ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists

Zhang, Shile; Sun, Chao; He, Zhenying

doi:10.1007/978-3-319-32049-6_5

Shile Zhang^19,21,
Chao Sun^20,21 &
Zhenying He^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9643))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1501 Accesses
7 Citations

Abstract

Sorted list is widely used to feature indexing in a variety of applications, such as multimedia database and information retrieval. Answering top-k aggregation queries on a set of lists plays an increasingly important role in these domains. Unfortunately the existing solutions, such as threshold-style (TA-style) algorithms, do not guarantee superior performance on a large number of lists. In this paper, we introduce a merge-based strategy, called ListMerge, to accelerating TA-style algorithms. ListMerge exploits a critical observation to TA-style algorithms: if aggregation functions are monotone and distributive, it is much more efficient that merging several lists together, then applying a TA-style algorithm. This observation also inspires the development of our cost model, which can evaluate the best number of merged lists. Experimental results show that ListMerge could outperform the baseline algorithms up to 4–20 times in synthetic datasets generated by various distributions.

The work was partially supported by the National Natural Science Foundation of China (No. 61370080, No. 61170007) and Science and Technology Commission of Shanghai Municipality (No. 14511106802).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Article 01 May 2015

Rank Aggregation of Candidate Sets for Efficient Similarity Search

Notes

1.
Just-in-time (JIT) compilation, also known as dynamic translation, is compilation done during execution of a program.

References

Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured p2p systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)
Article Google Scholar
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 495–506. VLDB Endowment (2007)
Google Scholar
Akbarinia, R., Pacitti, E., Valduriez, P.: Processing top-k queries in distributed hash tables. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 489–502. Springer, Heidelberg (2007)
Chapter Google Scholar
Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 10–14, September 2000
Google Scholar
Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: 21st International Conference on Data Engineering, 2005, ICDE 2005, Proceedings, pp. 174–185. IEEE (2005)
Google Scholar
Bruno, N., Wang, H.: The threshold algorithm: from middleware systems to the relational engine. IEEE Trans. Knowl. Data Eng. 19(4), 523–537 (2007)
Article Google Scholar
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, pp. 206–215. ACM (2004)
Google Scholar
Chang, Y.C., Bergman, L., Castelli, V., Li, C.S., Lo, M.L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: ACM SIGMOD Record, vol. 29, pp. 391–402. ACM (2000)
Google Scholar
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)
Article Google Scholar
Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework for efficiently processing ranking related queries. In: EDBT, pp. 427–438 (2014)
Google Scholar
Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 451–462. VLDB Endowment (2006)
Google Scholar
Fagin, R.: Combining fuzzy information from multiple systems. In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 216–226. ACM (1996)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Güntzer, U., Balke, W.T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: International Conference on Information Technology: Coding and Computing, 2001, Proceedings, pp. 622–628. IEEE (2001)
Google Scholar
Heo, J.S., Cho, J., Whang, K.Y.: The hybrid-layer index: A synergic approach to answering top-k queries in arbitrary subspaces. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 445–448. IEEE (2010)
Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: A system for the efficient execution of multi-parametric ranked queries. In: ACM SIGMOD Record, vol. 30, pp. 259–270. ACM (2001)
Google Scholar
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Joining ranked inputs in practice. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 950–961. VLDB Endowment (2002)
Google Scholar
Lee, J., Cho, H., Hwang, S.W.: Efficient dual-resolution layer indexing for top-k queries. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1084–1095. IEEE (2012)
Google Scholar
Li, C., Chang, K.C.C., Ilyas, I.F., Song, S.: RankSQL: query algebra and optimization for relational top-k queries. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 131–142. ACM (2005)
Google Scholar
Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 61–72. ACM(2006)
Google Scholar
Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 637–648. VLDB Endowment (2005)
Google Scholar
Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: 15th International Conference on Data Engineering, 1999, Proceedings, pp. 22–29. IEEE (1999)
Google Scholar
Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011)
Chapter Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K.: Distributed top-k query processing by exploiting skyline summaries. Distrib. Parallel Databases 30(3–4), 239–271 (2012)
Article Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 753–764. ACM (2008)
Google Scholar
Xie, M., Lakshmanan, L.V., Wood, P.T.: Efficient top-k query answering using cached views. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 489–500. ACM (2013)
Google Scholar
Zou, L., Chen, L.: Pareto-based dominant graph: an efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering, Fudan University, Shanghai, China
Shile Zhang
School of Computer Science, Fudan University, Shanghai, China
Chao Sun & Zhenying He
Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China
Shile Zhang, Chao Sun & Zhenying He

Authors

Shile Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhenying He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenying He .

Editor information

Editors and Affiliations

Georgia Institute of Technology , Atlanta, Georgia, USA
Shamkant B. Navathe
University of Texas at Dallas , Richardson, Texas, USA
Weili Wu
University of Minnesota , Minneapolis, Minnesota, USA
Shashi Shekhar
Renmin University , Beijing, China
Xiaoyong Du
Fudan University , Shanghai, China
Sean X. Wang
Rutgers, The State University of New Jer , New Brunswick, New Jersey, USA
Hui Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S., Sun, C., He, Z. (2016). ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-32049-6_5
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32048-9
Online ISBN: 978-3-319-32049-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists

Abstract

Access this chapter

Similar content being viewed by others

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Rank Aggregation of Candidate Sets for Efficient Similarity Search

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists

Abstract

Access this chapter

Similar content being viewed by others

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Rank Aggregation of Candidate Sets for Efficient Similarity Search

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation