Abstract
Sorted list is widely used to feature indexing in a variety of applications, such as multimedia database and information retrieval. Answering top-k aggregation queries on a set of lists plays an increasingly important role in these domains. Unfortunately the existing solutions, such as threshold-style (TA-style) algorithms, do not guarantee superior performance on a large number of lists. In this paper, we introduce a merge-based strategy, called ListMerge, to accelerating TA-style algorithms. ListMerge exploits a critical observation to TA-style algorithms: if aggregation functions are monotone and distributive, it is much more efficient that merging several lists together, then applying a TA-style algorithm. This observation also inspires the development of our cost model, which can evaluate the best number of merged lists. Experimental results show that ListMerge could outperform the baseline algorithms up to 4–20 times in synthetic datasets generated by various distributions.
The work was partially supported by the National Natural Science Foundation of China (No. 61370080, No. 61170007) and Science and Technology Commission of Shanghai Municipality (No. 14511106802).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Just-in-time (JIT) compilation, also known as dynamic translation, is compilation done during execution of a program.
References
Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured p2p systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 495–506. VLDB Endowment (2007)
Akbarinia, R., Pacitti, E., Valduriez, P.: Processing top-k queries in distributed hash tables. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 489–502. Springer, Heidelberg (2007)
Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 10–14, September 2000
Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: 21st International Conference on Data Engineering, 2005, ICDE 2005, Proceedings, pp. 174–185. IEEE (2005)
Bruno, N., Wang, H.: The threshold algorithm: from middleware systems to the relational engine. IEEE Trans. Knowl. Data Eng. 19(4), 523–537 (2007)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, pp. 206–215. ACM (2004)
Chang, Y.C., Bergman, L., Castelli, V., Li, C.S., Lo, M.L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: ACM SIGMOD Record, vol. 29, pp. 391–402. ACM (2000)
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)
Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework for efficiently processing ranking related queries. In: EDBT, pp. 427–438 (2014)
Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 451–462. VLDB Endowment (2006)
Fagin, R.: Combining fuzzy information from multiple systems. In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 216–226. ACM (1996)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Güntzer, U., Balke, W.T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: International Conference on Information Technology: Coding and Computing, 2001, Proceedings, pp. 622–628. IEEE (2001)
Heo, J.S., Cho, J., Whang, K.Y.: The hybrid-layer index: A synergic approach to answering top-k queries in arbitrary subspaces. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 445–448. IEEE (2010)
Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: A system for the efficient execution of multi-parametric ranked queries. In: ACM SIGMOD Record, vol. 30, pp. 259–270. ACM (2001)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Joining ranked inputs in practice. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 950–961. VLDB Endowment (2002)
Lee, J., Cho, H., Hwang, S.W.: Efficient dual-resolution layer indexing for top-k queries. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1084–1095. IEEE (2012)
Li, C., Chang, K.C.C., Ilyas, I.F., Song, S.: RankSQL: query algebra and optimization for relational top-k queries. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 131–142. ACM (2005)
Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 61–72. ACM(2006)
Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 637–648. VLDB Endowment (2005)
Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: 15th International Conference on Data Engineering, 1999, Proceedings, pp. 22–29. IEEE (1999)
Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011)
Vlachou, A., Doulkeridis, C., Nørvåg, K.: Distributed top-k query processing by exploiting skyline summaries. Distrib. Parallel Databases 30(3–4), 239–271 (2012)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 753–764. ACM (2008)
Xie, M., Lakshmanan, L.V., Wood, P.T.: Efficient top-k query answering using cached views. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 489–500. ACM (2013)
Zou, L., Chen, L.: Pareto-based dominant graph: an efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, S., Sun, C., He, Z. (2016). ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-32049-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32048-9
Online ISBN: 978-3-319-32049-6
eBook Packages: Computer ScienceComputer Science (R0)