Advertisement

ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists

  • Shile Zhang
  • Chao Sun
  • Zhenying HeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9643)

Abstract

Sorted list is widely used to feature indexing in a variety of applications, such as multimedia database and information retrieval. Answering top-k aggregation queries on a set of lists plays an increasingly important role in these domains. Unfortunately the existing solutions, such as threshold-style (TA-style) algorithms, do not guarantee superior performance on a large number of lists. In this paper, we introduce a merge-based strategy, called ListMerge, to accelerating TA-style algorithms. ListMerge exploits a critical observation to TA-style algorithms: if aggregation functions are monotone and distributive, it is much more efficient that merging several lists together, then applying a TA-style algorithm. This observation also inspires the development of our cost model, which can evaluate the best number of merged lists. Experimental results show that ListMerge could outperform the baseline algorithms up to 4–20 times in synthetic datasets generated by various distributions.

Keywords

Random Access Cost Model Aggregation Function Execution Cost Access Cost 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured p2p systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)CrossRefGoogle Scholar
  2. 2.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 495–506. VLDB Endowment (2007)Google Scholar
  3. 3.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Processing top-k queries in distributed hash tables. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 489–502. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 10–14, September 2000Google Scholar
  5. 5.
    Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: 21st International Conference on Data Engineering, 2005, ICDE 2005, Proceedings, pp. 174–185. IEEE (2005)Google Scholar
  6. 6.
    Bruno, N., Wang, H.: The threshold algorithm: from middleware systems to the relational engine. IEEE Trans. Knowl. Data Eng. 19(4), 523–537 (2007)CrossRefGoogle Scholar
  7. 7.
    Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, pp. 206–215. ACM (2004)Google Scholar
  8. 8.
    Chang, Y.C., Bergman, L., Castelli, V., Li, C.S., Lo, M.L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: ACM SIGMOD Record, vol. 29, pp. 391–402. ACM (2000)Google Scholar
  9. 9.
    Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)CrossRefGoogle Scholar
  10. 10.
    Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework for efficiently processing ranking related queries. In: EDBT, pp. 427–438 (2014)Google Scholar
  11. 11.
    Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 451–462. VLDB Endowment (2006)Google Scholar
  12. 12.
    Fagin, R.: Combining fuzzy information from multiple systems. In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 216–226. ACM (1996)Google Scholar
  13. 13.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Güntzer, U., Balke, W.T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: International Conference on Information Technology: Coding and Computing, 2001, Proceedings, pp. 622–628. IEEE (2001)Google Scholar
  15. 15.
    Heo, J.S., Cho, J., Whang, K.Y.: The hybrid-layer index: A synergic approach to answering top-k queries in arbitrary subspaces. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 445–448. IEEE (2010)Google Scholar
  16. 16.
    Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: A system for the efficient execution of multi-parametric ranked queries. In: ACM SIGMOD Record, vol. 30, pp. 259–270. ACM (2001)Google Scholar
  17. 17.
    Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Joining ranked inputs in practice. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 950–961. VLDB Endowment (2002)Google Scholar
  18. 18.
    Lee, J., Cho, H., Hwang, S.W.: Efficient dual-resolution layer indexing for top-k queries. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1084–1095. IEEE (2012)Google Scholar
  19. 19.
    Li, C., Chang, K.C.C., Ilyas, I.F., Song, S.: RankSQL: query algebra and optimization for relational top-k queries. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 131–142. ACM (2005)Google Scholar
  20. 20.
    Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 61–72. ACM(2006)Google Scholar
  21. 21.
    Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 637–648. VLDB Endowment (2005)Google Scholar
  22. 22.
    Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: 15th International Conference on Data Engineering, 1999, Proceedings, pp. 22–29. IEEE (1999)Google Scholar
  23. 23.
    Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Vlachou, A., Doulkeridis, C., Nørvåg, K.: Distributed top-k query processing by exploiting skyline summaries. Distrib. Parallel Databases 30(3–4), 239–271 (2012)CrossRefGoogle Scholar
  25. 25.
    Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 753–764. ACM (2008)Google Scholar
  26. 26.
    Xie, M., Lakshmanan, L.V., Wood, P.T.: Efficient top-k query answering using cached views. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 489–500. ACM (2013)Google Scholar
  27. 27.
    Zou, L., Chen, L.: Pareto-based dominant graph: an efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Software EngineeringFudan UniversityShanghaiChina
  2. 2.School of Computer ScienceFudan UniversityShanghaiChina
  3. 3.Shanghai Key Laboratory of Data ScienceFudan UniversityShanghaiChina

Personalised recommendations