Advertisement

The VLDB Journal

, Volume 27, Issue 1, pp 27–52 | Cite as

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

  • Chunbin Lin
  • Jiaheng Lu
  • Zhewei Wei
  • Jianguo Wang
  • Xiaokui Xiao
Regular Paper
  • 350 Downloads

Abstract

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-km, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-km algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-km queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-km algorithms on multiple real-life datasets.

Keywords

Top-k query Top-k, m query Instance optimal algorithm 

References

  1. 1.
    Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA, pp. 633–634 (2002)Google Scholar
  2. 2.
    Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing. In: VLDB, pp. 475–486 (2006)Google Scholar
  3. 3.
    Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM TODS 27(2), 153–187 (2002)CrossRefGoogle Scholar
  4. 4.
    Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, pp. 369–380 (2002)Google Scholar
  5. 5.
    Chang, K.C.-C., Hwang, S.-W.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)Google Scholar
  6. 6.
    Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)Google Scholar
  7. 7.
    Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)Google Scholar
  8. 8.
    Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226 (1996)Google Scholar
  9. 9.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIDMA 17(1), 134–160 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)Google Scholar
  12. 12.
    Feng, J., Li, G., Wang, J.: Finding top-k answers in keyword search over relational databases using tuple units. TKDE 23(12), 1781–1794 (2011)Google Scholar
  13. 13.
    Guntzer, J., Balke, W.-T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: ITCC, pp. 622–628 (2001)Google Scholar
  14. 14.
    Güntzer, U., Balke, W., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)Google Scholar
  15. 15.
    He, R., Lin, C., McAuley, J.: Fashionista: A fashion-aware graphical system for exploring visually similar items. In: WWW, pp. 199–202 (2016)Google Scholar
  16. 16.
    He, R., Lin, C., Wang, J., McAuley, J.: Sherlock: sparse hierarchical embeddings for visually-aware one-class collaborative filtering. In: IJCAI, pp. 3740–3746 (2016)Google Scholar
  17. 17.
    Hua, M., Pei, J., Fu, A.W., Lin, X., Leung, H.: Top-k typicality queries and efficient query answering methods on large databases. VLDB J. 18(3), 809–835 (2009)CrossRefGoogle Scholar
  18. 18.
    Ilyas, I.F., Aref, W.G., Elmagarmid, A. K.: Joining ranked inputs in practice. In: VLDB, pp. 950–961 (2002)Google Scholar
  19. 19.
    Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB, pp. 754–765 (2003)Google Scholar
  20. 20.
    Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)CrossRefGoogle Scholar
  21. 21.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. CSUR 40(4), 11 (2008)CrossRefGoogle Scholar
  22. 22.
    Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: SIGMOD, pp. 61–72 (2006)Google Scholar
  23. 23.
    Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: ICDE, pp. 673–684 (2011)Google Scholar
  24. 24.
    Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)CrossRefGoogle Scholar
  25. 25.
    Lu, E.H.-C., Chen, C.-Y., Tseng, V.S.: Personalized trip recommendation with multiple constraints by mining user check-in behaviors. In: SIGSPATIAL GIS, pp. 209–218 (2012)Google Scholar
  26. 26.
    Lu, J., Senellart, P., Lin, C., Du, X., Wang, S., Chen, X.: Optimal top-k generation of attribute combinations based on ranked lists. In: SIGMOD, pp. 409–420 (2012)Google Scholar
  27. 27.
    Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. TODS 32(3), 19 (2007)CrossRefGoogle Scholar
  28. 28.
    Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)Google Scholar
  29. 29.
    Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)Google Scholar
  30. 30.
    Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. VLDB 1, 281–290 (2001)Google Scholar
  31. 31.
    Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)Google Scholar
  32. 32.
    Qiao, M., Qin, L., Cheng, H., Yu, J.X., Tian, W.: Top-k nearest keyword search on large graphs. PVLDB 6(10), 901–912 (2013)Google Scholar
  33. 33.
    Ranu, S., Hoang, M.X., Singh, A.K.: Answering top-k representative queries on graph databases. In: SIGMOD, pp. 1163–1174 (2014)Google Scholar
  34. 34.
    Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)Google Scholar
  35. 35.
    Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identities. JACM 27(4), 701–717 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: EDBT, pp. 156–167 (2012)Google Scholar
  37. 37.
    Soliman, M.A., Ilyas, I.F., Chang, K. C.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)Google Scholar
  38. 38.
    Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Probabilistic top-k and ranking-aggregate queries. TODS 33(3), 13 (2008)CrossRefGoogle Scholar
  39. 39.
    Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for topx search. In: VLDB, pp. 625–636 (2005)Google Scholar
  40. 40.
    Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)Google Scholar
  41. 41.
    Varadarajan, R., Farfán, F., Hristidis, V.: Comparing top-k XML lists. Inf. Syst. 38(6), 820–834 (2013)CrossRefGoogle Scholar
  42. 42.
    Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: ICDE, pp. 990–1001 (2016)Google Scholar
  43. 43.
    Yang, Z., Fu, A.W., Liu, R.: Diversified top-k subgraph querying in a large graph. In: SIGMOD, pp. 1167–1182 (2016)Google Scholar
  44. 44.
    Yiu, M.L., Mamoulis, N., Hristidis, V.: Extracting k most important groups from data efficiently. DKE 66(2), 289–310 (2008)CrossRefGoogle Scholar
  45. 45.
    Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Parallel Databases 26(1), 67–126 (2009)CrossRefGoogle Scholar
  46. 46.
    Zhu, R., Zou, Z., Li, J.: Towards efficient top-k reliability search on uncertain graphs. KAIS 50(3), 723–750 (2017)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of CaliforniaSan DiegoUSA
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  3. 3.School of InformationRenmin University of ChinaBeijingChina
  4. 4.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations