Advertisement

Knowledge and Information Systems

, Volume 47, Issue 2, pp 301–328 | Cite as

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

  • Xixian HanEmail author
  • Xianmin Liu
  • Jianzhong Li
  • Hong Gao
Regular Paper

Abstract

In many applications, top-k query is an important operation to return a set of interesting points in a potentially huge data space. The existing algorithms, either maintaining too many candidates, or requiring assistant structures built on the specific attribute subset, or returning results with probabilistic guarantee, cannot process top-k query on massive data efficiently. This paper proposes a sorted-list-based TKAP algorithm, which utilizes some data structures of low space overhead, to efficiently compute top-k results on massive data. In round-robin retrieval on sorted lists, TKAP performs adaptive pruning operation and maintains the required candidates until the stop condition is satisfied. The adaptive pruning operation can be adjusted by the information obtained in round-robin retrieval to achieve a better pruning effect. The adaptive pruning rule is developed in this paper, along with its theoretical analysis. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant advantage of TKAP over the existing algorithms.

Keywords

Massive data TKAP algorithm Sorted list Adaptive pruning 

Notes

Acknowledgments

This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61402130, 61272046, 61190115, 61173022, 61033015, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308.

References

  1. 1.
    Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp 495–506Google Scholar
  2. 2.
    Chang YC, Bergman L, Castelli V et al (2000) The onion technique: indexing for linear optimization queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 391–402Google Scholar
  3. 3.
    Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 451–462Google Scholar
  4. 4.
    Fagin R, Kumar R, Sivakumar D (2003a) Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp 301–312Google Scholar
  5. 5.
    Fagin R, Lotem A, and Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp 102–113Google Scholar
  6. 6.
    Fagin R, Lotem A, Naor M (2003b) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51CrossRefGoogle Scholar
  8. 8.
    Ge S, Hou LU, Mamoulis N, Cheung DW (2013) Efficient all top-k computation—a unified solution for all top-k, reverse top-k and top-m influential queries. IEEE Trans Knowl Data Eng 25(5):1015–1027CrossRefGoogle Scholar
  9. 9.
    Güntzer U, Balke WT, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp 419–428Google Scholar
  10. 10.
    Güntzer U, Balke WT, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the International Conference on Information Technology: Coding and Computing, pp 622–628Google Scholar
  11. 11.
    Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557CrossRefGoogle Scholar
  13. 13.
    Heo JS, Cho J, Whang KY (2013) Subspace top-k query processing using the hybrid-layer index with a tight bound. Data Knowl Eng 83:1–19CrossRefGoogle Scholar
  14. 14.
    Hristidis V, Papakonstantinou Y (2004) Algorithms and applications for answering ranked queries using ranked views. VLDB J 13(1):49–70CrossRefGoogle Scholar
  15. 15.
    Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11:1–11:58CrossRefGoogle Scholar
  16. 16.
    Lee J, Cho H, Hwang SW (2012) Efficient dual-resolution layer indexing for top-k queries. In: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, pp 1084–1095Google Scholar
  17. 17.
    Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19CrossRefGoogle Scholar
  18. 18.
    Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456CrossRefGoogle Scholar
  19. 19.
    Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86CrossRefGoogle Scholar
  20. 20.
    Xie M, Lakshmanan L, Wood P (2013) Efficient top-k query answering using cached views. In: Proceedings of the 16th International Conference on Extending Database Technology, pp 489–500Google Scholar
  21. 21.
    Xin D, Chen C, Han J (2006) Towards robust indexing for ranked queries. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 235–246Google Scholar
  22. 22.
    Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242CrossRefGoogle Scholar
  23. 23.
    Zou L, Chen L (2011) Pareto-based dominant graph: an efficient indexing structure to answer top-k queries. IEEE Trans Knowl Data Eng 23(5):727–741MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Xixian Han
    • 1
    Email author
  • Xianmin Liu
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations