Advertisement

Knowledge and Information Systems

, Volume 43, Issue 3, pp 689–718 | Cite as

TDEP: efficiently processing top-k dominating query on massive data

  • Xixian HanEmail author
  • Jianzhong Li
  • Hong Gao
Regular Paper

Abstract

In many applications including multicriteria decision making, top-k dominating query is a practically useful tool to return k tuples with the highest domination scores in a potentially huge data space. The existing algorithms, either requiring indexes built on the specific attribute subset, or incurring high I/O cost or memory cost, cannot process top-k dominating query on massive data efficiently. In this paper, a novel algorithm TDEP is proposed to utilize sorted lists built for each attribute with low cost to return top-k dominating result on massive data efficiently. Through analysis, it is found that TDEP can be divided into two phases: growing phase and shrinking phase. In each phase, TDEP retrieves the sorted lists in round-robin fashion and maintains the candidates until the stop condition is satisfied. The theoretical analysis is provided for the execution behavior in two phases. An efficient method is developed to compute the domination scores of tuples with the obtained candidates only. Besides, TDEP adopts early pruning to reduce the number of candidate tuples maintained significantly. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant performance advantage of TDEP over the existing algorithms.

Keywords

Massive data Top-k dominating query TDEP algorithm Sorted lists  Early pruning 

Notes

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 60831160525, 61272046, and 60903016, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, and National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.

References

  1. 1.
    Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd international conference on very large data bases, pp 495–506Google Scholar
  2. 2.
    Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefzbMATHGoogle Scholar
  3. 3.
    Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430Google Scholar
  4. 4.
    Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719Google Scholar
  5. 5.
    Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113Google Scholar
  6. 6.
    Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51CrossRefGoogle Scholar
  8. 8.
    Feller W (1968) An introduction to probability theory, vol 1, 3rd edn. Wiley, New YorkzbMATHGoogle Scholar
  9. 9.
    Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRefGoogle Scholar
  10. 10.
    Güntzer U, Balke W, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large data bases, pp 419–428Google Scholar
  11. 11.
    Güntzer U, Balke W, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the international conference on information technology: coding and computing, pp 622–628Google Scholar
  12. 12.
    Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Han X, Li J, Wang J, Yang D (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRefGoogle Scholar
  14. 14.
    Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384CrossRefGoogle Scholar
  15. 15.
    Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183CrossRefGoogle Scholar
  16. 16.
    Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRefGoogle Scholar
  17. 17.
    Kontaki M, Papadopoulos A, Manolopoulos Y (2012) Continuous top-k dominating queries. IEEE Trans Knowl Data Eng 24(5):840–853CrossRefGoogle Scholar
  18. 18.
    Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286Google Scholar
  19. 19.
    Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671Google Scholar
  20. 20.
    Mamoulis N, Yiu M, Cheng K, Cheung D (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19CrossRefGoogle Scholar
  21. 21.
    Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456CrossRefGoogle Scholar
  22. 22.
    Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRefGoogle Scholar
  23. 23.
    Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86CrossRefGoogle Scholar
  24. 24.
    Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116Google Scholar
  25. 25.
    Skoutas D, Sacharidis D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909Google Scholar
  26. 26.
    Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606CrossRefGoogle Scholar
  27. 27.
    Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310Google Scholar
  28. 28.
    Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948CrossRefGoogle Scholar
  29. 29.
    Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242CrossRefGoogle Scholar
  30. 30.
    Yiu M, Mamoulis N (2007) Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd international conference on very large data, bases, pp 483–494Google Scholar
  31. 31.
    Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718CrossRefGoogle Scholar
  32. 32.
    Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305CrossRefGoogle Scholar
  33. 33.
    Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, CambridgeGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations