Advertisement

PRS: efficient range skyline computation on massive data via presorting

  • Xixian Han
  • Xue Li
  • Bailing Wang
  • Hong Gao
Regular Paper
  • 8 Downloads

Abstract

In many applications, range skyline query is an important operation to find the interesting tuples in a potentially huge data space. Given selection condition, range skyline query returns tuples both satisfying the specified selection condition and not dominated by other satisfying tuples. It is found that most of the existing skyline algorithms do not consider the selection condition. This paper proposes a novel table-scan-based Presorted-table-based Range Skyline (PRS) algorithm to efficiently compute range skyline results on massive data. PRS first presorts the table for early termination. The early termination checking is proposed in this paper, along with the theoretical analysis of scan depth. The selection checking and dominance checking are devised in this paper to skip the unsatisfying or dominated tuples directly. The theoretical analysis proves that the overwhelming majority of candidates can be skipped. The extensive experimental results, conducted on synthetic and real-life data sets, show that PRS outperforms the existing algorithms significantly.

Keywords

Massive data PRS algorithm Early termination Selection checking Dominance checking 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61872106, 61632010, 61502121 and 61402130, National key research and development program of China under Grant No. 2016YFB1000703 and Weihai-HIT co-construction program under Grant No. ZMZ001702.

Supplementary material

10115_2018_1310_MOESM1_ESM.zip (2.2 mb)
Supplementary material 1 (zip 2301 KB)

References

  1. 1.
    Berger M (1987) Geometry I. Springer, BerlinCrossRefGoogle Scholar
  2. 2.
    Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430Google Scholar
  3. 3.
    Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719Google Scholar
  4. 4.
    Dellis E, Vlachou A, Vladimirskiy I, et al (2006) Constrained subspace skyline computation. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 415–424Google Scholar
  5. 5.
    Endres M (2015) The structure of preference orders. In: Proceedings of 19th East European conference on advances in databases and information systems, pp 32–45Google Scholar
  6. 6.
    Endres M, Kießling W (2011a) Semi-skyline optimization of constrained skyline queries. In: 22nd Australasian database conference, pp 7–16Google Scholar
  7. 7.
    Endres M, Kießling W (2011b) Skyline snippets. In: Proceedings of 9th international conference on flexible query answering systems, pp 246–257Google Scholar
  8. 8.
    Endres M, Kießling W (2014) High parallel skyline computation over low-cardinality domains. In: Proceedings of the 18th East European conference on advances in databases and information systems, pp 97–111Google Scholar
  9. 9.
    Endres M, Kießling W (2015) Parallel skyline computation exploiting the lattice structure. J Database Manag 26(4):18–43CrossRefGoogle Scholar
  10. 10.
    Endres M, Preisinger T (2017) Beyond skylines: explicit preferences. In: Proceedings of the 22nd international conference on database systems for advanced applications, Part I, pp 327–342Google Scholar
  11. 11.
    Endres M, Roocks P, Kießling W (2015) Scalagon: an efficient skyline algorithm for all seasons. In: Proceedings of 20th international conference on database systems for advanced applications, Part II, pp 292–308Google Scholar
  12. 12.
    Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113Google Scholar
  13. 13.
    Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRefGoogle Scholar
  14. 14.
    Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699CrossRefGoogle Scholar
  15. 15.
    Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRefGoogle Scholar
  16. 16.
    Hsueh Y, Lin C, Chang C (2017) An efficient indexing method for skyline computations with partially ordered domains. IEEE Trans Knowl Data Eng 29(5):963–976CrossRefGoogle Scholar
  17. 17.
    Kießling W, Endres M, Wenzel F (2011) The preference SQL system—an overview. IEEE Data Eng Bull 34(2):11–18Google Scholar
  18. 18.
    Korn F, Pagel B, Faloutsos C (2001) On the ’dimensionality curse’ and the ’self-similarity blessing’. IEEE Trans Knowl Data Eng 13(1):96–111CrossRefGoogle Scholar
  19. 19.
    Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286Google Scholar
  20. 20.
    Kung H, Luccio F, Preparata F (1975) On finding the maxima of a set of vectors. J ACM 22(4):469–476MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lee J, Hwang S (2014a) Scalable skyline computation using a balanced pivot selection technique. Inf Syst 39:1–21CrossRefGoogle Scholar
  22. 22.
    Lee J, Hwang S (2014b) Toward efficient multidimensional subspace skyline computation. VLDB J 23(1):129–145CrossRefGoogle Scholar
  23. 23.
    Liu X, Lu R, Ma J et al (2016) Efficient and privacy-preserving skyline computation framework across domains. Future Gener Comput Syst 62(C):161–174CrossRefGoogle Scholar
  24. 24.
    Mandl S, Kozachuk O, Endres M, Kießling W (2015) Preference analytics in exasolution. In: Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS). Proceedings, pp 613–632Google Scholar
  25. 25.
    Morse M, Patel J, Jagadish H (2007) Efficient skyline computation over low-cardinality domains. In: Proceedings of the 33rd international conference on very large data bases, pp 267–278Google Scholar
  26. 26.
    Mortensen M, Chester S, Assent I, Magnani M (2015) Efficient caching for constrained skyline queries. In: Proceedings of the 18th international conference on extending database technology, pp 337–348Google Scholar
  27. 27.
    O’Neil P, Quass D (1997) Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, pp 38–49Google Scholar
  28. 28.
    Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRefGoogle Scholar
  29. 29.
    Park Y, Min J, Shim K (2017) Efficient processing of skyline queries using mapreduce. IEEE Trans Knowl Data Eng 29(5):1031–1044CrossRefGoogle Scholar
  30. 30.
    Pei J, Yuan Y, Lin X et al (2006) Towards multidimensional subspace skyline analysis. ACM Trans Database Syst 31(4):1335–1381CrossRefGoogle Scholar
  31. 31.
    Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116Google Scholar
  32. 32.
    Steuer R (1986) Multiple criteria optimization. Wiley, HobokenzbMATHGoogle Scholar
  33. 33.
    Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310Google Scholar
  34. 34.
    Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088CrossRefGoogle Scholar
  35. 35.
    Wu K, Shoshani A, Stockinger K (2008) Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans Database Syst 35(1):2:1–2:52Google Scholar
  36. 36.
    Xia T, Zhang D, Fang Z et al (2012) Online subspace skyline query processing using the compressed skycube. ACM Trans Database Syst 37(2):15:1–15:36CrossRefGoogle Scholar
  37. 37.
    Xin D, Han J, Cheng H, Li X (2006) Answering top-k queries with multi-dimensional selections: the ranking cube approach. In: Proceedings of the 32nd international conference on very large data bases, pp 463–474Google Scholar
  38. 38.
    Zhang M, Alhajj R (2010) Skyline queries with constraints: integrating skyline and traditional query operators. Data Knowl Eng 69(1):153–168CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations