Advertisement

Knowledge and Information Systems

, Volume 43, Issue 2, pp 355–388 | Cite as

SEPT: an efficient skyline join algorithm on massive data

  • Xixian HanEmail author
  • Jianzhong Li
  • Hong Gao
  • Chengyu Yang
Regular Paper

Abstract

Skyline join is an important operation in many applications to return all join tuples that are not dominated by any other join tuples. It is found that the existing algorithms cannot process skyline join on massive data efficiently. This paper presents a novel skyline join algorithm SEPT on massive data. SEPT utilizes sorted positional index lists with join information which require low space overhead to reduce I/O cost significantly. The sorted positional index list is constructed for each potential skyline attribute in the joined tables and is arranged in ascending order of the attribute. SEPT consists of two phases. In phase one, SEPT obtains candidate join positional index pairs of skyline join results. During retrieving the sorted positional index lists, SEPT performs pruning on candidate join positional index pairs in order to discard the candidates whose corresponding join tuples are not skyline join results. In phase two, SEPT exploits the obtained candidate join positional index pairs to get skyline join results by a selective and sequential scan on the tables. The experimental results on synthetic and real data sets show that SEPT has a significant advantage over the existing skyline join algorithms.

Keywords

Massive data Skyline join Pruning SEPT 

Notes

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 61272046, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.

References

  1. 1.
    Bartolini I, Ciaccia P, Patella M (2008) Efficient sort-based skyline evaluation. ACM Trans Database Syst 33(4):31:1–31:49CrossRefGoogle Scholar
  2. 2.
    Bentley J, Kung H, Schkolnick M, Thompson C (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4):536–543CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7): 422–426Google Scholar
  4. 4.
    Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data, engineering, pp 421–430Google Scholar
  5. 5.
    Bryan R (2007) Data-intensive supercomputing: the case for disc. In: Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon UniversityGoogle Scholar
  6. 6.
    Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data, engineering, pp 717–719Google Scholar
  7. 7.
    Courant R, John F (1989) Introduction to calculus and analysis: volume I, 1st edn. Springer, New YorkCrossRefzbMATHGoogle Scholar
  8. 8.
    Gibas M, Canahuate G, Ferhatosmanoglu H (2008) Online index recommendations for high-dimensional databases using query workloads. IEEE Trans Knowl Data Eng 20(2):246–260CrossRefGoogle Scholar
  9. 9.
    Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JMa (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97Google Scholar
  10. 10.
    Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRefGoogle Scholar
  11. 11.
    Gray J, Shenoy P (2000) Rules of thumb in data engineering. In: Proceedings of the 16th international conference on data, engineering, pp 3–12Google Scholar
  12. 12.
    Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557CrossRefGoogle Scholar
  13. 13.
    Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRefGoogle Scholar
  14. 14.
    Huang J, Jiang B, Pei J, Chen J, Tang Y (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396CrossRefGoogle Scholar
  15. 15.
    Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183CrossRefGoogle Scholar
  16. 16.
    Jin W, Ester M, Hu Z, Han J (2007) The multi-relational skyline operator. In: Proceedings of the 23rd international conference on data, engineering, pp 1276–1280Google Scholar
  17. 17.
    Jin W, Morse M, Patel J, Ester M, Hu Z (2010) Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th international conference on data, engineering, pp 249–260Google Scholar
  18. 18.
    Khalefa M, Mokbel M, Levandoski J (2011) Prefjoin: an efficient preference-aware join operator. In: Proceedings of the 27th international conference on data, engineering, pp 995–1006Google Scholar
  19. 19.
    Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data, bases, pp 275–286Google Scholar
  20. 20.
    Kung H, Luccio F, Preparata F (1975) On finding the maxima of a set of vectors. J ACM 22(4):469–476CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Lee K, Lee W, Zheng B, Li H, Tian Y (2010) Z-sky: an efficient skyline query processing framework based on z-order. VLDB J 19(3):333–362CrossRefGoogle Scholar
  22. 22.
    Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301CrossRefGoogle Scholar
  23. 23.
    Nagendra M, Candan K (2012) Skyline-sensitive joins with lr-pruning. In: Proceedings of the 15th international conference on extending database technology, pp 252–263Google Scholar
  24. 24.
    Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRefGoogle Scholar
  25. 25.
    Raghavan V, Rundensteiner E (2010) Progressive result generation for multi-criteria decision support queries. In: Proceedings of the 26th international conference on data, engineering, pp 733–744Google Scholar
  26. 26.
    Raghavan V, Rundensteiner E, Srivastava S (2011) Skyline and mapping aware join query evaluation. Inf Syst 36(6):917–936CrossRefGoogle Scholar
  27. 27.
    Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill Book Co., New YorkzbMATHGoogle Scholar
  28. 28.
    Seagate (2012) Barracuda xt: no compromise. Speed and capacity for high-performance desktop systems. http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_xt.pdf
  29. 29.
    Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116Google Scholar
  30. 30.
    Sun D, Wu S, Li J, Tung A (2008) Skyline-join in distributed databases. In: Proceedings of the 24th international conference on data engineering workshops, pp 176–181Google Scholar
  31. 31.
    Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606CrossRefGoogle Scholar
  32. 32.
    Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data, bases, pp 301–310Google Scholar
  33. 33.
    Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088CrossRefGoogle Scholar
  34. 34.
    Tom’s Hardware (2006) Hard drives: 40mb to 750gb. http://www.tomshardware.com/reviews/15-years-of-hard-drive-history,1368--2.html
  35. 35.
    Vlachou A, Doulkeridis C, Polyzotis N (2011) Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 73–84Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Xixian Han
    • 1
    Email author
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  • Chengyu Yang
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations