Skip to main content
Log in

Algorithms and analyses for maximal vector computation

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied.The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors N and the dimensionality K. Early algorithms based on divide and conquer established seemingly good average and worst-case asymptotic runtimes. In fact, the problem can be solved in \(\mathcal{O}(KN)\) average-case (holding K as fixed). We prove, however, that the performance is quite bad with respect to K. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve \(\mathcal{O}(KN)\) average-case. While K matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its average-case running time is \(\mathcal{O}(KN)\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Balke W.T., Güntzer U. (2004) Multi-objective query processing for databas systems. In: Nascimento M.A., Özsu M.T., Kossmann D., Miller R.J., Blakeley J.A., Schiefer K.B. (eds). Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, Toronto, Canada, pp. 936–947

    Google Scholar 

  2. Balke, W.T., Güntzer, U.: Supporting skyline queries on categorical data in web information systems. In: IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA 2004), pp. 1–6 (2004)

  3. Balke, W.T., Güntzer, U.: Efficient skyline queries under weak pareto dominance. In: IJCAI-05 Multidisciplinary Workshop on Advances in Preference Handling (Preference 2005), pp. 1–7 (2005)

  4. Barndorff-Nielsen O., Sobel M. (1966) On the distribution of the number of admissible points in a vector random sample. Theory Probab Appl 11(2): 249–269

    Article  MATH  MathSciNet  Google Scholar 

  5. Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. In: Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 179–187. ACM/SIAM (1990)

  6. Bentley J.L., Kung H.T., Schkolnick M., Thompson C.D. (1978) On the average number of maxima in a set of vectors and applications. JACM 25(4): 536–543

    Article  MATH  MathSciNet  Google Scholar 

  7. Blum M., Floyd R.W., Pratt V., Rivest R.L., Tarjan R.E. (1973) Time bounds for selection. J. Comput. Syst. Sci. 7(4): 448–461

    Article  MATH  MathSciNet  Google Scholar 

  8. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th ICDE, pp. 421–430 (2001)

  9. Buchta C. (1989) On the average number of maxima in a set of vectors. Inf. Process. Lett. 33, 63–65

    Article  MATH  MathSciNet  Google Scholar 

  10. Chan, C.Y., Eng, P.K., Tan, K.L.: Efficient processing of skyline queries with partially-ordered domains. In: ICDE, pp. 190–191 (2005)

  11. Chan, C.Y., Eng, P.K., Tan, K.L.: Stratified computation of skylines with partially-ordered domains. In: SIGMOD Conference, pp. 203–214 (2005)

  12. Chaudhuri, S., Dalvi, N., Raghav, K.: Robust cardinality and cost estimation for skyline operator. In: ICDE (To appear, 2006)

  13. Chomicki J. (2002) Querying with intrinsic preferences. In: Jensen C.S., Jeffery K.G., Pokorný J., Saltenis S., Bertino E., Böhm K., Jarke M. (eds). Proceedings of the 8th International Conference on Extending Database Technology (EDBT), LNCS 2287. Springer, Prague, Czech Republic, pp. 34–51

    Google Scholar 

  14. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. Technical. Report 04, Computer Science, York University, Toronto, Ontario, Canada (2002)

  15. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 717–719 (2003). See [14] for a longer version

  16. Chomicki J., Godfrey P., Gryz J., Liang D. (2005) Skyline with presorting: Theory and optimization. In: Klopotek M.A., Wierzchon S.T., Trojanowski K. (eds). Proceedings of the Intelligent Information Systems Conference (IIS): New Trends in Intelligent Information Processing and Web Mining, Advances in Soft Computing. Springer, Gdansk, Poland, pp. 593–602

    Google Scholar 

  17. Ciaccia, P.: Evaluating preferences with non-transitive preferences. Presentation at the Dagstuhl Seminar 04271 (Preferences: Specification, Inference, Applications) (2004)

  18. Eng P.K., Ooi B.C., Tan K.L. (2003) Indexing for progressive skyline computation. Data Knowl. Eng. 46(2): 169–201

    Article  Google Scholar 

  19. Godfrey P. (2004) Skyline cardinality for relational processing. In: Seipel D., Torres J.M.T. (eds) Proceedings of the 3rd International Symposium on Foundations of Information and Knowledge Systems (FoIKS). Springer, Wilhelminenberg Castle, Austria, pp. 78–97

    Google Scholar 

  20. Godfrey P., Shipley R., Gryz J. (2005) Maximal vector computation in large data sets. In: Böhm K., Jensen C.S., Haas L.M., Kersten M.L., Larson P.Å., Ooi B.C. (eds) Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005). ACM, Trondheim, Norway, pp. 229–240

    Google Scholar 

  21. Hellerstein J.M., Avnur R., Chou A., Hidber C., Olston C., Raman V., Roth T., Haas P.J. (1999) Interactive data analysis: The control project. IEEE Comput. 32(8):51–59

    Google Scholar 

  22. Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: ICDE (To appear, 2006)

  23. Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: PKDD, pp. 255–266 (2004)

  24. Kossmann, D., Ramask, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB-2002), pp. 275–286 (2002)

  25. Kung H.T., Luccio F., Preparata F.P. (1975) On finding the maxima of a set of vectors. JACM 22(4): 469–476

    Article  MATH  MathSciNet  Google Scholar 

  26. Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: ICDE, pp. 502–513 (2005)

  27. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM Press, Newyork (2003)

  28. Papadias D., Tao Y., Fu G., Seeger B. (2005) Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1): 41–82

    Article  Google Scholar 

  29. Pei, J., Jin, W., Ester, M., Tao, Y.: Catching the best views of skyline: a semantic approach based on decisive subspaces. In: VLDB, pp. 253–264 (2005)

  30. Tan K.L., Eng P.K., Ooi B.C. (2001) Efficient progressive skyline computation. In: Apers P.M.G., Atzeni P., Ceri S., Paraboschi S., Ramamohanarao K., Snodgrass R.T. (eds). Proceedings of 27th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, Rome, Italy, pp. 301–310

    Google Scholar 

  31. Tao, Y., Xiao, X., Pei, J.: SUBSKY: efficient computation of skylines in subspaces. In: ICDE (to appear, 2006)

  32. Torlone, R., Ciaccia, P.: Finding the best when it’s a matter of preference. In: Ciaccia, P., Rabitti, F., Soda, G.: (eds) The 10th Italian National Conference on Advanced Data Base Systems (SEBD 2002), pp. 347–360 (2002)

  33. Torlone, R., Ciaccia, P.: Which are my preferred items? In: Workshop on Recommendation and Personalization in eCommerce (RPEC), pp. 1–9. Malaga, Spain (2002)

  34. Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., Zhang, Q.: Efficient computation of the skyline cube. In: VLDB, pp. 241–252 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parke Godfrey.

Additional information

Part of this work was conducted at William & Mary where Ryan Shipley was a student and Parke Godfrey was on faculty while on leave of absence from York.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Godfrey, P., Shipley, R. & Gryz, J. Algorithms and analyses for maximal vector computation. The VLDB Journal 16, 5–28 (2007). https://doi.org/10.1007/s00778-006-0029-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0029-7

Keywords

Navigation