The VLDB Journal

, Volume 16, Issue 1, pp 5–28 | Cite as

Algorithms and analyses for maximal vector computation

Special Issue Paper

Abstract

The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied.The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors N and the dimensionality K. Early algorithms based on divide and conquer established seemingly good average and worst-case asymptotic runtimes. In fact, the problem can be solved in \(\mathcal{O}(KN)\) average-case (holding K as fixed). We prove, however, that the performance is quite bad with respect to K. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve \(\mathcal{O}(KN)\) average-case. While K matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its average-case running time is \(\mathcal{O}(KN)\).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balke W.T., Güntzer U. (2004) Multi-objective query processing for databas systems. In: Nascimento M.A., Özsu M.T., Kossmann D., Miller R.J., Blakeley J.A., Schiefer K.B. (eds). Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, Toronto, Canada, pp. 936–947Google Scholar
  2. 2.
    Balke, W.T., Güntzer, U.: Supporting skyline queries on categorical data in web information systems. In: IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA 2004), pp. 1–6 (2004)Google Scholar
  3. 3.
    Balke, W.T., Güntzer, U.: Efficient skyline queries under weak pareto dominance. In: IJCAI-05 Multidisciplinary Workshop on Advances in Preference Handling (Preference 2005), pp. 1–7 (2005)Google Scholar
  4. 4.
    Barndorff-Nielsen O., Sobel M. (1966) On the distribution of the number of admissible points in a vector random sample. Theory Probab Appl 11(2): 249–269MATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. In: Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 179–187. ACM/SIAM (1990)Google Scholar
  6. 6.
    Bentley J.L., Kung H.T., Schkolnick M., Thompson C.D. (1978) On the average number of maxima in a set of vectors and applications. JACM 25(4): 536–543MATHMathSciNetCrossRefGoogle Scholar
  7. 7.
    Blum M., Floyd R.W., Pratt V., Rivest R.L., Tarjan R.E. (1973) Time bounds for selection. J. Comput. Syst. Sci. 7(4): 448–461MATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th ICDE, pp. 421–430 (2001)Google Scholar
  9. 9.
    Buchta C. (1989) On the average number of maxima in a set of vectors. Inf. Process. Lett. 33, 63–65MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Chan, C.Y., Eng, P.K., Tan, K.L.: Efficient processing of skyline queries with partially-ordered domains. In: ICDE, pp. 190–191 (2005)Google Scholar
  11. 11.
    Chan, C.Y., Eng, P.K., Tan, K.L.: Stratified computation of skylines with partially-ordered domains. In: SIGMOD Conference, pp. 203–214 (2005)Google Scholar
  12. 12.
    Chaudhuri, S., Dalvi, N., Raghav, K.: Robust cardinality and cost estimation for skyline operator. In: ICDE (To appear, 2006)Google Scholar
  13. 13.
    Chomicki J. (2002) Querying with intrinsic preferences. In: Jensen C.S., Jeffery K.G., Pokorný J., Saltenis S., Bertino E., Böhm K., Jarke M. (eds). Proceedings of the 8th International Conference on Extending Database Technology (EDBT), LNCS 2287. Springer, Prague, Czech Republic, pp. 34–51Google Scholar
  14. 14.
    Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. Technical. Report 04, Computer Science, York University, Toronto, Ontario, Canada (2002)Google Scholar
  15. 15.
    Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 717–719 (2003). See [14] for a longer versionGoogle Scholar
  16. 16.
    Chomicki J., Godfrey P., Gryz J., Liang D. (2005) Skyline with presorting: Theory and optimization. In: Klopotek M.A., Wierzchon S.T., Trojanowski K. (eds). Proceedings of the Intelligent Information Systems Conference (IIS): New Trends in Intelligent Information Processing and Web Mining, Advances in Soft Computing. Springer, Gdansk, Poland, pp. 593–602Google Scholar
  17. 17.
    Ciaccia, P.: Evaluating preferences with non-transitive preferences. Presentation at the Dagstuhl Seminar 04271 (Preferences: Specification, Inference, Applications) (2004)Google Scholar
  18. 18.
    Eng P.K., Ooi B.C., Tan K.L. (2003) Indexing for progressive skyline computation. Data Knowl. Eng. 46(2): 169–201CrossRefGoogle Scholar
  19. 19.
    Godfrey P. (2004) Skyline cardinality for relational processing. In: Seipel D., Torres J.M.T. (eds) Proceedings of the 3rd International Symposium on Foundations of Information and Knowledge Systems (FoIKS). Springer, Wilhelminenberg Castle, Austria, pp. 78–97Google Scholar
  20. 20.
    Godfrey P., Shipley R., Gryz J. (2005) Maximal vector computation in large data sets. In: Böhm K., Jensen C.S., Haas L.M., Kersten M.L., Larson P.Å., Ooi B.C. (eds) Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005). ACM, Trondheim, Norway, pp. 229–240Google Scholar
  21. 21.
    Hellerstein J.M., Avnur R., Chou A., Hidber C., Olston C., Raman V., Roth T., Haas P.J. (1999) Interactive data analysis: The control project. IEEE Comput. 32(8):51–59Google Scholar
  22. 22.
    Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: ICDE (To appear, 2006)Google Scholar
  23. 23.
    Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: PKDD, pp. 255–266 (2004)Google Scholar
  24. 24.
    Kossmann, D., Ramask, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB-2002), pp. 275–286 (2002)Google Scholar
  25. 25.
    Kung H.T., Luccio F., Preparata F.P. (1975) On finding the maxima of a set of vectors. JACM 22(4): 469–476MATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: ICDE, pp. 502–513 (2005)Google Scholar
  27. 27.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM Press, Newyork (2003)Google Scholar
  28. 28.
    Papadias D., Tao Y., Fu G., Seeger B. (2005) Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1): 41–82CrossRefGoogle Scholar
  29. 29.
    Pei, J., Jin, W., Ester, M., Tao, Y.: Catching the best views of skyline: a semantic approach based on decisive subspaces. In: VLDB, pp. 253–264 (2005)Google Scholar
  30. 30.
    Tan K.L., Eng P.K., Ooi B.C. (2001) Efficient progressive skyline computation. In: Apers P.M.G., Atzeni P., Ceri S., Paraboschi S., Ramamohanarao K., Snodgrass R.T. (eds). Proceedings of 27th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, Rome, Italy, pp. 301–310Google Scholar
  31. 31.
    Tao, Y., Xiao, X., Pei, J.: SUBSKY: efficient computation of skylines in subspaces. In: ICDE (to appear, 2006)Google Scholar
  32. 32.
    Torlone, R., Ciaccia, P.: Finding the best when it’s a matter of preference. In: Ciaccia, P., Rabitti, F., Soda, G.: (eds) The 10th Italian National Conference on Advanced Data Base Systems (SEBD 2002), pp. 347–360 (2002)Google Scholar
  33. 33.
    Torlone, R., Ciaccia, P.: Which are my preferred items? In: Workshop on Recommendation and Personalization in eCommerce (RPEC), pp. 1–9. Malaga, Spain (2002)Google Scholar
  34. 34.
    Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., Zhang, Q.: Efficient computation of the skyline cube. In: VLDB, pp. 241–252 (2005)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.York UniversityTorontoCanada
  2. 2.The College of William and MaryWilliamsburgUSA

Personalised recommendations