Skip to main content
Log in

Parallel Skyline Queries

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

In this paper, we design and analyze parallel algorithms for skyline queries. The skyline of a multidimensional set consists of the points for which no other point exists that is at least as good along every dimension. As a framework for parallel computation, we use both the MP model proposed in Koutris and Suciu (2011), which requires that the data is perfectly load-balanced, and a variation of the model in Afrati and Ullman (2010), the GMP model, which demands weaker load balancing constraints. In addition to load balancing, we want to minimize the number of blocking steps, where all processors must wait and synchronize. We propose a 2-step algorithm in the MP model for any dimension of the dataset, as well a 1-step algorithm for the case of 2 and 3 dimensions. Finally, we present a 1-step algorithm in the GMP model for any number of dimensions and a 1-step algorithm in the MP model for uniform distributions of data points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Throughout this paper, we will assume set (and not bag) semantics.

  2. If the Conjunctive Query has k variables, then ε is at most 1/k.

  3. In [13], the size of the broadcast data was required to be O(n ε), for some ε<1. In this paper we impose a stricter bound, by requiring it to be independent on n.

  4. It will be one of p, p logp or p 1/(d−1).

References

  1. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT, ACM International Conference Proceeding Series, vol. 426, pp 99–110. ACM (2010)

  2. Berenbrink, P., Friedetzky, T., Hu, Z., Martin, R.A.: On weighted balls-into-bins games. Theor. Comput. Sci. 409(3), 511–520 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  3. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp 421–430. IEEE Computer Society (2001)

  4. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: ICDE, pp 717–816. IEEE Computer Society (2003)

  5. Cosgaya-Lozano, A., Rau-Chaplin, A., Zeh, N.: Parallel computation of skyline queries. In: HPCS, p 12. IEEE Computer Society (2007)

  6. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp 137–150 (2004)

  7. Dehne, F.K.H.A., Fabri, A., Rau-Chaplin, A.: Scalable parallel geometric algorithms for coarse grained multicomputers. In: Symposium on Computational Geometry, pp 298–307 (1993)

  8. Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB 2(2), 1414–1425 (2009)

    Google Scholar 

  9. Godfrey, P., Shipley, R., Gryz, J.: Maximal vector computation in large data sets. In: VLDB, pp 229–240. ACM (2005)

  10. Hellerstein, J.M.: The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Record 39(1), 5–19 (2010)

    Article  Google Scholar 

  11. Karloff, H.J., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: SODA, pp 938–948. SIAM (2010)

  12. Köhler, H., Yang, J., Zhou, X.: Efficient parallel skyline processing using hyperplane projections. In: SIGMOD Conference, pp 85–96. ACM (2011)

  13. Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: PODS, pp 223–234. ACM (2011)

  14. Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. J. ACM 22(4), 469–476 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  15. Lee, K.C.K., Zheng, B., Li, H., Lee, W.C.: Approaching the skyline in z order. In: VLDB, pp 279–290. ACM (2007)

  16. Matousek, J.: Computing dominances in E n. Inf. Process. Lett. 38(5), 277–278 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  17. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)

    Article  Google Scholar 

  18. Park, S., Kim, T., Park, J., Kim, J., Im, H.: Parallel skyline computation on multicore architectures. In: ICDE, pp 760–771. IEEE (2009)

  19. Raab, M., Steger, A.: balls into bins - a simple and tight analysis. In: RANDOM, pp 159–170 (1998)

  20. Rocha-Junior, J.B., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Agids: A grid-based strategy for distributed skyline query processing. In: Globe, Lecture Notes in Computer Science, vol. 5697, pp 12–23. Springer (2009)

  21. Stojmenovic, I., Miyakawa, M.: An optimal parallel algorithm for solving the maximal elements problem in the plane. Parallel Comput. 7(2), 249–251 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  22. Vlachou, A., Doulkeridis, C., Kotidis, Y.: Angle-based space partitioning for efficient parallel skyline computation. In: SIGMOD Conference, pp 227–238. ACM (2008)

  23. Wang, S., Ooi, B.C., Tung, A.K.H., Xu, L.: Efficient skyline query processing on peer-to-peer networks. In: ICDE, pp 1126–1135. IEEE (2007)

  24. Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., Abbadi, A.E.: Parallelizing skyline queries for scalable distribution. In: EDBT, Lecture Notes in Computer Science, vol. 3896, pp 112–130. Springer (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paraschos Koutris.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Afrati, F.N., Koutris, P., Suciu, D. et al. Parallel Skyline Queries. Theory Comput Syst 57, 1008–1037 (2015). https://doi.org/10.1007/s00224-015-9627-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-015-9627-3

Keywords

Navigation