Skip to main content
Log in

SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The skyline operator determines points in a multidimensional dataset that offer some optimal trade-off. State-of-the-art CPU skyline algorithms exploit quad-tree partitioning with complex branching to minimise the number of point-to-point comparisons. Branch-phobic GPU skyline algorithms rely on compute throughput rather than partitioning, but fail to match the performance of sequential algorithms. In this paper, we introduce a new skyline algorithm, SkyAlign, that is designed for the GPU, and a GPU-friendly, grid-based tree structure upon which the algorithm relies. The search tree allows us to dramatically reduce the amount of work done by the GPU algorithm by avoiding most point-to-point comparisons at the cost of some compute throughput. This trade-off allows SkyAlign to achieve orders of magnitude faster performance than its predecessors. Moreover, a NUMA-oblivious port of SkyAlign outperforms native multicore state of the art on challenging workloads by an increasing margin as more cores and sockets are utilised.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Without loss of generality and to simplify exposition, we assume smaller values are better, but to handle mixed preferences (e.g. Table 1) is a straightforward adaptation.

  2. Obtained by counting low-level operations in Algorithm 1 of GGS [2] (the branch-free dominance test). Branching DTs are ill-suited to GPUs and have unpredictable, variable cost.

  3. Hyper-threading also hides latencies, but it is not as impactful as the features that we will explicitly analyse.

  4. We assume “large” memory to simplify the algorithm.

  5. Manhattan norm is the sum of all attribute values.

  6. Either via PCIe transfer from CPU (host) memory or because the previous GPU operator in the query plan completes.

  7. Progressive skyline algorithms [17] can output solution points as they are discovered, in contrast to an algorithm that must fully complete before any solution point can be confirmed.

  8. Strictly speaking, some warps run concurrently while others queue, and the order in which they are queued is unpredictable.

  9. The code is available at: https://github.com/sean-chester/SkyBench.

  10. This is the number of concurrent threads on our GPU.

  11. CUDA 7 and C++ are similar enough that converting from the former to the latter is trivial.

  12. The code is available at: https://github.com/sean-chester/SkyBench.

  13. BSkyTree does not use the pre-filter that the other three methods use, but its pivot selection routine has a similar effect.

  14. Up to eight instructions can be retired in a cycle if they are ready for execution, but the front-end (the instruction fetch-decode cycle) populates the queue at a slower rate.

References

  1. Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. TODS 33(4), 31:1–49 (2008)

    Article  Google Scholar 

  2. Bøgh, K.S., Assent, I., Magnani, M.: Efficient GPU-based skyline computation. In: Proceedings of the DaMoN, pp. 5:1–6 (2013)

  3. Bøgh, K.S., Chester, S., Assent, I.: Work-efficient skyline computation for the GPU. PVLDB 8(9), 962–973 (2015)

    Google Scholar 

  4. Börzsönyi, S., Kossman, D., Stocker, K.: The skyline operator. In: Proceedings of the ICDE, pp. 421–430 (2001)

  5. Chester, S., Šidlauskas, D., Assent, I., Bøgh, K.S.: Scalable parallelization of skyline computation for multi-core processors. In: Proceedings of the ICDE (2015)

  6. Cho, S.R., Lee, J., Hwang, S.W., Han, H., Lee, S.W.: VSkyline: vectorization for efficient skyline computation. SIGMOD Rec. 39(2), 19–26 (2010)

    Article  Google Scholar 

  7. Choi, W., Liu, L., Yu, B.: Multi-criteria decision making with skyline computation. In: Proceedings of the IRI, pp. 316–323 (2012)

  8. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proc of the ICDE, pp. 717–719 (2003)

  9. He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. TODS 34(4), 1–39 (2009)

    Article  Google Scholar 

  10. He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational joins on graphics processors. In: Proceedings of the SIGMOD, pp. 511–524 (2008)

  11. Hose, K., Vlachou, A.: A survey of skyline processing in highly distributed environments. VLDB J. 21(3), 359–384 (2012)

    Article  Google Scholar 

  12. Im, H., Park, J., Park, S.: Parallel skyline computation on multicore architectures. Inf. Syst. 36(4), 808–823 (2011)

    Article  MathSciNet  Google Scholar 

  13. Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: Proceedings of the DaMoN, pp. 55–62 (2012)

  14. Lee, J., Hwang, S.W.: Scalable skyline computation using a balanced pivot selection technique. Inf. Syst. 39, 1–24 (2014)

    Article  MathSciNet  Google Scholar 

  15. Lee, K.C.K., Zheng, B., Li, H., Lee, W.C.: Approaching the skyline in Z order. In: Proceedings of the VLDB, pp. 279–290 (2007)

  16. Mullesgaard, K., Pedersen, J.L., Lu, H., Zhou, Y.: Efficient skyline computation in MapReduce. In: Proceedings of the EDBT, pp. 37–48 (2014)

  17. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. TODS 30(1), 41–82 (2005)

    Article  Google Scholar 

  18. Park, Y., Min, J.K., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. PVLDB 6(14), 2002–2011 (2013)

    Google Scholar 

  19. Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proceedings of the VLDB, pp. 301–310 (2001)

  20. Vlachou, A., Doulkeridis, C., Kotidis, Y.: Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the SIGMOD, pp. 227–238 (2008)

  21. Woods, L., Alonso, G., Teubner, J.: Parallel computation of skyline queries. In: Proceedings of the FCCM, pp. 1–8 (2013)

  22. Zhang, K., Yang, D., Gao, H., Li, J., Wang, H., Cai, Z.: VMPSP: Efficient skyline computation using VMP-based space partitioning. In: Proceedings of the DASFAA Workshops, pp. 179–193 (2016)

  23. Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: Proceedings of the SIGMOD, pp. 483–494 (2009)

Download references

Acknowledgments

This research was supported through the WallViz (Danish Council for Strategic Research) and ExiBiDa (Norwegian Research Council) projects. The authors thank the Harvard DASlab for the use of their quad-socket machine and the anonymous reviewers for their helpful comments and suggestions of informative experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Chester.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bøgh, K.S., Chester, S. & Assent, I. SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures. The VLDB Journal 25, 817–841 (2016). https://doi.org/10.1007/s00778-016-0438-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0438-1

Keywords

Navigation