Skip to main content
Log in

Two-Level Task Scheduling for Irregular Applications on GPU Platform

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

With a data parallel design, GPUs depend on uniform work distribution to expose their full potential. Therefore, irregular applications suffer from serious performance degradation as it is highly challenging to schedule irregular tasks on a GPU: It requires understandings of GPU architecture and irregular applications to devise a scheduling most suitable in this context, not to mention error-prone concurrent programming. This paper proposes a two-level scheduling to distribute irregular tasks and enable resource sharing on GPUs, by managing tasks and threads hierarchically. Meanwhile, we manage to group cache friendly tasks for more data reuse in L1 cache. We further extend our scheduling to handle nested irregularities. Besides, we devise a programming framework to facilitate the task scheduling for application programmers. The experimental results show that our approach effectively improves performance of six irregular applications on a typical platform, yielding a harmonic-mean speedup of \(2.1\times \) at a small schedule cost, and does not burden programmers with lots of work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Ruetsch, G., Micikevicius, P.: Optimize matrix transpose in CUDA. NVIDIA (2009). http://www.cs.colostate.edu/~cs675/MatrixTranspose.pdf

  2. Fujimoto, N.: Faster matrix-vector multiplication on geforce 8800 gtx. In: Proceedings of IEEE IPDPS, pp. 1–8 (2008)

  3. Cederman, D., Tsigas, P.: Dynamic load balancing using work-stealing. In: Hwu, WmW (ed.) GPU Computing Gems Jade Edition, pp. 485–499. Morgan Kaufmann, Boston (2012)

    Chapter  Google Scholar 

  4. Chen, L., Villa, O., Krishnamoorthy, S., Gao, G.: Dynamic load balancing on single-and multi-gpu systems. In: Proceedings of IEEE IPDPS, pp. 1–12 (2010)

  5. Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on gpus. In: Proceedings of IEEE IISWC, pp. 141–151 (2012)

  6. Jia, H., Zhang, Y., Wang, W., Xu, J.: Accelerating viola-jones facce detection algorithm on gpus. In: Proceedings of IEEE HPCC-ICESS, Liverpool, UK, pp. 396–403 (2012)

  7. Müller, C., Strengert, M., Ertl, T.: Adaptive load balancing for raycasting of non-uniformly bricked volumes. Parallel Comput. 33(6), 406–419 (2007)

    Article  Google Scholar 

  8. Zhu, Z., Li, J., Li, G.: Load-balanced breadth-first search on gpus. In: Web-Age Information Management. Volume 8485 of Lecture Notes in Computer Science, pp. 435–447. Springer (2014)

  9. Harish, P., Narayanan, P.: Accelerating large graph algorithms on the gpu using cuda. In: High Performance Computing, pp. 197–208. Springer, Berlin (2007)

  10. Liu, L., Li, Y., Cui, Z., Bao, Y., Chen, M., Wu, C.: Going vertical in memory management: Handling multiplicity by multi-policy. In: Proceedings of IEEE ISCA, pp. 169–180 (2014)

  11. NVIDIA: CUDA C Programming Guide. Version 6.5 edn. (2014)

  12. Aila, T., Laine, S.: Understanding the efficiency of ray traversal on gpus. In: Proceedings of ACM HPG, pp. 145–149 (2009)

  13. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C 28(1), 100–108 (1979)

    MATH  Google Scholar 

  14. Tsiodras, T.: Renderer 2.x-porting to cuda (one month later). Accessed 2015-01

  15. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: Proceedings of IEEE IISWC, pp. 44–54 (2009)

  16. Bradski, G.: Opencv. Dr. Dobb’s Journal of Software Tools (2000). http://opencv.org/

  17. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings ACM/IEEE Conference on Supercomputing, pp. 18:1–18:11 (2009)

  18. Dan Ginsburg, P.E.G., Pienaar, R.: OpenCL Programming Guide. Addison-Wesley Professional, Boston (2011)

    Google Scholar 

  19. Fung, W.W.L., Sham, I., Yuan, G., Aamodt, T.M.: Dynamic warp formation and scheduling for efficient gpu control flow. In: Proceedings of IEEE/ACM MICRO, pp. 407–420 (2007)

  20. Meng, J., Tarjan, D., Skadron, K.: Dynamic warp subdivision for integrated branch and memory divergence tolerance. In: Proceedings of ACM ISCA, pp. 235–246 (2010)

  21. Hong, S., Kim, S.K., Oguntebi, T., Olukotun, K.: Accelerating cuda graph algorithms at maximum warp. In: Proceedings of ACM PPoPP, pp. 267–276 (2011)

  22. Tzeng, S., Patney, A., Owens, J.D.: Task management for irregular-parallel workloads on the gpu. In: Proceedings of HPG, Eurographics Association, pp. 29–37 (2010)

  23. Zhang, E., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for gpu computing. In: Proceedings of ACM ASPLOS, pp. 369–380 (2011)

  24. Ma, L., Agrawal, K., Chamberlain, R.D.: Theoretical analysis of classic algorithms on highly-threaded many-core gpus. In: Proceedings of ACM SIGPLAN PPoPP, pp. 391–392 (2014)

Download references

Acknowledgments

This work is supported by the National High Technology Research and Development Program of China (2012AA010902), the National Natural Science Foundation of China (61432018), and the Innovation Research Group of NSFC (61221062).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Liu, L., Wu, Y. et al. Two-Level Task Scheduling for Irregular Applications on GPU Platform. Int J Parallel Prog 45, 79–93 (2017). https://doi.org/10.1007/s10766-015-0387-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-015-0387-0

Keywords

Navigation