Skip to main content

Task Scheduling of Data-Parallel Applications on HSA Platform

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2018)

Abstract

As CPU processing speed has slowed down year-on-year, heterogeneous “CPU-GPU” architectures combining multi-core CPU and GPU accelerators have become increasingly attractive. Under this backdrop, the Heterogeneous System Architecture (HSA) standard was released in 2012. New Accelerated Processing Unit (APU) architectures – AMD Kaveri and Carrizo – were released in 2014 and 2015 respectively, and are compliant with HSA. These architectures incorporate two technologies central to HSA, hUMA (heterogeneous Unified Memory Access) and hQ (heterogeneous Queuing). This paper realizes radix sort and matrix-vector multiplication – two data-parallel applications on Kaveri platform. By analyzing the performance, a dynamic task scheduling stratgy is proposed. The experimental results show that the running efficiency of algorithm can be greatly improved by using APU with reasonable task scheduling. In the same way, the other data-parallel algorithm would also be optimized on these heterogeneous multi-core architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rogers, P.: Heterogeneous system architecture overview. In: 25th IEEE Hot Chips Symposium, HCS 2013, pp. 7–48. IEEE, New York (2016)

    Google Scholar 

  2. Heterogeneous System Architecture: A Technical Review. http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf

  3. Bouvier, D., Sander, B.: Applying AMD’s Kaveri APU for heterogeneous computing. In: 2014 IEEE Hot Chips 26 Symposium, vol. 30, no. 4, pp. 1–42 (2014)

    Google Scholar 

  4. Krishnan, G., Bouvier, D., Zhang, L., et al.: Energy efficient graphics and multimedia in 28 nm Carrizo APU. In: 2015 IEEE Hot Chips 27 Symposium, HCS 2015, pp. 1–34. IEEE, New York (2015)

    Google Scholar 

  5. Krishnan, G., Bouvier, D., Naffziger, S.: Energy-efficient graphics and multimedia in 28-nm carrizo accelerated processing unit. IEEE Micro 36(2), 22–33 (2016)

    Article  Google Scholar 

  6. Bao, Z.S., Chen, C., Zhang, W.B., et al.: Study on heterogeneous queuing. In: International Conference on Information Engineering and Communications Technology (IECT2016), Shanghai, China (2016)

    Google Scholar 

  7. Ukidave, Y., Ziabari, A.K., Mistry, P., Schirner, G., Kaeli, D.: Quantifying the energy efficiency of FFT on heterogeneous platforms. In: Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2013), pp. 235–244 (2013)

    Google Scholar 

  8. Franz, W., Thulasiraman, P., Thulasiram, R.K.: Optimization of an OpenCL-based multi-swarm PSO algorithm on an APU. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8385, pp. 140–150. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55195-6_13

    Chapter  Google Scholar 

  9. Che, S., Orr, M., Rodgers, G., et al.: Betweenness centrality in an HSA-enabled system. In: Proceedings of the ACM Workshop on High Performance Graph Processing, Co-located with HPDC 2016, pp. 35–38. ACM, New York (2016)

    Google Scholar 

  10. Sun, Y.F., Gong, X., Ziabari, A.K., et al.: Hetero-mark, a benchmark suite for CPU-GPU collaborative computing. In: Proceedings of the 2016 IEEE International Symposium on Workload Characterization, pp. 13–22. IEEE, New York (2016)

    Google Scholar 

  11. Calandra, H., Dolbeau, R., Fortin, P., Lamotte, J.-L., Said, I.: Evaluation of successive CPUs/APUs/GPUs based on an OpenCL finite difference stencil. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2013), Belfast, United kingdom (2013)

    Google Scholar 

  12. AMD: CLOC. https://github.com/HSAFoundation

Download references

Acknowledgement

This work was supported by the significant special project for Core electronic devices, high-end general chips and basic software products (2012ZX01039-004), and also supported by Beijing Key Laboratory on Integration and Analysis of Large Scale Stream Data.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chong Chen or Wenbo Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bao, Z., Chen, C., Zhang, W. (2018). Task Scheduling of Data-Parallel Applications on HSA Platform. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds) Data Science. ICPCSEE 2018. Communications in Computer and Information Science, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-13-2203-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2203-7_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2202-0

  • Online ISBN: 978-981-13-2203-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics