Skip to main content

OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous Platforms

  • 1133 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 13157)

Abstract

Programming on heterogeneous hardware architectures using OpenCL requires thorough knowledge of the hardware. Many High-Performance Domain-Specific Languages (HPDSLs) are aimed at simplifying the programming efforts by abstracting away hardware details, allowing users to program in a sequential style. However, most HPDSLs still require the users to manually map compute workloads to the best suitable hardware to achieve optimal performance. This again calls for knowledge of the underlying hardware and trial-and-error attempts. Further, very often they only consider an offloading mode where compute-intensive tasks are offloaded to accelerators. During this offloading period, CPUs remain idle, leaving parts of the available computational power untapped. In this work, we propose a tool named OptCL for existing HPDSLs to enable a heterogeneous co-execution mode when capable where CPUs and accelerators can process data simultaneously. Through a static analysis of data dependencies among compute-intensive code regions and performance predictions, the tool selects the best execution schemes out of purely CPU/accelerator execution or co-execution. We show that by enabling co-execution on dedicated and integrated CPU-GPU systems up to 13\(\times \) and 21\(\times \) speed-ups can be achieved.

Keywords

  • Heterogeneous hardware
  • OpenCL
  • Domain-Specific Language
  • CPU
  • GPU

This work was financially supported by the Singapore National Research Foundation under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.

Financial support was provided by the Deutsche Forschungsgemeinschaft (DFG) research grant UH-66/15-1 (MoSiLLDe).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-95391-1_48
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-95391-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Notes

  1. 1.

    https://github.com/xjjex1990/OptCL.

  2. 2.

    https://github.com/BeauJoh/OpenDwarfs.

  3. 3.

    https://github.com/xjjex1990/OpenABL_Extension.

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    CrossRef  Google Scholar 

  2. Brown, K.J., et al.: Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns. In: 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Barcelona, Spain, pp. 194–205. IEEE (2016)

    Google Scholar 

  3. Chikin, A., Amaral, J.N., Ali, K., Tiotto, E.: Toward an analytical performance model to select between GPU and CPU execution. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, pp. 353–362. IEEE (2019)

    Google Scholar 

  4. Codeplay: Codeplay: ComputeCpp. https://www.codeplay.com/products/computecpp/. Accessed 30 July 2020

  5. Cosenza, B., et al.: Easy and efficient agent-based simulations with the OpenABL language and compiler. Future Gener. Comput. Syst. 116, 61–75 (2021)

    CrossRef  Google Scholar 

  6. Grewe, D., O’Boyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19861-8_16

    CrossRef  Google Scholar 

  7. Grosser, T., Hoefler, T.: Polly-ACC transparent compilation to heterogeneous hardware. In: Proceedings of the 2016 International Conference on Supercomputing, Istanbul, Turkey, pp. 1–13. ACM (2016)

    Google Scholar 

  8. Guzman, M.A.D., Nozal, R., Tejero, R.G., Villarroya-Gaudo, M., Gracia, D.S., Bosque, J.L.: Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL. J. Supercomput. 75(3), 1732–1746 (2019)

    CrossRef  Google Scholar 

  9. Huang, S., et al.: Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures. In: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, Mumbai, India, pp. 79–90. ACM (2019)

    Google Scholar 

  10. Johnston, B., Falzon, G., Milthorpe, J.: OpenCL performance prediction using architecture-independent features. In: 2018 International Conference on High Performance Computing & Simulation (HPCS), Orleans, France, pp. 561–569. IEEE (2018)

    Google Scholar 

  11. Majeti, D., Sarkar, V.: Heterogeneous Habanero-C (H2C): a portable programming model for heterogeneous processors. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, Hyderabad, India, pp. 708–717. IEEE (2015)

    Google Scholar 

  12. Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. (CSUR) 47(4), 1–35 (2015)

    CrossRef  Google Scholar 

  13. Moren, K., Göhringer, D.: Automatic mapping for OpenCL-programs on CPU/GPU heterogeneous platforms. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10861, pp. 301–314. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93701-4_23

    CrossRef  Google Scholar 

  14. Navarro, A., Corbera, F., Rodriguez, A., Vilches, A., Asenjo, R.: Heterogeneous parallel_for template for CPU-GPU chips. Int. J. Parallel Program. 47(2), 213–233 (2019)

    CrossRef  Google Scholar 

  15. Ohshima, S., Yamazaki, I., Ida, A., Yokota, R.: Optimization of hierarchical matrix computation on GPU. In: Yokota, R., Wu, W. (eds.) SCFA 2018. LNCS, vol. 10776, pp. 274–292. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-69953-0_16

    CrossRef  Google Scholar 

  16. Pandit, P., Govindarajan, R.: Fluidic Kernels: cooperative execution of OpenCL programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, Orlando, FL, USA, pp. 273–283. ACM (2014)

    Google Scholar 

  17. Pereira, A.D., Rocha, R.C., Ramos, L., Castro, M., Góes, L.F.: Automatic partitioning of stencil computations on heterogeneous systems. In: 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), Campinas, Brazil, pp. 43–48. IEEE (2017)

    Google Scholar 

  18. Pérez, B., Bosque, J.L., Beivide, R.: Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, Barcelona, Spain, pp. 42–51. ACM (2016)

    Google Scholar 

  19. Pérez, B., et al.: Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems. J. Parallel Distrib. Comput. 125, 45–57 (2019)

    CrossRef  Google Scholar 

  20. Phothilimthana, P.M., Ansel, J., Ragan-Kelley, J., Amarasinghe, S.: Portable performance on heterogeneous architectures. In: Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, Houston, Texas, USA, pp. 431–444. ACM (2013)

    Google Scholar 

  21. Pérez, B., Stafford, E., Bosque, J., Beivide, R.: Sigmoid: an auto-tuned load balancing algorithm for heterogeneous systems. J. Parallel Distrib. Comput. 157, 30–42 (2021)

    CrossRef  Google Scholar 

  22. Price, J., McIntosh-Smith, S.: Oclgrind: an extensible OpenCL device simulator. In: Proceedings of the 3rd International Workshop on OpenCL, Palo Alto, CA, USA. ACM (2015)

    Google Scholar 

  23. Rao, D.M., Thondugulam, N.V., Radhakrishnan, R., Wilsey, P.A.: Unsynchronized parallel discrete event simulation. In: 1998 Winter Simulation Conference. Proceedings (Cat. No. 98CH36274), Washington, USA, vol. 2, pp. 1563–1570. IEEE (1998)

    Google Scholar 

  24. Riebler, H., Vaz, G., Kenter, T., Plessl, C.: Transparent acceleration for heterogeneous platforms with compilation to OpenCL. ACM Trans. Archit. Code Optim. (TACO) 16(2), 1–26 (2019)

    CrossRef  Google Scholar 

  25. Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, Beijing, China, pp. 61–70. ACM (2012). https://doi.org/10.1145/2248418.2248428

  26. Sotomayor, R., Sanchez, L.M., Blas, J.G., Fernandez, J., Garcia, J.D.: Automatic CPU/GPU generation of multi-versioned OpenCL kernels for C++ scientific applications. Int. J. Parallel Program. 45(2), 262–282 (2017)

    CrossRef  Google Scholar 

  27. Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. ACM SIGPLAN Not. 50(9), 205–217 (2015)

    CrossRef  MathSciNet  Google Scholar 

  28. Tillet, P., Rupp, K., Selberherr, S.: An automatic OpenCL compute kernel generator for basic linear algebra operations. In: Proceedings of the 2012 Symposium on High Performance Computing, Orlando, FL, USA, pp. 1–2. ACM (2012)

    Google Scholar 

  29. Trigkas, A.: Investigation of the OpenCL SYCL programming model. Master’s thesis, The University of Edinburgh, UK (2014)

    Google Scholar 

  30. Xiao, J., Andelfinger, P., Cai, W., Richmond, P., Knoll, A., Eckhoff, D.: OpenABLext: an automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms. Concurr. Comput. Pract. Exp. 32, e5807 (2020). https://doi.org/10.1002/CPE.5807

    CrossRef  Google Scholar 

  31. Xiao, J., Andelfinger, P., Eckhoff, D., Cai, W., Knoll, A.: Exploring execution schemes for agent-based traffic simulation on heterogeneous hardware. In: Proceedings of the International Symposium on Distributed Simulation and Real Time Applications, Madrid, Spain, pp. 1–10. IEEE (2018)

    Google Scholar 

  32. Xiao, J., Andelfinger, P., Eckhoff, D., Cai, W., Knoll, A.: A survey on agent-based simulation using hardware accelerators. ACM Comput. Surv. (CSUR) 51(6), 1–35 (2019)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiajian Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Xiao, J., Andelfinger, P., Cai, W., Eckhoff, D., Knoll, A. (2022). OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous Platforms. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13157. Springer, Cham. https://doi.org/10.1007/978-3-030-95391-1_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95391-1_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95390-4

  • Online ISBN: 978-3-030-95391-1

  • eBook Packages: Computer ScienceComputer Science (R0)