Skip to main content

Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs

  • Conference paper
  • First Online:
Accelerator Programming Using Directives (WACCPD 2017)

Abstract

While multi-core CPUs and many-core GPUs are both viable platforms for parallel computing, programming models for them can impose large burdens upon programmers due to their complex and low-level APIs. Since managed languages like Java are designed to be run on multiple platforms, parallel language constructs and APIs such as Java 8 Parallel Stream APIs can enable high-level parallel programming with the promise of performance portability for mainstream (“non-ninja”) programmers. To achieve this goal, it is important for the selection of the hardware device to be automated rather than be specified by the programmer, as is done in current programming models. Due to a variety of factors affecting performance, predicting a preferable device for faster performance of individual kernels remains a difficult problem. While a prior approach uses machine learning to address this challenge, there is no comparable study on good supervised machine learning algorithms and good program features to track. In this paper, we explore (1) program features to be extracted by a compiler and (2) various machine learning techniques that improve accuracy in prediction, thereby improving performance. The results show that an appropriate selection of program features and machine learning algorithm can further improve accuracy. In particular, support vector machines (SVMs), logistic regression, and J48 decision tree are found to be reliable techniques for building accurate prediction models from just two, three, or four program features, achieving accuracies of 99.66%, 98.63%, and 98.28% respectively from 5-fold-cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. APARAPI: API for Data Parallel Java (2011). http://code.google.com/p/aparapi/. Accessed 20 June 2017

  2. Dubach, C., Cheng, P., Rabbah, R., Bacon, D.F., Fink, S.J.: Compiling a high-level language for gpus: (via language support for architectures and compilers). In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, pp. 1–12. ACM, New York (2012). http://doi.acm.org/10.1145/2254064.2254066

  3. Fumero, J.J., Remmelg, T., Steuwer, M., Dubach, C.: Runtime code generation and data management for heterogeneous computing in Java. In: Proceedings of the Principles and Practices of Programming on the Java Platform, PPPJ 2015, pp. 16–26. ACM, New York (2015). http://doi.acm.org/10.1145/2807426.2807428

  4. Fumero, J.J., Steuwer, M., Dubach, C.: A composable array function interface for heterogeneous computing in Java. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY 2014, pp. 44:44–44:49. ACM, New York (2014). http://doi.acm.org/10.1145/2627373.2627381

  5. Grcevski, N., Kielstra, A., Stoodley, K., Stoodley, M., Sundaresan, V.: JavaTM just-in-time compiler and virtual machine improvements for server and middleware applications. In: Proceedings of the 3rd Conference on Virtual Machine Research And Technology Symposium, VM 2004, vol. 3. p. 12. USENIX Association, Berkeley (2004). http://dl.acm.org/citation.cfm?id=1267242.1267254

  6. Grossman, M., Breternitz, M., Sarkar, V.: HadoopCL: MapReduce on Distributed heterogeneous platforms through seamless integration of Hadoop and OpenCL. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW 2013, pp. 1918–1927. IEEE Computer Society, Washington, DC (2013). https://doi.org/10.1109/IPDPSW.2013.246

  7. Grossman, M., Breternitz, M., Sarkar, V.: Hadoopcl2: motivating the design of a distributed, heterogeneous programming system with machine-learning applications. IEEE Trans. Parallel Distrib. Syst. 27(3), 762–775 (2016)

    Article  Google Scholar 

  8. Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V.: Accelerating Habanero-Java programs with OpenCL generation. In: Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, PPPJ 2013, pp. 124–134 (2013)

    Google Scholar 

  9. Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V.: Speculative execution of parallel programs with precise exception semantics on GPUs. In: Caşcaval, C., Montesinos, P. (eds.) LCPC 2013. LNCS, vol. 8664, pp. 342–356. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09967-5_20

    Google Scholar 

  10. Hayashi, A., Ishizaki, K., Koblents, G., Sarkar, V.: Machine-learning-based performance heuristics for runtime CPU/GPU selection. In: Proceedings of the Principles and Practices of Programming on the Java Platform, PPPJ 2015, pp. 27–36. ACM, New York (2015). http://doi.acm.org/10.1145/2807426.2807429

  11. Hayashi, A., et al.: Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 184–198. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19595-2_13

    Chapter  Google Scholar 

  12. Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA 2009, pp. 152–163. ACM, New York (2009). http://doi.acm.org/10.1145/1555754.1555775

  13. IBM Corporation: IBM SDK, Java Technology Edition, Version 8 (2015). https://developer.ibm.com/javasdk/downloads/. Accessed 20 June 2017

  14. Ishizaki, K., Hayashi, A., Koblents, G., Sarkar, V.: Compiling and optimizing java 8 programs for GPU execution. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 419–431, October 2015

    Google Scholar 

  15. JGF: The Java Grande Forum benchmark suite. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/java-grande-benchmark-suite

  16. Kaleem, R., Barik, R., Shpeisman, T., Lewis, B.T., Hu, C., Pingali, K.: Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT 2014, pp. 151–162. ACM, New York (2014). http://doi.acm.org/10.1145/2628071.2628088

  17. Karami, A., Mirsoleimani, S.A., Khunjush, F.: A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs. In: The 17th CSI International Symposium on Computer Architecture Digital Systems (CADS 2013), pp. 15–22, October 2013

    Google Scholar 

  18. Leung, A., Lhoták, O., Lashari, G.: Automatic parallelization for graphics processing units. In: Proceedings of the 7th International Conference on Principles and Practice of Programming in Java, PPPJ 2009, pp. 91–100 (2009)

    Google Scholar 

  19. Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 45–55. ACM, New York (2009). http://doi.acm.org/10.1145/1669112.1669121

  20. Luo, C., Suda, R.: A performance and energy consumption analytical model for GPU. In: 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 658–665, December 2011

    Google Scholar 

  21. NVIDIA: NVVM IR specification 1.3 (2017). http://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf. Accessed 20 June 2017

  22. NVIDIA: Parallel Thread Execution ISA v5.0 (2017). http://docs.nvidia.com/cuda/pdf/ptx_isa_5.0.pdf. Accessed 20 June 2017

  23. Nystrom, N., White, D., Das, K.: Firepile: run-time compilation for GPUs in scala. SIGPLAN Not. 47(3), 107–116 (2011). http://doi.acm.org/10.1145/2189751.2047883

    Google Scholar 

  24. OpenMP: OpenMP Application Program Interface, version 4.5 (2015). http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf. Accessed 20 June 2017

  25. Parboil: Parboil benchmarks. http://impact.crhc.illinois.edu/parboil/parboil.aspx

  26. PolyBench: The polyhedral benchmark suite. http://www.cse.ohio-state.edu/~pouchet/software/polybench

  27. Pratt-Szeliga, P., Fawcett, J., Welch, R.: Rootbeer: seamlessly using GPUs from Java. In: 14th IEEE International Conference on High Performance Computing and Communication and 9th IEEE International Conference on Embedded Software and Systems, HPCC-ICESS 2012, Liverpool, United Kingdom, June 25–27, 2012, pp. 375–380, June 2012

    Google Scholar 

  28. Machine Learning Group at the University of Waikato: Weka3: data mining software in Java (2017). http://www.cs.waikato.ac.nz/ml/weka/. Accessed 20 June 2017

  29. Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015

    Google Scholar 

  30. Yan, Y., Grossman, M., Sarkar, V.: JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_82

    Chapter  Google Scholar 

  31. Zaremba, W., Lin, Y., Grover, V.: JaBEE: framework for object-oriented Java bytecode compilation and execution on Graphics Processor Units. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 74–83. ACM, New York (2012). http://doi.acm.org/10.1145/2159430.2159439

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gloria Y. K. Kim .

Editor information

Editors and Affiliations

A Appendix

A Appendix

  • Backpropagation: In a Multilayer Perceptron (an artificial neural net), the repeated process of adjusting the weight of each neuron node in order to minimize error.

  • Logit function: In Logistic Regression, the cumulative distribution function of the logistic distribution.

  • Overfitting: In machine learning, an undesired occurrence when noise in the training data is learned by the model as concepts, negatively impacting the model’s ability to make predictions on new data.

  • Sigmoid node: In a Multilayer Perceptron (an artificial neural net), a node that is activated based on the Sigmoid function, a special kind of logistic function.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, G.Y.K., Hayashi, A., Sarkar, V. (2018). Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs. In: Chandrasekaran, S., Juckeland, G. (eds) Accelerator Programming Using Directives. WACCPD 2017. Lecture Notes in Computer Science(), vol 10732. Springer, Cham. https://doi.org/10.1007/978-3-319-74896-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74896-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74895-5

  • Online ISBN: 978-3-319-74896-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics