Boosting Java Performance Using GPGPUs

  • James Clarkson
  • Christos Kotselidis
  • Gavin Brown
  • Mikel Luján
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10172)


In this paper we describe Jacc, an experimental framework which allows developers to program GPGPUs directly from Java. The goal of Jacc, is to allow developers to benefit from using heterogeneous hardware whilst minimizing the amount of code refactoring required. Jacc utilizes two key abstractions: tasks which encapsulate all the information needed to execute code on a GPGPU; and task graphs which capture both inter-task control-flow and data dependencies. These abstractions enable the Jacc runtime system to automatically choreograph data movement and synchronization between the host and the GPGPU; eliminating the need to explicitly manage disparate memory spaces. We demonstrate the advantages of Jacc, both in terms of programmability and performance, by evaluating it against existing Java frameworks. Experimental results show an average performance speedup of 19x, using NVIDIA Tesla K20m GPU, and a 4x decrease in code complexity when compared with writing multi-threaded Java code across eight evaluated benchmarks.


Task Graph Runtime System Java Implementation Outermost Loop OpenMP Implementation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported by the AnyScale Apps and PAMELA projects funded by EPSRC EP/L000725/1 and EP/K008730/1. Dr. Luján is supported by a Royal Society University Research Fellowship.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Auerbach, J., Bacon, D.F., Cheng, P., Rabbah, R.: Lime: a java-compatible and synthesizable language for heterogeneous architectures. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA 2010). ACM (2010)Google Scholar
  6. 6.
    Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP 2011). ACM (2011)Google Scholar
  7. 7.
    Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism, p. 35. ACM Press (2011)Google Scholar
  8. 8.
    Chafik, O.: ScalaCL: faster scala: optimizing compiler plugin+GPU-based collections (openCL).
  9. 9.
    Dotzler, G., Veldema, R., Klemm, M.: JCudaMP. In: Proceedings of the 3rd International Workshop on Multicore Software Engineering (2010)Google Scholar
  10. 10.
    Fumero, J.J., Steuwer, M., Dubach, C.: A composable array function interface for heterogeneous computing in java. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2014). ACM (2014)Google Scholar
  11. 11.
    Garg, R., Hendren, L.: Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT 2014). ACM (2014)Google Scholar
  12. 12.
    Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V.: Accelerating habanero-java programs with openCL generation. In: Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools (2013)Google Scholar
  13. 13.
    Herhut, S., Hudson, R.L., Shpeisman, T., Sreeram, J.: River trail: a path to parallelism in javascript. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages Applications (OOPSLA 2013). ACM (2013)Google Scholar
  14. 14.
    Nystrom, N., White, D., Das, K.: Firepile: run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE 2011). ACM (2011)Google Scholar
  15. 15.
    OpenMP Architecture Review Board: OpenMP Specification (version 4.0) (2014)Google Scholar
  16. 16.
    Pratt-Szeliga, P., Fawcett, J., Welch, R.: Rootbeer: seamlessly using GPUs from java. In: Proceedings of 14th International IEEE High Performance Computing and Communication Conference on Embedded Software and Systems (2012)Google Scholar
  17. 17.
    Vallèe-Rai, R., Hendren, L., Sundaresan, V., Lam, P., Gagnon, E., Phong, C.: Soot - a java optimization framework. In: Proceedings of CASCON 1999 (1999)Google Scholar
  18. 18.
    Yan, Y., Grossman, M., Sarkar, V.: JCUDA: a programmer-friendly interface for accelerating java programs with CUDA. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03869-3_82 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • James Clarkson
    • 1
  • Christos Kotselidis
    • 1
  • Gavin Brown
    • 1
  • Mikel Luján
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations