Boosting Java Performance Using GPGPUs

Clarkson, James; Kotselidis, Christos; Brown, Gavin; Luján, Mikel

doi:10.1007/978-3-319-54999-6_5

Boosting Java Performance Using GPGPUs

James Clarkson¹⁸,
Christos Kotselidis¹⁸,
Gavin Brown¹⁸ &
…
Mikel Luján¹⁸

Conference paper
First Online: 04 March 2017

1045 Accesses
9 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10172))

Abstract

In this paper we describe Jacc, an experimental framework which allows developers to program GPGPUs directly from Java. The goal of Jacc, is to allow developers to benefit from using heterogeneous hardware whilst minimizing the amount of code refactoring required. Jacc utilizes two key abstractions: tasks which encapsulate all the information needed to execute code on a GPGPU; and task graphs which capture both inter-task control-flow and data dependencies. These abstractions enable the Jacc runtime system to automatically choreograph data movement and synchronization between the host and the GPGPU; eliminating the need to explicitly manage disparate memory spaces. We demonstrate the advantages of Jacc, both in terms of programmability and performance, by evaluating it against existing Java frameworks. Experimental results show an average performance speedup of 19x, using NVIDIA Tesla K20m GPU, and a 4x decrease in code complexity when compared with writing multi-threaded Java code across eight evaluated benchmarks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This problem is resolved in Java 8.
2.
However, this can easily lead to a large number of indirect-memory accesses in the generated code - which will degrade performance on a GPGPU.
3.
There is no technical reason why support can not be added at a later date.
4.
In our experience, the majority of kernels that we could not auto-parallelize using this scheme was due containing multiple loop-nests.
5.
The schema also tracks which fields are accessed and modified by the code, to minimize the cost of synchronizing data with the host after a task has been executed.
6.
The OpenMP implementation uses the OS supplied libatlas library.
7.
We found that changing Jacc’s work group size, to match that of APARAPI, severely reduced performance but remained faster than APARAPI.

References

Aparapi. http://developer.amd.com/tools-and-sdks/opencl-zone/aparapi/
CUDA. http://developer.nvidia.com/cuda-zone
OpenCL. https://www.khronos.org/opencl/
Project Sumatra. http://openjdk.java.net/projects/sumatra/
Auerbach, J., Bacon, D.F., Cheng, P., Rabbah, R.: Lime: a java-compatible and synthesizable language for heterogeneous architectures. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA 2010). ACM (2010)
Google Scholar
Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP 2011). ACM (2011)
Google Scholar
Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism, p. 35. ACM Press (2011)
Google Scholar
Chafik, O.: ScalaCL: faster scala: optimizing compiler plugin+GPU-based collections (openCL). http://code.google.com/p/scalacl
Dotzler, G., Veldema, R., Klemm, M.: JCudaMP. In: Proceedings of the 3rd International Workshop on Multicore Software Engineering (2010)
Google Scholar
Fumero, J.J., Steuwer, M., Dubach, C.: A composable array function interface for heterogeneous computing in java. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2014). ACM (2014)
Google Scholar
Garg, R., Hendren, L.: Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT 2014). ACM (2014)
Google Scholar
Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V.: Accelerating habanero-java programs with openCL generation. In: Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools (2013)
Google Scholar
Herhut, S., Hudson, R.L., Shpeisman, T., Sreeram, J.: River trail: a path to parallelism in javascript. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages Applications (OOPSLA 2013). ACM (2013)
Google Scholar
Nystrom, N., White, D., Das, K.: Firepile: run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE 2011). ACM (2011)
Google Scholar
OpenMP Architecture Review Board: OpenMP Specification (version 4.0) (2014)
Google Scholar
Pratt-Szeliga, P., Fawcett, J., Welch, R.: Rootbeer: seamlessly using GPUs from java. In: Proceedings of 14th International IEEE High Performance Computing and Communication Conference on Embedded Software and Systems (2012)
Google Scholar
Vallèe-Rai, R., Hendren, L., Sundaresan, V., Lam, P., Gagnon, E., Phong, C.: Soot - a java optimization framework. In: Proceedings of CASCON 1999 (1999)
Google Scholar
Yan, Y., Grossman, M., Sarkar, V.: JCUDA: a programmer-friendly interface for accelerating java programs with CUDA. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03869-3_82
Chapter Google Scholar

Download references

Acknowledgments

This work is supported by the AnyScale Apps and PAMELA projects funded by EPSRC EP/L000725/1 and EP/K008730/1. Dr. Luján is supported by a Royal Society University Research Fellowship.

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, Manchester, UK
James Clarkson, Christos Kotselidis, Gavin Brown & Mikel Luján

Authors

James Clarkson
View author publications
You can also search for this author in PubMed Google Scholar
Christos Kotselidis
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Brown
View author publications
You can also search for this author in PubMed Google Scholar
Mikel Luján
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Clarkson .

Editor information

Editors and Affiliations

Vienna University of Technology, Vienna, Austria
Jens Knoop
Karlsruhe Institute of Technology, Karlsruhe, Germany
Wolfgang Karl
Lawrence Livermore National Laboratory, Livermore, USA
Martin Schulz
Kyushu University, Fukuoka, Japan
Koji Inoue
Otto-von-Guericke Universität Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Clarkson, J., Kotselidis, C., Brown, G., Luján, M. (2017). Boosting Java Performance Using GPGPUs. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds) Architecture of Computing Systems - ARCS 2017. ARCS 2017. Lecture Notes in Computer Science(), vol 10172. Springer, Cham. https://doi.org/10.1007/978-3-319-54999-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-54999-6_5
Published: 04 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54998-9
Online ISBN: 978-3-319-54999-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics