Skip to main content
Log in

Abstract

To obtain significant execution speedups, GPUs rely heavily on the inherent data-level parallelism present in the targeted application. However, application programs may not always be able to fully utilize these parallel computing resources due to intrinsic data dependencies or complex data pointer operations. In this paper, we explore how to leverage aggressive software-based value prediction techniques on a GPU to accelerate programs that lack inherent data parallelism. This class of applications are typically difficult to map to parallel architectures due to the presence of data dependencies and complex data pointer manipulation present in these applications. Our experimental results show that, despite the overhead incurred due to software speculation and the communication overhead between the CPU and GPU, we obtain up to 6.5\(\times \) speedup on a selected set of kernels taken from the SPEC CPU2006, PARSEC and Sequoia benchmark suites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Advanced Micro Devices, Inc: Heterogeneous Computing Open CL and the ATI Radeon HD 5870 (Evergreen) Architecture (2010)

  2. NVIDIA Tesla Computing Solutions, http://www.nvidia.com/object/tesla_computing_solutions.html

  3. Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems

  4. Steffan, J., Mowry, T.: The potential for using thread-level data speculation to facilitate automatic parallelization. In: Proceedings of the 4th International Symposium on High-Performance Computer, Architecture, pp. 2–13 (1998)

  5. Liu, S., Gaudiot, J.-L.: Potential impact of value prediction on communication in many-core architectures. IEEE Trans. Comput. 58(6), 759–769 (2009)

    Article  MathSciNet  Google Scholar 

  6. Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Speculative execution on GPU: An exploratory study. In: 2010 39th International Conference on Parallel Processing (ICPP), pp. 453–461 sept. (2010)

  7. Lipasti, M.H., Shen, J.P.: Exceeding the dataflow limit via value prediction. In: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture

  8. Sazeides, Y., Vassiliadis, S., Smith, J.E.: The performance potential of data dependence speculation& collapsing. In: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture

  9. Sazeides, Y., Smith, J.E.: The predictability of data values. In: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture

  10. Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Value prediction and speculative execution on GPU. Int. J. Parallel Programm. 11, 1-20–20 (2010)

    Google Scholar 

  11. Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing

  12. Dang, F., Yu, H., Rauchwerger, L.: The R-LRPD test: Speculative parallelization of partially parallel loops, Technical report. College Station, TX, USA (2001)

  13. Cintra, M., Llanos, D.R.: Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

  14. Marcuello, P., Tubella, J., Gonzalez, A.: Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 230–236 (1999)

  15. Oplinger, J.T., Heine, D.L., Lam, M.S.: In search of speculative thread-level parallelism. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

  16. CUDA Zone - the resource for CUDA developers http://www.nvidia.com/object/cuda_home_new.html

  17. OpenCL—The open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl/

  18. Owens, J., Houston, M., Luebke, D., Green, S., Stone, J., Phillips, J.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)

    Article  Google Scholar 

  19. Jang, B., Kaeli, D., Do, S., Pien, H.: Multi GPU implementation of iterative tomographic reconstruction algorithms. In: ISBI’09: Proceedings of the Sixth IEEE International Conference on Symposium on Biomedical Imaging, pp. 185–188. IEEE Press, Piscataway, NJ, USA (2009)

  20. ATI Radeon HD 5870 Graphics, http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5870/Pages/ati-radeon-hd-5870-overview.aspx

  21. 2nd Generation Intel Core i7 Processor, http://www.intel.com/products/processor/corei7/index.htm

  22. Intel Xeon Processor E5530, http://ark.intel.com/Product.aspx?spec=SLBF7

  23. Spradling, C.D.: SPEC CPU 2006 benchmark tools. SIGARCH Comput. Archit. News 35, 130–134 (2007)

    Article  Google Scholar 

  24. Bienia, C.: Benchmarking Modern Multiprocessors, Ph.D. dissertation. Princeton University, Jan. (2011)

  25. ASC Sequoia Benchmark Codes, http://www.llnl.gov/asc/sequoia/benchmarks

  26. IRSmk source code from LLNL, http://www.llnl.gov/asc/sequoia/benchmarks/IRSmk_v0.9.0.tar

  27. Kejariwal, A., Casçaval, C.: Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

  28. SPEC 2006 benchmark suite, http://www.spec.org/cpu2006/

Download references

Acknowledgments

The work presented in this paper was supported in part by the NSF through an EEC Innovation Award (EEC-0946463), by AMD through the AMD Strategic Academic Partners Program, by NVIDIA through the NVIDIA CUDA Research Centers Program, and by support by the Vice Provost’s Office of Research at Northeastern University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enqiang Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, E., Kaeli, D. Aggressive Value Prediction on a GPU. Int J Parallel Prog 42, 30–48 (2014). https://doi.org/10.1007/s10766-012-0232-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0232-7

Keywords

Navigation