International Journal of Parallel Programming

, Volume 42, Issue 1, pp 30–48 | Cite as

Aggressive Value Prediction on a GPU

  • Enqiang SunEmail author
  • David Kaeli


To obtain significant execution speedups, GPUs rely heavily on the inherent data-level parallelism present in the targeted application. However, application programs may not always be able to fully utilize these parallel computing resources due to intrinsic data dependencies or complex data pointer operations. In this paper, we explore how to leverage aggressive software-based value prediction techniques on a GPU to accelerate programs that lack inherent data parallelism. This class of applications are typically difficult to map to parallel architectures due to the presence of data dependencies and complex data pointer manipulation present in these applications. Our experimental results show that, despite the overhead incurred due to software speculation and the communication overhead between the CPU and GPU, we obtain up to 6.5\(\times \) speedup on a selected set of kernels taken from the SPEC CPU2006, PARSEC and Sequoia benchmark suites.


Parallelism General purpose GPU computing Data dependency Value prediction 



The work presented in this paper was supported in part by the NSF through an EEC Innovation Award (EEC-0946463), by AMD through the AMD Strategic Academic Partners Program, by NVIDIA through the NVIDIA CUDA Research Centers Program, and by support by the Vice Provost’s Office of Research at Northeastern University.


  1. 1.
    Advanced Micro Devices, Inc: Heterogeneous Computing Open CL and the ATI Radeon HD 5870 (Evergreen) Architecture (2010)Google Scholar
  2. 2.
  3. 3.
    Hammond, L., Willey, M., Olukotun, K.: Data speculation support for a chip multiprocessor. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating SystemsGoogle Scholar
  4. 4.
    Steffan, J., Mowry, T.: The potential for using thread-level data speculation to facilitate automatic parallelization. In: Proceedings of the 4th International Symposium on High-Performance Computer, Architecture, pp. 2–13 (1998)Google Scholar
  5. 5.
    Liu, S., Gaudiot, J.-L.: Potential impact of value prediction on communication in many-core architectures. IEEE Trans. Comput. 58(6), 759–769 (2009)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Speculative execution on GPU: An exploratory study. In: 2010 39th International Conference on Parallel Processing (ICPP), pp. 453–461 sept. (2010)Google Scholar
  7. 7.
    Lipasti, M.H., Shen, J.P.: Exceeding the dataflow limit via value prediction. In: Proceedings of the 29th Annual ACM/IEEE International Symposium on MicroarchitectureGoogle Scholar
  8. 8.
    Sazeides, Y., Vassiliadis, S., Smith, J.E.: The performance potential of data dependence speculation& collapsing. In: Proceedings of the 29th Annual ACM/IEEE International Symposium on MicroarchitectureGoogle Scholar
  9. 9.
    Sazeides, Y., Smith, J.E.: The predictability of data values. In: Proceedings of the 30th annual ACM/IEEE International Symposium on MicroarchitectureGoogle Scholar
  10. 10.
    Liu, S., Eisenbeis, C., Gaudiot, J.-L.: Value prediction and speculative execution on GPU. Int. J. Parallel Programm. 11, 1-20–20 (2010)Google Scholar
  11. 11.
    Gupta, M., Nim, R.: Techniques for speculative run-time parallelization of loops. In: Proceedings of the 1998 ACM/IEEE Conference on SupercomputingGoogle Scholar
  12. 12.
    Dang, F., Yu, H., Rauchwerger, L.: The R-LRPD test: Speculative parallelization of partially parallel loops, Technical report. College Station, TX, USA (2001)Google Scholar
  13. 13.
    Cintra, M., Llanos, D.R.: Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingGoogle Scholar
  14. 14.
    Marcuello, P., Tubella, J., Gonzalez, A.: Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 230–236 (1999)Google Scholar
  15. 15.
    Oplinger, J.T., Heine, D.L., Lam, M.S.: In search of speculative thread-level parallelism. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation TechniquesGoogle Scholar
  16. 16.
    CUDA Zone - the resource for CUDA developers
  17. 17.
    OpenCL—The open standard for parallel programming of heterogeneous systems,
  18. 18.
    Owens, J., Houston, M., Luebke, D., Green, S., Stone, J., Phillips, J.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)CrossRefGoogle Scholar
  19. 19.
    Jang, B., Kaeli, D., Do, S., Pien, H.: Multi GPU implementation of iterative tomographic reconstruction algorithms. In: ISBI’09: Proceedings of the Sixth IEEE International Conference on Symposium on Biomedical Imaging, pp. 185–188. IEEE Press, Piscataway, NJ, USA (2009)Google Scholar
  20. 20.
  21. 21.
    2nd Generation Intel Core i7 Processor,
  22. 22.
  23. 23.
    Spradling, C.D.: SPEC CPU 2006 benchmark tools. SIGARCH Comput. Archit. News 35, 130–134 (2007)CrossRefGoogle Scholar
  24. 24.
    Bienia, C.: Benchmarking Modern Multiprocessors, Ph.D. dissertation. Princeton University, Jan. (2011)Google Scholar
  25. 25.
    ASC Sequoia Benchmark Codes,
  26. 26.
  27. 27.
    Kejariwal, A., Casçaval, C.: Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingGoogle Scholar
  28. 28.
    SPEC 2006 benchmark suite,

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringNortheastern UniversityBostonUSA

Personalised recommendations