International Journal of Parallel Programming

, Volume 39, Issue 5, pp 533–552 | Cite as

Value Prediction and Speculative Execution on GPU

  • Shaoshan LiuEmail author
  • Christine Eisenbeis
  • Jean-Luc Gaudiot
Open Access


GPUs and CPUs have fundamentally different architectures. It is conventional wisdom that GPUs can accelerate only those applications that exhibit very high parallelism, especially vector parallelism such as image processing. In this paper, we explore the possibility of using GPUs for value prediction and speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU. Based on these observations, we explore the hardware implementation of speculative execution operations on GPU architectures to reduce the software performance overheads. The results indicate that the hardware extensions result in almost tenfold reduction of the control divergent sequential operations with only moderate hardware (5–8%) and power consumption (1–5%) overheads.


Value prediction Speculative execution GPU 



This work is partly supported by the National Science Foundation under Grant No. CCF-0541403 and by the French Agence Nationale pour la Recherche (ANR) PetaQCD project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or of the ANR.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


  1. 1.
  2. 2.
  3. 3.
    Lipasti, M.H., Shen,J.P.: Exceeding the dataflow limit via value prediction. In: Proceedings of the 29th International Symposium on Microarchitecture, December 1996Google Scholar
  4. 4.
    Sazeides, Y., Smith, J.E.: The predictability of data values. In: Proceedings of the 30th Annual International Symposium on Microarchitecture, December 1997Google Scholar
  5. 5.
    Sodani, A., Sohi, G.S.: Understanding the differences between value prediction and instruction reuse. In: Proceedings of the 31st Annual International Symposium on Microarchitecture, December 1998Google Scholar
  6. 6.
    Marcuello, P., Tubella, J., González, A.: Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd Annual international Symposium on Microarchitecture (Micro’99), November 1999Google Scholar
  7. 7.
    Oplinger, J., Heine, D., Lam, M.S.: In search of speculative thread-level parallelism. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT’99), October 1999Google Scholar
  8. 8.
    Liu S., Gaudiot J-L.: Potential impact of value prediction on communication in many-core architectures. IEEE Trans. Comput. 58, 6 (2009)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Knight, T.: An architecture for mostly functional languages. In: Proceedings of the ACM Lisp and Functional Programming Conference, August, 1986Google Scholar
  10. 10.
    Franklin M., Sohi G.S.: APB: a hardware mechanism for dynamic reordering of memory references. IEEE Trans. Comput. 45, 5 (1996)CrossRefGoogle Scholar
  11. 11.
    Sohi, G.S., Breach, S., Vijaykumar, T.N.: Multiscalar Processors. In: Proceedings of the 22nd International Symposium on Computer Architecture (ISCA’95), June, 1995Google Scholar
  12. 12.
    Hammond L., Hubbert B.A., Siu M., Prabhu M.K., Chen M., Olukotun K.: The stanford hydra CMP. IEEE Micro 22, 2 (2000)Google Scholar
  13. 13.
  14. 14.
    CUDA Zone—the resource for CUDA developers,
  15. 15.
    Rauchwerger L., Padua D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. ACM SIGPLAN Notices 30, 6 (1995)CrossRefGoogle Scholar
  16. 16.
  17. 17.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.; The PARSEC benchmark suite: characterization and architectural implications, Princeton University Technical Report TR-811-08, January 2008Google Scholar
  18. 18.
  19. 19.

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Shaoshan Liu
    • 1
    Email author
  • Christine Eisenbeis
    • 2
  • Jean-Luc Gaudiot
    • 3
  1. 1.MicrosoftRedmondUSA
  2. 2.Alchemy team, INRIA Saclay - Île-de-France & Univ Paris-Sud 11 (LRI, UMR CNRS 8623)OrsayFrance
  3. 3.University of CaliforniaIrvineUSA

Personalised recommendations