Value Prediction and Speculative Execution on GPU
- 740 Downloads
GPUs and CPUs have fundamentally different architectures. It is conventional wisdom that GPUs can accelerate only those applications that exhibit very high parallelism, especially vector parallelism such as image processing. In this paper, we explore the possibility of using GPUs for value prediction and speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU. Based on these observations, we explore the hardware implementation of speculative execution operations on GPU architectures to reduce the software performance overheads. The results indicate that the hardware extensions result in almost tenfold reduction of the control divergent sequential operations with only moderate hardware (5–8%) and power consumption (1–5%) overheads.
KeywordsValue prediction Speculative execution GPU
This work is partly supported by the National Science Foundation under Grant No. CCF-0541403 and by the French Agence Nationale pour la Recherche (ANR) PetaQCD project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or of the ANR.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- 1.IBM Cell Broadband Engine, http://www.ibm.com/developerworks/power/library/pa-cellperf/
- 2.NVIDIA Tesla Computing Solutions, http://www.nvidia.com/object/tesla_computing_solutions.html
- 3.Lipasti, M.H., Shen,J.P.: Exceeding the dataflow limit via value prediction. In: Proceedings of the 29th International Symposium on Microarchitecture, December 1996Google Scholar
- 4.Sazeides, Y., Smith, J.E.: The predictability of data values. In: Proceedings of the 30th Annual International Symposium on Microarchitecture, December 1997Google Scholar
- 5.Sodani, A., Sohi, G.S.: Understanding the differences between value prediction and instruction reuse. In: Proceedings of the 31st Annual International Symposium on Microarchitecture, December 1998Google Scholar
- 6.Marcuello, P., Tubella, J., González, A.: Value prediction for speculative multithreaded architectures. In: Proceedings of the 32nd Annual international Symposium on Microarchitecture (Micro’99), November 1999Google Scholar
- 7.Oplinger, J., Heine, D., Lam, M.S.: In search of speculative thread-level parallelism. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT’99), October 1999Google Scholar
- 9.Knight, T.: An architecture for mostly functional languages. In: Proceedings of the ACM Lisp and Functional Programming Conference, August, 1986Google Scholar
- 11.Sohi, G.S., Breach, S., Vijaykumar, T.N.: Multiscalar Processors. In: Proceedings of the 22nd International Symposium on Computer Architecture (ISCA’95), June, 1995Google Scholar
- 12.Hammond L., Hubbert B.A., Siu M., Prabhu M.K., Chen M., Olukotun K.: The stanford hydra CMP. IEEE Micro 22, 2 (2000)Google Scholar
- 13.NVIDIA GeForce 8800, http://www.nvidia.com/page/geforce_8800.html
- 14.CUDA Zone—the resource for CUDA developers, http://www.nvidia.com/object/cuda_home.html#
- 16.SPEC CPU2006, http://www.spec.org/cpu2006/
- 17.Bienia, C., Kumar, S., Singh, J.P., Li, K.; The PARSEC benchmark suite: characterization and architectural implications, Princeton University Technical Report TR-811-08, January 2008Google Scholar
- 18.Intel Core i7 Processor, http://www.intel.com/products/processor/corei7/index.htm
- 19.Xilinx ML401 Overview, http://www.xilinx.com/products/boards/ml401/index.htm