International Journal of Parallel Programming

, Volume 36, Issue 4, pp 361–385 | Cite as

The Impact of Speculative Execution on SMT Processors

  • Dongsoo Kang
  • Chen LiuEmail author
  • Jean-Luc Gaudiot


By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are fetched from multiple threads. However, due to incorrect control speculations, a significant number of these in-flight instructions are discarded from the pipelines of SMT processors (which is a direct consequence of these pipelines getting wider and deeper). Although increasing the accuracy of branch predictors may reduce the number of instructions so discarded from the pipelines, the prediction accuracy cannot be easily scaled up since aggressive branch prediction schemes strongly depend on the particular predictability inherently to the application programs. In this paper, we present an efficient thread scheduling mechanism for SMT processors, called SAFE-T (Speculation-Aware Front-End Throttling): it is easy to implement and allows an SMT processor to selectively perform speculative execution of threads according to the confidence level on branch predictions, hence preventing wrong-path instructions from being fetched. SAFE-T provides an average reduction of 57.9% in the number of discarded instructions and improves the instructions per cycle (IPC) performance by 14.7% on average over the ICOUNT policy across the multi-programmed workloads we simulate.


Simultaneous multithreading Thread scheduling Speculation control Confidence estimator 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Marcuello, P., González, A.: Exploiting speculative thread-level parallelism on a SMT processor. Proc. Int’l Conference on High Performance Computing and Networking, pp. 754–763 (Apr. 1999)Google Scholar
  2. 2.
    Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., Tullsen, D.: Simultaneous multithreading: a platform for next-generation processors. IEEE Micro. 17(5), 12–19 (Sept./Oct. 1997)Google Scholar
  3. 3.
    Hennessy J., Patterson D. (2002). Computer Architecture: A Quantitative Approach. 3rd edn. Morgan Kaufmann, San Francisco, CA zbMATHGoogle Scholar
  4. 4.
    Marr, D., Binns, F., Hill, D., Hinton, G., Koufaty, D., Miller, J., Upton, M.: Hyper-threading technology architecture and microarchitecture. Intel Technol. J. 06(01), 4–15 (Feb. 2002)Google Scholar
  5. 5.
    Tullsen, D., Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R.: Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. Proc. 23rd Annual Int’l Symposium on Computer Architecture, pp. 191–202 (May 1996)Google Scholar
  6. 6.
    Rabaey J. (1996). Digital Integrated Circuits: A Design Perspective. Prentice Hall, Upper Saddle River, NJ Google Scholar
  7. 7.
    Grunwald, D., Klauser, A., Manne, S., Pleszkun, A.: Confidence estimation for speculation control. Proc. 25th Annual Int’l Symposium on Computer Architecture (1998)Google Scholar
  8. 8.
    Jacobsen, E., Rotenberg, E., Smith, J.: Assigning confidence to conditional branch predictions. Proc. 29th Annual Int’l Symposium on Microarchitecture, pp. 142–152 (Dec. 1996)Google Scholar
  9. 9.
    Tullsen, D., Brown, J.: Handling long-latency loads in a simultaneous multithreading processor. Proc. 34th Int’l Symposium on Microarchitecture, pp. 318–327 (Dec. 2001)Google Scholar
  10. 10.
    Raasch, S.E., Reinhardt, S.K.: Applications of thread prioritization in SMT processors. Proc. Workshop on Multithreaded Execution and Compilation (Jan. 1999)Google Scholar
  11. 11.
    El-Mousry, A., Albonesi, D.H.: Front-end policies for improved issue efficiency in SMT processors. Proc. 9th Int’l Symposium on High-Performance Computer Architecture, pp. 31–40 (Feb. 2003)Google Scholar
  12. 12.
    Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: DCache warn: an I-Fetch policy to increase SMT efficiency. Proc. 18th Int’l Parallel & Distributed Processing Symposium (2004)Google Scholar
  13. 13.
    Liu, C., Gaudiot, J-L.: Static partitioning vs dynamic sharing of resources in simultaneous multithreading microarchitectures. Proc. 6th Int’l Workshop on Advanced Parallel Processing Technologies (Oct. 2005)Google Scholar
  14. 14.
    Raasch, S., Reinhardt, S.: The impact of resource partitioning on SMT processors. Proc. 12th Int’l Conference on Parallel Architectures and Compilation Techniques, pp. 15–25 (2003)Google Scholar
  15. 15.
    Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. Proc. 37th Int’l Symposium on Micoarchitecture, pp. 171–182 (Dec. 2004)Google Scholar
  16. 16.
    Choi, S., Yeung, D.: Learning-based SMT processor resource distribution via hill-climbing. Proc. 33rd Int’l Symposium on Computer Architecture, pp. 239–251 (2006)Google Scholar
  17. 17.
    Aragón, J., González, J., García, J., González, A.: Confidence estimation for branch prediction reversal. Proc. 8th Int’l Conference on High Performance Computing, pp. 214–223 (Dec. 2001)Google Scholar
  18. 18.
    Burtscher, M., Zorn, B.: Prediction outcome history-based confidence estimation for load value prediction. J. Instruction-Level Parallelism 1 (May 1999)Google Scholar
  19. 19.
    Heil, T., Smith, J.: Selective dual path execution. Univ. of Wisconsin – Madison, Technical Report (Nov. 1996).Google Scholar
  20. 20.
    Luo, K., Franklin, M., Mukherjee, S., Séznec, A.: Boosting SMT performance by speculation control. Proc. 15th Int’l Parallel and Distributed Processing Symposium (2001)Google Scholar
  21. 21.
    Manne, S., Klauser, A., Grunwald, D.: Pipeline gating: speculation control for energy reduction. Proc. 25th Annual Int’l Symposium on Computer Architecture, pp. 132–141, (1998)Google Scholar
  22. 22.
    Wall, D.: Limits of instruction-level parallelism. Proc. 4th Int’l Conf. on Architectural Support for Programming Languages and Operating System, pp.176–189 (1991)Google Scholar
  23. 23.
    Gonçalves, R., Pilla, M., Pizzol, G., Santos, T., Santos, R., Navaux, P.: Evaluating the effects of branch prediction accuracy on the performance of SMT architectures. Euromicro Workshop on Parallel and Distributed Processing, pp. 355–362 (Feb. 2001)Google Scholar
  24. 24.
    Yeh, T., Patt, Y.: Alternative implementations of two-level adaptive branch prediction. Proc. 19th Annual Int’l Symposium on Computer Architecture, pp. 124–134, (May 1992)Google Scholar
  25. 25.
    Knijnenburg, P., Ramirez, A., Latorre, F., Larrriba, J., Valero, M.: Branch classification to control instruction fetch in simultaneous multithreaded architectures. Proc. Int’l Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 67–76 (Aug. 2002)Google Scholar
  26. 26.
    Seng, J.S., Tullsen, D.M., Cai, G.Z.N.: Power-sensitive multithreaded architecture. Proc. Int’l Conference on Computer Design, pp. 199–206 (Sep. 2000)Google Scholar
  27. 27.
    Swanson S., McDowell L., Swift M., Eggers S., Levy H. (Aug. 2003). An evaluation of speculative instruction execution on simultaneous multithreaded processors. ACM Trans. Comput. Syst. 21(3): 314–340 CrossRefGoogle Scholar
  28. 28.
    Hilly, S., Séznec, A.: Branch prediction and simultaneous multithreading. 5th Proc. Int’l Conference on Parallel Architectures and Compilation Techniques, pp. 169–173 (1996)Google Scholar
  29. 29.
    Burger, D., Austin, T.: The SimpleScalar Tool Set, Version 2.0. Univ. of Wisconsin-Madison Computer Science Department Technical Report #1342 (June 1997)Google Scholar
  30. 30.
    Sohi G. (Mar. 1990). Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Trans. Comput. 39(3): 349–359 CrossRefGoogle Scholar
  31. 31.
    McFarling, S.: Combining branch predictors. WRL Technical Note TN-36 (Jun. 1993)Google Scholar
  32. 32.
    Henning, J.: SPEC CPU2000: measuring CPU performance in the new millennium. IEEE Compu. pp. 28–35 (July 2000)Google Scholar
  33. 33.
    KleinOsowski, A., Lilja, D.: MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters (June 2002)Google Scholar
  34. 34.
    Sazeides, Y., Juan, T.: How to compare the performance of two SMT microarchitectures. Proc. Int’l Symposium on Performance Analysis of Systems and Software, pp. 180–183 (Nov. 2001)Google Scholar
  35. 35.
    Citron D., Hurani A., Gnadrey A. (Sep. 2006). The harmonic or geometric mean: does it really matter?. Comput. Architect. News 34(4): 18–25 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Electrical EngineeringUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Department of Electrical Engineering & Computer ScienceUniversity of California, IrvineIrvineUSA

Personalised recommendations