Static Partitioning vs Dynamic Sharing of Resources in Simultaneous MultiThreading Microarchitectures

  • Chen Liu
  • Jean-Luc Gaudiot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3756)

Abstract

Simultaneous MultiThreading (SMT) achieves better system resource utilization and higher performance because it exploits Thread-Level Parallelism (TLP) in addition to “conventional” Instruction-Level Parallelism (ILP). Theoretically, system resources in every pipeline stage of an SMT microarchitecture can be dynamically shared. However, in commercial applications, all the major queues are statically partitioned. From an implementation point of view, static partitioning of resources is easier to implement and has a lower hardware overhead and power consumption. In this paper, we strive to quantitatively determine the tradeoff between static partitioning and dynamic sharing. We find that static partitioning of either the instruction fetch queue (IFQ) or the reorder buffer (ROB) is not sufficient if implemented alone (3% and 9% performance decrease respectively in the worst case comparing with dynamic sharing), while statically partitioning both the IFQ and the ROB could achieve an average performance gain of 9% at least, and even reach 148% when running with floating-point benchmarks, when compared with dynamic sharing. We varied the number of functional units in our efforts to isolate the reason for this performance improvement. We found that static partitioning both queues outperformed all the other partitioning mechanisms under the same system configuration. This demonstrates that the performance gain has been achieved by moving from dynamic sharing to static partitioning of the system resources.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Raasch, S.E., Reinhardt, S.K.: The Impact of Resource Partitioning on SMT Processors. In: Proceedings of the 12th Intenrational Conference on Parallel Architectures and Compilation Techniques (PACT 2003), New Orleans, Louisiana, USA, September 27 - October 01, pp. 15–26 (2003)Google Scholar
  2. 2.
    Sazeides, Y., Juan, T.: How to Compare the Performace of Two SMT Microarchitectures. In: Proceedings of 2001 IEEE International Symposium on Performance Analysis of System and Software (ISPASS-2001), Tucson, Arizona, USA, November 4-6 (2001)Google Scholar
  3. 3.
    Burger, D., Austin, T.: The SimpleScalar Tool Set, Version 2.0. University of Wisconsin-Madison Computer Science Department Technical Report No.1342 (June 1997)Google Scholar
  4. 4.
    Koufaty, D., Marr, D.T.: Hyperthreading Technology in the Netburst Microarchitecture. IEEE Micro (March-April 2003)Google Scholar
  5. 5.
    Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J., Alan, U.M.: Hyper-Threading Technology Architecture and Microarchitecture. Intel Technology Journal Q1 (2002)Google Scholar
  6. 6.
    SPEC CPU 2000 Benchmark Suite (2000), http://www.specbench.org/osg/cpu2000/
  7. 7.
    Kang, D., Gaudiot, J.-L.: Speculation control for simultaneous Multithreading. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico, April 26-30 (2004)Google Scholar
  8. 8.
    Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., Smith, B.: The TERA Computer System. ACM SIGARCH Computer Architecture News 18(3), 1–6 (1990)CrossRefGoogle Scholar
  9. 9.
    Smith, B.J.: Architecture and Applications of the HEP Multiprocessor Computer System. SPIE Real Time Signal Processing IV, 241–248 (1981)Google Scholar
  10. 10.
    Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal Q1 (2001)Google Scholar
  11. 11.
    Preston, R.P., Badeau, R.W., Bailey, D.W., Bell, S.L., et al.: Design of an 8-wide Superscalar RISC Microprocessor with Simultaneous Multithreading. In: Proceedings of 2002 IEEE International Solid-State Circuits Conference (ISSCC 2002), vol. 1 (2002)Google Scholar
  12. 12.
    Thistle, M.R., Smith, B.J.: A Processor Architecture for HORIZON. In: Proceedings of the 1988 ACM/IEEE Conference on Supercomputing, Orlando, Florida, USA, November 12-17, pp. 35–41 (1988)Google Scholar
  13. 13.
    Agarwal, A., Lim, B.-H., Kranz, D., Kubiatowicz, J.: APRIL: A Processor Architecture for Multiprocessing. In: Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA 1990), pp. 104–114 (1990)Google Scholar
  14. 14.
    Nemirovsky, M.D., Brewer, F., Wood, R.C.: DISC: Dynamic Instruction Stream Computer. In: Proceedings of the 24th annual international symposium on Microarchitecture (Micro-24), Albuquerque, New Mexico, Puerto Rico, pp. 163–171 (1991)Google Scholar
  15. 15.
    Yamamoto, W., Nemirovsky, M.D.: Increasing superscalar performance through multistreaming. In: Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, Limassol, Cyprus, pp. 49–58 (1995)Google Scholar
  16. 16.
    Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-chip Parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA 1995), pp. 392–403 (1995)Google Scholar
  17. 17.
    Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.: Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In: Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA 1996), pp. 191–202 (1996)Google Scholar
  18. 18.
    Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L., Tullsen, D.M.: Simultaneous Multithreading: A Platform for Next-Generation Processors. IEEE Micro 17(5), 12–19 (1997)CrossRefGoogle Scholar
  19. 19.
    Shin, C.-H., Lee, S.-W., Gaudiot, J.-L.: Dynamic Scheduling Issues in SMT Architectures. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, April 22-26, pp. 77–84 (2003)Google Scholar
  20. 20.
    Burns, J., Gaudiot, J.-L.: SMT Layout Overhead and Scalability. IEEE Transactions on Parallel and Distributed Systems 13(2), 142–155 (2002)CrossRefGoogle Scholar
  21. 21.
    Lee, S.-W., Gaudiot, J.-L.: Clustered Microarchitecture Simultaneous Multithreading. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 576–585. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  22. 22.
    Thornton, J.E.: Design of a computer: the CDC 6600. Scott, Foresman Co., Glenview, Ill (1970)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Chen Liu
    • 1
  • Jean-Luc Gaudiot
    • 1
  1. 1.Department of Electrical Engineering and Computer ScienceUniversity of CaliforniaIrvineUSA

Personalised recommendations