Skip to main content
Log in

Design Alternatives of Multithreaded Architecture

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

This paper compares two possible implementations of multithreaded architecture and proposes a new architecture combining the flexibility of the first with the low hardware complexity of the second. We present performance and step-by-step complexity analysis of two design alternatives of multithreaded architecture: dynamic inter-thread resource scheduling and static resource allocation. We then introduce a new multithreaded architecture based on a new scheduling mechanism called the “semi-static.” We show that with two concurrent threads the dynamic scheduling processor achieves from 5 to 45 % higher performance at the cost of much more complicated design. This paper indicates that for a relatively high number of execution resources the complexity of the dynamic scheduling logic will inevitably require design compromises. Moreover, high chip-wide communication time and an incomplete bypassing network will limit the dynamic scheduling and reduce its performance advantage. On the other hand, static scheduling architecture achieves low resource utilization. The semi-static architecture utilizes compiler techniques to exploit patterns of program parallelism and introduces a new hardware mechanism, in order to achieve performance close to dynamic scheduling without significantly increasing the static hardware complexity. The semi-static architecture statically assigns part of the functional units but dynamically schedules the most performance-critical functional units on a medium-grain basis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. D. W. Wall, Limits on Instruction-Level Parallelism, ASPLOS IV: Fourth Int'l. Conf. on Architectural Support for Progr. Lang. and Operat. Syst., pp. 176–188 (April 1991).

  2. D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm, Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, Proc. 23rd Ann. Int'l. Symp. Computer Architecture, pp. 191–202 (1996).

  3. M. Johnson, Superscalar Microprocessor Design, Prentice-Hall, Englewood Cliffs, New Jersey (1991).

    Google Scholar 

  4. L. Gwennap, PA-8000 Combines Complexity and Speed, Microprocessor Report, 8(15) (November 14, 1994).

  5. Peter Christy, IA-64 and Merced––What and Why, Microprocessor Report, 10(17) (December 1996).

  6. Carole Dulong, The IA-64 Architecture at Work, Computer, 31(7):24–32 (July 1998).

    Google Scholar 

  7. J. A. Fisher, Very Long Instruction Word Architecture and the Eli-512, Proc. 10th Ann. Symp. Computer Architecture, pp. 140–150 (June 1983).

  8. Z. Rozenshein, STAR*CORE: A Scalable High performance DSP Architecture, Motorola, Microprocessor forum (October 14, 1998).

  9. A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'souza, and M. Parkin, Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors, IEEE Micro, 13(3):48–61 (June 1993).

    Google Scholar 

  10. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, Multiscalar Processors, Proc. 22nd Ann. Int'l. Symp. Computer Architecture (1995).

  11. R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, B. Smith, The Tera Computer System, Proc. Int'l. Conf. Supercomputing, pp. 1–6 (June 1990).

  12. B. K. Gunther, Multithreading with Distributed Functional Units, IEEE Trans. Computers, 46(4):399–411 (April 1997).

    Google Scholar 

  13. B. H. Krishna and R. Govindarajan, Performance Evaluation of Simultaneous Multithreaded Architectures, Proc. Fourth Int'l. Conf. '97), pp. 34–43.

  14. H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, G. R. Gao, and L. J. Hendren, A study of the EARTH-MANNA multithreaded system, IJPP, 24(4):319–347 (August 1996).

    Google Scholar 

  15. A. Wolfe and J. P. Shen, A Variable Instruction Stream Extension to the VLIW Architecture, ASPLOS IV: Fourth Int'l. Conf. Architectural Support for Progr. Lang. Operat. Syst., pp. 2–14 ( April 1991).

  16. G. Tyson, M. Farrens, and A. R. Pleszkun, MISC: A Multiple Instruction Stream Computer. MICRO-25, Proc. 25th Int'l. Symp. Microarchitecture, pp. 193–196 (December 1992).

  17. D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parralelism, Proc. 22nd Ann. Int'l. Symp. Computer Architecture, pp. 392–403 (1995).

  18. M. Bekerman, A. Mendelson, and G. Sheaffer, Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures, Conf. Parallel Architect. Compilation Techniques (PACT 96), pp. 24–34 (1996).

  19. Haitham Akkary and Michael A. Driscoll, A Dynamic Multithreaded Processor, MICRO-31, Proc. 31st Int'l. Symp. Microarchitecture, pp. 226–236 (November 1998).

  20. G. E. Daddis and H. C. Torng, The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors, Proc. Int'l. Conf. Parallel Processing, I:76–83 (August 1991).

    Google Scholar 

  21. H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa, An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads, Proc. 19th Ann. Int'l. Symp. Computer Architecture, pp. 136–145 (1992).

  22. S. W. Keckler and W. J. Dally, Processor Coupling: Integrating Compile Time and Run-Time Scheduling for Parallelism, Proc. 18th Ann. Int'l. Symp. on Computer Architecture, pp. 202–213 (May 1992).

  23. R. E. Hank, S. A. Mahlke, R. A. Bringmann, J. C. Gyllenhaal, and W. W. Hwu, Superblock Formation Using Static Program Analysis, MICRO-26, Proc. 26th Int'l. Symp. Microarchitecture, Austin, Texas (December 1993).

  24. D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Inc., 1990.

  25. L. Gwennap, Intel's P6 Uses Decoupled Superscalar Design, Microprocessor Report, 9(2) (February 16, 1995).

  26. J. E. Smith and A. R. Pleszkun, Implementation of Precise Interrupts in Pipelined Processor, Proc. 12th Ann. Int' l. Symp. Computer Architecture, Piscataway, New Jersey, pp. 36–44 (1985).

  27. R. Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM J. 11:25–33 (January 1967).

    Google Scholar 

  28. G. S. Sohi, Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Units, Pipelined Computers, IEEE Trans. Computers, 39(3) (March 1990).

  29. J. P. Singh, W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared-Memory, Computer Architecture News, 20(1):5–44 (March 1992).

    Google Scholar 

  30. SPEC Newsletter, Vol. 1 (1989).

  31. Shade User's Manual, UNIX Manual Pages for Shade Analyzer and Library Functions, Spix Tools Users Manual, SpixTools, SUN Microsystems (1992).

  32. B. R. Rau, Data Flow and Dependence Analysis for Instruction Level Parallelism, Proc. Fourth Int'l. Workshop on Lang. Compilers for Parallel Computing, Lecture Notes in Computer Science (LNCS ), 589:236–250 (August 1991).

    Google Scholar 

  33. P. P. Chang, S. A. Mahlke, and W. W. Hwu, Using Profile Information to Assist Classic Code Optimization, Software Practice and Experience, 21:1301–1321 (December 1991).

    Google Scholar 

  34. A. Mendelson and B. Mendelson, Toward a General-Purpose Multi-Stream System, Proc. IFIP Working Conf. Parallel Architectures and Compilation Techniques (PACT 94), pp. 335–338 (1994).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mendelson, A., Bekerman, M. Design Alternatives of Multithreaded Architecture. International Journal of Parallel Programming 27, 161–193 (1999). https://doi.org/10.1023/A:1018733528538

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018733528538

Navigation