Abstract
This paper compares two possible implementations of multithreaded architecture and proposes a new architecture combining the flexibility of the first with the low hardware complexity of the second. We present performance and step-by-step complexity analysis of two design alternatives of multithreaded architecture: dynamic inter-thread resource scheduling and static resource allocation. We then introduce a new multithreaded architecture based on a new scheduling mechanism called the “semi-static.” We show that with two concurrent threads the dynamic scheduling processor achieves from 5 to 45 % higher performance at the cost of much more complicated design. This paper indicates that for a relatively high number of execution resources the complexity of the dynamic scheduling logic will inevitably require design compromises. Moreover, high chip-wide communication time and an incomplete bypassing network will limit the dynamic scheduling and reduce its performance advantage. On the other hand, static scheduling architecture achieves low resource utilization. The semi-static architecture utilizes compiler techniques to exploit patterns of program parallelism and introduces a new hardware mechanism, in order to achieve performance close to dynamic scheduling without significantly increasing the static hardware complexity. The semi-static architecture statically assigns part of the functional units but dynamically schedules the most performance-critical functional units on a medium-grain basis.
Similar content being viewed by others
REFERENCES
D. W. Wall, Limits on Instruction-Level Parallelism, ASPLOS IV: Fourth Int'l. Conf. on Architectural Support for Progr. Lang. and Operat. Syst., pp. 176–188 (April 1991).
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm, Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, Proc. 23rd Ann. Int'l. Symp. Computer Architecture, pp. 191–202 (1996).
M. Johnson, Superscalar Microprocessor Design, Prentice-Hall, Englewood Cliffs, New Jersey (1991).
L. Gwennap, PA-8000 Combines Complexity and Speed, Microprocessor Report, 8(15) (November 14, 1994).
Peter Christy, IA-64 and Merced––What and Why, Microprocessor Report, 10(17) (December 1996).
Carole Dulong, The IA-64 Architecture at Work, Computer, 31(7):24–32 (July 1998).
J. A. Fisher, Very Long Instruction Word Architecture and the Eli-512, Proc. 10th Ann. Symp. Computer Architecture, pp. 140–150 (June 1983).
Z. Rozenshein, STAR*CORE: A Scalable High performance DSP Architecture, Motorola, Microprocessor forum (October 14, 1998).
A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'souza, and M. Parkin, Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors, IEEE Micro, 13(3):48–61 (June 1993).
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, Multiscalar Processors, Proc. 22nd Ann. Int'l. Symp. Computer Architecture (1995).
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, B. Smith, The Tera Computer System, Proc. Int'l. Conf. Supercomputing, pp. 1–6 (June 1990).
B. K. Gunther, Multithreading with Distributed Functional Units, IEEE Trans. Computers, 46(4):399–411 (April 1997).
B. H. Krishna and R. Govindarajan, Performance Evaluation of Simultaneous Multithreaded Architectures, Proc. Fourth Int'l. Conf. '97), pp. 34–43.
H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, G. R. Gao, and L. J. Hendren, A study of the EARTH-MANNA multithreaded system, IJPP, 24(4):319–347 (August 1996).
A. Wolfe and J. P. Shen, A Variable Instruction Stream Extension to the VLIW Architecture, ASPLOS IV: Fourth Int'l. Conf. Architectural Support for Progr. Lang. Operat. Syst., pp. 2–14 ( April 1991).
G. Tyson, M. Farrens, and A. R. Pleszkun, MISC: A Multiple Instruction Stream Computer. MICRO-25, Proc. 25th Int'l. Symp. Microarchitecture, pp. 193–196 (December 1992).
D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parralelism, Proc. 22nd Ann. Int'l. Symp. Computer Architecture, pp. 392–403 (1995).
M. Bekerman, A. Mendelson, and G. Sheaffer, Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures, Conf. Parallel Architect. Compilation Techniques (PACT 96), pp. 24–34 (1996).
Haitham Akkary and Michael A. Driscoll, A Dynamic Multithreaded Processor, MICRO-31, Proc. 31st Int'l. Symp. Microarchitecture, pp. 226–236 (November 1998).
G. E. Daddis and H. C. Torng, The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors, Proc. Int'l. Conf. Parallel Processing, I:76–83 (August 1991).
H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa, An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads, Proc. 19th Ann. Int'l. Symp. Computer Architecture, pp. 136–145 (1992).
S. W. Keckler and W. J. Dally, Processor Coupling: Integrating Compile Time and Run-Time Scheduling for Parallelism, Proc. 18th Ann. Int'l. Symp. on Computer Architecture, pp. 202–213 (May 1992).
R. E. Hank, S. A. Mahlke, R. A. Bringmann, J. C. Gyllenhaal, and W. W. Hwu, Superblock Formation Using Static Program Analysis, MICRO-26, Proc. 26th Int'l. Symp. Microarchitecture, Austin, Texas (December 1993).
D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Inc., 1990.
L. Gwennap, Intel's P6 Uses Decoupled Superscalar Design, Microprocessor Report, 9(2) (February 16, 1995).
J. E. Smith and A. R. Pleszkun, Implementation of Precise Interrupts in Pipelined Processor, Proc. 12th Ann. Int' l. Symp. Computer Architecture, Piscataway, New Jersey, pp. 36–44 (1985).
R. Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM J. 11:25–33 (January 1967).
G. S. Sohi, Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Units, Pipelined Computers, IEEE Trans. Computers, 39(3) (March 1990).
J. P. Singh, W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared-Memory, Computer Architecture News, 20(1):5–44 (March 1992).
SPEC Newsletter, Vol. 1 (1989).
Shade User's Manual, UNIX Manual Pages for Shade Analyzer and Library Functions, Spix Tools Users Manual, SpixTools, SUN Microsystems (1992).
B. R. Rau, Data Flow and Dependence Analysis for Instruction Level Parallelism, Proc. Fourth Int'l. Workshop on Lang. Compilers for Parallel Computing, Lecture Notes in Computer Science (LNCS ), 589:236–250 (August 1991).
P. P. Chang, S. A. Mahlke, and W. W. Hwu, Using Profile Information to Assist Classic Code Optimization, Software Practice and Experience, 21:1301–1321 (December 1991).
A. Mendelson and B. Mendelson, Toward a General-Purpose Multi-Stream System, Proc. IFIP Working Conf. Parallel Architectures and Compilation Techniques (PACT 94), pp. 335–338 (1994).
Rights and permissions
About this article
Cite this article
Mendelson, A., Bekerman, M. Design Alternatives of Multithreaded Architecture. International Journal of Parallel Programming 27, 161–193 (1999). https://doi.org/10.1023/A:1018733528538
Issue Date:
DOI: https://doi.org/10.1023/A:1018733528538