Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors

Laudon, James; Gupta, Anoop; Horowitz, Mark

doi:10.1007/978-1-4615-2698-8_8

James Laudon⁵,
Anoop Gupta⁵ &
Mark Horowitz⁵

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 281))

138 Accesses
3 Citations

Abstract

Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. In this paper, we examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability for a single context to fully utilize the pipeline. We show that cycle-by-cycle interleaving of contexts provides a performance advantage over switching contexts only at a cache miss. This advantage results from the context interleaving hiding pipeline dependencies and reducing the context switch cost. In addition, we show that while the implementation of the interleaved scheme is more complex, the complexity is not overwhelming. As pipelines get deeper and operate at lower percentages of peak performance, the performance advantage of the interleaved scheme is likely to justify its additional complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In 1990 International Conference on Supercomputing, pages 1–6, June 1990.
Google Scholar
A. Agarwal, B.-H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 104–114, May 1990.
Chapter Google Scholar
D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Pro-ceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40–52, April 1991.
Chapter Google Scholar
Cypress Semiconductor Corporation. SPARC RISC User’s Guide, 2nd edition, 1990.
Google Scholar
H. Davis, S. R. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, pages 99–107, August 1991.
Google Scholar
Digital Equipment Corporation. Alpha Architecture Handbook, preliminary edition, 1992.
Google Scholar
Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.
Google Scholar
M. Dubois, C. Scheurich, and F. Briggs. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434–442, June 1986.
Google Scholar
M. K. Farrens and A. R. Pleszkun. Strategies for achieving improved processor throughput. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 362–369, May 1991.
Google Scholar
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, May 1990.
Chapter Google Scholar
A. Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, and W.-D. Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceeding of the 18th Annual International Symposium on Computer Architecture, pages 254–263, May 1991.
Google Scholar
R. H. Halstead, Jr. and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443–451, June 1988.
Google Scholar
G. Kane. MIPS RISC Architecture. Prentice-Hall, 1988.
Google Scholar
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 81–87, 1981.
Google Scholar
K. Kurihara, D. Chaiken, and A. Agarwal. Latency tolerance through multithreading in large-scale multiprocessors. In Proceedings of the International Symposium on Shared Memory Multiprocessing, pages 91–101, April 1991.
Google Scholar
J. Laudon. Architectural and Implementation Tradeoffs for Multiple-Context Processors. PhD thesis, Stanford University, Stanford, California, in preparation, 1993.
Google Scholar
J. Laudon, A. Gupta, and M. Horowitz. Architectural and implementation tradeoffs in the design of multiple-context processors. Technical Report CSL-TR-92–523, Stanford University, May 1992.
Google Scholar
J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 17(1):6–22, January 1984.
Article Google Scholar
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148–159, May 1990.
Chapter Google Scholar
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH prototype: Logic overhead and performance. IEEE Transactions on Parallel and Distributed Systems, 4(1):41–61, January 1993.
Article Google Scholar
E. Lusk, R. Overbeek, J. Boyle, R. Butler, T. Disz, B. Glickfeld, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.
Google Scholar
MIPS Computer Systems, Inc. MIPS R4000 Microprocessor User’s Manual, 1991.
Google Scholar
T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62–73, October 1992.
Chapter Google Scholar
R. H. Saavedra-Barrera, D. E. Culler, and T. von Eicken. Analysis of multithreaded architectures for parallel computing. In Proceedings of the 2nd Annual Symposium on Parallel Algorithms and Architecture, July 1990.
Google Scholar
C. Scheurich and M. Dubois. Lockup-free caches in high-performance multiprocessors. Journal of Parallel and Distributed Computing, 11(1):25–36, January 1991.
Article Google Scholar
J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):5–44, March 1992.
Article Google Scholar
B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6–8, 1978.
Google Scholar
M. Smith, M. Lam, and M. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 344–354, May 1990.
Chapter Google Scholar
M. Sporer, F. H. Moss, and C. J. Mathias. An introduction to the architecture of the Stellar graphics supercomputer. In Proceedings of COMPCON Spring ‘88: Thirty-third IEEE Computer Society International Conference, pages 464–467, February 1988.
Google Scholar
W.-D. Weber and A. Gupta, Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 273–280, June 1989.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Systems Laboratory, Stanford University, Stanford, California, 94305, USA
James Laudon, Anoop Gupta & Mark Horowitz

Authors

James Laudon
View author publications
You can also search for this author in PubMed Google Scholar
Anoop Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Mark Horowitz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Exa Corporation, Cambridge, Massachusetts, USA
Robert A. Iannucci
McGill University, Montreal, Quebec, Canada
Guang R. Gao
Digital Equipment Corporation, Cambridge, Massachusetts, USA
Robert H. Halstead Jr.
Tera Computer Company, Seattle, Washington, USA
Burton Smith

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Laudon, J., Gupta, A., Horowitz, M. (1994). Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors. In: Iannucci, R.A., Gao, G.R., Halstead, R.H., Smith, B. (eds) Multithreaded Computer Architecture: A Summary of the State of the ART. The Springer International Series in Engineering and Computer Science, vol 281. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2698-8_8

Download citation

DOI: https://doi.org/10.1007/978-1-4615-2698-8_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6161-9
Online ISBN: 978-1-4615-2698-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics