Abstract
Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. In this paper, we examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability for a single context to fully utilize the pipeline. We show that cycle-by-cycle interleaving of contexts provides a performance advantage over switching contexts only at a cache miss. This advantage results from the context interleaving hiding pipeline dependencies and reducing the context switch cost. In addition, we show that while the implementation of the interleaved scheme is more complex, the complexity is not overwhelming. As pipelines get deeper and operate at lower percentages of peak performance, the performance advantage of the interleaved scheme is likely to justify its additional complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In 1990 International Conference on Supercomputing, pages 1–6, June 1990.
A. Agarwal, B.-H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 104–114, May 1990.
D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Pro-ceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40–52, April 1991.
Cypress Semiconductor Corporation. SPARC RISC User’s Guide, 2nd edition, 1990.
H. Davis, S. R. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, pages 99–107, August 1991.
Digital Equipment Corporation. Alpha Architecture Handbook, preliminary edition, 1992.
Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.
M. Dubois, C. Scheurich, and F. Briggs. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434–442, June 1986.
M. K. Farrens and A. R. Pleszkun. Strategies for achieving improved processor throughput. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 362–369, May 1991.
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, May 1990.
A. Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, and W.-D. Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceeding of the 18th Annual International Symposium on Computer Architecture, pages 254–263, May 1991.
R. H. Halstead, Jr. and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443–451, June 1988.
G. Kane. MIPS RISC Architecture. Prentice-Hall, 1988.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 81–87, 1981.
K. Kurihara, D. Chaiken, and A. Agarwal. Latency tolerance through multithreading in large-scale multiprocessors. In Proceedings of the International Symposium on Shared Memory Multiprocessing, pages 91–101, April 1991.
J. Laudon. Architectural and Implementation Tradeoffs for Multiple-Context Processors. PhD thesis, Stanford University, Stanford, California, in preparation, 1993.
J. Laudon, A. Gupta, and M. Horowitz. Architectural and implementation tradeoffs in the design of multiple-context processors. Technical Report CSL-TR-92–523, Stanford University, May 1992.
J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 17(1):6–22, January 1984.
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148–159, May 1990.
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH prototype: Logic overhead and performance. IEEE Transactions on Parallel and Distributed Systems, 4(1):41–61, January 1993.
E. Lusk, R. Overbeek, J. Boyle, R. Butler, T. Disz, B. Glickfeld, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.
MIPS Computer Systems, Inc. MIPS R4000 Microprocessor User’s Manual, 1991.
T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62–73, October 1992.
R. H. Saavedra-Barrera, D. E. Culler, and T. von Eicken. Analysis of multithreaded architectures for parallel computing. In Proceedings of the 2nd Annual Symposium on Parallel Algorithms and Architecture, July 1990.
C. Scheurich and M. Dubois. Lockup-free caches in high-performance multiprocessors. Journal of Parallel and Distributed Computing, 11(1):25–36, January 1991.
J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):5–44, March 1992.
B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6–8, 1978.
M. Smith, M. Lam, and M. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 344–354, May 1990.
M. Sporer, F. H. Moss, and C. J. Mathias. An introduction to the architecture of the Stellar graphics supercomputer. In Proceedings of COMPCON Spring ‘88: Thirty-third IEEE Computer Society International Conference, pages 464–467, February 1988.
W.-D. Weber and A. Gupta, Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 273–280, June 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer Science+Business Media New York
About this chapter
Cite this chapter
Laudon, J., Gupta, A., Horowitz, M. (1994). Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors. In: Iannucci, R.A., Gao, G.R., Halstead, R.H., Smith, B. (eds) Multithreaded Computer Architecture: A Summary of the State of the ART. The Springer International Series in Engineering and Computer Science, vol 281. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2698-8_8
Download citation
DOI: https://doi.org/10.1007/978-1-4615-2698-8_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6161-9
Online ISBN: 978-1-4615-2698-8
eBook Packages: Springer Book Archive