Skip to main content

Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors

  • Chapter
Book cover Multithreaded Computer Architecture: A Summary of the State of the ART

Abstract

Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. In this paper, we examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability for a single context to fully utilize the pipeline. We show that cycle-by-cycle interleaving of contexts provides a performance advantage over switching contexts only at a cache miss. This advantage results from the context interleaving hiding pipeline dependencies and reducing the context switch cost. In addition, we show that while the implementation of the interleaved scheme is more complex, the complexity is not overwhelming. As pipelines get deeper and operate at lower percentages of peak performance, the performance advantage of the interleaved scheme is likely to justify its additional complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In 1990 International Conference on Supercomputing, pages 1–6, June 1990.

    Google Scholar 

  2. A. Agarwal, B.-H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 104–114, May 1990.

    Chapter  Google Scholar 

  3. D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Pro-ceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40–52, April 1991.

    Chapter  Google Scholar 

  4. Cypress Semiconductor Corporation. SPARC RISC User’s Guide, 2nd edition, 1990.

    Google Scholar 

  5. H. Davis, S. R. Goldschmidt, and J. Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, pages 99–107, August 1991.

    Google Scholar 

  6. Digital Equipment Corporation. Alpha Architecture Handbook, preliminary edition, 1992.

    Google Scholar 

  7. Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.

    Google Scholar 

  8. M. Dubois, C. Scheurich, and F. Briggs. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434–442, June 1986.

    Google Scholar 

  9. M. K. Farrens and A. R. Pleszkun. Strategies for achieving improved processor throughput. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 362–369, May 1991.

    Google Scholar 

  10. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, May 1990.

    Chapter  Google Scholar 

  11. A. Gupta, J. Hennessy, K. Gharachorloo, T. Mowry, and W.-D. Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceeding of the 18th Annual International Symposium on Computer Architecture, pages 254–263, May 1991.

    Google Scholar 

  12. R. H. Halstead, Jr. and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443–451, June 1988.

    Google Scholar 

  13. G. Kane. MIPS RISC Architecture. Prentice-Hall, 1988.

    Google Scholar 

  14. D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 81–87, 1981.

    Google Scholar 

  15. K. Kurihara, D. Chaiken, and A. Agarwal. Latency tolerance through multithreading in large-scale multiprocessors. In Proceedings of the International Symposium on Shared Memory Multiprocessing, pages 91–101, April 1991.

    Google Scholar 

  16. J. Laudon. Architectural and Implementation Tradeoffs for Multiple-Context Processors. PhD thesis, Stanford University, Stanford, California, in preparation, 1993.

    Google Scholar 

  17. J. Laudon, A. Gupta, and M. Horowitz. Architectural and implementation tradeoffs in the design of multiple-context processors. Technical Report CSL-TR-92–523, Stanford University, May 1992.

    Google Scholar 

  18. J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 17(1):6–22, January 1984.

    Article  Google Scholar 

  19. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148–159, May 1990.

    Chapter  Google Scholar 

  20. D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH prototype: Logic overhead and performance. IEEE Transactions on Parallel and Distributed Systems, 4(1):41–61, January 1993.

    Article  Google Scholar 

  21. E. Lusk, R. Overbeek, J. Boyle, R. Butler, T. Disz, B. Glickfeld, J. Patterson, and R. Stevens. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc., 1987.

    Google Scholar 

  22. MIPS Computer Systems, Inc. MIPS R4000 Microprocessor User’s Manual, 1991.

    Google Scholar 

  23. T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62–73, October 1992.

    Chapter  Google Scholar 

  24. R. H. Saavedra-Barrera, D. E. Culler, and T. von Eicken. Analysis of multithreaded architectures for parallel computing. In Proceedings of the 2nd Annual Symposium on Parallel Algorithms and Architecture, July 1990.

    Google Scholar 

  25. C. Scheurich and M. Dubois. Lockup-free caches in high-performance multiprocessors. Journal of Parallel and Distributed Computing, 11(1):25–36, January 1991.

    Article  Google Scholar 

  26. J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):5–44, March 1992.

    Article  Google Scholar 

  27. B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6–8, 1978.

    Google Scholar 

  28. M. Smith, M. Lam, and M. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 344–354, May 1990.

    Chapter  Google Scholar 

  29. M. Sporer, F. H. Moss, and C. J. Mathias. An introduction to the architecture of the Stellar graphics supercomputer. In Proceedings of COMPCON Spring ‘88: Thirty-third IEEE Computer Society International Conference, pages 464–467, February 1988.

    Google Scholar 

  30. W.-D. Weber and A. Gupta, Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 273–280, June 1989.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer Science+Business Media New York

About this chapter

Cite this chapter

Laudon, J., Gupta, A., Horowitz, M. (1994). Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors. In: Iannucci, R.A., Gao, G.R., Halstead, R.H., Smith, B. (eds) Multithreaded Computer Architecture: A Summary of the State of the ART. The Springer International Series in Engineering and Computer Science, vol 281. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2698-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-2698-8_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-6161-9

  • Online ISBN: 978-1-4615-2698-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics