Skip to main content

Coping with very high latencies in petaflop computer systems

  • II System Architecture
  • Conference paper
  • First Online:
  • 113 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1615))

Abstract

The very long and highly variable latencies in the deep memory hierarchy of a petaflop-scale architecture design, such as the Hybrid Technology Multi-Threaded Architecture (HTMT) [13], present a new challenge to its programming and execution model. A solution to coping with such high and variable latencies is to directly and explicity expose the different memory regions of the machine to the program execution model, allowing better management of communication. In this paper we describe the novel percolation model that lies at the heart of the HTMT program execution model [13]. The Percolation Model combines multithreading with dynamic prefetching of coarse-grain contexts. In the past, prefetching techniques have concentrated on moving blocks of data within the memory hierarchy. Instead of only moving contiguous blocks of data, the thread percolation approach manages contexts that include data, program instructions, and control states.

The main contributions of this paper include the specification of the HTMT runtime execution model based on the concept of percolation, and a discussion of the role of the compiler in a machine that exposes the memory hierarchy to the programming model.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), Boston, Massachusetts, October 20–23, 1996. IEEE Computer Society Press.

    Google Scholar 

  2. ACM SIGARCH and IEEE Computer Society. Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, California, May 17–19, 1993. Computer Architecture News 21(2), May 1993.

    Google Scholar 

  3. ACM SIGARCH and IEEE Computer Society. Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 22–24, 1995. Computer Architecture News, 23(2), May 1995.

    Google Scholar 

  4. Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Kenneth Mackenzie, and Donald Yeung. The MIT Alewife machine: Architecture and performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture [3], Santa Margherita Ligure, Italy, June 22–24, 1995, pages 2–13. Computer Architecture News, 23(2), May 1995.

    Google Scholar 

  5. Gail Alverson, Robert Alverson, David Callahan, Brian Koblenz, Allan Porterfield, and Burton Smith. Exploiting heterogeneous parallelism on a multithreaded multiprocessor. Presented at the Workshop on Multithreaded Computers, held at Supercomputing '91, Albuquerque, New Mexico, November 1991.

    Google Scholar 

  6. Jose Nelson Amaral, Guang R. Gao, Phillip Merkey, Thomas Sterling, Zachary Ruiz, and Sean Ryan. An htmt performance prediction case study: implementing cannon's dense matrix multiply algorithm. Technical report, University of Delaware, 1999.

    Google Scholar 

  7. Karen Bergman and Coke Reed. Hybrid technology multithreaded architecture program design and development of the data vortex network. Technical report, Princeton University, 1998. Technical Note 2.0.

    Google Scholar 

  8. Derek Chiou, Boon S. Ang, Robert Greiner, Arvind, James C. Hoe, Michael J. Beckerle, James E. Hicks, and Andy Boughton. StarT-NG: Delivering seamless parallel computing. In Seif Haridi, Khayri Ali, and Peter Magnusson, editors, Proceedings of the First International EURO-PAR Conference, number 966 in Lecture Notes in Computer Science, pages 101–116, Stockholm, Sweden, August 29–31, 1995. Springer-Verlag.

    Google Scholar 

  9. David E. Culler, Seth C. Goldstein, Klaus E. Schauser, and Thorsten von Eicken. TAM—a compiler controlled threaded abstract machine. Journal of Parallel and Distributed Computing, 18:347–370, July 1993.

    Article  Google Scholar 

  10. David E. Culler, Anurag Sah, Klaus Erik Schauser, Thorsten von Eicken, and John Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 164–175, Santa Clara, California, April 8–11, 1991. ACM SIGARCH, SIGPLAN, SIGOPS, and the IEEE Computer Society. Computer Architecture News, 19(2), April 1991; Operating Systems Review, 25, April 1991; SIGPLAN Notices, 26(4), April 1991.

    Google Scholar 

  11. Mikhail Dorojevets, Paul Bunyk, Dmitri Zinoviev, and Konstantin Likharev. Petaflops rsfq system design. In Applied Superconductivity Conference, Sept 1998.

    Google Scholar 

  12. Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, and Whay S. Lee. The M-Machine multicomputer. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 146–156, Ann Arbor, Michigan, November 29–December 1, 1995. IEEE-CS TC-MICRO and ACM SIGMICRO.

    Google Scholar 

  13. Guang R. Gao, Kevin B. Theobald, Andrés Márquez, and Thomas Sterling. The HTMT program execution model. CAPSL Technical Memo 09, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, July 1997. Inftp://ftp.capsl.udel.edu/pub/doc/memos.

    Google Scholar 

  14. Laurie J. Hendren, Xinan Tang, Yingchun Zhu, Guang R. Gao, Xun Xue, Haiying Cai, and Pierre Ouellet. Compiling C for the EARTH multithreaded architecture. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96) [1], pages 12–23.

    Google Scholar 

  15. HTMT. Hybid technology multi-threaded architectures. http://htmt.caltech.edu, 1998.

    Google Scholar 

  16. Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao, and Laurie J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319–347, August 1996.

    Google Scholar 

  17. Robert A. Iannucci. A dataflow/von Neumann hybrid architecture. Technical Report MIT/LCS/TR-418, MIT Laboratory for Computer Science, Cambridge, Massachusetts, July 1988. PhD thesis, May 1988.

    Google Scholar 

  18. Robert A. Iannucci, Guang R. Gao, Robert H. Halstead, Jr., and Burton Smith, editors. Multithreaded Computer Architecture: A Summary of the State of the Art. Kluwer Academic Publishers, Norwell, Massachusetts, 1994. Book contains papers presented at the Workshop on Multithreaded Computers, held in conjunction with Supercomputing '91 in Albuquerque, New Mexico, November 1991.

    Google Scholar 

  19. Yuetsu Kodama, Hirohumi Sakane, Mitsuhisa Sato, Hayato Yamana, Shuichi Sakai, and Yoshinori Yamaguchi. The EM-X parallel computer: Architecture and basic performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture [3] Santa Margherita Ligure, Italy, June 22–24, 1995, pages 14–23. Computer Architecture News, 23(2), May 1995.

    Google Scholar 

  20. Peter M. Kogge, Jay B. Brockman, Thomas Sterling, and Guang Gao. Processingin-memory: Chips to petaflops. Technical report, International Symposium on Computer Architecture, Denver, Co., June 1997.

    Google Scholar 

  21. Andrés Márquez, Kevin B. Theobald, Xinan Tang, and Guang R. Gao. A superstrand architecture. CAPSL Technical Memo 14, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, December 1997. In ftp://ftp.capsl.udel.edu/pub/doc/memos.

    Google Scholar 

  22. Andrés Márquez, Kevin B. Theobald, Xinan Tang, Thomas Sterling, and Guang R. Gao. A superstrand architecture and its compilation. CAPSL Technical Memo 18, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, March 1998.

    Google Scholar 

  23. R. S. Nikhil and Arvind. Id: a language with implicit parallelism. In J. Feo, editor, A Comparative Study of Parallel Programming Languages: The Salishan Problems. Elsevier Science Publishers, February 1990.

    Google Scholar 

  24. Michael D. Noakes, Deborah A. Wallah, and William J. Dally. The J-Machine multicomputer: An architectural evaluation. In Proceedings of the 20th Annual International Symposium on Computer Architecture [2] San Diego, California, May 17–19, 1993, pages 224–235. Computer Architecture News, 21(2), May 1993.

    Google Scholar 

  25. Kazuaki Okamoto, Shuichi Sakai, Hiroshi Matsuoka, Takashi Yokota, and Hideo Hirono. Multithread execution mechanisms on RICA-1 for massively parallel computation. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96) [1] Massachusetts, October 20–23, 1996. IEEE Computer Society Press. pages 116–121.

    Google Scholar 

  26. Demetri Psaltis and Geoffrey W. Burr. Holographic data storage. Computer, 31(2):52–60, Febuary 1998.

    Article  Google Scholar 

  27. Klaus E. Schauser, David E. Culler, and Seth C. Goldstein. Separation constraint partitioning—A new algorithm for partitioning non-strict programs into sequential threads. In Conference Record of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 259–271, San Francisco, California, January 22–25, 1995.

    Google Scholar 

  28. Klaus Eric Schauser, David E. Culler, and Thorsten von Eiken. Compiler-controlled multithreading for lenient parallel languages. Report No. UCB/CSD 91/640, Computer Science Division, University of California at Berkeley, 1991.

    Google Scholar 

  29. Ellen Spertus, Seth Copen Goldstein, Klaus Erik Schauser, Thorsten von Eicken, David E. Culler, and William J. Dally. Evaluation of mechanisms for fine-grained parallel programs in the J-Machine and the CM-5. In Proceedings of the 20th Annual International Symposium on Computer Architecture [2] San Diego, California, May 17–19, 1993 pages 302–313. Computer Architecture News, 21(2), May 1993.

    Google Scholar 

  30. Xinan Tang, Jian Wang, Kevin B. Theobald, and Guang R. Gao. Thread partitioning and scheduling based on cost model. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 272–281, Newport, Rhode Island, June 22–25, 1997. SIGACT/SIGARCH and EATCS.

    Google Scholar 

  31. Kevin B. Theobald, José Nelson Amaral, Gerd Heber, Olivier Maquelin, Xinan Tang, and Guang R. Gao. Overview of the Threaded-C language. CAPSL Technical Memo 19, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, March 1998. In ftp://ftp.capsl.udel.edu/pub/doc/memos.

    Google Scholar 

  32. L. Wittie, D. Zinoviev, G. Sazaklis, and K. Likharev. CNET: Design of an RSFQ Switching network for petaflops-scale computing. IEEE Trans. on Appl. Supercond., June 1999. In press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Constantine Polychronopoulos Kazuki Joe Akira Fukuda Shinji Tomita

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ryan, S., Amaral, J.N., Gao, G., Ruiz, Z., Marquez, A., Theobald, K. (1999). Coping with very high latencies in petaflop computer systems. In: Polychronopoulos, C., Fukuda, K.J.A., Tomita, S. (eds) High Performance Computing. ISHPC 1999. Lecture Notes in Computer Science, vol 1615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0094912

Download citation

  • DOI: https://doi.org/10.1007/BFb0094912

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65969-3

  • Online ISBN: 978-3-540-48821-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics