Advertisement

On-Chip Networks for Multicore Systems

  • Li-Shiuan Peh
  • Stephen W. Keckler
  • Sriram Vangal
Chapter
Part of the Integrated Circuits and Systems book series (ICIR)

Abstract

With Moore’s law supplying billions of transistors, and uniprocessor architectures delivering diminishing performance, multicore chips are emerging as the prevailing architecture in both general-purpose and application-specific markets. As the core count increases, the need for a scalable on-chip communication fabric that can deliver high bandwidth is gaining in importance, leading to recent multicore chips interconnected with sophisticated on-chip networks. In this chapter, we first present a tutorial on on-chip network architecture fundamentals including on-chip network interfaces, topologies, routing, flow control, and router microarchitectures. Next, we detail case studies on two recent prototypes of on-chip networks: the UT-Austin TRIPS operand network and the Intel TeraFLOPS on-chip network. This chapter organization seeks to provide the foundations of on-chip networks so that readers can appreciate the different design choices faced in the two case studies. Finally, this chapter concludes with an outline of the challenges facing research into on-chip network architectures.

Keywords

Output Port Virtual Channel Multicore System Very Long Instruction Word Sleep Transistor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

Dr. Peh wishes to thank her entire Princeton research group, as well as students who have taken the ELE580 graduate course on interconnection networks as those research and teaching experiences helped significantly in the writing of this chapter. Her research has been kindly supported by the National Science Foundation, Intel Corporation, and the MARCO Gigascale Systems Research Center. Dr. Keckler thanks the entire TRIPS team, in particular Doug Burger, Paul Gratz, Heather Hanson, Robert McDonald, and Karthikeyan Sankarlingam, for their contributions to the design and implementation of the TRIPS operand network. The TRIPS project was supported by the Defense Advanced Research Projects Agency under contract F33615-01-C-4106 and by NSF CISE Research Infrastructure grant EIA-0303609. Dr. Vangal thanks the entire TeraFLOPS processor design team at Circuit Research Laboratories, Intel Corporation, for flawless execution of the design.

References

  1. 1.
    N. R. Adiga, M. A. Blumrich, D. Chen, P. Coteus, A. Gara, M. E. Giampapa, P. Heidelberger, S. Singh, B. D. Steinmacher-Burow, T. Takken, M. Tsao, and P. Vranas. Blue Gene/L torus interconnection network. IBM Journal of Research and Development, 49(2/3):265–276, 2005.Google Scholar
  2. 2.
    P. Bai, C. Auth, S. Balakrishnan, M. Bost, R. Brain, V. Chikarmane, R. Heussner, M. Hussein, J. Hwang, D. Ingerly, R. James, J. Jeong, C. Kenyon, E. Lee, S.-H. Lee, N. Lindert, M. Liu, Z. Ma, T. Marieb, A. Murthy, R. Nagisetty, S. Natarajan, J. Neirynck, A. Ott, C. Parker, J. Sebastian, R. Shaheed, S. Sivakumar, J. Steigerwald, S. Tyagi, C. Weber, B. Woolery, A.Yeoh, K. Zhang, and M. Bohr. A 65 nm Logic Technology Featuring 35 nm Gate Lengths, Enhanced Channel Strain, 8 Cu Interconnect Layers, Low-k ILD and 0.57 um2 SRAM Cell. In International Electron Devices Meeting (IEDM), pages 657–660, Dec 2004.Google Scholar
  3. 3.
    S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, and M. Reif. TILE64 processor: A 64-core SoC with mesh interconnect. In International Solid State Circuits Conference, Feb 2008.Google Scholar
  4. 4.
    S. Borkar. Thousand core chips: a technology perspective. In Design Automation Conference, pages 746–749, June 2007.Google Scholar
  5. 5.
    D. Burger, S. Keckler, K. McKinley, M. Dahlin, L. John, C. Lin, C. Moore, J. Burrill, R. McDonald, and W. Yoder. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 37(7):44–55, July 2004.Google Scholar
  6. 6.
    M. Butts. Synchronization through communication in a massively parallel processor array. IEEE Micro, 27(5):32–40, Sep/Oct 2007.CrossRefGoogle Scholar
  7. 7.
    M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, and S. Tam. CMP network-on-chip overlaid with multi-band RF-interconnect. In International Conference on High-Performance Computer Architecture, Feb 2008.Google Scholar
  8. 8.
    M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, and A. Scandurra. Spidergon: a novel on-chip communication network. In International Symposium on System-on-Chip, page 15, Nov 2004.Google Scholar
  9. 9.
    W. J. Dally. Virtual-channel flow control. In International Symposium of Computer Architecture, pages 60–68, May 1990.Google Scholar
  10. 10.
    W. J. Dally, A. Chien, S. Fiske, W. Horwat, R. Lethin, M. Noakes, P. Nuth, E. Spertus, D. Wallach, D. S. Wills, A. Chang, and J. Keen. Retrospective: the J-machine. In 25 years of the International Symposium on Computer Architecture (selected papers), pages 54–58, 1998.Google Scholar
  11. 11.
    W. J. Dally and C. L. Seitz. The torus routing chip. Journal of Distributed Computing, 1:187–196, 1986.CrossRefGoogle Scholar
  12. 12.
    W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers, San Francisco, CA, 2004.Google Scholar
  13. 13.
    J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. Morgan Kaufmann Publishers, San Francisco, CA, 2003.Google Scholar
  14. 14.
    W. Eatherton. The push of network processing to the top of the pyramid. Keynote speech, International Symposium on Architectures for Networking and Communications Systems.Google Scholar
  15. 15.
    N. Enright-Jerger, L.-S. Peh, and M. Lipasti. Circuit-switched coherence. In International Symposium on Networks-on-Chip, April 2008.Google Scholar
  16. 16.
    M. Galles. Scalable pipelined interconnect for distributed endpoint routing: The SGI SPIDER chip. In Hot Interconnects 4, Aug 1996.Google Scholar
  17. 17.
    P. Gratz, B. Grot, and S. Keckler. Regional congestion awareness for load balance in networks-on-chip. In International Conference on High-Performance Computer Architecture, pages 203–214, Feb 2008.Google Scholar
  18. 18.
    P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and Evaluation of On-Chip Network Architectures. In International Conference on Computer Design, pages 477–484, Sep 2006.Google Scholar
  19. 19.
    R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4), Apr 2001.Google Scholar
  20. 20.
    D. Hopkins, A. Chow, R. Bosnyak, B. Coates, J. Ebergen, S. Fairbanks, J. Gainsley, R. Ho, J. Lexau, F. Liu, T. Ono, J. Schauer, I. Sutherland, and R. Drost. Circuit techniques to enable 430 GB/s/mm2 proximity communication. International Solid-State Circuits Conference, pages 368–609, Feb 2007.Google Scholar
  21. 21.
    Infiniband trade organization. http://www.infinibandta.org/
  22. 22.
    A. P. Jose, G. Patounakis, and K. L. Shepard. Pulsed current-mode signaling for nearly speed-of-light intrachip communication. Proceedings of the IEEE, 41(4):772–780, April 2006.Google Scholar
  23. 23.
    J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5):589–604, 2005.CrossRefGoogle Scholar
  24. 24.
    P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3:267–286, 1979.MATHMathSciNetGoogle Scholar
  25. 25.
    B. Kim and V. Stojanovic. Equalized interconnects for on-chip networks: Modeling and optimization framework. In International Conference on Computer-Aided Design, pages 552–559, November 2007.Google Scholar
  26. 26.
    J. Kim, J. Balfour, and W. J. Dally. Flattened butterfly topology for on-chip networks. In International Symposium on Microarchitecture, pages 172–182, December 2007.Google Scholar
  27. 27.
    J. Kim, C. A. Nicopoulos, D. Park, N. Vijaykrishnan, M. S. Yousif, and C. R. Das. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In International Symposium on Computer Architecture, pages 4–15, June 2006.Google Scholar
  28. 28.
    N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, and D. H. Albonesi. Leveraging optical technology in future bus-based chip multiprocessors. In International Symposium on Microarchitecture, pages 492–503, December 2006.Google Scholar
  29. 29.
    P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21–29, March/April 2005.CrossRefGoogle Scholar
  30. 30.
    A. Kumar, L.-S. Peh, P. Kundu, and N. K. Jha. Express virtual channels: Towards the ideal interconnection fabric. In International Symposium on Computer Architecture, pages 150–161, June 2007.Google Scholar
  31. 31.
    S. S. Mukherjee, P. Bannon, S. Lang, A. Spink, and D. Webb. The Alpha 21364 network architecture. IEEE Micro, 22(1):26–35, Jan/Feb 2002.CrossRefGoogle Scholar
  32. 32.
    R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In International Symposium on Computer Architecture, pages 188–197, June 2004.Google Scholar
  33. 33.
    C. A. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In International Symposium on Microarchitecture, pages 333–346, December 2006.Google Scholar
  34. 34.
    J. D. Owens, W. J. Dally, R. Ho, D. N. J. Jayasimha, S. W. Keckler, and L.-S. Peh. Research challenges for on-chip interconnection networks. IEEE Micro, 27(5):96–108, Sep/Oct 2007.CrossRefGoogle Scholar
  35. 35.
    L.-S. Peh and W. J. Dally. Flit-reservation flow control. In International Symposium on High-Performance Computer Architecture, pages 73–84, Jan 2000.Google Scholar
  36. 36.
    L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In International Conference on High-Performance Computer Architecture, pages 255–266, January 2001.Google Scholar
  37. 37.
    K. Sankaralingam, R. Nagarajan, P. Gratz, R. Desikan, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, P. Shivakumar, W. Yoder, R. McDonald, S. Keckler, and D. Burger. The Distributed Microarchitecture of the TRIPS Prototype Processor. In International Symposium on Microarchitecture, pages 480–491, December 2006.Google Scholar
  38. 38.
    A. Shacham, K. Bergman, and L. P. Carloni. The case for low-power photonic networks on chip. In Design Automation Conference, pages 132–135, June 2007.Google Scholar
  39. 39.
    L. Shang, L.-S. Peh, A. Kumar, and N. K. Jha. Thermal modeling, characterization and management of on-chip networks. In International Symposium on Microarchitecture, pages 67–78, Decemeber 2004.Google Scholar
  40. 40.
    A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. Goal: a load-balanced adaptive routing algorithm for torus networks. In International Symposium on Computer Architecture, pages 194–205, June 2003.Google Scholar
  41. 41.
    M. B. Taylor, W. Lee, S. P. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architecture. In International Symposium on High-Performance Computer Architecture, pages 341–353, Feb 2003.Google Scholar
  42. 42.
    J. Tschanz, S. Narendra, Y. Ye, B. Bloechel, S. Borkar, and V. De. Dynamic Sleep Transistor and Body Bias for Active Leakage Power Control of Microprocessors. IEEE Journal of Solid-State Circuits, 38(11):1838–1845, Nov 2003.CrossRefGoogle Scholar
  43. 43.
    S. Vangal, Y. Hoskote, N. Borkar, and A. Alvandpour. A 6.2-GFLOPS Floating-Point Multiply-Accumulator with Conditional Normalization. IEEE Journal of Solid-State Circuits, 41(10):2314–2323, Oct 2006.CrossRefGoogle Scholar
  44. 44.
    S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. An 80-Tile Sub-100 W TeraFLOPS Processor in 65-nm CMOS. IEEE Journal of Solid-State Circuits, 43(1):29–41, Jan 2008.CrossRefGoogle Scholar
  45. 45.
    S. Vangal, A. Singh, J. Howard, S. Dighe, N. Borkar, and A. Alvandpour. A 5.1 GHz 0.34 mm2 Router for Network-on-Chip Applications. In International Symposium on VLSI Circuits, pages 42–43, June 2007.Google Scholar
  46. 46.
    H.-S. Wang, L.-S. Peh, and S. Malik. Power-driven design of router microarchitectures in on-chip networks. In International Symposium on Microarchitecture, pages 105–116, Nov 2003.Google Scholar
  47. 47.
    H. Wilson and M. Haycock. A Six-port 30-GB/s Non-blocking Router Component Using Point-to-Point Simultaneous Bidirectional Signaling for High-bandwidth Interconnects. IEEE Journal of Solid-State Circuits, 36(12):1954–1963, Dec 2001.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  • Li-Shiuan Peh
    • 1
  • Stephen W. Keckler
    • 2
  • Sriram Vangal
    • 3
  1. 1.Princeton UniversityPrincetonUSA
  2. 2.The University of Texas at AustinAustinUSA
  3. 3.Intel CorporationHillsboroUSA

Personalised recommendations