Skip to main content

Fast Network-on-Chip Design

  • Chapter
  • First Online:
Source-Synchronous Networks-On-Chip

Abstract

In previous Chapter, we showed how resonant clocking can be used as a high-speed, low power, stable, on-chip clock generation and distribution schemes. In this chapter, we use such a clock to design a high speed source-synchronous ring-based NoC architecture. In Sect. 3.1, we introduce our NoC design, which comprises of extremely fast, intersecting source-synchronous data rings. These source-synchronous data rings traverse the CMP in both the horizontal and vertical directions providing complete connectivity to all the PEs in a CMP. In our approach, the interconnection network operates on a different clock domain which runs significantly faster than the PE clocks. This helps us achieve inter-processor communication with minimal latency. We perform architectural simulations of the ring-based NoC in Sect. 3.2. We propose a deadlock-free routing protocol of the source-synchronous ring-based NoC by using link ordering and virtual channel based buffered flow control. Architectural results obtained on synthetic and real traffic demonstrate that the source-synchronous ring-based NoC has significantly lower latency and higher maximum sustained injection rate compared to a state of the art mesh-based NoC. Next, in Sect. 3.3, we propose a modified source-synchronous design in which the PEs extract a low jitter clock directly from the high speed ring clock by division, and hence are synchronous with the NoC. This is feasible due to the extremely good jitter characteristics of the SWO based clock generation and distribution scheme of Sect. 2.2. Using the above modified design, we propose a class of source-synchronous NoCs organized in an H-tree topology which consume lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that our H-tree based NoC designs can provide significantly lower latency and are able to sustain a higher injection rate compared to a state of the art mesh. Using the modified source-synchronous design proposed in Sect. 3.3, we also evaluate two more floorplan-friendly NoC topologies in Sect. 3.4. These two floorplan-friendly NoC topologies consume significantly lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that they can provide significantly lower latency while achieving same or better maximum sustained injection rate compared to a state of the art mesh.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, and Venkatanand Venkatachalapathy, “Microarchitectural Wire Management for Performance and Power in Partitioned Architectures,” in Proceedings of the 11th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, 2005, pp. 28–39, IEEE Computer Society.

    Google Scholar 

  • James D. Balfour and William J. Dally, “Design tradeoffs for tiled CMP on-chip networks,” in International Conference on Supercomputing, 2006, pp. 187–198.

    Google Scholar 

  • Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li, “The PARSEC benchmark suite: Characterization and architectural implications,” Tech. Rep., IN PRINCETON UNIVERSITY, 2008.

    Google Scholar 

  • Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood, “The GEM5 simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1–7, Aug. 2011.

    Article  Google Scholar 

  • T. Bjerregaard, “The MANGO clockless network-on-chip: Concepts and implementation,” 2005, Supervised by Assoc. Prof. Jens Sparsø, IMM.

    Google Scholar 

  • L. Bononi, N. Concer, M. Grammatikakis, M. Coppola, and R. Locatelli, “NoC Topologies Exploration based on Mapping and Simulation Models,” in Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on, 2007, pp. 543–546.

    Google Scholar 

  • T. Chelcea and S.M. Nowick, “A low-latency FIFO for mixed-clock systems,” in VLSI, 2000. Proceedings. IEEE Computer Society Workshop on, 2000, pp. 119–126.

    Google Scholar 

  • D. M. Chiu, M. Kadansky, R. Perlman, J. Reynders, G. Steele, and M. Yuksel, “Deadlock-free routing based on ordered links,” in Proceedings of the 27th Annual IEEE Conference on Local Computer Networks, Washington, DC, USA, 2002, LCN '02, pp. 0062–, IEEE Computer Society.

    Google Scholar 

  • E C Cummings and Peter Alfke, “Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons,” Technical Report, Sunburst Design, 2002.

    Google Scholar 

  • W. J. Dally and C. L. Seitz, “The Torus Routing Chip,” The Journal of Distributed Computing, vol. 1(3), pp. 187–196, 1986.

    Google Scholar 

  • W. J. Dally and C. L. Seitz, “Deadlock-free message routing in multiprocessor interconnection networks,” IEEE Trans. Comput., vol. 36, no. 5, pp. 547–553, May 1987.

    Article  MATH  Google Scholar 

  • W J Dally and J W Poulton, Digital Systems Engineering, Cambridge University Press, 1998.

    Google Scholar 

  • W.J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Design Automation Conference, 2001. Proceedings, 2001, pp. 684–689.

    Google Scholar 

  • Jose Duato, Sudhakar Yalamanchili, and Ni Lionel, Interconnection Networks: An Engineering Approach, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.

    Google Scholar 

  • G. Gerosa, S. Curtis, M. D’Addeo, Bo Jiang, B. Kuttanna, F. Merchant, B. Patel, M.H. Taufique, and H. Samarchi, “A Sub-2W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 1, pp. 73–82, 2009.

    Google Scholar 

  • P. Gratz, Changkyu Kim, R. McDonald, S.W. Keckler, and D. Burger, “Implementation and Evaluation of On-Chip Network Architectures,” in Computer Design, 2006. ICCD 2006. International Conference on, Oct 2006, pp. 477–484.

    Google Scholar 

  • M.N. Horak, S.M. Nowick, M. Carlberg, and U. Vishkin, “A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors,” in Networks-on-Chip (NOCS), 2010 Fourth ACM/IEEE International Symposium on, May 2010, pp. 43–50.

    Google Scholar 

  • Jingcao Hu, Yangdong Deng, and Radu Marculescu, “System-level point-to-point communication synthesis using floorplanning information,” in Proceedings of the 2002 Asia and South Pacific Design Automation Conference, Washington, DC, USA, 2002, ASP-DAC '02, pp. 573–, IEEE Computer Society.

    Google Scholar 

  • Inc Meta-Software, “HSPICE user’s manual,” Campbell, CA.

    Google Scholar 

  • F. Karim, A. Nguyen, and S. Dey, “An interconnect architecture for networking systems on chips,” Micro, IEEE, vol. 22, no. 5, pp. 36–45, Sep/Oct 2002.

    Google Scholar 

  • J. Kim, J. Balfour, and W.J. Dally, “Flattened butterfly topology for on-chip networks,” Computer Architecture Letters, vol. 6, no. 2, pp. 37–40, Feb. 2007.

    Article  Google Scholar 

  • M.M. Kim, J.D. Davis, M. Oskin, and T. Austin, “Polymorphic On-Chip Networks,” in Computer Architecture, 2008. ISCA '08. 35th International Symposium on, June 2008, pp. 101–112.

    Google Scholar 

  • Charles E. Leiserson, “Fat-trees: universal networks for hardware-efficient supercomputing,” IEEE Trans. Comput., vol. 34, pp. 892–901, October 1985.

    Article  Google Scholar 

  • Daniele Ludovici, Alessandro Strano, Georgi N. Gaydadjiev, and Davide Bertozzi, “Mesochronous NoC technology for power-efficient GALS MPSoCs,” in Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, New York, NY, USA, 2011, INA-OCMC '11, pp. 27–30, ACM.

    Google Scholar 

  • George Michelogiannakis, Daniel Sanchez, William J. Dally, and Christos Kozyrakis, “Evaluating bufferless flow control for on-chip networks,” in Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, Washington, DC, USA, 2010, NOCS '10, pp. 9–16, IEEE Computer Society.

    Google Scholar 

  • U Nawathe, “Design and implementation of Sun’s Niagara2 processor,” Technical Report, Sun Microsystems, 2007.

    Google Scholar 

  • L Peh H Wang and S Malik, “Power-driven design of router microarchitectures in on-chip networks,” in Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, dec. 2003, pp. 105–116.

    Google Scholar 

  • “PTM website,” http://www.eas.asu.edu/~ptm (Accessed April 22, 2013).

  • “Raphael Interconnect Analysis Tool: User’s Guide,”.

    Google Scholar 

  • H. Samuelsson and S. Kumar, “Ring Road NoC architecture,” in Norchip, 2004, pp. 16–19.

    Google Scholar 

  • Daniel Sanchez, George Michelogiannakis, and Christos Kozyrakis, “An analysis of on-chip interconnection networks for large-scale chip multiprocessors,” ACM Trans. Archit. Code Optim., vol. 7, pp. 4:1–4:28, May 2010.

    Google Scholar 

  • Yvain Thonnart, Pascal Vivet, and Fabien Clermidy, “A fully-asynchronous low-power framework for GALS NoC integration,” in Proceedings of the Conference on Design, Automation and Test in Europe, 3001 Leuven, Belgium, Belgium, 2010, DATE '10, pp. 33–38, European Design and Automation Association.

    Google Scholar 

  • Sergio Tota, Mario R. Casu, and Luca Macchiarulo, “Implementation analysis of NoC: a MPSoC trace-driven approach,” in Proceedings of the 16th ACM Great Lakes symposium on VLSI. 2006, GLSVLSI '06, pp. 204–209, ACM.

    Google Scholar 

  • Anh Thien Tran, Dean Nguyen Truong, and B. Baas, “A Reconfigurable Source-Synchronous On-Chip Network for GALS Many-Core Platforms,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 29, no. 6, pp. 897–910, June 2010.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayan Mandal .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Mandal, A., Khatri, S., Mahapatra, R. (2014). Fast Network-on-Chip Design. In: Source-Synchronous Networks-On-Chip. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9405-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-9405-8_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-9404-1

  • Online ISBN: 978-1-4614-9405-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics