Fast Network-on-Chip Design

  • Ayan Mandal
  • Sunil P. Khatri
  • Rabi N. Mahapatra


In previous Chapter, we showed how resonant clocking can be used as a high-speed, low power, stable, on-chip clock generation and distribution schemes. In this chapter, we use such a clock to design a high speed source-synchronous ring-based NoC architecture. In Sect. 3.1, we introduce our NoC design, which comprises of extremely fast, intersecting source-synchronous data rings. These source-synchronous data rings traverse the CMP in both the horizontal and vertical directions providing complete connectivity to all the PEs in a CMP. In our approach, the interconnection network operates on a different clock domain which runs significantly faster than the PE clocks. This helps us achieve inter-processor communication with minimal latency. We perform architectural simulations of the ring-based NoC in Sect. 3.2. We propose a deadlock-free routing protocol of the source-synchronous ring-based NoC by using link ordering and virtual channel based buffered flow control. Architectural results obtained on synthetic and real traffic demonstrate that the source-synchronous ring-based NoC has significantly lower latency and higher maximum sustained injection rate compared to a state of the art mesh-based NoC. Next, in Sect. 3.3, we propose a modified source-synchronous design in which the PEs extract a low jitter clock directly from the high speed ring clock by division, and hence are synchronous with the NoC. This is feasible due to the extremely good jitter characteristics of the SWO based clock generation and distribution scheme of Sect. 2.2. Using the above modified design, we propose a class of source-synchronous NoCs organized in an H-tree topology which consume lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that our H-tree based NoC designs can provide significantly lower latency and are able to sustain a higher injection rate compared to a state of the art mesh. Using the modified source-synchronous design proposed in Sect. 3.3, we also evaluate two more floorplan-friendly NoC topologies in Sect. 3.4. These two floorplan-friendly NoC topologies consume significantly lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that they can provide significantly lower latency while achieving same or better maximum sustained injection rate compared to a state of the art mesh.


Injection Rate Virtual Channel Junction Station Link Utilization Architectural Simulation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, and Venkatanand Venkatachalapathy, “Microarchitectural Wire Management for Performance and Power in Partitioned Architectures,” in Proceedings of the 11th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, 2005, pp. 28–39, IEEE Computer Society.Google Scholar
  2. James D. Balfour and William J. Dally, “Design tradeoffs for tiled CMP on-chip networks,” in International Conference on Supercomputing, 2006, pp. 187–198.Google Scholar
  3. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li, “The PARSEC benchmark suite: Characterization and architectural implications,” Tech. Rep., IN PRINCETON UNIVERSITY, 2008.Google Scholar
  4. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood, “The GEM5 simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1–7, Aug. 2011.CrossRefGoogle Scholar
  5. T. Bjerregaard, “The MANGO clockless network-on-chip: Concepts and implementation,” 2005, Supervised by Assoc. Prof. Jens Sparsø, IMM.Google Scholar
  6. L. Bononi, N. Concer, M. Grammatikakis, M. Coppola, and R. Locatelli, “NoC Topologies Exploration based on Mapping and Simulation Models,” in Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on, 2007, pp. 543–546.Google Scholar
  7. T. Chelcea and S.M. Nowick, “A low-latency FIFO for mixed-clock systems,” in VLSI, 2000. Proceedings. IEEE Computer Society Workshop on, 2000, pp. 119–126.Google Scholar
  8. D. M. Chiu, M. Kadansky, R. Perlman, J. Reynders, G. Steele, and M. Yuksel, “Deadlock-free routing based on ordered links,” in Proceedings of the 27th Annual IEEE Conference on Local Computer Networks, Washington, DC, USA, 2002, LCN '02, pp. 0062–, IEEE Computer Society.Google Scholar
  9. E C Cummings and Peter Alfke, “Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons,” Technical Report, Sunburst Design, 2002.Google Scholar
  10. W. J. Dally and C. L. Seitz, “The Torus Routing Chip,” The Journal of Distributed Computing, vol. 1(3), pp. 187–196, 1986.Google Scholar
  11. W. J. Dally and C. L. Seitz, “Deadlock-free message routing in multiprocessor interconnection networks,” IEEE Trans. Comput., vol. 36, no. 5, pp. 547–553, May 1987.CrossRefMATHGoogle Scholar
  12. W J Dally and J W Poulton, Digital Systems Engineering, Cambridge University Press, 1998.Google Scholar
  13. W.J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Design Automation Conference, 2001. Proceedings, 2001, pp. 684–689.Google Scholar
  14. Jose Duato, Sudhakar Yalamanchili, and Ni Lionel, Interconnection Networks: An Engineering Approach, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.Google Scholar
  15. G. Gerosa, S. Curtis, M. D’Addeo, Bo Jiang, B. Kuttanna, F. Merchant, B. Patel, M.H. Taufique, and H. Samarchi, “A Sub-2W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 1, pp. 73–82, 2009.Google Scholar
  16. P. Gratz, Changkyu Kim, R. McDonald, S.W. Keckler, and D. Burger, “Implementation and Evaluation of On-Chip Network Architectures,” in Computer Design, 2006. ICCD 2006. International Conference on, Oct 2006, pp. 477–484.Google Scholar
  17. M.N. Horak, S.M. Nowick, M. Carlberg, and U. Vishkin, “A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors,” in Networks-on-Chip (NOCS), 2010 Fourth ACM/IEEE International Symposium on, May 2010, pp. 43–50.Google Scholar
  18. Jingcao Hu, Yangdong Deng, and Radu Marculescu, “System-level point-to-point communication synthesis using floorplanning information,” in Proceedings of the 2002 Asia and South Pacific Design Automation Conference, Washington, DC, USA, 2002, ASP-DAC '02, pp. 573–, IEEE Computer Society.Google Scholar
  19. Inc Meta-Software, “HSPICE user’s manual,” Campbell, CA.Google Scholar
  20. F. Karim, A. Nguyen, and S. Dey, “An interconnect architecture for networking systems on chips,” Micro, IEEE, vol. 22, no. 5, pp. 36–45, Sep/Oct 2002.Google Scholar
  21. J. Kim, J. Balfour, and W.J. Dally, “Flattened butterfly topology for on-chip networks,” Computer Architecture Letters, vol. 6, no. 2, pp. 37–40, Feb. 2007.CrossRefGoogle Scholar
  22. M.M. Kim, J.D. Davis, M. Oskin, and T. Austin, “Polymorphic On-Chip Networks,” in Computer Architecture, 2008. ISCA '08. 35th International Symposium on, June 2008, pp. 101–112.Google Scholar
  23. Charles E. Leiserson, “Fat-trees: universal networks for hardware-efficient supercomputing,” IEEE Trans. Comput., vol. 34, pp. 892–901, October 1985.CrossRefGoogle Scholar
  24. Daniele Ludovici, Alessandro Strano, Georgi N. Gaydadjiev, and Davide Bertozzi, “Mesochronous NoC technology for power-efficient GALS MPSoCs,” in Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, New York, NY, USA, 2011, INA-OCMC '11, pp. 27–30, ACM.Google Scholar
  25. George Michelogiannakis, Daniel Sanchez, William J. Dally, and Christos Kozyrakis, “Evaluating bufferless flow control for on-chip networks,” in Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, Washington, DC, USA, 2010, NOCS '10, pp. 9–16, IEEE Computer Society.Google Scholar
  26. U Nawathe, “Design and implementation of Sun’s Niagara2 processor,” Technical Report, Sun Microsystems, 2007.Google Scholar
  27. L Peh H Wang and S Malik, “Power-driven design of router microarchitectures in on-chip networks,” in Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, dec. 2003, pp. 105–116.Google Scholar
  28. “PTM website,” (Accessed April 22, 2013).
  29. “Raphael Interconnect Analysis Tool: User’s Guide,”.Google Scholar
  30. H. Samuelsson and S. Kumar, “Ring Road NoC architecture,” in Norchip, 2004, pp. 16–19.Google Scholar
  31. Daniel Sanchez, George Michelogiannakis, and Christos Kozyrakis, “An analysis of on-chip interconnection networks for large-scale chip multiprocessors,” ACM Trans. Archit. Code Optim., vol. 7, pp. 4:1–4:28, May 2010.Google Scholar
  32. Yvain Thonnart, Pascal Vivet, and Fabien Clermidy, “A fully-asynchronous low-power framework for GALS NoC integration,” in Proceedings of the Conference on Design, Automation and Test in Europe, 3001 Leuven, Belgium, Belgium, 2010, DATE '10, pp. 33–38, European Design and Automation Association.Google Scholar
  33. Sergio Tota, Mario R. Casu, and Luca Macchiarulo, “Implementation analysis of NoC: a MPSoC trace-driven approach,” in Proceedings of the 16th ACM Great Lakes symposium on VLSI. 2006, GLSVLSI '06, pp. 204–209, ACM.Google Scholar
  34. Anh Thien Tran, Dean Nguyen Truong, and B. Baas, “A Reconfigurable Source-Synchronous On-Chip Network for GALS Many-Core Platforms,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 29, no. 6, pp. 897–910, June 2010.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Ayan Mandal
    • 1
  • Sunil P. Khatri
    • 2
  • Rabi N. Mahapatra
    • 3
  1. 1.Computer Science and EngineeringTexas A&M UniversityCollege StationUSA
  2. 2.Electrical and Computer EngineeringTexas A&M UniversityCollege StationUSA
  3. 3.Computer Science and EngineeringTexas A&M UniversityCollege StationUSA

Personalised recommendations