Abstract
In previous Chapter, we showed how resonant clocking can be used as a high-speed, low power, stable, on-chip clock generation and distribution schemes. In this chapter, we use such a clock to design a high speed source-synchronous ring-based NoC architecture. In Sect. 3.1, we introduce our NoC design, which comprises of extremely fast, intersecting source-synchronous data rings. These source-synchronous data rings traverse the CMP in both the horizontal and vertical directions providing complete connectivity to all the PEs in a CMP. In our approach, the interconnection network operates on a different clock domain which runs significantly faster than the PE clocks. This helps us achieve inter-processor communication with minimal latency. We perform architectural simulations of the ring-based NoC in Sect. 3.2. We propose a deadlock-free routing protocol of the source-synchronous ring-based NoC by using link ordering and virtual channel based buffered flow control. Architectural results obtained on synthetic and real traffic demonstrate that the source-synchronous ring-based NoC has significantly lower latency and higher maximum sustained injection rate compared to a state of the art mesh-based NoC. Next, in Sect. 3.3, we propose a modified source-synchronous design in which the PEs extract a low jitter clock directly from the high speed ring clock by division, and hence are synchronous with the NoC. This is feasible due to the extremely good jitter characteristics of the SWO based clock generation and distribution scheme of Sect. 2.2. Using the above modified design, we propose a class of source-synchronous NoCs organized in an H-tree topology which consume lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that our H-tree based NoC designs can provide significantly lower latency and are able to sustain a higher injection rate compared to a state of the art mesh. Using the modified source-synchronous design proposed in Sect. 3.3, we also evaluate two more floorplan-friendly NoC topologies in Sect. 3.4. These two floorplan-friendly NoC topologies consume significantly lower logic and wiring area compared to a state of the art mesh. Architectural simulations on synthetic and real traffic show that they can provide significantly lower latency while achieving same or better maximum sustained injection rate compared to a state of the art mesh.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, and Venkatanand Venkatachalapathy, “Microarchitectural Wire Management for Performance and Power in Partitioned Architectures,” in Proceedings of the 11th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, 2005, pp. 28–39, IEEE Computer Society.
James D. Balfour and William J. Dally, “Design tradeoffs for tiled CMP on-chip networks,” in International Conference on Supercomputing, 2006, pp. 187–198.
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li, “The PARSEC benchmark suite: Characterization and architectural implications,” Tech. Rep., IN PRINCETON UNIVERSITY, 2008.
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood, “The GEM5 simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1–7, Aug. 2011.
T. Bjerregaard, “The MANGO clockless network-on-chip: Concepts and implementation,” 2005, Supervised by Assoc. Prof. Jens Sparsø, IMM.
L. Bononi, N. Concer, M. Grammatikakis, M. Coppola, and R. Locatelli, “NoC Topologies Exploration based on Mapping and Simulation Models,” in Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on, 2007, pp. 543–546.
T. Chelcea and S.M. Nowick, “A low-latency FIFO for mixed-clock systems,” in VLSI, 2000. Proceedings. IEEE Computer Society Workshop on, 2000, pp. 119–126.
D. M. Chiu, M. Kadansky, R. Perlman, J. Reynders, G. Steele, and M. Yuksel, “Deadlock-free routing based on ordered links,” in Proceedings of the 27th Annual IEEE Conference on Local Computer Networks, Washington, DC, USA, 2002, LCN '02, pp. 0062–, IEEE Computer Society.
E C Cummings and Peter Alfke, “Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons,” Technical Report, Sunburst Design, 2002.
W. J. Dally and C. L. Seitz, “The Torus Routing Chip,” The Journal of Distributed Computing, vol. 1(3), pp. 187–196, 1986.
W. J. Dally and C. L. Seitz, “Deadlock-free message routing in multiprocessor interconnection networks,” IEEE Trans. Comput., vol. 36, no. 5, pp. 547–553, May 1987.
W J Dally and J W Poulton, Digital Systems Engineering, Cambridge University Press, 1998.
W.J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Design Automation Conference, 2001. Proceedings, 2001, pp. 684–689.
Jose Duato, Sudhakar Yalamanchili, and Ni Lionel, Interconnection Networks: An Engineering Approach, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
G. Gerosa, S. Curtis, M. D’Addeo, Bo Jiang, B. Kuttanna, F. Merchant, B. Patel, M.H. Taufique, and H. Samarchi, “A Sub-2W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 1, pp. 73–82, 2009.
P. Gratz, Changkyu Kim, R. McDonald, S.W. Keckler, and D. Burger, “Implementation and Evaluation of On-Chip Network Architectures,” in Computer Design, 2006. ICCD 2006. International Conference on, Oct 2006, pp. 477–484.
M.N. Horak, S.M. Nowick, M. Carlberg, and U. Vishkin, “A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors,” in Networks-on-Chip (NOCS), 2010 Fourth ACM/IEEE International Symposium on, May 2010, pp. 43–50.
Jingcao Hu, Yangdong Deng, and Radu Marculescu, “System-level point-to-point communication synthesis using floorplanning information,” in Proceedings of the 2002 Asia and South Pacific Design Automation Conference, Washington, DC, USA, 2002, ASP-DAC '02, pp. 573–, IEEE Computer Society.
Inc Meta-Software, “HSPICE user’s manual,” Campbell, CA.
F. Karim, A. Nguyen, and S. Dey, “An interconnect architecture for networking systems on chips,” Micro, IEEE, vol. 22, no. 5, pp. 36–45, Sep/Oct 2002.
J. Kim, J. Balfour, and W.J. Dally, “Flattened butterfly topology for on-chip networks,” Computer Architecture Letters, vol. 6, no. 2, pp. 37–40, Feb. 2007.
M.M. Kim, J.D. Davis, M. Oskin, and T. Austin, “Polymorphic On-Chip Networks,” in Computer Architecture, 2008. ISCA '08. 35th International Symposium on, June 2008, pp. 101–112.
Charles E. Leiserson, “Fat-trees: universal networks for hardware-efficient supercomputing,” IEEE Trans. Comput., vol. 34, pp. 892–901, October 1985.
Daniele Ludovici, Alessandro Strano, Georgi N. Gaydadjiev, and Davide Bertozzi, “Mesochronous NoC technology for power-efficient GALS MPSoCs,” in Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, New York, NY, USA, 2011, INA-OCMC '11, pp. 27–30, ACM.
George Michelogiannakis, Daniel Sanchez, William J. Dally, and Christos Kozyrakis, “Evaluating bufferless flow control for on-chip networks,” in Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, Washington, DC, USA, 2010, NOCS '10, pp. 9–16, IEEE Computer Society.
U Nawathe, “Design and implementation of Sun’s Niagara2 processor,” Technical Report, Sun Microsystems, 2007.
L Peh H Wang and S Malik, “Power-driven design of router microarchitectures in on-chip networks,” in Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, dec. 2003, pp. 105–116.
“PTM website,” http://www.eas.asu.edu/~ptm (Accessed April 22, 2013).
“Raphael Interconnect Analysis Tool: User’s Guide,”.
H. Samuelsson and S. Kumar, “Ring Road NoC architecture,” in Norchip, 2004, pp. 16–19.
Daniel Sanchez, George Michelogiannakis, and Christos Kozyrakis, “An analysis of on-chip interconnection networks for large-scale chip multiprocessors,” ACM Trans. Archit. Code Optim., vol. 7, pp. 4:1–4:28, May 2010.
Yvain Thonnart, Pascal Vivet, and Fabien Clermidy, “A fully-asynchronous low-power framework for GALS NoC integration,” in Proceedings of the Conference on Design, Automation and Test in Europe, 3001 Leuven, Belgium, Belgium, 2010, DATE '10, pp. 33–38, European Design and Automation Association.
Sergio Tota, Mario R. Casu, and Luca Macchiarulo, “Implementation analysis of NoC: a MPSoC trace-driven approach,” in Proceedings of the 16th ACM Great Lakes symposium on VLSI. 2006, GLSVLSI '06, pp. 204–209, ACM.
Anh Thien Tran, Dean Nguyen Truong, and B. Baas, “A Reconfigurable Source-Synchronous On-Chip Network for GALS Many-Core Platforms,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 29, no. 6, pp. 897–910, June 2010.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Mandal, A., Khatri, S., Mahapatra, R. (2014). Fast Network-on-Chip Design. In: Source-Synchronous Networks-On-Chip. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9405-8_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9405-8_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9404-1
Online ISBN: 978-1-4614-9405-8
eBook Packages: EngineeringEngineering (R0)