Memory Architecture and Management in an NoC Platform

  • Axel Jantsch
  • Xiaowen Chen
  • Abdul Naeem
  • Yuang Zhang
  • Sando Penolazzi
  • Zhonghai Lu


The memory organization and the management of the memory space is a critical part of every NoC based platform design. We propose a Data Management Engine (DME), that is a block of programmable hardware and part of every processing element. It off-loads the processing element (CPU, DSP, etc.) by managing the memory space, memory access and the communication over the on-chip network. The DME’s main functions are virtual address translation, private and shared memory management, cache coherence protocol, support for memory consistency models, synchronization and protection mechanisms for shared memory communication. The DME is fully programmable and configurable thus allowing for customized support for high level data management functions such as dynamic memory allocation and abstract data types. This chapter describes the main concepts, design and functionality of the DME and presents case studies illustrating its usage and performance.


Network on Chip SoC Architecture Memory Organization 


  1. 1.
    The Aeroflex Gaisler webpage.
  2. 2.
  3. 3.
    Sarita V. Adve and Kourosh Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66–76, 1996.Google Scholar
  4. 4.
    Bradford M. Beckmann and David A. Wood. Managing wire delay in large chip-multiprocessor caches. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 319–330, Washington, DC, USA, 2004. IEEE Computer Society.Google Scholar
  5. 5.
    Xiaowen Chen, Shuming Chen, Zhonghai Lu, and Axel Jantsch. Area and performance optimization of barrier synchronization on multi-core network-on-chips. In 3rd IEEE International Conference on Computer and Electrical Engineering (ICCEE), Chengdu, China, November 2010.Google Scholar
  6. 6.
    Xiaowen Chen, Zhonghai Lu, Shuming Chen, and Axel Jantsch. Run-time partitioning of hybrid distributed shared memory on multi-core network-on-chips. In The 3rd IEEE International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010), Dalian, China, December 2010.Google Scholar
  7. 7.
    Xiaowen Chen, Zhonghai Lu, Axel Jantsch, and Shuming Chen. Handling shared variable synchronization in multi-core network-on-chip with distributed memory. In International SOC Conference, Las Vegas, Nevada, September 2010.Google Scholar
  8. 8.
    Xiaowen Chen, Zhonghai Lu, Axel Jantsch, and Shuming Chen. Supporting distributed shared memory on multi-core network-on-chips using a dual microcoded controller. In Proceedings of the confernece for Design Automation and Test in Europe, Dresden, Germany, March 2010.Google Scholar
  9. 9.
    Xiaowen Chen, Zhonghai Lu, Axel Jantsch, and Shuming Chen. Supporting efficient synchronization in multi-core NoCs using dynamic buffer allocation technique. In Proceedings of the IEEE Annual Symposium on VLSI, Kefalonia, Greece, July 2010.Google Scholar
  10. 10.
    Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 55, Washington, DC, USA, 2003. IEEE Computer Society.Google Scholar
  11. 11.
    Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in cmps. SIGARCH Comput. Archit. News, 33(2):357–368, 2005.Google Scholar
  12. 12.
    David E. Culler, Jaswinder Pal Singh, and Anoop Gupta. Parallel Computer Architecture - A Hardware/Software Approach. Morgan Kaufman Publishers, 1999.Google Scholar
  13. 13.
    Pierre Guironnet de Massas and Frédéric Pétrot. Comparison of memory write policies for NoC based multicore cache coherent systems. In DATE ’08: Proceedings of the conference on Design, automation and test in Europe, pages 997–1002, New York, NY, USA, 2008. ACM.Google Scholar
  14. 14.
    Michel Dubois, Christoph Scheurich, and Faye Briggs. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434–442, June 1986.Google Scholar
  15. 15.
    K. Gharachorloo, D. Lenoski, J. Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. Computer Architecture News, 18(2):15–26, June 1990.Google Scholar
  16. 16.
    J. Hennessy, M. Heinrich, and A. Gupta. Cache-coherent distributed shared memory: perspectives on its development and future challenges. Proceedings of the IEEE, 87(3):418 –429, March 1999.CrossRefGoogle Scholar
  17. 17.
    John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 3rd edition, 2003.Google Scholar
  18. 18.
    Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhang, Doug Burger, and Stephen W. Keckler. A NUCA substrate for flexible CMP cache sharing. In ICS ’05: Proceedings of the 19th annual international conference on Supercomputing, pages 31–40, New York, NY, USA, 2005. ACM.Google Scholar
  19. 19.
    Axel Jantsch, Matthew Grange, and Dinesh Pamunuwa. The promises and limitations of 3-D integration. In Abbas Sheibanyrad, Frédéric Pétrot, and Axel Jantsch, editors, 3D Integartion for NoC-based SoC Architectures, Integrated Circuits and Systems, chapter 2. Springer, 2011.Google Scholar
  20. 20.
    C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, 10 2002.Google Scholar
  21. 21.
    L. Lamport. How to make a multiprocessors computer that correctly executes multiprocessors programs. IEEE Transaction on Computers, C-28(9):690–691, September 1979.Google Scholar
  22. 22.
    Feihui Li, Chrysostomos Nicopoulos, Thomas Richardson, Yuan Xie, Vijaykrishnan Narayanan, and Mahmut Kandemir. Design and management of 3 D chip multiprocessors using network-in-memory. ACM SIGARCH Computer Architecture News, 34(2):130–141, 2006.MATHCrossRefGoogle Scholar
  23. 23.
    C. C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari. Bridging the processor-memory performance gap with 3D IC technology. Design and Test of Computers, 22(6):556–564, November-December 2005.Google Scholar
  24. 24.
    Gabriel Loh. 3D-stacked memory architectures for multi-core processors. In Proceedings for the 35th ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2008.Google Scholar
  25. 25.
    G. L. Loi, B. Agarwal, N. Srivastava, S.-C. Lin, and T. Sherwood. A thermally-aware performance analysis of vertically integrated 3-D processor memory hierarchy. In Proceedings of the 43rd Desigfn Automation Conference, 2006.Google Scholar
  26. 26.
    Abdul Naeem, Xiaowen Chen, Zhonghai Lu, and Axel Jantsch. Scalability of transaction counter based relaxed consistency models in NoC based multicore architectures. ACM SIGARCH Computer Architecture News, December 2009.Google Scholar
  27. 27.
    Abdul Naeem, Xiaowen Chen, Zhonghai Lu, and Axel Jantsch. Scalability of weak consistency in NoC based multicore architectures. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, June 2010.Google Scholar
  28. 28.
    Abdul Naeem, Xiaowen Chen, Zhonghai Lu, and Axel Jantsch. Realization and performance comparison of sequential and weak memory consistency models in network-on-chip based multi-core systems. In Proceedings of the 16th Asian Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, January 2011.Google Scholar
  29. 29.
    T.G. Rauscher and P.M. Adams. Microprogramming: A tutorial and survey of recent developments. Computers, IEEE Transactions on, C-29(1):2 –20, January 1980.Google Scholar
  30. 30.
    Chuan Seng Tan. Three-dimensional integration of integrated circuits - and introduction. In Abbas Sheibanyrad, Frédéric Pétrot, and Axel Jantsch, editors, 3D Integartion for NoC-based SoC Architectures, Integrated Circuits and Systems, chapter 1. Springer, 2011.Google Scholar
  31. 31.
    R. Stanley Williams. How we found the missing memristor. IEEE Spectrum, December 2008.Google Scholar
  32. 32.
    W. A. Wulf and Sally A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20–24, 1995.Google Scholar
  33. 33.
    Michael Zhang and Krste Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA ’05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 336–345, Washington, DC, USA, 2005. IEEE Computer Society.Google Scholar
  34. 34.
    Yuang Zhang, Zhonghai Lu, Axel Jantsch, Li Li, and Minglun Gao. Towards hierarchical cluster based cache coherence for large-scale network-on-chip. In Proceedings of the 4th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era, Cairo, Egypt, April 2009.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Axel Jantsch
    • 1
  • Xiaowen Chen
    • 1
  • Abdul Naeem
    • 1
  • Yuang Zhang
    • 1
  • Sando Penolazzi
    • 1
  • Zhonghai Lu
    • 1
  1. 1.Royal Institute of TechnologyStockholmSweden

Personalised recommendations