Journal of Computer Science and Technology

, Volume 29, Issue 2, pp 255–272 | Cite as

MIMS: Towards a Message Interface Based Memory System

  • Li-Cheng Chen
  • Ming-Yu Chen
  • Yuan Ruan
  • Yong-Bing Huang
  • Ze-Han Cui
  • Tian-Yue Lu
  • Yun-Gang Bao
Regular Paper

Abstract

The decades-old synchronous memory bus interface has restricted many innovations in the memory system, which is facing various challenges (or walls) in the era of multi-core and big data. In this paper, we argue that a message based interface should be adopted to replace the traditional bus-based interface in the memory system. A novel message interface based memory system called MIMS is proposed. The key innovation of MIMS is that processors communicate with the memory system through a universal and flexible message packet interface. Each message packet is allowed to encapsulate multiple memory requests (or commands) and additional semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible for processing packets, scheduling memory requests, preparing responses, and executing specific commands with the help of semantic information. Under the MIMS framework, many previous innovations on memory architecture as well as new optimization opportunities such as address compression and continuous requests combination can be naturally incorporated. The experimental results on a 16-core cycle-detailed simulation system show that: with accurate granularity message, MIMS can improve system performance by 53.21% and reduce energy delay product (EDP) by 55.90%. Furthermore, it can improve effective bandwidth utilization by 62.42% and reduce memory access latency by 51% on average.

Keywords

message interface memory system asynchronous granularity semantic information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2014_1428_MOESM1_ESM.doc (28 kb)
(DOC 29 KB)

References

  1. 1.
    Udipi A N, Muralimanohar N, Chatterjee N et al. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proc. the 37th Annual Int. Symposium on Computer Architecture, Jun. 2010, pp.175-186.Google Scholar
  2. 2.
    Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. SIGARCH Computer Architecture News, 1995, 23(1): 20–24.Google Scholar
  3. 3.
    Rogers B M, Krishna A, Bell G B et al. Scaling the bandwidth wall: Challenges in and avenues for CMP scaling. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.371-382.Google Scholar
  4. 4.
    Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and through put. In Proc. the 38th Annual Int. Symposium on Computer Architecture, Jun. 2011, pp.295-306.Google Scholar
  5. 5.
    Ferdman M, Adileh A, Kocberber O et al. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proc. the 17th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2012, pp.37-48.Google Scholar
  6. 6.
    Lotfi-Kamran P, Grot B, Ferdman M et al. Scale-out processors. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.500-511.Google Scholar
  7. 7.
    Yoon D H, Jeong M K, Sullivan M et al. The dynamic granularity memory system. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.548-559.Google Scholar
  8. 8.
    Fredriksson H, Svensson C. Improvement potential and equalization example for multidrop DRAM memory buses. IEEE Transactions on Advanced Packaging, 2009, 32(3): 675–682.CrossRefGoogle Scholar
  9. 9.
    Cooper-Balis E, Rosenfeld P, Jacob B. Buffer-on-board memory systems. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.392-403.Google Scholar
  10. 10.
    Lee B C, Ipek E, Mutlu O et al. Architecting phase change memory as a scalable dram alternative. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.2-13.Google Scholar
  11. 11.
    Udipi A N, Muralimanohar N, Balsubramonian R et al. Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems. In Proc. the 38th Annual Int. Symposium on Computer Architecture, Jun. 2011, pp.425-436.Google Scholar
  12. 12.
    Barroso L A, Höelzle U. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st edition). Morgan and Claypool Publishers, 2009.Google Scholar
  13. 13.
    Deng Q Y, Meisner D, Ramos L et al. Memscale: Active low-power modes for main memory. In Proc. the 16th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.225-238.Google Scholar
  14. 14.
    Li S, Chen K, Hsieh M Y et al. System implications of memory reliability in exascale computing. In Proc. the 2011 Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Nov. 2011, Article No.46.Google Scholar
  15. 15.
    Yoon D H, Muralimanohar N, Chang J C et al. FREE-p: Protecting non-volatile memory against both hard and soft errors. In Proc. the 17th IEEE Int. Symposium on High Performance Computer Architecture, Feb. 2011, pp.466-477.Google Scholar
  16. 16.
    Draper J, Chame J, Hall M et al. The architecture of the DIVA processing-in-memory chip. In Proc. the 16th Int. Conf. Supercomputing, Jun. 2002, pp.14-25.Google Scholar
  17. 17.
    Fang Z, Zhang L X, Carter J B et al. Active memory operations. In Proc. the 21st Annual Int. Conf. Supercomputing, Jun. 2007, pp.232-241.Google Scholar
  18. 18.
    Fang Z, Zhang L X, Carter J B et al. Active memory controller. J. Supercomput., 2012, 62(1): 510–549.CrossRefGoogle Scholar
  19. 19.
    Lynch B, Kumar S. Smart memory. In Hot Chips: A Symposium on High Performance Chips, Aug. 2010. http://www.hotchips.org/wp-content/uploads/hc archives/hc22/H-C22.23.325-1-Kumar-Smart-Memory.pdf, Feb. 2014.Google Scholar
  20. 20.
    Ware F A, Hampel C. Improving power and data efficiency with threaded memory modules. In Proc. the 24th Int. Conf. Computer Design, Oct. 2006, pp.417-424.Google Scholar
  21. 21.
    Ahn J H, Leverich J, Schreiber R S et al. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. Computer Architecture Letters, 2009, 8(1):5–8.CrossRefGoogle Scholar
  22. 22.
    Ahn J H, Jouppi N P, Kozyrakis C et al. Future scaling of processor-memory interfaces. In Proc. the Conf. High Performance Computing Networking, Storage and Analysis, Nov. 2009, Article No.42.Google Scholar
  23. 23.
    Zheng H Z, Lin J, Zhang Z et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st Annual IEEE/ACM Int. Symposium on Microarchitecture, Nov. 2008, pp.210-221.Google Scholar
  24. 24.
    Fang K, Zheng H Z, Zhu Z C. Heterogeneous mini-rank: Adaptive, power-efficient memory architecture. In Proc. the 39th Int. Conf. Parallel Processing, Sept. 2010, pp.21-29.Google Scholar
  25. 25.
    Zhang G F, Wang H D, Chen X K et al. Heterogeneous multi-channel: Fine-grained DRAM control for both system performance and power efficiency. In Proc. the 49th ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp.876-881.Google Scholar
  26. 26.
    Cooper-Balis E, Jacob B. Fine-grained activation for power reduction in DRAM. IEEE Micro, 2010, 30(3): 34–47.CrossRefGoogle Scholar
  27. 27.
    Brewer T M. Instruction set innovations for the convey HC-1 computer. IEEE Micro, 2010, 30(2): 70–79.CrossRefGoogle Scholar
  28. 28.
    Abts D, Bataineh A, Scott S et al. The cray blackwidow: A highly scalable vector multiprocessor. In Proc. the 2007 ACM/IEEE Conf. Supercomputing, Nov. 2007, Article No.17.Google Scholar
  29. 29.
    Chen L, Cao Y N, Zhang Z. E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories. ACM Transactions on Architecture and Code Optimization, 2013, 10(4): Article No.32.Google Scholar
  30. 30.
    Yoon D H, Erez M. Virtualized and exible ECC for main memory. In Proc. the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, Mar. 2010, pp.397-408.Google Scholar
  31. 31.
    Udipi A N, Muralimanohar N, Balsubramonian R et al. LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems. In Proc. the 39th Annual Int. Symposium on Computer Architecture, Jun. 2012, pp.285-296.Google Scholar
  32. 32.
    Zheng H Z, Lin J, Zhang Z et al. Decoupled DIMM: Building high-bandwidth memory system using low-speed DRAM devices. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.255-266.Google Scholar
  33. 33.
    Yoon D H, Chang J C, Muralimanohar N et al. BOOM: Enabling mobile memory based low-power server DIMMs. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.25-36.Google Scholar
  34. 34.
    Kalla R, Sinharoy B, Starke W J et al. Power7: IBM's next-generation server processor. IEEE Micro, 2010, 30(2): 7–15.CrossRefGoogle Scholar
  35. 35.
    Van Huben G A, Lamb K D, Tremaine R B et al. Server-class DDR3 SDRAM memory buffer chip. IBM Journal of Research and Development, 2012, 56(1.2): Article No.3.Google Scholar
  36. 36.
    Fang K, Chen L, Zhang Z et al. Memory architecture for integrating emerging memory technologies. In Proc. the 2011 Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2011, pp.403-412.Google Scholar
  37. 37.
    Ham T J, Chelepalli B K, Xue N et al. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proc. the 19th Int. Symposium on High Performance Computer Architecture, Feb. 2013, pp.424-435.Google Scholar
  38. 38.
    Hall M, Kogge P, Koller J et al. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proc. the 1999 ACM/IEEE Conf. Supercomputing (CDROM), Jan. 1999, Article No.57.Google Scholar
  39. 39.
    Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.24-33.Google Scholar
  40. 40.
    Zhou P, Zhao B, Yang J et al. A durable and energy efficient main memory using phase change memory technology. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.14-23.Google Scholar
  41. 41.
    Zhang W Y, Li T. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proc. the 18th Int. Conf. Parallel Architectures and Compilation Techniques, Sept. 2009, pp.101-112.Google Scholar
  42. 42.
    Stuecheli J, Kaseridis D, Daly D et al. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proc. the 37th Annual Int. Symposium on Computer Architecture, Jun. 2010, pp.72-82.Google Scholar
  43. 43.
    Chatterjee N, Muralimanohar N, Balasubramonian R et al. Staged reads: Mitigating the impact of DRAM writes on DRAM reads. In Proc. the 18th Int. Symposium on High-Performance Computer Architecture, Feb. 2012, pp.41-52.Google Scholar
  44. 44.
    Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSIM2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16–19.CrossRefGoogle Scholar
  45. 45.
    Rixner S, Dally W J, Kapasi U J et al. Memory access scheduling. In Proc. the 27th Annual Int. Symposium on Computer Architecture, Jun. 2000, pp.128-138.Google Scholar
  46. 46.
    Luk C K, Cohn R, Muth R et al. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. the 2005 ACM SIGPLAN Conf. Programming Language Design and Implementation, Jun. 2005, pp.190-200.Google Scholar
  47. 47.
    Bienia C, Kumar S, Singh J P et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.Google Scholar
  48. 48.
    Bader D A, Cong G J, Feo J. On the architectural requirements for efficient execution of graph algorithms. In Proc. the 2005 Int. Conf. Parallel Processing, Jun. 2005, pp.547-556.Google Scholar
  49. 49.
    Bader D A, Madduri K. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In Proc. the 12th Int. Conf. High Performance Computing, Dec. 2005, pp.465-476.Google Scholar
  50. 50.
    Lu T Y, Chen L C, Chen M Y. Achieving efficient packet-based memory system by exploiting correlation of memory requests. In proc. Design, Automation & Test in Europe, Mar. 2014, to be appeared.Google Scholar

Copyright information

© Springer Science+Business Media New York & Science Press, China 2014

Authors and Affiliations

  • Li-Cheng Chen
    • 1
    • 2
  • Ming-Yu Chen
    • 1
  • Yuan Ruan
    • 1
  • Yong-Bing Huang
    • 1
    • 2
  • Ze-Han Cui
    • 1
    • 2
  • Tian-Yue Lu
    • 1
    • 2
  • Yun-Gang Bao
    • 1
  1. 1.State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations