The decades-old synchronous memory bus interface has restricted many innovations in the memory system, which is facing various challenges (or walls) in the era of multi-core and big data. In this paper, we argue that a message based interface should be adopted to replace the traditional bus-based interface in the memory system. A novel message interface based memory system called MIMS is proposed. The key innovation of MIMS is that processors communicate with the memory system through a universal and flexible message packet interface. Each message packet is allowed to encapsulate multiple memory requests (or commands) and additional semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible for processing packets, scheduling memory requests, preparing responses, and executing specific commands with the help of semantic information. Under the MIMS framework, many previous innovations on memory architecture as well as new optimization opportunities such as address compression and continuous requests combination can be naturally incorporated. The experimental results on a 16-core cycle-detailed simulation system show that: with accurate granularity message, MIMS can improve system performance by 53.21% and reduce energy delay product (EDP) by 55.90%. Furthermore, it can improve effective bandwidth utilization by 62.42% and reduce memory access latency by 51% on average.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Udipi A N, Muralimanohar N, Chatterjee N et al. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proc. the 37th Annual Int. Symposium on Computer Architecture, Jun. 2010, pp.175-186.
Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. SIGARCH Computer Architecture News, 1995, 23(1): 20–24.
Rogers B M, Krishna A, Bell G B et al. Scaling the bandwidth wall: Challenges in and avenues for CMP scaling. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.371-382.
Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and through put. In Proc. the 38th Annual Int. Symposium on Computer Architecture, Jun. 2011, pp.295-306.
Ferdman M, Adileh A, Kocberber O et al. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proc. the 17th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2012, pp.37-48.
Lotfi-Kamran P, Grot B, Ferdman M et al. Scale-out processors. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.500-511.
Yoon D H, Jeong M K, Sullivan M et al. The dynamic granularity memory system. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.548-559.
Fredriksson H, Svensson C. Improvement potential and equalization example for multidrop DRAM memory buses. IEEE Transactions on Advanced Packaging, 2009, 32(3): 675–682.
Cooper-Balis E, Rosenfeld P, Jacob B. Buffer-on-board memory systems. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.392-403.
Lee B C, Ipek E, Mutlu O et al. Architecting phase change memory as a scalable dram alternative. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.2-13.
Udipi A N, Muralimanohar N, Balsubramonian R et al. Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems. In Proc. the 38th Annual Int. Symposium on Computer Architecture, Jun. 2011, pp.425-436.
Barroso L A, Höelzle U. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st edition). Morgan and Claypool Publishers, 2009.
Deng Q Y, Meisner D, Ramos L et al. Memscale: Active low-power modes for main memory. In Proc. the 16th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.225-238.
Li S, Chen K, Hsieh M Y et al. System implications of memory reliability in exascale computing. In Proc. the 2011 Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Nov. 2011, Article No.46.
Yoon D H, Muralimanohar N, Chang J C et al. FREE-p: Protecting non-volatile memory against both hard and soft errors. In Proc. the 17th IEEE Int. Symposium on High Performance Computer Architecture, Feb. 2011, pp.466-477.
Draper J, Chame J, Hall M et al. The architecture of the DIVA processing-in-memory chip. In Proc. the 16th Int. Conf. Supercomputing, Jun. 2002, pp.14-25.
Fang Z, Zhang L X, Carter J B et al. Active memory operations. In Proc. the 21st Annual Int. Conf. Supercomputing, Jun. 2007, pp.232-241.
Fang Z, Zhang L X, Carter J B et al. Active memory controller. J. Supercomput., 2012, 62(1): 510–549.
Lynch B, Kumar S. Smart memory. In Hot Chips: A Symposium on High Performance Chips, Aug. 2010. http://www.hotchips.org/wp-content/uploads/hc archives/hc22/H-C22.23.325-1-Kumar-Smart-Memory.pdf, Feb. 2014.
Ware F A, Hampel C. Improving power and data efficiency with threaded memory modules. In Proc. the 24th Int. Conf. Computer Design, Oct. 2006, pp.417-424.
Ahn J H, Leverich J, Schreiber R S et al. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. Computer Architecture Letters, 2009, 8(1):5–8.
Ahn J H, Jouppi N P, Kozyrakis C et al. Future scaling of processor-memory interfaces. In Proc. the Conf. High Performance Computing Networking, Storage and Analysis, Nov. 2009, Article No.42.
Zheng H Z, Lin J, Zhang Z et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st Annual IEEE/ACM Int. Symposium on Microarchitecture, Nov. 2008, pp.210-221.
Fang K, Zheng H Z, Zhu Z C. Heterogeneous mini-rank: Adaptive, power-efficient memory architecture. In Proc. the 39th Int. Conf. Parallel Processing, Sept. 2010, pp.21-29.
Zhang G F, Wang H D, Chen X K et al. Heterogeneous multi-channel: Fine-grained DRAM control for both system performance and power efficiency. In Proc. the 49th ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp.876-881.
Cooper-Balis E, Jacob B. Fine-grained activation for power reduction in DRAM. IEEE Micro, 2010, 30(3): 34–47.
Brewer T M. Instruction set innovations for the convey HC-1 computer. IEEE Micro, 2010, 30(2): 70–79.
Abts D, Bataineh A, Scott S et al. The cray blackwidow: A highly scalable vector multiprocessor. In Proc. the 2007 ACM/IEEE Conf. Supercomputing, Nov. 2007, Article No.17.
Chen L, Cao Y N, Zhang Z. E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories. ACM Transactions on Architecture and Code Optimization, 2013, 10(4): Article No.32.
Yoon D H, Erez M. Virtualized and exible ECC for main memory. In Proc. the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, Mar. 2010, pp.397-408.
Udipi A N, Muralimanohar N, Balsubramonian R et al. LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems. In Proc. the 39th Annual Int. Symposium on Computer Architecture, Jun. 2012, pp.285-296.
Zheng H Z, Lin J, Zhang Z et al. Decoupled DIMM: Building high-bandwidth memory system using low-speed DRAM devices. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.255-266.
Yoon D H, Chang J C, Muralimanohar N et al. BOOM: Enabling mobile memory based low-power server DIMMs. In Proc. the 39th Int. Symposium on Computer Architecture, Jun. 2012, pp.25-36.
Kalla R, Sinharoy B, Starke W J et al. Power7: IBM's next-generation server processor. IEEE Micro, 2010, 30(2): 7–15.
Van Huben G A, Lamb K D, Tremaine R B et al. Server-class DDR3 SDRAM memory buffer chip. IBM Journal of Research and Development, 2012, 56(1.2): Article No.3.
Fang K, Chen L, Zhang Z et al. Memory architecture for integrating emerging memory technologies. In Proc. the 2011 Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2011, pp.403-412.
Ham T J, Chelepalli B K, Xue N et al. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proc. the 19th Int. Symposium on High Performance Computer Architecture, Feb. 2013, pp.424-435.
Hall M, Kogge P, Koller J et al. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proc. the 1999 ACM/IEEE Conf. Supercomputing (CDROM), Jan. 1999, Article No.57.
Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.24-33.
Zhou P, Zhao B, Yang J et al. A durable and energy efficient main memory using phase change memory technology. In Proc. the 36th Annual Int. Symposium on Computer Architecture, Jun. 2009, pp.14-23.
Zhang W Y, Li T. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proc. the 18th Int. Conf. Parallel Architectures and Compilation Techniques, Sept. 2009, pp.101-112.
Stuecheli J, Kaseridis D, Daly D et al. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proc. the 37th Annual Int. Symposium on Computer Architecture, Jun. 2010, pp.72-82.
Chatterjee N, Muralimanohar N, Balasubramonian R et al. Staged reads: Mitigating the impact of DRAM writes on DRAM reads. In Proc. the 18th Int. Symposium on High-Performance Computer Architecture, Feb. 2012, pp.41-52.
Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSIM2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16–19.
Rixner S, Dally W J, Kapasi U J et al. Memory access scheduling. In Proc. the 27th Annual Int. Symposium on Computer Architecture, Jun. 2000, pp.128-138.
Luk C K, Cohn R, Muth R et al. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. the 2005 ACM SIGPLAN Conf. Programming Language Design and Implementation, Jun. 2005, pp.190-200.
Bienia C, Kumar S, Singh J P et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.
Bader D A, Cong G J, Feo J. On the architectural requirements for efficient execution of graph algorithms. In Proc. the 2005 Int. Conf. Parallel Processing, Jun. 2005, pp.547-556.
Bader D A, Madduri K. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In Proc. the 12th Int. Conf. High Performance Computing, Dec. 2005, pp.465-476.
Lu T Y, Chen L C, Chen M Y. Achieving efficient packet-based memory system by exploiting correlation of memory requests. In proc. Design, Automation & Test in Europe, Mar. 2014, to be appeared.
This work is partially supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010401, the National Basic Research 973 Program of China under Grant No. 2011CB302502, the National Natural Science Foundation of China under Grant Nos. 60925009, 61221062, 61331008, and the Huawei Research Program under Grant No. YBCB2011030.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Chen, L., Chen, M., Ruan, Y. et al. MIMS: Towards a Message Interface Based Memory System. J. Comput. Sci. Technol. 29, 255–272 (2014). https://doi.org/10.1007/s11390-014-1428-7
- message interface
- memory system
- semantic information