Soft Computing

, Volume 22, Issue 6, pp 2065–2077 | Cite as

Exploiting dynamic transaction queue size in scalable memory systems

  • Mario Donato Marino
  • Tien-Hsiung Weng
  • Kuan-Ching Li
Methodologies and Application

Abstract

In order to increase parallelism via memory width in scalable memory systems, a straightforward approach is to employ larger number of memory controllers (MCs). Nevertheless, a number of researches have pointed out that, even executing bandwidth-bound applications in systems with larger number of MCs, the number of transaction queue entries is under-utilized—namely as shallower transaction queues, which provides an opportunity to power saving. In order to address this challenge, we propose the use of transaction queues with dynamic size that employs the most adequate size, taking into consideration the number of entries utilized while presenting adequate levels of bandwidth and minimizing power. Experimental results show that, while saving up to 75% number of entries, the introduction of dynamic transaction queue mechanism can present savings up to 75% of bandwidth and 20% of rank energy-per-bit reduction compared to systems with 1–2 entries.

Keywords

Memory Controller Dynamic Transaction Queue Scalable 

Notes

Acknowledgements

We would like to thank Maria Amelia Guitti Marino and anonymous reviewers for their valuable contributions and suggestions.

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human or animals performed by any of the authors.

References

  1. AMD Reveals Details About Bulldozer Microprocessors. http://www.xbitlabs.com/news/cpu/display/20100824154814_AMD_Unveils_Details_About_Bulldozer_Microprocessors.html. Accessed 9 June 2016
  2. Binkert NL et al (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60CrossRefGoogle Scholar
  3. Bontempi G, Kruijtzer W (2004) The use of intelligent data analysis techniques for system-level design: a software estimation example. Soft Comput 8(7):477–490CrossRefMATHGoogle Scholar
  4. Byun G et al (2011) An 8.4 Gb/s 2.5 pJ/b mobile memory I/O interface using bi-directional and simultaneous dual (base+RF)-band signaling. In: ISSCC, IEEE, pp 488, 490Google Scholar
  5. CACTI 5.1. (2016) http://www.hpl.hp.com/techreports/2008/HPL-2008-20.html. Accessed 22 Oct 2016
  6. Calculating Memory System Power for DDR3 Introduction. http://www.micron.com/. Accessed 12 June 2015
  7. Chang MCF et al (2008) Power reduction of CMP communication networks via RF-interconnects. In: MICRO, IEEE, Washington, USA, 2008, pp 376–387Google Scholar
  8. Chang MF et al (2008) CMP network-on-chip overlaid with multi-band RF-interconnect. In: HPCA , pp 191–202Google Scholar
  9. Chang MCF et al (2005) Advanced RF/baseband interconnect schemes for inter- and intra-ULSI communications. IEEE Trans Electron Dev 52:1271–1285CrossRefGoogle Scholar
  10. Darren M (2016) Chitty Improving the performance GPU-based genetic programming through exploitation of on-chip memory. Soft Comput 20(2):661–680CrossRefGoogle Scholar
  11. David H et al(2011) Memory power management via dynamic voltage/frequency scaling. In: Proceedings of the 8th ACM international conference on autonomic computing, ICAC’11, ACM, New York, NY, USA pp 31–40Google Scholar
  12. David Wang et al (2005) DRAMsim: a memory system simulator. ACM SIGARCH Comput Arch News 33(4):100–107CrossRefGoogle Scholar
  13. Deng Q et al (2012) MultiScale: memory system DVFS with multiple memory controllers. In: Proceedings of the 2012 ACM/IEEE international symposium on low power electronics and design, ISLPED’12, ACM, New York, NY, USA, pp 297–302Google Scholar
  14. Deng Q et al(2011) Memscale: active low-power modes for main memory. In: Proceedings of the sixteenth ASPLOS, ACM, New York, NY, USA, pp 225–238Google Scholar
  15. Hybrid Memory Cube Specification 1.0. (2016) http://www.hybridmemorycube.org/. Accessed 9 Dec 2016
  16. ITRS HOME (2016) http://www.itrs.net/. Accessed 18 Aug 2016
  17. JEDEC Publishes Breakthrough Standard for Wide I/O Mobile DRAM. http://www.jedec.org/. Accessed 11 Mar 2016
  18. Jeong MK et al (2012) A qos-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In: DAC, ACM, New York, USA, pp 850–855Google Scholar
  19. Jantz MR, Strickland C, Kumar K, Dimitrov M, Doshi KA (2013) A framework for application guidance in virtual memory systems. In: VEE, ACM, pp 344–355Google Scholar
  20. Li S et al (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: MICRO’09, ACM, New York, USA, pp 469–480Google Scholar
  21. Little JDC (1961) A proof for the queuing formula: L = W. Oper Res 9(3):383387. doi: 10.1287/opre.9.3.383 MathSciNetCrossRefGoogle Scholar
  22. Loh GH (2008) 3D-stacked memory architectures for multi-core processors. In: ISCA, IEEE, DC, USA, pp 453–464Google Scholar
  23. Malladi et al (2012) Towards energy-proportional datacenter memory with mobile DRAM. In: Proceedings of the 39th annual international symposium on computer architecture, ISCA’12, IEEE Computer Society, Washington, DC, USA, pp 37–48Google Scholar
  24. Marino MD (2006) L2-cache hierarchical organizations for multi-core architectures. In: Frontiers of high performance computing and networking—ISPA 2006 workshops: ISPA 2006 international workshops, FHPCN, XHPC, S-GRACE, GridGIS, HPC-GTP, PDCE, ParDMCom, WOMP, ISDF, and UPWN, Proceedings. Springer, pp 74–83Google Scholar
  25. Marino MD (2012) On-package scalability of RF and inductive memory controllers. In: Euromicro DSD, IEEE, pp 923–930Google Scholar
  26. Marino MD (2012) RFiop: RF-memory path to address on-package I/O pad and memory controller scalability. In: ICCD, 2012, Montreal, Quebec, Canada, IEEE, pp 183–188Google Scholar
  27. Marino MD (2013) RFiof: an RF approach to the I/O-pin and memory controller scalability for off-chip memories. In: CF, Ischia, Italy, ACM, pp. 100–110, 14–16 May 2013Google Scholar
  28. Marino MD (2016) ABaT-FS: towards adjustable bandwidth and temperature via frequency scaling in scalable memory systems. Microprocess Microsyst 45:339–354CrossRefGoogle Scholar
  29. Marino MD, Li KC (2014) Insights on memory controller scaling in multi-core embedded systems. Int J Embed Syst 6(4):351–361CrossRefGoogle Scholar
  30. Marino MD, Li KC (2016) Last level cache size heterogeneity in embedded systems. J Supercomput 72(2):503–544CrossRefGoogle Scholar
  31. Marino MD, Li KC (2016) Implications of Shallower Memory Controller Transaction Queues in Scalable Memory Systems. J Supercomput 72:1785–1798Google Scholar
  32. McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers, IEEE TCCA Newsletter, pp 19–25Google Scholar
  33. Micron manufactures DRAM components and modules and NAND Flash. http://www.micron.com/. Accessed 01 Aug 2016
  34. Mobile Forum (2016) LPDDR4 Moves Mobile, presented by Daniel Skinner. http://www.jedec.org/sites/.../D_Skinner_Mobile_Forum_May_2013_0.pdf. Accessed 27 Jan 2016
  35. Nair PJ et al (2013) ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates. In: Proceedings of the 40th annual international symposium on computer architecture, ISCA’13, ACM, New York, NY, USA, pp 72–83Google Scholar
  36. NAS Parallel Benchmarks (2016) http://www.nas.nasa.gov/Resources/Software/npb.html/. Accessed 08 Nov 2016
  37. Nogueira B et al (2016) Multi-objective optimization of multimedia embedded systems using genetic algorithms and stochastic simulation. Soft Comput. doi: 10.1007/s00500-016-2061-x
  38. Novakovic S et al (2014) Scale-out NUMA. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems, ASPLOS’14, ACM, New York, NY, USA, pp 3–18Google Scholar
  39. Pase D (2016) The pChase memory benchmark page. http://pchase.org/. Accessed 10 May 2016
  40. Rünger G, Rauber T (2013) Parallel programming: for multicore and cluster systems, 2nd edn. Springer, BerlinMATHGoogle Scholar
  41. Scoton FM, Kobayashi J, Marino MD (2012) Adapted discrete-based entropy cache replacement algorithm. In: International conference on high performance computing and simulation (HPCS), pp 534–540Google Scholar
  42. Taassori M et al (2014) Exploring a brink-of-failure memory controller to design an approximate memory system. In: 1st Workshop on approximate computing across the system stack (WACAS), ACM, Salt Lake City, pp 72–83Google Scholar
  43. Tam S-W et al (2011) RF-interconnect for future network-on-chip. In: Low power network-on-chip, pp 255–280Google Scholar
  44. Therdsteerasukdi K et al (2011) The DIMM tree architecture: a high bandwidth and scalable memory system. In: ICCD, IEEE, pp 388–395Google Scholar
  45. Udipi AN (2012) Designing efficient memory for future computing systems. Ph.D. Thesis, University of Utah, School of Computing, Utah, USA, pp 1–126Google Scholar
  46. Usui H, Subramanian L, Chang K, Mutlu O (2016) SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators. arXiv:1505.07502. Accessed 10 Feb 2016
  47. Vantrease et al (2008) Corona: system implications of emerging nanophotonic technology. In: ISCA, IEEE, DC, USA, pp 153–164Google Scholar
  48. Zhang X et al(2015) Exploiting dram restore time variations in deep sub-micron scaling. In: Proceedings of the 2015 design, automation and test in Europe conference and exhibition, DATE’15, San Jose, CA, USA, pp 477–482Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Mario Donato Marino
    • 1
  • Tien-Hsiung Weng
    • 3
  • Kuan-Ching Li
    • 2
    • 3
  1. 1.Leeds Beckett UniversityLeedsUK
  2. 2.Xiamen UniversityXiamenChina
  3. 3.Providence UniversityTaichungTaiwan

Personalised recommendations