Abstract
3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. Several publications in the past few years demonstrate this renewed interest. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) die in HMC. An accurate simulation environment called SMCSim has been developed, along with a full featured software stack. The key contribution of this work is full system analysis of near memory computation including high-level software to low-level firmware and hardware layers, considering offloading and dynamic overheads caused by the operating system (OS), cache coherence, and memory management. A zero-copy pointer passing mechanism has been devised to allow low overhead data sharing between the host and the PIM. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host System-on-Chip (SoC), and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70 % and 55 % in comparison with the host and the accelerator, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mali-400/450 GPU device drivers. http://malideveloper.arm.com/resources/drivers
Hybrid memory cube specification 2.1 (2014). http://www.hybridmemorycube.org/
Ahn, J., Yoo, S., Choi, K.: Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Trans. Very Large Scale Integr. VLSI Syst. 99, 1–1 (2015)
Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 105–117. ACM, New York, NY, USA (2015)
Ahn, J., Yoo, S., Mutlu, O., Choi, K.: PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 336–348. ACM, New York, NY, USA (2015)
Alves, M.A.Z., Freitas, H.C., Navaux, P.O.A.: Investigation of shared L2 cache on many-core processors. In: 2009 22nd International Conference on Architecture of Computing Systems (ARCS), pp. 1–10, March 2009
Aminot, A., Lhuiller, Y., Castagnetti, A., et al.: Floating point units efficiency in multi-core processors. In: Proceedings, ARCS 2015 - The 28th International Conference on Architecture of Computing Systems, pp. 1–8, March 2015
Azarkhish, E., Rossi, D., Loi, I., Benini, L.: High performance AXI-4.0 based interconnect for extensible smart memory cubes. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015, pp. 1317–1322. EDA Consortium, San Jose, CA, USA (2015)
Boroujerdian, B., Keller, B., Lee, Y.: LPDDR2 memory controllerdesign in a 28 nm process. http://www.eecs.berkeley.edu/bkeller/~rekall.pdf
Chandrasekar, K., Akesson, B., Goossens, K.: Improved power modeling of DDR SDRAMs. In: 2011 14th Euromicro Conference on Digital System Design (DSD), pp. 99–108, August 2011
Farmahini-Farahani, A., Ahn, J.H., Morrow, K., Kim, N.S.: NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 283–295, February 2015
Hansson, A., Agarwal, N., Kolli, A., et al.: Simulating DRAM controllers for future system architecture exploration. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 201–210, March 2014
Jeddeloh, J., Keeth, B.: Hybrid memory cube new DRAM architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT), pp. 87–88, June 2012
Kim, G., Kim, J., Ahn, J.H., Kim, J.: Memory-centric system interconnect design with hybrid memory cubes. In: 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 145–155, September 2013
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Lloyd, S., Gokhale, M.: In-memory data rearrangement for irregular, data-intensive computing. Computer 48(8), 18–25 (2015)
Nair, R.: Evolution of memory architecture. Proc. IEEE 103(8), 1331–1345 (2015)
Paul, J., Stechele, W., Kroehnert, M., Asfour, T.: Improving efficiency of embedded multi-core platforms with scratchpad memories. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8, February 2014
Rosenfeld, P.: Performance Exploration of the Hybrid Memory Cube. Ph.D. thesis, University of Maryland (2014)
Salihoglu, S., Widom, J.: GPS: A graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, pp. 22:1–22:12. ACM, New York, NY, USA (2013)
Schaffner, M., Gürkaynak, F.K., Smolic, A., Benini, L.: DRAM or no-DRAM? exploring linear solver architectures for image domain warping in 28 nm CMOS. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. DATE 2015, EDA Consortium (2015)
Sura, Z., Jacob, A., Chen, T., et al.: Data access optimization in a processing-in-memory system. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. CF 2015, pp. 6:1–6:8. ACM, New York, NY, USA (2015)
Tudor, B.M., Teo, Y.M.: On understanding the energy consumption of ARM-based multicore servers. SIGMETRICS Perform. Eval. Rev. 41(1), 267–278 (2013)
Wilton, S., Jouppi, N.: CACTI: an enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31(5), 677–688 (1996)
Zhong, J., He, B.: Towards GPU-accelerated large-scale graph processing in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 1, pp. 9–16, December 2013
Acknowledgment
This work was supported, in parts, by EU FP7 ERC Project MULTITHERMAN (GA no. 291125). We would also like to thank Samsung Electronics for their support and funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Azarkhish, E., Rossi, D., Loi, I., Benini, L. (2016). Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-30695-7_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30694-0
Online ISBN: 978-3-319-30695-7
eBook Packages: Computer ScienceComputer Science (R0)