Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube

Azarkhish, Erfan; Rossi, Davide; Loi, Igor; Benini, Luca

doi:10.1007/978-3-319-30695-7_2

Erfan Azarkhish¹⁹,
Davide Rossi¹⁹,
Igor Loi¹⁹ &
…
Luca Benini^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9637))

Included in the following conference series:

International Conference on Architecture of Computing Systems

2217 Accesses
27 Citations

Abstract

3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. Several publications in the past few years demonstrate this renewed interest. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) die in HMC. An accurate simulation environment called SMCSim has been developed, along with a full featured software stack. The key contribution of this work is full system analysis of near memory computation including high-level software to low-level firmware and hardware layers, considering offloading and dynamic overheads caused by the operating system (OS), cache coherence, and memory management. A zero-copy pointer passing mechanism has been devised to allow low overhead data sharing between the host and the PIM. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host System-on-Chip (SoC), and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70 % and 55 % in comparison with the host and the accelerator, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mali-400/450 GPU device drivers. http://malideveloper.arm.com/resources/drivers
Hybrid memory cube specification 2.1 (2014). http://www.hybridmemorycube.org/
Ahn, J., Yoo, S., Choi, K.: Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Trans. Very Large Scale Integr. VLSI Syst. 99, 1–1 (2015)
Google Scholar
Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 105–117. ACM, New York, NY, USA (2015)
Google Scholar
Ahn, J., Yoo, S., Mutlu, O., Choi, K.: PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 336–348. ACM, New York, NY, USA (2015)
Google Scholar
Alves, M.A.Z., Freitas, H.C., Navaux, P.O.A.: Investigation of shared L2 cache on many-core processors. In: 2009 22nd International Conference on Architecture of Computing Systems (ARCS), pp. 1–10, March 2009
Google Scholar
Aminot, A., Lhuiller, Y., Castagnetti, A., et al.: Floating point units efficiency in multi-core processors. In: Proceedings, ARCS 2015 - The 28th International Conference on Architecture of Computing Systems, pp. 1–8, March 2015
Google Scholar
Azarkhish, E., Rossi, D., Loi, I., Benini, L.: High performance AXI-4.0 based interconnect for extensible smart memory cubes. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015, pp. 1317–1322. EDA Consortium, San Jose, CA, USA (2015)
Google Scholar
Boroujerdian, B., Keller, B., Lee, Y.: LPDDR2 memory controllerdesign in a 28 nm process. http://www.eecs.berkeley.edu/bkeller/~rekall.pdf
Chandrasekar, K., Akesson, B., Goossens, K.: Improved power modeling of DDR SDRAMs. In: 2011 14th Euromicro Conference on Digital System Design (DSD), pp. 99–108, August 2011
Google Scholar
Farmahini-Farahani, A., Ahn, J.H., Morrow, K., Kim, N.S.: NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 283–295, February 2015
Google Scholar
Hansson, A., Agarwal, N., Kolli, A., et al.: Simulating DRAM controllers for future system architecture exploration. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 201–210, March 2014
Google Scholar
Jeddeloh, J., Keeth, B.: Hybrid memory cube new DRAM architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT), pp. 87–88, June 2012
Google Scholar
Kim, G., Kim, J., Ahn, J.H., Kim, J.: Memory-centric system interconnect design with hybrid memory cubes. In: 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 145–155, September 2013
Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Lloyd, S., Gokhale, M.: In-memory data rearrangement for irregular, data-intensive computing. Computer 48(8), 18–25 (2015)
Article Google Scholar
Nair, R.: Evolution of memory architecture. Proc. IEEE 103(8), 1331–1345 (2015)
Article Google Scholar
Paul, J., Stechele, W., Kroehnert, M., Asfour, T.: Improving efficiency of embedded multi-core platforms with scratchpad memories. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8, February 2014
Google Scholar
Rosenfeld, P.: Performance Exploration of the Hybrid Memory Cube. Ph.D. thesis, University of Maryland (2014)
Google Scholar
Salihoglu, S., Widom, J.: GPS: A graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, pp. 22:1–22:12. ACM, New York, NY, USA (2013)
Google Scholar
Schaffner, M., Gürkaynak, F.K., Smolic, A., Benini, L.: DRAM or no-DRAM? exploring linear solver architectures for image domain warping in 28 nm CMOS. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. DATE 2015, EDA Consortium (2015)
Google Scholar
Sura, Z., Jacob, A., Chen, T., et al.: Data access optimization in a processing-in-memory system. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. CF 2015, pp. 6:1–6:8. ACM, New York, NY, USA (2015)
Google Scholar
Tudor, B.M., Teo, Y.M.: On understanding the energy consumption of ARM-based multicore servers. SIGMETRICS Perform. Eval. Rev. 41(1), 267–278 (2013)
Article Google Scholar
Wilton, S., Jouppi, N.: CACTI: an enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31(5), 677–688 (1996)
Article Google Scholar
Zhong, J., He, B.: Towards GPU-accelerated large-scale graph processing in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 1, pp. 9–16, December 2013
Google Scholar

Download references

Acknowledgment

This work was supported, in parts, by EU FP7 ERC Project MULTITHERMAN (GA no. 291125). We would also like to thank Samsung Electronics for their support and funding.

Author information

Authors and Affiliations

University of Bologna, Bologna, Italy
Erfan Azarkhish, Davide Rossi, Igor Loi & Luca Benini
Swiss Federal Institute of Technology in Zurich, Zurich, Switzerland
Luca Benini

Authors

Erfan Azarkhish
View author publications
You can also search for this author in PubMed Google Scholar
Davide Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Igor Loi
View author publications
You can also search for this author in PubMed Google Scholar
Luca Benini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erfan Azarkhish .

Editor information

Editors and Affiliations

Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Frank Hannig
Faculty of Engineering (FEUP), University of Porto, Porto, Portugal
João M. P. Cardoso
Universität zu Lübeck, Lübeck, Germany
Thilo Pionteck
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Dietmar Fey
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Wolfgang Schröder-Preikschat
Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Jürgen Teich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Azarkhish, E., Rossi, D., Loi, I., Benini, L. (2016). Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-30695-7_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30694-0
Online ISBN: 978-3-319-30695-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics