Skip to main content

Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube

  • Conference paper
Book cover Architecture of Computing Systems – ARCS 2016 (ARCS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9637))

Included in the following conference series:

Abstract

3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. Several publications in the past few years demonstrate this renewed interest. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) die in HMC. An accurate simulation environment called SMCSim has been developed, along with a full featured software stack. The key contribution of this work is full system analysis of near memory computation including high-level software to low-level firmware and hardware layers, considering offloading and dynamic overheads caused by the operating system (OS), cache coherence, and memory management. A zero-copy pointer passing mechanism has been devised to allow low overhead data sharing between the host and the PIM. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host System-on-Chip (SoC), and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70 % and 55 % in comparison with the host and the accelerator, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mali-400/450 GPU device drivers. http://malideveloper.arm.com/resources/drivers

  2. Hybrid memory cube specification 2.1 (2014). http://www.hybridmemorycube.org/

  3. Ahn, J., Yoo, S., Choi, K.: Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Trans. Very Large Scale Integr. VLSI Syst. 99, 1–1 (2015)

    Google Scholar 

  4. Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 105–117. ACM, New York, NY, USA (2015)

    Google Scholar 

  5. Ahn, J., Yoo, S., Mutlu, O., Choi, K.: PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 336–348. ACM, New York, NY, USA (2015)

    Google Scholar 

  6. Alves, M.A.Z., Freitas, H.C., Navaux, P.O.A.: Investigation of shared L2 cache on many-core processors. In: 2009 22nd International Conference on Architecture of Computing Systems (ARCS), pp. 1–10, March 2009

    Google Scholar 

  7. Aminot, A., Lhuiller, Y., Castagnetti, A., et al.: Floating point units efficiency in multi-core processors. In: Proceedings, ARCS 2015 - The 28th International Conference on Architecture of Computing Systems, pp. 1–8, March 2015

    Google Scholar 

  8. Azarkhish, E., Rossi, D., Loi, I., Benini, L.: High performance AXI-4.0 based interconnect for extensible smart memory cubes. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015, pp. 1317–1322. EDA Consortium, San Jose, CA, USA (2015)

    Google Scholar 

  9. Boroujerdian, B., Keller, B., Lee, Y.: LPDDR2 memory controllerdesign in a 28 nm process. http://www.eecs.berkeley.edu/bkeller/~rekall.pdf

  10. Chandrasekar, K., Akesson, B., Goossens, K.: Improved power modeling of DDR SDRAMs. In: 2011 14th Euromicro Conference on Digital System Design (DSD), pp. 99–108, August 2011

    Google Scholar 

  11. Farmahini-Farahani, A., Ahn, J.H., Morrow, K., Kim, N.S.: NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 283–295, February 2015

    Google Scholar 

  12. Hansson, A., Agarwal, N., Kolli, A., et al.: Simulating DRAM controllers for future system architecture exploration. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 201–210, March 2014

    Google Scholar 

  13. Jeddeloh, J., Keeth, B.: Hybrid memory cube new DRAM architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT), pp. 87–88, June 2012

    Google Scholar 

  14. Kim, G., Kim, J., Ahn, J.H., Kim, J.: Memory-centric system interconnect design with hybrid memory cubes. In: 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 145–155, September 2013

    Google Scholar 

  15. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data

  16. Lloyd, S., Gokhale, M.: In-memory data rearrangement for irregular, data-intensive computing. Computer 48(8), 18–25 (2015)

    Article  Google Scholar 

  17. Nair, R.: Evolution of memory architecture. Proc. IEEE 103(8), 1331–1345 (2015)

    Article  Google Scholar 

  18. Paul, J., Stechele, W., Kroehnert, M., Asfour, T.: Improving efficiency of embedded multi-core platforms with scratchpad memories. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8, February 2014

    Google Scholar 

  19. Rosenfeld, P.: Performance Exploration of the Hybrid Memory Cube. Ph.D. thesis, University of Maryland (2014)

    Google Scholar 

  20. Salihoglu, S., Widom, J.: GPS: A graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, pp. 22:1–22:12. ACM, New York, NY, USA (2013)

    Google Scholar 

  21. Schaffner, M., Gürkaynak, F.K., Smolic, A., Benini, L.: DRAM or no-DRAM? exploring linear solver architectures for image domain warping in 28 nm CMOS. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. DATE 2015, EDA Consortium (2015)

    Google Scholar 

  22. Sura, Z., Jacob, A., Chen, T., et al.: Data access optimization in a processing-in-memory system. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. CF 2015, pp. 6:1–6:8. ACM, New York, NY, USA (2015)

    Google Scholar 

  23. Tudor, B.M., Teo, Y.M.: On understanding the energy consumption of ARM-based multicore servers. SIGMETRICS Perform. Eval. Rev. 41(1), 267–278 (2013)

    Article  Google Scholar 

  24. Wilton, S., Jouppi, N.: CACTI: an enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31(5), 677–688 (1996)

    Article  Google Scholar 

  25. Zhong, J., He, B.: Towards GPU-accelerated large-scale graph processing in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 1, pp. 9–16, December 2013

    Google Scholar 

Download references

Acknowledgment

This work was supported, in parts, by EU FP7 ERC Project MULTITHERMAN (GA no. 291125). We would also like to thank Samsung Electronics for their support and funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erfan Azarkhish .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Azarkhish, E., Rossi, D., Loi, I., Benini, L. (2016). Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30695-7_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30694-0

  • Online ISBN: 978-3-319-30695-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics