A programmable shared-memory system for an array of processing-in-memory devices

Lee, Sangkuen; Sim, Hyogi; Kim, Youngjae; Vazhkudai, Sudharshan S.

doi:10.1007/s10586-018-2844-1

A programmable shared-memory system for an array of processing-in-memory devices

Published: 30 August 2018

Volume 22, pages 385–398, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Sangkuen Lee¹,
Hyogi Sim¹,
Youngjae Kim² &
…
Sudharshan S. Vazhkudai¹

277 Accesses
Explore all metrics

Abstract

Processing in memory (PIM), the concept of integrating processing directly with memory has been attracting a lot of attention, since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and parallel thread execution on multiple PIM devices. In this study, we present AnalyzeThat, a programmable shared-memory system for parallel data processing with PIM devices. Thematic to AnalyzeThat is a rich PIM-aware data structure (PADS), which is an encapsulation that integrally ties together the data, the analysis tasks and the runtime needed to interface with the PIM device array. The PADS abstraction provides (i) a sophisticated key-value data container that allows programmers to easily store data on multiple PIMs, (ii) a suite of parallel operations with which users can easily implement data analysis applications, and (iii) a runtime, hidden to programmers, which provides the mechanisms needed to overlay both the data and the tasks on the PIM device array in an intelligent fashion, based on PIM-specific information collected from the hardware. We have developed a PIM emulation framework called AnalyzeThat. Our experimental evaluation with representative data analytics applications suggests that the proposed system can significantly reduce the PIM programming effort without losing its technology benefits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

A Modern Primer on Processing in Memory

MT-3000: a heterogeneous multi-zone processor for HPC

Article 24 May 2022

Notes

The CV is defined as the ratio of the standard deviation s to the mean m of written data sizes across PIM devices. CV = \(s\over ~m\).

References

Kogge, P.M., Brockman, J.B., Sterling, T., Gao, G.: Processing in memory: chips to petaflops. In: Workshop on Mixing Logic and DRAM: Chips that Compute and Remember at ISCA, vol. 97 (1997)
Murphy, R.C., Kogge, P.M., Rodrigues, A.: The characterization of data intensive memory workloads on distributed PIM systems. In: Intelligent Memory Systems (2001)
Adibi, J., Barrett, T., Bhatt, S., Chalupsky, H., Chame, J., Hall, M.: Processing-in-memory technology for knowledge discovery algorithms. In: Proceedings of the DaMoN (2006)
Brockman, J.B., Thoziyoor, S., Kuntz, S.K., Kogge, P.M.: A low cost, multithreaded processing-in-memory system. In: Proceedings of the WMPI (2004)
Kang, Y., Huang, W., Yoo, S.-M., Keen, D., Ge, Z., Lam, V., Pattnaik, P., Torrellas, J.: FlexRAM: toward an advanced intelligent memory system. In: Proceedings of the ICCD (2012)
Draper, J., Chame, J., Hall, M., Steele, C., Barrett, T., LaCoss, J., Granacki, J., Shin, J., Chen, C., Kang, C.W. et al.: The architecture of the DIVA processing-in-memory chip. In: Proceedings of the SC (2002)
Pugsley, S.H., Jestes, J., Zhang, H., Balasubramonian, R., Srinivasan, V., Buyuktosunoglu, A., Davis, A., Li, F.: NDC: analyzing the impact of 3D-stacked memory logic devices on MapReduce workloads. In: Proceedings of the ISPASS (2014)
Scrbak, M., Islam, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Processing-in-Memory: Exploring the Design Space. Springer, Cham (2015)
Google Scholar
Zhang, D., Jayasena, N., Lyashevsky, A., Greathouse, J.L., Xu, L., Ignatowski, M.: TOP-PIM: throughput-oriented programmable processing in memory. In: Proceedings of the HPDC (2014)
Micron’s Automata: https://www.micronautomata.com
Raoux, S., Burr, G.W., Breitwisch, M.J., Rettner, C.T., Chen, Y.-C., Shelby, R.M., Salinga, M., Krebs, D., Chen, S.-H., Lung, H.-L.: Phase-change random access memory: a scalable technology. IBM J. Res. Dev. 52(4), 5 (2008)
Google Scholar
Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: The missing memristor found. Nature 453, 7191 (2008)
Article Google Scholar
Driskill-Smith, A.: Latest advances and future prospects of STT-RAM. In: Proceedings of the NVMW (2010)
Islam, M., Scrbak, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Improving node-level MapReduce performance using processing-in-memory technologies. In: Proceedings of the Euro-Par (2014)
Zhang, D.P., Jayasena, N., Lyashevsky, A., Greathouse, J., Meswani, M., Nutter, M., Ignatowski, M.: A new perspective on processing-in-memory architecture design. In: Proceedings of the SIGPLAN, ser. MSPC ’13, pp. 7:1–7:3 (2013)
Dongarra, J.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25, 3–60 (2011)
Article Google Scholar
Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: SkewTune: mitigating skew in MapReduce applications. In: Proceedings of the SIGMOD (2012)
Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: Proceedings of the IISWC (2009)
Lee, S., Sim, H., Kim, Y., Vazhkudai, S.S.: Analyzethat: a programmable shared-memory system for an array of processing-in-memory devices. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Press pp. 619–624 (2017)
Scrbak, M., Islam, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Processing-in-memory: exploring the design space. In: Proceedings of the ARCS (2015)
OpenCL: The open standard for parallel programming of heterogeneous systems. https://www.khronos.org/opencl/
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Hadoop, A.: Apache hadoop. http://hadoop.apache.org (2011)
Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: Proceedings of the MapReduce (2011)
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the PACT (2008)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the USENIX, vol. 10 (2010)
Pugsley, S.H., Jestes, J., Balasubramonian, R., Srinivasan, V., Buyuktosunoglu, A., Li, F., et al.: Comparing implementations of near-data computing with in-memory MapReduce workloads. IEEE Micro 34(4), 1 (2014)
Article Google Scholar
Cache Coherent Interconnect for Accelerators (CCIX): http://www.ccixconsortium.com
Nobis, S.: AMD’s Unified CPU & GPU Processor Concept
Loh, G., Jayasena, N., Oskin M. et al.: A processing in memory taxonomy and a case for studying fixed-function PIM. In: Near-Data Processing Workshop (2013)
ARM Cortex-A5: http://www.arm.com/products/processors/cortex-a/cortex-a5.php
Netezza Data Warehouse | IBM: https://www.ndm.net/datawarehouse/IBM/netezza
Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (2011)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of the ACM SIGOPS, vol. 41, no. 6 (2007)
Debnath, B., Sengupta, S., Li, J.: FlashStore: high throughput persistent key-value store. In: Proceedings of the VLDB Endowment, vol. 3, no. 1–2 (2010)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, Stanford (1999)
Google Scholar
EnWiki.NET: Encyclopaedia Britannica Ultimate. http://www.enwiki.net/
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data (2014)
Adamic, L.A., Huberman, B.A.: Power-law distribution of the world wide web. Science 287(5461), 2115 (2000)
Article Google Scholar

Download references

Acknowledgements

This research was supported in part by the U.S. DOE’s Office of Advanced Scientific Computing Research (ASCR) under the Scientific data management program, and the National Research Foundation of Korea (NRF) Grant funded by the Korea Government (MSIP) (No. 2015R1C1A1A0152105). The work was also supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE, under the contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, USA
Sangkuen Lee, Hyogi Sim & Sudharshan S. Vazhkudai
Department of Computer Science and Engineering, Sogang University, Office: AS911, 35 Baekbeomro, Mapogu, Seoul, 04107, Republic of Korea
Youngjae Kim

Authors

Sangkuen Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyogi Sim
View author publications
You can also search for this author in PubMed Google Scholar
Youngjae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sudharshan S. Vazhkudai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youngjae Kim.

Additional information

The preliminary version of the paper was published in the Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2017).

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., Sim, H., Kim, Y. et al. A programmable shared-memory system for an array of processing-in-memory devices. Cluster Comput 22, 385–398 (2019). https://doi.org/10.1007/s10586-018-2844-1

Download citation

Received: 03 January 2018
Accepted: 20 August 2018
Published: 30 August 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10586-018-2844-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A programmable shared-memory system for an array of processing-in-memory devices

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

MT-3000: a heterogeneous multi-zone processor for HPC

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A programmable shared-memory system for an array of processing-in-memory devices

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

MT-3000: a heterogeneous multi-zone processor for HPC

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation