Skip to main content
Log in

A programmable shared-memory system for an array of processing-in-memory devices

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Processing in memory (PIM), the concept of integrating processing directly with memory has been attracting a lot of attention, since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and parallel thread execution on multiple PIM devices. In this study, we present AnalyzeThat, a programmable shared-memory system for parallel data processing with PIM devices. Thematic to AnalyzeThat is a rich PIM-aware data structure (PADS), which is an encapsulation that integrally ties together the data, the analysis tasks and the runtime needed to interface with the PIM device array. The PADS abstraction provides (i) a sophisticated key-value data container that allows programmers to easily store data on multiple PIMs, (ii) a suite of parallel operations with which users can easily implement data analysis applications, and (iii) a runtime, hidden to programmers, which provides the mechanisms needed to overlay both the data and the tasks on the PIM device array in an intelligent fashion, based on PIM-specific information collected from the hardware. We have developed a PIM emulation framework called AnalyzeThat. Our experimental evaluation with representative data analytics applications suggests that the proposed system can significantly reduce the PIM programming effort without losing its technology benefits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The CV is defined as the ratio of the standard deviation s to the mean m of written data sizes across PIM devices. CV = \(s\over ~m\).

References

  1. Kogge, P.M., Brockman, J.B., Sterling, T., Gao, G.: Processing in memory: chips to petaflops. In: Workshop on Mixing Logic and DRAM: Chips that Compute and Remember at ISCA, vol. 97 (1997)

  2. Murphy, R.C., Kogge, P.M., Rodrigues, A.: The characterization of data intensive memory workloads on distributed PIM systems. In: Intelligent Memory Systems (2001)

  3. Adibi, J., Barrett, T., Bhatt, S., Chalupsky, H., Chame, J., Hall, M.: Processing-in-memory technology for knowledge discovery algorithms. In: Proceedings of the DaMoN (2006)

  4. Brockman, J.B., Thoziyoor, S., Kuntz, S.K., Kogge, P.M.: A low cost, multithreaded processing-in-memory system. In: Proceedings of the WMPI (2004)

  5. Kang, Y., Huang, W., Yoo, S.-M., Keen, D., Ge, Z., Lam, V., Pattnaik, P., Torrellas, J.: FlexRAM: toward an advanced intelligent memory system. In: Proceedings of the ICCD (2012)

  6. Draper, J., Chame, J., Hall, M., Steele, C., Barrett, T., LaCoss, J., Granacki, J., Shin, J., Chen, C., Kang, C.W. et al.: The architecture of the DIVA processing-in-memory chip. In: Proceedings of the SC (2002)

  7. Pugsley, S.H., Jestes, J., Zhang, H., Balasubramonian, R., Srinivasan, V., Buyuktosunoglu, A., Davis, A., Li, F.: NDC: analyzing the impact of 3D-stacked memory logic devices on MapReduce workloads. In: Proceedings of the ISPASS (2014)

  8. Scrbak, M., Islam, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Processing-in-Memory: Exploring the Design Space. Springer, Cham (2015)

    Google Scholar 

  9. Zhang, D., Jayasena, N., Lyashevsky, A., Greathouse, J.L., Xu, L., Ignatowski, M.: TOP-PIM: throughput-oriented programmable processing in memory. In: Proceedings of the HPDC (2014)

  10. Micron’s Automata: https://www.micronautomata.com

  11. Raoux, S., Burr, G.W., Breitwisch, M.J., Rettner, C.T., Chen, Y.-C., Shelby, R.M., Salinga, M., Krebs, D., Chen, S.-H., Lung, H.-L.: Phase-change random access memory: a scalable technology. IBM J. Res. Dev. 52(4), 5 (2008)

    Google Scholar 

  12. Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: The missing memristor found. Nature 453, 7191 (2008)

    Article  Google Scholar 

  13. Driskill-Smith, A.: Latest advances and future prospects of STT-RAM. In: Proceedings of the NVMW (2010)

  14. Islam, M., Scrbak, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Improving node-level MapReduce performance using processing-in-memory technologies. In: Proceedings of the Euro-Par (2014)

  15. Zhang, D.P., Jayasena, N., Lyashevsky, A., Greathouse, J., Meswani, M., Nutter, M., Ignatowski, M.: A new perspective on processing-in-memory architecture design. In: Proceedings of the SIGPLAN, ser. MSPC ’13, pp. 7:1–7:3 (2013)

  16. Dongarra, J.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25, 3–60 (2011)

    Article  Google Scholar 

  17. Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: SkewTune: mitigating skew in MapReduce applications. In: Proceedings of the SIGMOD (2012)

  18. Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: Proceedings of the IISWC (2009)

  19. Lee, S., Sim, H., Kim, Y., Vazhkudai, S.S.: Analyzethat: a programmable shared-memory system for an array of processing-in-memory devices. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Press pp. 619–624 (2017)

  20. Scrbak, M., Islam, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Processing-in-memory: exploring the design space. In: Proceedings of the ARCS (2015)

  21. OpenCL: The open standard for parallel programming of heterogeneous systems. https://www.khronos.org/opencl/

  22. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  23. Hadoop, A.: Apache hadoop. http://hadoop.apache.org (2011)

  24. Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: Proceedings of the MapReduce (2011)

  25. He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the PACT (2008)

  26. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the USENIX, vol. 10 (2010)

  27. Pugsley, S.H., Jestes, J., Balasubramonian, R., Srinivasan, V., Buyuktosunoglu, A., Li, F., et al.: Comparing implementations of near-data computing with in-memory MapReduce workloads. IEEE Micro 34(4), 1 (2014)

    Article  Google Scholar 

  28. Cache Coherent Interconnect for Accelerators (CCIX): http://www.ccixconsortium.com

  29. Nobis, S.: AMD’s Unified CPU & GPU Processor Concept

  30. Loh, G., Jayasena, N., Oskin M. et al.: A processing in memory taxonomy and a case for studying fixed-function PIM. In: Near-Data Processing Workshop (2013)

  31. ARM Cortex-A5: http://www.arm.com/products/processors/cortex-a/cortex-a5.php

  32. Netezza Data Warehouse | IBM: https://www.ndm.net/datawarehouse/IBM/netezza

  33. Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (2011)

  34. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of the ACM SIGOPS, vol. 41, no. 6 (2007)

  35. Debnath, B., Sengupta, S., Li, J.: FlashStore: high throughput persistent key-value store. In: Proceedings of the VLDB Endowment, vol. 3, no. 1–2 (2010)

  36. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, Stanford (1999)

    Google Scholar 

  37. EnWiki.NET: Encyclopaedia Britannica Ultimate. http://www.enwiki.net/

  38. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data (2014)

  39. Adamic, L.A., Huberman, B.A.: Power-law distribution of the world wide web. Science 287(5461), 2115 (2000)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the U.S. DOE’s Office of Advanced Scientific Computing Research (ASCR) under the Scientific data management program, and the National Research Foundation of Korea (NRF) Grant funded by the Korea Government (MSIP) (No. 2015R1C1A1A0152105). The work was also supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE, under the contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youngjae Kim.

Additional information

The preliminary version of the paper was published in the Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2017).

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Sim, H., Kim, Y. et al. A programmable shared-memory system for an array of processing-in-memory devices. Cluster Comput 22, 385–398 (2019). https://doi.org/10.1007/s10586-018-2844-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2844-1

Keywords

Navigation