DVFS Space Exploration in Power Constrained Processing-in-Memory Systems

  • Marko Scrbak
  • Joseph L. Greathouse
  • Nuwan Jayasena
  • Krishna Kavi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10172)


In order to deliver high performance under stringent power constraints, future systems may include die-stacked memories with processing-in-memory (PIM) cores. Because of their proximity to the memory, PIMs are expected to target applications which require high bandwidth, implying that PIMs do not need the same computational capabilities as traditional host processor and can therefore be implemented using slower, low-leakage transistors to increase energy efficiency. Such systems must carefully balance design-time choices, such as the circuits used to build the devices, and run-time choices, such as DVFS states and the preferred hardware platform on which to run the application. This paper explores these parameters in a GPGPU PIM system with a large compute-optimized host and a collection of bandwidth-optimized PIMs. We develop high-level performance and power models and use them to find optimal DVFS and kernel placement decisions for a series of GPGPU applications targeting maximum energy efficiency. We find, for instance, that the energy efficiency of PIM systems is greatly affected by DVFS; simply selecting the optimum hardware (host/PIM) results in 7\(\times \) higher ED\(^2\) than migrating work in conjunction with DVFS.


Processing-in-Memory DVFS GPGPU High performance computing Energy efficiency Computer architecture 3D-DRAM 


  1. 1.
    Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the International Symposium on Computer Architecture (ISCA) (2015)Google Scholar
  2. 2.
    Akram, S., Sartor, J.B., Eeckhout, L.: DVFS performance prediction for managed multithreaded applications. In: International Symposium on Performance Analysis of Systems and Software (ISPASS) (2016)Google Scholar
  3. 3.
    Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRefGoogle Scholar
  4. 4.
    Black, B.: Die stacking is happening. Presented at MICRO (2013)Google Scholar
  5. 5.
    Cochran, R., Hankendi, C., Coskun, A.K., Reda, S.: Pack & Cap: adaptive DVFS and thread packing under power caps. In: Proceedings of the International Symposiyum on Microarchitecture (MICRO) (2011)Google Scholar
  6. 6.
    Eckert, Y., Jayasena, N., Loh, G.H.: Thermal feasibility of die-stacked processing in memory. In: Workshop on Near-Data Processing (WoNDP) (2014)Google Scholar
  7. 7.
    Farmahini-Farahani, A., Ahn, J.H., Morrow, K., Kim, N.S.: NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA) (2015)Google Scholar
  8. 8.
    Islam, M., Ščrbak, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Improving node-level mapreduce performance using processing-in-memory technologies. In: Proceedings of the International European Conference on Parallel Processing (EuroPar) (2014)Google Scholar
  9. 9.
    Joint Electron Devices Engineering Council: High Bandwidth Memory (HBM) DRAM. JEDEC Document JESD235A (2015)Google Scholar
  10. 10.
    Krishnan, G., Bouvier, D., Zhang, L., Dongara, P.: Energy efficient graphics and multimedia in 28 nm Carrizo APU. Presented at Hot Chips (2015)Google Scholar
  11. 11.
    Lee, J., Sathisha, V., Schulte, M., Compton, K., Kim, N.S.: Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (2011)Google Scholar
  12. 12.
    Majumdar, A., Wu, G., Dev, K., Greathouse, J.L., Paul, I., Huang, W., Venugopal, A.K., Piga, L., Freitag, C., Puthoor, S.: A taxonomy of GPGPU performance scaling. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2015)Google Scholar
  13. 13.
    Nair, R., Antao, S.F., Bertolli, C., Bose, P., Brunheroto, J.R.: Active memory cube: a processing-in-memory architecture for exascale systems. IBM J. Res. Dev. 59(2/3), 17:1–17:14 (2015)CrossRefGoogle Scholar
  14. 14.
    Nowatzki, T., Menon, J., Ho, C.H., Sankaralingam, K.: gem5, GPGPUSim, McPAT, GPUWattch, “Your favorite simulator here” considered harmful. In: Workshop on Duplicating, Deconstructing, and Debunking (2014)Google Scholar
  15. 15.
    Paul, I., Manne, S., Arora, M., Bircher, W.L., Yalamanchili, S.: Cooperative boosting: needy versus greedy power management. In: Proceedings of the International Symposium on Computer Architecture (ISCA) (2013)Google Scholar
  16. 16.
    Pawlowski, J.T.: Hybrid Memory Cube (HMC). Presented at Hot Chips (2011)Google Scholar
  17. 17.
    Pugsley, S.H., Jestes, J., Zhang, H., Balasubramonian, R., Srinivasan, V., Buyuktosunoglu, A., Davis, A., Li, F.: NDC: analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS) (2014)Google Scholar
  18. 18.
    Schulte, M.J., Ignatowski, M., Loh, G.H., Beckmann, B.M., Brantley, W.C., Gurumurthi, S., Jayasena, N., Paul, I., Reinhardt, S.K., Rodgers, G.: Achieving exascale capabilities through heterogeneous computing. IEEE Micro 35(4), 26–36 (2015)CrossRefGoogle Scholar
  19. 19.
    Su, B., Gu, J., Shen, L., Huang, W., Greathouse, J.L., Wang, Z.: PPEP: online performance, power, and energy prediction framework and DVFS space exploration. In: Proceedings of the International Symposium on Microarchitecture (MICRO) (2014)Google Scholar
  20. 20.
    TOP 500 List: Titan - Cray XK7. (2012). Accessed 31 July 2016
  21. 21.
    Scrbak, M., Islam, M., Kavi, K.M., Ignatowski, M., Jayasena, N.: Processing-in-memory: exploring the design space. In: Pinho, L.M.P., Karl, W., Cohen, A., Brinkschulte, U. (eds.) ARCS 2015. LNCS, vol. 9017, pp. 43–54. Springer, Cham (2015). doi: 10.1007/978-3-319-16086-3_4 Google Scholar
  22. 22.
    Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA) (2015)Google Scholar
  23. 23.
    Zhang, D., Jayasena, N., Lyashevsky, A., Greathouse, J.L., Xu, L., Ignatowski, M.: TOP-PIM: throughput-oriented programmable processing in memory. In: Proceedings of the International Symposium on High-performance Parallel and Distributed Computing (HPDC) (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Marko Scrbak
    • 1
  • Joseph L. Greathouse
    • 2
  • Nuwan Jayasena
    • 2
  • Krishna Kavi
    • 1
  1. 1.University of North TexasDentonUSA
  2. 2.Advanced Micro Devices, Inc. (AMD)SunnyvaleUSA

Personalised recommendations