Skip to main content

Influence of Stacked 3D Memory/Cache Architectures on GPUs

  • Chapter
  • First Online:
3D Integration for NoC-based SoC Architectures

Part of the book series: Integrated Circuits and Systems ((ICIR))

  • 1348 Accesses

Abstract

This chapter investigates the architectural design of a 3D die-stacked Graphics Processing Unit. The investigation includes a discussion of the design space of the system as well as some empirical results that quantify the expected performance gain of such a system. Also, the chapter discusses the cost, power and thermal aspects of the proposed designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stanford University CS488a Spring 2007 Real-Time Graphics Architecture, available at: http://graphics.stanford.edu/cs448-07-spring/

  2. R. del Barrio, V. M. Gonzalez, C. Roca, J. Fernandez, and A. Espasa E., “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” in Proc. International Symposium on Performance Analysis of Systems and Software, 2006, pages 231–241

    Google Scholar 

  3. General-Purpose Computation Using Graphics Hardware, available at: www.gpgpu.com

  4. Nvidia: CUDA Homepage, available at: http://www.nvidia.com/object/cuda_home.html

  5. ATI Stream Software Development Kit (SDK), available at: http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx

  6. GeForce GTX200 Technical Brief, available at: http://www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf

  7. Yuh-Fang Tsai, Y. Xie, N. Vijaykrishnan, and M. Jane Irwin, “Three-Dimensional Cache Design Exploration Using 3DCacti,” in Proc. International Conference on Computer Design, 2005, pages 519–524

    Google Scholar 

  8. N. Govindaraju, S. Larsen, J. Gray, and D. Manocha, “A Memory Model for Scientific Algorithms on Graphics Processors,” in Proc. Conference on High Performance Networking and Computing, 2006. Article No. 89

    Google Scholar 

  9. N. Goodnight, C. Woolley, G. Lewin, D. Luebke, and G. Humphreys, “A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware,” in Proc. SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2003, pages 102–111

    Google Scholar 

  10. K. Fatahalian, J. Sugerman, and P. Hanrahan, “Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication,” in Proc. SIGGRAPH, 2004, pages 133–137

    Google Scholar 

  11. CACTI Cache Simulator, available at: http://www.hpl.hp.com/research/cacti/

  12. V. K. Kodavalla, “IP Gate Count Estimation Methodology During Micro-Architecture Phase,” in IP based Electronic System Conference and Exhibition, Dec. 5–6 2007, Grenoble, France, available at: http://www.design-reuse.com/ipbasedsocdesign/slides_2007-32_01.html

  13. ITRS, “International Technology Roadmap for Semiconductors,” available at: www.itrs.net

  14. X. Dong, and Y. Xie, “System-Level Cost Analysis and Design Exploration for 3D ICs,” in Proc. Asia and South Pacific Design Automation Conference, 2009, pages 234–241, Yokohama, Japan

    Google Scholar 

  15. J. L. Hennessy, and D. A. Patterson, Computer Architecture: A Quantitative Approach. Fourth Edition, Wiley, San Francisco, CA, 2010

    Google Scholar 

  16. M. Saravana Sibi Govindan, S. W. Keckler, S. R. Nassif, and E. Acar, “A Temperature Aware Power Estimation Methodology,” ASPDAC, January 2008

    Google Scholar 

  17. K. Skadron, M. R. Stan, W. Velusamy, K. Sankaranarayanan, and D. Tarjan, “Temperature-Aware Microarchitecture,” in Proc. International Symposium on Computer Architecture, 2003, pages 2–13

    Article  Google Scholar 

  18. Attila Project: AttilaWiki, available at: https://attila.ac.upc.edu/wiki/index.php/Main_Page, 2008

  19. OpenGL, available at: http://www.opengl.org/

  20. DirectX Library, available at: http://www.microsoft.com/games/en-US/aboutGFW/pages/directx.aspx

  21. D. Luebke, and G. Humphreys, How GPUs Work, in IEEE Computer, vol. 40, no. 2, pages 126–130, 2007

    Article  Google Scholar 

  22. S. Jones, “2008 IC Economics Report,” in IC Knowledge LLC, 2008, available at: http://www.icknowledge.com/

  23. S. Rodriguez, and B. Jacob, “Energy/power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32),” in Proc. International Symposium on Low Power Electronics and Design, 2006, pages 25–30

    Google Scholar 

  24. J. D. Hall, N. Carr, and J. Hart, “Cache and Bandwidth Aware Matrix Multiplication on the GPU,” Technical Report UIUCDCS-R-2003-2328, University of Illinois Urbana-Champain, 2003

    Google Scholar 

  25. M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens, “Efficient Computation of Sum-Products on GPUs Through Software-Managed Cache,” in Proc. Inter. Conference on Supercomputing, 2008, pages 308–318

    Google Scholar 

  26. G. Luca Loi, B. Agrawal, N. Srivastava, Sheng-Chih Lin, T. Sherwood, and K. Banerjee, “A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy,” in Proc. Design Automation Conference, 2006, pages 991–996

    Google Scholar 

  27. K. Puttaswamy, and G. H. Loh, “Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors,” in Proc. HPCA, 2007, pages 193–204

    Google Scholar 

  28. M. Hosomi, H. Yamagishi, and T. Yamamoto, “A Novel Nonvolatile Memory with Spin Torque Transfer Magnetization Switching: Spin-Ram,” in International Electron Devices Meeting, 2005, pages 459–462

    Google Scholar 

  29. J. Owens, “GPU Architecture Overview,” in Proc. International Conference on Computer Graphics and Interactive Techniques, 2007, Article No. 2

    Google Scholar 

  30. A. Al Maashri, G. Sun, X. Dong, V. Narayanan, and Y. Xie, “3D GPU Architecture Using Cache Stacking: Performance, Cost, Power, and Thermal Analysis,” in Proc. International Conference on Computer Design (ICCD), 2009

    Google Scholar 

Download references

Acknowledgment

The work appeared in this chapter was supported in part by NSF grants 0903432; 0702617.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Al Maashri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Al Maashri, A., Sun, G., Dong, X., Xie, Y., Vijaykrishnan, N. (2011). Influence of Stacked 3D Memory/Cache Architectures on GPUs. In: Sheibanyrad, A., Pétrot, F., Jantsch, A. (eds) 3D Integration for NoC-based SoC Architectures. Integrated Circuits and Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7618-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7618-5_11

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7617-8

  • Online ISBN: 978-1-4419-7618-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics