Skip to main content

The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective

  • Conference paper
  • First Online:
Green, Pervasive, and Cloud Computing

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9663))

  • 782 Accesses

Abstract

Because of the popularity of big data and cloud computing, the evolution of microarchitecture has to concentrated on raw computing ability, throughput, low power and cost at the same time. Due to the huge Non-recurring engineering costs, computer architects and processor designers rely on the simulation tools and models to optimize the main processing unit. Design space exploration (DSE) methodology is responsible to filter all the possible choices. However, thousands of parameters for current multi-core processor make it too expensive to complete the exhausting search. The future high performance computing (HPC) no longer insist on peak double precision performance (DFP) only, but also on high throughput and light-weight. Depending on the various details from the number of cores to the individual pipeline buffer size, we can divide the DSE problem into macro and micro level.

In this paper, we focus on the macro-DSE problem around choosing the right style for the processing core design. Firstly, we extended McPAT, the de facto DSE tools to support from 65 nm to 16 nm technology and up to 256 Cores. Based on the physical design constraints: chip area, power and balance design request, we examine and explore the design of future processing unit of high performance. Although traditional HPC pursued the peak performance only, our DSE results show the physical constrain will direct the processing unit of future HPC to limited choice. The experiment results show that with only 74.8 % increasing in chip die area and 3.8 % increasing in power, one many-core design can archive 4 times peak performance both in INT and FP, and 285.6 % increasing in performance/power efficiency than another. The key insight of our experiment indicates that unique type of processing core can be the best choice depending on the specific physical design plan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Thomas Sterling, R., Williams, S., Yelick, K.: ExtraScale Computing Study: Technology Challenges in Achieving Exascale System. Kogge, P. (ed. and study lead) (2008)

    Google Scholar 

  2. Danowitz, A., Kelley, K., Mao, J., Stevenson, J.P., Horowitz, M.: CPU DB: recording microprocessor history. Commun. ACM 55(4), 55–63 (2012)

    Article  Google Scholar 

  3. Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evalution of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecdture (ISCA 2010), pp. 451–460 (2010)

    Google Scholar 

  4. Blem, E., Menon, J., Vijayaraghavan, T., Sankaralingam, K.: ISA wars: understanding the relevance of ISA being RISC or CISC to performance power and energy on modern architecture. ACM Trans. Comput. Syst. 33(1), 3 (2015)

    Article  Google Scholar 

  5. Tendler, J.M., Dodson, J.S., Fields, J.S., Le, H., Sinharoy, B.: POWER4 System microarchtecture. IBM J. Res. Dev. 46(1), 5–15 (2001)

    Article  Google Scholar 

  6. Sampson, R., Yang, M., Wei, S., Chakrabarti, C., Wenisch, T.F.: Sonic Millip3De: a massively parallel 3D-stacked accelerator for 3D ultrasound. In: Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, pp. 318–329 (2013)

    Google Scholar 

  7. Akin, B., Franchetti, F., Hoe, J.C.: Data reorganization in memory using 3D-stacked DRAM. In: Proceedings of the 42nd International Symposium on Computer Architecture, pp. 131–143 (2015)

    Google Scholar 

  8. Koyanagi, M.: Heterogeneous 3D integration - technology enabler toward future super-chip. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), pp. 1.2.1–1.2.8 (2013)

    Google Scholar 

  9. Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: The McPAT framework for multicore and manycore architecture: simultaneously modeling power, area, and timing. ACM Trans. Archit. Code Optim. 10(1), 5 (2013)

    Article  Google Scholar 

  10. Xi, S.L., Jacobson, H., Bose, P., Wei, G.-Y., Brooks, D.: Quantifying sources of error in McPAT and potential impacts on architecture studies. In: Proceedings of 21st Internaional Symposium on High Performance Computer Architecture, pp. 577–589 (2015)

    Google Scholar 

  11. Leng, J., Hethering, T., ElTantawy, A., Gilani, S., Kim, N.S., Aamodt, T.M., Reddi, V.J.: GPUWattch: enabling energy optimizations in GPGPUs. In: Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA 2013), pp. 487–498 (2013)

    Google Scholar 

  12. Serafy, C., Srivastava, A., Yeung, D.: Unlocking the true potential of 3D CPUs with micro-fluidic cooling. In: Proceedings of the 2014 International Symposium on Low Power Electronics and Design, pp. 323–326 (2014)

    Google Scholar 

  13. Johns, C.R., Brokenshire, D.A.: Introduction to the cell broadband engine architecture. IBM J. Res. Dev. 51(5), 503–520 (2007)

    Article  Google Scholar 

  14. Gutta, S.R., Foley, D., Naini, A., Wasmuth, R., Cherepacha, D.: A low-power integrated X86-64 and graphics processor for mobile computing devices. In: 2011 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 270–272 (2011)

    Google Scholar 

  15. Davy, G., Deckhout, L.: Chip multiprocessor design space exploration through statistical simulation. IEEE Trans. Comput. 12(58), 1668–1681 (2009)

    MathSciNet  Google Scholar 

  16. Lee, J., Jang, H., Kim, J.: RpStacks: fast and accurate processor design space exploration using representative stall-event stacks. In: Proceedings of 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 255–267 (2014)

    Google Scholar 

  17. Rajovic, N., Carpenter, R.M., Gelado, I., Puzovic, N., Ramirez, A., Valero, M.: Supercomputing with commodity CPUs: are mobile SoCs Ready for HPC? In: Proceedings of 2013 International Conference of Supercomputing (SC 2013), pp. 1–12 (2013)

    Google Scholar 

  18. Dubach, C., Jones, T., O’Boyle, M.: Microarchitectural design space exploration using an architecture-centric approach. In: Proceeding of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40), pp. 262–271 (2007)

    Google Scholar 

  19. Wang, L., Tang, Y., Deng, Y., Qi, F., et al.: A Scalable and fast microprocessor design space exploration methodology. In: Proceedings of McSoC (2015)

    Google Scholar 

  20. Gibbons, P.B.: Big data: scale down, scale up, scale out. The Keynotes in 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 29) (2015)

    Google Scholar 

  21. Dhodapkar, A., Aauterbach, G., Li, S., et al.: SeaMicro SM10000-64 server: building datacenter servers using cell phone chips. In: Proceedings of 23rd IEEE HotChips Symposium (2011)

    Google Scholar 

  22. Gwennap, L.: ThunderX rattles server market: cavium develops 48-Core ARM processor to challenge Xeon. MicroProcessor report, 9 June 2014

    Google Scholar 

  23. Gwennap, L.: 3D packaging gains momentum: xilinx FPGAs to use stacked silicon - will processors follow suit? MicroProcessor report 12/27/10-01 December 2012

    Google Scholar 

  24. Dreslinski, R.G., Fick, D., Giridhar, B., Kim, G., Seo, S., Fojtik, M., Satpathy, S., Lee, Y., Kim, D., Liu, N., Wieckowski, M., Chen, G., Sylvester, D., Blaauw, D., Mudge, T.: Centip3De: a many-core prototype exploring 3D integration and near-threshold computing. Commun. ACM 56(11), 97–104 (2013)

    Article  Google Scholar 

  25. Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

We thanks the other cpu@nudt team numbers that provide architecture, microarchitecture and physical design parameters of various processor. This work is supported in part by NSFC grants No. 61272139 and National Science and Technology Major Project HGJ-2015ZX01028001-001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuxing Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tang, Y., Wang, L., Deng, Y., Ni, X., Dou, Q. (2016). The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective. In: Huang, X., Xiang, Y., Li, KC. (eds) Green, Pervasive, and Cloud Computing. Lecture Notes in Computer Science(), vol 9663. Springer, Cham. https://doi.org/10.1007/978-3-319-39077-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39077-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39076-5

  • Online ISBN: 978-3-319-39077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics