Modeling the Impact of Reduced Memory Bandwidth on HPC Applications

Tiwari, Ananta; Gamst, Anthony; Laurenzano, Michael A.; Schulz, Martin; Carrington, Laura

doi:10.1007/978-3-319-09873-9_6

Ananta Tiwari¹⁶,
Anthony Gamst¹⁷,
Michael A. Laurenzano¹⁸,
Martin Schulz¹⁹ &
…
Laura Carrington¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2785 Accesses
6 Citations

Abstract

To deliver the energy efficiency and raw compute throughput necessary to realize exascale systems, projected designs call for massive numbers of (simple) cores per processor. An unfortunate consequence of such designs is that the memory bandwidth per core will be significantly reduced, which can significantly degrade the performance of many memory-intensive HPC workloads. To identify the code regions that are most impacted and to guide them in developing mitigating solutions, system designers and application developers alike would benefit immensely from a systematic framework that allowed them to identify the types of computations that are sensitive to reduced memory bandwidth and to precisely identify those regions in their code that exhibit sensitivity. This paper introduces a framework for identifying the properties in computations that are associated with memory bandwidth sensitivity, extracting those same properties from HPC applications, and for associating bandwidth sensitivity to specific structures in the application source code. We apply our framework to a number of large scale HPC applications, observing that the bandwidth sensitivity model shows an absolute mean error that averages less than 5%.

The rights of this work are transferred to the extent transferable according to Title 17 §105 U.S.C.

Download to read the full chapter text

Chapter PDF

Application Performance Analysis: A Report on the Impact of Memory Bandwidth

HPC Benchmarking: Scaling Right and Looking Beyond the Average

Automatic Performance Modeling of HPC Applications

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Mantevo Project, http://mantevo.org/
Alam, S., Vetter, J.: A framework to develop symbolic performance models of parallel applications. In: 20th International Parallel and Distributed Processing Symposium, IPDPS 2006, p. 8 (April 2006)
Google Scholar
Bailey, D.H., Snavely, A.: Performance modeling: Understanding the past and predicting the future. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 185–195. Springer, Heidelberg (2005)
Chapter Google Scholar
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks–summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing 1991. ACM, New York (1991)
Google Scholar
Barker, K., Davis, K., Kerbyson, D.: Performance modeling in action: Performance prediction of a cray xt4 system during upgrade. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS (2009)
Google Scholar
Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS 2008 (2008)
Google Scholar
Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R.S., Yelick, K.: Exascale computing study: Technology challenges in achieving exascale systems (2008), http://www.cse.nd.edu/Reports/2008TR-2008-13.pdf
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman & Hall, CRC (1984)
Google Scholar
Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening Multigrid on Distributed Memory Machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000)
Article MathSciNet MATH Google Scholar
Chen, C., Chame, J., Hall, M.W.: CHiLL: A framework for composing high-level loop transformations. TR 08-897, Univ. of Southern California (June 2008)
Google Scholar
Deng, Q., Meisner, D., Bhattacharjee, A., Wenisch, T.F., Bianchini, R.: Coscale: Coordinating cpu and memory system dvfs in server systems. In: 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO (2012)
Google Scholar
Diniz, B., Guedes, D., Meira Jr., W., Bianchini, R.: Limiting the power consumption of main memory. In: ACM SIGARCH Computer Architecture News, vol. 35, pp. 290–301. ACM (2007)
Google Scholar
Falgout, R.D., Meier Yang, U.: hypre: A library of high performance preconditioners. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J. J., Hoekstra, A.G. (eds.) ICCS 200. Part III. LNCS, vol. 2331, pp. 632–641. Springer, Heidelberg (2002)
Google Scholar
Friedman, J.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Hoefler, T.: Bridging performance analysis tools and analytic performance modeling for HPC. In: Guarracino, M.R., et al. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, pp. 483–491. Springer, Heidelberg (2011)
Chapter Google Scholar
Hoisie, A., Kerbyson, D.J., Mendes, C.L., Reed, D.A., Snavely, A.: Special section: Large-scale system performance modeling and analysis. Future Generation Comp. Syst. 22(3), 291–292 (2006)
Article Google Scholar
Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 196–205. Springer, Heidelberg (2005)
Chapter Google Scholar
Kerbyson, D., Vishnu, A., Barker, K., Hoisie, A.: Codesign challenges for exascale systems: Performance, power, and reliability. Computer 44(11), 37–43 (2011)
Article Google Scholar
Kerbyson, D.J., Jones, P.W.: A performance model of the parallel ocean program. Int. J. High Perform. Comput. Appl. 19(3), 261–276 (2005)
Article Google Scholar
Laurenzano, M., Tikir, M., Carrington, L., Snavely, A.: Pebil: Efficient static binary instrumentation for linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 175–183 (March 2010)
Google Scholar
Laurenzano, M.A., Meswani, M., Carrington, L., Snavely, A., Tikir, M.M., Poole, S.: Reducing energy usage with memory and computation-aware dynamic frequency scaling. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 79–90. Springer, Heidelberg (2011)
Chapter Google Scholar
Lebeck, A.R., Fan, X., Zeng, H., Ellis, C.: Power aware page allocation. ACM SIGPLAN Notices 35(11), 105–116 (2000)
Article Google Scholar
McVoy, L., Staelin, C.: lmbench: Portable tools for performance analysis. In: Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC 1996, Berkeley, CA, USA, pp. 23–23. USENIX Association (1996)
Google Scholar
Norman, M., Snavely, A.: Accelerating data-intensive science with Gordon and Dash. In: 2010 TeraGrid Conference (2010)
Google Scholar
Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications, Computational Science, pp. 443–462. Chapman & Hall / CRC Press (2007)
Google Scholar
Pandey, V., Jiang, W., Zhou, Y., Bianchini, R.: Dma-aware memory energy management. In: HPCA, vol. 6, pp. 133–144 (2006)
Google Scholar
Tiwari, A., Laurenzano, M., Carrington, L., Snavely, A.: Modeling power and energy usage of hpc kernels. In: Proceedings of the Eighth Workshop on High-Performance, Power-Aware Computing, HPPAC 2012 (2012)
Google Scholar
Yang, U.: Parallel algebraic multigrid methods in high performance preconditioners. In: Garbow, B.S., Dongarra, J., Boyle, J.M., Moler, C.B. (eds.) Numerical Solution of Partial Differential Equations on Parallel Computers. LNCS, vol. 51, pp. 209–236. Springer, Heidelberg (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

Performance Modeling and Characterization Lab, San Diego Supercomputer Center, USA
Ananta Tiwari & Laura Carrington
Computational and Applied Statistics Lab, San Diego Supercomputer Center, USA
Anthony Gamst
Department of Computer Science and Engineering, University of Michigan, USA
Michael A. Laurenzano
Lawrence Livermore National Laboratory (LLNL), USA
Martin Schulz

Authors

Ananta Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Gamst
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Laurenzano
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schulz
View author publications
You can also search for this author in PubMed Google Scholar
Laura Carrington
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tiwari, A., Gamst, A., Laurenzano, M.A., Schulz, M., Carrington, L. (2014). Modeling the Impact of Reduced Memory Bandwidth on HPC Applications. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications

Abstract

Chapter PDF

Similar content being viewed by others

Application Performance Analysis: A Report on the Impact of Memory Bandwidth

HPC Benchmarking: Scaling Right and Looking Beyond the Average

Automatic Performance Modeling of HPC Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications

Abstract

Chapter PDF

Similar content being viewed by others

Application Performance Analysis: A Report on the Impact of Memory Bandwidth

HPC Benchmarking: Scaling Right and Looking Beyond the Average

Automatic Performance Modeling of HPC Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation