Performance Analysis of the NWChem TCE for Different Communication Patterns

Ghosh, Priyanka; Hammond, Jeff R.; Ghosh, Sayan; Chapman, Barbara

doi:10.1007/978-3-319-10214-6_14

Performance Analysis of the NWChem TCE for Different Communication Patterns

Priyanka Ghosh¹⁶,
Jeff R. Hammond¹⁷,
Sayan Ghosh¹⁶ &
…
Barbara Chapman¹⁶

Conference paper
First Online: 01 January 2014

831 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8551))

Abstract

One-sided communication is a model that separates communication from synchronization, and has been in practice for over two decades in libraries such as SHMEM and Global Arrays (GA). GA is used in a number of application codes, especially NWChem, and provides a superset of SHMEM functionality that includes remote accumulate, among other features. Remote accumulate is an active-message operation that applies \(y+=a*x\) at the target rather than just \(y=x\) (as in Put) which gives the programmer additional choices with respect to algorithm design. In this paper, we discuss and evaluate communication scenarios for dense block-tensor contractions, one of the mainstays of the NWChem computation chemistry package. We show that apart from the classical approach involving dynamic scheduling of data blocks for load balancing, reordering one-sided Get and Accumulate calls affects the performance of tensor contractions on leadership-class machines substantially. In order to understand why this reordering affects the performance, we develop a proxy application for the NWChem Tensor Contraction Engine (TCE) module. We utilize this proxy application to compare different implementations with a focus on communication.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aprà, E., Rendell, A.P., Harrison, R.J., Tipparaju, V., de Jong, W.A., Xantheas, S.S.: Liquid water: obtaining the right answer for the right reasons. In: Proceedings of the ACM/IEEE SC Conference on High Performance Networking and Computing, pp. 1–7. ACM, New York (2009)
Google Scholar
Auer, A.A., Baumgartner, G., Bernholdt, D.E., Bibireata, A., Choppella, V., Cociorva, D., Gao, X., Harrison, R., Krishnamoorthy, S., Krishnan, S., Lam, C.-C., Lu, Q., Nooijen, M., Pitzer, R., Ramanujam, J., Sadayappan, P., Sibiryakov, A.: Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104(2), 211–228 (2006)
Article Google Scholar
Bylaska, E.J., de Jong, W.A., Govind, N., Kowalski, K., Straatsma, T.P., Valiev, M., van Dam, H.J.J., Wang, D., Aprà, E., Windus, T.L., Hammond, J., Autschbach, J., Nichols, P., Hirata, S., Hackler, M.T., Zhao, Y., Fan, P.-D., Harrison, R.J., Dupuis, M., Smith, D.M.A., Nieplocha, J., Tipparaju, V., Krishnan, M., Vazquez-Mayagoitia, A., Wu, Q., Voorhis, T.V., Auer, A.A., Nooijen, M., Crosby, L.D., Brown, E., Cisneros, G., Fann, G.I., Früchtl, H., Garza, J., Hirao, K., Kendall, R., Nichols, J.A., Tsemekhman, K., Wolinski, K., Anchell, J., Bernholdt, D., Borowski, P., Clark, T., Clerc, D., Dachsel, H., Deegan, M., Dyall, K., Elwood, D., Glendening, E., Gutowski, M., Hess, A., Jaffe, J., Johnson, B., Ju, J., Kobayashi, R., Kutteh, R., Lin, Z., Littlefield, R., Long, X., Meng, B., Nakajima, T., Niu, S., Pollack, L., Rosing, M., Sandrone, G., Stave, M., Taylor, H., Thomas, G., van Lenthe, J., Wong, A., Zhang, Z.: NWChem, a computational chemistry package for parallel computers, version 6.0 (2010)
Google Scholar
Goto, K.: Gotoblas. Texas Advanced Computing Center, University of Texas at Austin, USA (2007), http://www.otc.utexas.edu/ATdisplay.jsp
Hammond, J.R., Krishnamoorthy, S., Shende, S., Romero, N.A., Malony, A.D.: Performance characterization of global address space applications: a case study with NWChem. Concurrency and Computation: Practice and Experience 24, 135–154 (2011)
Article Google Scholar
Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. SAND2009-5574 (2009)
Google Scholar
Hirata, S.: Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. The Journal of Physical Chemistry A 107(46), 9887–9897 (2003)
Article Google Scholar
Kowalski, K., Hammond, J.R., de Jong, W.A., Fan, P.-D., Valiev, M., Wang, D., Govind, N.: Coupled cluster calculations for large molecular and extended systems. In: Reimers, J.R. (ed.) Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology. Wiley (2011)
Google Scholar
Liu, X., Patel, A., Chow, E.: A new scalable parallel algorithm for fock matrix construction, pp. 1–12 (May 2014)
Google Scholar
MPI Forum. MPI: A message-passing interface standard. Version 3.0 (November 2012)
Google Scholar
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: A portable “shared-memory” programming model for distributed memory computers. In: Supercomputing (SC) (1994)
Google Scholar
Ozog, D., Hammond, J.R., Dinan, J., Balaji, P., Shende, S., Malony, A.: Inspector-executor load balancing algorithms for block-sparse tensor contractions. In: International Conference on Parallel Processing (ICPP) (October 2013)
Google Scholar
Ozog, D., Shende, S., Malony, A.D., Hammond, J.R., Dinan, J., Balaji, P.: Inspector/executor load balancing algorithms for block-sparse tensor contractions. In: ICS, pp. 483–484 (2013)
Google Scholar
Rajbhandari, S., Nikam, A., Lai, P.-W., Stock, K., Krishnamoorthy, S., Sadayappan, P.: Framework for distributed contractions of tensors with symmetry. Ohio State University (2013) (Preprint)
Google Scholar
Solomonik, E., Matthews, D., Hammond, J., Demmel, J.: Cyclops tensor framework: reducing communication and eliminating load imbalance in massively parallel contractions. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (2013)
Google Scholar
Straatsma, T.P., McCammon, J.A.: Load balancing of molecular dynamics simulation with NWChem. IBM Systems Journal 40(2), 328–341 (2001)
Article Google Scholar
Valiev, M., Bylaska, E., Govind, N., Kowalski, K., Straatsma, T., Dam, H.V., Wang, D., Nieplocha, J., Apra, E., Windus, T., de Jong, W.: NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications 181(9), 1477–1489 (2010)
Article MATH Google Scholar
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM), pp. 1–27. IEEE Computer Society (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, Houston, TX, 77004, USA
Priyanka Ghosh, Sayan Ghosh & Barbara Chapman
Leadership Computing Facility, Argonne National Laboratory, Lemont, IL, 60439, USA
Jeff R. Hammond

Authors

Priyanka Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Jeff R. Hammond
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Chapman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeff R. Hammond .

Editor information

Editors and Affiliations

University of Warwick Coventry, West Midlands, United Kingdom
Stephen A. Jarvis
University of Warwick Coventry, West Midlands, United Kingdom
Steven A. Wright
Sandia National Laboratories CSRI, Albuquerque, New Mexico, USA
Simon D. Hammond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghosh, P., Hammond, J.R., Ghosh, S., Chapman, B. (2014). Performance Analysis of the NWChem TCE for Different Communication Patterns. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-10214-6_14
Published: 01 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10213-9
Online ISBN: 978-3-319-10214-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics