Abstract
The design of hardware for next-generation exascale computing systems will require a deep understanding of how software optimizations impact hardware design trade-offs. In order to characterize how co-tuning hardware and software parameters affects the performance of combustion simulation codes, we created ExaSAT, a compiler-driven static analysis and performance modeling framework. Our framework can evaluate hundreds of hardware/software configurations in seconds, providing an essential speed advantage over simulators and dynamic analysis techniques during the co-design process. Our analytic performance model shows that advanced code transformations, such as cache blocking and loop fusion, can have a significant impact on choices for cache and memory architecture. Our modeling helped us identify tuned configurations that achieve a 90% reduction in memory traffic, which could significantly improve performance and reduce energy consumption. These techniques will also be useful for the development of advanced programming models and runtimes, which must reason about these optimizations to deliver better performance and energy efficiency.
This manuscript has been authored by an author at Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231 with the U.S. Department of Energy. The U.S. Government retains, and the publisher, by accepting the article for publication, acknowledges, that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mohiyuddin, M., et al.: A design methodology for domain-optimized power-efficient supercomputing. In: SC 2009, pp. 12:1–12:12. ACM, New York (2009)
Tan, Z., et al.: RAMP Gold: An FPGA-based architecture simulator for multiprocessors. In: 2010 47th ACM/IEEE Design Automation Conference (DAC), DAC 2010, pp. 463–468 (June 2010)
Janssen, C.L., et al.: A simulator for large-scale parallel computer architectures. International Journal of Distributed Systems and Technologies 1(2), 57–73 (2010)
Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: PLDI 2005, pp. 190–200. ACM, New York (2005)
Spafford, K.L., Vetter, J.S.: Aspen: a domain specific language for performance modeling. In: SC 2012, pp. 84:1–84:11. IEEE Computer Society Press, Los Alamitos (2012)
ExaCT: Center for exascale simulation of combustion in turbulence. Website (2013), http://exactcodesign.org
Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: SC 2008, pp. 4:1–4:12. IEEE Press, Piscataway (2008)
Rivera, G., Tseng, C.-W.: Tiling optimizations for 3d scientific computations. In: Supercomputing 2000. IEEE Computer Society, Washington, DC (2000)
Kogge, P., et al.: Exascale computing study: Technology challenges in achieving exascale systems (2008)
Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011)
Miller, D.A.B.: Rationale and challenges for optical interconnects to electronic chips. In: Proc. IEEE, pp. 728–749 (2000)
Borkar, S.: Design challenges of technology scaling. IEEE Micro 19(4), 23–29 (1999)
Chen, J.H., et al.: Terascale Direct Numerical Simulations of Turbulent Combustion Using S3D. Comput. Sci. Disc. 2(015001) (2009)
Gottleib, S., Shu, C.: Total variation diminishing Runge-Kutta schemes. Mathematics of Computation 67(221), 73–85 (1998)
Qiu, J., Shu, C.: Runge-Kutta discontinuous Galerkin method using WENO limiters. SIAM J. Sci. Comp. 26(3), 907–929 (2005)
Zhang, W., et al.: Multirate higher-order discretization approaches for the multicomponent, reaction compressible Navier-Stokes equations (in preparation)
Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. J. Parallel Distrib. Comput. 64(1), 108–134 (2004)
Quinlan, D.J., Miller, B., Philip, B., Schordan, M.: Treating a user-defined parallel library as a domain-specific language. In: IPDPS 2002, p. 324. IEEE Computer Society (2002)
Unat, D., Chan, C., et al.: Exasat: A static analysis and performance modeling tool for exascale co-design (in preparation)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commununications of the ACM 52(4), 65–76 (2009)
Williams, S.: Intel Sandy Bridge SVML benchmark results (2012)
Vladimirov, A.: Arithmetics on Intel’s Sandy Bridge and Westmere CPUs: not all FLOPS are created equal. Colfax International (2012)
NERSC: Cray XE6 (Hopper). Website (2013), http://www.nersc.gov/users/computational-systems/hopper
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chan, C., Unat, D., Lijewski, M., Zhang, W., Bell, J., Shalf, J. (2013). Software Design Space Exploration for Exascale Combustion Co-design. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-38750-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)