QUARC: An Array Programming Approach to High Performance Computing

Deb, Diptorup; Fowler, Robert J.; Porterfield, Allan

doi:10.1007/978-3-319-52709-3_1

Diptorup Deb¹⁶,
Robert J. Fowler¹⁶ &
Allan Porterfield¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

992 Accesses

Abstract

We present QUARC, a framework for the optimized compilation of domain-specific extensions to C++. Driven by needs for programmer productivity and portable performance for lattice QCD, the framework focuses on stencil-like computations on arrays with an arbitrary number of dimensions. QUARC uses a template meta-programming front end to define a high-level array language. Unlike approaches that generate scalarized loop nests in the front end, the instantiation of QUARC templates retains high-level abstraction suitable for optimization at the object (array) level. The back end compiler (CLANG/LLVM) is extended to implement array transformations such as transposition, reshaping, and partitioning for parallelism and for memory locality prior to scalarization. We present the design and implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 4:1–4:12. IEEE Press, Piscataway (2008). http://dl.acm.org/citation.cfm?id=1413370.1413375
Edwards, H.C., Trott, C.R.: Kokkos: enabling performance portability across manycore architectures. In: Proceedings of the 2013 Extreme Scaling Workshop (XSW 2013), XSW 2013, pp. 18–24 (2013). http://dx.doi.org/10.1109/XSW.2013.7
Estérie, P., Gaunard, M., Falcou, J., Lapresté, J.T., Rozoy, B.: Boost.SIMD: generic programming for portable SIMDization. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT 2012, pp. 431–432. ACM, New York (2012). http://doi.acm.org/10.1145/2370816.2370881
Härdtlein, J., Pflaum, C., Linke, A., Wolters, C.H.: Advanced expression templates programming. Comput. Vis. Sci. 13(2), 59–68 (2009). http://dx.doi.org/10.1007/s00791-009-0128-2
Article Google Scholar
Henretty, T., Veras, R., Franchetti, F., Pouchet, L.N., Ramanujam, J., Sadayappan, P.: A stencil compiler for short-vector SIMD architectures. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing - ICS 2013, p. 13 (2013). http://dl.acm.org/citation.cfm?doid=2464996.2467268
Iglberger, K., Hager, G., Treibig, J., Rüde, U.: Expression templates revisited: a performance analysis of current methodologies. SIAM J. Sci. Comput. 34(2), C42–C69 (2012). http://dx.doi.org/10.1137/110830125
Article MathSciNet Google Scholar
Intel Corporation: Intel Threading Building Blocks (2016)
Google Scholar
Iverson, K.E.: Notation as a tool of thought. Commun. ACM 23(8), 444–465 (1980). http://doi.acm.org/10.1145/358896.358899
Article MathSciNet Google Scholar
Joo, B., Smelyanskiy, M., Kalamkar, D.D., Vaidyanathan, K.: Wilson Dslash kernel from lattice QCD optimization, July 2015. http://www.osti.gov/scitech/servlets/purl/1223094
Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: Proceedings of the 2005 Workshop on Memory System Performance, MSP 2005, pp. 36–43. ACM, New York (2005). http://doi.acm.org/10.1145/1111583.1111589
Kennedy, K., Broom, B., Chauhan, A., Fowler, R.J., Garvin, J., Koelbel, C., Mccosh, C., Mellor-Crummey, J.: Telescoping languages: a system for automatic generation of domain languages. Proc. IEEE 93(2), 387–408 (2005)
Article Google Scholar
Majeti, D., Barik, R., Zhao, J., Grossman, M., Sarkar, V.: Compiler-driven data layout transformation for heterogeneous platforms. In: Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 188–197. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54420-0_19
Chapter Google Scholar
Maslov, V.: Delinearization: an efficient way to break multiloop dependence equations. In: Proceedings of the SIGPLAN 1992 Conference on Programming Language Design and Implementation, pp. 152–161 (1992)
Google Scholar
More, T.: Axioms and theorems for a theory of arrays. IBM J. Res. Dev. 17(2), 135–175 (1973). http://dx.doi.org/10.1147/rd.172.0135
Article MathSciNet MATH Google Scholar
Mullin, L.: A mathematics of arrays. Ph.D. thesis, Syracuse University, December 1988
Google Scholar
Roth, G., Mellor-Crummey, J., Kennedy, K., Brickner, R.G.: Compiling stencils in high performance fortran. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, SC 1997. pp. 1–20. ACM, New York (1997). http://doi.acm.org/10.1145/509593.509605
Haney, S., Crotinger, J., Karmesin, S., Smith, S.: Easy expression templates using PETE, the Portable Expression Template Engine. Technical report LA-UR-99-777 (1999)
Google Scholar
Sung, I.J., Stratton, J.A., Hwu, W.M.W.: Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 513–522. ACM, New York (2010). http://doi.acm.org/10.1145/1854273.1854336
Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011). http://doi.acm.org/10.1145/1989493.1989508
USQCD: QDP++ (2002). http://usqcd-software.github.io/qdpxx/
Veldhuizen, T.: Expression templates. C++ Report 7, 26–31 (1995)
Google Scholar
Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Fukuda, K., Hoeven, J., Joswig, M., Takayama, N. (eds.) ICMS 2010. LNCS, vol. 6327, pp. 299–302. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15582-6_49
Chapter Google Scholar
Winter, F.T., Clark, M.A., Edwards, R.G., Joo, B.: A framework for lattice QCD calculations on GPUs. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, May 2014. http://dx.doi.org/10.1109/IPDPS.2014.112
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI 1991, pp. 30–44. ACM, New York (1991). http://doi.acm.org/10.1145/113445.113449
Xu, S., Gregg, D.: Semi-automatic composition of data layout transformations for loop vectorization. In: Hsu, C.-H., Shi, X., Salapura, V. (eds.) NPC 2014. LNCS, vol. 8707, pp. 485–496. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44917-2_40
Google Scholar
Yan, Y., Lin, P.H., Liao, C., de Supinski, B.R., Quinlan, D.J.: Supporting multiple accelerators in high-level programming models. In: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015, pp. 170–180, ACM, New York (2015). http://doi.acm.org/10.1145/2712386.2712405

Download references

Acknowledgement

This work was supported in part by the DOE Office of Science SciDAC program on grants DE-FG02-11ER26050/DE-SC0006925 and DE-SC0008706.

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
Diptorup Deb, Robert J. Fowler & Allan Porterfield

Authors

Diptorup Deb
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Fowler
View author publications
You can also search for this author in PubMed Google Scholar
Allan Porterfield
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diptorup Deb .

Editor information

Editors and Affiliations

University of Rochester , Rochester, New York, USA
Chen Ding
University of Rochester , Rochester, New York, USA
John Criswell
Huawei Inc. , Santa Clara, California, USA
Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deb, D., Fowler, R.J., Porterfield, A. (2017). QUARC: An Array Programming Approach to High Performance Computing. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-52709-3_1
Published: 24 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics