Abstract
In this paper we evaluate the performance of the OpenACC and Mint toolkits against C and CUDA implementations of the standard PolyBench test suite. Our analysis reveals that performance is similar in many cases, but that a certain set of code constructs impede the ability of Mint to generate optimal code. We then present some small improvements which we integrate into our own GPSME toolkit (which is derived from Mint) and show that our toolkit now out-performs OpenACC in the majority of tests.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Owens, J.D., Luekbe, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)
Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4All: from convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Paris, France, Jan 2012
Lee, S., Eigenmann, R.: OpenMPC: extended openMP programming and tuning for GPUs. In: Proceedings of the 2010 ACM/IEEE Conference on Supercomputing, November 2010, pp. 1–11 (2010)
Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-stream compiler. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer, Heidelberg (2011)
Verdoolaege, S., Juega, J.C., Cohen, A., Gómez, J.I., Tenllado, C., Catthoor, F.: Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim. 9(4), 54:1–54:23 (2013)
Unat, D., Cai, X., Baden, S.B.: Mint: realizing CUDA performance in 3D Stencil methods with Annotated C. In: Proceedings of the International Conference on Supercomputing, pp. 214–224 (2011)
The OpenACC Application Programming Interface, Version 1.0 (2011)
OpenMP Application Program Interface, Version 3.1 (2011)
Dong, F.: A General Toolkit for “GPUtilisation” in SME Applications. http://www.gp-sme.eu/ (2013). Accessed Oct 2013
Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: Proceedings of the International Conference on High Performance Computing, Article 23 (2012)
Pouchet, L-N.: PolyBench: The Polyhedral Benchmark suite (2011), Version 3.2. http://www.cs.ucla.edu/~pouchet/software/polybench/ (2011)
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Proceedings of Innovative Parallel Computing, pp. 1–10 (2012)
Zhou, J., Unat, D., Choi, D.J., Guest, C.C., Cui, Y.: Hands-on performance tuning of 3D finite difference earthquake simulation on GPU fermi chipset. Procedia Comput. Sci. 9, 976–985 (2012)
Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: Proceedings of the Parallel Processing, pp. 216–225 (2011)
Komatsu, K., Sato, K., Arai, Y., Koyama, K., Takizawa, H., Kobayashi, H.: Evaluating performance and portability of OpenCL programs. In: Proceedings of the Automatic Performance Tuning (2010)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using cuda. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)
Magni, A., Grewe, D., Johnson, N.: Input-aware auto-tuning for directive-based GPU programming. In: Proceedings of the 6th Workshop on General Purpose Processor Using Graphic Processing Units, pp. 66–75 (2013)
Reyes, R.N., Lopez, I., Fumero, J.J., de Sande, F.: Directive-based programming for GPUs: a comparative study. In: IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS) (2012)
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — First experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Herdman, J.A., Gaudin, W.P., McIntosh-Smith, S., Boulton, M., Beckingsale, D.A., Mallinson, A.C., Jarvis, S.A.: Accelerating hydrocodes with OpenACC, OpeCL and CUDA. In: Proceedings of the High Performance Computing, Networking, Storage and Analysis (SCC), pp. 465–471 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Williams, D. et al. (2014). Evaluation of Autoparallelization Toolkits for Commodity GPUs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-55224-3_42
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55223-6
Online ISBN: 978-3-642-55224-3
eBook Packages: Computer ScienceComputer Science (R0)