Evaluation of Autoparallelization Toolkits for Commodity GPUs

  • David WilliamsEmail author
  • Valeriu Codreanu
  • Po Yang
  • Baoquan Liu
  • Feng Dong
  • Burhan Yasar
  • Babak Mahdian
  • Alessandro Chiarini
  • Xia Zhao
  • Jos B. T. M. Roerdink
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8384)


In this paper we evaluate the performance of the OpenACC and Mint toolkits against C and CUDA implementations of the standard PolyBench test suite. Our analysis reveals that performance is similar in many cases, but that a certain set of code constructs impede the ability of Mint to generate optimal code. We then present some small improvements which we integrate into our own GPSME toolkit (which is derived from Mint) and show that our toolkit now out-performs OpenACC in the majority of tests.


GPU computing Autoparallelization Evaluation 


  1. 1.
    Owens, J.D., Luekbe, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)CrossRefGoogle Scholar
  2. 2.
    Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4All: from convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Paris, France, Jan 2012Google Scholar
  3. 3.
    Lee, S., Eigenmann, R.: OpenMPC: extended openMP programming and tuning for GPUs. In: Proceedings of the 2010 ACM/IEEE Conference on Supercomputing, November 2010, pp. 1–11 (2010)Google Scholar
  4. 4.
    Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-stream compiler. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer, Heidelberg (2011)Google Scholar
  5. 5.
    Verdoolaege, S., Juega, J.C., Cohen, A., Gómez, J.I., Tenllado, C., Catthoor, F.: Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim. 9(4), 54:1–54:23 (2013)CrossRefGoogle Scholar
  6. 6.
    Unat, D., Cai, X., Baden, S.B.: Mint: realizing CUDA performance in 3D Stencil methods with Annotated C. In: Proceedings of the International Conference on Supercomputing, pp. 214–224 (2011)Google Scholar
  7. 7.
    The OpenACC Application Programming Interface, Version 1.0 (2011)Google Scholar
  8. 8.
    OpenMP Application Program Interface, Version 3.1 (2011)Google Scholar
  9. 9.
    Dong, F.: A General Toolkit for “GPUtilisation” in SME Applications. (2013). Accessed Oct 2013
  10. 10.
    Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: Proceedings of the International Conference on High Performance Computing, Article 23 (2012)Google Scholar
  11. 11.
    Pouchet, L-N.: PolyBench: The Polyhedral Benchmark suite (2011), Version 3.2. (2011)
  12. 12.
    Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Proceedings of Innovative Parallel Computing, pp. 1–10 (2012)Google Scholar
  13. 13.
    Zhou, J., Unat, D., Choi, D.J., Guest, C.C., Cui, Y.: Hands-on performance tuning of 3D finite difference earthquake simulation on GPU fermi chipset. Procedia Comput. Sci. 9, 976–985 (2012)CrossRefGoogle Scholar
  14. 14.
    Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: Proceedings of the Parallel Processing, pp. 216–225 (2011)Google Scholar
  15. 15.
    Komatsu, K., Sato, K., Arai, Y., Koyama, K., Takizawa, H., Kobayashi, H.: Evaluating performance and portability of OpenCL programs. In: Proceedings of the Automatic Performance Tuning (2010)Google Scholar
  16. 16.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using cuda. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)CrossRefGoogle Scholar
  17. 17.
    Magni, A., Grewe, D., Johnson, N.: Input-aware auto-tuning for directive-based GPU programming. In: Proceedings of the 6th Workshop on General Purpose Processor Using Graphic Processing Units, pp. 66–75 (2013)Google Scholar
  18. 18.
    Reyes, R.N., Lopez, I., Fumero, J.J., de Sande, F.: Directive-based programming for GPUs: a comparative study. In: IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS) (2012)Google Scholar
  19. 19.
    Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — First experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)Google Scholar
  20. 20.
    Herdman, J.A., Gaudin, W.P., McIntosh-Smith, S., Boulton, M., Beckingsale, D.A., Mallinson, A.C., Jarvis, S.A.: Accelerating hydrocodes with OpenACC, OpeCL and CUDA. In: Proceedings of the High Performance Computing, Networking, Storage and Analysis (SCC), pp. 465–471 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • David Williams
    • 1
    Email author
  • Valeriu Codreanu
    • 1
  • Po Yang
    • 2
  • Baoquan Liu
    • 2
  • Feng Dong
    • 2
  • Burhan Yasar
    • 3
  • Babak Mahdian
    • 4
  • Alessandro Chiarini
    • 5
  • Xia Zhao
    • 6
  • Jos B. T. M. Roerdink
    • 1
  1. 1.University of GroningenGroningenThe Netherlands
  2. 2.University of BedfordshireLutonUK
  3. 3.RotaSoft LtdAnkaraTurkey
  4. 4.ImageMetryPragueCzech Republic
  5. 5.Super Computing SolutionsBolognaItaly
  6. 6.AnSmartWembleyUK

Personalised recommendations