Skip to main content

Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications

  • Conference paper
  • First Online:
High Performance Computing for Computational Science -- VECPAR 2014 (VECPAR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Abstract

The performance optimization of scientific applications usually requires an in-depth knowledge of the hardware and software. A performance tuning mechanism is suggested to automatically tune OpenACC parameters to adapt to the execution environment on a given system. A historic learning based methodology is suggested to prune the parameter search space for a more efficient auto-tuning process. This approach is applied to tune the OpenACC gang and vector clauses for a better mapping of the compute kernels onto the underlying architecture. Our experiments show a significant performance improvement against the default compiler parameters and drastic reduction in tuning time compared to a brute force search-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. OpenACC Standard specification. www.openacc-standard.org

  2. OpenMP 4.0 specification. www.openmp.org/mp-documents/OpenMP4.0.0.pdf

  3. Gabriel, E., Feki, S., Benkert, K., Chaarawi, M.: The abstract data and communication library. J. Algorithms Comput. Technol. 2(4), 581600 (2008)

    Article  Google Scholar 

  4. Gabriel, E., Feki, S., Benkert, K., Resch, M.: Towards performance and portability through runtime adaption for high performance computing applications. In: International Supercomputing Conference, Dresden, Germany, June 2008

    Google Scholar 

  5. Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th Symposium on Principles and Practice of Parallel Programming

    Google Scholar 

  6. Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In: The 1st Workshop on General Purpose Processing on Graphics Processing Units, GPGPU (2007)

    Google Scholar 

  7. Siddiqui, S., Feki, S.: Predictive performance tuning of OpenACC accelerated applications, 29th International Conference, 22–26 June 2014, Leipzig, Germany. LNCS, vol. 8488, pp. 511–512 (2014)

    Google Scholar 

  8. Feki, S., Gabriel, E.: A historic knowledge based approach for dynamic optimization. In: Proceedings of the International Conference on Parallel Computing, pp. 389–396 (2009)

    Google Scholar 

  9. Feki, S., Gabriel, E.: Incorporating historic knowledge into a communication library for self-optimizing high performance computing applications. In: Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems, Venice, Italy (2008)

    Google Scholar 

  10. Frigo, M., Johnson, S.: The design and implementation of FFTW3. Proceedings of IEEE 93(2), 216–231 (2005)

    Article  Google Scholar 

  11. Mametjanov, A., Lowell, M.C., Norris, B.: Autotuning stencil-based computations on GPUs, In: Cluster Conference, Beijing, China (2012)

    Google Scholar 

  12. Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18(1), 6594 (2004)

    Article  Google Scholar 

  13. Tillmann, M., Karcher, T., Dachsbacher, C., Tichy, W.F.: Application-independent autotuning for GPUs. In: International Conference on Parallel Computing, Munich, Germany (2013)

    Google Scholar 

  14. Feki, S., Al-Jarro, A., Bagci, H.: Multi-GPU-based acceleration of the explicit time domain volume integral equation solver using MPI-OpenACC. In: IEEE International Symposium on Antennas and Propagation and USNC/URSI National Radio Science, Lake Buena Vista, Florida, USA (2013)

    Google Scholar 

  15. Bodin, F.: Using CAPS compiler on NVIDIA kepler and CARMA systems. In: Supercomputing, Salt Lake City, Utah, USA (2012)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank NVIDIA for the hardware donation to KAUST as CUDA center of research and KAUST IT Research Computing for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saber Feki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Siddiqui, S., AlZayer, F., Feki, S. (2015). Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17353-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17352-8

  • Online ISBN: 978-3-319-17353-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics