International Journal of Parallel Programming

, Volume 41, Issue 6, pp 825–854 | Cite as

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

  • Barbara Chapman
  • Deepak Eachempati
  • Oscar Hernandez


The OpenUH compiler is a branch of the open source Open64 compiler suite for C, C++, and Fortran 95/2003, with support for a variety of targets including x86_64, IA-64, and IA-32. For the past several years, we have used OpenUH to conduct research in parallel programming models and their implementation, static and dynamic analysis of parallel applications, and compiler integration with external tools. In this paper, we describe the evolution of the OpenUH infrastructure and how we’ve used it to carry out our research and teaching efforts.


Compilers OpenMP PGAS Parallelization 



We would like to thanks our funding agencies for their support. The work described in this paper was funded by the following grants: National Science Foundation under contracts CCF-0444468, CCF-0702775, CCF-0833201; Department of Energy under contracts DE-FC03-01ER25502, DE-FC02-06ER25759. Support for our CAF implementation was partially sponsored by Total.


  1. 1.
    The Open64 compiler. (2011)
  2. 2.
    Addison, C., LaGrone, J., Huang, L., Chapman, B.: OpenMP 3.0 tasking implementation in OpenUH. In: Open64 Workshop in Conjunction with the International Symposium on Code Generation and, Optimization (2009)Google Scholar
  3. 3.
    Adhianto, L., Chapman, B.: Performance modeling and analysis of hybrid MPI and OpenMP applications. University of Houston Department of Computer Science, technical report (2006)Google Scholar
  4. 4.
    Adhianto, L., Chapman, B.: Performance modeling of communication and computation in hybrid MPI and OpenMP applications. In: ICPADS ’06: Proceedings of the 12th International Conference on Parallel and Distributed Systems, pp. 3–8. IEEE Computer Society, Washington, DC, USA (2006). doi: 10.1109/ICPADS.2006.81
  5. 5.
    Adve, V.S., Bagrodia, R., Browne, J.C., Deelman, E., Dube, A., Houstis, E.N., Rice, J.R., Sakellariou, R., Sundaram-Stukel, D.J., Teller, P.J., Vernon, M.K.: Poems: end-to-end performance design of large parallel adaptive computational systems. IEEE Trans. Softw. Eng. 26(11), 1027–1048 (2000). doi: 10.1109/32.881716 CrossRefGoogle Scholar
  6. 6.
    Balart, J., Duran, A., Gonzalez, M., Martorell, X., Ayguade, E., Labarta, J.: Nanos Mercurium: a research compiler for OpenMP. In: The 6th European Workshop on OpenMP (EWOMP ’04). Stockholm, Sweden (2004)Google Scholar
  7. 7.
    Balasundaram, V., Kennedy, K.: Compile-time detection of race conditions in a parallel program. In: ICS ’89: Proceedings of the 3rd International Conference on Supercomputing, pp. 175–185. ACM Press, Crete, Greece (1989). doi: 10.1145/318789.318809
  8. 8.
    Beddall, A.: The g95 project.
  9. 9.
    Bonachea, D.: Gasnet specification, v1.1. Technical report, Berkeley, CA, USA (2002)Google Scholar
  10. 10.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000). doi: 10.1177/109434200001400303 CrossRefGoogle Scholar
  11. 11.
    Brunst, H., Kranzlm ller, D., Nagel, W.E.: Tools for scalable parallel program analysis Vampir VNG and DeWiz. In: DAPSYS, pp. 93–102 (2004)Google Scholar
  12. 12.
    Buck, B., Hollingsworth, J.K.: An API for runtime code patching. Int. J. High Perform. Comput. Appl. 14(4), 317–329 (2000). Scholar
  13. 13.
    Bui, V., Hernandez, O., Chapman, B., Kufrin, R., Tafti, D., Gopalkrishnan, P.: Towards an implementation of the OpenMP collector api. In: PARCO (2007)Google Scholar
  14. 14.
    Callahan, D., Kennedy, K., Subhlok, J.: Analysis of event synchronization in a parallel programming tool. In: PPOPP ’90: Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 21–30. ACM Press, Seattle, Washington, USA (1990). doi: 10.1145/99163.99167
  15. 15.
    Chapman, B.M., Huang, L., Jin, H., Jost, G., de Supinski, B.R.: Toward enhancing OpenMP’s work-sharing directives. In: Europar 2006, pp. 645–654 (2006)Google Scholar
  16. 16.
    Chen, W.Y., Bonachea, D., Iancu, C., Yelick, K.: Automatic nonblocking communication for partitioned global address space programs. In: Proceedings of the 21st Annual International Conference on Supercomputing, ICS ’07, pp. 158–167. ACM, New York, NY, USA, 2007, 10(1145/1274971), pp. 1274995 (2007). doi: 10.1145/1274971.1274995
  17. 17.
    Chen, W.Y., Iancu, C., Yelick, K.: Communication optimizations for fine-grained upc applications. In: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, PACT ’05, pp. 267–278. IEEE Computer Society, Washington, DC, USA (2005). doi: 10.1109/PACT.2005.13
  18. 18.
    Chow, F., Chan, S., Kennedy, R., Liu, S.M., Lo, R., Tu, P.: A new algorithm for partial redundancy elimination based on ssa form. In: Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, PLDI ’97, pp. 273–286. ACM, New York, NY, USA (1997). doi: 10.1145/258915.258940
  19. 19.
    Chow, F.C., Chan, S., Liu, S.M., Lo, R., Streich, M.: Effective representation of aliases and indirect memory operations in SSA form. In: Computational Complexity ’96: Proceedings of the 6th International Conference on Compiler Construction, pp. 253–267. Springer, London, UK (1996)Google Scholar
  20. 20.
    Dotsenko, Y., Coarfa, C., Mellor-Crummey, J.: A multi-platform co-array fortran compiler. In: PACT ’04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 29–40. IEEE Computer Society, Washington, DC, USA (2004). doi: 10.1109/PACT.2004.3
  21. 21.
    Eachempati, D., Huang, L., Chapman, B.M.: Strategies and implementation for translating OpenMP code for clusters. In: Perrott, R.H., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC, Lecture Notes in Computer Science, vol. 4782, pp. 420–431. Springer (2007)Google Scholar
  22. 22.
    Eachempati, D., Jun, H.J., Chapman, B.: An open-source compiler and runtime implementation for coarray Fortran. In: PGAS ’10. ACM Press, New York, NY, USA (2010)Google Scholar
  23. 23.
    Fahringer, T., Clovis Seragiotto, J.: Aksum: a performance analysis tool for parallel and distributed applications, pp. 189–208 (2004)Google Scholar
  24. 24.
    Fahringer, T., J nior, C.S.: Automatic search for performance problems in parallel and distributed programs by using multi-experiment analysis. In: Proceedings of the 9th International Conference On High Performance Computing (HiPC 2002), pp. 151–162. Springer, Bangalore, India (2002)Google Scholar
  25. 25.
    the GNU compiler collection. (2005)
  26. 26.
    Girona, S., Labarta, J., Badia, R.M.: Validation of dimemas communication model for mpi collective operations. In: Proceedings of the 7th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 39–46. Springer, London, UK (2000)Google Scholar
  27. 27.
    Han, T.D., Abdelrahman, T.S.: /hi/cuda: a high-level directive-based language for gpu programming. In: GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 52–61. ACM, New York, NY, USA (2009). doi: 10.1145/1513895.1513902
  28. 28.
    Hernandez, O., Chapman, B.: Compiler support for efficient profiling and tracing. In: Parallel Computing (ParCo 2007) (2007)Google Scholar
  29. 29.
    Hernandez, O., Chapman, B., et al.: Open source software support for the openmp runtime api for profiling. In: The 2nd International Workshop on Parallel Programming Models and Systems Software for High-End, Computing (P2S2) (2009)Google Scholar
  30. 30.
    Hernandez, O., Nanjegowda, R.C., Chapman, B.M., Bui, V., Kufrin, R.: Open source software support for the openmp runtime api for profiling. In: ICPP Workshops, pp. 130–137 (2009)Google Scholar
  31. 31.
    Hernandez, O.R.: Efficient performance tuning methodology with compiler feedback. Ph.D. thesis, Houston, TX, USA (2008). AAI3313493.Google Scholar
  32. 32.
    Huang, L., Chapman, B., Kendall, R.: OpenMP on distributed memory via global arrays. In: Parallel Computing (PARCO 2003). DRESDEN, Germany (2003)Google Scholar
  33. 33.
    Huang, L., Chapman, B., Liao, C.: An implementation and evaluation of thread subteam for openmp extensions. In: Programming Models for Ubiquitous Parallelism (PMUP 06). Seattle, WA (2006)Google Scholar
  34. 34.
    Huang, L., Chapman, B., Liu, Z.: Towards a more efficient implementation of OpenMP for clusters via translation to global arrays. Parallel Comput. 31(10–12) (2005)Google Scholar
  35. 35.
    Huang, L., Eachempati, D., Hervey, M.W., Chapman, B.: Extending global optimizations in the openUH compiler for openMP. In: Open64 Workshop at CGO 2008, In Conjunction with the International Symposium on Code Generation and Optimization (CGO). Boston, MA (2008)Google Scholar
  36. 36.
    Huang, L., Jin, H., Chapman, B.: Introducing locality-awareness computation into openmp. In: IWOMP ’10 (2010, submitted)Google Scholar
  37. 37.
    Huang, L., Jin, H., Yi, L., Chapman, B.: Enabling locality-aware computations in OpenMP. Sci. Program. 18(3), 169–181 (2010)Google Scholar
  38. 38.
    Huang, L., Sethuraman, G., Chapman, B.: Parallel data flow analysis for openmp programs. In: Proceedings of IWOMP (2007)Google Scholar
  39. 39.
    Intel: Intel itanium2 Processor Reference Manual for Software Development and Optimization, vol. 1 (2004)Google Scholar
  40. 40.
    Itzkowitz, M., Mazurov, O., Copty, N., Lin, Y.: White paper: an openmp runtime api for profiling. Technical report, Sun Microsystems, Inc. (2007)
  41. 41.
    Jin, H., Chapman, B., Huang, L.: Performance evaluation of a multi-zone application in different openmp approaches. In: Proceedings of IWOMP (2007)Google Scholar
  42. 42.
    Johnson, S.P., Evans, E., Jin, H., Ierotheou, C.S.: The parawise expert assistant widening accessibility to efficient and scalable tool generated OpenMP code. In: WOMPAT, pp. 67–82 (2004)Google Scholar
  43. 43.
    LaGrone, J., Aribuki, A., Addison, C., Chapman, B.M.: A runtime implementation of openmp tasks. In: 7th International Workshop on OpenMP, IWOMP2011, pp. 165–178 (2011)Google Scholar
  44. 44.
    Lee, J., Padua, D.A., Midkiff, S.P.: Basic compiler algorithms for parallel programs. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’99), pp. 1–12. ACM SIGPLAN, Atlanta, Georgia, USA (1999)Google Scholar
  45. 45.
    Lee, S.I., Johnson, T.A., Eigenmann, R.: Cetus an extensible compiler infrastructure for source-to-source transformation. In: LCPC, pp. 539–553 (2003)Google Scholar
  46. 46.
    Liao, C., Chapman, B.: Invited paper: a compile-time cost model for OpenMP. In: 12th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS) (March 2007)Google Scholar
  47. 47.
    Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: an optimizing, portable OpenMP compiler. In: 12th Workshop on Compilers for Parallel Computers (2006)Google Scholar
  48. 48.
    Liao, C., Liu, Z., Huang, L., Chapman, B.: Evaluating OpenMP on chip multithreading platforms. In: 1st International Workshop on OpenMP. Eugene, Oregon, USA (2005)Google Scholar
  49. 49.
    Liao, C., Quinlan, D.J., Panas, T., de Supinski, B.R.: A rose-based openmp 3.0 research compiler supporting multiple runtime libraries. In: Sato, M., Hanawa, T., M ller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP, Lecture Notes in Computer Science, vol. 6132, pp. 15–28. Springer (2010)Google Scholar
  50. 50.
    Malony, A.D., Shende, S., Bell, R., Li, K., Li, L., Trebon, N.: Advances in the tau performance system. Performance Analysis and Grid, Computing, pp. 129–144 (2004)Google Scholar
  51. 51.
    Mellor-Crummey, J., Adhianto, L., Scherer, W.: A new vision for Coarray Fortran. In: PGAS ’09. Rice University (2009)Google Scholar
  52. 52.
    MetaSim: Scholar
  53. 53.
    Moene, T.: Towards an implementation of Coarrays in GNU Fortran.
  54. 54.
    Mohr, B., Wolf, F.: KOJAK a tool set for automatic performance analysis of parallel applications. In: Proceedings of the European Conference on Parallel Computing (EuroPar), pp. 1301–1304 (2003)Google Scholar
  55. 55.
    Nanjegowda, R.C., Hernandez, O., Chapman, B.M., Jin, H.: Scalability evaluation of barrier algorithms for openmp. In: IWOMP, pp. 42–52 (2009)Google Scholar
  56. 56.
    Nieplocha, J., Carpenter, B.: ARMCI: a portable remote memory copy library for distributed array libraries and compiler run-time systems. In: Proceedings of the 11 IPPS/SPDP ’99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pp. 533–546. Springer (1999)Google Scholar
  57. 57.
    for Non-Experts, I.O.T.: Scholar
  58. 58.
    OpenMP: simple, portable, scalable SMP programming. (2006)
  59. 59.
    The OpenUH compiler project. openuh (2005)
  60. 60.
    Petersen, P., Shah, S.: OpenMP support in the Intel Thread Checker. In: Proceedings of the Workshop on OpenMP Applications and Tools (WOMPAT). Toronto, Ontario, Canada (2003)Google Scholar
  61. 61.
    Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: a tool to visualize and analyze parallel code. In: Nixon, P. (ed.) Proceedings of WoTUG-18: Transputer and Occam Developments, pp. 17–31 (1995)Google Scholar
  62. 62.
    Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core and multiprocessor systems. In: In HPCA O07: Proceedings of the 13th International Symposium on High-Performance Computer, Architecture (2007)Google Scholar
  63. 63.
    de Rose, L.A., Reed, D.A.: SvPablo: a multi-language architecture-independent performance analysis system. In: ICPP ’99: Proceedings of the 1999 International Conference on Parallel Processing, p. 311. IEEE Computer Society, Washington, DC, USA (1999)Google Scholar
  64. 64.
    Sato, M., Satoh, S., Kusano, K., Tanaka, Y.: Design of openmp compiler for an smp cluster. In: In EWOMP ’99, pp. 32–39 (1999)Google Scholar
  65. 65.
    TAU Tuning and Analysis Utilites. (2008)
  66. 66.
    Wicaksono, B., Tolubaeva, M., Chapman, B.M.: Detecting false sharing in openmp applications using the darwin framework. In: In Proceedings of 24th International Workshop on Languages and Compilers for Parallel Computing (2011)Google Scholar
  67. 67.
    Wolf, M.E., Maydan, D.E., Chen, D.K.: Combining loop transformations considering caches and scheduling. In: MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 274–286. IEEE Computer Society, Washington, DC, USA (1996)Google Scholar
  68. 68.
    Wolf, M.E., Maydan, D.E., Chen, D.K.: Combining loop transformations considering caches and scheduling. Int. J. Parallel Program. 26(4), 479–503 (1998). doi: 10.1023/A:1018754616274 CrossRefGoogle Scholar
  69. 69.
    Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. In: PLDI ’03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pp. 63–76. ACM Press, New York, NY, USA (2003). doi: 10.1145/781131.781140

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Barbara Chapman
    • 1
  • Deepak Eachempati
    • 1
  • Oscar Hernandez
    • 2
  1. 1.Department of Computer ScienceUniversity of HoustonHoustonUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations