Science China Information Sciences

, Volume 53, Issue 5, pp 932–944 | Cite as

OpenMP compiler for distributed memory architectures

  • Jue Wang
  • ChangJun Hu
  • JiLin Zhang
  • JianJiang Li
Research Papers


OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the “partially replicating shared arrays” memory model, we propose an algorithm for shared array recognition based on the inter-procedural analysis, optimization technique based on the producer/consumer relationship, and communication generation technique for nonlinear references. We evaluate the performance on nine benchmarks which cover computational fluid dynamics, integer sorting, molecular dynamics, earthquake simulation, and computational chemistry. The average scalability achieved by KLCoMP version is close to that achieved by MPI version. We compare the performance of our translated programs with that of versions generated for Omni+SCASH, LLCoMP, and OpenMP(Purdue), and find that parallel applications (especially, irregular applications) translated by KLCoMP can achieve more effective performance than other versions.


parallel compiling high performance computing distributed memory architecture OpenMP irregular application 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OpenMP Architecture Review Board. OpenMP Application Program Interface, version 2.5, 2005Google Scholar
  2. 2.
    Sato M, Satoh S, Kusano K, et al. Design of OpenMP compiler for an SMP cluster. In: Proc. of the 1st European Workshop on OpenMP. Belin: Springer, 1999. 32–39Google Scholar
  3. 3.
    Costa J J, Cortes T, Martorell X, et al. Running OpenMP applications efficiently on an everything-shared SDSM. J Parall Distrib Comput, 2006, 66: 647–658zbMATHCrossRefGoogle Scholar
  4. 4.
    Min S J, Eigenmann R. Combined compile-time and runtime-driven, pro-active data movement in software DSM systems. In: Proc. of Seventh Workshop on Languages, Compilers, and Run-time Support for Scalable Systems, Houston, Texas, 2004. 1–6Google Scholar
  5. 5.
    Lu H H. Quantifying the performance differences between PVM and TreadMarks. J Parall Distrib Comput, 1997, 43: 65–78CrossRefGoogle Scholar
  6. 6.
    Basumallik A, Min S, Eigenmann R. Programming distributed memory systems using OpenMP. In: Proc. of International Parallel and Distributed Processing Symposium. New York: IEEE Press, 2007. 1–8Google Scholar
  7. 7.
    Basumallik A, Eigenmann R. Towards automatic translation of OpenMP to MPI. In: Proc. of the 19th Annual International Conference on Supercomputing. New York: ACM Press, 2005. 189–198CrossRefGoogle Scholar
  8. 8.
    Basumallik A, Eigenmann R. Optimizing irregular shared-memory applications for distributed-memory systems. In: Proc. of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM Press, 2006. 119–128CrossRefGoogle Scholar
  9. 9.
  10. 10.
    Dorta A, Lopez P, Sande F. Basic skeletons in llc. Parall Comput, 2006, 32: 491–506CrossRefGoogle Scholar
  11. 11.
    Eigenmann R, Hoeflinger J, Kuhn R H, et al. Is OpenMP for Grids? In: Proc. of International Parallel and Distributed Processing Symposium. New York: IEEE Press, 2002. 171–178CrossRefGoogle Scholar
  12. 12.
    Jeun W C, Kee Y S, Ha S. Improving performance of OpenMP for SMP clusters through overlapped page migrations. In: Proc. of International Workshop on OpenMP, Reims, France, 2006Google Scholar
  13. 13.
    Eachempati D, Huang L, Chapman B M. Strategies and implementation for translating OpenMP code for clusters. In: Proc. of High Performance Computing and Communications. Belin: Springer, 2007. 420–431CrossRefGoogle Scholar
  14. 14.
    Jin H, Frumkin M, Yan J. The OpenMP implementation of NAS parallel benchmarks and its performance. Technical Report NAS-99-011, 1999Google Scholar
  15. 15.
    Aslot V, Domeika M, Eigenmann R. SPEComp: A new benchmark suite for measuring parallel computer performance. In: Proc. of the Workshop on OpenMP Applications and Tools. Belin: Springer, 2001. 1–10Google Scholar
  16. 16.
    COSMIC group, University of Maryland. COSMIC software for irregular applications.
  17. 17.
    Brooks B R, Bruccoleri R E, Olafson B D, et al. A program for macromolecular energy, minimization, and dynamics calculations. J Comp Chem, 1983, 4: 187–217CrossRefGoogle Scholar
  18. 18.
    Brandes T. ADAPTOR Users Guide, Fraunhofer Gesellschaft, Augustin, Germany, 2004Google Scholar
  19. 19.
    Petersen P, Padua D A. Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parall Distrib Syst, 1996, 7: 1121–1132CrossRefGoogle Scholar
  20. 20.
    Brezany P, Dang M. CHAOS+ Runtime Library. Internal Report, Institute for Software Technology and Parallel Systems, University of Vienna, September 1997Google Scholar
  21. 21.
    Michelle M, Barbara K, Paul D. Data-flow analysis for MPI programs. In: Proceedings of the 2006 International Conference on Parallel Processing, Columbus, Ohio, USA, 2006. 175–184Google Scholar
  22. 22.
    Wang J, Hu C J, Zhang J L, et al. An optimized strategy for collective communication in data parallelism (in Chinese). Chinese J Comput, 2008, 2: 318–328MathSciNetGoogle Scholar
  23. 23.
    Engelen R, Birch J, Shou Y, et al. A unified framework for nonlinear dependence testing and symbolic analysis. In: Proc. of the ACM International Conference on Supercomputing. New York: ACM Press, 2004. 106–115Google Scholar
  24. 24.
    Li Z. Array privatization for parallel execution of loops. In: Proc. of the ACM International Conference on Supercomputing. New York: ACM Press, 1992. 313–322Google Scholar
  25. 25.
    Haghighat M R, Polychronopoulos C D. Symbolic analysis for parallelizing compilers. ACM Trans Program Languag Syst, 1996. 18: 477–518CrossRefGoogle Scholar
  26. 26.
    Hu C, Li J, Wang J, et al. Communication generation for irregular parallel applications. In: Proc. of IEEE International Symposium on Parallel Computing in Electrical Engineering. New York: IEEE Press, 2006. 263–270Google Scholar
  27. 27.
    Wang J, Hu C, Zhang J, et al. OpenMP extensions for irregular parallel applications on cluster international workshop on OpenMP. Lecture Notes in Computer Science 4935. Berlin: Springer Publisher, 2007. 101–111Google Scholar
  28. 28.
    Tseng E, Gaudlot J. Communication generation for aligned and cyclic(k) distributions using integer lattice. IEEE Trans Parallel Distrib Syst, 1999, 10: 136–146CrossRefGoogle Scholar
  29. 29.
    Ojima Y, Sato M, Harada H, et al. Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system, cluster computing and the grid. In: Proc. of 3rd IEEE/ACM International Symposium on CCGrid, Tokyo, Japan, 2003. 450–456Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jue Wang
    • 1
  • ChangJun Hu
    • 1
  • JiLin Zhang
    • 1
  • JianJiang Li
    • 1
  1. 1.School of Information and EngineeringUniversity of Science and Technology BeijingBeijingChina

Personalised recommendations