Performance of OSCAR Multigrain Parallelizing Compiler on SMP Servers

  • Kazuhisa Ishizaka
  • Takamichi Miyamoto
  • Jun Shirako
  • Motoki Obata
  • Keiji Kimura
  • Hironori Kasahara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3602)


This paper describes performance of OSCAR multigrain parallelizing compiler on various SMP servers, such as IBM pSeries 690, Sun Fire V880, Sun Ultra 80, NEC TX7/i6010 and SGI Altix 3700. The OSCAR compiler hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and the near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Also, it allows us global cache optimization over different loops, or coarse grain tasks, based on data localization technique with inter-array padding to reduce memory access overhead. Current performance of OSCAR compiler is evaluated on the above SMP servers. For example, the OSCAR compiler generating OpenMP parallelized programs from ordinary sequential Fortran programs gives us 5.7 times speedup, in the average of seven programs, such as SPEC CFP95 tomcatv, swim, su2cor, hydro2d, mgrid, applu and turb3d, compared with IBM XL Fortran compiler 8.1 on IBM pSeries 690 24 processors SMP server. Also, it gives us 2.6 times speedup compare with Intel Fortran Itanium Compiler 7.1 on SGI Altix 3700 Itanium 2 16 processors server, 1.7 times speedup compared with NEC Fortran Itanium Compiler 3.4 on NEC TX7/i6010 Itanium 2 8 processors server, 2.5 times speedup compared with Sun Forte 7.0 on Sun Ultra 80 UltraSPARC II 4 processors desktop workstation, and 2.1 times speedup compare with Sun Forte compiler 7.1 on Sun Fire V880 UltraSPARC III Cu 8 processors server.


Time Speedup Cache Size Dynamic Schedule Static Schedule Task Parallelism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen, R., Kennedy, K.: Optimizaing Compilers for Modern Architectures. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  2. 2.
    Wolfe, M.: High performance compilers for parallel computing. Addison-Wesley, Reading (1996)zbMATHGoogle Scholar
  3. 3.
    Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. on parallel and distributed systems 9(1) (January 1998)Google Scholar
  4. 4.
    Pugh, W.: The omega test: A fast and practical integer programming algorithm for dependence analysis. In: Proc. of Super Computing 1991 (1991)Google Scholar
  5. 5.
    Haghighat, M.R., Polychronopoulos, C.D.: Symbolic analysis for parallelizing compilers. Kluwer Academic Publishers, Dordrecht (1995)zbMATHGoogle Scholar
  6. 6.
    Tu, P., Padua, D.: Automatic array privatization. In: Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing (1993)Google Scholar
  7. 7.
    Rauchwerger, L., Amato, N.M., Padua, D.A.: Run-time methods for parallelizing partially parallel loops. In: Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, July 1995, pp. 137–146 (1995)Google Scholar
  8. 8.
    Hoeflinger, J., Paek, Y.: Unified interprocedural parallelism detection. International Journal of Parallel Processing (2000)Google Scholar
  9. 9.
    Hall, M.W., Murphy, B.R., Amarasinghe, S.P., Liao, S., Lam, M.S.: Interprocedural parallelization analysis: A case study. In: Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (August 1995)Google Scholar
  10. 10.
    Anderson, J.M., Amarasinghe, S.P., Lam, M.S.: Data and computation transformations for multiprocessors. In: Proc. of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing (July 1995)Google Scholar
  11. 11.
    Lim, A.W., Cheong, G.I., Lam, M.S.: An affine partitoning algorithm to maximize parallelism and minimize communication. In: Proc. of the 13th ACM SIGARCH International Conference on Supercomputing (January 1999)Google Scholar
  12. 12.
    Martorell, X., Ayguade, E., Navarro, N., Corbalan, J., Gonzalez, M., Labarta, J.: Thread fork/join techniques for multi-level parallelism exploitatio in numa multiprocessors. In: Proc. of the 1999 International Conference on Supercomputing (June 1999)Google Scholar
  13. 13.
    Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on smp using openmp. In: Proc. of 13 th International Workshop on Languages and Compilers for Parallel Computing 2000 (August 2000)Google Scholar
  14. 14.
    Kasahara, H., Honda, H., Iwata, M., Hirota, M.: A macro-dataflow compilation scheme for hierarchical multiprocessor systems. In: Proc. Int’l. Conf. on Parallel Processing (August 1990)Google Scholar
  15. 15.
    Kasahara, H., Honda, H., Narita, S.: Parallel processing of near fine grain tasks using static scheduling on oscarGoogle Scholar
  16. 16.
    Kimura, K., Kasahara, H.: Near fine grain parallel processing using static scheduling on single chip multiprocessors. In: Proc. of International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (November 1999)Google Scholar
  17. 17.
    Kasahara, H., Yhoshida, A., Koshizuka, K.: Data-localization using loop aligned decomposition for macro-dataflow processing. In: Proc. of 9th Workshop on Languages and Compilers for Parallel Computing (August 1996)Google Scholar
  18. 18.
    Ishizaka, K., Obata, M., Kasahara, H.: Coarse grain task parallel processing with cache optimization on shared memory multiprocessor. In: Proc. of 14th International Workshop on Languages and Compilers for Parallel Computing (August 2001)Google Scholar
  19. 19.
    Ishizaka, K., Obata, M., Kasahara, H.: Cache optimization for coarse grain task parallel processing using inter-array padding. In: Proc. of 16th International Workshop on Languages and Compilers for Parallel Computing (October 2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Kazuhisa Ishizaka
    • 1
  • Takamichi Miyamoto
    • 1
  • Jun Shirako
    • 1
  • Motoki Obata
    • 2
  • Keiji Kimura
    • 1
  • Hironori Kasahara
    • 1
  1. 1.Department of Computer Science, Advanced Chip Multiprocessor Research InstituteWaseda UniversityTokyoJapan
  2. 2.System Development LaboratoryHitachi Co.Ltd. 

Personalised recommendations