Hierarchical Parallelism Control for Multigrain Parallel Processing

  • Motoki Obata
  • Jun Shirako
  • Hiroki Kaminaga
  • Kazuhisa Ishizaka
  • Hironori Kasahara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2481)


To improve effective performance and usability of shared memory multiprocessor systems, a multi-grain compilation scheme, which hierarchically exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional loop parallelism and near fine grain parallelism among statements inside a basic block, is important. In order to efficiently use hierarchical parallelism of each nest level, or layer, in multigrain parallel processing, it is required to determine how many processors or groups of processors should be assigned to each layer, according to the parallelism of the layer. This paper proposes an automatic hierarchical parallelism control scheme to assign suitable number of processors to each layer so that the parallelism of each hierarchy can be used efficiently. Performance of the proposed scheme is evaluated on IBM RS6000 SMP server with 8 processors using 8 programs of SPEC95FP.


Parallel Processing Basic Block Time Speedup Sequential Execution Processor Element 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wolfe, M.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)zbMATHGoogle Scholar
  2. 2.
    Banerjee, U.: Loop parallelization. Kluwer Academic Pub., Dordrecht (1994)zbMATHGoogle Scholar
  3. 3.
  4. 4.
    Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. on parallel and distributed systems 9 (1998)Google Scholar
  5. 5.
    Rauchwerger, L., Amato, N.M., Padua, D.A.: Run-time methods for parallelizing partially parallel loops. In: Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pp. 137–146 (1995)Google Scholar
  6. 6.
    Tu, P., Padua, D.: Automatic array privatization. In: Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing (1993)Google Scholar
  7. 7.
    Hall, M.W., Murphy, B.R., Amarasinghe, S.P., Liao, S., Lam, M.S.: Interprocedural parallelization analysis: A case study. In: Huang, C.-H., Sadayappan, P., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1995. LNCS, vol. 1033. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  8. 8.
    Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the suif compiler. IEEE Computer (1996)Google Scholar
  9. 9.
    Amarasinghe, S., Anderson, J., Lam, M., Tseng, C.: The suif compiler for scalable parallel machines. In: Proc. of the 7th SIAM conference on parallel processing for scientific computing (1995)Google Scholar
  10. 10.
    Lam, M.S.: Locallity optimizations for parallel machines. In: Third Joint International Conference on Vector and Parallel Processing (1994)Google Scholar
  11. 11.
    Lim, A.W., Lam., M.S.: Cache optimizations with affine partitioning. In: Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing (2001)Google Scholar
  12. 12.
    Yoshida, A., Koshizuka, K., Okamoto, M., Kasahara, H.: A data-localization scheme among loops for each layer in hierarchical coarse grain parallel processing. Trans. of IPSJ 40 (1999) (Japanese)Google Scholar
  13. 13.
    Rivera, G., Tseng, C.W.: Locality optimizations for multi-level caches. In: Super Computing 1999 (1999)Google Scholar
  14. 14.
    Han, H., Rivera, G., Tseng, C.W.: Software support for improving locality in scientific codes. In: 8th Workshop on Compilers for Parallel Computers, CPC 2000 (2000)Google Scholar
  15. 15.
    Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S.: A multigrain parallelizing compilation scheme on oscar. In: Proc. 4th Workshop on Languages and Compilers for Parallel Computing (1991)Google Scholar
  16. 16.
    Okamoto, M., Aida, K., Miyazawa, M., Honda, H., Kasahara, H.: A hierarchical macro-dataflow computation scheme of oscar multi-grain compiler. Trans. of IPSJ 35, 513–521 (1994) (Japanese)Google Scholar
  17. 17.
    Kasahara, H., Okamoto, M., Yoshida, A., Ogata, W., Kimura, K., Matsui, G., Matsuzaki, H., Honda, H.: Oscar multi-grain architecture and its evaluation. In: Proc. International Workshop on Innovative Architecture for Future Generation High- Performance Processors and Systems (1997)Google Scholar
  18. 18.
    Kasahara, H., Honda, H., Iwata, M., Hirota, M.: A macro-dataflow compilation scheme for hierarchical multiprocessor systems. In: Proc. Int’l. Conf. on Parallel Processing (1990)Google Scholar
  19. 19.
    Honda, H., Iwata, M., Kasahara, H.: Coarse grain parallelism detection scheme of fortran programs. Trans. IEICE J73-D-I (1990) (in Japanese) Google Scholar
  20. 20.
    Kasahara, H.: Parallel Processing Technology. Corona Publishing, Tokyo (1991) (in Japanese)Google Scholar
  21. 21.
    Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on smp using openmp. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, p. 189. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  22. 22.
    Kasahara, H., Honda, H., Narita, S.: Parallel processing of near fine grain tasks using static scheduling on oscar. In: Proc. IEEE ACM Supercomputing 1990 (1990)Google Scholar
  23. 23.
    Kimura, K., Kato, T., Kasahara, H.: Evaluation of processor core architecture for single chip multiprocessor with near fine grain parallel processing. Trans. of IPSJ 42 (2001) (Japanese)Google Scholar
  24. 24.
    Martorell, X., Ayguade, E., Navarro, N., Corbalan, J., Gozalez, M., Labarta, J.: Thread fork/join techniques for multi-level parllelism exploitation in numa multiprocessors. In: ICS 1999, Rhodes, Greece (1999)Google Scholar
  25. 25.
    Ayguade, E., Martorell, X., Labarta, J., Gonzalez, M., Navarro, N.: Exploiting multiple levels of parallelism in openmp: A case study. In: ICPP 1999 (1999)Google Scholar
  26. 26.
  27. 27.
    Brownhill, C.J., Nicolau, A., Novack, S., Polychronopoulos, C.D.: Achieving multilevel parallelization. In: Araki, K., Joe, K., Polychronopoulos, C.D. (eds.) ISHPC 1997. LNCS, vol. 1336. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  28. 28.
  29. 29.
    Girkar, M., Polychronopoulos, C.: Optimization of data/control conditions in task graphs. In: Proc. 4th Workshop on Languages and Compilers for Parallel Computing (1991)Google Scholar
  30. 30.
    Haghighat, M.R., Polychronopoulos, C.D.: Symbolic Analysis for Parallelizing Compliers. Kluwer Academic Publishers, Dordrecht (1995)Google Scholar
  31. 31.
  32. 32.
    Kasahara, H., Obata, M., Ishizaka, K.: Coarse grain task parallel processing on a shared memory multiprocessor system. Trans. of IPSJ 42 (2001) (Japanese)Google Scholar
  33. 33.
    Obata, M., Ishizaka, K., Kasahara, H.: Automatic coarse grain task parallel processing using oscar multigrain parallelizing compiler. In: Ninth International Workshop on Compilers for Parallel Computers, CPC 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Motoki Obata
    • 1
    • 2
  • Jun Shirako
    • 1
  • Hiroki Kaminaga
    • 1
  • Kazuhisa Ishizaka
    • 1
    • 2
  • Hironori Kasahara
    • 1
    • 2
  1. 1.Dept. of Electrical, Electronics and Computer EngineeringWaseda University 
  2. 2.Advanced Parallelizing Compiler Reserch Group 

Personalised recommendations