Skip to main content

OSCAR Fortran Multigrain Compiler

  • Chapter

Abstract

OSCAR Fortran multigrain compiler [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] has been developed since 1986 for a multiprocessor system OSCAR (Optimally Scheduled Advanced Multiprocessor) [11] having centralized and distributed shared memories in addition to local memory on each processor. This multigrain compiler allows ordinary users to get much higher effective performance easily. It automatically parallelizes every block of a program, such as Do-all loops, Do-across loops, sequential loops, subroutines, and basic blocks outside loops, in inter- and intra-block level. More concretely, the compiler hierarchically exploits coarse-grain parallelism among loops, subroutines and basic blocks [2, 3, 4, 6], conventional medium-grain parallelism among loop-iterations in a Do-all loop, and near-fine-grain parallelism among statements inside a basic block [8, 9, 10]. The coarse-grain parallelism is automatically detected by the earliest executable condition analysis of macrotasks [3, 4], or coarse-grain tasks, considering control dependencies and data dependencies among macrotasks. Macrotasks are dynamically assigned to processor-clusters with low overhead by a scheduling routine generated by the compiler [1, 4]. At the macrodataflow processing, data localization techniques are applied to minimize data transfer overhead among macrotasks by using the local memory on each processor [12, 13]. A macrotask composed of a Do-all or Do-across loop, which is assigned onto a processor- cluster, is hierarchically processed in parallel in the medium-grain (namely, in loop-iteration level) by processors inside the processor-cluster.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Kasahara et al. “A Multi-grain Parallelizing Compilation Scheme on OSCAR,” Proc. 4th Workshop on Languages and Compilers for Parallel Computing, August 1991.

    Google Scholar 

  2. H. Kasahara, H. Honda, M. Iwata, and M. Hirota, “A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems,” Proc. International Conference on Parallel Processing, August 1990.

    Google Scholar 

  3. H. Honda, M. Iwata, H. Kasahara, “Coarse Grain Parallelism Detection Scheme of Fortran programs,” Trans. IEICE, J73-D-I(12), December 1990 (in Japanese).

    Google Scholar 

  4. H. Kasahara, Parallel Processing Technology, Corona Publishing, Tokyo, (in Japanese), June 1991.

    Google Scholar 

  5. H. Kasahara, H. Honda, and S. Narita, “A Fortran Parallelizing Compilation Scheme for OSCAR Using Dependence Graph Analysis,” IEICE Trans., E74(10):3105–3114, 1991.

    Google Scholar 

  6. H. Honda, K. Aida, M. Okamoto, A. Yoshida, W. Ogata, and H. Kasahara, “Fortran Macro-Dataflow Compiler,” Proc. of Fourth Workshop on Compilers for Parallel Computers, December 1993.

    Google Scholar 

  7. M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara, “A Hierarchical Macro-Dataflow Computation Scheme for OSCAR Multi-grain Compiler,” Trans, of Information Processing Society of Japan, 35(4):513–521, (in Japanese), April 1994.

    Google Scholar 

  8. H. Kasahara and S. Narita, “An Approach to Supercomputing Using Mul-tiprocessor Scheduling Algorithms,” Proc. IEEE 1st International Conference on Supercomputing, 139–148,December 1985.

    Google Scholar 

  9. H. Kasahara, H. Honda, S. Narita, “Parallel Processing of near-fine Grain Tasks Using Static Scheduling on OSCAR,” Proc. IEEE ACM Supercom-puting’90, November 1990.

    Google Scholar 

  10. W. Ogata, A. Yoshida, K. Aida, M. Okamoto, and H. Kasahara, “Near-fine Grain Parallel Processing without Synchronization Using Static Scheduling,” Trans. Information Processing Society of Japan, 35(4):522–531, (in Japanese), April 1994.

    Google Scholar 

  11. H. Kasahara, S. Narita, and S. Hashimoto, “OSCAR’s Architecture,” Trans. IEICE, J71-D-I(8) (in Japanese), August 1988.

    Google Scholar 

  12. A. Yoshida, S. Maeda, W. Ogata, and H. Kasahara, “A Data-Localization Scheme for Fortran Macro-Dataflow Computation,” Trans. Information Processing Society of Japan, 35(9):1848–1860, (in Japanese), September 1994.

    Google Scholar 

  13. A. Yoshida, S. Maeda, W. Ogata, and H. Kasahara, “A Data-Localization Scheme among Doall/Sequential Loops for Fortran Coarse-Grain Parallel Processing,” Trans. IEICE, J78-D-I(2), (in Japanese), February 1995.

    Google Scholar 

  14. B.S. Baker, “An Algorithm for Structuring Flowgraphs,” J. ACM,24(1):98–120, January 1977.

    Article  MATH  Google Scholar 

  15. M. Burke and R. Cytron, “Interprocedural Dependence Analysis and Par-allelization,” Proc. ACMSIGPLAN’86 Symposium on Compiler Construction, 1986.

    Google Scholar 

  16. F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh and V. Sarkar, “A Framework for Determining Useful Parallelism,” Proc. 2nd ACM International Conference on Supercomputing, 1988.

    Google Scholar 

  17. J. Ferrante, K.J. Ottenstein, J.D. Warren, “The Program Dependence Graph and Its Use in Optimization,” ACM Trans. Programing Languages and Systems,9(3):319–349, July 1987.

    Article  MATH  Google Scholar 

  18. M. Girkar and CD. Polychronopoulos, “Optimization of Data/Control Conditions in Task Graphs,” Proc. 4th Workshop on Languages and Compilers for Parallel Computing, August 1991.

    Google Scholar 

  19. H. Kasahara, T. Fujii, H. Nakayama, and S. Narita, “A parallel Processing Scheme for the Solution of Sparse Linear Equations Using Static Optimal Multiprocessor Scheduling Algorithms,” Proc. 2nd International Conference on Super computing, May 1987.

    Google Scholar 

  20. H. Kasahara, W. Premchaiswadi, M. Tamura, Y. Maekawa, and S. Narita, “Parallel Processing of Sparse Matrix Solution Using Fine Grain Tasks on OSCAR,” Proc. International Conference on Parallel Processing, August 1991.

    Google Scholar 

  21. F.G. Gustavson, W. Liniger, and R.A. Willoughby, “Symbolic Generation of an Optimal Crout Algorithm for Sparse Systems of Linear Equations,” J.ACM, 17:87–109, January 1970.

    Article  MATH  Google Scholar 

  22. A.A. Berlin and R.J. Surati, “Exploiting the Parallelism Exposed by Partial Evaluation,” MIT AI Lab. A.I. Memo No. 1414, April 1993.

    Google Scholar 

  23. H. Kasahara and S. Narita, “Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing,” IEEE Trans. Computers, c-33(11):1023–1029, 1984 November.

    Article  Google Scholar 

  24. E.G. Coffman Jr. (ed.), Computer and Job-shop Scheduling Theory, New York, Wiley, 1976.

    MATH  Google Scholar 

  25. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco, Freeman, 1979.

    MATH  Google Scholar 

  26. Y. Kodama, Y. Koumura, M.Sato, H. Sakane, S. Sakai, Y. Yamaguchi, “EMC-Y: Parallel Processing Element Optimizing Communication and Computation,” Proc. A CM International Conference on Supercomputing,July 1993.

    Google Scholar 

  27. H.G. Dietz, T. Schewederski, M.T. O’Keefe, A. Zaafrani, “Extended Static Synchronization Beyond VLIW,” Proc. Supercomputing’89, 1989.

    Google Scholar 

  28. M. O’Keefe and H. Dietz, “Hardware Barrier Synchronization: Static Barrier MIMD,” Proc. 1990 International Conference on Parallel Processing, 1:35–42, August 1990.

    Google Scholar 

  29. D.A. Padua and M.J. Wolfe, “Advanced Compiler Optimizations for Supercomputers,” Communications of the ACM, 29(12): 1184–1201, December 1986.

    Article  Google Scholar 

  30. M. Wolfe, Optimizing Supercompilers for Supercomputers, Cambridge, MA, MIT Press, 1989.

    MATH  Google Scholar 

  31. U. Banerjee, Dependence Analysis for Supercomputing, Boston, MA, Kluwer Academic, 1988.

    Google Scholar 

  32. W. Pugh, “The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis,” Proc. Supercomputing’91, 1991.

    Google Scholar 

  33. P.M. Petersen and D.A. Padua, “Static and Dynamic Evaluation of Data Dependence Analysis,” Proc. International Conference on Supercommputing, June 1993.

    Google Scholar 

  34. S.S. Munshi and B. Simons, “Scheduling Sequential Loops on Parallel Processors,” SIAM J. Comput., 19(4):728–741, August 1990.

    Article  MathSciNet  MATH  Google Scholar 

  35. D.D. Gajski, D.J. Kuck and D.A. Padua, “Dependence Driven Computation,” Proc. COMPCON 81 Spring Computer Conference,168–172, February 1981.

    Google Scholar 

  36. D. Gajski, D. Kuck, D. Lawrie and A. Sameh, “CEDAR,” Report UIUCDCS-R-83-1123, Department of Computer Science University of Illinois at Urbana-Champaign, February 1983.

    Google Scholar 

  37. D.J. Kuck, E.S. Davidson, D.H. Lawrie and A.H. Sameh, “Parallel Super-computing Today and Cedar Approach,” Science, 231:967–974, February 1986.

    Google Scholar 

  38. H.E. Husmann, D.J. Kuck and D.A. Padua, “Automatic Compound Function Definition for Multiprocessors,” Proc. 1988 International Conference on Parallel Processing, August 1988.

    Google Scholar 

  39. J.A. Fisher, “The VLIW Machine: A Multiprocessor for Compiling Scientific Code,” IEEE Computer, 17(7):45–53, July 1984.

    Article  Google Scholar 

  40. R.P. Colwell et.al., “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. Compuers, C-37(8):967–979, August 1989.

    MathSciNet  Google Scholar 

  41. J.R. Ellis, Bulldog: A Compiler for VLIW Architectures, Cambridge, MA, MIT Press, 1985.

    Google Scholar 

  42. J.A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Trans. Computers, C-30(7):478–490, July 1981.

    Article  Google Scholar 

  43. A. Nicolau, “Uniform Parallelism Exploitation in Ordinary Programs,” Proc. 1985 International Conference on Parallel Processing, August 1985.

    Google Scholar 

  44. A. Aiken and A. Nicolau, “Perfect Pipelining: A New Loop Paralleliza-tion Technique,” Cornell University Computer Science Report, No.87-873, October 1987.

    Google Scholar 

  45. N.P. Jouppi, “The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance,” IEEE Trans. Computers, C-38(12): 1645–1657, December 1989.

    Article  Google Scholar 

  46. CD. Polychronopoulos, Parallel Programming and Compilers, Boston Kluwer Academic, 1988.

    Book  MATH  Google Scholar 

  47. V. Sarkar, “Determining Average Program Execution Times and Their Variance”, Proc. Sigplan’89, June 1989.

    Google Scholar 

  48. V. Sarkar, Partitioning and Scheduling Parallel Programs for Multiprocessors, Cambridge, MA, MIT Press, 1989.

    MATH  Google Scholar 

  49. S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer and C. Tseng, “An Over-view of the Fortran D Programming System,” Proc, Workshop on Languages and Compilers for Parallel Computing, 18–34, August 1991.

    Google Scholar 

  50. High Performance Fortran Forum, High Performance Fortran Language Specification, 1.0, May 1993.

    Google Scholar 

  51. P. Tu and D. Padua, “Automatic Array Privatization,” 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993

    Google Scholar 

  52. Zhiyuan Li, “Array Privatization for Parallel Execution of Loops,” Proc. the 1992 ACM International Conference on Supercomputing, 313-322, 1992.

    Google Scholar 

  53. R. Eigenman, J. Hoeflinger, G. Jaxon, Z. Li and D. Padua, “Restructuring Fortran Programs for Cedar,” International Conference on Parallel Processing 1:57–66, 1991.

    Google Scholar 

  54. J. Li and M. Chen, “Generating Explicit Communication from Shared-Memory Program References,” Proc. Supercomputing’90, 865–876, 1990.

    Google Scholar 

  55. M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techiniques for Parallelizing Compilers on Multicomputers,” IEEE Trans. Parallel and Distributed System, 3(2):179–193, 1992.

    Article  Google Scholar 

  56. J.M. Anderson and M.S. Lam, “Global Optimizations for Parallelism and Locality on Scalable Parallel Machines,” Proc. the SIGPLAN’ 93 Con-ference on Programming Language Design and Implementation, 112–125, 1993.

    Google Scholar 

  57. L. Kipp, “Perfect Benchmarks Documentation Suite 1,” CSRD University of Illinois at Urbana-Champaign, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Kasahara, H., Honda, H., Aida, K., Okamoto, M., Yoshida, A., Ogata, W. (1995). OSCAR Fortran Multigrain Compiler. In: Bic, L.F., Nicolau, A., Sato, M. (eds) Parallel Language and Compiler Research in Japan. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2269-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-2269-0_11

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5957-9

  • Online ISBN: 978-1-4615-2269-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics