Abstract
OSCAR Fortran multigrain compiler [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] has been developed since 1986 for a multiprocessor system OSCAR (Optimally Scheduled Advanced Multiprocessor) [11] having centralized and distributed shared memories in addition to local memory on each processor. This multigrain compiler allows ordinary users to get much higher effective performance easily. It automatically parallelizes every block of a program, such as Do-all loops, Do-across loops, sequential loops, subroutines, and basic blocks outside loops, in inter- and intra-block level. More concretely, the compiler hierarchically exploits coarse-grain parallelism among loops, subroutines and basic blocks [2, 3, 4, 6], conventional medium-grain parallelism among loop-iterations in a Do-all loop, and near-fine-grain parallelism among statements inside a basic block [8, 9, 10]. The coarse-grain parallelism is automatically detected by the earliest executable condition analysis of macrotasks [3, 4], or coarse-grain tasks, considering control dependencies and data dependencies among macrotasks. Macrotasks are dynamically assigned to processor-clusters with low overhead by a scheduling routine generated by the compiler [1, 4]. At the macrodataflow processing, data localization techniques are applied to minimize data transfer overhead among macrotasks by using the local memory on each processor [12, 13]. A macrotask composed of a Do-all or Do-across loop, which is assigned onto a processor- cluster, is hierarchically processed in parallel in the medium-grain (namely, in loop-iteration level) by processors inside the processor-cluster.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
H. Kasahara et al. “A Multi-grain Parallelizing Compilation Scheme on OSCAR,” Proc. 4th Workshop on Languages and Compilers for Parallel Computing, August 1991.
H. Kasahara, H. Honda, M. Iwata, and M. Hirota, “A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems,” Proc. International Conference on Parallel Processing, August 1990.
H. Honda, M. Iwata, H. Kasahara, “Coarse Grain Parallelism Detection Scheme of Fortran programs,” Trans. IEICE, J73-D-I(12), December 1990 (in Japanese).
H. Kasahara, Parallel Processing Technology, Corona Publishing, Tokyo, (in Japanese), June 1991.
H. Kasahara, H. Honda, and S. Narita, “A Fortran Parallelizing Compilation Scheme for OSCAR Using Dependence Graph Analysis,” IEICE Trans., E74(10):3105–3114, 1991.
H. Honda, K. Aida, M. Okamoto, A. Yoshida, W. Ogata, and H. Kasahara, “Fortran Macro-Dataflow Compiler,” Proc. of Fourth Workshop on Compilers for Parallel Computers, December 1993.
M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara, “A Hierarchical Macro-Dataflow Computation Scheme for OSCAR Multi-grain Compiler,” Trans, of Information Processing Society of Japan, 35(4):513–521, (in Japanese), April 1994.
H. Kasahara and S. Narita, “An Approach to Supercomputing Using Mul-tiprocessor Scheduling Algorithms,” Proc. IEEE 1st International Conference on Supercomputing, 139–148,December 1985.
H. Kasahara, H. Honda, S. Narita, “Parallel Processing of near-fine Grain Tasks Using Static Scheduling on OSCAR,” Proc. IEEE ACM Supercom-puting’90, November 1990.
W. Ogata, A. Yoshida, K. Aida, M. Okamoto, and H. Kasahara, “Near-fine Grain Parallel Processing without Synchronization Using Static Scheduling,” Trans. Information Processing Society of Japan, 35(4):522–531, (in Japanese), April 1994.
H. Kasahara, S. Narita, and S. Hashimoto, “OSCAR’s Architecture,” Trans. IEICE, J71-D-I(8) (in Japanese), August 1988.
A. Yoshida, S. Maeda, W. Ogata, and H. Kasahara, “A Data-Localization Scheme for Fortran Macro-Dataflow Computation,” Trans. Information Processing Society of Japan, 35(9):1848–1860, (in Japanese), September 1994.
A. Yoshida, S. Maeda, W. Ogata, and H. Kasahara, “A Data-Localization Scheme among Doall/Sequential Loops for Fortran Coarse-Grain Parallel Processing,” Trans. IEICE, J78-D-I(2), (in Japanese), February 1995.
B.S. Baker, “An Algorithm for Structuring Flowgraphs,” J. ACM,24(1):98–120, January 1977.
M. Burke and R. Cytron, “Interprocedural Dependence Analysis and Par-allelization,” Proc. ACMSIGPLAN’86 Symposium on Compiler Construction, 1986.
F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh and V. Sarkar, “A Framework for Determining Useful Parallelism,” Proc. 2nd ACM International Conference on Supercomputing, 1988.
J. Ferrante, K.J. Ottenstein, J.D. Warren, “The Program Dependence Graph and Its Use in Optimization,” ACM Trans. Programing Languages and Systems,9(3):319–349, July 1987.
M. Girkar and CD. Polychronopoulos, “Optimization of Data/Control Conditions in Task Graphs,” Proc. 4th Workshop on Languages and Compilers for Parallel Computing, August 1991.
H. Kasahara, T. Fujii, H. Nakayama, and S. Narita, “A parallel Processing Scheme for the Solution of Sparse Linear Equations Using Static Optimal Multiprocessor Scheduling Algorithms,” Proc. 2nd International Conference on Super computing, May 1987.
H. Kasahara, W. Premchaiswadi, M. Tamura, Y. Maekawa, and S. Narita, “Parallel Processing of Sparse Matrix Solution Using Fine Grain Tasks on OSCAR,” Proc. International Conference on Parallel Processing, August 1991.
F.G. Gustavson, W. Liniger, and R.A. Willoughby, “Symbolic Generation of an Optimal Crout Algorithm for Sparse Systems of Linear Equations,” J.ACM, 17:87–109, January 1970.
A.A. Berlin and R.J. Surati, “Exploiting the Parallelism Exposed by Partial Evaluation,” MIT AI Lab. A.I. Memo No. 1414, April 1993.
H. Kasahara and S. Narita, “Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing,” IEEE Trans. Computers, c-33(11):1023–1029, 1984 November.
E.G. Coffman Jr. (ed.), Computer and Job-shop Scheduling Theory, New York, Wiley, 1976.
M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco, Freeman, 1979.
Y. Kodama, Y. Koumura, M.Sato, H. Sakane, S. Sakai, Y. Yamaguchi, “EMC-Y: Parallel Processing Element Optimizing Communication and Computation,” Proc. A CM International Conference on Supercomputing,July 1993.
H.G. Dietz, T. Schewederski, M.T. O’Keefe, A. Zaafrani, “Extended Static Synchronization Beyond VLIW,” Proc. Supercomputing’89, 1989.
M. O’Keefe and H. Dietz, “Hardware Barrier Synchronization: Static Barrier MIMD,” Proc. 1990 International Conference on Parallel Processing, 1:35–42, August 1990.
D.A. Padua and M.J. Wolfe, “Advanced Compiler Optimizations for Supercomputers,” Communications of the ACM, 29(12): 1184–1201, December 1986.
M. Wolfe, Optimizing Supercompilers for Supercomputers, Cambridge, MA, MIT Press, 1989.
U. Banerjee, Dependence Analysis for Supercomputing, Boston, MA, Kluwer Academic, 1988.
W. Pugh, “The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis,” Proc. Supercomputing’91, 1991.
P.M. Petersen and D.A. Padua, “Static and Dynamic Evaluation of Data Dependence Analysis,” Proc. International Conference on Supercommputing, June 1993.
S.S. Munshi and B. Simons, “Scheduling Sequential Loops on Parallel Processors,” SIAM J. Comput., 19(4):728–741, August 1990.
D.D. Gajski, D.J. Kuck and D.A. Padua, “Dependence Driven Computation,” Proc. COMPCON 81 Spring Computer Conference,168–172, February 1981.
D. Gajski, D. Kuck, D. Lawrie and A. Sameh, “CEDAR,” Report UIUCDCS-R-83-1123, Department of Computer Science University of Illinois at Urbana-Champaign, February 1983.
D.J. Kuck, E.S. Davidson, D.H. Lawrie and A.H. Sameh, “Parallel Super-computing Today and Cedar Approach,” Science, 231:967–974, February 1986.
H.E. Husmann, D.J. Kuck and D.A. Padua, “Automatic Compound Function Definition for Multiprocessors,” Proc. 1988 International Conference on Parallel Processing, August 1988.
J.A. Fisher, “The VLIW Machine: A Multiprocessor for Compiling Scientific Code,” IEEE Computer, 17(7):45–53, July 1984.
R.P. Colwell et.al., “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. Compuers, C-37(8):967–979, August 1989.
J.R. Ellis, Bulldog: A Compiler for VLIW Architectures, Cambridge, MA, MIT Press, 1985.
J.A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Trans. Computers, C-30(7):478–490, July 1981.
A. Nicolau, “Uniform Parallelism Exploitation in Ordinary Programs,” Proc. 1985 International Conference on Parallel Processing, August 1985.
A. Aiken and A. Nicolau, “Perfect Pipelining: A New Loop Paralleliza-tion Technique,” Cornell University Computer Science Report, No.87-873, October 1987.
N.P. Jouppi, “The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance,” IEEE Trans. Computers, C-38(12): 1645–1657, December 1989.
CD. Polychronopoulos, Parallel Programming and Compilers, Boston Kluwer Academic, 1988.
V. Sarkar, “Determining Average Program Execution Times and Their Variance”, Proc. Sigplan’89, June 1989.
V. Sarkar, Partitioning and Scheduling Parallel Programs for Multiprocessors, Cambridge, MA, MIT Press, 1989.
S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer and C. Tseng, “An Over-view of the Fortran D Programming System,” Proc, Workshop on Languages and Compilers for Parallel Computing, 18–34, August 1991.
High Performance Fortran Forum, High Performance Fortran Language Specification, 1.0, May 1993.
P. Tu and D. Padua, “Automatic Array Privatization,” 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993
Zhiyuan Li, “Array Privatization for Parallel Execution of Loops,” Proc. the 1992 ACM International Conference on Supercomputing, 313-322, 1992.
R. Eigenman, J. Hoeflinger, G. Jaxon, Z. Li and D. Padua, “Restructuring Fortran Programs for Cedar,” International Conference on Parallel Processing 1:57–66, 1991.
J. Li and M. Chen, “Generating Explicit Communication from Shared-Memory Program References,” Proc. Supercomputing’90, 865–876, 1990.
M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techiniques for Parallelizing Compilers on Multicomputers,” IEEE Trans. Parallel and Distributed System, 3(2):179–193, 1992.
J.M. Anderson and M.S. Lam, “Global Optimizations for Parallelism and Locality on Scalable Parallel Machines,” Proc. the SIGPLAN’ 93 Con-ference on Programming Language Design and Implementation, 112–125, 1993.
L. Kipp, “Perfect Benchmarks Documentation Suite 1,” CSRD University of Illinois at Urbana-Champaign, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1995 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Kasahara, H., Honda, H., Aida, K., Okamoto, M., Yoshida, A., Ogata, W. (1995). OSCAR Fortran Multigrain Compiler. In: Bic, L.F., Nicolau, A., Sato, M. (eds) Parallel Language and Compiler Research in Japan. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2269-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-4615-2269-0_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5957-9
Online ISBN: 978-1-4615-2269-0
eBook Packages: Springer Book Archive