Abstract
This paper proposes a multi-grain parallelizing compilation scheme for Fortran programs. The scheme hierarchically exploits parallelism among coarse grain tasks, such as, loops, subroutines or basic blocks, among medium grain tasks like loop iterations and among near fine grain tasks like statements. Parallelism among the coarse grain tasks called the macrotasks is exploited by carefully analyzing control dependences and data dependences. The macrotasks are dynamically assigned to processor clusters to cope with run-time uncertainties, such as, conditional branches among the macrotasks and variation of execution time of each macrotask. The parallel processing of macrotasks is called the macro-dataflow computation. A macrotask composed of a Do-all loop, which is assigned onto a processor cluster, is processed in the medium grain in parallel by processors inside the processor cluster. A macrotask composed of a sequential loop or a basic block is processed on a processor cluster in the near fine grain by using static scheduling. A macrotask composed of subroutine or a large sequential loop is processed by hierarchically applying macro-dataflow computation inside a processor cluster. Performance of the proposed scheme is evaluated on a multiprocessor system named OSCAR. The evaluation shows that the multi-grain parallel processing effectively exploits parallelism from Fortran programs.
Preview
Unable to display preview. Download preview PDF.
References
A.V.Aho, R.Sethi and J.D.Ullman, Compilers: Principles, Techniques, and Tools, Addison Wesley, 1988.
U.Banerjee, Dependence Analysis for Supercomputing, Kluwer Pub., 1988
D.A.Padua, D.J.Kuck and D.H.Lawrie, “High-speed multiprocessor and compilation techniques,” IEEE Trans. Comput., Vol. C-29, No.9,pp.763–776, Sep. 1980.
D.A.Padua, and M.J.Wolfe,“Advanced Compiler Optimizations for Supercomputers,” C.ACM, Vol.29, No.12, pp.1184–1201,Dec.1986.
D.Gajski, D.Kuck, D.Lawrie and A.Sameh,“CEDAR,” Report UIUCDCS-R-83-1123, Dept. of Computer Sci., Univ. Illinois at Urbana-Champaign, Feb. 1983.
D.D.Gajski, D.J.Kuck, D.A.Padua, “Dependence Driven Computation,” Proc. of COMPCON 81 Spring Computer Conf., pp.168–172, Feb. 1981.
H.E.Husmann, D.J.Kuck and D.A.Padua,“Automatic Compound Function Definition for Multiprocessors,” Proc. 1988 Int"l. Conf. on Parallel Processing,Aug.1988.
M.Wolfe, “Multiprocessor synchronization for concurrent loops,” IEEE software, Vol. pp. 34–42, Jan. 1988.
M.Wolfe,Optimizing Supercompilers for Supercomputers,MIT Press, 1989.
C.D.Polychronopoulos and D.J.Kuck, “Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,” IEEE Trans. Comput., Vol.c-36,12, pp.1425–1439,Dec. 1987.
D.J.Kuck, E.S.Davidson, D.H.Lawrie and A.H.Sameh, “Parallel Supercomputing Today and Cedar Approach,” Science, Vol.231, pp.967–974, Feb. 1986.
J.A.Fisher, “The VLIW Machine: A Multiprocessor for Compiling Scientific Code,” IEEE Computer, Vol. 17, No.7,pp.45–53, Jul.1984.
R.P.Colwell, et.al.,“A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. Comp., Vol.C-37, No.8, pp.967–979, Aug.1989.
J.R.Ellis, “Bulldog: A Compiler for VLIW Architectures,” MIT Press,1985.
A.Nicolau and J.A.Fisher, “Measuring the Parallelism Available for Very Long Instruction Word Architectures,” IEEE Trans. on Computers, Vol. C-33, No. 11, pp.968–976, Nov.1984.
A.Nicolau, “Uniform Parallelism Exploitation in Ordinary Programs,” Proc. 1985 Int. Conf. Parallel Processing, Aug. 1985.
N.P.Jouppi, “The Nonuniform Distribution of Instrction-Level and Machine Paralellism and Its Effect on Performance,” IEEE Trans. on Comput, vol. C-38, No.12, pp.1645–1657, Dec.1989.
E.G.Coffman Jr.(ed.), Computer and Job-shop Scheduling Theory. New York: Wiley, 1976.
M.R.Garey and D.S.Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: Freeman, 1979.
C.D.Polychronopoulos, Parallel Programming and Compilers, Kluwer Academic Pub., 1988.
V.Sarkar, “Determining Average Program Execution Times and Their Variance'”, Proc. Sigplan'89, June 1989.
V.Sarkar, Partitioning and Scheduling Parallel Programs for Multiprocessors, MIT Press,1989.
H.Kasahara and S.Narita, “Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing,” IEEE Trans. Comput, Vol.c-33, No.11.pp. 1023–1029,Nov.1984.
H.Kasahara and S.Narita, “An approach to supercomputing using multiprocessor scheduling algorithms, “ in Proc. IEEE 1st Int'l Conf. on Supercomputing, pp.139–148,Dec. 1985.
F.Allen, M.Burke,R.Cytron,J.Ferrante,W.Hsieh and V.Sarkar, “A Framework for Determining Useful Parallelism,” Proc. 2nd ACM Int'l. Conf. on Supercomputing, 1988.
J.Ferrante,K.J.Ottenstein,J.D.Warren,“The Program Dependence Graph and Its Use in Optimization,” ACM Trans. on Prog. Lang. and Syst, Vol.9,No.3.pp.319–349, July 1987.
B.S.Baker,“An Algorithm for Structuring Flowgraphs,” J. ACM, Vol.24, No.1, pp.98–120, Jan.1977.
M.Burke and R.Cytron, “Interprocedural Dependence Analysis and Parallelization,” Proc. ACM SIGPLAN'86 Symposium on Compiler Construction, 1986.
M.O'Keefe and H. Dietz, “Hardware Barrier Synchronization: Static Barrier MIMD,” Proc. 1990 Int'l Conf. on Parallel Processing, pp. 135–42, Aug. 1990.
H.Kasahara, H.Honda, S.Narita, “Parallel Processing of Near Fine Grain Tasks Using Static Scheduling on OSCAR,” in Proc. IEEE ACM Supercomputing'90, Nov. 1990.
H.Honda, M.Iwata, H.Kasahara, “Coarse Grain Parallelism Detection Scheme of Fortran programs,” Trans. IEICE, Vol.J73-D-I, No.12, Dec.1990 (in Japanese).
F.G.Gustavson, W.Liniger and R.A.Willoughby, “Symbolic Generation of an Optimal Crout Algorithm for Sparse Systems of Linear Equations,” J.ACM, vol.17, pp.87–109, Jan. 1970.
H.Kasahara, Parallel Processing Technology, Corona Publishing, Tokyo, (in Japanese), Jun. 1991.
S.S.Munshi and B.Simons, “Scheduling Sequential Loops on Parallel Processors,” SIAM J. Comput, Vol. 19, No.4, pp.728–741, Aug., 1990.
M.Girkar and C.D.Polychronopoulos, “Optimization of Data/Control Conditions in Task Graphs,” Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
H.Kasahara, H.Honda, M.Iwata and M.Hirota, “A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems,” Proc. Int'l. Conf. on Parallel Processing, Aug. 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kasahara, H., Honda, H., Mogi, A., Ogura, A., Fujiwara, K., Narita, S. (1992). A multi-grain parallelizing compilation scheme for OSCAR (optimally scheduled advanced multiprocessor). In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1991. Lecture Notes in Computer Science, vol 589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0038671
Download citation
DOI: https://doi.org/10.1007/BFb0038671
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55422-6
Online ISBN: 978-3-540-47063-2
eBook Packages: Springer Book Archive