Abstract
This paper addresses the partitioning and scheduling problems in mapping multi-stage regular iterative algorithms onto fixed size distributed memory processor arrays. We first propose a versatile partitioning model which provides a unified framework to integrate various partitioning schemes such as “locally sequential globally parallel”, “locally parallel globally sequential” and “multi-projection”. To alleviate the run time data migration overhead—a crucial problem to the mapping of multi-stage algorithms, we further relax the widely adopted atomic partitioning constraint in our model such that a more flexible partitioning scheme can be achieved. Based on this unified partitioning model, a novel hierarchical scheduling scheme which applies separate schedules at different processor hierarchies is then developed. The scheduling problem is then formulated into a set of ILP problem and solved by the existing software package for optimal solutions. Examples indicate that our partitioning model is a superset of the existing schemes and the proposed hierarchical scheduling scheme can outperform the conventional one-level linear schedule.
Similar content being viewed by others
References
S. Kung,VLSI Array Processors, Prentice Hall, 1987.
K. Jainandunsing, “Optimal partitioning scheme for wavefront/systolic array processors,”Proc. IEEE Symposium on Circuits and Systems, 1986.
S. Horiike et al., “A design method of systolic arrays under the constraint of the number of processors,”Proc. ICASSP, pp. 764–767, 1987.
P. Kuchibhotla and B. Rao, “Efficient scheduling methods for partitioned systolic algorithms,”Int'l Conf. on Application Specific Arrays Processors, pp. 649–663, August 1992.
E.F. Deprettere, “Cellular broadcast in regular processor arrays,” K. Yao et al. (Eds.),VLSI Signal Processing V, pp. 319–331. IEEE, October 1992.
D. Moldovan and J. Fortes, “Partitioning and mapping algorithms into fixed size systolic arrays,”IEEE Trans. on Computers, Vol. c-35, pp. 1–12, 1986.
J. Navarro et al., “Partitioning: An essential step in mapping algorithms into systolic array processors,”Computer, pp. 77–89, July 1987.
S. Rao and T. Kailath, “What is a systolic algorithm,”SPIE, Vol. 614, pp. 34–48, 1986.
J. Bu, E.F. Deprettere, and P. Dewilde, “A design methodology for fixed-size systolic array,”Int'l Conf. on Application Specific Array Processors, pp. 591–603, 1990.
J. Bu and E. F. Deprettere, “Processor clustering for the design of optimal fixed-size systolic arrays,” E. Deprettere and A.-J. van der Veen (Eds.),Algorithms and parallel VLSI architectures, Elsevier Science Publishers, 1991, Vol. A, Ch. 16, pp. 341–362.
J.-P. Sheu and T.-H. Tai, “Partitioning and mapping nested loops on multiprocessor systems,”IEEE Trans. on Parallel and Distributed Systems, Vol. 2, pp. 430–439, 1991.
R. Stewart, “Mapping signal processing algorithms to fixed architectures,”Proc. ICASSP, pp. 2037–2040, 1988.
S. Mirchandaney and J. Saltz, “A scheme for supporting automatic data migration on multicomputers,” D. Walker and Q. Stout (Eds.),The Fifth Distributed Memory Computing Conf., pp. 1028–1037, April 1990.
Y. Hwang and Y. Hu, “Mssm—a design aide for multi-stage systolic mapping,”J. of VLSI Signal Processing, Vol. 4, pp. 125–145, 1992.
S. Rao, “Regular iterative algorithms and their implementations on processor arrays,” Ph.D. thesis, Stanford University, October 1985.
A. Schrijver,Theory of Integer and Linear Programming, John Wiely and Sons, 1988.
Y. Wong and J.-M. Delosme, “Optimization of processor count for systolic arrays,” Technical Report YALEU/DCS/RR-697, Yale University, May 1989.
S. Kung and S. Jean, “A VLSI array compiler system (vacs) for array design,” R. Brodersen and H.S. Moscovitz (Eds.),VLSI Signal Processing III, pp. 495–508, IEEE Press, 1988.
M.E. Wolf and M.S. Lam, “A loop transformation theory and an algorithm to maximize parallelism,”IEEE Trans. on Parallel and Distributed Systems, Vol. 2, pp. 452–471, 1991.
L.-C. Lu and M. Chen, “New loop transformation techniques for massive parallelism,” Technical Report YALEU/DCS/TR-833, CS Department, Yale University, October 1990.
Y. Hwang, “Automatic mapping of multi-stage algorithms onto distributed memory systems,” Ph.D. thesis, University of Wisconsin, Madison, August 1993.
J. Teich and L. Thiele, “A transformative approach to the partitioning of processor array,”Int'l Conf. on Application Specific Arrays Processors, pp. 4–20. IEEE, August 1992.
A. Suarez, J. Llaberia, and A. Fernandez, “Scheduling partitions in systolic algorithms,”Int'l Conf. on Application Specific Arrays Processors, pp. 619–633. IEEE, August 1992.
H. Nelis, E.F. Deprettere, and P. Dewilde, “Automatic design and partitioning of systolic/wavefront arrays for VLSI,”Circuits, Systems, and Signal Processing, Vol. 7, pp. 235–252, 1988.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hwang, Y.T., Hu, Y.H. A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays. Journal of VLSI Signal Processing 11, 133–150 (1995). https://doi.org/10.1007/BF02106827
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02106827