A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays

Hwang, Yin -Tsung; Hu, Yu Hen

doi:10.1007/BF02106827

A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays

Published: 01 October 1995

Volume 11, pages 133–150, (1995)
Cite this article

Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Yin -Tsung Hwang¹ &
Yu Hen Hu²

42 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

This paper addresses the partitioning and scheduling problems in mapping multi-stage regular iterative algorithms onto fixed size distributed memory processor arrays. We first propose a versatile partitioning model which provides a unified framework to integrate various partitioning schemes such as “locally sequential globally parallel”, “locally parallel globally sequential” and “multi-projection”. To alleviate the run time data migration overhead—a crucial problem to the mapping of multi-stage algorithms, we further relax the widely adopted atomic partitioning constraint in our model such that a more flexible partitioning scheme can be achieved. Based on this unified partitioning model, a novel hierarchical scheduling scheme which applies separate schedules at different processor hierarchies is then developed. The scheduling problem is then formulated into a set of ILP problem and solved by the existing software package for optimal solutions. Examples indicate that our partitioning model is a superset of the existing schemes and the proposed hierarchical scheduling scheme can outperform the conventional one-level linear schedule.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scheduling Overheads for Task-Based Parallel Programming Models

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Layer-by-Layer Partitioning of Finite Element Meshes for Multicore Architectures

References

S. Kung,VLSI Array Processors, Prentice Hall, 1987.
K. Jainandunsing, “Optimal partitioning scheme for wavefront/systolic array processors,”Proc. IEEE Symposium on Circuits and Systems, 1986.
S. Horiike et al., “A design method of systolic arrays under the constraint of the number of processors,”Proc. ICASSP, pp. 764–767, 1987.
P. Kuchibhotla and B. Rao, “Efficient scheduling methods for partitioned systolic algorithms,”Int'l Conf. on Application Specific Arrays Processors, pp. 649–663, August 1992.
E.F. Deprettere, “Cellular broadcast in regular processor arrays,” K. Yao et al. (Eds.),VLSI Signal Processing V, pp. 319–331. IEEE, October 1992.
D. Moldovan and J. Fortes, “Partitioning and mapping algorithms into fixed size systolic arrays,”IEEE Trans. on Computers, Vol. c-35, pp. 1–12, 1986.
Article Google Scholar
J. Navarro et al., “Partitioning: An essential step in mapping algorithms into systolic array processors,”Computer, pp. 77–89, July 1987.
S. Rao and T. Kailath, “What is a systolic algorithm,”SPIE, Vol. 614, pp. 34–48, 1986.
Article Google Scholar
J. Bu, E.F. Deprettere, and P. Dewilde, “A design methodology for fixed-size systolic array,”Int'l Conf. on Application Specific Array Processors, pp. 591–603, 1990.
J. Bu and E. F. Deprettere, “Processor clustering for the design of optimal fixed-size systolic arrays,” E. Deprettere and A.-J. van der Veen (Eds.),Algorithms and parallel VLSI architectures, Elsevier Science Publishers, 1991, Vol. A, Ch. 16, pp. 341–362.
J.-P. Sheu and T.-H. Tai, “Partitioning and mapping nested loops on multiprocessor systems,”IEEE Trans. on Parallel and Distributed Systems, Vol. 2, pp. 430–439, 1991.
Article Google Scholar
R. Stewart, “Mapping signal processing algorithms to fixed architectures,”Proc. ICASSP, pp. 2037–2040, 1988.
S. Mirchandaney and J. Saltz, “A scheme for supporting automatic data migration on multicomputers,” D. Walker and Q. Stout (Eds.),The Fifth Distributed Memory Computing Conf., pp. 1028–1037, April 1990.
Y. Hwang and Y. Hu, “Mssm—a design aide for multi-stage systolic mapping,”J. of VLSI Signal Processing, Vol. 4, pp. 125–145, 1992.
Article Google Scholar
S. Rao, “Regular iterative algorithms and their implementations on processor arrays,” Ph.D. thesis, Stanford University, October 1985.
A. Schrijver,Theory of Integer and Linear Programming, John Wiely and Sons, 1988.
Y. Wong and J.-M. Delosme, “Optimization of processor count for systolic arrays,” Technical Report YALEU/DCS/RR-697, Yale University, May 1989.
S. Kung and S. Jean, “A VLSI array compiler system (vacs) for array design,” R. Brodersen and H.S. Moscovitz (Eds.),VLSI Signal Processing III, pp. 495–508, IEEE Press, 1988.
M.E. Wolf and M.S. Lam, “A loop transformation theory and an algorithm to maximize parallelism,”IEEE Trans. on Parallel and Distributed Systems, Vol. 2, pp. 452–471, 1991.
Article Google Scholar
L.-C. Lu and M. Chen, “New loop transformation techniques for massive parallelism,” Technical Report YALEU/DCS/TR-833, CS Department, Yale University, October 1990.
Y. Hwang, “Automatic mapping of multi-stage algorithms onto distributed memory systems,” Ph.D. thesis, University of Wisconsin, Madison, August 1993.
Google Scholar
J. Teich and L. Thiele, “A transformative approach to the partitioning of processor array,”Int'l Conf. on Application Specific Arrays Processors, pp. 4–20. IEEE, August 1992.
A. Suarez, J. Llaberia, and A. Fernandez, “Scheduling partitions in systolic algorithms,”Int'l Conf. on Application Specific Arrays Processors, pp. 619–633. IEEE, August 1992.
H. Nelis, E.F. Deprettere, and P. Dewilde, “Automatic design and partitioning of systolic/wavefront arrays for VLSI,”Circuits, Systems, and Signal Processing, Vol. 7, pp. 235–252, 1988.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, National Yunlin Institute of Technology, Yunlin, Taiwan 40415, R.O.C.
Yin -Tsung Hwang
Department of Electrical and Computer Engineering, University of Wisconsin, 53706, Madison, WI
Yu Hen Hu

Authors

Yin -Tsung Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Hen Hu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, Y.T., Hu, Y.H. A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays. Journal of VLSI Signal Processing 11, 133–150 (1995). https://doi.org/10.1007/BF02106827

Download citation

Received: 15 January 1993
Revised: 15 January 1994
Published: 01 October 1995
Issue Date: October 1995
DOI: https://doi.org/10.1007/BF02106827

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays

Abstract

Access this article

Similar content being viewed by others

Scheduling Overheads for Task-Based Parallel Programming Models

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Layer-by-Layer Partitioning of Finite Element Meshes for Multicore Architectures

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays

Abstract

Access this article

Similar content being viewed by others

Scheduling Overheads for Task-Based Parallel Programming Models

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Layer-by-Layer Partitioning of Finite Element Meshes for Multicore Architectures

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation