A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors

Govindarajan, R.; Altman, Erik R.; Gao, Guang R.

doi:10.1023/A:1014050303852

A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors

Published: March 2002

Volume 6, pages 243–275, (2002)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

R. Govindarajan¹,
Erik R. Altman² &
Guang R. Gao³

61 Accesses
Explore all metrics

Abstract

Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Unlike conventional general purpose processors, ASIPs and embedded processors typically run a single application and hence must be optimized extensively for this in order to extract maximum performance. Further, low power and low cost requirements of ASIPs may demand reuse of pipeline stages causing pipelines with complex structural hazards. In such architectures, exploiting higher ILP is a major challenge to the designer.

Existing techniques deal with either scheduling hardware pipelines to obtain higher throughput or software pipelining—an instruction scheduling technique for iterative computation—for exploiting greater ILP. We integrate these techniques to co-schedule hardware and software pipelines to achieve greater instruction throughput. In this paper, we develop the underlying theory of Co-Scheduling, called the Modulo-Scheduled Pipeline (or MS-Pipeline) theory. More specifically, we establish the necessary and sufficient condition for achieving the maximum throughput in a given pipeline operating under modulo scheduling. Further, we establish a sufficient condition to achieve a specified throughput, based on which we also develop a methodology for designing the hardware pipelines that achieve such a throughput. Further, we present initial experimental results which help to establish the usefulness of MS-pipeline theory in software pipelining. As the proposed theory helps to analyze and improve the throughput of Modulo-Scheduled Pipelines (MS-pipelines), it is especially useful in designing ASIPs and embedded processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction Scheduling in Microprocessors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

Article Open access 29 January 2015

References

Altman, E. R., R. Govindarajan, and G. R. Gao. Scheduling and Mapping: Software Pipelining in the Presence of Structural Hazards. In Proc. of the ACM SIGPLAN '95 Conf. on Programming Language Design and Implementation, La Jolla, CA, June 18–21, 1995, pp. 139-150.
Bala, V. and N. Rubin. Efficient Instruction Scheduling Using Finite State Automata. In Proc. of the 28th Ann. Intl. Symp. on Microarchitecture, Ann Arbor, MI, 1995, pp. 46-56.
Chaar, J. K. and E. S. Davidson. Cyclic Job Shop Scheduling Using Collision Vectors, Technical Report CSE-TR-169-93, University of Michigan, Ann Arbor, MI, Aug. 1993.
Google Scholar
Dehnert, J. C., P. Y.-T. Hsu, and J. P. Bratt. Overlapped Loop Support in the Cydra 5. In Proc. of the Third Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, MA, April 3–6, 1989, pp. 26-38.
Dehnert, J. C. and R. A. Towle. Compiling for Cydra 5, Journal of Supercomputing, vol. 7, pp. 181-227, May 1993.
Eichenberger, A. E., E. S. Davidson, and S. G. Abraham. Minimum Register Requirements for a Modulo Schedule. In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture, San Jose, CA, Nov. 30–Dec. 1994, pp. 75-84.
Gasperoni, F. and U. Schwiegelshohn. Efficient Algorithms for Cyclic Scheduling. Res. Rep. RC 17068, IBM T. J. Watson Res. Center, Yorktown Heights, NY, 1991.
Google Scholar
Govindarajan, R., E. R. Altman, and G. R. Gao. Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining. In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture, San Jose, CA, Nov. 30–Dec. 2, 1994, pp. 85-94.
Govindarajan, R., E. R. Altman, and G. R. Gao. A Framework for Resource-Constrained Rate-Optimal Software Pipelining, IEEE Trans. on Parallel and Distrib. Systems, vol. 7,no. 11, pp. 1133-1149, Nov. 1996.
Google Scholar
Govindarajan, R., E. R. Altman, and G. R. Gao. Co-Scheduling Hardware and Software Pipelines. In Proc. of the Second Intl. Symp. on High-Performance Computer Architecture, San Jose, CA, Feb. 3–7, 1996, pp. 52-61.
Govindarajan, R., N. S. S. Narasimha Rao, E. R. Altman, and G. R. Gao. Enhanced Co-Scheduling: A Software Pipelining Method using Modulo-Scheduled Pipeline Theory, Intl. Journal of Parallel Programming, vol. 28,no. 1, pp. 1-46, Feb. 2000.
Google Scholar
Gupta, R. K. and G. De Micheli. Hardware-Software Cosynthesis for Digital Systems, IEEE Design & Test of Computers, pp. 29-41, Sept. 1993.
Huff, R. A. Lifetime-Sensitive Modulo Scheduling. In Proc. of the ACM SIGPLAN '93 Conf. on Programming Language Design and Implementation, Albuquerque, NM, June 23–25, 1993, pp. 258-267.
Kogge, P. M. The Architecture of Pipelined Computers. McGraw-Hill Book Co., New York, NY, 1981.
Google Scholar
Lam, M. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proc. of the SIGPLAN '88 Conf. on Programming Language Design and Implementation, Atlanta, GA, June 22–24, 1988, pp. 318-328.
Lee, C., M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. of the 30th Ann. Intl. Symp. on Microarchitecture, Research Triangle Park, NC, Dec. 1–3, 1997, pp. 330-335.
Llosa, J., M. Valero, E. Ayguadé, and A. González. Hypernode Reduction Modulo Scheduling. In Proc. of the 28th Ann. Intl. Symp. on Microarchitecture, Ann Arbor, MI, Nov. 29–Dec. 1995, pp. 350-360.
Muller, T. Employing Finite State Automata for Resource Scheduling. In Proc. of the 26th Ann. Intl. Symp. on Microarchitecture, Austin, TX, Dec. 1–3, 1993.
Patel, J. H. and E. S. Davidson. Improving the Throughput of a Pipeline by Insertion of Delays. In Proc. of the 3rd Ann. Symp. on Computer Architecture, Clearwater, FL, Jan. 19–21, 1976, pp. 159-164.
Philips Semiconductors. TriMedia. http://www.semiconductors.com/trimedia/
Proebsting, T. A. and C. W. Fraser. Detecting Pipeline Structural Hazards Quickly. In Conf. Rec. of the 21st ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, Portland, OR, Jan. 17–21, 1994, pp. 280-286.
Rau, B. R. and C. D. Glaeser. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. In Proc. of the 14th Ann. Microprogramming Work., Chatham, MA, Oct. 12–15, 1981, pp. 183-198.
Rau, B. R. and J. A. Fisher. Instruction-Level Parallel Processing: History, Overview and Perspective, Journal of Supercomputing, vol. 7, pp. 9-50, May 1993.
Rau, B. R. Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture, San Jose, CA, 1994, pp. 63-74.
Texas Instruments. TMS 320C6000, http://www.ti.com/sc/docs/products/c6000.
Reiter, R. Scheduling Parallel Computations, Journal of the ACM, vol. 15,no. 4, pp. 590-599, Oct. 1968.
Google Scholar
Wang, J., C. Eisenbeis, M. Jourdan, and B. Su. Decomposed Software Pipelining: A New Approach to Exploit Instruction-Level Parallelism for Loop Programs, Res. Rep. No. 1838, Institut Nat. de Recherche on Informatique et en Automatique (INRIA), Rocquencourt, France, Jan. 1993.
Google Scholar
Waingold, E., M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Barring It to All Software: Raw Machines, IEEE Computer, vol. 30,no. 9, pp. 86-93, Sept. 1997.
Google Scholar
Weinhardt, M. Compilation and Pipeline Synthesis for Reconfigurable Architectures Loops. In Reconfigurable Architectures—High Performance by Configware (Proc. of the RAW'97), April 1997.
Zhang, C., R. Govindarajan, S. Ryan, and G. R. Gao. Efficient State-Diagram Construction Methods for Software Pipelining. In Proc. of the Compiler Construction Conference, Amsterdam, The Netherlands, March 1999.

Download references

Author information

Authors and Affiliations

Supercomputer Edn. and Res. Centre, Indian Institute of Science, Bangalore, 560 012, India
R. Govindarajan
IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY, 10598, U.S.A.
Erik R. Altman
Electrical & Computer Engineering, University of Delaware, Newark, DE, 19716, U.S.A.
Guang R. Gao

Authors

R. Govindarajan
View author publications
You can also search for this author in PubMed Google Scholar
Erik R. Altman
View author publications
You can also search for this author in PubMed Google Scholar
Guang R. Gao
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Govindarajan, R., Altman, E.R. & Gao, G.R. A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors. Design Automation for Embedded Systems 6, 243–275 (2002). https://doi.org/10.1023/A:1014050303852

Download citation

Issue Date: March 2002
DOI: https://doi.org/10.1023/A:1014050303852

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors

Abstract

Access this article

Similar content being viewed by others

Instruction Scheduling in Microprocessors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors

Abstract

Access this article

Similar content being viewed by others

Instruction Scheduling in Microprocessors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation