An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors

Akturan, Cagdas; Jacome, Margarida F.

doi:10.1023/A:1019799515784

An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors

Published: September 2002

Volume 7, pages 115–138, (2002)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

Cagdas Akturan¹ &
Margarida F. Jacome¹

54 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The Effects of System Hyper Pipelining on Three Computational Benchmarks Using FPGAs

References

Rixner, S., W. Dally, B. Khailany, P. Mattson, U. Kapasi, J. Owens. Register Organization for Media Processing. In Proceedings of the 26th International Symposium on High-Performance Computer Architecture, May 1999, pp. 375–386.
TMS320C62x/ C67x CPU and Instruction Set Reference Guide, Texas Instruments internal literature number: SPRU189C, March 1998.
Lam, M. A Systolic Array Optimizing Compiler. Ph.D. Thesis, Carnegie Mellon University, 1987.
Akturan, C., M. F. Jacome. CALiBeR: A Software Pipelining Algorithm for Clustered Embedded VLIW Processors. In Proceedings of the International Conference on Computer Aided Design, November 2001, pp. 112–118.
Akturan, C., and M. F. Jacome. RS-FDRA: A Register Sensitive Software Pipelining Algorithm for Embedded VLIW Processors, In Proceedings of the 9th International Symposium on Hardware/Software Codesign, April 2001, pp. 67–72.
Rau, B. R. Iterative Modulo Scheduling an Algorithm for Software Pipelining Loops. In Proceedings of the MICRO-27, 1994, pages 63–74.
Leiserson, C. E.and J. B. Saxe. Retiming Synchronous Circuitry. Algorithmica, vol. 6, no.1, pp. 5–35, June 1991.
Google Scholar
Akturan, C., and M. F. Jacome. FDRA: A Software Pipelining Algorithm for Embedded VLIW Processors. In Proceedings of the International Symposium on System Synthesis, September 2000, pp. 34–40.
Lapinskii, V., M. F. Jacome, and G. de Veciana. High Quality Operation Binding for ClusteredVLIW Data-paths. In Proceedings of the IEEE/ACM Design Automation Conference, June 2001, pp. 702–707.
Paulin, P. G., and J. P. Knight. Force Directed Scheduling for the Behavioral Synthesis of ASIC's. IEEE Transactions on Computer-Aided Design, vol. 8, no.6, pp. 661–679, June 1989.
Google Scholar
Wang, C., and K. K. Parhi. High Level DSP Synthesis Using the MARS Design System. In Proceedings of the International Symposium on Circuits and Systems, 1992, pp. 164–167.
Lee, T., A. C. Wu, D. D. Gajski, and Y. Lin. An effective methodology for functional pipelining. In Proceedings of the International Conference on Computer Aided Design, December 1992, pp. 230–233.
Goossens, G., J. Vandewalle, and H. De Man. Loop Optimization in Register-Transfer Scheduling for DSP-Systems. In Proceedings of the ACM/IEEE Design Automation Conference, 1989, pp. 826–831.
Moon, S., and K. Ebcioglu. An Efficient Resource-Constrained Global Scheduling Technique for Superscalar and VLIW Processors. In Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992, pp. 55–71.
Moon, S., and K. Ebcioglu. Parallelizing Nonnumerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, vol. 19, no.6, pp. 853–898, November 1997.
Google Scholar
Warter, N. J., G. E. Haab, K. Subramanian, and J. W. Bockhaus. Enhanced Modulo Scheduling for Loops with Conditional Branches. In Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992, pp. 170–179.
Aiken, A., A. Nicolau, and S. Novack. Resource-Constrained Software Pipelining. IEEE Transactions on Parallel and Distributed Systems, vol. 6, no.12, pp. 1248–1270, December 1995.
Google Scholar
Calland, P., A. Darte, and Y. Robert. A New Guaranteed Heuristic for the Software Pipelining Problem. In Proceedings of International Conference on Supercomputing, January 1996, pp. 261–269.
Gasperoni, F., and U. Schweigelshohn. Generating Close to Optimum Loop Schedules on Parallel Processors. In Parallel Processing Letters, vol.4, no.4, 1994, pp. 391–403.
Google Scholar
Chao, L., A. LaPaugh, and E. H. Sha. Rotation Scheduling: A Loop Pipelining Algorithm. IEEE Transactions on Computer Aided Design, vol.16, no.3, March 1997, pp. 229–239.
Google Scholar
Potkonjak, M., and J. Rabaey. Retiming for Scheduling. VLSI Signal Processing IV, November 1990, pp. 23–32.
Huff, R. A. Lifetime-Sensitive Modulo Scheduling. In Proceedings of the ACM SIGPLAN Conference on Programming Language, Design and Implementation, 1993, pp. 258–267.
Eichenberger, A. E., E. S. Davidson, and S. G. Abraham. Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling. International Journal of Parallel Programming, vol. 2, no.2, pp. 103–132, February 1996.
Google Scholar
Govindarajan, R., E. R. Altman, and G. R. Gao. Minimizing Register Requirements Under Resource-Constrained Rate-Optimal Software Pipelining. In Proceedings of MICRO-27, November 1994, pp. 85–94.
Eichenberger, A. E., E. S. Davidson. Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule. In Proceedings of MICRO-28, November 1995, pp. 338–349.
Llosa, J., A. Gonzalez, E. Ayguade, and M. Valero. Swing Modulo Scheduling: A Lifetime Sensitive Approach. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, October 1996, pp. 80–86.
Kolson, D. J., A. Nicolau, N. Dutt, and K. Kennedy. Optimal Register Assignment to Loops for Embedded Code Generation. ACM Transactions on Design Automation of Electronic Systems, vol. 1, no.2, pp. 251–279, April 1996.
Google Scholar
Ozer, E., S. Banerjia, and T. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In Proceedings of MICRO-31, November 1998, pp. 308–315.
Fernandes, M. M., J. Llosa, and N. Topham. Distributed Modulo Scheduling. In Proceedings of the International Symposium on High Performance Computer Architecture, January 1999, pp. 130–134.
Sanchez, J., and A. Gonzalez. Instruction Scheduling for Clustered VLIW Architectures. In Proceedings of the 13th International Symposium on System Synthesis, September 2000, pages 41–46.

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, ACES Building 5.118, Austin, TX, 78712-1084, USA
Cagdas Akturan & Margarida F. Jacome

Authors

Cagdas Akturan
View author publications
You can also search for this author in PubMed Google Scholar
Margarida F. Jacome
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akturan, C., Jacome, M.F. An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors. Design Automation for Embedded Systems 7, 115–138 (2002). https://doi.org/10.1023/A:1019799515784

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1019799515784

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors

Abstract

Access this article

Similar content being viewed by others

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The Effects of System Hyper Pipelining on Three Computational Benchmarks Using FPGAs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors

Abstract

Access this article

Similar content being viewed by others

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The Effects of System Hyper Pipelining on Three Computational Benchmarks Using FPGAs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation