An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors

Abstract

This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Rixner, S., W. Dally, B. Khailany, P. Mattson, U. Kapasi, J. Owens. Register Organization for Media Processing. In Proceedings of the 26th International Symposium on High-Performance Computer Architecture, May 1999, pp. 375–386.

  2. 2.

    TMS320C62x/ C67x CPU and Instruction Set Reference Guide, Texas Instruments internal literature number: SPRU189C, March 1998.

  3. 3.

    Lam, M. A Systolic Array Optimizing Compiler. Ph.D. Thesis, Carnegie Mellon University, 1987.

  4. 4.

    Akturan, C., M. F. Jacome. CALiBeR: A Software Pipelining Algorithm for Clustered Embedded VLIW Processors. In Proceedings of the International Conference on Computer Aided Design, November 2001, pp. 112–118.

  5. 5.

    Akturan, C., and M. F. Jacome. RS-FDRA: A Register Sensitive Software Pipelining Algorithm for Embedded VLIW Processors, In Proceedings of the 9th International Symposium on Hardware/Software Codesign, April 2001, pp. 67–72.

  6. 6.

    Rau, B. R. Iterative Modulo Scheduling an Algorithm for Software Pipelining Loops. In Proceedings of the MICRO-27, 1994, pages 63–74.

  7. 7.

    Leiserson, C. E.and J. B. Saxe. Retiming Synchronous Circuitry. Algorithmica, vol. 6, no.1, pp. 5–35, June 1991.

    Google Scholar 

  8. 8.

    Akturan, C., and M. F. Jacome. FDRA: A Software Pipelining Algorithm for Embedded VLIW Processors. In Proceedings of the International Symposium on System Synthesis, September 2000, pp. 34–40.

  9. 9.

    Lapinskii, V., M. F. Jacome, and G. de Veciana. High Quality Operation Binding for ClusteredVLIW Data-paths. In Proceedings of the IEEE/ACM Design Automation Conference, June 2001, pp. 702–707.

  10. 10.

    Paulin, P. G., and J. P. Knight. Force Directed Scheduling for the Behavioral Synthesis of ASIC's. IEEE Transactions on Computer-Aided Design, vol. 8, no.6, pp. 661–679, June 1989.

    Google Scholar 

  11. 11.

    Wang, C., and K. K. Parhi. High Level DSP Synthesis Using the MARS Design System. In Proceedings of the International Symposium on Circuits and Systems, 1992, pp. 164–167.

  12. 12.

    Lee, T., A. C. Wu, D. D. Gajski, and Y. Lin. An effective methodology for functional pipelining. In Proceedings of the International Conference on Computer Aided Design, December 1992, pp. 230–233.

  13. 13.

    Goossens, G., J. Vandewalle, and H. De Man. Loop Optimization in Register-Transfer Scheduling for DSP-Systems. In Proceedings of the ACM/IEEE Design Automation Conference, 1989, pp. 826–831.

  14. 14.

    Moon, S., and K. Ebcioglu. An Efficient Resource-Constrained Global Scheduling Technique for Superscalar and VLIW Processors. In Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992, pp. 55–71.

  15. 15.

    Moon, S., and K. Ebcioglu. Parallelizing Nonnumerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, vol. 19, no.6, pp. 853–898, November 1997.

    Google Scholar 

  16. 16.

    Warter, N. J., G. E. Haab, K. Subramanian, and J. W. Bockhaus. Enhanced Modulo Scheduling for Loops with Conditional Branches. In Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992, pp. 170–179.

  17. 17.

    Aiken, A., A. Nicolau, and S. Novack. Resource-Constrained Software Pipelining. IEEE Transactions on Parallel and Distributed Systems, vol. 6, no.12, pp. 1248–1270, December 1995.

    Google Scholar 

  18. 18.

    Calland, P., A. Darte, and Y. Robert. A New Guaranteed Heuristic for the Software Pipelining Problem. In Proceedings of International Conference on Supercomputing, January 1996, pp. 261–269.

  19. 19.

    Gasperoni, F., and U. Schweigelshohn. Generating Close to Optimum Loop Schedules on Parallel Processors. In Parallel Processing Letters, vol.4, no.4, 1994, pp. 391–403.

    Google Scholar 

  20. 20.

    Chao, L., A. LaPaugh, and E. H. Sha. Rotation Scheduling: A Loop Pipelining Algorithm. IEEE Transactions on Computer Aided Design, vol.16, no.3, March 1997, pp. 229–239.

    Google Scholar 

  21. 21.

    Potkonjak, M., and J. Rabaey. Retiming for Scheduling. VLSI Signal Processing IV, November 1990, pp. 23–32.

  22. 22.

    Huff, R. A. Lifetime-Sensitive Modulo Scheduling. In Proceedings of the ACM SIGPLAN Conference on Programming Language, Design and Implementation, 1993, pp. 258–267.

  23. 23.

    Eichenberger, A. E., E. S. Davidson, and S. G. Abraham. Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling. International Journal of Parallel Programming, vol. 2, no.2, pp. 103–132, February 1996.

    Google Scholar 

  24. 24.

    Govindarajan, R., E. R. Altman, and G. R. Gao. Minimizing Register Requirements Under Resource-Constrained Rate-Optimal Software Pipelining. In Proceedings of MICRO-27, November 1994, pp. 85–94.

  25. 25.

    Eichenberger, A. E., E. S. Davidson. Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule. In Proceedings of MICRO-28, November 1995, pp. 338–349.

  26. 26.

    Llosa, J., A. Gonzalez, E. Ayguade, and M. Valero. Swing Modulo Scheduling: A Lifetime Sensitive Approach. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, October 1996, pp. 80–86.

  27. 27.

    Kolson, D. J., A. Nicolau, N. Dutt, and K. Kennedy. Optimal Register Assignment to Loops for Embedded Code Generation. ACM Transactions on Design Automation of Electronic Systems, vol. 1, no.2, pp. 251–279, April 1996.

    Google Scholar 

  28. 28.

    Ozer, E., S. Banerjia, and T. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In Proceedings of MICRO-31, November 1998, pp. 308–315.

  29. 29.

    Fernandes, M. M., J. Llosa, and N. Topham. Distributed Modulo Scheduling. In Proceedings of the International Symposium on High Performance Computer Architecture, January 1999, pp. 130–134.

  30. 30.

    Sanchez, J., and A. Gonzalez. Instruction Scheduling for Clustered VLIW Architectures. In Proceedings of the 13th International Symposium on System Synthesis, September 2000, pages 41–46.

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Akturan, C., Jacome, M.F. An Effective Software Pipelining Algorithm for Clustered Embedded VLIW Processors. Design Automation for Embedded Systems 7, 115–138 (2002). https://doi.org/10.1023/A:1019799515784

Download citation

  • Clustering
  • embedded systems
  • optimizingcompilers
  • retiming
  • soft real-time applications
  • software pipelining
  • VLIW processor