Skip to main content

Compiling for VLIW DSPs

  • Chapter
  • First Online:

Abstract

This chapter describes fundamental compiler techniques for VLIW DSP processors. We begin with a review of VLIW DSP architecture concepts, as far as relevant for the compiler writer. As a case study, we consider the TI TMS320C6x™ clustered VLIW DSP processor family. We survey the main tasks of VLIW DSP code generation, discuss instruction selection, cluster assignment, instruction scheduling and register allocation in some greater detail, and present selected techniques for these, both heuristic and optimal ones. Some emphasis is put on phase ordering problems and on phase coupled and integrated code generation techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Processors that decouple issue packets from fetch packets are commonly also referred to as Explicitly Parallel Instruction set Computing (EPIC) architectures.

  2. 2.

    NOP (no operation) instructions only occupy an issue slot but no further resources.

  3. 3.

    For simplicity of presentation, we assume here that write latency and read latency are constants for each instruction. In general, they may in some cases depend on run-time conditions exposed by the hardware and vary in an interval between earliest and latest write resp. read latency. See also our remarks on the LE model further below. For a more detailed latency model, we refer to Rau et al. [93].

  4. 4.

    Note that in some papers and texts, the meanings of the terms delay and latency are reversed.

  5. 5.

    Exception: For load and store instructions, two more resources are used to model load destination register resp. store source register access to the two register files, as only one loaded or stored register can be accessed per register file and clock cycle. Furthermore, load instructions can cause additional implicit delays (pipeline stalls) by unbalanced access to the internal memory banks (see later). This effect could likewise be modeled with additional resources representing the different memory banks. However, this will only be useful for predicting stalls where the alignment of the accessed memory addresses is statically known.

  6. 6.

    Even though ’C62x assembly language allows an issue packet to start in a fetch packet and continue into the next one, the assembler will automatically create and insert a fresh fetch packet after the first one, move the pending issue packet there, and fill up the remainder of the first issue packet with NOP instructions.

  7. 7.

    ’C66x and ’C67x support, for the basic arithmetic instructions, both single-precision and double-precision floatingpoint variants as defined by the IEEE 754 standard [98]. ’C66x combines the floatingpoint features of ’C67x with the advanced fixed point features of ’C64x.

References

  1. Alfred V. Aho, Mahadevan Ganapathi, and Steven W.K. Tjiang. Code Generation Using Tree Matching and Dynamic Programming. ACM Transactions on Programming Languages and Systems, 11(4):491–516, October 1989.

    Google Scholar 

  2. Alexander Aiken and Alexandru Nicolau. Optimal loop parallelization. SIGPLAN Notices, 23(7):308–317, July 1988.

    Article  Google Scholar 

  3. Alex Aleta, Josep M. Codina, Jesus Sanchez, Antonio Gonzalez, and David Kaeli. AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures. IEEE Transactions on Computers, 58(6):770–783, June 2009.

    Article  MathSciNet  Google Scholar 

  4. Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. Software pipelining. ACM Computing Surveys, 27(3), September 1995.

    Google Scholar 

  5. Analog Devices. TigerSHARC embedded processor ADSP-TS201S. Data sheet, www.analog.com/en/embedded-processing-dsp/tigersharc, 2006.

  6. Andrew W. Appel and Lal George. Optimal Spilling for CISC Machines with Few Registers. In Proc. ACM conf. on Programming language design and implementation, pages 243–253. ACM Press, 2001.

    Google Scholar 

  7. Guido Araujo and Sharad Malik. Optimal code generation for embedded memory non-homogeneous register architectures. In Proc. 7th Int. Symposium on System Synthesis, pages 36–41, September 1995.

    Google Scholar 

  8. Rosa M. Badia, Fermin Sanchez, and Jordi Cortadella. OSP: Optimal Software Pipelining with Minimum Register Pressure. Technical Report UPC-DAC-1996-25, DAC Dept. d’arquitectura de Computadors, Univ. Polytecnica de Catalunya, Barcelona, Campus Nord. Modul D6, E-08071 Barcelona, Spain, June 1996.

    Google Scholar 

  9. Vasanth Bala and Norman Rubin. Efficient instruction scheduling using finite state automata. In Proc. 28th int. symp. on miocroarchitecture (MICRO-28), pages 46–56. IEEE, 1995.

    Google Scholar 

  10. Steven Bashford and Rainer Leupers. Phase-coupled mapping of data flow graphs to irregular data paths. Design Automation for Embedded Systems (DAES), 4(2/3):119–165, 1999.

    Article  Google Scholar 

  11. Andrzej Bednarski and Christoph Kessler. Optimal integrated VLIW code generation with integer linear programming. In Proc. Int. Euro-Par 2006 Conference. Springer LNCS, August 2006.

    Google Scholar 

  12. Mirza Beg and Peter van Beek. A constraint programming approach for instruction assignment. In Proc. Int. Workshop on Interaction between Compilers and Computer Architectures (INTERACT-15), pp. 25–34, February 2011.

    Google Scholar 

  13. D. Bernstein, M.C. Golumbic, Y. Mansour, R.Y. Pinter, D.Q. Goldin, H. Krawczyk, and I. Nahshon. Spill code minimization techniques for optimizing compilers. In Proc. Int. Conf. on Progr. Lang. Design and Implem., pages 258–263, 1989.

    Google Scholar 

  14. F. Bouchez, A. Darte, C. Guillon, and F. Rastello. Register allocation: what does the NP-completeness proof of Chaitin et al. really prove? […]. In Proc. 19th int. workshop on languages and compilers for parallel computing, New Orleans, November 2006.

    Google Scholar 

  15. Thomas S. Brasier, Philip H. Sweany, Steven J. Beaty, and Steve Carr. Craig: A practical framework for combining instruction scheduling and register assignment. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques (PACT’95), 1995.

    Google Scholar 

  16. Preston Briggs, Keith Cooper, Ken Kennedy, and Linda Torczon. Coloring heuristics for register allocation. In Proc. Int. Conf. on Progr. Lang. Design and Implem., pages 275–284, 1989.

    Google Scholar 

  17. Preston Briggs, Keith Cooper, and Linda Torczon. Rematerialization. In Proc. Int. Conf. on Progr. Lang. Design and Implem., pages 311–321, 1992.

    Google Scholar 

  18. Philip Brisk, Ajay K. Verma, and Paolo Ienne. Optimistic chordal coloring: a coalescing heuristic for SSA form programs. Des. Autom. Embed. Syst., 13:115–137, 2009.

    Article  Google Scholar 

  19. Roberto Castañeda-Lozano, Mats Carlsson, Gabriel Hjort-Blindell and Christian Schulte. Combinatorial spill code optimization and ultimate coalescing. In Proc. LCTES’14, pp. 23–32, June 2014.

    Google Scholar 

  20. G.J. Chaitin, M.A. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, and P.W. Markstein. Register allocation via coloring. Computer Languages, 6:47–57, 1981.

    Article  Google Scholar 

  21. Chia-Ming Chang, Chien-Ming Chen, and Chung-Ta King. Using integer linear programming for instruction scheduling and register allocation in multi-issue processors. Computers Mathematics and Applications, 34(9):1–14, 1997.

    Article  MathSciNet  Google Scholar 

  22. Chung-Kai Chen, Ling-Hua Tseng, Shih-Chang Chen, Young-Jia Lin, Yi-Ping You, Chia-Han Lu, and Jenq-Kuen Lee. Enabling compiler flow for embedded VLIW DSP processors with distributed register files. In Proc. LCTES’07, pages 146–148. ACM, 2007.

    Google Scholar 

  23. Hong-Chich Chou and Chung-Ping Chung. An Optimal Instruction Scheduler for Superscalar Processors. IEEE Trans. on Parallel and Distr. Syst., 6(3):303–313, 1995.

    Article  Google Scholar 

  24. Michael Chu, Kevin Fan, and Scott Mahlke. Region-based hierarchical operation partitioning for multicluster processors. In Proc. Int. Conf. on Progr. Lang. Design and Implem. (PLDI’03), pp. 300–311, ACM, June 2003.

    Google Scholar 

  25. Josep M. Codina, Jesus Sánchez, and Antonio González. A unified modulo scheduling and register allocation technique for clustered processors. In Proc. PACT-2001, September 2001.

    Google Scholar 

  26. Edward S. Davidson, Leonard E. Shar, A. Thampy Thomas, and Janak H. Patel. Effective control for pipelined computers. In Proc. Spring COMPCON75 Digest of Papers, pages 181–184. IEEE Computer Society, February 1975.

    Google Scholar 

  27. Giuseppe Desoli. Instruction assignment for clustered VLIW DSP compilers: a new approach. Technical Report HPL-98-13, HP Laboratories Cambridge, February 1998.

    Google Scholar 

  28. Benoit Dupont de Dinechin. Kalray MPPA® Massively Parallel Processor Array. Slide set, Hot Chips 27 Symposium, IEEE, August 2015.

    Google Scholar 

  29. Dietmar Ebner. SSA-based code generation techniques for embedded architectures. PhD thesis, Technische Universität Wien, Vienna, Austria, June 2009.

    Google Scholar 

  30. Erik Eckstein, Oliver König, and Bernhard Scholz. Code instruction selection based on SSA-graphs. In A. Krall, editor, Proc. SCOPES-2003, Springer LNCS 2826, pages 49–65, 2003.

    Chapter  Google Scholar 

  31. Alexandre E. Eichenberger and Edward S. Davidson. A reduced multipipeline machine description that preserves scheduling constraints. In Proc. Int. Conf. on Progr. Lang. Design and Implem. (PLDI’96), pages 12–22, New York, NY, USA, 1996. ACM Press.

    Google Scholar 

  32. Christine Eisenbeis and Antoine Sawaya. Optimal loop parallelization under register constraints. In Proc. 6th Workshop on Compilers for Parallel Computers (CPC’96), pages 245–259, December 1996.

    Google Scholar 

  33. John Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, MA, 1986.

    Google Scholar 

  34. Mattias Eriksson and Christoph Kessler. Integrated Code Generation for Loops. ACM Transactions on Embedded Computing Systems 11S(1), Article 19, 24 pages, ACM, June 2012.

    Google Scholar 

  35. Mattias Eriksson and Christoph Kessler. Integrated modulo scheduling for clustered VLIW architectures. In Proc. HiPEAC-2009 High-Performance and Embedded Architecture and Compilers, Paphos, Cyprus, pages 65–79. Springer LNCS 5409, January 2009.

    Google Scholar 

  36. Mattias Eriksson, Oskar Skoog, and Christoph Kessler. Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proc. 11th int. workshop on software and compilers for embedded systems (SCOPES’08). ACM, 2008.

    Google Scholar 

  37. M. Anton Ertl. Optimal Code Selection in DAGs. In Proc. Int. Symposium on Principles of Programming Languages (POPL’99). ACM, 1999.

    Google Scholar 

  38. Joseph A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput., C–30(7):478–490, July 1981.

    Google Scholar 

  39. Joseph A. Fisher, Paolo Faraboschi, and Cliff Young. Embedded computing: a VLIW approach to architecture, compilers and tools. Elsevier / Morgan Kaufmann, 2005.

    Google Scholar 

  40. Björn Franke. C Compilers and Code Optimization for DSPs. In S. S. Bhattacharyya, E. F. Deprettere, R. Leupers, and J. Takala, eds., Handbook of Signal Processing Systems, Second Edition, Springer 2012.

    Google Scholar 

  41. Christopher W. Fraser, David R. Hanson, and Todd A. Proebsting. Engineering a Simple, Efficient Code Generator Generator. Letters of Programming Languages and Systems, 1(3):213–226, September 1992.

    Article  Google Scholar 

  42. Stefan M. Freudenberger and John C. Ruttenberg. Phase ordering of register allocation and instruction scheduling. In Code Generation: Concepts, Tools, Techniques [44], pages 146–170, 1992.

    Google Scholar 

  43. Anup Gangwar, M. Balakrishnan, and Anshul Kumar. Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures. ACM Trans. Des. Autom. Electron. Syst., 12(1):1, 2007.

    Article  Google Scholar 

  44. Robert Giegerich and Susan L. Graham, editors. Code Generation - Concepts, Tools, Techniques. Springer Workshops in Computing, 1992.

    Google Scholar 

  45. R.S. Glanville and S.L. Graham. A New Method for Compiler Code Generation. In Proc. Int. Symposium on Principles of Programming Languages, pages 231–240, January 1978.

    Google Scholar 

  46. James R. Goodman and Wei-Chung Hsu. Code scheduling and register allocation in large basic blocks. In Proc. ACM Int. Conf. on Supercomputing, pages 442–452. ACM press, July 1988.

    Google Scholar 

  47. David W. Goodwin and Kent D. Wilken. Optimal and near-optimal global register allocations using 0–1 integer programming. Softw. Pract. Exper., 26(8):929–965, 1996.

    Article  Google Scholar 

  48. R. Govindarajan, Erik Altman, and Guang Gao. A framework for resource-constrained rate-optimal software pipelining. IEEE Trans. Parallel and Distr. Syst., 7(11):1133–1149, November 1996.

    Article  Google Scholar 

  49. R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45(9):1563–1581, November 1966.

    Article  Google Scholar 

  50. Daniel Grund and Sebastian Hack. A fast cutting-plane algorithm for optimal coalescing. In Proc. 16th int. conf. on compiler construction, pages 111–125, March 2007.

    Google Scholar 

  51. Sebastian Hack and Gerhard Goos. Optimal register allocation for SSA-form programs in polynomial time. Information Processing Letters, 98:150–155, 2006.

    Article  MathSciNet  Google Scholar 

  52. Todd Hahn, Eric Stotzer, Dineel Sule, and Mike Asal. Compilation strategies for reducing code size on a VLIW processor with variable length instructions. In Proc. HiPEAC’08 conference, pages 147–160. Springer LNCS 4917, 2008.

    Google Scholar 

  53. Silvina Hanono and Srinivas Devadas. Instruction scheduling, resource allocation, and scheduling in the AVIV retargetable code generator. In Proc. Design Automation Conf. ACM, 1998.

    Google Scholar 

  54. W. A. Havanki. Treegion scheduling for VLIW processors. M.S. thesis, Dept. Electrical and Computer Engineering, North Carolina State Univ., Raleigh, NC, USA, 1997.

    Google Scholar 

  55. Gabriel Hjort-Blindell. Instruction Selection – Principles, Methods, and Applications. Springer, 2016.

    Book  Google Scholar 

  56. Gabriel Hjort-Blindell, Mats Carlsson, Roberto Castaneda-Lozano, and Christian Schulte. Complete and practical univeral instruction selection. ACM Trans. on Embedded Computing Systems (TECS), 16(5s), Art. 119, Sep. 2017

    Google Scholar 

  57. L.P. Horwitz, R. M. Karp, R. E. Miller, and S. Winograd. Index register allocation. Journal of the ACM, 13(1):43–61, January 1966.

    Article  Google Scholar 

  58. Wei-Chung Hsu, Charles N. Fischer, and James R. Goodman. On the minimization of loads/stores in local register allocation. IEEE Trans. Softw. Eng., 15(10):1252–1262, October 1989.

    Article  Google Scholar 

  59. Wen-Mei Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput., 7(1-2):229–248, 1993.

    Article  Google Scholar 

  60. Krishnan Kailas, Kemal Ebcioglu, and Ashok Agrawala. CARS: A new code generation framework for clustered ILP processors. In Proc. 7th Int. Symp. on High-Performance Computer Architecture (HPCA’01), pages 133–143. IEEE Computer Society, June 2001.

    Google Scholar 

  61. Daniel Kästner. Retargetable Postpass Optimisations by Integer Linear Programming. PhD thesis, Universität des Saarlandes, Saarbrücken, Germany, 2000.

    Google Scholar 

  62. Christoph Kessler and Andrzej Bednarski. Optimal integrated code generation for clustered VLIW architectures. In Proc. ACM SIGPLAN Conf. on Languages, Compilers and Tools for Embedded Systems / Software and Compilers for Embedded Systems, LCTES-SCOPES’2002. ACM, June 2002.

    Google Scholar 

  63. Christoph Kessler and Andrzej Bednarski. Optimal integrated code generation for VLIW architectures. Concurrency and Computation: Practice and Experience, 18:1353–1390, 2006.

    Article  Google Scholar 

  64. Christoph Kessler, Andrzej Bednarski, and Mattias Eriksson. Classification and generation of schedules for VLIW processors. Concurrency and Computation: Practice and Experience, 19:2369–2389, 2007.

    Article  Google Scholar 

  65. Christoph W. Keßler. Scheduling Expression DAGs for Minimal Register Need. Computer Languages, 24(1):33–53, September 1998.

    Article  Google Scholar 

  66. Nikolai Kim and Andreas Krall. Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture. In Proc. 11th Worksh. on Optim. for DSP and Embedded Syst. (ODES’14), pp. 25–32, ACM, 2014.

    Google Scholar 

  67. Tokuzo Kiyohara and John C. Gyllenhaal. Code scheduling for VLIW/superscalar processors with limited register files. In Proc. 25th int. symp. on miocroarchitecture (MICRO-25). IEEE CS Press, 1992.

    Google Scholar 

  68. Monica Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. CC’88, pages 318–328, July 1988.

    Google Scholar 

  69. Rainer Leupers. Retargetable Code Generation for Digital Signal Processors. Kluwer, 1997.

    Book  Google Scholar 

  70. Rainer Leupers. Code Optimization Techniques for Embedded Processors. Kluwer, 2000.

    Book  Google Scholar 

  71. Rainer Leupers. Instruction scheduling for clustered VLIW DSPs. In Proc. PACT’00 int. conference on parallel architectures and compilation. IEEE Computer Society, 2000.

    Google Scholar 

  72. Rainer Leupers and Steven Bashford. Graph-based code selection techniques for embedded processors. ACM TODAES, 5(4):794–814, October 2000.

    Article  Google Scholar 

  73. Rainer Leupers and Peter Marwedel. Time-constrained code compaction for DSPs. IEEE Transactions on VLSI Systems, 5(1):112–122, 1997.

    Article  Google Scholar 

  74. Dake Liu. Embedded DSP processor design. Morgan Kaufmann, 2008.

    Google Scholar 

  75. Josep Llosa, Antonio Gonzalez, Mateo Valero, and Eduard Ayguade. Swing Modulo Scheduling: A Lifetime-Sensitive Approach. In Proc. PACT’96 conference, pages 80–86. IEEE, 1996.

    Google Scholar 

  76. Josep Llosa, Mateo Valero, Eduard Ayguade, and Antonio Gonzalez. Hypernode reduction modulo scheduling. In Proc. 28th int. symp. on miocroarchitecture (MICRO-28), 1995.

    Google Scholar 

  77. M. Lorenz and P. Marwedel. Phase coupled code generation for DSPs using a genetic algorithm. In Proc. conf. on design automation and test in Europe (DATE’04), pages 1270–1275, 2004.

    Google Scholar 

  78. Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In Proc. 25th int. symp. on microarchitecture (MICRO-25), pages 45–54, December 1992.

    Google Scholar 

  79. Abid M. Malik, Michael Chase, Tyrel Russell, and Peter van Beek. An application of constraint programming to superblock instruction scheduling. In Proc. 14th Int. Conf. on Principles and Practice of Constraint Programming, pages 97–111, September 2008.

    Google Scholar 

  80. Waleed M. Meleis and Edward D. Davidson. Dual-issue scheduling with spills for binary trees. In Proc. 10th ACM-SIAM Symposium on Discrete Algorithms, pages 678 – 686. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1999.

    Google Scholar 

  81. Steven S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.

    Google Scholar 

  82. Thomas Müller. Employing finite automata for resource scheduling. In Proc. 26th int. symp. on microarchitecture (MICRO-26), pages 12–20. IEEE, December 1993.

    Google Scholar 

  83. S. G. Nagarakatte and R. Govindarajan. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proc. int. conf. on compiler construction (CC-2007), pages 126–140. Springer LNCS 4420, 2007.

    Google Scholar 

  84. Rahul Nagpal and Y. N. Srikant. Integrated temporal and spatial scheduling for extended operand clustered VLIW processors. In Proc. 1st conf. on Computing Frontiers, pages 457–470. ACM Press, 2004.

    Google Scholar 

  85. Steven Novack and Alexandru Nicolau. Mutation scheduling: A unified approach to compiling for fine-grained parallelism. In Proc. Workshop on compilers and languages for parallel computers (LCPC’94), pages 16–30. Springer LNCS 892, 1994.

    Google Scholar 

  86. NXP. Trimedia TM-1000. Data sheet, www.nxp.com, 1998.

  87. Erik Nyström and Alexandre E. Eichenberger. Effective cluster assignment for modulo scheduling. In Proc. 31st annual ACM/IEEE Int. symposium on microarchitecture (MICRO-31), IEEE CS Press, 1998.

    Google Scholar 

  88. Emre Özer, Sanjeev Banerjia, and Thomas M. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. In Proc. 31st annual ACM/IEEE Int. Symposium on Microarchitecture, pages 308–315. IEEE CS Press, 1998.

    Google Scholar 

  89. Massimiliano Poletto and Vivek Sarkar. Linear scan register allocation. ACM Transactions on Programming Languages and Systems, 21(5), September 1999.

    Google Scholar 

  90. Todd A. Proebsting and Christopher W. Fraser. Detecting pipeline structural hazards quickly. In Proc. 21st symp. on principles of programming languages (POPL’94), pages 280–286. ACM Press, 1994.

    Google Scholar 

  91. Qualcomm Technologies, Inc. Hexagon DSP Processor. Qualcomm Developer Network, https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor, last accessed March 2017

  92. B. Rau and C. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. 14th Annual Workshop on Microprogramming, pages 183–198, 1981.

    Article  Google Scholar 

  93. B. Ramakrishna Rau, Vinod Kathail, and Shail Aditya. Machine-description driven compilers for EPIC and VLIW processors. Design Automation for Embedded Systems, 4:71–118, 1999. Appeared also as technical report HPL-98-40 of HP labs, Sep. 1998.

    Google Scholar 

  94. Recore Systems. Xentium VLIW DSP IP core. Product brief, http://www.recoresystems.com/fileadmin/downloads/Product_briefs/2016-1.0_Xentium_Product_Brief.pdf, 2016.

  95. Richard Scales. Software development techniques for the TMS320C6201 DSP. Texas Instruments Application Report SPRA481, www.ti.com, December 1998.

  96. Eric J. Stotzer and Ernst L. Leiss. Modulo scheduling without overlapped lifetimes. In Proc. LCTES-2009, pages 1–10. ACM, June 2009.

    Google Scholar 

  97. Texas Instruments, Inc. TMS320C62x DSP CPU and instruction set reference guide. Document SPRU731A, www.ti.com, 2010.

  98. Texas Instruments, Inc. TMS320C66x DSP CPU and instruction set reference guide. Document SPRUGH7, www.ti.com, Nov. 2010.

  99. Texas Instruments, Inc. Optimizing loops on the C66x DSP. Application report SPRABG7, www.ti.com, Nov. 2010.

  100. Omri Traub, Glenn Holloway, and Michael D. Smith. Quality and Speed in Linear-scan Register Allocation. In Proc. ACM SIGPLAN Conf. on Progr. Lang. Design and Implem. (PLDI’98), pages 142–151, 1998.

    Google Scholar 

  101. Steven R. Vegdahl. A Dynamic-Programming Technique for Compacting Loops. In Proc. 25th annual ACM/IEEE Int. symposium on microarchitecture (MICRO-25), pages 180–188. IEEE CS Press, 1992.

    Google Scholar 

  102. Kent Wilken, Jack Liu, and Mark Heffernan. Optimal instruction scheduling using integer programming. In Proc. Int. Conf. on Progr. Lang. Design and Implem. (PLDI’00), pages 121–133, 2000.

    Google Scholar 

  103. Tom Wilson, Gary Grewal, Ben Halley, and Dilip Banerji. An integrated approach to retargetable code generation. In Proc. Int. Symposium on High-Level Synthesis, pages 70–75, May 1994.

    Google Scholar 

  104. Sebastian Winkel. Optimal global instruction scheduling for the Itanium processor architecture. PhD thesis, Universität des Saarlandes, Saarbrücken, Germany, September 2004.

    Google Scholar 

  105. Sebastian Winkel. Optimal versus heuristic global code scheduling. In Proc. 40th annual ACM/IEEE Int. symposium on microarchitecture (MICRO-40), pages 43–55, 2007.

    Google Scholar 

  106. Hongbo Yang, Ramaswamy Govindarajan, Guang R. Gao, George Cai, and Ziang Hu. Exploiting schedule slacks for rate-optimal power-minimum software pipelining. In Proc. Workshop on Compilers and Operating Systems for Low Power (COLP-2002), September 2002.

    Google Scholar 

  107. Javier Zalamea, Josep Llosa, Eduard Ayguade, and Mateo Valero. Modulo scheduling with integrated register spilling for clustered VLIW architectures. In Proc. ACM/IEEE Int. symp. on microarchitecture (MICRO-34), pages 160–169, 2001.

    Google Scholar 

  108. Thomas Zeitlhofer and Bernhard Wess. Operation scheduling for parallel functional units using genetic algorithms. In Proc. Int. Conf. on ICASSP ’99: Proceedings of the Acoustics, Speech, and Signal Processing (ICASSP’99), pages 1997–2000. IEEE Computer Society, 1999.

    Google Scholar 

Download references

Acknowledgements

The author thanks Mattias Eriksson and Dake Liu for discussions and commenting on a draft of this chapter. The author also thanks Eric Stotzer from Texas Instruments for interesting discussions about code generation for the TI ’C6x DSP processor family.

This work was funded by Vetenskapsrådet (project Integrated Software Pipelining), SSF (project DSP platform for emerging telecommunication and multimedia) and by SeRC, Parallel Software and Data Engineering (www.e-science.se).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph W. Kessler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kessler, C.W. (2019). Compiling for VLIW DSPs. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-91734-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91734-4_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91733-7

  • Online ISBN: 978-3-319-91734-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics