Skip to main content


Log in

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript


Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Blossom E.: GNU Radio: tools for exploring the radio frequency spectrum. Linux J. 122, 4 (2004)

    Google Scholar 

  2. International Organization for Standardization, ISO/IEC JTC1/SC29/WG11: Coding of Moving Pictures and Audio, Overview of the MPEG-4 Standard,

  3. Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos mercurium: a research compiler for OpenMP. In: Proceedings of the European Workshop on OpenMP 2004, October 2004

  4. StreamIt Language Specification Version 2.1 (September 2006)

  5. Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 151–162 (2006)

  6. Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the ACM Conference on Programming Languages Design and Implementation (PLDI’08), pp. 114–124. ACM, New York (June 2008)

  7. Lee E.A., Messerschmitt D.G.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–25 (1987)

    Article  MATH  Google Scholar 

  8. Feautrier, P.: Scalable and modular scheduling. In: Pimentel, A.D., Vassiliadis, S. Computer Systems: Architectures, Modeling and Simulation (SAMOS’04), number 3133 in LNCS, pp. 433–442. Springer, Berlin (2004)

  9. Pop, A., Pop, S.: A proposal for lastprivate clause on OpenMP task pragma, Technical report, MINES ParisTech, CRI—Centre de Recherche en Informatique, Mathématiques et Systèmes, 35 rue St Honoré 77305 Fontainebleau-Cedex, France. (2009)

  10. Pouchet, L.-N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: part II, multidimensional time. In: ACM Conference on Programming Language Design and Implementation (PLDI’08), Tucson, Arizona (June 2008)

  11. Fatahalian, K., Horn, D.R., Knightd, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: ACM/IEEE conference on Supercomputing (SC’06) (2006)

  12. OpenMP Organization: OpenMP Application Program Interface, v. 3.0. (May 2008)

  13. Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Federico, M., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in OpenMP. In: Proceedings of the 3rd International Workshop on OpenMP (IWOMP), pp. 1–12 (2007)

  14. Gonzàlez, M., Ayguadé, E., Martorell, X., Labarta, J.: Complex pipelined executions in OpenMP parallel applications. In: Proceedings of the 2001 International Conference on Parallel Processing (ICPP), pp. 295–304. IEEE Computer Society, Washington, DC (2001)

  15. Gonzalez, M., Ayguade, E., Martorell, X., Labarta, J.: Exploiting pipelined executions in OpenMP. In: Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03), pp. 153–160 (2003)

  16. Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the cell BE architecture. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)

  17. Nijhuis, M., Bos, H., Bal, H., Augonnet, C.: Mapping and synchronizing streaming applications on Cell processors. In: International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Paphos, Cyprus (January 2009)

  18. ILOG. CPLEX Math Programming Engine.

  19. Lundgren, W.I., Barnes, K.B., Steed, J.W.: Gedae: Auto Coding to a Virtual Machine. (2004)

  20. Carpenter, P., Ramirez, A., Martorell, X., Rodenas, D., Ferrer, R.: Report on Streaming Programming Model and Abstract Streaming Machine Final Version. Deliverable D2.2, IST ACOTES Project (September 2008)

  21. Carpenter, P., Rodenas, D., Martorell, X., Ramirez, A., Ayguadé, E.: A streaming machine description and programming model. In: Vassiliadis S., et al. (eds.), Proceedings of the International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Lecture Notes in Computer Science, vol. 4599, pp. 107–116. Springer, Berlin (August 2007)

  22. Fursin, G., Cohen, A.: Building a practical iterative interactive compiler. In: 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’07), Colocated with HiPEAC 2007 Conference (2007)

  23. Girona, S., Labarta, J., Badia, R.M.: Validation of dimemas communication model for MPI collective operations. In: Proceedings of the 7th European PVM/MPI Users’ Group Meeting, Lecture Notes In Computer Science, vol. 1908, pp. 39–46. Springer (2000)

  24. Carpenter, P.M., Ramirez, A., Ayguade, E.: Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. In: High Performance Embedded Architectures and Compilers 5th International Conference, HiPEAC’10, pp. 96–110. Springer (January 2010)

  25. Pop, A., Pop, S., Jagasia, H., Sjödin, J., Kelly, P.H.J.: Improving GNU compiler collection infrastructure for streamization. In: Proceedings of the 2008 GCC Developers’ Summit, pp. 77–86. (2008)

  26. Fellahi, M., Cohen, A.: Software pipelining in nested loops with prolog-epilog merging. In: International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09), LNCS. Springer, Paphos, Cyprus (January 2009)

  27. Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (June 2006), special issue on Microgrids

  28. Allen R., Kennedy K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Burlington (2001)

    Google Scholar 

  29. Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelization and locality optimization system. In: ACM Conference on Programming Languages Design and Implementation (PLDI’08). Tucson, AZ (June 2008)

  30. Naishlos, D.: Autovectorization in GCC. In: Proceedings of the GCC Developers’ summit, pp. 105–118. (June 2004)

  31. Scarborough R.G., Kolsky H.G.: A vectorizing Fortran compiler. IBM J. Res. Dev. 30(2), 163–171 (1986)

    Article  MathSciNet  Google Scholar 

  32. Wolfe M.: High Performance Compilers for Parallel Computing. Addison Wesley, Reading (1996)

    MATH  Google Scholar 

  33. Ngo, V.: Parallel loop transformation techniques for vector-based multiprocessor systems. Ph.D. thesis, University of Minnesota (1994)

  34. Naishlos, D., Biberstein, M., Ben-David, S., Zaks, A.: Vectorizing for a SIMdD DSP architecture. In: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 2–11 (2003)

  35. Nuzman, D., Namolaru, M., Zaks, A., Derby, J.H.: Compiling for an indirect vector register architecture. In: Proceedings of the 5th Conference on Computing Frontiers, pp. 199–208 (2008)

  36. Nuzman, D., Zaks, A.: Outer-loop vectorization—revisited for short SIMD architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 2–11 (October 2008)

  37. Trifunovic, K., Nuzman, D., Cohen, A., Zaks, A., Rosen, I.: Polyhedral-model guided loop-nest auto-vectorization. In: Parallel Architecture and Compilation Techniques (PACT’09). Raleigh (September 2009)

  38. Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 132–143 (June 2006)

  39. Lee, C.G.: UTDSP Benchmarks. (1998)

  40. Gschwind M., Erb D., Manning S., Nutter M.: An open source environment for cell broadband engine system software. IEEE Comput. 40(6), 37–47 (2007)

    Google Scholar 

  41. Weigand, U.: Porting the GNU tool chain to the cell architecture. In: Proceedings of the GCC Developers’ Summit, pp. 185–198. Ottawa, Canada (June 2005)

  42. Rosen, I., Elliston, B., Eres, R., Modra, A., Nuzman, D., Weigand, U., Zaks, A., Edelsohn, D.: Compiling effectively for cell B.E. with GCC. In: 14th Workshop on Compilers for Parallel Computing (CPC) (January 2009)

  43. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California (March 2004)

  44. ECMA International: Rue du Rhône 114, 1204 Geneva, Switzerland. Common Language Infrastructure (CLI) Partitions I to IV, 4th edn. (June 2006)

  45. Novell: The Mono Project.

  46. Southern Storm Software, Pty Ltd: DotGNU Project.

  47. Campanoni S., Agosta G., Reghizzi S.C.: A parallel dynamic compiler for CIL bytecode. SIGPLAN Not. 43(4), 11–20 (2008)

    Article  Google Scholar 

  48. Cornero, M., Rohou, E., Ornstein, A., Ladelsky, R.: Report on Back-end Formats. Deliverable D5.3, IST ACOTES Project (December 2007)

  49. Costa, R., Ornstein, A.C., Rohou, E.: CLI back-end in GCC. In: Proceedings of the GCC Developers’ Summit, pp. 111–116 (July 2007)

  50. Svelto, G., Ornstein, A., Rohou, E.: A stack-based internal representation for GCC. In: First International Workshop on GCC Research Opportunities (GROW), in Conjunction with HiPEAC 2009, pp. 37–48 (January 2009)

  51. Bodin, F., Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P., Rohou, E.: Iterative compilation in a non-linear optimisation space. In: Workshop on Profile and Feedback-Directed Compilation (FDO-1), in conjunction with PACT ’98 (October 1998)

  52. Pham, D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation CELL processor. In: Digest of Technical Papers, Solid-State Circuits Conference (ISSCC), Paper 10.2, pp. 184–185 (February 2005)

  53. Flachs, B., Asano, S., Dhong, S., Hotstee, P., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H., Mueller, S., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N.: A streaming processor unit for a CELL processor. In: Digest of Technical Papers, Solid-State Circuits Conference (ISSCC), Paper 7.4, pp. 134–135 (February 2005)

  54. Hoogerbrugge J., Terechko A.: A multithreaded multicore system for embedded media processing. Trans. High-Perform. Embed. Archit. Compil. 4(2), 168–187 (2008)

    Google Scholar 

  55. Al-Kadi, G., Terechko, A.S.: A hardware task scheduler for embedded video processing. In: Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC), number 5409 in LNCS, pp. 140–152 (2009)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Harm Munk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Munk, H., Ayguadé, E., Bastoul, C. et al. ACOTES Project: Advanced Compiler Technologies for Embedded Streaming. Int J Parallel Prog 39, 397–450 (2011).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: