ACOTES Project: Advanced Compiler Technologies for Embedded Streaming

  • Harm Munk
  • Eduard Ayguadé
  • Cédric Bastoul
  • Paul Carpenter
  • Zbigniew Chamski
  • Albert Cohen
  • Marco Cornero
  • Philippe Dumont
  • Marc Duranton
  • Mohammed Fellahi
  • Roger Ferrer
  • Razya Ladelsky
  • Menno Lindwer
  • Xavier Martorell
  • Cupertino Miranda
  • Dorit Nuzman
  • Andrea Ornstein
  • Antoniu Pop
  • Sebastian Pop
  • Louis-Noël Pouchet
  • Alex Ramírez
  • David Ródenas
  • Erven Rohou
  • Ira Rosen
  • Uzi Shvadron
  • Konrad Trifunović
  • Ayal Zaks
Article

Abstract

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.

Keywords

Parallel architectures Compilers Streaming applications Automatic Parallelisation HiPEAC 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blossom E.: GNU Radio: tools for exploring the radio frequency spectrum. Linux J. 122, 4 (2004)Google Scholar
  2. 2.
    International Organization for Standardization, ISO/IEC JTC1/SC29/WG11: Coding of Moving Pictures and Audio, Overview of the MPEG-4 Standard, http://www.chiariglione.org/mpeg
  3. 3.
    Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos mercurium: a research compiler for OpenMP. In: Proceedings of the European Workshop on OpenMP 2004, October 2004Google Scholar
  4. 4.
    StreamIt Language Specification Version 2.1 (September 2006)Google Scholar
  5. 5.
    Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 151–162 (2006)Google Scholar
  6. 6.
    Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the ACM Conference on Programming Languages Design and Implementation (PLDI’08), pp. 114–124. ACM, New York (June 2008)Google Scholar
  7. 7.
    Lee E.A., Messerschmitt D.G.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–25 (1987)CrossRefMATHGoogle Scholar
  8. 8.
    Feautrier, P.: Scalable and modular scheduling. In: Pimentel, A.D., Vassiliadis, S. Computer Systems: Architectures, Modeling and Simulation (SAMOS’04), number 3133 in LNCS, pp. 433–442. Springer, Berlin (2004)Google Scholar
  9. 9.
    Pop, A., Pop, S.: A proposal for lastprivate clause on OpenMP task pragma, Technical report, MINES ParisTech, CRI—Centre de Recherche en Informatique, Mathématiques et Systèmes, 35 rue St Honoré 77305 Fontainebleau-Cedex, France. http://www.cri.ensmp.fr/classement/doc/A-403.pdf (2009)
  10. 10.
    Pouchet, L.-N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: part II, multidimensional time. In: ACM Conference on Programming Language Design and Implementation (PLDI’08), Tucson, Arizona (June 2008)Google Scholar
  11. 11.
    Fatahalian, K., Horn, D.R., Knightd, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: ACM/IEEE conference on Supercomputing (SC’06) (2006)Google Scholar
  12. 12.
    OpenMP Organization: OpenMP Application Program Interface, v. 3.0. http://www.openmp.org (May 2008)
  13. 13.
    Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Federico, M., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in OpenMP. In: Proceedings of the 3rd International Workshop on OpenMP (IWOMP), pp. 1–12 (2007)Google Scholar
  14. 14.
    Gonzàlez, M., Ayguadé, E., Martorell, X., Labarta, J.: Complex pipelined executions in OpenMP parallel applications. In: Proceedings of the 2001 International Conference on Parallel Processing (ICPP), pp. 295–304. IEEE Computer Society, Washington, DC (2001)Google Scholar
  15. 15.
    Gonzalez, M., Ayguade, E., Martorell, X., Labarta, J.: Exploiting pipelined executions in OpenMP. In: Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03), pp. 153–160 (2003)Google Scholar
  16. 16.
    Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the cell BE architecture. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)Google Scholar
  17. 17.
    Nijhuis, M., Bos, H., Bal, H., Augonnet, C.: Mapping and synchronizing streaming applications on Cell processors. In: International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Paphos, Cyprus (January 2009)Google Scholar
  18. 18.
  19. 19.
    Lundgren, W.I., Barnes, K.B., Steed, J.W.: Gedae: Auto Coding to a Virtual Machine. www.gedae.com (2004)
  20. 20.
    Carpenter, P., Ramirez, A., Martorell, X., Rodenas, D., Ferrer, R.: Report on Streaming Programming Model and Abstract Streaming Machine Final Version. Deliverable D2.2, IST ACOTES Project (September 2008)Google Scholar
  21. 21.
    Carpenter, P., Rodenas, D., Martorell, X., Ramirez, A., Ayguadé, E.: A streaming machine description and programming model. In: Vassiliadis S., et al. (eds.), Proceedings of the International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Lecture Notes in Computer Science, vol. 4599, pp. 107–116. Springer, Berlin (August 2007)Google Scholar
  22. 22.
    Fursin, G., Cohen, A.: Building a practical iterative interactive compiler. In: 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’07), Colocated with HiPEAC 2007 Conference (2007)Google Scholar
  23. 23.
    Girona, S., Labarta, J., Badia, R.M.: Validation of dimemas communication model for MPI collective operations. In: Proceedings of the 7th European PVM/MPI Users’ Group Meeting, Lecture Notes In Computer Science, vol. 1908, pp. 39–46. Springer (2000)Google Scholar
  24. 24.
    Carpenter, P.M., Ramirez, A., Ayguade, E.: Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. In: High Performance Embedded Architectures and Compilers 5th International Conference, HiPEAC’10, pp. 96–110. Springer (January 2010)Google Scholar
  25. 25.
    Pop, A., Pop, S., Jagasia, H., Sjödin, J., Kelly, P.H.J.: Improving GNU compiler collection infrastructure for streamization. In: Proceedings of the 2008 GCC Developers’ Summit, pp. 77–86. http://www.gccsummit.org/2008 (2008)
  26. 26.
    Fellahi, M., Cohen, A.: Software pipelining in nested loops with prolog-epilog merging. In: International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09), LNCS. Springer, Paphos, Cyprus (January 2009)Google Scholar
  27. 27.
    Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (June 2006), special issue on MicrogridsGoogle Scholar
  28. 28.
    Allen R., Kennedy K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Burlington (2001)Google Scholar
  29. 29.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelization and locality optimization system. In: ACM Conference on Programming Languages Design and Implementation (PLDI’08). Tucson, AZ (June 2008)Google Scholar
  30. 30.
    Naishlos, D.: Autovectorization in GCC. In: Proceedings of the GCC Developers’ summit, pp. 105–118. ftp://gcc.gnu.org/pub/gcc/summit/2004/Autovectorization.pdf (June 2004)
  31. 31.
    Scarborough R.G., Kolsky H.G.: A vectorizing Fortran compiler. IBM J. Res. Dev. 30(2), 163–171 (1986)CrossRefMathSciNetGoogle Scholar
  32. 32.
    Wolfe M.: High Performance Compilers for Parallel Computing. Addison Wesley, Reading (1996)MATHGoogle Scholar
  33. 33.
    Ngo, V.: Parallel loop transformation techniques for vector-based multiprocessor systems. Ph.D. thesis, University of Minnesota (1994)Google Scholar
  34. 34.
    Naishlos, D., Biberstein, M., Ben-David, S., Zaks, A.: Vectorizing for a SIMdD DSP architecture. In: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 2–11 (2003)Google Scholar
  35. 35.
    Nuzman, D., Namolaru, M., Zaks, A., Derby, J.H.: Compiling for an indirect vector register architecture. In: Proceedings of the 5th Conference on Computing Frontiers, pp. 199–208 (2008)Google Scholar
  36. 36.
    Nuzman, D., Zaks, A.: Outer-loop vectorization—revisited for short SIMD architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 2–11 (October 2008)Google Scholar
  37. 37.
    Trifunovic, K., Nuzman, D., Cohen, A., Zaks, A., Rosen, I.: Polyhedral-model guided loop-nest auto-vectorization. In: Parallel Architecture and Compilation Techniques (PACT’09). Raleigh (September 2009)Google Scholar
  38. 38.
    Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 132–143 (June 2006)Google Scholar
  39. 39.
  40. 40.
    Gschwind M., Erb D., Manning S., Nutter M.: An open source environment for cell broadband engine system software. IEEE Comput. 40(6), 37–47 (2007)Google Scholar
  41. 41.
    Weigand, U.: Porting the GNU tool chain to the cell architecture. In: Proceedings of the GCC Developers’ Summit, pp. 185–198. Ottawa, Canada (June 2005)Google Scholar
  42. 42.
    Rosen, I., Elliston, B., Eres, R., Modra, A., Nuzman, D., Weigand, U., Zaks, A., Edelsohn, D.: Compiling effectively for cell B.E. with GCC. In: 14th Workshop on Compilers for Parallel Computing (CPC) (January 2009)Google Scholar
  43. 43.
    Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California (March 2004)Google Scholar
  44. 44.
    ECMA International: Rue du Rhône 114, 1204 Geneva, Switzerland. Common Language Infrastructure (CLI) Partitions I to IV, 4th edn. (June 2006)Google Scholar
  45. 45.
    Novell: The Mono Project. http://www.mono-project.com
  46. 46.
    Southern Storm Software, Pty Ltd: DotGNU Project. http://dotgnu.org
  47. 47.
    Campanoni S., Agosta G., Reghizzi S.C.: A parallel dynamic compiler for CIL bytecode. SIGPLAN Not. 43(4), 11–20 (2008)CrossRefGoogle Scholar
  48. 48.
    Cornero, M., Rohou, E., Ornstein, A., Ladelsky, R.: Report on Back-end Formats. Deliverable D5.3, IST ACOTES Project (December 2007)Google Scholar
  49. 49.
    Costa, R., Ornstein, A.C., Rohou, E.: CLI back-end in GCC. In: Proceedings of the GCC Developers’ Summit, pp. 111–116 (July 2007)Google Scholar
  50. 50.
    Svelto, G., Ornstein, A., Rohou, E.: A stack-based internal representation for GCC. In: First International Workshop on GCC Research Opportunities (GROW), in Conjunction with HiPEAC 2009, pp. 37–48 (January 2009)Google Scholar
  51. 51.
    Bodin, F., Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P., Rohou, E.: Iterative compilation in a non-linear optimisation space. In: Workshop on Profile and Feedback-Directed Compilation (FDO-1), in conjunction with PACT ’98 (October 1998)Google Scholar
  52. 52.
    Pham, D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation CELL processor. In: Digest of Technical Papers, Solid-State Circuits Conference (ISSCC), Paper 10.2, pp. 184–185 (February 2005)Google Scholar
  53. 53.
    Flachs, B., Asano, S., Dhong, S., Hotstee, P., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H., Mueller, S., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N.: A streaming processor unit for a CELL processor. In: Digest of Technical Papers, Solid-State Circuits Conference (ISSCC), Paper 7.4, pp. 134–135 (February 2005)Google Scholar
  54. 54.
    Hoogerbrugge J., Terechko A.: A multithreaded multicore system for embedded media processing. Trans. High-Perform. Embed. Archit. Compil. 4(2), 168–187 (2008)Google Scholar
  55. 55.
    Al-Kadi, G., Terechko, A.S.: A hardware task scheduler for embedded video processing. In: Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC), number 5409 in LNCS, pp. 140–152 (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Harm Munk
    • 1
  • Eduard Ayguadé
    • 5
  • Cédric Bastoul
    • 3
  • Paul Carpenter
    • 5
  • Zbigniew Chamski
    • 1
  • Albert Cohen
    • 3
  • Marco Cornero
    • 4
  • Philippe Dumont
    • 1
    • 3
  • Marc Duranton
    • 1
  • Mohammed Fellahi
    • 3
  • Roger Ferrer
    • 5
  • Razya Ladelsky
    • 2
  • Menno Lindwer
    • 6
  • Xavier Martorell
    • 5
  • Cupertino Miranda
    • 3
  • Dorit Nuzman
    • 2
  • Andrea Ornstein
    • 4
  • Antoniu Pop
    • 7
  • Sebastian Pop
    • 8
  • Louis-Noël Pouchet
    • 3
  • Alex Ramírez
    • 5
  • David Ródenas
    • 5
  • Erven Rohou
    • 4
  • Ira Rosen
    • 2
  • Uzi Shvadron
    • 2
  • Konrad Trifunović
    • 3
  • Ayal Zaks
    • 2
  1. 1.NXP SemiconductorsEindhovenThe Netherlands
  2. 2.IBM Haifa Research LaboratoriesHaifaIsrael
  3. 3.Alchemy Group, INRIA Saclay and LRIParis-Sud 11 UniversityParisFrance
  4. 4.STMicroelectronicsCornaredoItaly
  5. 5.Universitat Politècnica de CatalunyaBarcelonaSpain
  6. 6.Silicon HiveEindhovenThe Netherlands
  7. 7.Centre de Recherche en InformatiqueMINES ParisTechParisFrance
  8. 8.Compiler Performance EngineeringAdvanced Micro DevicesAustinUSA

Personalised recommendations