Design Automation for Embedded Systems

, Volume 11, Issue 4, pp 249–283 | Cite as

Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC

  • Sang-Il Han
  • Soo-Ik Chae
  • Lisane Brisolara
  • Luigi Carro
  • Ricardo Reis
  • Xavier Guérin
  • Ahmed Amine Jerraya
Article

Abstract

Emerging embedded systems require heterogeneous multiprocessor SoC architectures that can satisfy both high-performance and programmability. However, as the complexity of embedded systems increases, software programming on an increasing number of multiprocessors faces several critical problems, such as multithreaded code generation, heterogeneous architecture adaptation, short design time, and low cost implementation. In this paper, we present a software code generation flow based on Simulink to address these problems. We propose a functional modeling style to capture data-intensive and control-dependent target applications, and a system architecture modeling style to seamlessly transform the functional model into the target architecture. Both models are described using Simulink. From a system architecture Simulink model, a code generator produces a multithreaded code, inserting thread and communication primitives to abstract the heterogeneity of the target architecture. In addition, the multithread code generator called LESCEA applies the extensions of dataflow based memory optimization techniques, considering both data and control dependency. Experimental results on a Motion-JPEG decoder and an H.264 decoder show that the proposed multithread code generator enables easy software programming on different multiprocessor architectures with substantially reduced data memory size (up to 68.0%) and code memory size (up to 15.9%).

Keywords

Multithreaded code generation Memory size reduction Multiprocessor SoC Simulink 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jerraya AA, Wolf W, Tenhunen H (eds) (2005) IEEE Comput, Special issue on MPSoC 38(7):36–40 Google Scholar
  2. 2.
  3. 3.
  4. 4.
    Ravikumar CP (2004) Multiprocessor architectures for embedded system-on-chip applications, vlsid. In: 17th international conference on VLSI design, p 512 Google Scholar
  5. 5.
    Keutzer K, Malik S, Newton R, Rabaey J, Sangiovanni-Vincentelli A (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput-Aided Des Integr Circuits Syst 19(12):1523–1543 CrossRefGoogle Scholar
  6. 6.
    International technology roadmap for semiconductors (ITRS) (2001). http://public.itrs.net
  7. 7.
    Simulink mathworks. http://www.mathworks.com
  8. 8.
    Han SI, Guerin X, Chae S-I, Jerraya AA (2006) Buffer memory optimization for video codec application modeled in Simulink. In: Proceedings of DAC’06, San Francisco, July 2006, pp 689–694 Google Scholar
  9. 9.
    Kahn G, MacQueen DB (1977) Coroutines and networks of parallel processes. In: Gilchrist B (ed) Proceedings of the information processing, vol 77. Toronto, Canada, pp 993–998 Google Scholar
  10. 10.
    Lee EA, Parks TM (1995) Dataflow process networks. Proc IEEE 83(5):773–801 CrossRefGoogle Scholar
  11. 11.
    Buck JT (1993) Scheduling dynamic dataflow graphs with bounded memory using the token flow model. PhD thesis, University of California, EECS Dept., Berkeley, CA. Technical Memorandum UCB/ERL M93/69 Google Scholar
  12. 12.
    Benveniste A, Caspi P, Edwards SA, Halbwachs N, Le Guernic P, de Simone R (2003) The synchronous languages 12 years later. Proc IEEE 91(1):64–83 CrossRefGoogle Scholar
  13. 13.
    Kopetz H (1998) The time-triggered architecture. In: Proceedings of ISORC’98, Kyoto, Japan Google Scholar
  14. 14.
    Benveniste A, Carloni L, Caspi P, Sangiovanni-Vincentelli A (2003) Heterogeneous reactive systems modeling and correct-by-construction deployment. In: Proceedings of the third international conference on embedded software Google Scholar
  15. 15.
    Han S-I, Chae S-I, Jerraya AA (2006) Functional modeling techniques for efficient SW code generation of video codec application. In: Proceedings of ASP-DAC’06, Japan, January 2006, pp 935–940 Google Scholar
  16. 16.
    Lieverse P, Van Der Wolf P, Vissers K, Deprettere E (2001) A methodology for architecture exploration of heterogeneous signal processing systems. J VLSI Signal Process Signal Image Video Technol 29(3):197–207 CrossRefMATHGoogle Scholar
  17. 17.
    Pimentel AD, Erbas C, Polstra S (2006) A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Trans Comput 55(2):99–112 CrossRefGoogle Scholar
  18. 18.
  19. 19.
    Dwivedi SK, Kumar A, Balakrishnan M (2004) Automatic synthesis of system on chip multiprocessor architectures for process networks. In: Proceedings of CODES+ISSS’04, Sweden, September 2004, pp 60–65 Google Scholar
  20. 20.
    Open systemc initiative. Online available at http://www.systemc.org/
  21. 21.
    Herrera F, Posadas H, Sanchez P, Villar E (2003) Systematic embedded software generation from SystemC. In: Proceedings of DATE’03 Google Scholar
  22. 22.
    Yu H, Doemer R, Gajski D (2004) Embedded software generation from system-level design languages. In: Proceedings of ASP-DAC’04 Google Scholar
  23. 23.
    Buck JT, Ha S, Lee EA, Messerschmitt DG (2004) Ptolemy: a framework for simulating and prototyping heterogeneous systems. Int J Comput Simul 4:155–182 Google Scholar
  24. 24.
    Pino JL, Bhattacharyya SS, Lee EA (1995) A hierarchical multiprocessor scheduling system for DSP applications. In: Proceedings of the IEEE asilomar conference on signals, systems, and computers, November 1995 Google Scholar
  25. 25.
    Banerjee P, Shenoy N, Choudhary A, Hauck S, Bachmann C, Haldar M, Joisha P, Jones A, Kanhare A, Nayak A, Periyacheri S, Walkden M, Zaretsky D (2000) A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In: Proceedings of FCCM’00, California, April 2000 Google Scholar
  26. 26.
    Real-time workshop. Mathworks. http://www.mathworks.com
  27. 27.
  28. 28.
    Murthy PK, Bhattacharyya SS (2001) Shared buffer implementations of signal processing systems using lifetime analysis techniques. IEEE Trans Comput-Aided Des Integr Circuits Syst 20(2):177–198 CrossRefGoogle Scholar
  29. 29.
    Oh H, Ha S (2003) Memory-optimized software synthesis from dataflow program graphs with large size data samples. EURASIP J Appl Signal Process 2003:514–529 CrossRefMATHGoogle Scholar
  30. 30.
    Ritz S, Willems M, Meyr H (1995) Scheduling for optimum data memory compaction in block diagram oriented software synthesis. In: Proceedings of ICASS’95, Detroit, May 1995, pp 2651–2653 Google Scholar
  31. 31.
    Balasa F, Catthoor F, De Man H (1995) Background memory area estimation for multidimensional signal processing systems. IEEE Trans. Comput. Des. Integr. Circuits Syst. 3(2):157–172 Google Scholar
  32. 32.
    De Greef E, Catthoor F, De Man H (1998) Program transformation strategies for memory size and power reduction of pseudo-regular multimedia subsystems. IEEE Trans Circuits Syst Video Technol 8(6):719–733 CrossRefGoogle Scholar
  33. 33.
    Greef ED, Catthoor F, Man HD (1997) Array placement for storage size reduction in embedded multimedia systems. In: Proceedings of ASAP’97, Zurich, July 1997 Google Scholar
  34. 34.
    Fabri J (1979) Automatic storage optimization. ACM SIGPLAN’79 Not 14(8):83–91 CrossRefGoogle Scholar
  35. 35.
    Zhu J (2001) Static memory allocation by pointer analysis and coloring. In: Proceedings of DATE’01, Munich, March 2001, pp 785–790 Google Scholar
  36. 36.
    Joisha PG, Banerjee P (2003) Static array storage optimization in MATLAB. In: ACM SIGPLAN 2003, California, pp 258–268 Google Scholar
  37. 37.
    Jantsch A (2003) Modeling embedded systems and SoCs—concurrency and time in models of computation. Kaufmann, Los Altos Google Scholar
  38. 38.
    Lee EA, Sangiovanni-Vincentelli A (1998) A framework for comparing models of computation. IEEE Trans CAD Integr Circuits Syst 17(12):1217–1229 CrossRefGoogle Scholar
  39. 39.
    Cesario WO, Nicolescu G, Gauthier L, Lyonnard D, Jerraya AA (2001) Colif: a design representation for application-specific multiprocessor SoC. IEEE Des Test Comput 18(5):18–20 CrossRefGoogle Scholar
  40. 40.
    Cormen TH, Leiserson CE, Rivest RL (1990) Introduction to algorithms. MIT Press, Cambridge, pp 329–355 Google Scholar
  41. 41.
    Tensilica Xtensa V. http://www.tensilica.com
  42. 42.
    Mathworks Inc. Tips for optimizing the generated code. In: Real-time workshop embedded coder 5, pp 84–94. http://www.mathworks.com
  43. 43.
    Huang K, Han S-I, Popovici K, Brisolara L, Guerin X, Li L, Yan X, Chae S-I, Carro L, Jerraya AA (2007) Simulink-based MPSoC design flow: case study of motion-JPEG and H.264. In: Proceedings of DAC’07, San Diego, June 2007, pp 39–42 Google Scholar
  44. 44.
    Wood WA, Kleb WL (2003) Exploring XP for scientific research. IEEE Soft 20(3):30–36 CrossRefGoogle Scholar
  45. 45.
    Tensilica. XPRES compiler. http://www.tensilica.com/products/xpres.htm
  46. 46.
    Banerjee P, Chandy JA, Gupta M, Hodges IV EW, Holm JG, Lain A, Palermo DJ, Ramaswamy S, Su E (1995) The paradigm compiler for distributed-memory multicomputers. Computer 28(10):37–47 CrossRefGoogle Scholar
  47. 47.
    POSIX 1003.1c threading, IEEE POSIX 1003.1c-1995, ISO/IEC 9945-1:1996 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Sang-Il Han
    • 1
  • Soo-Ik Chae
    • 1
  • Lisane Brisolara
    • 2
  • Luigi Carro
    • 2
  • Ricardo Reis
    • 2
  • Xavier Guérin
    • 3
  • Ahmed Amine Jerraya
    • 4
  1. 1.School of Computer Science and EngineeringSeoul National UniversityKwanak-gu SeoulSouth Korea
  2. 2.Instituto de InformaticaFederal University of Rio Grande do SulPorto AlegreBrazil
  3. 3.TIMA LaboratoryGrenobleFrance
  4. 4.CEA-LETI, MINATECGrenobleFrance

Personalised recommendations