Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC


Emerging embedded systems require heterogeneous multiprocessor SoC architectures that can satisfy both high-performance and programmability. However, as the complexity of embedded systems increases, software programming on an increasing number of multiprocessors faces several critical problems, such as multithreaded code generation, heterogeneous architecture adaptation, short design time, and low cost implementation. In this paper, we present a software code generation flow based on Simulink to address these problems. We propose a functional modeling style to capture data-intensive and control-dependent target applications, and a system architecture modeling style to seamlessly transform the functional model into the target architecture. Both models are described using Simulink. From a system architecture Simulink model, a code generator produces a multithreaded code, inserting thread and communication primitives to abstract the heterogeneity of the target architecture. In addition, the multithread code generator called LESCEA applies the extensions of dataflow based memory optimization techniques, considering both data and control dependency. Experimental results on a Motion-JPEG decoder and an H.264 decoder show that the proposed multithread code generator enables easy software programming on different multiprocessor architectures with substantially reduced data memory size (up to 68.0%) and code memory size (up to 15.9%).

This is a preview of subscription content, access via your institution.


  1. 1.

    Jerraya AA, Wolf W, Tenhunen H (eds) (2005) IEEE Comput, Special issue on MPSoC 38(7):36–40

  2. 2.

    Cradle CT3600 Family™.

  3. 3.

    IBM Cell™.

  4. 4.

    Ravikumar CP (2004) Multiprocessor architectures for embedded system-on-chip applications, vlsid. In: 17th international conference on VLSI design, p 512

  5. 5.

    Keutzer K, Malik S, Newton R, Rabaey J, Sangiovanni-Vincentelli A (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput-Aided Des Integr Circuits Syst 19(12):1523–1543

    Article  Google Scholar 

  6. 6.

    International technology roadmap for semiconductors (ITRS) (2001).

  7. 7.

    Simulink mathworks.

  8. 8.

    Han SI, Guerin X, Chae S-I, Jerraya AA (2006) Buffer memory optimization for video codec application modeled in Simulink. In: Proceedings of DAC’06, San Francisco, July 2006, pp 689–694

  9. 9.

    Kahn G, MacQueen DB (1977) Coroutines and networks of parallel processes. In: Gilchrist B (ed) Proceedings of the information processing, vol 77. Toronto, Canada, pp 993–998

  10. 10.

    Lee EA, Parks TM (1995) Dataflow process networks. Proc IEEE 83(5):773–801

    Article  Google Scholar 

  11. 11.

    Buck JT (1993) Scheduling dynamic dataflow graphs with bounded memory using the token flow model. PhD thesis, University of California, EECS Dept., Berkeley, CA. Technical Memorandum UCB/ERL M93/69

  12. 12.

    Benveniste A, Caspi P, Edwards SA, Halbwachs N, Le Guernic P, de Simone R (2003) The synchronous languages 12 years later. Proc IEEE 91(1):64–83

    Article  Google Scholar 

  13. 13.

    Kopetz H (1998) The time-triggered architecture. In: Proceedings of ISORC’98, Kyoto, Japan

  14. 14.

    Benveniste A, Carloni L, Caspi P, Sangiovanni-Vincentelli A (2003) Heterogeneous reactive systems modeling and correct-by-construction deployment. In: Proceedings of the third international conference on embedded software

  15. 15.

    Han S-I, Chae S-I, Jerraya AA (2006) Functional modeling techniques for efficient SW code generation of video codec application. In: Proceedings of ASP-DAC’06, Japan, January 2006, pp 935–940

  16. 16.

    Lieverse P, Van Der Wolf P, Vissers K, Deprettere E (2001) A methodology for architecture exploration of heterogeneous signal processing systems. J VLSI Signal Process Signal Image Video Technol 29(3):197–207

    Article  MATH  Google Scholar 

  17. 17.

    Pimentel AD, Erbas C, Polstra S (2006) A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Trans Comput 55(2):99–112

    Article  Google Scholar 

  18. 18.

    Artemis project.

  19. 19.

    Dwivedi SK, Kumar A, Balakrishnan M (2004) Automatic synthesis of system on chip multiprocessor architectures for process networks. In: Proceedings of CODES+ISSS’04, Sweden, September 2004, pp 60–65

  20. 20.

    Open systemc initiative. Online available at

  21. 21.

    Herrera F, Posadas H, Sanchez P, Villar E (2003) Systematic embedded software generation from SystemC. In: Proceedings of DATE’03

  22. 22.

    Yu H, Doemer R, Gajski D (2004) Embedded software generation from system-level design languages. In: Proceedings of ASP-DAC’04

  23. 23.

    Buck JT, Ha S, Lee EA, Messerschmitt DG (2004) Ptolemy: a framework for simulating and prototyping heterogeneous systems. Int J Comput Simul 4:155–182

    Google Scholar 

  24. 24.

    Pino JL, Bhattacharyya SS, Lee EA (1995) A hierarchical multiprocessor scheduling system for DSP applications. In: Proceedings of the IEEE asilomar conference on signals, systems, and computers, November 1995

  25. 25.

    Banerjee P, Shenoy N, Choudhary A, Hauck S, Bachmann C, Haldar M, Joisha P, Jones A, Kanhare A, Nayak A, Periyacheri S, Walkden M, Zaretsky D (2000) A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In: Proceedings of FCCM’00, California, April 2000

  26. 26.

    Real-time workshop. Mathworks.

  27. 27.


  28. 28.

    Murthy PK, Bhattacharyya SS (2001) Shared buffer implementations of signal processing systems using lifetime analysis techniques. IEEE Trans Comput-Aided Des Integr Circuits Syst 20(2):177–198

    Article  Google Scholar 

  29. 29.

    Oh H, Ha S (2003) Memory-optimized software synthesis from dataflow program graphs with large size data samples. EURASIP J Appl Signal Process 2003:514–529

    Article  MATH  Google Scholar 

  30. 30.

    Ritz S, Willems M, Meyr H (1995) Scheduling for optimum data memory compaction in block diagram oriented software synthesis. In: Proceedings of ICASS’95, Detroit, May 1995, pp 2651–2653

  31. 31.

    Balasa F, Catthoor F, De Man H (1995) Background memory area estimation for multidimensional signal processing systems. IEEE Trans. Comput. Des. Integr. Circuits Syst. 3(2):157–172

    Google Scholar 

  32. 32.

    De Greef E, Catthoor F, De Man H (1998) Program transformation strategies for memory size and power reduction of pseudo-regular multimedia subsystems. IEEE Trans Circuits Syst Video Technol 8(6):719–733

    Article  Google Scholar 

  33. 33.

    Greef ED, Catthoor F, Man HD (1997) Array placement for storage size reduction in embedded multimedia systems. In: Proceedings of ASAP’97, Zurich, July 1997

  34. 34.

    Fabri J (1979) Automatic storage optimization. ACM SIGPLAN’79 Not 14(8):83–91

    Article  Google Scholar 

  35. 35.

    Zhu J (2001) Static memory allocation by pointer analysis and coloring. In: Proceedings of DATE’01, Munich, March 2001, pp 785–790

  36. 36.

    Joisha PG, Banerjee P (2003) Static array storage optimization in MATLAB. In: ACM SIGPLAN 2003, California, pp 258–268

  37. 37.

    Jantsch A (2003) Modeling embedded systems and SoCs—concurrency and time in models of computation. Kaufmann, Los Altos

    Google Scholar 

  38. 38.

    Lee EA, Sangiovanni-Vincentelli A (1998) A framework for comparing models of computation. IEEE Trans CAD Integr Circuits Syst 17(12):1217–1229

    Article  Google Scholar 

  39. 39.

    Cesario WO, Nicolescu G, Gauthier L, Lyonnard D, Jerraya AA (2001) Colif: a design representation for application-specific multiprocessor SoC. IEEE Des Test Comput 18(5):18–20

    Article  Google Scholar 

  40. 40.

    Cormen TH, Leiserson CE, Rivest RL (1990) Introduction to algorithms. MIT Press, Cambridge, pp 329–355

  41. 41.

    Tensilica Xtensa V.

  42. 42.

    Mathworks Inc. Tips for optimizing the generated code. In: Real-time workshop embedded coder 5, pp 84–94.

  43. 43.

    Huang K, Han S-I, Popovici K, Brisolara L, Guerin X, Li L, Yan X, Chae S-I, Carro L, Jerraya AA (2007) Simulink-based MPSoC design flow: case study of motion-JPEG and H.264. In: Proceedings of DAC’07, San Diego, June 2007, pp 39–42

  44. 44.

    Wood WA, Kleb WL (2003) Exploring XP for scientific research. IEEE Soft 20(3):30–36

    Article  Google Scholar 

  45. 45.

    Tensilica. XPRES compiler.

  46. 46.

    Banerjee P, Chandy JA, Gupta M, Hodges IV EW, Holm JG, Lain A, Palermo DJ, Ramaswamy S, Su E (1995) The paradigm compiler for distributed-memory multicomputers. Computer 28(10):37–47

    Article  Google Scholar 

  47. 47.

    POSIX 1003.1c threading, IEEE POSIX 1003.1c-1995, ISO/IEC 9945-1:1996

Download references

Author information



Corresponding author

Correspondence to Sang-Il Han.

Additional information

This manuscript has been extended with multithreaded code generation based on “Buffer memory optimization for video codec application modeled in Simulink” by Sang-Il Han, Ahmed A. Jerraya, et. al., which appeared in the Proceedings of the DAC 2006 and “Functional modeling techniques for efficient SW code generation of video codec application” by Sang-Il Han, Ahmed A. Jerraya, et. al., which appeared in the Proceedings of the ASPDAC 2006.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Han, SI., Chae, SI., Brisolara, L. et al. Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC. Des Autom Embed Syst 11, 249–283 (2007).

Download citation


  • Multithreaded code generation
  • Memory size reduction
  • Multiprocessor SoC
  • Simulink