Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC
- 192 Downloads
- 3 Citations
Abstract
Emerging embedded systems require heterogeneous multiprocessor SoC architectures that can satisfy both high-performance and programmability. However, as the complexity of embedded systems increases, software programming on an increasing number of multiprocessors faces several critical problems, such as multithreaded code generation, heterogeneous architecture adaptation, short design time, and low cost implementation. In this paper, we present a software code generation flow based on Simulink to address these problems. We propose a functional modeling style to capture data-intensive and control-dependent target applications, and a system architecture modeling style to seamlessly transform the functional model into the target architecture. Both models are described using Simulink. From a system architecture Simulink model, a code generator produces a multithreaded code, inserting thread and communication primitives to abstract the heterogeneity of the target architecture. In addition, the multithread code generator called LESCEA applies the extensions of dataflow based memory optimization techniques, considering both data and control dependency. Experimental results on a Motion-JPEG decoder and an H.264 decoder show that the proposed multithread code generator enables easy software programming on different multiprocessor architectures with substantially reduced data memory size (up to 68.0%) and code memory size (up to 15.9%).
Keywords
Multithreaded code generation Memory size reduction Multiprocessor SoC SimulinkPreview
Unable to display preview. Download preview PDF.
References
- 1.Jerraya AA, Wolf W, Tenhunen H (eds) (2005) IEEE Comput, Special issue on MPSoC 38(7):36–40 Google Scholar
- 2.Cradle CT3600 Family™. http://www.cradle.com/products/sil_3600_family.shtml
- 3.
- 4.Ravikumar CP (2004) Multiprocessor architectures for embedded system-on-chip applications, vlsid. In: 17th international conference on VLSI design, p 512 Google Scholar
- 5.Keutzer K, Malik S, Newton R, Rabaey J, Sangiovanni-Vincentelli A (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput-Aided Des Integr Circuits Syst 19(12):1523–1543 CrossRefGoogle Scholar
- 6.International technology roadmap for semiconductors (ITRS) (2001). http://public.itrs.net
- 7.Simulink mathworks. http://www.mathworks.com
- 8.Han SI, Guerin X, Chae S-I, Jerraya AA (2006) Buffer memory optimization for video codec application modeled in Simulink. In: Proceedings of DAC’06, San Francisco, July 2006, pp 689–694 Google Scholar
- 9.Kahn G, MacQueen DB (1977) Coroutines and networks of parallel processes. In: Gilchrist B (ed) Proceedings of the information processing, vol 77. Toronto, Canada, pp 993–998 Google Scholar
- 10.Lee EA, Parks TM (1995) Dataflow process networks. Proc IEEE 83(5):773–801 CrossRefGoogle Scholar
- 11.Buck JT (1993) Scheduling dynamic dataflow graphs with bounded memory using the token flow model. PhD thesis, University of California, EECS Dept., Berkeley, CA. Technical Memorandum UCB/ERL M93/69 Google Scholar
- 12.Benveniste A, Caspi P, Edwards SA, Halbwachs N, Le Guernic P, de Simone R (2003) The synchronous languages 12 years later. Proc IEEE 91(1):64–83 CrossRefGoogle Scholar
- 13.Kopetz H (1998) The time-triggered architecture. In: Proceedings of ISORC’98, Kyoto, Japan Google Scholar
- 14.Benveniste A, Carloni L, Caspi P, Sangiovanni-Vincentelli A (2003) Heterogeneous reactive systems modeling and correct-by-construction deployment. In: Proceedings of the third international conference on embedded software Google Scholar
- 15.Han S-I, Chae S-I, Jerraya AA (2006) Functional modeling techniques for efficient SW code generation of video codec application. In: Proceedings of ASP-DAC’06, Japan, January 2006, pp 935–940 Google Scholar
- 16.Lieverse P, Van Der Wolf P, Vissers K, Deprettere E (2001) A methodology for architecture exploration of heterogeneous signal processing systems. J VLSI Signal Process Signal Image Video Technol 29(3):197–207 CrossRefMATHGoogle Scholar
- 17.Pimentel AD, Erbas C, Polstra S (2006) A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Trans Comput 55(2):99–112 CrossRefGoogle Scholar
- 18.Artemis project. http://ce.et.tudelft.nl/artemis/
- 19.Dwivedi SK, Kumar A, Balakrishnan M (2004) Automatic synthesis of system on chip multiprocessor architectures for process networks. In: Proceedings of CODES+ISSS’04, Sweden, September 2004, pp 60–65 Google Scholar
- 20.Open systemc initiative. Online available at http://www.systemc.org/
- 21.Herrera F, Posadas H, Sanchez P, Villar E (2003) Systematic embedded software generation from SystemC. In: Proceedings of DATE’03 Google Scholar
- 22.Yu H, Doemer R, Gajski D (2004) Embedded software generation from system-level design languages. In: Proceedings of ASP-DAC’04 Google Scholar
- 23.Buck JT, Ha S, Lee EA, Messerschmitt DG (2004) Ptolemy: a framework for simulating and prototyping heterogeneous systems. Int J Comput Simul 4:155–182 Google Scholar
- 24.Pino JL, Bhattacharyya SS, Lee EA (1995) A hierarchical multiprocessor scheduling system for DSP applications. In: Proceedings of the IEEE asilomar conference on signals, systems, and computers, November 1995 Google Scholar
- 25.Banerjee P, Shenoy N, Choudhary A, Hauck S, Bachmann C, Haldar M, Joisha P, Jones A, Kanhare A, Nayak A, Periyacheri S, Walkden M, Zaretsky D (2000) A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In: Proceedings of FCCM’00, California, April 2000 Google Scholar
- 26.Real-time workshop. Mathworks. http://www.mathworks.com
- 27.
- 28.Murthy PK, Bhattacharyya SS (2001) Shared buffer implementations of signal processing systems using lifetime analysis techniques. IEEE Trans Comput-Aided Des Integr Circuits Syst 20(2):177–198 CrossRefGoogle Scholar
- 29.Oh H, Ha S (2003) Memory-optimized software synthesis from dataflow program graphs with large size data samples. EURASIP J Appl Signal Process 2003:514–529 CrossRefMATHGoogle Scholar
- 30.Ritz S, Willems M, Meyr H (1995) Scheduling for optimum data memory compaction in block diagram oriented software synthesis. In: Proceedings of ICASS’95, Detroit, May 1995, pp 2651–2653 Google Scholar
- 31.Balasa F, Catthoor F, De Man H (1995) Background memory area estimation for multidimensional signal processing systems. IEEE Trans. Comput. Des. Integr. Circuits Syst. 3(2):157–172 Google Scholar
- 32.De Greef E, Catthoor F, De Man H (1998) Program transformation strategies for memory size and power reduction of pseudo-regular multimedia subsystems. IEEE Trans Circuits Syst Video Technol 8(6):719–733 CrossRefGoogle Scholar
- 33.Greef ED, Catthoor F, Man HD (1997) Array placement for storage size reduction in embedded multimedia systems. In: Proceedings of ASAP’97, Zurich, July 1997 Google Scholar
- 34.Fabri J (1979) Automatic storage optimization. ACM SIGPLAN’79 Not 14(8):83–91 CrossRefGoogle Scholar
- 35.Zhu J (2001) Static memory allocation by pointer analysis and coloring. In: Proceedings of DATE’01, Munich, March 2001, pp 785–790 Google Scholar
- 36.Joisha PG, Banerjee P (2003) Static array storage optimization in MATLAB. In: ACM SIGPLAN 2003, California, pp 258–268 Google Scholar
- 37.Jantsch A (2003) Modeling embedded systems and SoCs—concurrency and time in models of computation. Kaufmann, Los Altos Google Scholar
- 38.Lee EA, Sangiovanni-Vincentelli A (1998) A framework for comparing models of computation. IEEE Trans CAD Integr Circuits Syst 17(12):1217–1229 CrossRefGoogle Scholar
- 39.Cesario WO, Nicolescu G, Gauthier L, Lyonnard D, Jerraya AA (2001) Colif: a design representation for application-specific multiprocessor SoC. IEEE Des Test Comput 18(5):18–20 CrossRefGoogle Scholar
- 40.Cormen TH, Leiserson CE, Rivest RL (1990) Introduction to algorithms. MIT Press, Cambridge, pp 329–355 Google Scholar
- 41.Tensilica Xtensa V. http://www.tensilica.com
- 42.Mathworks Inc. Tips for optimizing the generated code. In: Real-time workshop embedded coder 5, pp 84–94. http://www.mathworks.com
- 43.Huang K, Han S-I, Popovici K, Brisolara L, Guerin X, Li L, Yan X, Chae S-I, Carro L, Jerraya AA (2007) Simulink-based MPSoC design flow: case study of motion-JPEG and H.264. In: Proceedings of DAC’07, San Diego, June 2007, pp 39–42 Google Scholar
- 44.Wood WA, Kleb WL (2003) Exploring XP for scientific research. IEEE Soft 20(3):30–36 CrossRefGoogle Scholar
- 45.Tensilica. XPRES compiler. http://www.tensilica.com/products/xpres.htm
- 46.Banerjee P, Chandy JA, Gupta M, Hodges IV EW, Holm JG, Lain A, Palermo DJ, Ramaswamy S, Su E (1995) The paradigm compiler for distributed-memory multicomputers. Computer 28(10):37–47 CrossRefGoogle Scholar
- 47.POSIX 1003.1c threading, IEEE POSIX 1003.1c-1995, ISO/IEC 9945-1:1996 Google Scholar