Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs
- 359 Downloads
- 3 Citations
Abstract
The majority of applications, ranging from the low complexity to very multifaceted entities requiring dedicated hardware accelerators, are very well suited for Multiprocessor Systems-on-Chips (MPSoCs). It is critical to understand the general characteristics of a given embedded application: its behavior and its requirements in terms of MPSoC resources. This paper presents a complete method to study the important aspect of memory characteristic of an application. This method spans the theoretical, architecture-independent memory characterization to the quasi optimal static memory allocation of an application on a real shared-memory MPSoCs. The application is modeled as an Synchronous Dataflow (SDF) graph which is used to derive a Memory Exclusion Graph (MEG) essential for the analysis and allocation techniques. Practical considerations, such as cache coherence and memory broadcasting, are extensively treated. Memory footprint optimization is demonstrated using the example of a stereo matching algorithm from the computer vision domain. Experimental results show a reduction of the memory footprint by up to 43 % compared to a state-of-the-art minimization technique, a throughput improvement of 33 % over dynamic allocation, and the introduction of a tradeoff between multicore scheduling flexibility and memory footprint.
Keywords
Memory allocation Multiprocessor system-on-chip Stereo vision Synchronous dataflowReferences
- 1.Arndt, O., Becker, D., Banz, C., Blume, H. (2013). Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In Embedded computer systems: architectures, modeling, and simulation (SAMOS XIII).Google Scholar
- 2.Benazouz, M., Marchetti, O., Munier-Kordon, A., Urard, P. (2010). A new approach for minimizing buffer capacities with throughput constraint for embedded system design. In Computer systems and applications (AICCSA), 2010 IEEE/ACS.Google Scholar
- 3.Bodin, B., Munier-Kordon, A., de Dinechin, B. (2012). K-periodic schedules for evaluating the maximum throughput of a synchronous dataflow graph. In Embedded computer systems (SAMOS).Google Scholar
- 4.Bouchard, M., Angalović, M., Hertz, A. About equivalent interval colorings of weighted graphs. Discrete Appl. Math. doi: 10.1016/j.dam.2009.04.015.
- 5.Boutellier, J. (2009). Quasi-static scheduling for fine-grained embedded multiprocessing. Ph.D.thesis.Google Scholar
- 6.Desnos, K., Pelcat, M., Nezan, J., Aridhi, S. (2012). Memory bounds for the distributed execution of a hierarchical synchronous data-flow graph. In International conference on embedded computer systems (SAMOS).Google Scholar
- 7.Desnos, K., Pelcat, M., Nezan, J.F., Aridhi, S. (2013). Pre-and post-scheduling memory allocation strategies on mpsocs. In Electronic system level synthesis conference (ESLsyn).Google Scholar
- 8.Desnos, K., & Zhang, J. (2013). Preesm project - stereo matching. svn://svn.code.sf.net/p/preesm/code/trunk/tests/stereo.
- 9.El Assad, S., & Noura, H. (2013). Generator of chaotic sequences and corresponding generating system. EP Patent App. EP20,110,720,313. http://www.google.com/patents/EP2553567A1?cl=en.
- 10.Electronic Systems Group TU Eindhoven (2013). Sdf for free (sdf3). http://www.es.ele.tue.nl/sdf3/.
- 11.Embedded Vision Alliance (2013). Embedded vision alliance. http://www.embedded-vision.com.
- 12.Fabri, J. (1979). Automatic storage optimization. Courant Institute of Mathematical Sciences, New York University.Google Scholar
- 13.Fischaber, S., Woods, R., McAllister, J. (2007). Soc memory hierarchy derivation from dataflow graphs. In IEEE workshop on signal processing systems (pp. 469–474). doi: 10.1109/SIPS.2007.4387593 10.1109/SIPS.2007.4387593.
- 14.Greef, E.D., Catthoor, F., Man, H.D. (1997). Array placement for storage size reduction in embedded multimedia systems. ASAP.Google Scholar
- 15.Intel (2013). i7-3610qm processor product page. http://ark.intel.com/products/64899/.
- 16.Johnson, D.S. (1973). Near-optimal bin packing algorithms. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
- 17.Kalray (2013). Many-core processors – dataflow. http://www.kalray.eu/technology/dataflow/.
- 18.Lee, E., & Messerschmitt, D. (1987). Synchronous data flow. Proceedings of the IEEE, 75(9), 1235–1245. doi: 10.1109/PROC.1987.13876 CrossRefGoogle Scholar
- 19.Lee, E.A., & Parks, T.M. (1995). Dataflow process networks. Proceedings of the IEEE, 83(5), 773–801.CrossRefGoogle Scholar
- 20.Malamas, E.N., Petrakis, E.G., Zervakis, M., Petit, L., Legat, J.D. (2003). A survey on industrial vision systems, applications and tools. Image and Vision Computing, 21(2), 171–188.CrossRefGoogle Scholar
- 21.Murthy, P., & Bhattacharyya, S. (2000). Shared memory implementations of synchronous dataflow specifications. In Proceedings of the design, automation and test in Europe conference and exhibition.Google Scholar
- 22.Murthy, P.K.,& Bhattacharyya, S.S. (2010). Memory management for synthesis of DSP software. CRC Press.Google Scholar
- 23.Östergård, P.R.J. (2001). A new algorithm for the maximum-weight clique problem. Nordic Journal of Computing, 8(4), 424–436. Google Scholar
- 24.Parks, T.M. (1995). Bounded scheduling of process networks. Ph.D. thesis, University of California.Google Scholar
- 25.Pelcat, M., Aridhi, S., Piat, J., Nezan, J.F. (2012). Physical layer multi-core prototyping: a dataflow-based approach for LTE eNodeB. Springer.Google Scholar
- 26.Pelcat, M., Nezan, J.F., Piat, J., Croizer, J., Aridhi, S. (2009). A System-Level architecture model for rapid prototyping of heterogeneous multicore embedded systems. DASIP.Google Scholar
- 27.Roy, S. (1999). Stereo without epipolar lines: a maximum-flow formulation. International Journal of Computer Vision, 34(2–3), 147–161.CrossRefGoogle Scholar
- 28.Sriram, S., & Bhattacharyya, S.S. (2009). Embedded multiprocessors: scheduling and synchronization, 2nd Edn. Boca Raton, FL: CRC Press, Inc.CrossRefGoogle Scholar
- 29.Stuijk, S., Geilen, M., Basten, T. (2006). Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the 43rd annual design automation conference.Google Scholar
- 30.Szeliski, R., & Zabih, R. (2000). An experimental comparison of stereo algorithms:Vision algorithms: theory and practice. In Vision algorithms: theory and practice, pp. 1–19. Springer.Google Scholar
- 31.Szymanek, R., & Kuchcinski, K. (2001). A constructive algorithm for memory-aware task assignment and scheduling. In CODES Proceedings.Google Scholar
- 32.Texas Instruments. Tms320c6678 product page. http://www.ti.com/product/tms320c6678.
- 33.Urban, F., Raulet,M., Nezan, J.F., Déforges, O. (2006). Automatic dsp cache memory management and fast prototyping for multi-processor image applications. In 14th European signal processing conference. Eusipco.Google Scholar
- 34.Wagner, D. (2007). Handheld augmented reality. Ph.D.thesis.Google Scholar
- 35.Wulf, W.A., & McKee, S.A. (1995). Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, 23(1), 20–24.CrossRefGoogle Scholar
- 36.Yamaguchi, K., & Masuda, S. (2008). A new exact algorithm for the maximum weight clique problem. In 23rd international conference on circuit/systems, computers and communications (ITC-CSCC’08).Google Scholar
- 37.Zhang, J., Nezan, J.F., Pelcat, M., Cousin, J.G. (2013). Real-time gpu-based local stereo matching method. In IEEE conference on design and architectures for signal and image processing (DASIP). Google Scholar