Advertisement

Journal of Signal Processing Systems

, Volume 80, Issue 1, pp 19–37 | Cite as

Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs

In-Depth Study of a Computer Vision Application
  • Karol Desnos
  • Maxime PelcatEmail author
  • Jean-François Nezan
  • Slaheddine Aridhi
Article

Abstract

The majority of applications, ranging from the low complexity to very multifaceted entities requiring dedicated hardware accelerators, are very well suited for Multiprocessor Systems-on-Chips (MPSoCs). It is critical to understand the general characteristics of a given embedded application: its behavior and its requirements in terms of MPSoC resources. This paper presents a complete method to study the important aspect of memory characteristic of an application. This method spans the theoretical, architecture-independent memory characterization to the quasi optimal static memory allocation of an application on a real shared-memory MPSoCs. The application is modeled as an Synchronous Dataflow (SDF) graph which is used to derive a Memory Exclusion Graph (MEG) essential for the analysis and allocation techniques. Practical considerations, such as cache coherence and memory broadcasting, are extensively treated. Memory footprint optimization is demonstrated using the example of a stereo matching algorithm from the computer vision domain. Experimental results show a reduction of the memory footprint by up to 43 % compared to a state-of-the-art minimization technique, a throughput improvement of 33 % over dynamic allocation, and the introduction of a tradeoff between multicore scheduling flexibility and memory footprint.

Keywords

Memory allocation Multiprocessor system-on-chip Stereo vision Synchronous dataflow 

References

  1. 1.
    Arndt, O., Becker, D., Banz, C., Blume, H. (2013). Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In Embedded computer systems: architectures, modeling, and simulation (SAMOS XIII).Google Scholar
  2. 2.
    Benazouz, M., Marchetti, O., Munier-Kordon, A., Urard, P. (2010). A new approach for minimizing buffer capacities with throughput constraint for embedded system design. In Computer systems and applications (AICCSA), 2010 IEEE/ACS.Google Scholar
  3. 3.
    Bodin, B., Munier-Kordon, A., de Dinechin, B. (2012). K-periodic schedules for evaluating the maximum throughput of a synchronous dataflow graph. In Embedded computer systems (SAMOS).Google Scholar
  4. 4.
    Bouchard, M., Angalović, M., Hertz, A. About equivalent interval colorings of weighted graphs. Discrete Appl. Math. doi: 10.1016/j.dam.2009.04.015.
  5. 5.
    Boutellier, J. (2009). Quasi-static scheduling for fine-grained embedded multiprocessing. Ph.D.thesis.Google Scholar
  6. 6.
    Desnos, K., Pelcat, M., Nezan, J., Aridhi, S. (2012). Memory bounds for the distributed execution of a hierarchical synchronous data-flow graph. In International conference on embedded computer systems (SAMOS).Google Scholar
  7. 7.
    Desnos, K., Pelcat, M., Nezan, J.F., Aridhi, S. (2013). Pre-and post-scheduling memory allocation strategies on mpsocs. In Electronic system level synthesis conference (ESLsyn).Google Scholar
  8. 8.
    Desnos, K., & Zhang, J. (2013). Preesm project - stereo matching. svn://svn.code.sf.net/p/preesm/code/trunk/tests/stereo.
  9. 9.
    El Assad, S., & Noura, H. (2013). Generator of chaotic sequences and corresponding generating system. EP Patent App. EP20,110,720,313. http://www.google.com/patents/EP2553567A1?cl=en.
  10. 10.
    Electronic Systems Group TU Eindhoven (2013). Sdf for free (sdf3). http://www.es.ele.tue.nl/sdf3/.
  11. 11.
    Embedded Vision Alliance (2013). Embedded vision alliance. http://www.embedded-vision.com.
  12. 12.
    Fabri, J. (1979). Automatic storage optimization. Courant Institute of Mathematical Sciences, New York University.Google Scholar
  13. 13.
    Fischaber, S., Woods, R., McAllister, J. (2007). Soc memory hierarchy derivation from dataflow graphs. In IEEE workshop on signal processing systems (pp. 469–474). doi:  10.1109/SIPS.2007.4387593  10.1109/SIPS.2007.4387593.
  14. 14.
    Greef, E.D., Catthoor, F., Man, H.D. (1997). Array placement for storage size reduction in embedded multimedia systems. ASAP.Google Scholar
  15. 15.
    Intel (2013). i7-3610qm processor product page. http://ark.intel.com/products/64899/.
  16. 16.
    Johnson, D.S. (1973). Near-optimal bin packing algorithms. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
  17. 17.
    Kalray (2013). Many-core processors – dataflow. http://www.kalray.eu/technology/dataflow/.
  18. 18.
    Lee, E., & Messerschmitt, D. (1987). Synchronous data flow. Proceedings of the IEEE, 75(9), 1235–1245. doi: 10.1109/PROC.1987.13876 CrossRefGoogle Scholar
  19. 19.
    Lee, E.A., & Parks, T.M. (1995). Dataflow process networks. Proceedings of the IEEE, 83(5), 773–801.CrossRefGoogle Scholar
  20. 20.
    Malamas, E.N., Petrakis, E.G., Zervakis, M., Petit, L., Legat, J.D. (2003). A survey on industrial vision systems, applications and tools. Image and Vision Computing, 21(2), 171–188.CrossRefGoogle Scholar
  21. 21.
    Murthy, P., & Bhattacharyya, S. (2000). Shared memory implementations of synchronous dataflow specifications. In Proceedings of the design, automation and test in Europe conference and exhibition.Google Scholar
  22. 22.
    Murthy, P.K.,& Bhattacharyya, S.S. (2010). Memory management for synthesis of DSP software. CRC Press.Google Scholar
  23. 23.
    Östergård, P.R.J. (2001). A new algorithm for the maximum-weight clique problem. Nordic Journal of Computing, 8(4), 424–436. Google Scholar
  24. 24.
    Parks, T.M. (1995). Bounded scheduling of process networks. Ph.D. thesis, University of California.Google Scholar
  25. 25.
    Pelcat, M., Aridhi, S., Piat, J., Nezan, J.F. (2012). Physical layer multi-core prototyping: a dataflow-based approach for LTE eNodeB. Springer.Google Scholar
  26. 26.
    Pelcat, M., Nezan, J.F., Piat, J., Croizer, J., Aridhi, S. (2009). A System-Level architecture model for rapid prototyping of heterogeneous multicore embedded systems. DASIP.Google Scholar
  27. 27.
    Roy, S. (1999). Stereo without epipolar lines: a maximum-flow formulation. International Journal of Computer Vision, 34(2–3), 147–161.CrossRefGoogle Scholar
  28. 28.
    Sriram, S., & Bhattacharyya, S.S. (2009). Embedded multiprocessors: scheduling and synchronization, 2nd Edn. Boca Raton, FL: CRC Press, Inc.CrossRefGoogle Scholar
  29. 29.
    Stuijk, S., Geilen, M., Basten, T. (2006). Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the 43rd annual design automation conference.Google Scholar
  30. 30.
    Szeliski, R., & Zabih, R. (2000). An experimental comparison of stereo algorithms:Vision algorithms: theory and practice. In Vision algorithms: theory and practice, pp. 1–19. Springer.Google Scholar
  31. 31.
    Szymanek, R., & Kuchcinski, K. (2001). A constructive algorithm for memory-aware task assignment and scheduling. In CODES Proceedings.Google Scholar
  32. 32.
    Texas Instruments. Tms320c6678 product page. http://www.ti.com/product/tms320c6678.
  33. 33.
    Urban, F., Raulet,M., Nezan, J.F., Déforges, O. (2006). Automatic dsp cache memory management and fast prototyping for multi-processor image applications. In 14th European signal processing conference. Eusipco.Google Scholar
  34. 34.
    Wagner, D. (2007). Handheld augmented reality. Ph.D.thesis.Google Scholar
  35. 35.
    Wulf, W.A., & McKee, S.A. (1995). Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, 23(1), 20–24.CrossRefGoogle Scholar
  36. 36.
    Yamaguchi, K., & Masuda, S. (2008). A new exact algorithm for the maximum weight clique problem. In 23rd international conference on circuit/systems, computers and communications (ITC-CSCC’08).Google Scholar
  37. 37.
    Zhang, J., Nezan, J.F., Pelcat, M., Cousin, J.G. (2013). Real-time gpu-based local stereo matching method. In IEEE conference on design and architectures for signal and image processing (DASIP). Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Karol Desnos
    • 1
  • Maxime Pelcat
    • 1
    Email author
  • Jean-François Nezan
    • 1
  • Slaheddine Aridhi
    • 2
  1. 1.IETR, INSA RennesUMR CNRS 6164, UEBRennesFrance
  2. 2.Texas Instrument FranceVilleneuve LoubetFrance

Personalised recommendations