Advertisement

Design Space Exploration for Efficient Data Intensive Computing on SoCs

  • Rosilde Corvino
  • Abdoulaye Gamatié
  • Pierre Boulet
Chapter

Abstract

Finding efficient implementations of data intensive applications, such as radar/sonar signal and image processing, on a system-on-chip is a very challenging problem due to increasing complexity and performance requirements of such applications. One major issue is the optimization of data transfer and storage micro-architecture, which is crucial in this context. In this chapter, we propose a comprehensive method to explore the mapping of high-level representations of applications into a customizable hardware accelerator. The high-level representation is given in a language named Array-OL. The customizable architecture uses FIFO queues and a double buffering mechanism to mask the latency of data transfers and external memory access. The mapping of a high-level representation onto a given architecture is achieved by applying loop transformations in Array-OL. A method based on integer partition is used to reduce the space of explored solutions. Our proposition aims at facilitating the inference of adequate hardware realizations for data intensive applications. It is illustrated on a case study consisting in implementing a hydrophone monitoring application.

Keywords

Local Memory Data Block External Memory Task Repetition Design Space Exploration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Tony Hey, Stewart Tansley, and Kristin Tolle, editors.The Fourth Paradigm: Data-Intensive Scientific Discovery. 2009.Google Scholar
  2. 2.
    Jianwen Zhu and Nikil Dutt. Electronic system-level design and high-level synthesis. In Laung-Terng Wang, Yao-Wen Chang, and Kwang-Ting (Tim) Cheng, editors,Electronic Design Automation, pages 235–297. Morgan Kaufmann, Boston, 2009.Google Scholar
  3. 3.
    Felice Balarin, Massimiliano Chiodo, Paolo Giusto, Harry Hsieh, Attila Jurecska, Luciano Lavagno, Claudio Passerone, Alberto Sangiovanni-Vincentelli, Ellen Sentovich, Kei Suzuki, and Bassam Tabbara.Hardware-software co-design of embedded systems: the POLIS approach. Kluwer Academic Publishers, Norwell, MA, USA, 1997.CrossRefzbMATHGoogle Scholar
  4. 4.
    R. Ernst, J. Henkel, Th. Benner, W. Ye, U. Holtmann, D. Herrmann, and M. Trawny. The cosyma environment for hardware/software cosynthesis of small embedded systems.Microprocessors and Microsystems, 20(3):159–166, 1996.CrossRefGoogle Scholar
  5. 5.
    B. Kienhuis, E. Deprettere, K. Vissers, and P. Van Der Wolf. An approach for quantitative analysis of application-specific dataflow architectures. InApplication-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International Conference on, pages 338–349, Jul 1997.Google Scholar
  6. 6.
    Sander Stuijk.Predictable Mapping of Streaming Applications on Multiprocessors. PhD thesis, Technische Universiteit Eindhoven, The Nederlands, 2007.Google Scholar
  7. 7.
    Andreas Gerstlauer and Daniel D. Gajski. System-level abstraction semantics. InProceedings of the 15th international symposium on System Synthesis, ISSS ’02, pages 231–236, New York, NY, USA, 2002. ACM.Google Scholar
  8. 8.
    P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems.ACM Trans. Des. Autom. Electron. Syst., 6:149–206, April 2001.CrossRefGoogle Scholar
  9. 9.
    F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. G. Kjeldsberg, T. Van Achteren, and T. Omnes.Data access and storage management for embedded programmable processors. Springer, 2002.Google Scholar
  10. 10.
    Rosilde Corvino, Abdoulaye Gamatié, and Pierre Boulet. Architecture exploration for efficient data transfer and storage in data-parallel applications. In Pasqua D’Ambra, Mario Guarracino, and Domenico Talia, editors,Euro-Par 2010 - Parallel Processing, volume 6271 ofLecture Notes in Computer Science, pages 101–116. Springer Berlin/Heidelberg, 2010.Google Scholar
  11. 11.
    Lech Józwiak, Nadia Nedjah, and Miguel Figueroa. Modern development methods and tools for embedded reconfigurable systems: A survey.Integration, the VLSI Journal, 43(1):1–33, 2010.Google Scholar
  12. 12.
    Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow.Proceedings of the IEEE, 75(9):1235–1245, September 1987.CrossRefGoogle Scholar
  13. 13.
    A. Sangiovanni-Vincentelli and G. Martin. Platform-based design and software design methodology for embedded systems.Design Test of Computers, IEEE, 18(6):23–33, Nov/Dec 2001.Google Scholar
  14. 14.
    Giuseppe Ascia, Vincenzo Catania, Alessandro G. Di Nuovo, Maurizio Palesi, and Davide Patti. Efficient design space exploration for application specific systems-on-a-chip.Journal of Systems Architecture, 53(10):733–750, 2007.CrossRefGoogle Scholar
  15. 15.
    F Balasa, P Kjeldsberg, A Vandecappelle, M Palkovic, Q Hu, H Zhu, and F Catthoor. Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications.Journal of Signal Processing Systems, 53(1):51–71, Nov 2008.CrossRefGoogle Scholar
  16. 16.
    Yong Chen, Surendra Byna, Xian-He Sun, Rajeev Thakur, and William Gropp. Hiding i/o latency with pre-execution prefetching for parallel applications. InACM/IEEE Supercomputing Conference (SC’08), page 40, 2008.Google Scholar
  17. 17.
    P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems.ACM Transactions on Design Automation of Electronic Systems, 6(2):149–206, 2001.CrossRefGoogle Scholar
  18. 18.
    H T Kung. Why systolic architectures.Computer, 15(1):37–46, 1982.Google Scholar
  19. 19.
    Abdelkader Amar, Pierre Boulet, and Philippe Dumont. Projection of the Array-OL Specification Language onto the Kahn Process Network Computation Model. InISPAN ’05: Proceedings of the 8th International Symposium on Parallel Architectures, Algorithms and Networks, pages 496–503, 2005.Google Scholar
  20. 20.
    D. Kim, R. Managuli, and Y. Kim. Data cache and direct memory access in programming mediaprocessors.Micro, IEEE, 21(4):33–42, Jul 2001.CrossRefGoogle Scholar
  21. 21.
    Jason D. Hiser, Jack W. Davidson, and David B. Whalley. Fast, Accurate Design Space Exploration of Embedded Systems Memory Configurations. InSAC ’07: Proceedings of the 2007 ACM symposium on Applied computing, pages 699–706, New York, NY, USA, 2007. ACM.Google Scholar
  22. 22.
    Q. Hu, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, and F. Catthoor. Incremental hierarchical memory size estimation for steering of loop transformations.ACM Transactions on Design Automation of Electronic Systems, 12(4):50, 2007.Google Scholar
  23. 23.
    Yong Chen, Surendra Byna, Xian-He Sun, Rajeev Thakur, and William Gropp. Hiding I/O latency with pre-execution prefetching for parallel applications. InSC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1–10, 2008.Google Scholar
  24. 24.
    P.K. Murthy and E.A. Lee. Multidimensional synchronous dataflow.IEEE Transactions on Signal Processing, 50(8):2064–2079, Aug. 2002.CrossRefGoogle Scholar
  25. 25.
    F. Deprettere and T. Stefanov. Affine nested loop programs and their binary cyclo-static dataflow counterparts. InProc. of Conf. on Application Specific Systems, Architectures, and Processors, pages 186–190, 2006.Google Scholar
  26. 26.
    Albert Cohen, Marc Duranton, Christine Eisenbeis, Claire Pagetti, Florence Plateau, and Marc Pouzet. N-synchronous kahn networks: a relaxed model of synchrony for real-time systems. InPOPL ’06: Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 180–193, 2006.Google Scholar
  27. 27.
    Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies.Journal of Parallel Programming, 34:261–317, 2006.CrossRefzbMATHGoogle Scholar
  28. 28.
    Mark Thompson, Hristo Nikolov, Todor Stefanov, Andy D. Pimentel, Cagkan Erbas, Simon Polstra, and Ed F. Deprettere. A framework for rapid system-level exploration, synthesis, and programming of multimedia mp-socs. InProceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, CODES+ISSS’07, pages 9–14, New York, NY, USA, 2007. ACM.Google Scholar
  29. 29.
    Scott Fischaber, Roger Woods, and John McAllister. Soc memory hierarchy derivation from dataflow graphs.Journal of Signal Processing Systems, 60:345–361, 2010.CrossRefGoogle Scholar
  30. 30.
    Calin Glitia and Pierre Boulet. High Level Loop Transformations for Systematic Signal Processing Embedded Applications. Research Report RR-6469, INRIA, 2008.Google Scholar
  31. 31.
    S.H. Fuller and L.I. Millett. Computing performance: Game over or next level?Computer, 44(1):31–38, Jan. 2011.CrossRefGoogle Scholar
  32. 32.
    Rosilde Corvino.Exploration de l’espace des architectures pour des systèmes de traitement d’image, analyse faite sur des blocs fondamentaux de la rétine numérique. PhD thesis, Université Joseph-Fourier - Grenoble I, France, 2009.Google Scholar
  33. 33.
    Calin Glitia, Philippe Dumont, and Pierre Boulet. Array-OL with delays, a domain specific specification language for multidimensional intensive signal processing.Multidimensional Systems and Signal Processing (Springer Netherlands), 2010.Google Scholar
  34. 34.
    B.C. de Lavarene, D. Alleysson, B. Durette, and J. Herault. Efficient demosaicing through recursive filtering. InIEEE International Conference on Image Processing (ICIP 07), volume 2, Oct. 2007.Google Scholar
  35. 35.
    Jeanny Hérault and Barthélémy Durette. Modeling visual perception for image processing.Computational and Ambient Intelligence (LNCS Springer Berlin/Heidelberg), pages 662–675, 2007.Google Scholar
  36. 36.
    Calin Glitia and Pierre Boulet. High level loop transformations for systematic signal processing embedded applications.Embedded Computer Systems: Architectures, Modeling, and Simulation (Springer), pages 187–196, 2008.Google Scholar
  37. 37.
    Ken Kennedy and Kathryn S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. InProceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 301–320, London, UK, 1994. Springer-Verlag.Google Scholar
  38. 38.
    Frank Hannig, Hritam Dutta, and Jürgen Teich. Parallelization approaches for hardware accelerators – loop unrolling versus loop partitioning.Architecture of Computing Systems – ARCS 2009, pages 16–27, 2009.Google Scholar
  39. 39.
    Jingling Xue.Loop tiling for parallelism. Kluwer Academic Publishers, 2000.Google Scholar
  40. 40.
    Preeti Ranjan Panda, Hiroshi Nakamura, Nikil D. Dutt, and Alexandru Nicolau. Augmenting loop tiling with data alignment for improved cache performance.IEEE Transactions on Computers, 48:142–149, 1999.Google Scholar
  41. 41.
    Lushan Liu, Pradeep Nagaraj, Shambhu Upadhyaya, and Ramalingam Sridhar. Defect analysis and defect tolerant design of multi-port srams.J. Electron. Test., 24(1–3):165–179, 2008.Google Scholar
  42. 42.
    Robert Schreiber, Shail Aditya, Scott Mahlke, Vinod Kathail, B Rau, Darren Cronquist, and Mukund Sivaraman. Pico-npa: High-level synthesis of nonprogrammable hardware accelerators.The Journal of VLSI Signal Processing, 31(2):127–142, Jun 2002.Google Scholar
  43. 43.
    Imondi GC, Zenzo M, and Fazio MA. Pipelined Burst Memory Access, US patent, August 2008. patent.Google Scholar
  44. 44.
    Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests.International Journal of Parallel Programming, 29(5):493–544, Oct 2001.CrossRefzbMATHGoogle Scholar
  45. 45.
    Talal Rahwan, Sarvapali Ramchurn, Nicholas Jennings, and Andrea Giovannucci. An anytime algorithm for optimal coalition structure generation.Journal of Artificial Intelligence Research (JAIR), 34:521–567, April 2009.Google Scholar
  46. 46.
    Abdoulaye Gamatié, Sébastien Le Beux, Éric Piel, Rabie Ben Atitallah, Anne Etien, Philippe Marquet, and Jean-Luc Dekeyser. A model driven design framework for massively parallel embedded systems.ACM Transactions on Embedded Computing Systems (TECS) ACM (To appear), preliminary version athttp://hal.inria.fr/inria-00311115/2010.

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Rosilde Corvino
    • 1
  • Abdoulaye Gamatié
    • 2
  • Pierre Boulet
    • 2
  1. 1.University of Technology EindhovenEindhovenThe Netherlands
  2. 2.LIFL/CNRS and Inria, Parc Scientifique de la Haute BorneVilleneuve d’AscqFrance

Personalised recommendations