Implementation and validation of architectural space exploration techniques for domain-specific reconfigurable computing

Abstract

Domain specific coarse-grained reconfigurable architectures (CGRAs) have great promise for energy-efficient flexible designs for a suite of applications. Designing such a reconfigurable device for an application domain is very challenging because the needs of different applications must be carefully balanced to achieve the targeted design goals. It requires the evaluation of many potential architectural options to select an optimal solution. Exploring the design space manually would be very time consuming and may not even be feasible for very large designs. Even mapping one algorithm onto a customized architecture can require time ranging from minutes to hours. Running a full power simulation on a complete suite of benchmarks for various architectural options require several days. Finding the optimal point in a design space could require a very long time. We have designed a framework/tool that made such design space exploration (DSE) feasible. The resulting framework allows testing a family of algorithms and architectural options in minutes rather than days and can allow rapid selection of architectural choices. In this paper, we describe our DSE framework for domain specific reconfigurable computing where the needs of the application domain drive the construction of the device architecture. The framework has been developed to automate design space case studies, allowing application developers to explore architectural tradeoffs efficiently and reach solutions quickly. We selected some of the core signal processing benchmarks from the MediaBench benchmark suite and some edge-detection benchmarks from the image processing domain for our case studies. We describe two search algorithms: a stepped search algorithm motivated by our manual design studies and a more traditional gradient based optimization. Approximate energy models are developed in each case to guide the search toward a minimal energy solution. We validate our search results by comparing the architectural solutions selected by our tool to an architecture optimized manually and by performing sensitivity tests to evaluate the ability of our algorithms to find good quality minima in the design space. All selected fabric architectures were synthesized on 130 nm cell-based ASIC fabrication process from IBM. These architectures consume almost same amount of energy on average, but the gradient based approach is more general and promises to extend well to new problem domains. We expect these or similar heuristics and the overall design flow of the system to be useful for a wide range of architectures, including mesh based and other commonly used architectures for CGRAs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Algorithm 1
Fig. 12
Fig. 13
Algorithm 2
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Notes

  1. 1.

    We note here that alternatives to this arrangement of dedicated pass gates are possible. In particular, we could provide dedicated routes in conjunction with each ALU to allow that ALU to be bypassed. However, we found such an arrangement to be expensive within the context of our design space, due to the need for additional multiplexers, and we do not consider it here. Instead, we search for the most efficient proportion of dedicated routes to provide, hence keeping the number of additional multiplexers to the minimum that provide us with energy gains vs. energy expense.

References

  1. 1.

    Monaghan S, Cowen C, Noakes PD (1993) Using fpgas to implement reconfigurable dsp architectures. In: IEE colloquium on field programmable gate arrays—technology and applications

    Google Scholar 

  2. 2.

    Fawcett BK (1995) Fpgas in reconfigurable computing applications. In: WESCON

    Google Scholar 

  3. 3.

    Kramberger I (1999) Dsp acceleration using a reconfigurable fpga. In: Proc of IEEE international symposium on industrial electronics

    Google Scholar 

  4. 4.

    Katona M, Krajacevic Z, Teslic N, Kovacevic V (2005) Signal processing algorithms implementation with fpgas. In: 7th international conference on telecommunications in modern satellite, cable and broadcasting services 2005, vol 1, pp 127–130. doi:10.1109/TELSKS.2005.1572078

    Google Scholar 

  5. 5.

    Baz M (2008) Optimization of mapping onto a flexible low-power electronic fabric architecture. PhD Dissertation, University of Pittsburgh

  6. 6.

    Levine B, Schmit H (2002) Piperench: power and performance evaluation of a programmable pipelined datapath. In: Presented at hot chips, vol 14

    Google Scholar 

  7. 7.

    Levine B (2005) Kilocore: scalable, high-performance, and power efficient coarse-grained reconfigurable fabrics. In: International symposium on advanced reconfigurable systems

    Google Scholar 

  8. 8.

    Mehta G, Stander J, Lucas J, Hoare RR, Hunsaker B, Jones AK (2006) A low-energy reconfigurable fabric for the supercisc architecture. J Low Power Electron 2(2):148–164

    Article  Google Scholar 

  9. 9.

    Mehta G, Stander J, Baz M, Hunsaker B, Jones AK (2009) Interconnect customization for a hardware fabric. ACM Trans Design Autom Electron Syst 14(1):11, 32 pages, doi:10.1145/1455229.1455240

    Article  Google Scholar 

  10. 10.

    Mehta G, Hoare RR, Stander J, Jones AK (2006) Design space exploration for low-power reconfigurable fabrics. In: Proc of the reconfigurable architectures workshop (RAW)

    Google Scholar 

  11. 11.

    Mehta G, Stander J, Baz M, Hunsaker B, Jones AK (2007) Interconnect customization for a coarse-grained reconfigurable fabric. In: Proc of the IPDPS reconfigurable architecture workshop (RAW), pp 165.1–165.8

    Google Scholar 

  12. 12.

    Mehta G, Ihrig CJ, Jones AK (2008) Reducing energy by exploring heterogeneity in a coarse-grain fabric. In: Proc of the IPDPS reconfigurable architecture workshop (RAW)

  13. 13.

    Benoit P, Sassatelli G, Torres L, Demigny D, Robert M, Cambon G (2003) Metrics for reconfigurable architectures characterization: remanence and scalability. In: IEEE IPDPS reconfigurable architecture workshop

    Google Scholar 

  14. 14.

    Enzler R, Jeger T, Cottet D, Troster G (2000) High-level area and performance estimation of hardware building blocks on FPGAs. In: Field-programmable logic and applications forum on design language

    Google Scholar 

  15. 15.

    Bilavarn S, Gogniat G, Philippe JL, Bossuet L (2003) Fast prototyping of reconfigurable architectures from a C program. In: IEEE symposium on circuits and systems

    Google Scholar 

  16. 16.

    Zabel M, Kohler S, Zimmerling M, Preuber T, Spallek R (2005) Design space exploration of coarse-grain reconfigurable dsps. In: International conference on reconfigurable computing and FPGAs. ReConFig 2005, pp 8–15. doi:10.1109/RECONFIG.2005.15

    Google Scholar 

  17. 17.

    Mehdipour F, Noori H, Zamani M, Inoue K, Murakami K (2008) Design space exploration for a coarse grain accelerator. In: Design automation conference, 2008. ASPDAC 2008. Asia and South pacific, pp 685–690. doi:10.1109/ASPDAC.2008.4484039

    Google Scholar 

  18. 18.

    Shehan B, Jahr R, Uhrig S, Ungerer T (2010) Reconfigurable grid alu processor: optimization and design space exploration. In: 13th Euromicro conference on digital system design: architectures, methods and tools (DSD), 2010, pp 71–79. doi:10.1109/DSD.2010.28

    Google Scholar 

  19. 19.

    Bossuet L, Gogniat G, Philippe JL (2005) Generic design space exploration for reconfigurable architectures. In: IEEE IPDPS reconfigurable architectures workshop (RAW)

    Google Scholar 

  20. 20.

    Kim Y, Mahapatra R, Choi K (2010) Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture. IEEE Trans Very Large Scale Integr (VLSI) Syst 18(10):1471–1482. doi:10.1109/TVLSI.2009.2025280

    Article  Google Scholar 

  21. 21.

    Sotiropoulou CL, Nikolaidis S (2010) Design space exploration for fpga-based multiprocessing systems. In: 17th IEEE international conference on electronics, circuits, and systems (ICECS), pp 1164–1167. 2010. doi:10.1109/ICECS.2010.5724724

    Google Scholar 

  22. 22.

    Irturk A, Benson B, Mirzaei S, Kastner R (2008) An fpga design space exploration tool for matrix inversion architectures. In: Symposium on application specific processors, 2008. SASP 2008, pp 42–47. doi:10.1109/SASP.2008.4570784

    Google Scholar 

  23. 23.

    Karuri K, Chattopadhyay A, Chen X, Kammler D, Hao L, Leupers R, Meyr H, Ascheid G (2008) A design flow for architecture exploration and implementation of partially reconfigurable processors. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(10):1281–1294. doi:10.1109/TVLSI.2008.2002685

    Article  Google Scholar 

  24. 24.

    Chattopadhyay A, Chen X, Ishebabi H, Leupers R, Ascheid G, Meyr H (2008) High-level modelling and exploration of coarse-grained re-configurable architectures. In: Design, automation and test in Europe, 2008. DATE ’08, pp 1334–1339. doi:10.1109/DATE.2008.4484864

    Google Scholar 

  25. 25.

    Bauer L, Shafique M, Henkel J (2009) Cross-architectural design space exploration tool for reconfigurable processors. In: Design, automation test in Europe conference exhibition, 2009. DATE ’09, pp 958–963

    Google Scholar 

  26. 26.

    Mei B, Lambrechts A, Verkest D, Mignolet JY, Lauwereins R (2005) Architecture exploration for a reconfigurable architecture template. IEEE Des Test 22:90–101. doi:10.1109/MDT.2005.27

    Article  Google Scholar 

  27. 27.

    Bouwens F, Berekovic M, Kanstein A, Gaydadjiev G (2007) Architectural exploration of the adres coarse-grained reconfigurable array. In: Proceedings of the 3rd international conference on reconfigurable computing: architectures, tools and applications, ARC’07. Springer, Berlin, pp 1–13. http://dl.acm.org/citation.cfm?id=1764631.1764633

    Google Scholar 

  28. 28.

    Sun K, Pan X, Wang J, Ping L (2007) Pad: a design space exploration model for reconfigurable systems. In: Fourth international conference on information technology, 2007, ITNG ’07, pp 964–965. doi:10.1109/ITNG.2007.146

    Google Scholar 

  29. 29.

    Miramond B, Delosme JM (2005) Design space exploration for dynamically reconfigurable architectures. In: Proceedings design, automation and test in Europe, 2005, vol 1, pp 366–371. doi:10.1109/DATE.2005.118

    Google Scholar 

  30. 30.

    Clark N, Blome J, Chu M, Mahlke S, Biles S, Flautner K (2005) An architecture framework for transparent instruction set customization in embedded processors. SIGARCH Comput Archit News 33(2):272–283. doi:10.1145/1080695.1069993. http://doi.acm.org/10.1145/1080695.1069993

    Article  Google Scholar 

  31. 31.

    Wirthlin MJ, Hutchings BL (1995) A dynamic instruction set computer. In: Proc of FCCM

    Google Scholar 

  32. 32.

    Cong J, Fan Y, Han G, Zhang Z (2004) Application-specific instruction generation for configurable processor architectures. In: Proc of ISFPGA

    Google Scholar 

  33. 33.

    Mbaye M, Belanger N, Savaria Y, Pierre S (2005) Application specific instruction-set processor generation for video processing based on loop optimization. In: International symposium on circuits and systems (ISCAS 2005). IEEE Press, New York, pp 515–3518

    Google Scholar 

  34. 34.

    Mbaye M, Belanger N, Savaria Y, Pierre S (2007) A novel application-specific instruction-set processor design approach for video processing acceleration. J VLSI Signal Process Syst 47(3):297–315

    Article  Google Scholar 

  35. 35.

    Vogt T, Wehn N (2008) A reconfigurable application specific instruction set processor for convolutional and turbo decoding in a sdr environment. In: Design, automation and test in Europe, DATE 2008. IEEE Press, New York, pp 38–43

    Google Scholar 

  36. 36.

    Guan X, Fei Y, Lin H (2011) Hierarchical design of an application-specific instruction set processor for high-throughput and scalable fft processing. IEEE Trans Very Large Scale Integr (VLSI) Syst PP(99):1–13. doi:10.1109/TVLSI.2011.2105512

    Google Scholar 

  37. 37.

    Shen Z, He H, Zhang Y, Sun Y (2007) A video specific instruction set architecture for asip design. VLSI Des 2007(2):1–7. doi:10.1155/2007/58431

    Article  Google Scholar 

  38. 38.

    Fanucci L, Cassiano M, Saponara S, Kammler D, Witte EM, Schliebusch O, Ascheid G, Leupers R, Meyr H (2006) Asip design and synthesis for non linear filtering in image processing. In: Proceedings of the conference on design, automation and test in Europe (DATE), Leuven, Belgium. European Design and Automation Association, Grenoble, pp 233–238

    Google Scholar 

  39. 39.

    Brisk P, Verma AK, Ienne P (2007) Optimal polynomial-time interprocedural register allocation for high-level synthesis and asip design. In: Proc of the international conference on computer-aided design (CCAD). IEEE Press, Piscataway, pp 172–179

    Google Scholar 

  40. 40.

    Dinh Q, Chen D, Wong MDF (2008) Efficient asip design for configurable processors with fine-grained resource sharing. In: Proceedings of the international symposium on field programmable gate arrays (ISFPGA). ACM, New York, pp 99–106. http://doi.acm.org/10.1145/1344671.1344687

    Google Scholar 

  41. 41.

    Mehta G, Jones A (2010) An architectural space exploration tool for domain specific reconfigurable computing. In: IEEE international symposium on parallel distributed processing, workshops and phd forum (IPDPSW), 2010, pp 1–8. doi:10.1109/IPDPSW.2010.5470735

    Google Scholar 

  42. 42.

    Micheli GD (1994) Synthesis and optimization of digital circuits. McGraw-Hill, New York

    Google Scholar 

  43. 43.

    Hoare R, Jones AK, Kusic D, Fazekas J, Foster J, Tung S, McCloud M (2006) Rapid VLIW processor customization for signal processing applications using combinational hardware functions. EURASIP J Appl Signal Process 46:472 (23 pages)

    Google Scholar 

  44. 44.

    Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2006) Extensible markup language (xml) 1.0 (fourth edition)—origin and goals. Tech Rep 20060816, World Wide Web Consortium

  45. 45.

    Ihrig CJ, Baz M, Stander J, Hoare RR, Norman BA, Prokopyev O, Hunsaker B, Jones AK (2008) Greedy algorithms for mapping onto a coarse-grained reconfigurable fabric. I-Tech Education and Publishing, Vienna

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Gayatri Mehta.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mehta, G., Jones, A.K. Implementation and validation of architectural space exploration techniques for domain-specific reconfigurable computing. Des Autom Embed Syst 17, 27–51 (2013). https://doi.org/10.1007/s10617-013-9118-1

Download citation

Keywords

  • Domain specific reconfigurable computing
  • Coarse-grained reconfigurable architectures
  • Design space exploration