Abstract
Coarse Grain Reconfigurable Array (CGRA) architectures have been extensively used for accelerating time consuming loops. The design of such systems requires good balance between the architecture abilities and the loops’ characteristics. A reliable design is characterized by optimized cost-performance trade-off. The main target of this paper is to present an exploration framework that automates the evaluation of CGRA architectures. In specific, the framework helps the designer to identify CGRA architectures tuned toward a specific application domain. The whole process is assisted: (1) by an optimized retargetable compiler based on modulo scheduling and (2) by the Synopsys Design Compiler that provides realization metrics such as the area and clock frequency. Both target on the description of a parametric CGRA architecture template which is capable of instantiating a large diversity of these architectures. Until now, many studies suggest that clock frequency influences performance. However, none of them examines the impact of architecture on clock frequency and performance. Our work studies in a unified way for the first time the area, the clock frequency, the instructions per cycle and performance. Hence, architectures with good compromise between cost and performance can be identified. Another objective of the paper is to present the advances made to the compiler approach used by the exploration framework. In specific, a new more effective priority scheme is proposed while the modulo scheduler has been equipped with backtracking capability. The experiments outline the algorithm’s efficiency and scalability for a given set of DSP benchmarks. Moreover, optimized architectures with respect to cost-performance trade-off have been identified by an exploration over 72 CGRA architecture alternatives.
Similar content being viewed by others
References
Hartenstein R (2001) A decade of reconfigurable computing: A visionary retrospective. In: Proc of ACM/IEEE DATE’01, pp 642–649
Pact Corporation (2005) The XPP white Paper. Technical report, www.pactcorp.com
Mei B, Vernalde S, Verkest D, De Man H, Lauwereins R (2003) ADRES: an architecture with tightly coupled vliw processor and coarse grained reconfigurable matrix. In: Proc of FPL’03, pp 61–70
Singh H, Ming-Hau L, Guangming L et al. (2000) Morphosys: an integrated reconfigurable system for data-parallel and communication-intensive applications. IEEE Trans Comput 49(5):465–481
Miyamori T, Olukotun K (1998) A quantitative analysis of reconfigurable coprocessors for multimedia applications. In: IEEE symposium on FPGAs for custom computing machines, pp 2–11
Ebeling C, Fisher C, Xing G, Shen M, Liu H (2004) Implementing an OFDM receiver on the RaPiD reconfigurable architecture. IEEE Trans Comput 53(11):1436–1448
Waingold E, Taylor M, Sarkar V, Lee V et al. (1997) Baring it all to software: raw machines. IEEE Comput 30(9):86–93
Lee J, Choi K, Dutt N (2003) Compilation approach for coarse-grained reconfigurable architectures. IEEE Des Test Comput 20(1):26–33
Kwok Z, Wilton SJE ( 2005) Register file architecture optimization in coarse grained reconfigurable architecture. In: Proc 13th annual IEEE symp. on field programmable custom computing machines, pp 1–10
Panda PR, Catthoor F, Dutt ND et al. (2001) Data and memory optimization techniques for embedded systems. ACM Trans Des Automat Electron Syst (TODAES) 6(2):149–206
Hartenstein RW, Kress R (1995) A datapath synthesis system for the reconfigurable datapath architecture. In: Proc of ASP-DAC, Article No 77, Sep 1995
Cardoso JMP, Weinhardt M (2002) XPP-VC: a compiler with temporal partitioning for the PACT-XPP architecture. In: Proc of FPL 02. LNCS, vol 2438. Springer, Berlin, pp 864–874
Ferreira R, Cardoso JMP, Toledo A, Neto HC (2005) Data driven regular reconfigurable arrays: design, space exploration and mapping. In: SAMOS, Greece 2005. LNCS, vol 3553. Springer, Berlin, pp 41–50
Kennedy K, Allen R (2002) Optimizing compilers for modern architectures. Morgan Kauffman, San Mateo
Zalamea J, Llosa J, Ayguade E, Valero M (2004) Register constrained modulo scheduling. IEEE Trans Parallel Distrib Syst 15(5):417–430
Dimitroulakos G, Galanis MD, Goutis CE (2006) Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures. In: Proc int symp par and distr systems (IPDPS 06), April 25–29, 2006, p 10
Galanis MD, Dimitroulakos G, Goutis CE (2007) Speedups and energy reductions from mapping DSP applications on an embedded reconfigurable system. IEEE Trans Very Large Scale Integr Syst 15(12):1362–1366
Galanis MD, Dimitroulakos G, Goutis CE (2006) Partitioning methodology for heterogeneous reconfigurable functional unit. J Supercomput 38(1):17–34
Mahlke SA, Lin DC, Chen WY et al (1992) Effective compiler support for predicated execution using the hyperblock. In: Proc 25th microarchitecture, pp 45–54
Allan VH, Jones RB, Lee RM, Allan SJ (1995) Software pipelining. ACM Comput Surv 27(3):367–432
Rau BR (1994) Iterative Modulo scheduling: an algorithm for software pipelining loops. In: Proc 27th ann int’l symp microarchitecture, San Jose, CA, Dec 1994, pp 63–74
Lam MS (1988) Software pipelining: an effective scheduling technique for VLIW machines. In: Proc of SIGPLAN’88, pp 318–328
Ruttenberg J, Gao GR, Stoutchinin A, Lichtenstein W (1996) Software pipelining showdown: optimal vs heuristic methods in a production compiler. In: Proc of PLDI 96, pp 1–11
Hartenstein RW, Hoffman T, Nageldinger U (2000) Design–space exploration of low power coarse grained reconfigurable datapath array architectures. In: Proc PATMOS 2000. LNCS, vol 1918, pp 118–128
Panda PR, Dutt N, Nicolau A (1999) Memory issues in embedded systems-on-chip: optimizations and exploration. Kluwer Academic, Dordrecht
Rau BR, Lee M, Tirumalai P, Schlansker MS Register allocation for software pipelined loops. In: Proc of ACM SIGPLAN
Wuytack S, Diguet JP, Catthoor F, De Man H (1998) Formalized methodology for data reuse exploration for low-power hierarchical memory mappings. In: IEEE transactions on VLSI systems, vol 6, no 4
Hall MW et al. (1996) Maximizing multiprocessor performance with the SUIF compiler. Computer 29:84–89
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge
Leupers R, Basu A, Marwedel P (1998) Optimized array index computation in {DSP} programs. In: ASP-DAC, pp 87–92
De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill, New York
Texas Instruments Inc. (2005) www.ti.com
Synopsys (2008) http://www.synopsys.com/products/logic/design_compiler.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dimitroulakos, G., Kostaras, N., Galanis, M.D. et al. Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays. J Supercomput 48, 115–151 (2009). https://doi.org/10.1007/s11227-008-0208-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0208-y