Exploiting Both Pipelining and Data Parallelism with SIMD Reconfigurable Architecture

  • Yongjoo Kim
  • Jongeun Lee
  • Jinyong Lee
  • Toan X. Mai
  • Ingoo Heo
  • Yunheung Paek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7199)


Reconfigurable Architecture (RA), which provides extremely high energy efficiency for certain domains of applications, have one problem that current mapping algorithms for it do not scale well with the number of cores. One approach to this problem is using SIMD (Single Instruction Multiple Data) paradigm. However, SIMD can complicate the mapping problem by adding an additional dimension, i.e., iteration mapping, to the already inter-dependent problems of data mapping and operation mapping, and can significantly affect performance through memory bank conflicts. In this paper we introduce SIMD reconfigurable architecture, which allows for SIMD mapping at multiple levels of granularity, and investigate ways to minimize bank conflicts in a SIMD reconfigurable architecture with the related sub-problems taken into consideration. We further present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of iteration and data mapping.


Coarse-grained reconfigurable architecture Application mapping Sequential Interleaving Memory bank conflict 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.-S.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: PACT 2008, pp. 166–176. ACM, New York (2008)Google Scholar
  2. 2.
    Wu, K., Kanstein, A., Madsen, J., Bereković, M.: MT-ADRES: Multithreading on Coarse-Grained Reconfigurable Architecture. In: Diniz, P.C., Marques, E., Bertels, K., Fernandes, M.M., Cardoso, J.M.P. (eds.) ARCS 2007. LNCS, vol. 4419, pp. 26–38. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Park, H., Park, Y., Mahlke, S.: Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In: MICRO-42, pp. 370–380 (December 2009)Google Scholar
  4. 4.
    Kim, Y., Lee, J., Mai, T.X., Paek, Y.: Improving performance of nested loops on reconfigurable array processors. ACM Transactions on Architecture and Code Optimization (2012)Google Scholar
  5. 5.
    Mei, B., Vernalde, S., Verkest, D., De Man, H., Lauwereins, R.: ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In: Cheung, P.Y.K., Constantinides, G.A. (eds.) FPL 2003. LNCS, vol. 2778, pp. 61–70. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Kim, Y., Lee, J., Shrivastava, A., Yoon, J., Paek, Y.: Memory-Aware Application Mapping on Coarse-Grained Reconfigurable Arrays. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 171–185. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Kim, Y., Lee, J., Shrivastava, A., Paek, Y.: Operation and data mapping for cgras with multi-bank memory. SIGPLAN Not. 45(4), 17–26 (2010)CrossRefGoogle Scholar
  8. 8.
    Barua, R., Lee, W., Amarasinghe, S., Agarawal, A.: Compiler support for scalable and efficient memory systems. IEEE Trans. Comput. 50, 1234–1247 (2001)CrossRefGoogle Scholar
  9. 9.
    Peleg, A., Weiser, U.: MMX technology extension to the intel architecture. IEEE Micro 16(4), 42–50 (1996)CrossRefGoogle Scholar
  10. 10.
    Singh, H., Lee, M.-H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49(5), 465–481 (2000)CrossRefGoogle Scholar
  11. 11.
    Lin, Y., Lee, H., Woh, M., Harel, Y., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K.: Soda: A high-performance dsp architecture for software-defined radio. IEEE Micro 27(1), 114–123 (2007)CrossRefGoogle Scholar
  12. 12.
    Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K.: Anysp: anytime anywhere anyway signal processing. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp. 128–139. ACM (2009)Google Scholar
  13. 13.
    Dasika, G., Woh, M., Seo, S., Clark, N., Mudge, T., Mahlke, S.: Mighty-morphing power-SIMD. In: Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 67–76. ACM (2010)Google Scholar
  14. 14.
    Kyo, S., Okazaki, S.: IMAPCAR: A 100 gops in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements. J. Signal Process. Syst. 62, 5–16 (2011)CrossRefGoogle Scholar
  15. 15.
    Fatemi, H., Mesman, B., Corporaal, H., Jonker, P.: RC-SIMD: Reconfigurable communication SIMD architecture for image processing applications. Journal of Embedded Computing 2, 167–179 (2006)Google Scholar
  16. 16.
    Bougard, B., De Sutter, B., Verkest, D., Van der Perre, L., Lauwereins, R.: A coarse-grained array accelerator for software-defined radio baseband processing. IEEE Micro 28, 41–50 (2008)CrossRefGoogle Scholar
  17. 17.
    Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39, 1–7 (2011)CrossRefGoogle Scholar
  18. 18.
    Wang, D., Ganesh, B., Tuaycharoen, N., Baynes, K., Jaleel, A., Jacob, B.: Dramsim: a memory system simulator. SIGARCH Comput. Archit. News 33, 100–107 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yongjoo Kim
    • 1
  • Jongeun Lee
    • 2
  • Jinyong Lee
    • 1
  • Toan X. Mai
    • 2
  • Ingoo Heo
    • 1
  • Yunheung Paek
    • 1
  1. 1.School of EECSSeoul National UniversitySeoulKorea
  2. 2.School of ECEUlsan National Institute of Science and TechnologyUlsanKorea

Personalised recommendations