Partitioning and Vectorizing Binary Applications for a Reconfigurable Vector Computer

  • Tobias Kenter
  • Gavin Vaz
  • Christian Plessl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8405)


In order to leverage the use of reconfigurable architectures in general-purpose computing, quick and automated methods to find suitable accelerator designs are required. We tackle this challenge in both regards. In order to avoid long synthesis times, we target a vector coprocessor, implemented on the FPGAs of a Convey HC-1. Previous studies showed that existing tools were not able to accelerate a real-world application with low effort. We present a toolflow to automatically identify suitable loops for vectorization, generate a corresponding hardware/software bipartition, and generate coprocessor code. Where applicable, we leverage outer-loop vectorization. We evaluate our tools with a set of characteristic loops, systematically analyzing different dependency and data layout properties.


Heterogeneous System Binary Acceleration Outer-Loop Vectorization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen, J.R., Kennedy, K.: Automatic loop interchange. In: Proc. ACM SIGPLAN Symp. on Compiler Construction, SIGPLAN 1984, pp. 233–246. ACM (1984)Google Scholar
  2. 2.
    Anand, K., Smithson, M., Elwazeer, K., Kotha, A., Gruen, J., Giles, N., Barua, R.: A compiler-level intermediate representation based binary analysis and rewriting system. In: Proc. ACM European Conference on Computer Systems (EuroSys), EuroSys 2013, pp. 295–308. ACM (2013)Google Scholar
  3. 3.
    Augustin, W., Heuveline, V., Weiss, J.-P.: Convey HC-1 hybrid core computer – the potential of FPGAs in numerical simulation. In: Proc. Int. Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC). KIT Scientific Publishing (March 2011)Google Scholar
  4. 4.
    Bakos, J.D.: High-performance heterogeneous computing with the Convey HC-1. Computing in Science and Engineering 12(6), 80–87 (2010)CrossRefGoogle Scholar
  5. 5.
    Bispo, J., Cardoso, J.M.P., Monteiro, J.: Hardware pipelining of runtime-detected loops. In: 2012 25th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2012)Google Scholar
  6. 6.
    Brewer, T.M.: Instruction set innovations for the Convey HC-1 computer. IEEE Micro 30(2), 70–79 (2010)CrossRefGoogle Scholar
  7. 7.
    Grad, M., Plessl, C.: On the feasibility and limitations of just-in-time instruction set extension for FPGA-based reconfigurable processors. Int. Journal of Reconfigurable Computing, IJRC (2012)Google Scholar
  8. 8.
    Happe, M., Meyeraufder Heide, F., Kling, P., Platzner, M., Plessl, C.: On-the-fly computing: A novel paradigm for individualized IT services. In: Proc. Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS). IEEE Computer Society Press (June 2013)Google Scholar
  9. 9.
    Johnson, R., Pearson, D., Pingali, K.: The program structure tree: computing control regions in linear time. In: Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), PLDI 1994, pp. 171–185. ACM (1994)Google Scholar
  10. 10.
    Kenter, T., Schmitz, H., Plessl, C.: Pragma based parallelization – trading hardware efficiency for ease of use? In: Proc. Int. Conf. on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–6. IEEE Computer Society (December 2012)Google Scholar
  11. 11.
    Lysecky, R., Stitt, G., Vahid, F.: Warp processors. ACM Transactions on Design Automation of Electronic Systems (TODAES) 11(3), 659–681 (2004)CrossRefGoogle Scholar
  12. 12.
    Meyer, B., Schumacher, J., Plessl, C., Förstner, J.: Convey vector personalities – FPGA acceleration with an OpenMP-like programming effort? In: Proc. Int. Conf. on Field Programmable Logic and Applications (FPL) (August 2012)Google Scholar
  13. 13.
    Ngo, V.N.: Parallel loop transformation techniques for vector-based multiprocessor systems. PhD thesis (1995) UMI Order No. GAX94-33091Google Scholar
  14. 14.
    Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short SIMD architectures. In: Proc. Int. Conf. on Parallel Architecture and Compilation Techniques (PACT), PACT 2008, pp. 2–11. ACM (2008)Google Scholar
  15. 15.
    Scarborough, R.G., Kolsky, H.G.: A vectorizing Fortran compiler. IBM Journal of Research and Development 30(2), 163–171 (1986)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tobias Kenter
    • 1
  • Gavin Vaz
    • 1
  • Christian Plessl
    • 1
  1. 1.Department of Computer ScienceUniversity of PaderbornGermany

Personalised recommendations