A Code-Based Analytical Approach for Using Separate Device Coprocessors in Computing Systems

  • Volker Hampel
  • Grigori Goronzy
  • Erik Maehle
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6566)


Special hardware accelerators like FPGAs and GPUs are commonly introduced into a computing system as a separate device. Consequently, the accelerator and the host system do not share a common memory. Sourcing out the data to the additional hardware thus introduces a communication penalty. Based on a combination of a program’s source code and execution profiling we perform an analysis which evaluates the arithmetic intensity as a cost function to identify those parts most reasonable to source out to the accelerating hardware. The basic principles of this analysis are introduced and tested with a sample application. Its concrete results are discussed and evaluated based on the performance of a FPGA-based and a GPU-based implementation.


FPGA GPU hardware accelerator profiling analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Harris, M.: Mapping Computational Concepts to GPUs. In: Pharr, M. (ed.) GPU Gems 2, ch. 31, Addison-Wesley Longman, Amsterdam (2005)Google Scholar
  2. 2.
    Palmer, J.: The Intel® 8087 numeric data processor. In: ISCA 1980: Proceedings of the 7th annual symposium on Computer Architecture, La Baule, USA, pp. 174–181 (1980),
  3. 3.
    Tripp, J.L., Gokhale, M.B., Peterson, K.D.: Trident: From High-Level Language to Hardware Circuitry. Computer 40(3), 28–37 (2007), CrossRefGoogle Scholar
  4. 4.
    Han, T.D., Abdelrahman, T.S.: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems (March 31, 2010),
  5. 5.
    Weber, R., Gothandaraman, A., Hinde, R.J., Peterson, G.D.: Comparing Hardware Accelerators in Scientific Applications: A Case Study. IEEE Transactions on Parallel and Distributed Systems (June 02, 2010),
  6. 6.
    Park, S.J., Ross, J., Shires, D., Richie, D., Henz, B., Nguyen, L.: Hybrid Core Acceleration of UWB SIRE Radar Signal Processing. IEEE Transactions on Parallel and Distributed Systems (May 27, 2010),
  7. 7.
    Park, I.K., Singhal, N., Lee, M.H., Cho, S., Kim, C.: Design and Performance Evaluation of Image Processing Algorithms on GPUs. IEEE Transactions on Parallel and Distributed Systems (May 27, 2010),
  8. 8.
    Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.-Z., Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded gpu. In: CGO 2008: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Boston, MA, USA, pp. 195–204 (2008),
  9. 9.
    Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, USA, pp. 73–83 (2008),
  10. 10.
    Suffern, K.G.: Ray Tracing from the Ground up, A K Peters Ltd (2007)Google Scholar
  11. 11.
    Sobe, P., Hampel, V.: FPGA-Accelerated Deletion-Tolerant Coding for Reliable Distributed Storage. In: Lukowicz, P., Thiele, L., Tröster, G. (eds.) ARCS 2007. LNCS, vol. 4415, pp. 14–27. Springer, Heidelberg (2007), CrossRefGoogle Scholar
  12. 12.
    Cray Inc.: Cray XD1 FPGA Development. Release 1.4 (2006)Google Scholar
  13. 13.
    Valgrind Developers: Valgrind User Manual. Release 3.5.0 (August 19, 2009)Google Scholar
  14. 14.
    Munshi, A. (ed.): The OpenCL-Specification. Version 1.1 (June 11, 2010)Google Scholar
  15. 15.
    Nvidia Corp.: NVIDIA CUDA C Programming Guide. Version 3.2 (September 8, 2010)Google Scholar
  16. 16.
    Nvidia Corp.: NVIDIA OpenCL Best Practices Guide. Version 2.3 (August 31, 2009)Google Scholar
  17. 17.
    Brewer, T.M.: Hybrid-core Computing: Punching through the power/performance wall. Scientific Computing, November/December (2009),

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Volker Hampel
    • 1
  • Grigori Goronzy
    • 1
  • Erik Maehle
    • 1
  1. 1.Institute of Computer EngineeringUniversity of LübeckLübeckGermany

Personalised recommendations