Advertisement

A Code-Based Analytical Approach for Using Separate Device Coprocessors in Computing Systems

  • Volker Hampel
  • Grigori Goronzy
  • Erik Maehle
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6566)

Abstract

Special hardware accelerators like FPGAs and GPUs are commonly introduced into a computing system as a separate device. Consequently, the accelerator and the host system do not share a common memory. Sourcing out the data to the additional hardware thus introduces a communication penalty. Based on a combination of a program’s source code and execution profiling we perform an analysis which evaluates the arithmetic intensity as a cost function to identify those parts most reasonable to source out to the accelerating hardware. The basic principles of this analysis are introduced and tested with a sample application. Its concrete results are discussed and evaluated based on the performance of a FPGA-based and a GPU-based implementation.

Keywords

FPGA GPU hardware accelerator profiling analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Harris, M.: Mapping Computational Concepts to GPUs. In: Pharr, M. (ed.) GPU Gems 2, ch. 31, Addison-Wesley Longman, Amsterdam (2005)Google Scholar
  2. 2.
    Palmer, J.: The Intel® 8087 numeric data processor. In: ISCA 1980: Proceedings of the 7th annual symposium on Computer Architecture, La Baule, USA, pp. 174–181 (1980), http://doi.acm.org/10.1145/800053.801923
  3. 3.
    Tripp, J.L., Gokhale, M.B., Peterson, K.D.: Trident: From High-Level Language to Hardware Circuitry. Computer 40(3), 28–37 (2007), http://dx.doi.org/10.1109/MC.2007.107 CrossRefGoogle Scholar
  4. 4.
    Han, T.D., Abdelrahman, T.S.: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems (March 31, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.62
  5. 5.
    Weber, R., Gothandaraman, A., Hinde, R.J., Peterson, G.D.: Comparing Hardware Accelerators in Scientific Applications: A Case Study. IEEE Transactions on Parallel and Distributed Systems (June 02, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.125
  6. 6.
    Park, S.J., Ross, J., Shires, D., Richie, D., Henz, B., Nguyen, L.: Hybrid Core Acceleration of UWB SIRE Radar Signal Processing. IEEE Transactions on Parallel and Distributed Systems (May 27, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.117
  7. 7.
    Park, I.K., Singhal, N., Lee, M.H., Cho, S., Kim, C.: Design and Performance Evaluation of Image Processing Algorithms on GPUs. IEEE Transactions on Parallel and Distributed Systems (May 27, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.115
  8. 8.
    Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.-Z., Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded gpu. In: CGO 2008: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Boston, MA, USA, pp. 195–204 (2008), http://doi.acm.org/10.1145/1356058.1356084
  9. 9.
    Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, USA, pp. 73–83 (2008), http://doi.acm.org/10.1145/1345206.1345220
  10. 10.
    Suffern, K.G.: Ray Tracing from the Ground up, A K Peters Ltd (2007)Google Scholar
  11. 11.
    Sobe, P., Hampel, V.: FPGA-Accelerated Deletion-Tolerant Coding for Reliable Distributed Storage. In: Lukowicz, P., Thiele, L., Tröster, G. (eds.) ARCS 2007. LNCS, vol. 4415, pp. 14–27. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-71270-1_2 CrossRefGoogle Scholar
  12. 12.
    Cray Inc.: Cray XD1 FPGA Development. Release 1.4 (2006)Google Scholar
  13. 13.
    Valgrind Developers: Valgrind User Manual. Release 3.5.0 (August 19, 2009)Google Scholar
  14. 14.
    Munshi, A. (ed.): The OpenCL-Specification. Version 1.1 (June 11, 2010)Google Scholar
  15. 15.
    Nvidia Corp.: NVIDIA CUDA C Programming Guide. Version 3.2 (September 8, 2010)Google Scholar
  16. 16.
    Nvidia Corp.: NVIDIA OpenCL Best Practices Guide. Version 2.3 (August 31, 2009)Google Scholar
  17. 17.
    Brewer, T.M.: Hybrid-core Computing: Punching through the power/performance wall. Scientific Computing, November/December (2009), http://www.conveycomputer.com/Resources/ScientificComputing62629.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Volker Hampel
    • 1
  • Grigori Goronzy
    • 1
  • Erik Maehle
    • 1
  1. 1.Institute of Computer EngineeringUniversity of LübeckLübeckGermany

Personalised recommendations