Advertisement

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

  • Umar Ibrahim Minhas
  • Samuel Bayliss
  • George A. Constantinides
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8405)

Abstract

FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical algorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical accuracy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are captured in a design description, this trade-off can be exploited to optimize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point representations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The results show that for large-enough matrices, GPUs out-perform FPGA-based implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU.

Keywords

GPU FPGA High Performance Computing (HPC) Non-standard Precision Half Precision Double-double Precision 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dennard, R., Gaensslen, F., Rideout, V., Bassous, E., LeBlanc, A.: Design of Ion-Implanted MOSFET’s with Very Small Physical Dimensions. IEEE Journal of Solid-State Circuits 9(5), 256–268 (1974)CrossRefGoogle Scholar
  2. 2.
    NVIDIA Corporation, Santa Clara, U.: Tesla C1060 Computing Processor Board (January 2010)Google Scholar
  3. 3.
    Xilinx Corporation: Virtex-6 Family Overview. Technical Report DS150 (January 2012)Google Scholar
  4. 4.
    Xilinx Corporation: LogiCORE Floating-Point Operator v5.0. (2011)Google Scholar
  5. 5.
    De Dinechin, F., Pasca, B.: Designing Custom Arithmetic Data Paths with FloPoCo. IEEE Design & Test of Computers 28(4), 18–27 (2011)CrossRefGoogle Scholar
  6. 6.
    Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune Dense Linear Algebra. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, p. 31. IEEE Press (2008)Google Scholar
  7. 7.
    NVIDIA Corporation: CUBLAS library v5.5. Technical report (2013)Google Scholar
  8. 8.
    NVIDIA Corporation: CUDA library documentation 4.1, http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online
  9. 9.
    Thall, A.: Extended-Precision Floating-Point Numbers for GPU Computation. In: ACM SIGGRAPH 2006 Research posters, p. 52. ACM (2006)Google Scholar
  10. 10.
    Lu, M., He, B., Luo, Q.: Supporting Extended Precision on Graphics Processors. In: Proceedings of the Sixth International Workshop on Data Management on New Hardware, pp. 19–26. ACM (2010)Google Scholar
  11. 11.
    Minhas, U.: GPU vs FPGA: A Comparative Performance Analysis for Non-Standard Precision. Master’s thesis, Imperial College London (2013)Google Scholar
  12. 12.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimizations of Software and the ATLAS project. Parallel Computing 27(12), 3–35 (2001)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Umar Ibrahim Minhas
    • 1
  • Samuel Bayliss
    • 1
  • George A. Constantinides
    • 1
  1. 1.Department of Electrical and Electronic EngineeringImperial College LondonLondonUK

Personalised recommendations