A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

  • Abid Rafique
  • Nachiket Kapre
  • George A. Constantinides
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7199)

Abstract

Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ~ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with high-bandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanczos method can be specialized only for extremal eigenvalues computation and present an architecture which can achieve a sustained single precision floating-point performance of 175 GFLOPs on Virtex6-SX475T for a dense matrix of size 335×335. We perform a quantitative comparison with the parallel implementations of the Lanczos method using optimized Intel MKL and CUBLAS libraries for multi-core and GPU respectively. We find that for a range of matrices the FPGA implementation outperforms both multi-core and GPU; a speed up of 8.2-27.3× (13.4× geo. mean) over an Intel Xeon X5650 and 26.2-116× (52.8× geo. mean) over an Nvidia C2050 when FPGA is solving a single eigenvalue problem whereas a speed up of 41-520× (103× geo.mean) and 131-2220× (408× geo.mean) respectively when it is solving multiple eigenvalue problems.

Keywords

Field Programmable Gate Array Lanczos Method Eigenvalue Computation Extremal Eigenvalue FPGA Design 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Underwood, K.: FPGAs vs. CPUs: trends in peak floating-point performance. In: Proc. ACM/SIGDA 12th International Symposium on Field programmable Gate Arrays, pp. 171–180 (2004)Google Scholar
  2. 2.
    Lopes, A.R., Constantinides, G.A.: A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds.) ARC 2008. LNCS, vol. 4943, pp. 75–86. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Boland, D., Constantinides, G.: An FPGA-based implementation of the MINRES algorithm. In: Proc. Field Programmable Logic and Applications, pp. 379–384 (2008)Google Scholar
  4. 4.
    Kapre, N., DeHon, A.: Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs. In: Proc. Field-Programmable Technology, pp. 190–198 (2009) Google Scholar
  5. 5.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)MATHGoogle Scholar
  6. 6.
    Toh, K.C.: A note on the calculation of step-lengths in interior-point methods for semidefinite programming. J. Computational Optimization and Applications 21(3), 301–310 (1999)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Zeng, Y., Koh, C.L., Liang, Y.C.: Maximum eigenvalue detection: theory and application. In: Proc. IEEE International Conference on Communications, pp. 4160–4164 (2008)Google Scholar
  8. 8.
    Demmel, J.W.: Applied numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia (1997)CrossRefMATHGoogle Scholar
  9. 9.
    Ahmedsaid, A., Amira, A., Bouridane, A.: Improved SVD systolic array and implementation on FPGA. In: Proc. Field-Programmable Technology, pp. 35–42 (2003)Google Scholar
  10. 10.
    Liu, Y., Bouganis, C.S., Cheung, P.Y.K., Leong, P.H.W., Motley, S.J.: Hardware efficient architectures for eigenvalue computation. In: Proc. Design Automation & Test in Europe, p. 202 (2006)Google Scholar
  11. 11.
    Bravo, I., Jiménez, P., Mazo, M., Lázaro, J.L., Gardel, A.: Implementation in FPGAs of Jacobi method to solve the eigenvalue and eigenvector problem. In: Proc. Field Programmable Logic and Applications, pp. 1–4 (2006)Google Scholar
  12. 12.
    Brochers, B.: SDPLIB 1.2, a library of semidefinite programming test problems. Optimization Methods and Software 11(1-4), 683–690 (1999)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Intel Math Kernel Library 10.2.4.032 (2010), http://software.intel.com/en-us/articles/intel-mkl/
  14. 14.
  15. 15.
    Intel microprocessor export compliance metrics (2010), http://download.intel.com/support/processors/xeon/sb/xeon_5600.pdf
  16. 16.
  17. 17.
    Sundararajan, P.: High Performance Computing using FPGAs (2010), http://www.xilinx.com/support/documentation/white_papers/wp375_HPC_Using_FPGAs.pdf
  18. 18.
    Anzt, H., Hahn, T., Heuveline, V., Rocker, B.: GPU Accelerated Scientific Computing: Evaluation of the NVIDIA Fermi Architecture; Elementary Kernels and Linear Solvers, KIT (2010)Google Scholar
  19. 19.
    Caspi, E., Chu, M., Huang, R., Yeh, J., Wawrzynek, J., DeHon, A.: Stream computations organized for reconfigurable execution (SCORE). In: Proc. Field Programmable Logic and Applications, pp. 605–614 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Abid Rafique
    • 1
  • Nachiket Kapre
    • 1
  • George A. Constantinides
    • 1
  1. 1.Electrical and Electronic EngineeringImperial College LondonLondonUK

Personalised recommendations