Skip to main content

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7199))

Abstract

Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ~ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with high-bandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanczos method can be specialized only for extremal eigenvalues computation and present an architecture which can achieve a sustained single precision floating-point performance of 175 GFLOPs on Virtex6-SX475T for a dense matrix of size 335×335. We perform a quantitative comparison with the parallel implementations of the Lanczos method using optimized Intel MKL and CUBLAS libraries for multi-core and GPU respectively. We find that for a range of matrices the FPGA implementation outperforms both multi-core and GPU; a speed up of 8.2-27.3× (13.4× geo. mean) over an Intel Xeon X5650 and 26.2-116× (52.8× geo. mean) over an Nvidia C2050 when FPGA is solving a single eigenvalue problem whereas a speed up of 41-520× (103× geo.mean) and 131-2220× (408× geo.mean) respectively when it is solving multiple eigenvalue problems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Underwood, K.: FPGAs vs. CPUs: trends in peak floating-point performance. In: Proc. ACM/SIGDA 12th International Symposium on Field programmable Gate Arrays, pp. 171–180 (2004)

    Google Scholar 

  2. Lopes, A.R., Constantinides, G.A.: A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds.) ARC 2008. LNCS, vol. 4943, pp. 75–86. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Boland, D., Constantinides, G.: An FPGA-based implementation of the MINRES algorithm. In: Proc. Field Programmable Logic and Applications, pp. 379–384 (2008)

    Google Scholar 

  4. Kapre, N., DeHon, A.: Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs. In: Proc. Field-Programmable Technology, pp. 190–198 (2009)

    Google Scholar 

  5. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  6. Toh, K.C.: A note on the calculation of step-lengths in interior-point methods for semidefinite programming. J. Computational Optimization and Applications 21(3), 301–310 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  7. Zeng, Y., Koh, C.L., Liang, Y.C.: Maximum eigenvalue detection: theory and application. In: Proc. IEEE International Conference on Communications, pp. 4160–4164 (2008)

    Google Scholar 

  8. Demmel, J.W.: Applied numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia (1997)

    Book  MATH  Google Scholar 

  9. Ahmedsaid, A., Amira, A., Bouridane, A.: Improved SVD systolic array and implementation on FPGA. In: Proc. Field-Programmable Technology, pp. 35–42 (2003)

    Google Scholar 

  10. Liu, Y., Bouganis, C.S., Cheung, P.Y.K., Leong, P.H.W., Motley, S.J.: Hardware efficient architectures for eigenvalue computation. In: Proc. Design Automation & Test in Europe, p. 202 (2006)

    Google Scholar 

  11. Bravo, I., Jiménez, P., Mazo, M., Lázaro, J.L., Gardel, A.: Implementation in FPGAs of Jacobi method to solve the eigenvalue and eigenvector problem. In: Proc. Field Programmable Logic and Applications, pp. 1–4 (2006)

    Google Scholar 

  12. Brochers, B.: SDPLIB 1.2, a library of semidefinite programming test problems. Optimization Methods and Software 11(1-4), 683–690 (1999)

    Article  MathSciNet  Google Scholar 

  13. Intel Math Kernel Library 10.2.4.032 (2010), http://software.intel.com/en-us/articles/intel-mkl/

  14. CUBLAS 3.2 (2010), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUBLAS_Library.pdf

  15. Intel microprocessor export compliance metrics (2010), http://download.intel.com/support/processors/xeon/sb/xeon_5600.pdf

  16. Nvidia Tesla C2050 (2010), http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf

  17. Sundararajan, P.: High Performance Computing using FPGAs (2010), http://www.xilinx.com/support/documentation/white_papers/wp375_HPC_Using_FPGAs.pdf

  18. Anzt, H., Hahn, T., Heuveline, V., Rocker, B.: GPU Accelerated Scientific Computing: Evaluation of the NVIDIA Fermi Architecture; Elementary Kernels and Linear Solvers, KIT (2010)

    Google Scholar 

  19. Caspi, E., Chu, M., Huang, R., Yeh, J., Wawrzynek, J., DeHon, A.: Stream computations organized for reconfigurable execution (SCORE). In: Proc. Field Programmable Logic and Applications, pp. 605–614 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rafique, A., Kapre, N., Constantinides, G.A. (2012). A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem. In: Choy, O.C.S., Cheung, R.C.C., Athanas, P., Sano, K. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2012. Lecture Notes in Computer Science, vol 7199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28365-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28365-9_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28364-2

  • Online ISBN: 978-3-642-28365-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics