A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

Rafique, Abid; Kapre, Nachiket; Constantinides, George A.

doi:10.1007/978-3-642-28365-9_20

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

Abid Rafique²⁰,
Nachiket Kapre²⁰ &
George A. Constantinides²⁰

Conference paper

1379 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7199))

Abstract

Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ~ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with high-bandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanczos method can be specialized only for extremal eigenvalues computation and present an architecture which can achieve a sustained single precision floating-point performance of 175 GFLOPs on Virtex6-SX475T for a dense matrix of size 335×335. We perform a quantitative comparison with the parallel implementations of the Lanczos method using optimized Intel MKL and CUBLAS libraries for multi-core and GPU respectively. We find that for a range of matrices the FPGA implementation outperforms both multi-core and GPU; a speed up of 8.2-27.3× (13.4× geo. mean) over an Intel Xeon X5650 and 26.2-116× (52.8× geo. mean) over an Nvidia C2050 when FPGA is solving a single eigenvalue problem whereas a speed up of 41-520× (103× geo.mean) and 131-2220× (408× geo.mean) respectively when it is solving multiple eigenvalue problems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Underwood, K.: FPGAs vs. CPUs: trends in peak floating-point performance. In: Proc. ACM/SIGDA 12th International Symposium on Field programmable Gate Arrays, pp. 171–180 (2004)
Google Scholar
Lopes, A.R., Constantinides, G.A.: A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds.) ARC 2008. LNCS, vol. 4943, pp. 75–86. Springer, Heidelberg (2008)
Chapter Google Scholar
Boland, D., Constantinides, G.: An FPGA-based implementation of the MINRES algorithm. In: Proc. Field Programmable Logic and Applications, pp. 379–384 (2008)
Google Scholar
Kapre, N., DeHon, A.: Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs. In: Proc. Field-Programmable Technology, pp. 190–198 (2009)
Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Toh, K.C.: A note on the calculation of step-lengths in interior-point methods for semidefinite programming. J. Computational Optimization and Applications 21(3), 301–310 (1999)
Article MathSciNet MATH Google Scholar
Zeng, Y., Koh, C.L., Liang, Y.C.: Maximum eigenvalue detection: theory and application. In: Proc. IEEE International Conference on Communications, pp. 4160–4164 (2008)
Google Scholar
Demmel, J.W.: Applied numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Book MATH Google Scholar
Ahmedsaid, A., Amira, A., Bouridane, A.: Improved SVD systolic array and implementation on FPGA. In: Proc. Field-Programmable Technology, pp. 35–42 (2003)
Google Scholar
Liu, Y., Bouganis, C.S., Cheung, P.Y.K., Leong, P.H.W., Motley, S.J.: Hardware efficient architectures for eigenvalue computation. In: Proc. Design Automation & Test in Europe, p. 202 (2006)
Google Scholar
Bravo, I., Jiménez, P., Mazo, M., Lázaro, J.L., Gardel, A.: Implementation in FPGAs of Jacobi method to solve the eigenvalue and eigenvector problem. In: Proc. Field Programmable Logic and Applications, pp. 1–4 (2006)
Google Scholar
Brochers, B.: SDPLIB 1.2, a library of semidefinite programming test problems. Optimization Methods and Software 11(1-4), 683–690 (1999)
Article MathSciNet Google Scholar
Intel Math Kernel Library 10.2.4.032 (2010), http://software.intel.com/en-us/articles/intel-mkl/
CUBLAS 3.2 (2010), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUBLAS_Library.pdf
Intel microprocessor export compliance metrics (2010), http://download.intel.com/support/processors/xeon/sb/xeon_5600.pdf
Nvidia Tesla C2050 (2010), http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf
Sundararajan, P.: High Performance Computing using FPGAs (2010), http://www.xilinx.com/support/documentation/white_papers/wp375_HPC_Using_FPGAs.pdf
Anzt, H., Hahn, T., Heuveline, V., Rocker, B.: GPU Accelerated Scientific Computing: Evaluation of the NVIDIA Fermi Architecture; Elementary Kernels and Linear Solvers, KIT (2010)
Google Scholar
Caspi, E., Chu, M., Huang, R., Yeh, J., Wawrzynek, J., DeHon, A.: Stream computations organized for reconfigurable execution (SCORE). In: Proc. Field Programmable Logic and Applications, pp. 605–614 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Electronic Engineering, Imperial College London, London, SW7 2BT, UK
Abid Rafique, Nachiket Kapre & George A. Constantinides

Authors

Abid Rafique
View author publications
You can also search for this author in PubMed Google Scholar
Nachiket Kapre
View author publications
You can also search for this author in PubMed Google Scholar
George A. Constantinides
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Oliver C. S. Choy
Department of Electronic Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong, China
Ray C. C. Cheung
Department of ECE, Virginia Tech., 302 Whittemore Hall, 24061, Blacksburg, VA, USA
Peter Athanas
Tohoku University, 6-6-01 Aramaki Aza Aoba, Aobaku, 981-8579, Sendai, Miyagi, Japan
Kentaro Sano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rafique, A., Kapre, N., Constantinides, G.A. (2012). A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem. In: Choy, O.C.S., Cheung, R.C.C., Athanas, P., Sano, K. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2012. Lecture Notes in Computer Science, vol 7199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28365-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-28365-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28364-2
Online ISBN: 978-3-642-28365-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics