A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA

Zhou, Jie; Dou, Yong; Zhao, Jianxun; Xia, Fei; Lei, Yuanwu; Tang, Yuxing

doi:10.1007/978-3-642-03644-6_9

Jie Zhou¹⁹,
Yong Dou¹⁹,
Jianxun Zhao²⁰,
Fei Xia¹⁹,
Yuanwu Lei¹⁹ &
…
Yuxing Tang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

883 Accesses
2 Citations

Abstract

Large-scale matrix inversion play an important role in many applications. However to the best of our knowledge, there is no FPGA-based implementation. In this paper, we explore the possibility of accelerating large-scale matrix inversion on FPGA. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for matrix inversion. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 12 PEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA on our self-designed board. Experimental results show that a factor of 2.6 speedup and the maximum power-performance of 41 can be achieved compare to Pentium Dual CPU with double SSE threads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, D.H., Ferguson, H.R.: A strassen-newton algorithm for high-speed parallelizable matrix inversion. In: Proceedings of Supercomputing 1988, pp. 419–424. IEEE, Los Alamitos (November 1988)
Google Scholar
Batchelor, G.: Introduction to Fluid Dynamics, 2nd edn. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Bigdeli, A., Biglari-Abhari, M., Salcic, Z., Lai, Y.T.: A new pipelined systolic array-based architecture for matrix inversion in fpgas with kalman filter case study. EURASIP Journal on Applied Signal Processing archive 2006(1), 75 (2006)
Google Scholar
Caron, E., Utard, G.: Parallel out-of-core matrix inversion. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 71–76 (2002)
Google Scholar
Echman, F., Owall, V.: A scalable pipelined complex valued matrix inversion architecture. In: IEEE International Symposium on Circuits and Systems, vol. 5, pp. 4489–4492 (2005)
Google Scholar
Edman, F., Owall, V.: Implementation of a scalable matrix inversion architecture for triangular matrices. In: 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, vol. 3, pp. 2558–2562 (2003)
Google Scholar
El-Amawy, A.: A systolic architecture for fast dense matrix inversion. IEEE Transactions on Computers 38(3), 449–455 (1989)
Article MathSciNet Google Scholar
Farina, A., Timmoneri, L.: Parallel algorithms and processing architectures for space-time adaptive processing. In: Proceedings of CIE International Conference of Radar, pp. 770–774 (1996)
Google Scholar
Fischer, B., Modersitzki, J.: Fast inversion of matrices arising in image processing. Computer Science 22(1), 1–11 (1999)
MathSciNet MATH Google Scholar
LaRoche, I., Roy, S.: A efficient regular matrix inversion circuit architecture for mimo processing. In: Proceedings of IEEE International Symposium on Circuits and Systems, May 2006, pp. 4819–4822 (2006)
Google Scholar
Lau, K., Kumar, M., Venkatesh, S.: Parallel matrix inversion techniques. In: Proceedings of the 16th Annual Symposium on Foundations of Computer Science, October 1975, pp. 11–12 (1975)
Google Scholar
Lightbody, G., Walke, R., Woods, R., McCanny, J.: Linear qr architecture for a single chip adaptive beamformer. Journal of VLSI Signal Processing Systems archive 24(1), 67–81 (2000)
Article Google Scholar
Lim, C.H., Mulgrew, B.: Prediction of inverse covariance matrix (picm) sequences for stap. IEEE Signal Processing Letters 13(4), 236–239 (2006)
Article Google Scholar
Milovanovic, E., Milovanovic, I., Stojcev, M., Jovanovic, G.: Fault-tolerant matrix inversion on processor array. Electronics Letters 28(13), 1206–1208 (1992)
Article MATH Google Scholar
Ojalvo, I.: Proper use of lanczos vectors for large eigenvalue problems. Computers & Structures 20(1-3), 115–120 (1985)
Article MATH Google Scholar
Quintana, E.S., Quintana, G., Sun, X., van de Geijn, R.: Efficient matrix inversion via gauss-jordan elimination and its parallelization. Technical Report TR-98-19, Dept. of Computer Sciences, The University of Texas at Austin (1998)
Google Scholar
Rabideau, D., Kogon, S.: A signal processing architecture for space-based gmti radar. In: The Record of the 1999 IEEE Radar Conference, pp. 96–101. ACM, New York (1999)
Google Scholar
Singh, C.K., Prasad, S.H., Balsara, P.T.: Vlsi architecture for matrix inversion using modified gram-schmidt based qr decomposition. In: 20th International Conference on VLSI Design, pp. 836–841 (2007)
Google Scholar
Xiaodong, W., Roychowdhury, V.: Minimizing communication overhead for matrix inversion algorithms on hypercubes. In: Proceedings of the 9th International Parallel Processing Symposium, April 1995, pp. 446–450 (1995)
Google Scholar
Yong, D., Jie, Z., Xiaoyang, C., Yuanwu, L., Jinbo, X.: Fpga accelerating three qr decomposition algorithms in the unified pipelined framework. In: FPL 2009 (2009)
Google Scholar
Yong, D., Jie, Z., Yuanwu, L., Xingming, Z.: Fpga sar processor with window memory accesses. In: IEEE International Conf. on Application-specific Systems, Architectures and Processors, pp. 95–100 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Parallel & Distributed Processing, NUDT, Changsha, P.R. China, 410073
Jie Zhou, Yong Dou, Fei Xia, Yuanwu Lei & Yuxing Tang
Academy of Armored Forces Engineering, Beijing, China, 100072
Jianxun Zhao

Authors

Jie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dou
View author publications
You can also search for this author in PubMed Google Scholar
Jianxun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yuanwu Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yuxing Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, Department of Computer Science, 410073, Changsha, P.R. China
Yong Dou
Lausanne (EPFL), Ecole Polytechnique Fédérale de ,Dépt. Physique, 1015, LAUSANNE, Switzerland
Ralf Gruber
Technik Rapperswil, HSR - Hochschule für, Oberseestr. 10, 8640, RAPPERSWIL , SCHWEIZ
Josef M. Joller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, J., Dou, Y., Zhao, J., Xia, F., Lei, Y., Tang, Y. (2009). A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-03644-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics