Vectorizable Design and Implementation of Matrix Multiplication on Vector Processor

Zhang, Junyang; Guo, Yang; Hu, Xiao

doi:10.1007/978-981-10-3770-2_11

Vectorizable Design and Implementation of Matrix Multiplication on Vector Processor

Junyang Zhang¹⁸,
Yang Guo¹⁸ &
Xiao Hu¹⁸

Conference paper
First Online: 28 May 2017

1219 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 553))

Abstract

Matrix-vector multiplication is one of the core computing of many algorithms calculation in scientific computing, the vectorization algorithm mapping is a difficult problem to vector processors. In this study, based on the background of BP algorithm for deep learning application, on the basis of in-depth analysis of the BP algorithm, according to the characteristics of vector processor architecture, we proposed an efficient vectorization method of matrix-vector multiplication. The L1D configured into SRAM mode, with double buffer “ping-pong” way to smooth data transmission of multistage storage structure, makes the calculation of the kernel and the DMA data moving overlap, let the kernel run at a peak speed, so as to achieve the best calculation efficiency. Through the way of transpose matrix transmission with DMA to avoid the inefficient access to column of matrix and summation reduction of floating-point calculation between the VPEs, Obtain the optimal kernel computing performance. Experimental result on MATRIX2 shows that the single-core performance of presented double precision matrix multiplication achieves 94.45 GFLOPS, and the efficiency of kernel computation achieves 99.39%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

LIU Zhong, TIAN Xi, CHEN Lei. Efficient vectorization method of triangular matrix multiplication supporting in-place calculation [J]. Journal of National University of Defense Technology, 2014(6):7–11.
Google Scholar
LIU Zhong, CHEN Yueyue, CHEN Haiyan. A vectorization of FIR filter supporting any length and data types of coefficients [J]. Acta Electronics Sinica, 2013, 41(2):346–351. (in Chinese).
Google Scholar
J.J. DONGARRA, JEREMY DU CROZ, SVEN HAMMARLING, RICHARD J. HANSON, An Extended Set of FORTRAN Basic Linear Algebra Subprograms [J], ACM Transactions on Mathematical Software, Vol. 14, No. 1, March 1973, Pages 1–17.
Google Scholar
GotoBLASHomepage. [EB/OI]. [2014-04-24]. http://www.tacc.utexas.edu/tacc-projects/gotoblas2.
Goto K, van de Geijn R A. High-performance implementation of the level-3 BLAS[J]. ACM Transactions on Mathematical Software, 2008, 35(1):1–14.
Google Scholar
ATLASHomepage. [EB/OL]. [2014-04-24]. http://math-atlas.SourceForge.net/.
Intel MKL Homepage [EB/OL]. [2014-04-24]. http://software.intel.com/en-us/articles/intel-mkl/.
ZHANG Xianyi, WANG Qian, ZHANG Yunquan. OpenBLAS: a high performance BLAS library on loongson 3A CPU [J]. journal of Software, 2011, 22(zk2):208–216. (in Chinese).
Google Scholar
H. Esmaeilzadeh, P. Saeedi, B.N. Araabi, C. Lucas, and Sied Mehdi Fakhraie. Neural network stream processing core (NnSP) for embedded systems. In Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, 2006.
Google Scholar
V. Vanhoucke, A. Senior, and M. Z. Mao. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, 2011.
Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In arXiv:1409.4842, 2014.
Zhao Z. Study and Application of BP Neural Network in Intrusion Detection[M] Proceedings of the 2012 International Conference on Cybernetics and Informatics. Springer New York, 2014:379–385.
Google Scholar
Y.K. Li, “Analysis and Improvement Application of BP Neural Network,” Anhui University of Science and Technology, 2012.
Google Scholar
Y.M. Li, “The Study of BP Learning Algorithm Improvement and Application in Face Recognition,” Shandong University, 2012.
Google Scholar

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (61133007 and 61572025)

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, China
Junyang Zhang, Yang Guo & Xiao Hu

Authors

Junyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junyang Zhang .

Editor information

Editors and Affiliations

Department of Computer Science, University of Missouri, Columbia, Missouri, USA
Sanjiv K. Bhatia
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad, Uttar Pradesh, India
Krishn K. Mishra
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Banaras Hindu University, Varanasi, Uttar Pradesh, India
Vivek Kumar Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Guo, Y., Hu, X. (2017). Vectorizable Design and Implementation of Matrix Multiplication on Vector Processor. In: Bhatia, S., Mishra, K., Tiwari, S., Singh, V. (eds) Advances in Computer and Computational Sciences. Advances in Intelligent Systems and Computing, vol 553. Springer, Singapore. https://doi.org/10.1007/978-981-10-3770-2_11

Download citation

DOI: https://doi.org/10.1007/978-981-10-3770-2_11
Published: 28 May 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3769-6
Online ISBN: 978-981-10-3770-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics