Abstract
Matrix-vector multiplication is one of the core computing of many algorithms calculation in scientific computing, the vectorization algorithm mapping is a difficult problem to vector processors. In this study, based on the background of BP algorithm for deep learning application, on the basis of in-depth analysis of the BP algorithm, according to the characteristics of vector processor architecture, we proposed an efficient vectorization method of matrix-vector multiplication. The L1D configured into SRAM mode, with double buffer “ping-pong” way to smooth data transmission of multistage storage structure, makes the calculation of the kernel and the DMA data moving overlap, let the kernel run at a peak speed, so as to achieve the best calculation efficiency. Through the way of transpose matrix transmission with DMA to avoid the inefficient access to column of matrix and summation reduction of floating-point calculation between the VPEs, Obtain the optimal kernel computing performance. Experimental result on MATRIX2 shows that the single-core performance of presented double precision matrix multiplication achieves 94.45 GFLOPS, and the efficiency of kernel computation achieves 99.39%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
LIU Zhong, TIAN Xi, CHEN Lei. Efficient vectorization method of triangular matrix multiplication supporting in-place calculation [J]. Journal of National University of Defense Technology, 2014(6):7–11.
LIU Zhong, CHEN Yueyue, CHEN Haiyan. A vectorization of FIR filter supporting any length and data types of coefficients [J]. Acta Electronics Sinica, 2013, 41(2):346–351. (in Chinese).
J.J. DONGARRA, JEREMY DU CROZ, SVEN HAMMARLING, RICHARD J. HANSON, An Extended Set of FORTRAN Basic Linear Algebra Subprograms [J], ACM Transactions on Mathematical Software, Vol. 14, No. 1, March 1973, Pages 1–17.
GotoBLASHomepage. [EB/OI]. [2014-04-24]. http://www.tacc.utexas.edu/tacc-projects/gotoblas2.
Goto K, van de Geijn R A. High-performance implementation of the level-3 BLAS[J]. ACM Transactions on Mathematical Software, 2008, 35(1):1–14.
ATLASHomepage. [EB/OL]. [2014-04-24]. http://math-atlas.SourceForge.net/.
Intel MKL Homepage [EB/OL]. [2014-04-24]. http://software.intel.com/en-us/articles/intel-mkl/.
ZHANG Xianyi, WANG Qian, ZHANG Yunquan. OpenBLAS: a high performance BLAS library on loongson 3A CPU [J]. journal of Software, 2011, 22(zk2):208–216. (in Chinese).
H. Esmaeilzadeh, P. Saeedi, B.N. Araabi, C. Lucas, and Sied Mehdi Fakhraie. Neural network stream processing core (NnSP) for embedded systems. In Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, 2006.
V. Vanhoucke, A. Senior, and M. Z. Mao. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, 2011.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In arXiv:1409.4842, 2014.
Zhao Z. Study and Application of BP Neural Network in Intrusion Detection[M] Proceedings of the 2012 International Conference on Cybernetics and Informatics. Springer New York, 2014:379–385.
Y.K. Li, “Analysis and Improvement Application of BP Neural Network,” Anhui University of Science and Technology, 2012.
Y.M. Li, “The Study of BP Learning Algorithm Improvement and Application in Face Recognition,” Shandong University, 2012.
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (61133007 and 61572025)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, J., Guo, Y., Hu, X. (2017). Vectorizable Design and Implementation of Matrix Multiplication on Vector Processor. In: Bhatia, S., Mishra, K., Tiwari, S., Singh, V. (eds) Advances in Computer and Computational Sciences. Advances in Intelligent Systems and Computing, vol 553. Springer, Singapore. https://doi.org/10.1007/978-981-10-3770-2_11
Download citation
DOI: https://doi.org/10.1007/978-981-10-3770-2_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3769-6
Online ISBN: 978-981-10-3770-2
eBook Packages: EngineeringEngineering (R0)