Abstract
Matrix multiplication is an essential building block of many linear algebra operations and applications. This paper presents parallel algorithms with shared A or B matrix in the memory for the special massively multithreaded Fiteng1000 processor. We discuss the implementations of parallel matrix multiplication algorithms on the multi-core processor with many threads. To gain better performance, it is important to choose the 2D thread spatial topography, the memory layer for the placement and the sizes of the matrices. Parallel codes using C and assembly language under OpenMP parallel programming environment are designed. Performance results on Fiteng1000 processor show that the algorithms have well good parallel performance and achieve near-peak performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide-Release 2.0. SIAM (1994)
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16(1), 1–17 (1990)
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 256–265. Springer, Heidelberg (2006)
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: FLAME: Formal linear algebra methods environment. ACM Trans. Math. 4, 422–455 (2001)
Gunnels, J.A., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS 2001. LNCS, vol. 2073, pp. 51–60. Springer, Heidelberg (2001)
Goto, K., van de Geijn, R.: High-performance implementation of the level-3 BLAS. FLAME Working Note #20, Tech. rep. TR-2006-23, Department of Computer Sciences, The University of Texas at Austin (2006)
Goto, K., van de Geijn, R.A.: On reducing TLB misses in matrix multiplication. Tech. rep. CS-TR-02-55, Department of Computer Sciences, University of Texas at Austin (2002)
Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Soft. 34(3) (2008)
Jeff, D., Behnam, R., Stephen, W.K., van de Robert, G., Goto, K., Doug, B.: PPoPP 2008, Salt Lake City, Utah, USA, Februrary 20-23, pp. 63–72 (2008)
Ernie, C., Enrique, S.Q., Gregorio, Q., Robert van de, G.: SuperMatrix out-of-order scheduling of matrix operations for SMP and Multi-Core Architectures. In: SPAA 2007, San Diego, Califonia, USA, June 9-11, pp. 116–125 (2007)
Marker, B., Van Zee, F.G., Goto, K., Quintana-OrtÃ, G., van de Geijn, R.A.: Toward Scalable Matrix Multiply on Multithreaded Architectures. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 748–757. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, J. et al. (2012). High-Performance Matrix Multiply on a Massively Multithreaded Fiteng1000 Processor. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-33065-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)