Hybrid 2D/1D Blocking as Optimal Matrix-Matrix Multiplication

Gusev, Marjan; Ristov, Sasko; Velkoski, Goran

doi:10.1007/978-3-642-37169-1_2

Marjan Gusev³,
Sasko Ristov³ &
Goran Velkoski³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 207))

Included in the following conference series:

International Conference on ICT Innovations

1059 Accesses
2 Citations

Abstract

Multiplication of huge matrices generates more cache misses than smaller matrices. 2D block decomposition of matrices that can be placed in L1 CPU cache decreases the cache misses since the operations will access data only stored in L1 cache. However, it also requires additional reads, writes, and operations compared to 1D partitioning, since the blocks are read multiple times.

In this paper we propose a new hybrid 2D/1D partitioning to exploit the advantages of both approaches. The idea is first to partition the matrices in 2D blocks and then to multiply each block with 1D partitioning to achieve minimum cache misses. We select also a block size to fit in L1 cache as 2D block decomposition, but we use rectangle instead of squared blocks in order to minimize the operations but also cache associativity. The experiments show that our proposed algorithm outperforms the 2D blocking algorithm for huge matrices on AMD Phenom CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for strassen’s matrix multiplication. In: Proceedinbgs of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, pp. 193–204. ACM, NY (2012)
Chapter Google Scholar
Bataineh, S., Khalil, I.M., Khreishah, A., Shi, J.Y.: Performance evaluation of matrix multiplication on a network of work stations with communication delay. JCIS: Journal of Communication and Information Sciences 1(2), 32–44 (2011)
Article Google Scholar
DeFlumere, A., Lastovetsky, A., Becker, B.: Partitioning for parallel matrix-matrix multiplication with heterogeneous processors: The optimal solution. In: 21st International Heterogeneity in Computing Workshop (HCW 2012). IEEE Computer Society Press, Shanghai (2012)
Google Scholar
Drevet, C.E., Islam, M.N., Schost, E.: Optimization techniques for small matrix multiplication. ACM Comm. Comp. Algebra 44(3/4), 107–108 (2011)
Google Scholar
Gusev, M., Ristov, S.: Matrix multiplication performance analysis in virtualized shared memory multiprocessor. In: MIPRO, 2012 Proceedings of the 35th International Convention, pp. 264–269. IEEE Conference Publications (2012)
Google Scholar
Gusev, M., Ristov, S.: Performance gains and drawbacks using set associative cache. Journal of Next Generation Information Technology (JNIT) 3(3), 87–98 (2012)
Article Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 5th edn. (2012)
Google Scholar
Jenks, S.: Multithreading and thread migration using mpi and myrinet. In: Proc. of the Parallel and Distrib. Computing and Systems, PDCS 2004 (2004)
Google Scholar
Playne, D.P., Hawick, K.A.: Comparison of gpu architectures for asynchronous communication with finite-di erencing applications. Concurrency and Computation: Practice and Experience 24(1), 73–83 (2012)
Article Google Scholar
Ristov, S., Gusev, M.: Superlinear speedup for matrix multiplication. In: Proceedings of the 34th International Conference on Information Technology Interfaces, ITI 2012, pp. 499–504 (2012)
Google Scholar
So, B., Ghuloum, A.M., Wu, Y.: Optimizing data parallel operations on many-core platforms. In: First Workshop on Software Tools for Multi-Core Systems (STMCS), pp. 66–70 (2006)
Google Scholar
Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput. 35(3), 178–194 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Sciences and Computer Engineering, Ss. Cyril and Methodius University, Rugjer Boshkovikj 16, 1000, Skopje, Macedonia
Marjan Gusev, Sasko Ristov & Goran Velkoski

Authors

Marjan Gusev
View author publications
You can also search for this author in PubMed Google Scholar
Sasko Ristov
View author publications
You can also search for this author in PubMed Google Scholar
Goran Velkoski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marjan Gusev .

Editor information

Editors and Affiliations

, Faculty of Computer Science, Ss Cyrill and Methodius University, Ruger Boskovic 16, POBox 393, Skopje, 1000, Macedonia
Smile Markovski
, Faculty of Information Sciences, Ss Cyrill and Methodius University, Ruger Boskovic 16, Skopje, 1000, Macedonia
Marjan Gusev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gusev, M., Ristov, S., Velkoski, G. (2013). Hybrid 2D/1D Blocking as Optimal Matrix-Matrix Multiplication. In: Markovski, S., Gusev, M. (eds) ICT Innovations 2012. ICT Innovations 2012. Advances in Intelligent Systems and Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37169-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-37169-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37168-4
Online ISBN: 978-3-642-37169-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics