The I/O Complexity of Strassen’s Matrix Multiplication with Recomputation
Conference paper
First Online:
Abstract
A tight \(\varOmega ((n/\sqrt{M})^{\log _2 7}M)\) lower bound is derived on the I/O complexity of Strassen’s algorithm to multiply two \(n \times n\) matrices, in a two-level storage hierarchy with M words of fast memory. A proof technique is introduced, which exploits the Grigoriev’s flow of the matrix multiplication function as well as some combinatorial properties of the Strassen computational directed acyclic graph (CDAG). Applications to parallel computation are also developed. The result generalizes a similar bound previously obtained under the constraint of no-recomputation, that is, that intermediate results cannot be computed more than once.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Patterson, C.A., Snir, M., Graham, S.L.: Getting Up to Speed: The Future of Supercomputing. National Academies Press (2005)Google Scholar
- 2.Bilardi, G., Preparata, F.P.: Horizons of parallel computation. Journal of Parallel and Distributed Computing 27(2), 172–182 (1995)CrossRefzbMATHGoogle Scholar
- 3.Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13(4), 354–356 (1969)MathSciNetCrossRefzbMATHGoogle Scholar
- 4.Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proc. ACM ISSAC, pp. 296–303. ACM (2014)Google Scholar
- 5.Hong, J., Kung, H.: I/o complexity: the red-blue pebble game. In: Proc. ACM STOC, pp. 326–333. ACM (1981)Google Scholar
- 6.Cannon, L.E.: A cellular computer to implement the Kalman filter algorithm. Technical report, DTIC Document (1969)Google Scholar
- 7.Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In: Proc. ACM SPAA, pp. 77–79. ACM (2012)Google Scholar
- 8.Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing 64(9), 1017–1026 (2004)CrossRefzbMATHGoogle Scholar
- 9.Scquizzato, M., Silvestri, F.: Communication lower bounds for distributed-memory computations. arXiv preprint arXiv:1307.1805 (2013)
- 10.Pagh, R., Stöckel, M.: The input/output complexity of sparse matrix multiplication. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 750–761. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44777-2_62 Google Scholar
- 11.Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications 32(3), 866–901 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Communication-optimal parallel and sequential Cholesky decomposition. SIAM Journal on Scientific Computing 32(6), 3495–3523 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Loomis, L.H., Whitney, H.: An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc. 55(10), 961–962 (1949)MathSciNetCrossRefzbMATHGoogle Scholar
- 14.Zalgaller, V.A., Sossinsky, A.B., Burago, Y.D.: The American Mathematical Monthly 96(6), 544–546 (1989)CrossRefGoogle Scholar
- 15.Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. JACM 59(6), 32 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
- 16.Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In: Even, G., Rawitz, D. (eds.) MedAlg 2012. LNCS, vol. 7659, pp. 13–36. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34862-4_2 CrossRefGoogle Scholar
- 17.Scott, J., Holtz, O., Schwartz, O.: Matrix multiplication I/O complexity by path routing. In: Proc. ACM SPAA, pp. 35–45 (2015)Google Scholar
- 18.De Stefani, L.: On space constrained computations. PhD thesis, University of Padova (2016)Google Scholar
- 19.Bilardi, G., Preparata, F.: Processor-time trade offs under bounded speed message propagation. Lower Bounds. Theory of Computing Systems 32(5), 531–559 (1999)CrossRefzbMATHGoogle Scholar
- 20.Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for Strassen’s matrix multiplication. In: Proc. ACM SPAA, pp. 193–204 (2012)Google Scholar
- 21.Jacob, R., Stöckel, M.: Fast output-sensitive matrix multiplication. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 766–778. Springer, Heidelberg (2015). doi: 10.1007/978-3-662-48350-3_64 CrossRefGoogle Scholar
- 22.Savage, J.E.: Extending the Hong-Kung model to memory hierarchies. In: Du, D.-Z., Li, M. (eds.) COCOON 1995. LNCS, vol. 959, pp. 270–281. Springer, Heidelberg (1995). doi: 10.1007/BFb0030842 CrossRefGoogle Scholar
- 23.Bilardi, G., Peserico, E.: A characterization of temporal locality and its portability across memory hierarchies. In: Orejas, F., Spirakis, P.G., Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 128–139. Springer, Heidelberg (2001). doi: 10.1007/3-540-48224-5_11 CrossRefGoogle Scholar
- 24.Koch, R.R., Leighton, F.T., Maggs, B.M., Rao, S.B., Rosenberg, A.L., Schwabe, E.J.: Work-preserving emulations of fixed-connection networks. JACM 44(1), 104–147 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
- 25.Bhatt, S.N., Bilardi, G., Pucci, G.: Area-time tradeoffs for universal VLSI circuits. Theoret. Comput. Sci. 408(2–3), 143–150 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
- 26.Bilardi, G., Pietracaprina, A., D’Alberto, P.: On the space and access complexity of computation DAGs. In: Brandes, U., Wagner, D. (eds.) WG 2000. LNCS, vol. 1928, pp. 47–58. Springer, Heidelberg (2000). doi: 10.1007/3-540-40064-8_6 CrossRefGoogle Scholar
- 27.Grigor’ev, D.Y.: Application of separability and independence notions for proving lower bounds of circuit complexity. Zapiski Nauchnykh Seminarov POMI 60, 38–48 (1976)zbMATHGoogle Scholar
- 28.Savage, J.E.: Models of Computation: Exploring the Power of Computing, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (1997)Google Scholar
- 29.Bilardi, G., Stefani, L.D.: The i/o complexity of strassen’s matrix multiplication with recomputation. arXiv preprint arXiv:1605.02224 (2016)
- 30.Ranjan, D., Savage, J.E., Zubair, M.: Upper and lower I/O bounds for pebbling r-pyramids. Journal of Discrete Algorithms 14, 2–12 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
- 31.Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
- 32.Le Gall, F.: Faster algorithms for rectangular matrix multiplication. In: Proc. IEEE FOCS, pp. 514–523. IEEE (2012)Google Scholar
- 33.Thompson, C.: Area-time complexity for VLSI. In: Proc. ACM STOC, pp. 81–88. ACM (1979)Google Scholar
Copyright information
© Springer International Publishing AG 2017