The I/O Complexity of Strassen’s Matrix Multiplication with Recomputation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10389)

Abstract

A tight \(\varOmega ((n/\sqrt{M})^{\log _2 7}M)\) lower bound is derived on the I/O complexity of Strassen’s algorithm to multiply two \(n \times n\) matrices, in a two-level storage hierarchy with M words of fast memory. A proof technique is introduced, which exploits the Grigoriev’s flow of the matrix multiplication function as well as some combinatorial properties of the Strassen computational directed acyclic graph (CDAG). Applications to parallel computation are also developed. The result generalizes a similar bound previously obtained under the constraint of no-recomputation, that is, that intermediate results cannot be computed more than once.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Patterson, C.A., Snir, M., Graham, S.L.: Getting Up to Speed: The Future of Supercomputing. National Academies Press (2005)Google Scholar
  2. 2.
    Bilardi, G., Preparata, F.P.: Horizons of parallel computation. Journal of Parallel and Distributed Computing 27(2), 172–182 (1995)CrossRefMATHGoogle Scholar
  3. 3.
    Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13(4), 354–356 (1969)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proc. ACM ISSAC, pp. 296–303. ACM (2014)Google Scholar
  5. 5.
    Hong, J., Kung, H.: I/o complexity: the red-blue pebble game. In: Proc. ACM STOC, pp. 326–333. ACM (1981)Google Scholar
  6. 6.
    Cannon, L.E.: A cellular computer to implement the Kalman filter algorithm. Technical report, DTIC Document (1969)Google Scholar
  7. 7.
    Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In: Proc. ACM SPAA, pp. 77–79. ACM (2012)Google Scholar
  8. 8.
    Irony, D., Toledo, S., Tiskin, A.: Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing 64(9), 1017–1026 (2004)CrossRefMATHGoogle Scholar
  9. 9.
    Scquizzato, M., Silvestri, F.: Communication lower bounds for distributed-memory computations. arXiv preprint arXiv:1307.1805 (2013)
  10. 10.
    Pagh, R., Stöckel, M.: The input/output complexity of sparse matrix multiplication. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 750–761. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44777-2_62 Google Scholar
  11. 11.
    Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications 32(3), 866–901 (2011)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Communication-optimal parallel and sequential Cholesky decomposition. SIAM Journal on Scientific Computing 32(6), 3495–3523 (2010)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Loomis, L.H., Whitney, H.: An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc. 55(10), 961–962 (1949)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Zalgaller, V.A., Sossinsky, A.B., Burago, Y.D.: The American Mathematical Monthly 96(6), 544–546 (1989)CrossRefGoogle Scholar
  15. 15.
    Ballard, G., Demmel, J., Holtz, O., Schwartz, O.: Graph expansion and communication costs of fast matrix multiplication. JACM 59(6), 32 (2012)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In: Even, G., Rawitz, D. (eds.) MedAlg 2012. LNCS, vol. 7659, pp. 13–36. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34862-4_2 CrossRefGoogle Scholar
  17. 17.
    Scott, J., Holtz, O., Schwartz, O.: Matrix multiplication I/O complexity by path routing. In: Proc. ACM SPAA, pp. 35–45 (2015)Google Scholar
  18. 18.
    De Stefani, L.: On space constrained computations. PhD thesis, University of Padova (2016)Google Scholar
  19. 19.
    Bilardi, G., Preparata, F.: Processor-time trade offs under bounded speed message propagation. Lower Bounds. Theory of Computing Systems 32(5), 531–559 (1999)CrossRefMATHGoogle Scholar
  20. 20.
    Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., Schwartz, O.: Communication-optimal parallel algorithm for Strassen’s matrix multiplication. In: Proc. ACM SPAA, pp. 193–204 (2012)Google Scholar
  21. 21.
    Jacob, R., Stöckel, M.: Fast output-sensitive matrix multiplication. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 766–778. Springer, Heidelberg (2015). doi:10.1007/978-3-662-48350-3_64 CrossRefGoogle Scholar
  22. 22.
    Savage, J.E.: Extending the Hong-Kung model to memory hierarchies. In: Du, D.-Z., Li, M. (eds.) COCOON 1995. LNCS, vol. 959, pp. 270–281. Springer, Heidelberg (1995). doi:10.1007/BFb0030842 CrossRefGoogle Scholar
  23. 23.
    Bilardi, G., Peserico, E.: A characterization of temporal locality and its portability across memory hierarchies. In: Orejas, F., Spirakis, P.G., Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 128–139. Springer, Heidelberg (2001). doi:10.1007/3-540-48224-5_11 CrossRefGoogle Scholar
  24. 24.
    Koch, R.R., Leighton, F.T., Maggs, B.M., Rao, S.B., Rosenberg, A.L., Schwabe, E.J.: Work-preserving emulations of fixed-connection networks. JACM 44(1), 104–147 (1997)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Bhatt, S.N., Bilardi, G., Pucci, G.: Area-time tradeoffs for universal VLSI circuits. Theoret. Comput. Sci. 408(2–3), 143–150 (2008)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Bilardi, G., Pietracaprina, A., D’Alberto, P.: On the space and access complexity of computation DAGs. In: Brandes, U., Wagner, D. (eds.) WG 2000. LNCS, vol. 1928, pp. 47–58. Springer, Heidelberg (2000). doi:10.1007/3-540-40064-8_6 CrossRefGoogle Scholar
  27. 27.
    Grigor’ev, D.Y.: Application of separability and independence notions for proving lower bounds of circuit complexity. Zapiski Nauchnykh Seminarov POMI 60, 38–48 (1976)MATHGoogle Scholar
  28. 28.
    Savage, J.E.: Models of Computation: Exploring the Power of Computing, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (1997)Google Scholar
  29. 29.
    Bilardi, G., Stefani, L.D.: The i/o complexity of strassen’s matrix multiplication with recomputation. arXiv preprint arXiv:1605.02224 (2016)
  30. 30.
    Ranjan, D., Savage, J.E., Zubair, M.: Upper and lower I/O bounds for pebbling r-pyramids. Journal of Discrete Algorithms 14, 2–12 (2012)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Le Gall, F.: Faster algorithms for rectangular matrix multiplication. In: Proc. IEEE FOCS, pp. 514–523. IEEE (2012)Google Scholar
  33. 33.
    Thompson, C.: Area-time complexity for VLSI. In: Proc. ACM STOC, pp. 81–88. ACM (1979)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of PadovaPadovaItaly
  2. 2.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations