Skip to main content

Is Cache-Oblivious DGEMM Viable?

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4699))

Abstract

We present a study of implementations of DGEMM using both the cache-oblivious and cache-conscious programming styles. The cache-oblivious programs use recursion and automatically block DGEMM operands A,B,C for the memory hierarchy. The cache-conscious programs use iteration and explicitly block A,B,C for register files, all caches and memory. Our study shows that the cache-oblivious programs achieve substantially less performance than the cache-conscious programs. We discuss why this is so and suggest approaches for improving the performance of cache-oblivious programs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R.C., Gustavson, F.G., Zubair, M.: Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)

    Google Scholar 

  2. Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal 5(2), 78–101 (1966)

    Google Scholar 

  3. Chatterjee, S., et al.: Design and Exploitation of a High-performance SIMD Floating-point Unit for Blue Gene/L. IBM Journal of Research and Development 49(2-3), 377–391 (2005)

    Article  Google Scholar 

  4. Frigo, M., Leiserson, C., Prokop, H., Ramachandran, S.: Cache-oblivious Algorithms. In: FOCS 1999: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 285. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  5. Dongarra, J.J., Gustavson, F.G., Karp, A.: Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine. SIAM Review 26(1), 91–112 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  6. Elmroth, E., Gustavson, F.G., Kågström, B., Jonsson, I.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  7. Hong, J.-W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proc. of the thirteenth annual ACM symposium on Theory of computing, pp. 326–333 (1981)

    Google Scholar 

  8. Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 256–265. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development 41(6), 737–755 (1997)

    Google Scholar 

  10. Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM Journal of Research and Development 47(1), 31–55 (2003)

    MathSciNet  Google Scholar 

  11. Gustavson, F.G., Gunnels, J.A., Sexton, J.C.: Minimal Data Copy for Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E. (eds.) Computational Science - Para 2006. LNCS, vol. xxxx, pp. 540–549. Springer, Heidelberg (2006)

    Google Scholar 

  12. Gustavson, F.G., Henriksson, A., Jonsson, I., Kågström, B., Ling, P.: Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In: Kagström, B., Elmroth, E., Waśniewski, J., Dongarra, J.J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 195–206. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  13. Gustavson, F.G., Henriksson, A., Jonsson, I., Kågström, B., Ling, P.: Superscalar GEMM-based level 3 BLAS—the on-going evolution of a portable and high-performance library. In: Kagström, B., Elmroth, E., Waśniewski, J., Dongarra, J.J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 207–215. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Park, N., Hong, B., Prasanna, V.K.: Tiling, Block Data Layout, and Memory Hierarchy Performance. IEEE Trans. Parallel and Distributed Systems 14(7), 640–654 (2003)

    Article  Google Scholar 

  15. Roeder, T., Yotov, K., Pingali, K., Gunnels, J., Gustavson, F.: The Price of Cache Obliviousness. Department of Computer Science, University of Texas, Austin Technical Report CS-TR-06-43 (September 2006)

    Google Scholar 

  16. Sinharoy, B., Kalla, R.N., Tendler, J.M, Kovacs, R.G., Eickemeyer, R.J., Joyner, J.B.: POWER5 System Microarchitecture. IBM Journal of Research and Development 49(4/5), 505–521 (2005)

    Google Scholar 

  17. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing (1-2), 3–35 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bo Kågström Erik Elmroth Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gunnels, J.A., Gustavson, F.G., Pingali, K., Yotov, K. (2007). Is Cache-Oblivious DGEMM Viable?. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_109

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75755-9_109

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75754-2

  • Online ISBN: 978-3-540-75755-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics