Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7133))

Included in the following conference series:

Abstract

Over the past five years almost all computer manufacturers have dramatically changed their computer architectures to Multicore (MC) processors. We briefly describe Cache Blocking as it relates to computer architectures since about 1985 by covering the where, when, how and why of Cache Blocking as it relates to dense linear algebra. It will be seen that the arrangement in memory of the submatrices A ij of A that are being processed is very important.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R.C., Cooley, J.W., Gustavson, F.G., Shearer, J.B., Slishman, G., Tuckerman, B.: New scalar and vector elementary functions for the IBM System/370. IBM Journal of Research and Development 30(2), 126–144 (1986)

    Article  MathSciNet  Google Scholar 

  2. Agarwal, R.C., Gustavson, F.G.: A Parallel Implementation of Matrix Multiplication and LU factorization on the IBM 3090. In: Wright, M. (ed.) Proceedings of the IFIP WG 2.5 on Aspects of Computation on Asynchronous Parallel Processors, pp. 217–221. North Holland, Stanford (1988)

    Google Scholar 

  3. Agarwal, R.C., Gustavson, F.G., Zubair, M.: Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)

    Article  Google Scholar 

  4. Agarwal, R.C., Gustavson, F.G., Zubair, M.: A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication. IBM Journal of Research and Development 38(6), 673–681 (1994)

    Article  Google Scholar 

  5. Andersen, B.S., Gunnels, J.A., Gustavson, F.G., Reid, J.K., Waśniewski, J.: A Fully Portable High Performance Minimal Storage Hybrid Cholesky Algorithm. ACM TOMS 31(2), 201–227 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Anderson, E., et al.: LAPACK Users’ Guide Release 3.0. SIAM, Philadelphia (1999)

    Google Scholar 

  7. Blackford, L.S., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)

    Google Scholar 

  8. Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)

    Article  MathSciNet  Google Scholar 

  9. Chatterjee, S., et al.: Design and Exploitation of a High-performance SIMD Floating-point Unit for Blue Gene/L. IBM Journal of Research and Development 49(2-3), 377–391 (2005)

    Article  Google Scholar 

  10. Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A Set of Level 3 Basic Linear Algebra Subprograms. TOMS 16(1), 1–17 (1990)

    Article  MATH  Google Scholar 

  11. Elmroth, E., Gustavson, F.G., Jonsson, I., Kågström, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design. International Journal of Supercomputer Applications 2(1), 12–48 (1988)

    Article  Google Scholar 

  13. Golub, G., VanLoan, C.: Matrix Computations, 3rd edn. John Hopkins Press, Baltimore (1996)

    Google Scholar 

  14. Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development 41(6), 737–755 (1997)

    Article  Google Scholar 

  15. Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM Journal of Research and Development 47(1), 31–55 (2003)

    Article  MathSciNet  Google Scholar 

  16. Gustavson, F.G., Gunnels, J.A., Sexton, J.C.: Minimal Data Copy for Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 540–549. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Gustavson, F.G., Swirszcz, T.: In-Place Transposition of Rectangular Matrices. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 560–569. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Gustavson, F.G., Gunnels, J., Sexton, J.: Method and Structure for Fast In-Place Transformation of Standard Full and Packed Matrix Data Formats. United State Patent Office Submission YOR920070021US1 and Submission YOR920070021US1(YOR.699CIP) US Patent Office, 35 pages (September 1, 2007); 58 pages (March 2008)

    Google Scholar 

  19. Gustavson, F.G.: The Relevance of New Data Structure Approaches for Dense Linear Algebra in the New Multicore/Manycore Environments, IBM Research report RC24599; also, to appear in PARA 2008 proceeding, 10 pages (2008)

    Google Scholar 

  20. Gustavson, F.G., Karlsson, L., Kågström, B.: Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion. ACM TOMS, 34 pages (to appear 2011)

    Google Scholar 

  21. IBM. IBM Engineering and Scientific Subroutine Library for AIX Version 3, Release 3. IBM Pub. No. SA22-7272-00 (February 1986)

    Google Scholar 

  22. Karlsson, L.: Blocked in-place transposition with application to storage format conversion. Tech. Rep. UMINF 09.01. Department of Computing Science, Umeå University, Umeå, Sweden (January 2009) ISSN 0348-0542

    Google Scholar 

  23. Knuth, D.: The Art of Computer Programming, 3rd edn., vol. 1 & 2. Addison-Wesley (1998)

    Google Scholar 

  24. Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of Linear Equations on the Cell Processor using Cholesky Factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)

    Article  Google Scholar 

  25. Kurzak, J., Dongarra, J.: Implementation of mixed precision in solving mixed precision of linear equations on the Cell processor: Research Articles. Concurr. Comput.: Pract. Exper. 19(10), 1371–1385 (2007)

    Article  Google Scholar 

  26. Lao, S., Lewis, B.R., Boucher, M.L.: In-place Transpose United State Patent No. US 7,031,994 B2. US Patent Office (April 18, 2006)

    Google Scholar 

  27. Park, N., Hong, B., Prasanna, V.: Tiling, Block Data Layout, and Memory Hierarchy Performance. IEEE Trans. Parallel and Distributed Systems 14(7), 640–654 (2003)

    Article  Google Scholar 

  28. Tietze, H.: Three Dimensions–Higher Dimensions. In: Famous Problems of Mathematics, pp. 106–120. Graylock Press (1965)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gustavson, F.G. (2012). Cache Blocking. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28151-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28151-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28150-1

  • Online ISBN: 978-3-642-28151-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics