A Square Block Format for Symmetric Band Matrices

  • Fred G. Gustavson
  • José R. Herrero
  • Enric Morancho
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8384)


This contribution describes a Square Block, SB, format for storing a banded symmetric matrix. This is possible by rearranging “in place” LAPACK Band Layout to become a SB layout: store submatrices as a set of square blocks. The new format reduces storage space, provides higher locality of memory accesses, results in regular access patterns, and exposes parallelism.


Upper Square Block Band Format Banded Cholesky factorization New data storage format Locality Parallelism 


  1. 1.
    Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)CrossRefGoogle Scholar
  2. 2.
    Gustavson, F.G., Quintana-Ortí, E.S., Quintana-Ortí, G., Remón, A., Waśniewski, J.: Clearer, simpler and more efficient LAPACK routines for symmetric positive definite band factorization. To appear in: PARA’08. IMM-Technical Report-2008-19. Technical University of Denmark, DTU Informatics, Building 321 (2008)Google Scholar
  3. 3.
    Gustavson, F.G., Waśniewski, J., Dongarra, J.J., Herrero, J.R., Langou, J.: Level-3 Cholesky factorization routines improve performance of many Cholesky algorithms. ACM Trans. Math. Softw. 39(2), 9:1–9:10 (2013)CrossRefGoogle Scholar
  4. 4.
    Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the cell processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)CrossRefGoogle Scholar
  5. 5.
    Quintana-Ortí, G., Quintana-Ortí, E.S., Remón, A., Geijn, R.A.: An algorithm-by-blocks for supermatrix band Cholesky factorization. In: Palma, J., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J. (eds.) VECPAR 2008. LNCS, vol. 5336, pp. 228–239. Springer, Heidelberg (2008)Google Scholar
  6. 6.
    Gustavson, F.G., Karlsson, L., Kågström, B.: Parallel and cache-efficient in-place matrix storage format conversion. ACM TOMS 38(3), 17:1–17:32 (2012)CrossRefGoogle Scholar
  7. 7.
    Gustavson, F.G., Walker, D.W.: Algorithms for in-place matrix transposition. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part II. LNCS, vol. 8385, pp. 105–117. Springer, Heidelberg (2014)Google Scholar
  8. 8.
    Herrero, J.R., Navarro, J.J.: Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3984, pp. 762–771. Springer, Heidelberg (2006) Google Scholar
  9. 9.
    Herrero, J.R.: New data structures for matrices and specialized inner kernels: low overhead for high performance. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 659–667. Springer, Heidelberg (2008) Google Scholar
  10. 10.
    Herrero, J.R.: Exposing inner kernels and block storage for fast parallel dense linear algebra codes. To appear in: PARA’08Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Fred G. Gustavson
    • 1
    • 2
  • José R. Herrero
    • 3
  • Enric Morancho
    • 3
  1. 1.IBM T.J. Watson Research CenterNew YorkUSA
  2. 2.Umeå UniversityUmeåSweden
  3. 3.Computer Architecture DepartmentUniversitat Politècnica de Catalunya, BarcelonaTechBarcelonaSpain

Personalised recommendations