Power/Performance Trade-Offs of Small Batched LU Based Solvers on GPUs

  • Oreste Villa
  • Massimiliano Fatica
  • Nitin Gawande
  • Antonino Tumeo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8097)


In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different levels of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread level parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solutions only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Langou, J., Ltaief, H., Tomov, S.: Lu factorization for accelerator-based systems. In: AICCSA: 9th IEEE/ACS International Conference on Computer Systems and Applications, pp. 217–224 (December 2011)Google Scholar
  2. 2.
    Hammond, G., Lichtner, P., Lu, C., Mills, R.: Pflotran: Reactive flow and transport code for use on laptops to leadership-class supercomputers. In: Groundwater Reactive Transport Models. Bentham Sciene Publishers (2012)Google Scholar
  3. 3.
    Higham, N.: Gaussian elimination. Computational Statistics 3, 230–238 (2011)CrossRefGoogle Scholar
  4. 4.
    Nidia corporation. Nidia CUBLAS Library, Version 5.0 (2012)Google Scholar
  5. 5.
    Nidia corporation. Nvidia CUDA c Programming Guide, Version 5.0 (2012)Google Scholar
  6. 6.
    Song, F., Tomov, S., Dongarra, J.: Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems. In: ICS 2012: The 26th ACM International Conference on Supercomputing, pp. 365–376 (2012)Google Scholar
  7. 7.
    Tang, G., D’Azevedo, E.F., Zhang, F., Parker, J.C., Watson, D.B., Jardine, P.M.: Application of a hybrid MPI/OPENMP approach for parallel groundwater model calibration using multi-core computers. Computers & Geosciences 36, 1451–1460 (2010)CrossRefGoogle Scholar
  8. 8.
    Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with gpu accelerators. In: IPDPSW 2010: IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 1–8 (2010)Google Scholar
  9. 9.
    White, M., Oostrom, M.: STOMP Subsurface Transport Over Multiple Phase: User’s Guide. Technical report, Pacific Northwest National Laboratory, Richland, WA, USA, PNNL-15782 (2006)Google Scholar
  10. 10.
    Yeh, G., Tripathi, V., Gwo, J., Cheng, H., Chend, J.-R.C., Salvage, K., Li, M., Fang, Y., Li, Y., Sun, J., Zhang, F., Siegel, M.: HYDROGEOCHEM: A coupled model of variably saturated flow, thermal transport, and reactive biogeochemical transport, on laptops to leadership-class supercomputers. In: Groundwater Reactive Transport Models. Bentham Science Publishers (2012)Google Scholar
  11. 11.
    Zhang, K., Wu, Y., Pruess, K.: User’s Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code. Technical report, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, LBNL-315E (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Oreste Villa
    • 1
  • Massimiliano Fatica
    • 2
  • Nitin Gawande
    • 1
  • Antonino Tumeo
    • 1
  1. 1.Pacific Northwest National LaboratoryRichlandUSA
  2. 2.NVIDIASanta ClaraUSA

Personalised recommendations