Towards Distributed Heterogenous High-Performance Computing with ViennaCL

  • Josef Weinbub
  • Karl Rupp
  • Siegfried Selberherr
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7116)


One of the major drawbacks of computing with graphics adapters is the limited available memory for relevant problem sizes. To overcome this limitation for the ViennaCL library, we investigate a partitioning approach for one of the standard benchmark problems in High-Performance Computing (HPC), namely the dense matrix-matrix product. We apply this partitioning approach to problems exceeding the available memory on graphics adapters. Moreover, we investigate the applicability on distributed memory systems by facilitating the Message Passing Interface (MPI). Our approach is presented in detail and benchmark results are given.


Message Passing Interface Global Memory Computing Node Execution Performance Distribute Memory System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agullo, E., et al.: Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects. Journal of Physics: Conference Series 180 (2009)Google Scholar
  2. 2.
    Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. Tech. Rep. NVR-2008-004, NVIDIA (2008)Google Scholar
  3. 3.
    Lawlor, O.S.: Message Passing for GPGPU Clusters: cudaMPI. In: IEEE Cluster PPAC Workshop (2009)Google Scholar
  4. 4.
    Rupp, K., Rudolf, F., Weinbub, J.: ViennaCL - A High Level Linear Algebra Library for GPUs and Multi-Core CPUs. In: Proceedings International Workshop on GPUs and Scientific Applications (GPUScA), pp. 51–56 (2010)Google Scholar
  5. 5.
    Rupp, K., Weinbub, J., Rudolf, F.: Automatic Performance Optimization in ViennaCL for GPUs. In: Proceedings Parallel/High-Performance Object-Oriented Scientific Computing Workshop, POOSC (2011)Google Scholar
  6. 6.
    Tomov, S., Dongarra, J., Baboulin, M.: Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems. Parallel Computing 36, 232–240 (2010)zbMATHCrossRefGoogle Scholar
  7. 7.
    Zee, F.G.V., et al.: The libflame Library for Dense Matrix Computations. Computing in Science and Engineering 11, 56–63 (2009)CrossRefGoogle Scholar
  8. 8.
    AMD Accelerated Parallel Processing SDK,
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
    SimuNova Matrix Template Library 4,
  16. 16.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Josef Weinbub
    • 1
  • Karl Rupp
    • 1
    • 2
  • Siegfried Selberherr
    • 1
  1. 1.Institute for MicroelectronicsTU WienViennaAustria
  2. 2.Institute for Analysis and Scientific ComputingTU WienViennaAustria

Personalised recommendations