Abstract
One of the major drawbacks of computing with graphics adapters is the limited available memory for relevant problem sizes. To overcome this limitation for the ViennaCL library, we investigate a partitioning approach for one of the standard benchmark problems in High-Performance Computing (HPC), namely the dense matrix-matrix product. We apply this partitioning approach to problems exceeding the available memory on graphics adapters. Moreover, we investigate the applicability on distributed memory systems by facilitating the Message Passing Interface (MPI). Our approach is presented in detail and benchmark results are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agullo, E., et al.: Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects. Journal of Physics: Conference Series 180 (2009)
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. Tech. Rep. NVR-2008-004, NVIDIA (2008)
Lawlor, O.S.: Message Passing for GPGPU Clusters: cudaMPI. In: IEEE Cluster PPAC Workshop (2009)
Rupp, K., Rudolf, F., Weinbub, J.: ViennaCL - A High Level Linear Algebra Library for GPUs and Multi-Core CPUs. In: Proceedings International Workshop on GPUs and Scientific Applications (GPUScA), pp. 51–56 (2010)
Rupp, K., Weinbub, J., Rudolf, F.: Automatic Performance Optimization in ViennaCL for GPUs. In: Proceedings Parallel/High-Performance Object-Oriented Scientific Computing Workshop, POOSC (2011)
Tomov, S., Dongarra, J., Baboulin, M.: Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems. Parallel Computing 36, 232–240 (2010)
Zee, F.G.V., et al.: The libflame Library for Dense Matrix Computations. Computing in Science and Engineering 11, 56–63 (2009)
AMD Accelerated Parallel Processing SDK, http://developer.amd.com/gpu/amdappsdk/
Boost uBLAS, http://www.boost.org/libs/numeric/ublas/
Eigen, http://eigen.tuxfamily.org
GotoBLAS2, http://www.tacc.utexas.edu/tacc-projects/gotoblas2/
Khronos OpenCL, http://www.khronos.org/opencl/
NVIDIA CUDA, http://www.nvidia.com/cuda/
SimuNova Matrix Template Library 4, http://www.simunova.com
ViennaCL, http://viennacl.sourceforge.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weinbub, J., Rupp, K., Selberherr, S. (2012). Towards Distributed Heterogenous High-Performance Computing with ViennaCL. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds) Large-Scale Scientific Computing. LSSC 2011. Lecture Notes in Computer Science, vol 7116. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29843-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-29843-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29842-4
Online ISBN: 978-3-642-29843-1
eBook Packages: Computer ScienceComputer Science (R0)