GPU Solutions to Multi-scale Problems in Science and Engineering

Part of the series Lecture Notes in Earth System Sciences pp 639-652


High Throughput Heterogeneous Computing and Interactive Visualization on a Desktop Supercomputer

  • S. ZhangAffiliated withMinnesota Supercomputing Institute, University of Minnesota
  • , R. WeissAffiliated withDepartment of Geology and Geophysics, University of Minnesota
  • , S. WangAffiliated withDepartment of Geology and Geophysics, University of Minnesota
  • , G. A. BarnettJr.Affiliated withDepartment of Applied Mathematics, University of Colorado
  • , D. A. YuenAffiliated withMinnesota Supercomputing Institute, University of MinnesotaDepartment of Geology and Geophysics, University of Minnesota

* Final gross prices may vary according to local VAT.

Get Access


At a cost below $2500, a desktop supercomputer was built from scratch by assembling the basic parts including a Tesla C1060 card and a GeForce GTX 295 card. This commodity desktop runs a Linux operating system together with CUDA, MPI and other needed software. MPI is used not only for distributing and/or transferring the computing loads among the GPU devices, but also for controlling the process of visualization. Several applications of heterogeneous computing have been successfully run on this desktop. Calculation of long-ranged forces in the n-body problem with fast multi-pole method can consume more than 85 % of the cycles and generate 480 GFLOPS of throughput. Mixed programming of CUDA-based C and Matlab has facilitated interactive visualization during simulations. One such MIMD application is the simulation of an idealized Belousov-Zhabotinsky Reaction (BZR), which is distributed evenly on three GPU devices (two on GTX 295 and one on Tesla) through message passing interface (MPI) and visualized at a given frequency displaying the evolution of the simulated reaction. One additional MPI process is over-subscribed onto one GPU device for monitoring the thermal status and memory usage of all the GPU devices as the BZR simulation progresses, further enhancing the throughput. (Submitted as a part of the paper is a movie capturing the self-organization process of cellular spirals resembling the Belousov-Zhabotinsky Reaction.) Our test runs have shown that running multiple applications on one GPU device or running one application across multiple GPU devices can be done as conveniently as on traditional CPUs.


CUDA SIMD MIMD Matlab MPI Heterogeneous computing