High Throughput Heterogeneous Computing and Interactive Visualization on a Desktop Supercomputer

  • S. Zhang
  • R. Weiss
  • S. Wang
  • G. A. BarnettJr.
  • D. A. Yuen
Chapter
Part of the Lecture Notes in Earth System Sciences book series (LNESS)

Abstract

At a cost below $2500, a desktop supercomputer was built from scratch by assembling the basic parts including a Tesla C1060 card and a GeForce GTX 295 card. This commodity desktop runs a Linux operating system together with CUDA, MPI and other needed software. MPI is used not only for distributing and/or transferring the computing loads among the GPU devices, but also for controlling the process of visualization. Several applications of heterogeneous computing have been successfully run on this desktop. Calculation of long-ranged forces in the n-body problem with fast multi-pole method can consume more than 85 % of the cycles and generate 480 GFLOPS of throughput. Mixed programming of CUDA-based C and Matlab has facilitated interactive visualization during simulations. One such MIMD application is the simulation of an idealized Belousov-Zhabotinsky Reaction (BZR), which is distributed evenly on three GPU devices (two on GTX 295 and one on Tesla) through message passing interface (MPI) and visualized at a given frequency displaying the evolution of the simulated reaction. One additional MPI process is over-subscribed onto one GPU device for monitoring the thermal status and memory usage of all the GPU devices as the BZR simulation progresses, further enhancing the throughput. (Submitted as a part of the paper is a movie capturing the self-organization process of cellular spirals resembling the Belousov-Zhabotinsky Reaction.) Our test runs have shown that running multiple applications on one GPU device or running one application across multiple GPU devices can be done as conveniently as on traditional CPUs.

Keywords

CUDA SIMD MIMD Matlab MPI Heterogeneous computing 

References

  1. Jacket-The GPU acceleration engine for Matlab. http://www.omatrix.com/jacket.html
  2. Keeneland success at SC10. http://keeneland.gatech.edu
  3. Knepley M (2009) Understanding the performance of the fast multipole method (FMM) on a GPU, SC09 presentation at MSI’s booth. http://static.msi.umn.edu/curtain/docs/MSISC09PresentationSchedule.pdf
  4. LAPACK for GPUs and multicore architectures. http://icl.cs.utk.edu/magma/
  5. Selected Publications by NVIDIA. http://research.nvidia.com/publications
  6. Shimokawabe T, Auki T, Muroi C, Ishida J, Kawano K, Endo T, Nukada A, Maruyama N, Matsuoka S (2010) An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE conference on supercomputing (SC’10), New Orleans.Google Scholar
  7. Turner A (2009) A simple model of the Belousov-Zhabotinsky reaction from first principles. ScientificCmmons, http://en.scientificcommons.org/50894615
  8. Wang S, Zhang S, Weiss RM, Barnett GA, Yuen DA (2009) Commodity CPU-GPU system for low-cost. High Perform Comput 90:52Google Scholar
  9. Winfree AT (1984) The prehistory of the Belousov-Zhabotinsky oscillator. J Chem Educ 61:661–663CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • S. Zhang
    • 1
  • R. Weiss
    • 2
  • S. Wang
    • 2
  • G. A. BarnettJr.
    • 3
  • D. A. Yuen
    • 1
    • 2
  1. 1.Minnesota Supercomputing InstituteUniversity of MinnesotaMinneapolisUSA
  2. 2.Department of Geology and GeophysicsUniversity of MinnesotaMinneapolisUSA
  3. 3.Department of Applied MathematicsUniversity of ColoradoBoulderUSA

Personalised recommendations