Journal of Real-Time Image Processing

, Volume 5, Issue 3, pp 179–193 | Cite as

Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station

  • Peter J. LuEmail author
  • Hidekazu Oki
  • Catherine A. Frey
  • Gregory E. Chamitoff
  • Leroy Chiao
  • Edward M. Fincke
  • C. Michael Foale
  • Sandra H. Magnus
  • William S. McArthurJr.
  • Daniel M. Tani
  • Peggy A. Whitson
  • Jeffrey N. Williams
  • William V. Meyer
  • Ronald J. Sicker
  • Brion J. Au
  • Mark Christiansen
  • Andrew B. Schofield
  • David A. Weitz
Original Research Paper


We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIA’s CUDA platform. We use our code to analyze images of liquid-gas phase separation in a model colloid-polymer system, photographed in the absence of gravity aboard the International Space Station (ISS). Our GPU code is 4,000 times faster than simple MATLAB code performing the same calculation on a central processing unit (CPU), 130 times faster than simple C code, and 30 times faster than optimized C++ code using single-instruction, multiple-data (SIMD) extensions. The speed increases from these parallel algorithms enable us to analyze images downlinked from the ISS in a rapid fashion and send feedback to astronauts on orbit while the experiments are still being run.


GPU CUDA Autocorrelation International Space Station SIMD 



This work was supported by NASA grant NNX08AE09G and the NVIDIA professor partnership program. We thank A. Bik, J. Curley, A. Ghuloum, D. Luebke, E. Phillips, B. Saar, H. Saito, M. Schnubbel-Stutz, L. Vogt, and many helpful individuals throughout NASA and its contractors.


  1. 1.
    Alerstam, E., Svensson T., Andersson-Engels, S.: Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration. JBO Lett. 13, 060504 (2008). doi: 10.1117/1.3041496 Google Scholar
  2. 2.
    Anderson, J.A., Lorenz, C.D., Travesset, A.: General purpose molecular dynamics simulations fully implemented on graphics processing units. J. Comput. Phys. 227, 5342–5359 (2008). doi: 10.1016/ zbMATHCrossRefGoogle Scholar
  3. 3.
    Bailey, A.E., Poon, W.C.K., Christianson, R.J., Schofield, A.B., Gasser, U., Prasad, V., Manley, S., Segre, P.N., Cipelletti, L., Meyer, W.V., Doherty, M.P., Sankaran, S., Jankovsky, A.L., Shiley, W.L., Bowen, J.P., Eggers, J.C., Kurta, C., Lorik, Jr., T., Pusey, P.N., Weitz, D.A.: Spinodal decomposition in a model colloid–polymer mixture in microgravity. Phys. Rev. Lett 99, 205701 (2007). doi: 10.1103/PhysRevLett.99.205701 CrossRefGoogle Scholar
  4. 4.
    Belleman, R.G., Bédorf, J., Portegies Zwart, S.F.: High performance direct gravitational N-body simulations on graphics processing units II: an implementation in CUDA. New Astron. 13, 103–112 (2008). doi: 10.1016/j.newast.2007.07.004 CrossRefGoogle Scholar
  5. 5.
    Bik, A.J.C.: The Software Vectorization Handbook. Intel, Hillsboro (2004)Google Scholar
  6. 6.
    Bodnár, I., Dhont J.K.G., Lekkerkerker, H.N.W.: Pretransitional phenomena of a colloid polymer mixture studied with static and dynamic light scattering. J. Chem. Phys. 100, 19614–19619 (1996)CrossRefGoogle Scholar
  7. 7.
    Bodnár, I., Oosterbaan, W.D.: Indirect determination of the composition of the coexisting phases in a demixed colloid polymer mixture. J. Chem. Phys. 106, 7777–7780 (1997)CrossRefGoogle Scholar
  8. 8.
    Castaño-Díez, D., Mozer, D., Schoenegger, A., Pruggnaller S., Frangakis, A.S.: Performance evaluation of image processing algorithms on the GPU. J. Struct. Biol. 164, 153–160 (2008). doi: 10.1016/j.jsb.2008.07.006 CrossRefGoogle Scholar
  9. 9.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68, 1370–1380 (2008). doi: 10.1016/j.jpdc.2008.05.014 CrossRefGoogle Scholar
  10. 10.
    Christiansen, M.: Adobe After Effects 7.0 Studio Techniques. Peachpit, Berkeley (2006)Google Scholar
  11. 11.
    Fernando, R., Kilgard, M.J.: The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley, Boston (2003)Google Scholar
  12. 12.
    Fraser F., Schewe, J.: Real World Camera Raw with Adobe Photoshop CS3. Peachpit, Berkeley (2008)Google Scholar
  13. 13.
    Furukawa, H.: A dynamic scaling assumption for phase separation. Adv. Phys. 34, 703–750 (1985)CrossRefGoogle Scholar
  14. 14.
    Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V.: Parallel Computing Experiences with CUDA. IEEE Micro 28, 13–27 (2008)CrossRefGoogle Scholar
  15. 15.
    Gumerov, N.A., Duraiswami, R.: Fast multipole methods on graphics processors. J. Comput. Phys. 227, 8290–8313 (2008). doi: 10.1016/ zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Harris, C., Haines K., Staveley-Smith, L.: GPU accelerated radio astronomy signal convolution. Exp. Astron. 22, 129–141 (2008). doi: 10.1007/s10686-008-9114-9 CrossRefGoogle Scholar
  17. 17.
    Ibrahim, K.Z., Bodin, F., Pène, O.: Fine-grained parallelization of lattice QCD kernel routine on GPUs. J. Parallel Distrib. Comput. 68, 1350–1359 (2008). doi: 10.1016/j.jpdc.2008.06.009 CrossRefGoogle Scholar
  18. 18.
    Li, H., Kolpas, A., Petzold, L., Moehlis, J.: Parallel simulation for a fish schooling model on a general-purpose graphics processing unit. Concurr. Comput. Pract. Exp. (2008). doi: 10.1002/cpe.1330
  19. 19.
    Liu, S., Li, P., Luo, Q.: Fast blood flow visualization of high-resolution laser speckle imaging data using graphics processing unit. Opt. Express 16, 14321–14329 (2008). doi: 10.1364/OE.16.014321 CrossRefGoogle Scholar
  20. 20.
    Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W.: Accelerating molecular dynamics simulation using Graphics Processing Units with CUDA. Comp. Phys. Comm. 179, 634–641 (2008). doi: 10.1016/j.cpc.2008.05.008 CrossRefGoogle Scholar
  21. 21.
    Lozano, O.M., Otsuka, K.: Real-time Visual Tracker by Stream Processing. J. Signal Process. Syst. (2008). doi: 10.1007/s11265-008-0250-2
  22. 22.
    Lu, P.J., Conrad, J.C., Wyss, H.M., Schofield, A.B., Weitz, D.A.: Fluids of Clusters in Attractive Colloids. Phys. Rev. Lett. 96, 028306 (2006). doi: 10.1103/PhysRevLett.96.028306 CrossRefGoogle Scholar
  23. 23.
    Lu, P.J., Sims, P.A., Oki, H., Macarthur, J.B., Weitz, D.A.: Target-locking acquisition with real-time confocal (TARC) microscopy. Opt. Express 15, 8702–8712 (2007). doi: 10.1364/OE.15.008702 CrossRefGoogle Scholar
  24. 24.
    Lu, P.J., Zaccarelli, E., Ciulla, F., Schofield, A.B., Sciortino, F., Weitz, D.A.: Gelation of particles with short-range attraction. Nature 453, 499–503 (2008). doi: 10.1038/nature06931 CrossRefGoogle Scholar
  25. 25.
    Lu, P.J.: Gelation and Phase Separation of Attractive Colloids. Harvard University Ph.D. Thesis (2008)Google Scholar
  26. 26.
    Manavski, S.A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith–Waterman sequence alignment. BCM Bioinf. 9(Suppl 2), S10 (2008). doi: 10.1186/1471-2105-9-S2-S10 CrossRefGoogle Scholar
  27. 27.
    Marziale, L., Richard III, G.C., Roussev, V.: Massive threading: Using GPUs to increase the performance of digital forensics tools. Digital Investigation 4S, S73–S81 (2007). doi: 10.1016/j.diin.2007.06.014 CrossRefGoogle Scholar
  28. 28.
    McCool, M., Du Toit, S.: Metaprogramming GPUs with Sh. Peters, Wellesley (2004)Google Scholar
  29. 29.
    Nguyen, H. (ed.): GPU Gems 3. Addison-Wesley, Upper Saddle River (2007)Google Scholar
  30. 30.
    Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26, 80–113 (2007)CrossRefGoogle Scholar
  31. 31.
    Pharr, M. (ed.): GPU Gems 2. Addison-Wesley, Upper Saddle River (2005)Google Scholar
  32. 32.
    Roeh, D.W., Kindratenko V.V., Brunner, R.J.: Accelerating cosmological data analysis with graphics processors. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. ACM, Washington (2009)Google Scholar
  33. 33.
    Ruiz, A., Ujaldon, M., Cooper, L., Huang, K.: Non-rigid Registration for Large Sets of Microscopic Images on Graphics Processors, J. Sign. Process. Syst. (2008) doi: 10.1007/s11265-008-0208-4
  34. 34.
    Samant, S.S., Xia, J., Muyan-Özçelik, P., Owens, J.D.: High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. Med. Phys. 35, 3546–3553 (2008). doi: 10.1118/1.2948318 CrossRefGoogle Scholar
  35. 35.
    Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A.: High-throughput sequence alignment using Graphics Processing Units. BCM Bioinformatics 8, 474 (2007). doi: 10.1186/1471-2105-8-474 CrossRefGoogle Scholar
  36. 36.
    Schenk, O., Christen, M., Burkhart, H.: Algorithmic perfomance studies on graphics processing units. J. Parallel Distrib. Comput. 68, 1360–1369 (2008). doi: 10.1016/j.jpdc.2008.05.008 CrossRefGoogle Scholar
  37. 37.
    Shimobaba, T., Ito, T., Masuda, N., Abe, Y., Ichihashi, Y., Nakayama, H., Takada, N., Shiraki, A., Sugie, T.: Numerical calculation library for diffraction integrals using the graphic processing unit: the GPU-based wave optics library. J. Opt. A: Pure Appl. Opt. 10, 075308 (2008). doi: 10.1088/1464-4258/10/7/075308 CrossRefGoogle Scholar
  38. 38.
    Shimobaba, T., Sato, Y., Miura, J., Takenouchi, M., Ito, T.: Real-time digital holographic microscopy using the graphics processing unit. Opt. Express 16, 11776–11781 (2008). doi: 10.1364/OE.16.011776 CrossRefGoogle Scholar
  39. 39.
    Sintorn, E., Assarsson, U.: Fast parallel GPU-sorting using a hybrid algorithm. J. Parallel Distrib. Comput. 68, 1381–1388 (2008). doi: 10.1016/j.jpdc.2008.05.012 CrossRefGoogle Scholar
  40. 40.
    Stantchev, G., Dorland W., Gumerov, N.: Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU. J. Parallel Distrib. Comput. 68, 1339–1349 (2008). doi: 10.1016/j.jpdc.2008.05.009 CrossRefGoogle Scholar
  41. 41.
    Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating Molecular Modeling Applications with Graphics Processors. J. Comput. Chem. 28, 2618–2640 (2007). doi: 10.1002/jcc.20829 CrossRefGoogle Scholar
  42. 42.
    Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.-m.W., Sutton, B.P., Liang, Z.-P.: Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distrib. Comput. 68, 1307–1317 (2008). doi: 10.1016/j.jpdc.2008.05.013 CrossRefGoogle Scholar
  43. 43.
    Taylor, S.: Intel Integrated Performance Primitives. Intel, Hillsboro, OR (2004)Google Scholar
  44. 44.
    Thibault, J.C., Senocak, I.: CUDA Implementation of a Navier–Stokes solver in multi-GPU desktop platforms for incompressible flows. In 47th AIAA Aerospace Sciences Meeting and Exhibit (2009)Google Scholar
  45. 45.
    Van Meel, J.A., Arnold, A., Frenkel, D., Portegies Zwart, S.F., Belleman, R.G.: Harvesting graphics power for MD simulations. Mol. Simulation 34, 259–266 (2008). doi: 10.1080/08927020701744295 CrossRefGoogle Scholar
  46. 46.
    Wirawan, A., Kwoh, C.K., Hieu, N.T., Schmidt, B.: CBESW: sequence alignment on the Playstation 3. BCM Bioinf. 9 377 (2008). doi: 10.1186/1471-2105-9-377 CrossRefGoogle Scholar
  47. 47.
    Zaccarelli, E., Lu, P.J., Ciulla, F., Weitz, D.A., Sciortino, F.: Gelation as arrested phase separation in short-ranged attractive colloid-polymer mixtures. J. Phys. Condens. Matter 20, 494242 (2008). doi: 10.1088/0953-8984/20/49/494242 CrossRefGoogle Scholar
  48. 48.
  49. 49.

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Peter J. Lu
    • 1
    Email author
  • Hidekazu Oki
    • 2
  • Catherine A. Frey
    • 3
  • Gregory E. Chamitoff
    • 4
  • Leroy Chiao
    • 4
  • Edward M. Fincke
    • 4
  • C. Michael Foale
    • 4
  • Sandra H. Magnus
    • 4
  • William S. McArthurJr.
    • 4
  • Daniel M. Tani
    • 4
  • Peggy A. Whitson
    • 4
  • Jeffrey N. Williams
    • 4
  • William V. Meyer
    • 5
  • Ronald J. Sicker
    • 5
  • Brion J. Au
    • 6
  • Mark Christiansen
    • 7
  • Andrew B. Schofield
    • 8
  • David A. Weitz
    • 1
  1. 1.Department of Physics and SEASHarvard UniversityCambridgeUSA
  2. 2.Shinagawa-kuTokyoJapan
  3. 3.ZIN Technologies, Inc.Middleburg HeightsUSA
  4. 4.International Space Station, Low Earth Orbit, and NASA Johnson Space CenterHoustonUSA
  5. 5.NASA Glenn Research CenterClevelandUSA
  6. 6.United Space Alliance and NASA Johnson Space CenterHoustonUSA
  7. 7.Flowseeker LLCSan FranciscoUSA
  8. 8.The School of Physics and AstronomyUniversity of EdinburghEdinburghUK

Personalised recommendations