Creating and Debugging Performance CUDA C

Part of the Studies in Computational Intelligence book series (SCI, volume 415)


Various practical ways of testing, locating and removing bugs in parallel general-purpose computation on graphics hardware GPGPU applications are described. Some of these are generic whilst other relate directly to stochastic bioinspired techniques, such as genetic programming. We pass on software engineering lessons learnt during CUDA C programming and ways to obtain high performance from nVidia GPU and Tesla cards including examples of both successful and less successful recent applications.


C programming GPU GPGPU GPPPU parallel computing computer game hardware graphics controller parallel computing rcs randomised search 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, D.T., Luke III, R.H., Keller, J.M.: Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Transactions on Fuzzy Systems 16(4), 1101–1106 (2008), CrossRefGoogle Scholar
  2. 2.
    Arabnia, H.R., Oliver, M.A.: A transputer network for the arbitrary rotation of digitised images. The Computer Journal 30(5), 425–432 (1987), Google Scholar
  3. 3.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamondt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA, USA, April 26-28, pp. 163–174 (2009),
  4. 4.
    Cantu-Paz, E., Goldberg, D.E.: Efficient parallel genetic algorithms: theory and practice. Computer Methods in Applied Mechanics and Engineering 186(2-4), 221–238 (2000), MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Ebner, M., Reinhardt, M., Albert, J.: Evolution of Vertex and Pixel Shaders. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 261–270. Springer, Heidelberg (2005), CrossRefGoogle Scholar
  6. 6.
    Fernando, R., Kilgard, M.J.: The Cg Tutorial. Addison-Wesley, nVidia (2003)Google Scholar
  7. 7.
    Fok, K.-L., Wong, T.-T., Wong, M.-L.: Evolutionary computing on consumer graphics hardware. IEEE Intelligent Systems 22(2), 69–78 (2007), CrossRefGoogle Scholar
  8. 8.
    Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Communications of the ACM 53(11), 58–66 (2010)CrossRefGoogle Scholar
  9. 9.
    Greene, C.S., Sinnott-Armstrong, N.A., Himmelstein, D.S., Park, P.J., Moore, J.H., Harris, B.T.: Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics 26(5), 694–695 (2010), CrossRefGoogle Scholar
  10. 10.
    Harding, S., Banzhaf, W.: Fast Genetic Programming on GPUs. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 90–101. Springer, Heidelberg (2007), CrossRefGoogle Scholar
  11. 11.
    Harding, S.L., Banzhaf, W.: Distributed genetic programming on GPUs using CUDA. In: Hidalgo, I., Fernandez, F., Lanchares, J. (eds.) Workshop on Parallel Architectures and Bioinspired Algorithms, Raleigh, NC, USA, September 13, pp. 1–10. Universidad Complutense de Madrid (2009),
  12. 12.
    Harvey, N., Luke, R., Keller, J.M., Anderson, D.: Speedup of fuzzy logic through stream processing on graphics processing units. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, pp. 3809–3815 (2008),
  13. 13.
    Kernighan, B.W., Ritchie, D.M.: The C Programming Language, 2nd edn. Prentice-Hall, Englewood Cliffs (1988)Google Scholar
  14. 14.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Natural Selection. MIT press (1992),
  15. 15.
    Langdon, W.B.: Evolving GeneChip correlation predictors on parallel graphics hardware. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June1-6, pp. 4152–4157 (2008),
  16. 16.
    Langdon, W.B.: A fast high quality pseudo random number generator for nVidia CUDA. In: Wilson, G. (ed.) CIGPU Workshop at GECCO, Montreal, July 8, pp. 2511–2513. ACM (2009),
  17. 17.
    Langdon, W.B.: A Many Threaded CUDA Interpreter for Genetic Programming. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 146–158. Springer, Heidelberg (2010), CrossRefGoogle Scholar
  18. 18.
    Langdon, W.B., Harman, M.: Evolving a CUDA kernel from an nVidia template. In: Sobrevilla, P. (ed.) 2010 IEEE World Congress on Computational Intelligence, Barcelona, July 18-23, pp. 2376–2383 (2010),
  19. 19.
    Langdon, W.B., Harrison, A.P.: GP on SPMD parallel graphics hardware for mega bioinformatics data mining. Soft Computing 12(12), 1169–1183 (2008), CrossRefGoogle Scholar
  20. 20.
    Langdon, W.B., Yoo, S., Harman, M.: Formal concept analysis on graphics hardware. In: Napoli, A., Vychodil, V. (eds.) The Eighth International Conference on Concept Lattices and Their Applications, Nancy, France, October 17-21, pp. 413–416. INRIA Nancy and LORIA (2011),
  21. 21.
    Langdon, W.B., Banzhaf, W.: A SIMD Interpreter for Genetic Programming on GPU Graphics Cards. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 73–85. Springer, Heidelberg (2008), CrossRefGoogle Scholar
  22. 22.
    Langdon, W.B., Harman, M., Jia, Y.: Efficient multi-objective higher order mutation testing with genetic programming. Journal of Systems and Software 83(12), 2416–2430 (2010), CrossRefGoogle Scholar
  23. 23.
    Moler, C.: Matrix computation on distributed memory multiprocessors. In: Heath, M.T. (ed.) Proceedings of the First Conference on Hypercube Multiprocessors, Knoxville, Tennessee, USA, August 24-27, pp. 181–195. Society for Industrial and Applied Mathematics (1986),
  24. 24.
    Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965), Google Scholar
  25. 25.
    Mussi, L., Cagnoni, S., Daolio, F.: GPU-based road sign detection using particle swarm optimization. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA 2009, Pisa, Italy, pp. 152–157. IEEE (2009), November 30-2 December,
  26. 26.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008),; (invited paper)CrossRefGoogle Scholar
  27. 27.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (2008) (With contributions by Koza, J.R.), Published via, freely available at
  28. 28.
    Prabhu, R.D.: SOMGPU: an unsupervised pattern classifier on graphical processing unit. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, pp. 1011–1018 (2008),
  29. 29.
    Reeves, C.R., Rowe, J.E.: Genetic Algorithms–Principles and Perspectives: A Guide to GA Theory. Kluwer Academic Publishers (2003)Google Scholar
  30. 30.
    Ribeiro Filho, J.L., Treleaven, P.C.: Genetic-algorithm programming environments. Computer 27(6), 28 (1994), CrossRefGoogle Scholar
  31. 31.
    Rieffel, J., Saunders, F., Nadimpalli, S., Zhou, H., Hassoun, S., Rife, J., Trimmer, B.: Evolving soft robotic locomotion in PhysX. In: GECCO 2009: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, Montreal, Québec, Canada, July 8-12, pp. 2499–2504. ACM (2009),
  32. 32.
    Sinnott-Armstrong, N.A., Granizo-Mackenzie, D., Moore, J.H.: High performance parallel disease detection: an artificial immune system for graphics processing units. In: GECCO 2010 GPUs for Genetic and Evolutionary Computation, Winning Entry (2011),
  33. 33.
    Stender, J. (ed.): Parallel Genetic Algorithms: Theory and Applications. IOS press (1993)Google Scholar
  34. 34.
    Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS 2010), White Plains, NY, USA, March 28-30, pp. 235–246 (2010),
  35. 35.
    Yamamoto, L., Banzhaf, W., Collet, P.: Evolving Reaction-Diffusion Systems on GPU. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 208–223. Springer, Heidelberg (2011), CrossRefGoogle Scholar
  36. 36.
    Yudanov, D., Shaaban, M., Melton, R., Reznik, L.: GPU-based implementation of real-time system for spiking neural networks. In: Sobrevilla, P. (ed.) 2010 IEEE World Congress on Computational Intelligence, Barcelona, July 18-23, pp. 2143–2150 (2010),
  37. 37.
    Zhu, W., Curry, J.: Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2009, San Antonio, Texas, USA, October 11-14, pp. 1803–1808 (2009),

Copyright information

© Springer Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.CREST, Computer Science, Department of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations