Creating and Debugging Performance CUDA C

Part of the Studies in Computational Intelligence book series (SCI, volume 415)

Abstract

Various practical ways of testing, locating and removing bugs in parallel general-purpose computation on graphics hardware GPGPU applications are described. Some of these are generic whilst other relate directly to stochastic bioinspired techniques, such as genetic programming. We pass on software engineering lessons learnt during CUDA C programming and ways to obtain high performance from nVidia GPU and Tesla cards including examples of both successful and less successful recent applications.

Keywords

C programming GPU GPGPU GPPPU parallel computing computer game hardware graphics controller parallel computing rcs randomised search 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson, D.T., Luke III, R.H., Keller, J.M.: Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Transactions on Fuzzy Systems 16(4), 1101–1106 (2008), http://dx.doi.org/10.1109/TFUZZ.2008.924203 CrossRefGoogle Scholar
  2. 2.
    Arabnia, H.R., Oliver, M.A.: A transputer network for the arbitrary rotation of digitised images. The Computer Journal 30(5), 425–432 (1987), http://comjnl.oxfordjournals.org/cgi/reprint/30/5/425.pdf Google Scholar
  3. 3.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamondt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA, USA, April 26-28, pp. 163–174 (2009), http://dx.doi.org/10.1109/ISPASS.2009.4919648
  4. 4.
    Cantu-Paz, E., Goldberg, D.E.: Efficient parallel genetic algorithms: theory and practice. Computer Methods in Applied Mechanics and Engineering 186(2-4), 221–238 (2000), http://dx.doi.org/10.1016/S0045-78259900385-0 MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Ebner, M., Reinhardt, M., Albert, J.: Evolution of Vertex and Pixel Shaders. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 261–270. Springer, Heidelberg (2005), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/eurogp_EbnerRA05.html CrossRefGoogle Scholar
  6. 6.
    Fernando, R., Kilgard, M.J.: The Cg Tutorial. Addison-Wesley, nVidia (2003)Google Scholar
  7. 7.
    Fok, K.-L., Wong, T.-T., Wong, M.-L.: Evolutionary computing on consumer graphics hardware. IEEE Intelligent Systems 22(2), 69–78 (2007), http://dx.doi.org/10.1109/MIS.2007.28 CrossRefGoogle Scholar
  8. 8.
    Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Communications of the ACM 53(11), 58–66 (2010)CrossRefGoogle Scholar
  9. 9.
    Greene, C.S., Sinnott-Armstrong, N.A., Himmelstein, D.S., Park, P.J., Moore, J.H., Harris, B.T.: Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS. Bioinformatics 26(5), 694–695 (2010), http://dx.doi.org/10.1093/bioinformatics/btq009 CrossRefGoogle Scholar
  10. 10.
    Harding, S., Banzhaf, W.: Fast Genetic Programming on GPUs. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 90–101. Springer, Heidelberg (2007), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/eurogp07_harding.html CrossRefGoogle Scholar
  11. 11.
    Harding, S.L., Banzhaf, W.: Distributed genetic programming on GPUs using CUDA. In: Hidalgo, I., Fernandez, F., Lanchares, J. (eds.) Workshop on Parallel Architectures and Bioinspired Algorithms, Raleigh, NC, USA, September 13, pp. 1–10. Universidad Complutense de Madrid (2009), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/hardinggpem2009.html
  12. 12.
    Harvey, N., Luke, R., Keller, J.M., Anderson, D.: Speedup of fuzzy logic through stream processing on graphics processing units. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, pp. 3809–3815 (2008), http://dx.doi.org/10.1109/CEC.2008.4631314
  13. 13.
    Kernighan, B.W., Ritchie, D.M.: The C Programming Language, 2nd edn. Prentice-Hall, Englewood Cliffs (1988)Google Scholar
  14. 14.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Natural Selection. MIT press (1992), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/koza_book.html
  15. 15.
    Langdon, W.B.: Evolving GeneChip correlation predictors on parallel graphics hardware. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June1-6, pp. 4152–4157 (2008), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2008_CIGPU2.html
  16. 16.
    Langdon, W.B.: A fast high quality pseudo random number generator for nVidia CUDA. In: Wilson, G. (ed.) CIGPU Workshop at GECCO, Montreal, July 8, pp. 2511–2513. ACM (2009), http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/langdon_2009_CIGPU.pdf
  17. 17.
    Langdon, W.B.: A Many Threaded CUDA Interpreter for Genetic Programming. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 146–158. Springer, Heidelberg (2010), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2010_eurogp.html CrossRefGoogle Scholar
  18. 18.
    Langdon, W.B., Harman, M.: Evolving a CUDA kernel from an nVidia template. In: Sobrevilla, P. (ed.) 2010 IEEE World Congress on Computational Intelligence, Barcelona, July 18-23, pp. 2376–2383 (2010), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2010_cigpu.html
  19. 19.
    Langdon, W.B., Harrison, A.P.: GP on SPMD parallel graphics hardware for mega bioinformatics data mining. Soft Computing 12(12), 1169–1183 (2008), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2008_SC.html CrossRefGoogle Scholar
  20. 20.
    Langdon, W.B., Yoo, S., Harman, M.: Formal concept analysis on graphics hardware. In: Napoli, A., Vychodil, V. (eds.) The Eighth International Conference on Concept Lattices and Their Applications, Nancy, France, October 17-21, pp. 413–416. INRIA Nancy and LORIA (2011), http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/langdon_2011_cla.pdf
  21. 21.
    Langdon, W.B., Banzhaf, W.: A SIMD Interpreter for Genetic Programming on GPU Graphics Cards. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 73–85. Springer, Heidelberg (2008), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2008_eurogp.html CrossRefGoogle Scholar
  22. 22.
    Langdon, W.B., Harman, M., Jia, Y.: Efficient multi-objective higher order mutation testing with genetic programming. Journal of Systems and Software 83(12), 2416–2430 (2010), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2010_jss.html CrossRefGoogle Scholar
  23. 23.
    Moler, C.: Matrix computation on distributed memory multiprocessors. In: Heath, M.T. (ed.) Proceedings of the First Conference on Hypercube Multiprocessors, Knoxville, Tennessee, USA, August 24-27, pp. 181–195. Society for Industrial and Applied Mathematics (1986), http://books.google.co.uk/books?id=QN8HNVwZEecC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false
  24. 24.
    Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965), ftp://download.intel.com/museum/Moores_Law/Articles-Press_Releases/Gordon_Moore_1965_Article.pdf Google Scholar
  25. 25.
    Mussi, L., Cagnoni, S., Daolio, F.: GPU-based road sign detection using particle swarm optimization. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA 2009, Pisa, Italy, pp. 152–157. IEEE (2009), November 30-2 December, http://dx.doi.org/10.1109/ISDA.2009.88
  26. 26.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008), http://dx.doi.org/10.1109/JPROC.2008.917757; (invited paper)CrossRefGoogle Scholar
  27. 27.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (2008) (With contributions by Koza, J.R.), Published via http://lulu.com, freely available at http://www.gp-field-guide.org.uk
  28. 28.
    Prabhu, R.D.: SOMGPU: an unsupervised pattern classifier on graphical processing unit. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, pp. 1011–1018 (2008), http://dx.doi.org/10.1109/CEC.2008.4630920
  29. 29.
    Reeves, C.R., Rowe, J.E.: Genetic Algorithms–Principles and Perspectives: A Guide to GA Theory. Kluwer Academic Publishers (2003)Google Scholar
  30. 30.
    Ribeiro Filho, J.L., Treleaven, P.C.: Genetic-algorithm programming environments. Computer 27(6), 28 (1994), http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/RibeiroFilho_1994_GPE.html CrossRefGoogle Scholar
  31. 31.
    Rieffel, J., Saunders, F., Nadimpalli, S., Zhou, H., Hassoun, S., Rife, J., Trimmer, B.: Evolving soft robotic locomotion in PhysX. In: GECCO 2009: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, Montreal, Québec, Canada, July 8-12, pp. 2499–2504. ACM (2009), http://dx.doi.org/10.1145/1570256.1570351
  32. 32.
    Sinnott-Armstrong, N.A., Granizo-Mackenzie, D., Moore, J.H.: High performance parallel disease detection: an artificial immune system for graphics processing units. In: GECCO 2010 GPUs for Genetic and Evolutionary Computation, Winning Entry (2011), http://www.gpgpgpu.com/gecco2010/2.pdf
  33. 33.
    Stender, J. (ed.): Parallel Genetic Algorithms: Theory and Applications. IOS press (1993)Google Scholar
  34. 34.
    Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS 2010), White Plains, NY, USA, March 28-30, pp. 235–246 (2010), http://dx.doi.org/10.1109/ISPASS.2010.5452013
  35. 35.
    Yamamoto, L., Banzhaf, W., Collet, P.: Evolving Reaction-Diffusion Systems on GPU. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 208–223. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-24769-9_16 CrossRefGoogle Scholar
  36. 36.
    Yudanov, D., Shaaban, M., Melton, R., Reznik, L.: GPU-based implementation of real-time system for spiking neural networks. In: Sobrevilla, P. (ed.) 2010 IEEE World Congress on Computational Intelligence, Barcelona, July 18-23, pp. 2143–2150 (2010), http://dx.doi.org/10.1109/IJCNN.2010.5596334
  37. 37.
    Zhu, W., Curry, J.: Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2009, San Antonio, Texas, USA, October 11-14, pp. 1803–1808 (2009), http://dx.doi.org/10.1109/ICSMC.2009.5346870

Copyright information

© Springer Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.CREST, Computer Science, Department of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations