Large Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units

  • William B. Langdon
Part of the Studies in Computational Intelligence book series (SCI, volume 269)


A suitable single instruction multiple data GP interpreter can achieve high (Giga GPop/second) performance on a SIMD GPU graphics card by simultaneously running multiple diverse members of the genetic programming population. SPMD dataflow parallelisation is achieved because the single interpreter treats the different GP programs as data. On a single 128 node parallel nVidia GeForce 8800 GTX GPU, the interpreter can out run a compiled approach, where data parallelisation comes only by running a single program at a time across multiple inputs.

The RapidMind GPGPU Linux C++ system has been demonstrated by predicting ten year+ outcome of breast cancer from a dataset containing a million inputs. NCBI GEO GSE3494 contains hundreds of Affymetrix HG-U133A and HG-U133B GeneChip biopsies. Multiple GP runs each with a population of five million programs winnow useful variables from the chaff at more than 500 million GPops per second. Sources available via FTP.


Genetic Programming Stream Processor Graphic Hardware Single Instruction Multiple Data Cartesian Genetic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banzhaf, W., Harding, S., Langdon, W.B., Wilson, G.: Accelerating genetic programming through graphics processing units. In: Genetic Programming Theory and Practice VI, May 15-17, ch. 15. Springer, Ann Arbor (2008)Google Scholar
  2. 2.
    Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An Introduction. Morgan Kaufmann, San Francisco (1998)zbMATHGoogle Scholar
  3. 3.
    Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., Edgar, R.: NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Research 35(Database issue), D760–D765 (2007)Google Scholar
  4. 4.
    Charalambous, M., Trancoso, P., Stamatakis, A.: Initial experiences porting a bioinformatics application to a graphics processor. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 415–425. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Chitty, D.M.: A data parallel approach to genetic programming using programmable graphics hardware. In: Thierens, D., et al. (eds.) GECCO 2007: Proceedings of the 9th annual conference on Genetic and evolutionary computation, London, July 7-11, vol. 2, pp. 1566–1573. ACM Press, New York (2007)CrossRefGoogle Scholar
  6. 6.
    Corney, D.P.A.: Intelligent Analysis of Small Data Sets for Food Design. PhD thesis, University College, London (2002)Google Scholar
  7. 7.
    Dowsey, A.W., Dunn, M.J., Yang, G.-Z.: Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline. Bioinformatics 24(7), 950–957 (2008)CrossRefGoogle Scholar
  8. 8.
    Ebner, M., Reinhardt, M., Albert, J.: Evolution of vertex and pixel shaders. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 261–270. Springer, Heidelberg (2005)Google Scholar
  9. 9.
    Fan, Z., Qiu, F., Kaufman, A., Yoakum-Stover, S.: GPU cluster for high performance computing. In: Proceedings of the ACM/IEEE SC2004 Conference Supercomputing (2004)Google Scholar
  10. 10.
    Feller, W.: An Introduction to Probability Theory and Its Applications, 2nd edn., vol. 1. John Wiley and Sons, Chichester (1957)zbMATHGoogle Scholar
  11. 11.
    Fernando, R.: GPGPU: general general-purpose purpose computation on GPUs. NVIDIA Developer Technology Group. Slides (2004)Google Scholar
  12. 12.
    Fok, K.-L., Wong, T.-T., Wong, M.-L.: Evolutionary computing on consumer graphics hardware. IEEE Intelligent Systems 22(2), 69–78 (2007)CrossRefGoogle Scholar
  13. 13.
    Gobron, S., Devillard, F., Heit, B.: Retina simulation using cellular automata and GPU programming. Machine Vision and Applications (2007)Google Scholar
  14. 14.
    Harding, S.L., Banzhaf, W.: Fast genetic programming and artificial developmental systems on GPUs. In: 21st International Symposium on High Performance Computing Systems and Applications (HPCS 2007), Canada, p. 2. IEEE Press, Los Alamitos (2007)CrossRefGoogle Scholar
  15. 15.
    Harding, S.: Evolution of image filters on graphics processor units using Cartesian genetic programming. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, IEEE Press, Los Alamitos (2008)Google Scholar
  16. 16.
    Harding, S., Banzhaf, W.: Fast genetic programming on GPUs. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 90–101. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Harding, S.L., Miller, J.F., Banzhaf, W.: Self-modifying Cartesian genetic programming. In: Thierens, D., et al. (eds.) GECCO 2007: Proceedings of the 9th annual conference on Genetic and evolutionary computation, London, July 7-11, vol. 1, pp. 1021–1028. ACM Press, New York (2007)CrossRefGoogle Scholar
  18. 18.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  19. 19.
    Langdon, W.B.: Genetic Programming and Data Structures. Kluwer, Dordrecht (1998)zbMATHGoogle Scholar
  20. 20.
    Langdon, W.B.: A SIMD interpreter for genetic programming on GPU graphics cards. Technical Report CSM-470, Department of Computer Science, University of Essex, Colchester, UK, July 3 (2007)Google Scholar
  21. 21.
    Langdon, W.B.: Evolving GeneChip correlation predictors on parallel graphics hardware. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, pp. 4152–4157. IEEE Press, Los Alamitos (2008)Google Scholar
  22. 22.
    Langdon, W.B.: A fast high quality pseudo random number generator for graphics processing units. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June 1-6, pp. 459–465. IEEE Press, Los Alamitos (2008)CrossRefGoogle Scholar
  23. 23.
    Langdon, W.B., Barrett, S.J.: Genetic programming in data mining for drug discovery. In: Ghosh, A., Jain, L.C. (eds.) Evolutionary Computing in Data Mining. Studies in Fuzziness and Soft Computing, ch. 10, vol. 163, pp. 211–235. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  24. 24.
    Langdon, W.B., Buxton, B.F.: Genetic programming for mining DNA chip data from cancer patients. Genetic Programming and Evolvable Machines 5(3), 251–257 (2004)CrossRefGoogle Scholar
  25. 25.
    Langdon, W.B., da Silva Camargo, R., Harrison, A.P.: Spatial defects in 5896 HG-U133A GeneChips. In: Dopazo, J., Conesa, A., Al Shahrour, F., Montener, D. (eds.) Critical Assesment of Microarray Data, Valencia, December 13-14 (2007); Presented at EMERALD WorkshopGoogle Scholar
  26. 26.
    Langdon, W.B., Harrison, A.P.: GP on SPMD parallel graphics hardware for mega bioinformatics data mining. Soft Computing 12(12), 1169–1183 (2008)CrossRefGoogle Scholar
  27. 27.
    Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  28. 28.
    Langdon, W.B., Upton, G.J.G., da Silva Camargo, R., Harrison, A.P.: A survey of spatial defects in Homo Sapiens Affymetrix GeneChips. IEEE/ACM Transactions on Computational Biology and Bioinformatics (in press, 2009)Google Scholar
  29. 29.
    Langdon, W.B., Banzhaf, W.: A SIMD interpreter for genetic programming on GPU graphics cards. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 73–85. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. 30.
    Lindblad, F., Nordin, P., Wolff, K.: Evolving 3D model interpretation of images using graphics hardware. In: Fogel, D.B., et al. (eds.) Proceedings of the 2002 Congress on Evolutionary Computation, CEC 2002, pp. 225–230. IEEE Press, Los Alamitos (2002)CrossRefGoogle Scholar
  31. 31.
    Liu, W., Schmidt, B., Voss, G., Schroder, A., Muller-Wittig, W.: Bio-sequence database scanning on a GPU. In: 20th International Parallel and Distributed Processing Symposium, IPDPS 2006, April 25-29. IEEE Press, Los Alamitos (2006)Google Scholar
  32. 32.
    Liu, Y., De Suvranu: CUDA-based real time surgery simulation. Studies in Health Technology and Informatics 132, 260–262 (2008)Google Scholar
  33. 33.
    Loviscach, J., Meyer-Spradow, J.: Genetic programming of vertex shaders. In: Chover, M., Hagen, H., Tost, D. (eds.) Proceedings of EuroMedia 2003, pp. 29–31 (2003)Google Scholar
  34. 34.
    Luo, Z., Liu, H., Wu, X.: Artificial neural network computation on graphic process unit. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, IJCNN 2005, July-4 August 2005, vol. 1, pp. 622–626 (2005)Google Scholar
  35. 35.
    Manavski, S., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics 9(suppl. 2), S10 (2008)Google Scholar
  36. 36.
    Meyer-Spradow, J., Loviscach, J.: Evolutionary design of BRDFs. In: Chover, M., Hagen, H., Tost, D. (eds.) Eurographics 2003 Short Paper Proceedings, pp. 301–306 (2003)Google Scholar
  37. 37.
    Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Sciences 102(38), 13550–13555 (2005)CrossRefGoogle Scholar
  38. 38.
    Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965)Google Scholar
  39. 39.
    NVIDIA GeForce 8800 GPU architecture overview. Technical Brief TB-02787-001_v0.9, Nvidia Corporation (November 2006)Google Scholar
  40. 40.
    NVIDIA CUDA compute unified device architecture, programming guide. Technical Report version 0.8, NVIDIA, February 12 (2007)Google Scholar
  41. 41.
    Owens, J.: Experiences with GPU computing. Presentation slides (2007)Google Scholar
  42. 42.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008); invited paperCrossRefGoogle Scholar
  43. 43.
    Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)CrossRefGoogle Scholar
  44. 44.
    Pawitan, Y., Bjohle, J., Amler, L., Borg, A.-L., Egyhazi, S., Hall, P., Han, X., Holmberg, L., Huang, F., Klaar, S., Liu, E.T., Miller, L., Nordgren, H., Ploner, A., Sandelin, K., Shaw, P.M., Smeds, J., Skoog, L., Wedren, S., Bergh, J.: Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Research 7, R953–R964 (2005)Google Scholar
  45. 45.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (2008),, (With contributions by J. R. Koza)
  46. 46.
    Price, G.R.: Selection and covariance. Nature 227, 520–521 (1970)CrossRefGoogle Scholar
  47. 47.
    Reggia, J., Tagamets, M., Contreras-Vidal, J., Jacobs, D., Weems, S., Naqvi, W., Winder, R., Chabuk, T., Jung, J., Yang, C.: Development of a large-scale integrated neurocognitive architecture - part 2: Design and architecture. Technical Report TR-CS-4827, UMIACS-TR-2006-43, University of Maryland, USA (October 2006)Google Scholar
  48. 48.
    Robilliard, D., Marion-Poty, V., Fonlupt, C.: Population parallel GP on the G80 GPU. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 98–109. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  49. 49.
    Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A.: High-throughput sequence alignment using graphics processing units. BMC Bioinformatics 8, 474 (2007)CrossRefGoogle Scholar
  50. 50.
    Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)CrossRefGoogle Scholar
  51. 51.
    Upton, G.J.G., Cook, I.: Introducing Statistics, 2nd edn. Oxford University Press, Oxford (2001)Google Scholar
  52. 52.
    Wilson, G., Banzhaf, W.: Linear genetic programming GPGPU on Microsoft’s Xbox 360. In: Wang, J. (ed.) 2008 IEEE World Congress on Computational Intelligence, Hong Kong, June1-6. IEEE Press, Los Alamitos (2008)Google Scholar
  53. 53.
    Wilson, G., Harding, S.: WCCI 2008 special session: Computational intelligence on consumer games and graphics hardware (CIGPU-2008). SIGEvolution 3(1), 19–21 (2008)Google Scholar
  54. 54.
    Wirawan, A., Kwoh, C., Hieu, N., Schmidt, B.: CBESW: sequence alignment on the PlayStation 3. BMC Bioinformatics 9(1), 377 (2008)CrossRefGoogle Scholar
  55. 55.
    Wu, Z., Irizarry, R.A., Gentleman, R., Martinez-Murillo, F., Spencer, F.: A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association 99(468), 909–917 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  56. 56.
    Yu, J., Yu, J., Almal, A.A., Dhanasekaran, S.M., Ghosh, D., Worzel, W.P., Chinnaiyan, A.M.: Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292–303 (2007)CrossRefGoogle Scholar
  57. 57.
    Zipf, G.K.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Press Inc., Reading (1949)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • William B. Langdon
    • 1
  1. 1.King’s CollegeLondon

Personalised recommendations