Advertisement

Planta

, Volume 248, Issue 5, pp 1307–1318 | Cite as

A deep convolutional neural network approach for predicting phenotypes from genotypes

  • Wenlong Ma
  • Zhixu Qiu
  • Jie Song
  • Jiajia Li
  • Qian Cheng
  • Jingjing Zhai
  • Chuang Ma
Original Article

Abstract

Main conclusion

Deep learning is a promising technology to accurately select individuals with high phenotypic values based on genotypic data.

Abstract

Genomic selection (GS) is a promising breeding strategy by which the phenotypes of plant individuals are usually predicted based on genome-wide markers of genotypes. In this study, we present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypes when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional genotypic data. We used a large GS dataset to train DeepGS and compared its performance with other methods. The experimental results indicate that DeepGS can be used as a complement to the commonly used RR-BLUP in the prediction of phenotypes from genotypes. The complementarity between DeepGS and RR-BLUP can be utilized using an ensemble learning approach for more accurately selecting individuals with high phenotypic values, even for the absence of outlier individuals and subsets of genotypic markers. The source codes of DeepGS and the ensemble learning approach have been packaged into Docker images for facilitating their applications in different GS programs.

Keywords

Deep learning Ensemble learning Genomic selection High phenotypic values Machine learning Genotypic marker 

Abbreviations

CNN

Deep convolutional neural network

DL

Deep learning

GS

Genomic selection

MNV

Mean normalized discounted cumulative gain value

(RR)-BLUP

(Ridge regression)-Best linear unbiased prediction

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China (31570371), the Agricultural Science and Technology Innovation and Research Project of Shaanxi Province, China (2015NY011), the Youth 1000-Talent Program of China, the Hundred Talents Program of Shaanxi Province of China, the Innovative Talents Promotion Project of Shaanxi Province of China (2017KJXX-67), and the Fund of Northwest A&F University.

Compliance with ethical standards

Conflict of interest

We declare that we have no competing interests.

Supplementary material

425_2018_2976_MOESM1_ESM.pdf (895 kb)
Supplementary material 1 (PDF 896 kb)

References

  1. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838.  https://doi.org/10.1038/nbt.3300 CrossRefPubMedPubMedCentralGoogle Scholar
  2. Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878.  https://doi.org/10.15252/msb.20156651 CrossRefPubMedPubMedCentralGoogle Scholar
  3. Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, Tyagi A, Mushtaq M, Jain N, Singh PK, Singh GP, Prabhu KV (2016) Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Front Genet 7:221.  https://doi.org/10.3389/fgene.2016.00221 CrossRefPubMedPubMedCentralGoogle Scholar
  4. Bhering LL, Junqueira VS, Peixoto LA, Cruz CD, Laviola BG (2015) Comparison of methods used to identify superior individuals in genomic selection in plant breeding. Genet Mol Res 14(3):10888–10896.  https://doi.org/10.4238/2015.September.9.26 CrossRefPubMedPubMedCentralGoogle Scholar
  5. Blondel M, Onogi A, Iwata H, Ueda N (2015) A ranking approach to genomic selection. PLoS One 10(6):e0128570.  https://doi.org/10.1371/journal.pone.0128570 CrossRefPubMedPubMedCentralGoogle Scholar
  6. Chen Y, Li Y, Narayan R, Subramanian A, Xie X (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1832–1839.  https://doi.org/10.1093/bioinformatics/btw074 CrossRefPubMedPubMedCentralGoogle Scholar
  7. Crossa J, Jarquín D, Franco J, Pérez-Rodríguez P, Burgueño J, Saint-Pierre C, Vikram P, Sansaloni C, Petroli C, Akdemir D, Sneller C, Reynolds M, Tattaris M, Payne T, Guzman C, Peña RJ, Wenzl P, Singh S (2016) Genomic prediction of gene bank wheat landraces. G3 (Bethesda) 6(7):1819–1834.  https://doi.org/10.1534/g3.116.029637 CrossRefGoogle Scholar
  8. Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, Burgueño J, Camacho-González JM, Pérez-Elizalde S, Beyene Y, Dreisigacker S, Singh R, Zhang X, Gowda M, Roorkiwal M, Rutkoski J, Varshney RK (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22(11):961–975.  https://doi.org/10.1016/j.tplants.2017.08.011 CrossRefGoogle Scholar
  9. de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182(1):375–385.  https://doi.org/10.1534/genetics.109.101501 CrossRefPubMedPubMedCentralGoogle Scholar
  10. Desta ZA, Ortiz R (2014) Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci 19(9):592–601.  https://doi.org/10.1016/j.tplants.2014.05.006 CrossRefGoogle Scholar
  11. Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3):250.  https://doi.org/10.3835/plantgenome2011.08.0024 CrossRefGoogle Scholar
  12. Gianola D, Schön CC (2016) Cross-validation without doing cross-validation in genome-enabled prediction. G3 (Bethesda) 6(10):3107–3128.  https://doi.org/10.1534/g3.116.033381 CrossRefGoogle Scholar
  13. Guzman C, Peña RJ, Singh R, Autrique E, Dreisigacker S, Crossa J, Rutkoski J, Poland J, Battenfield S (2016) Wheat quality improvement at CIMMYT and the use of genomic selection on it. Appl Transl Genom 11:3–8.  https://doi.org/10.1016/j.atg.2016.10.004 CrossRefPubMedPubMedCentralGoogle Scholar
  14. Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 4(1):65–75.  https://doi.org/10.3835/plantgenome2010.12.0029 CrossRefGoogle Scholar
  15. Huang M, Cabrera A, Hoffstetter A, Griffey C, Van Sanford D, Costa J, McKendry A, Chao S, Sneller C (2016) Genomic selection for wheat traits and trait stability. Theor Appl Genet 129(9):1697–1710.  https://doi.org/10.1007/s00122-016-2733-z CrossRefPubMedPubMedCentralGoogle Scholar
  16. Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9(2):166–177.  https://doi.org/10.1093/bfgp/elq001 CrossRefGoogle Scholar
  17. Jo T, Hou J, Eickholt J, Cheng J (2015) Improving protein fold recognition by deep learning networks. Sci Rep 5:17573.  https://doi.org/10.1038/srep17573 CrossRefPubMedPubMedCentralGoogle Scholar
  18. Jonas E, de Koning DJ (2013) Does genomic selection have a future in plant breeding? Trends Biotechnol 31(9):497–504.  https://doi.org/10.1016/j.tibtech.2013.06.003 CrossRefGoogle Scholar
  19. Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26(7):990–999.  https://doi.org/10.1101/gr.200535.115 CrossRefPubMedPubMedCentralGoogle Scholar
  20. Kennedy J, Eberhart R (1995) Particle swarm optimization. ICNN 4:1942–1948.  https://doi.org/10.1109/icnn.1995.488968 CrossRefGoogle Scholar
  21. Kim SG, Harwani M, Grama A, Chaterji S (2016) EP-DNN: a deep neural network-based global enhancer prediction algorithm. Sci Rep 6:38433.  https://doi.org/10.1038/srep38433 CrossRefPubMedPubMedCentralGoogle Scholar
  22. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.  https://doi.org/10.1038/nature14539 CrossRefGoogle Scholar
  23. Liu F, Li H, Ren C, Bo X, Shu W (2016) PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci Rep 6:28517.  https://doi.org/10.1038/srep28517 CrossRefPubMedPubMedCentralGoogle Scholar
  24. Marulanda JJ, Mi X, Melchinger AE, Xu JL, Würschum T, Longin CF (2016) Optimum breeding strategies using genomic selection for hybrid breeding in wheat, maize, rye, barley, rice and triticale. Theor Appl Genet 129(10):1901–1913.  https://doi.org/10.1007/s00122-016-2748-5 CrossRefPubMedPubMedCentralGoogle Scholar
  25. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829PubMedPubMedCentralGoogle Scholar
  26. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869.  https://doi.org/10.1093/bib/bbw068 CrossRefPubMedPubMedCentralGoogle Scholar
  27. Poland J, Rutkoski J (2016) Advances and challenges in genomic selection for disease resistance. Annu Rev Phytopathol 54:79–98.  https://doi.org/10.1146/annurev-phyto-080615-100056 CrossRefPubMedPubMedCentralGoogle Scholar
  28. Qiu Z, Cheng Q, Song J, Tang Y, Ma C (2016) Application of machine learning-based classification to genomic selection and performance improvement. In: Huang DS, Bevilacqua V, Premaratne P (eds) Intelligent computing theories and applicaton. Proceedings of the 12th international conference on intelligent computing (ICIC 2016), Lecture notes in computer science, vol 9771, pp 412–421.  https://doi.org/10.1007/978-3-319-42291-6_41 CrossRefGoogle Scholar
  29. Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 44(11):e107.  https://doi.org/10.1093/nar/gkw226 CrossRefPubMedPubMedCentralGoogle Scholar
  30. Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31(5):761–763.  https://doi.org/10.1093/bioinformatics/btu703 CrossRefPubMedPubMedCentralGoogle Scholar
  31. Resende MF Jr, Muñoz P, Resende MD, Garrick DJ, Fernando RL, Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M (2012) Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 190(4):1503–1510.  https://doi.org/10.1534/genetics.111.137026 CrossRefPubMedPubMedCentralGoogle Scholar
  32. Riedelsheimer C, Technow F, Melchinger AE (2012) Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines. BMC Genomics 13:452.  https://doi.org/10.1186/1471-2164-13-452 CrossRefPubMedPubMedCentralGoogle Scholar
  33. Roorkiwal M, Rathore A, Das RR, Singh MK, Jain A, Srinivasan S, Gaur PM, Chellapilla B, Tripathi S, Li Y, Hickey JM, Lorenz A, Sutton T, Crossa J, Jannink JL, Varshney RK (2016) Genome-enabled prediction models for yield related traits in chickpea. Front Plant Sci 7:1666.  https://doi.org/10.3389/fpls.2016.01666 CrossRefPubMedPubMedCentralGoogle Scholar
  34. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536.  https://doi.org/10.1038/323533a0 CrossRefGoogle Scholar
  35. Schmidt M, Kollers S, Maasberg-Prelle A, Großer J, Schinkel B, Tomerius A, Graner A, Korzun V (2016) Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theor Appl Genet 129(2):203–213.  https://doi.org/10.1007/s00122-015-2639-1 CrossRefGoogle Scholar
  36. Singh R, Lanchantin J, Robins G, Qi Y (2016) DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32(17):i639–i648.  https://doi.org/10.1093/bioinformatics/btw427 CrossRefPubMedPubMedCentralGoogle Scholar
  37. Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E, Atlin G, Jannink JL, McCouch SR (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11(2):e1004982.  https://doi.org/10.1371/journal.pgen.1004982 CrossRefPubMedPubMedCentralGoogle Scholar
  38. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. JMLR 15:1929–1958Google Scholar
  39. van Eeuwijk FA, Bink MC, Chenu K, Chapman SC (2010) Detection and use of QTL for complex traits in multiple environments. Curr Opin Plant Biol 13(2):193–205.  https://doi.org/10.1016/j.pbi.2010.01.001 CrossRefPubMedPubMedCentralGoogle Scholar
  40. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423.  https://doi.org/10.3168/jds.2007-0980 CrossRefPubMedCentralGoogle Scholar
  41. Varshney RK (2016) Exciting journey of 10 years from genomes to fields and markets: some success stories of genomics-assisted breeding in chickpea, pigeonpea and groundnut. Plant Sci 242:98–107.  https://doi.org/10.1016/j.plantsci.2015.09.009 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6:18962.  https://doi.org/10.1038/srep18962 CrossRefPubMedPubMedCentralGoogle Scholar
  43. Whittaker JC, Thompson R, Denham MC (2000) Marker-assisted selection using ridge regression. Genet Res 75(2):249–252.  https://doi.org/10.1017/S0016672399004462 CrossRefGoogle Scholar
  44. Wimmer V, Lehermeier C, Albrecht T, Auinger HJ, Wang Y, Schön CC (2013) Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195(2):573–587.  https://doi.org/10.1534/genetics.113.150078 CrossRefPubMedPubMedCentralGoogle Scholar
  45. Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RK, Hua Y, Gueroussov S, Najafabadi HS, Hughes TR, Morris Q, Barash Y, Krainer AR, Jojic N, Scherer SW, Blencowe BJ, Frey BJ (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806.  https://doi.org/10.1126/science.1254806 CrossRefPubMedPubMedCentralGoogle Scholar
  46. Xu Y, Crouch JH (2008) Marker-assisted selection in plant breeding: from publications to practice. Crop Sci 48(2):391.  https://doi.org/10.2135/cropsci2007.04.0191 CrossRefGoogle Scholar
  47. Yu X, Li X, Guo T, Zhu C, Wu Y, Mitchell SE, Roozeboom KL, Wang D, Wang ML, Pederson GA, Tesso TT, Schnable PS, Bernardo R, Yu J (2016) Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat Plants 2:16150.  https://doi.org/10.1038/nplants.2016.150 CrossRefPubMedPubMedCentralGoogle Scholar
  48. Zeng H, Edwards MD, Ge L, Gifford DK, Zeng H, Edwards MD, Ge L, Gifford DK (2016) Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12):i121–i127.  https://doi.org/10.1093/bioinformatics/btw255 CrossRefPubMedPubMedCentralGoogle Scholar
  49. Zhang S, Zhou J, Hu H, Gong H, Chen L, Cheng C, Zeng J (2016) A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res 44(4):e32.  https://doi.org/10.1093/nar/gkv1025 CrossRefPubMedPubMedCentralGoogle Scholar
  50. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934.  https://doi.org/10.1038/nmeth.3547 CrossRefPubMedPubMedCentralGoogle Scholar
  51. Zou C, Wang P, Xu Y (2016) Bulked sample analysis in genetics, genomics and crop improvement. Plant Biotechnol J 14(10):1941–1955.  https://doi.org/10.1111/pbi.12559 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life SciencesNorthwest A&F UniversityYanglingChina
  2. 2.Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest RegionMinistry of Agriculture, Northwest A&F UniversityYanglingChina
  3. 3.Biomass Energy Center for Arid and Semi-arid LandsNorthwest A&F UniversityShaanxiChina

Personalised recommendations