Application of Machine Learning-Based Classification to Genomic Selection and Performance Improvement

  • Zhixu Qiu
  • Qian Cheng
  • Jie Song
  • Yunjia Tang
  • Chuang MaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9771)


Genomic selection (GS) is a novel breeding strategy that selects individuals with high breeding value using computer programs. Although GS has long been practiced in the field of animal breeding, its application is still challenging in crops with high breeding efficiency, due to the limited training population size, the nature of genotype-environment interactions, and the complex interaction patterns between molecular markers. In this study, we developed a bioinformatics pipeline to perform machine learning (ML)-based classification for GS. We built a random forest-based ML classifier to produce an improved prediction performance, compared with four widely used GS prediction models on the maize GS dataset under study. We found that a reasonable ratio between positive and negative samples of training dataset is required in the ML-based GS classification system. Moreover, we recommended more careful selection of informative SNPs to build a ML-based GS model with high prediction performance.


Genomic selection Marker-assisted breeding Relative efficiency Machine learning Random forest 



This work was supported by the grants of the National Natural Science Foundation of China (No. 31570371), Agricultural Science and Technology Innovation and Research Project of Shaanxi Province, China (No. 2015NY011) and the Fund of Northwest A&F University (No. Z111021403 and Z109021514).


  1. 1.
    Meuwissen, T.H., Hayes, B.J., Goddard, M.E.: Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001)Google Scholar
  2. 2.
    Desta, Z.A., Ortiz, R.: Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci. 19, 592–601 (2014)CrossRefGoogle Scholar
  3. 3.
    Hayes, B.J., Bowman, P.J., Chamberlain, A.J., Goddard, M.E.: Invited review: genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009)CrossRefGoogle Scholar
  4. 4.
    Wellmann, R., Preuss, S., Tholen, E., Heinkel, J., Wimmers, K., Bennewitz, J.: Genomic selection using low density marker panels with application to a sire line in pigs. Genet. Sel. Evol. 45, 28 (2013)CrossRefGoogle Scholar
  5. 5.
    Wolc, A., Zhao, H.H., Arango, J., Settar, P., Fulton, J.E., O’Sullivan, N.P., Preisinger, R., Stricker, C., Habier, D., Fernando, R.L., Garrick, D.J., Lamont, S.J., Dekkers, J.C.: Response and inbreeding from a genomic selection experiment in layer chickens. Genet. Sel. Evol. 47, 59 (2015)CrossRefGoogle Scholar
  6. 6.
    Isidro, J., Jannink, J.L., Akdemir, D., Poland, J., Heslot, N., Sorrells, M.E.: Training set optimization under population structure in genomic selection. Theoret. Appl. Genet. 128, 145–158 (2015)CrossRefGoogle Scholar
  7. 7.
    Crossa, J., Perez, P., Hickey, J., Burgueno, J., Ornella, L., Ceron-Rojas, J., Zhang, X., Dreisigacker, S., Babu, R., Li, Y., Bonnett, D., Mathews, K.: Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity 112, 48–60 (2014)CrossRefGoogle Scholar
  8. 8.
    Brito, F.V., Neto, J.B., Sargolzaei, M., Cobuci, J.A., Schenkel, F.S.: Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genet. 12, 80 (2011)CrossRefGoogle Scholar
  9. 9.
    Habier, D., Fernando, R.L., Kizilkaya, K., Garrick, D.J.: Extension of the Bayesian alphabet for genomic selection. BMC Bioinform. 12, 186 (2011)CrossRefGoogle Scholar
  10. 10.
    Endelman, J.B.: Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011)CrossRefGoogle Scholar
  11. 11.
    de Los Campos, G., Hickey, J.M., Pong-Wong, R., Daetwyler, H.D., Calus, M.P.: Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–345 (2013)CrossRefGoogle Scholar
  12. 12.
    Blondel, M., Onogi, A., Iwata, H., Ueda, N.: A ranking approach to genomic selection. PLoS ONE 10, 0128570 (2015)CrossRefGoogle Scholar
  13. 13.
    Ornella, L., Perez, P., Tapia, E., Gonzalez-Camacho, J.M., Burgueno, J., Zhang, X., Singh, S., Vicente, F.S., Bonnett, D., Dreisigacker, S., Singh, R., Long, N., Crossa, J.: Genomic-enabled prediction with classification algorithms. Heredity 112, 616–626 (2014)CrossRefGoogle Scholar
  14. 14.
    Gonzalez-Camacho, J.M., Crossa, J., Perez-Rodriguez, P., Ornella, L., Gianola, D.: Genome-enabled prediction using probabilistic neural network classifiers. BMC Genom. 17, 208 (2016)CrossRefGoogle Scholar
  15. 15.
    Chen, X., Ishwaran, H.: Random forests for genomic data analysis. Genomics 99, 323–329 (2012)CrossRefGoogle Scholar
  16. 16.
    Sturm, M., Hackenberg, M., Langenberger, D., Frishman, D.: TargetSpy: a supervised machine learning approach for MicroRNA target prediction. BMC Bioinform. 11, 292 (2010)CrossRefGoogle Scholar
  17. 17.
    Cui, H., Zhai, J., Ma, C.: MiRLocator: machine learning-based prediction of mature MicroRNAs within plant pre-miRNA sequences. PLoS ONE 10, e0142753 (2015)CrossRefGoogle Scholar
  18. 18.
    Hamp, T., Rost, B.: More challenges for machine-learning protein interactions. Bioinformatics 31, 1521–1525 (2015)CrossRefGoogle Scholar
  19. 19.
    Shaik, R., Ramakrishna, W.: Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice. Plant Physiol. 164, 481–595 (2014)CrossRefGoogle Scholar
  20. 20.
    Ma, C., Xin, M., Feldmann, K.A., Wang, X.: Machine learning-based differential network analysis: a study of stress-responsive transcriptomes in arabidopsis. Plant Cell 26, 520–537 (2014)CrossRefGoogle Scholar
  21. 21.
    Hickey, J.M., Dreisigacker, S., Crossa, J., Hearne, S., Babu, R., Prasanna, B.M., Grondona, M., Zambelli, A., Windhausen, V.S., Mathews, K., Gorjanc, G.: Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci. 54, 1476–1488 (2014)CrossRefGoogle Scholar
  22. 22.
    Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312 (2015)CrossRefGoogle Scholar
  23. 23.
    Long, N., Gianola, D., Rosa, G.J.M., Weigel, K.A., Avendano, S.: Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. J. Anim. Breed. Genet. 124, 377–389 (2007)CrossRefGoogle Scholar
  24. 24.
    Adorjan, P., Distler, J., Lipscher, E., Model, F., Muller, J., Pelet, C., Braun, A., Florl, A.R., Gutig, D., Grabs, G., Howe, A., Kursar, M., Lesche, R., Leu, E., Lewin, A., Maier, S., Muller, V., Otto, T., Scholz, C., Schulz, W.A., Seifert, H.H., Schwope, I., Ziebarth, H., Berlin, K., Piepenbrock, C., Olek, A.: Tumour class prediction and discovery by microarray-based DNA methylation analysis. Nucleic Acids Res. 30, e21 (2002)CrossRefGoogle Scholar
  25. 25.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  26. 26.
    Lloyd, J.P., Seddon, A.E., Moghe, G.D., Simenc, M.C., Shiu, S.H.: Characteristics of plant essential genes allow for within- and between-species prediction of lethal mutant phenotypes. Plant Cell 27, 2133–2147 (2015)CrossRefGoogle Scholar
  27. 27.
    Panwar, B., Arora, A., Raghava, G.P.: Prediction and classification of NcRNAs using structural information. BMC Genom. 15, 127 (2014)CrossRefGoogle Scholar
  28. 28.
    Touw, W.G., Bayjanov, J.R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., van Hijum, S.A.: data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform. 14, 315–326 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Zhixu Qiu
    • 1
    • 2
  • Qian Cheng
    • 1
    • 2
  • Jie Song
    • 1
    • 2
  • Yunjia Tang
    • 1
    • 2
  • Chuang Ma
    • 1
    • 2
    Email author
  1. 1.State Kay Laboratory of Crop Stress Biology for Arid AreasNorthwest A&F UniversityYanglingChina
  2. 2.Center of Bioinformatics, College of Life SciencesNorthwest A&F UniversityYanglingChina

Personalised recommendations