Skip to main content

Predicting Rice Phenotypes with Meta-learning

  • Conference paper
  • First Online:
Discovery Science (DS 2018)

Abstract

The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. We evaluated the frameworks on a genomic rice dataset where the regression task is to predict plant phenotype. We conclude that there are use cases for both frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexandrov, N., et al.: SNP-seek database of SNPs derived from 3000 rice genomes. Nucl. Acids Res. 43(D1), D1023ā€“D1027 (2015)

    ArticleĀ  Google ScholarĀ 

  2. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175ā€“185 (1992)

    MathSciNetĀ  Google ScholarĀ 

  3. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281ā€“305 (2012)

    Google ScholarĀ 

  4. Breheny, P., Huang, J.: Penalized methods for bi-level variable selection. Stat. Interface 2(3), 369 (2009)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  5. Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49ā€“64 (1996)

    MathSciNetĀ  MATHĀ  Google ScholarĀ 

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5ā€“32 (2001)

    ArticleĀ  Google ScholarĀ 

  7. Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 18. ACM (2004)

    Google ScholarĀ 

  8. Chen, T., He, T.: xgboost: extreme gradient boosting. R package version 0.4-2 (2015)

    Google ScholarĀ 

  9. Cortes, C., Mohri, M., Rostamizadeh, A.: Learning non-linear combinations of kernels. In: Advances in Neural Information Processing Systems, pp. 396ā€“404 (2009)

    Google ScholarĀ 

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273ā€“297 (1995)

    MATHĀ  Google ScholarĀ 

  11. Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201ā€“211. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45428-4_20

    ChapterĀ  Google ScholarĀ 

  12. Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255ā€“273 (2004)

    ArticleĀ  Google ScholarĀ 

  13. Endelman, J.B.: Ridge regression and other kernels for genomic selection with r package rrBLUP. Plant Genome 4(3), 250ā€“255 (2011)

    ArticleĀ  Google ScholarĀ 

  14. Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)

  15. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189ā€“1232 (2001)

    Google ScholarĀ 

  16. Grenier, C., et al.: Accuracy of genomic selection in a rice synthetic population developed for recurrent selection breeding. PloS ONE 10(8), e0136594 (2015)

    ArticleĀ  Google ScholarĀ 

  17. Grinberg, N.F., et al.: Implementation of genomic prediction in Lolium perenne (L.) breeding populations. Front. Plant Sci. 7, 133 (2016)

    Google ScholarĀ 

  18. Hainmueller, J., Hazlett, C.: Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)

    Google ScholarĀ 

  19. Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339ā€“355 (2009)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  20. Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299ā€“314 (1996)

    Google ScholarĀ 

  21. Jahrer, M., Tƶscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 693ā€“702. ACM (2010)

    Google ScholarĀ 

  22. Jolliffe, I.T.: A note on the use of principal components in regression. Appl. Stat. 31(3) 300ā€“303 (1982)

    ArticleĀ  Google ScholarĀ 

  23. Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132(1), 1ā€“63 (1997)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  24. Van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6(1), 1544ā€“6115 (2007)

    Google ScholarĀ 

  25. Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5(Jan), 27ā€“72 (2004)

    Google ScholarĀ 

  26. Maclean, J., Hardy, B., Hettel, G.: Rice almanac: source book for one of the most important economic activities on earth. In: IRRI (2013)

    Google ScholarĀ 

  27. Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)

    ArticleĀ  Google ScholarĀ 

  28. Merz, C.J.: Classification and regression by combining models. Ph.D. thesis, University of California Irvine (1998)

    Google ScholarĀ 

  29. Ni, W., Brown, S.D., Man, R.: Stacked partial least squares regression analysis for spectral calibration and prediction. J. Chemom. 23(10), 505ā€“517 (2009)

    ArticleĀ  Google ScholarĀ 

  30. Ogutu, J.O., Piepho, H.P.: Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. In: BMC Proceedings. vol. 8, p. S7. BioMed Central (2014)

    ArticleĀ  Google ScholarĀ 

  31. Onogi, A., et al..: Exploring the areas of applicability of whole-genome prediction methods for asian rice (oryza sativa l.). Theor. Appl. Genet. 128(1), 41ā€“53 (2015)

    ArticleĀ  Google ScholarĀ 

  32. Parmanto, B., Munro, P.W., Doyle, H.R.: Reducing variance of committee prediction with resampling techniques. Connect. Sci. 8(3ā€“4), 405ā€“426 (1996)

    ArticleĀ  Google ScholarĀ 

  33. Purcell, S., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559ā€“575 (2007)

    ArticleĀ  Google ScholarĀ 

  34. Ray, D.K., Mueller, N.D., West, P.C., Foley, J.A.: Yield trends are insufficient to double global crop production by 2050 (2013)

    ArticleĀ  Google ScholarĀ 

  35. Rooney, N., Patterson, D., Anand, S., Tsymbal, A.: Dynamic integration of regression models. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 164ā€“173. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_16

    ChapterĀ  Google ScholarĀ 

  36. Rutkoski, J.E., Poland, J., Jannink, J., Sorrells, M.E.: Imputation of unordered markers and the impact on genomic selection accuracy. G3: Genes Genomes Genet. 3(3), 427ā€“439 (2013)

    ArticleĀ  Google ScholarĀ 

  37. Sonnenburg, S., RƤtsch, G., SchƤfer, C., Schƶlkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7(Jul), 1531ā€“1565 (2006)

    Google ScholarĀ 

  38. Spindel, J., et al.: Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet. 11(2), e1004982 (2015)

    ArticleĀ  Google ScholarĀ 

  39. Tai, A.P., Martin, M.V., Heald, C.L.: Threat to future global food security from climate change and ozone air pollution. Nat. Clim. Chang. 4(9), 817ā€“821 (2014)

    ArticleĀ  Google ScholarĀ 

  40. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Society. Ser. B (Methodol.) 58(1) 267ā€“288 (1996)

    Google ScholarĀ 

  41. Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. (JAIR) 10, 271ā€“289 (1999)

    ArticleĀ  Google ScholarĀ 

  42. Un, U.N.: World population prospects: the 2015 revision, key findings and advance tables. Working Paper, No. ESA/P/WP. 241. (2015)

    Google ScholarĀ 

  43. Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013)

  44. Xu, L., Jiang, J.H., Zhou, Y.P., Wu, H.L., Shen, G.L., Yu, R.Q.: MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemom. Intell. Lab. Syst. 87(2), 226ā€“230 (2007)

    ArticleĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oghenejokpeme I. Orhobor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Orhobor, O.I., Alexandrov, N.N., King, R.D. (2018). Predicting Rice Phenotypes with Meta-learning. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01771-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01770-5

  • Online ISBN: 978-3-030-01771-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics