Predicting Rice Phenotypes with Meta-learning

Orhobor, Oghenejokpeme I.; Alexandrov, Nickolai N.; King, Ross D.

doi:10.1007/978-3-030-01771-2_10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11198))

Included in the following conference series:

International Conference on Discovery Science

862 Accesses
2 Citations

Abstract

The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. We evaluated the frameworks on a genomic rice dataset where the regression task is to predict plant phenotype. We conclude that there are use cases for both frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexandrov, N., et al.: SNP-seek database of SNPs derived from 3000 rice genomes. Nucl. Acids Res. 43(D1), D1023–D1027 (2015)
Article Google Scholar
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)
Google Scholar
Breheny, P., Huang, J.: Penalized methods for bi-level variable selection. Stat. Interface 2(3), 369 (2009)
Article MathSciNet Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 18. ACM (2004)
Google Scholar
Chen, T., He, T.: xgboost: extreme gradient boosting. R package version 0.4-2 (2015)
Google Scholar
Cortes, C., Mohri, M., Rostamizadeh, A.: Learning non-linear combinations of kernels. In: Advances in Neural Information Processing Systems, pp. 396–404 (2009)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201–211. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45428-4_20
Chapter Google Scholar
Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)
Article Google Scholar
Endelman, J.B.: Ridge regression and other kernels for genomic selection with r package rrBLUP. Plant Genome 4(3), 250–255 (2011)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Google Scholar
Grenier, C., et al.: Accuracy of genomic selection in a rice synthetic population developed for recurrent selection breeding. PloS ONE 10(8), e0136594 (2015)
Article Google Scholar
Grinberg, N.F., et al.: Implementation of genomic prediction in Lolium perenne (L.) breeding populations. Front. Plant Sci. 7, 133 (2016)
Google Scholar
Hainmueller, J., Hazlett, C.: Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)
Google Scholar
Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339–355 (2009)
Article MathSciNet Google Scholar
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
Google Scholar
Jahrer, M., Töscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 693–702. ACM (2010)
Google Scholar
Jolliffe, I.T.: A note on the use of principal components in regression. Appl. Stat. 31(3) 300–303 (1982)
Article Google Scholar
Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132(1), 1–63 (1997)
Article MathSciNet Google Scholar
Van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6(1), 1544–6115 (2007)
Google Scholar
Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5(Jan), 27–72 (2004)
Google Scholar
Maclean, J., Hardy, B., Hettel, G.: Rice almanac: source book for one of the most important economic activities on earth. In: IRRI (2013)
Google Scholar
Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)
Article Google Scholar
Merz, C.J.: Classification and regression by combining models. Ph.D. thesis, University of California Irvine (1998)
Google Scholar
Ni, W., Brown, S.D., Man, R.: Stacked partial least squares regression analysis for spectral calibration and prediction. J. Chemom. 23(10), 505–517 (2009)
Article Google Scholar
Ogutu, J.O., Piepho, H.P.: Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. In: BMC Proceedings. vol. 8, p. S7. BioMed Central (2014)
Article Google Scholar
Onogi, A., et al..: Exploring the areas of applicability of whole-genome prediction methods for asian rice (oryza sativa l.). Theor. Appl. Genet. 128(1), 41–53 (2015)
Article Google Scholar
Parmanto, B., Munro, P.W., Doyle, H.R.: Reducing variance of committee prediction with resampling techniques. Connect. Sci. 8(3–4), 405–426 (1996)
Article Google Scholar
Purcell, S., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)
Article Google Scholar
Ray, D.K., Mueller, N.D., West, P.C., Foley, J.A.: Yield trends are insufficient to double global crop production by 2050 (2013)
Article Google Scholar
Rooney, N., Patterson, D., Anand, S., Tsymbal, A.: Dynamic integration of regression models. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 164–173. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_16
Chapter Google Scholar
Rutkoski, J.E., Poland, J., Jannink, J., Sorrells, M.E.: Imputation of unordered markers and the impact on genomic selection accuracy. G3: Genes Genomes Genet. 3(3), 427–439 (2013)
Article Google Scholar
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7(Jul), 1531–1565 (2006)
Google Scholar
Spindel, J., et al.: Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet. 11(2), e1004982 (2015)
Article Google Scholar
Tai, A.P., Martin, M.V., Heald, C.L.: Threat to future global food security from climate change and ozone air pollution. Nat. Clim. Chang. 4(9), 817–821 (2014)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Society. Ser. B (Methodol.) 58(1) 267–288 (1996)
Google Scholar
Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. (JAIR) 10, 271–289 (1999)
Article Google Scholar
Un, U.N.: World population prospects: the 2015 revision, key findings and advance tables. Working Paper, No. ESA/P/WP. 241. (2015)
Google Scholar
Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013)
Xu, L., Jiang, J.H., Zhou, Y.P., Wu, H.L., Shen, G.L., Yu, R.Q.: MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemom. Intell. Lab. Syst. 87(2), 226–230 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The University of Manchester, Manchester, M13 9PL, United Kingdom
Oghenejokpeme I. Orhobor & Ross D. King
The International Rice Research Institute, Los Baños, Philippines
Nickolai N. Alexandrov

Authors

Oghenejokpeme I. Orhobor
View author publications
You can also search for this author in PubMed Google Scholar
Nickolai N. Alexandrov
View author publications
You can also search for this author in PubMed Google Scholar
Ross D. King
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oghenejokpeme I. Orhobor .

Editor information

Editors and Affiliations

Goldsmiths University of London, London, UK
Larisa Soldatova
Eindhoven University of Technology, Eindhoven, The Netherlands
Joaquin Vanschoren
University of Cyprus, Nicosia, Cyprus
George Papadopoulos
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orhobor, O.I., Alexandrov, N.N., King, R.D. (2018). Predicting Rice Phenotypes with Meta-learning. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-01771-2_10
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01770-5
Online ISBN: 978-3-030-01771-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics