Abstract
The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. We evaluated the frameworks on a genomic rice dataset where the regression task is to predict plant phenotype. We conclude that there are use cases for both frameworks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexandrov, N., et al.: SNP-seek database of SNPs derived from 3000 rice genomes. Nucl. Acids Res. 43(D1), D1023āD1027 (2015)
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175ā185 (1992)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281ā305 (2012)
Breheny, P., Huang, J.: Penalized methods for bi-level variable selection. Stat. Interface 2(3), 369 (2009)
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49ā64 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5ā32 (2001)
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 18. ACM (2004)
Chen, T., He, T.: xgboost: extreme gradient boosting. R package version 0.4-2 (2015)
Cortes, C., Mohri, M., Rostamizadeh, A.: Learning non-linear combinations of kernels. In: Advances in Neural Information Processing Systems, pp. 396ā404 (2009)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273ā297 (1995)
Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201ā211. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45428-4_20
Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255ā273 (2004)
Endelman, J.B.: Ridge regression and other kernels for genomic selection with r package rrBLUP. Plant Genome 4(3), 250ā255 (2011)
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189ā1232 (2001)
Grenier, C., et al.: Accuracy of genomic selection in a rice synthetic population developed for recurrent selection breeding. PloS ONE 10(8), e0136594 (2015)
Grinberg, N.F., et al.: Implementation of genomic prediction in Lolium perenne (L.) breeding populations. Front. Plant Sci. 7, 133 (2016)
Hainmueller, J., Hazlett, C.: Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)
Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339ā355 (2009)
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299ā314 (1996)
Jahrer, M., Tƶscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 693ā702. ACM (2010)
Jolliffe, I.T.: A note on the use of principal components in regression. Appl. Stat. 31(3) 300ā303 (1982)
Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132(1), 1ā63 (1997)
Van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6(1), 1544ā6115 (2007)
Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5(Jan), 27ā72 (2004)
Maclean, J., Hardy, B., Hettel, G.: Rice almanac: source book for one of the most important economic activities on earth. In: IRRI (2013)
Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)
Merz, C.J.: Classification and regression by combining models. Ph.D. thesis, University of California Irvine (1998)
Ni, W., Brown, S.D., Man, R.: Stacked partial least squares regression analysis for spectral calibration and prediction. J. Chemom. 23(10), 505ā517 (2009)
Ogutu, J.O., Piepho, H.P.: Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. In: BMC Proceedings. vol. 8, p. S7. BioMed Central (2014)
Onogi, A., et al..: Exploring the areas of applicability of whole-genome prediction methods for asian rice (oryza sativa l.). Theor. Appl. Genet. 128(1), 41ā53 (2015)
Parmanto, B., Munro, P.W., Doyle, H.R.: Reducing variance of committee prediction with resampling techniques. Connect. Sci. 8(3ā4), 405ā426 (1996)
Purcell, S., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559ā575 (2007)
Ray, D.K., Mueller, N.D., West, P.C., Foley, J.A.: Yield trends are insufficient to double global crop production by 2050 (2013)
Rooney, N., Patterson, D., Anand, S., Tsymbal, A.: Dynamic integration of regression models. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 164ā173. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_16
Rutkoski, J.E., Poland, J., Jannink, J., Sorrells, M.E.: Imputation of unordered markers and the impact on genomic selection accuracy. G3: Genes Genomes Genet. 3(3), 427ā439 (2013)
Sonnenburg, S., RƤtsch, G., SchƤfer, C., Schƶlkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7(Jul), 1531ā1565 (2006)
Spindel, J., et al.: Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet. 11(2), e1004982 (2015)
Tai, A.P., Martin, M.V., Heald, C.L.: Threat to future global food security from climate change and ozone air pollution. Nat. Clim. Chang. 4(9), 817ā821 (2014)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Society. Ser. B (Methodol.) 58(1) 267ā288 (1996)
Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. (JAIR) 10, 271ā289 (1999)
Un, U.N.: World population prospects: the 2015 revision, key findings and advance tables. Working Paper, No. ESA/P/WP. 241. (2015)
Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013)
Xu, L., Jiang, J.H., Zhou, Y.P., Wu, H.L., Shen, G.L., Yu, R.Q.: MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemom. Intell. Lab. Syst. 87(2), 226ā230 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Orhobor, O.I., Alexandrov, N.N., King, R.D. (2018). Predicting Rice Phenotypes with Meta-learning. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds) Discovery Science. DS 2018. Lecture Notes in Computer Science(), vol 11198. Springer, Cham. https://doi.org/10.1007/978-3-030-01771-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-01771-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01770-5
Online ISBN: 978-3-030-01771-2
eBook Packages: Computer ScienceComputer Science (R0)