Abstract
Key message
Moisture content during nixtamalization can be accurately predicted from NIR spectroscopy when coupled with a support vector machine (SVM) model, is strongly modulated by the environment, and has a complex genetic architecture.
Abstract
Lack of high-throughput phenotyping systems for determining moisture content during the maize nixtamalization cooking process has led to difficulty in breeding for this trait. This study provides a high-throughput, quantitative measure of kernel moisture content during nixtamalization based on NIR scanning of uncooked maize kernels. Machine learning was utilized to develop models based on the combination of NIR spectra and moisture content determined from a scaled-down benchtop cook method. A linear support vector machine (SVM) model with a Spearman’s rank correlation coefficient of 0.852 between wet laboratory and predicted values was developed from 100 diverse temperate genotypes grown in replicate across two environments. This model was applied to NIR spectra data from 501 diverse temperate genotypes grown in replicate in five environments. Analysis of variance revealed environment explained the highest percent of the variation (51.5%), followed by genotype (15.6%) and genotype-by-environment interaction (11.2%). A genome-wide association study identified 26 significant loci across five environments that explained between 5.04% and 16.01% (average = 10.41%). However, genome-wide markers explained 10.54% to 45.99% (average = 31.68%) of the variation, indicating the genetic architecture of this trait is likely complex and controlled by many loci of small effect. This study provides a high-throughput method to evaluate moisture content during nixtamalization that is feasible at the scale of a breeding program and provides important information about the factors contributing to variation of this trait for breeders and food companies to make future strategies to improve this important processing trait.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Availability of data and material
All raw data are included in the supplemental tables.
Code availability
All code is publicly available on GitHub at https://github.com/HirschLabUMN/ML_Moisture_Prediction.
Notes
Shaun Purcell PLINK (1.07).
References
Aenugu HPR, Kumar DS, Parthiban N et al (2011) Near infra red spectroscopy—an overview. ChemTech 3:12
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Balabin RM, Lomakina EI (2011) Support vector machine regression (SVR/LS-SVM)—an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data. Analyst 136:1703. https://doi.org/10.1039/c0an00387e
Barratt S, Sharma R (2018) Optimizing for generalization in machine learning with cross-validation gradients. ArXiv180507072 Cs Stat
Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67. https://doi.org/10.18637/jss.v067.i01
Bradbury PJ, Zhang Z, Kroon DE et al (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633–2635. https://doi.org/10.1093/bioinformatics/btm308
Breiman L (2001) Random forests. In: Machine learning. Kluwer Academic Publishers, p 17
Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. ArXiv180101489 Stat
Flint-Garcia SA, Bodnar AL, Scott MP (2009) Wide variability in kernel composition, seed characteristics, and zein profiles among diverse maize inbreds, landraces, and teosinte. Theor Appl Genet 119:1129–1142. https://doi.org/10.1007/s00122-009-1115-1
Hansey CN, Johnson JM, Sekhon RS et al (2011) Genetic diversity of a maize association population with restricted phenology. Crop Sci 51:704–715. https://doi.org/10.2135/cropsci2010.03.0178
Holmes M, Renk JS, Coaldrake P et al (2019) Food-grade maize composition, evaluation, and genetics for masa-based products. Crop Sci 59:1392. https://doi.org/10.2135/cropsci2018.10.0605
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer, New York, New York, NY
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an S4 package for Kernel methods in R. J Stat Softw 11. https://doi.org/10.18637/jss.v011.i09
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28. https://doi.org/10.18637/jss.v028.i05
Lipka AE, Tian F, Wang Q et al (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399. https://doi.org/10.1093/bioinformatics/bts444
Liu X, Huang M, Fan B, et al (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLOS Genet 24
Max Kuhn (2020) caret: classification and regression training
Mayer LI, Savin R, Maddonni GA (2016) Heat stress during grain filling modifies kernel protein composition in field-grown maize. Crop Sci 56:1890–1903. https://doi.org/10.2135/cropsci2015.09.0537
Mevik B-H, Wehrens R (2007) The pls package: principal component and partial least squares regression in R. J Stat Softw 18. https://doi.org/10.18637/jss.v018.i02
Mistry J, Chuguransky S, Williams L et al (2021) Pfam: the protein families database in 2021. Nucleic Acids Res 49:D412–D419. https://doi.org/10.1093/nar/gkaa913
Monaco MK, Sen TZ, Dharmawardhana PD et al (2013) Maize metabolic network construction and transcriptome analysis. Plant Genome 6:plantgenome2012.09.0025. https://doi.org/10.3835/plantgenome2012.09.0025
Orman BA, Schumann RA (1991) Comparison of near-infrared spectroscopy calibration methods for the prediction of protein, oil, and starch in maize grain. J Agric Food Chem 39:883–886. https://doi.org/10.1021/jf00005a015
Ornella L, Cervigni G, Tapia E (2012) Applications of machine learning for maize breeding. In: Crop stress and its management: perspectives and strategies. Springer New York
Parmley KA, Higgins RH, Ganapathysubramanian B et al (2019) Machine learning approach for prescriptive plant breeding. Sci Rep 9:17132. https://doi.org/10.1038/s41598-019-53451-4
Mahalanobis PC (1936) On the generalized distance in statistics. Natl Inst Sci India 2:49–55
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
Qiu Y, O’Connor CH, Coletta RD et al (2021) Whole genome variation of transposable element insertions in a maize diversity panel. bioRxiv 2020.09.25.314401. https://doi.org/10.1101/2020.09.25.314401
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Ramirez-Wong B, Sweat VE, Torres PI, Rooney LW (1994) Cooking time, grinding, and moisture content effect on fresh corn masa texture. Cereal Chem J 71:337–343
Renk JS, Gilbert AM, Hattery TJ et al (2021) Genetic architecture of kernel compositional variation in a maize diversity panel. bioRxiv 2021.03.29.436703. https://doi.org/10.1101/2021.03.29.436703
Roh Y, Heo G, Whang SE (2019) A survey on data collection for machine learning: a big data—AI integration perspective. ArXiv181103402 Cs Stat
Santiago-Ramos D, Figueroa-Cárdenas JD, Mariscal-Moreno RM et al (2018) Physical and chemical changes undergone by pericarp and endosperm during corn nixtamalization—a review. J Cereal Sci 81:108–117. https://doi.org/10.1016/j.jcs.2018.04.003
Sehgal A, Sita K, Siddique KHM et al (2018) Drought or/and heat-stress effects on seed filling in food crops: impacts on functional biochemistry, seed yields, and nutritional quality. Front Plant Sci 9:1705. https://doi.org/10.3389/fpls.2018.01705
Serna-Saldivar SO, Gomes MH, Almeida-Domingues HD et al (1993) A method to evaluate the lime-cooking properties of corn (Zea mays). Cereal Chem J 70:762–764
Spielbauer G, Armstrong P, Baier JW et al (2009) High-throughput near-infrared reflectance spectroscopy for predicting quantitative and qualitative composition phenotypes of individual maize kernels. Cereal Chem J 86:556–564. https://doi.org/10.1094/CCHEM-86-5-0556
United States Department of Agriculture (2019) Agricultural Statistics 2018
Weber EJ (1979) The lipids of corn germ and endosperm. J Am Oil Chem Soc 56:637–641. https://doi.org/10.1007/BF02679340
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Acknowledgements
The authors acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for providing resources that contributed to the research results reported in this paper.
Funding
This work was funded in part by NSF IOS-1546272 to CNH and MDY-N, PepsiCo, Inc. to CNH, the Iowa Agriculture and Home Economics Research Station Project IOW03649 to MDY-N, and USDA-ARS base funds to SF-G.
Author information
Authors and Affiliations
Contributions
CNH, GA, MYN, SFG, DE, AW, NA conceived this experiment. MJB, JSR, AMG, TJH, MH conducted the experiments. MJB, JSR, DPE analyzed the data. MJB visualized the data. MJB and CNH wrote the original draft. All co-authors edited and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
NA, AJW, and DE are employed by PepsiCo, Inc., a goods and beverage company that sources food grade corn. The views expressed in this manuscript are those of the authors and do not necessarily reflect the position or policy of PepsiCo, Inc.
Additional information
Communicated by Benjamin Stich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Burns, M.J., Renk, J.S., Eickholt, D.P. et al. Predicting moisture content during maize nixtamalization using machine learning with NIR spectroscopy. Theor Appl Genet 134, 3743–3757 (2021). https://doi.org/10.1007/s00122-021-03926-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-021-03926-8


