Bayesian discrete lognormal regression model for genomic prediction

Montesinos-López, Abelardo; Gutiérrez-Pulido, Humberto; Ramos-Pulido, Sofía; Montesinos-López, José Cricelio; Montesinos-López, Osval A.; Crossa, José

doi:10.1007/s00122-023-04526-4

Bayesian discrete lognormal regression model for genomic prediction

Original Article
Published: 14 January 2024

Volume 137, article number 21, (2024)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Abelardo Montesinos-López¹,
Humberto Gutiérrez-Pulido¹,
Sofía Ramos-Pulido¹,
José Cricelio Montesinos-López²,
Osval A. Montesinos-López ORCID: orcid.org/0000-0002-3973-6547³ &
…
José Crossa ORCID: orcid.org/0000-0001-9429-5855^4,5,6

295 Accesses
4 Altmetric
Explore all metrics

Abstract

Key message

Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model.

Abstract

Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits. It has revolutionized animal and plant breeding, as it allows breeders to identify the best candidates without labor-intensive and time-consuming phenotypic evaluations. While several statistical models have been developed, most of them have been for quantitative continuous traits and only a few for count responses. In this paper, we propose a discrete lognormal regression model in the Bayesian context, that with a Gibbs sampler to explore the corresponding posterior distribution and make the predictions. Two datasets of resistance disease is used in the wheat crop and are then evaluated against the traditional Gaussian model and a lognormal model. The results indicate the proposed model is a competitive and natural model for predicting count genomic traits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genomic Prediction Models for Count Data

Article Open access 07 October 2015

Applications of Genomic Selection in Breeding Wheat for Rust Resistance

Genomic Selection Using Bayesian Methods: Models, Software, and Application

Data availability

The genomic and phenotypic data used in this study can be downloaded from the following link http://hdl.handle.net/11529/10575..

References

Bai G, Shaner G (2004) Management and resistance in wheat and barley to Fusarium head blight. Annu Rev Phytopathol 42:135–161
Article CAS PubMed Google Scholar
Budhlakoti N, Kushwaha AK, Rai A, Chaturvedi KK, Kumar A, Pradhan AK, Kumar S (2022) Genomic selection: a tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops. Front Genet 13:66
Article Google Scholar
Buerstmayr M, Steiner B, Buerstmayr H (2020) Breeding for Fusarium head blight resistance in wheat—progress and challenges. Plant Breed 139(3):429–454
Article CAS Google Scholar
Cavanagh CR, Chao S, Wang S, Huang BE, Stephen S et al (2013) Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc Natl Acad Sci USA 110(20):8057–8062
Article CAS PubMed PubMed Central ADS Google Scholar
Crossa J et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22(11):961–975
Article CAS PubMed Google Scholar
Falconi-Castillo CE (2014) Association mapping for detecting QTLs for Fusarium head blight and yellow rust resistance in bread wheat. Michigan State University
Falk DA, Swetnam TW (1998) Scaling rules and probability models for surface fire regimes in ponderosa pine forests. In: Fire, fuel treatments, and ecological restoration: conference proceedings, p 301
Gianola D, Van Kaam JBCHM (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4):2289–2303. https://doi.org/10.1534/genetics.107.084285
Article PubMed PubMed Central Google Scholar
González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J (2018) Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. The Plant Genome 11(2):170104
Article Google Scholar
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinform 12(1):186. https://doi.org/10.1186/1471-2105-12-186
Article Google Scholar
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2013) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 96(2):859–876. https://doi.org/10.3168/jds.2012-5639
Article CAS Google Scholar
Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME (2010) Plant breeding with genomic selection: gain per unit time and cost. Crop Sci 50(5):1681–1690
Article Google Scholar
Hickey JM et al (2017) Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet 49(9):1297–1303
Article CAS PubMed Google Scholar
Leirness JB, Kinlan BP (2018) Additional statistical analyses to support guidelines for marine avian sampling. Sterling (VA): US Department of the Interior, Bureau of Ocean Energy Management. OCS Study BOEM, p 63
Lyu J, Nadarajah S (2021) Discrete lognormal distributions with application to insurance data. Int J Syst Assur Eng Manag 13:1–15
Google Scholar
Merrick LF, Lozada DN, Chen X, Carter AH (2022) Classification and regression models for genomic selection of skewed phenotypes: a case for disease resistance in winter wheat (Triticum aestivum L.). Front Genet 13:835781
Article PubMed PubMed Central Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
Article CAS PubMed PubMed Central Google Scholar
Montesinos-López OA, Montesinos-López A, Crossa J, Burgueño J, Eskridge K (2015a) Genomic-enabled prediction of ordinal data with Bayesian logistic ordinal regression. G3 Genes Genomes Genet 5(10):2113–2126
Article Google Scholar
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Eskridge K, He X, Juliana P, Singh P, Crossa J (2015b) Genomic prediction models for count data. J Agric Biol Environ Stat 20:533–554
Article MathSciNet Google Scholar
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, de Los Campos G, Eskridge K, Crossa J (2015c) Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding. G3 Genes, Genomes, Genet 5(2):291–300
Article Google Scholar
Montesinos-López A, Montesinos-López OA, Crossa J, Burgueño J, Eskridge KM, Falconi-Castillo E, Cichy K (2016) Genomic Bayesian prediction model for count data with genotype× environment interaction. G3 Genes Genomes Genet 6(5):1165–1177
Article Google Scholar
Montesinos-López OA, Montesinos-López A, Crossa J, Toledo FH, Montesinos-López JC, Singh P, Salinas-Ruiz J (2017) A Bayesian Poisson-lognormal model for count data for multiple-trait multiple-environment genomic-enabled prediction. G3 Genes Genomes Genet 7(5):1595–1606
Article Google Scholar
Montesinos-López OA, Montesinos-López JC, Singh P, Lozano-Ramirez N, Barrón-López A, Montesinos-López A, Crossa J (2020) A multivariate Poisson deep learning model for genomic prediction of count data. G3 Genes Genomes Genet 10(11):4177–4190
Article Google Scholar
Montesinos López OA, Montesinos López A, Crossa J (2022) Multivariate statistical machine learning methods for genomic prediction. Springer Nature, p 691
Book Google Scholar
Moreira JA, Zeng XHT, Amaral LAN (2015) The distribution of the asymptotic number of citations to sets of publications by a researcher or from an academic department are consistent with a discrete lognormal model. PLoS One 10(11):e0143108
Article PubMed PubMed Central Google Scholar
Oliveira SL, Turkman MA, Pereira JM (2012) An analysis of fire frequency in tropical savannas of northern Australia, using a satellite-based fire atlas. Int J Wildland Fire 22(4):479–492
Article Google Scholar
Pérez P, de Los Campos G (2014a) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483–495
Article PubMed PubMed Central Google Scholar
Pérez P, de Los Campos G (2014b) BGLR: a statistical package for whole genome regression and prediction. Genetics 198(2):483–495
Article PubMed PubMed Central Google Scholar
Pryce JE, Arias J, Bowman PJ, Davis SR, Macdonald KA, Waghorn GC, Spelman RJ (2012) Accuracy of genomic predictions of residual feed intake and 250-day body weight in growing heifers using 625,000 single nucleotide polymorphism markers. J Dairy Sci 95(4):2108–2119
Article CAS PubMed Google Scholar
R Core Team (2023) R: a language and environment for statistical computing [Internet]. Vienna: R Foundation for Statistical Computing; Available from https://www.R-project.org/
Rutkoski J, Poland J, Jannink JL, Sorrells ME (2016) Imputation of unordered markers and the impact on genomic selection accuracy. G3 Genes Genomes Genet 6(5):1285–1296
Google Scholar
Sorensen DA, Andersen S, Gianola D, Korsgaard I (1995) Bayesian inference in threshold models using Gibbs sampling. Genet Sel Evol 27(3):229–249
Article PubMed Central Google Scholar
Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redona E, McCouch SR (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11(2):e1004982
Article PubMed PubMed Central Google Scholar
Stringer MJ, Sales-Pardo M, Nunes Amaral LA (2008) Effectiveness of journal ranking schemes as a tool for locating information. PLoS ONE 3(2):e1683
Article PubMed PubMed Central ADS Google Scholar
Stringer MJ, Sales-Pardo M, Amaral LAN (2010) Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. J Am Soc Inform Sci Technol 61(7):1377–1385
Article Google Scholar
Thelwall M (2016) The discretised lognormal and hooked power law distributions for complete citation data: best options for modelling and regression. J Informetr 10(2):336–346
Article Google Scholar
Thelwall M, Wilson P (2014) Distributions for cited articles from individual subjects and years. J Informetr 8(4):824–839
Article Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423
Article CAS PubMed Google Scholar
Zhang Q et al (2015) Genomic selection for productive and disease resistance traits in cattle: a review. J Anim Sci Biotechnol 6(1):32
Article Google Scholar
Zhao M, Leng Y, Chao S, Xu SS, Zhong S (2018) Molecular mapping of QTL for Fusarium head blight resistance introgressed into durum wheat. Theor Appl Genet 131:1939–1951
Article CAS PubMed Google Scholar
Zhu Z, Chen L, Zhang W, Yang L, Zhu W, Li J, Gao C (2020) Genome-wide association analysis of Fusarium head blight resistance in Chinese elite wheat lines. Front Plant Sci 11:206
Article PubMed PubMed Central Google Scholar
Zipkin EF, Leirness JB, Kinlan BP, O’Connell AF, Silverman ED (2014) Fitting statistical distributions to sea duck count data: implications for survey design and abundance estimation. Stat Methodol 17:67–81
Article MathSciNet Google Scholar

Download references

Acknowledgements

We are thankful for the financial support provided by the Bill & Melinda Gates Foundation [INV-003439, BMGF/FCDO, Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)], the USAID projects [USAID Amend. No. 9 MTO 069033, USAID-CIMMYT Wheat/AGGMW, AGG-Maize Supplementary Project, AGG (Stress Tolerant Maize for Africa], and the CIMMYT CRP (maize and wheat). We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) through the Research Council of Norway for grants 301835 (Sustainable Management of Rust Diseases in Wheat) and 320090 (Phenotyping for Healthier and more Productive Wheat Crops).

Funding

We are thankful for the financial support provided by the Bill & Melinda Gates Foundation [INV-003439, BMGF/FCDO, Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)].

Author information

Authors and Affiliations

Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, C. P. 44430, Guadalajara, Jalisco, México
Abelardo Montesinos-López, Humberto Gutiérrez-Pulido & Sofía Ramos-Pulido
Department of Public Health Sciences, University of California Davis, Davis, CA, 95616, USA
José Cricelio Montesinos-López
Facultad de Telemática, Universidad de Colima, C. P. 28040, Colima, Edo. de Colima, México
Osval A. Montesinos-López
International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45, El Batán, C. P. 56237, Texcoco, Edo. de México, México
José Crossa
Colegio de Postgraduados, C. P. 56230, Montecillos, Edo. de México, México
José Crossa
Centre for Crop & Food Innovation, Food Futures Institute, Murdoch University, Murdoch, 6150, Australia
José Crossa

Authors

Abelardo Montesinos-López
View author publications
You can also search for this author in PubMed Google Scholar
Humberto Gutiérrez-Pulido
View author publications
You can also search for this author in PubMed Google Scholar
Sofía Ramos-Pulido
View author publications
You can also search for this author in PubMed Google Scholar
José Cricelio Montesinos-López
View author publications
You can also search for this author in PubMed Google Scholar
Osval A. Montesinos-López
View author publications
You can also search for this author in PubMed Google Scholar
José Crossa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AML, OAML, and SRP developed the idea, implemented the model, and wrote the manuscript. HGP, JCML, and JC assisted in writing and critically reviewing the article.

Corresponding authors

Correspondence to Osval A. Montesinos-López or José Crossa.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable (not human or animal data are used).

Consent to participate

Authors have declared that have consented to participate.

Consent for publication

Authors have declared that have consented to participate.

Additional information

Communicated by Mikko J. Sillanpää.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

The observed response variable in model (1) is the result of applying the floor function to a continuous Lognormal regression model, that is, $Y_{i} = \left\lfloor {L_{i}^{*} } \right\rfloor$, where given ${{\varvec{x}}}_{i}$ the latent variable ${L}_{i}^{*}$ follows a Lognormal distribution with parameters ${\mu }_{i}={\beta }_{0}+\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}$ and ${\sigma }_{i}^{2}={\sigma }^{2}$, $i=1,\dots ,n.$ Then, by expressing ${L}_{i}^{*}={\text{exp}}({L}_{i})$ where ${L}_{i}|{{\varvec{x}}}_{i}\sim N\left({\mu }_{i},{\sigma }^{2}\right)$, $i=1,\dots ,n$, and by augmenting the posterior distribution of the parameters of model (1), ${\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2}$, with latent variables ${L}_{i}$ (${\varvec{L}}={\left({L}_{1},..,{L}_{n}\right)}^{T}$), the joint posterior of ${\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2}$ and ${\varvec{L}}$ is given by

$${f}_{{\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2},{\varvec{L}}|{\varvec{Y}}}\left({\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2},{\varvec{l}}|{\varvec{y}}\right)\propto \left\{\prod_{i=1}^{n}\frac{1}{\sqrt{{\sigma }^{2}}}{\text{exp}}\left[-\frac{1}{2{\sigma }^{2}}{\left({l}_{i}-{\beta }_{0}-\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\right)}^{2}\right]{I}_{\left\{{\text{log}}\left({y}_{i}\right)\le {l}_{i}\le \mathit{log}\left({y}_{i}+1\right)\right\}}\right\}\times {f}_{{\varvec{\beta}}|{\sigma }_{\beta }^{2}}\left({\varvec{\beta}}|{\sigma }_{\beta }^{2}\right){f}_{{\sigma }_{\beta }^{2}}\left({\sigma }_{\beta }^{2}\right){f}_{{\sigma }^{2}}\left({\sigma }^{2}\right)$$

(A1)

where ${\varvec{l}}={\left({l}_{1},..,{l}_{n}\right)}^{T}$. From here and doing simple algebraic manipulations, the full conditional posterior for ${\beta }_{0}$ is a normal distribution with mean $\frac{1}{n}\sum_{i=1}^{n}{l}_{i}^{\left(0\right)}$ and variance $\frac{{\sigma }^{2}}{n}$ where ${l}_{i}^{(0)}={l}_{i}-\sum_{\begin{array}{c}j=1\end{array}}^{p}{x}_{ij}{\beta }_{j}$.

Similarly, the full conditional posterior for each ${\beta }_{k},$ $k=1,..,p,$ is a normal distribution with variance $\frac{1}{{\sigma }_{\beta }^{-2}+{\sigma }^{-2}\sum_{i=1}^{n}{x}_{ik}^{2}}$ and mean $\frac{{\sigma }^{-2}}{{\sigma }_{\beta }^{-2}+{\sigma }^{-2}\sum_{i=1}^{n}{x}_{ik}^{2}}\sum_{i=1}^{n}{l}_{i}^{(k)}{x}_{ik}$ where ${l}_{i}^{(k)}={l}_{i}-{\beta }_{0}-\sum_{\begin{array}{c}j=1\\ j\ne k\end{array}}^{p}{x}_{ij}{\beta }_{j}$.

The full conditional posterior for ${\sigma }_{\beta }^{2}$ is

$${f}_{{\sigma }_{\beta }^{2}}\left({\sigma }_{\beta }^{2}|{\varvec{y}},-\right)\propto \frac{1}{{{\sigma }_{\beta }^{2}}^{p/2}}{\text{exp}}\left[-\frac{1}{2{\sigma }_{\beta }^{2}}\sum_{j=1}^{p}{\beta }_{j}^{2}\right]\frac{1}{{{\sigma }_{\beta }^{2}}^{1+{v}_{\beta }/2}}{\text{exp}}\left(-\frac{{s}_{\beta }}{2{\sigma }_{\beta }^{2}}\right)\propto \frac{1}{{{\sigma }_{\beta }^{2}}^{1+({v}_{\beta }+p)/2}}{\text{exp}}\left[-\frac{\left({s}_{\beta }+\sum_{j=1}^{p}{\beta }_{j}^{2}\right)}{2{\sigma }_{\beta }^{2}}\right]$$

which corresponds to the density of a scaled inverse chi-squared distribution (${\chi }^{-2}$) and so ${\sigma }_{\beta }^{2}|{\varvec{y}},-\sim {\chi }^{-2}\left({\widetilde{v}}_{\beta },{\widetilde{s}}_{\beta }\right)$, ${\widetilde{v}}_{\beta }={v}_{\beta }+p$ and${\widetilde{s}}_{\beta }={s}_{\beta }+\sum_{j=1}^{p}{\beta }_{j}^{2}$. Likewise, the full conditional posterior for ${\sigma }^{2}$ is ${\sigma }^{2}|{\varvec{y}},-\sim {\chi }^{-2}\left(\widetilde{v},\widetilde{s}\right)$, $\widetilde{v}=v+n$ and$\widetilde{s}=s+\sum_{i=1}^{n}{\left({l}_{i}-{\beta }_{0}-\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\right)}^{2}$. Here, we denote the rest of the parameters other than the parameter for which the conditional distribution is specified.

Now, from equation (A1) the full conditional posterior for ${\varvec{L}}$ is given by

$${f}_{{\varvec{L}}|{\varvec{Y}}}\left({\varvec{l}}|{\varvec{y}},-\right)\propto \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi {\sigma }^{2}}}{\text{exp}}\left[-\frac{1}{2{\sigma }^{2}}{\left({l}_{i}-{\beta }_{0}-\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\right)}^{2}\right]{I}_{\left\{{\text{log}}\left({y}_{i}\right)\le {l}_{i}\le {\text{log}}({y}_{i}+1)\right\}}$$

and from here conditioned to ${\varvec{Y}}$ and the parameters of model (1), ${L}_{1},..,{L}_{n}$ are independent random variables, each one with truncated normal distribution on ($\mathit{log}\left({y}_{i}\right),log({y}_{i}+1$) with parameters ${\beta }_{0}+\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}$ and ${\sigma }^{2}$, $i=1,\dots ,n$, respectively.

Appendix 2

See Figs. 3 and 4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Montesinos-López, A., Gutiérrez-Pulido, H., Ramos-Pulido, S. et al. Bayesian discrete lognormal regression model for genomic prediction. Theor Appl Genet 137, 21 (2024). https://doi.org/10.1007/s00122-023-04526-4

Download citation

Received: 10 May 2023
Accepted: 11 December 2023
Published: 14 January 2024
DOI: https://doi.org/10.1007/s00122-023-04526-4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian discrete lognormal regression model for genomic prediction