Skip to main content
Log in

Bayesian discrete lognormal regression model for genomic prediction

  • Original Article
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model.

Abstract

Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits. It has revolutionized animal and plant breeding, as it allows breeders to identify the best candidates without labor-intensive and time-consuming phenotypic evaluations. While several statistical models have been developed, most of them have been for quantitative continuous traits and only a few for count responses. In this paper, we propose a discrete lognormal regression model in the Bayesian context, that with a Gibbs sampler to explore the corresponding posterior distribution and make the predictions. Two datasets of resistance disease is used in the wheat crop and are then evaluated against the traditional Gaussian model and a lognormal model. The results indicate the proposed model is a competitive and natural model for predicting count genomic traits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The genomic and phenotypic data used in this study can be downloaded from the following link http://hdl.handle.net/11529/10575..

References

  • Bai G, Shaner G (2004) Management and resistance in wheat and barley to Fusarium head blight. Annu Rev Phytopathol 42:135–161

    Article  CAS  PubMed  Google Scholar 

  • Budhlakoti N, Kushwaha AK, Rai A, Chaturvedi KK, Kumar A, Pradhan AK, Kumar S (2022) Genomic selection: a tool for accelerating the efficiency of molecular breeding for development of climate-resilient crops. Front Genet 13:66

    Article  Google Scholar 

  • Buerstmayr M, Steiner B, Buerstmayr H (2020) Breeding for Fusarium head blight resistance in wheat—progress and challenges. Plant Breed 139(3):429–454

    Article  CAS  Google Scholar 

  • Cavanagh CR, Chao S, Wang S, Huang BE, Stephen S et al (2013) Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc Natl Acad Sci USA 110(20):8057–8062

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  • Crossa J et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22(11):961–975

    Article  CAS  PubMed  Google Scholar 

  • Falconi-Castillo CE (2014) Association mapping for detecting QTLs for Fusarium head blight and yellow rust resistance in bread wheat. Michigan State University

  • Falk DA, Swetnam TW (1998) Scaling rules and probability models for surface fire regimes in ponderosa pine forests. In: Fire, fuel treatments, and ecological restoration: conference proceedings, p 301

  • Gianola D, Van Kaam JBCHM (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4):2289–2303. https://doi.org/10.1534/genetics.107.084285

    Article  PubMed  PubMed Central  Google Scholar 

  • González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J (2018) Applications of machine learning methods to genomic selection in breeding wheat for rust resistance. The Plant Genome 11(2):170104

    Article  Google Scholar 

  • Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinform 12(1):186. https://doi.org/10.1186/1471-2105-12-186

    Article  Google Scholar 

  • Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2013) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 96(2):859–876. https://doi.org/10.3168/jds.2012-5639

    Article  CAS  Google Scholar 

  • Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME (2010) Plant breeding with genomic selection: gain per unit time and cost. Crop Sci 50(5):1681–1690

    Article  Google Scholar 

  • Hickey JM et al (2017) Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet 49(9):1297–1303

    Article  CAS  PubMed  Google Scholar 

  • Leirness JB, Kinlan BP (2018) Additional statistical analyses to support guidelines for marine avian sampling. Sterling (VA): US Department of the Interior, Bureau of Ocean Energy Management. OCS Study BOEM, p 63

  • Lyu J, Nadarajah S (2021) Discrete lognormal distributions with application to insurance data. Int J Syst Assur Eng Manag 13:1–15

    Google Scholar 

  • Merrick LF, Lozada DN, Chen X, Carter AH (2022) Classification and regression models for genomic selection of skewed phenotypes: a case for disease resistance in winter wheat (Triticum aestivum L.). Front Genet 13:835781

    Article  PubMed  PubMed Central  Google Scholar 

  • Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Montesinos-López OA, Montesinos-López A, Crossa J, Burgueño J, Eskridge K (2015a) Genomic-enabled prediction of ordinal data with Bayesian logistic ordinal regression. G3 Genes Genomes Genet 5(10):2113–2126

    Article  Google Scholar 

  • Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Eskridge K, He X, Juliana P, Singh P, Crossa J (2015b) Genomic prediction models for count data. J Agric Biol Environ Stat 20:533–554

    Article  MathSciNet  Google Scholar 

  • Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, de Los Campos G, Eskridge K, Crossa J (2015c) Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding. G3 Genes, Genomes, Genet 5(2):291–300

    Article  Google Scholar 

  • Montesinos-López A, Montesinos-López OA, Crossa J, Burgueño J, Eskridge KM, Falconi-Castillo E, Cichy K (2016) Genomic Bayesian prediction model for count data with genotype× environment interaction. G3 Genes Genomes Genet 6(5):1165–1177

    Article  Google Scholar 

  • Montesinos-López OA, Montesinos-López A, Crossa J, Toledo FH, Montesinos-López JC, Singh P, Salinas-Ruiz J (2017) A Bayesian Poisson-lognormal model for count data for multiple-trait multiple-environment genomic-enabled prediction. G3 Genes Genomes Genet 7(5):1595–1606

    Article  Google Scholar 

  • Montesinos-López OA, Montesinos-López JC, Singh P, Lozano-Ramirez N, Barrón-López A, Montesinos-López A, Crossa J (2020) A multivariate Poisson deep learning model for genomic prediction of count data. G3 Genes Genomes Genet 10(11):4177–4190

    Article  Google Scholar 

  • Montesinos López OA, Montesinos López A, Crossa J (2022) Multivariate statistical machine learning methods for genomic prediction. Springer Nature, p 691

    Book  Google Scholar 

  • Moreira JA, Zeng XHT, Amaral LAN (2015) The distribution of the asymptotic number of citations to sets of publications by a researcher or from an academic department are consistent with a discrete lognormal model. PLoS One 10(11):e0143108

    Article  PubMed  PubMed Central  Google Scholar 

  • Oliveira SL, Turkman MA, Pereira JM (2012) An analysis of fire frequency in tropical savannas of northern Australia, using a satellite-based fire atlas. Int J Wildland Fire 22(4):479–492

    Article  Google Scholar 

  • Pérez P, de Los Campos G (2014a) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483–495

    Article  PubMed  PubMed Central  Google Scholar 

  • Pérez P, de Los Campos G (2014b) BGLR: a statistical package for whole genome regression and prediction. Genetics 198(2):483–495

    Article  PubMed  PubMed Central  Google Scholar 

  • Pryce JE, Arias J, Bowman PJ, Davis SR, Macdonald KA, Waghorn GC, Spelman RJ (2012) Accuracy of genomic predictions of residual feed intake and 250-day body weight in growing heifers using 625,000 single nucleotide polymorphism markers. J Dairy Sci 95(4):2108–2119

    Article  CAS  PubMed  Google Scholar 

  • R Core Team (2023) R: a language and environment for statistical computing [Internet]. Vienna: R Foundation for Statistical Computing; Available from https://www.R-project.org/

  • Rutkoski J, Poland J, Jannink JL, Sorrells ME (2016) Imputation of unordered markers and the impact on genomic selection accuracy. G3 Genes Genomes Genet 6(5):1285–1296

    Google Scholar 

  • Sorensen DA, Andersen S, Gianola D, Korsgaard I (1995) Bayesian inference in threshold models using Gibbs sampling. Genet Sel Evol 27(3):229–249

    Article  PubMed Central  Google Scholar 

  • Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redona E, McCouch SR (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11(2):e1004982

    Article  PubMed  PubMed Central  Google Scholar 

  • Stringer MJ, Sales-Pardo M, Nunes Amaral LA (2008) Effectiveness of journal ranking schemes as a tool for locating information. PLoS ONE 3(2):e1683

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  • Stringer MJ, Sales-Pardo M, Amaral LAN (2010) Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. J Am Soc Inform Sci Technol 61(7):1377–1385

    Article  Google Scholar 

  • Thelwall M (2016) The discretised lognormal and hooked power law distributions for complete citation data: best options for modelling and regression. J Informetr 10(2):336–346

    Article  Google Scholar 

  • Thelwall M, Wilson P (2014) Distributions for cited articles from individual subjects and years. J Informetr 8(4):824–839

    Article  Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423

    Article  CAS  PubMed  Google Scholar 

  • Zhang Q et al (2015) Genomic selection for productive and disease resistance traits in cattle: a review. J Anim Sci Biotechnol 6(1):32

    Article  Google Scholar 

  • Zhao M, Leng Y, Chao S, Xu SS, Zhong S (2018) Molecular mapping of QTL for Fusarium head blight resistance introgressed into durum wheat. Theor Appl Genet 131:1939–1951

    Article  CAS  PubMed  Google Scholar 

  • Zhu Z, Chen L, Zhang W, Yang L, Zhu W, Li J, Gao C (2020) Genome-wide association analysis of Fusarium head blight resistance in Chinese elite wheat lines. Front Plant Sci 11:206

    Article  PubMed  PubMed Central  Google Scholar 

  • Zipkin EF, Leirness JB, Kinlan BP, O’Connell AF, Silverman ED (2014) Fitting statistical distributions to sea duck count data: implications for survey design and abundance estimation. Stat Methodol 17:67–81

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are thankful for the financial support provided by the Bill & Melinda Gates Foundation [INV-003439, BMGF/FCDO, Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)], the USAID projects [USAID Amend. No. 9 MTO 069033, USAID-CIMMYT Wheat/AGGMW, AGG-Maize Supplementary Project, AGG (Stress Tolerant Maize for Africa], and the CIMMYT CRP (maize and wheat). We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) through the Research Council of Norway for grants 301835 (Sustainable Management of Rust Diseases in Wheat) and 320090 (Phenotyping for Healthier and more Productive Wheat Crops).

Funding

We are thankful for the financial support provided by the Bill & Melinda Gates Foundation [INV-003439, BMGF/FCDO, Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)].

Author information

Authors and Affiliations

Authors

Contributions

AML, OAML, and SRP developed the idea, implemented the model, and wrote the manuscript. HGP, JCML, and JC assisted in writing and critically reviewing the article.

Corresponding authors

Correspondence to Osval A. Montesinos-López or José Crossa.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable (not human or animal data are used).

Consent to participate

Authors have declared that have consented to participate.

Consent for publication

Authors have declared that have consented to participate.

Additional information

Communicated by Mikko J. Sillanpää.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

The observed response variable in model (1) is the result of applying the floor function to a continuous Lognormal regression model, that is, \(Y_{i} = \left\lfloor {L_{i}^{*} } \right\rfloor\), where given \({{\varvec{x}}}_{i}\) the latent variable \({L}_{i}^{*}\) follows a Lognormal distribution with parameters \({\mu }_{i}={\beta }_{0}+\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\) and \({\sigma }_{i}^{2}={\sigma }^{2}\), \(i=1,\dots ,n.\) Then, by expressing \({L}_{i}^{*}={\text{exp}}({L}_{i})\) where \({L}_{i}|{{\varvec{x}}}_{i}\sim N\left({\mu }_{i},{\sigma }^{2}\right)\), \(i=1,\dots ,n\), and by augmenting the posterior distribution of the parameters of model (1), \({\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2}\), with latent variables \({L}_{i}\) (\({\varvec{L}}={\left({L}_{1},..,{L}_{n}\right)}^{T}\)), the joint posterior of \({\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2}\) and \({\varvec{L}}\) is given by

$${f}_{{\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2},{\varvec{L}}|{\varvec{Y}}}\left({\beta }_{0},{\varvec{\beta}},{\sigma }_{\beta }^{2},{\sigma }^{2},{\varvec{l}}|{\varvec{y}}\right)\propto \left\{\prod_{i=1}^{n}\frac{1}{\sqrt{{\sigma }^{2}}}{\text{exp}}\left[-\frac{1}{2{\sigma }^{2}}{\left({l}_{i}-{\beta }_{0}-\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\right)}^{2}\right]{I}_{\left\{{\text{log}}\left({y}_{i}\right)\le {l}_{i}\le \mathit{log}\left({y}_{i}+1\right)\right\}}\right\}\times {f}_{{\varvec{\beta}}|{\sigma }_{\beta }^{2}}\left({\varvec{\beta}}|{\sigma }_{\beta }^{2}\right){f}_{{\sigma }_{\beta }^{2}}\left({\sigma }_{\beta }^{2}\right){f}_{{\sigma }^{2}}\left({\sigma }^{2}\right)$$
(A1)

where \({\varvec{l}}={\left({l}_{1},..,{l}_{n}\right)}^{T}\). From here and doing simple algebraic manipulations, the full conditional posterior for \({\beta }_{0}\) is a normal distribution with mean \(\frac{1}{n}\sum_{i=1}^{n}{l}_{i}^{\left(0\right)}\) and variance \(\frac{{\sigma }^{2}}{n}\) where \({l}_{i}^{(0)}={l}_{i}-\sum_{\begin{array}{c}j=1\end{array}}^{p}{x}_{ij}{\beta }_{j}\).

Similarly, the full conditional posterior for each \({\beta }_{k},\) \(k=1,..,p,\) is a normal distribution with variance \(\frac{1}{{\sigma }_{\beta }^{-2}+{\sigma }^{-2}\sum_{i=1}^{n}{x}_{ik}^{2}}\) and mean \(\frac{{\sigma }^{-2}}{{\sigma }_{\beta }^{-2}+{\sigma }^{-2}\sum_{i=1}^{n}{x}_{ik}^{2}}\sum_{i=1}^{n}{l}_{i}^{(k)}{x}_{ik}\) where \({l}_{i}^{(k)}={l}_{i}-{\beta }_{0}-\sum_{\begin{array}{c}j=1\\ j\ne k\end{array}}^{p}{x}_{ij}{\beta }_{j}\).

The full conditional posterior for \({\sigma }_{\beta }^{2}\) is

$${f}_{{\sigma }_{\beta }^{2}}\left({\sigma }_{\beta }^{2}|{\varvec{y}},-\right)\propto \frac{1}{{{\sigma }_{\beta }^{2}}^{p/2}}{\text{exp}}\left[-\frac{1}{2{\sigma }_{\beta }^{2}}\sum_{j=1}^{p}{\beta }_{j}^{2}\right]\frac{1}{{{\sigma }_{\beta }^{2}}^{1+{v}_{\beta }/2}}{\text{exp}}\left(-\frac{{s}_{\beta }}{2{\sigma }_{\beta }^{2}}\right)\propto \frac{1}{{{\sigma }_{\beta }^{2}}^{1+({v}_{\beta }+p)/2}}{\text{exp}}\left[-\frac{\left({s}_{\beta }+\sum_{j=1}^{p}{\beta }_{j}^{2}\right)}{2{\sigma }_{\beta }^{2}}\right]$$

which corresponds to the density of a scaled inverse chi-squared distribution (\({\chi }^{-2}\)) and so \({\sigma }_{\beta }^{2}|{\varvec{y}},-\sim {\chi }^{-2}\left({\widetilde{v}}_{\beta },{\widetilde{s}}_{\beta }\right)\), \({\widetilde{v}}_{\beta }={v}_{\beta }+p\) and\({\widetilde{s}}_{\beta }={s}_{\beta }+\sum_{j=1}^{p}{\beta }_{j}^{2}\). Likewise, the full conditional posterior for \({\sigma }^{2}\) is \({\sigma }^{2}|{\varvec{y}},-\sim {\chi }^{-2}\left(\widetilde{v},\widetilde{s}\right)\), \(\widetilde{v}=v+n\) and\(\widetilde{s}=s+\sum_{i=1}^{n}{\left({l}_{i}-{\beta }_{0}-\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\right)}^{2}\). Here, we denote the rest of the parameters other than the parameter for which the conditional distribution is specified.

Now, from equation (A1) the full conditional posterior for \({\varvec{L}}\) is given by

$${f}_{{\varvec{L}}|{\varvec{Y}}}\left({\varvec{l}}|{\varvec{y}},-\right)\propto \prod_{i=1}^{n}\frac{1}{\sqrt{2\pi {\sigma }^{2}}}{\text{exp}}\left[-\frac{1}{2{\sigma }^{2}}{\left({l}_{i}-{\beta }_{0}-\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\right)}^{2}\right]{I}_{\left\{{\text{log}}\left({y}_{i}\right)\le {l}_{i}\le {\text{log}}({y}_{i}+1)\right\}}$$

and from here conditioned to \({\varvec{Y}}\) and the parameters of model (1), \({L}_{1},..,{L}_{n}\) are independent random variables, each one with truncated normal distribution on (\(\mathit{log}\left({y}_{i}\right),log({y}_{i}+1\)) with parameters \({\beta }_{0}+\sum_{j=1}^{p}{x}_{ij}{\beta }_{j}\) and \({\sigma }^{2}\), \(i=1,\dots ,n\), respectively.

Appendix 2

See Figs. 3 and 4.

Fig. 3
figure 3

Trace plots for two chains of the log-posterior density of model (1) with predictor (2)

Fig. 4
figure 4

Trace plots for two chains of the fixed (environment) effects and variance components of model (1) with predictor (2)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Montesinos-López, A., Gutiérrez-Pulido, H., Ramos-Pulido, S. et al. Bayesian discrete lognormal regression model for genomic prediction. Theor Appl Genet 137, 21 (2024). https://doi.org/10.1007/s00122-023-04526-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00122-023-04526-4

Navigation