Bayesian and Classical Prediction Models for Categorical and Count Data

Montesinos López, Osval Antonio; Montesinos López, Abelardo; Crossa, Jose

doi:10.1007/978-3-030-89010-0_7

Osval Antonio Montesinos López⁴,
Abelardo Montesinos López⁵ &
Jose Crossa^6,7

18k Accesses
1 Citations

Abstract

In this chapter, we explain, under a Bayesian framework, the fundamentals and practical issues for implementing genomic prediction models for categorical and count traits. First, we derive the Bayesian ordinal model and exemplify it with plant breeding data. These examples were implemented in the library BGLR. We also derive the ordinal logistic regression. The fundamentals and practical issues of penalized multinomial logistic regression and penalized Poisson regression are given including several examples illustrating the use of the glmnet library. All the examples include main effects of environments and genotypes as well as the genotype × environment interaction term.

You have full access to this open access chapter, Download chapter PDF

Genome-Enabled Prediction Using the BLR (Bayesian Linear Regression) R-Package

Genomic Prediction Models for Count Data

Article Open access 07 October 2015

MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits

Article Open access 23 July 2021

Keywords

7.1 Introduction

Categorical and count response variables are common in many plant breeding programs. For this reason, in this chapter, the fundamental and practical issues for implementing genomic prediction models with these types of response variables are provided. Also, current open source software is used for its implementation. The prediction models proposed here are Bayesian and classical models. For each type of these models, we provide the fundamental principles and concepts, whereas their practical implementation is illustrated with examples in the genomic selection context. Taken into account are the most common terms in the predictor, such as the main effects of environments and genotypes, and interaction terms like genotype × environment interaction and marker × environment interaction. First, the models under the Bayesian framework are presented and then those under a classical framework. Next, the fundamentals and applications of the Bayesian ordinal regression model are provided.

7.2 Bayesian Ordinal Regression Model

In many applications, the response (Y) is not continuous, but rather multi-categorical in nature: ordinal (the categories of the response can be ordered, 1, …, C) or nominal (the categories have no order). For example, in plant breeding, categorical scores for resistance and susceptibility are often collected [disease: 1 (no disease), 2 (low infection), 3 (moderate infection), 4 (high infection), and 5 (complete infection)]. Another example is in animal breeding, where calving ease is scored as 1, 2, or 3 to indicate normal birth, slight difficult birth, or extreme difficult birth, respectively. The categorical traits are assigned to one of a set of mutually exclusive and exhaustive response values.

It is well documented that linear regression models in this situation are difficult to justify (Gianola 1980, 1982; Gianola and Foulley 1983) and inadequate results can be obtained, unless there is a large number of categories and the data seem to show a distribution that is approximately normal (Montesinos-López et al. 2015b).

For the ordinal model, we appeal to the existence of a latent continuous random variable and the categories are conceived as contiguous intervals on the continuous scale, as presented in McCullagh (1980), where the points of division (thresholds) are denoted as γ₀, γ₁, …, γ_C. In this way, the ordinal model assumes that conditioned to x_i (covariates of dimension p), Y_i is a random variable that takes values 1, …, C, with the following probabilities:

$$ {p}_{ic}=P\left({Y}_i=c\right)=P\left({\gamma}_{c-1}\le {L}_i\le {\gamma}_c\right) $$

(7.1)

$$ =F\left({\gamma}_c+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right)-F\left({\gamma}_{c-1}+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right),\kern0.5em c=1,\dots, C, $$

where $ {L}_i=-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} +{\epsilon}_i $ is a latent variable, ϵ_i is a random error term with cumulative distribution function F, β = (β₁, …, β_p)^T denotes the parameter vector associated with the effects of the covariates, and − ∞ = γ₀ < γ₁ < … < γ_C = ∞ are threshold parameters and also need to be estimated, that is, the parameter vector to be estimated in this model is $ \boldsymbol{\theta} ={\left({\boldsymbol{\beta}}^{\mathrm{T}},{\boldsymbol{\gamma}}^{\mathrm{T}},{\sigma}_{\beta}^2\right)}^{\mathrm{T}} $, where γ = (γ₁, …, γ_C − 1)^T.

All the ordinal models presented in this chapter share the property that the categories can be considered as contiguous intervals on some continuous scale, but they differ in their assumptions regarding the distributions of the latent variable. When 𝐹 is the standard logistic distribution function, the resulting model is known as the logistic ordinal model :

$$ {p}_{ic}=F\left({\gamma}_c+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right)-F\left({\gamma}_{c-1}+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right),\kern0.5em c=1,\dots, C, $$

where $ F(z)=\frac{1}{1+\exp \left(-z\right)}. $ When 𝐹 is the standard normal distribution function, the resulting model is the ordinal probit model :

$$ {p}_{ic}=F\left({\gamma}_c+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right)-F\left({\gamma}_{c-1}+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right),\kern0.5em c=1,\dots, C, $$

w$ \mathrm{here}\ F(z)={\int}_{-\infty}^z\frac{1}{\sqrt{2\pi }}\exp \left(-\frac{u^2}{2}\right) du. $ When the response value only takes two values, model (7.1) is reduced to the binary regression model, which in the first case is better known as logistic regression, while in the second case, it is known as the probit regression model. For this reason, logistic regression or probit regression is a particular case of ordinal regression even under a Bayesian framework; for this reason, the Bayesian frameworks that will be described for ordinal regression in this chapter are reduced to logistic or probit regression when only one threshold parameter needs to be estimated (γ₁), or equivalently this is set to 0 and an intercept parameter (β₀) is added to the vector coefficients that also need to be estimated, β = (β₀, β₁, …, β_p)^T. In this model, the following development will be concentrated in the probit ordinal regression model.

A Bayesian specification for the ordinal regression is very similar to the linear model described in Chap. 6, and assumes the following prior distribution for the parameter vector $ \boldsymbol{\theta} ={\left({\boldsymbol{\beta}}^T,{\boldsymbol{\gamma}}^T,{\sigma}_{\beta}^2\right)}^{\mathrm{T}}: $

$$ f\left(\boldsymbol{\theta} \right)\propto \frac{1}{{\left({\sigma}_{\beta}^2\right)}^{\frac{p}{2}}}\exp \left(\left[-\frac{1}{2{\sigma}_{\beta}^2}{\boldsymbol{\beta}}^{\mathrm{T}}\boldsymbol{\beta} \right]\right)\frac{{\left(\frac{S_{\beta }}{2}\right)}^{\frac{v_{\beta }}{2}}}{\Gamma \left(\frac{v_{\beta }}{2}\right){\left({\sigma}_{\beta}^2\right)}^{1+\frac{v_{\beta }}{2}}}\exp \left(-\frac{S_{\beta }}{2{\sigma}_{\beta}^2}\right) $$

That is, a flat prior is assumed for the threshold parameter γ, a multivariate normal distribution for the vector of beta coefficients, $ \boldsymbol{\beta} \mid {\sigma}_{\beta}^2\sim {N}_p\left(\mathbf{0},{\sigma}_{\beta}^2{\boldsymbol{I}}_p\right) $, and a scaled inverse chi-squared for $ {\sigma}_{\beta}^2. $ From now on, this model will be referred to as BRR in this chapter in analogy to the Bayesian linear regression and the way that this is implemented in the BGLR R package. Similar to the genomic linear regression model, the posterior distribution of the parameter vector does not have a known form and a Gibbs sampler is used to explore this; for this reason, in the coming lines, the Gibbs sampling method proposed by Albert and Chib (1993) is described. To make it possible to derive the full conditional distributions, the parameter vector is augmented with a latent variable in the representation of model (7.1).

The joint probability density function of the vector of observations, Y = (Y₁, …, Y_n)^T, and the latent variables L = (L₁, …, L_n)^T, evaluated in the vector values y = (y₁, …, y_n)^T and l = (l₁, …, l_n)^T, is given by

$$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{y},\boldsymbol{l}|\boldsymbol{\beta}, \boldsymbol{\gamma} \right)=\prod \limits_{i=1}^n{f}_{L_i}\left({l}_i\right){I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)\\ {}=\prod \limits_{i=1}^n\exp \left[-\frac{1}{2}{\left({l}_i+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right)}^2\right]{I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)\\ {}=\exp \left[-\frac{1}{2}\sum \limits_{i=1}^n{\left({l}_i+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} \right)}^2\right]\prod \limits_{i=1}^n{I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right),\end{array}} $$

where I_A is the indicator function of a set A, and $ {f}_{L_i}\left({l}_i\right) $ is the density function of the latent variable that in the ordinal probit regression model corresponds to the normal distribution with mean $ -{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} $ and unit variance. Then the full conditional of the jth component of β, β_j, given the rest of the parameters and the latent variables, is given by

$$ {\displaystyle \begin{array}{c}f\left({\beta}_j|-\right)\propto f\left(\boldsymbol{y},\boldsymbol{l}|\boldsymbol{\beta}, \boldsymbol{\gamma} \right)f\left(\boldsymbol{\beta} |{\sigma}_{\beta}^2\right)\\ {}\propto \prod \limits_{i=1}^n{f}_{L_i}\left({l}_i\right){I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)f\left(\boldsymbol{\beta} |{\sigma}_{\beta}^2\right)\\ {}\propto \exp \left[-\frac{1}{2}\sum \limits_{i=1}^n{\left({e}_{ij}+{x}_{ij}{\beta}_j\right)}^2-\frac{1}{2{\sigma}_{\beta}^2}{\beta}_j^2\right]\\ {}\propto \exp \left\{-\frac{1}{2{\tilde{\sigma}}_{\beta_j}^2}{\left({\beta}_j-{\tilde{\beta}}_j\right)}^2\right\},\end{array}} $$

where$ {e}_{ij}={l}_i+{\sum}_{\genfrac{}{}{0pt}{}{k=1}{k\ne j}}^p{x}_{1k}{\beta}_k $, l = (l₁, …, l_n)^T, x_j = [x_1j, …, x_nj]^T, e_j = [e_1j, …, e_nj]^T, $ {\overset{\sim }{\sigma}}_{\beta_j}^2={\left({\sigma}_{\beta}^{-2}+{\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{x}}_j\right)}^{-1} $, and $ {\overset{\sim }{\beta}}_j=-{\overset{\sim }{\sigma}}_{\beta_j}^2\left({\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{e}}_j\right) $. So, the full conditional distribution of β_j is $ N\left({\overset{\sim }{\beta}}_j,{\overset{\sim }{\sigma}}_{\beta_j}^2\right) $.

Now, the full conditional distribution of the threshold parameter γ_c is

$$ {\displaystyle \begin{array}{c}f\left({\gamma}_c|-\right)\propto f\left(\boldsymbol{y},\boldsymbol{l}|\boldsymbol{\beta}, \boldsymbol{\gamma} \right)\\ {}\propto \prod \limits_{i=1}^n{I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)\\ {}\propto \prod \limits_{i\in \left\{i:{y}_i=c\right\}}{I}_{\left({\gamma}_{c-1},{\gamma}_c\right)}\left({l}_i\right)\prod \limits_{i\in \left\{i:{y}_i=c+1\right\}}{I}_{\left({\gamma}_c,{\gamma}_{c+1}\right)}\left({l}_i\right)\\ {}\propto {I}_{\left({a}_c,{b}_c\right)}\left({\gamma}_c\right),\end{array}} $$

where a_c = max {l_i : y_i = c} and b_c = min {l_i : y_i = c + 1}. So γ_c ∣ − ∼ U(a_c, b_c), c = 1, 2, …, C − 1. Now, the full conditional distribution of the latent variables is

$$ f\left(\boldsymbol{l}|-\right)\propto f\left(\boldsymbol{y},\boldsymbol{l}|\boldsymbol{\beta}, \boldsymbol{\gamma} \right)\propto \prod \limits_{i=1}^n{f}_{L_i}\left({l}_i\right){I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right) $$

from which we have that conditional on the rest of parameters, the latent variables are independent truncated normal random variables, that is, $ {L}_i\mid -\sim N\left(-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta}, 1\right) $ truncated in the interval $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $, i = 1, …, n.

The full conditional distribution of $ {\sigma}_{\beta}^2 $ is the same as in the linear regression model described in Chap. 6, so a Gibbs sampler for exploration of the posterior distribution of the parameters of the models can be implemented by iterating the following steps:

1.
Initialize the parameters: β_j = β_j0, j = 1, …, p, γ_c = γ_c0, c = 1, …, C − 1, l_i = l_i0, i = 1, …, n, and $ {\sigma}_{\beta}^2={\sigma}_{\beta 0}^2 $.
2.
For each i = 1, …, n, simulate l_i from the normal distribution $ N\left(-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta}, 1\right) $ truncated in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $.
3.
For each j = 1, …, p, simulate from the full conditional distribution of β_j, that is, from a $ N\left({\overset{\sim }{\beta}}_j,{\overset{\sim }{\sigma}}_{\beta_j}^2\right), $ where $ {\overset{\sim }{\sigma}}_{\beta_j}^2={\left({\sigma}_{\beta}^{-2}+{\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{x}}_j\right)}^{-1} $ and $ {\overset{\sim }{\beta}}_j=-{\overset{\sim }{\sigma}}_{\beta_j}^2\left({\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{e}}_j\right) $.
4.
For each c = 1, 2, …, C − 1, simulate from the full conditional distribution of γ_c, γ_c ∣ − ∼ U(a_c, b_c), where a_c = max {l_i : y_i = c} and b_c = min {l_i : y_i = c + 1}.
5.
Simulate $ {\sigma}_{\beta}^2 $ from a scaled inverse chi-squared distribution with parameters $ {\overset{\sim }{v}}_{\beta }={v}_{\beta }+p $ and $ {\overset{\sim }{S}}_{\beta }={S}_{\beta }+{\boldsymbol{\beta}}^{\mathrm{T}}\boldsymbol{\beta} $, that is, from a $ {\chi}^{-2}\left({\overset{\sim }{v}}_{\beta },{\overset{\sim }{S}}_{\beta}\right). $
6.
Repeat 1–5 until a burning period and the desired number of samples are reached.

A similar Gibbs sampler is implemented in the BGLR R package, and the hyperparameters are established in a similar way as the linear regression models described in Chap. 6. When the hyperparameters S_β and v_β are not specified, by default BGLR is assigned v_β = 5 and to S_β a value such that the prior mode of $ {\sigma}_{\beta}^2 $ (χ⁻²(v_β, S_β)) is equal to half of the “total variance” of the latent variables (Pérez and de los Campos 2014). An implementation of this model can be done with a fixed value of the variance component parameter $ {\sigma}_{0\beta}^2 $ by choosing a very large value of the degree of freedom parameter (v_β = 10¹⁰) and taking the scale value parameter $ {S}_{\beta }={\sigma}_{0\beta}^2\left({v}_{\beta }+2\right) $.

The basic R code with the BGLR implementation of this model is as follows:

ETA = list( list( X=X, model=‘BRR’ ) ) A = BGLR(y=y, response_type='ordinal', ETA=ETA, nIter = 1e4, burnIn = 1e3) Probs = A$Probs

where the predictor component is specified in ETA: argument X specifies the design matrix, whereas the argument model describes the priors of the model’s parameters. The response variable vector is given in y, and the ordinal model is specified in the response_type argument. In the last line, the posterior means of the probabilities of each category are extracted, and can be used to give an estimation of a particular metric when comparing models or to evaluate the performance of this model, which will be explained later. An application of this ordinal model is given by Montesinos-López et al. (2015a).

Similar to the linear regression model in Chap. 6, when the number of markers to be used in genomic prediction is very large relative to the number of observations, p ≫ n, the dimension of the posterior distribution (number of parameters in θ, p + c − 1 + 1) to be explored can be reduced by simulating the genetic vector effect b = (b₁, …, b_n)^T, where $ {b}_i={\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} $, i = 1, …, n, instead of β. Because $ \boldsymbol{\beta} \mid {\sigma}_{\beta}^2\sim {N}_p\left(\mathbf{0},{\sigma}_{\beta}^2{\boldsymbol{I}}_p\right) $, the induced distribution of b = (b₁, …, b_n)^T is $ \boldsymbol{b}\mid {\sigma}_g^2\sim {N}_n\left(\mathbf{0},{\sigma}_g^2\boldsymbol{G}\right) $, where $ {\sigma}_g^2=p{\sigma}_{\beta}^2 $ and $ \boldsymbol{G}=\frac{1}{p}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X} $. Then, with this and assuming a scaled inverse chi-squared distribution as prior for $ {\sigma}_g^2 $, $ {\sigma}_g^2\sim {\chi}^{-2}\left({v}_g,{S}_g\right) $, and a flat prior for the threshold parameters (γ), an ordinal probit GBLUP Bayesian regression model specification of model (7.1) is given by

$$ {p}_{ic}=P\left({Y}_i=c\right)=\Phi \left({\gamma}_c+{b}_i\right)-\Phi \left({\gamma}_{c-1}+{b}_i\right),\kern0.5em c=1,\dots, C, $$

(7.2)

where now L_i = − b_i + ϵ_i is the latent variable, and Φ is the cumulative normal standard distribution. In matrix form the model for the latent variable can be specified as L = b + ϵ, where L = (L₁, …, L_n)^T and ϵ~N_n(0, I_n). A Gibbs sampler of the posterior of the parameters of this model can be obtained similarly as was done for model (7.1). Indeed, the full conditional density of b is given by

$$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{b}|-\right)\propto f\left(\boldsymbol{y},\boldsymbol{l}|\boldsymbol{b},\boldsymbol{\gamma} \right)f\left(\boldsymbol{b}|{\sigma}_g^2\right)\\ {}\propto \prod \limits_{i=1}^n{f}_{L_i}\left({l}_i\right){I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)f\left(\boldsymbol{b}|{\sigma}_g^2\right)\\ {}\propto \exp \left[-\frac{1}{2}{\left(\boldsymbol{l}+\boldsymbol{b}\right)}^{\mathrm{T}}\left(\boldsymbol{l}+\boldsymbol{b}\right)-\frac{1}{2{\sigma}_g^2}{\boldsymbol{b}}^{\mathrm{T}}{\boldsymbol{G}}^{-1}\boldsymbol{b}\right]\\ {}\propto \exp \left[-\frac{1}{2}{\left(\boldsymbol{b}-\tilde{\boldsymbol{b}}\right)}^{\mathrm{T}}{\tilde{\boldsymbol{\Sigma}}}_b\left(\boldsymbol{b}-\tilde{\boldsymbol{b}}\right)\right],\end{array}} $$

where $ {\overset{\sim }{\boldsymbol{\Sigma}}}_b={\left({\sigma}_g^{-2}{\boldsymbol{G}}^{-1}+{\boldsymbol{I}}_n\right)}^{-1} $ and $ \overset{\sim }{\boldsymbol{b}}=-{\overset{\sim }{\boldsymbol{\Sigma}}}_b\boldsymbol{l} $. So the full conditional distribution of b is $ {N}_n\left(\overset{\sim }{\boldsymbol{b}},{\overset{\sim }{\boldsymbol{\Sigma}}}_b\right) $. Then, a Gibbs sampler exploration of this model can be done by iterating the following steps:

1.
Initialize the parameters: b_i = b_i0, i = 1, …, n, γ_c = γ_c0, c = 1, …, C − 1, l_i = l_i0, i = 1, …, n, and $ {\sigma}_{\beta}^2={\sigma}_{\beta 0}^2 $.
2.
For each i = 1, …, n, simulate l_i from the normal distribution $ N\left(-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta}, 1\right) $ truncated in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $.
3.
Simulate b from $ {N}_n\left(\overset{\sim }{\boldsymbol{b}},{\overset{\sim }{\boldsymbol{\Sigma}}}_b\right) $, with $ {\overset{\sim }{\boldsymbol{\Sigma}}}_b={\left({\sigma}_g^{-2}{\boldsymbol{G}}^{-1}+{\boldsymbol{I}}_n\right)}^{-1} $ and $ \overset{\sim }{\boldsymbol{b}}=-{\overset{\sim }{\boldsymbol{\Sigma}}}_b\boldsymbol{l} $.
4.
For each c = 1, 2…, C − 1, simulate from the full conditional distribution of γ_c γ_c ∣ − ∼ U(a_c, b_c), where a_c = max {l_i : y_i = c} and b_c = min {l_i : y_i = c + 1}.
5.
Simulate $ {\sigma}_g^2 $ from a scaled inverse chi-squared distribution with parameters $ \tilde{v}_{g}={v}_g+n $ and $ {\overset{\sim }{S}}_g={S}_g+{\boldsymbol{b}}^{\mathrm{T}}\boldsymbol{b} $, that is, from a $ {\chi}^{-2}\left(\tilde{v}_{g},{\overset{\sim }{S}}_g\right). $
6.
Repeat 1–5 until a burning period and the desired number of samples are reached.

This model can also be implemented with the BGLR package:

ETA = list( list( K=G, model=‘RKHS’ ) ) A = BGLR(y=y, response_type='ordinal', ETA=ETA, nIter = 1e4, burnIn = 1e3) Probs = A$Probs

where instead of specifying the design matrix and the prior model to the associated regression coefficients, now the genomic relationship matrix G is given, and the corresponding model in this case corresponds to RKHS, which can also be used when another type of information between the lines is available, for example, the pedigree relationship matrix.

Like the Bayesian linear regression model, other variants of model BRR can be obtained by adopting different prior models for the beta coefficients (β): FIXED, BayesA, BayesB, BayesC, or BL, for example. See Chap. 6 for details of each of these prior models for the regression coefficients. Indeed, the full conditional distributions used to implement a Gibbs sampler in each of these ordinal models are the same as the corresponding Bayesian linear regression models, except that the full conditional distribution for the variance component of random errors is not needed and two full conditional distributions (for the threshold parameters and the latent variables) are added, where now the latent variable will play the role of the response value in the Bayesian linear regression. These models can also be implemented in the BGLR package.

For example, a Gibbs sampler implementation for ordinal model (7.1) with a BayesA prior (see Chap. 6) can be done by following steps:

1.
Initialize the parameters: β_j = β_j0, j = 1, …, p, γ_c = γ_c0, c = 1, …, C − 1, l_i = l_i0, i = 1, …, n, and $ {\sigma}_{\beta}^2={\sigma}_{\beta 0}^2 $.
2.
For each i = 1, …, n, simulate l_i from the normal distribution $ N\left(-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta}, 1\right) $ truncated in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $.
3.
For each j = 1, …, p, simulate from the full conditional distribution of β_j, that is, from a $ N\left({\overset{\sim }{\beta}}_j,{\overset{\sim }{\sigma}}_{\beta_j}^2\right), $ where $ {\overset{\sim }{\sigma}}_{\beta_j}^2={\left({\sigma}_{\beta_j}^{-2}+{\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{x}}_j\right)}^{-1} $ and $ {\overset{\sim }{\beta}}_j={\overset{\sim }{\sigma}}_{\beta_j}^2\left({\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{e}}_j\right) $.
4.
For each c = 1, 2, …, C − 1, simulate from the full conditional distribution of γ_c, γ_c ∣ − ∼ U(a_c, b_c), where a_c = max {l_i : y_i = c} and b_c = min {l_i : y_i = c + 1}.
5.
Simulate from the full conditional of each $ {\sigma_{\beta}}_j^2 $
$$ {\sigma}_{\beta_j}^2\mid \boldsymbol{\gamma}, \boldsymbol{l},{\boldsymbol{\beta}}_0,{\boldsymbol{\sigma}}_{-j}^2,{S}_{\beta },\sim {\chi}_{\tilde{v}_{j},{{\overset{\sim }{S}}_{\beta}}_j}^{-2}, $$

where $ {\overset{\sim }{v}}_{\beta_j}={v}_{\beta }+1 $, $ \mathrm{scale}\ \mathrm{parameter}\ {\overset{\sim }{S}}_{\beta_j}={S}_{\beta }+{\beta}_j^2 $, and $ {\boldsymbol{\sigma}}_{-j}^2 $ is the vector $ {\boldsymbol{\sigma}}_{\beta}^2=\left({\sigma}_{\beta_1}^2,\dots, {\sigma}_{\beta_p}^2\right) $ but without the jth entry.
6.
Simulate from the full conditional of S_β
$$ {\displaystyle \begin{array}{c}f\left({S}_{\beta }|-\right)\propto \left[\prod \limits_{j=1}^pf\left({\sigma}_{\beta_j}^2|{S}_{\beta }\ \right)\right]f\left({S}_{\beta}\right)\\ {}\propto \prod \limits_{j=1}^p\left[\frac{{\left(\frac{S_{\beta }}{2}\right)}^{\frac{v_{\beta }}{2}}}{\Gamma \left(\frac{v_{\beta }}{2}\right){\left({\sigma}_{\beta_j}^2\right)}^{1+\frac{v_{\beta }}{2}}}\exp \left(-\frac{S_{\beta }}{2{\sigma}_{\beta_j}^2}\right)\right]{S}_{\beta}^{s-1}\exp \left(-{rS}_{\beta}\right)\\ {}\propto {S}_{\beta}^{s+\frac{pv_{\beta }}{2}-1}\exp \left[-\left(r+\frac{1}{2}\sum \limits_{j=1}^p\frac{1}{\sigma_{\beta j}^2}\right){S}_{\beta}\right]\end{array}} $$

which corresponds to the kernel of the gamma distribution with rate parameter $ \overset{\sim }{r}=r+\frac{1}{2}{\sum}_{j=1}^p\frac{1}{\sigma_{\beta j}^2} $ and shape parameter $ \overset{\sim }{s}=s+\frac{pv_{\beta }}{2} $, and so, $ {S}_{\beta}\mid -\sim \mathrm{Gamma}\left(\overset{\sim }{r},\overset{\sim }{s}\right). $
7.
Repeat steps 2–6 depending on how many values you wish to simulate of the parameter vector ($ {\boldsymbol{\beta}}^{\mathrm{T}},{\boldsymbol{\sigma}}_{\beta}^2,{\boldsymbol{\gamma}}^{\mathrm{T}} $)

The implementation of BayesA for an ordinal response variable in BGLR is as follows:

ETA = list( list( X=X, model=‘BayesA’ ) ) A = BGLR(y=y, response_type='ordinal', ETA=ETA, nIter = 1e4, burnIn = 1e3) Probs = A$Probs

The implementation of the other Bayesian ordinal regression models, BayesB, BayesC, and BL, in BGLR is done by only replacing in ETA the corresponding model, as was commented before, see Chap. 6 for details and the difference between all these prior models for the regression coefficients.

7.2.1 Illustrative Examples

Example 1 BRR and GBLUP

To illustrate how these models work and how they can be implemented, here we used a toy data set with an ordinal response with five levels and consisting of 50 lines that were planted in three environments (a total of 150 observations) and the genotyped information of 1000 markers for each line. To evaluate the performance of the described models (BRR, BayesA, GBLUP, BayesB, BayesC, and BL), 10 random partitions were used in a cross-validation strategy, where 80% of the complete data set was used to train the model and the rest to evaluate the model in terms of the Brier score (BS) and the proportion of cases correctly classified (PCCC).

Six models were implemented with the BGLR R package, using a burn-in period equal to 1000 and 10,000 iterations, and the default hyperparameter for prior distribution. The results are shown in Table 7.1, where we can appreciate similar performance in terms of the Brier score metric in all models, and a similar but poor performance of all models when the proportion of cases correctly classified (PCCC) was used. With the BRR model only in 4 out of 10 partitions, the values of the PCCC were slightly greater or equal to what can be obtained by random assignation (1/5), while with the GBLUP model this happened in 3 out of 10 partitions. A similar behavior is obtained with the rest of the models. On average, the PCCC values were 0.1566, 0.1466, 0.15, 0.1466, 0.1533, and 0.1433 for the BRR, GBLUP, BayesA, BayesB, BayesC, and Bayes Lasso (BL) models, respectively.

Table 7.1 Brier score (BS) and proportion of cases correctly classified (PCCC) across 10 random partitions, with 80% of the total data set used for training and the rest for testing, under model (7.1) with different priors models to the marker coefficients: ordinal Bayesian Ridge regression model (BRR), BayesA, BayesB, BayesC, and BL (model (7.1)); and under the ordinal GBLUP regression model (7.2)

Full size table

The R code to reproduce the results in Table 7.1 is given in Appendix 1.

In some applications, additional information is available, such as the sites (locations) where the experiments were conducted, environmental covariates, etc., which can be taken into account to improve the prediction performance.

One extension of model (7.1) that takes into account environment effects and environment–marker interaction is given by

$$ {p}_{ic}=P\left({Y}_i=c\right)=F\left({\gamma}_c-{\eta}_i\right)-F\left({\gamma}_{c-1}-{\eta}_i\right),c=1,\dots, C, $$

(7.3)

where now the predictor $ {\eta}_i={\boldsymbol{x}}_{Ei}^{\mathrm{T}}{\boldsymbol{\beta}}_E+{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} +{\boldsymbol{x}}_{EM i}^T{\boldsymbol{\beta}}_{EM} $, in addition to the marker effects ($ {\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} $) in the second summand, is included the environment and the environment–marker interaction effects, respectively. In the latent variable representation, this model is equivalently expressed as

$$ P\left({Y}_i=c\right)=P\left({\gamma}_{c-1}\le {L}_i\le {\gamma}_c\right),c=1,\dots, C, $$

where L = (L₁, …, L_n)^T = X_Eβ_E + Xβ + X_EMβ_EM + ϵ is the vector with latent random variables of all observations, ϵ ∼ N_n(0, I_n) is a random error vector, X_E, X, and X_EM are the design matrices of the environments, markers, and environment–marker interactions, respectively, while β_E and β_EM are the vectors of the environment effects and the interaction effects, respectively, with a prior distribution that can be specified as was done for β. In fact, with the BGLR function, it is also possible to implement all these extensions, since it allows using any of the several priors included here: FIXED, BRR, BayesA, BayesB, BayesC, and BL. For example, the basic BGLR code to implement model (7.3) with a flat prior (“FIXED”) for the environment effects, a BRR prior for marker effects and for the environment–marker interaction effects, is as follows:

X = scale(X) #Environment matrix design XE = model.matrix(~Env,data=dat_F)[,-1] #Environment-marker interaction matrix design XEM = model.matrix(~0+X:Env,data=dat_F) ETA = list(list(X=XE, model='FIXED'), list(X=X, model='BRR'), list(X=XEM, model='BRR'))#Predictor A = BGLR(y=y,response_type='ordinal',ETA=ETA, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Probs = A$probs

where dat_F is the data file that contains all the information of how the data was collected (GID: Lines or individuals; Env: Environment; y: response variable of the trait). Other desired prior models to beta coefficients of each predictor component are obtained only by replacing the “model” argument of each of the three components of the predictor. For example, for a BayesA prior model for the marker effects, in the second sub-list we must use model='BayesA'.

The latent random vector of model (7.1) under the GBLUP specification, plus genotypic and environment×genotypic interaction effects, takes the form

$$ \boldsymbol{L}={\boldsymbol{X}}_E{\boldsymbol{\beta}}_E+{\boldsymbol{Z}}_L\boldsymbol{g}+{\boldsymbol{Z}}_{LE}\boldsymbol{gE}+\boldsymbol{\epsilon} $$

(7.4)

which is like model (6.7) in Chap. 6, where Z_L and Z_LE are the incident matrices of the genotypes and environment×genotype interaction effects, respectively, and g and gE are the corresponding random effects which have distributions $ {N}_J\left(\boldsymbol{O},{\sigma}_g^2\boldsymbol{G}\right) $ and $ {N}_J\left(\boldsymbol{O},{\sigma}_{gE}^2\left({\boldsymbol{I}}_I\boldsymbol{\bigotimes}\boldsymbol{G}\right)\right) $, respectively, where J is the number of different lines in the data set. This model can be trained with the BGLR package as follows:

I = length(unique(dat_F$Env)) XE = model.matrix(~0+Env,data=dat_F)[,-1] Z_L = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID))) K_L = Z_L %*%G%*%t(Z_L) Z_LE = model.matrix(~0+GID:Env,data=dat_F, xlev = list(GID=unique(dat_F$GID),Env = unique(dat_F$Env))) K_LE = Z_LE%*%kronecker(diag(I),G)%*%t(Z_LE) ETA = list( list(model='FIXED',X=XE), list( model = ‘RHKS’, K = K_L )), list(model='RKHS',K=K_LE ) A = BGLR(y, response_type = “ordinal”, ETA = ETA, nIter = 1e4, burnIn = 1e3) Probs = A$Probs

where dat_F is as before, that is, the data file that contains all the information of how the data were collected. Similar specification can be done when other kinds of covariates are available, for example, environmental covariates.

Example 2 Environments + Markers + Marker×Environment Interaction

This example illustrates how models (7.3) and (7.4) can be fitted and used for prediction. Here we used the same data set as in Example 1, but now the environment and environment×marker effects are included in the predictor. For model (7.3), to the environment effect a flat prior is assigned [FIXED, a normal distribution with mean 0 and very large variance (10¹⁰)], and one of BRR, BayesA, BayesB, BayesC, or BL prior model is assigned to the marker and marker×environment interaction effects. For model (7.4), a flat prior is assigned for environment effects. The performance evaluation was done using the same 10 random partitions used in Example 1, where 80% of the complete data set was used for training the model and the rest for testing, in which the Brier score (BS) and the proportion of correctly classified (PCCC) metrics were computed.

The results are shown in Table 7.2. They indicate an improved prediction performance of these models compared to the models fitted in Example 1 which only takes into account the marker effects (see Table 7.1). However, this improvement is only slightly better under the Brier score, because the reduction in the average BS across all 10 partitions was 1.55, 3.74, 0.96, 0.83, 1.41, and 1.32% for the BRR, GBLUP, BayesA, BayesB, BayesC, and BL, respectively. A more notable improvement was obtained with the PCCC criteria, where now for all models in about 8 out of the 10 partitions, the value of this metric was greater than the one obtained by random chance only. Indeed, the average values across the 10 partitions were 0.27, 0.28, 0.28, 0.27, 0.28, and 0.27, for the BRR, GBLUP, BayesA, BayesB, BayesC, and BL, respectively. This indicates 74.47, 90.90, 86.67, 86.36, 80.43, and 86.05% improvement for these models with respect to their counterpart models when only marker effects or genomic effects were considered. So, greater improvement was observed with the GBLUP model with both metrics, but finally the performance of all models is almost undistinguishable but with an advantage in time execution of the GBLUP model with respect to the rest. These issues were pointed out in Chaps. 5 and 6. This difference is even greater when the number of markers is larger than the number of observations.

Table 7.2 Brier score (BS) and proportion of cases correctly classified (PCCC) across 10 random partitions, with 80% of the total data set used for training and the rest for testing, under model (7.3) with different prior models for the marker effects and the environment×marker interaction: BRR, BayesA, BayesB, BayesC, and BL; and under the ordinal GBLUP regression model (7.4)

Full size table

Example 3 Binary Traits

For this example, we used the EYT Toy data set consisting of 40 lines, four environments (Bed5IR, EHT, Flat5IR, and LHT), and a response binary variable based on plant Height (0 = low, 1 = high). For this example, marker information is not available, only the genomic relationship matrix for the 40 lines. So, only the models (M3 and M4) in (7.3) and (7.4) are fitted, and the performance prediction for these models was evaluated using cross-validation. Also, in this comparison model, (M5) (7.5) is added but without the line×environment interaction effects, that is, only the environment effect and the genetic effects are taken into account in the linear predictor:

$$ \boldsymbol{L}={\boldsymbol{X}}_E{\boldsymbol{\beta}}_E+{\boldsymbol{Z}}_L\boldsymbol{g}+\boldsymbol{\epsilon} $$

(7.5)

The results are presented in Table 7.3 with the BS and PCCC metrics obtained in each partition of the random CV strategy. From this we can appreciate that the best performance with both metrics was obtained with the model that considered only the genetic effects (M3; (7.3)). On average, models M4 and M5 gave a performance in terms of BS, that was 24.02 and 19.79% greater than the corresponding performance of model M3, while with model M3, on average across the 10 partitions, an improvement of 21.57 and 19.19% in terms of PCCC was obtained with regard to models M4 and M5. The difference between these last two models is only slight; the average BS value of the first model was 2.01% greater than that of the second, and the PCCC of the second model was 0.50% greater than that of the first. The R code to reproduce the results is in Appendix 3.

Table 7.3 Brier score (BS) and proportion of cases correctly classified (PCCC) across 10 random partitions, with 80% of the total data set used for training and the rest for testing, under models (7.3), (7.4), and (7.5) with a flat prior for environment effects

Full size table

7.3 Ordinal Logistic Regression

As described at the beginning of this chapter, the ordinal logistic model is given in model (7.1) but with F the cumulative logistic distribution. Again, as in the ordinal probit regression model, the posterior distribution of the parameter is not easy to simulate and numerical methods are required. Here we describe the Gibbs sampler proposed by Montesinos-López et al. (2015b), which in addition to the latent variable L_i in the representation of model (7.1), the parameter vectors are also augmented with a Pólya-Gamma latent random variable.

By using the following identity proposed by Polson et al. (2013):

$$ \frac{{\left[\exp \left(\eta \right)\right]}^a}{{\left[1+\exp \left(\eta \right)\right]}^b}={2}^{-b}\exp \left( k\eta \right){\int}_0^{\infty}\exp \left(-\frac{\eta^2}{2}\omega \right){f}_{\upomega}\left(\omega; b,0\right) d\omega, $$

where k = a − b/2 and f_ω(ω; b, 0) is the density of a Pólya-Gamma random variable ω with parameters b and d = 0 (ω ∼ PG(b, d), (see Polson et al. (2013) for details), the joint distribution of the vector of observations, Y = (Y₁, …, Y_n)^T, and the latent variables L = (L₁, …, L_n)^T can be expressed as

$$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{y},\boldsymbol{l}|\boldsymbol{\beta}, \boldsymbol{\gamma} \right)=\prod \limits_{i=1}^n{f}_{L_i}\left({l}_i\right){I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)\\ {}=\prod \limits_{i=1}^n\frac{\exp \left(-{l}_i-{\eta}_i\right)}{{\left[1+\exp \left(-{l}_i-{\eta}_i\right)\right]}^2}{I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)\\ {}\propto \prod \limits_{i=1}^n\left[{\int}_0^{\infty}\exp \left(-\frac{{\left({l}_i+{\eta}_i\right)}^2}{2}{\omega}_i\right){f}_{\omega_i}\left({\omega}_i;2,0\right)d{\omega}_i\right]{I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right),\end{array}} $$

where $ {\eta}_i={\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} $ and ω₁, …, ω_n are independent random variables with the same Pólya-Gamma distribution with parameters b = 2 and d = 0.

Now, the Bayesian specification developed in Montesinos-López et al. (2015b) assumes the same priors as for the ordinal probit model (BRR prior model), except that now for the threshold parameter vector γ, the prior distribution proposed by Sorensen et al. (1995) is adopted, which is the distribution of the order statistics of a random sample of size C − 1 of the uniform distribution in (γ_L, γ_U), that is,

$$ f\left(\boldsymbol{\gamma} \right)=\left(C-1\right)!{\left(\frac{1}{\gamma_U-{\gamma}_L}\right)}^{C-1}{I}_{\left\{\boldsymbol{\gamma} \in {\boldsymbol{S}}_{\gamma}\right\}}, $$

where S_γ = {(γ₁, …, γ_C − 1) : γ_L < γ₁ < γ₂ < ⋯ < γ_C − 1 < γ_U}.

Then, by conditioning on the rest of parameters, including the latent Pólya-Gamma random variables ω = (ω₁, …, ω_n)^T, the full conditional of γ_c is given by

$ {\displaystyle \begin{array}{c}f\left({\gamma}_c|-\right)\propto f\left(\boldsymbol{y},\boldsymbol{l},\boldsymbol{\omega} |\boldsymbol{\beta}, \boldsymbol{\gamma} \right)f\left(\boldsymbol{\gamma} \right)\\ {}\propto \left[\prod \limits_{i=1}^n{I}_{\left({\gamma}_{y_i-1},{\gamma}_{y_i}\right)}\left({l}_i\right)\right]\left(C-1\right)!{\left(\frac{1}{\gamma_U-{\gamma}_L}\right)}^{C-1}{I}_{{\boldsymbol{S}}_{\gamma }}\left(\boldsymbol{\gamma} \right)\\ {}\propto \left[\prod \limits_{i\in \left\{i:{y}_i=c\right\}}{I}_{\left({\gamma}_{c-1},{\gamma}_c\right)}\left({l}_i\right)\prod \limits_{i\in \left\{i:{y}_i=c+1\right\}}{I}_{\left({\gamma}_c,{\gamma}_{c+1}\right)}\left({l}_i\right)\right]\left(C-1\right)!{\left(\frac{1}{\gamma_U-{\gamma}_L}\right)}^{C-1}{I}_{{\boldsymbol{S}}_{\gamma }}\left(\boldsymbol{\gamma} \right)\\ {}\propto {I}_{\left({a}_c,{b}_c\right)}\left({\gamma}_c\right)\end{array}} $

where f(y, l, ω| β, γ) is the joint density of the observations and the vector of the latent random variables, L and ω, a_c = max {γ_c − 1, max{l_i : y_i = c}}, and b_c = min {γ_c + 1, min{l_i : y_i = c + 1}}. So the full conditional for the threshold parameters is also uniform, γ_c ∣ − ∼ U(a_c, b_c), c = 1, 2…, C − 1.

Now, the conditional distribution of β_j (jth element of β) is given by

$$ {\displaystyle \begin{array}{c}f\left({\beta}_j|{\sigma}_{\beta}^2,\boldsymbol{\gamma}, \boldsymbol{L},\boldsymbol{\omega} \right)\propto f\left(\boldsymbol{y},\boldsymbol{l},\boldsymbol{\omega} |\boldsymbol{\beta}, \boldsymbol{\gamma} \right)f\left(\boldsymbol{\beta} |{\sigma}_{\beta}^2\right)\\ {}\propto \exp \left[-\frac{1}{2}\sum \limits_i^n{\omega}_i\frac{{\left({l}_i+{\eta}_i\right)}^2}{2}-\frac{1}{2{\sigma}_{\beta}^2}{\beta}_j^2\right]\\ {}\propto \exp \left[-\frac{1}{2}\sum \limits_i^n{\omega}_i\frac{{\left({e}_{ij}+{x}_{ij}{\beta}_j\right)}^2}{2}-\frac{1}{2{\sigma}_{\beta}^2}{\beta}_j^2\right]\\ {}\propto \exp \left\{-\frac{1}{2{\tilde{\sigma}}_{\beta_j}^2}{\left({\beta}_j-{\tilde{\beta}}_j\right)}^2\right\},\end{array}} $$

where $ {e}_{ij}={l}_i+{\sum}_{\genfrac{}{}{0pt}{}{k=1}{k\ne j}}^p{x}_{1k}{\beta}_k $, e_j = [e_1j, …, e_nj]^T, x_j = [x_1j, …, x_nj]^T, $ {\overset{\sim }{\sigma}}_{\beta_j}^2={\left({\sigma}_{\beta}^{-2}+{\sum}_{i=1}^n{\omega}_i{x}_{ij}^2\right)}^{-1} $, and $ {\overset{\sim }{\beta}}_j=-{\overset{\sim }{\sigma}}_{\beta_j}^2\left({\sum}_{i=1}^n{w}_i{x}_{ij}{e}_{ij}\right) $. From this, the full conditional distribution of β_j is a normal distribution with mean $ {\overset{\sim }{\beta}}_j $ and variance $ {\overset{\sim }{\sigma}}_{\beta_j}^2. $

The full conditional distribution of $ {\sigma}_{\beta}^2 $ is the same as the one obtained for the ordinal probit model, $ {\sigma}_{\beta}^2\mid -\sim {\chi}^{-2}\left({\overset{\sim }{v}}_{\beta },{\overset{\sim }{S}}_{\beta}\right) $ with $ {\overset{\sim }{v}}_{\beta }={v}_{\beta }+p $ and $ {\overset{\sim }{S}}_{\beta }={S}_{\beta }+{\boldsymbol{\beta}}^{\mathrm{T}}\boldsymbol{\beta} . $ In a similar fashion, for the latent variable L_i, i = 1, …, n, it can be seen that its full conditional distributions are also truncated normal in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $ but with mean parameter $ -{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta} $ and variance 1/ω_i, i.e., $ {L}_i\mid -\sim N\left(-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta}, {\omega}_i^{-1}\right) $ truncated in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $, for each i = 1, …, n. Finally, by following Eq. (5) in Polson et al. (2013), note that the full joint conditional distribution of the Pólya random variables ω can be written as

$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{\omega} |-\right)\propto f\left(\boldsymbol{y},\boldsymbol{l},\boldsymbol{\omega} |\boldsymbol{\beta}, \boldsymbol{\gamma} \right)\\ {}\propto \prod \limits_{i=1}^n\exp \left(-\frac{{\left({l}_i+{\eta}_i\right)}^2}{2}{\omega}_i\right){f}_{\upomega_{\mathrm{i}}}\left({\omega}_i;2,0\right)\\ {}\propto \prod \limits_{i=1}^n{f}_{\upomega_{\mathrm{i}}}\left({\omega}_i;2,{l}_i+{\eta}_i\right).\end{array}} $

From here, conditionally to $ \boldsymbol{\beta}, {\sigma}_{\beta}^2,\boldsymbol{\gamma}, $ and L, ω₁, …, ω_n are independently Pólya-Gamma random variables with parameters b = 2 and d = l_i + η_i, i = 1, …, n, respectively, that is, ω_i ∣ − ∼ PG(2, l_i + η_i), i = 1, …, n.

With the above derived full conditionals, a Gibbs sampler exploration of this ordinal logistic regression model can be done with the following steps:

1
Initialize the parameters: β_j = β_j0, j = 1, …, p, γ_c = γ_c0, c = 1, …, C − 1, l_i = l_i0, i = 1, …, n, and $ {\sigma}_{\beta}^2={\sigma}_{\beta 0}^2 $.
2.
Simulate ω₁, …, ω_n independently Pólya-Gamma random variables with parameters b = 2 and d = l_i + η_i, i = 1, …, n.
3.
For each i = 1, …, n, simulate l_i from the normal distribution $ N\left(-{\boldsymbol{x}}_i^{\mathrm{T}}\boldsymbol{\beta}, {\omega}_i^{-1}\right) $ truncated in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $.
4.
For each j = 1, …, p, simulate from the full conditional distribution of β_j, that is, from a $ N\left({\overset{\sim }{\beta}}_j,{\overset{\sim }{\sigma}}_{\beta_j}^2\right), $ where $ {\overset{\sim }{\sigma}}_{\beta_j}^2={\left({\sigma}_{\beta}^{-2}+{\sum}_{i=1}^n{\omega}_i{x}_{ij}^2\right)}^{-1} $ and $ {\overset{\sim }{\beta}}_j=-{\overset{\sim }{\sigma}}_{\beta_j}^2\left({\sum}_{i=1}^n{\omega}_i{x}_{ij}{e}_{ij}\right) $.
5.
For each c = 1, 2…, C − 1, simulate from the full conditional distribution of γ_c, γ_c ∣ − ∼ U(a_c, b_c), where a_c = max {γ_c − 1, max{l_i : y_i = c}} and b_c = min {γ_c + 1, min{l_i : y_i = c + 1}}.
6.
Simulate $ {\sigma}_{\beta}^2 $ from a scaled inverse chi-squared distribution with parameters $ {\overset{\sim }{v}}_{\beta }={v}_{\beta }+p $ and $ {\overset{\sim }{S}}_{\beta }={S}_{\beta }+{\boldsymbol{\beta}}^{\mathrm{T}}\boldsymbol{\beta} $, that is, from a $ {\chi}^{-2}\left({\overset{\sim }{v}}_{\beta },{\overset{\sim }{S}}_{\beta}\right). $
7.
Repeat 1–6 until a burning period and a desired number of samples are reached.

Similar modifications can be done to obtain Gibbs samplers corresponding to other prior adopted models for the beta coefficients (FIXED, BayesA, BayesB, BayesC, or BL; see Chap. 6 for details of these priors). Also, for the ordinal logistic GBLUP Bayesian regression model specification as done in (7.2) for the ordinal probit model, a Gibbs sampler implementation can be done following the steps below, which can be obtained directly from the ones described above:

1.
Initialize the parameters: b_i = b_i0, j = 1, …, n, γ_c = γ_c0, c = 1, …, C − 1, l_i = l_i0, i = 1, …, n, and $ {\sigma}_g^2={\sigma}_{g0}^2 $.
2.
Simulate ω₁, …, ω_n independently Pólya-Gamma random variables with parameters b = 2 and d = l_i + b_i, i = 1, …, n.
3.
For each i = 1, …, n, simulate l_i from the normal distribution $ N\left(-{b}_i,{\omega}_i^{-1}\right) $ truncated in $ \left({\gamma}_{y_i-1},{\gamma}_{y_i}\right) $.
4.
Simulate b from $ {N}_n\left(\overset{\sim }{\boldsymbol{b}},{\overset{\sim }{\boldsymbol{\Sigma}}}_b\right) $, with $ {\overset{\sim }{\boldsymbol{\Sigma}}}_b={\left({\sigma}_g^{-2}{\boldsymbol{G}}^{-1}+{\boldsymbol{D}}_{\omega}\right)}^{-1} $ and $ \overset{\sim }{\boldsymbol{b}}=-{\overset{\sim }{\boldsymbol{\Sigma}}}_b{\boldsymbol{D}}_{\omega}\boldsymbol{l} $, where D_ω = Diag(ω₁, …, ω_n).
5.
For each c = 1, 2…, C − 1, simulate from the full conditional distribution of γ_c, γ_c ∣ − ∼ U(a_c, b_c), where a_c = max {γ_c − 1, max{l_i : y_i = c}} and b_c = min {γ_c + 1, min{l_i : y_i = c + 1}}.
6.
Simulate $ {\sigma}_g^2 $ from a scaled inverse chi-squared distribution with parameters $ \tilde{v}_{g}={v}_g+n $ and $ {\overset{\sim }{S}}_g={S}_g+{\boldsymbol{b}}^{\mathrm{T}}\boldsymbol{b} $, that is, from a $ {\chi}^{-2}\left(\tilde{v}_{g},{\overset{\sim }{S}}_g\right). $
7.
Repeat 1–6 until a burning period and the desired number of samples are reached.

There is a lot of empirical evidence that there are no large differences between the prediction performance of the probit or logistic regression models. For this reason, here we only explained the Gibbs sampler for ordinal data under a logistic framework and did not provide illustrated examples. Also, we did not provide illustrative examples because it is not possible to implement this logistic ordinal model in BGLR.

7.4 Penalized Multinomial Logistic Regression

An extension of the logistic regression model described in Chap. 3 is the multinomial regression model that is also used to explain or predict a categorical nominal response variable (that does not have any natural order) with more than two categories. For example, a study could investigate the association of markers with diabesity (diabetes and obesity, diabetes but no obesity, obesity but no diabetes, and no diabetes and no obesity); another example could be to study the effects of age group on the histological subtypes of woman cancer (adenocarcinoma, adenosquamous, others), predict the preference in an election among four candidates (C1, C2, C3, and C4) using socioeconomic and demographic variables, etc.

The multinomial logistic regression model assumes that, conditionally to a covariate x_i, a multinomial response random variable Y_i takes one of the categories 1, 2, …, C, with the following probabilities:

$$ P\left({Y}_i=c|{\boldsymbol{x}}_i\right)=\frac{\exp \left({\beta}_{0c}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_c\right)}{\sum_{l=1}^C\exp \left({\beta}_{0l}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_l\right)},c=1,\dots, C, $$

(7.6)

where β_c, c = 1, …, C, is a vector of coefficients of the same dimension as x.

Model (7.6) is not identifiable because by evaluating these probabilities with the parameter values $ \left({\beta}_{0c}^{\ast },{\boldsymbol{\beta}}_c^{\ast \mathrm{T}}\right)=\left({\beta}_{0c}+{\beta}_{0c}^{\ast \ast },{\boldsymbol{\beta}}_{0c}^{\mathrm{T}}+{\boldsymbol{\beta}}_c^{\ast \ast \mathrm{T}}\right) $, c = 1, …, C, where $ {\beta}_{0c}^{\ast \ast } $ and $ {\boldsymbol{\beta}}_c^{\ast \ast } $ are arbitrary constants and vectors, give equal probabilities than when computing this with the original parameter values $ \left({\beta}_{0c},{\boldsymbol{\beta}}_c^{\mathrm{T}}\right) $, c = 1, …, C. A common constraint that avoids this lack of identifiability is to set $ \left({\beta}_{0C},{\boldsymbol{\beta}}_C^{\mathrm{T}}\right)=\left(0,{\mathbf{0}}^{\mathrm{T}}\right) $, although any one of the other C − 1 of the vectors could be chosen.

With such a constraint ($ \left({\beta}_{0C},{\boldsymbol{\beta}}_C^{\mathrm{T}}\right)=\left(0,{\mathbf{0}}^{\mathrm{T}}\right) $), we can identify a model by assuming that the effects of the covariate vector x_i over the log “odds” of each of the categories c = 1, …, C − 1 with respect to category C (baseline category) are given (Agresti 2012) by

$$ \log \left(\frac{P\left({Y}_i=c|x\right)}{P\left({Y}_i=C|x\right)}\right)={\beta}_{0c}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_c,\kern0.5em c=1,\dots, C-1, $$

where the effects of x_i depend on the chosen response baseline category. Similar expressions can be obtained when using the unconstrained model in (7.6). From these relationships, the log odds corresponding to the probabilities of any two categories, c and l, can be derived:

$$ \log \left(\frac{P\left({Y}_i=c|x\right)}{P\left({Y}_i=l|x\right)}\right)={\beta}_{0c}-{\beta}_{0l}+{\boldsymbol{x}}_i^{\mathrm{T}}\left({\boldsymbol{\beta}}_c-{\boldsymbol{\beta}}_l\right),\kern0.75em c=1,\dots, C-1. $$

Sometimes the number of covariates is larger than the number of observations (for example, in expression arrays and genomic prediction where the number of markers is often larger than the number of phenotyped individuals), so the conventional constraint described above to force identifiability in the model is not enough and some identifiability problems remain. One way to avoid this problem is to use similar quadratic regularization maximum likelihood estimation methods (Zhu and Hastie 2004) as done for some models in Chap. 3, but here only on the slopes (β_c) for the covariates, under which setting the constraints mentioned before are no longer necessary.

For a given value of the regularization parameter, λ > 0, the regularized maximum likelihood estimation of the beta coefficients, $ \boldsymbol{\beta} ={\left({\beta}_{0c},{\boldsymbol{\beta}}_c^{\mathrm{T}},{\beta}_{02},{\boldsymbol{\beta}}_2^{\mathrm{T}},\dots, {\beta}_{0C},{\boldsymbol{\beta}}_C^{\mathrm{T}}\right)}^{\mathrm{T}} $, is the value of this that maximizes the penalized log-likelihood:

$$ {\mathrm{\ell}}_p\left(\boldsymbol{\beta}; \boldsymbol{y}\right)=\mathrm{\ell}\left(\boldsymbol{\beta}; \boldsymbol{y}\right)-\lambda \sum \limits_{c=1}^C{\boldsymbol{\beta}}_c^{\mathrm{T}}{\boldsymbol{\beta}}_c, $$

(7.7)

where ℓ(β; y) = log [L(β; y)] is the logarithm of the likelihood and takes the form:

$$ {\displaystyle \begin{array}{c}\mathrm{\ell}\left(\boldsymbol{\beta}; \boldsymbol{y}\right)=\sum \limits_{i=1}^n\log \left[P\left({Y}_i={y}_i|{\boldsymbol{x}}_i\right)\right]=\sum \limits_{i=1}^n\sum \limits_{c=1}^C{I}_{\left\{{y}_i=c\right\}}\log \left[\frac{\exp \left({\beta}_{0c}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_c\right)}{\sum_{l=1}^C\exp \left({\beta}_{0l}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_l\right)}\right]\\ {}=\sum \limits_{i=1}^n\sum \limits_{c=1}^C{I}_{\left\{{y}_i=c\right\}}\left({\beta}_{0c}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_c\right)-\sum \limits_{i=1}^n\log \left[\sum \limits_{l=1}^C\exp \left({\beta}_{0l}+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_l\right)\right]\end{array}} $$

(7.8)

When p is large (p ≫ n), direct optimization of ℓ_p(β; y) is almost impossible. An alternative is to use the sequential minimization optimization algorithm proposed by Zhu and Hastie (2004), which is applied after a transformation trick is used to make the involved computations feasible, because the number of parameters in the optimization is reduced to only (n + 1)C instead of (p + 1)C.

Another alternative available in the glmnet package is the one proposed by Friedman et al. (2010) that is similar to that of the logistic Ridge regression in Chap. 3. This consists of maximizing (7.7) by using a block-coordinate descent strategy, where each block is formed by the beta coefficients corresponding to each class, $ {\boldsymbol{\beta}}_c^{\ast \mathrm{T}}=\left({\beta}_{0c},{\boldsymbol{\beta}}_c^{\mathrm{T}}\right) $, but with ℓ(β; y) replaced by a quadratic approximation with respect to beta coefficients of the chosen block, $ \left({\beta}_{0c},{\boldsymbol{\beta}}_c^{\mathrm{T}}\right) $, at the current values of β ($ \overset{\sim }{\boldsymbol{\beta}} $). That is, the update of block c is achieved by maximizing the following function with respect to β_0c and β_c:

$$ {f}_c\left({\beta}_{0c},{\boldsymbol{\beta}}_c\right)={\ell}_c^{\ast}\left(\boldsymbol{\beta}; \boldsymbol{y}\right)-\lambda {\boldsymbol{\beta}}_c^{\mathrm{T}}{\boldsymbol{\beta}}_c, $$

(7.9)

where $ {\ell}_c^{\ast}\left(\boldsymbol{\beta}; \boldsymbol{y}\right)=-\frac{1}{2}{\sum}_{i=1}^n{w}_{ic}{\left({y}_{ic}^{\ast }-{\beta}_{0c}-{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_c\right)}^2+\overset{\sim }{c} $ is the second-order Taylor approximation of ℓ(β; y) with respect to the beta coefficients that conform block c, $ \left({\beta}_{0c},{\boldsymbol{\beta}}_c^{\mathrm{T}}\right) $, about the current estimates $ \overset{\sim }{\boldsymbol{\beta}}={\left({\overset{\sim }{\beta}}_{0c},{\overset{\sim }{\boldsymbol{\beta}}}_c^{\mathrm{T}},{\overset{\sim }{\beta}}_{02},{\overset{\sim }{\boldsymbol{\beta}}}_2^{\mathrm{T}},\dots, {\overset{\sim }{\beta}}_{0C},{\overset{\sim }{\boldsymbol{\beta}}}_C^{\mathrm{T}}\right)}^{\mathrm{T}} $; $ {y}_{ic}^{\ast }={\overset{\sim }{\beta}}_{0c}+{\boldsymbol{x}}_i^{\mathrm{T}}{\overset{\sim }{\boldsymbol{\beta}}}_c+{w}_{ic}^{-1}\left({I}_{\left\{{y}_i=c\right\}}-\tilde{p}_{c}\left({\boldsymbol{x}}_i\right)\right) $ is the “working response,” $ {w}_{ic}=\tilde{p}_{c}\left({\boldsymbol{x}}_i\right)\left(1-\tilde{p}_{c}\left({\boldsymbol{x}}_i\right)\right) $, and $ \tilde{p}_{c}\left({\boldsymbol{x}}_i\right) $ is P(Y_i = c| x_i) given in (7.6) but evaluated at the current parameter values.

When X is the design matrix with standardized independent variables, the updated parameters of block c, $ {\hat{\boldsymbol{\beta}}}_c^{\ast \mathrm{T}}, $ can be obtained by the following formula:

$$ {\hat{\boldsymbol{\beta}}}_c^{\ast \mathrm{T}}={\left({\boldsymbol{X}}^{\ast \mathrm{T}}{\boldsymbol{W}}_c{\boldsymbol{X}}^{\ast }+\lambda \boldsymbol{D}\right)}^{-1}{\boldsymbol{X}}^{\ast \mathrm{T}}{\boldsymbol{W}}_c{\boldsymbol{y}}^{\ast }, $$

where X^∗ = [1_n X], W_c = Diag(w_1c, …, w_nc), $ {\boldsymbol{y}}^{\ast}={\left({y}_1^{\ast },\dots, {y}_n^{\ast}\right)}^{\mathrm{T}} $, and D is an identity matrix of dimension (p + 1) × (p + 1) except that in the first entry we have the value of 0 instead of 1. However, in the context of p ≫ n, a non-prohibited optimization of (7.9) is achieved by using coordinate descent methods as done in the glmnet package and commented in Chap. 3.6.2.

For other penalization terms, a very similar algorithm to the one described before can be used. For example, for Lasso penalty, the penalized likelihood (7.7) is modified as

$$ {\mathrm{\ell}}_p\left(\boldsymbol{\beta}; \boldsymbol{y}\right)=\mathrm{\ell}\left(\boldsymbol{\beta}; \boldsymbol{y}\right)-\lambda \sum \limits_{c=1}^C\sum \limits_{j=1}^p\left|{\beta}_{cj}\right| $$

(7.10)

and block updating can be done as in (7.9), except that now $ {\boldsymbol{\beta}}_c^{\mathrm{T}}{\boldsymbol{\beta}}_c $ is replaced by $ {\sum}_{j=1}^p\left|{\beta}_{cj}\right| $. Like the penalized logistic regression studied in Chap. 3, the more common approach for choosing the “optimal” regularization parameter λ in the penalized multinomial regression model in (7.7) is by using a k-fold cross-validation strategy with misclassification error as metrics. This will be used here. For more details, see Friedman et al. (2010). It is important to point out that the tuning parameter λ used in glmnet is equal to the one used in the penalized log-likelihood (7.7) and (7.10) but divided by the number of observations.

7.4.1 Illustrative Examples for Multinomial Penalized Logistic Regression

Example 4

To illustrate the penalized multinomial regression model, here we considered the ordinal data of Example 1, but considering the data as nominal, which under the prediction paradigm may not be so important. To evaluate the performance of this model with Ridge (7.9) and Lasso penalization (7.10), the same 10 random partitions as used in Example 1 were used in the cross-validation strategy. By combining these two penalizations and including different covariates information, the performance of six resulting models were evaluated: penalized Ridge multinomial logistic regression model (PRMLRM) with markers as predictors (PRMLRM-1); PRMLRM with environment and markers as predictors (PRMLRM-2); PRMLRM with environment, markers, and environment×marker interaction as predictors (PRMLRM-3); and the penalized Lasso multinomial logistic regression models, PLMLRM-1, PLMLRM-2, and PLMLRM-3, which have the same predictors as PRMLRM-1, PRMLRM-2, and PRMLRM-3, respectively.

The results are shown in Table 7.4, where again metrics BS and PCCC were computed for each partition. We can observe a similar performance of all models with both metrics. The greater difference in the average BS values across the 10 partitions was found between PRMLRM-3 (worse) model and the PLMLRM-1-PLMRM-2 (best) models, given a 1.34% greater average BS value of the first one compared to the other two. With respect to the other metric, the best average PCCC value (0.29) was obtained with model PRMLRM-2, which gave a 2.36% better performance than the average value of the worse PCCC performance obtained with models PRMLRM-1-2 (0.28).

Table 7.4 Brier score (BS) and proportion of cases correctly classified (PCCC) across 10 random partitions, with 80% of the total data set used for training and the rest for testing, under model (7.6), with Ridge and Lasso penalization and different covariable information: the penalized Ridge multinomial logistic regression (PRMLR) model with markers as covariates (PRMLRM-1); the PRMLR model with environment and markers as covariates (PRMLRM-2); the PRMLR model with environment, markers, and environment×marker interaction as covariates (PRMLRM-3); PLMLRM-1, PLRMLRM-2, and PLMLRM-3 denote the penalized Lasso multinomial logistic regression model with the same covariates as in PRMLRM-1, PRMLRM-2, and PRMLRM-3, respectively

Full size table

All these models gave a better performance than all models fitted in Example 1, but only a slightly better performance than the models considered in Example 2, which contain additional information besides the marker information [the only inputs included in the models in Example 1 and PRMLRM-1 and PLMLRG-1], also included environment and environment×marker interaction as predictors. Specifically, even the simpler models in Table 7.4 that use only marker information in the predictor (PRMLRM-1 and PLMLRM-1) gave results comparable to the more complex and competitive models fitted in Example 2. The relative difference in the average BS value between the better models in each of these two sets of compared models (models in Example 2 vs. models in Table 7.4) was 1.25%, while the relative difference in the average PCCC was 3.57%. The R code to reproduce the results of models under Ridge penalization is in Appendix 4. The codes corresponding to the Lasso penalization models are the same as those for Ridge penalization but with the alpha value set to 1 instead of 0, as needs to be done in the basic code of the glmnet to fit the penalized Ridge multinomial logistic regression models:

A = cv.glmnet(X, y, family='multinomial', type.measure = "class", nfolds = 10, alpha = 0)

where X and y are the design matrix and vector of response variable for training the multinomial model, and the multinomial model is specified as family='multinomial'; with "class" we specified the metric that will be used in the inner cross-validation (CV) strategy to internally select the “optimal” value of the regularization parameter λ (over a grid of 100 values of λ, by default). The misclassification error is equal to 1 − PCCC; nfolds is the number of folds that are used internally with the inner CV strategy and when this value is not specified, by default it is set to 10; alpha is a mixing parameter for a more general penalty function (Elastic Net penalty) that when it is set to 0, the Ridge penalty is obtained, while the Lasso penalty is used when alpha is set to 1; however, when alpha takes values between 0 and 1, the Elastic Net penalization is implemented.

A kind of GBLUP implementation of the simpler models (PRMLRM-1 and PLMLRM-1) considered in Example 4 can be done by considering the genomic relationship matrix derived from the marker information and using as predictor the lower triangular factor of the Cholesky decomposition or its root squared matrix. Something similar can be done to include the Env×Line interaction term. The kinds of “GBLUP” implementation of PRMLRM-1 and PLMLRM-1 are referred to as GMLRM-R1 and GMLRM-L1, respectively. In both cases, the input matrix is X = Z_LL_g, where Z_L is the incident matrix of the lines and L_g is the lower triangular part of the Cholesky decomposition of G, $ \boldsymbol{G}={\boldsymbol{L}}_g{\boldsymbol{L}}_g^{\mathrm{T}} $. For the rest of the models, this kind of “GBLUP” implementation will be referred to as GMLRM-R2 and GMLRM-R3 for Ridge penalty, and as GMLRM-L2 and GMLRM-L3 for Lasso penalty, and in both cases the input matrix X will be X = [X_E, Z_LL_g] when an environment and genetic effect is present in the predictor (GMLRM-R2, GMLRM-L3), or X = [X_E, Z_LL_g, Z_EL(I_I ⊗ L_g)] when the genotype×environment interaction effect is also taken into account (GMLRM-R2, GMLRM-L3), and where I is the number of environments.

The basic code for glmnet implementation of the GMLRM-L3 model is given below, from which the other five kinds of “GBLUP” implementations described above can be performed by only removing the corresponding predictors and changing to 0 the alpha value in the Ridge penalty:

Lg = t(chol(G)) #Environment matrix design XE = model.matrix(~Env,data=dat_F) #Matrix design of the genetic effect ZL = model.matrix(~0+GID,data=dat_F) ZLa = ZL%*%Lg #Environment-genetic interaction ZEL = model.matrix(~0+GID:Env,data=dat_F) ZELa = ZEL%*%kronecker(diag(dim(XE)[2]),Lg) X = cbind(XE[,-1],ZLa, ZELa)#Input X matrix A = cv.glmnet(X, y, family='multinomial', type.measure = "class", alpha = 1)#alpha=0 for Ridge penalty

Example 5

Here we considered the data in Example 4 to illustrate the implementation of the following six kinds of “GBLUP” models: GMLRM-R1, GMLRM-R2, GMLRM-R3, GMLRM-L1, GMLRM-L2, and GMLRM-L3. To evaluate the performance of these models, the same CV strategy was used, where for each of the 10 random partitions, 80% of the full data set was taken to train the models and the rest to evaluate their performance.

The results presented in Table 7.5 are comparable to those given in Table 7.4 for Example 4. The relative difference between the models in Table 7.4 with better average BS value (PLMLRM-1) and the model in Table 7.5 with better average BS value (GMLRM-R1) is 0.6990%, in favor of the latter model. Regarding the average PCCC, the “kind” of GBLUP model with better value, GMLRM-L2, was 2.2989% greater than the average PCCC value of the better models in Table 7.4, PMLR-R2 and PMLR-L3. Furthermore, for all models in Table 7.5, the greatest difference between the average BS values was observed in models GMLRM-1 (better) and GMLRM-R3 (worse), a relative difference of 2.748%. The best average PCCC performance was observed with model GMLRM-L2, while the worst one was with model GMLRM-L3, a relative difference of 7.23% between the best and the worst.

Table 7.5 Brier score (BS) and proportion of cases correctly classified (PCCC) across 10 random partitions, with 80% of the data used for training and the rest for testing, for multinomial model (7.6) under the six types of “GBLUP” implementation of the Ridge and Lasso penalization in Example 4 that were obtained by varying the input information: GMLRM-R1 (X = Z_LL_g), GMLRM-R2 (X = [X_E, Z_LL_g]), GMLRM-R3 (X = [X_E, Z_LL_g, Z_EL(I_I ⊗ L_g)]), GMLRM-L1 (X = Z_LL_g,), GMLRM-L2 (X = [X_E, Z_LL_g]), and GMLRM-L3 (X = [X_E, Z_LL_g, Z_EL(I_I ⊗ L_g)]), where the first three use Ridge penalization and the last three Lasso penalty

Full size table

The R code to reproduce the results in Table 7.5 for the first three models, see Appendix 5. By only changing the value of alpha to 1, the corresponding R code allows implementing the last three Lasso penalty models.

7.5 Penalized Poisson Regression

In contrast to ordinal data, sometimes the response value is not known to be upper bound, rather it is often a count or frequency data, for example, the number of customers per minute in a specific supermarket, number of infected spikelets per plant, number of defects per unit, number of seeds per plant, etc. The most often used count model to relate a set of explanatory variables with a count response variable is Poisson regression.

Given vector covariates x_i = (x_i1, …, x_ip)^T, the Poisson log-linear regression modeled the number of events Y_i, as a Poisson random variable with mass density

$$ P\left({Y}_i=y|{\boldsymbol{x}}_i\right)=\frac{\lambda_i^y\exp \left(-{\lambda}_i\right)}{y!},y=0,1,2,\dots, $$

(7.11)

where the expected value of Y_i is given by $ {\lambda}_i=\exp \left({\beta}_0+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_0\right) $ and β₀ = (β₁, …, β_p)^T is the vector of beta coefficients effect of the covariates. For example, Y_i may be the number of defects of a product under specific production conditions (temperature, raw material, schedule, operator, etc.) or the number of spikelets per plant which can be predicted from environment and marker information, etc.

For the parameter estimation of this model, the more usual method is maximum likelihood estimation, but this is not suitable in the context of a large number of covariates, so here, as in the logistic and multinomial models, we describe the penalized likelihood method. For a Ridge penalty, the penalized log-likelihood function is given by

$$ {\displaystyle \begin{array}{c}{\mathrm{\ell}}_p\left({\beta}_0,{\boldsymbol{\beta}}_0;\boldsymbol{y}\right)=\sum \limits_{i=1}^n\log \left[{f}_{Yi}\left({y}_i;{\beta}_0,{\boldsymbol{\beta}}_0\right)\right]-\lambda \sum \limits_{j=1}^p{\beta}_j^2\\ {}=\sum \limits_{i=1}^n{y}_i\left({\beta}_0+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_0\right)-\sum \limits_{i=1}^n\exp \left({\beta}_0+{\boldsymbol{x}}_i^{\mathrm{T}}{\boldsymbol{\beta}}_0\right)-\sum \limits_{i=1}^n\log \left({y}_i!\right)-\frac{\lambda }{2}\sum \limits_{j=1}^p{\beta}_j^2\end{array}}. $$

This can be optimized directly by using traditional numerical methods, but in a general genomic prediction context where p ≫ n, other alternatives could be more useful. One is the same procedure described before for multinomial and logistic regressions, in which the log-likelihood is replaced by a second-order approximation at the current beta coefficients and then the resulting weighted least-squares problem is solved by coordinate descent methods. This method for Poisson regression is also implemented in the glmnet package and the basic code for implementing this is as follows:

A = cv.glmnet(X, y, family='poisson', type.measure = "mse", nfolds = 10, alpha = 0)

where again X and y are the design matrix and vector of response variable for training the Poisson model which is specified by family='poisson'; the metric that will be used internally in the inner cross-validation (CV) strategy to choose the “optimal” regularization parameter λ (over a grid of 100 values of λ, by default) is specified in "type.measure". Additionally, to the mean square error (mse), the deviance (deviance) and the mean absolute error (mae) can also be adopted. The other parameters are the same as described for logistic and multinomial regression models. For implementing the Lasso Poison regression (Poisson regression with Lasso penalty), this same code can be used by only making the argument alpha = 1, instead of 0. Further, in the genomic context, as accomplished before for other kinds of responses (continuous, binary, nominal), with this function more complex models involving marker (genetic), environment, or/and marker × environment interaction effects can also be implemented.

Like the predictors described for the multinomial model, all these predictors are shown in Table 7.6, where the first three (PRPRM-1–3) correspond to Poisson Ridge regression with marker, environment plus marker, and environment plus markers plus environment×marker interaction effects, respectively. The second group of three models (PLPRM-1–3) are, respectively, the same as before but with the Lasso penalization. The third group—three models (GRPRM-R1–R3)—are the corresponding GBLUP implementation of the first three predictors, while the last group—three models (GLPRM-L1–L3)—corresponds to the GBLUP implementation of the second group—three Lasso penalized models.

Table 7.6 Penalized Poisson regression models with Ridge or Lasso penalties and different covariate information: penalized Ridge Poisson regression (PRPRM) model with markers as covariates (PRPRM-1); PRPRM model with environment and markers as covariates (PRPRM-2); PRPRM model with environment, markers, and environment ×marker interaction as covariates (PRPRM-3); PLPRM-1, PLPRM-2, and PLPRM-3 denote the corresponding penalized Lasso Poisson regression models. GMLRM-R1–R3 are the corresponding GBLUP implementation of the PRPRM models, while GMLRM-L1–L3 are the respective GBLUP of the PLPRM models. X_E, X_M, and X_EM are the design matrices of environment, markers and environment ×marker interaction, Z_L and Z_EL are the incident matrices of lines and environment ×line interaction effects, and L_g is the lower triangular part of the Cholesky decomposition of G, $ \boldsymbol{G}={\boldsymbol{L}}_g{\boldsymbol{L}}_g^T $

Full size table

Example 6

To illustrate how to fit the Poisson model with the predictors given in Table 7.6 and compare these in terms of the mean squared error of prediction, we considered a small data set that is part of the data set used in Montesinos-López et al. (2016). It consists of 50 lines in three environments and a total of 1635 markers. The performance evaluation was done using 10 random partitions where 80% of the data was used to train the model and the rest to test the model. The glmnet R package was used to train all these regression models.

The results are shown in Table 7.7, where the first six models are the first six penalized Poisson regression models given in Table 7.6. With respect to the Ridge penalized models (PRPRM-1–3), we can observe a slightly better performance of the more complex models that take into account environment, marker, and environment ×marker interaction effects (PRPRM-3). As for the penalized Lasso Poisson regression models (PLPRM-1–3), but with a more notable difference, the more complex models also resulted in better performance, and this was also better than the penalized Ridge model (PRPRM-3).

Table 7.7 Mean square error of prediction (MSE) of six penalized Poisson regression models with Ridge or Lasso penalties and different covariate information (first six models) and MSE of the GBLUP implementation of the corresponding models (second six models), across 10 random partitions, with 80% of the total data set used for training and the rest for testing. See Table 7.6 for details of each model

Full size table

Furthermore, in its GBLUP implementation of the Poisson regression models with Ridge penalty, better MSE performance was obtained with the model that includes environment and genetic effects (GPRM-R2) and this was the best among all Ridge penalized models (PRPRM-1–3 and GPRM-R1–R3); the average MSE of the best PRPRM (PRPRM-3) is 7.5668 greater than the average MSE of the best GPRM with Ridge penalty (GPRM-R2). Under the GBLUP implementations of the Poisson regression models with Lasso penalty (GPRM-L1–L3), the best performance prediction was also obtained when using the environment and genetic effects (GPRM-R2), but its average MSE was slightly different (by only 2.71%) than that obtained with the average MSE of the best Lasso penalized Poisson regression model (PLPRM-3). Additionally, this GBLUP implementation (GPRM-L2) also showed the best average MSE performance among all 12 implemented models, and a notable difference with the second best (GPRM-R2) that gave a relative MSE that was 12.01% greater than that for GPRM-L2 model.

The worse average MSE performance of model GPRM-L3 is because of the high value of the MSE obtained in partition 2. However, this high variability of the MSEP observed across partitions can be unexpected for other larger data sets.

The R code to reproduce these results is shown in Appendix 6.

7.6 Final Comments

In this chapter, we give the fundamentals of some popular Bayesian and classical regression models for categorical and count data. We also provide many practical examples for implementing these models for genomic prediction. The examples used take into account in the predictor not only the effect of markers or genotypes but also illustrate how to take into account the effects of environment, genotype × environment interaction, and marker × environment interaction. Also, for each type of response variables, we calculated the prediction performance using appropriate metrics, which is very important since the metrics for evaluating the prediction performance are dependent upon the type of response variable. It is important to point out that the components to include in the predictor are not restricted to those illustrated here, since the user can include other components as main effects or interaction effects in similar fashion as illustrated in the examples provided.

References

Agresti A (2012) Categorical data analysis, 3rd edn. Wiley, Hoboken, NJ
Google Scholar
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22. http://www.jstatsoft.org/v33/i01/
Article Google Scholar
Gianola D (1980) A method of sire evaluation for dichotomies. J Anim Sci 51(6):1266–1271
Article CAS Google Scholar
Gianola D (1982) Theory and analysis of threshold characters. J Anim Sci 54(5):1079–1096
Article Google Scholar
Gianola D, Foulley JL (1983) Sire evaluation for ordered categorical data with a threshold model. Genet Select Evol 15(2):201–224
Article CAS Google Scholar
McCullagh P (1980) Regression models for ordinal data. J R Stat Soc B Methodol 42(2):109–142
Google Scholar
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, de los Campos G, Eskridge K, Crossa J (2015a) Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding. G3 5(2):291–300
Article Google Scholar
Montesinos-López OA, Montesinos-López A, Crossa J, Burgueño J, Eskridge K (2015b) Genomic-enabled prediction of ordinal data with Bayesian logistic ordinal regression. G3 5(10):2113–2126
Article Google Scholar
Montesinos-López A, Montesinos-López OA, Crossa J, Burgueño J, Eskridge KM, Falconi-Castillo E et al (2016) Genomic Bayesian prediction model for count data with genotype × environment interaction. G3 6(5):1165–1177
Article Google Scholar
Pérez P, de los Campos, G. (2014) BGLR: a statistical package for whole genome regression and prediction. Genetics 198(2):483–495
Article Google Scholar
Polson NG, Scott JG, Windle J (2013) Bayesian inference for logistic models using Pólya–Gamma latent variables. J Am Stat Assoc 108(504):1339–1349
Article CAS Google Scholar
Sorensen DA, Andersen S, Gianola D, Korsgaard I (1995) Bayesian inference in threshold models using Gibbs sampling. Genet Sel Evol 27(3):1–21
Article Google Scholar
Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443
Article Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Telemática, University of Colima, Colima, México
Osval Antonio Montesinos López
Departamento de Matemáticas, University of Guadalajara, Guadalajara, México
Abelardo Montesinos López
Biometrics and Statistics Unit, CIMMYT, Edo de México, México
Jose Crossa
Colegio de Post- Graduado, Edo de México, México
Jose Crossa

Authors

Osval Antonio Montesinos López
View author publications
You can also search for this author in PubMed Google Scholar
Abelardo Montesinos López
View author publications
You can also search for this author in PubMed Google Scholar
Jose Crossa
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Appendix 1

rm(list=ls(all=TRUE)) library(BGLR) load('dat_ls.RData',verbose=TRUE) #Phenotypic data dat_F = dat_ls$dat_F #Marker information dat_M = dat_ls$dat_M #Trait response values y = dat_F$y summary(dat_F) #Sorting data phenotypic data set first by Environment, and then by GID dat_F = dat_F[order(dat_F$Env,dat_F$GID),] #10 random partitions K = 10; n = length(y) set.seed(1) PT = replicate(K,sample(n,0.20*n)) #Brier score function BS_f<-function(y,p) { n = length(y) CM = matrix(0,nr=n,nc=dim(p)[2]) CM[cbind(1:n,y)]<-1 mean(rowSums((p-CM)^2))/2 } #Matrix design of markers for all observations Pos = match(dat_F$GID,row.names(dat_M)) X = dat_M[Pos,] X = scale(X) #Ordinal BRR model ETA_BRR = list(list(X=X,model='BRR')) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_BRR, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab #Ordinal Bayesian GBLUP model #Genomic relationship matrix X_M = scale(dat_M) G = (X_M%*%t(X_M))/dim(X_M)[2] dat_F$GID = factor(dat_F$GID,levels=colnames(G)) Z_L = model.matrix(~0+GID,data=dat_F) Ga = Z_L%*%G%*%t(Z_L) ETA_GBLUP = list(list(K=Ga,model='RKHS')) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_GBLUP, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab #Ordinal BayesA model ETA_BA = list(list(X=X,model='BayesA')) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_BA, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab #Ordinal BayesB model ETA_BB = list(list(X=X,model='BayesB')) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_BB, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab #Ordinal BayesC model ETA_BC = list(list(X=X,model='BayesC')) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_BC, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab #Ordinal BL model ETA_BL = list(list(X=X,model='BL')) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_BL, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab

Appendix 2

rm(list=ls(all=TRUE)) library(BGLR) load('dat_ls.RData',verbose=TRUE) #Phenotypic data dat_F = dat_ls$dat_F #Marker information dat_M = dat_ls$dat_M y = dat_F$y dat_F = dat_F[order(dat_F$Env,dat_F$GID),] #PT #10 random partitions K = 10 n = length(y) set.seed(1) PT = replicate(K,sample(n,0.20*n)) #Brier score BS_f<-function(y,p) { n = length(y) CM = matrix(0,nr=n,nc=dim(p)[2]) CM[cbind(1:n,y)]<-1 mean(rowSums((p-CM)^2))/2 } #Matrix design of markers Pos = match(dat_F$GID,row.names(dat_M)) X = dat_M[Pos,] X = scale(X) #Environment matrix design XE = model.matrix(~Env,data=dat_F)[,-1] #Environment-marker interaction Envs = unique(dat_F$Env) Markers = colnames(X) XEM = model.matrix(~0+X:Env,data=dat_F) Pos = !(colnames(XEM)%in%paste0('X',Markers,':Env',Envs[length(Envs)])) XEM = XEM[,Pos] #Models to evaluate Models = c('BRR','BayesA','BayesB','BayesC','BL') for(m in 1:5) { ETA = list(list(X=XE,model='FIXED'),list(X=X,model=Models[m]), list(X=XEM,model=Models[m])) Tab = data.frame(PT = 1:K,BS=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) if(dim(A$probs)[2]<5) stop } #Saving the output (Tab) for each model write.csv(Tab,file=paste0('Tab_M3',Models[m],'.csv'),row.names = FALSE) } #GBLUP regression model (4) #Genomic relationship matrix G = (dat_M%*%t(dat_M))/dim(dat_M)[2] dat_F$GID = factor(dat_F$GID,levels=colnames(G)) Z_L = model.matrix(~0+GID,data=dat_F) Ga = Z_L%*%G%*%t(Z_L) #"Genomic" relationship matrix for interaction Ga2 = kronecker(diag(3),G) ETA_GBLUP = list(list(K=Ga,model='RKHS'), list(X=XE,model='FIXED'), list(K=Ga2,model='RKHS')) Tab = data.frame(PT = 1:K,BS=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_GBLUP, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst],A$probs[Pos_tst,]) Tab$PCC[k] = mean(y[Pos_tst]==apply(A$probs,1,which.max)[Pos_tst]) } Tab

Appendix 3

rm(list=ls(all=TRUE)) library(BGLR) load('Data_Toy_EYT.RData',verbose=TRUE) #Phenotypic data dat_F = Pheno_Toy_EYT head(dat_F) dat_F = dat_F[order(dat_F$Env,dat_F$GID),] y = dat_F$Height #Genomic relationship matrix G = G_Toy_EYT #PT #10 random partitions K = 10 n = length(y) set.seed(1) PT = replicate(K,sample(n,0.20*n)) #Brier score #The min value of y must be 1 BS_f<-function(y,p) { n = length(y) CM = matrix(0,nr=n,nc=dim(p)[2]) CM[cbind(1:n,y)]<-1 mean(rowSums((p-CM)^2))/2 } #M3 dat_F$GID = factor(dat_F$GID,levels=colnames(G)) Z_L = model.matrix(~0+GID,data=dat_F) Ga = Z_L%*%G%*%t(Z_L) ETA_GBLUP = list(list(K=Ga,model='RKHS')) #GBLUP Tab = data.frame(PT = 1:K,BS=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_GBLUP, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst]+1,A$probs[Pos_tst,]) lbs = as.numeric(colnames(A$probs)) yp = apply(A$probs,1,function(x)lbs[which.max(x)]) Tab$PCC[k] = mean(y[Pos_tst]==yp[Pos_tst]) } Tab #M4 #Environment matrix design XE = model.matrix(~Env,data=dat_F)[,-1] #"Genomic" relationship matrix for interaction Ga2 = kronecker(diag(4),G) ETA_GBLUP = list(list(X=XE,model='FIXED'), list(K=Ga,model='RKHS'), list(K=Ga2,model='RKHS')) #GBLUP Tab = data.frame(PT = 1:K,BS=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_GBLUP, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst]+1,A$probs[Pos_tst,]) lbs = as.numeric(colnames(A$probs)) yp = apply(A$probs,1,function(x)lbs[which.max(x)]) Tab$PCC[k] = mean(y[Pos_tst]==yp[Pos_tst]) } Tab #M5 #Environment matrix design XE = model.matrix(~Env,data=dat_F)[,-1] #mean(G!=Ga2) ETA_GBLUP = list(list(K=Ga,model='RKHS'), list(X=XE,model='FIXED')) Tab = data.frame(PT = 1:K,BS=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_NA = y y_NA[Pos_tst] = NA A = BGLR(y=y_NA,response_type='ordinal',ETA=ETA_GBLUP, nIter = 1e4,burnIn = 1e3,verbose = FALSE) Tab$BS[k] = BS_f(y[Pos_tst]+1,A$probs[Pos_tst,]) lbs = as.numeric(colnames(A$probs)) yp = apply(A$probs,1,function(x)lbs[which.max(x)]) Tab$PCC[k] = mean(y[Pos_tst]==yp[Pos_tst]) } Tab

Appendix 4 (Example 4)

rm(list=ls(all=TRUE)) library(glmnet) load('dat_ls.RData',verbose=TRUE) #Phenotypic data dat_F = dat_ls$dat_F #Marker information dat_M = dat_ls$dat_M #Response variable y = dat_F$y dat_F = dat_F[order(dat_F$Env,dat_F$GID),] head(dat_F) #PT #10 random partitions K = 10 n = length(y) set.seed(1) PT = replicate(K,sample(n,0.20*n)) #Brier score function BS_f<-function(y,p) { n = length(y) CM = matrix(0,nr=n,nc=dim(p)[2]) CM[cbind(1:n,y)]<-1 mean(rowSums((p-CM)^2))/2 } #PRMLRM-1 #Matrix design of markers Pos = match(dat_F$GID,row.names(dat_M)) X = dat_M[Pos,] Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = as.character(y[-Pos_tst]) X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='multinomial', type.measure = "class",# Misclassification error nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0) # alpha=1 for lasso penalization p_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min)[,,1] y_tst = y[Pos_tst] Tab$BS[k] = BS_f(y_tst,p_tst) yp_tst = predict(A,newx = X[Pos_tst,],type='class',s=A$lambda.min) yp_tst = as.numeric(yp_tst) Tab$PCC[k] = mean(y_tst==yp_tst) } Tab #PRMLRM-2 # Marker + Env effect #Environment_marker interaction XE = model.matrix(~Env,data=dat_F)[,-1] #Environment matrix design XEM = model.matrix(~0+X:Env,data=dat_F) dim(XEM) head(XEM)[,1:5] #XEMa = kronecker(diag(3),unique(X)) #sum((XEM-XEMa)^2) #Matrix desing for PMR X = cbind(XE,X) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = as.character(y[-Pos_tst]) X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='multinomial', type.measure = "class", nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0) # alpha=1 for lasso penalization p_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min)[,,1] y_tst = y[Pos_tst] Tab$BS[k] = BS_f(y_tst,p_tst) yp_tst = predict(A,newx = X[Pos_tst,],type='class',s=A$lambda.min) yp_tst = as.numeric(yp_tst) Tab$PCC[k] = mean(y_tst==yp_tst) } Tab #PRMLRM-3 #Marker + Env + Env:Marker effect #Matrix design of markers Pos = match(dat_F$GID,row.names(dat_M)) X = dat_M[Pos,] #Environment_marker interaction XEM = model.matrix(~0+X:Env,data=dat_F) X = cbind(XE,X,XEM) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = as.character(y[-Pos_tst]) X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='multinomial', type.measure = "class", nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0) # alpha=1 for lasso penalization #plot(A) p_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min)[,,1] y_tst = y[Pos_tst] Tab$BS[k] = BS_f(y_tst,p_tst) yp_tst = predict(A,newx = X[Pos_tst,],type='class',s=A$lambda.min) yp_tst = as.numeric(yp_tst) Tab$PCC[k] = mean(y_tst==yp_tst) } Tab

Appendix 5

rm(list=ls(all=TRUE)) library(glmnet) load('dat_ls.RData',verbose=TRUE) #Phenotypic data dat_F = dat_ls$dat_F #Marker information dat_M = dat_ls$dat_M y = dat_F$y dat_F = dat_F[order(dat_F$Env,dat_F$GID),] #PT #10 random partitions K = 10 n = length(y) set.seed(1) PT = replicate(K,sample(n,0.20*n)) #Brier score function BS_f<-function(y,p) { n = length(y) CM = matrix(0,nr=n,nc=dim(p)[2]) CM[cbind(1:n,y)]<-1 mean(rowSums((p-CM)^2))/2 } #GMLRM-R1: Genomic effects #Matrix design of markers X = dat_M #GIDs = sort(row.names(X)) #X = X[match(GIDs,row.names(X)),] G = tcrossprod(X)/dim(X)[2] dat_F$GID = factor(dat_F$GID,levels=row.names(G)) ZL = model.matrix(~0+GID,data=dat_F) Lg = t(chol(G))#Lower triangular part of the Cholesky descomposition of G #Design matrix to use with glmnet ZLa = ZL%*%Lg X = ZLa Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = as.character(y[-Pos_tst]) X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='multinomial', type.measure = "class",# Misclassification error nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0)#take alpha=1 for lasso penalty, GMLRM-L1 #Prediction of testing set p_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min)[,,1] y_tst = y[Pos_tst] Tab$BS[k] = BS_f(y_tst,p_tst) yp_tst = predict(A,newx = X[Pos_tst,],type='class',s=A$lambda.min) yp_tst = as.numeric(yp_tst) Tab$PCC[k] = mean(y_tst==yp_tst) } Tab #GMLRM-R2: Genomic + Env effect #Matrix design of markers ZLa = ZL%*%Lg #Environment_marker interaction XE = model.matrix(~Env,data=dat_F) #Matrix design to use in glmnet X = cbind(XE[,-1],ZLa) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = as.character(y[-Pos_tst]) X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='multinomial', type.measure = "class", nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0)# take alpha=1 for lasso penalty, GMLRM-L2 p_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min)[,,1] y_tst = y[Pos_tst] Tab$BS[k] = BS_f(y_tst,p_tst) #Prediction of testing set yp_tst = predict(A,newx = X[Pos_tst,],type='class',s=A$lambda.min) yp_tst = as.numeric(yp_tst) Tab$PCC[k] = mean(y_tst==yp_tst) } Tab #GMLRM-R3: Genomic + Env effect + Env:Marker effect #Matrix design of markers ZLa = ZL%*%Lg #Environment-genetic interaction ZEL = model.matrix(~0+GID:Env,data=dat_F) ZELa = ZEL%*%kronecker(diag(dim(XE)[2]),Lg) #Input matrix to use in glmnet X = cbind(XE,ZLa,ZELa) Tab = data.frame(PT = 1:K,BS=NA,PCC=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = as.character(y[-Pos_tst]) X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='multinomial', type.measure = "class", nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0)# alpha=1 for lasso penalty, GMLRM-L3 p_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min)[,,1] y_tst = y[Pos_tst] Tab$BS[k] = BS_f(y_tst,p_tst) #Prediction of testing set yp_tst = predict(A,newx = X[Pos_tst,],type='class',s=A$lambda.min) yp_tst = as.numeric(yp_tst) Tab$PCC[k] = mean(y_tst==yp_tst) } Tab

Appendix 6

rm(list=ls(all=TRUE)) library(glmnet) load('dat_ls.RData',verbose=TRUE) #Phenotypic data dat_F = dat_ls$dat_F #Marker information dat_M = dat_ls$dat_M y = dat_F$y dat_F$Env= as.character(dat_F$Loc) dat_F = dat_F[order(dat_F$Env,dat_F$GID),] head(dat_F) #PT #10 random partitions K = 10 n = length(y) set.seed(1) PT = replicate(K,sample(n,0.20*n)) #Penalized Ridge (lasso) Poisson regression models #PRPRM-1-3 (PLPRM-1-3) #Matrix design of markers Pos = match(dat_F$GID,row.names(dat_M)) X = dat_M[Pos,] #Environment matrix design XE = model.matrix(~Env,data=dat_F)[,-1] #Environment-marker interaction Envs = unique(dat_F$Env) Markers = colnames(X) XEM = model.matrix(~0+X:Env,data=dat_F) Pos = !(colnames(XEM)%in%paste0('X',Markers,':Env',Envs[length(Envs)])) XEM = XEM[,Pos] X = X# Design matrix for PRPRM-1 and PLPRM-1 #X = cbind(XE,X) # Uncomment this line to implement PRPRM-2 or PLPRM-2 #X = cbind(XE,X,XEM) # Uncomment this line to implement PRPRM-3 or PLPRM-3 Tab = data.frame(PT = 1:K,MSEP=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = y[-Pos_tst] X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='poisson', type.measure = "mse", nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0)# Take alpha=1 for lasso penalty yp_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min) y_tst = y[Pos_tst] Tab$MSEP[k] = mean((y_tst-yp_tst)^2) } Tab #GBLUP implementation of penalized Ridge (lasso) Poisson regression models #GPRM-R1-R3 (GPRM-L1-L3) #Matrix design of Environments XE = model.matrix(~Env,data=dat_F) #Matrix design of markers X = dat_M G = tcrossprod(X)/dim(X)[2] # Genomic relationship matrix dat_F$GID = factor(dat_F$GID,levels=row.names(G)) dat_F = dat_F[order(dat_F$Env,dat_F$GID),] ZL = model.matrix(~0+GID,data=dat_F) mean(substring(colnames(ZL),4)!=row.names(G)) Lg = t(chol(G)) #Matrix design of genetic effects ZLa = ZL%*%Lg #Environment-genetic interaction matrix design ZEL = model.matrix(~0+GID:Env,data=dat_F) ZELa = ZEL%*%kronecker(diag(dim(XE)[2]),Lg) X = ZLa # Matrix design for GPRM-R1 and GPRM-L1 #X = cbind(XE[,-1],ZLa)# Uncomment this line to implement GPRM-R2 or GPRM-L2 #X = cbind(XE[,-1],ZLa,ZELa) )# Uncomment this line to implement GPRM-R3 or GPRM-L3 Tab = data.frame(PT = 1:K,MSEP=NA) set.seed(1) for(k in 1:K) { Pos_tst = PT[,k] y_tr = y[-Pos_tst] X_tr = X[-Pos_tst,] A = cv.glmnet(X_tr, y_tr, family='poisson', type.measure = "mse", nfolds = 10, parallel = FALSE,thresh=1e-10, alpha = 0)# Take alpha=1 for lasso penalty yp_tst = predict(A,newx = X[Pos_tst,],type='response',s=A$lambda.min) y_tst = y[Pos_tst] Tab$MSEP[k] = mean((y_tst-yp_tst)^2) } Tab

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Montesinos López, O.A., Montesinos López, A., Crossa, J. (2022). Bayesian and Classical Prediction Models for Categorical and Count Data. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer, Cham. https://doi.org/10.1007/978-3-030-89010-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-89010-0_7
Published: 14 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89009-4
Online ISBN: 978-3-030-89010-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Bayesian and Classical Prediction Models for Categorical and Count Data

Abstract

Similar content being viewed by others

Genome-Enabled Prediction Using the BLR (Bayesian Linear Regression) R-Package

Genomic Prediction Models for Count Data

MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits

Keywords

7.1 Introduction

7.2 Bayesian Ordinal Regression Model

7.2.1 Illustrative Examples

Example 1 BRR and GBLUP

Example 2 Environments + Markers + Marker×Environment Interaction

Example 3 Binary Traits

7.3 Ordinal Logistic Regression

7.4 Penalized Multinomial Logistic Regression

7.4.1 Illustrative Examples for Multinomial Penalized Logistic Regression

Example 4

Example 5

7.5 Penalized Poisson Regression

Example 6

7.6 Final Comments

References

Author information

Authors and Affiliations

Appendices

Appendix 1

Appendix 2

Appendix 3

Appendix 4 (Example 4)

Appendix 5

Appendix 6

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Bayesian and Classical Prediction Models for Categorical and Count Data

Abstract

Similar content being viewed by others

Genome-Enabled Prediction Using the BLR (Bayesian Linear Regression) R-Package

Genomic Prediction Models for Count Data

MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits

Keywords

7.1 Introduction

7.2 Bayesian Ordinal Regression Model

7.2.1 Illustrative Examples

Example 1 BRR and GBLUP

Example 2 Environments + Markers + Marker×Environment Interaction

Example 3 Binary Traits

7.3 Ordinal Logistic Regression

7.4 Penalized Multinomial Logistic Regression

7.4.1 Illustrative Examples for Multinomial Penalized Logistic Regression

Example 4

Example 5

7.5 Penalized Poisson Regression

Example 6

7.6 Final Comments

References

Author information

Authors and Affiliations

Appendices

Appendix 1

Appendix 2

Appendix 3

Appendix 4 (Example 4)

Appendix 5

Appendix 6

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation