Throughout this book, we have been using the pseudonym GLMMs to denote generalized linear mixed models. The common denominator among all these models is that they all contain a linear model (LM) part, which refers to the fixed effects component of the linear predictor . In a GLMM, the prefix “G” indicates that the distribution of observations may not be normal, the suffix of the first M means that the linear predictor includes mixed effects and thus contains random effects, which are expressed by the term “Zb.” The fixed linear component of the predictor is important because the fixed effects describe the treatment design, which, in turn, is determined by the objectives or the initial research questions that the study wishes to answer. Therefore, if the researcher proposes using a reasonable model to analyze an experiment, then he/she must be able to express each objective as a question about a model parameter or as a linear combination of model parameters.

FormalPara Example

Assume a factorial 2 × 2 model, with two levels in both factors A and B, in which all possible combinations are tested. In this case, corresponds to a two-way model with interaction and a predictor given by

$$ {\eta}_{ij}=\mu +{\alpha}_i+{\beta}_j+{\left(\alpha \beta \right)}_{ij};i,j=1,2 $$

As in all the statistical models studied so far, the linear predictor is expressed in terms of the link function, and ηij can estimate the mean μij (a combination of treatments) directly if the data follow a normal distribution and indirectly if the data are not normally distributed. For this example, the inference should focus on one or more of the following options (estimable functions): a treatment combination mean; a main effect mean; the mean of factor A, which is the average of the overall levels of factor B or vice versa; the difference of the main effects or the difference of a single effect, i.e., the difference between two levels of factor A at a given level of B or the difference between two levels of factor B at a given level of factor A; and so on. Each of these options can be expressed in terms of the parameters of the linear predictor, as shown in Table 3.1.

Table 3.1 Estimable functions in a factorial 2 × 2 treatment structure using the identity link function

Assuming that the data have a normal distribution, which is equivalent to using an identity link function, the estimator, in terms of the linear predictor (column 2), estimates the expected values of column three. If the data do not follow a normal distribution, then column 2 indirectly estimates the expected values of column three, and, in order to estimate the expected values, link functions are required. For link functions other than identity, the estimates in column two require a more careful handling. In an experimental design with a factorial treatment structure, the analysis should focus on the interaction of the two factors. If this interaction is significant, then the simple effects are not equal; however, if the interaction is not significant, then the main effects provide useful information; otherwise, the main effects are confounded. Therefore, for this reason, in this case, it is better to focus on the simple effects.

3.1 Three Aspects to Consider for an Inference

When constructing a model, the researcher must decide whether the effects are fixed or random. This decision has important implications with respect to the estimation criteria and in the interpretation of the tests and estimates obtained. Given these implications, three important aspects, described in the following sections, must be taken into consideration in statistical modeling.

3.1.1 Data Scale in the Modeling Process Versus Original Data

This is a very particular issue for models with a link function other than “identity,” since the scale of the data used in the modeling process is not always the same as the scale of the original data when the assumption of normality in the response variable is no longer valid. When the data are normally distributed, the estimable function directly estimates the expected value. However, this is not true if the data follow a non-normal distribution. For example, in a logistic model for binomial data in a completely randomized design, the estimable function η + τi estimates a logit or “log” odds. In this vein, η + τi must be expressed as a probability and not as a logit, i.e., the expected value for individuals receiving the ith treatment is a probability. This requires converting the estimate to a probability, using the inverse link; that is:

$$ {\pi}_i=\frac{1}{\left(1+{e}^{-\left(\eta +{\tau}_i\right)}\right)} $$

Thus, for functions other than “identity,” there are two ways of expressing the estimates: (1) in terms of the parameters directly estimated from the GLMM (model scale) or (2) in terms of the expected value of the response variable (data scale).

3.1.2 Inference Space

This problem arises only when the linear predictor contains random effects. In these models, the estimates are obtained through a linear combination (an estimable function) with fixed effects, even though the linear predictor contains random effects. Kβ denotes the estimable function, where K is the matrix of order [(p + 1) × k] and β is the vector of fixed effects parameters of order [(p + 1) × 1]. The estimable function (Kβ) represents a broad inference as it generalizes results to the entire population represented by the random effects.

Although the linear combination Kβ + Zb is a predictable function with Z, a matrix for random effects with nonzero coefficients, its inference is limited to only those levels defined in b. Suppose that you are conducting an experiment with three treatments at different locations (L), then the estimable function τ1 − τ2 provides information for the inference about the difference between treatments 1 and 2 in the whole population under study. Although the predictable function [τ1 − τ2 + ()1j − ()2j] constrains the inference space between treatments 1 and 2, it is limited to location (Lj). The type of inference produced by predictable functions is called “narrow inference” because the nonzero coefficients in matrix Z reduce the scope of inference for the entire population at those levels identified in Z. Thus, the predictive function Kβ + Zb should be used for specific estimates, whereas the estimable function Kβ should be used for valid estimates for the entire population under study.

3.1.3 Inference Based on Marginal and Conditional Models

As mentioned in the previous chapter, the specification of a generalized linear mixed model (GLMM) is done in terms of two probability distributions: (1) the distribution of the observations, given the random effects y ∣ b and (2) the distribution of the random effects b. This feature is very particular to Gaussian (and non-Gaussian) mixed models (MMs), for this reason, it is also valid for mixed models with response variables that are different from a normal distribution.

From the probability theory, the marginal probability distribution of data (y) can be obtained by integrating over the random effects, b, from the joint probability distribution of y and b. Of the two distributions, the marginal distribution of data is the only one that can be known and observable. Many non-Gaussian mixed models, which seem reasonable, do not distinguish between the distribution of y ∣ b and y. Models that do not make this distinction are called marginal models. Estimates obtained by marginal models have different expected values compared to those produced by conditional models. Therefore, marginal models are not estimated in the same way as conditional models.

3.2 Illustrative Examples of the Data Scale and the Model Scale

In linear models, inference begins with the estimable function Kβ, and, these models, in turn, are defined in terms of the linear function η = g(μ) =  (if there are no random effects) and η =  + Zb (if there are random effects in the model), whereas Kβ produces results in terms of the link function.

For linear normal response models such as LMs and LMMs, the link function is not visible because they use the “identity function as the link. Linear combinations of model parameters directly estimate desired values such as differences between treatments and many other hypothesis tests of interest. Inference for an LM is straightforward.

For GLMs and GLMMs with a non-normal response, the estimation of Kβ yields a linear combination of elements of the linear predictor η, which is a linear combination of g(μ), typically a nonlinear function of μ. For example, with Poisson data the Kβ is a function of logarithm (log) and for binomial data, it is a function of logit or probit. However, most of the time, the researcher wants to see the binomial results expressed in terms of the probability of the outcome of interest, whereas for Poisson, the results are expressed in terms of counts. This means that since both GLMs and GLMMs carry out the estimation process on the scale of the model (depending on the link used) to report the results of interest in terms of the scale of the data, it is necessary to apply the inverse link to the predictor in terms of the model scale to express the results. To mention two examples, in the case of the logit link for binomial, the results are expressed in terms of probability and, in the case of the Poisson model, they are expressed in terms of counts. To exemplify the model scale and the data scale, an example is shown below.

Example 3.1

Consider the following experiment in which three chemical seed coat softeners were tested for studying their effect on germination of tomato seeds in Styrofoam trays (Table 3.2).

Table 3.2 Percentage of germinated seeds (Y) out of total seeds (N)

To illustrate the above two concepts, we first analyze these data using a completely randomized design (CRD), assuming the response variable to be normal, and, then, we analyze the same experimental design but with a binomial response variable. We are interested in comparing the means of treatments using a completely randomized design. Note that for demonstrative purposes, we are assuming that Y has a normal distribution, when in fact it has a binomial distribution.

The components of this model are defined as follows:

  • Distribution: yij~N(μi, σ2)

  • Linear predictor: ηi = η + τi; (i = 1, 2, 3)

  • Link function: ηi = μi (identity link)

The analysis of variance (ANOVA) (part (a)) and estimated parameters (part (b)) of this experimental design indicate that there is a highly significant difference between the treatments (P = 0.0033) for the germination of tomato seeds. Table 3.3 shows part of the results.

Table 3.3 Results of the analysis of variance using a CRD

The estimated parameter values of the model, except for treatment three, are shown in the table above (obtained with the “solution” command) because the model is over-parameterized. The estimable functions Kβ for the treatment means are as follows:

$$ {\boldsymbol{K}}^{\prime }=\left[\begin{array}{ccc}1& 1& 0\kern0.5em 0\\ {}1& 0& 1\kern0.5em 0\\ {}1& 0& \begin{array}{cc}0& 1\end{array}\end{array}\right];\kern.5em \boldsymbol{\beta} =\left[\begin{array}{c}\eta \\ {}{\tau}_1\\ {}\begin{array}{c}{\tau}_2\\ {}{\tau}_3\end{array}\end{array}\right] $$

From the estimated treatment parameters \( {\overline{\tau}}_i={\overline{\mu}}_i=\hat{\eta}+{\hat{\tau}}_i \), we can obtain the estimated mean for each one of the treatments (i = 1, 2, 3) as follows: for treatment

1, \( {\overline{\tau}}_1=\hat{\eta}+{\hat{\tau}}_1=41.6667+7.3333=49 \); for treatment

2, \( {\overline{\tau}}_2=\hat{\eta}+{\hat{\tau}}_2=41.6667-18=23.6667 \); and for treatment

3, \( {\overline{\tau}}_3=\hat{\eta}+{\hat{\tau}}_3=41.6667+0=41 \). The value of the mean squared error (\( {\hat{\sigma}}^2 \)),

which appears in the table as “Scale,” is 29.5556.

For the difference between treatments, the τi − τi values for i ≠ iare as follows:

\( {\overline{\tau}}_1-{\overline{\tau}}_2=\hat{\eta}+{\hat{\tau}}_1-\left(\hat{\eta}+{\hat{\tau}}_2\right)={\hat{\tau}}_1-{\hat{\tau}}_2=7.3333-\left(-18\right)=25.3333 \),\( {\overline{\tau}}_1-{\overline{\tau}}_3=\hat{\eta}+{\hat{\tau}}_1-\left(\hat{\eta}+{\hat{\tau}}_3\right)={\hat{\tau}}_1-{\hat{\tau}}_3=7.333-0.0=7.3333 \), and \( {\overline{\tau}}_2-{\overline{\tau}}_3=\hat{\eta}+{\hat{\tau}}_2-\left(\hat{\eta}+{\hat{\tau}}_3\right)={\hat{\tau}}_2-{\hat{\tau}}_3=-18.00-0.0=-18.0 \). These estimates can be obtained using the Statistical Analysis Software (SAS) “estimate” and “lsmeans” commands, as shown below:

proc glimmix data=germi; class trt; model y=trt / solution; lsmeans trt / diff e; estimate ’lsm trt 1’ intercept 1 trt 1 0; estimate ’lsm trt 2’ intercept 1 trt 0 1; estimate ’lsm trt 3’ intercept 1 trt 0 0 1; estimate ’overall mean’ intercept 1 trt 0.33333 0.33333 0.33333 0.33333; estimate ’overall mean’ intercept 3 trt 1 1 1 1 / divisor=3; estimate ’trt diff 1&2’ trt 1 -1 0; estimate ’trt diff 2&3’ trt 0 1 -1; run;

The “estimate” command requires us to specify what we wish to estimate and the “intercept” command refers to the intercept (η) and “Trt” to the treatment (τi) effects under evaluation; the coefficients needed for the estimates are shown above. While the “lsmeans” command invokes GLIMMIX in SAS to estimate the treatment means, “diff” asks to estimate the differences between pairs of treatments, and “E” displays the coefficients of the estimable functions used in “lsmeans.” Some of the outputs of the above code are shown in Table 3.4.

Table 3.4 Results obtained using the “estimate” and “lsmeans” commands

Next, we analyze the same data, also using a CRD, but now assuming a binomial distribution in the response variable. N indicates the independent number of Bernoulli trials observed in the ijth observation. The components of the model are as follows:

  • Distribution: yij~Binomial(Nij, πi)

  • Linear predictor: ηi = η + τi; (i = 1, 2, 3)

  • Link function: \( {\eta}_i=\mathrm{logit}\left(\frac{\pi_i}{1-{\pi}_i}\right) \) (logit link)

Fitting these data in a binomial model, the fixed effects solution of the parameters obtained in terms of the model scale are tabulated in Table 3.5.

Table 3.5 Estimated parameters at the model scale

The above results were obtained using the following SAS code:

proc glimmix data=germi; class trt; model y/n=trt / solution; run;

Similar to the previous example, we can estimate the mean of treatments and the differences between two pairs of treatments. The linear predictors for the treatments are as follows: \( {\hat{\eta}}_1=\hat{\eta}+{\hat{\tau}}_1=0.5108+0.5093=1.0201 \), \( {\hat{\eta}}_2=\hat{\eta}+{\hat{\tau}}_2=0.5108-1.108=-0.5971, \) and \( {\hat{\eta}}_3=\hat{\eta}+{\hat{\tau}}_3=0.5108+0.0=0.5108, \) and, for the differences between treatments (1 and 2, 1 and 3, and 2 and 3), they are as follows:

\( {\hat{\eta}}_1-{\hat{\eta}}_2=1.0201-\left(-0.5971\right)=1.6173 \), \( {\hat{\eta}}_1-{\hat{\eta}}_3=1.0201-0.5108=0.5093 \), and

\( {\hat{\eta}}_2-{\hat{\eta}}_3=-0.5971-0.5108=-1.1079 \), respectively

Using the relationship between the linear predictor and the link function \( {\eta}_i=\mathrm{logit}\left(\frac{\pi_i}{1-{\pi}_i}\right)=\log \left(\frac{\pi_i}{1-{\pi}_i}\right), \) we can estimate the probability of observing a favorable outcome for each of the treatments, that is, π1, π2, and π3, respectively. Applying the inverse link, we obtain:

$$ {\hat{\pi}}_1=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_1\right)}\right);{\hat{\pi}}_2=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_2\right)}\right);\mathrm{and}\ {\hat{\pi}}_3=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_3\right)}\right) $$

Substituting the corresponding values, we obtain

$$ {\hat{\pi}}_1=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_1\right)}\right)=1/\left(1+{e}^{-1.0201}\right)=0.735, $$
$$ {\hat{\pi}}_2=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_2\right)}\right)=1/\left(1+{e}^{-\left(-0.5971\right)}\right)=0.355,\mathrm{and} $$
$$ {\hat{\pi}}_3=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_3\right)}\right)=1/\left(1+{e}^{-(0.5108)}\right)=0.625 $$

Here, we can see that the treatment with the highest probability of success is treatment one, followed by treatment three, whereas treatment two has the lowest probability of success. Now, for the difference between two treatments, τi − τi for i ≠ i, we can estimate the logarithm of the odds ratio as

$$ {\tau}_i-{\tau}_{i^{\prime }}=\log \left(\frac{\pi_i}{1-{\pi}_i}\right)-\log \left(\frac{\pi_{i^{\prime }}}{1-{\pi}_{i^{\prime }}}\right)=\log \left(\raisebox{1ex}{$\frac{\pi_i}{1-{\pi}_i}$}\!\left/ \!\raisebox{-1ex}{$\frac{\pi_{i^{\prime }}}{1-{\pi}_{i^{\prime }}}$}\right.\right) $$

where, in this particular case, \( \mathrm{odds}=\left(\frac{\pi_i}{1-{\pi}_i}\right) \) is the odds of the treatment i and \( \mathrm{oddsratio}=\log \left(\raisebox{1ex}{$\frac{\pi_i}{1-{\pi}_i}$}\!\left/ \!\raisebox{-1ex}{$\frac{\pi_{i^{\prime }}}{1-{\pi}_{i^{\prime }}}$}\right.\right) \) is the odds ratio for treatments i and i, for i ≠ i.

When applying the inverse link to the above expression (odds ratio), we get

$$ \mathrm{Oddsratio}=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$\left(1+{e}^{-\left({\hat{\tau}}_i-{\hat{\tau}}_{i^{\prime }}\right)}\right)$}\right. $$

The value of the odds ratios for treatments 1 and 3 is

$$ {\mathrm{Oddsratio}}_{1-3}=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$\left(1+{e}^{-\left({\hat{\tau}}_1-{\hat{\tau}}_3\right)}\right)$}\right.=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$\left(1+{e}^{-0.5093}\right)$}\right.=0.6246 $$

Similarly, for the pair of treatments 1–2 and 2–3, the resulting odds ratios are Oddsratio1 − 2 = 0.8344 and Oddsratio2 − 3 = 0.2483, respectively. It is important to mention that the odds ratios are not the mean of the difference of πi − πifor i ≠ i.

From the previous example, it is clear that when the response variable is not normal, parameter estimation and inference occurs at two levels. The linear predictor and the estimable function Kβ are expressed in terms of the link function, logit – estimates on the model scale (scale of the link function) – as in the above example. Under the logit link, the logarithm of the odds and the difference of the estimate (log odds ratio) are very common and useful terms in categorical data analysis for the estimation of treatments or treatment differences in terms of the data scale.

Commonly, estimation at the model scale in GLMs is not very easy to interpret, and, as such, the data scale plays a very important role. A data scale involves applying the inverse of the link function to the estimable function, Kβ, as we did in the previous example to convert the log of the odds for each treatment to a probability. In general, we use the inverse of the link function to transform the estimates at the model scale to the data scale. The inverse of the link function is not used for estimating the differences between treatments because the link functions are generally nonlinear. This is why the inverse of the link function is not applied to the differences between treatments because it produces meaningless results.

Thus, in the logit model, we have two approximations for the difference. First, we could apply the inverse of the link function to each linear predictor for each treatment and then take the difference between probabilities: \( {\hat{\pi}}_i-{\hat{\pi}}_{i\prime } \). That is, we can estimate the difference between πi − πi through \( \left[1/\left(1+{e}^{-\left(\eta +{\tau}_i\right)}\right)\right]-\left[1/\left(1+{e}^{-\left(\eta +{\tau}_{i\prime}\right)}\right)\right] \) and not as \( \left[1/\left(1+{e}^{-\left({\hat{\tau}}_i-{\hat{\tau}}_{i\prime}\right)}\right)\right] \). Second, we know that τi − τi estimates the logarithm of the odds ratio by means of \( {e}^{\left({\tau}_i-{\tau}_{i\prime}\right)} \), which produces an estimate of the odds ratio. Both approaches are valid, and the use of one approach or the other depends on the requirements of the particular study.

With the GLIMMIX procedure, we can implement the solution in terms of the data scale with the “ilink,” “exp,” and “oddsratio” commands, as shown in the following SAS code:

proc glimmix; class trt; model y/n=trt / solution oddsratio; lsmeans trt / diff oddsratio ilink ; estimate ’lsm trt 1’ intercept 1 trt 1 0/ilink; estimate ’lsm trt 2’ intercept 1 trt 0 1/ilink; estimate ’lsm trt 3’ intercept 1 trt 0 0 1/ilink; estimate ’overall mean’ intercept 1 trt 0.33333 0.33333 0.33333 0.33333/ilink; estimate ’overall mean’ intercept 3 trt 1 1 1 1 / divisor=3 ilink; estimate ’trt diff 1&2’ trt 1 -1 0/ oddsratio ilink; estimate ’trt diff 1&3’ trt 1 0 -1/oddsratio ilink; estimate ’trt diff 2&3’ trt 0 1 -1/oddsratio ilink; estimate ’trt diff 1&2’ trt 1 -1 0/exp; estimate ’trt diff 1&3’ trt 1 0 -1/exp; estimate ’trt diff 2&3’ trt 0 1 -1/exp; run;

Part of the output of “proc GLIMMIX” is shown in Table 3.6. The “Odds ratio estimates” (part (a)) are the result of the “oddsratio” command in the previous program, whereas the confidence intervals are provided by default.

Table 3.6 Results of the “ilink,” “exp,” and “oddsratio” commands

What appears under “Estimate” (in part (b)) is in the model scale \( {\hat{\eta}}_i=\hat{\eta}+{\hat{\tau}}_i \), and what appears under “Mean” (in part (b)) is an estimate of the inverse of the link function \( {\hat{\pi}}_i=1/\left(1+{e}^{-\left(\hat{\eta}+{\hat{\tau}}_i\right)}\right) \) and, in this case, is a probability that corresponds to the data scale. Similarly, what appears under “Estimate” is in model scale \( {\hat{\tau}}_i-{\hat{\tau}}_{i^{\prime }} \), whereas the “Odds ratio” values were estimated using \( {e}^{\left({\tau}_i-{\tau}_{i\prime}\right)} \) and are in data scale.

Under “Estimates” column in Table 3.7, the log odds ratio appears as an “Exponentiated estimate” regardless of whether we use the “oddsratio” or “exp” option in the “estimate” command. For the overall mean, the inverse of the link function applied to \( \hat{\eta}+\frac{1}{3}\left({\hat{\tau}}_1+{\hat{\tau}}_2+{\hat{\tau}}_3\right) \) is 0.5772, which is totally different from the average of \( {\hat{\pi}}_{is} \); that is, \( \frac{1}{3}\left({\hat{\pi}}_1+{\hat{\pi}}_2+{\hat{\pi}}_3\right)=\frac{1}{3}\left(0.735+0.355+0.655\right)=0.5816 \). This illustrates that we have to be extremely careful when using the output of proc GLIMMIX, as it can produce outputs in terms of both the model scale and the data scale through the application of the inverse of the link function; however, this has to be applied appropriately, otherwise, we will get meaningless results.

Table 3.7 Estimates at the model scale and at the data scale

Example 3.2: Randomized complete block design (RCBD) with normal and binomial responses

Now, assume that we have the same example but in an RCBD. The three treatments were tested in each of the blocks, as shown in Table 3.8.

Table 3.8 Percentage of germinated seeds (Y) out of total seeds (N) in a randomized complete block design

In this example, first, the data are analyzed assuming a normal response and assuming that the block effect is fixed; then, they are analyzed assuming a binomial response.

The model components under a Gaussian response variable are as follows:

  • Distribution: yij ~ N(μij, σ2)

  • Linear predictor: ηij = η + τi + blockj; (i, j = 1, 2, 3)

  • Link function: ηij = μij; (identity link)

From the theory of linear models, we know that we can estimate the ith treatment mean through

$$ {\overline{\eta}}_{i\bullet }=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right.\sum \limits_{j=1}^3{y}_{ij}=\eta +{\tau}_i+\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right.\sum \limits_{j=1}^3{\mathrm{block}}_j=\eta +{\tau}_i+{\overline{\mathrm{bloq}}}_{\bullet } $$

where \( {\overline{\mathrm{bloq}}}_{\bullet }=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right.\sum \limits_{j=1}^3{\mathrm{block}}_j. \)

For the mean difference of two treatments i and i’, this is estimated as

$$ {\overline{\eta}}_{i\bullet }-{\overline{\eta}}_{i\prime \bullet }=\eta +{\tau}_i+{\overline{\mathrm{bloq}}}_{\bullet }-\left(\eta +{\tau}_{i^{\prime }}+{\overline{\mathrm{bloq}}}_{\bullet}\right)={\tau}_i-{\tau}_{i\prime } $$

The goal of this experiment could be to compare the treatment means, that is, \( {\overline{\eta}}_{1.}={\overline{\eta}}_{2.}={\overline{\eta}}_{3.} \), equivalently – this can be expressed as τ1 = τ2 = τ3 – or to compare one treatment with the average of the other treatments: for example, to compare treatment 1 with the averages of treatments 2 and 3 (Trt1.vs.average.Trt2.and.Trt3).

For the hypothesis test of the equality of treatments (τ1 = τ2 = τ3), the estimable function Kβ is given by:

$$ {\boldsymbol{K}}^{\prime }=\left[\begin{array}{ccc}0& 1& 0\kern0.5em -1\kern0.5em \begin{array}{ccc}0& 0& 0\end{array}\\ {}0& 0& \begin{array}{ccc}1& -1& \begin{array}{ccc}0& 0& 0\end{array}\end{array}\end{array}\right];\kern0.5em \boldsymbol{\beta} =\left[\begin{array}{c}\eta \\ {}{\tau}_1\\ {}\begin{array}{c}{\tau}_2\\ {}{\tau}_3\\ {}\begin{array}{c}{\mathrm{bloq}}_1\\ {}\begin{array}{c}{\mathrm{bloq}}_2\\ {}{\mathrm{bloq}}_3\end{array}\end{array}\end{array}\end{array}\right] $$

While for contrasts Trt1.vs.average.Trt2.and.Trt3 and Trt2.vs.average.Trt1.and.Trt3, Kβ is given by

$$ \mathrm{Trt}1.\mathrm{vs}.\mathrm{average}.\mathrm{Trt}2.\mathrm{and}.\mathrm{Trt}3={\overline{\eta}}_{1\bullet }-\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.\left({\overline{\eta}}_{2\bullet }+{\overline{\eta}}_{3\bullet}\right)={\tau}_1-\left(\frac{\tau_2-{\tau}_3}{2}\right) $$
$$ \mathrm{Trt}2.\mathrm{vs}.\mathrm{average}.\mathrm{Trt}1.\mathrm{and}.\mathrm{Trt}3={\overline{\eta}}_{2\bullet }{-}^1/2\operatorname{}\left({\overline{\eta}}_{1\bullet }+{\overline{\eta}}_{3\bullet}\right)={\tau}_2-\left(\frac{\tau_1-{\tau}_3}{2}\right) $$
$$ {\boldsymbol{K}}^{\prime }=\left[\begin{array}{lllllll}0& 2& -1& -1& 0& 0& 0\\ {}0& 1& -2& 1& 0& 0& 0\end{array}\right];\kern0.5em \boldsymbol{\beta} =\left[\begin{array}{c}\eta \\ {}{\tau}_1\\ {}\begin{array}{c}{\tau}_2\\ {}{\tau}_3\\ {}\begin{array}{c}{\mathrm{bloq}}_1\\ {}\begin{array}{c}{\mathrm{bloq}}_2\\ {}{\mathrm{bloq}}_3\end{array}\end{array}\end{array}\end{array}\right] $$

The following GLIMMIX procedure allows us to implement the above example.

proc glimmix; class trt block; model y = trt block/solution; lsmeans trt / diff e; estimate ’lsm trt1’ intercept 3 trt 3 0 0 0 block 1 1 1 / divisor=3; estimate ’overall mean’ intercept 3 trt 1 1 1 1 block 1 1 1 1 / divisor=3; estimate ’average trt1&trt2’ intercept 6 trt 3 3 0 block 2 2 2 / divisor=6; estimate ’average trt1&trt2&trt3’ intercept 9 trt 3 3 3 3 block 3 3 3 3 3/divider=9; estimate ’trt1 vs trt2’ trt 1 -1 0 ; estimate ’trt1 vs trt3’ trt 1 0 -1; estimate ’trt2 vs trt3’ trt 0 1 -1; estimate ’trt1 vs trt2’ trt 1 -1 0, ’trt1 vs trt3’ trt 1 0 -1, ’trt2 vs trt3’ trt 0 1 -1/divisor=1,1,1 adjust=sidak; contrast ’trt1 vs trt2’ trt 1 -1 0 ; contrast ’trt1 vs trt3’ trt 1 0 -1; contrast ’trt2 vs trt3’ trt 0 1 -1; contrast ’trt1 vs average trt1,trt2’ trt 2 -1 -1; contrast ’trt2 vs average trt1,trt3’ trt -1 2 -1; contrast ’type 3 trt ss’ trt 1 0 -1 0,trt 0 1 -1; contrast ’type 3 trt test’ trt 2 -1 -1,trt -1 2 -1; run;

Part of the GLIMMIX output is shown below in Table 3.9. Parameter estimation for treatments 1–2 and blocks 1–2 are shown below, except for treatment and block 3. This is because it is an incomplete rank model. The generalized inverse is used in the estimation through the SWEEP operator of SAS. In this case, it sets the last class effect equal to zero (Table 3.9).

Table 3.9 Estimation of treatment and block parameters

“Coefficients” (part (a) of Table 3.10) obtained with option E for the least squares means of treatments in “lsmeans” shows how SAS uses this information in the parameter solution to calculate the treatment means (part (b)). In part (c), we can see the difference of means obtained with the “diff” option in “lsmeans.”

Table 3.10 Coefficients for treatment and block used in least squares

The estimates obtained from the “estimate” command with multiple estimable functions and in the “Sidak” adjustment and contrasts are shown in Table 3.11. This adjustment allows us to control for type I errors. The “adjust” option in “estimate” in the Sidak adjustment (part (b)) allows us to obtain the adjusted P-values denoted as AdjP in addition to Pr > |t|.

Table 3.11 Multiple estimates and contrasts

The planned contrasts in matrix K and with the F-values obtained with the “contrast command” produce the same results (part (c)).

Now, the same dataset is fitted using the same predictor but assuming that the response variable is binomial. This analysis intends to show the options available in the SAS commands when you want to fit non-normal responses; in this case, it is binomial. Practically, the same commands used in the previous program with normal data are used, but, now, some other options (“ilink,” “oddsratio,” or “exp”) are exemplified with details under what circumstances they should be used. This is because all estimable functions produce estimates at the model scale, and we must decide what conversions are necessary to obtain the results at the data scale. Below, the estimable functions and the appropriate conversion required to produce the results on the data scale are listed.

  1. (a)

    Least squares means (“lsmeans”) for normal data and an inverse link (“ilink”) for non-normal data

  2. (b)

    Difference between pairs of treatment means of “lsmeans” for normal data and “odds ratio” for non-normal data

  3. (c)

    Estimation of the mean of a treatment (“estimate”) for normal data and an inverse link (“ilink”) for non-normal data

  4. (d)

    Estimation of a treatment i vs treatment i: exponentiation (“exp”) (or odds ratio)

  5. (e)

    Multiple estimates of treatment differences as “exp” (or odds ratio) for non-normal data

  6. (f)

    In “contrast estimation,” conversion to the data scale is not necessary, since it is only an F-statistic test.

The following GLIMMIX program shows how to implement this model with a binomial response.

proc glimmix; class trt block; model y/n = trt block/solution oddsratio; lsmeans trt / diff e oddsratio; estimate ’lsm trt1’ intercept 3 trt 3 0 0 0 block 1 1 1 1/divider=3 ilink; estimate ’difference trt1 vs trt2’ trt 1 -1 0/exp; estimate ’avg trt1&trt2&trt3’ intercept 9 trt 3 3 3 3 block 3 3 3 3 3/divider=9; estimate ’trt1 vs trt2’ trt 1 -1 0/exp; estimate ’trt1 vs trt3’ trt 1 0 -1/exp; estimate ’trt2 vs trt3’ trt 0 1 -1/exp; estimate ’trt1 vs trt2’ trt 1 -1 0, ’trt1 vs trt3’ trt 1 0 -1, ’trt1 vs trt3’ trt 0 1 -1/exp adjust=sidak; contrast ’trt1 vs trt2’ trt 1 -1 0; contrast ’trt1 vs trt3’ trt 1 0 -1; contrast ’trt2 vs trt3’ trt 0 1 -1; contrast ’trt1 vs average trt1,trt2’ trt 2 -1 -1; contrast ’trt2 vs average trt1,trt3’ trt -1 2 -1; contrast ’type 3 trt ss’ trt 1 0 -1 0,trt 0 1 -1; contrast ’type 3 trt test’ trt 2 -1 -1,trt -1 2 -1; run;

Part of the output is shown in Table 3.12. The estimated treatment and block parameters of the model are given in part (a) of Table 3.12; the last two effects of both classes were restricted to zero because they are incomplete rank design matrices. In part (b), the type III tests of fixed effects and in part (c) the odds ratio estimates are provided. Note that \( {\hat{\sigma}}^2 \) does not appear in the output because the variance of the binomial distribution is not an independent parameter.

Table 3.12 Results of the analysis of variance in a binomial model

In Table 3.12 (parts (b) and (c)), which shows the sum of the squares of fixed effects type III as well as the odds ratio, it can be seen that only the effect of treatments is significant but not the effect of blocks, which indicates that it is valid to analyze these data using a completely randomized design. Two sets of odds ratios were estimated (part (c)): one for the treatment effects and the other for the block effects in the model. In the calculation of odds ratios, generally, the last level of the factor is compared with the rest of the levels of that same factor.

The estimates obtained with “estimate”, in Table 3.13 (parts (a) and (b)), are results in terms of the model scale, whereas the last column is obtained by applying EXP \( \left({e}^{\tau_i-{\tau}_i\prime}\right) \).

Table 3.13 Different estimates obtained with “estimate”

The least squares means for treatment and the linear predictors of treatment differences (parts (a) and (b) of Table 3.14, respectively) obtained with “lsmeans” are the values under the “Estimate” column, and, these, together with their corresponding standard errors, were obtained using the linear predictor \( {\hat{\eta}}_{ij}=\hat{\eta}+{\hat{\tau}}_i+{\hat{\overline{\mathrm{block}}}}_{\bullet } \).

Table 3.14 Estimated linear predictors for treatments and treatment differences with their respective inverse values

These estimates are on the model scale, whereas the values under the “Mean” column and their respective standard errors were obtained by applying the inverse link to obtain the probabilities of success of each treatment (\( {\hat{\pi}}_i\Big) \). While the estimated linear predictors for the mean differences were obtained with the “oddsratio” option, the mean difference in the data scale is obtained by taking the inverse of these predictors.

3.3 Fixed and Random Effects in the Inference Space

In an analysis, inference can be directed solely at fixed effects (population inference) or at a combination of fixed and random effects (specific inference). To illustrate these two levels of inferences, we will consider two examples:

3.3.1 A Broad Inference Space or a Population Inference

In practice, the random effects in a linear mixed model (LMM) should represent the population from which the data were collected and should be included in studies as if they came from a well-planned sample. In a model, random effects can be locations, regions, states, blocks, and so on, and they have two very particular characteristics.

  • Random effects represent the target population.

  • Random effects have a probability distribution.

These two characteristics allow us to have a broad inference space where we can calculate point estimates, estimate intervals, and perform hypothesis testing applicable to the entire population represented by the random effects. Formally, an estimate or hypothesis test based on an LMM indicates that we have a broad inference space defined by the estimable function Kβ if Z is a matrix with coefficients equal to zero; otherwise, the estimation or hypothesis test is defined by the prediction function Kβ + Zβ, which is a specific inference.

3.3.2 Mixed Models with a Normal Response

In Example 3.2, the response variable was assumed as a function of fixed effects due to treatments and blocks, since block effects were also assumed to be fixed effects. Now, suppose that applications of treatments were done by three different people (blocks); then, assuming that the block effects are fixed, this would be questionable since each person does their job according to their experience, skill, and so forth. Clearly, there is some variability between blocks that is not due to the experiment and this has to be removed, so the effects due to blocks must be considered random. In this example, let us assume that the three blocks (persons) were randomly selected from a population. Thus, the components of the model are defined as follows:

  • Distribution:

    1. (a)

      yij∣ blockj ~ N(μij, σ2)

    2. (b)

      \( {\mathrm{bloque}}_j\sim N\left(0,{\upsigma}_{\mathrm{block}}^2\right) \)

  • Linear predictor: ηij = η + τi + blockj (i, j = 1, 2, 3)

  • Link function: ηij = μij (identity link)

Note the impact of changing the estimable function for the mean of treatments. In Example 3.2, the estimable function was defined by \( E\left({\overline{y}}_{i\bullet}\right)=\eta +{\tau}_i+{\overline{\mathrm{block}}}_{\bullet } \). Now with the mixed model (fixed effects and random blocks), the estimable function is defined by \( E\left({\overline{y}}_{i\bullet}\right)=\eta +{\tau}_i \) because E(block) = 0. Therefore, the estimable function for the mean in each of the treatments is η + τi. In this situation, two questions arise:

  • How much do the results obtained from a fixed effects model differ from those obtained from a mixed model?

  • How can we compare the two results?

The following program allows us to estimate a mixed model with a normal response.

proc glimmix; class trt block; model y = trt /solution; random block/solution; lsmeans trt / diff e; estimate ’lsm trt1’ intercept 1 trt 1 0 0 0|block 0 0 0 0; estimate ’lsm trt2’ intercept 1 trt 0 1 0|block 0 0 0 0; estimate ’lsm trt3’ intercept 1 trt 0 0 0 1|block 0 0 0 0; estimate ’blup trt1’ intercept 3 trt 3 0 0 0|block 1 1 1 / divisor=3; estimate ’blup trt2’ intercept 3 trt 0 3 0|block 1 1 1 / divisor=3; estimate ’blup trt3’ intercept 3 trt 0 0 3|block 1 1 1 / divisor=3; run;

In the previous SAS GLIMMIX code, the “estimate” command shows the coefficients associated with the fixed effects before the vertical bar (∣) and after the vertical bar, are provided the coefficients for the random effects associated with the model, that is:

$$ {\boldsymbol{K}}^{\prime}\boldsymbol{\beta} +{\boldsymbol{Z}}^{\prime}\boldsymbol{b}=\left[\overset{\mathrm{efectosfijos}}{\overbrace{\begin{array}{ccc}1& 1& 0\kern0.5em 0\\ {}1& 0& \begin{array}{cc}1& 0\end{array}\\ {}1& 0& \begin{array}{cc}0& 1\end{array}\end{array}}}\left(\begin{array}{c}\eta \\ {}{\tau}_1\\ {}\begin{array}{c}{\tau}_2\\ {}{\tau}_3\end{array}\end{array}\right)+\overset{\mathrm{efectosaleatorios}}{\overbrace{\begin{array}{ccc}1& 1& 1\\ {}1& 1& 1\\ {}1& 1& 1\end{array}}}\left(\begin{array}{c}{\mathrm{block}}_1\\ {}{\mathrm{block}}_2\\ {}{\mathrm{block}}_3\end{array}\right)\right] $$

Part of the output is shown in Table 3.15. Subsection (a) shows the estimated variance components due to blocks, and for conditional observations, the effect of the blocks is \( {\hat{\upsigma}}_{\mathrm{block}}^2=11.2778 \), whereas the mean squared error (MSE) is \( {\hat{\sigma}}^2=18.2778 \). On the other hand, the fixed effects solution obtained with the “solution” option of the parameters is provided in part by (b). The analysis of variance (part (c)) indicates that there is a significant difference between treatments (P = 0.0045), and so the null hypothesis must be rejected (H0 : μ1 = μ2 = μ3).

Table 3.15 Variance components, fixed effects, and fixed effects test

The means estimated with the “estimate” statement, given that the mean block effect is zero, the mean and the best linear unbiased predictor for each treatment are similar, as shown in Table 3.16 (part (a)). Subsections (b) and (c) show the means and differences between two means estimated with the “lsmeans” statement and the “diff” option.

Table 3.16 Estimated means, best linear unbiased estimates (BLUEs), and BLUPs for treatment and the difference between two means

3.4 Marginal and Conditional Models

The process of analyzing a dataset has two main objectives: the first is model selection, which aims to find well-fitting parsimonious models for the responses being measured, and the second is model prediction, where estimates from the selected models are used to predict quantities of interest and their uncertainties.

The differences that may arise in this analysis process are mainly due to the choice of unidentifiable constraints on random effects. To compare two different models, we must compare analogous quantities. Different constraints can lead to apparently extremely different but inferentially identical models. The conditional model is believed to be the basic model, and any conditional model leads to a specific marginal model. Lee and Nelder (2004) proposed and worked on conditional models derived from generalized hierarchical linear models (GHLMs) and marginal models derived from these conditional models. Marginal models have often been fitted using generalized estimating equations (GEEs), the drawbacks of which are also discussed.

3.4.1 Marginal Versus Conditional Models

Consider two models with a normal distribution: one is a random effects model (a mixed model)

$$ {y}_{ij}=\mu +{\tau}_i+{b}_j+{\varepsilon}_{ij} $$

where \( {b}_j\sim N\left(0,{\sigma}_b^2\right) \) and εij~N(0, σ2). The other is a marginal model

$$ E\left({y}_{ij}\right)=\mu +{\tau}_i $$

where the elements in V(y) = Σ are variances and covariances that have an arbitrary correlation structure. Zeger et al. (1988) pointed out that given a marginal model, the generalized estimating equations are consistent. An obvious advantage of using random effects models is that they allow conditional inferences in addition to marginal inferences (Robinson 1991). Using the model with random effects, we can obtain not only the conditional mean

$$ {\mu}_{ij}^C=E\left({y}_{ij}|{b}_j\right)=\mu +{\tau}_i+{b}_j $$

but also the marginal mean

$$ {\mu}_{ij}=E\left({\mu}_{ij}^C\right)=E\left({y}_{ij}|{b}_j\right)=\mu +{\tau}_i $$

whereas with the marginal model, we can obtain only the marginal mean μij.

It may be reasonable to assume that the unobservable characteristic of the random effects of blocks (bj) follows a certain distribution. However, the center of this distribution cannot be identified because it is confounded with the intercept. Therefore, in the random effects model, we put the unidentifiable constraints E(bi) = 0 and E(εij) = 0 as we do for error terms in linear models. In the mixed model, these restrictions are \( {\sum}_j{\hat{b}}_j=0 \) and \( {\sum}_j{\hat{\varepsilon}}_{ij}=0 \) in any estimation procedure. First, we consider the case in which the data follow a normal distribution. We then briefly discuss how the results differ for data with a non-normal distribution.

3.4.2 Normal Distribution

Example

The effect of different substrates (factor A), i.e., three substrates made from vermicompost and one from compost, on the development of physiological variables and mortality of cuttings of three clones (factor B) of robusta coffee (Coffea canephora p.) was evaluated. The levels of factor A are randomly assigned to rows in each block, with the following restriction: each block receives levels A1, A2, A3, and A4 and each level of factor B (B1, B2, and B3) is randomly assigned to each level of factor A in each block. The data for this experiment are tabulated in Table 3.17.

Table 3.17 Mortality of coffee seedling clones (C) in different substrates (S)

Note that while there are two randomization processes, there are effectively three sizes of experimental units: rows for A levels, columns for B levels, and row–column intersections for A × B combinations. Thus, the experimental design used was a complete randomized design with a strip-plot treatment arrangement.

The model, for these data, is given below:

$$ {y}_{ij k}=\mu +{b}_k+{\alpha}_i+{\left(\alpha b\right)}_{ik}+{\beta}_j+{\left(\beta b\right)}_{jk}+{\left(\alpha \beta \right)}_{ij}+{\varepsilon}_{ij k} $$

where yijk is the kth response observed at the ith level of factor A and at the jth level of factor B, μ is the overall mean, bk is the random effect due to blocks assuming \( {b}_k\sim N\left(0,{\sigma}_b^2\right) \), αi is the fixed effect due to substrate type (S), (αb)ik is the random effect due to the interaction of a substrate with blocks assuming \( {\left(\alpha b\right)}_{ik}\sim N\left(0,{\sigma}_{\alpha b}^2\right) \), βj is the fixed effect due to the coffee clone type (C), (βb)jk is the random effect due to the interaction of a coffee clone with blocks assuming \( {\left(\beta b\right)}_{jk}\sim N\left(0,{\sigma}_{\beta b}^2\right) \), (αβ)ij is the interaction fixed effect between a substrate and a coffee clone, and εijk is the normal random error εijk~N(0, σ2). The components of the model for this dataset are as follows:

  • Linear predictor: ηijk = μ + bk + αi + (αb)ik + βj + (βb)jk + (αβ)ij

  • Distributions: yijk ∣ bk, (αb)ik, (βb)jk~N(μijk, σ2)

    • \( {b}_k\sim N\left(0,{\sigma}_b^2\right) \); \( {\left(\alpha b\right)}_{ik}\sim N\left(0,{\sigma}_{\alpha b}^2\right) \); \( {\left(\beta b\right)}_{jk}\sim N\left(0,{\sigma}_{\beta b}^2\right) \)

  • Link function: ηijk = μijk

The following GLIMMIX syntax sets a GLMM with a normal response.

proc glimmix; class block s c; model y=s|c; random intercept s w/subject=block; lsmeans s*c/ slicediff=s; run;

Part of the results of this analysis is shown below. The estimated variance components for blocks, block × substrate, blocks × clon, and the MSE are \( {\hat{\sigma}}_b^2=23.4714,{\hat{\sigma}}_{\alpha b}^2=35.4995,{\hat{\sigma}}_{\beta b}^2=67.0160\ \mathrm{and}\ {\hat{\sigma}}^2=\mathrm{CME}=139.58 \), respectively, which are listed in part (a) of Table 3.18. However, the fixed effects tests for both factors and the interaction (part (b)) are not statistically significant.

Table 3.18 Estimated variance components and type III tests of fixed effects

According to the “slicediff = s” option in the “lsmeans” statement, Table 3.19 shows the simple effects of each substrate level at varying clone levels.

Table 3.19 Simple effect comparisons across substrate levels

3.4.3 Non-normal Distribution

Example

Using the data in Table 3.17 but under a beta distribution, the components of the GLMM change slightly:

  • Distributions: yijk ∣ bk, (αb)ik, (βb)jk~Beta(μijk, ϕ), where ϕ is the scale parameter.

    • \( {b}_k\sim N\left(0,{\sigma}_b^2\right) \); \( {\left(\alpha b\right)}_{ik}\sim N\left(0,{\sigma}_{\alpha b}^2\right) \); \( {\left(\beta b\right)}_{jk}\sim N\left(0,{\sigma}_{\beta b}^2\right) \)

  • Linear predictor: ηijk = μ + bk + αi + (αb)ik + βj + (βb)jk + (αβ)ij

  • Link function: ηijk = logit(μijk)

The following GLIMMIX syntax sets a beta response variable.

proc glimmix method=laplace; class block s c; model pct=s|c/dist=beta; random intercept s w/subject=block; lsmeans s*c/plot=meanplot(sliceby=s join) slicediff=s ilink; run;

Some of the SAS output from this analysis is shown below. The variance components estimated for blocks, block × substrate, blocks × clon, and the scale parameter are \( {\hat{\sigma}}_b^2=0.06723,{\hat{\sigma}}_{\alpha b}^2=0.1594,{\hat{\sigma}}_{\beta b}^2=0.1932 \), and with scale parameter \( \hat{\phi}=16.6041, \) respectively, which are listed in part (a) of Table 3.20. However, the fixed effects tests for both factors and the interaction (part (b)) are not statistically significant. Unlike a normal distribution (the previous example), the variance components (multiplied by 100) under the beta distribution are smaller, and the type III fixed effects test is closer to be significant.

Table 3.20 Variance components and the fixed effects test

Table 3.21 shows, for each substrate level at varying clone levels, the estimates (linear predictors) of the simple effects. These effects differ from the previous results, but this is mainly because in a GLMM, these values correspond to the linear predictors estimated at the model scale and not to the estimated means at the data scale (Example 3.4.2). It is also important to note that the degrees-of-freedom correction in the estimation of means cannot yet be used in the estimation of a GLMM.

Table 3.21 Simple effect comparisons across substrate levels

3.5 Exercises

Exercise 3.5.1

The data in the Table 3.22 below show the yield of five barley varieties in a randomized complete block experiment conducted in Minnesota (Immer et al. 1934).

Table 3.22 Total yields (grams) of barley varieties in 12 independent trials
  • Write a complete description of the statistical model associated with this study and the assumptions of this model.

  • Compute the ANOVA for the design model according to part (a) and determine whether there is a significant difference in the varieties.

  • Use the least significance difference (LSD) method to make pairwise comparisons of variety mean yields.

Exercise 3.5.2

Lew (2007) conducted an experiment to determine whether cultured cells respond to two drugs (chemical formulations). The experiment was conducted using a cell culture line placed in Petri dishes. Each experimental trial consisted of three Petri dishes: one treated with drug 1, one treated with drug 2, and one untreated as a control. The data are shown in the following Table 3.23:

Table 3.23 Number of cells cultured in different drugs
  1. (a)

    Write a complete description of the statistical model associated with this study and the assumptions of this model.

  2. (b)

    Analyze the data using a completely randomized design. Is there a significant difference between the treatment groups?

  3. (c)

    Analyze the data as a randomized complete block design, where the number of trials represents a blocking factor.

  4. (d)

    Is there any difference in the results obtained in (a) and (b)? If so, explain what might be the cause of the difference in results and what method would you recommend?