9.1 Introduction

Repeated measures data, also known as longitudinal data, are those derived from experiments in which observations are made on the same experimental units at various planned times. These experiments can be of the regression or analysis of variance (ANOVA) type, can contain two or more treatments, and are set up using familiar designs, such as completely randomized design (CRD), randomized complete block design (RCBD), or randomized incomplete blocks, if blocking is appropriate, or using row and column designs such as Latin squares when appropriate. Repeated measures designs are widely used in the biological sciences and are fairly well understood for normally distributed data but less so with binary, ordinal, count data, and so on. Nevertheless, recent developments in statistical computing methodology and software have greatly increased the number of tools available for analyzing categorical data.

A generalized linear mixed model (GLMM) is one of the most useful and sophisticated structures in modern statistics, as it allows complex structures to be incorporated into the framework of a general linear model. Fitting such models has been the subject of much research over the last three decades. GLMMs, for repeated measures, combine both generalized linear model (GLM) theory (e.g., a binomial, multinomial, or Poisson response variable) and linear mixed effects models.

Experimentation is sometimes not well understood since researchers believe that it involves only the manipulation of the levels of independent variables and the observation of subsequent responses in dependent variables. Independent variables, whose levels are determined or set by the experimenter, are said to have fixed effects, although random effects are also very common, where the levels of the effects are assumed to be randomly selected from an infinite population of possible levels. Many variables of interest in research are not fully amenable to experimental manipulation but can nevertheless be studied by considering them to have random effects. For example, the genetic composition of individuals of a species cannot be manipulated experimentally, but it is of great interest to geneticists aiming to assess the genetic contribution to individual variation of some specific behaviors.

A GLMM with repeated measures is a generalization of the standard linear model, and this generalization is due to (1) the presence of more than one response variable that can be binary, ordinal, count, and so on and (2) the nonconstant correlation and/or variability exhibited by the data. The linear mixed model, therefore, gives you the flexibility to model not only the means of your data (as in the standard linear model) but also their variances and covariances. Usually, a normal distribution is assumed for random effects. Since normally distributed data can be modeled entirely in terms of their means and variances/covariances, the two sets of parameters in a linear mixed model actually specify the full probability distribution of the data. The parameters of the mean structure in the model are called (known as) fixed effects parameters, which can be qualitative (as in traditional analysis of variance) or quantitative (as in standard regression), and the parameters of the variance–covariance of the model are known as covariance parameters, which help distinguish a linear mixed model from the standard linear model. Covariance parameters come up quite frequently in the following applications, with two more typical scenarios:

  1. (a)

    Experimental units on which data are measured can be grouped into clusters, and data from a common cluster are correlated.

  2. (b)

    Repeated measurements of the same experimental unit are taken, and these repeated measurements correlate or show some variability.

The first scenario can be generalized to include a set of clusters nested within one another. For example, if students are the experimental unit, they can be grouped into classes (clusters), which, in turn, can be grouped into schools. Each level of this hierarchy may present an additional source of variability and correlation. The second scenario occurs in longitudinal studies, in which repeated measurements of the same experimental unit over time are taken. Alternatively, these repeated measures could be spatial or multivariate.

9.2 Example of Turf Quality

The proportional odds model, introduced by McCullagh (1980), was proposed as an extension of the generalized linear model used for ordinal responses. One can recall that the proportional odds model is a special case of a GLM with a cumulative link function in which the probability of an observation falling into a category or below is modeled. In the case of a logit link, with only two categories (a binary response), the proportional odds model reduces to a standard logistic regression or a classification model. As with any other type of response variable, repeated measurements are common in agronomic research. They result in clustered data structures with correlations between repeated observations in the same experimental unit that must be taken into account in the analysis.

The data were obtained from an experiment studying the turf quality of five grass varieties. The varieties were sown independently in 17 or 18 plots. The evaluations of the plots (experimental units) were carried out in the months of May, July, and September of the growing season, and turf quality was classified on an ordinal scale into three categories: low quality, medium quality, and excellent quality, as demonstrated in Table 9.1.

Table 9.1 Turf quality of five grass varieties (low, Med = medium, Excel = Excellent, Sept = September)

The components of the GLMM, with repeated measures with an ordinal multinomial response, are as follows:

  • Distributions: y1ij, y2ij, y3ij|ρij~Multinomial(Nij, π1ij, π2ij, π3ij), where y1ij, y2ij, and y3ij are the observed frequencies of the responses (turf quality) in each c category (low, medium, and excellent), and ρij is the random effect due to the combination variety × month (measurement time), assuming \( {\rho}_{ij}\sim N\left(0,{\sigma}_{\rho}^2\right) \).

  • Linear predictor: η(c)ij = ηc + τi + ρij, where η(c)ij is the cth link (c = 1, 2) in the ijth combination variety × month, ηc is the intercept for the cth link, τi is the fixed effect due to the ith treatment, and ρij is the random effect due to the ijth measurement of variety × month \( \left({\rho}_{ij}\sim N\left(0,{\sigma}_{\mathrm{variety}\times \mathrm{month}}^2\right)\right) \). The link functions for each category are as follows:

$$ \log \left(\frac{\pi_{0 ij}}{1-{\pi}_{0 ij}}\right)={\eta}_{0 ij} $$
$$ \log \left(\frac{\pi_{0 ij}+{\pi}_{1 ij}}{1-\left({\pi}_{0 ij}+{\pi}_{1 ij}\right)}\right)={\eta}_{1 ij} $$

The following Statistical Analysis Software (SAS) program fits a repeated measures GLMM with an ordinal response.

proc glimmix data=turfgrass method=laplace; class Variety time; model cat(order=data)=variety|time/dist=Multinomial link=clogit solution oddsratio; random intercept/subject=variety type=cs solution ; estimate ’c=1, var=1’ intercept 1 0 variedad 1 0 0 0 0, ’c=2, var=1’ intercept 0 1 variedad 1 0 0 0 0, ’c=1, var=2’ intercept 1 0 variedad 0 1 0 0 0, ’c=2, var=2’ intercept 0 1 variedad 0 1 0 0 0, ’c=1, var=3’ intercept 1 0 variedad 0 0 1 0 0, ’c=2, var=3’ intercept 0 1 variedad 0 0 1 0 0, ’c=1, var=4’ intercept 1 0 variedad 0 0 0 1 0, ’c=2, var=4’ intercept 0 1 variedad 0 0 0 1 0, ’c=1, var=5’ intercept 1 0 variedad 0 0 0 0 1, ’c=2, var=5’ intercept 0 1 variedad 0 0 0 0 1/ilink; freq y; run;

Mixed models have advantages over fixed linear models (Littell et al. 1996) because they have the ability to incorporate fixed () and random effects (Zb) that allow us to select different variance–covariance structures for repeated measures experiments (with or without missing data) to see which covariance structure best fits the model (Henderson 1984; Smith et al. 2005). Selecting or building a good enough model involves selecting a covariance structure that best fits the dataset. The information criteria minus two Restricted Log Likelihood (−2RLL), Akaike information criterion (AIC), Corrected Akaike’s information criterion (AICC), Bayesian information criterion (BIC), etc.) provided by proc GLIMMIX are used as statistical fit measures to select the variance structure (compound symmetry (“CS”), first-order autoregressive (“AR(1)”), Toeplitz (“Toep(1)”), unstructured (“UN)”) that best models the dataset.

Most of the commands have already been explained. To provide the correlation structure that you want to model, with the above program, you vary the “TYPE” option = (CS, AR(1), Toep(1), and UN) separately to specify each of the covariance structures in the parentheses. Part of the results is shown below.

According to the fit statistics (Table 9.2), the covariance structure that best fits the dataset is Toeplitz of order 1 (Toep(1)). The type III tests of fixed effects, shown in Table 9.3 part (a), indicate that grass variety provides different turfgrass qualities (P = 0.0202). The “solution” option in the model specification “Model” provides the solution of fixed effects of the model (intercepts and treatments), which we use to estimate the linear predictors \( {\hat{\eta}}_{ci}={\hat{\eta}}_c+{\hat{\mathrm{Variety}}}_i \) (part (b)).

Table 9.2 Fit statistics under different correlation structures
Table 9.3 Results of the analysis of variance

The probabilities \( {\hat{\pi}}_{ci} \) obtained using the “Estimate” information are tabulated under the “Mean” column of Table 9.4.

Table 9.4 Estimated linear predictors and means on the model scale (Estimate) and on the data scale (Mean) for observed turfgrass quality in grass varieties in the multinomial generalized logit model

From these values, we can observe that for the category "c = 1, var  = 1, " the value of the linear predictor is \( {\hat{\eta}}_{11}={\hat{\eta}}_1+{\hat{\mathrm{variety}}}_1=-2.0248 \). Taking the inverse of \( {\hat{\eta}}_{11} \) corresponds to the probability of \( {\hat{\pi}}_{11}=0.1166 \) of observing “Low”-quality grass of variety 1. Now, for the category "c = 2, var  = 1, " the inverse of the linear predictor is 0.6507, which is the estimate of the probability \( {\hat{\pi}}_{11}+{\hat{\pi}}_{21} \). From this value, we can obtain the probability that variety 1 provides grass of “Medium” quality, that is, \( {\hat{\pi}}_{11}+{\hat{\pi}}_{21}=0.6504 \), and, substituting the value of \( {\hat{\pi}}_{11}, \) we obtain the probability value \( {\hat{\pi}}_{21}=0.6507-0.1166=0.5341 \). With these two probability estimates \( {\hat{\pi}}_{11} \) and \( {\hat{\pi}}_{21} \), it is possible to estimate the probability that variety 1 will yield an “Excellent” quality turf, which is equal to \( {\hat{\pi}}_{31}=1-0.6504=0.3496 \). Likewise, we obtain the values of the remaining probabilities \( {\hat{\pi}}_{ci} \) for the rest of the grass varieties.

9.3 Effect of Insecticides on Aphid Growth

A cage experiment was used to investigate the effect of three insecticides on aphid colonies with partial resistance to a common active compound. There were eight treatments: all combinations of the three insecticides and a control (no insecticide) with two types of colonies (susceptible or partially resistant). The experiment was organized as an RCBD with six blocks of eight cages, and each cage was assigned a treatment combination in each block. A colony of aphids was reared in each cage, and the number of live aphids was recorded before insecticide treatment was applied and then 2 and 6 days after application. Both hatches and deaths could occur within each cage between evaluations. The dataset from this experiment is shown below (Table 9.5).

Table 9.5 Effect of insecticides (C = control, R = resistant, S = susceptible) on aphid growth

Following the same reasoning as in previous examples, the components of the GLMM with a Poisson response and repeated measures, which models the number of aphids (yijkl), is described in the following lines.

$$ \mathrm{Distributions}:{y}_{ij kl}\mid {b}_l,\mathrm{insecticide}\times \mathrm{clone}{\left(\mathrm{block}\right)}_{ij(l)}\sim \mathrm{Poisson}\left({\lambda}_{ij kl}\right) $$
$$ {b}_l\sim N\left(0,{\sigma}_{\mathrm{block}}^2\right),\kern0.5em \mathrm{insecticide}\times \mathrm{clone}{\left(\mathrm{block}\right)}_{ij(l)}\sim N\left(0,{\sigma}_{\mathrm{insecticide}\times \mathrm{clone}\times \mathrm{block}}^2\right). $$
  • Linear predictor: ηijkl = θ + Ii + Cj + (IC)ij + bl + IC(b)ij(l) + τl + ()il + ()il + (ICτ)ijkl, where ηijkl is the linear predictor, θ is the intercept, Ii (i = 1, 2, 3) is the fixed effect due to the insecticide, Cj (j = 1, 2) is the fixed effect due to the aphid clone, (IC)ij is the fixed effect due to the interaction between the type of insecticide and clone, bl (k = 1, 2, 3) is the random block effect, assuming \( {b}_l\sim N\left(0,{\sigma}_{\mathrm{block}}^2\right) \), IC(b)ij(l) is the random effect of the interaction between the insecticide and clone within blocks, assuming \( \mathrm{insecticide}\times \mathrm{clone}{\left(\mathrm{block}\right)}_{ij(k)}\sim N\left(0,{\sigma}_{\mathrm{insecticide}\times \mathrm{clone}\times \mathrm{block}}^2\right) \), τl (l = 1, 2, 3) is the fixed effect due to measurement time, and ()il and (ICτ)ijkl are the fixed effects due to interaction.

  • Link function: log(λijkl) = ηijk is the link function that relates the linear predictor to the mean (λijkl).

The following SAS program adjusts the GLMM with a Poisson distribution on repeated measures.

proc glimmix nobound method=laplace; class ID Block Insecticide Cage Clone time; model y = Insecticide|clone|time/dist=poi; random intercept Insecticide*Clon/subject=block; lsmeans Insecticide|Clon|time/lines ilink; run;quit;

Before fitting the GLMM, we compare the estimates of covariance structures with a Poisson distribution assumed in the response variable. According to the fit statistics, the covariance structure that best models the data is the autoregressive type of order 1 (AR(1)). The value of the fit statistic of the conditional distribution Pearsons chi − square/DF = 5.77 indicates that there is an extra variation (aka overdispersion) and that the Poisson distribution does not adequately fit the data (Table 9.6).

Table 9.6 Results of the analysis of variance in the Poisson GLMM

Since there is overdispersion in the data, a highly recommended alternative is to find another suitable (or more appropriate) distribution for this dataset. In this case, the linear predictor will be the same, although now, a negative binomial distribution will be assumed in the response variable. That is,

$$ {y}_{ij kl}\mid {b}_l,\mathrm{insecticide}\times \mathrm{clone}{\left(\mathrm{block}\right)}_{ij(l)}\sim \mathrm{Negative}\ \mathrm{Binomial}\left({\lambda}_{ij kl},\phi \right) $$

This negative binomial model arises by assuming that the conditional distribution of observations given random blocks and Insecticide*clone(block)ij(l)) is as follows: yijklbl, insecticide*clone(block)ij(l) ~ Poisson(λijkl), where \( {\lambda}_{ijkl}\sim \mathrm{Gamma}\left(\frac{1}{\phi },\phi \right) \). The result of the new distribution of yijklbl, insecticide × clone(Block)ij(l) is a negative binomial (Negative binomial (λijkl, ϕ)). The link function is log(λijkl) = ηijkl.

The following SAS code fits the GLMM with a negative binomial distribution.

proc glimmix nobound method=laplace; class ID Block Insecticide Cage Clone time; model y = Insecticide|clone|time/dist=negbi; random intercept Insecticide*Clone/subject=block; lsmeans Insecticide|Clone|time/lines ilink; run;

Part of the results is shown in Table 9.7. The values of the fit statistics, assuming a negative binomial distribution of the data, are shown in part (a), and the value of the conditional statistic is observed in part (b) (Pearsons chi − square/DF = 0.81). This indicates that overdispersion has been eliminated from the data, and, so, the negative binomial distribution adequately models the response variable.

Table 9.7 Fit statistics

The estimated variance components are shown in part (a) of Table 9.8, under an AR(1) covariance structure. The estimates of the variance components of blocks, the interaction between the insecticide and clone within blocks, and the scale parameter are \( {\hat{\sigma}}_{\mathrm{block}}^2=0.06613 \), \( {\hat{\sigma}}_{\mathrm{insecticide}\times \mathrm{clone}\left(\mathrm{block}\right)}^2=-0.7575 \), and \( \hat{\phi}=0.1584 \), respectively. The fixed III type effects tests (part (b)) indicate that there is a significant effect of insecticide type (P < 0.0001), clone (P = 0.0387), measurement time (P = 0.0137), and interactions insecticide x measurement time (P < 0.0001) and clone x measurement time (P = 0.0259) on the average number of aphids. The interaction insecticide x clone x measurement time is close to significance (P < 0.0663).

Table 9.8 Estimated variance components and tests of fixed effects

The linear predictors and estimated means of the factors and interaction are under the “Estimate” and “Mean” columns, respectively. Average number of aphids for insecticide, clone and time are given below:

For insecticide type (Table 9.9):

Table 9.9 Estimates of insecticide least squares (LS) means on the model scale (Estimate) and the data scale (Mean)

For clone (Table 9.10):

Table 9.10 Clone least squares means on the model scale (Estimate) and the data scale (Mean)

For the interaction insecticide*clone (Table 9.11):

Table 9.11 Insecticide*clone least squares means on the model scale (Estimate) and the data scale (Mean)

For measurement time (Table 9.12):

Table 9.12 Time least squares means on the model scale (Estimate) and the data scale (Mean)

For the interaction insecticide*time (Table 9.13):

Table 9.13 Insecticide*time least squares means on the model scale (Estimate) and the data scale (Mean)

For the interaction clone*time (Table 9.14):

Table 9.14 Clone*time least squares means on the model scale (Estimate) and the data scale (Mean)

For the interaction insecticide*clone*time (Table 9.15):

Table 9.15 Insecticide*clone*time least squares means on the model scale (Estimate) and the data scale (Mean)

9.4 Manufacture of Livestock Feed

In this experiment, two types of pelleted feed were manufactured using different amounts of whole sorghum. Using the whole grain resulted in one feed with a high pellet durability index (PDI) and one with a low PDI. The researcher was interested in how much impact this difference in PDI would have on the amount of intact and pelleted feed distributed to the different positions along the feeding line. The line was fed four times with the high PDI feed and four times with the low PDI feed. After each run, the total weight of the feed in each of the 12 identified trays was measured. The feed was then sieved into each tray, and the crushed fine granules were weighed in the feed line. The response of interest was the ratio (proportion) between the weight of fine granules and the total weight of the feed for each tray. The data for this experiment are in the Appendix (Data: Feeding line experiment).

The experimental design used in this study was a split plot in a randomized completely design. There were 2 fixed factors, feed with 2 levels (high PDI feed (H) and low PDI feed (L)), and a tray with 12 levels (1, 2, 3, ..., 12 locations along the feed line). Different run levels (1, 2, 3, 4 runs in the feed line) may influence the inference of this experiment, so it is advisable to analyze which variance structure is suitable for this analysis.

The ANOVA table (Table 9.16) with degrees of freedom for this experiment is shown below.

Table 9.16 Results of the analysis of variance of the experiment

The researcher aims to draw conclusions about the destructiveness in the feed line with two types of feed, high PDI and low PDI. The following GLMM is used to describe the experiment:

$$ {y}_{ij k}=\mu +{\alpha}_i+\alpha {(r)}_{ik}+{\beta}_j+{\left(\alpha \beta \right)}_{ij}+{\varepsilon}_{ij k} $$

where yijk is the proportion observed in the run k (k = 1, 2, 3, 4), tray j (j = 1, 2, …, 12), and in feed i (i = 1, 2); μ is the overall mean; αi is the fixed effect of feed i; α(r)ik is the random effect of the ith feed within the kth run, assuming \( \alpha {(r)}_{ik}\sim N\left(0,{\sigma}_{\alpha r}^2\right) \); βj is the fixed effect due to the jth tray; (αβ)ij is the effect of the interaction between the ith feed and the jth tray; and εijk is the experimental error. The components of the conditional GLMM assuming that the response variable follows a beta distribution are listed below:

The distribution of the response variable is given by yikjα(r)ik~Beta(μ + αi + α(r)ik + βj + (αβ)ij, ϕ) whose linear predictor is ηijk = μ + αi + α(r)ik + βj + (αβ)ij with link function \( \mathrm{logit}\left(\frac{\pi_{ijk}}{1-{\pi}_{ijk}}\right)={\eta}_{ijk} \). The following GLIMMIX syntax fits a GLMM with a beta distribution.

proc glimmix method=laplace; class tray feed run; model ratio = feed|tray/dist=beta; random intercept/subject=feed(run) type=toep(1); lsmeans feed|tray/lines ilink; run;

Part of the output is shown below. Four covariance structures (“CS,” “AR(1),” “Toep(1),” and “UN”) were tested to see which one best fits the response variable. Of these covariance structures, “Toep(1)” produced the best fit statistics (part (a), Table 9.17).

Table 9.17 Results of the analysis of variance

Another important result that gives the guideline to continue with the analysis is the conditional distribution statistic (Pearsons chi − square/DF = 0.96), whose value indicates that the beta model adequately fits the data, whereas the fixed effects tests (part (c)) indicate that there is a statistically significant effect of feeding type (P = 0.0001) and tray (P = 0.0001).

The linear predictors and estimated probabilities of the factors and interaction are listed under the “Estimate” and “Mean” columns of the following tables, respectively.

For the feeding line (Table 9.18):

Table 9.18 Feed least squares means on the model scale (Estimate) and the data scale (Mean)

For the tray (Table 9.19):

Table 9.19 Tray least squares means on the model scale (Estimate) and the data scale (Mean)

For the interaction feeding*tray (Table 9.20):

Table 9.20 Tray*feed least squares means on the model scale (Estimate) and the data scale (Mean)

9.5 Characterization of Spatial and Temporal Variations in Fecal Coliform Density

During a 1-month period (June 1981), 30 river water samples were collected from the channel at 3 stations, A, B, and C (downstream to upstream) on 5 randomly selected days at 9:00 a.m. and 3:00 p.m. (1 sample per station per hour per day). Each sample was analyzed for fecal coliform by method FC-96. The data from this experiment are shown in Table 9.21.

Table 9.21 Variation in fecal coliform densities of the river water samples from three sampling stations on five sampling days at 9:00 a.m. (TM = 1) and 3:00 p.m. (TM = 2)

To assess the relative magnitudes of sources of variation due to time, site, and subsampling on the number of coliforms per milliliter (yijk), an analysis of variance using a GLMM with a Poisson response was performed, as described below:

We denote yijk as the number of colonies per milliliter, whose conditional distribution is given by yijk∣sampling(site)ik ~ Poisson (λijk) with the linear predictor ηijk defined by

$$ {\eta}_{ij k}=\theta +{site}_i+\mathrm{sampling}{\left(\mathrm{site}\right)}_{ik}+{\mathrm{time}}_j+{\left(\mathrm{site}\times \mathrm{time}\right)}_{ij} $$
$$ \left(i=1,2,3;j=1,2,3,4,5;k=1,2\right) $$

where ηijk is the linear predictor that relates the linear function to the mean, θ is the intercept, sitei is the fixed effect due to the sampling site i, sampling(site)ik is the random effect due to the sampling time nested within the site, assuming \( \mathrm{sampling}{\left(\mathrm{site}\right)}_{ik}\sim N\left(0,{\sigma}_{\mathrm{sampling}\left(\mathrm{site}\right)}^2\right) \), timej is the fixed effect due to sampling date, and (site × time)ij is the effect of the interaction between the site and sampling date. The link function for this model is log(λijk) = ηijk.

The following GLIMMIX syntax fits a GLMM with a Poisson response.

proc glimmix data=ufc nobound method=laplace; class T TM Site ; model ufc = Site|T/dist=Poisson link=log; random intercept/subject=TM(Site) type=toep(1); lsmeans Site|T/lines ilink; run;

Part of the results is summarized in Table 9.22. To determine which covariance structure best models the response variable, four types were tested (part (a)), all of which produced very similar results. Because of these results, the “Toep(1)” covariance structure was chosen. From this, the fit statistics were obtained, and the value of the conditional distribution statistic is Pearsons chi − square/DF = 54.41. This value indicates that there is a strong overdispersion in the dataset. Therefore, it is important to look for an alternative distribution that solves this problem.

Table 9.22 Results of the analysis of variance

The hypothesis tests in part (c) indicate that there is a significant difference in the date of sampling (P = 0.0001) as well as in the interaction between the site and date of sampling (P = 0.0001). That is, the concentration of fecal coliform units per milliliter is affected by the date of data collection. However, we observed that there is an excessive dispersion in the data. One way to check for and deal with overdispersion is to run a quasi-Poisson model, which, during the fitting process, adds an additional dispersion parameter to account for that additional variance. Another option is to look for a distribution that adequately fits the data; in this case, the negative binomial distribution is a good alternative.

Next, we will implement the analysis assuming that the response variable is distributed under a negative binomial distribution. This means that the distribution of yijk (number of colonies per militro) is given by yijk ∣ smapling(site)ik~Negative Binomial (λijk, ϕ), where ϕ is the scale parameter. However, the linear predictor ηijk and the link function remain unchanged.

The following GLIMMIX commands fit a GLMM with a negative binomial distribution.

proc glimmix data=ufc nobound method=laplace; class T TM Site; model ufc = Site|T/dist=negbin; random intercept/subject=TM(Site)/type=Toep(1); lsmeans Site|T/lines ilink; run;

Part of the output of the above program is shown below. The values of the fit statistics under the negative binomial distribution (part (a) of Table 9.23) are much smaller compared to those obtained assuming the Poisson model, indicating that the negative binomial distribution adequately fits the response variable. Furthermore, the value of the conditional distribution statistic indicates that the negative binomial distribution is a good distribution for these data (Pearsons chi − square/DF = 0.76).

Table 9.23 Fit statistics under the negative binomial distribution

This parameter \( \left(\mathrm{Pearso}{\mathrm{n}}^{\prime}\mathrm{s}\ \mathrm{chi}-\frac{\mathrm{square}}{\mathrm{DF}}=0.76\right) \) refers to how many times the variance is larger than the mean. Since this value is less than 1 (part (b)), the conditional variance is actually smaller than the conditional mean, indicating that overdispersion has been removed in the fitting of the data. Another direct effect observed when there is no overdispersion is the F-values of the fixed effects tests (Table 9.24). In this case, the date on which the samples were collected was significant but not the interaction between the two factors, as the case when the data were fitted using the Poisson GLMM.

Table 9.24 Type III fixed effects tests

The linear predictors and estimated probabilities of the main effects and the interaction between both factors are under the columns “Estimate” and “Mean,” respectively. The sampling site averages are presented below (Table 9.25).

Table 9.25 Means and standard errors on the model scale (Estimate) and on the data scale (Mean) of the sampling site data

The averages by sampling date are listed below (Table 9.26).

Table 9.26 Means and standard errors of measurement time on the model scale (Estimate) and the data scale (Mean)

The means of the interaction site × sampling date are shown below (Table 9.27).

Table 9.27 Means and standard errors for the interaction T*site on the model scale (Estimate) and the data scale (Mean)

9.6 Log-Normal Distribution

Positively skewed distributions are highly common, especially when modeling biological data. Data often have a lower bound, usually 0 or the detection limit, but have no restriction on the upper bound. Therefore, when the data are below the median, no observation can be further away than the lower bound; however, when the data are above the median, there may be values that are many times further away, giving a positively skewed distribution. These skewed distributions can often be approximated by a log-normal distribution (Limpert et al. 2001).

A log-normal distribution is characterized by having only positive nonzero values, positive skewness, a nonconstant variance that is proportional to the square of the mean value, and a normally distributed natural logarithm. The probability density function for a log-normal distribution has an asymmetric appearance, with a larger amount of data below the expected value and a thinner right tail with higher values. Figure 9.1 shows the positive skewness of a log-normal distribution with mean 1 and standard deviation 0.6.

Fig. 9.1
A graph plots f of x versus x. It plots an increasing trend till 4 x, plotting the highest at 0.29 at 4 x. It plots decreasing values till 10 x, plotting the lowest at 0, and continuing to plot 0 till 25 x.

Density function of the log-normal distribution with parameters 1 and 0.6

9.6.1 Emission of Nitrous Oxide (N2O) in Beef Cattle Manure with Different Percentages of Crude Protein in the Diet

The experiment was conducted between January and February 2017 at the Colegio de Postgraduados Campus Córdoba located in Amatlán de los Reyes, Veracruz, México. The genetic material used were four 5–6-month-old males of the Criollo lechero tropical (CLT) breed, randomly distributed in individual pens of 4.8 × 2.1 m2, each one with 75% shade, a cup drinker, and a drawer-type feeder. To ensure the required crude protein percentages for each treatment, the following diets (treatments 1–4) were developed: Trt1 (12% crude protein), Trt2 (14% crude protein), Trt3 (16% crude protein), and Trt4 (commercial feed with 16% crude protein). Each animal randomly received the four treatments in different periods. Each treatment was applied for 11 days, of which the first 7 were considered adaptation days and the following 4 days were used for the measurement of gases in the daily accumulated excreta. The experiment had a total duration of 44 days. The data from this experiment are tabulated in the Appendix (Data: Nitrous oxide emission). The N2O gas fluxes in ppm were calculated from a linear or nonlinear increase of the concentrations inside the static chambers over time, and these fluxes were converted to micrograms of N2O–N per m2 per hour (y); for more details, see the study by Nadia Hernández-Tapia et al., (2019). The statistical model used in this study was an analysis of covariance model in a randomized complete block design with repeated measures, as described below.

$$ {y}_{ij k}=\mu +{\tau}_i+{\mathrm{animal}}_j+{\mathrm{time}}_k+{\left(\tau \times \mathrm{time}\right)}_{ik}+{\beta}_i\left({x}_{ij}-\overline{x}\right)+{\varepsilon}_{ij k} $$

where yijk is the flux of N2O–N (μg m−2 h−1); μ is the overall mean; τi is the fixed effect due to treatment i (i = 1, 2, 3, 4); animalj is the random effect due to animal j (j = 1, 2, 3, 4), assuming animalj~N(0, σ2animal); timek is the fixed effect of time k (k = 1, 2, 3, 4, 5) at the time of measurement; (τ × time)ik is the effect of the interaction between τi and timek, βi is the coefficient of linear regression of the covariate xij in treatment i and time j, where xij can be the pH, humidity (HE), temperature (TE) in the manure, maximum temperature (TMaxA), minimum temperature (TMinA), maximum humidity (HMaxA), minimum humidity (HMinA), or initial weight (kilograms) at the start of a treatment; \( \overline{x} \) is the mean of the covariate in question; and εijk is the non-normal experimental error.

The linear predictor ηijk for N2O–N is \( {\eta}_{ij k}=\mu +{\tau}_i+{\mathrm{animal}}_j+{\mathrm{tim}}_{ek}+{\left(\tau \ast \mathrm{time}\right)}_{ik}+{\beta}_i\left({x}_{ij}-\overline{x}\right) \). The response variable yijk has a conditional log-normal distribution with a mean μijk and variance \( \left({e}^{\sigma^2}-1\right).{e}^{2\mu +{\sigma}^2} \), that is, yijk∣animalj ~ Log normal (μijk, \( \left({e}^{\sigma^2}-1\right).{e}^{2\mu +{\sigma}^2} \)); the rest of the parameters have already been described above.

The following GLIMMIX syntax adjusts a GLMM with a log-normal response:

proc glimmix data=co2 nobound method=laplace; class animal trt time; model flox =trt|time xbar/dist=lognormal; random intercept /subject=animal type=cs; lsmeans trt|time/lines ilink; run;

Although most of the commands have already been described in previous chapters, in this chapter, we average the TMinA covariate “xbar.” Part of the output is shown below.

The gas emissions from cattle manure, regardless of the treatment applied, are influenced by several factors (covariates) that the researcher cannot control, which have a significant effect on the estimation of means and experimental error. Both are linearly related to the response variable. Covariates such as pH, humidity, and temperature of the excreta, as well as the temperature and humidity (maximum and minimum) of the environment, influence the dynamics of gas emission. These covariates were considered and analyzed in the covariance model to adjust the estimated means of the N2O–N flux. Based on the fit statistics obtained from the proposed models (Table 9.28), the model that best explains the variability of the N2O–N flux is model 5 because this model provides the lowest values in AIC, AICC, BIC, and MSE (Mean Square Error). Therefore, the model that provides the best fit or explains the most variability in the N2O–N flux is the one that includes the minimum environment temperature.

Table 9.28 Fit statistics in the different models proposed

The conditional fit statistics (part (a)) and the estimated variance components (part (b)) are shown in Table 9.29. The type III fixed effects tests (part (c)) indicate that there is a significant effect of Trt (P = 0.0008), time (P = 0.0288), the interaction Trt × time (P = 0.0140), the covariate Tmin (P = 0.0079), and the interaction Tmin × Trt (P = 0.038).

Table 9.29 Conditional fit statistics, variance components, and type III fixed effect tests

The average N2O–N emissions between Trt1 (12% CP: Crude Protein) and Trt2 (14% CP) were statistically different from each other. Treatment 1 emitted the highest N2O–N flux despite being the treatment with the lowest percentage of CP (Table 9.30).

Table 9.30 Mean and standard error of N flux2 O (μg of N2O-N m−2 h−1) of the different treatments under study

9.7 Effect of a Chemical Salt on the Percentage Inhibition of the Fusarium sp.

In order to observe the tolerance of the fungus Fusarium sp. to different concentrations of a chemical salt, a bioassay was implemented to evaluate the percentage of inhibition of the fungus. This bioassay consisted of placing a nutritive culture medium in Petri dishes for the fungal development in which different concentrations of the salt in ppm were added (0, 500, 1000, and 2000, ). Mycelium growth was measured during 6 days, and the percentage of inhibition of Fusarium sp. growth was calculated. Part of the data is shown below, and the complete base is in the Appendix (Data: Percentage inhibition).

Bio

Day

Conc

Rep

Y

Bio

Day

Conc

Rep

Y

1

1

0

3

5.263

2

1

0

2

0.0016

1

1

0

4

5.263

2

1

0

3

14.285

1

2

0

2

1.935

2

2

500

2

31.506

1

2

0

3

4.516

2

2

500

3

42.465

1

3

0

2

1.234

2

3

500

3

35.042

1

3

0

3

3.703

2

3

500

4

24.786

1

4

0

3

4.672

2

4

500

2

23.123

1

4

500

1

19.626

2

4

500

3

27.927

1

5

0

3

4.065

2

5

500

1

13.253

1

5

0

4

4.065

2

5

500

2

21.285

1

6

1000

3

15.862

2

6

2000

1

31.197

1

6

1000

4

18.62

2

6

2000

2

29.173

1

6

2000

1

32.413

2

6

2000

3

29.848

1

6

2000

2

29.655

2

6

2000

4

30.522

1

6

2000

3

31.724

     

1

6

2000

4

35.172

     

Following the same reasoning as in previous examples, the components of the GLMM with beta response distribution repeated-measures for the percentage inhibition of Fusarium sp. (yijkl) are listed below:

  • Distributions: yijkl ∣ ωkl, conc(ω)i(kl)~Beta(πijkl, ϕ); i = 1, ⋯, 4; j = 1, …, 6; k = 1, 2; l = 1, …, ri. \( {\omega}_{kl}\sim N\left(0,{\sigma}_{\omega}^2\right) \), \( \mathrm{conc}{\left(\omega \right)}_{i(kl)}\sim N\left(0,{\sigma}_{\mathrm{conc}\left(\omega \right)}^2\right) \).

$$ \mathrm{Linear}\ \mathrm{predictor}:{\eta}_{ij k}=\theta +{conc}_i+{\omega}_{kl}+\mathrm{conc}{\left(\omega \right)}_{i(kl)}+{\mathrm{time}}_j+{\left(\mathrm{conc}\times \mathrm{time}\right)}_{ij.} $$

where ηijk is the linear predictor, θ is the intercept, conci is the fixed effect of salt concentration, ωkl is the random effect of the Petri dish within the bioassay, assuming \( {\omega}_{kl}\sim N\left(0,{\sigma}_{\omega}^2\right) \), conc(ω)i(kl) is the random effect of salt concentration–Petri dish–bioassay, assuming \( \mathrm{conc}{\left(\omega \right)}_{i(kl)}\sim N\left(0,{\sigma}_{\mathrm{conc}\left(\omega \right)}^2\right) \), timej is the fixed effect due to the day of measurement, and (conc × time)ij is the interaction effect of chemical salt concentration with the day of measurement.

  • Link function: logit(πijkl) = ηijkl is the link function that relates the linear predictor to the mean (πijkl).

The following SAS program adjusts the beta GLMM with repeated measures.

proc glimmix data=inhibition method=laplace nobound; class Bio Day Conc Rep; model pct = Con|Day/dist=beta link=logit; random intercept/subject=con(bio) type=cs; lsmeans Con|Day/lines ilink; run;

Before fitting the generalized linear mixed model, we compare the estimates of the covariance structures with the beta distribution in the response variable (Table 9.31 part (a)). According to the fit statistics, the covariance structures that best fit the data are the Toeplitz type (Toep(1)) and unstructured (UN).

Table 9.31 Fit statistics for the conditional distribution and variance components

Having defined the covariance structure, in this case, Toeplitz of order 1, we present part of the results of the data fit (Table 9.31 part (b)). The fit statistic Pearsons chi − square/DF = 1.07 indicates that there is no overdispersion and that the beta distribution fits the data adequately. The estimated variance component, under Toeplitz (1), of the concentration–repetition bioassay is \( {\hat{\sigma}}_{\mathrm{con}\left(\omega \right)}^2=0.00285 \) and the scale parameter \( \hat{\phi}=52.281 \) (c).

The fixed effects indicate that there is a highly significant effect of salt concentration (P = 0.0002), time (P = 0.0001), and the interaction concentration x time (P = 0.0074) on the growth inhibition of Fusarium sp. (Table 9.32).

Table 9.32 Type III fixed effects tests

The linear predictors and estimated probabilities of the factors (Table 9.33 parts (a) and (b)) and interaction (Table 9.34) are found under the columns “Estimate” and “Mean,” respectively.

Table 9.33 Concentration and measurement time least square means on the model scale (Estimate) and the data scale (Mean)
Table 9.34 Measuring time*salt concentration interaction on the model scale (Estimate) and the data scale (Mean)

9.8 Carbon Dioxide (CO2) Emission as a Function of Soil Moisture and Microbial Activity

Productive agricultural soil requires a certain level of ventilation to maintain active plant root growth and soil microbial activity. One scientist found that soil oxygenation levels had been affected in soils fertilized with nutrient-rich sludge from a sewage treatment plant. The level of soil aeration can be reduced by (1) the high water content of the sludge added, through compaction with heavy machinery used to add the sludge and, ironically, (2) the increased microbial activity that occurs when sludge with high organic matter content is added. The objective of the research was to determine the moisture levels at which aeration becomes a limiting factor for microbial activity in the soil. The study included a control treatment (no sludge) and three treatments using sludge as a fertilizer with different moisture contents, whose moisture levels for the fertilized soil were 0.24, 0.26, and 0.28 kg water/kg soil.

Soil samples were randomly assigned to the four treatments in a randomized completely design. Soil samples were placed in sealed containers and incubated under favorable conditions for microbial activity. The soil was compacted in the containers simulating a degree of compaction experienced in the field. Microbial activity, measured as an increase in CO2, was used as a measure of the level of soil oxygenation. The CO2 evolution/kilogram soil/day in each container was measured on 2, 4, 6, and 8 days after starting of the incubation period. Microbial activity in each soil sample was recorded as the percentage increase in CO2 produced above the atmospheric level. The data are shown in Table 9.35.

Table 9.35 Repeated measurements of emissions of CO2 by bacterial activity in soil under different moisture conditions

The analysis of variance table for this experiment is shown below (Table 9.36).

Table 9.36 Analysis of variance of an RCD with repeated measures

Let pctijk be the percentage of CO2 emission, assuming that pctijk has a beta distribution with a mean πijk and scale parameter ϕ, i.e., pctijk~Beta(πijk, ϕ). The linear predictor ηijk that relates the mean to the link function is given by

$$ {\eta}_{ij k}=\theta +{\alpha}_i+\alpha {(r)}_{i(k)}+{\tau}_j+{\left(\alpha \tau \right)}_{ij};i=1,\dots, 4,j=1,\dots, 4,k=1,2,3 $$

where θ is the intercept, αi is the fixed effect of the treatment i, α(r)i(k) is the random effect of treatment nested in the repetition k, assuming \( \mathrm{that}\ \alpha {(r)}_{i(k)}\sim N\left(0,{\sigma}_{\alpha (r)}^2\right) \), τj is the fixed effect of measurement time j, and (ατ)ij is the interaction effect of treatment with measurement time. The link function is defined by logit(πijk) = ηijk.

The following SAS syntax fits a GLMM on repeated measures with a beta distribution.

proc glimmix data=co2 method=laplace; class trt container time; model pct = trt|time/dist=beta link=logit; random trt/subject=container; lsmeans trt|time /lines ilink; run;

Part of the results is shown below. The fit statistics under different covariance structures (Table 9.37 part (a)), such as AIC and AICC indicate that a Toeplitz-type covariance structure of order 1 provides the best fit to the dataset of this experiment.

Table 9.37 Fit statistics of the beta GLMM under different covariance structures

Table 9.38 part (a) shows the estimated variance component due to treatment x repetition, i.e., − \( {\hat{\sigma}}_{a(r)}^2=0.03363 \), and the estimated scale parameter \( \hat{\phi}=790.82 \), and the hypothesis test (part (b)) indicates that the treatments yielded statistically different means (P = 0.0011).

Table 9.38 Variance components and fixed effects test

Table 9.39 shows the estimated average emissions of CO2 in tested treatments, which showed that the treatment with moisture 0.24 kg water/kg soil favored a higher microbial activity, whereas treatments with moisture levels 0.26 and 0.28 kg water/kg soil showed similar microbial activity between them.

Table 9.39 Means and standard errors on the model scale (Estimate) and the data scale (Mean)

Figure 9.2 clearly shows that the treatment with moisture 0.24 kg water/kg soil provides the best conditions for soil microbial activity, whereas the rest of the treatments significantly affect the activity of microorganisms.

Fig. 9.2
A line graph plots the percent of C O 2 for C, T 0.24, 0.26, and 0.28. T 0.24 exhibits a decreasing trend. C presents an increasing trend, while T 0.26 and T 0.28 both plot fluctuating trends.

CO2 emission as a measure of microbial activity

9.9 Effect of Soil Compaction and Soil Moisture on Microbial Activity

A soil scientist conducted an experiment to evaluate the effects of soil compaction and soil moisture on microbial activity. Ventilation levels may be restricted in highly saturated or compacted soils, thus reducing microbial activity. The experiment consisted of three levels of soil compaction (1.1, 1.4, and 1.6 mg soil/m3) and three levels of soil moisture (0.1, 0.2, and 0.24 kg water/kg soil). The treated soil samples were placed in sealed containers and incubated under conditions to microbial activity. The percentage increase in CO2 produced above atmospheric levels was measured in each soil sample. The experimental design was a completely randomized design (CRD) with a 3 X 3 factorial structure of treatments. Two replicates of the soil container units were prepared for each treatment. The evolution of CO2/kg soil/day was measured for three successive days. The data from this experiment are shown below in Table 9.40.

Table 9.40 Percentage of CO2 by bacterial activity as a function of soil density (mg soil/m3) and soil humidity (kg water/kg soil)

The analysis of variance table for this experiment is shown below (Table 9.41).

Table 9.41 Analysis of variance of an CRD with factorial structure of treatments in repeated measures

Let pctijk be the percentage of CO2 emission and assume that pctijk has a beta distribution with a mean πijk and scale parameter ϕ, i.e., pctijk~Beta(πijk, ϕ). The linear predictor ηijk that relates the mean to the link function is given by

$$ {\eta}_{ij k l}=\theta +{\alpha}_i+{\beta}_j+{\left(\alpha \beta \right)}_{ij}+\alpha \beta {(r)}_{ij(l)}+{\tau}_k+{\left(\alpha \tau \right)}_{ik}+{\left(\beta \tau \right)}_{jk}+{\left(\alpha \beta \tau \right)}_{ij k} $$
$$ i=1,2,3,j=1,2,3,k=1,2,3,l=1,2 $$

where θ is the intercept, αi is the fixed effect of the density factor, βj is the fixed effect of the humidity factor, (αβ)ij is the effect of the interaction between density and humidity, αβ(r)ij(l) is the random effect of the interaction density × humidity × repetition \( \alpha \beta {(r)}_{ij(l)}\sim N\left(0,{\sigma}_{\alpha \beta (r)}^2\right) \), τl is the fixed effect of measurement time, (ατ)ij is the fixed effect of the interaction between density and measurement time, (βτ)jk is the fixed effect of the interaction between moisture and measurement time, and (αβτ)ijk is the fixed effect of the interaction of density × humidity × time. The link function is defined by logit(πijkl) = ηijkl.

The following SAS GLIMMIX syntax fits a repeated measures GLMM with a beta distribution.

proc glimmix data=co2_fact nobound method=laplace; class density moisture rep time; model pct = density|humidity|time/dist=beta link=logit; random density*humidity/subject=rep type=toep(1); lsmeans density|humidity|time/lines ilink; run;

Part of the results is listed below. The fit statistics (AIC and AICC) in Table 9.42 part (a) indicate that a Toeplitz covariance structure of order 1 provides the best fit to of the data.

Table 9.42 Fit statistics of a beta GLMM with a factorial structure of treatments under different covariance structures

The type III tests of fixed effects in Table 9.43 indicate that soil density (P = 0.0021), humidity (P = 0.0001), the evolution of emission over time (P = 0.0001), and the interaction between moisture and time of measurement (P = 0.0001) are statistically significant.

Table 9.43 Hypothesis testing of the factors under study

The least mean squares obtained with the “lsmeans” command on the model scale are shown under the “Estimate” column and the data scale under the “Mean” column of Table 9.44.

Table 9.44 Means and standard errors and comparison of means (least significance difference (LSD)) on the model scale (Estimate) and data scale (Mean)

9.10 Joint Model for Binary and Poisson Data

Another advantage of the GLIMMIX procedure is the ability to fit models to data where the distribution and/or link function varies with response variables. This is accomplished through the specification of DIST = BYOBS or LINK=BYOBS in the model definition. The dataset created below provides an example of a variable with a bivariate outcome. This reflects the condition and length of hospital stay for 32 patients with herniorrhaphy. These data are taken from data provided by Mosteller and Tukey (1977) and reproduced in the study by Hand et al. (1994) (Table 9.45).

Table 9.45 Hospital condition and length of stay of patients

For each patient, two responses were recorded. A binary response takes the value one if a patient experienced a routine recovery and the value zero if postoperative intensive care was required. The second response variable is a count variable that measures the length of hospital stay after the surgery (in days). The binary variable “OKstatus” is a regressor variable that distinguishes patients according to their postoperative physical status (“1” implies better status), and the variable age is the age of the patient.

These data can be modeled with a separate logistic model for the binary outcome and with a Poisson model for the count outcome. Such separate analyses would not take into account the correlation between the two response variables. It is reasonable to assume that the duration of post-surgery hospitalization is correlated and will depend on whether the patient requires intensive care.

In the following analysis, the correlation between the two types of response variables for a patient is modeled with shared random effects (G-side). The dataset variable “dist” identifies the distribution for each observation. For those observations that follow a binary distribution, the response variable option “(event = “1 “)” determines which value of the binary variable is modeled as the event of interest. Since no “link” option is specified, the link is also chosen on an observation-by-observation basis as a predetermined link for the respective distribution. The following GLIMMIX commands fit this dataset with two distributions:

data Poi_Bin; length dist $7; input d$ patient age OKstatus response @@; if d = ’B’ then dist=’Binary’; else dist=’Poisson’; datalines; B 1 78 1 0 P 1 78 1 9 B 2 60 1 0 P 2 60 1 4 B 3 68 1 1 1 P 3 68 1 7 B 4 62 0 1 P 4 62 0 35 ....................................................................... ....................................................................... ....................................................................... B 29 54 1 0 P 29 54 1 2 B 30 43 1 1 1 P 30 43 1 3 B 31 4 1 1 1 P 31 4 1 3 B 32 52 1 1 1 P 32 52 1 8 ; proc glimmix data=joint; class patient dist; model response(event=’1’) = dist dist*age dist*OKstatus / noint s dist=byobs(dist); random int / subject=patient; lsmeans dist/lines ilink; run;

Some of the output is shown below. Table 9.46 (“Model information”) shows that the distribution of the data is multivariate and that possibly multiple link functions are involved; by default, proc. GLIMMIX uses a logit link for the binary observations and a log link for the Poisson data.

Table 9.46 Model information

Table 9.47 shows the value of the distribution statistic Gener. chi − square/DF = 0.90, which indicates that there is no overdispersion, and also shows the estimated variance component due to patient, which is, \( {\hat{\sigma}}_{\mathrm{patient}}^2=0.299 \). The fixed effects tests for the effects of age and status are shown in part (c).

Table 9.47 Results of the analysis of variance

In addition to the above results, the maximum likelihood estimators of the intercepts, as well as the values of the slopes of each of the variables of both probability distributions, are tabulated in Table 9.48.

Table 9.48 Maximum likelihood estimators for fixed effects

Thus, to calculate the probability that a patient will experience a routine recovery, the following expression is used:

$$ {\displaystyle \begin{array}{l}\hat{\pi}=\frac{1}{1+{\exp}^{\left\{-{\hat{\beta}}_0-{\hat{\beta}}_1\times \mathrm{age}-{\hat{\beta}}_2\times \mathrm{okstatus}\right\}}}\\ {}=\frac{1}{1+{\exp}^{\left\{-5.7783+0.07572\times \mathrm{age}+0.4697\times \mathrm{okstatus}\right\}}}\end{array}} $$

whereas the following expression is used to calculate the average value of the length of hospital stay after the surgery (in days):

$$ \hat{\lambda}={\exp}^{\left\{{\hat{\alpha}}_0+{\hat{\alpha}}_1\times \mathrm{age}+{\hat{\alpha}}_2\times \mathrm{okstatus}\right\}}={\exp}^{\left\{0.8410+0.01875\times \mathrm{age}-0.1856\times \mathrm{okstatus}\right\}} $$

9.11 Exercises

Exercise 9.11.1

Consider an experiment in which three treatments are compared. There are r blocks of n animals, each using grouping criteria relevant to the experiment. Within each block, one animal is randomly assigned to each treatment. A measurement was taken on animals at “week 0,” when treatments were applied, and again at weeks 4 and 12. Variables measured included weight, the presence or absence of disease symptoms, and severity of symptoms, classified as “worse,” “no change,” or “better.” The focus of this experiment was on repeated measures analysis of the last two types of data in the above list: categorical data that are binary or ordinal and ordinal responses/ratings in an experiment designed with a repeated measures and treatment factor structure. Regardless of whether the observations are normally distributed, categorical, or have some other distribution, a general approach to repeated measures analysis based on the linear mixed model uses the following general form:

$$ \mathrm{Observation}=\mathrm{systematic}\ \mathrm{between}-\mathrm{subjects}\ \mathrm{variation}+\mathrm{random}\ \mathrm{between}-\mathrm{subjects}\ \mathrm{variation}+\mathrm{systematic}\ \mathrm{within}-\mathrm{subjects}\ \mathrm{effects}+\mathrm{random}\ \mathrm{within}-\mathrm{subjects}\ \mathrm{variation}. $$

The following table shows the data from an experiment in which each cell contains the number of animals in a given treatment × week × response category combination (Table 9.49).

Table 9.49 Results of a repeated measures experiment with an ordinal response variable
  1. (a)

    List all the components of the repeated measures under a multinomial GLMM.

  2. (b)

    Study and choose the best covariance structure that models this dataset. Cite the most relevant results.

  3. (c)

    Fit the multinomial cumulative logit model to these data. Perform a complete and appropriate analysis of the data, focusing on:

    1. (i)

      An evaluation of the effects of the combination of treatments

    2. (ii)

      Odds ratio interpretation

    3. (iii)

      The expected probability per category for each treatment

  4. (d)

    Test whether the proportional odds assumption is viable. Cite relevant evidence to support your conclusion regarding the adequacy of the assumption.

Repeat (b) through (d), assuming a generalized multinomial logit in Exercise 9.11.1. Discuss your results.

Repeat (b) through (d) assuming a multinomial cumulative probit in Exercise 9.11.1. Discuss your results and compare with those found in (1) and (2).

Alternatively, the contingency table approach can be implemented using a log-linear model. For the previous example, 9.11.1, fit the log-linear model

$$ \log \left({\lambda}_{ij k}\right)=\mu +{\tau}_i+{\varpi}_j+{\left(\tau \varpi \right)}_{ij}+{c}_k+{\left(\tau c\right)}_{ik}+{\left(\tau \varpi c\right)}_{ij k} $$

where λijk is the expected count of the treatment combination ijk by week by response category and τ, ϖ, and c refer to treatment, week,and response category effects, respectively.

Exercise 9.11.2

Fertilization of turf has traditionally been accomplished through surface applications. The introduction of new equipment (Hydroject) has made it possible to place soluble materials below the surface (Table 9.50).

Table 9.50 Nitrogen injection treatment factors study

A study was conducted during the 1997 growing season to compare surface application and subsoil injection of nitrogen on the green color of bentgrass (Agrostis palustris L. Huds) 1 year after transplanting. The treatment structure was a full factorial of grass management factors (four types/levels) and the rate/level (two levels) of nitrogen application per square meter (g/m2). Eight treatment combinations were arranged in a completely randomized design with four replications. Turf color was evaluated in each experimental unit at weekly intervals of 4 weeks as poor, average, good, or excellent.

Of particular interest was the determination of the water injection effect, the subsurface effect, and the comparison of injection versus surface applications. These are contrasts between the levels of factor management practice and their primary objective, which was to determine whether the factor interacts with the rate of application.

  1. (a)

    List all the GLMM components of this experiment.

  2. (b)

    Fit the multinomial cumulative logit proportional odds model to these data. Perform a complete and appropriate analysis of the data, focusing on:

    1. (i)

      An evaluation of the effects of the combination of treatments

    2. (ii)

      Interpretation of the odds ratios

    3. (iii)

      The expected probability per category for each treatment

  3. (c)

    Test whether the proportional odds assumption is viable. Cite relevant evidence to support your conclusion regarding the adequacy of the assumption.

Exercise 9.11.3

Refer to Exercise 9.11.1.

  1. (a)

    Fit the multinomial generalized logit proportional odds model to these data.

  2. (b)

    List all the components of the GLMM of this experiment.

  3. (c)

    Perform a complete and appropriate analysis of the data, focusing on:

    1. (i)

      An evaluation of the effects of the combination of treatments

    2. (ii)

      Interpretation of odds ratios

    3. (iii)

      The expected probability per category for each treatment

  4. (d)

    Test whether the proportional odds assumption is viable. Cite relevant evidence to support your conclusion regarding the adequacy of the assumption.

Exercise 9.11.4

Refer to Exercise 9.11.1.

  1. (a)

    List all the components of the GLMM of this experiment.

  2. (b)

    Fit the multinomial cumulative probit proportional odds model to these data. Perform a complete and appropriate analysis of the data, focusing on:

    1. (i)

      An evaluation of the effects of the combination of treatments

    2. (ii)

      Interpretation of the odds ratios

    3. (iii)

      The expected probability per category for each treatment

  3. (c)

    Test whether the proportional odds assumption is viable. Cite relevant evidence to support your conclusion regarding the adequacy of the assumption.