Elements of Generalized Linear Mixed Models

Salinas Ruíz, Josafhat; Montesinos López, Osval Antonio; Hernández Ramírez, Gabriela; Crossa Hiriart, Jose

doi:10.1007/978-3-031-32800-8_1

3519 Accesses

Abstract

Linear models are commonly used to describe and analyze datasets from different research areas, such as biological, agricultural, social, and so on. A linear model aims to best represent/describe the nature of a dataset. A model is usually made up of factors or a series of factors that can be nominal or discrete variables (sex, year, etc.) or continuous variables (age, height, etc.), which have an effect on the observed data. Linear models are the most commonly used statistical models for estimating and predicting a response based on a set of observations.

You have full access to this open access chapter, Download chapter PDF

1.1 Introduction to Linear Models

Linear models are commonly used to describe and analyze datasets from different research areas, such as biological, agricultural, social, and so on. A linear model aims to best represent/describe the nature of a dataset. A model is usually made up of factors or a series of factors that can be nominal or discrete variables (sex, year, etc.) or continuous variables (age, height, etc.), which have an effect on the observed data. Linear models are the most commonly used statistical models for estimating and predicting a response based on a set of observations.

Linear models get their name because they are linear in the model parameters. The general form of a linear model is given by

$$ \boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta } +\boldsymbol{\varepsilon} \kern0.5em $$

(1.1)

where y is the vector of dimension n × 1 observed responses, X is the design matrix of n × (p + 1) fixed constants, β is the vector of (p + 1) × 1 parameters to be estimated (unknown), and ε is the vector of n × 1 random errors. Linearity arises because the mean response of vector y is linear to the vector of unknown parameters β. Mathematically, this is demonstrated by obtaining the first derivative of the predictor with respect to β, and, if after derivation it is still a function of any of the beta parameters, then the model is said to be nonlinear; otherwise, it is a linear model. In this case, the derivative of the predictor (1.1) with respect to beta is equal to X, so, mathematically, the model in (1.1) is linear, since after derivation, the predictor no longer depends on the β parameters.

Several models used in statistics are examples of the general linear model y = Xβ + ϵ. These include regression models and analysis of variance (ANOVA) models. Regression models generally refer to those in which the design matrix X is of a full column rank, whereas in analysis of variance models, the design matrix X is not of a full column rank. Some linear models are briefly described in the following sections.

1.2 Regression Models

Linear models are often used to model the relationship between a variable, known as the response or dependent variable, y, and one or more predictors, known as independent or explanatory variables, X₁, X₂, ⋯, X_p.

1.2.1 Simple Linear Regression

Consider a model in which a response variable y is linearly related to an explanatory variable X₁ via

$$ {y}_i={\beta}_0+{\beta}_1{X}_{1i}+{\varepsilon}_i $$

where ε_i are uncorrelated random errors (i = 1, 2, ⋯, n) which are commonly assumed to be normally distributed with mean 0 and variance constant σ² > 0, ε_i ~ N(0, σ²). If X₁₁, X₁₂, ⋯, X_1n are constant (fixed), then this is a general linear model y = Xβ + ϵ where

$$ {\boldsymbol{y}}_{\boldsymbol{n}\times \mathbf{1}}=\left(\begin{array}{c}{y}_1\\ {}{y}_2\\ {}\begin{array}{c}\vdots \\ {}{y}_n\end{array}\end{array}\right),\kern0.5em {\boldsymbol{X}}_{\boldsymbol{n}\times \mathbf{2}}=\left(\begin{array}{cc}1& {X}_{11}\\ {}\begin{array}{c}1\\ {}\vdots \end{array}& \begin{array}{c}{X}_{12}\\ {}\vdots \end{array}\\ {}1& {X}_{1n}\end{array}\right),\kern0.5em {\boldsymbol{\beta}}_{\mathbf{2}\times \mathbf{1}}=\left(\begin{array}{c}{\beta}_0\\ {}{\beta}_1\end{array}\right),\kern0.5em {\boldsymbol{\epsilon}}_{\boldsymbol{n}\times \mathbf{1}}=\left(\begin{array}{c}{\upvarepsilon}_1\\ {}{\upvarepsilon}_2\\ {}\begin{array}{c}\vdots \\ {}{\upvarepsilon}_n\end{array}\end{array}\right) $$

Example

Let us consider the relationship between the performance test scores and tissue concentration of lysergic acid diethylamide commonly known as LSD (from German Lysergsäure-diethylamid) in a group of volunteers who received the drug (Wagner et al. 1968). The average scores on the mathematical test and the LSD tissue concentrations are shown in Table 1.1.

Table 1.1 Average mathematical test scores and LSD tissue concentrations

Population	Plant	Stamens	Eggs	Total no. of flowers	Ratio (stamens/ovules)
St. Croix	1	30.75	13.75	8	2.24
St. Croix	2	33.83	16.17	12	2.09
St. Croix	3	35.67	16.33	6	2.18
St. Croix	4	35.40	17.40	14	2.03
St. Croix	5	33.50	23.50	13	1.43
St. Croix	6	37.40	18.40	10	2.03
St. Croix	7	33.57	21.29	25	1.58
St. Croix	8	29.86	28.71	20	1.04
St. Croix	9	33.80	29.60	17	1.14
St. Croix	10	31.60	25.80	14	1.22
St. Croix	11	32.57	27.50	21	1.18
St. Croix	12	31.80	24.00	13	1.33
St. Croix	13	35.25	17.75	8	1.99
St. Croix	14	30.00	16.83	13	1.78
St. Croix	15	30.50	18.75	9	1.63
St. Croix	16	32.20	21.40	13	1.50
St. Croix	17	32.40	26.25	12	1.23
St. Croix	18	38.50	17.75	8	2.17
St. Croix	19	37.00	25.83	16	1.43
St. Croix	20	33.00	25.25	8	1.31
St. Croix	21	31.40	25.20	15	1.25
St. Croix	22	31.80	25.60	14	1.24
St. Croix	23	30.40	19.20	15	1.58
St. Croix	24	35.20	22.40	22	1.57
St. Croix	25	27.80	20.80	10	1.34
St. Croix	26	31.29	22.71	14	1.38
St. Croix	27	32.83	22.33	20	1.47
St. Croix	29	31.20	17.40	14	1.79
St. Croix	30	33.00	19.20	13	1.72
St. Croix	31	33.80	22.20	13	1.52
St. Croix	32	32.22	27.63	31	1.17
St. Croix	33	32.91	28.73	18	1.15
St. Croix	34	34.50	15.75	9	2.19
St. Croix	35	28.33	17.33	8	1.63
St. Croix	36	30.71	23.14	14	1.33
St. Croix	37	33.00	24.00	14	1.38
St. Croix	38	31.00	20.50	4	1.51
St. Croix	39	35.00	21.83	15	1.60
St. Croix	40	35.00	18.00	10	1.94
Cedar Creek	1	30.17	18.67	16	1.62

Elements of Generalized Linear Mixed Models

Abstract

1.1 Introduction to Linear Models

1.2 Regression Models

1.2.1 Simple Linear Regression

Example

1.2.2 Multiple Linear Regression

Example

1.3 Analysis of Variance Models

1.3.1 One-Way Analysis of Variance

Example

1.3.2 Two-Way Nested Analysis of Variance

Example

1.3.3 Two-Way Analysis of Variance with Interaction

Example

1.4 Analysis of Covariance (ANCOVA)

1.5 Mixed Models

1.5.1 Introduction

1.5.2 Mixed Models

1.5.3 Distribution of the Response Variable Conditional on Random Effects (y| b)

1.5.4 Types of Factors and Their Related Effects on LMMs

1.5.4.1 Fixed Factors

1.5.4.2 Random Factors

1.5.4.3 Fixed Versus Random Factors

1.5.5 Nested Versus Crossed Factors and Their Corresponding Effects

1.5.6 Estimation Methods

1.5.6.1 Maximum Likelihood

1.5.6.2 Restricted Maximum Likelihood Estimation

1.5.7 One-Way Random Effects Model

1.5.8 Analysis of Variance Model of a Randomized Block Design

Example

1.6 Exercises

Exercise 1.6.1

Exercise 1.6.2

Exercise 1.6.3

Exercise 1.6.4

References

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation