Generalized Linear Mixed Models for Proportions and Percentages

Salinas Ruíz, Josafhat; Montesinos López, Osval Antonio; Hernández Ramírez, Gabriela; Crossa Hiriart, Jose

doi:10.1007/978-3-031-32800-8_6

4541 Accesses
1 Citations

Abstract

In this chapter, we will review generalized linear mixed models (GLMMs) whose response can be either a proportion or a percentage. For proportion and percentage data, we refer to data whose expected value is between 0 and 1 or between 0 and 100. For the remainder of this book, we will refer to this type of data only in terms of proportion, knowing that it is possible to change it to a percentage scale only when multiplying it by 100. Proportions can be classified into two types: discrete and continuous. Discrete proportions arise when the unit of observation consists of N distinct entities, of which individuals have the attribute of interest ^“y. ^” N must be a nonnegative integer and ^“y^” must be a positive integer; here, y ≤ N. Therefore, the observed proportion must be a discrete fraction, which can take values $ \frac{0}{N},\frac{1}{N},\cdots, \frac{N}{N} $. A binomial distribution is the sum of a series of m independent binary trials (i.e., trials with only two possible outcomes: success or failure), where all trials have the same probability of success. For binary and binomial distributions, the target of inference is the value of the parameter such that $ 0\le E\left(\frac{y}{N}\right)=\pi \le 1 $. Continuous proportions (ratios) arise when the researcher measures responses such as the fraction of the area of a leaf infested with a fungus, the proportion of damaged cloth in a square meter, the fraction of a contaminated area, and so on. As with the binomial parameter π, the continuous rates (fractions) take values between 0 and 1, but, unlike the binomial, the continuous proportions do not result from a set of Bernoulli tests. Instead, the beta distribution is most often used when the response variable is in continuous proportions. In the following sections, we will first address issues in modeling when we have binary and binomial data. When the response variable is binomial, we have the option of using a linearization method (pseudo-likelihood (PL)) or the Laplace or quadrature integral approximation (Stroup 2012).

You have full access to this open access chapter, Download chapter PDF

6.1 Response Variables as Ratios and Percentages

In this chapter, we will review generalized linear mixed models (GLMMs) whose response can be either a proportion or a percentage. For proportion and percentage data, we refer to data whose expected value is between 0 and 1 or between 0 and 100. For the remainder of this book, we will refer to this type of data only in terms of proportion, knowing that it is possible to change it to a percentage scale only when multiplying it by 100. Proportions can be classified into two types: discrete and continuous. Discrete proportions arise when the unit of observation consists of N distinct entities, of which individuals have the attribute of interest “y”. N must be a nonnegative integer and “y” must be a positive integer; here, y ≤ N. Therefore, the observed proportion must be a discrete fraction, which can take values $ \frac{0}{N},\frac{1}{N},\cdots, \frac{N}{N} $. A binomial distribution is the sum of a series of m independent binary trials (i.e., trials with only two possible outcomes: success or failure), where all trials have the same probability of success. For binary and binomial distributions, the target of inference is the value of the parameter such that $ 0\le E\left(\frac{y}{N}\right)=\pi \le 1 $. Continuous proportions (ratios) arise when the researcher measures responses such as the fraction of the area of a leaf infested with a fungus, the proportion of damaged cloth in a square meter, the fraction of a contaminated area, and so on. As with the binomial parameter π, the continuous rates (fractions) take values between 0 and 1, but, unlike the binomial, the continuous proportions do not result from a set of Bernoulli tests. Instead, the beta distribution is most often used when the response variable is in continuous proportions. In the following sections, we will first address issues in modeling when we have binary and binomial data. When the response variable is binomial, we have the option of using a linearization method (pseudo-likelihood (PL)) or the Laplace or quadrature integral approximation (Stroup 2012).

6.2 Analysis of Discrete Proportions: Binary and Binomial Responses

A binomial distribution is the number of successes from a series of N independent binary trials – Bernoulli trials (i.e., trials with two possible outcomes: success or failure), where all trials have the same probability of success. In the context of a GLMM, there are N binomial responses, each of which is the result of binary trials. The ith response consists of two pieces of information: the number of trials n_i and the number of successes y_i, as shown in the following example.

6.2.1 Completely Randomized Design (CRD): Methylation Experiment

An agent to induce demethylation is applied to plants; this agent converts methylated nucleotides to their unmethylated forms, thus causing epigenetic changes that produce or induce abnormal phenotypes such as deformation or stunting (Amoah et al. 2008). A pilot study was implemented to investigate the relationship between the dose of the demethylating agent and the observed proportion of plants with a normal phenotype. Seeds were treated with the demethylating agent at six different doses, including the control. Plants were sown in trays, with each tray containing seeds previously treated with the same dose of the demethylating agent. Each dose was replicated 4 times: 2 with 60 plants and 2 with 100 plants. The trays were allocated following a completely randomized design (CRD). The plants with a normal phenotype in each tray are shown (in Table 6.1) with the number of plants per tray (N). The notation 59(60) indicates that 59 normal plants were found out of 60 plants under study. In the same way, the notation 14(100) indicates that 14 normal plants were found out of 100 plants under study.

Table 6.1 Number of normal plants out of a total of N plants per tray and dose of the demethylating agent

Block 1
A₃				A₁				A₂
B₁		B₂		B₁		B₂		B₂		B₁
C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂
C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁

Block 2
A₂				A₁				A₃
B₁		B₂		B₁		B₂		B₂		B₁
C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁	C₁
C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂	C₂

Data: Fleas
Bioen	SP	Treat	Rep	Overvi	Dead
B1	Daphnia	T1	1	10	0
B1	Daphnia	T1	2	10	0
B1	Daphnia	T1	3	10	0
B1	Daphnia	T2	1	10	0
B1	Daphnia	T2	2	10	0
B1	Daphnia	T2	3	10	0
B1	Daphnia	T3	1	9	1
B1	Daphnia	T3	2	9	1
B1	Daphnia	T3	3	8	2
B1	Daphnia	T4	1	2	8
B1	Daphnia	T4	2	2	8
B1	Daphnia	T4	3	3	7
B1	Daphnia	T5	1	0	10
B1	Daphnia	T5	2	0	10
B1	Daphnia	T5	3	0	10
B1	Daphnia	T6	1	0	10
B1	Daphnia	T6	2	0	10
B1	Daphnia	T6	3	0	10
B2	Daphnia	T1	1	10	0
B2	Daphnia	T1	2	10	0
B2	Daphnia	T1	3	10	0
B2	Daphnia	T2	1	10	0
B2	Daphnia	T2	2	10	0
B2	Daphnia	T2	3	10	0
B2	Daphnia	T3	1	9	1
B2	Daphnia	T3	2	9	1
B2	Daphnia	T3	3	9	1
B2	Daphnia	T4	1	2	8
B2	Daphnia	T4	2	2	8
B2	Daphnia	T4	3	2	8
B2	Daphnia	T5	1	0	10
B2	Daphnia	T5	2	0	10
B2	Daphnia	T5	3	0	10
B2	Daphnia	T6	1	0	10
B2	Daphnia	T6	2	0	10
B2	Daphnia	T6	3	0	10
B3	Daphnia	T1	1	10	0
B3	Daphnia	T1	2	10	0
B3	Daphnia	T1	3	10	0

Data: Commercial crop explant detachment
Block	A	B	C	y	N
1	1	1	1	15	73
2	1	1	1	10	86
1	1	1	2	17	69
2	1	1	2	19	32
1	1	2	1	26	125
2	1	2	1	21	62
1	1	2	2	14	81
2	1	2	2	12	21
1	2	1	1	10	92
2	2	1	1	12	108
1	2	1	2	30	44
2	2	1	2	32	33
1	2	2	1	37	91
2	2	2	1	30	42
1	2	2	2	32	98
2	2	2	2	37	44
1	3	1	1	18	52
2	3	1	1	18	73
1	3	1	2	23	108
2	3	1	2	21	55
1	3	2	1	24	106
2	3	2	1	27	92
1	3	2	2	37	64
2	3	2	2	37	97

Generalized Linear Mixed Models for Proportions and Percentages

Abstract

6.1 Response Variables as Ratios and Percentages

6.2 Analysis of Discrete Proportions: Binary and Binomial Responses

6.2.1 Completely Randomized Design (CRD): Methylation Experiment

6.3 Factorial Design in a Randomized Complete Block Design (RCBD) with Binomial Data: Toxic Effect of Different Treatments on Two Species of Fleas

6.4 A Split-Plot Design in an RCBD with a Normal Response

6.4.1 An RCBD Split Plot with Binomial Data: Carrot Fly Larval Infestation of Carrots

6.4.1.1 Linear Predictor Review (ηijk)

6.4.1.2 Scale Parameter

6.4.1.3 Alternative Distribution

6.5 A Split-Split Plot in an RCBD:- In Vitro Germination of Seeds

6.6 Alternative Link Functions for Binomial Data

6.6.1 Probit Link: A Split-Split Plot in an RCBD with a Binomial Response

6.6.2 Complementary Log-Log Link Function: A Split Plot in an RCBD with a Binomial Response

6.7 Percentages

6.7.1 RCBD: Dead Aphid Rate

6.7.2 RCBD: Percentage of Quality Malt

6.7.3 A Split Plot in an RCBD: Cockroach Mortality (Blattella germanica)

6.7.4 A Split-Plot Design in an RCBD: Percentage Disease Inhibition

6.7.5 Randomized Complete Block Design with a Binomial Response with Multiple Variance Components

6.8 Exercises

Exercise 6.8.1

Exercise 6.8.2

Exercise 6.8.3

Exercise 6.8.4

Exercise 6.8.5

Exercise 6.8.6

Exercise 6.8.7

Exercise 6.8.8

References

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

6.4.1.1 Linear Predictor Review (η_ijk)