Advertisement

The Analysis of Individual Health Trajectories Across the Life Course: Latent Class Growth Models Versus Mixed Models

  • Trynke HoekstraEmail author
  • Jos W. R. Twisk
Part of the Life Course Research and Social Policies book series (LCRS, volume 4)

Abstract

When analysing individual trajectories across the life course, the fact that repeated observations of individuals are not independent (i.e. correlated) should be taken into account. Several techniques to do this are available, such as mixed models, MM and latent growth models, LGM. These models can also elegantly incorporate different stages of the life course, including childhood, adolescence and adulthood in the modelling process. MM do so by the inclusion of a ‘time’ variable denoting each stage in the model and LGM can be conducted in a piecewise manner, where each ‘piece’ represents a life course stage. Moreover, both techniques can further be extended to allow for possible heterogeneity in health trajectory (shape), but do so in different ways. MM can include random slopes to account for heterogeneity in growth; LGM can be extended into latent class growth models to allow for the possible revelation of subgroups of individuals determined by the data with distinct health trajectories across the life course.

This chapter will explain and compare the two techniques using existing life course data of the Amsterdam Growth and Health Study cohort and combines a methodological focus with empirical findings.

Keywords

Bayesian Information Criterion Random Slope Individual Trajectory Latent Growth Model Health Trajectory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Researchers in the field of life course research are often interested in analysing longitudinal data. One of the main advantages of longitudinal data is the possibility to study individual development over time (Singer and Willett 2003; Twisk 2013). Studying these individual trajectories for example helps to better understand how risk factors for diseases naturally develop throughout the life course and aids the understanding and unravelling of the aetiology of diseases, which is important for early detection and prevention. Several statistical techniques to study these individual trajectories are available, of which mixed models (MM) and latent (class) growth models (LCGM) are the most common. Both techniques aim to study (heterogeneity in) individual health trajectories, but do so in different ways. In short, MM are regression-based techniques designed to study the individual development of a certain variable over time by adjusting for the correlations of the observations within one subject. This is done by estimating the differences between the subjects; a variance for the intercept and a variance for the slope(s). The intercept is indicative for the baseline values and the slope(s) for the rate of development over time. LCGM also aims to study the individual development of a certain variable over time, but uses the data to additionally generate groups of individuals with comparable trajectories over time.

The purpose of this chapter is to explain and compare MM and LCGM in more detail by combining a methodological focus with empirical findings using existing life course data of the Amsterdam Growth and Health Longitudinal Study cohort. We will address practical issues with which researchers within life course research are confronted when aiming to find answers to their research questions, paying particular attention to when, how and why to use MM and LCGM. Further, detailed readings are recommended for interested readers (Hancock and Samuelsen 2008; Nagin and Odgers 2010; Nagin 2005; Preacher et al. 2008; Singer and Willett 2003; Twisk 2013).

The Amsterdam Growth and Health Longitudinal Study

In 1974, the Amsterdam Growth and Health Longitudinal Study (AGHLS) was initially planned to monitor the growth, health and lifestyle of groups of adolescents entering secondary school, over a period of 4 years (Wijnstok et al. 2013). The reason for this follow-up was a series of intervention studies to measure the effectiveness of more intensive and extra physical education lessons in 12–13 year-old boys (Bakker et al. 2003; Kemper 1995). In general, no clear effects were found. There were indications that large inter-individual differences between the pupils in biological development and habitual physical activity could have masked any intervention effects. At that time, health authorities were complaining about the level of fitness of youngsters in their late teens. In growing towards independence, the life-style habits of teenagers change considerably (with regard to physical activity, food intake, tobacco smoking and alcohol consumption). Thereby, their health perspective might also have changed. This illustrated that the teenage period is an important period in the life course. Because individual changes in growth and development can be described most precisely by studying the same participants over a longer period of time, the Amsterdam Growth and Health Longitudinal Study (AGHLS) was set up; approximately 500 boys and girls (mean age of 13 years) from the first two grades of two secondary schools in the Netherlands were included in the study.

In the first 4 years, the AGHLS was primarily aimed at the investigation of the natural course of dietary intake, physical activity and physical fitness as risk factors for cardiovascular disease, such as cholesterol levels. Subsequently, the AGHLS was expanded with further rounds of measurements after the adolescence period, during the young adulthood phase. At each of these follow-up rounds of measurement, cardiovascular disease risk indicators and lifestyle variables were measured in a similar way to allow longitudinal comparison. Also, new, age-relevant measures for health and lifestyle were added to the test battery at each follow-up round. For example, when subjects were in their 20s, measurements of bone mineral density were added (Welten et al. 1994). In their 30s, the test battery was extended with, for example, carotid ultrasound measurements to determine large arterial properties to examine not only the ‘clinical’ cardiovascular risk factors but also the early preclinical cardiovascular damage (Schouten et al. 2011; van de Laar et al. 2012). In their 40s, microvascular function was added to the test battery as an even earlier indicator for vascular damage (Wijnstok et al. 2010, 2012). Moreover, to study the onset, timing and progression of neuropsychiatric disease, such as Alzheimer’s disease, a magnetoencephalogram, a novel brain scan to assess communication of different brain sections, was added to the test battery (Douw et al. 2014).

During the adolescence period, participants were measured annually during school hours, and thereafter, six more examinations took place, of which the most recent took place when the participants were 42 years old. Currently, around 350 participants are still enrolled in the study. In 2013, a cohort profile describing in detail the Amsterdam Growth and Health Longitudinal Study was published (Wijnstok et al. 2013).

Focus of the Cohort for This Chapter

The substantive focus of this paper will be illustrated by body mass index (BMI), a well-known construct across many fields of research. Because this chapter focuses mainly on methodological issues, although by using empirical data, substantive findings will not be discussed in great detail. Interested readers are referred to the literature specifically focused on BMI, outside the scope of this chapter (e.g. Hoekstra et al. 2011).

For the analyses in this chapter we selected subjects who had valid measurements of BMI available at baseline (i.e. at age 13) and who had at least two additional valid measurements over time. In total, 378 subjects (51 % females) who were measured 3–10 times were included for the analyses presented in this chapter. Baseline characteristics of the sample are presented in Table 9.1. The sample size varies across the first four measurements because part of the sample was measured once during that period (Wijnstok et al. 2013). We performed the MM in MLWiN 2.30 (Rasbash et al. 2005) and the LCGM in Mplus 7.11 (Muthén and Muthén 2012) and first analysed the development of BMI over time for the whole cohort, and second we analysed differences in the development of BMI over time between males and females.
Table 9.1

Descriptive information (mean, standard deviation) regarding the development of body mass index

  

BMI (kg/m−2), SD

BMI (kg/m−2), SD

BMI (kg/m−2), SD

Age

Sample size

Females

Males

Study sample

13

321

17.81, 2.14

16.93, 1.42

17.39, 1.88

14

279

18.66, 2.08

17.68, 1.49

18.23, 1.90

15

272

19.41, 2.14

18.38, 1.66

18.95, 2.01

16

275

20.01, 2.10

19.21, 1.70

19.66, 1.97

21

131

21.65, 2.48

21.20, 1.67

21.45, 2.17

27

133

22.03, 2.36

22.46, 2.17

22.22, 2.28

29

125

22.46, 2.58

22.94, 2.22

23.70, 2.41

32

314

22.85, 3.13

23.79, 2.41

23.28, 2.85

36

300

23.57, 3.47

24.60, 2.68

24.06, 3.16

42

336

24.09, 3.87

25.20, 2.92

24.62, 3.49

SD standard deviation

Analysis of Health Trajectories Over Time: Mixed Models

When analysing health trajectories over time, we should take into account the fact that repeated observations of each individual in the dataset are not independent (Singer and Willett 2003; Twisk 2013). Simple regression analyses or analysis of variance are unable to fully incorporate these correlated measures and therefore, more sophisticated statistical techniques are needed to analyse these data accordingly. Several techniques to do so are available, of which mixed models (MM) are the most well-known techniques.

Model Assumptions

Mixed models have relatively straightforward underlying assumptions. Because they are regression-based models MM rely on the same assumptions (Altman 1991). Specifically, the residuals of the outcome variable should be normally distributed and in addition, the residuals around the (random) intercept and slopes should approximate a normal distribution. This can be easily assessed in any statistical software package. MM are also well suited in the case of unequal repeated measurements and the model assumes data to be missing at random.

The idea behind MM (Twisk 2013) was initially developed in the social sciences, specifically for educational research. Investigating the performance of pupils within schools, researchers realised that the performances of pupils within the same class were not independent (in other words; these performances are correlated). This type of study design is characterised by a hierarchical structure; also referred to as a multilevel structure. Pupils are nested within classes and classes are nested within schools. More or less the same structure is seen in longitudinal datasets where observations are nested within the subject: observations of one subject over time are correlated. MM adjusts for this correlation between repeated observations by modelling the variability among individuals through the inclusion of both fixed and random effects. To understand the general idea behind a mixed model, it is important to realise that with simple regression analyses adjusting for a certain variable (for example gender) means that different intercepts are calculated. In a longitudinal setting (thus adjusting for subject), this then means that for each subject, different intercepts are calculated. The simplest form of a MM takes into account this random intercept only. Figure 9.1 demonstrates that a model with a random intercept only allows the intercepts to vary across subjects. In addition to a random intercept, it is also possible that the development over time varies across individuals (indicated by a random slope). This is demonstrated in Fig. 9.2, where only linear slopes are assumed.
Fig. 9.1

Visualisation of different intercepts for each subject (random intercept)

Fig. 9.2

Visualisation of differences in the development over time for each subject (random intercept and slope)

The Modelling of Time

Within life course epidemiology, MM are usually conducted to describe the development of a particular outcome variable over time (Beunen et al. 2002; Twisk 2013). In that case, the only independent variable of interest is a time variable, which can be included either as a continuous variable (indicating a linear development or other polynomial function of the outcome variable over time) or as a categorical variable. The modelling of time as a categorical variable allows for the incorporation of different developmental shapes during different stages in the life course, such as childhood, adolescence, adulthood and the elderly. This closely maps onto one of the focal points of life course research, which offers opportunities to understand the natural history of chronic disorders with a specific focus on health transitions and phases across the life course (Kuh and Ben-Shlomo 2004).

Comparing Models

Model fit of MM can be assessed by comparing the log likelihood values of neighbouring [nested] models. The likelihood ratio test (Bollen 1989) subsequently calculates the difference in -2 log-likelihood between two neighbouring models. This difference is assumed to follow a Chi-squared distribution with the number of degrees of freedom denoted as the difference in the number of parameters estimated with the two models.

Strategy and Results: Mixed Models

First, we will start by modelling a linear development of BMI over time. This model is relatively straightforward and includes the time variable as a continuous variable, which, in the case of the AGHLS would be 1, 2, 3, 4, 9, 15, 17, 20, 24, and 30. These values reflect the actual time between measurement rounds, taking into account the unequally spaced time points. Alternatively, ignoring the unequally spaced time points, this variable could also have values between 1 and 10 (one unit per measurement).

The simplest model includes a random intercept only. Table 9.2 demonstrates the results of this model and indicates that body mass index values are increasing over time. The regression coefficient for time for the linear model (model 1) is 0.240, so for each year, the BMI increases (linearly) with 0.240 kg/m2. The corresponding standard error (0.003; indicative of the preciseness in the estimation of the regression coefficient – the smaller the more precise) can be used to assess the statistical significance of the regression coefficient by calculating the z-statistic (Altman 1991; Twisk 2013). The z-statistic is calculated by dividing the regression coefficient with the corresponding standard error. Values above 1.96 indicate statistical significance.
Table 9.2

Results of the mixed models

    

Slope (SE)

Slope (SE)

 

Model

Slope (SE)

−2 LL

Model

Males

Females

−2 LL

Linear

  

Linear

   

Model 1

0.240 (0.003)

10,469

Model 1

0.285 (0.005)

0.200 (0.004)

10,288

Model 2

0.242 (0.006)

9,425

Model 2

0.286 (0.009)

0.202 (0.008)

9,367

Quadratic

  

Quadratic

   

Model 1

0.436 (0.013); −0.007a

10,244

Model 1

0.516 (0.018); −0.008a

0.368 (0.017); −0.006a

10,035

Model 2

(0.0004)a

0.443 (0.014); −0.007a (0.0004)a

8,753

Model 2

(0.001)a

0.527 (0.019); −0.008a (0.001)a

(0.001)a

0.368 (0.017); −0.006a (0.001)a

8,688

Life course

  

Life course

   

Model 1

 

10,134

Model 1

  

9,916

Time2

0.936 (0.13)

 

Time2

0.788 (0.181)

6.277 (0.160)

 

Time 3

1.677 (0.13)

 

Time 3

1.520 (0.183)

5.753 (0.165)

 

Time 4

2.334 (0.13)

 

Time 4

2.344 (0.183)

5.088 (0.162)

 

Time 9

4.046 (0.17)

 

Time 9

4.318 (0.237)

4.706 (0.224)

 

Time 15

5.928 (0.17)

 

Time 15

5.633 (0.233)

4.286 (0.213)

 

Time 17

5.400 (0.17)

 

Time 17

6.117 (0.231)

3.749 (0.212)

 

Time 20

5.920 (0.12)

 

Time 20

6.871 (0.173)

2.255 (0.167)

 

Time 24

6.654 (0.13)

 

Time 24

7.636 (0.173)

1.729 (0.167)

 

Time 30

7.221 (0.12)

 

Time 30

8.253 (0.168)

0.982 (0.166)

 

Model 2

 

8,026

Model 2

  

7,958

Time2

0.809 (0.04)

 

Time2

0.895 (0.059)

0.747 (0.065)

 

Time 3

1.563 (0.06)

 

Time 3

1.622 (0.079)

1.487 (0.088)

 

Time 4

2.212 (0.06)

 

Time 4

2.153 (0.087)

2.276 (0.097)

 

Time 9

4.023 (0.13)

 

Time 9

3.711 (0.179)

4.350 (0.200)

 

Time 15

4.893 (0.14)

 

Time 15

4.214 (0.184)

5.641 (0.199)

 

Time 17

5.325 (0.16)

 

Time 17

4.552 (0.205)

6.176 (0.216)

 

Time 20

5.960 (0.14)

 

Time 20

5.079 (0.185)

6.934 (0.195)

 

Time 24

6.630 (0.15)

 

Time 24

5.711 (0.199)

7.642 (0.209)

 

Time 30

7.204 (0.18)

 

Time 30

6.259 (0.234)

8.246 (0.246)

 

−2 LL -2 log likelihood

Model 1: model with only a random intercept

Model 2: model with a random intercept and random slopes

aParameters for the quadratic terms

The next step is to extend the model with a random slope for time, indicating that the development of BMI over time is different for the subjects in the sample (model 2). These two models with a random intercept and a random intercept plus random slope can be compared by the likelihood ratio test to assess whether or not the inclusion of a random slope is necessary and improves the model fit. For each model, the -2 log likelihood is provided and although the value itself has no interpretation, the difference between two -2 log likelihoods can be used to compare neighbouring models. In our case, the distribution of the difference in the two -2 log likelihoods follows a Chi-square distribution with two degrees of freedom and this is highly statistically significant (i.e. 10,469−9,425 = 1,044; this difference is much larger than the critical value). The result of the likelihood ratio test thus indicates that a model with both a random intercept and a random slope is significantly better than a model with a random intercept only.

These two relatively straightforward models assume that the development of body mass index is linear. It could also be possible that the development over time is better represented for example by a second-, third- polynomial function. Table 9.2 additionally shows the results of a model assuming a quadratic development (time as well as time * time) of body mass index over time. Whether or not this quadratic term is needed, can again be evaluated by the likelihood ratio test, or by evaluating the P-value of the regression coefficient of the quadratic term. The likelihood ratio test indicates a model assuming a quadratic development has better fit compared to a model assuming linear development over time and this finding is confirmed by a significant quadratic term (−0.007/0.0004 = 17.5, which corresponds with a P < 0.05). The interpretation of the regression coefficients is similar to those in a linear model, except that there are two coefficients to interpret; one for the linear slope and one for the quadratic slope.

Incorporation of Life Course Phases

The use of a ‘straightforward’ mathematical function in modelling the development of body mass index over time always assumes a particular development over time and does not necessarily take into account differential developmental shapes throughout the phases of the life course. A very elegant solution in this respect is to model time as a categorical variable instead of a continuous one. When time is modelled as a categorical variable, the development over time is modelled without assuming a certain shape in the development. Table 9.2 additionally shows the results of a MM analysing the development of body mass index with time as a categorical variable (represented with nine dummy variables – ten measurements). The regression coefficient for each of the nine dummy variables indicates the difference in BMI between that particular measurement and the first measurement, which is the ‘reference’ category. For example; the regression coefficient for time 2 (0.936) is interpreted as the difference in BMI between the age of 13 and the age of 14. The positive coefficient indicates an increase in BMI. The corresponding model 2 demonstrates the same model, but additionally including a random slope.

Comparing Groups

Although the analysis of the development over time is of great interest, researchers are often interested in dividing the cohort under study into groups of subjects with comparable developmental trajectories; firstly as a tool to describe the population under study and, secondly as a first step to study either determinants or consequences of different trajectories. The possibility to divide the cohort under study within MM, however, is limited and only allows for the grouping of subjects into (predefined) categories such as gender. Categorisation into these predefined categories thus results in average health trajectories for each category, mostly assuming that each male or female has a similar health trajectory (shape). To do so, the interaction between the time variable and the group variable must be added to the model. Adding random slopes to predefined categories to the model makes it possible to model some heterogeneity within each category. However, this is not often done in practice, mainly because many software packages are not capable of performing such analyses. Table 9.2 shows the results of the MM analyses comparing the development over time between males and females. The interpretation of the numbers is exactly as has been given for the total population. All differences in development over time between males and females were statistically significant.

Although MM can allow for heterogeneity in health trajectory (shape) through the inclusion of random slopes, they only allow for the a-priori selection of subgroups of subjects. LCGM, on the other hand, relies on the data to generate latent subgroups of subjects with different health trajectories over time and consequently, potentially differential risks of disease. The application of such a ‘group-based’ approach, as opposed to the traditional ‘classification-based’ approach where subjects are classified into predefined groups is interesting for life course researchers too, because it allows the unravelling of distinct trajectories throughout the life course including their determinants and consequences and allows for the identification of high-risk subjects, who may need supplementary treatment or monitoring over the life course. Moreover, such an approach can also allow the subsequent identification of low-risk subjects who might need less attention.

There are several statistical techniques available to study heterogeneity in developmental trajectories using the data to form the groups; the most commonly used and the most flexible techniques available are probably those based on structural equation modelling (Bentler 1980; Bollen 1989; Kline 2005), i.e. latent class (growth) models (LCGM). These models are actually extensions of MM and latent growth models (Preacher et al. 2008) and have recently been introduced in life course research (Dunn et al. 2006; Dunn 2010; Hoekstra et al. 2011).

Analysis of Health Trajectories Over Time: Latent Class Growth Models

Latent class growth models (LCGM) are extensions of MM and latent growth models (LGM). LGM are regression-based models that allow the analysis of observed and unobserved (or latent) variables. Latent variables are unobserved variables inferred from the data (Preacher et al. 2008).

Model Assumptions

LCGM also have relatively straightforward underlying assumptions, because they are also regression-based models. Specifically, the assumption of within-class conditional normality is important. This assumption assumes normally distributed measures within classes (Bauer 2007).

The observed variables are the body mass index measurements over time and the latent variables represent aspects of the repeated measurements over time. The most important latent variables are the intercept parameter and the slope parameters, analogous to a MM. The intercept also represents the initial, or baseline, status of the body mass index variable and the slope represents the rate at which this body mass index changes over time. In addition to the intercept and slope parameters, a LGM also models the variation, or variance around both these parameters. Similar to the random intercept and random slope in a MM, these parameters tell us something about the inter-individual differences; the larger the variance parameters, the more subjects in the study sample differ according to their initial status and/or development over time. Where in MM and LGM one single, average, trajectory is assumed to be sufficient to describe the individual trajectories (i.e. all subjects are represented by one underlying population), in LCGM one single trajectory is considered insufficient. This means that subjects in the study might come from multiple underlying (latent) subpopulations, with corresponding heterogeneity in developmental trajectory shape. The application of LCGM allows the (statistical) identification of the number, and characteristics (again, intercept, slope, and the variances around them) of these underlying subpopulations, hereby grouping the sample under study not based on predefined groups, but on groups (or classes) within the data. Thus, the subgroups to be revealed are latent. LCGM are also referred to as group-based models (Nagin and Tremblay 2001; Nagin 1999, 2005) and were first developed as the counterpart of the MM; both techniques aim to model individual-level heterogeneity in developmental trajectories but do so in different ways. In MM, we aim to describe the development of body mass index over time, incorporating heterogeneity within the sample. The main objective of LCGM is to statistically classify the subjects into distinct subgroups, each with their own growth parameters. In group-based LCGM, both the within-class intercepts and slope variances are set to zero (i.e. these parameters are not estimated), implying that subjects classified into the same class are very similar to each other regarding their individual trajectories. LCGM that do estimate within-class intercept and/or slope variance are latent class growth mixture models (Muthén and Muthén 2000) and imply larger within-class heterogeneity.

Choosing the Optimal Number of Classes; Comparing Models

Model fit of LCGM cannot be compared with the likelihood ratio test used in the MM example because from simulation studies it has become clear that the difference in -2 log likelihood does not follow a Chi-squared distribution and is therefore inappropriate (Nylund et al. 2007). Thus, other fit indices need to be used if we want to compare neighbouring models. The most often used fit indices are the bootstrapped likelihood ratio test (BLRT) and the Bayesian information criterion (BIC). The BLRT is also a likelihood-based test (McLachlan and Peel 2000), and overcomes the problems with the traditional likelihood ratio test. The BLRT uses bootstrap samples to empirically estimate the distribution of the difference in log-likelihood, hereby estimating the specific difference distribution. The test includes a Monte Carlo method involving the simulation of data and provides a P-value. A non-significant P-value would favour a model with one class less whereas a significant P-value would favour the other model.

The BIC (Schwarz 1978) is also used in MM and considers both the likelihood of the test and the number of parameters in the model, hereby penalising more complex models. A lower BIC value indicates a better fitting model, where a minimum on 10 points is often considered (Raftery 1995).

Choosing the Optimal Number of Classes Is Not Straightforward in LCGM

When conducting LCGM, model fit indices often do not consistently point to one best fitting model and the two model fit indices described in the previous section should both be considered. Although the BLRT has been demonstrated to be superior in choosing the optimal number of classes in simulated data (Nylund et al. 2007), in ‘real’ datasets this is often not the case and both indices should be considered. Moreover, because of the common inconsistencies, additionally, model parsimony (favouring the ‘simplest’ model), successful convergence, a minimum of 1 % of the study sample in each class, theoretical background and substantial interpretation of the classes (uninterpretable and theoretically impossible models are rejected) is taken into account too (Jung and Wickrama 2008; Nylund et al. 2007).

Finally, LCGM models are computationally-heavy models, often with convergence issues or hitting local maxima because of the complexity of the models. Random starts are recommended to avoid these issues as much as possible. In the current analyses, we applied 1,000 random starting values with 50 final optimisations. Only solutions with replicated log likelihoods were accepted. Because LCGM are complicated models, often problems arise when estimating more than three classes.

Strategy and Models: Latent Class Growth Models

Various LCGM models were run before choosing a final model. First, several linear LCGM with fixed intercept- and slope variance within-classes were investigated. Table 9.3 shows these results. The linear models are shown in the top four rows of the table, and point towards a three- or four class solutions. The interpretation of the slopes is the same as for the slopes presented in Table 9.2, only they are now class-specific; each class has its own estimated slope(s). For example, the slope of 0.269 indicates that for each year, the BMI increases (linearly) with 0.269 kg/m2.
Table 9.3

Results of the latent class growth models

Number of classes

Intercept

Slope(s)

BIC

BLRT

Probability

Subjects per class

1, linear slope

Class 1: 18.248

0.269

9899.165

Not calculated

1.00

336

2, linear slope

Class 1: 18.081

0.261

9884.609

i < 0.001

0.930

324/12

 

Class 2: 21.658

0.434

    

3, linear slope

Class 1: 17.950

0.258

9887.077

P < 0.001

0.863

310/10/16

Class 2: 23.545

0.140

    

Class 3: 19.734

0.479

    

4, linear slope

Class 1: 19.397

0.465

9894.435

P = 0.034

0.829

31/286/3/16

Class 2: 17.757

0.259

    

Class 3: 25.537

0.267

    

Class 4: 21.344

0.119

    

1, quadratic slope

Class 1: 17.809

0.436;−0.007a

8671.655

Not calculated

1.00

336

2, quadratic slope

Class 1: 20.661

0.530;−0.003a

8639.308

P < 0.001

0.956

17/322

Class 2: 17.663

0.431;−0.007a

    

1, life course modelb

Class 1: 17.442

0.743; 1.208; 0.538

8283.726

Not calculated

1.00

336

2, life course modelb

Class 1: 17.292

0.752; 1.159; 0.497

8262.715

P < 0.001

0.936

324/12

Class 2: 20.892

0.515; 2.299; 1.528

    

3, life course model b

Class 1: 17.043

0.796; 1.130; 0.566

8265.720

P< 0.001

0.867

297/24/15

Class 2: 20.160

0.551; 1.514;−0.322

    

Class 3: 20.924

0.517; 2.313; 1.502

    

4, life course modelb

Class 1: 19.498

0.681; 2.410; 1.294

8274.552

P = 0.05

0.910

16/2/294/24

Class 2: 25.892

0.184; 0.824; 2.080

    

Class 3: 16.997

0.771; 1.116; 0.555

    

Class 4: 20.238

0.525; 1.431;−0.302

    

aParameters for the quadratic slopes

bLife course phase 1 (piece 1) is defined by ages 13–16, piece 2 is defined by the ages 21–29 and piece three is defined by the ages 32–42

Next, quadratic slopes were added to the model allowing for curved developmental patterns as described in the section about MM. However, the quadratic models with three and four classes are increasingly complex and often result in untrustworthy output because of this complexity (e.g. negative variance estimates, difficulties with estimating other model estimates or classes with zero subjects (Jung and Wickrama 2008)). In our case, we had problems with the estimation of the quadratic slopes for some classes that appeared to include two individuals only. Therefore, for the quadratic models we had two models to compare to.

Incorporation of Life Course Phases

We subsequently investigated piecewise models which allow for different phases in development. Models with three pieces were investigated, showing possibilities of different growth rates (and directions) during each phase of the life course. Each phase thus has its own slope(s). Phase one was defined as the adolescence phase (age 13–16), phase two was defined as the young adulthood phase (age 21–27) and phase three was defined as the adulthood phase (age 32–42). The last rows of Table 9.3 show the results of these analyses. We see that the improvement of these models (specifically indicated by much lower BIC values) is clear.

The Final Model

Comparing the three sets of four models based on model fit alone was complicated, as often the different model fit indices are not in agreement with each other (Nylund et al. 2007). Based on the BIC, for example, the “best” model (i.e. the model with the lowest value) is the two class piecewise model, although the BIC of the three class piecewise model is almost the same (a difference of three points). Literature advices about “significant” improvement in the BIC-values; improvement of at least 10 points indicates a sufficient improvement of the model (Raftery 1995) indicating that based on the BIC, the two- and three class piecewise models have equivalent fit. Further, based on the BLRT, the “best” model is a four class model (indicated by a non-significant P-value). Both (BIC and BLRT) model fit indices therefore do not point consistently to one definite solution. Hence, we also took the substantive interpretation of the trajectories into account. We interpreted the meaning of the trajectories by assessing whether they make substantive sense. We assessed neighbouring models with similar statistical fit and rejected solutions that included classes that had no theoretically plausible meaning. Based on this, our final model was a three class piecewise model, showing a “normative” BMI trajectory (N = 297), a stabilising trajectory (N = 24) and a progressively overweight (N = 15) trajectory.

When the choice for the number of classes in the final model has been made, the necessity of random intercept- or slope variance within class can be assessed. This is done by additionally estimating one or more class-specific variance parameters for the intercepts and slopes. Subsequently, the model fit estimates as described earlier can again be interpreted. The results of these further analyses are not shown in the table, as none of the additional models showed a sufficient increase in model indices. Based on existing literature (Jung and Wickrama 2008) the ultimate final model was the most parsimonious model; a three class piecewise model without random intercept- and slope variance within classes (shown in bold in Table 9.3). Figure 9.3 shows the average trajectories of the final model.
Fig. 9.3

Estimated trajectories of Body Mass Index (Y-axis) from the age of 13–42 years (X-axis)

Comparing Mixed Models and Latent Class Growth Models

Both MM and LCGM can be used to answer research questions that deal with the analysis of (heterogeneity in) individual health trajectories over time. Research questions dealing with the investigation of predictors for the development of a health- or disease marker over time, for example, can be analysed by means of a MM as well as a LCGM. However, although both MM and LCGM study individual developmental trajectories over time, LCGM classifies these individual trajectories into homogeneous subgroups based on the individual trajectories and MM relies on theory to create such homogenous groups, which are created a priori.

Conclusion

This chapter explained and compared mixed models and latent class growth models using existing life course data of the Amsterdam Growth and Health Study cohort. We combined a methodological focus with empirical findings and demonstrated the value of both techniques for life course researchers who aim to study (heterogeneity in) individual health trajectories over time. Both techniques can be used to study the individual development of a certain variable over time, but depending on the specific focus of the research question either LCGM or MM are preferred.

References

  1. Altman, D. (1991). Practical statistics for medical research. Boca Raton: Chapman and Hall/CRC Press.Google Scholar
  2. Bakker, I., Twisk, J. W., van Mechelen, W., Mensink, G. B., & Kemper, H. C. (2003). Computerization of a dietary history interview in a running cohort; evaluation within the Amsterdam Growth and Health Longitudinal Study. European Journal of Clinical Nutrition, 57, 394–404.PubMedCrossRefGoogle Scholar
  3. Bauer, D. J. (2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42, 757–768.CrossRefGoogle Scholar
  4. Bentler, P. (1980). Multivariate analysis with latent variables: Causal modeling. Annual Review of Psychology, 31, 419–456.CrossRefGoogle Scholar
  5. Beunen, G., Baxter-Jones, A. D. G., Mirwald, R. L., Thomis, M., Lefevre, J., Malina, R. M., & Bailey, D. A. (2002). Intraindividual allometric development of aerobic power in 8- to 16-year-old boys. Medicine and Science in Sports and Exercise, 34(3), 503–510.PubMedCrossRefGoogle Scholar
  6. Bollen, K. (1989). Structural equations with latent variables. New York: Wiley.CrossRefGoogle Scholar
  7. Douw, L., Nieboer, D., van Dijk, B. W., Stam, C. J., & Twisk, J. W. R. (2014). A healthy brain in a healthy body: Brain network correlates of physical and mental fitness. PLoS One, 9(2), e88202. doi: 10.1371/journal.pone.0088202.PubMedCentralPubMedCrossRefGoogle Scholar
  8. Dunn, K. M. (2010). Extending conceptual frameworks: Life course epidemiology for the study of back pain. BMC Musculoskeletal Disorders, 2, 11–23.Google Scholar
  9. Dunn, K. M., Jordan, K., & Croft, P. R. (2006). Characterizing the course of low back pain: A latent class analysis. American Journal of Epidemiology, 163, 754–761.PubMedCrossRefGoogle Scholar
  10. Hancock, G., & Samuelsen, K. (Eds.). (2008). Advances in latent variable mixture models. Charlotte: Information Age Publishing.Google Scholar
  11. Hoekstra, T., Barbosa-Leiker, C., Koppes, L., & Twisk, J. (2011). Developmental trajectories of body mass index throughout the life course: An application of latent class growth (mixture) modelling. Longitudinal and Life Course Studies, 2, 319–330.Google Scholar
  12. Jung, T., & Wickrama, K. A. S. (2008). An introduction to latent class growth analysis and growth mixture modeling. Social and Personality Psychology Compass, 2, 302–317.CrossRefGoogle Scholar
  13. Kemper, H. (1995). The Amsterdam growth study: A longitudinal analysis of health, fitness and lifestyle (Vol. 6). Champaign: Human Kinetics.Google Scholar
  14. Kline, R. (2005). Principles and practice of structural equation modeling. New York: The Guildford Press.Google Scholar
  15. Kuh, D., & Ben-Shlomo, Y. (2004). A life course approach to chronic disease epidemiology. Oxford: Oxford University Press.CrossRefGoogle Scholar
  16. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.CrossRefGoogle Scholar
  17. Muthén, B., & Muthén, L. (2000). Integrating person-centered and variable centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882–891.CrossRefGoogle Scholar
  18. Muthén, L., & Muthén, B. (2012). Mplus user’s guide (7th ed.). Los Angeles: Muthén & Muthén.Google Scholar
  19. Nagin, D. S. (1999). Analyzing developmental trajectories. A semi-parametric group based approach. Psychological Methods, 6, 18–34.CrossRefGoogle Scholar
  20. Nagin, D. S. (2005). Group-based modeling of development. Cambridge: Harvard University Press.CrossRefGoogle Scholar
  21. Nagin, D. S., & Odgers, C. (2010). Group-based trajectory modelling in clinical research. Annual Review of Clinical Psychology, 6, 109–138.PubMedCrossRefGoogle Scholar
  22. Nagin, D. S., & Tremblay, R. E. (2001). Developmental trajectory groups: Fact or a useful statistical fiction? Criminology, 43, 873–904.CrossRefGoogle Scholar
  23. Nylund, K., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modelling: A Monte Carlo simulation study. Structural Equation Modeling, 14, 535–569.CrossRefGoogle Scholar
  24. Preacher, K., Wichman, A., MacCallum, R., & Briggs, N. (2008). Latent growth curve modeling. Thousand Oaks/New Delhi/London/Singapore: Sage Publications.Google Scholar
  25. Raftery, A. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163.CrossRefGoogle Scholar
  26. Rasbash, J., Charlton, C., Browne, W., Healy, M., & Cameron, B. (2005). MLWiN. Bristol: Center for Multilevel Modelling.Google Scholar
  27. Schouten, F., Twisk, J. W., de Boer, M. R., Serné, E. H., Stehouwer, C. D., Smulders, Y. M., & Ferreira, I. (2011). Increases in central fat and decreases in peripheral fat masses are associated with accelerated arterial stiffening in healthy adults. American Journal of Clinical Nutrition, 94, 40–48.PubMedCrossRefGoogle Scholar
  28. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.CrossRefGoogle Scholar
  29. Singer, J., & Willett, J. (2003). Applied longitudinal data analysis. Oxford: Oxford University Press.CrossRefGoogle Scholar
  30. Twisk, J. (2013). Applied longitudinal data analysis for epidemiology. A practical guide. New York: Cambridge University Press.CrossRefGoogle Scholar
  31. Van de Laar, R., Stehouwer, C., van Bussel, B., te Velde, S., Prins, M., Twisk, J., & Ferreira, I. (2012). Lower lifetime dietary fiber intake is associated with carotid artery stiffness: The Amsterdam Growth and Health Longitudinal Study. American Journal of Clinical Nutrition, 96, 14–23.PubMedCrossRefGoogle Scholar
  32. Welten, D. C., Kemper, H. C., Post, G. B., Van Mechelen, W., Twisk, J., Lips, P., & Teule, G. J. (1994). Weight-bearing activity during youth is a more important factor for peak bone mass than calcium intake. Journal of Bone and Mineral Research: The Official Journal of the American Society for Bone and Mineral Research, 9(7), 1089–1096. doi: 10.1002/jbmr.5650090717.CrossRefGoogle Scholar
  33. Wijnstok, N., Twisk, J., Young, I., Woodside, J., McFarlane, C., McEneny, J., & Boreham, C. (2010). Inflammation markers are associated with cardiovascular diseases risk in adolescents: The Young Hearts project 2000. Journal of Adolescent Health, 47, 346–351.PubMedCentralPubMedCrossRefGoogle Scholar
  34. Wijnstok, N., Hoekstra, T., Twisk, J., Smulders, Y., & Serné, E. (2012). The relationship of body fatness and body fat distribution with microvascular recruitment: The Amsterdam Growth and Health Longitudinal Study. Microcirculation, 19, 273–279.PubMedCrossRefGoogle Scholar
  35. Wijnstok, N., Hoekstra, T., Twisk, J., van Mechelen, W., & Kemper, H. (2013). Cohort profile: The Amsterdam Growth and Health Longitudinal Study. International Journal of Epidemiology, 42, 422–429.PubMedCrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.EMGO+ Institute for Health and Care ResearchAmsterdamNetherlands
  2. 2.Department of Epidemiology and BiostatisticsVU University Medical Center AmsterdamAmsterdamNetherlands
  3. 3.Department of Health SciencesVU UniversityAmsterdamNetherlands

Personalised recommendations