Researchers in the field of life course research are often interested in analysing longitudinal data. One of the main advantages of longitudinal data is the possibility to study individual development over time (Singer and Willett 2003; Twisk 2013). Studying these individual trajectories for example helps to better understand how risk factors for diseases naturally develop throughout the life course and aids the understanding and unravelling of the aetiology of diseases, which is important for early detection and prevention. Several statistical techniques to study these individual trajectories are available, of which mixed models (MM) and latent (class) growth models (LCGM) are the most common. Both techniques aim to study (heterogeneity in) individual health trajectories, but do so in different ways. In short, MM are regression-based techniques designed to study the individual development of a certain variable over time by adjusting for the correlations of the observations within one subject. This is done by estimating the differences between the subjects; a variance for the intercept and a variance for the slope(s). The intercept is indicative for the baseline values and the slope(s) for the rate of development over time. LCGM also aims to study the individual development of a certain variable over time, but uses the data to additionally generate groups of individuals with comparable trajectories over time.

The purpose of this chapter is to explain and compare MM and LCGM in more detail by combining a methodological focus with empirical findings using existing life course data of the Amsterdam Growth and Health Longitudinal Study cohort. We will address practical issues with which researchers within life course research are confronted when aiming to find answers to their research questions, paying particular attention to when, how and why to use MM and LCGM. Further, detailed readings are recommended for interested readers (Hancock and Samuelsen 2008; Nagin and Odgers 2010; Nagin 2005; Preacher et al. 2008; Singer and Willett 2003; Twisk 2013).

The Amsterdam Growth and Health Longitudinal Study

In 1974, the Amsterdam Growth and Health Longitudinal Study (AGHLS) was initially planned to monitor the growth, health and lifestyle of groups of adolescents entering secondary school, over a period of 4 years (Wijnstok et al. 2013). The reason for this follow-up was a series of intervention studies to measure the effectiveness of more intensive and extra physical education lessons in 12–13 year-old boys (Bakker et al. 2003; Kemper 1995). In general, no clear effects were found. There were indications that large inter-individual differences between the pupils in biological development and habitual physical activity could have masked any intervention effects. At that time, health authorities were complaining about the level of fitness of youngsters in their late teens. In growing towards independence, the life-style habits of teenagers change considerably (with regard to physical activity, food intake, tobacco smoking and alcohol consumption). Thereby, their health perspective might also have changed. This illustrated that the teenage period is an important period in the life course. Because individual changes in growth and development can be described most precisely by studying the same participants over a longer period of time, the Amsterdam Growth and Health Longitudinal Study (AGHLS) was set up; approximately 500 boys and girls (mean age of 13 years) from the first two grades of two secondary schools in the Netherlands were included in the study.

In the first 4 years, the AGHLS was primarily aimed at the investigation of the natural course of dietary intake, physical activity and physical fitness as risk factors for cardiovascular disease, such as cholesterol levels. Subsequently, the AGHLS was expanded with further rounds of measurements after the adolescence period, during the young adulthood phase. At each of these follow-up rounds of measurement, cardiovascular disease risk indicators and lifestyle variables were measured in a similar way to allow longitudinal comparison. Also, new, age-relevant measures for health and lifestyle were added to the test battery at each follow-up round. For example, when subjects were in their 20s, measurements of bone mineral density were added (Welten et al. 1994). In their 30s, the test battery was extended with, for example, carotid ultrasound measurements to determine large arterial properties to examine not only the ‘clinical’ cardiovascular risk factors but also the early preclinical cardiovascular damage (Schouten et al. 2011; van de Laar et al. 2012). In their 40s, microvascular function was added to the test battery as an even earlier indicator for vascular damage (Wijnstok et al. 2010, 2012). Moreover, to study the onset, timing and progression of neuropsychiatric disease, such as Alzheimer’s disease, a magnetoencephalogram, a novel brain scan to assess communication of different brain sections, was added to the test battery (Douw et al. 2014).

During the adolescence period, participants were measured annually during school hours, and thereafter, six more examinations took place, of which the most recent took place when the participants were 42 years old. Currently, around 350 participants are still enrolled in the study. In 2013, a cohort profile describing in detail the Amsterdam Growth and Health Longitudinal Study was published (Wijnstok et al. 2013).

Focus of the Cohort for This Chapter

The substantive focus of this paper will be illustrated by body mass index (BMI), a well-known construct across many fields of research. Because this chapter focuses mainly on methodological issues, although by using empirical data, substantive findings will not be discussed in great detail. Interested readers are referred to the literature specifically focused on BMI, outside the scope of this chapter (e.g. Hoekstra et al. 2011).

For the analyses in this chapter we selected subjects who had valid measurements of BMI available at baseline (i.e. at age 13) and who had at least two additional valid measurements over time. In total, 378 subjects (51 % females) who were measured 3–10 times were included for the analyses presented in this chapter. Baseline characteristics of the sample are presented in Table 9.1. The sample size varies across the first four measurements because part of the sample was measured once during that period (Wijnstok et al. 2013). We performed the MM in MLWiN 2.30 (Rasbash et al. 2005) and the LCGM in Mplus 7.11 (Muthén and Muthén 2012) and first analysed the development of BMI over time for the whole cohort, and second we analysed differences in the development of BMI over time between males and females.

Table 9.1 Descriptive information (mean, standard deviation) regarding the development of body mass index

Analysis of Health Trajectories Over Time: Mixed Models

When analysing health trajectories over time, we should take into account the fact that repeated observations of each individual in the dataset are not independent (Singer and Willett 2003; Twisk 2013). Simple regression analyses or analysis of variance are unable to fully incorporate these correlated measures and therefore, more sophisticated statistical techniques are needed to analyse these data accordingly. Several techniques to do so are available, of which mixed models (MM) are the most well-known techniques.

Model Assumptions

Mixed models have relatively straightforward underlying assumptions. Because they are regression-based models MM rely on the same assumptions (Altman 1991). Specifically, the residuals of the outcome variable should be normally distributed and in addition, the residuals around the (random) intercept and slopes should approximate a normal distribution. This can be easily assessed in any statistical software package. MM are also well suited in the case of unequal repeated measurements and the model assumes data to be missing at random.

The idea behind MM (Twisk 2013) was initially developed in the social sciences, specifically for educational research. Investigating the performance of pupils within schools, researchers realised that the performances of pupils within the same class were not independent (in other words; these performances are correlated). This type of study design is characterised by a hierarchical structure; also referred to as a multilevel structure. Pupils are nested within classes and classes are nested within schools. More or less the same structure is seen in longitudinal datasets where observations are nested within the subject: observations of one subject over time are correlated. MM adjusts for this correlation between repeated observations by modelling the variability among individuals through the inclusion of both fixed and random effects. To understand the general idea behind a mixed model, it is important to realise that with simple regression analyses adjusting for a certain variable (for example gender) means that different intercepts are calculated. In a longitudinal setting (thus adjusting for subject), this then means that for each subject, different intercepts are calculated. The simplest form of a MM takes into account this random intercept only. Figure 9.1 demonstrates that a model with a random intercept only allows the intercepts to vary across subjects. In addition to a random intercept, it is also possible that the development over time varies across individuals (indicated by a random slope). This is demonstrated in Fig. 9.2, where only linear slopes are assumed.

Fig. 9.1
figure 1

Visualisation of different intercepts for each subject (random intercept)

Fig. 9.2
figure 2

Visualisation of differences in the development over time for each subject (random intercept and slope)

The Modelling of Time

Within life course epidemiology, MM are usually conducted to describe the development of a particular outcome variable over time (Beunen et al. 2002; Twisk 2013). In that case, the only independent variable of interest is a time variable, which can be included either as a continuous variable (indicating a linear development or other polynomial function of the outcome variable over time) or as a categorical variable. The modelling of time as a categorical variable allows for the incorporation of different developmental shapes during different stages in the life course, such as childhood, adolescence, adulthood and the elderly. This closely maps onto one of the focal points of life course research, which offers opportunities to understand the natural history of chronic disorders with a specific focus on health transitions and phases across the life course (Kuh and Ben-Shlomo 2004).

Comparing Models

Model fit of MM can be assessed by comparing the log likelihood values of neighbouring [nested] models. The likelihood ratio test (Bollen 1989) subsequently calculates the difference in -2 log-likelihood between two neighbouring models. This difference is assumed to follow a Chi-squared distribution with the number of degrees of freedom denoted as the difference in the number of parameters estimated with the two models.

Strategy and Results: Mixed Models

First, we will start by modelling a linear development of BMI over time. This model is relatively straightforward and includes the time variable as a continuous variable, which, in the case of the AGHLS would be 1, 2, 3, 4, 9, 15, 17, 20, 24, and 30. These values reflect the actual time between measurement rounds, taking into account the unequally spaced time points. Alternatively, ignoring the unequally spaced time points, this variable could also have values between 1 and 10 (one unit per measurement).

The simplest model includes a random intercept only. Table 9.2 demonstrates the results of this model and indicates that body mass index values are increasing over time. The regression coefficient for time for the linear model (model 1) is 0.240, so for each year, the BMI increases (linearly) with 0.240 kg/m2. The corresponding standard error (0.003; indicative of the preciseness in the estimation of the regression coefficient – the smaller the more precise) can be used to assess the statistical significance of the regression coefficient by calculating the z-statistic (Altman 1991; Twisk 2013). The z-statistic is calculated by dividing the regression coefficient with the corresponding standard error. Values above 1.96 indicate statistical significance.

Table 9.2 Results of the mixed models

The next step is to extend the model with a random slope for time, indicating that the development of BMI over time is different for the subjects in the sample (model 2). These two models with a random intercept and a random intercept plus random slope can be compared by the likelihood ratio test to assess whether or not the inclusion of a random slope is necessary and improves the model fit. For each model, the -2 log likelihood is provided and although the value itself has no interpretation, the difference between two -2 log likelihoods can be used to compare neighbouring models. In our case, the distribution of the difference in the two -2 log likelihoods follows a Chi-square distribution with two degrees of freedom and this is highly statistically significant (i.e. 10,469−9,425 = 1,044; this difference is much larger than the critical value). The result of the likelihood ratio test thus indicates that a model with both a random intercept and a random slope is significantly better than a model with a random intercept only.

These two relatively straightforward models assume that the development of body mass index is linear. It could also be possible that the development over time is better represented for example by a second-, third- polynomial function. Table 9.2 additionally shows the results of a model assuming a quadratic development (time as well as time * time) of body mass index over time. Whether or not this quadratic term is needed, can again be evaluated by the likelihood ratio test, or by evaluating the P-value of the regression coefficient of the quadratic term. The likelihood ratio test indicates a model assuming a quadratic development has better fit compared to a model assuming linear development over time and this finding is confirmed by a significant quadratic term (−0.007/0.0004 = 17.5, which corresponds with a P < 0.05). The interpretation of the regression coefficients is similar to those in a linear model, except that there are two coefficients to interpret; one for the linear slope and one for the quadratic slope.

Incorporation of Life Course Phases

The use of a ‘straightforward’ mathematical function in modelling the development of body mass index over time always assumes a particular development over time and does not necessarily take into account differential developmental shapes throughout the phases of the life course. A very elegant solution in this respect is to model time as a categorical variable instead of a continuous one. When time is modelled as a categorical variable, the development over time is modelled without assuming a certain shape in the development. Table 9.2 additionally shows the results of a MM analysing the development of body mass index with time as a categorical variable (represented with nine dummy variables – ten measurements). The regression coefficient for each of the nine dummy variables indicates the difference in BMI between that particular measurement and the first measurement, which is the ‘reference’ category. For example; the regression coefficient for time 2 (0.936) is interpreted as the difference in BMI between the age of 13 and the age of 14. The positive coefficient indicates an increase in BMI. The corresponding model 2 demonstrates the same model, but additionally including a random slope.

Comparing Groups

Although the analysis of the development over time is of great interest, researchers are often interested in dividing the cohort under study into groups of subjects with comparable developmental trajectories; firstly as a tool to describe the population under study and, secondly as a first step to study either determinants or consequences of different trajectories. The possibility to divide the cohort under study within MM, however, is limited and only allows for the grouping of subjects into (predefined) categories such as gender. Categorisation into these predefined categories thus results in average health trajectories for each category, mostly assuming that each male or female has a similar health trajectory (shape). To do so, the interaction between the time variable and the group variable must be added to the model. Adding random slopes to predefined categories to the model makes it possible to model some heterogeneity within each category. However, this is not often done in practice, mainly because many software packages are not capable of performing such analyses. Table 9.2 shows the results of the MM analyses comparing the development over time between males and females. The interpretation of the numbers is exactly as has been given for the total population. All differences in development over time between males and females were statistically significant.

Although MM can allow for heterogeneity in health trajectory (shape) through the inclusion of random slopes, they only allow for the a-priori selection of subgroups of subjects. LCGM, on the other hand, relies on the data to generate latent subgroups of subjects with different health trajectories over time and consequently, potentially differential risks of disease. The application of such a ‘group-based’ approach, as opposed to the traditional ‘classification-based’ approach where subjects are classified into predefined groups is interesting for life course researchers too, because it allows the unravelling of distinct trajectories throughout the life course including their determinants and consequences and allows for the identification of high-risk subjects, who may need supplementary treatment or monitoring over the life course. Moreover, such an approach can also allow the subsequent identification of low-risk subjects who might need less attention.

There are several statistical techniques available to study heterogeneity in developmental trajectories using the data to form the groups; the most commonly used and the most flexible techniques available are probably those based on structural equation modelling (Bentler 1980; Bollen 1989; Kline 2005), i.e. latent class (growth) models (LCGM). These models are actually extensions of MM and latent growth models (Preacher et al. 2008) and have recently been introduced in life course research (Dunn et al. 2006; Dunn 2010; Hoekstra et al. 2011).

Analysis of Health Trajectories Over Time: Latent Class Growth Models

Latent class growth models (LCGM) are extensions of MM and latent growth models (LGM). LGM are regression-based models that allow the analysis of observed and unobserved (or latent) variables. Latent variables are unobserved variables inferred from the data (Preacher et al. 2008).

Model Assumptions

LCGM also have relatively straightforward underlying assumptions, because they are also regression-based models. Specifically, the assumption of within-class conditional normality is important. This assumption assumes normally distributed measures within classes (Bauer 2007).

The observed variables are the body mass index measurements over time and the latent variables represent aspects of the repeated measurements over time. The most important latent variables are the intercept parameter and the slope parameters, analogous to a MM. The intercept also represents the initial, or baseline, status of the body mass index variable and the slope represents the rate at which this body mass index changes over time. In addition to the intercept and slope parameters, a LGM also models the variation, or variance around both these parameters. Similar to the random intercept and random slope in a MM, these parameters tell us something about the inter-individual differences; the larger the variance parameters, the more subjects in the study sample differ according to their initial status and/or development over time. Where in MM and LGM one single, average, trajectory is assumed to be sufficient to describe the individual trajectories (i.e. all subjects are represented by one underlying population), in LCGM one single trajectory is considered insufficient. This means that subjects in the study might come from multiple underlying (latent) subpopulations, with corresponding heterogeneity in developmental trajectory shape. The application of LCGM allows the (statistical) identification of the number, and characteristics (again, intercept, slope, and the variances around them) of these underlying subpopulations, hereby grouping the sample under study not based on predefined groups, but on groups (or classes) within the data. Thus, the subgroups to be revealed are latent. LCGM are also referred to as group-based models (Nagin and Tremblay 2001; Nagin 1999, 2005) and were first developed as the counterpart of the MM; both techniques aim to model individual-level heterogeneity in developmental trajectories but do so in different ways. In MM, we aim to describe the development of body mass index over time, incorporating heterogeneity within the sample. The main objective of LCGM is to statistically classify the subjects into distinct subgroups, each with their own growth parameters. In group-based LCGM, both the within-class intercepts and slope variances are set to zero (i.e. these parameters are not estimated), implying that subjects classified into the same class are very similar to each other regarding their individual trajectories. LCGM that do estimate within-class intercept and/or slope variance are latent class growth mixture models (Muthén and Muthén 2000) and imply larger within-class heterogeneity.

Choosing the Optimal Number of Classes; Comparing Models

Model fit of LCGM cannot be compared with the likelihood ratio test used in the MM example because from simulation studies it has become clear that the difference in -2 log likelihood does not follow a Chi-squared distribution and is therefore inappropriate (Nylund et al. 2007). Thus, other fit indices need to be used if we want to compare neighbouring models. The most often used fit indices are the bootstrapped likelihood ratio test (BLRT) and the Bayesian information criterion (BIC). The BLRT is also a likelihood-based test (McLachlan and Peel 2000), and overcomes the problems with the traditional likelihood ratio test. The BLRT uses bootstrap samples to empirically estimate the distribution of the difference in log-likelihood, hereby estimating the specific difference distribution. The test includes a Monte Carlo method involving the simulation of data and provides a P-value. A non-significant P-value would favour a model with one class less whereas a significant P-value would favour the other model.

The BIC (Schwarz 1978) is also used in MM and considers both the likelihood of the test and the number of parameters in the model, hereby penalising more complex models. A lower BIC value indicates a better fitting model, where a minimum on 10 points is often considered (Raftery 1995).

Choosing the Optimal Number of Classes Is Not Straightforward in LCGM

When conducting LCGM, model fit indices often do not consistently point to one best fitting model and the two model fit indices described in the previous section should both be considered. Although the BLRT has been demonstrated to be superior in choosing the optimal number of classes in simulated data (Nylund et al. 2007), in ‘real’ datasets this is often not the case and both indices should be considered. Moreover, because of the common inconsistencies, additionally, model parsimony (favouring the ‘simplest’ model), successful convergence, a minimum of 1 % of the study sample in each class, theoretical background and substantial interpretation of the classes (uninterpretable and theoretically impossible models are rejected) is taken into account too (Jung and Wickrama 2008; Nylund et al. 2007).

Finally, LCGM models are computationally-heavy models, often with convergence issues or hitting local maxima because of the complexity of the models. Random starts are recommended to avoid these issues as much as possible. In the current analyses, we applied 1,000 random starting values with 50 final optimisations. Only solutions with replicated log likelihoods were accepted. Because LCGM are complicated models, often problems arise when estimating more than three classes.

Strategy and Models: Latent Class Growth Models

Various LCGM models were run before choosing a final model. First, several linear LCGM with fixed intercept- and slope variance within-classes were investigated. Table 9.3 shows these results. The linear models are shown in the top four rows of the table, and point towards a three- or four class solutions. The interpretation of the slopes is the same as for the slopes presented in Table 9.2, only they are now class-specific; each class has its own estimated slope(s). For example, the slope of 0.269 indicates that for each year, the BMI increases (linearly) with 0.269 kg/m2.

Table 9.3 Results of the latent class growth models

Next, quadratic slopes were added to the model allowing for curved developmental patterns as described in the section about MM. However, the quadratic models with three and four classes are increasingly complex and often result in untrustworthy output because of this complexity (e.g. negative variance estimates, difficulties with estimating other model estimates or classes with zero subjects (Jung and Wickrama 2008)). In our case, we had problems with the estimation of the quadratic slopes for some classes that appeared to include two individuals only. Therefore, for the quadratic models we had two models to compare to.

Incorporation of Life Course Phases

We subsequently investigated piecewise models which allow for different phases in development. Models with three pieces were investigated, showing possibilities of different growth rates (and directions) during each phase of the life course. Each phase thus has its own slope(s). Phase one was defined as the adolescence phase (age 13–16), phase two was defined as the young adulthood phase (age 21–27) and phase three was defined as the adulthood phase (age 32–42). The last rows of Table 9.3 show the results of these analyses. We see that the improvement of these models (specifically indicated by much lower BIC values) is clear.

The Final Model

Comparing the three sets of four models based on model fit alone was complicated, as often the different model fit indices are not in agreement with each other (Nylund et al. 2007). Based on the BIC, for example, the “best” model (i.e. the model with the lowest value) is the two class piecewise model, although the BIC of the three class piecewise model is almost the same (a difference of three points). Literature advices about “significant” improvement in the BIC-values; improvement of at least 10 points indicates a sufficient improvement of the model (Raftery 1995) indicating that based on the BIC, the two- and three class piecewise models have equivalent fit. Further, based on the BLRT, the “best” model is a four class model (indicated by a non-significant P-value). Both (BIC and BLRT) model fit indices therefore do not point consistently to one definite solution. Hence, we also took the substantive interpretation of the trajectories into account. We interpreted the meaning of the trajectories by assessing whether they make substantive sense. We assessed neighbouring models with similar statistical fit and rejected solutions that included classes that had no theoretically plausible meaning. Based on this, our final model was a three class piecewise model, showing a “normative” BMI trajectory (N = 297), a stabilising trajectory (N = 24) and a progressively overweight (N = 15) trajectory.

When the choice for the number of classes in the final model has been made, the necessity of random intercept- or slope variance within class can be assessed. This is done by additionally estimating one or more class-specific variance parameters for the intercepts and slopes. Subsequently, the model fit estimates as described earlier can again be interpreted. The results of these further analyses are not shown in the table, as none of the additional models showed a sufficient increase in model indices. Based on existing literature (Jung and Wickrama 2008) the ultimate final model was the most parsimonious model; a three class piecewise model without random intercept- and slope variance within classes (shown in bold in Table 9.3). Figure 9.3 shows the average trajectories of the final model.

Fig. 9.3
figure 3

Estimated trajectories of Body Mass Index (Y-axis) from the age of 13–42 years (X-axis)

Comparing Mixed Models and Latent Class Growth Models

Both MM and LCGM can be used to answer research questions that deal with the analysis of (heterogeneity in) individual health trajectories over time. Research questions dealing with the investigation of predictors for the development of a health- or disease marker over time, for example, can be analysed by means of a MM as well as a LCGM. However, although both MM and LCGM study individual developmental trajectories over time, LCGM classifies these individual trajectories into homogeneous subgroups based on the individual trajectories and MM relies on theory to create such homogenous groups, which are created a priori.


This chapter explained and compared mixed models and latent class growth models using existing life course data of the Amsterdam Growth and Health Study cohort. We combined a methodological focus with empirical findings and demonstrated the value of both techniques for life course researchers who aim to study (heterogeneity in) individual health trajectories over time. Both techniques can be used to study the individual development of a certain variable over time, but depending on the specific focus of the research question either LCGM or MM are preferred.