The core of life course or lifespan research consists in studying how individual trajectories are shaped and unfold over time, from conception to death (Baltes 1987). Two concepts are fundamental in this endeavor: stability and change. Indeed, while certain individual characteristics remain constant across one’s lifespan (e.g., sex, ethnicity), others undergo profound change, to the point that they might mutate into other characteristics (e.g., health, cognitive capacities). Such changes need not be independent of each other. They may simply co-occur, in the sense that while they happen simultaneously, it is hard to infer causality mechanisms between them. They may also be intrinsically related, where one change process is a necessary antecedent of the other, which becomes the consequence.

The empirical study of lifespan development necessitates hence statistical models capable, at the very least, of (a) estimating constancy and change in information from a collection of observed measurements and (b) assessing degrees of interrelationships among constant and changing characteristics. Moreover, in a contextual paradigm typical of life course research, it is also highly desirable to ascertain which characteristics are intrinsic to the individual and which are influenced by, or stem from, external factors. Thus, the statistical models adopted in life course research face a number of challenging questions, which in turn can only be met with an appropriate data collection methodology.

The Rationale and Objectives of Longitudinal Research

There is common agreement that to study any changing phenomenon it is best to utilize longitudinal data, rather than cross-sectional data with only one time of measurement per individual. Baltes and Nesselroade (1979), in their seminal chapter, have defined the rationale of longitudinal research as “the study of phenomena in their time-related constancy and change” (p. 2). This conceptualization encompasses various types of methodologies applied across many disciplines, such as panel or wave designs, repeated measurement experiments, single-subject designs, and time series. The common feature of these designs is the presence of repeated observations on the same units of observation (usually individuals). For instance, the same individuals are assessed multiple times, necessarily at different time points, and the researcher’s interest lies in assessing what is constant, what changes across the multiple measurements, and what may explain the changes.

Following this rationale, Baltes and Nesselroade (1979) have outlined five objectives of longitudinal research. We repeat them here because these objectives motivate the use of longitudinal research and thereby should be directly addressed by statistical models applied to longitudinal data. To clarify, we provide a working example. The five objectives are:

  • Direct identification of intraindividual change. This objective aims at estimating change over (at least two) repeated assessments at the individual level, thereby identifying individual trajectories of change. The estimation of intraindividual change is at the very heart of longitudinal research and can only be approximated (usually unsatisfactorily) by analyzing single measurements of different individuals assessed at different time points. Cross-sectional data, thus, do not allow the exploration of this objective. A researcher interested, for instance, in assessing social inequalities in health trajectories across the lifespan (e.g., Cullati et al. 2014) will start out by assessing whether the health status of individuals changes across time. Individuals for whom intraindividual change is observed will not display flat trajectories.

  • Direct identification of interindividual differences (or similarities) in intraindividual change. In life course and lifespan research it is unrealistic to assume that entities are perfect replicates of each other. Hence, once intraindividual change is identified, this objective focuses on assessing whether individuals are similar to or differ from each other with respect to their specific intraindividual change patterns. Indeed, while all individuals may change in time, some may display increase in the intensity of the observed behavior in time, while others may show decrease. To continue the previous example, it may well be that for some individuals their health trajectories are rather stable in time, while for others their trajectories decline, evincing deterioration in general health, or rise, signifying health improvement. Lack of interindividual differences in intraindividual change would be reflected by parallel health trajectories, meaning that all individuals’ health changes equally.

  • Analysis of interrelationships in change. It is seldom that life course researchers focus on single variables, because rarely do important characteristics of our lifespan change in a completely independent and autonomous fashion. Most often the interest lies in understanding how multiple variables are associated, either occurring in parallel, or influencing each other. For instance, change in one observed behavior (e.g., declining health) may occur in parallel with other characteristics (e.g., aging, socioeconomic status, increase in body mass index, loss of income), or be more likely to come about if another behavior is previously triggered (e.g., loss of employment). Such interrelationships, again, cannot be approximated by cross-sectional methodology, unless the characteristics are inert and do not undergo any change. That is, static correlations between cross-sectional assessments of behaviors are usually bad approximations of the dynamics at play between these behaviors.

  • Analyses of causes (determinants) of intraindividual change. If intraindividual change can be identified (cf., first objective), it becomes highly pertinent to determine possible causes (or determinants) of such change. The passing of time is a convenient metric along which to describe and analyze change, but it seldom constitutes a theoretically sound explanation of any change process. While we know that in old age health generally declines, it is recognized that various causes are at play at very different levels (e.g., cellular, biological, epigenetic, psychological, social, all of which may correlate with aging).

  • Analyses of causes (determinants) of interindividual differences in intraindividual change. If interindividual differences in intraindividual change (second objective) are detected, we may wonder what causes such differences. It may well be that the causes of intraindividual change (fourth objective) are different across individuals (e.g., while an individual’s health improvement may primarily be due to change in marital status, another individual’s health improvement might be triggered by change in occupation status). This fifth objective seems promising with regard to addressing social inequalities in health trajectories across the lifespan.

The statistical models we discuss in this chapter are widely used in many disciplines because they define parameters that address directly the five objectives of longitudinal research. Moreover, recent extensions of these models allow broadening the scopes of longitudinal research by exploring additional questions, such as the presence of interindividual differences due to (known or unknown) group membership. These models have also been adapted to handle realities of longitudinal designs which render the whole research enterprise more arduous, such as dealing with unavoidable missing values.

A Linear Model of Change

Let Y ij represent the measurement of Y at time i for individual j. Let’s assume there are j = 1,2,…,n individuals composing a sample of size n and let’s assume this sample has been assessed at times i = 1,2,… T, hence T times. The usual statistical assumption of such a sample is that the n individuals are independent of each other, meaning that the scores Y ij of an individual j are not related to the scores Y ij’ of another individual j’, and this at any time point i. Nevertheless, it is natural to expect that the score Y ij of individual j at time i is correlated with the score Y i’j of that same individual at another time point i’, given that both depend on that individual’s characteristics. The first objective of longitudinal research motivates understanding how the repeated measurements Y ij are related in time. In other words, a mathematical model is wished for, to relate a meaningful variable that changes in time (hence assessed at multiple time points i) to the outcome of interest Y, and this for every individual j.

Historically, longitudinal data were first analyzed one individual at a time, separately, in what was named individual growth models (the term growth most probably is due to the fact that the first applications of this kind studied the physical growth of children undergoing puberty; see McArdle 2001). In other words, for n individuals, n models were computed separately. In this context, most frequently the repeated assessments were related to the aging of the individual, where a ij is the age of individual j at time i. Note that other definitions of time can be specified (e.g., occasions of measurement, assessment trials in a laboratory experiment, hours of the day when studying a circadian function). A simple model assuming that Y is linearly related to age (a ij ) is the simple linear regression model as specified in Eq. 8.1. Because this model has to be estimated separately for each individual j, we drop the subscript j.

$$ {Y}_i={\pi}_0+{\pi}_1{a}_i+{E}_i $$

In the end, the analyst will obtain, for each individual j, a value for: (a) the intercept π 0 , which defines the predicted value of Y when age = 0; (b) the slope π 1 , which defines the predicted change (either positive for growth or negative for decline) of Y for each unit change (e.g., one year, one month) in age; (c) the standard error of the estimate, which is closely related to the variability of the errors E i at each time point and which defines the quality of the overall age prediction of Y. Note that to interpret the intercept, it is customary to center age around a meaningful value, such as the average of the sample (by subtracting from each individual’s age the sample average age). The intercept is then the predicted Y value for an individual of average age. Alternative age scalings have been proposed and might allow for a more meaningful interpretation of the intercept in particular research situations (e.g., Mehta and West 2000; Wainer 2000). The assumptions of the model are that the errors E i are normally distributed (i.e., they follow a normal, Gaussian curve) and do not depend on values of age (i.e., homoscedasticity).

If the model is estimated for each individual, n estimates of these parameters are obtained. Of course, other mathematical relations between age and Y can be tested, such as polynomial or exponential functions (for instance, to model human growth, various exponential functions have been proposed). What must be kept in mind is that so far the analysis is individual-specific. At the first analytical step, the growth model is estimated for each individual. At the second analytical step, the individual estimates are subsequently summarized. Any conclusion about the overall sample would have to be inferred by summarizing the n estimates, for instance by calculating the average and the variance of the intercepts and of the slopes across all individuals. Note that the two steps are computed independently of each other.

The Linear Mixed-Effects Model

In 1982 Laird and Ware proposed a model that allowed estimating simultaneously intercept and slope information at both the individual and the sample level. The model is an expansion of the individual growth model and is presented in Eq. 8.2.

$$ \begin{array}{c}{Y}_{ij}={\pi}_{0j}+{\pi}_{1j}{a}_{ij}+{E}_{ij}\\ {}{\pi}_{0j}={\beta}_0+{U}_{0j}\\ {}{\pi}_{1j}={\beta}_1+{U}_{1j}\end{array} $$

Given that the repeated assessments of Y of all individuals are analyzed simultaneously, Eq. 8.2 necessitates the addition of the subscript j, which identifies the individual. Moreover, this approach supposes one set of growth parameters, that is, one intercept and one slope, per individual, which again justifies the subscript j on both parameters π 0 and π 1 . Technically, it is not correct to say that an intercept and a slope value are estimated for each individual. That is, the model does not explicitly estimate a π 0j and a π 1j value for each individual j. The model presupposes, however, that each individual may have an intercept and a slope value that deviate from the central (population average) values, which are indicated by β 0 and β 1 and which are explicitly estimated. What are also estimated are the inter-individual variances, due to the individual deviations U 0j and U 1j around the central values, and possibly the covariance between U 0j and U 1j . The variances of the U 0j and U 1j are defined by the parameters σ 2 I and σ 2 S , respectively, and their covariance by the parameter σ IS . Lastly, the errors of prediction E ij are not individually estimated. These are often assumed to have a constant variance in time, estimated by the parameter σ 2 E , and to be uncorrelated in time. In sum, then, in its most frequent specification, this model estimates six parameters: two central values, for the intercept, β 0 , and the slope, β 1 , two variances and a covariance of growth parameters, σ 2 I , σ 2 S , and σ IS , and an error variance σ 2 E . Figure 8.1 illustrates schematically the parameters of the linear mixed-effects model (LMEM).

Fig. 8.1
figure 1

Schematic representation of the parameters estimated in the LMEM (Eq. 8.2). The thin lines represent the best-fitting trajectory of each individual, while the thick line represents the best-fitting trajectory based on the central values (representative of the overall sample). In this example the value of the slope, β 1 , is positive. Note that the estimated slope variance, σ 2 S , and intercept-slope covariance, σ IS , depend on where age has been centered

This model distinguishes two kinds of parameters: the central values, which are common to all individuals (β 0 and β 1 ) and are called fixed effects, and the individual deviations from these central values (U 0j and U 1j ), called the random effects. Again, the random effects and the individual errors (E ij ) are not directly estimated, but their variances and covariance (σ 2 I , σ 2 S , σ IS , and σ 2 E ) are. Fixed effects thus apply to all individuals and are not subject to individual variations. Random effects, on the other hand, vary across individuals and do so by typically following the standard distribution of random variables, that is, a normal (Gaussian) distribution. The model hence assumes that all random effects (U 0j , U 1j ) and errors (E ij ) are normally distributed (symbolized by U 0j  ∼ \( \mathcal{N} \)(0, σ 2 I ), U 1j  ∼ \( \mathcal{N} \)(0, σ 2 S ), and E ij  ∼ \( \mathcal{N} \)(0, σ 2 E )), that the random effects may covary (Cov(U 0j , U 1j ) = σ IS ), that the errors do not covary with the random effects (Cov(E ij , U 0j ) = Cov(E ij , U 1j ) = 0), and that the errors do not covary in time (Cov(E ij , E i’j ) = 0).

The coexistence of fixed and random effects within the same model gives it the name of the linear mixed-effects model. The term linear denotes that the function linking Y to the predictor a ij is linear in its parameters, meaning that the parameters associated to the prediction of Y (on the right side of the equal sign in Eq. 8.2) are at most multiplied by a predictor and then added (i.e., the prediction is a linear combination of the parameters and the predictors), rather than, for instance, being exponentiated. This model has been developed in different disciplines, and is also known as random-effects model (Laird and Ware 1982), hierarchical linear model (Bryk and Raudenbush 1987), and multilevel model (Goldstein 1989).

The LMEM of Eq. 8.2 has notable advantages over the individually estimated growth models of Eq. 8.1. First, the estimation is simultaneous, that is, instead of involving two analytical steps it can be computed in a single step. All n individuals’ data are analyzed together, in a single analysis. This makes for a considerable gain in time. Second, the statistical tests obtained with the LMEM are superior to those of individually estimated growth models. More precisely, the Type I error rate of statistical tests is closer to its nominal value within the LMEM than in individual growth models, which are usually too liberal (Snijders and Bosker 2012). Third, the LMEM allows for a statistical test of sample heterogeneity, which is not possible with the individual growth approach. Hence, statistical tests for the significance of variances and covariance of the random effects (σ 2 I , σ 2 S , and σ IS ) are possible within the LMEM. This feature is extremely important, as these parameters represent heterogeneity in growth parameters, a concept often of chief interest from a theoretical perspective. Fourth, the LMEM can be extended to define parameters that operationalize the five objectives of longitudinal research: (a) Intraindividual change is directly identified with the first line of Eq. 8.2, and β 0 and β 1 define the average intraindividual change function; (b) the intercept variance σ 2 I and the slope variance σ 2 S identify directly interindividual differences in intraindividual change; (c) in multivariate specifications of the LMEM it is possible to covary intercept and slope of one change process with those of another (MacCallum et al. 1997; more detail in the section “Three Notable Extensions”, below); (d) it is straightforward to add a predictor to the first line of Eq. 8.2 to test a determinant of intraindividual change; and (e) if intercept variance and slope variance are significant, it is straightforward to extend the LMEM to include a predictor of interindividual differences in intraindividual change (more detail in the section “Inclusion of Covariates”, below). In other words, the LMEM appears to provide explicit statistical tests directly associated to the rationale and objectives of longitudinal research enunciated by Baltes and Nesselroade (1979). Consequently, this statistical model addresses key theoretical questions of life course research.

The Latent Curve Model

In 1984, Meredith and Tisak 1984 presented the precepts of how the LMEM defined above can be specified as a structural equation model. The work was later formalized by the same authors (Meredith and Tisak 1990), and applied by McArdle (1986). Structural equation modeling can largely be defined as a set of statistical techniques aimed at testing hypothesized relationships among chosen variables. In this context, the Latent Curve Model (LCM) formalizes how a series of repeated measurements of variable Y for individual j, represented by the vector Y j, is related to the passing of time (or aging of the individual) as specified in Eq. 8.3 (note that it is customary to write the names of vectors and matrices in boldface).

$$ {\mathbf{Y}}_{\rm{j}}=\varLambda {\boldsymbol{\upeta}}_{\rm{j}}+{\mathbf{E}}_{\rm{j}} $$

For each individual j, the T repeated measurements of Y are piled up in the vector Y j (of size T × 1, corresponding to Y 1j to Y Tj in Eq. 8.2). η j represents the vector (of size 2 × 1) of the two growth factors (or latent variables), intercept and slope (equivalent to π 0j and π 1j in Eq. 8.2). Λ is the matrix (of size T × 2) of the factor loadings associating the repeated observations to the growth factors. Note that Λ is not indexed by j, indicating that its values are the same for all individuals. Finally, E j is the vector (of size T × 1, corresponding to E 1j up to E Tj in Eq. 8.2) of time-specific errors. The loadings associating the intercept to the measurements (i.e., the first column of Λ) are conveniently fixed at 1 (the multiplier of π 0j in Eq. 8.2). To specify linear change, the loadings of the slope factor (i.e., the second column of Λ) increase linearly with time, or age. Hence, in this basic specification, the elements of the loading matrix Λ are not estimated. For instance, if the sample is observed at five ages, the loadings of the intercept would be [1 1 1 1 1]’ while for the slope they might be [−2 −1 0 1 2]’, (the values of a ij in Eq. 8.2; the prime symbol ’ stands for transposed). This is equivalent to centering the variable age on the third value in the LMEM. To clarify, we expand the general LCM notation in an illustration below (cf. Eq. 8.7).

The crucial parameter estimates of the LCM are associated to the growth factors η. The central values of the intercept and of the slope factor correspond to the fixed effects β 0 and β 1 of the LMEM; the variances of the two factors correspond to σ 2 I and σ 2 S and their covariance corresponds to σ IS . Finally, if the error variance is constrained to be constant in time, its estimate corresponds to σ 2 E . In the end, it can be shown that, while they have originated in rather widely different fields of statistics, the LMEM and the LCM as specified here are completely equivalent (e.g., Bauer 2003; Chou et al. 1998; McArdle and Hamagami 1996).

Inclusion of Covariates

The growth model implemented via the LMEM or the LCM allows for testing sample heterogeneity in the growth factors. For instance, are individuals different from each other with respect to their intercept score? This question is operationalized by testing the null hypothesis that the intercept variance is zero (H0: σ 2 I  = 0). Likewise, we can ask whether there are individual differences in the change process, or whether the entities are different from each other with respect to their slope score. Testing the null hypothesis of zero slope variance (H0: σ 2 S  = 0) addresses this question. If there appear to be interindividual differences in intercept and in slope (i.e., if the two variances are different from zero), a natural question is whether the intercept and the slope scores are related. This question is addressed by the null hypothesis of zero covariance (H0: σ IS  = 0).

If sample heterogeneity in intercept and/or slope appears significant, we can wonder whether a given characteristic of the entity may influence the growth factors. This question can easily be addressed within the LMEM and LCM by expanding the models to include predictors. There are two kinds of predictors. Time-varying predictors are those that vary across time, such as a medical or physical measurement taken at each assessment time, and hence require not only a subscript j to determine the individual, but also a subscript i to specify their time of assessment. These predictors are also called time-dependent or level 1 and may help explain intraindividual change. Time-invariant predictors, on the other hand, are individual specific and do not change in time, such as sex. These are simply denoted with a subscript j and are also called level 2, and may help explain interindividual differences in intraindividual change.

The LMEM can be expanded to include a time-varying predictor x ij and a time-invariant predictor z j as shown in Eq. 8.4.

$$ \begin{array}{c}{Y}_{ij}={\pi}_{0j}+{\pi}_{1j}{a}_{ij}+{\pi}_2{x}_{ij}+{E}_{ij}\\ {}{\pi}_{0j}={\beta}_{00}+{\beta}_{01}{z}_j+{U}_{0j}\\ {}{\pi}_{1j}={\beta}_{10}+{\beta}_{11}{z}_j+{U}_{1j}\end{array} $$

In the LCM the inclusion of a time-varying and a time-invariant predictor is shown in Eq. 8.5 (in which we use the notation that is standard in the structural equation modeling literature; e.g., Bollen and Curran 2006).

$$ \begin{array}{c}{\mathbf{Y}}_j=\varLambda {\boldsymbol{\upeta}}_j+B{\mathbf{x}}_j+{\mathbf{E}}_j\\ {}{\boldsymbol{\upeta}}_j=\boldsymbol{\upalpha} +\boldsymbol{\Gamma} {\xi}_j+{\boldsymbol{\upzeta}}_j\end{array} $$

Equation 8.5 specifies that the outcome Y j is not only influenced by the growth factors η j, but also by an observed time-varying covariate x j, through a regression weight specified in the scalar B (equivalent to π 2 in Eq. 8.4). The vector x j has T rows and one column, while the scalar B is a single number. The growth factors η j may be influenced by a time-invariant covariate ξ j (z j in Eq. 8.4) via the regression intercepts α (β 00 and β 10 in Eq. 8.4) and the regression weights Γ (β 01 and β 11 in Eq. 8.4). Given that the prediction of η j by ξ j will likely not be perfect in practice, Eq. 8.5 includes the regression errors ζ j (U 0j and U 1j in Eq. 8.4). Again, it can be shown that the models in Eqs. 8.4 and 8.5 are equivalent.

Note that it is not necessary that a time-invariant predictor influence both the intercept and the slope. Just like in ordinary linear regression, multiple predictors, including interactions, can be added to the model. Also, the same substantive variable could be considered as time-varying (for instance, daily values of a characteristic) or as time-invariant (for example, average value of the same characteristic over multiple days). In such cases it is important to note that the meaning of the variable can change dramatically and special care must be taken when interpreting the results (Goldstein 2011; Raudenbush and Bryk 2002; Snijders and Bosker 2012). Also, the LCM allows for more flexibility when testing covariates’ effects. For instance, the effect of a time-varying predictor is typically specified to be the same at all time points in the LMEM. While relaxing this equality constraint requires some data manipulation in LMEM, this can easily be achieved in the LCM.

We expect the variability of the random effects and of the errors (i.e., σ 2 I , σ 2 S , and σ 2 E ) to diminish as we add important predictors to the model. In other words, the unexplained variance of the models should decrease as we add important predictors (this effect can be tested statistically). Effect size measures, similar to the coefficient of determination (R 2) in ordinary linear regression, have been proposed to quantify these effects (e.g., Snijders and Bosker 1994).


Given the equivalency of the two models, when should an analyst use the LMEM and when the LCM? To a certain extent, the answer is a matter of preference. Given that the two approaches stem from different traditions, historically the statistical programs estimating these models were quite specific. Either they would estimate the LMEM (e.g., MIXED in SPSS, PROC MIXED in SAS, nlme and lme4 in R, mixed in STATA, HLM, MLwiN) or the LCM (e.g., LiSRel, AMOS, Mx, OpenMx in R, sem in STATA, EQS, Mplus). Recently, some software allow for both estimations (e.g., R, LiSRel, Mplus), although the optimization algorithms implemented in such software may not be the most efficient.

There are, however, empirical conditions, in which the implementation of one model over the other may be indicated. Ghisletta and Lindenberger (2004) described the relative advantages and disadvantages of the LMEM and the LCM approaches and suggested some practical guidelines concerning the choice between the two. In brief, the LMEM may be preferable over the LCM when (a) the data set is not balanced (i.e., not every entity has been observed at all times and at the same time points; hence, each entity j has been observed T j times, where T j may vary across entities) and there are many incomplete data; (b) the theoretical functional form of change is a common mathematical function (e.g., linear, polynomial, exponential); (c) the structure of the error covariance matrix is common (e.g., independent-diagonal, auto-regressive, unstructured); (d) the theoretical relation between the growth factors (i.e., intercept and slope) can only be a covariance (rather than, for instance, a regression weight); and (e) the role of a time-invariant variable, characteristic of the individual (e.g., sex, ethnicity, age at study inception), is that of predicting the growth factors, rather than for instance correlating with them.

The LCM approach may be more indicated than the LMEM approach when (a) there is a limited amount of unbalanced and incomplete data; (b) the change function is more complex than common mathematical functions (but remains linear in its parameters) or, when unknown, may be estimated directly from the data; (c) the structure of the errors is of theoretical interest and requires flexible specifications; (d) the growth factors are not limited to simply covary but could be regressed on each other; and (e) the associations between external covariates and the growth factors are not limited to unilateral predictions.

These suggestions are meant as starting guidelines and not definite rules. Either approach has undergone notable extensions, which greatly amplifies their degree of flexibility in modeling change processes. For instance, the LMEM has been extended to allow relating the outcome (Y ij ) to the predictors (on the right side of Eqs. 8.2 or 8.4) by a so-called link function from the exponential family. This model is called the Generalized Linear Mixed Model (GLMM; McCulloch and Searle 2001). Even more flexibility about the relationships between the outcome and the predictors has been introduced in the context of the so-called Nonlinear Mixed-Effects Models (Davidian and Giltinan 1995). The LCM has also undergone notable extensions. In particular, when applied to multiple variables, the model has been extended to include dynamic associations between the growth factors, allowing to establish which variable, if any, is driving the change of the other variable, hence determining the leader and the lagger within a multivariate dynamic system (McArdle and Hamagami 2001).

An Illustration from the Swiss Household Panel

To illustrate the LMEM and the LCM, we use data from the second sample of the Swiss Household Panel (SHP), which is a study based at the Swiss Centre of Expertise in the Social Sciences FORS. The project is financed by the Swiss National Science Foundation. The SHP is a nationally representative survey of households investigating trends in social dynamics among the Swiss population. Data are collected annually by Computer Assisted Telephone Interviewing. For this illustration, we focus on eight waves collected on 3,665 individuals, running from 2004 to 2011, and examine the evolution of self-reported health (SRH) over time. SRH is measured by the first question of the Short-Form-36 (“In general, do you think your health is…”, rated from 1 (excellent) to 5 (poor)). Albeit simple, this measure is interesting because it predicts several important health outcomes, such as sick leave, and mortality (Berkman and Syme 1979; Halford et al. 2012).

We reverse coded SRH, such that 1 represents poor health and 5 indicates excellent health. Independent variables are sex (sex j , coded 0 for men and 1 for women), a time-invariant predictor, and satisfaction with personal relationships (satrel ij , rated from 0, not at all satisfied, to 10, completely satisfied), a time-varying predictor. In addition, to examine intraindividual change in SRH and allow for the influence of sex on this change, we included time (time ij ) as a time-varying predictor, yielding the following model:

LMEM Notation

$$ \begin{array}{c}SR{H}_{ij}={\pi}_{0j}+{\pi}_{1j}tim{e}_{ij}+{\pi}_2 satre{l}_{ij}+{E}_{ij}\\ {}{\pi}_{0j}={\beta}_{00}+{\beta}_{01}se{x}_j+{U}_{0j}\\ {}{\pi}_{1j}={\beta}_{10}+{\beta}_{11}se{x}_j+{U}_{1j}\end{array} $$

LCM Notation

$$ \begin{array}{c}\left(\begin{array}{l}SR{H}_{1j}\\ {}SR{H}_{2j}\\ {}SR{H}_{3j}\\ {}SR{H}_{4j}\\ {}SR{H}_{5j}\\ {}SR{H}_{6j}\\ {}SR{H}_{7j}\\ {}SR{H}_{8j}\end{array}\right)=\left(\begin{array}{cc}\hfill 1\hfill & \hfill 0\hfill \\ {}\hfill \begin{array}{l}1\\ {}1\\ {}1\\ {}1\\ {}1\\ {}1\\ {}1\end{array}\hfill & \hfill \begin{array}{l}1\\ {}2\\ {}3\\ {}4\\ {}5\\ {}6\\ {}7\end{array}\hfill \end{array}\right)\left(\begin{array}{l}{\eta}_{1j}\\ {}{\eta}_{2j}\end{array}\right)+B\left(\begin{array}{l} satre{l}_{1j}\\ {} satre{l}_{2j}\\ {} satre{l}_{3j}\\ {} satre{l}_{4j}\\ {} satre{l}_{5j}\\ {} satre{l}_{6j}\\ {} satre{l}_{7j}\\ {} satre{l}_{8j}\end{array}\right)+\left(\begin{array}{l}{E}_{1j}\\ {}{E}_{2j}\\ {}{E}_{3j}\\ {}{E}_{4j}\\ {}{E}_{5j}\\ {}{E}_{6j}\\ {}{E}_{7j}\\ {}{E}_{8j}\end{array}\right)\\ {}\left(\begin{array}{l}{\eta}_{1j}\\ {}{\eta}_{2j}\end{array}\right)=\left(\begin{array}{l}{\alpha}_1\\ {}{\alpha}_2\end{array}\right)+\left(\begin{array}{l}{\varGamma}_1\\ {}{\varGamma}_2\end{array}\right)se{x}_j+\left(\begin{array}{l}{\zeta}_{1j}\\ {}{\zeta}_{2j}\end{array}\right)\end{array} $$

We used the lme4 and OpenMx packages in the R environment (R Core Team 2013) to estimate the models (for the code: see Appendix). Results are shown in Table 8.1 (all estimates are significant at the α = 1% level, unless otherwise indicated). We can see that SRH decreases slightly over time (β 10 in LMEM notation, cf. Eq. 8.6; α 1 in LCM notation, cf. Eq. 8.7), and that women report slightly poorer health (β 01 ; Γ 1 ). Furthermore, women and men have similar overall intraindividual change in SRH (β 11 ; Γ 2 are non-significant). Satisfaction with personal relationships has a positive association with SRH (π 2 ; B), so that an increase of 1 point in satisfaction with relationships yields a 0.044 (0.043 in the LCM) increase in SRH. Finally, compared to the error variance (σ 2 E ), there is a relatively large interindividual variability in SRH (σ 2 I ), and a small variability in intraindividual change (σ 2 S ). The covariance between the random intercept and random slope (correlation = −0.38) indicates that individuals with higher initial SRH have a tendency to decline faster (a larger negative slope) than individuals with lower SRH.

Table 8.1 Estimates of LMEM and LCM to predict evolution of SRH over time with sex and satisfaction with relationships as independent variables

As can be seen from the results displayed in Table 8.1, the solutions estimated from the LMEM and LCM are virtually identical. The slight differences are not meaningful and can be imputed to differences in the optimization algorithms adopted by the two approaches.

Three Notable Extensions

Both the LMEM and the LCM have been extended to accommodate three notable features. First, because of the practical realities of any longitudinal study, a statistical model aimed at analyzing longitudinal data must be able to handle incomplete data. For instance, in any long-term longitudinal study lasting several weeks, months, years, or decades it is virtually impossible that all participants present at inception remain for the total duration. So-called micro-longitudinal studies (such as a psychological experiments lasting a few minutes, during which individuals are repeatedly assessed) are not necessarily immune to this type of dropout mechanism, either. This gives rise to data being incomplete for some, often most, participants.

Pretending data to be complete by limiting one’s analyses to the entities that provided all longitudinal assessments (so-called complete-case analyses) will most likely lead to biased estimation, because the subsample with complete data usually represents a selection of the starting sample. That is, the parameter estimates obtained from the application of any statistical model to complete-case longitudinal data will likely be different (biased) from the population parameters. Note that it is not the quantity of data incompleteness that is necessarily the chief determinant of the amount of estimation bias, but rather the nature of (the process causing) incomplete data.

Second, the frequent use of the LMEM and LCM to single variables assessed repeatedly has motivated multivariate applications to address relations in change processes. It is not realistic, for any substantive life course researcher, to theorize single behaviors or attitudes in isolation. Most often, from a theoretical perspective, it is most interesting to study interrelationships between multiple variables, to assess, for instance, if an entity changing on one behavior also tends to change on another behavior. Multivariate extensions of both the LMEM and the LCM easily address such questions.

Third, it is often illusory to assume that the samples at hand represent homogeneous groupings of entities. We may have reasons to believe that known characteristics of the entities may influence their change process. Males vs. females, wealthy vs. poor, with a low vs. a high level of education, etc. may differ from each other. Rather than considering such characteristics as covariates in the LMEM and LCM, it is possible to use them to create subsamples, which are then compared with respect to all parameters of the models. Moreover, group membership may not be known, but uncovered, in a rather exploratory fashion, from the data. This last application has received wide attention in several disciplines studying the life course.

Incomplete Data

Virtually every longitudinal study faces the reality of not having all assessments at all time points on all participants. For instance, in the illustration above, by the end of the study 50.4% of the initial participants were no longer in the study. Modern methods allow obtaining unbiased estimates despite the incompleteness of the outcome data Y. Under specific conditions (see Rubin 1976), any variable X that may be related to the reasons of data incompleteness in Y is called informative or auxiliary. For instance, older and less healthy participants are more likely to drop out of a longitudinal study than younger and healthier participants. Thus, age and health are informative X variables that are related to the probability of not providing outcome Y values. Informative variables can be added to a longitudinal model to decrease estimation bias. This is why it is of chief importance to measure as many potentially informative X variables as possible at the first wave of a longitudinal study, when attrition (i.e., incomplete data in Y) has not yet occurred. This information may be used in subsequent waves to lessen estimation bias (for more detail, see Graham 2009, and Schafer and Graham 2002). In the previous illustration, satisfaction with personal relationships was related to the probability of dropping out, so that its inclusion in the model does not only address an interesting substantive issue but also reduces overall estimation bias.

In some situations it is desirable to have incomplete data, mainly to avoid other, more deleterious effects. For instance, to understand the interrelationships among a very big set of variables, it is not necessary to administer all variables to all participants. Doing so could introduce major fatigue and demotivation effects, which are likely to lower the validity and reliability of the assessments. Rather, it is possible to randomly create subgroups of participants, each of which is administered a reduced set of variables. What matters is that the subsets of variables overlap across the participants, so that all interrelationships among the variables can be estimated. If carefully planned, then, the administration time and participants’ burden can be reduced dramatically, without increase in parameter bias nor the need for informative variables (McArdle 1994).

In longitudinal research this feature may be of crucial importance. Researchers interested in wide segments of the life course (say, young adulthood, from age 18 to 35 years) can probably not afford to observe an age-homogeneous sample during the entire time epoch of interest (that is, study a sample of 18-year old participants during 17 years, until they are 35 years old). By the means of so-called age-convergence analyses (Bell 1953, 1954) and cohort sequential designs (Schaie 1977), it is possible to reduce the overall length of the study, without having to shorten the time epoch of substantive interest. For instance, three samples, of 18-, 23-, and 28-year old participants, may be observed for seven years. The total interval of time is now covered in seven rather than 17 years, and the partial age overlap of the three samples (18–25, 23–30, and 28–35 years) will allow testing whether the samples develop alike (i.e., convergence).

The software implementing the LMEM is more flexible than that of LCM analyses when dealing with incomplete Y data. Indeed, the optimization algorithms used to estimate parameters with incomplete data sets are usually much more efficient for the LMEM. Nevertheless, the LCM allows for a more flexible treatment of informative covariates, so that in problematic instances of incomplete data the LCM may be the preferred strategy (Graham 2003). However, neither the LMEM nor the LCM compensate directly for missing X values. In that case, two valuable alternative strategies are to either apply multiple imputation first, to impute all incomplete X and Y values, and then apply the longitudinal model (e.g., Carpenter and Goldstein 2004), or to estimate the model in the Bayesian framework (e.g., Muthén and Muthén 1998–2012).

Multivariate Extensions

By definition, a statistical model for longitudinal data must involve multiple variables, in that the outcome of interest must be observed at least twice (at times i = 1 and i = 2). Despite this technicality, it is common to call a longitudinal model applied to a single outcome univariate, denoting that a single Y outcome is analyzed. It follows that a multivariate longitudinal model is one that analyzes at least two outcomes (Y 1 and Y 2 ). In its simplest specification, a multivariate LMEM supposes an intercept and a slope growth factor for each outcome, which covary freely (MacCallum et al. 1997). This allows assessing the degree of communality of multiple change processes. Note that the functional form of growth need not be the same across the outcomes. In the previous illustration it would be possible to study change in both self-rated health and satisfaction with personal relationships. A multivariate LMEM might specify an intercept and a slope for both outcomes and estimate all six covariances among the four growth factors.

The LCM can also be easily adapted to multiple outcomes. Here, the association between the growth factors is not limited to symmetrical effects (i.e., covariances), but may be freely specified by the analyst. For instance, it is possible to let the growth factors of an earlier process predict those of a later process (rather than simply covary with them; Singer and Willett 2003). Such a relation cannot be tested within a multivariate LMEM. In the previous illustration we might want to assess the effect of degree of satisfaction with personal relationship and change therein on self-rated health. Then, we might define a multivariate LCM with satisfaction assessments from 2004 to 2007 and health assessments from 2008 to 2011. We could then estimate the effects of the intercept and slope of the former (and temporally preceding) construct on the intercept and slope of the latter.

Another important multivariate extension of the LCM stems from its implementation within the structural equation modeling framework, where it is possible to define latent variables based on the common, shared variance of a chosen set of observed variables via a common factor model (Spearman 1904). Assuming the common factor can be defined at each wave of measurement of a longitudinal study, the outcome of interest is no longer an observed variable, but the latent variable itself. A LCM (with growth factors of higher order) can hence be specified to assess the growth of the common factor (McArdle called this extension a Curve of Factors Model; McArdle 1988). For instance, rather than relying on a single health question, multiple questions (and/or objective health measurements) could be assessed, to define a common health factor. The LCM would then study the change trajectory not of the single health assessments, but of the common health factor.

Specific multivariate extensions of the LMEM/LCM are particularly useful when trying to assess causality relations between multiple outcomes. For instance, the LCM can be modified to define multiple slope factors, each acting between two adjacent i-1 and i assessments. Then, each assessment at time i-1 can influence the immediately upcoming change between i-1 and i. In a bivariate setting it is then possible to estimate, for instance, the influence of variable A at time i-1 on the change in variable B between i-1 and i, and vice-versa. This analysis would allow determining whether satisfaction with personal relationships drives changes in self-assessed health, or vice-versa, or both variables influence each other’s change. Similar extensions are possible within the multivariate LMEM, with the inclusion of instrumental variables to predict time-invariant and/or time-varying confounders (e.g., Skrondal and Rabe-Hesketh 2004). However, for this extension, as well as for all models discussed here, it is important to bear in mind that they cannot be considered as proofs of causal relationships, because it is not conceivable to assure that no other construct or variable is causing such relationships.

Multiple Group Analyses

The specifications of growth models discussed thus far assume that the sample is homogeneous with respect to the growth process, meaning that all entities within the sample follow the same functional form of growth (albeit allowing for interindividual differences with respect to the magnitude of the growth factors – i.e., random effects). Adding a time-invariant covariate that clearly splits the overall sample in subsamples (e.g., men and women), as discussed in Eqs. 8.4 and 8.5, allows for a difference in mean intercept and mean slope between men and women. However, all other parameters and the shape of change remain unaltered across the sexes. The adequacy of this constraint can be tested. A multiple-group analysis, by sex, of the previous illustration would reveal that men and women have different mean intercepts of health (with women’s inferior to men’s), but similar mean slopes (consistent with the non-significant time-by-sex interaction in Table 8.1).

Splitting the sample into two subsamples allows freeing the equality constraints of all remaining parameters, as well as recovering group-specific growth features that would otherwise go unobserved. For instance, in a clinical trial we would expect a treatment group to react differently from a control group, not just with respect to the average intercept and slope mean, but much more generally. The control group may not undergo any change and remain constant. Hence, the growth model may be reduced to an intercept-only model. The treatment group would probably undergo marked change, as a result of the treatment, which would require the growth model to specify factors to this effect. Such a group difference cannot be modeled by considering group membership as a simple covariate.

In yet other cases we may suspect the existence of different subgroups, with respect to the analyzed change process, but ignore both the groups’ characteristics and individual group membership. Thus, in a standard LCM, we cannot split the sample into subsamples according to the values of a known grouping variable. In this case, the grouping variable is assumed latent or unobserved, and we ignore its values. Recent developments extended the LCM to allow uncovering subgroups based on latent grouping variables. Latent class growth analyses (Nagin 1999) and growth mixture models (Muthén et al. 2002) are two such models, and both are currently only implemented in the LCM framework. Advanced versions of this approach can also be applied to state-trait type models, in order to uncover groups of individuals based on their degree of variability vs. stability (Courvoisier et al. 2007).


The LMEM and the LCM are statistical models of change that have permeated in many scientific disciplines. Today, they are largely used, thanks also to excellent reviews and textbooks, often supplemented by online material (e.g., Collins 2006; Duncan et al. 2006; McArdle and Nesselroade 2014; Pinheiro and Bates 2000; Singer and Willett 2003). Additionally, university curricula in psychology, sociology, demography, gerontology, and other disciplines related to the study of the life course have started integrating courses or workshops that discuss, among other statistical models, the LMEM and the LCM.

We believe that disciplines concerned with the study of the life course and the lifespan can greatly benefit from using the LMEM and the LCM. At the same time, we do not conceive statistics to be an independent academic discipline that mainly provides tools to substantive researchers. The many theoretical and methodological challenges of life course research call for flexible statistical tools. Ideally, statisticians and substantive life course researchers engage in dialogues and make progress together. Increased use of the LMEM and the LCM by life course researchers will motivate statisticians to additionally extend both models. With this chapter we hope to have further encouraged this dialogue.