Identifying typical trajectories in longitudinal data: modelling strategies and interpretations

Individual-level longitudinal data on biological, behavioural, and social dimensions are becoming increasingly available. Typically, these data are analysed using mixed effects models, with the result summarised in terms of an average trajectory plus measures of the individual variations around this average. However, public health investigations would benefit from finer modelling of these individual variations which identify not just one average trajectory, but several typical trajectories. If evidence of heterogeneity in the development of these variables is found, the role played by temporally preceding (explanatory) variables as well as the potential impact of differential trajectories may have on later outcomes is often of interest. A wide choice of methods for uncovering typical trajectories and relating them to precursors and later outcomes exists. However, despite their increasing use, no practical overview of these methods targeted at epidemiological applications exists. Hence we provide: (a) a review of the three most commonly used methods for the identification of latent trajectories (growth mixture models, latent class growth analysis, and longitudinal latent class analysis); and (b) recommendations for the identification and interpretation of these trajectories and of their relationship with other variables. For illustration, we use longitudinal data on childhood body mass index and parental reports of fussy eating, collected in the Avon Longitudinal Study of Parents and Children. Electronic supplementary material The online version of this article (10.1007/s10654-020-00615-6) contains supplementary material, which is available to authorized users.


Introduction
Repeated observations of the same variable over time are increasingly frequent not only in purposely designed observational studies but also in large linked administrative health databases. In most applications, this type of data is analysed using mixed effects models [1,2], leading to estimates of a population average trajectory, parametrised in terms of fixed effects, and the variation of the individual trajectories around this average. The latter is captured by the variances and covariances of subject-specific random effects. More recently, the focus of modelling such data has moved towards investigating whether there are multiple typical trajectories (see for example adolescent smoking [3], treatment response [4] and comorbidity [5]), leading to the characterisation of latent subgroups of individuals who share a common profile over time. Such groups are often referred to as "phenotypes" (e.g., early onset versus late onset of illness). Aiming to classify individuals into subgroups based on their longitudinal data has been described as being a person-centred approach, as opposed to the variable-centred approach typical of many regression analyses [6]. Often however these latent classes are studied in relation to explanatory variables [7][8][9] and/or later outcomes [10][11][12], and thus a person-centred classification may itself become a variable in a regression model, thereby blurring this distinction.
There are several modelling approaches that focus on identifying these trajectories, with alternative strategies available to relate them to earlier variables or later outcomes. The common feature of these approaches is that they all assume that a latent variable, composed of several classes, underlies the heterogeneity in how the variables evolve over time. These common approaches are: 1. Growth mixture models 2. Latent class growth analysis, also known as group-based trajectory models

Longitudinal latent class analysis
In this paper, we provide an overview of these three approaches and compare them in terms of assumptions, feasibility, and interpretation of the derived classes using mixed effects models as a reference. Another class of methods for the identification of latent trajectories are generalizations of cluster analysis (e.g., extentions of k-means clustering to longitudinal data [13]). As these methods do not invoke models, but rather rely on algorithms to classify individuals, they are not considered here. Their performance, however, has been found to be closely related to that of latent class growth analysis when trajectories vary smoothly with time [14].
To discuss the practical implications of adopting each of these modelling approaches above, and to illustrate how differences in resulting classes may derive, we analyse data derived from the Avon Longitudinal Study of Parents and Children (ALSPAC [15,16]).

Mixed effects models
Mixed effects models when applied to longitudinal data, relate outcomes collected on the same individual to their observation times, allowing for the shape of this relationship to vary across individuals. Consider a single outcome variable, Z ij , observed on individual i at times t ij , where i = 1, 2, …, N, and j = 0, 1, …, J. A typical specification of a mixed effects models for continuous outcomes, assuming a linear relationship with time, and the same observation times for all individuals, t j , is where 0i and 1i are individual-specific coefficients, which have fixed ( 0 and 1 ) and random ( u 0i and u 1i ) components, with 0i = 0 + u 0i and 1i = 1 + u 1i . The fixed coefficients 0 and 1 are shared by all individuals, while the error terms u i = u 0i , u 1i are unobserved random variables that capture the individual departures from the population average trajectory, 0 + 1 t j . The error terms u i are usually assumed to be jointly normally distributed with mean zero and free covariance matrix Ω u , and the residual errors ij to be independently and normally distributed, conditionally on u i and t, with constant variance 2 . The ij capture the distance between the observed data for the i-th individual to the true individual-specific trajectory, ( 0i + 1i t j ) (Fig. 1a). Here we consider t j to indicate the actual observation time, so that the relationship with time is properly captured. When information is gathered in terms of waves, as in panel data, we would recommend translating this information into an appropriate time-scale.
(1) Z ij = 0i + 1i t j + ij , When Z ij is an ordered categorical variable, with (K + 1) categories, a mixed effects model is usually specified in terms of a latent continuous variable Z ′ ij specified as where 0i and 1i are defined as before but with the independent error ij following a logistic distribution with mean 0 and variance 2 3 (where is the constant representing the ratio of a circle's circumference over its diameter). The observed categorical variable Z ij is assumed to have been generated from this latent variable according to unobserved cut-points ("thresholds" The thresholds are the expected values of the latent variable Z ′ ij at which an individual transitions from a value k to a value (k + 1) on the categorical outcome variable Z ij .
Generalisations of models (1) and (2) that include nonlinear relationships with time are straightforward, likewise models where the coefficients for these additional non-linear terms include random components, as in Estimation is generally by maximum likelihood (ML, or restricted maximum likelihood when the study is small [17]), with the estimation-maximisation algorithm used in the presence of missing outcome data under the missing at random (MAR) assumption [18].
When individuals are observed at the same times t j , as assumed here, there is an alternative formalization of mixed effects models that arises from to the confirmatory factor analysis framework (and, more generally, the structural equation modelling [SEM] literature). This framework views the random coefficients of a mixed effects model as latent factors, "manifested" by the joint distribution of the longitudinal observations, Z i = Z i1 , Z i2 , , … , Z iJ [19]. Model (1) for example could also be written as where 0i and 1i are the original individual-specific coefficients that are now viewed as latent variables. The regression coefficients j (referred to as "factor loadings" in the SEM literature) are not estimated but are pre-determined according to the timing of the observations. For model (1) the factor loadings would be: etc. This representation of model (1) can be viewed graphically in Fig. 2a, where the factor loadings are shown above the arrows linking the latent individual-specific coefficients to the observed data. Adopting this approach has several advantages, in particular the option of using SEM software for estimation, and also extending the model for example by allowing the error terms ij to have time-specific variances, 2 j , or more complex extensions as discussed below.

Growth mixture models
Growth mixture models assume that there are multiple mixed effects models, each representing a subgroup (i.e. "class") of trajectories that share a common mean and shape (with, potentially, class-specific error variance structures) [20,21]. Growth mixture models are therefore generalisations of mixed effects models (Fig. 1b).
Formally, they are specified as follows. Let C indicate the number of latent classes in the population, distributed with probabilities p c , c = 1,…, C, with 0 ≤ p c ≤ 1 and ∑ C c=1 p c = 1 [22]. As the latent classes are unknown, we model the observed data using as a mixed effects model specific to the latent class c each individual belongs to, with the joint distribution of the data then being a mixture of these distributions, weighted by the probability of each class, p c . For example, a growth mixture model generalisation of model (1) and c ij are defined as before, although specifically for each class c. The graphical representation of this model is shown in Fig. 2b. Assuming that all classes have the same error structure may be unrealistic; therefore class-specific covariances Ω c u for the individual-level error terms are often considered.
For categorical variables, we would specify Z � ij|c = c 0i + c 1i t j + c ij , with c ij following a logistic distribution.
Because the number of classes is unknown, the estimation is carried out conditionally on a pre-specified number of classes. Estimation is by Maximum likelihood (ML) with the expectation-maximization (EM) algorithm because the classes are unobserved [23]. As several local maxima for the likelihood are expected to be found with such complex models, multiple starting points for the estimation routine are recommended, before maximization is deemed to have been reached [19]. Following estimation, posterior class probabilities can be derived and used to assign individuals to classes according to their largest value ("modal assignment"), or to weigh individuals when calculating predicted class frequencies.
In order to identify the number of classes that best fits the data, a number of goodness-of-fit criteria are compared. Those commonly recommended in the literature [24] are the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and its sample size-corrected version (c-BIC). For each of these, lower scores indicate (relatively) better fitting models. The parametric bootstrap likelihood ratio test (BLRT) has also been recommended as an additional comparative tool given its performance in simulations [25]. However it is disadvantaged by being computationally intensive and affected by poor performance in small samples [19]. These goodness-of-fit criteria do not necessarily agree, in the sense that they may not all point to selecting the same model. Hence, additional considerations are often invoked, such as interpretability of the latent trajectories, and the avoidance of too small classes (e.g. < 5% of the study population) that may lead to lack of reproducibility of the results. The quality of the classification of a model, the so-called "entropy", is also often reported, with values close to 1 indicating good classification. Specifically, this is a summary measure that captures how well class membership is predicted given the observed outcomes. However, this interpretation requires the model to be correct, and thus entropy values should not be overinterpreted [25].
As described, these criteria are applied sequentially on models with increasing numbers of classes using the same dataset. It has been suggested that cross-validation should be used instead [26]. This would involve fitting the model with a given number of classes on a subset of the data, followed by using the selected model on the remaining data and assessing its goodness of fit. A more sophisticated version of this would involve k-fold cross-validation. This approach, however, requires larger datasets than those usually available in typical epidemiological studies and would still depend on which goodness of fit criterion is used.

Latent class growth analysis
Latent class growth analysis [27] specifies models that are similar to growth mixture models. However, latent class growth analysis models assume no individual-level random variation within each class, and therefore individuals assigned to the same class share exactly the same trajectory.
Formally, latent class growth analysis specifies models with the same structure as model (5) but with fixed effects regression coefficients, albeit specific to each class. Denoting a latent class growth analysis class by s, this model is expressed as, where Z is a continuous variable and e s ij are independently distributed error terms. Because there is no within-cluster variation (i.e. there are no u s i and the class-specific coefficients s 0 and s 1 are the same for every member of class s), these error terms capture random perturbations of each observed data point from their class specific trajectory (Fig. 1c). The assumption that these errors are independently distributed, as implicit in most software [28,29], may be unrealistic however as one would expect individual trajectories that belong to the same class to be heterogeneous and the individual-specific departures from the class-specific trajectories to be correlated. Departures from this assumption can have consequences, as discussed in "Assumptions".

Longitudinal latent class analysis
These models are a variation of latent class growth analysis models that ignores the longitudinal nature of the data. The model for an individual belonging to the longitudinal latent class r is specified as, where I t=t j are dummy (0/1) indicators of the times when Z ij is observed (Fig. 1d). Hence, latent classes are identified without exploiting the information on the time order of the observations, but also without forcing any parametric relationship between the outcomes and time.

Assumptions
Mixed effects models, growth mixture models, and latent class growth analysis rely on parametric assumptions for the relationship between the observed outcomes and time. These models, together with longitudinal latent class analysis, rely on distributional assumptions for the error terms. Mixed effects models and growth mixture models make additional assumptions regarding the within-subject correlations (parametrized by Ω u and Ω c u , respectively). Violations of these assumptions have different consequences depending on the type of outcome and modelling approach. Misspecified distributions and correlation structures in mixed effects models do not impact on the consistency of the fixed effect estimates when the observed outcomes are continuous, but they may bias inferences [1]. If the outcomes are categorical, however, bias will affect the fixed effects estimates as well [1,30]. Non-parametric specifications of the random effect distributions have been proposed to deal with these issues [31], as described below.
The impact of these misspecifications may also influence the estimated number of classes of a growth mixture model. If, the assumed covariance structure is too simple, the number of classes may be greater because more are needed to capture the variability in the data [32]. For this reason, and as demonstrated in simulations [33], when selecting the number of classes for growth mixture models, one should in principle allow for general specifications, e.g., with class-specific covariance matrices Ω c u . and timespecific residual error variance 2 j [33]. How general these matrices can be, will be limited by the study size and may not be suitable with binary outcome data when their prevalence is low [33].
The assumption of independence for the residual errors e s ij , conditional on class s, which is usually made when performing latent class growth analysis, is most likely to be incorrect, especially when there are several observations per individual. Violations may lead to biased estimates of the class-specific regression coefficients [33] unless the classes are well separated, e.g. entropy > 0.8 [32]. Such bias is more prominent when the true covariance structure is complex, the study size is small (< 500), or the outcomes are binary [30].
Another assumption often made with longitudinal data is that of the outcome data being missing at random (MAR) [18]. This assumes that the propensity of missing an observation, possibly because of an individual dropping-out of the study, depends on the observed data only. If met, model estimation by ML (for mixed effects models), or ML with EM (for growth mixture models), based on incomplete data is not affected by selection bias [17,34]. It is often the case, however, that missingness depends on other variables, most commonly social factors. In such circumstances, one could include the predictors of missingness in the model, as discussed in "Relating latent classes to earlier explanatory variables or to later outcomes".

Interpretation
In interpreting the results of whichever approach, one has to take into consideration all of the issues described above. Of note is that latent class growth analysis models were initially proposed as a semi-parametric version of mixed effects models where the variation in trajectories around a single class is approximated by a number of fixed trajectories, as opposed to assuming jointly normally distributed random effects [35]. In other words, the classes are used to capture the overall variation so that, when the data are truly from a mixture of K classes (as in growth mixture models), a larger number of classes will be needed to extract the main features of the data when adopting latent class growth analysis [23,27]. Thus, interpreting the resulting classes as if they had a theoretical underpinning would be inappropriate in most settings. In contrast, growth mixture models distinguish the typologies represented by the latent classes from the within-class variation. Again, however interpretation should be cautious because of their stronger parametric assumptions.

Analytical strategy
These observations highlight the need for a comprehensive set of model specifications to be considered and then compared, ranging from single class mixed effects models to growth mixture models and then latent class growth analysis and longitudinal latent class analysis models, before concluding whether there are multiple trajectory types and what they capture.
As a first step, we would recommend fitting the most general mixed effects model that the data can identify in order to investigate the extent of between-individual heterogeneities. The distributions and correlations of the predicted random effects from such a model could then be used to aid the interpretation of the best fitting growth mixture and best fitting latent class growth analysis (or longitudinal latent class analysis) models. Comparing the classes predicted from these different model specifications, numerically and/or graphically, would also help clarify whether similar typologies emerge when adopting different modelling approaches.
If, even after allowing for the fact that some of the classes from a latent class growth analysis model actually will aim to capture the distribution of individual trajectories within a particular "true" class, little agreement is found, one should investigate whether model misspecifications might explain the discrepancies. As discussed in "Assumptions", these may lead to biased parameter estimates and/or incorrect selection of the number of classes. Examination of the distributions of the estimated time-specific residuals derived for each class might indicate for example that the model is not properly reflecting the data if they were found to be skewed. This would happen for example if the relationship with time were misspecified in one of the classes.

Relating latent classes to earlier explanatory variables or to later outcomes
Once classes are derived, it is possible to relate them to earlier explanatory variables or later outcomes. Any inferences drawn on these relationships, however, should account for the fact that the classes are not directly observed but derived under certain modelling assumptions. There are two main approaches to achieve this.
The first approach-the "1-step approach"-consists of extending the original model for the latent trajectories to include associations with the explanatory or the later outcome variables of interest. This is easily achieved within an SEM framework (Fig. 2c, d), with the joint estimation of the latent classes and their relationship with other variables (respectively the "measurement" and "structural" parts of the SEM model) accounting for the uncertainties of class assignment.
The second commonly used approach breaks down the estimation into three steps ("3-step approach"). The best fitting latent trajectory model is fitted (1st step) and then used to assign individuals to their most likely class using the predicted posterior probabilities of belonging to each class (2nd step). These classifications are then included as outcomes or predictors in the relevant new analyses, accounting for the uncertainty of the classification performed in step 2 (via the probabilities of the true class given the assigned class estimated in step 1) [36].
The first approach is not generally recommended when the aim is to relate explanatory variables to the latent classes because the identification of the latent classes is potentially affected by which variables are included in the model [37]. One exception to this concern is when the reason for including the covariates in a 1-step analysis is to meet the MAR assumption when the longitudinal outcome data are affected by missingness. In this case, one would want to condition on these covariates to avoid the bias that would arise from analysing incomplete data.
More serious concerns arise when relating latent classes to a later outcome, because in the latter case the outcome has the same direction of association with the classes as the longitudinal variables that lead to their identification (see Fig. 2d; [36]).
When the entropy of the latent class model is greater than 0.80, results from the 1-or 3-step approach have been found to be similar [36]. In practice, however, the 1-step approach may be unfeasible, especially when the longitudinal data are categorical, so that the 3-step approach should be adopted (with multiple imputation if missingness depends on covariates, and with the selection of the number of classes made from the most frequently best solution among the imputed sets).

Participants
We analysed data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a population based, longitudinal cohort of mothers and their children born in the southwest of England, to illustrate the different modelling strategies. Details of the study are given elsewhere [15,16]. Briefly, all pregnant women expected to give birth between the 1st April 1991 and 31st December 1992 were invited to enrol in the study. From all pregnancies (n = 14,676), 14,451 mothers opted to take part, and 13,988 of their children were alive at 1 year. Analyses are restricted to girls only for simplicity, after randomly selecting one child per set when birth was from a multiple pregnancy. Please note that the study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool: http://www.brist ol.ac.uk/alspa c/resea rcher s/our-data/.

Longitudinal variables
We aimed to model the repeated measures of a continuous and an ordinal variable: a. Body mass index (BMI; in kg/m 2 ), objectively measured up to six times when participants were (around) 8, 10, 11, 12, 13, and 16 years. Height was measured to the nearest millimetre with the use of a Harpenden Stadiometer (Holtain Ltd.). Weight was measured with a Tanita Body Fat Analyzer (Tanita TBF UK Ltd.) to the nearest 50 g. b. Parental reporting of fussy eating consisted of responses to the question "How worried are you because your child is choosy?" for which there were three possible answers: "No/did not happen", "Not worried", and "A bit/greatly worried". These were observed up to eight times during the first ten years of life, specifically at around 1.3, 2.0, 3.2, 4.6, 5.5, 6.9, 8.7, and 9.6 years. A more detailed description of these data can be found in Herle et al. [38].

Explanatory variable
Birth weight (in kg) was used as the explanatory variable of interest in our examples. This variable was available on 4462 (99%) girls among those with at least one longitudinal BMI and on 5750 (99%) girls with at least one fussy eating measurement. Mean birth weight was 3.37 kg (SD = 0.51) and 3.36 kg (SD = 0.51) in these two subgroups. It was internally standardized using these means and SDs in the analyses.

Later outcome
Body fat mass index (FMI) [39] at age 18 years was the later outcome of interest. It was defined as the ratio of total body fat mass (in kg) over height (in metres) squared. Body fat was objectively measured using the Tanita Body Fat Analyser (Model TBF 401A) and height as described above. Data on FMI were available on 2443 (55%) girls with at least one longitudinal BMI measurement and 2464 (42%) girls with at least one longitudinal fussy eating measurement. Mean FMI was 21.57 kg (SD = 9.56) and 21.62 kg (SD = 9.52), respectively.

Computer code
Examples of Mplus and Stata code used for these analyses can be found in https ://githu b.com/Morit zHerl e/Ident ifyin g-typic al-traje ctori es-in-longi tudin al-data. Some of these analyses can also be performed in R (with the lcmm package); the relevant code can also be found in this depository.

Ethics
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Figure 3a shows the observed individual BMI trajectories for all participants with at least one BMI observation ("spaghetti plot"), while Fig. 3b shows the equivalent plot ("lasagne plot") for the categorical fussy eating variable, with a change in colour along time representing a change in category. Both variables show considerable and increasing variation over time, as well as an increasing frequency of missing data. Details of the completeness of the longitudinal BMI data are given in Supplementary Tables 1 and 2; they highlight that the majority of the participants included in these analyses had six data points and are therefore quite complete. A total of 4517 girls had at least one longitudinal BMI measure and 5824 girls had at least one longitudinal parental report of fussy eating. In the following, we assume that MAR was satisfied and included in the analyses all girls with at least one longitudinal observation of the relevant outcome variable.

Mixed effects models
As a first step, we fitted mixed effects models to the longitudinal BMI measures, with age capturing the dependence on time. Given the observed trajectories, we specified models that included a linear and quadratic term for age and correlated random effects for the intercept and the two slopes of the linear and the quadratic age term, before considering simpler specifications. The resulting best fitting model had random intercepts, and random coefficients for the linear and the quadratic age term (with freely estimated covariances and residual variances; details in Supplementary Table 3).

Growth mixture models
Mixed effects models can be viewed as single class growth mixture model. Hence, their goodness of fit can be compared with that obtained from growth mixture models with increasing numbers of classes, allowing for class-specific covariance structures [33]. We compared growth mixture models with up to six classes using the criteria described in "Growth mixture models". Among the models that assumed the same error structure across classes, the three-class solution fitted the data best according to the AIC, BIC and c-BIC (red and blue line in Fig. 4, Supplementary Table 4). Relaxing this assumption led to improvements in these indices but led to substantially worse entropy (it dropped from 88.9% to 57.5% for the (best) 3-class solution) and to some very large values for elements of the estimated covariance matrix Ω c u for one of the classes. Hence, we selected the three-class growth mixture model with homogeneous error structure as the best fitting model.
The selected model's predicted average trajectories are shown in Fig. 5a, together with the trajectory predicted by is very similar to that predicted by the mixed effects model. This is not surprising given that class 1 is the most frequent, with 88% of the participants assigned to it according to their predicted posterior probability. Class 3 (GMM-3; 5%, in green) has a fairly parallel trajectory to that of GMM-1, albeit starting from a higher value. Class 2 (GMM-2; 7%, in blue) starts at a lower value than GMM-3 but increases faster over time, leading to the highest predicted BMI by age 16 years.

Latent class growth analysis
Six alternative specifications of latent class growth analysis model were fitted to the longitudinal BMI data. The fourclass solution gave the best fit according to the goodnessof-fit criteria (green line in Fig. 4, Supplementary Table 4). The predicted trajectories for the four classes do not cross (Fig. 5b), unlike those identified by the best fitting growth mixture model. The second class (LCGA-2, in blue, 37%) overlaps with the trajectory predicted by mixed effects model (MEM; in black).
As expected, the lack of intra-class variability assumed by the latent class growth analysis model led to a greater number of classes than found by the best fitting growth mixture model. However, they also differed in shape. This might derive from biases affecting either model as a consequence of incorrect assumptions about the correlation structure of the BMI observations.

Longitudinal latent class analysis
The same modelling steps used to select the best growth mixture model and latent class growth analysis model were used when fitting longitudinal latent class analysis models. The best fitting model predicted identical trajectories to those obtained by latent class growth analysis. This is not surprising since the only difference between the two models is how time (here age) is accounted for: it is included as a continuous explanatory variable in the latent class growth analysis specification (and modelled here using a quadratic function) while it is an ordered categorical variable in longitudinal latent class analysis (see Supplementary Table 4).

Changing outcome scale
Each of the fitted models described above assumes that the residual errors are normally distributed, conditionally on class (in latent class growth and longitudinal latent class analysis) or on class and individual (for growth mixture models). If this assumption is inappropriate, results may be biased, with our conclusions regarding the number of latent classes potentially erroneous [40]. For this reason, we refitted all models on log-transformed BMI values, given its known skewness.
The new mixed effects model (i.e., the one-class GMM) showed a marked improvement in fit (in terms of AIC, BIC and c-BIC) relative to the mixed effects model fitted on the original scale (Supplementary Table 4). The best fitting growth mixture model fitted on the transformed data had four classes (with no gain in the fit indices when allowing class-specific covariance structures; Fig. 4). Interestingly, the most frequent classes of the growth mixture model fitted on the original and log-transformed values (GMM-1, in red, in Fig. 5a, c, respectively 88% and 75% of the total) have very similar trajectories (after back-transformation of the latter). However, the solutions differed with regards to the remaining classes, with the classes derived from the new model showing a separation of the individuals who start with moderate values: log-transforming the data on BMI before fitting the growth mixture model separated a group of individuals who continue to increase their BMI (GMM-3, in blue, 6%) from those whose increase slows down after age 12 (GMM-2, in pink, 13%).
The best latent class growth analysis model fitted on logtransformed BMI had five classes (Fig. 5d), with the new class showing a finer separation among the individuals. Again, the best fitting latent class growth analysis model has more classes than the best fitting growth mixture model, with the first three latent class growth analysis classes nearly completely overlapping with GMM-1 (hence capturing its distribution). However, the other classes do not follow this pattern (Supplementary Fig. 1).
In order to further interrogate these results, Fig. 6 compares the distributions of the random intercepts, linear slopes, and quadratic slopes predicted by the mixed effects models fitted on log(BMI), with those of the equivalent random coefficients predicted by the growth mixture model with four classes and the latent class growth analysis model with five classes (although the latter-by definition-do not have any within-class heterogeneity). The skewed distributions of the random intercepts predicted by the mixed effects model are neatly separated into the four growth mixture model classes. In contrast, the class-specific intercepts given by the latent class growth analysis do not fully reflect the spread of the mixed effects model random intercepts, and do not capture at all its extreme values. Similar comments apply to the distributions of the predicted random slopes (especially the linear ones, Fig. 6). Examination of the estimated residuals from each of these models showed no particular skewness. There is therefore no direct evidence of departure from the models' assumptions. Note however that all estimates were obtained assuming the missing mechanism was MAR. If this were not the case, estimates would be affected by selection bias in directions that cannot be predicted. We discuss this further in "Alternative estimation approaches".

Mixed effects models
Several mixed effects models for the longitudinal categorical fussy eating were fitted, with age capturing the dependence on time and with alternative specifications of their random components. The model with linear and quadratic terms for age and random intercepts, linear and quadratic slopes fitted the data best. The stacked predicted probabilities of parental reporting of, respectively, "Did not happen", "Not worried", and "A bit/greatly worried" are shown in Fig. 7a (details in Supplementary Table 6). They show stable probabilities over time of each category of parental reported fussy eating.

Growth mixture models
A three-class growth mixture model with homogeneous error structure (Ω u ) was selected as the best fitting model, with attempts at relaxing this assumption resulting in noconvergence (Supplementary Table 7). According to this classification, most children (75.1%) were assigned to class 1 (GMM-1, Fig. 7b), which is characterised by predicted probabilities closely resembling those identified by the mixed effects model. The second most common class (GMM-2, 16.5%, Fig. 7c) comprises parents reporting that fussy eating "did not happen" with increasing predicted probabilities. The smallest class (GMM-3, 8.4%, Fig. 7d) includes children whose parents report worrying (a bit or greatly) with high and increasing probabilities, while "not worrying" and "did not happen" are reported with fast decreasing probabilities over time.

Latent class growth analysis
Latent class growth analysis of fussy eating suggested six classes fitted the data best (Supplementary Table 7). The cumulative predicted probabilities of each of the three categories varied substantially across classes (Fig. 8). Class 1 (LCGA-1; 20.9%) identifies a group of children whose parents report that fussy eating "did not happen" with high and fairly stable probabilities, while in class 2 (LCGA-2; 6.9%) parents worry (a bit or greatly) about their children's fussy eating with fairly stable probabilities. In class 3 (LCGA-3; 37.5%) parents mostly do "not worry". The other three classes show time-varying probabilities: LCGA-4 (5.9%) and LCGA-5 (19.8%) comprise parents that progressively increasingly and progressively decreasingly worry. LCGA-6 (9.1%) includes parents that progressively increasingly report that fussy eating "did not happen".

Longitudinal latent class analysis
The best fitting longitudinal latent class analysis model also consisted of 6 classes (Supplementary Table 7). The shape of their predicted probabilities is very close to those derived from the best fitting latent class growth analysis. Again, the role of time seems to be well captured by the linear and quadratic terms used in the selected latent class growth analysis model.

Comparisons
The distributions of the random coefficients predicted by the mixed effects model, growth mixture model with three classes, and latent class growth analysis model with six classes are compared in Fig. 9. The distribution of random coefficients of the largest growth mixture model class (GMM-1) practically coincides with those for the mixed effects model, with the class-specific coefficients for LCGA-3 mirroring their means. In contrast, the distributions for GMM-2 and GMM-3 capture respectively the lower and upper tails of the mixed effects model distributions.There is a similar spread across the latent class growth analysis coefficients. The riverplot that links the classes predicted by the growth mixture with those from the latent class growth analysis model shows that LCGA-3 and LCGA-5 make up most of GMM-1 (all capturing large probabilities of being "not worried"), while LCGA-1 and LCGA-6 correspond to GMM-2 (large probabilities of "did not happen"), and LCGA-2 and LCGA-4 to GMM-3 (decreasing probabilities of "did not happen"; Supplementary Fig. 2). Hence, once it is recognised that the latent class growth analysis captures within-class variation by creating further classes, we find that the two modelling approaches lead to similar classifications.

Associations with explanatory and outcome variables
The interpretation of the classes derived so far may be enhanced by relating them to precursors or later outcomes.

Association with birth weight
The best fitted models for log(BMI) and fussy eating were used to define their respective latent classes before relating them to birth weight (as in Fig. 2c). Multinomial logistic regression models were fitted where the probability of belonging to each class depended on this explanatory variable. Results are reported in terms of estimated relative risk ratios (RRR), i.e., ratios of the relative probability of being in a given class over the probability of being in the reference class, per 1 standard deviation (SD) increase in birth weight. We chose as reference the most frequent class from each growth mixture model; Class 1 for both longitudinal log(BMI) and fussy eating, and the closest latent class growth analysis class to these reference classes. These were respectively class 2 for log(BMI) and class 3 for fussy eating. We report results obtained using the 3-step approach first, before comparing them with those from the 1-step approach. Birth weight was associated with an increased risk of being in the highest BMI growth mixture model class (GMM-4), relative to the reference (GMM-1), with an estimated 32% increase in relative risk [RRR = 1.32, 95% confidence interval (CI): 1.05, 1.68] per 1 SD increase in birth weight (Table 1, 3-step results). The estimated RRRs across the latent class growth analysis classes for BMI, relative to LGCA-2 (which trajectory is similar to GMM-1), show a negative association for the first class (LCGA-1) and a positive association with the other classes (Table 1). Of note is the similarity in RRRs for LCGA-5 and GMM-4. These results highlight, regardless of modelling approach, a positive association between birth weight and trajectories with persistently higher BMI.
With regards to the fussy eating classes, the RRRs for GMM-2 and LCGA-1 (relative to their respective, and comparable, reference classes) show increased relative risks with higher birth weight (Table 1, 3-step results). These two classes are characterised by large (GMM-2) and increasing (LCGA-1) frequencies of parental reporting that fussy eating "did not happen". The opposite is seen with the classes characterised by increasing (GMM-3, LCGA-4) or large and stable (LCGA-2) parental worrying about fussy eating. Their RRR are all less than 1 (0.92-0.96). It appears therefore that fussy eating may be less commonly reported in children with greater birth weight.

Association with fat mass index
Similar steps were followed to examine the association between the BMI and fussy eating classes and FMI at age 18 years (Fig. 2d). Linear regression models were fitted on log-transformed FMI to address the right-skewness of its distribution.
The results for the BMI classes derived from the growth mixture and latent class growth analysis models are in agreement again, and in line with expectations, with larger average FMI associated with higher BMI trajectories ( Table 2, 3-step results). Interestingly, GMM-3 and GMM-4 show similar differences in log-FMI, relative to GMM-1, both larger than those found for GMM-2 (characterized by relatively higher BMI but only initially), indicating the importance of BMI in later adolescence.
Among the latent class growth analyses classes, only LCGA-2 has similar average FMI to the reference class LCGA-3, despite being characterised by very different trajectories, all the other LCGA classes having instead lower mean values. All these differences are however small and estimated with wide confidence intervals.
Overall, FMI at age 18 is on average higher in individuals that belong to classes with persistently high BMI values, as identified by the two modelling approaches as GMM-3 and LCGA-5. The growth mixture model gives an additional insight in identifying also GMM-2 as having the highest Fig. 8 Stacked predicted probabilities of parental reports of their child's fussy eating ("Did not happen", "Not worried" and "A bit/greatly worried") predicted by the best fitting latent class growth analysis (LCGA) with 6 classes; Avon Longitudinal Study of parents and children (ALSPAC) study, N = 5824 average FMI. This class has average BMI onset but the fastest increase over time. In contrast, no clear associations were found between FMI and the fussy eating classes derived from either modelling approach. This is likely a reflection of the complex consequences of fussy eating. Fussy children might only like a small variety of foods, some of which may have high caloric content [41].

Alternative estimation approaches
The results concerning the relationship between the explanatory/outcome variable and the latent classes were generally very similar when performing the 1-step or 3-step approach for latent class growth analysis. This was not the case when fitting growth mixture models. When relating birth weight with the growth mixture model classes, the 1-step approach identified different classes in comparison to the 3-step approach. This is a consequence of the impact of birth weight on the identification of the classes. When relating the growth mixture model classes to later FMI, we encountered no-convergence, even when the model was simplified by constraining the quadratic random effect to have zero variance. The derived classes were different between the 1-step and 3-step approach regardless.
To account for the possible departure from MAR, we also fitted both models conditionally on maternal education and maternal age at birth of the child (Supplementary Table 8). Similar to what happened with birth weight, this led to different frequencies of the latent classes when using the 1-step approach with growth mixture models. It therefore seems advisable to avoid using a 1-step approach when fitting growth mixture models.

Final remarks
We have compared three different analytical approaches to derive latent trajectories from a variable observed longitudinally. In doing so we have reviewed the assumptions invoked when fitting these models (summarised in Table 3), and highlighted the importance of carefully evaluating them, because misspecifications may lead to biased estimates of the trajectories and to an overestimation of the number of classes. For this reason, any interpretation of the resulting classes needs to take into account possible sources of misspecification of the models and of the impact such misspecifications may have. Additionally, when describing the classes identified by latent class growth analysis (and its simplification, longitudinal latent class analysis), one should acknowledge their derivation as non-parametric representations of variation in the individual trajectories, as opposed to just (possibly substantive) underlying typologies.
Our view is that each of these modelling approaches offers a useful representation of the heterogeneities in individual trajectories and that much can be learnt from    Examining the predicted trajectories against those from latent class growth analysis ones may help identify whether the relationship with time assumed in the latter should be modified comparing results. We have found that starting the analyses by first fitting a mixed effects model to the data helps understanding the data and that much in gained by examining the correspondence across classes obtained from different models, and by locating the class-specific parameters estimated by latent class growth analysis within the distributions of predicted random effects from the corresponding mixed effects and latent growth models.
Comparing the strength and direction of the associations between the latent classes and both birth weight and fat mass index was enlightening for the understanding of the underlying typologies. Furthermore, assessing the support for an association between known predictors and the classes (or the classes and a subsequent outcome) offers insights into the typologies captured by the classes. However, much care should be invested in comparing results across models to avoid overinterpreting the results.
Generalizations of these models to more than one longitudinal variable are in principle straightforward, although they lead to complexities in both specification and estimation. Not surprisingly, even greater caution should accompany the interpretation of any resulting latent trajectories from multivariate longitudinal data.
In summary, this overview and the applications presented stress the importance of extensive and careful modelling, the advantages of comparing results across modelling approaches, and the need to temper the temptation of interpreting the classes derived by any of these models as confirmed phenotypes.