Subgroup detection in linear growth curve models with generalized linear mixed model (GLMM) trees

Growth curve models are popular tools for studying the development of a response variable within subjects over time. Heterogeneity between subjects is common in such models, and researchers are typically interested in explaining or predicting this heterogeneity. We show how generalized linear mixed effects model (GLMM) trees can be used to identify subgroups with differently shaped trajectories in linear growth curve models. Originally developed for clustered cross-sectional data, GLMM trees are extended here to longitudinal data. The resulting extended GLMM trees are directly applicable to growth curve models as an important special case. In simulated and real-world data, we assess the performance of the extensions and compare against other partitioning methods for growth curve models. Extended GLMM trees perform more accurately than the original algorithm and LongCART, and similarly accurate as structural equation model (SEM) trees. In addition, GLMM trees allow for modeling both discrete and continuous time series, are less sensitive to (mis-)specification of the random-effects structure and are much faster to compute.


Introduction
Development over time is of prime interest in psychological research.For example, in educational studies researchers may want to model student's academic development over time; in clinical studies researchers may want to model patients' symptoms over time.
Mixed-effects or latent-variable models can be used to model such trajectories and allow for explaining heterogeneity with covariates of a-priori known relevance (e.g., McNeish arXiv:2309.05862v1[stat.ME] 11 Sep 2023& Matta, 2018).However, when these covariates are not known in advance, methods for identifying them are needed.
As an example, trajectories of science knowledge and skills from a sample of 250 children are depicted in Figure 1.The children were assessed at three timepoints across grades 3 through 8 * .The red line depicts the estimated average trajectory, while the gray lines depict individual trajectories.The gray lines reveal substantial variability between the children, both in initial levels and growth over time.An obvious research aim would be to identify covariates that can explain or predict this heterogeneity.

Recursive Partitioning Methods for Growth Curve Models
Recursive partitioning methods (RPMs), also known as "trees", allow for identifying relevant predictors from a potentially (very) large number of covariates.Figure 2 shows an example tree, which identified socio-economic status (SES), gross motor skills (GMOTOR) * Further details on the source of these data are provided in Study III.

Figure 1
Growth curves of science ability.Note.Gray lines depict observed individual trajectories, red line depicts average growth curve as estimated with a linear mixed-effect model, comprising a fixed effect of time and a random intercept with respect to individuals.The x-axis is not linear in the number of months, because time was scaled as (# of months) Partitioned growth curves of science ability.Note.The x-axes represent the number of months after the baseline assessment, y-axes represent science ability.Gray lines depict observed individual trajectories, red lines depict average growth curve within each terminal node, as estimated with a linear mixed-effect model comprising node-specific fixed effects of time and a random intercept with respect to individuals.The x-axis is not linear in the number of months, because time was scaled as (# of months) 2 3 in order to obtain approximately linear trajectories.The table presents numerical estimates of fixed intercepts and slopes.
and internalizing problems (INTERN) from a set of 11 socio-demographic and behavioral characteristics of the children, assessed at baseline.Five subgroups were identified, corresponding to the terminal nodes of the tree, each with a different estimate of the fixed intercept and slope.Groups of children with higher SES also have higher intercepts, indicating higher average science ability.The group of children with lower SES (node 2) is further split based on gross motor skills, with higher motor skills resulting in a higher intercept.The group of children with intermediate levels of SES (node 6) is further split based on internalizing problems, with lower internalizing problems resulting in a higher intercept.
The two groups (or nodes) with higher intercepts also have higher slopes, indicating that children with higher ability also gain more ability over time.
The tree in Figure 2 has been estimated with generalized linear mixed-effects model trees (GLMM trees;Fokkema, Smits, Zeileis, Hothorn, & Kelderman, 2018).GLMM trees were originally proposed for subgroup detection in clustered cross-sectional studies, where subjects are nested in treatment centers, classrooms and/or geographical areas, for example.
In the current paper we extend GLMM trees, so that they can be applied to partitioning LGCMs.The general idea of GLMM trees is appealing for subgroup detection in almost any type of mixed-effects model.Compared to clustered cross-sectional data, however, longitudinal data may require a different estimation approach: The variance of random effects tends to be higher with longitudinal data, and the predictors of interest tend to be measured at higher levels (e.g., time-invariant covariates).In this paper, we propose and test two extensions of GLMM trees that account for these characteristics.We focus on the specific use case of partitioning LGCMs, but the extensions are critical for a wider range of settings where covariates are measured at higher levels and/or where the random effects have substantial variance.
The main characteristic that sets GLMM trees apart from other methods for partitioning LGCMs is its local-global estimation approach: GLMM trees do not fit a full parametric model in each of the subgroups defined by the terminal nodes of the tree.Instead, fixed-effects parameters are estimated locally, using the observations within a terminal node, and the random-effects parameters are estimated globally, using all observations.This local-global estimation approach was first proposed by Hajjem, Bellavance, and Larocque (2011) and Sela and Simonoff (2012) for trees with constant fits (i.e., intercepts only) in the terminal nodes.With GLMM trees, the approach was generalized to allow for GLMs with any number of parameters in the terminal nodes, thus allowing for non-Gaussian responses and targeted detection of a wide range of possible interaction effects in mixed-effects models (Fokkema et al., 2018).

Other methods for partitioning
LGCMs take a fully local estimation approach: Within every node or subgroup defined by the terminal nodes, a full parametric model is estimated based on the observations in that subgroup only.This fully local estimation approach provides more flexibility, but yields higher computational burden and model complexity.In contrast, GLMM trees estimate a (much) lower number of random-effects parameters, which likely reduces overfitting and improves stability and generalizability of † Both IT methods specifically target subgroups with different time-by-treatment interactions, so are not generally applicable for partitioning growth curve models.
the results.Furthermore, the fully local estimation requires possible partitioning variables to be measured at the highest level of nesting, while GLMM trees' local-global estimation approach allows partitioning variables to be measured at any level.
The computational advantage of GLMM trees is strongest compared to longRPart, longRPart2, IT-LT and LRT-based SEM trees.These methods employ an exhaustive split detection procedure, where for every possible split point in the current node, the full parametric model needs to be re-estimated in the two resulting nodes.To choose the optimal split, the splitting criterion (such as a p-value from a likelihood-ratio test) is derived from these two models.Not only does this bring a heavy computational load, it also introduces a selection bias towards covariates with a larger number of possible cutpoints (Shih, 2004;Shih & Tsai, 2004).LongCART, MELT, GEE-based decision trees and score-based SEM trees also fit full parametric models in each of the nodes, but do not require model refitting for cutpoint selection; they employ the predictions or residuals from the fitted model in the current node for selecting the best split.This reduces computational load, while it also allows for separating variable and cutpoint selection, thus preventing selection bias.The GLMM tree algorithm shares these advantages, because it also employs a two-step approach to split selection.
Given their unbiased variable selection, lower model complexity and computational burden, GLMM trees might be particularly useful for subgroup detection in LGCMs.The next section explains how GLMM trees are estimated and propose adjustments for partitioning longitudinal trajectories.Next, the performance of the proposed adjustments is evaluated: In Study I, we assess performance in simulated datasets, in Study II, we compare performance of GLMM trees with that of two other methods: SEM trees and LongCART.
In Study III, we assess performance of the proposed adjustments in existing datasets on children's development of reading, math and science abilities.In the Discussion, we summarize our findings and discuss implications.

Estimation of GLMM trees and Adaptations for Longitudinal Data
In the GLMM tree model (Fokkema et al., 2018), expectation µ i of outcome vector y i is modeled through a linear predictor and suitable link function: (1) Throughout this paper we focus on the case with a continuous, normally-distributed response y i with constant variance σ ϵ .Therefore, the identity function can be used for the link g but generalizations to other response variable types within the GLM are straightforward.In the general notation above, X i is the n i × (p + 1) fixed-effects design matrix for subject i (i = 1, . . ., N ), comprising p regressors plus one column of 1s for the intercept.In the following, we assume that time is the predictor variable of interest (i.e., p = 1), where the number n i and spacing of observed timepoints may differ between subjects.The fixedeffects parameters β j (here, intercept and time slope) in GLMM trees are node-specific, i.e., their value depends on the subgroup/node j into which subject i falls.As in a traditional GLMM, Z i is the random-effects design matrix for subject i, comprising a subset of columns of X i , and b i is the corresponding vector of random effects for subject i.Finally, b i is assumed to follow a (possibly multivariate) normal distribution with mean zero and (co)variance Σ.
The parameters of a traditional GLMM can be estimated, among other techniques, by maximum likelihood (ML) or restricted ML (REML).Thus, when it is known into which node j each subject i falls, the GLMM specified by Equation 2 can be fitted "as usual", yielding local subgroup-specific fixed-effect estimates βj and global random-effect estimates bi .To infer the subgroup membership for all observations i, the random-effect estimate is treated as a known offset and a GLM tree is estimated using the model-based (MOB) recursive partitioning algorithm of Zeileis, Hothorn, and Hornik (2008).The overall GLMM tree model is then estimated by alternating between estimating the partition (i.e., subgroups or terminal nodes j), and estimating the random-and fixed-effects parameters, as per the following algorithm: 0. Initialize by setting step r = 0 and all random-effect estimates bi,(r) = 0.
1. Set r = r + 1. Fit a GLM tree using Z i bi,(r−1) as an offset, yielding the partition j (r) .
2. Fit the mixed-effects model g(µ i ) = X i β j,(r) + Z i b i,(r) with the partition j (r) from Step 1. Extract the random-effect estimates bi,(r) from the fitted model.

Repeat
Steps 1 and 2 until convergence.
This initialization simply assumes zero random effects.Convergence of the algorithm is monitored through the log-likelihood of the mixed-effects model fitted in Step 3. Typically, this converges when the partition j (r) from the GLM tree is the same as j (r−1) from the previous step.
The following two subsections describe alternative approaches for the initialization in Step 0 and for fitting the GLM tree in Step 1.Each subsection first reviews the wellestablished methods and then proceeds to discuss modifications that may improve performance when partitioning longitudinal data.
(2018), we found initializing estimation of GLMM trees with zero random effects performed well in cross-sectional clustered data.With longitudinal data, however, random effects tend to be more pronounced: Repeated measures on the same subjects tend to be correlated more strongly than observations nested within the same unit in cross-sectional data.If random effects are sizable, the initial assumption of zero random effects could provide an unrealistic starting point that may be difficult to overcome in subsequent iterations.Our expectation is that for partitioning LGCMs, initializing estimation with the random effects instead of the subgroup structure may improve subgroup recovery.Specifically, that means the algorithm starts by estimating the classic version of the mixed-effects model from Equation 2with just one set of fixed-effects coefficients β and all subjects in a single group.
The alternative initialization step is thus: 0 ′ .Initialize by setting step r = 0 and fit the mixed-effects model g( to the full sample.Extract the random-effect estimates bi,(r) from the fitted model.
To illustrate, we applied both initialization approaches to the dataset from Figure 1.
In fact, the tree presented in Figure 2 was estimated by initializing with the random effects.Initializing estimation assuming zero random effects resulted in the tree in Figure 3.
The split based on gross motor skills in Figure 2 was not implemented in Figure 3, while additional splits were implemented based on gender, race, internalizing problems and fine motor skills.Considering the relatively large number of subgroups in Figure 2, some with relatively small sample sizes, this tree may overfit and not generalize well to other samples.Note.The x-axes represent the number of months after the baseline assessment, the y-axes represent science ability.Gray lines depict observed individual trajectories.Red lines depict the average growth curve within each terminal node, as estimated with a linear mixed-effect model with a node-specific fixed effect of time and random intercepts estimated with respect to individuals.

Partitioning
The subgroup structure in Step 1 is estimated by a GLM tree using the general model-based recursive partitioning (MOB) algorithm of Zeileis et al. (2008).Here, we give a general overview and then comment on the aspects of the algorithm that may be particularly relevant for LGCMs.In the case of GLMs (with random effects held constant in an offset), the MOB algorithm cycles iteratively through the following steps: (a) Fit the GLM to all observations in the current subgroup.
(b) Test for instability of the GLM parameters with respect to each of the partitioning variables.
(c) If there is some overall parameter instability, split the subgroup with respect to the partitioning variable associated with the highest instability.Note.The x-axes represent the number of months after the baseline assessment, y-axes represent science ability.Gray lines depict observed individual trajectories.Red lines depict the average growth curve within each terminal node, as estimated with a linear mixed-effect model with a node-specific fixed effect of time and random intercepts estimated with respect to individuals.
conditions, the scores have an expected value of 0. The parameter stability tests evaluate whether the scores fluctuate randomly around this mean of 0, or exhibit systematic deviations when ordered by the values of a covariate available for partitioning.For continuous covariates u k (or ordered covariates with a large enough number of unique values), this involves computing the following cumulative score process W k (t) with respect to each potential partitioning variable (Zeileis et al., 2008): where Ĵ is a suitable estimate of the covariance matrix of the parameter estimates, and n j gives the number of observations in the current subgroup.Further, ψσ(u ik ) denotes the scores evaluated at the parameter estimates, with subscript σ(u ik ) denoting their ordering by the values of partitioning covariate u k .Note that 0 ≤ t ≤ 1, thus n j t = 1 for an observation associated with a unique minimum on the partitioning variable, and n j t = n j for an observation with a unique maximum.
From the cumulative score process W k (t), a range of test statistics can be derived which capture increased fluctuations (beyond the random fluctuation under parameter stability).For numerical partitioning variables, a maximum Lagrange multiplier test statistic can be computed, which takes the maximum of the squared Euclidean norm of W k (t), weighted by its variance (Zeileis & Hornik, 2007).This statistic is referred to as the supLM statistic, and is asymptotically equivalent to the maximum of likelihood-ratio statistics.Approximate asymptotic p-values for the supLM statistic can be computed with the method of Hansen (1997).Categorical covariates do not provide an implicit ordering and scores are therefore binned at each level of the covariate.From these, a test statistic is computed that does not depend on the ordering of the levels (Merkle, Fan, & Zeileis, 2014).
When partitioning longitudinal data, covariates will often be measured at the subject level (i.e., time-invariant covariates), which should be accounted for in computing the estimated covariance matrix Ĵ.In general, this computation makes use of the scores.By summing the scores within clusters prior to computation of the covariances, so-called clustered covariances are obtained, which account for dependence between observations within the same cluster (Zeileis, Köll, & Graham, 2020).This resembles a GEE-type approach with an independence correlation structure.Our expectation is that in partitioning LGCMs, use of clustered covariances in the parameter stability tests will improve subgroup recovery.
The tree in Figure 4 3) and 0.23 (Figure 4, indicating that the more parsimonious the tree structure, the more variance will be captured by the random effects.
In the next sections, we assess performance of the original algorithm and proposed adaptations through more extensive empirical evaluations.

Data generation
We simulated data according to the subgroup structure depicted in Figure 5. Every dataset comprised four non-overlapping subgroups, corresponding to the terminal nodes of the tree in Figure 5.The subgroups are defined by the three true partitioning variables: u 1 , Design of subgroups and fixed effects.
where β j corresponds to the fixed effects in terminal node j of which subject i is part.β j values are reported below the terminal node panels in Figure 5.The fixed-and randomeffects design matrices X i and Z i are identical, each comprising two columns: a vector of 1s for the intercept, and a vector of timepoints.The same set of timepoints was generated for all subjects: 0, 1, 2, 3, 4 ‡ .Values of b i (random intercepts and slopes) were generated from a multivariate normal distribution with mean zero and a 2 × 2 diagonal covariance matrix Σ, the diagonal entries determined by the level of the data-generating design described below.Values of ϵ i were independently generated from a normal distribution with µ = 0 and σ 2 = 5.
We varied the following five data-generating characteristics: • Number of subjects: small (N = 100) or large (N = 250).
A full factorial design was employed, yielding 2 5 = 32 cells of the design; 100 repetitions were performed per cell.All data generation and analysis was performed in R (version 4.1.2;R Core Team, 2022).

Model fitting
We applied ten different fitting approaches to every generated dataset.Each variation combines one of three random-effects specifications (none vs. intercepts vs. inter-cepts+slopes) with one of two random-effect initializations (if any; all zero vs. full sample estimates) and one of two covariance specifications in the parameter instability tests (classic vs. clustered): • None: σb 0 = σb 1 = 0. Random effects are not estimated and their variances thus fixed to 0, yielding linear model (LM) trees with fixed effects only and the following variations of covariance specifications: -Default: Classic observation-level covariances.
• Intercepts: σb 0 > 0; σb 1 = 0.This yields linear mixed-effects model (LMM) trees in which only the variance of the random intercept was freely estimated and the variance of the random slope was fixed to 0. The four variations considered are the following: ‡ Although GLMM trees are not restricted to have the same set of timepoints for each subject, this design allowed for comparison with SEM trees in Study II.
-Default: Classic observation-level covariances in the parameter stability tests and random-effect initialization with all zeros (original step 0.).
-Alternative: Clustered covariances in the parameter stability tests.
-Alternative: Clustered covariances and random-effect initialization with the fullsample estimates.
• Intercepts and slopes: σb 0 > 0; σb 1 > 0. This yields LMM trees in which the variance of both the random intercept and slope were freely estimated.The four variations considered are the same as for the intercept-only LMM trees.

Evaluation of performance
We evaluated tree accuracy by counting the number of splits in each tree, and computing the standard (SD) and mean absolute deviation (MAD) from the true tree size (3 splits).Trees with > 3 splits are indicative of Type-I error, while trees with < 3 splits are indicative of Type-II errors (i.e., power too low to detect the true partitioning variables).
We also assessed whether the variable selected for the first split in every tree was the true first splitting variable.

Results
Figure 6 depicts the number of splits implemented by each partitioning approach.
The default fitting approach tends to overfit, irrespective of the random-effects specification, implementing > 3 splits in 89% datasets, on average.The use of cluster-level covariances in the parameter stability tests successfully mitigated overfitting: LMM trees with clustered covariances with (middle panel) or without (right panel) random slopes showed an average number of splits closest to the true tree size.
In terms of MAD, however, LMM trees initializing estimation with the random effects performed best, but only if the random slope was not estimated.Thus, initializing Table 1 Variables selected for the first split by each LM tree (top two rows) and LMM tree (bottom eight rows) estimation approach.
estimation with the random effects may only be useful if the random-effects specification is kept relatively simple.With more complex random-effects specifications, actual subgroup differences may be more likely captured by the random effects than by the tree structure.
The second-lowest MAD was observed for LM trees with clustered covariances (left panel).
Combined use of random-effects initialization and cluster-level covariances was not very effective, irrespective of the random-effects specification.
Distributions of the number of splits, separated according to the levels of the datagenerating design are presented in Figure A1 and discussed in Appendix A. The results show a pattern very similar to Figure 6; no substantial interactions between data-generating and model-specification parameters were observed.Main effects of the data-generating parameters were as expected: Strongest effects were for σ b 1 and N , with higher values resulting in a higher number of splits.The number of noise variables and σ 2 b 1 had smaller effects, while the correlation between partitioning variables hardly affected the number of implemented splits.
Table 1 shows the variables selected for the first split, and indicates high accuracy for recovery of the first split for all LM(M) tree fitting approaches.Only very rarely is u 1 not selected for the first split, if a first split was implemented.

Figure 6
Tree size distributions for LM trees (left panel) and LMM trees (middle and right panel).

Method
We fitted a total of six SEM trees to every dataset.We used two different splitting criteria: • The default "naive" splitting approach which employs likelihood ratio tests (LRTs) as the splitting criterion (Brandmaier et al., 2013).That is, for each candidate split, the log-likelihood of the SEM fitted to the observations in the current node is compared against the sum of the log-likelihoods of a two-group SEM, in which the two groups are defined by the candidate split.An LRT can thus be computed for each candidate split, which quantifies the improvement in fit that would result from implementing this split.In each step, the candidate split yielding the highest LRT is selected for splitting, and splitting is continued as long as a candidate split yields a p-value of the LRT above a pre-specified α level (0.05, by default).
• The score-based splitting approach of Arnold, Voelkle, and Brandmaier (2021).This For each splitting criterion, three different random-effects specifications were employed: • None: σb 0 = σb 1 = 0. Random effects were not estimated and their variances were thus fixed to 0.
• Intercepts: σb 0 > 0; σb 1 = 0.In every node, the variance of the random intercept was freely estimated; the variance of the random slope was fixed to 0.
• Intercepts and slopes: σb 0 > 0; σb 1 > 0. In every node, the variances of the random intercept and slope, as well as their correlation, were freely estimated.
To specify the node-specific models for SEM trees, we employed an LGCM specification with the response at each timepoint regressed on a latent intercept and slope.Intercept loadings were fixed to 1; slope loadings were fixed to 0, 1, 2, 3, 4, respectively.Errors were assumed uncorrelated between timepoints and an error variance was freely estimated for each timepoint.We used package lavaan (version 0.6-11; Rosseel, 2012) to fit the SEMs and we used package semtree (version 0.9.17;Brandmaier et al., 2013) to fit the SEM trees.
We fitted a single LongCART tree to each dataset.The LongCART function estimates node-specific models comprising a random intercept term; this default cannot be changed.A fixed-effects model was specified with the response regressed on time and a subject-specific random intercept.We used package LongCART (version 3.1 Kundu, 2021) to fit LongCART trees.(i.e., both random intercept and slope were estimated), LRT-based SEM trees showed average tree size very close to the true tree size, while under-specification (i.e., specification of only a random intercept) increased tree size by 0.2 splits on average.Score-based SEM trees perform best when only random intercepts are estimated, implementing too few splits when random slopes were (correctly) specified.LongCART trees seem to suffer from a lack of power, implementing only two splits on average.Overall, LM(M) trees with clustered covariances showed best performance, but are very closely followed by LRT-based SEM trees, which seem to be affected more strongly by mis-specification of the random effects.

Tree size
Distributions of the number of splits, separated according to the levels of the datagenerating parameters are depicted and discussed in Appendix A and Figure A2.They are omitted here, as they show a pattern very similar to Figure 7.Of the four data-generating parameters, N and σ 2 b 0 showed the strongest effects, with higher values resulting in a higher number of splits, as expected.The number of splits implemented by SEM trees was most Tree size distributions for LM(M) trees with clustered covariances, SEM trees and Long-CART.strongly affected by the data-generating parameters when the random effects were misspecified (i.e., random intercepts and/or slope fixed to 0).

Split selection
Table 2 presents variable selection frequencies for the first split in the fitted trees.
For SEM trees, the LRT criterion yields almost perfect accuracy for the first split.The score-based criterion for SEM trees provides near-perfect accuracy when random effects were correctly specified.When random effects were mis-specified, score-based SEM trees selected the wrong variable for the first split in about 12% of datasets.Closer inspection of stability tests for individual models and parameters suggested that the the score-based tests for SEMs are more sensitive to instability in the fixed slope than in the fixed intercept, explaining why u 2 or u 3 were often selected for the first split.
Table 2 Variables selected for the first splits by each of the partitioning approaches.
LongCART trees exhibit low accuracy for recovering the first split, selecting the wrong variable in all datasets where at least one split was implemented.LongCART showed a strong tendency to select u 2 or u 3 for the first split.Closer inspection of the fitted Long-CART trees revealed that in 99% percent of datasets in which no splits were implemented, u 1 was the strongest splitting candidate, but the parameter stability tests did not reach significance.This suggests the tests proposed by Kundu and Harezlak (2019) are less sensitive to instability of the intercept (compared to instability of the slope), or less sensitive to instability with respect to categorical covariates (compared to instability with respect to continuous covariates).grade in 2007, here we focus on assessments from kindergarten, 1st, 3rd, 5th and 8th grade.

Computation time
Response variables are reading, math, and science abilities, which were assessed using multi-item cognitive tests.Latent ability estimates were computed with a mean of zero and variance of one.Reading and math abilities were assessed in all five rounds of data collection, science knowledge was assessed in 3rd, 5th and 8th grade.We analyzed data from children who completed all assessments yielding N = 6, 277 for reading; N = 6, 512 for math; N = 6, 625 for science.
Time was measured as the number of months since the baseline assessment.In order to obtain approximately linear trajectories, we chose the timing metric based on visual inspection of the observed data: months 1/2 was used as the timing metric for reading and math trajectories, and months 2/3 for science trajectories.

Fitting approaches
We applied five LM(M) trees to the data, focusing on the original GLMM tree approach and approaches that performed well in the simulations: • An LM tree (σ b 0 = σb 1 = 0) using clustered covariances in the parameter-stability tests.
Although LRT-and score-based SEM trees performed very well in the simulations, they could not be used in this study because growth-curve SEMs do not allow for incorporating continuous time.

Evaluation of performance
The ECLS-K datasets have exceptionally large sample sizes, so we employed random sampling to obtain training samples of N = 250 children, likely more representative of realworld studies in psychology.We performed 100 repetitions for each response variable (math, reading, or science).We evaluated predictive accuracy by computing the mean squared difference between predicted and observed response variable values (MSE) for all children not included in the training sample in the current repetition.This separation of train and test observations does not allow for using the random effects in computing predictions; the cross-validated MSEs only quantify accuracy of the fixed-effects parameters.Tree size was measured by counting the number of splits in each tree.

Results
Figure 9 and Table 3 present MSE distributions.Differences in predictive performance are small, all R 2 differences are smaller than 0.01.Bonferroni-adjusted pairwise t-tests indicated no difference in performance between default LMM trees and LMM trees with random-effects initialization for any of the three outcomes.In contrast, the three LM(M) trees using clustered covariances performed significantly better for the math and reading outcomes.For the science outcomes, no significant differences were observed.Thus, cluster-level covariances seem to provide the most robust improvement of performance.
Figure 10 presents tree size distributions.Similar to the simulation study's results, default LMM trees implement the largest number of splits.Cluster-level covariances provide a robust reduction in the number of splits.Given their non-distinguishable predictive performance, LM trees with clustered covariances may be preferred if a obtaining a sparse result is critical.LMM trees with clustered covariances may provide less sparsity but better predictive accuracy, irrespective of whether random intercepts and/or slopes were estimated.
In contrast to the simulation study results, random-effects initialization here yields tree sizes similar to the default fitting approach.
Table 3 Cross-validated mean squared errors for each of the response variables.

Figure 9
Mean squared errors for trees fitted to math, reading and science ability trajectories.The secondary y-axis on the right quantifies the proportion of variance explained, computed as 1− mean(MSE) var(y) .

Figure 10
Sizes of trees fitted to math, reading, and science ability trajectories.
no. of splits Note.Top, middle and bottom panels depict math, reading and science ability trajectories, respectively.

Discussion
The simulations showed that the proposed extensions of GLMM trees are effective for partitioning LGCMs.Use of clustered covariances appears most effective and their good performance appears largely unaffected by (mis-)specification of the random effects; it may therefore be the safest choice in practice.Initializing estimation with the random effects was also effective, but only when the random-effects specification is kept simple (i.e., no estimation of random slopes).Combining cluster-level covariances and random-effects initialization worsened performance and is thus not recommended.
Strong performance of clustered covariances was also observed in partitioning realworld academic trajectories.They provided substantially smaller trees for all outcomes and better or equal predictive accuracy.In comparison to cluster-level covariances, randomeffects initialization resulted in larger trees for all outcomes and worse performance for the reading and math outcomes.
The simulations showed comparable performance of LM(M) and SEM trees in partitioning LGCMs.SEM trees may however be more sensitive to mis-specification of the random effects, with under-specification resulting in too many splits.In line with results of Arnold et al. (2021), we found score-based SEM trees to have somewhat lower power than LRT-based SEM trees, but at a much lower computational cost.LongCART trees often selected the wrong partitioning variable for the first split, and were outperformed by LM(M) and SEM trees.The LongCART parameter stability tests (Kundu & Harezlak, 2019) may be underpowered for detecting instability of the fixed intercept, or for detecting instability with respect to categorical covariates.
The simulations clearly illustrated the lower computational burden of GLMM trees.This is in large part due to their local-global estimation approach, where fixed-effects parameters are estimated locally within a node and random-effects parameters are estimated globally, using all observations.In contrast, SEM trees and LongCART fit the full mixed-effects model in each node, which substantially increases computational load.The local-global estimation approach also reduces model complexity, because a lower number of random-effects parameters need to be estimated.
Yet, a possible downside of the local-global estimation approach is that it does not allow for recovering subgroups with differences in random-effects parameters.When there is a specific interest in partitioning the random effects, score-based SEM trees may be preferred.Alternatively, researchers may want to use the parameter stability tests for mixedeffects models developed by Wang and Merkle (2018) (see also Wang, Graves, Rosseel, & Merkle, 2022;Wang, Merkle, Anguera, & Turner, 2021).This will be useful, for example, when the number of or distances between timepoints differ between respondents and SEMbased growth curve models cannot be applied (McNeish & Matta, 2018).
The current evaluations were limited to Gaussian responses and LGCMs.Future studies should assess performance of GLMM trees in partitioning longitudinal data with, for example, binomial or count responses.We expect that the strong performance of clusterlevel covariances generalizes to other settings where covariates are measured at higher levels, either in longitudinal and/or otherwise nested data structures, but whether our conclusions generalize to settings beyond LGCMs remains to be evaluated.Finally, we used the outerproduct-of-gradients (OPG) estimator for computing (clustered) covariances.Though computationally more burdensome, future work could assess potential benefits of using the full sandwich estimator.

Figure A1
Effects of data-generating parameters on tree size for LM(M) trees.

Figure 3 GLMM
Figure 3 Figure 4 Figures 2 and 3, the cluster-level parameter stability tests provided the most parsimonious tree structure thus far.The estimated variances of the random intercept were 0.21 (Figure 2, 0.20 (Figure 3) and 0.23 (Figure 4, indicating that the more parsimonious the tree

β
j1 = 1.0 u 2 and u 3 .All partitioning variables were generated from a standard normal distribution with µ = 0 and σ 2 = 25.To allow for assessing possible selection bias toward partitioning variables with a larger number of possible cutpoints, variable u 1 was transformed to a binary factor, with values below the mean set to 0 and values above the mean set to 1.The response was computed as: Note.M = mean number of splits; SD = standard deviation of the number of splits; MAD = mean absolute deviation from true tree size.Gray circles represent counts, dark gray horizontal lines represent true number of splits 3. Panel labels indicate whether variances of the random intercept and slope were fixed to 0 or freely estimated.Distances on y-axis are on the log scale.cl.cov.= cluster-level covariances employed in parameter stability tests; ran.eff.= estimation initialized with the random effects on the full sample.Study II: Comparison with Other Partitioning MethodsNext, we compared the performance of LM(M) trees with that of SEM trees and LongCART.This allowed for evaluating the possible (dis)advantages of global versus local estimation of random-effects parameters, as well as the performance of the different splitting criteria employed by each method.The same data-generating design as in Simulation Study I was employed.To reduce the number of comparisons, we only include performance of LM(M) trees fitted using clustered covariances, because they showed good performance in Simulation Study I.
approach uses the MOB algorithm described in the Introduction, where the parametric model fitted in step (a) is a SEM.While for GLMM trees, parameter stability tests are computed for the fixed-effects parameters only, score-based SEM trees compute parameter stability tests based on both fixed-and random-effects parameters.

Figure 7
Figure 7 depicts tree size distributions for the different algorithms.SEM trees (middle two panels) performed well, exhibiting overfitting only when no random effects were specified.LRT-based SEM trees tended to implement more splits than score-based SEM trees.Both LRT-and score-based SEM trees yielded lower tree size with increasing complexity of the random-effect specification.With the correct random-effects specification

Figure 7
Figure 7 number of splits; SD = standard deviation of the number of splits; MAD = mean absolute deviation from true tree size.Gray circles represent counts, dark gray horizontal lines represent true number of splits 3. Distances on the y-axis are on the log scale; x-axis labels indicate whether variances of the random intercept and slope were fixed to 0 or freely estimated.

Figure 8
Figure 8 presents computation time distributions for the partitioning algorithms.A clear computational advantage is observed for LM trees.LMM trees require longer computation times because of the estimation of random effects.Yet, this increase is minor compared to the computation times required by LongCART and SEM trees: LRT-based SEM trees required striking computation times, while score-based SEM trees were computationally more efficient, as expected.

Figure 8
Figure 8 Top, middle and bottom panels depict math, reading and science ability trajectories, respectively.

Figure A2 Effect
Figure A2