Within-Person Variability Score-Based Causal Inference: A Two-Step Estimation for Joint Effects of Time-Varying Treatments

Behavioral science researchers have shown strong interest in disaggregating within-person relations from between-person differences (stable traits) using longitudinal data. In this paper, we propose a method of within-person variability score-based causal inference for estimating joint effects of time-varying continuous treatments by controlling for stable traits of persons. After explaining the assumed data-generating process and providing formal definitions of stable trait factors, within-person variability scores, and joint effects of time-varying treatments at the within-person level, we introduce the proposed method, which consists of a two-step analysis. Within-person variability scores for each person, which are disaggregated from stable traits of that person, are first calculated using weights based on a best linear correlation preserving predictor through structural equation modeling (SEM). Causal parameters are then estimated via a potential outcome approach, either marginal structural models (MSMs) or structural nested mean models (SNMMs), using calculated within-person variability scores. Unlike the approach that relies entirely on SEM, the present method does not assume linearity for observed time-varying confounders at the within-person level. We emphasize the use of SNMMs with G-estimation because of its property of being doubly robust to model misspecifications in how observed time-varying confounders are functionally related to treatments/predictors and outcomes at the within-person level. Through simulation, we show that the proposed method can recover causal parameters well and that causal estimates might be severely biased if one does not properly account for stable traits. An empirical application using data regarding sleep habits and mental health status from the Tokyo Teen Cohort study is also provided. Supplementary Information The online version contains supplementary material available at 10.1007/s11336-022-09879-1.


INTRODUCTION
Estimating the causal effects of (a sequence of) time-varying treatments/predictors on outcomes is a challenging issue in longitudinal observational studies because researchers must account for time-varying and time-invariant confounders. For this analytic purpose, potential outcome approaches such as marginal structural models (MSMs; Robins, 1999;Robins, Hernán, & Brumback, 2000) have been widely used in epidemiology. Although actual applications have been relatively infrequent, structural nested models (SNMs; Robins, 1989Robins, , 1992 with G-estimation are in principle more suitable and robust for handling violation of the usual assumptions of no unobserved confounders and sequential ignorability (Robins, 1999;Robins & Hernán, 2009;Vansteelandt & Joffe, 2014).
Parallel with such methodological development, behavioral science researchers have shown interest in inferring within-person relations in longitudinally observed variables, namely, how changes in one variable influence another for the same person. Investigations based on within-person relations might produce conclusions opposite to those based on between-person relations. For example, a person is more likely to have a heart attack during exercise (within-person relation), despite people who exercise more having a lower risk of heart attack (between-person relation; Curran & Bauer, 2011).
Statistical inference for disaggregating within-and between-person (or within-and between-group) relations has been a concern in behavioral sciences for more than half a century. However, recent methodological development and extensive discussion (Cole, Martin, & Steiger, 2005;Hamaker, 2012;Hamaker, Kuiper, & Grasman, 2015;Hoffman, 2014;Usami, Murayama, & Hamaker, 2019) have rapidly increased interest in this topic.
In the psychometrics literature, along with multilevel modeling (e.g., Wang & Maxwell, 2015), structural equation modeling (SEM)-based approaches have become one popular method for uncovering within-person relations. Among these approaches, applications of a random-intercept cross-lagged panel model (RI-CLPM;Hamaker et al., 2015), which includes common factors called stable trait factors, have rapidly increased, reaching more than 1250 citations on Google as of December 2021. This model was originally proposed to uncover reciprocal relations among focal variables that arise at the within-person level (i.e., simultaneous investigations for the effects of a variable X on a variable Y , along with the effects of Y on X), without explicit inclusion of (time-varying) observed confounders L (however, Mulder & Hamaker (2021) discussed an extension that included a between-level predictor).
Despite its popularity and theoretical appeal, the concepts of stable traits and withinperson relations in the RI-CLPM have not been fully characterized in the causal inference literature. This might be partly because psychometricians have used these terms vaguely and ambiguously in statistical models, without clarifying the assumed data-generating process (DGP) and providing clear mathematical definitions. For this reason, the RI-CLPM has not been contrasted with many other methodologies used for causal inference (e.g., MSMs and SNMs). One potential advantage of the RI-CLPM as SEM is that it can easily include and estimate measurement errors in statistical models under parametric assumptions. However, the RI-CLPM demands linear regressions at the within-person level that are correctly specified to link focal variables (as well as time-varying observed confounders, if included in the model). The linearity assumption typically imposed with respect to timevarying observed confounders in path modeling and SEM has often been criticized in the causal inference literature (e.g., Hong, 2015), and relaxing this assumption is often a key to consistently estimating the causal quantity of interest (e.g., Imai & Kim, 2019).
In this paper, we propose a method of within-person variability score-based causal inference for estimating joint effects of time-varying continuous treatments/predictors at the within-person level by controlling for stable traits (i.e., between-person differences). The proposed method is a two-step analysis. A within-person variability score for each person, which is disaggregated from the stable trait factor score of that person, is first calculated using weights based on a best linear correlation preserving predictor through SEM. Causal parameters are then estimated by MSMs or SNMs, using calculated within-person variability scores. This approach is more flexible than the one that relies entirely on SEM (e.g., the RI-CLPM that includes time-varying observed confounders) in terms of modeling how time-varying observed confounders are functionally related with treatments/predictors and outcomes at the within-person level, without imposing the linearity assumption in these relations. We particularly emphasize the utility of SNMs with G-estimation because of its attractive property of being doubly robust to model misspecifications in how time-varying observed confounders are functionally related with treatments/predictors and outcomes at the within-person level.
The proposed method can be viewed as one that synthesizes two traditions for factor analysis methods and SEM in psychometrics and a method of causal inference (MSMs or SNMs) in epidemiology. Because causal estimands that are defined at the within-person level are less common in the causal inference literature (Lüdtke & Robitzsch, 2021), the proposed method offers new insights for researchers in a broad range of disciplines who are interested in causal inference. Also, the idea of using within-person variability scores can be applied to many other issues that are closely relevant to causal hypotheses, including reciprocal effects and mediation effects.
The remainder of this paper is organized as follows. Because the concepts of stable traits and within-person relations have not been fully characterized in the causal inference literature, in Section 2 we start our discussion by introducing the two different DGPs in which time-invariant factors are included. Notably, unlike in a previous study , we argue that stable trait factors (assumed in the RI-CLPM as a statistical model) are merely random intercepts for persons and cannot be statistically characterized as time-invariant unobserved confounders. After providing formal definitions of stable trait factors (for between-person relations) and within-person variability scores (for withinperson relations), the definition of joint effects of time-varying treatments at the withinperson level and their identification conditions are described in Section 3. We then introduce the proposed methodology in Section 4. In Section 5, we perform simulations and show that the proposed method can recover causal parameters well, and that causal estimates might be severely biased if stable traits are not properly accounted for. Section 6 describes an empirical application of the proposed method using data from the Tokyo Teen Cohort (TTC) study (Ando et al., 2019). The final section gives some concluding remarks and discusses our future research agenda.

CAUSAL MODELS AND DATA-GENERATING PROCESSES
In this section, we first explain two different DGPs and causal models in which timeinvariant factors are included. In the first DGP, we assume that time-invariant factors have both direct and indirect effects on measurements; recent work by Gische, West, and Voelkle (2021), which provided a didactic presentation of the directed acyclic graph (DAG)based approach and key concepts regarding causal inference based on a cross-lagged panel design, assumed this process. In the second DGP, we assume that time-invariant factors have only direct effects; this corresponds to the process that researchers (implicitly) assume in applying the RI-CLPM to infer within-person relations. This distinction of processes is inspired by , who highlighted how common factors included in the different statistical models to examine reciprocal relations have different conceptual and mathematical properties.
Below, we suppose that data are generated at fixed time points t 0 , t 1 , . . . ,t K . Let A ik denote a continuous treatment/predictor at time t k (k = 0, . . . , K − 1) for person i, and let L ik denote time-varying observed confounders at that time for person i. 1 Furthermore, Y ik is the outcome at time t k (k = 0, . . . , K) for person i and is part of the time-varying confounders L ik . Suppose that a time-varying confounder has three characteristics: it is independently associated with future outcomes Y ik , it predicts subsequent levels of treatment as well as future confounders, and it is affected by an earlier treatment and confounders (Vansteelandt & Joffe, 2014). In this paper, for the purpose of explanation, we assume a single confounder that is measured concurrently with the outcome at each time point and is measured before the treatment/predictor level is determined for each person. Thus, we presume that the variables are ordered as L 0 , A 0 , L 1 , A 1 , . . . , L K−1 , A K−1 , L K . 2 1 Time-invariant observed confounders L i can be included as a special case, but in the DGPs discussed herein, only time-varying observed confounders L ik are assumed for simplicity.
2 Y is not shown explicitly here because it is part of L. We often omit Y in expressing time-varying observed confounders in this paper.

Data-generating Process 1: Time-invariant Factors Have Both
Direct and Indirect Effects on Measurements Gische et al. (2021) introduced the DAG-based approach to causal inference, explaining how (SEM-based) statistical models can identify the causal models. Figure 1a is a DAG that expresses linear causal relations among variables in K = 4; this is similar to the one presented by Gische et al. (2021)  In this linear causal DAG model, we suppose that time-invariant factors η are additive to express person-specific differences in the mean levels of the respective variables (Y , A, and L), which do not change over time. Without loss of generality, we assume that these factors have zero means (E(η (Y ) ) = E(η (A) ) = E(η (L) ) = 0). If we are interested in the longitudinal change of sleep time in adolescents, as will be investigated in the later empirical example, then η reflects all time-invariant factors that might affect the level of sleep time in an adolescent during the course of study (e.g., sex, year of birth, constitution, genetic endowment, health, exercise habits, home environment including discipline, engagement in club/extracurricular activities in school). The values of coefficients corresponding to the paths from time-invariant factors to measurements are restricted to be equal to one.
These restrictions (a) assign a scale to the latent random intercept and (b) reflect the assumption that the structural coefficients from the random intercepts to measurements do not change over time (Gische et al., 2021). The bidirected dashed edges between time-invariant factors indicate that they might show covarying relations due to unobserved confounding. Likewise, the bidirected dashed edge between initial measurements Y 0 and L 0 indicates that they might show covarying relations due to unobserved confounding.
Under this linear causal DAG model, the DGP can be represented by the following set of linear equations (k ≥ 1): Here, µ k , and µ (L) k are (fixed) intercepts at time t k and are omitted in the DAG representation. Residual terms are denoted by d and are assumed to be uncorrelated with time-invariant factors; they are also usually omitted in the DAG representation. It is also assumed that there is no unobserved common cause among these residuals (i.e., concurrent residuals are mutually uncorrelated). Note that the equations assume homogeneity: the coefficients α, β, and γ are fixed and constant across persons.
As suggested by Gische et al. (2021), a statistical model that captures the DAG depicted in Figure 1a (i.e., Equation 1) can be globally identified: all parameters can theoretically be estimated uniquely from observational data. Assuming sufficient sample size, correct model specification, and no excess multivariate kurtosis, the SEM-based maximum likelihood (ML) method provides estimates that are asymptotically unbiased, efficient, and consistent (Bollen, 1989). More details about causal identification and estimation in linearly parameterized causal DAG models are provided by Gische and Voelkle (in press).
A notable feature of this DGP is that time-invariant factors have both direct and indirect effects on measurements. For example, In addition, other time-invariant factors η (A) and η (L) also have indirect effects on Y 3 (e.g., Also, directed edges from time-varying factors, which are expressed by writing the variable name with an asterisk (e.g., Y * 3 ), are drawn to the corresponding measurements. Directed edges are assumed between these time-varying factors, rather than between measurements as in Figure 1a The values of coefficients corresponding to the paths from time-varying factors to mea-surements are all restricted to be one, and we assume that these time-varying factors have zero means. Under this linear causal DAG model, the DGP can be represented by the following linear equations (with the assumption of homogeneity of coefficients among persons) that have two major parts: for k ≥ 0, and   explained that the conceptual and mathematical roles of common factors differ according to whether or not they are modeled with lagged regression in the statistical model. More specifically, in models that include accumulating factors, their influences on measurements at time t k (e.g., η (Y ) → Y k ) transmit to the future measurements (e.g., Y k → Y k+1 ) through the lagged regression, which is also influenced by the same accumulating factors (e.g., η (Y ) → Y k+1 ); as a result, the magnitudes of impacts from these factors change over time. In contrast, in models that include stable trait factors (e.g., the RI-CLPM), their impacts are stable over time because they have only direct effects. In this way, the conceptual meaning and inferential results for (within-person) relations among variables being modeled differ in each statistical model according to whether researchers assume the inclusion of accumulating factors or stable trait factors; see Usami et al. (2019, pp. 643-644) for a more detailed comparison. As we have argued, the conceptual and mathematical roles differ between stable trait factors and accumulating factors. Importantly, the differences between these factors can also be characterized as whether or not they can be considered as time-invariant unobserved confounders. For example, the accumulating factors of outcomes (η (Y ) ) cause measurements Y k (k ≥ 1) while also being associated with measurements of treatments/predictor at the previous time point (A k−1 ). In this sense, accumulating factors can be considered as unobserved confounders in evaluating causal effects of treatments/predictors. In contrast, the stable trait factors of outcomes (I (Y ) ) also cause measurements Y k but are uncorrelated with within-person variations such as Y * k and A * k−1 . More specifically, when measurements Y k are unconditional, I (Y ) does not confound the relations among within-person variations (e.g., the path from A * k−1 to Y * k ) because the path from I (Y ) to within-person variations Y * k is blocked by the measurement Y k , which act as colliders. Therefore, stable trait factors cannot be viewed as time-invariant unobserved confounders; rather, they should be characterized as merely random intercepts that are uncorrelated with predictors (i.e., withinperson variations). This view differs from that of , who explain stable trait factors as time-invariant unobserved confounders.

Implications of Comparing
If the assumed (linear and first-order) DGP as in Figure 1b is correct and if all variables are observable, then controlling for only Y * k−1 and L * k−1 is sufficient to evaluate the within-person relation between A * k−1 and Y * k . One could argue that controlling for stable trait factors is not required to identify causal parameters for treatment effects at the within-person level, the reason being that within-person processes (time-varying factors) and between-person differences (stable trait factors as time-invariant factors) are mutually uncorrelated. However, because all these factors are actually latent variables and unobservable, we need to use measurements (as colliders) to infer treatment effects at the within-person level, and appropriate control of stable trait factors as latent variables is required in the statistical model. If the assumed DGP as in Figure 1b is correct, then not controlling for stable trait factors causes biased estimates of causal parameters for the within-person relation (e.g., Usami, Todo & Murayama, 2019 and the later simulations).

Initial conditions
How to treat the initial measurements (i.e., Y 0 , L 0 , and A 0 ) is also important for distinguishing between the two DGPs in Figure 1. Special attention needs to be paid to the initial measurements because there are no incoming directed edges to these variables from variables prior to the initial time point (k = −1, −2, . . . ). Although we have assumed so far that the DGP does not start prior to the initial measurements, this assumption is not realistic in many applications, and the initial measurements must somehow account for the past of the process that is not explicitly modeled (see Figure S1 in the Online Supplemental Material for more details). Assuming a DGP similar to that in Figure 1a Figure 5) provided a straightforward and interpretable approach that freely estimates the coefficients (loading) from η to all initial measurements. For example, for η (Y ) , the coefficients from this factor to Y 0 , L 0 , and A 0 are freely specified rather than being fixed to either one or zero. In applying dynamic panel models in econometrics, one usually assumes that individual-specific components (i.e., accumulating factors) are correlated with the initial measurements to account for the past of the process.
Importantly, if the second DGP ( Figure 1b) is correct, then such special considerations are not required. This is because time-invariant factors I have only direct effects on measurements (rather than on temporal deviations as within-person variations) and no directed edges are assumed between observed variables. In other words, past time-varying . . ) as variations in within-person processes cause observed variables separately from I as stable between-person differences (see also Figure S1 in the Online Supplemental Material). Therefore, if the assumed DGP as depicted in Figure 1b is correct, then the RI-CLPM, which includes time-varying observed confounders and assumes that initial variables at the within-person level (Y * 0 , L * 0 ) are exogeneous (and are mutually correlated) and that loadings from I to the corresponding initial measurements (i.e., I (Y ) → Y 0 , I (A) → A 0 and I (L) → L 0 ) are all set to one, can identify causal parameters for treatment effects, even if the DGP actually starts prior to the initial measurements.

Summary and Discussion
The critical difference between the two different DGPs in Figure 1 is whether the assumed time-invariant factors have only (stable) direct effects (i.e., stable trait factors) or both direct and indirect effects on measurements (i.e., accumulating factors). Because of this difference, stable trait factors as merely random intercepts cannot be viewed as timeinvariant unobserved confounders, while special considerations for initial measurements are not required if the assumed DGP includes only stable trait factors (i.e., Figure 1b).
Although researchers are increasingly using the RI-CLPM as a statistical model to uncover within-person relations, stable traits and (implicitly) assumed DGPs have not been fully characterized in the causal inference literature. Below, we assume a DGP that includes stable trait factors as in Figure 1b, and we propose a method of within-person variability score-based causal inference for (joint) effects of time-varying treatments at the withinperson level. This approach is more flexible than the one that relies entirely on SEM (e.g., the RI-CLPM that includes time-varying observed confounders) in terms of the linearity assumption regarding observed confounders at the within-person level.
A causal DAG represents a researcher's theory about the causal process and should be drawn based on subject-matter knowledge. However, in many cases, researchers do not exactly know the true DGP and how time-invariant factors (if they exist) influence measurements (e.g., linearly or nonlinearly, directly or indirectly, or both). Although it is ideal if one can unambiguously articulate the theoretically derived expected relations for variables, this can be quite challenging in practical applications (Curran, 2011). If linear SEM-based statistical models are used, then as a data-driven approach one could compare model fit indices between two statistical models that appropriately represent the causal models of Equation (1) and Equations (2) and (3) (i.e., the RI-CLPM that includes timevarying observed confounders), and this would be useful for investigating the sensitivity of the conclusions.
In the context of applying the RI-CLPM, Lüdtke & Robitzsch (2021) argued that including stable trait factors might be better suited for short-term studies that typically use shorter time lags between time points. In short-term studies, one might be more certain that there are no indirect effects from time-invariant factors (i.e., stable trait factors). Also, if time-invariant unobserved confounders (rather than random intercepts that merely represent between-person differences as stable trait factors) are likely to be present, then other statistical approaches that account for such confounders might be more suitable.
However, in our opinion there are no clear criteria that delineate when and how to include (time-invariant) factors in the assumed DGP, and continued discussion that also considers empirical investigations of each research hypothesis and sensitivity of results (e.g., the later simulations) will be required in the future. The terms "(stable) traits" and "within-person relations" have been used vaguely and ambiguously in statistical models, despite the existence of mathematical and interpretative differences among models (e.g., . Inspired by the discussion so far, we provide the formal definitions of these below.

FORMAL DEFINITIONS OF STABLE
A stable trait factor of person i (say, for Y ) is defined in this paper as (i) the timeinvariant factor that has additive influence on measurements, and (ii) its quantity is equal to the difference between the expected value of measurement (i.e., true score) of this person at time point t k (expressed as T (Y ) ik ) and the temporal group mean (µ (Y ) k ), which is invariant over time: Next, the within-person variability score Y * ik is defined as (i) a time-varying factor that has additive influence on measurements, and (ii) its quantity is equal to the difference between a measurement and its expected value: with the assumptions of E(Y * ik ) = 0 and independence between T (Y ) ik and Y * ik . From this formulation, stable trait factors and within-person variability scores are uncorrelated be- Thus, variances of measurements at time point t k can be expressed as the sum of those of stable trait factor scores and within-person variability scores. This means that the time series for within-person variability scores have the following covariance structure: In this paper, we use the terms within-person relation and between-person relation to describe the relations between variables that are based on within-person variability scores and stable trait factor scores, respectively.

Within-person Level
Next, we explain the definition of joint (causal) effects of treatments at the within-person level using the potential outcome approach. We assume a similar causal DAG model to that in Figure 1b: (i) measurements are expressed by the linear sum of stable trait factors and within-person variability scores, and (ii) within-person variability scores are expressed by functions (with assumption of homogeneity) of those in past time. However, unlike the presentation in Section 2, we relax some assumptions about the within-person variability scores to allow the following: (a) higher-order lagged effects and interaction effects of treatments/predictors can exist at the within-person level, and (b) time-varying observed confounders can be nonlinearly related with outcomes and treatments/predictors at the within-person level. The current focus is on evaluating the within-person relation between variables, that is, how the (joint) intervention of treatments/predictors influences future outcomes at the within-person level.
Below, we use overbarsȲ * . . , K) denote the within-person variability score for the outcome that would take at time point t k for person i were this person to receive treatment history at the within- Here, A * ik = 0 (k = 0, . . . , K − 1) indicates that the amount of treatments/predictors for person i is equal to the expected score of this person at time point t k (i.e., which we connect to the within-person variability score by the consistency assumption ik is a latent variable and unobservable, while potential outcomes for measurements (i.e., observed variables) are assumed in the standard potential outcome approach.
In potential outcome approach, causal effect refers to a contrast between potential outcomes under different treatment values. Therefore, for each causal effect, we can imagine a (hypothetical) randomized experiment to quantify it (i.e., target trial; Hernán & Robins, 2021). For example, (average) causal effect on Y * ik when a continuous treatment/predictor The standard assumption of no unobserved confounders or sequential ignorability indicates that Here, (ā * i(k−2) , 0) is the counterfactual history, that is, the history that agrees withā * i(k−2) through time t k−2 and is zero thereafter. Along with the assumed causal DAG above as well as consistency and sequential ignorability, we impose the stable unit treatment value assumption (SUTVA; no unmodeled spillovers, e.g., Hong, 2015) and assumptions of positivity (i.e., the probability of receiving each level of treatment conditional on past confounders and treatments is greater than zero) and modularity. Under these assumptions, the average causal effect on Y * ik when A * i(k−1) increases one unit from the reference value a * r i(k−1) at time t k−1 can be expressed using the difference in conditional means given information on confounders and treatment history as In other words, the causal effect of treatment A * r i(k−1) at the within-person level can be evaluated by the difference in conditional means of Y * ik between persons who receive A * i(k−1) = a * r i(k−1) + 1 (i.e., treatment levels that are a * r i(k−1) + 1 larger than their expected scores µ

given information on confounders and treatment history. 3
Similarly, the average joint (causal) effects of a sequence of treatments/predictorsĀ * i(k−1) on Y * ik when they increase one unit from the reference valuesā * r i(k−1) can be expressed as Figure 2 provides conceptual diagram modified from Figure 1b to account for the joint interventionsĀ * i(k−1) =ā * i(k−1) when DGP can be represented by linear and first-order models.
As a simple example, suppose K = 2 and that the DGP can be represented by linear and first-order models as in Equations (2) and (3) (assuming homogeneity and no interaction effects of treatments/predictors). Then, a conditional mean If we use a do-operator for a hypothetical experiment, which is popular for defining and quantifying the causal effect in the structural causal model approach (Pearl, 2009), Equation (10) ). The (conditional) interventional means are numerically equivalent to conditional means (i.e., Equation 10) under the causal DAG such as in Figure 1b in combination with linear equations and normally distributed residual terms. However, these quantities are conceptually different (see Gische et al., 2021;Gische & Voelkle, in press) and differ under nonlinear models or non-Gaussian models. MSMs and SNMs are methods based on the potential outcome approach, and a do-operator has not been used explicitly in this context. can be expressed as the linear (weighted) sum of the terms a * i0 and a * i1 : From this result, joint (causal) effects of treatments A * i0 and A * i1 when increasing one unit from the reference values a * r i0 and a * r thus the causal effect of treatment A * i0 when increasing one unit from the reference values a * r i0 at the within-person level becomes β 10 = β (Y ) 1 , which is equivalent to the so-called cross-lagged parameter in Equation (3).
From Equations (5) and (6) (i.e., stable trait factors are uncorrelated with within-person variability scores), we have the relation is the term that is not associated with treatments at the within-person level, it can be shown that The right side of the equation can be interpreted as person-specific joint (causal) effects in the sense that it accounts for stable traits of persons (I (Y ) i ). Therefore, joint (causal) effects of A * on Y * (i.e., Equation 11; at the within-person level) can be interpreted as person-specific joint (causal) effects on Y under the assumed linear causal DAG such as in Because intervention of I , conditional means in Equation (15) are not influenced by this intervention (e.g., Figure S2 in the Online Supplemental Material for a conceptual diagram). Therefore, given information of I (Y ) , this (person-specific) causal effect is equal to zero. This indicates that magnitudes of expected scores µ (A) + I

Identification Conditions for Causal Parameters
So far, we have assumed a causal DAG model that is similar to that shown in Figure 1b Regarding the second assumption, if K ≥ 2 and the DGP can be represented by linear (and first-order) equations such as in Equations (2) and (3) (assuming homogeneity and no interaction effects of treatments/predictors), then in this special case, the RI-CLPM (that includes time-varying observed confounders) as a statistical model can identify parameters for joint (causal) effects. However, the linearity assumption that is typically imposed for time-varying observed confounders and outcomes (and treatments/predictors) in path modeling and SEM (including the RI-CLPM) has often been criticized in the causal inference literature (e.g., Hong, 2015), and relaxing this assumption is often key to consistently estimating the causal quantity of interest (e.g., Imai & Kim, 2019). In addition, ensuring a correct specification in terms of the linearity is very challenging in that many equations must be diagnosed in longitudinal designs.
As we will see, the proposed method still requires specifications of the structure for within-person variability scores in each variable (Y , A, and L in the first step) as well as parametric models for treatments and outcomes at the within-person level (in the second step). However, the assumption of linearity is not required for these parametric models in MSMs and SNMs, and in MSMs one does not need to model the relation between outcomes and time-varying observed confounders (at the within-person level) because it is the means of potential outcomes that are marginalized over these confounders that are of concern. Notably, SNMs with G-estimation have the property of being doubly robust to model misspecifications in how time-varying observed confounders are functionally related with treatments and outcomes (at the within-person level).

PROPOSED METHODOLOGY
We are now ready to introduce a method of within-person variability score-based causal inference for estimating joint effects of time-varying continuous treatments, assuming that the above conditions for identification are satisfied. The proposed method consists of a twostep analysis. First, within-person variability scores are calculated using weights through SEM that models only the measurement parts that include stable trait factors. Then, causal parameters are estimated by MSMs or SNMs, using the scores calculated in the first step. This approach is more flexible than the one that relies entirely on SEM (e.g., the RI-CLPM that includes time-varying observed confounders) in terms of modeling how time-varying observed confounders are functionally related with treatments/predictors and outcomes at the within-person level, without imposing the linearity assumption in these relations. Before explaining the proposed methodology, we briefly discuss the motivation for adopting a two-step method, rather than simultaneously estimating stable trait factors (or within-person variations) and causal parameters.
In general, partial misspecification in measurements and/or structural models is known to cause large biases in estimates of model parameters. In the present context, when a simultaneous estimation procedure such as the RI-CLPM is used, misspecification in the structural models at the within-person level may greatly affect parameter estimates in the measurement model ((co)variances of stable factors and within-person variability scores), and vice versa.
To avoid such confounding in interpreting the estimation results, in the SEM context Anderson and Gerbing (1988) proposed a two-step procedure that first confirms the measurement model with a saturated model, so that structural relations have no impact on the measurement model. Then, using an appropriate measurement model, the substantive structural relations model of interest is added (Hoshino & Bentler, 2013). Applications of similar multistep estimation procedures can be seen for diverse classes of latent variable models (Bakk & Kuha, 2017;Croon, 2002;Skrondal & Laake, 2001;Vermunt, 2010).
Another potential advantage of two-step estimation is its feasibility. MSMs and SNMs usually do not assume common factors, and the optimization procedure for these models is different from that in SEM. For this reason, fully customized programming is required if performing simultaneous estimation. However, in two-step estimation, parameters in measurement models can be estimated in the first step through various software packages for SEM, including Amos, SAS PROC CALIS, R packages (sem, lavaan, OpenMx), LIS-REL, EQS, and Mplus. MSMs and SNMs can be straightforwardly applied just by using calculated within-person variability scores instead of measurements. Two-step estimation is also advantageous because it poses less risk of improper solutions. This problem is encountered relatively often when applying the RI-CLPM because of negative variance parameters and a singular approximate Hessian matrix for stable trait factor variance-covariance (e.g., Usami, Todo, & Murayama, 2019), which is likely caused by misspecifications in linear regressions (i.e., the structural model). We will separately estimate stable trait factors for each variable (Y , A, and L) without influence from specified structural models, thus minimizing the risk of improper solutions.

Within-person Variability Scores
The first step of our method is divided into two sub-steps: (i) specification of the measurement models and parameter estimation and (ii) prediction of within-person variability scores.

Specification of the measurement models and parameter estimation
As stated earlier, we assume that measurements are expressed by the linear sum of stable trait factors and within-person variability scores that are mutually uncorrelated, as in Equation (2). This equation can be viewed as a factor analysis model that includes a single common factor I (whose factor loadings are all one) 4 and a unique factor as temporal 4 Although we defined stable trait factors as the (time-invariant) difference between the expected value of a given person's measurement and the temporal group mean, one could argue for another definition that allows for time-varying influences on measurements. If this is the case, time-varying factor loadings can be freely specified in this step (except for one fixed factor loading for identification). However, there deviations. In vector notation, the causal model of Equation (2) for outcome Y becomes where µ (Y ) is a (K + 1) × 1 mean vector, E(I We denote as Ψ (Y ) a (K + 1) × (K + 1) variance-covariance matrix of within-person variability scores. This implies that the variance-covariance matrix of Y . Unlike the standard factor analysis model, Ψ (Y ) has a dependence structure and is not diagonal. Therefore, in using SEM to estimate the parameters in Equation (16), some structure-such as compound symmetry, a Toeplitz structure, or a (first-order) autoregressive (AR) structure-must be specified in Ψ (Y ) for model identification. When the model is correctly specified, consistent estimators for µ (Y ) , φ 2 (Y ) , and Ψ (Y ) can be obtained by MLE in SEM (Jöreskog & Lawley, 1968).
In SEM, missing values can be easily handled by full information maximum likelihood (Enders & Bandalos, 2001) with the assumption of missing at random (MAR; Rubin, 1976).
If data are suspected to be missing not at random (MNAR), then appropriate sensitivity analyses and/or multiple imputation should be considered (Resseguier, Giorgi, & Paoletti, 2011). Models that account for MNAR can be easily estimated in popular software packages for SEM (see Enders, 2011;Newsom, 2015).
Another advantage of SEM is that validity of the specified model can be diagnosed via multiple model fit indices, along with model comparisons using information criteria. In this paper, we use three current major indices (e.g., Hu & Bentler, 1999;Kline, 2016) Similarly, we also set measurement models for treatments/predictors A and observed confounders L separately in this sub-step, then estimate parameters for mean vectors (µ (A) and µ (L) ), stable trait factor variances (φ 2 (A) and φ 2 (L) ), and variance-covariance matrices of within-person variability scores Ψ (A) and Ψ (L) . may be some cost in that the minimum number of time points required to identify the measurement model becomes larger than that in specifying time-invariant loadings.

Predicting within-person variability scores
vectors of measurements and within-person variability scores, respectively, and let µ = (µ (Y ) , µ (A) , µ (L) ) t be a mean vector. Also let Σ and Ψ be covariance matrices for measurements X i and within-person variability scores We consider linear prediction of within-person variability scoresX * i under the condition that Σ and Ψ are known. Consider a (3K + 1) × (3K + 1) weight matrix W that provides within-person variability scores from measurements aŝ satisfying the relation Unlike standard applications of factor analysis, we are interested in predicting withinperson variability (unique factor) scores, rather than stable trait factor (common factor) scores. However, the current problem of determining weights W shares the similar motivation of predicting factor scores. In the factor analysis literature, a predictor that preserves the covariance structure of common factors has been developed as a linear correlation preserving predictor (Anderson & Rubin, 1956;Green, 1969;ten Berge, Krijinen, Wansbeek, & Shapiro, 1999).
With this point in mind, W that can provide the best linear predictor ofX * i minimizing the risk function, defined as the trace of a residual covariance matrix (i.e., mean squared , which also satisfies the relation in Equation (18), can be obtained by utilizing singular value decomposition as Here, for a positive (semi)definite matrix C, we denote as C 1/2 the positive (semi)definite matrix such that its square equals C. Matrices C −1/2 and C 3/2 are the inverse (if it exists) and the third power of C 1/2 , respectively. A derivation of W is provided in the Online Supplemental Material.
We use the sample meansX and covariance matrix S of X as estimators of µ and Σ.
As implied from the relation in Equation (7), we use estimated stable trait factor variances to estimate Ψ asΨ whereΦ + consists of estimated stable trait factor (co)variances. In the simple case where the initial measurement of Y (Y 0 ) is missing and the number of measurements equals K for each variable,Φ + becomeŝ whereΦ is an estimator of a 3 × 3 stable trait factor covariance matrix Φ. Because stable trait factor covariances are not estimated in the previous sub-step, we use covariances between calculated linear correlation preserving predictors for variables. For example, this predictor for Y can be expressed aŝ can be calculated in the same manner, whereby we obtainφ (Y,A) = Cov(Î the model is correctly specified in the previous sub-step. From Equations (17) and (19)-(21), we can thus obtainX * i without specifying the structural models that connect within-person variability scores from different variables (Y * , A * , and L * ), successfully maintaining independence from the next step.

Applying MSMs and SNMMs
The second step of the proposed method is straightforward, because we just need to apply MSMs or SNMs using calculated within-person variability scores. Robins and co-workers developed SNMs with G-estimation (Robins, 1989;Robins, Blevins, Ritter, & Wulfsohn, 1992) and MSMs with an inverse probability weight (IPW) estimator (Robins, 1999;Robins, Hernán & Brumback, 2000). These methods have been extended to treat clustered outcomes (e.g., Brumback, He, Prasad, Freeman, & Rheingans, 2014;He, Stephens-Shields, & Joffe, 2015. However, (joint) causal effects under the control of stable trait factors have not been investigated in this area because inference for stable traits and within-person relations has been an issue in the psychometric and behavioral science literature, and these concepts have yet to be fully characterized in the causal inference literature.
MSMs are advantageous in that they can be easily understood and fit with standard, off-the-shelf software that allows for weights (e.g., He, Stephens-Shields, & Joffe, 2019;Vansteelandt & Joffe, 2014). However, it is well known that MSMs can be highly sensitive to misspecification of the treatment assignment model, even when there is a moderate number of time points (e.g., Hong, 2015;Lefebvre, Delaney, & Platt, 2008). Imai and Ratkovic (2015) proposed a covariate balancing propensity score methodology for robust IPW estimation.
Because of the attractive property of being doubly robust in G-estimators, SNMs are a better approach for handling violation of the usual assumptions of no unmeasured confounders or sequential ignorability (Vansteelandt & Joffe, 2014). In addition, SNMs can allow direct modeling of the interactions and moderation effects of treatments/predictors A with observed confounders L. Another advantage of SNMs is that the variance of locally efficient IPW estimators in MSMs exceeds that of G-estimators in SNMs, unless A and L are independent. We therefore emphasize the utility of SNMs in this paper. Because we are now interested in evaluating the joint effects of treatments on the mean of an outcome, rather than those on the entire distribution of the outcome, we apply structural nested mean models (SNMMs; Robins, 1994).
Note that potential disadvantages of SNMs are their limited utility for G-estimation when applying logistic SNMs and their limited availability of off-the-shelf software. Regarding the latter point, Wallace, Moodie, and Stephens (2017) developed an R package for G-estimation of SNMMs.

MSMs using within-person variability scores
MSMs are typically applied to evaluate the joint effects of a sequence of treatments on the outcome, which is measured only at the end of a fixed follow-up period (t K ). For generality of discussion, as before we assume that the outcome is measured each time and that the primary interest is evaluation of effects of a sequence of past treatments on the outcome at each time point.
MSMs consider the marginal mean of potential outcomes that are marginalized over the observed confounders L. In the current context, we consider potential outcomes at the within-person level, namely, with k = 1, 2, . . . , K. 5 The average joint (causal) effects ofĀ * i(k−1) on Y * ik when increasing one unit from the reference values in each treatment become k t=1 β k(t−1) . Parameters τ = (α 1 , . . . , α K , β 10 , β 20 , β 21 , . . . , β K(K−1) ) t can be estimated by fitting a weighted conditional model with an IPW estimator. One useful option for calculating weights is to use stabilized weights w ik for person i at time point t k (Hernán, Brumback, & Robins, 2002) as , L * it ) = 0 (the positivity assumption). Parameters will be biased if the treatment assignment model f 1) ) does not result in bias. In MSMs, unlike the RI-CLPM (that includes L), one does not need to model the relation between outcomes and time-varying observed confounders at the within-person level because marginal (joint) effects of treatments/predictors are the primary focus in applying this method. Also, one 5 Other terms such as quadratic effects (e.g., A * 2 ik ) for time-varying treatments can be included in MSMs. Also, one can include observed covariates/non-confounders to assess effect modification (Hernán & Robins, 2021).
can allow a nonlinear relation between treatments/predictors and confounders in the treatment assignment model f (A * it |A * i(t−1) , L * it ), although estimates are sensitive to this model misspecification.

SNMMs using within-person variability scores
SNMMs simulate the sequential removal of an amount (blip) of treatment at t k−1 on subsequent average outcomes, after having removed the effects of all subsequent treatments.
SNMMs then model the effect of a blip in treatment at t k−1 on the subsequent outcome means while holding all future treatments fixed at a reference level 0 (Vansteelandt & Joffe, 2014); in other words, the level that is equal to expected scores of a person in the current context.
In the following empirical applications using the data of K = 2, a linear SNMM using the identity link g(x) = x is given by Here, the first equation models the effect of A * i1 on Y * i2 , the second models the effect of A * i0 on Y * i2 , and the third models the effect of A * i0 on Y * i1 . The (conditional) average joint effects of A * i0 and A * i1 on Y * i2 when increasing one unit from the reference values in each treatment become β 20 + γ 20 l * i0 + β 21 + γ 21 l * i1 . This effect becomes β 20 + β 21 if there are no interaction effects between confounders and treatments.
SNMMs consider a transformation U * im (τ ) of Y * ik , the mean value of which is equal to the mean that would be observed if treatment were stopped from time t k−1 onward, in the sense that is the identity link. For instance, in the above example of K = 2, The assumptions of sequential ignorability (Equation 9) together with identity (Equation 27) imply that for k = 1, . . . , K. The parameters τ can therefore be estimated by solving the estimating where d k−1 (L * i(k−1) ,Ā * i(k−1) ) is an arbitrary p × (K − k + 1)-dimensional function, with p the dimension of τ , and V −1 is a p × (K − k + 1)-dimensional vector that includes the reciprocal of the variance of each element in U ). This estimating equation essentially sets the sum across the time points of the conditional covariances between U * i(k−1) (τ ) and the function andĀ * i(k−2) , are zero. If there is homoscedasticity in V , then local semiparametric efficiency under the SNMM is attained upon choosing (Vansteelandt & Joffe, 2014). Solving the estimating equation (30) requires a parametric model A for the treatment/predictor A * ik : f (A * i(k−1) |L * i(k−1) ,Ā * i(k−2) ; η) with k = 1, . . . , K. It also requires a parametric model B for the conditional mean of U * i(k−1) (τ ), namely, . Notably, when the parameters η and κ are variation-independent, G-estimators that solve Equation (30), obtained by substituting η and κ with consistent estimators, are doubly robust (Robins & Rotnitzky, 2001, cited from Vansteelandt & Joffe, 2014, meaning that estimates of causal parameters are consistent when either model A or model B is correctly specified. In addition, unlike the RI-CLPM (that includes L), one can allow nonlinear effects of time-varying observed confounders on treatments/predictors and outcomes in models A and B.

Method
This section describes a Monte Carlo simulation for systematically investigating how effectively the proposed method using calculated within-person variability scores can recover causal parameters, and it presents comparisons of estimation performance versus other potential (centering) methods to account for stable traits. We consider two different scenarios: (i) the assumed linear (and first-order) DGP of Figure 1b (i.e., causal models represented in Equations 2 and 3) is correct and other assumptions of consistency, sequential ignorability, SUTVA, positivity, modularity, and multivariate normality are all satisfied, and (ii) some assumptions are violated and the statistical model contains misspecifications. In the whole simulation, for simplicity we also assume that causal effects are homogeneous among persons and interactions or moderation effects with observed confounders are not present.
In the first scenario, initial within-person variability scores (Y * i0 , A * i0 , and L * i0 ) are first generated so that they are normally distributed and their variances and covariances become 10 and 3, respectively. Then, within-person variability scores at succeeding times are sequentially generated via a first-order linear autoregressive model (i.e., Equation 3) with the stationarity assumption 6 : If K = 4, this setting produces (see also the calculation in Equation 12) where α k (k = 1, . . . , 4) is a constant. Because no moderation effects are assumed, estimating 10 different causal parameters τ K=4 = (β 10 , β 20 , β 21 , β 30 , β 31 , β 32 , β 40 , β 41 , β 42 , β 43 ) t is a common goal between MSMs and SNMMs. The variance of normal residual d was set to 5 for each variable, making the variance of within-person variability scores for each variable become almost 10 at each time point (the proportion of variance explained in Equation 32 becomes almost 50%).
Independently of generating within-person variability scores, three kinds of stable trait i , and I (L) i ) are generated by multivariate normal with a correlation of 0.3. Observed values are then generated using the relation of Equation (2), where temporal group means are set to zero at each time point (i.e., µ . In this simulation, we systematically changed the total number of persons as N = 200, 600, and 1000, the number of time points as K = 4 and 8, and the size of stable trait factor variances as φ 2 (Y ) = φ 2 (A) = φ 2 (L) = 10/9, 30/7, and 10. This setting of stable trait factor variances indicates that the proportion of this variance to that of measurements becomes around 10%, 30%, and 50%, respectively, at each time point. To make it easier to compare the results between the K = 4 and K = 8 conditions, in K = 8 we suppose . This setting produces conditional means of (potential) outcomes as functions of treatments at k = 6, and α 5 + β 54 a * i4 = α 5 + 0.40a * i4 at k = 5, where α k (k = 5, . . . , 8) is a constant. There are a total of 10 causal parameters τ K=8 = (β 54 , β 64 , β 65 , β 74 , β 75 , β 76 , β 84 , β 85 , β 86 , β 87 ) t that are equal to those in the K = 4 condition (i.e., τ K=4 = τ K=8 ).
By crossing these factors, we generated 200 simulation data for each combination of factors. For comparison, each simulation dataset was analyzed by MSMs and SNMMs using four different scores: 1) true within-person variability scores (true factor score centering: for Y ), 2) within-person variability scores predicted by the proposed method (Equation 17), 3) scores based on observed person-specific means (observed-mean centering, e.g.,Ŷ * ik = Y ik −Ȳ i , whereȲ i = K k=0 Y ik /(K + 1)), and 4) observed scores (no centering, e.g.,Ŷ * ik = Y ik ). In the current scenario, the no-centering method totally ignores the presence of stable traits. On the other hand, because observed means include the components of both stable traits (between-person differences) and within-person variability, observed-mean centering fails to perfectly disentangle stable individual differences from within-person variability. 7 Under each simulation condition, we calculated the bias and root mean squared error (RMSE) of 10 kinds of estimates of causal parameters from MSMs and SNMMs.
In the first step of the proposed method, to identify the measurement model (e.g., Equation 16 for Y ) SEM that assumes a linear AR(1) structure with time-varying autore-gressive parameters and residual variances is specified for within-person variability scores in each variable. Although a true model (i.e., AR(K) structure) cannot be specified because of the identification problem, we confirmed that the AR(1) structure generally provides acceptable model fits under the current parameter setting.
The results are discarded when improper solutions appear in the first step because of out-of-range parameter estimates (e.g., negative variance). In the current simulation, fewer than 0.1% of all estimates produced such improper solutions. We also confirmed that improper solutions were not found in the second step of applying MSMs and SNMMs.
When applying MSMs, a first-order linear regression model is specified for the treatment assignment model, namely, f (A t |A t−1 , L t ) (i.e., the correct specification). For SNMMs, models A and B are also specified in an appropriate manner.
ik , indicating that the treatment assignment model that includes only linear effects of Y * ik assumed in the current MSM and SNMM is misspecified. Note that causal parameters for time-varying treatments (τ ) remain unchanged even if quadratic effects exist in the treatment assignment model because time-varying treatments are now intervened (see Figure 2).
The simulation was conducted in R, using the lavaan package (Rosseel, 2012) to esti-mate parameters by SEM with MLE in the first step and the ipw package for MSMs in the second step. In SNMMs, we solve Equation (30) via the Newton-Raphson method.
Simulation code is available in the Online Supplemental Material.

Results
Because of space limitations, Figure 3   Another critical aspect of this method is that linear dependence prevents identification of joint effects of all past treatments on Y K (in this case, β 40 , β 41 , β 42 , β 43 in K = 4). We therefore do not recommend use of observed-mean centering. The no-centering method shows serious negative biases when φ 2 is not small, indicating that ignoring the presence of stable traits is critical to estimating causal effects. Magnitudes of stable trait factor variances should vary depending on the nature of variables and study period, but in the author's experience many studies that applied the RI-CLPM have shown significant and moderate to large sizes ofφ 2 (e.g., the proportion of stable trait factor variance to that of measurements is above 30%). The following application also demonstrates large stable trait factor variance estimates.
As supplemental analyses, we additionally explored the performance of the methods under different parameter settings, as well as different model specification of SEM in the first step. From this, we find similar tendencies in the results (Figures S5-S8): (a) SNMMs show smaller RMSEs than do MSMs, and (b) the proposed method shows adequate performance in terms of biases and RMSEs, and it works better than the no-centering method (especially when φ is larger) and the observed-mean centering method (especially when K is smaller).
We also investigated the performance of linear correlation preserving predictor (Î i ), confirming that the proposed method worked much better than this method on average ( Figures S5-S8).
Similar results were also observed in the second scenario, where model misspecifications are present (Figures S9-S14). More specifically, when measurement errors were present, the biases and RMSEs became larger in all methods ( Figures S9 and S10). However, the proposed method still outperforms other centering methods. When time-invariant factors do not influence measurements as stable trait factors, the overall results of biases and RMSEs were not largely affected (Figures S11-S12), regardless of the magnitude of φ 2 .
This result is a little surprising, considering that the specified time-varying loadings from factors (= 1 + 0.5k/K at time t k ) are not small (i.e., the impact of this factor on the variance of measurement at time t K is almost twice that at time t 0 ). This may suggest that causal parameters can be recovered relatively well even when ignoring time-varying impacts from time-invariant factors that are actually present in the first step. However, future investigations are required in order to better clarify when estimated causal parameters are seriously biased under various scenarios for misspecified measurement models. When quadratic effects from time-varying observed confounders are present in the treatment assignment model but are ignored in analyses, biases and RMSEs in MSMs become larger on average. In the proposed method, this is salient in the RMSEs for the K = 8 and φ 2 = 10 conditions ( Figures S13-S14). SNMMs, which have the property of being doubly robust in G-estimators, were less influenced even if these by-no-means small quadratic effects are ignored, and in many conditions the proposed method again outperforms other centering methods.

EMPIRICAL APPLICATION
This section describes an empirical application of the proposed method using data from the Tokyo Teen Cohort (TTC) study (Ando et al., 2019). We assume a similar causal DAG model to that in Figure  In this example, we estimate the (joint) causal effects of time-varying sleep duration (A) on later depressive symptoms (Y ) in adolescents. Several epidemiological studies have suggested a relationship between sleep habits (sleep duration, bedtime, and bedtime regularity) and mental health status (depression and anxiety) in adolescents. For example, Matamura et al. (2014) applied the CLPM to data from 314 monozygotic twins living in Japan and showed that sleep duration had significant associations with mental health indices, even after controlling for genetic and shared environmental factors. However, to the author's knowledge, no studies have investigated this relation that accounts for stable traits in sleep duration and symptoms (i.e., at the within-person level).
The Short Mood and Feelings Questionnaire (SMFQ; Angold et al., 1995) was used to measure depression in adolescents (Y ). The SMFQ consists of 13 items assessing depressive symptoms rated on a three-point scale (0: not true, 1: sometimes true, 2: true) regarding feelings and actions over the preceding two weeks. Higher SMFQ scores suggest moresevere symptoms. These data were measured at home by self-report questionnaires. In this example, sleep duration in hours (A) was measured by the question "How long do you usually sleep on weekdays?" Observed confounders were body mass index (BMI; L B ) and bedtime (L A ), which was measured by the question "When do you usually go to bed on weekdays?" Because many adolescents reported no problems for all items on the SMFQ, the score distribution was positively skewed. In the present example, we focus on the clinical group comprising N = 416 adolescents (13.1%) with SMFQ scores of 6 or higher during the study. Katon, Russo, Richardson, McCauley, and Lozano (2008) reported 80% sensitivity and 81% specificity at this cut-off for diagnosis of major depression based on the Computerized Diagnostic Interview Schedule for Children (C-DISC). Missing data were primarily due to dropout. Of the 416 samples, 113 adolescents provided all three responses in the study. Descriptive statistics of sleep duration, SMFQ score, bedtime, and BMI are available in the Online Supplemental Material (Table S1).
In the first step, we use generalized least squares in the lavaan package to estimate the model parameters for each variable. To identify the measurement model (e.g., Equation 16 for Y ), SEM that assumes an AR(1) structure with time-varying autoregressive parameters and residual variances is specified for within-person variability scores of each variable.
Let P i be the total number of variables observed in adolescent i. P i × P i weights W i are calculated from estimated parameters under the assumption of MAR. Within-person variability scores X * i are then calculated using this weight and measurements X i,obs for adolescent i asX * i = W t i X i,obs . Causal parameters (β and γ) of sleep duration at 10 and 12 years old (A * 0 and A * 1 ) on later depressive symptoms (SMFQ scores Y * 1 and Y * 2 ) are estimated using calculated withinperson variability scores by linear SNMM. In linear SNMM, blip functions and U * (τ ) are set as in Equations (26) and (28), except that the two confounders L A and L B are present in this example. When applying SNMMs, models A and B are both specified using firstorder linear regression models. All calculated within-person variability scores were used in the analysis under the assumption of MAR.
We confirmed that the first step did not find improper solutions, and that current AR(1) models that assume time-varying parameters fit better than those that do not. Table S2 summarizes the model fit indices and estimated parameters in this step. All stable trait factor variance estimates are significant, indicating the necessity of controlling for stable traits. Specifically, the proportions of variances in measurements attributable to estimated stable trait factors at k = 0 are 24.5%, 54.5%, 48.2%, and 74.8% for Y , A, L A , and L B , respectively. Table 1 provides the estimation results of causal parameters, along with estimates based on the no-centering and observed-mean centering methods for comparison. As seen in Table 1, the proposed method reveals that intervention of longer sleep duration at 12 years old (A * 1 ) has a positive effect (β 21 = −2.704, 95%CI [-4.938,-0.470], p <.05) on later depressive symptoms at 14 years old (Y * 2 ) at the within-person level, but this estimate is not significant in the no-centering and observed mean score-centering methods. Similar positive effects of sleep duration were found in previous studies (Matamura et al., 2014), but the present analysis newly investigates this causal hypothesis at the within-person level by controlling for stable traits of persons. When the no-centering method is applied, the causal effect estimate of A * 0 on Y * 1 is significant, showing that intervention of longer sleep duration at 10 years old has a negative effect (β 10 = 1.693, 95% CI [0.405,2.981], p <.05) on later depressive symptoms at 12 years old. Considering that the magnitudes of the estimated stable trait factor variances were moderate or large for all variables, causal effect estimates in the no-centering method are unreliable and might be seriously biased.
In supplemental analyses, we confirmed that the major findings did not change even when using only data of adolescents who provided all three responses (N = 113) and a different cutoff for SMFQ (Angold et al., 1995;Tables S2-S5). Again, statistical significance as well as sign and magnitude in estimates of causal parameters might change according to the choice of calculation (centering) methods for within-person variability scores, and ignoring the presence of stable traits of persons might lead to incorrect conclusions.

GENERAL DISCUSSION
We proposed a two-step estimation method for within-person variability score-based causal inference to estimate joint effects of time-varying (continuous) treatments/predictors by effectively controlling for stable traits. In the first step, a within-person variability score for each person, which is disaggregated from the stable trait factor score, is calculated using weights based on the best linear correlation preserving predictor through SEM. Causal parameters are then estimated by MSMs or SNMs, using calculated within-person variability scores. The proposed method can be viewed as one that synthesizes the two traditions of factor analysis/SEM in psychometrics and a method of causal inference (MSMs or SNMs) in epidemiology.
In this paper, we began by providing formal definitions of stable trait factors (for between-person relations) and within-person variability scores (for within-person relations), because these concepts have not been fully characterized in the causal inference literature despite the fact that they have been attracting increasing attention in psychometrics and behavioral science (e.g., Hamaker et al., 2015;. On the other hand, in epidemiology the conceptual and mathematical differences between stable trait factors and accumulating factors, along with which kind of time-invariant factor is included in each statistical model, have received less attention. This paper may help bridge the gap. We have also clarified the assumptions required to identify causal parameters for withinperson variability score-based causal inference: (i) (as depicted in Figure 1b As for the second assumption, our approach is more flexible than the RI-CLPM (that includes time-varying observed confounders), which researchers are becoming increasingly interested in for uncovering within-person relations among variables, in that the assumption of linearity is not required with respect to time-varying observed confounders at the withinperson level. We particularly emphasize the utility of SNMs with G-estimation, because of its property of being doubly robust to the model misspecifications in how the time-varying observed confounders are functionally related with treatments/predictors and outcomes, along with flexibility in that it allows investigation of moderation effects of treatments with observed confounders.
Through simulation and empirical application, we illustrated that ignoring the presence of stable traits might lead to incorrect conclusions in causal effects. We also confirmed that the proposed approach is superior to observed-mean centering, as a conventional method to predict stable traits of persons. Especially when K is small, observed-mean centering showed serious negative biases in estimates of causal parameters. Considering that most research applying the RI-CLPM to uncover within-person relations used longitudinal data with two or three time points (K = 1, 2; e.g., Usami, Todo & Murayama, 2019), observedmean centering cannot be recommended.
A recent study provided closed-form parametric expressions of causal effects for linear models (Gische et al., 2021), and Gische and Voelkle (in press) proposed asymptotically efficient estimators in the case of ML estimation. It is suggested that, at least in large samples, the parametric procedure proposed by Gische and Voelkle (in press) may give smaller standard errors compared with the proposed method in simulations in which data are generated by a linear model with normal residuals. Comparing the performances of these methods and the RI-CLPM under various conditions that account for nonlinear relations among variables is an important topic for future studies.
One caveat for the proposed method, which is relevant to the second assumption above, is that in the first step, one must correctly specify the structure (such as the AR(1) struc-ture) for within-person variability scores in each variable so that the (identified) SEM can yield consistent estimates of parameters, which are required for consistent estimation of causal effects in the subsequent step. However, in general, how to establish the correct (or even a plausible) DAG model is a major challenge (Hamaker, Mulder & van IJzendoorn, 2020; see also the discussion in Section 2.4). Relatedly, the premise that stable trait factors exist and loadings from factors are equal to those in the DGP might be restrictive in actual applications; therefore, we need to carefully account for the consequences of possible model misspecifications in the first step on the results in the second step to precisely infer withinperson relations. The good news is that in the present simulation, we confirmed that the specified time-varying AR(1) structure works well to recover causal parameters, and its performance was not largely influenced even when there were model misspecifications (i.e., time-varying effects from time-invariant factors). However, additional large-scale simulations to further clarify the robustness of the method regarding this point are needed in future studies.
We can also use model fit indices to evaluate how well the structure specified in the first step fits to the data, especially when the number of time points is large. However, this procedure is not a fundamental solution. Even if a researcher is certain that SEM that assumes stable trait factors for each variable can be specified in the first step, in general there is still at great risk of violating some assumptions. Notably, relating to the first assumption above, stable trait factors and within-person variability scores (temporal deviations) at each time point might be correlated in actual applications if they share common causes (e.g., genotype; see also McNeish & Kelly (2019), who argued the issue of endogeneity in applying mixed-effects model). This point is closely related to the model being able to perfectly disentangle the within-and between-person relations. Future studies should investigate how this violation impacts the estimated causal parameters.
We used two-step estimation to account for feasibility, but this issue remains in that one still must write programming code, as that in the Online Supplemental Material. We are planning to develop packages for the proposed method. Another potential limitation is that the proposed approach (as well as the RI-CLPM) demands longitudinal data with three or more time points (K ≥ 2) to identify the measurement model (SEM) in the first step unless strong parameter constraints are imposed.
We assumed continuous variables in this paper, but we can extend this methodology to categorical variables. For binary and ordered categorical variables, one simple method is to use estimated tetrachoric or polychoric correlations as input for factor analysis and then to calculate within-person variability scores using estimated stable trait factor and unique factors (within-person variability scores) variances. There are alternative methods for conducting factor analysis for ordinal variables (e.g., Jöreskog & Moustaki, 2001), but whichever method is used, rescaling stable trait factor and unique factor scores variances might be required to make calculated within-person variability scores interpretable. In some cases, using variances of original (categorical) data might be one option for this purpose. However, the estimation performance for causal parameters and the influence of biased tetrachoric or polychoric correlations caused by misspecified distributions must be investigated in future research.
Because we take an SEM approach in the first step, accounting for measurement errors, which is closely related to violation of the consistency assumption, is feasible under the parametric assumption. Although we expect that longitudinal data with large K are required for precisely estimating measurement-error variances, we plan to investigate how the proposed method works under measurement models that include measurement errors.
This paper opens a new avenue for exploring other various research questions that are closely relevant to causal hypotheses. For example, use of within-person variability scores can be extended to cases in which one is interested in uncovering reciprocal effects (e.g.,  and mediation effects (e.g., Goldsmith et al., 2018;Tchetgen Tchetgen & Shpitser, 2012), as well as to multilevel modeling and hierarchical continuous time modeling (Driver & Voelkle, 2018). Note that there is still room for discussion on the issues of within-person relation and stable traits, as well as the issue about when and how to include (time-invariant) factors in the assumed DGP (see Section 2.4). The present paper is intended to promote substantial discussion about the conceptual and statistical properties of the time-invariant factors (e.g., stable trait factors or accumulating factors) included in the assumed DGP among researchers who wish to infer within-person relations and causality, and the hope is that the proposed method helps in exploring various causal hypotheses in longitudinal design and guiding better decision-making for researchers.