Abstract
This paper provides a theoretical foundation to examine the effectiveness of post-hoc adjustment approaches such as propensity score matching in reducing the selection bias of synthetic cohort design (SCD) for causal inference and program evaluation. Compared with the Solomon four-group design, the SCD often encounters selection bias due to the imbalance of covariates between the two cohorts. The efficiency of SCD is ensured by the historical equivalence of groups (HEoG) assumption, indicating the comparability between the two cohorts. The multilevel structural equation modeling framework is used to define the HEoG assumption. According to the mathematical proof, HEoG ensures that the use of SCD results in an unbiased estimator of the schooling effect. Practical considerations and suggestions for future research and use of SCD are discussed.
Similar content being viewed by others
Notes
The comparability can be statistically tested through the multivariate group comparison approach (Tatsuoka 1971). The comparability of the two groups is revealed by the discriminant function (DF) of X (Tatsuoka 1971). X includes q column vectors such as level-1 (student-level) covariates and their interaction terms and level-2 (class- or school-level) covariates and their interaction terms. It is denoted as \(X=(x_{1}, \ldots , x_{q})\). The DF is a linear combination of X. For example, the first DF of Xs can be written as \(DF_{1}=v_{11}x_{1}+ v_{12}x_{2}+\cdots + v_{1p}x_{q}\). Vector \(V_{1}=(v_{11}, \ldots , v_{1p})\) is the first eigenvector of \({\Sigma ^W}^{-1}\Sigma ^B\). \(\Sigma ^W\) and \(\Sigma ^B\) are the within-group and between-group variance–covariance matrices of Xs, respectively. Notice that the within-group variance–covariance matrix \(\Sigma ^W\) should be computed by taking account of the hierarchical structure of the data (see Schmidt and Houang 1986). If \({(\Sigma ^W)}^{-1}\Sigma ^B\) has n non-zero eigenvalues, then we can define n DFs—namely, \(DF_{1}, DF_{2}, \ldots , DF_{n}\). Using DF simplifies group comparability testing when the number of covariates is large. Following the descriptive discriminant analysis (DDA, Huberty and Olejnik 2006), the group comparability testing can determine if Cohort 2 at time 0 is comparable to Cohort 1 at time 1 regarding covariates X. Here, a two-step testing approach can be conducted. First, one computes the latent roots of \({(\Sigma ^W)}^{-1}\Sigma ^B\) to construct DFs and test whether the two groups are unanimously comparable or not. Second, if they are not unanimously comparable, the univariate group comparison can reveal the non-comparability. Thus, a set of covariates will be identified. The two groups should be non-comparable on each of the covariates in terms of their means. The set of covariates can then be used as matching variables.
Selection bias, also called “sample selection bias”, refers to the bias that is due to the use of non-random samples in estimating relationships among variables of interests. It can occur in two situations: (1) self-selection by objects being studied, and (2) non-random sample selection by data analysts or researchers. The use of selection-biased samples results in a biased estimator of the effect of an intervention that should have been selected randomly. The intervention can refer to “treatment of migration, manpower training, or unionism” (Heckman 1979, p. 154).
\(\alpha\) represents the base-line value due to history or prior learning, which is identical in both treatment and control groups after randomization. It was not specified in Solomon (1949); however, it is important in this study for three reasons. First, it is a quantity that relates to or indicates the initial comparability of the groups. Second, it is involved the process of computing treatment effects (see subsection of SCD in this study). Third, more importantly, it will be a critical criterion to match the groups when randomization is unavailable. Note that when randomization is unavailable, another useful method for the creation of comparable groups is matching (Solomon 1949).
For example, the interaction effect of pre-test and treatment is a function of the four quantities. The quantity in experimental group 1 is \(Q_{E_1} =f(\alpha +\delta +\gamma +\tau +I)\). The quantity in experimental group 2 is \(Q_{E_2} =f(\alpha +\gamma +\delta )\). The quantity in control group 1 is \(Q_{C_1} =f(\alpha +\tau +\gamma )\), and the quantity in control group 2 is \(Q_{C_2} =f(\alpha +\gamma )\). Interaction effect, denoted as I, is computed as \(Q_{E_1}-Q_{E_2} -Q_{C_1} +Q_{C_2}\).
The mean of the sample distribution of \(\hat{\delta }_{C2T1-C1T1}\) is \(\delta _{C2T1-C1T1}\). At the population level, \(\delta _{C2T1-C1T1}\) is an estimator of \(\delta _{C2T0-C2T1}\). This way, bias can be defined as \(BIAS(\delta _{C2T1-C1T1})=\delta _{C2T1-C1T1}-\delta _{C2T0-C2T1}\).
In the two-parameter logistic (2PL) uni/multidimensional IRT model (Lord and Novick 1968; Reckase 2009), measurement equations for pre- and post-test are
$$\begin{aligned} \log \left[ {\frac{{\rm {prb}}(Y_{0}^{E_1} =1)}{1-{\rm {prb}}(Y_{0}^{E_1} =1)}} \right] =a_0 (\eta _{0}-b_{0}); \end{aligned}$$(5)and
$$\begin{aligned} \log \left[ {\frac{{\rm {prb}}(Y_{1}^{E_1} =1)}{1-{\rm {prb}}(Y_{1}^{E_1} =1)}} \right] =a_1 (\eta _{1} -b_{1}), \end{aligned}$$(6)respectively. \(b_{0}\) and \(b_{1}\) are the item difficulty parameter vectors. \(a_1\) and \(a_0\) are the discrimination parameter vectors. Multidimensional IRT parameter settings and dimension specification are on p. 71 and p. 93 of Reckase (2009).
This equation specifies a general case. For the purpose of simplicity, \(\mathcal {V}\) can be set as 1 across all four groups. \(\tau\) and \(\gamma\) are speculated in the structural model because they reflect changes associated with latent mathematics proficiency. The latent changes will further reveal their effects through the measurement equation.
For the purpose of simplicity, the superscripts (group indices) are dropped; however, Table 2 clearly displays each group in a separate row. Adding subscripts would be redundant. Also, after covariates have been included, the errors terms are now denoted by \(\xi\) rather than \(\varepsilon\).
In education studies, this effect is often referred to as the “schooling effect”.
References
Battistin E, Chesher A (2014) Treatment effect estimation with covariate measurement error. J Econom 178(2):707–715
Berger V (2005) Selection bias and covariate imbalances in randomized clinical trials. Wiley, New York
Biemer PP, Groves RM, Lyberg LE (2004) Measurement errors in surveys. Wiley, Hoboken
Bloom H (2005) Learning More from Social Experiments: Evolving Analytic Approaches. Russell Sage Foundation, New York
Bloom HS, Richburg-Hayes L, Black AR (2007) Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educ Eval Policy Anal 29(1):30–59
Bollen KA (1989) Structural equations with latent variables. Wiley, New York
Burstein L (1992) The IEA study of mathematics III: student growth and classroom processes. Pergamon Press, Oxford
Campbell RT, Hudson CM (1985) Synthetic cohorts from panel surveys. Res Aging 7(1):81–93
Campbell DT, Stanley JC (1963) Experimental and quasi-experimental designs for research. Rand McNally College Publishing Company, Skokie
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. CRC Press, Boca Raton
Cheung GW, Rensvold RB (2002) Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Equ Model Multidiscip J 9(2):233–255
Cochran WG (1957) Analysis of covariance: its nature and uses. Biometrics 13(3):261–281
Cochran WG (1968) Errors of measurement in statistics. Technometrics 10(4):637–666
Cochran WG, Chambers SP (1965) The planning of observational studies of human populations. J R Stat Soc Ser A (General) 128(2):234–266
Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhy Indian J Stat Ser A 35:417–446
Cox DR, Reid N (2000) The theory of the design of experiments. CRC Press, Boca Raton
Elder GH (1998) The life course as developmental theory. Child Dev 69(1):1–12
Freedman LS, Green SB, Byar DP (1990) Assessing the gain in efficiency due to matching in a community intervention study. Stat Med 9(8):943–952
Fuller WA (1987) Measurement error models. Wiley, New York
Fuller WA (1995) Estimation in the presence of measurement error. Int Stat Rev (Revue Internationale de Statistique) 63(2):121–141
Hansen MH, Hurwitz WN, Bershad MA (1961) Measurement errors in censuses and surveys. Bull Inst Int Stat 38(2):359–374
Haviland AM, Nagin DS (2005) Causal inferences with group based trajectory models. Psychometrika 70(3):557–578
Heckman JJ (1979) Sample selection bias as a specification error. Econom J Econom Soc 47(1):153–161
Hedges LV (2007) Correcting a significance test for clustering. J Educ Behav Stat 32(2):151–179
Heimberg RG, Stein MB, Hiripi E, Kessler RC (2000) Trends in the prevalence of social phobia in the united states: a synthetic cohort analysis of changes over four decades. Eur Psychiatry 15(1):29–37
Hong G, Raudenbush SW (2006) Evaluating kindergarten retention policy. J Am Stat Assoc 101(475):901–910
Huberty CJ, Olejnik S (2006) Applied MANOVA and discriminant analysis. Wiley, New York
International Association for the Evaluation of Educational Achievement (1977) The second international mathematics study. IEA, Amsterdam
Jakubowski M et al (2015) Latent variables and propensity score matching: a simulation study with application to data from the programme for international student assessment in poland. Empir Econ 48(3):1287–1325
Jöreskog KG, Sörbom D (1996) LISREL 8: user’s reference guide. Scientific software International, Chicago
Kaplan D (1999) An extension of the propensity score adjustment method for the analysis of group differences in mimic models. Multivar Behav Res 34(4):467–492
Kaplan D (2008) Structural equation modeling: foundations and extensions. Sage Publications, Thousand Oaks
Kessler RC, Stein MB, Berglund P (1998) Social phobia subtypes in the national comorbidity survey. Am J Psychiatry 155(5):613–619
Lee SY (2007) Structural equation modeling: a Bayesian approach. Wiley, New York
Leon AC, Hedeker D (2005) A mixed-effects quintile-stratified propensity adjustment for effectiveness analyses of ordered categorical doses. Stat Med 24(4):647–658
Li YP, Propert KJ, Rosenbaum PR (2001) Balanced risk set matching. J Am Stat Assoc 96(455):870–882
Lord FM (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Mahwah
Lord FM, Novick MR (1968) Statistical theories of mental test scores. Addison-Wesley Publishing Company, Boston
Mahalanobis PC (1946) A sample survey of after-effects of the bengal famine of 1943. Sankhy Indian J Stat 7(4):337–400
McCaffrey DF, Lockwood JR, Setodji CM (2013) Inverse probability weighting with error-prone covariates. Biometrika 100(3):671–680
Muthén LK, Muthén BO (1998–2012) Mplus user’s guide. Muthén & Muthén
Muthén BO (1994) Multilevel covariance structure analysis. Sociol Methods Res 22(3):376–398
Muthén BO, KG Jöreskog (1983) Selectivity problems in quasi-experimental studies. Eval Rev 7(2):139–174
Raab GM, Butcher I (2001) Balance in cluster randomized trials. Stat Med 20(3):351–365
Raudenbush SW, Liu XF (2001) Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychol Methods 6(4):387–401
Reckase M (2009) Multidimensional item response theory, vol 150. Springer, Berlin
Rosenbaum PR (1986) Dropping out of high school in the united states: an observational study. J Educ Behav Stat 11(3):207–224
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55
Rubin DB (1973) Matching to remove bias in observational studies. Biometrics 29(1):159–183
Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6(1):34–58
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147 (ISSN 1939-1463)
Schmidt WH, Burstein L (1992) Concomitants of growth in mathematics achievement during the population a school year. In: Burstein L (ed) The IEA study of mathematics III: student growth and classroom processes. Pergamon Press, Oxford, pp 309–327
Schmidt WH, Houang TR (1986) Ein vergleich von drei analyseverfahren fur hierarchist strukturierte daten. In: Saldern MV (ed) Mehrebenenanalyse. PVU, Weinheim, pp 71–81
Solomon RL (1949) An extension of control group design. Psychol Bull 46(2):137–150
Song M, Herman R (2010) Critical issues and common pitfalls in designing and conducting impact studies in education. Educ Eval Policy Anal 32(3):351–371
Spiegelman D, Schneeweiss S, McDermott A (1997) Measurement error correction for logistic regression models with an “alloyed gold standard”. Am J Epidemiol 145(2):184
Steiner PM, Cook TD, Shadish WR (2011) On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. J Educ Behav Stat 36(2):213–236
Stuart EA, Rubin DB (2008) Matching with multiple control groups with adjustment for group differences. J Educ Behav Stat 33(3):279–306
Tatsuoka MM (1971) Multivariate analysis: techniques for educational and psychological research. Wiley, New York
Wang Q (2015) Propensity score matching on multilevel data. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals and developments. Guilford, New York, pp 217–235
Wang Q, Maier K, Houang R (2017a) Omitted variables, R 2, and bias reduction in matching hierarchical data: a monte carlo study. J Stat Adv Theory Appl 17(1):43–81
Wang Q, Houang R, Maier K (2017b) Bias reduction rates for latent variable matching versus matching through surrogate variables with measurement errors. Interdiscip Educ Psychol 1(1):9
Webb-Vargas Y, Rudolph KE, Lenis D, Murakami P, Stuart EA (2015) An imputation-based solution to using mismeasured covariates in propensity score analysis. Stat Methods Med Res 26(4):1824–1837. https://doi.org/10.1177/0962280215588771
Wiley DE, Wolfe RG (1992) Major survey design issues for the IEA third international mathematics and science study. Prospects 22(3):297–304
Wolfe RG (1987) Second international mathematics study: training manual for use of the databank of the longitudinal, classroom process surveys for population a in the IEA second international mathematics study. Contractor’s report. Center for Education Statistics, Washington, DC
Wooldridge JM (2002) Econometric analysis of cross section and panel data. MIT Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hsiu-Ting Yu
This paper is based on work supported by the National Science Foundation under Grant no. DUE-0831581. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Appendix 1: Variance–covariance decomposition of the extended Solomon four-group design (SFGD) based on two-level SEM framework
Appendix 1: Variance–covariance decomposition of the extended Solomon four-group design (SFGD) based on two-level SEM framework
1.1 Appendix 1.1: Variance–covariance matrix of SFGD experimental group 1
The following are the variance–covariance matrix of Y and X in the pre- and post-test design—SFGD’s experimental
1.2 Appendix 1.2: Variance–covariance matrix of SFGD experimental group 2
1.3 Appendix 1.3: Variance–covariance matrix of SFGD control group 1
1.4 Appendix 1.4: Variance–covariance matrix of SFGD control group 2
1.5 Appendix 1.5: Detailed variance–covariance decomposition
This appendix discusses the detailed variance–covariance decomposition of the SFGD’s experimental group 1. The measurement model is defined in Eq. (14), where \(\varepsilon \sim N(0,\Theta _{\varepsilon })\). \(\varepsilon\) is independent of \(\eta\), \(\xi\) and \(\zeta\). \(\zeta \sim N(0,\Theta _{\zeta })\) is independent of \(\eta\), \(\xi\) and \(\varepsilon\). The latent variable \(\xi\) is hierarchically structured and includes both within-cluster and between-cluster components. The latent variable \(\eta\) of Y is hierarchically structured (Muthén 1994, p. 379). This is because \(\eta\) has a functional relationship with \(\xi\) in the structural model.
If a is the intercept vector and \(\Pi\) the loading matrix, then the structural model (Jöreskog and Sörbom 1996) is written as Eq. (15), where \(u\sim N(0,\Theta _u)\) is independent of \(\xi\), \(\varepsilon\) and \(\zeta\). The independence assumption will be used in the computation of the covariance of Y and X. This model is a two-level factor analysis model (Muthén and Muthén 1998–2012). Correspondingly, based on the SEM framework, data variation can be decomposed into within- and between-cluster components (Muthén 1994, p. 380).
1.5.1 Appendix 1.5.1: Decomposition of variation of X
The variation of the latent variable \(\xi\) can be decomposed as
The variation of X’s residual can be decomposed into between- and within-cluster components. That is,
The variation of outcome X is decomposed as
with
and
1.5.2 Appendix 1.5.2: Decomposition of latent variable \(\eta\)
The variation of \(\eta\) can be decomposed using the structural model. First, the residual variance is decomposed as
The variation of \(\eta\) is decomposed as
with
and
1.5.3 Appendix 1.5.3: Decomposition of variation of Y
The variation of Y’s residual can also be decomposed into between- and within-cluster components,
Now the variation of variable Y is decomposed as
with
and
1.5.4 Appendix 1.5.4: Covariance of Y and X
Based on independence assumptions in the measurement and structural models above, the covariance of Y and X is computed as:
Thus, the whole data variance–covariance matrix is shown below:
1.5.5 Appendix 1.5.5: Variance–covariance decomposition across times
This is the temporal decomposition of variance–covariance in Appendix 1.1.
Note the measurement model in Eq. (14), \(\varepsilon \sim N(0,\Theta _{\varepsilon })\). \(\varepsilon\) is independent of \(\eta\), \(\xi\) and \(\zeta\). \(\zeta \sim N(0,\Theta _{\zeta })\) is independent of \(\eta\), \(\xi\) and \(\varepsilon\). Rewrite each component in the model into two parts. One part represents the measure collected at time 0 and the other in time 1. For the first equation, \(Y=\left( {{\begin{array}{c} {Y_0 }\\ {Y_1 }\\ \end{array} }} \right)\), \(\delta =\left( {{\begin{array}{c} {\delta _0 }\\ {\delta _1 }\\ \end{array} }} \right)\), \({\Lambda } =\left( {{\begin{array}{c} {{\begin{array}{c} {{\Lambda _0} } \\ 0 \\ \end{array} }} \quad {{\begin{array}{c} 0 \\ {{\Lambda _1} }\\ \end{array} }} \\ \end{array} }} \right)\), \(\eta =\left( {{\begin{array}{c} {\eta _0 } \\ {\eta _1 } \\ \end{array} }} \right)\), \(\varepsilon =\left( {{\begin{array}{c} {\varepsilon _0 }\\ {\varepsilon _1 }\\ \end{array} }} \right)\). Variation of variable \(Y_t\) is decomposed as
with
and
for t = 0, 1.
Similarly, write
Correspondingly, the variation of outcome \(X_{t}\) is decomposed as
with
and
for t = 0,1.
The latent variables \(\xi _0\), \(\xi _1\), \(\eta _0\), and \(\eta _1\) are hierarchically structured and include within-cluster and between-cluster components (Muthén 1994, p. 379). This is because \(\eta\) has a functional relationship with \(\xi\) in the structural model of Eq. (20), where \(U\sim N(0,\Theta _U)\) is independent of \(\xi\), \(\varepsilon\) and \(\zeta\). Thus, it results in
with
and
Now, write the variance–covariance matrix as
1.5.6 Appendix 1.5.6: Compute \({\mathbb {V}}(Y)\)
In order to determine \({\mathbb {COV}}(Y_0, Y_1)\), the structural relationship between \(\eta _1\), and \(\eta _1\) is displayed by Eq. (17), where slope \({\mathcal {V}}\) represents schooling effect and \(\gamma\) represents maturation effect; \(\tau\) represents the pre-test effect at time 1 (Solomon 1949).
Thus, \({\mathbb {COV}}(Y_0 ,Y_1 )={\mathbb {COV}}[\delta _1 +{\Lambda _0} \eta _0 +\varepsilon _0,\delta _2 +{\Lambda _1} \eta _1 +\varepsilon _1 )= {\mathbb {COV}}[{\Lambda _0} \eta _0 ,{\Lambda _1} (\tau +\gamma +{\mathcal {V}} \eta _0 )]={\Lambda _0} {\mathbb {V}}(\eta _0 ){{\Lambda } }'_1 {\mathcal {V} }',\) with \({\mathbb {COV}}[\varepsilon _0 ,\varepsilon _1 )=0\).
Plug in those components and write
1.5.7 Appendix 1.5.7: Compute \({\mathbb {V}}(X)\)
In order to determine \({\mathbb {COV}}(X_0 ,X_1 )\), the structural relationship between \(\xi _0\), and \(\xi _1\) is displayed by Eq. (19).
Thus, \({\mathbb {COV}}(X_0 ,X_1 )={\mathbb {COV}}[v_0 +\Gamma _0 \xi _0 +\zeta _0 ,v_1 +\Gamma _1 \xi _1 +\zeta _1 )={\mathbb {COV}}[\Gamma _0 \xi _0 ,\Gamma _1 (\beta +{\mathcal {K}} \xi _0 )]=\Gamma _0 {\mathbb {V}}(\xi _0 ){\Gamma }'_1 {{{\mathcal {K}}} }',\) with \({\mathbb {COV}}[\zeta _0 ,\zeta _1 )=0\), and
Thus,
1.5.8 Appendix 1.5.8: Compute \({\mathbb {COV}}(Y_t,X_{t^{'}})\)
Components of \({\mathbb {COV}}(Y_t ,X_{t^{'}})\)—for \(t,t^{'} = 0,1\)—are computed as
Thus, the four components are computed as
and
These procedures derive all components displayed in the matrices of Appendix 1.1. Other variance–covariance matrices in Appendices 1.2–1.4 can be derived with similar procedures.
About this article
Cite this article
Wang, Q., Houang, R.T. & Maier, K. Multilevel structural equation modeling-based quasi-experimental synthetic cohort design. Behaviormetrika 45, 261–294 (2018). https://doi.org/10.1007/s41237-018-0053-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-018-0053-0
Keywords
- Propensity score matching
- Solomon four-group design
- Multilevel analysis
- Quasi-longitudinal design
- Causal inference
- Multilevel structural equation modeling
- Matching
- Synthetic cohort design