I congratulate Hao Wu and Michael W. Browne (henceforth, WB) on their thoughtprovoking approach to specification error in moment structure analysis. To my reading, the novel and challenging issue of their paper is to interpret specification error as a (stochastic) secondlevel variation of the sample covariance matrix, variation that is said to be induced by (physical, real?) "adventitious error" that selects the actual population from where sample data are extracted. Instead of the adage “all models are wrong” [“...but some are useful”], WB imply that each model is wrong in the current population but is true in an hypothetical superpopulation. The RMSEA emerges as the natural descriptor of the “distance” between the actual and a hypothetical population. My comments elaborate this view and aim to widen the perspective of WB’s paper by relating their approach to alternatives in the literature.
WB Versus Chen (1979)
At first reading, one sees a striking overlap between WB’s approach and Chen (1979) (henceforth, Ch). Let \(s = \text{ vec } \, (S)\) where \(S = \frac{1}{n} \sum _{i=1}^n (x_i  \bar{x}) (x_i  \bar{x})^\prime \), and \(x_1, \ldots , x_n\) are iid observations from a pvariate random \(x\). Here, \( \text{ vec } \, (.)\) denotes the usual vectorization operator.
Both WB and Ch assume the following twolevel variation setup for \(s\):

1.
Levelone variation, \(s \) is a realization of a distribution with mean \(\sigma \) and covariance matrix \(\Gamma _\sigma \), say
$$\begin{aligned} \text{ L1 }: \quad s \sim (\sigma , \Gamma _\sigma ) ; \end{aligned}$$ 
2.
Leveltwo variation, \(\sigma \) is a realization of a distribution with mean \(\omega \) and covariance matrix \(\Gamma _\omega \), say
$$\begin{aligned} \text{ L2 }: \quad \sigma \sim (\omega , \Gamma _\omega ) . \end{aligned}$$
WB and Chen coincide with the distributions assigned to L1 and L2: a Wishart distribution with degrees of freedom (df) \(n\) to L1; an Inverted Wishart distribution with (unknown) scalar parameter \(m\) to L2;^{Footnote 1} these distributions imply that \(\Gamma _\sigma \) and \(\Gamma _\omega \) are functions of \(\sigma \) and \(n\), and \(\omega \) and \(m\), respectively. Both WB and Ch specify a moment structure on \(\omega \), namely
where \(\omega (\tau )\) is a (continuously differentiable) function of \(\tau \), a \(q\)dimensional vector of parameters varying in \(\varUpsilon \) an open and compact subset of \(R^q\); finally, both WB and Ch estimate the parameter vector \(\tau \) and the scalar parameter \(m\) using the maximum likelihood (ML) method. WB and Ch, however, diverge in the computational approach used: Ch uses the EM algorithm, while WB use the Newton–Raphson method. They also diverge in that Ch uses the Bayesian approach, and WB use (frequentist) asymptotic methods.
In our view, however, the fundamental difference between WB and Ch is conceptual; it is the role assigned to the moment structure \( M_\omega \) and the parameter vector \(\tau \). WB view \(\tau \) as the vector of structural parameters (loadings, regression coefficients, etc.) and \( M_\omega \) as a true model for the superpopulation mean vector \(\omega \). In contrast, Ch views \(\tau \) as a vector of incidental parameter with no substantive interpretation and \( M_\omega \) as a prior information that serves to improve estimation of \(\theta \), the vector of structural parameters (loadings, regression coefficients, etc.), a function (explicit or implicit) of the moment vector \(\sigma \), \(\theta = \theta (\sigma ) \). In an example where \( M_\omega \) is a factor model and a component of \(\theta \) is a regression coefficient (\( \beta = \sigma _{12}/\sigma _{22}\)), Ch illustrates how the introduction of the variation level L2 and \( M_\omega \) improve the MSE of the classical estimator of \(\beta \), especially when sample size \(n\) is small. For large samples, Ch’s estimates converge to the classical ones that assume only L1 variation. In Ch’s approach, \( M_\omega \) is a parsimonious parameterization of the superpopulation parameter \(\omega \); if \( M_\omega \) is saturated, no reduction in MSE is achieved. Ch’s variation L2 is a statistical artifact to produce a regularization estimator of \(\theta \) for small samples.
In contrast, in WB’s approach, L2 confers to \( M_\omega \) the status of a true model, not for the “operative population” (i.e., for \(\sigma \)), but for the mean vector \(\omega \) of the postulated superpopulation. The vector \(\sigma \) is just a realization of L2; so, \( M_\omega \) is always misspecified for \(\sigma \). WB assign even a (physical?) motor for the variation L2, what they call "adventitious error," and they even get specific on the likely sources for this error.
In our view, however, data do not provide empirical evidence of the existence of L2 (nothing in that vein is pointed in WB), so we will just contemplate L2 as a mathematical device for modeling. The assumption that the distribution of L2 is an Inverted Wishart distribution seems to be critical for the whole WB’s setup. This distribution assumption and the presumption that \( M_\omega \) is not saturated permit identification of a key scalar parameter: \(\nu = 1/m\) (if \( M_\omega \) is saturated then the WB’s approach collapses). The parameter \(\nu \) takes the role of a scalar measure of the intensity of the variation L2, a kind of “distance” between \(\sigma \) and \(\omega \) (the mean vectors of the “operative ” and hypothetical populations). The assumption of an Inverted Wishart distribution provides even foot for deriving an estimate of \(\nu \), which turns out to be the square of the wellknown RMSEA (Browne & Cudeck, 1992). Asymptotic derivations based on \(n \rightarrow \infty \) and \(\nu \rightarrow 0\) provide even a standard error for this estimate. Interestingly, the twolevel variation setup of WB offer a (theoretical) revisiting of the RMSEA.
WB point to insufficiencies of the traditional approach to moment structure analysis.
The Traditional Approach, TA
WB claim that their approach overcomes the problems of what they call the “traditional approach” (TA). TA is simply L1, L2, and \(M_\omega \) with \(\Gamma _\omega = 0\); i.e., the case where L2 vanishes and thus \(\sigma \equiv \omega \) and \( M_\omega \equiv M_\sigma \) is a model for \(\sigma \).^{Footnote 2} As in Ch, TA addresses estimation of the vector \(\theta \) of structural parameter, a function of \(\sigma \) (implicit in \(M_\sigma \)). For large samples, TA coincides with Ch. TA and Ch diverge from WB in small and large samples, except when \(M_\omega \) is saturated, in which case the three approaches are the same.
WB questions the validity of the Pitman’s drift used by TA when \(M_\omega \) is misspecified (recall that in TA \(\omega \equiv \sigma \)); we read “[TA] though fairly successful, is still imperfect” (end of Section 1.2) and argues that ”[the Pitman’s drift device] is implausible in practice” (end of Section 2.3).
On the Pitman’s Drift Device
Let \(\delta \equiv \text{ min }_\tau {\mid \mid }(\sigma (\tau )  \sigma ) {\mid \mid }\) be a measure of the deviation of the model \( M_\sigma \) from \(\sigma \) (here \(\mid \mid . \mid \mid \) denotes a norm). Let \(\delta _n\) be \(\delta \) when sample size is \(n\). In case of misspecification of \( M_\sigma \), i.e., when \(\delta \ne 0\), TA derives asymptotic distributions for estimates, test statistics, and model diagnostics, using the mathematical device that \(\sqrt{n}\delta _n \rightarrow \mu \), where \(\mu \) is finite (e.g., Satorra, 1989). It is known that this device produces asymptotic approximations to the actual distributions that will be accurate when \(n\) is large and \(\delta \) is small. In the case of estimates, this device ensures the estimates are consistent for the parameters of a limit model where \(\delta =0\). For test statistics, this device produces noncentral distributions for approximating the actual distributions. It is not needed that “physically” the populations move with \(n\), as it seems implicit in WB’s comment “[this Pitman drift] is implausible in practice because the population should not be affected by sample size” (end of Section 2.3), what is needed is that the model is not too deviant from a model where \(\delta = 0\), i.e., that the posited model approximates the true model. The Pitman’s drift should be viewed just as a mathematical device to obtain an asymptotic approximation of the distribution of statistics of interest (the accuracy of the approximation being judged usually by Monte Carlo evaluation).
The Pitman’s drift (also called sequence of local alternatives) is classical in statistics (e.g., Wald, 1943) and has been used extensively. Satorra and Saris (1985) used this device to justify a procedure to approximate the power of the chisquare goodnessoffit test (see, e.g., Satorra & Saris, 1983; Satorra, Saris, & de Pijper, 1991, for Monte Carlo evidence on the accuracy of this procedure). More recently, Chun and Shapiro (2009) showed that, for small deviations from the null, the Pitman’s drift device provides more accurate approximations to the actual distribution of the chisquare goodnessoffit test than fix alternatives.
We note that the asymptotic results of WB are obtained under the double limit \(n \rightarrow \infty \) and \(\nu \rightarrow 0\), we read "This assumption [\(\nu \rightarrow 0\)] is more plausible than the traditional Pitman drift assumption in that adventitious error is only assumed small, but not assumed to get smaller when the sample size increases" (mid of Section 10). Clearly, \(\nu \) is tied to the size of misspecification and is thus the counterpart of \(\delta \), so \(\nu \rightarrow 0\) is tantamount to \(\delta _n \rightarrow 0\), and, to our view, the WB’s asymptotics stay on identical foot as the Pitman’s drift with regards to being “plausible in practice.” Whether the limits of \(n\) and \(\delta _n\) operate in one dimension, the Pitman’s drift device of \(\sqrt{n}\delta _n \rightarrow \mu \), or in two dimensions, \(n \rightarrow \infty \) and \(\nu \rightarrow 0\) independently, do not change their plausibility; except that, for undertaking a Monte Carlo evaluation of the accuracy of these asymptotic approximations, the Pitman’s drift offers a much more simplified frame than the double limit involved in WB.
TA and WB’s TwoLevel Data
Assume that data arise from the twolevel setup of WB, under an exact model \(M_\omega \) for \(\omega \). What would be the consequences of applying standard TA analysis to data \(x_1, \ldots , x_n\) sampled from \(\sigma \)? This section investigates this issue using a Monte Carlo illustration.
We consider the following data generating process. For a given choice of \(\omega \) and \(\nu = 1/m\), a population vector \(\sigma \) is sampled from L2 and, given \(\sigma \), a sample \(x_1, \ldots , x_n\) of size \(n\) is extracted from L1, with the Wishart and Inverted Wishart distributions used in L1 and L2, respectively. The sample covariance matrix \(S\) is computed from \(x_1, \ldots , x_n\) and fitted—using TA—to the following singlefactor model \(M_\sigma : \, \sigma = \sigma (\theta ) \), where \(\sigma = \text{ vec }\, \Sigma \),
and \(\theta = (\lambda _1, \lambda _2,\lambda _3,\lambda _4,\lambda _5,\psi _1,\psi _2,\psi _3,\psi _4,\psi _5)\). For the simulations, we use \(\omega = \text{ vec }\, \varOmega \) with
the covariance matrix implied by a singlefactor model with loading parameters equal to \(.8\), unique variances equal to \(.36\), and variance of the common factor equal to \(1\). Thus \(M_\sigma \) is a true model for \(\omega \), but it is a misspecified model for \(\sigma \). The degrees of freedom of the chisquare goodnessoffit model test is df = 5.
Given the above \(\varOmega \), for each combination of \(m\) and \(n\) listed in the first two columns of Table 1, 1000 replicates of \(S\) and the corresponding TA analyses are undertaken. TA is carried out using lavaan (Rosseel, 2012) with specification ML, and all the computations are performed in R (R Development Core Team, 2008). The Monte Carlo distribution of the TA parameter estimates and standard errors (se’s) as well as the distribution of a scaled version of the regular chisquare goodnessoffit test is reported in Table 1. For simplicity, we only consider the estimate of \(\lambda _1\), the other estimates perform in a similar manner.
We can reason that the unconditional expectation of \(E_u(S)\) (i.e., the expectation under variations L1 and L2) is
where \(E_L\) denotes expectation conditional to level \(L\); that is, the unconditional expected matrix value of \(S\) is the matrix \(\varOmega \) for which \(M_\sigma \) holds exactly. Consistency of \(S\) as an estimate of \(\varOmega \) could also be achieved but at the cost, however, of asymptotic limits with \(n \rightarrow \infty \) and \(\nu \rightarrow 0\). Thus, we should expect the TA estimate of \(\lambda _1\) to be centered around its true value \(0.8\) despite the fact that the model \(M_\sigma \) analyzed by TA is misspecified for \(\sigma \). The standard errors (se’s) produced by TA, however, are necessarily conditional to \(\sigma \), the population from where \(x_1, \ldots , x_n\) has been extracted. That is, the se’s produced by TA do not take into account the variation L2, the sampling variation of \(\sigma \) within L2. Unconditional se’s for TA estimates can be deduced from the unconditional distribution of \(S\). Clearly, given the nature of the distribution assumed in L2, the unconditional asymptotic variance of vec \(S\) is obtained by multiplying the conditional one (the one that assumes only variation L1) by the following scaling factor
a number that necessarily is \(c > 1\). For this result, we require \(n \rightarrow \infty \) and \(\nu \rightarrow 0\). That is, multiplication by \(\sqrt{c} \) transforms the (conditional to L1) se’s produced by TA to the unconditional ones that incorporate the variation of L2. Moreover, scaling by \(c\) the regular TA chisquare goodnessoffit test (as in Satorra & Bentler, 1994) produces an asymptotic chisquare statistics that takes into account both L1 and L2 variations.^{Footnote 3} These expectations will now be confronted with the Monte Carlo results shown in Table 1.
We observe that: (i) the Monte Carlo mean of the TA estimate \(\hat{\lambda }_1\) is close to the true population value \(.8\) (see column 3); (ii) the mean of the se’s computed by TA (column 5) show a downward bias with respect to the true unconditional se’s estimated by Monte Carlo (column 4); (iii) the scaled se’s (column 6) are fairly close to the Monte Carlo se’s (column 4); and (iv) the Monte Carlo estimate of the rejection rate of the (scaled) chisquare 5 %level test agrees with its nominal value 5 % (the last column of the table). Thus, Table 1 confirms our expectation. Noteworthy is the large value of \(c\) in the case of large misspecification (\(m=60\)) and large sample size \(n = 2000\).
Conclusion
WB write “[TA] though fairly successful, is still imperfect” and propose replacing the singlelevel approximative model L1 of TA by a twolevel variation L1 and L2 where the model \(M_\sigma \) is exact, not in the “operative population” but in the (hypothetical) superpopulation L2. The twolevel variation L1 and L2 parallels the formulation of Chen (1979), but with a fundamental difference: while for WB the parameters of interest to the researcher are the parameters of a true superpopulation model (the model in L2), for Ch, L2 is just a statistical device aimed to improve estimation of \(\theta \), a vector of structural parameters residing in L1. In Ch, when \(n \rightarrow \infty \), estimation converges to TA. This contrasts with WB, where the parameter of interest resides in L2, and even when sample size \(n \rightarrow \infty \) (i.e., when first level variation L1 vanishes), the crux of estimation rests on L2 (this is precisely the case of the very large value of \(c\) in Table 1).
The WB’s twolevel setup has the virtue of encapsulating the whole model misspecification issue into the single scalar \(\nu \), or its estimate, the square of the RMSEA. No other model evaluation statistic, however, emerges from the WB’s approach. Not even a test for what presumably is a very restrictive assumption, the Inverted Wishart distribution assumed in L2, a distribution assumed to mimic adventitious error variability.
The Monte Carlo illustration has shown that when L2 is present, then TA estimates are valid (consistent, when \(\nu \) is small) estimates for the parameters of the true model \(M_\omega \) that resides in L2. The se’s produced by TA, however, are conditional to the “operational population,” they account only for L1 variation. A simple scaling factor, however, can be used to expand TA se’s to account for the variation added by L2. For given \(m\) and \(n\), a simple scaling of the regular chisquare goodnessoffit test produced by TA is an (unconditional distribution) asymptotic chisquare statistic.
WB’s paper opens a new interpretation for RMSEA, a measure of the magnitude of “adventitious error.” Model evaluation, however, cannot be reduced to a single number. TA offers several statistics to assess misspecification (model test statistic, modification indices, parameter change indicators, etc.). Can these diagnostic statistics be extended to WB’s twolevel setup when the covariance structure is postulated in L2? How can we assess empirically the distribution assumed for L2 as well as the whole twolevel model setup? Given the mathematical similarity of Ch and WB, could a connection be made to emerge with more tools for model diagnostic than just the RMSEA? An interesting feature of Ch’s approach is the reduction of mean square error of estimates when compared to TA, for small samples.
Notes
 1.
Ch denotes as \(\nu \) the parameter that WB denote as \(m\), and viceversa. Moreover, WB use \(\nu \) to denote \(1/m\).
 2.
Note that in current TA, no distribution is specified in L1, so \(\Gamma _\sigma \) is not necessarily a function of \(\sigma \), a nonparametric estimate of \(\Gamma _\sigma \) is in place.
 3.
WB show that RMSEA is an estimate of \(\sqrt{\nu } \) (when \(n \rightarrow \infty \) and \(\nu \rightarrow 0\)), so the scaling factor for the standard errors could be computed using the RMSEA (when \(n\) is large and \(\nu \) is small). This, however, would not work for scaling the chisquare test.
References
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230–258.
Chen, C.F. (1979). Bayesian inference for a normal dispersion matrix and its application to stochastic multiple regression analysis. Journal of the Royal Statistical Society, Series B, 41, 235–248.
Chun, S. Y., & Shapiro, A. (2009). Normal versus noncentral chisquare asymptotic of misspecified models. Multivariate Behavioral Research, 44, 803–827.
R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3900051070.
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified approach. Psychometrika, 54(1), 131–151.
Satorra, A., & Bentler, P. M. (1994). Correctios to test statistics and standard errors in covariance structure analysis. In A. van Eye & C. C. Clogg (Eds.), Latent variable analysis in developmental research (pp. 285–305). Thousand Oaks, CA: Sage Publications.
Satorra, A., & Saris, W. E. (1983). The accuracy of a procedure for calculating the power of the likelihood ratio test as used within the LISREL framework. In C. P. Middendorp, B. Niemoller, & W. E. Saris (Eds.), Sociometric Research 1982 (pp. 127–190). Amsterdam: Sociometric Research Foundation.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50(1), 83–89.
Satorra, A., Saris, W. E., & de Pijper, W. M. (1991). A comparison of several approximations to the power function of the likelihod ratio test in covariance structure analysis. Statistica Neerlandica, 45, 173–185.
Wald, A. (1943). Test of statistical hypothesis concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54, 426–482.
Author information
Additional information
Work supported by grant EC0201128875 from the Spanish Ministry of Science and Innovation.
Rights and permissions
About this article
Cite this article
Satorra, A. A Comment on a Paper by H. Wu and M. W. Browne (2014). Psychometrika 80, 613–618 (2015). https://doi.org/10.1007/s113360159455z
Received:
Published:
Issue Date: