Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS
- 374 Downloads
- 2 Citations
Abstract
In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.
Keywords
Linear regression model Direction of effects Direction dependence Observational data NonnormalityThis article introduces methods of direction dependence and presents a unified statistical framework to discern the causal direction of effects in linear models using observational data. Existing regression-type methods allow researchers to quantify the magnitude of hypothesized effects but are of limited use when establishing the direction of effects between variables—that is, whether x → y or y → x correctly describes the causal flow between two variables x and y. The statistical framework proposed in this article allows researchers to make conclusions about the direction of effects. In the present work, we focus on observational (nonexperimental) data settings because this type of data is the chief material for the presented principles of direction dependence. The issue of effect directionality in the context of experimental studies (e.g., when decomposing total effects into direct and indirect effect components) will be taken up in the Discussion section.
Establishing cause–effect relations between variables is a central aim of many empirical studies in the social sciences. The direction of influence between variables is a key element of any causal theory that purports to explain the data-generating mechanism (Bollen, 1989). Questions concerning direction of effect naturally arise in observational studies. For example, it may not be entirely clear whether tobacco consumption causes depression and anxiety or whether people with symptoms of depression and anxiety are more likely to engage in health damaging behavior (Munafò & Araya, 2010; Taylor et al., 2014); whether violent video games expose players to aggressive behavior or whether aggressive people are simply more attracted to violent video games (Gentile, Lynch, Linder, & Walsh, 2004); or whether lead exposure contributes to the development of ADHD or whether children with ADHD symptoms are unable to stay focused enough to avoid lead-tainted objects (Nigg et al., 2008).
Methods of causal discovery have experienced rapid development within the last decades and various causal search algorithms have been proposed (see Spirtes & Zhang, 2016, for a recent overview). These search algorithms are designed to learn plausible causal structures from multivariate data. Examples of such algorithms include the PC algorithm (Spirtes, Glymour, & Scheines, 2000), greedy equivalence search (Chickering, 2002), cyclic causal discovery (Richardson & Spirtes, 1999), fast causal inference (Zhang, 2008), and linear non-Gaussian acyclic models (cf. Shimizu, 2016; Shimizu, Hoyer, Hyvärinen, & Kerminen, 2006) that are either designed to discover (Markov) equivalence classes of directed acyclic graphs (DAGs; i.e., a small subset of candidate models that have the same support by the data in terms of model fit; cf. Verma & Pearl, 1991) or uncover DAG structures beyond equivalence classes. All these algorithms constitute important exploratory tools for causal learning and are, thus, ideally suited to generate new substantive hypotheses concerning the causal nature of constructs.
DDA, in contrast, is concerned with a research scenario that is confirmatory in nature—that is, situations in which a substantive theory about the causal relation exists and the researcher wishes to know if the causal direction assumed by the model is plausible relative to the alternative scenarios a reasonable skeptic might propose. The primary goal is to probe this causal theory against alternatives while adjusting for potential background variables known to be explanatory in nature. Thus, instead of extracting plausible DAG structures (or classes of equivalent DAGs) for a given dataset, one is interested in testing a specific model (e.g., lead exposure → ADHD) against a plausible alternative (ADHD → lead exposure). The present article is designed to introduce principles of DDA to quantitative researchers. Although previous work (e.g., Dodge & Rousson, 2000, 2001; Dodge & Yadegari, 2010; Muddapur, 2003; Sungur, 2005; Wiedermann & von Eye, 2015a, 2015b, 2015c) has focused on direction dependence methods to choose between the two models x → y and y → x (Figs. 1a and b), direction dependency in the presence of confounders has received considerably less attention. To fill this void, we present extensions of DDA to scenarios in which confounders are present and incorporate these new insights into the existing direction dependence principle. As a result, we propose a unified framework that allows one to identify each explanatory model given in Fig. 1.
The article is structured as follows: First, we define statistical models suitable for DDA and introduce model assumptions. We then introduce elements of DDA and summarize new results that describe the behavior of DDA tests when unmeasured confounders are present. Next, three SPSS macros are introduced that make DDA accessible to applied researchers and a data example is given to illustrate their application. The article closes with a discussion of conceptual and empirical requirements of DDA, potential data-analytic pitfalls, and potential extensions of the DDA methodology. In addition, sample size recommendations based on previous simulation studies are provided.
The direction dependence principle
Direction dependence can be defined as the asymmetry between cause and effect. The model x → y implies that changing x (the cause) changes y (the effect) but changing y does not lead to change in x (see also Pearl, 2009; Peters, Janzing, & Schölkopf, 2017). Reversely, when changing y changes x but, at the same time, changing x does not change y, then the model y → x describes the causal relation. Limitations of conventional association-based approaches to uncover the asymmetry of cause and effect can be explained by the fact that these methods only consider variation up to the second order moments and, thus, rely on correlation in its symmetric form—that is, cor(x, y) = cor(y, x). The key element of DDA is to consider variable information beyond second order moments (specifically skewness and kurtosis) because asymmetry properties of the Pearson correlation and the related linear model appear under nonnormality. These asymmetry properties are of importance when x and y are not exchangeable in their roles as explanatory and response variables without leading to systematic model violations. DDA, thus, requires and makes use of nonnormality of variables to gain deeper insight into the causal mechanism. DDA consists of three core components: (1) distributional properties of observed variables, (2) distributional properties of error terms of competing models, and (3) independence properties of error terms and predictors in the competing models. On the basis of new results regarding direction dependence in the presence of confounders, we show that unique patterns of DDA component outcomes exist for each of the three models shown in Fig. 1. These outcome patterns enable researchers to select between competing models. In the following paragraphs, we formally define the statistical models considered. We then introduce DDA components separately for confounder-free “true” models and when confounders are present. In addition, statistical inference compatible with direction dependence is discussed. To simplify presentation, we assume that x → y corresponds to the “true” model and y → x represents the directionally mis-specified model.
Model definitions
We start the introduction to DDA by defining the statistical models considered. Although statistical models can either be used for the purposes of explanation or prediction^{2} (Geisser, 1993), DDA is designed for the task of validating explanatory models, that is, to test the causal hypothesis assumed under a given theory. Assume that a construct \( \mathcal{X} \) (e.g., lead exposure) causes construct \( \mathcal{Y} \) (e.g., ADHD symptomatology) through mechanism \( \mathcal{F} \)—that is, \( \mathcal{Y}=\mathcal{F}\left(\mathcal{X}\right) \). Furthermore, let x and y be operationalizations of \( \mathcal{X} \) and \( \mathcal{Y} \) (e.g., blood lead concentration and number of DSM-IV hyperactive–impulsive symptoms) and define f as the statistical model (e.g., the linear model) to approximate \( \mathcal{F} \)—that is, y = f(x). The direction dependence framework provides a set of statistical tools to evaluate the directionality assumption of y = f(x) implied by the causal theory \( \mathcal{X} \)→ \( \mathcal{Y} \).
The slope b_{yx} denotes the change in the fitted value of y for a one-unit increase in x and represents the causal effect of x → y. Estimates of the causal effect are usually obtained using OLS or, in structural equation models (SEMs), maximum likelihood estimation. Nonnormality of the “true” predictor, quantified as nonzero skewness \( {\gamma}_x=E\left[{\left(x-E\left[x\right]\right)}^3\right]/{\sigma}_x^3 \) and/or nonzero excess kurtosis \( {\kappa}_x=E\left[{\left(x-E\left[x\right]\right)}^4\right]/{\sigma}_x^4-3 \) (with E being the expected value operator), is assumed to reflect an inherent distributional characteristic of \( \mathcal{X} \) (as opposed to nonnormality due to boundaries of the operationalized x). The error term e_{yx} is assumed to be normally distributed (with zero mean and variance \( {\sigma}_{e_{yx}}^2 \)), serially independent, and independent of x.
It is common to include covariates (e.g., background or baseline measures) in statistical models to increase precision of parameter estimates and statistical power. In other words, covariates are independent variables that are considered in a target model to control for their influences on the putative response. In contrast, failing to include important covariates can lead to confounded parameter estimates when covariates are (in addition to their relation to the response) correlated with other predictors. However, several authors have cautioned against careless use of covariates because conditioning on covariates can also increase the bias of causal estimates (Pearl, 2009; Spirtes, Richardson, Meek, Scheines, & Glymour, 1998). Similar considerations hold for statistical models in the context of DDA. To be eligible for DDA, covariates must be known to be on the explanatory side of the statistical model. In addition, one must ensure that a recursive causal ordering of the covariates themselves is theoretically possible and that all covariates can be expressed as linear combinations of mutually independent external influences. We can formally express these prerequisites for a given set of covariates z_{j} (j = 1, ... , J) as z_{j} = ∑_{k(i) ≤ k(j)}a_{ji}η_{i}, with k(i) ≤ k(j) describing the causal order of covariates (i.e., z_{i} precedes z_{j}). The parameter a_{ji} describes the total effect and η_{i} denoting the external influence associated with z_{i}. When no other covariate precedes z_{j}, one obtains z_{j} = η_{j} with a_{ji} = 1. For example, suppose that two covariates (z_{1} and z_{2}) are known to influence y and that z_{1} precedes z_{2} and no other covariate precedes z_{1} (i.e., z_{1} = η_{1}). In this case, one obtains z_{2} = a_{21}η_{1} with z_{1} = η_{1}, which implies that z_{2} can be expressed as a (weighted) external influence. Consider again the example of ADHD and blood lead exposure. Two factors that are known to affect ADHD symptomology are prenatal maternal emotional stress (Harris & Seckl, 2011) and the cultural context of the child (Miller, Nigg, & Miller, 2009; see also Nigg, 2012). Arguments of temporality or logical order of effects can be used to evaluate the eligibility of covariates for DDA. Both, prenatal maternal stress and cultural context are located earlier in time than the child’s blood lead level and ADHD symptomology under study that justifies their use as background variables. Furthermore, in principle, we are also able to establish a causal order of the covariates themselves—that is, cultural context may be conceived as a background variable directly or indirectly contributing to maternal stress level. In other words, both variables are unlikely to render the target model cyclic, which makes them eligible to be covariates in DDA. In general, covariates can be either continuous or categorical. For categorical covariates, however, we need to assume that these variables constitute external influences themselves—that is, we exclude cases in which categorical variables serve as outcomes of other independent variables in the model (detailed explanations will be given below). Although this assumption is stricter than the continuous case, it still allows multiple-group scenarios in which the magnitude of the causal effect of predictor and outcome can vary across groups. When categorical covariates are present, a two-stage approach of model estimation is preferable. That is, in a first step, the effect of categorical covariates is partialled out of the putative predictor (e.g., x), the putative outcome (y), and all the continuous covariates and extracted regression residuals from these auxiliary models are subsequently used as “purified” measures (an example is given below). According to the Frisch–Waugh–Lovell theorem (cf. Frisch & Waugh, 1933; Lovell, 1963; sometimes called the regression anatomy formula: Angrist & Pischke, 2009), regressing the “purified” outcome on the “purified” independent variables in the second step leads to the same model parameters as in the full multiple regression model including categorical covariates.
DDA Component I: Distributional properties of observed variables
Absence of confounders
Presence of confounders
Thus, directional conclusions depend on the relative strength of the confounding effects. No biases in terms of model selection are expected when ∣ρ_{yu} ∣ < ∣ ρ_{xu}∣because ∣γ_{y} ∣ < ∣ γ_{x}∣ and ∣κ_{y} ∣ < ∣ κ_{x}∣still hold, which suggests the model x → y. In contrast, biases are likely to occur when ∣ρ_{yu} ∣ > ∣ ρ_{xu}∣, because ∣γ_{y} ∣ > ∣ γ_{x}∣and ∣κ_{y} ∣ > ∣ κ_{x}∣ increase the risk of erroneously selecting the mis-specified model y → x.
Statistical inference
von Eye and DeShon (2012) proposed using normality tests, such as D’Agostino’s (1971) skewness and/or Anscombe and Glynn’s (1983) kurtosis test, to evaluate hypotheses compatible with observed-variable based direction dependence. Directional decisions are based on separately evaluating nonnormality of predictor and response. In addition, Pornprasertmanit and Little (2012) suggested nonparametric bootstrap CIs for higher-order moment differences (Δ(γ) = ∣γ_{x}∣ − ∣γ_{y}∣ and Δ(κ) = ∣κ_{x}∣ − ∣κ_{y}∣).
DDA Component II: Distributional properties of error terms
Absence of confounders
Thus, the skewness and excess kurtosis of e_{xy} systematically increase with the magnitude of nonnormality of the “true” predictor. Furthermore, because normality of the error term is assumed in the “true” model (i.e.,\( {\gamma}_{e_{yx}}={\kappa}_{e_{yx}}=0 \)), differences in higher moments of e_{yx} and e_{xy} provide, again, information about the directional plausibility of a linear model. This DDA component can straightforwardly be extended to multiple linear regression models when adjusting for possible covariates (cf. Wiedermann & von Eye, 2015b). Under model x → y, one obtains \( \mid {\gamma}_{e_{xy}}\mid >\mid {\gamma}_{e_{yx}}\mid \) and \( \mid {\kappa}_{e_{xy}}\mid >\mid {\kappa}_{e_{yx}}\mid \); under model y → x, one obtains \( \mid {\gamma}_{e_{xy}}\mid <\mid {\gamma}_{e_{yx}}\mid \) and/or \( \mid {\kappa}_{e_{xy}}\mid <\mid {\kappa}_{e_{yx}}\mid \).
Presence of confounders
Statistical inference
Again, nonnormality tests can be used to separately evaluate distributional properties of model residuals (cf. Wiedermann et al., 2015). An asymptotic significance test and bootstrap CIs for the skewness difference of residuals (\( \Delta \left({\gamma}_e\right)=\mid {\gamma}_{e_{xy}}\mid -\mid {\gamma}_{e_{yx}}\mid \)) have been proposed by Wiedermann et al. (2015) and Wiedermann and von Eye (2015b). The asymptotic test requires normality of the “true” error term. Only error symmetry is required for the bootstrap approach. Analogous procedures for the difference in excess-kurtosis values were discussed by Wiedermann (2015).
DDA Component III: Independence properties of predictor and error term
Absence of confounders
Thus, both the “true” predictor x and the “true” error term e_{yx} contribute to y in Eq. 1 and e_{xy} in Eq. 18. Although this illustration serves as an intuitive explanation, a rigorous proof of nonindependence follows from the Darmois–Skitovich theorem (Darmois, 1953; Skitovich, 1953). The theorem states that if two linear functions (υ_{1} and υ_{2}) of the same independent random variables w_{j} (j = 1 ,..., J), υ_{1} = ∑_{j}α_{j}w_{j} and υ_{2} = ∑_{j}β_{j}w_{j}, with α_{j} and β_{j} being constants, are independent, then all w_{j} for which α_{j}β_{j} ≠ 0 must be normally distributed. The reverse corollary implies that if a common w_{j} exists that is nonnormal, then υ_{1} and υ_{2} must be nonindependent (cf. Shimizu et al., 2011; Wiedermann & von Eye, 2015a). Thus, e_{xy} in Eq. 18 and y in Eq. 1 are nonindependent because of the common nonnormal variable x, and \( \left(1-{\rho}_{xy}^2\right){b}_{yx}\ne 0 \) (excluding ∣ρ_{xy}∣ = 1, due to practical irrelevance). Since the Darmois–Skitovich theorem applies for J variables, covariates can straightforwardly be included in the Models 1 and 2, provided that the covariates fulfill the requirements described above. However, the Darmois–Skitovich theorem concerns continuous random variables w_{j}. Thus, when categorical covariates exist, a two-step regression approach should be applied first with subsequent DDA being performed on residualized x and y variables. Because independence is assumed in the correctly specified model, direction dependence statements are possible through separately evaluating independence in competing models (cf. Shimizu et al., 2011; Wiedermann & von Eye, 2015a). In essence, if the null hypothesis H_{0} : x ⊥ e_{yx} is retained and, at the same time, H_{0} : y ⊥ e_{xy} is rejected, then it is more likely that the observed effect transmits from x to y. Conversely, if H_{0} : x ⊥ e_{yx} is rejected and H_{0} : y ⊥ e_{xy} is retained, then the model y → x should be preferred.
Presence of confounders
Thus, through reconsidering the “true” model given in Eq. 3 and, again, making use of the Darmois–Skitovich theorem, one concludes that the independence assumption is likely to be violated in both candidate models whenever a nonnormal confounder is present and \( \left[{b}_{yu}+\left({b}_{yx}-{b}_{yx}^{\prime}\right){b}_{xu}\right]{b}_{xu} \) and \( \left[{b}_{xu}-{b}_{xy}^{\prime}\left({b}_{yu}+{b}_{yx}{b}_{xu}\right)\right]{b}_{yu} \) deviate from zero.
Statistical inference
Significance tests to evaluate nonindependence of (linearly uncorrelated) variables have extensively been discussed in signal processing (Hyvärinen, Karhunen, & Oja, 2001). The first class of tests considered here uses the basic definition of stochastic independence, E[g_{1}(υ_{1})g_{2}(υ_{2})] − E[g_{1}(υ_{1})]E[g_{2}(υ_{2})] = 0 for any absolutely integrable functions g_{1} and g_{2}. Thus, independence tests can be constructed using correlation tests of the form cor[g_{1}(x), g_{2}(e_{yx})] and cor[g_{1}(y), g_{2}(e_{xy})] , where at least one function is nonlinear. These tests are easy to use because they essentially rely on the Pearson correlation test applied to nonlinearly transformed variables.
In other words, the power of detecting nonindependence in y → x increases with nonnormality of x. Although proofs for Eqs. 21 and 22 can be found in Wiedermann and von Eye (2015a, 2016), a proof of Eq. 23 is given in online Appendix A. Note that the covariances in Eqs. 21 and 23 involve squared residuals, which reveals a direct link to significance tests originally designed for detecting patterns of heteroscedasticity (cf. Wiedermann, Artner, & von Eye, 2017). Because heteroscedasticity occurs, among others, when the variance of the error can be expressed as a function, g, of independent variables—that is, \( Var\left({e}_{xy}|y\right)={\sigma}_{e_{xy}}^2g(y) \) (see, e.g., Kaufman, 2013), it follows that homoscedasticity tests that relate squared residuals to functions of model predictors (such as the Breusch–Pagan test), are likely to indicate patterns of nonconstant error variances in directionally mis-specified models.
Covariances based on the hyperbolic tangent function have been proposed by Hyvärinen (2010) and Hyvärinen and Smith (2013). The value tanh(υ) is the derivative of the log-density of an inverse hyperbolic cosine distribution that provides an approximation of the likelihood ratio of directionally competing models in the bivariate case. The inverse hyperbolic cosine distribution constitutes a reasonably close approximation for several leptokurtic observed variables (cf. Mumford & Ramsey, 2014). Although tanh-based correlation tests are ideally suited for symmetric nonnormal variables, the statistical power of this approach can be expected to be low for skewed variables.
Since the choice of g_{1} and g_{2} is almost arbitrary, nonlinear approaches do not constitute rigorous directionality tests. Additional Type II errors beyond cases of small sample sizes are introduced because testing all existing functions g_{1} and g_{2} is impossible. Recently, a promising alternative was suggested, the Hilbert-Schmidt Independence Criterion (HSIC; Gretton et al., 2008). The HSIC evaluates the independence of functions of random variables and is provably omnibus in detecting any dependence between two random variables in the large sample limit. Sen and Sen (2014) introduced the HSIC in the context of testing the independence of predictors and error terms of linear regression models and proposed a bootstrap approach to approximating the distribution of the test statistic.
Model selection
Properties, significance tests, and patterns of DDA components for the three candidate models
Distribution of Observed Variables | Distribution of Error Terms | Independence of Predictor and Error Term | |
---|---|---|---|
General Properties | The “true” outcome of a confounder-free model will always be closer to the normal distribution than the true predictor | The “true” error term of a confounder-free model will always be closer to the normal distribution than the error term of the mis-specified model | Predictor and error term of the “true” model will be independent, whereas nonindependence will be observed in the mis-specified model |
Significance Tests | Separate D’Agostino skewness and Anscombe–Glynn kurtosis tests Bootstrap confidence interval for higher-moment differences | Separate D’Agostino skewness and Anscombe–Glynn kurtosis tests Asymptotic skewness and excess-kurtosis difference tests Bootstrap confidence interval for higher-moment differences | Separate nonlinear correlation tests Separate Breusch–Pagan and robust Breusch–Pagan homoscedasticity tests HSIC test |
Model: x → y | y is closer to the normal distribution than x: ∣γ_{y} ∣ < ∣ γ_{x}∣ ∣κ_{y} ∣ < ∣ κ_{x}∣ | e_{yx} is closer to the normal distribution than is e_{xy}: \( \mid {\gamma}_{e_{yx}}\mid <\mid {\gamma}_{e_{xy}}\mid \) \( \mid {\kappa}_{e_{yx}}\mid <\mid {\kappa}_{e_{xy}}\mid \) | e_{yx} and x are independent, and e_{xy} and y are dependent |
Model: y → x | x is closer to the normal distribution than y: ∣γ_{x} ∣ < ∣ γ_{y}∣ ∣κ_{x} ∣ < ∣ κ_{y}∣ | e_{xy} is closer to the normal distribution than e_{yx}: \( \mid {\gamma}_{e_{xy}}\mid <\mid {\gamma}_{e_{yx}}\mid \) \( \mid {\kappa}_{e_{xy}}\mid <\mid {\kappa}_{e_{yx}}\mid \) | e_{xy} and y are independent, and e_{yx} and x are dependent |
Presence of Confounder: | Higher moment differences depend on the correlations between x and u and y and u: • ∣γ_{y} ∣ < ∣ γ_{x}∣and ∣κ_{y} ∣ < ∣ κ_{x}∣ if ∣ρ_{xu} ∣ > ∣ ρ_{yu}∣ • ∣γ_{x} ∣ < ∣ γ_{y}∣and ∣κ_{x} ∣ < ∣ κ_{y}∣ if ∣ρ_{xu} ∣ < ∣ ρ_{yu}∣ | Higher-moment differences of e_{yx} and e_{xy} depend on the semipartial correlation coefficients ρ_{y(u| x)} and ρ_{x(u| y)}: • \( \mid {\gamma}_{e_{yx}}\mid <\mid {\gamma}_{e_{xy}}\mid \)and \( \mid {\kappa}_{e_{yx}}\mid <\mid {\kappa}_{e_{xy}}\mid \) if ∣ρ_{y(u| x)} ∣ < ∣ ρ_{x(u| y)}∣ • \( \mid {\gamma}_{e_{xy}}\mid <\mid {\gamma}_{e_{yx}}\mid \)and \( \mid {\kappa}_{e_{xy}}\mid <\mid {\kappa}_{e_{yx}}\mid \) if ∣ρ_{x(u| y)} ∣ < ∣ ρ_{y(u| x)}∣ | Independence assumption will be violated in both models—that is, e_{yx} and x are dependent, and e_{xy} and y are also dependent |
A worked empirical example with SPSS
Summary of arguments and their position in the three SPSS macros
To present a fully worked empirical example (consisting of preevaluating distributional requirements, building a valid target model, and subsequently using DDA) and demonstrate the use of the macros for DDA, we use data from a cross-sectional study on the triple-code model (Dehaene & Cohen, 1998). The triple-code model is used to explain the development of numerical cognition in children and proposes that numbers are represented in three different codes that serve different purposes in number processing. The analog magnitude code (AMC) represents numbers on a mental number line, includes knowledge of the proximity and size of numerical quantities, and is used in approximate estimations and magnitude comparisons. The auditory verbal code (AVC) represents numbers in syntactically organized word sequences that are important for verbal input/output, counting, and retrieving memorized arithmetic facts. The visual Arabic code (VAC) represents numerical quantities in Arabic format necessary for multidigit operations and parity judgments. Using the triple code model, von Aster and Shalev (2007) suggested a hierarchical developmental model of numerical cognition in which AMC is viewed as an inherited core system necessary to further develop the AVC and, as well, the VAC. In other words, the model posits a directional link between AMC and AVC (i.e., AMC → AVC) and AMC and VAC (i.e., AMC → VAC). In the present demonstration, we focus on the directionality of AMC and AVC.
Koller and Alexandrowicz (2010) collected AMC and AVC ability measures for 341 second- to fourth-grade elementary school children (185 girls and 156 boys, aged between 6 and 11 yrs.) using the Neuropsychological Test Battery for Number Processing and Calculation in Children (ZAREKI-R; von Aster, Weinhold Zulauf, & Horn, 2006). AMC sum scores are based on 31 dichotomous items (focusing on perceptual quantity estimation, placing numbers on an analog number line, counting backward, enumeration, magnitude comparison of spoken numbers, and contextual magnitude judgment), and AVC sum scores are based on 52 dichotomous items (mental calculations [addition, subtraction, and multiplication], repeating numbers forward and backward, and story problems). The sum scores were standardized prior to the analysis in order to improve interpretability. Because fourth-graders were most likely to solve all items of the AMC scale, we focused on the second- and third-grade children (n = 216; 123 girls and 93 boys) in order to avoid biased DDA results due to ceiling effects.
Bivariate Pearson correlations and descriptive measures of observed variables (means and standard deviations of AMC and AVC are based on sum scores)
Variable | (2) | (3) | (4) | (5) | M | SD | γ | κ | |
---|---|---|---|---|---|---|---|---|---|
(1) | Analogue magnitude code (AMC) | .725 | .240 | – .442 | – .374 | 25.47 | 4.26 | – 1.13 | 1.18 |
(2) | Auditory verbal code (AVC) | – | .294 | – .466 | – .364 | 36.08 | 6.75 | – 0.72 | 0.45 |
(3) | Years of age | – | – .304 | .067 | 8.26 | 0.79 | 0.13 | – 0.35 | |
(4) | Time to complete the test (in minutes) | – | .198 | 29.29 | 6.12 | 0.80 | 0.60 | ||
(5) | Preexisting difficulties | – | 0.35 | 0.48 | 0.64 | – 1.59 |
Distributional requirements for DDA
DDA requires that the distributions of the observed variables deviate from normality. Thus, before estimating the target model (AMC → AVC), we evaluated the assumption of nonnormality of the variables. The AMC and AVC measures were negatively skewed, with excess-kurtosis values greater than zero (Table 3). The Shapiro–Wilk test rejected the null hypothesis of normality for both ability measures (ps < .001). Visual inspection was used to rule out the presence of outliers, and frequencies of the minimum/maximum scores were computed in order to assess potential floor/ceiling effects. For the AVC scale, no participant reached the minimum or maximum score. For AMC, no participants received the minimum, and 14 out of 216 (6.5%) reached the maximum score, which is clearly below the commonly used cutoff of 15%–20% to define a ceiling effect (e.g., Lim et al., 2015; Terwee et al., 2007). Overall, the variables can be considered in line with the distributional requirements of DDA.
Estimating and validating the target model
Results of the two competing models (B = unstandardized coefficients, Std. Error = standard error, Beta = standardized coefficients)
Evaluating the direction of effect
The corresponding output is given in Box 1. The upper panel summarizes the results of D’Agostino skewness and Anscombe–Glynn kurtosis tests for the putative response (columns 1–3) and predictor (columns 4–6). Skewness and excess-kurtosis values were close to zero for Open image in new window and we can retain the null hypothesis of normality. In contrast, Open image in new window significantly deviated from normality with respect to skewness. The results for excess-kurtosis estimates point in the same direction. The lower panel of Box 1 reports the 95% nonparametric bootstrap CIs for the differences in skewness Δ(γ) = ∣γ_{AMC_r}∣−∣γ_{AVC_r}∣ and excess kurtosis Δ(κ) = ∣κ_{AMC_r}∣−∣κ_{AVC_r}∣. Although Open image in new window is significantly more skewed than Open image in new window , the difference in excess kurtosis was nonsignificant. Overall, the third-moment estimates provide evidence in line with direction dependence requirements necessary for Open image in new window .
The upper panel of Box 2 summarizes separate skewness and excess-kurtosis tests of the regression residuals. Columns 1–3 refer to the target model, and columns 4–6 give the results for the alternative model. Although the higher-moment estimates were larger (in absolute values) for the alternative model, we cannot the reject the null hypothesis of normality at the 5% level. Similar results were obtained for the higher-moment difference measures\( \Delta \left({\gamma}_e\right)=\mid {\gamma}_e^{\left( AVC\_r\to AMC\_r\right)}\mid -\mid {\gamma}_e^{\left( AMC\_r\to AVC\_r\right)}\mid \) and \( \Delta \left({\kappa}_e\right)=\mid {\kappa}_e^{\left( AVC\_r\to AMC\_r\right)}\mid -\mid {\kappa}_e^{\left( AMC\_r\to AVC\_r\right)}\mid \) (see the lower panel of Box 2). Both the asymptotic higher-moment difference tests (columns 1–3) and 95% nonparametric bootstrap CIs (last two columns) suggested that the two models are not distinguishable in terms of their residual distributions. Thus, no clear-cut decision is possible for this component.
Box 2. Results from Open image in new window Open image in new window
computes HSIC tests for the two competing models using 500 bootstrap samples. Box 4 gives the corresponding output. Note that the HSIC will be zero if and only if the predictor and the error term are stochastically independent. Again, a nonsignificant result was observed for the Open image in new window , whereas the HSIC reached significance for the Open image in new window . In sum, all independence measures indicated that Open image in new window → Open image in new window is more likely to hold for the present dataset.
Considering the overall results of DDA for the numerical-cognition example, we conclude that, taking into account the covariates, AVC is indeed more likely to reflect the response, and AMC is more likely to be on the explanatory side. In other words, on the basis of the present sample, the DDA results empirically support von Aster and Shalev’s (2007) hierarchical developmental model of numerical cognition.
Discussion
DDA allows researchers to test hypotheses compatible with the directional relation between pairs of variables while adjusting for covariates that possibly contribute to the causal process. This empirical falsification approach is based on the translation of a substantive causal theory into a linear target model that is then compared with the corresponding alternative model. The two models differ in the direction that is hypothesized for the causal process. DDA component patterns can then be used to either retain the target model, retain the directionally competing model, or conclude that no distinct decisions are possible due to the presence of unmeasured confounders. Here, it is important to reiterate that directional conclusions derived from DDA component patterns are based on the operationalization of latent constructs \( \mathcal{X} \) and \( \mathcal{Y} \) using the linear model as an approximation of an unknown “true” functional relation \( \mathcal{F} \). Trustworthiness of DDA, thus, ultimately depends on both, the quality of operationalization and the validity of the linear model for the description of the causal mechanism. Although both requirements essentially apply to any linear modeling approach, they deserve particular attention in the context of DDA.
Because higher moments of variables constitute the key elements to select directionally competing models, DDA assumes that nonnormality of variables reflects inherent distributional characteristics of the constructs under study. Although the phenomenon of nonnormal variable distributions and its occurrence in practice have extensively been studied in the psychometric literature (Blanca, Arnau, López-Montiel, Bono, & Bendayan, 2013; Cain, Zhang, & Yuan, 2017; D. L. Cook, 1959; Lord, 1955; Micceri, 1989), not every form of nonnormality makes variables eligible for DDA. In classical test theory, for example, the impact of discrimination and difficulty of a measurement instrument on the relation between latent traits and true score is well understood. To ensure that the observed score distributions adequately reflect distributional properties of a latent trait, the test characteristic curve should go straight through the range of the trait distribution, which is usually achieved by using items with a broad range of difficulties (Lord & Novick, 1968, p. 392). In addition, item response theory (IRT) models^{5} such as the Rasch model (Rasch, 1960/1980, for dichotomous data) and the partial credit model (Masters, 1982, for polytomous data) are valuable alternatives. These models (1) come with empirical measures to evaluate the adequacy of describing a given dataset, (2) provide accordingly “weighted” parameter estimates (i.e., taking into account item difficulties), and (3) if the measurement model holds, exhibit the feature of specific objectivity (i.e., items can be compared irrespective of the distribution of person parameters and subjects can be compared using any proper set of items), which allows the most adequate estimation of the underlying trait distributions. For example, data on numerical cognition used for illustrative purposes were shown to be in line with the Rasch model (see Koller & Alexandrowicz, 2010), which implies that raw scores are sufficient statistics for the latent person abilities. In contrast, applying DDA in cases in which nonnormality of variables is a by-product of poor item selection, scaling (Ho & Yu, 2015), or the result of ceiling/floor effects will lead to biased results (note that, in the empirical example, the fourth grade children who were most likely to solve all scale-specific items were excluded to reduce the risk of biases due to ceiling effects). Overall, selecting high-quality measurement instruments at the study planning stage, or carefully evaluating psychometric properties of secondary data are central steps toward meaningful DDA outcomes.
Explanatory modeling, in general, requires that selected statistical models f can easily be linked to the corresponding theoretical model \( \mathcal{F} \) (Shmueli, 2010). Because the “true” data-generating mechanism \( \mathcal{Y}=\mathcal{F}\left(\mathcal{X}\right) \) is unknown in any statistical modeling approach (Cudeck & Henly, 2003) empirical examinations whether y = f(x) is close enough to \( \mathcal{Y}=\mathcal{F}\left(\mathcal{X}\right) \) are impossible. Appropriateness of f must be established indirectly through critically evaluating model validity using regression diagnostics (cf. Belsley, Kuh, & Welsch, 1980; R. D. Cook & Weisberg, 1982). Several model checks are indispensable before applying DDA. First, one needs to ensure that the assumption of linearity is justified (in the illustrative example we used visual diagnostics and evaluated changes in R^{2} values when adding higher polynomials of all continuous variables). Second, evaluating potential issues of multicollinearity (e.g., inspecting pairwise predictor correlations and VIFs) are necessary to avoid biased inference due to inflated standard errors. Third, absence of outliers and highly influential data points must be confirmed (e.g., via examining Cook’s distances, leverage statistics, or deleted studentized residuals). Ideally, the process of building a valid target model and the subsequent evaluation of its directional properties constitute two separate steps. This implies that unintended DDA outcomes should not be used as a basis to delete “misbehaving” data points.
The case of nonnormal “true” errors
The DDA framework presented here assumes that the “true” error follows a normal distribution. Although, in best practice applications, normality of residuals should routinely be evaluated to guarantee valid statistical inference (Box & Watson, 1962; Hampel, 1973; Pearson, 1931; White & MacDonald, 1980), normality is not required for OLS coefficients to be the best linear unbiased estimates. Normal “true” errors are particularly important for residual-distribution-based DDA tests when measures of both, skewness and excess kurtosis, are considered simultaneously because normality of the correctly specified error then serves as a benchmark for model comparison. However, when one only focuses on the skewness of competing error terms, model selection can be performed as long as \( {\gamma}_{e_{yx}} \) = 0—that is, no explicit assumptions are made concerning \( {\kappa}_{e_{yx}} \). Model selection should then be based on nonparametric bootstrap CIs of skewness differences instead of the asymptotic skewness difference test (cf. Wiedermann & von Eye, 2015c). Reversely, when solely focusing on the excess kurtosis of error terms, no explicit assumptions are made concerning symmetry of the “true” error distribution and, as long as \( {\kappa}_{e_{yx}}=0 \) holds for the “true” model, \( {\gamma}_{e_{yx}} \) is allowed to vary within the range \( -\sqrt{2} \) to \( \sqrt{2} \) according to the skewness–kurtosis inequality κ ≥ γ^{2} − 2 (cf. Teuscher & Guiard, 1995).
Although DDA, based on the skewness and excess kurtosis of the observed variables, also requires normality of the “true” errors, focusing on either skewness or the excess kurtosis relaxes distributional assumptions about the “true” error in the same fashion (for a detailed discussion on distinguishing directionally competing models under error nonnormality, see also Wiedermann & Hagmann, 2015). In addition, alternative DDA measures based on higher-order correlations \( {\rho}_{ij}\left(x,y\right)={\operatorname{cov}}_{ij}\left(x,y\right)/\left({\sigma}_x^i{\sigma}_y^j\right) \) with cov_{ij}(x, y) = E[(x – E[x])^{i} (y – E[y])^{j}] are available that do not make any assumptions about the “true” error distribution. Dodge and Rousson (2001) showed that ρ_{xy} = ρ_{12}(x, y)/ρ_{21}(x, y) holds whenever the “true” predictor is asymmetrically distributed without imposing distributional assumptions on the error. Thus, one obtains \( {\rho}_{12}^2\left(x,y\right)<{\rho}_{21}^2\left(x,y\right) \) under x → y and \( {\rho}_{12}^2\left(x,y\right)>{\rho}_{21}^2\left(x,y\right) \) under y → x independent of the error term distribution. A nonparametric bootstrap approach can again be carried out for statistical inference. Similarly, kurtosis-based DDA measures can be obtained when focusing on \( {\rho}_{13}^2\left(x,y\right) \) and \( {\rho}_{31}^2\left(x,y\right) \) (cf. Wiedermann, 2017). Implementing additional DDA measures for potentially nonnormal “true” errors in Open image in new window is planned in the future.
Methods to assess independence of predictor(s) and error can straightforwardly be applied without any further modification even when the “true” error is nonnormal. The reason for this is that the Darmois–Skitovich theorem, as applied in the present context, does not impose distributional assumptions on the “true” error. Nonindependence of predictor(s) and error will hold when at least one common variable is nonnormal. Thus, evaluating the independence assumption of competing models can be carried out when (1) only the “true” predictor, (2) only the “true” error, or (3) both deviate from normality as along as the product of corresponding coefficients (see Eq. 18) is unequal to zero. However, results of competing BP-tests to assess patterns of heteroscedasticity in the two candidate models must be interpreted with caution when residuals of both models deviate from normality. In this case, Type I error rates of the test will be distorted and directional decisions must be based on Koenker’s robust BP test.
Power and sample size considerations: What we know so far
To provide guidelines for the necessary number of observations to achieve sufficient power, we summarize previous simulation studies on DDA components and focus on three factors that impact empirical power rates: The magnitude of nonnormality, the magnitude of the causal effects, and sample size. Dodge and Rousson (2016) evaluated the power of nonparametric bootstrap CIs of Δ(γ) = |γ_{x}|–|γ_{y}| and Δ(κ) = κ_{x}|–|κ_{y}| and concluded that skewness-based model selection outperformed the kurtosis-based approach in terms of statistical power to detect the “true” model. Here, for small effects (R^{2} = .25) and skewness values of 2, sample sizes as small as n = 50 may be sufficient to achieve a statistical power close to 80%. In contrast, for kurtosis-based selection, sample sizes of n = 500 and excess-kurtosis values larger 4 are needed to achieve similar statistical power.
Wiedermann and von Eye (2015b) evaluated power properties of residual distribution-based methods considering separate D’Agostino skewness tests, the asymptotic skewness difference test, and nonparametric bootstrap CIs for \( \Delta \left({\gamma}_e\right)=\mid {\gamma}_{e_{xy}}\mid -\mid {\gamma}_{e_{yx}}\mid \), and concluded that acceptable power levels can already be observed for n = 75 when causal effects are small (ρ_{xy} = .25) and the true predictor is sufficiently skewed (i.e., γ_{x} ≥ 2). Because model selection based on separate normality tests proves more powerful than tests based on Δ(γ_{e}), n = 50 may already be sufficient for separate D’Agostino tests. In general, at least n = 125 is required for less skewed variables (e.g., γ_{x} = 1) and lower correlations (e.g., ρ_{xy} = .25). Model selection based on excess-kurtosis differences of residual distributions was evaluated by Wiedermann (2015). Again, separate Anscombe–Glynn tests outperformed procedures based on the difference of excess-kurtosis estimates. Here, for n = 200 and ρ_{xy} = .4, excess-kurtosis values larger than 4 are necessary for power rates close to 80%.
Wiedermann, Artner, and von Eye (2017) compared the performance of nine homoscedasticity tests to evaluate the independence assumption in competing models and showed that the BP-test was the most powerful procedure to select the correct model. For slightly skewed predictors (γ_{x} = 0.75), large effects and large sample sizes n ≥ 400 may be required to achieve sufficient power. For γ_{x} ≥ 1.5 and medium effect sizes, at least n = 200 may be required. Quite similar results were obtained for model selection based on nonlinear correlation tests of the form \( cor\left(x,{e}_{yx}^2\right) \) (Wiedermann & von Eye, 2016). However, γ_{x} ≥ 1.5 and large effects are necessary to obtain power rates beyond 80% when n ≥ 200. Systematic simulation experiments that (1) compare the statistical power of several other independence tests and (2) evaluate all DDA components simultaneously constitute important future endeavors.
Further application scenarios and extensions
It is important to note that the proposed method is not restricted to the presented standard multiple regression setup. DDA is also applicable in other scenarios in which directionality issues have been deemed to be untestable. For example, when a statistical relation between two variables, x and y, has been established, researcher may further entertain hypotheses about the role of a third measured variable. Whether this third variable (m) should be conceptualized as a mediator (an intervening variable that transmits the effect from x to y) or as an observed confounder cannot be answered with standard statistical methods (MacKinnon, Krull, & Lockwood, 2000). From a DDA perspective, distinguishing between these models reduces to separately evaluating the directionality of x and m (i.e., whether x → m or m → x should be preferred) and m and y (i.e., whether m → y or y → m holds for the data) provided that nonnormality requirements are fulfilled (for extensions of residual- and independence-based DDA to mediation models, see Wiedermann & von Eye, 2015c, 2016). Furthermore, the application of DDA may not be restricted to observational studies. Directionality issues may also occur in experimental studies—in particular, those designed to test hypotheses that go beyond total effects in randomized trials. Here, mediation models may, again, provide sound explanations how experimental interventions causally affect the target outcome (Bullock, Green, & Ha, 2010; Heckman & Smith, 1995; Imai, Keele, & Tingley, 2010). However, even when the predictor is under experimental control, it is well-known that neither the direction (Wiedermann & von Eye, 2015c) nor the magnitude of the causal effect of the mediator on the outcome can be identified uniquely without imposing strong assumptions on data, assumptions that are similar to observational studies (Imai, Tingley, & Yamamoto, 2013; Keele, 2015). Again, DDA may help to gain further insight through evaluating competing mediator-outcome paths while adjusting for an experimentally controlled predictor.
Extensions of the direction dependence methodology proposed in this article can go in a number of directions. First, developing DDA for moderation models would enable researchers to test the direction of effect while accounting for a third variable that modifies the relation between predictor and response (the fact that the nature of the moderator effect may depend on the direction of the postulated model has been shown by Judd & Kenny, 2010). Similarly, future work is needed to study principles of direction dependence in polynomial (i.e., models that consider higher-order terms, cf. Aiken & West, 1991) and more general linearizable regression models (i.e., nonlinear regression functions that can be linearized through proper variable transformations). Another possible extension concerns the complexity of the research design. Although the presented framework is designed for single-level data, developing DDA for multilevel regression models (Raudenbush & Bryk, 2002) would allow to account for hierarchical (nested) data structures. Further, throughout the article, we assumed that the “true” predictor is measured without measurement error. Although first attempts to extend DDA components to measurement error models are given in von Eye and Wiedermann (2014) and Wiedermann, Merkle, and von Eye (2018), extending direction dependence to latent variable models may overcome potential biases in directional decisions resulting from imprecise measurement of constructs. Finally, the present study focused on cases in which the tentative predictor and the tentative response are continuous variables (covariates can either be continuous or categorical). The reason for this is that both candidates models (x → y and y → x) must be specified as standard linear regression models (similarly, the proposed SPSS macros are designed to evaluate two competing standard linear models). Although previous studies (cf. Inazumi et al., 2011; Peters, Janzing, & Schölkopf, 2011; von Eye & Wiedermann, 2016, 2017; Wiedermann & von Eye, 2018) discussed principles of direction dependence when both variables are categorical in nature, extending DDA to the generalized linear modeling framework (McCullagh & Nelder, 1989) would be most promising for evaluating causal relations among categorical, count, and continuous variables.
Author note
We thank the two anonymous reviewers, Wes Bonifay, Francis Huang, Edgar C. Merkle, Anna P. Nutt, and Phillip K. Wood for their constructive comments on an earlier version of the article. We are also indebted to Ingrid Koller for providing the data used for illustrative purposes.
Footnotes
- 1.
Reciprocal causal models (x affects y, and vice versa) may serve as a fourth possible explanation for variable associations. Although it is mathematically possible to estimate reciprocal effects with cross-sectional data (James & Singh, 1978), some controversy exists about the adequacy of those estimates (Wong & Law, 1999) due to the absence of temporality. In some theories, temporal precedence constitutes a crucial element to quantify feedback loops, and thus, longitudinal data are usually preferred (Rogosa, 1985).
- 2.
Although both types of statistical models are of importance for theory building (Braun & Oswald, 2011) and can be characterized as association-based models in the context of observational data, each one plays a different role in the process of testing and redefining theories. In explanatory models, a priori theories carry the crucial element of causation and the goal is to match a statistical model f and an underlying mechanism \( \mathcal{F} \) and use x and y as tools to estimate and validate f for the purpose of testing the causal hypothesis of interest. In contrast, in predictive models, f is considered being a tool capturing variable associations and x and y are of primary interest to build valid models for the purpose of forecasting new response values (Shmueli, 2010).
- 3.
Note that correlation and higher-moment parameters refer to population values, which implies that DDA quantities will exactly hold in the population. Due to sampling variability, DDA quantities will not hold exactly for sample estimates but converge to their true values with increasing sample size.
- 4.
We do not focus on tanh-based tests because of low power for asymmetrically distributed variables.
- 5.
We thank one of the anonymous reviewers for this suggestion.
Supplementary material
References
- Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks: Sage.Google Scholar
- Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.Google Scholar
- Anscombe, F. J., & Glynn, W. J. (1983). Distribution of the kurtosis statistics b2 for normal samples. Biometrika, 70, 227–234. doi: https://doi.org/10.2307/2335960 Google Scholar
- Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley.CrossRefGoogle Scholar
- Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9, 78–84. doi: https://doi.org/10.1027/1614-2241/a000057 CrossRefGoogle Scholar
- Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.CrossRefGoogle Scholar
- Box, G. E. P., & Watson, G. S. (1962). Robustness to nonnormality of regression tests. Biometrika, 49, 93–106. doi: https://doi.org/10.1093/biomet/49.1-2.93 CrossRefGoogle Scholar
- Braun, M. T., & Oswald, F. L. (2011). Exploratory regression analysis: A tool for selecting models and determining predictor importance. Behavior Research Methods, 43, 331–339. doi: https://doi.org/10.3758/s13428-010-0046-8 CrossRefPubMedGoogle Scholar
- Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98, 550–558. doi: https://doi.org/10.1037/a0018933 CrossRefPubMedGoogle Scholar
- Cain, M. K., Zhang, Z., & Yuan, K. H. (2017). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods, 49, 1716–1735. doi: https://doi.org/10.3758/s13428-016-0814-1 CrossRefPubMedGoogle Scholar
- Chickering D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554.Google Scholar
- Cook, D. L. (1959). A replication of Lord’s study on skewness and kurtosis of observed test-score distributions. Educational and Psychological Measurement, 19, 81–87. doi: https://doi.org/10.1177/001316445901900109 CrossRefGoogle Scholar
- Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.Google Scholar
- Cudeck, R., & Henly, S. J. (2003). A realistic perspective on pattern representation in growth data: Comment on Bauer and Curran (2003). Psychological Methods, 8, 378–383. doi: https://doi.org/10.1037/1082-989X.8.3.378 CrossRefPubMedGoogle Scholar
- D’Agostino, R. B. (1971). An omnibus test of normality for moderate and large sample sizes. Biometrika, 58, 341–348. doi: https://doi.org/10.2307/2334522 CrossRefGoogle Scholar
- Darmois, G. (1953). Analyse générale des liaisons stochastique. Review of the International Statistical Institute, 21, 2–8. doi: https://doi.org/10.2307/1401511 CrossRefGoogle Scholar
- Dehaene, S., & Cohen, L. (1998). Levels of representation in number processing. In B. Stemmer & H. A. Whitaker (Eds.), The handbook of neurolinguistics (pp. 331–341). New York: Academic Press.CrossRefGoogle Scholar
- Dodge, Y., & Rousson, V. (2000). Direction dependence in a regression line. Communications in Statistics: Theory and Methods, 29, 1957–1972. doi: https://doi.org/10.1080/03610920008832589 CrossRefGoogle Scholar
- Dodge, Y., & Rousson, V. (2001). On asymmetric properties of the correlation coefficient in the regression setting. American Statistician, 55, 51–54. doi: https://doi.org/10.1198/000313001300339932 CrossRefGoogle Scholar
- Dodge, Y., & Rousson, V. (2016). Recent developments on the direction of a regression line. In W. Wiedermann & A. von Eye (eds.), Statistics and causality: Methods for applied empirical research (pp. 45–62). Hoboken: Wiley.Google Scholar
- Dodge, Y., & Yadegari, I. (2010). On direction of dependence. Metrika, 72, 139–150. doi: https://doi.org/10.1007/s00184-009-0273-0 CrossRefGoogle Scholar
- Entner, D., Hoyer, P. O., & Spirtes, P. (2012). Statistical test for consistent estimation of causal effects in linear non-Gaussian models. Journal of Machine Learning Research: Workshop and Conference Proceedings, 22, 364–372.Google Scholar
- Frisch, R., & Waugh, F. (1933). Partial time regressions as compared with individual trends. Econometrica, 1, 387–401. doi: https://doi.org/10.2307/1907330 CrossRefGoogle Scholar
- Geisser, J. (1993). Predictive inference: An introduction. London: Chapman & Hall.CrossRefGoogle Scholar
- Gentile, D. A., Lynch, P. J., Linder, J. R., & Walsh, D. A. (2004). The effects of violent video game habits on adolescent hostility, aggressive behaviors, and school performance. Journal of Adolescence, 27, 5–22. doi: https://doi.org/10.1016/j.adolescence.2003.10.002 CrossRefPubMedGoogle Scholar
- Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2008). A kernel statistical test of independence. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 585–592). Cambridge: MIT Press.Google Scholar
- Hampel, F. R. (1973). Robust estimation: A condensed partial survey. Zeitschrift für Wahrscheinlichkeitstheorie, 27, 87–104. doi: https://doi.org/10.1007/bf00536619 CrossRefGoogle Scholar
- Harris, A., & Seckl, J. (2011). Glucocorticoids, prenatal stress and the programming of disease. Hormones and Behavior, 59, 279–289. doi: https://doi.org/10.1016/j.yhbeh.2010.06.007 CrossRefPubMedGoogle Scholar
- Heckman, J. J., & Smith, J. A. (1995) Assessing the case for social experiments. Journal of Economic Perspectives, 9, 85–110. doi: https://doi.org/10.1257/jep.9.2.85 CrossRefGoogle Scholar
- Ho, A. D., & Yu, C. C. (2015). Descriptive statistics for modern test score distributions skewness, kurtosis, discreteness, and ceiling effects. Educational and Psychological Measurement, 75, 365–388. doi: https://doi.org/10.1177/0013164414548576 CrossRefPubMedGoogle Scholar
- Hoyer, P. O., Shimizu, S., Kerminen, A. J., & Palviainen, M. (2008). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49, 362–378. doi: https://doi.org/10.1016/j.ijar.2008.02.006 CrossRefGoogle Scholar
- Hyvärinen, A. (2010). Pairwise measures of causal direction in linear non-Gaussian acyclic models. In JMLR: Workshop and Conference Proceedings (Vol. 13, pp. 1–16). Tokyo, Japan: JMLR.Google Scholar
- Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent components analysis. New York: Wiley.CrossRefGoogle Scholar
- Hyvärinen, A., & Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14, 111–152.Google Scholar
- Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15, 309–334. doi: https://doi.org/10.1037/a0020761 CrossRefPubMedGoogle Scholar
- Imai, K., Tingley, D., & Yamamoto, T. (2013). Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society: Series A, 176, 5–51. doi: https://doi.org/10.1111/j.1467-985x.2012.01032.x CrossRefGoogle Scholar
- Inazumi, T., Washio, T., Shimizu, S., Suzuki, J., Yamamoto, A., & Kawahara, Y. (2011). Discovering causal structures in binary exclusive-or skew acyclic models. In F. Cozman & A. Pfeffer (Eds.), Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (pp. 373–382). Corvallis: AUAI Press. arXiv:1202.3736Google Scholar
- James, L. R., & Singh, B. K. (1978). An introduction to the logic, assumptions, and basic analytic procedures of two-stage least squares. Psychological Bulletin, 85, 1104–1122. doi:10.1037/0033-2909.85.5.1104Google Scholar
- Judd, C. M., & Kenny, D. A. (2010). Data analysis. In D. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology (5th ed., Vol. 1, pp. 115–139). New York: Wiley.Google Scholar
- Kaufman, R. L. (2013). Heteroskedasticity in regression: Detection and correction. Thousand Oaks: Sage.CrossRefGoogle Scholar
- Keele, L. (2015). Causal mediation analysis: Warning! Assumptions ahead. American Journal of Evaluation, 36, 500–513. doi: https://doi.org/10.1177/1098214015594689 CrossRefGoogle Scholar
- Koller, I., & Alexandrowicz, R. W. (2010). A psychometric analysis of the ZAREKI-R using Rasch-models. Diagnostica, 56, 57–67. doi: https://doi.org/10.1026/0012-1924/a000003 CrossRefGoogle Scholar
- Lim, C. R., Harris, K., Dawson, J., Beard, D. J., Fitzpatrick, R., & Price, A. J. (2015). Floor and ceiling effects in the OHS: An analysis of the NHS PROMs data set. BMJ Open, 5, e007765. doi: https://doi.org/10.1136/bmjopen-2015-007765 CrossRefPubMedPubMedCentralGoogle Scholar
- Lord, F. M. (1955). A survey of observed test-score distributions with respect to skewness and kurtosis. Educational and Psychological Measurement, 15, 383–389. doi: https://doi.org/10.1177/001316445501500406 CrossRefGoogle Scholar
- Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.Google Scholar
- Lovell, M. (1963). Seasonal adjustment of economic time series and multiple regression analysis. Journal of the American Statistical Association, 58, 993–1010. doi: https://doi.org/10.1080/01621459.1963.10480682.CrossRefGoogle Scholar
- MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1, 173–181. doi: https://doi.org/10.1023/A:1026595011371 CrossRefPubMedPubMedCentralGoogle Scholar
- Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. doi: https://doi.org/10.1007/bf02296272 CrossRefGoogle Scholar
- McCullagh, P., & Nelder, A. (1989). Generalized linear models (2nd). London: Chapman & Hall.CrossRefGoogle Scholar
- Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. doi: https://doi.org/10.1037/0033-2909.105.1.156 CrossRefGoogle Scholar
- Miller, T. W., Nigg, J. T., & Miller, R. L. (2009). Attention deficit hyperactivity disorder in African American children: What can be concluded from the past ten years? Clinical Psychology Review, 29, 77–86. doi: https://doi.org/10.1016/j.cpr.2008.10.001 CrossRefPubMedGoogle Scholar
- Muddapur, M. V. (2003). On directional dependence in a regression line. Communications in Statistics: Theory and Methods, 32, 2053–2057. doi: https://doi.org/10.1081/sta-120023266 CrossRefGoogle Scholar
- Mumford, J. A., & Ramsey, J. D. (2014). Bayesian networks for fMRI: A primer. NeuroImage, 86, 573–582. doi: https://doi.org/10.1016/j.neuroimage.2013.10.020 CrossRefPubMedGoogle Scholar
- Munafò, M. R., & Araya, R. (2010). Cigarette smoking and depression: A question of causation. British Journal of Psychiatry, 196, 425–426. doi: https://doi.org/10.1192/bjp.bp.109.074880 CrossRefPubMedGoogle Scholar
- Nigg, J. T. (2012). Future directions in ADHD etiology research. Journal of Clinical Child & Adolescent Psychology, 41, 524–533. doi: https://doi.org/10.1080/15374416.2012.686870 CrossRefGoogle Scholar
- Nigg, J. T., Knottnerus, G. M., Martel, M. M., Nikolas, M., Cavanagh, K., Karmaus, W., & Rappley, M. D. (2008). Low blood lead levels associated with clinically diagnosed attention-deficit/hyperactivity disorder and mediated by weak cognitive control. Biological Psychiatry, 63, 325–331. doi: https://doi.org/10.1016/j.biopsych.2007.07.013 CrossRefPubMedGoogle Scholar
- Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
- Pearson, E. S. (1931). The analysis of variance in case of non-normal variation. Biometrika, 23, 114–133. doi: https://doi.org/10.2307/2333631 CrossRefGoogle Scholar
- Peters, J., Janzing, D., & Schölkopf, B. (2011). Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2436–2450. doi: https://doi.org/10.1109/tpami.2011.71 CrossRefPubMedGoogle Scholar
- Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of causal inference: Foundations and learning algorithms. Cambridge: MIT Press.Google Scholar
- Pornprasertmanit, S., & Little, T. D. (2012). Determining directional dependency in causal associations. International Journal of Behavioral Development, 36, 313–322. doi: https://doi.org/10.1177/0165025412448944 CrossRefPubMedPubMedCentralGoogle Scholar
- Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press. (Original work published 1960)Google Scholar
- Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks: Sage.Google Scholar
- Richardson, T., & Spirtes, P., (1999). Automated discovery of linear feedback models. In C. Glymour & G. F. Cooper (Eds.), Computation, causation and discovery (pp. 253–304). Cambridge: MIT Press.Google Scholar
- Rogosa, D. R. (1985). Analysis of reciprocal effects. In T. Husen & N. Postlethwaite (Eds.), International encyclopedia of education (pp. 4221–4225). London: Pergamon Press.Google Scholar
- Sen, A., & Sen, B. (2014). Testing independence and goodness-of-fit in linear models. Biometrika, 101, 927–942. doi: https://doi.org/10.1093/biomet/asu026 CrossRefGoogle Scholar
- Shimizu, S. (2016). Non-Gaussian structural equation models for causal discovery. In W. Wiedermann & A. von Eye (eds.), Statistics and causality: Methods for applied empirical research (pp. 153–276). Hoboken: Wiley.CrossRefGoogle Scholar
- Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7, 2003–2030.Google Scholar
- Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., . . . Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12, 1225–1248.Google Scholar
- Shmueli, G. (2010). To explain or to predict? Statistical Science, 25, 289–310. doi: https://doi.org/10.1214/10-sts330 CrossRefGoogle Scholar
- Skitovich, W. P. (1953). On a property of the normal distribution. Doklady Akademii Nauk SSSR, 89, 217–219.Google Scholar
- Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd). Cambridge: MIT PressGoogle Scholar
- Spirtes, P., Richardson, T., Meek, C., Scheines, R., & Glymour, C. (1998). Using path diagrams as a structural equation modeling tool. Sociological Methods and Research, 27, 182–225. doi: https://doi.org/10.1177/0049124198027002003 CrossRefGoogle Scholar
- Spirtes, P., & Zhang, K. (2016). Causal discovery and inference: Concepts and recent methodological advances. Applied Informatics, 3, 1–28. doi: https://doi.org/10.1186/s40535-016-0018-x CrossRefGoogle Scholar
- Sungur, E. A. (2005). A note on directional dependence in regression setting. Communications in Statistics: Theory and Methods, 34, 1957–1965. doi: https://doi.org/10.1080/03610920500201228 CrossRefGoogle Scholar
- Taylor, G., McNeill, A., Girling, A., Farley, A., Lindson-Hawley, N., & Aveyard, P. (2014). Change in mental health after smoking cessation: Systematic review and meta-analysis. British Medical Journal, 348, 1–22. doi: https://doi.org/10.1136/bmj.g1151 Google Scholar
- Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., … de Vet, H. C. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60, 34–42. doi: https://doi.org/10.1016/j.jclinepi.2006.03.012 CrossRefPubMedGoogle Scholar
- Teuscher, F., & Guiard, V. (1995). Sharp inequalities between skewness and kurtosis for unimodal distributions. Statistics and Probability Letters, 22, 257–260. doi: https://doi.org/10.1016/016771529400074I CrossRefGoogle Scholar
- Verma, T. S., & Pearl, J. (1991). Equivalence and synthesis of causal models. Uncertainty in Artificial Intelligence, 6, 220–227.Google Scholar
- von Aster, M., Weinhold Zulauf, M., & Horn, R. (2006). Neuropsychologische Testbatterie fuer Zahlenverarbeitung und Rechnen bei Kindern (ZAREKI-R) [Neuropsychological test battery for number processing and calculation in children]. Frankfurt: Harcourt Test Services.Google Scholar
- von Aster, M. G., & Shalev, R. S. (2007). Number development and dyscalculia. Developmental Medicine and Child Neurology, 49, 868–873. doi: https://doi.org/10.1111/j.1469-8749.2007.00868.x CrossRefGoogle Scholar
- von Eye, A., & DeShon, R. P. (2012). Directional dependence in developmental research. International Journal of Behavioral Development, 36, 303–312. doi: https://doi.org/10.1177/0165025412439968 CrossRefGoogle Scholar
- von Eye, A., & Wiedermann, W. (2014). On direction of dependence in latent variable contexts. Educational and Psychological Measurement, 74(1), 5–30. doi: https://doi.org/10.1177/0013164413505863
- von Eye, A., & Wiedermann, W. (2016). Direction of effects in categorical variables: A structural perspective. In W. Wiedermann & A. von Eye (Eds.), Statistics and causality: Methods for applied empirical research (pp. 107–130). Hoboken: Wiley.Google Scholar
- von Eye, A., & Wiedermann, W. (2017). Direction of effects in categorical variables: Looking inside the table. Journal of Person-Oriented Research, 3, 11–26. doi: https://doi.org/10.17505/jpor.2017.02 CrossRefGoogle Scholar
- White, H., & MacDonald, G. M. (1980). Some large-sample tests for nonnormality in the linear regression model. Journal of the American Statistical Association, 75, 16–28. doi: https://doi.org/10.2307/2287373 CrossRefGoogle Scholar
- Wiedermann, W. (2015). Decisions concerning the direction of effects in linear regression models using the fourth central moment. In M. Stemmler, A. von Eye, & W. Wiedermann (Eds.), Dependent data in social sciences research: Forms, issues, and methods of analysis (pp. 149–169). New York: Springer.CrossRefGoogle Scholar
- Wiedermann, W. (2017). A note on fourth moment-based direction dependence measures when regression errors are non normal. Communications in Statistics: Theory and Methods. doi: https://doi.org/10.1080/03610926.2017.1388403
- Wiedermann, W., Artner, R., & von Eye, A. (2017). Heteroscedasticity as a basis of direction dependence in reversible linear regression models. Multivariate Behavioral Research, 52, 222–241. doi: https://doi.org/10.1080/00273171.2016.1275498 CrossRefPubMedGoogle Scholar
- Wiedermann, W., & Hagmann, M. (2015). Asymmetric properties of the Pearson correlation coefficient: Correlation as the negative association between linear regression residuals. Communications in Statistics, 45, 6263–6283. doi: https://doi.org/10.1080/03610926.2014.960582 CrossRefGoogle Scholar
- Wiedermann, W., Hagmann, M., Kossmeier, M., & von Eye, A. (2013). Resampling techniques to determine direction of effects in linear regression models. Interstat. Retrieved May 13, 2013, from http://interstat.statjournals.net/YEAR/2013/articles/1305002.pdf
- Wiedermann, W., Hagmann, M., & von Eye, A. (2015). Significance tests to determine the direction of effects in linear regression models. British Journal of Mathematical and Statistical Psychology, 68, 116–141. doi: https://doi.org/10.1111/bmsp.12037 CrossRefPubMedGoogle Scholar
- Wiedermann, W., Merkle, E. C., & von Eye, A. (2018). Direction of dependence in measurement error models. British Journal of Mathematical and Statistical Psychology, 71, 117–145. doi: https://doi.org/10.1111/bmsp.12111 CrossRefPubMedGoogle Scholar
- Wiedermann, W., & von Eye, A. (2015a). Direction-dependence analysis: A confirmatory approach for testing directional theories. International Journal of Behavioral Development, 39, 570–580. doi: https://doi.org/10.1177/0165025415582056 CrossRefGoogle Scholar
- Wiedermann, W., & von Eye, A. (2015b). Direction of effects in multiple linear regression model. Multivariate Behavioral Research, 50, 23–40. doi: https://doi.org/10.1080/00273171.2014.958429 CrossRefPubMedGoogle Scholar
- Wiedermann, W., & von Eye, A. (2015c). Direction of effects in mediation analysis. Psychological Methods, 20, 221–244. doi: https://doi.org/10.1037/met0000027 CrossRefPubMedGoogle Scholar
- Wiedermann, W., & von Eye, A. (2016). Directionality of effects in causal mediation analysis. In W. Wiedermann & A. von Eye (Eds.), Statistics and causality: Methods for applied empirical research (pp. 63–106). Hoboken: Wiley.CrossRefGoogle Scholar
- Wiedermann, W., & von Eye, A. (2018). Log-linear models to evaluate direction of effect in binary variables. Statistical Papers. doi: https://doi.org/10.1007/s00362-017-0936-2
- Wong, C. S., & Law, K. S. (1999). Testing reciprocal relations by nonrecursive structural equation models using cross-sectional data. Organizational Research Methods, 2, 69–87. doi: https://doi.org/10.1177/109442819921005 CrossRefGoogle Scholar
- Zhang, J. (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172, 1873–1896. doi: https://doi.org/10.1016/j.artint.2008.08.001 CrossRefGoogle Scholar