Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS

Wiedermann, Wolfgang; Li, Xintong

doi:10.3758/s13428-018-1031-x

Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS

Published: 16 April 2018

Volume 50, pages 1581–1601, (2018)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS

Download PDF

Wolfgang Wiedermann¹ &
Xintong Li¹

7658 Accesses
24 Citations
2 Altmetric
Explore all metrics

Abstract

In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.

semPower: General power analysis for structural equation models

Article Open access 10 November 2023

Path Analysis

Generating Correlated, Non-normally Distributed Data Using a Non-linear Structural Model

Article 10 June 2015

This article introduces methods of direction dependence and presents a unified statistical framework to discern the causal direction of effects in linear models using observational data. Existing regression-type methods allow researchers to quantify the magnitude of hypothesized effects but are of limited use when establishing the direction of effects between variables—that is, whether x → y or y → x correctly describes the causal flow between two variables x and y. The statistical framework proposed in this article allows researchers to make conclusions about the direction of effects. In the present work, we focus on observational (nonexperimental) data settings because this type of data is the chief material for the presented principles of direction dependence. The issue of effect directionality in the context of experimental studies (e.g., when decomposing total effects into direct and indirect effect components) will be taken up in the Discussion section.

Establishing cause–effect relations between variables is a central aim of many empirical studies in the social sciences. The direction of influence between variables is a key element of any causal theory that purports to explain the data-generating mechanism (Bollen, 1989). Questions concerning direction of effect naturally arise in observational studies. For example, it may not be entirely clear whether tobacco consumption causes depression and anxiety or whether people with symptoms of depression and anxiety are more likely to engage in health damaging behavior (Munafò & Araya, 2010; Taylor et al., 2014); whether violent video games expose players to aggressive behavior or whether aggressive people are simply more attracted to violent video games (Gentile, Lynch, Linder, & Walsh, 2004); or whether lead exposure contributes to the development of ADHD or whether children with ADHD symptoms are unable to stay focused enough to avoid lead-tainted objects (Nigg et al., 2008).

Unfortunately, also useful in assessing the magnitude and statistical significance of an assumed causal effect, standard regression-based methods are of limited use when addressing directionality issues per se. When an association between x and y exists, at least three possible explanations can be entertained: (1) x causes y (x → y), (2) y causes x (y → x), and (3) neither relation exists due to a spurious association of both variables with a third variable (sometimes termed a “confounder”; see Fig. 1).^{Footnote 1} The Pearson product-moment correlation and ordinary least square (OLS) estimates do not adjudicate regarding the model that best represents the data-generating mechanism (von Eye & DeShon, 2012). Researchers who use regression models must therefore make their decision as to the direction of effect on the basis of a priori theory and substantive arguments. However, statistical tools often are desirable to empirically demonstrate the explanatory superiority of one theory over plausible alternatives. The present contribution introduces such a tool—direction dependence analysis (DDA; Wiedermann & von Eye, 2015a). Although standard regression models use only estimates of first- and second-order moments (i.e., means, variances, and covariances) to assess the magnitude and statistical significance of regression weights, DDA, by contrast, uses estimates of higher-order moments (i.e., skewness and kurtosis) to assess the relative plausibility of directional alternatives.

Methods of causal discovery have experienced rapid development within the last decades and various causal search algorithms have been proposed (see Spirtes & Zhang, 2016, for a recent overview). These search algorithms are designed to learn plausible causal structures from multivariate data. Examples of such algorithms include the PC algorithm (Spirtes, Glymour, & Scheines, 2000), greedy equivalence search (Chickering, 2002), cyclic causal discovery (Richardson & Spirtes, 1999), fast causal inference (Zhang, 2008), and linear non-Gaussian acyclic models (cf. Shimizu, 2016; Shimizu, Hoyer, Hyvärinen, & Kerminen, 2006) that are either designed to discover (Markov) equivalence classes of directed acyclic graphs (DAGs; i.e., a small subset of candidate models that have the same support by the data in terms of model fit; cf. Verma & Pearl, 1991) or uncover DAG structures beyond equivalence classes. All these algorithms constitute important exploratory tools for causal learning and are, thus, ideally suited to generate new substantive hypotheses concerning the causal nature of constructs.

DDA, in contrast, is concerned with a research scenario that is confirmatory in nature—that is, situations in which a substantive theory about the causal relation exists and the researcher wishes to know if the causal direction assumed by the model is plausible relative to the alternative scenarios a reasonable skeptic might propose. The primary goal is to probe this causal theory against alternatives while adjusting for potential background variables known to be explanatory in nature. Thus, instead of extracting plausible DAG structures (or classes of equivalent DAGs) for a given dataset, one is interested in testing a specific model (e.g., lead exposure → ADHD) against a plausible alternative (ADHD → lead exposure). The present article is designed to introduce principles of DDA to quantitative researchers. Although previous work (e.g., Dodge & Rousson, 2000, 2001; Dodge & Yadegari, 2010; Muddapur, 2003; Sungur, 2005; Wiedermann & von Eye, 2015a, 2015b, 2015c) has focused on direction dependence methods to choose between the two models x → y and y → x (Figs. 1a and b), direction dependency in the presence of confounders has received considerably less attention. To fill this void, we present extensions of DDA to scenarios in which confounders are present and incorporate these new insights into the existing direction dependence principle. As a result, we propose a unified framework that allows one to identify each explanatory model given in Fig. 1.

The article is structured as follows: First, we define statistical models suitable for DDA and introduce model assumptions. We then introduce elements of DDA and summarize new results that describe the behavior of DDA tests when unmeasured confounders are present. Next, three SPSS macros are introduced that make DDA accessible to applied researchers and a data example is given to illustrate their application. The article closes with a discussion of conceptual and empirical requirements of DDA, potential data-analytic pitfalls, and potential extensions of the DDA methodology. In addition, sample size recommendations based on previous simulation studies are provided.

The direction dependence principle

Direction dependence can be defined as the asymmetry between cause and effect. The model x → y implies that changing x (the cause) changes y (the effect) but changing y does not lead to change in x (see also Pearl, 2009; Peters, Janzing, & Schölkopf, 2017). Reversely, when changing y changes x but, at the same time, changing x does not change y, then the model y → x describes the causal relation. Limitations of conventional association-based approaches to uncover the asymmetry of cause and effect can be explained by the fact that these methods only consider variation up to the second order moments and, thus, rely on correlation in its symmetric form—that is, cor(x, y) = cor(y, x). The key element of DDA is to consider variable information beyond second order moments (specifically skewness and kurtosis) because asymmetry properties of the Pearson correlation and the related linear model appear under nonnormality. These asymmetry properties are of importance when x and y are not exchangeable in their roles as explanatory and response variables without leading to systematic model violations. DDA, thus, requires and makes use of nonnormality of variables to gain deeper insight into the causal mechanism. DDA consists of three core components: (1) distributional properties of observed variables, (2) distributional properties of error terms of competing models, and (3) independence properties of error terms and predictors in the competing models. On the basis of new results regarding direction dependence in the presence of confounders, we show that unique patterns of DDA component outcomes exist for each of the three models shown in Fig. 1. These outcome patterns enable researchers to select between competing models. In the following paragraphs, we formally define the statistical models considered. We then introduce DDA components separately for confounder-free “true” models and when confounders are present. In addition, statistical inference compatible with direction dependence is discussed. To simplify presentation, we assume that x → y corresponds to the “true” model and y → x represents the directionally mis-specified model.

Model definitions

We start the introduction to DDA by defining the statistical models considered. Although statistical models can either be used for the purposes of explanation or prediction^{Footnote 2} (Geisser, 1993), DDA is designed for the task of validating explanatory models, that is, to test the causal hypothesis assumed under a given theory. Assume that a construct $ \mathcal{X} $ (e.g., lead exposure) causes construct $ \mathcal{Y} $ (e.g., ADHD symptomatology) through mechanism $ \mathcal{F} $—that is, $ \mathcal{Y}=\mathcal{F}\left(\mathcal{X}\right) $. Furthermore, let x and y be operationalizations of $ \mathcal{X} $ and $ \mathcal{Y} $ (e.g., blood lead concentration and number of DSM-IV hyperactive–impulsive symptoms) and define f as the statistical model (e.g., the linear model) to approximate $ \mathcal{F} $—that is, y = f(x). The direction dependence framework provides a set of statistical tools to evaluate the directionality assumption of y = f(x) implied by the causal theory $ \mathcal{X} $→ $ \mathcal{Y} $.

DDA assumes that $ \mathcal{X} $ is a nonnormally distributed construct whose cause lies outside the causal mechanism $ \mathcal{X} $→ $ \mathcal{Y} $ and that the two observed variables, x and y, are continuous. Furthermore, we assume that the data-generating mechanism is recursive in nature (i.e., the causal flow is unidirectional without feedback loops) and can be approximated by the linear model—that is, given that $ \mathcal{X} $→ $ \mathcal{Y} $ constitutes the “true” mechanism, y = f(x) can be written as (cf. Fig. 1a; for simplicity, but without loss of generality, we assume that the intercept is fixed at zero)

$$ y={b}_{yx}x+{e}_{yx} $$

(1)

The slope b_yx denotes the change in the fitted value of y for a one-unit increase in x and represents the causal effect of x → y. Estimates of the causal effect are usually obtained using OLS or, in structural equation models (SEMs), maximum likelihood estimation. Nonnormality of the “true” predictor, quantified as nonzero skewness $ {\gamma}_x=E\left[{\left(x-E\left[x\right]\right)}^3\right]/{\sigma}_x^3 $ and/or nonzero excess kurtosis $ {\kappa}_x=E\left[{\left(x-E\left[x\right]\right)}^4\right]/{\sigma}_x^4-3 $ (with E being the expected value operator), is assumed to reflect an inherent distributional characteristic of $ \mathcal{X} $ (as opposed to nonnormality due to boundaries of the operationalized x). The error term e_yx is assumed to be normally distributed (with zero mean and variance $ {\sigma}_{e_{yx}}^2 $), serially independent, and independent of x.

When $ \mathcal{Y} $ → $ \mathcal{X} $ instead of $ \mathcal{X} $ → $ \mathcal{Y} $ describes the causal mechanism (Fig. 1b), the corresponding linear model is

$$ x={b}_{xy}y+{e}_{xy} $$

(2)

where b_xy captures the causal effect of y on x. Here, y represents a nonnormal external influence and e_xy denotes a normally distributed error term with zero mean and variance $ {\sigma}_{e_{xy}}^2 $, which is serially independent and independent of y.

Finally, as a third possible explanation, we consider an unknown confounding construct $ \mathcal{U} $ (and its continuously operationalized variable u) that has a causal effect on both $ \mathcal{X} $ and $ \mathcal{Y} $. A conceptual diagram of a confounded x–y relation is given in Fig. 1c. The dashed arrows in Fig. 1c indicate that either the model x → y in Eq. 1 or y → x in Eq. 2 can be biased by confounding. For the purpose of introducing the confounder model, we focus on the model x → y. In this case, the model in Eq. 1 changes to

$$ {\displaystyle \begin{array}{l}x={b}_{xu}u+{e}_{xu}\\ {}y={b}_{yx}x+{b}_{yu}u+{e}_{yu}\end{array}} $$

(3)

where b_xu and b_yu denote the effects of the confounder on x and y, respectively. The considered setup also includes the case of “pure” confounding as a special case (i.e., the observed association between x and y is entirely attributable to the existence of u) when b_yx = 0. In Model 3, u is assumed to be a nonnormally distributed external influence, and e_xu and e_yu are normally distributed error terms (exhibiting zero means and variances $ {\sigma}_{e_{xu}}^2 $ and $ {\sigma}_{e_{yu}}^2 $) that are independent of u and of each other.

It is common to include covariates (e.g., background or baseline measures) in statistical models to increase precision of parameter estimates and statistical power. In other words, covariates are independent variables that are considered in a target model to control for their influences on the putative response. In contrast, failing to include important covariates can lead to confounded parameter estimates when covariates are (in addition to their relation to the response) correlated with other predictors. However, several authors have cautioned against careless use of covariates because conditioning on covariates can also increase the bias of causal estimates (Pearl, 2009; Spirtes, Richardson, Meek, Scheines, & Glymour, 1998). Similar considerations hold for statistical models in the context of DDA. To be eligible for DDA, covariates must be known to be on the explanatory side of the statistical model. In addition, one must ensure that a recursive causal ordering of the covariates themselves is theoretically possible and that all covariates can be expressed as linear combinations of mutually independent external influences. We can formally express these prerequisites for a given set of covariates z_j (j = 1, ... , J) as z_j = ∑_{k(i) ≤ k(j)}a_jiη_i, with k(i) ≤ k(j) describing the causal order of covariates (i.e., z_i precedes z_j). The parameter a_ji describes the total effect and η_i denoting the external influence associated with z_i. When no other covariate precedes z_j, one obtains z_j = η_j with a_ji = 1. For example, suppose that two covariates (z₁ and z₂) are known to influence y and that z₁ precedes z₂ and no other covariate precedes z₁ (i.e., z₁ = η₁). In this case, one obtains z₂ = a₂₁η₁ with z₁ = η₁, which implies that z₂ can be expressed as a (weighted) external influence. Consider again the example of ADHD and blood lead exposure. Two factors that are known to affect ADHD symptomology are prenatal maternal emotional stress (Harris & Seckl, 2011) and the cultural context of the child (Miller, Nigg, & Miller, 2009; see also Nigg, 2012). Arguments of temporality or logical order of effects can be used to evaluate the eligibility of covariates for DDA. Both, prenatal maternal stress and cultural context are located earlier in time than the child’s blood lead level and ADHD symptomology under study that justifies their use as background variables. Furthermore, in principle, we are also able to establish a causal order of the covariates themselves—that is, cultural context may be conceived as a background variable directly or indirectly contributing to maternal stress level. In other words, both variables are unlikely to render the target model cyclic, which makes them eligible to be covariates in DDA. In general, covariates can be either continuous or categorical. For categorical covariates, however, we need to assume that these variables constitute external influences themselves—that is, we exclude cases in which categorical variables serve as outcomes of other independent variables in the model (detailed explanations will be given below). Although this assumption is stricter than the continuous case, it still allows multiple-group scenarios in which the magnitude of the causal effect of predictor and outcome can vary across groups. When categorical covariates are present, a two-stage approach of model estimation is preferable. That is, in a first step, the effect of categorical covariates is partialled out of the putative predictor (e.g., x), the putative outcome (y), and all the continuous covariates and extracted regression residuals from these auxiliary models are subsequently used as “purified” measures (an example is given below). According to the Frisch–Waugh–Lovell theorem (cf. Frisch & Waugh, 1933; Lovell, 1963; sometimes called the regression anatomy formula: Angrist & Pischke, 2009), regressing the “purified” outcome on the “purified” independent variables in the second step leads to the same model parameters as in the full multiple regression model including categorical covariates.

DDA Component I: Distributional properties of observed variables

Absence of confounders

Asymmetry properties in terms of observed variable distributions emerge from the additive nature of the linear model—that is, a response is defined as the sum of a (nonnormally distributed) explanatory part and a (normally distributed) error component. Intuitively, distributional differences of predictor and response variables emerge because the response is defined as the convolution of a nonnormal and a normal variate. In other words, adding a normal error term to a nonnormal predictor will necessarily cause the response to be more normally distributed than the predictor. Dodge and Rousson (2000, 2001) as well as Dodge and Yadegari (2010) presented algebraic proofs for this relation and showed that the Pearson correlation ρ_xy has asymmetric properties when considering higher moments of x and y. Specifically, the cube of ρ_xy can be expressed as the ratio of skewness of response and predictor,^{Footnote 3}

$$ {\rho}_{xy}^3=\frac{\gamma_y}{\gamma_x} $$

(4)

(as long as γ_x ≠ 0) and the fourth power of ρ_xy can be written as the ratio of excess kurtosis of response and predictor,

$$ {\rho}_{xy}^4=\frac{\kappa_y}{\kappa_x} $$

(5)

(as long as κ_x ≠ 0). Because ρ_xy is bounded on the interval [–1, 1], absolute values of skewness and excess kurtosis of the response y will always be smaller than absolute skewness and excess-kurtosis values of the predictor x. In other words, when Model 1 approximates the data-generating mechanism, y will be closer to the normal distribution than x. This asymmetry property opens the door for evaluation of the directional plausibility of a linear model by evaluating the skewness and excess kurtosis of a tentative response and a tentative predictor. Note that Eqs. 4 and 5, as proposed by Dodge and Rousson (2000, 2001), hold for the bivariate case. However, a two-step regression approach can be used to adjust for covariates defined above. First, two regression models are estimated in which x and y serve as responses and covariates z_j are used as independent variables—that is, $ y={\sum}_{j=1}^J{b}_{yz_j}{z}_j+{e}_{yz_j} $ and $ x={\sum}_{j=1}^J{b}_{xz_j}{z}_j+{e}_{xz_j} $. Next, the estimated regression residuals of the two models, $ {e}_{yz_j} $ and $ {e}_{xz_j} $, are used as auxiliary variables reflecting the (unexplained) portion of variation after adjusting for the covariates z_j. Regressing $ {e}_{yz_j} $ on $ {e}_{xz_j} $ gives the same regression coefficient as obtained in the multiple linear model {x, z_j} → y and the OLS model $ {e}_{yz_j}\to {e}_{xz_j} $ gives the identical estimate as in the multiple linear model {y, z_j} → x. Direction dependence decisions can then be based on these auxiliary measures. For example, for one covariate z, Model 1 extends to

$$ y={b}_{yx}x+{b}_{yz}z+{e}_{y(xz)} $$

(6)

“Purified” measures of x and y are obtained through $ {e}_{xz}=x-{b}_{xz}^{\prime }z $ and $ {e}_{yz}=y-{b}_{yz}^{\prime }z $, where $ {b}_{xz}^{\prime } $ and $ {b}_{yz}^{\prime } $ denote the OLS estimates when regressing x and y on z. Then Eq. 6 can be rewritten as

$$ {e}_{yz}={a}_{yx}{e}_{xz}+{\theta}_{yx} $$

(7)

with $ {a}_{yx}={b}_{yx}=\left({\rho}_{xy}-{\rho}_{yz}{\rho}_{xz}\right)/\left(1-{\rho}_{xz}^2\right) $ being the partial regression coefficient and θ_yx denoting the error term that is identical to e_y(xz). For the model in Eq. 7, one obtains (a proof is given in online Appendix A)

$$ {\rho}_{xy\left|z\right.}^3=\frac{\gamma_{e_{yz}}}{\gamma_{e_{xz}}} $$

(8)

$$ {\rho}_{xy\left|z\right.}^4=\frac{\kappa_{e_{yz}}}{\kappa_{e_{xz}}} $$

(9)

(as long as $ {\upgamma}_{e_{xz}}\ne 0 $ and $ {\kappa}_{e_{xz}}\ne 0 $), where $ {\rho}_{xy\mid z}={a}_{yx}\frac{\sigma_{e_{xz}}}{\sigma_{e_{yz}}}=\frac{\rho_{xy}-{\rho}_{yz}{\rho}_{xz}}{\sqrt{1-{\rho}_{xz}^2}\sqrt{1-{\rho}_{yz}^2}} $ is the partial regression coefficient of x and y adjusting for z, and $ {\gamma}_{e_{yz}} $, $ {\gamma}_{e_{xz}} $, $ {\kappa}_{e_{yz}} $, and $ {\kappa}_{e_{xz}} $ are the skewness and excess-kurtosis values of e_yz and e_xz. Because −1 ≤ ρ_xy∣z ≤ 1, higher moments of e_yz and e_xz possess the same properties as higher moments of x and y in the bivariate case. Under the model x → y, we obtain $ \mid {\gamma}_{e_{yz}}\mid <\mid {\gamma}_{e_{xz}}\mid $ and $ \mid {\kappa}_{e_{yz}}\mid <\mid {\kappa}_{e_{xz}}\mid $, whereas $ \mid {\gamma}_{e_{yz}}\mid >\mid {\gamma}_{e_{xz}}\mid $ and $ \mid {\kappa}_{e_{yz}}\mid >\mid {\kappa}_{e_{xz}}\mid $ hold under model y → x.

Presence of confounders

Because any continuous nonnormal confounder can affect both the distribution of x and the distribution of y, directional decisions based on higher moments of x and y are influenced by (1) the magnitude of nonnormality of u, (2) the connection strength of u and x, and (3) the connection strength of u and y. Formally, this can be shown by rewriting higher moments of x and y as a function of higher moments of u—that is, $ {\gamma}_x={\rho}_{xu}^3{\gamma}_u $, $ {\kappa}_x={\rho}_{xu}^4{\kappa}_u $, $ {\gamma}_y={\rho}_{yu}^3{\gamma}_u $, and $ {\kappa}_y={\rho}_{yu}^4{\kappa}_u $ (which follows from applying Dodge and Rousson’s, 2000, 2001, results to Model 3). Thus, the statistical power to determine the direction of effect depends on the magnitude of confounding effects and the degree of nonnormality of the confounder. If either the distribution of the confounder is close to normality or the influence of the confounder is weak (i.e., ρ_xu and ρ_yu are close to zero), no decisions can be made due to lack of sufficient nonnormality of x and y. The influence of the confounder on direction dependence decisions is given through

$$ \frac{\gamma_y}{\gamma_x}={\left(\frac{\rho_{yu}}{\rho_{xu}}\right)}^3 $$

(10)

and

$$ \frac{\kappa_y}{\kappa_x}={\left(\frac{\rho_{yu}}{\rho_{xu}}\right)}^4 $$

(11)

Thus, directional conclusions depend on the relative strength of the confounding effects. No biases in terms of model selection are expected when ∣ρ_yu ∣ < ∣ ρ_xu∣because ∣γ_y ∣ < ∣ γ_x∣ and ∣κ_y ∣ < ∣ κ_x∣still hold, which suggests the model x → y. In contrast, biases are likely to occur when ∣ρ_yu ∣ > ∣ ρ_xu∣, because ∣γ_y ∣ > ∣ γ_x∣and ∣κ_y ∣ > ∣ κ_x∣ increase the risk of erroneously selecting the mis-specified model y → x.

Statistical inference

von Eye and DeShon (2012) proposed using normality tests, such as D’Agostino’s (1971) skewness and/or Anscombe and Glynn’s (1983) kurtosis test, to evaluate hypotheses compatible with observed-variable based direction dependence. Directional decisions are based on separately evaluating nonnormality of predictor and response. In addition, Pornprasertmanit and Little (2012) suggested nonparametric bootstrap CIs for higher-order moment differences (Δ(γ) = ∣γ_x∣ − ∣γ_y∣ and Δ(κ) = ∣κ_x∣ − ∣κ_y∣).

DDA Component II: Distributional properties of error terms

Absence of confounders

The second DDA component focuses on the distributional shape of the error terms, e_yx and e_xy. In essence, distributional differences of the two error terms are likely to occur when the nonnormal “true” predictor is erroneously used as the outcome because predictor nonnormality will, to some extent, be preserved in the error term of the mis-specified model. Wiedermann, Hagmann, Kossmeier, and von Eye (2013), Wiedermann, Hagmann, and von Eye (2015), and Wiedermann (2015) showed that higher moments of the error term obtained from the mis-specified model (e_xy) can be expressed as functions of the third and fourth moments of the true predictor (x)—that is,

$$ {\gamma}_{e_{xy}}={\left(1-{\rho}_{xy}^2\right)}^{\raisebox{1ex}{$3$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}{\gamma}_x $$

(12)

and

$$ {\kappa}_{e_{xy}}={\left(1-{\rho}_{xy}^2\right)}^2{\kappa}_x $$

(13)

Thus, the skewness and excess kurtosis of e_xy systematically increase with the magnitude of nonnormality of the “true” predictor. Furthermore, because normality of the error term is assumed in the “true” model (i.e.,$ {\gamma}_{e_{yx}}={\kappa}_{e_{yx}}=0 $), differences in higher moments of e_yx and e_xy provide, again, information about the directional plausibility of a linear model. This DDA component can straightforwardly be extended to multiple linear regression models when adjusting for possible covariates (cf. Wiedermann & von Eye, 2015b). Under model x → y, one obtains $ \mid {\gamma}_{e_{xy}}\mid >\mid {\gamma}_{e_{yx}}\mid $ and $ \mid {\kappa}_{e_{xy}}\mid >\mid {\kappa}_{e_{yx}}\mid $; under model y → x, one obtains $ \mid {\gamma}_{e_{xy}}\mid <\mid {\gamma}_{e_{yx}}\mid $ and/or $ \mid {\kappa}_{e_{xy}}\mid <\mid {\kappa}_{e_{yx}}\mid $.

Presence of confounders

When an unmeasured confounder is present, the two competing models can be written as

$$ y={b}_{yx}^{\prime }x+{e}_{yx}^{\prime } $$

(14)

$$ x={b}_{xy}^{\prime }y+{e}_{xy}^{\prime } $$

(15)

where $ {b}_{yx}^{\prime } $ and $ {b}_{xy}^{\prime } $ are biased estimates of b_yx and b_xy. Although the causal estimate in Eq. 14 is biased, the model still correctly represents the data-generating process in terms of directionality. In this case, higher moments of $ {e}_{yx}^{\prime } $ and $ {e}_{xy}^{\prime } $ depend on the magnitude of nonnormality of u and the magnitudes of b_xu and b_yu. Specifically, the higher moments can be written as functions of semipartial correlations and higher moments of u. That is, for $ {e}_{yx}^{\prime } $ one obtains

$$ {\displaystyle \begin{array}{l}{\gamma}_{e_{yx}^{\prime }}={\rho}_{y\left(u|x\right)}^3{\gamma}_u\\ {}{\kappa}_{e_{yx}^{\prime }}={\rho}_{y\left(u|x\right)}^4{\kappa}_u\end{array}} $$

(16)

and for $ {e}_{xy}^{\prime } $ one obtains

$$ {\displaystyle \begin{array}{l}{\gamma}_{e_{xy}^{\prime }}={\rho}_{x\left(u|y\right)}^3{\gamma}_u\\ {}{\kappa}_{e_{xy}^{\prime }}={\rho}_{x\left(u|y\right)}^4{\kappa}_u,\end{array}} $$

(17)

with $ {\rho}_{y\left(u|x\right)}=\left({\rho}_{yu}-{\rho}_{xy}{\rho}_{xu}\right)/\sqrt{1-{\rho}_{xy}^2} $ being the semipartial correlation coefficient for y and u given x and $ {\rho}_{x\left(u|y\right)}=\left({\rho}_{xu}-{\rho}_{xy}{\rho}_{yu}\right)/\sqrt{1-{\rho}_{xy}^2} $ describing the semipartial correlation between x and u given y (see online Appendix A for a proof). The distribution of both error terms will be close to normality and, thus, no distinct decision is possible when u is close to normality and/or semi-partial correlations are close to zero. If the confounder is sufficiently nonnormal, the distributional properties of error terms and, thus, of directional decisions depend on the magnitude of the semipartial correlations. Unbiased directional decisions are possible when ∣ρ_y(u| x) ∣ < ∣ ρ_x(u| y)∣ because $ \mid {\gamma}_{e_{xy}^{\prime }}\mid >\mid {\gamma}_{e_{yx}^{\prime }}\mid $ and $ \mid {\kappa}_{e_{xy}^{\prime }}\mid >\mid {\kappa}_{e_{yx}^{\prime }}\mid $, which implies x → y. In contrast, if ∣ρ_y(u| x) ∣ > ∣ ρ_x(u| y)∣, then erroneously selecting y → x is likely to occur because $ \mid {\gamma}_{e_{xy}^{\prime }}\mid <\mid {\gamma}_{e_{yx}^{\prime }}\mid $ and $ \mid {\kappa}_{e_{xy}^{\prime }}\mid <\mid {\kappa}_{e_{yx}^{\prime }}\mid $.

Statistical inference

Again, nonnormality tests can be used to separately evaluate distributional properties of model residuals (cf. Wiedermann et al., 2015). An asymptotic significance test and bootstrap CIs for the skewness difference of residuals ($ \Delta \left({\gamma}_e\right)=\mid {\gamma}_{e_{xy}}\mid -\mid {\gamma}_{e_{yx}}\mid $) have been proposed by Wiedermann et al. (2015) and Wiedermann and von Eye (2015b). The asymptotic test requires normality of the “true” error term. Only error symmetry is required for the bootstrap approach. Analogous procedures for the difference in excess-kurtosis values were discussed by Wiedermann (2015).

DDA Component III: Independence properties of predictor and error term

Absence of confounders

The independence assumption in the linear model implies that the magnitude of the error made when fitting the response is not related in any form to the predictor(s). In OLS regression, it is well-known that estimated residuals will be linearly uncorrelated with the predictor(s), which holds even when the model is directionally mis-specified. However, when the “true” predictor x is nonnormal, the error term and the predictor of the mis-specified model, y and e_xy, will be stochastically nonindependent. To illustrate this, we start with a simulated data example. Two variables (x and y) were generated according to the linear model x → y (with zero intercept, unit slope, and a standard normal error term e_yx). The “true” predictor x was either drawn form a standard normal (γ_x = κ_x = 0), a standard uniform (i.e., γ_x = 0, κ_x = –1.2), or a chi-square distribution with eight degrees of freedom (γ_x = 1, κ_x = 1.5). Figure 2 shows scatterplots of the observed predictors and estimated residuals for the “true” model and the mis-specified model y → x. In the normal case, the two models cannot be distinguished from each other. That is, for both models circular data patterns occur that can be expected due to linear uncorrelatedness. This no longer holds for nonnormal predictors. Here, the two competing models are mutually distinguishable. Although linear uncorrelatedness also holds for all nonnormal data scenarios, clear dependence structures occur in the mis-specified model. Note that these dependence structures are not the result of special properties of the uniform and the chi-square distributions. In fact, the opposite is the case. The normal distribution constitutes the special case in which competing models cannot be uniquely distinguished, because uncorrelatedness implies stochastic independence in the normal domain (cf. Hoyer, Shimizu, Kerminen, & Palviainen, 2008).

Formally, nonindependence in the mis-specified model becomes intuitively obvious if we solve for the error term of the mis-specified model in Eq. 2 and insert the “true” Model 1, which results in (see also Entner, Hoyer, & Spirtes, 2012; Shimizu, 2016)

$$ {e}_{xy}=x-{b}_{xy}y=\left(1-{\rho}_{xy}^2\right)x-{b}_{xy}{e}_{yx} $$

(18)

Thus, both the “true” predictor x and the “true” error term e_yx contribute to y in Eq. 1 and e_xy in Eq. 18. Although this illustration serves as an intuitive explanation, a rigorous proof of nonindependence follows from the Darmois–Skitovich theorem (Darmois, 1953; Skitovich, 1953). The theorem states that if two linear functions (υ₁ and υ₂) of the same independent random variables w_j (j = 1 ,..., J), υ₁ = ∑_jα_jw_j and υ₂ = ∑_jβ_jw_j, with α_j and β_j being constants, are independent, then all w_j for which α_jβ_j ≠ 0 must be normally distributed. The reverse corollary implies that if a common w_j exists that is nonnormal, then υ₁ and υ₂ must be nonindependent (cf. Shimizu et al., 2011; Wiedermann & von Eye, 2015a). Thus, e_xy in Eq. 18 and y in Eq. 1 are nonindependent because of the common nonnormal variable x, and $ \left(1-{\rho}_{xy}^2\right){b}_{yx}\ne 0 $ (excluding ∣ρ_xy∣ = 1, due to practical irrelevance). Since the Darmois–Skitovich theorem applies for J variables, covariates can straightforwardly be included in the Models 1 and 2, provided that the covariates fulfill the requirements described above. However, the Darmois–Skitovich theorem concerns continuous random variables w_j. Thus, when categorical covariates exist, a two-step regression approach should be applied first with subsequent DDA being performed on residualized x and y variables. Because independence is assumed in the correctly specified model, direction dependence statements are possible through separately evaluating independence in competing models (cf. Shimizu et al., 2011; Wiedermann & von Eye, 2015a). In essence, if the null hypothesis H₀ : x ⊥ e_yx is retained and, at the same time, H₀ : y ⊥ e_xy is rejected, then it is more likely that the observed effect transmits from x to y. Conversely, if H₀ : x ⊥ e_yx is rejected and H₀ : y ⊥ e_xy is retained, then the model y → x should be preferred.

Presence of confounders

When confounding affects the relation between x and y, predictor(s) and errors of both models contain information of the confounder. This can be shown by solving Eqs. 14 and 15 for the corresponding error terms, which gives

$$ {e}_{yx}^{\prime }=\left[{b}_{yu}+\left({b}_{yx}-{b}_{yx}^{\prime}\right){b}_{xu}\right]u+\left({b}_{yx}-{b}_{yx}^{\prime}\right){e}_{xu}+{e}_{yu} $$

(19)

$$ {e}_{xy}^{\prime }=\left[{b}_{xu}-{b}_{xy}^{\prime}\left({b}_{yu}+{b}_{yx}{b}_{xu}\right)\right]u+\left(1-{b}_{xy}^{\prime }{b}_{yx}\right){e}_{xu}+{b}_{xy}^{\prime }{e}_{yu} $$

(20)

Thus, through reconsidering the “true” model given in Eq. 3 and, again, making use of the Darmois–Skitovich theorem, one concludes that the independence assumption is likely to be violated in both candidate models whenever a nonnormal confounder is present and $ \left[{b}_{yu}+\left({b}_{yx}-{b}_{yx}^{\prime}\right){b}_{xu}\right]{b}_{xu} $ and $ \left[{b}_{xu}-{b}_{xy}^{\prime}\left({b}_{yu}+{b}_{yx}{b}_{xu}\right)\right]{b}_{yu} $ deviate from zero.

Statistical inference

Significance tests to evaluate nonindependence of (linearly uncorrelated) variables have extensively been discussed in signal processing (Hyvärinen, Karhunen, & Oja, 2001). The first class of tests considered here uses the basic definition of stochastic independence, E[g₁(υ₁)g₂(υ₂)] − E[g₁(υ₁)]E[g₂(υ₂)] = 0 for any absolutely integrable functions g₁ and g₂. Thus, independence tests can be constructed using correlation tests of the form cor[g₁(x), g₂(e_yx)] and cor[g₁(y), g₂(e_xy)] , where at least one function is nonlinear. These tests are easy to use because they essentially rely on the Pearson correlation test applied to nonlinearly transformed variables.

Two nonlinear functions may be of particular value in the present context: the square function, g(υ) = υ², and the hyperbolic tangent function, g(υ) = tanh(υ). The square function constitutes a powerful candidate because covariances of predictor and error in the mis-specified model contain information of higher moments of the “true” predictor (Wiedermann and von Eye, 2015a)—that is,

$$ \operatorname{cov}\left(y,{e}_{xy}^2\right)={\rho}_{xy}{\left(1-{\rho}_{xy}^2\right)}^2{\gamma}_x $$

(21)

$$ \operatorname{cov}\left({y}^2,{e}_{xy}\right)={\rho}_{xy}^2\left(1-{\rho}_{xy}^2\right){\gamma}_x $$

(22)

$$ \operatorname{cov}\left({y}^2,{e}_{xy}^2\right)={\rho}_{xy}^2{\left(1-{\rho}_{xy}^2\right)}^2{\kappa}_x $$

(23)

In other words, the power of detecting nonindependence in y → x increases with nonnormality of x. Although proofs for Eqs. 21 and 22 can be found in Wiedermann and von Eye (2015a, 2016), a proof of Eq. 23 is given in online Appendix A. Note that the covariances in Eqs. 21 and 23 involve squared residuals, which reveals a direct link to significance tests originally designed for detecting patterns of heteroscedasticity (cf. Wiedermann, Artner, & von Eye, 2017). Because heteroscedasticity occurs, among others, when the variance of the error can be expressed as a function, g, of independent variables—that is, $ Var\left({e}_{xy}|y\right)={\sigma}_{e_{xy}}^2g(y) $ (see, e.g., Kaufman, 2013), it follows that homoscedasticity tests that relate squared residuals to functions of model predictors (such as the Breusch–Pagan test), are likely to indicate patterns of nonconstant error variances in directionally mis-specified models.

Covariances based on the hyperbolic tangent function have been proposed by Hyvärinen (2010) and Hyvärinen and Smith (2013). The value tanh(υ) is the derivative of the log-density of an inverse hyperbolic cosine distribution that provides an approximation of the likelihood ratio of directionally competing models in the bivariate case. The inverse hyperbolic cosine distribution constitutes a reasonably close approximation for several leptokurtic observed variables (cf. Mumford & Ramsey, 2014). Although tanh-based correlation tests are ideally suited for symmetric nonnormal variables, the statistical power of this approach can be expected to be low for skewed variables.

Since the choice of g₁ and g₂ is almost arbitrary, nonlinear approaches do not constitute rigorous directionality tests. Additional Type II errors beyond cases of small sample sizes are introduced because testing all existing functions g₁ and g₂ is impossible. Recently, a promising alternative was suggested, the Hilbert-Schmidt Independence Criterion (HSIC; Gretton et al., 2008). The HSIC evaluates the independence of functions of random variables and is provably omnibus in detecting any dependence between two random variables in the large sample limit. Sen and Sen (2014) introduced the HSIC in the context of testing the independence of predictors and error terms of linear regression models and proposed a bootstrap approach to approximating the distribution of the test statistic.

Model selection

Reconsidering the possible outcomes for the three DDA components, it becomes evident that each model in Fig. 1 can be uniquely identified through specific DDA-component patterns. Table 1 summarizes these patterns, together with corresponding significance tests for each DDA component. In general, DDA model selection requires the specification of a target and an alternative model. The selection of whether x → y or y → x serves as the target model is arbitrary in terms of model comparison. However, following the logic of confirmatory model testing, we suggest that the target model reflect the substantive causal theory of interest and that the alternative model reflect the contradicting theory. The target model, for example x → y, finds support when (1) the distribution of the response y is closer to normality than the distribution of x, (2) the residual distribution of x → y is closer to normality than the residuals of y → x, and (3) the independence assumption of residuals and predictor(s) holds for x → y and is, at the same time, violated for model y → x. Here, DDA’s independence procedures constitute the key element to test for the presence of unobserved confounders. In this case, independence must hold for x → y and the independence assumption must be violated for y → x in order to conclude that an effect is transmitted from x to y. Otherwise, one has to conclude that unmeasured confounders are present whenever the independence assumption is either violated or satisfied in both models (the latter possibility emerges from the fact that confounders can decrease the skewness/excess kurtosis of x and y to a degree that renders nonindependence no longer detectable). When independence tests allow a distinct decision, higher-moment tests for the observed variables and residuals are then used to test the directionality assumption inherent to the target model.

Table 1 Properties, significance tests, and patterns of DDA components for the three candidate models

Full size table

A worked empirical example with SPSS

To make DDA accessible to applied researchers, we provide three SPSS macros. analyzes the distributional properties of the variables, evaluates the distributional properties of the residuals, and implements tests to evaluate the independence assumption regarding the competing candidate models (the DDA macros and introductory material can be downloaded from [http://www.ddaproject.com/]). All macros make use of positional arguments to control the input parameters. Table 2 summarizes the input parameters and options for the three macros, together with generic examples of their use. Computational details and introductions into implemented DDA inference methods are given in online Appendix B.

Table 2 Summary of arguments and their position in the three SPSS macros

Full size table

To present a fully worked empirical example (consisting of preevaluating distributional requirements, building a valid target model, and subsequently using DDA) and demonstrate the use of the macros for DDA, we use data from a cross-sectional study on the triple-code model (Dehaene & Cohen, 1998). The triple-code model is used to explain the development of numerical cognition in children and proposes that numbers are represented in three different codes that serve different purposes in number processing. The analog magnitude code (AMC) represents numbers on a mental number line, includes knowledge of the proximity and size of numerical quantities, and is used in approximate estimations and magnitude comparisons. The auditory verbal code (AVC) represents numbers in syntactically organized word sequences that are important for verbal input/output, counting, and retrieving memorized arithmetic facts. The visual Arabic code (VAC) represents numerical quantities in Arabic format necessary for multidigit operations and parity judgments. Using the triple code model, von Aster and Shalev (2007) suggested a hierarchical developmental model of numerical cognition in which AMC is viewed as an inherited core system necessary to further develop the AVC and, as well, the VAC. In other words, the model posits a directional link between AMC and AVC (i.e., AMC → AVC) and AMC and VAC (i.e., AMC → VAC). In the present demonstration, we focus on the directionality of AMC and AVC.

Koller and Alexandrowicz (2010) collected AMC and AVC ability measures for 341 second- to fourth-grade elementary school children (185 girls and 156 boys, aged between 6 and 11 yrs.) using the Neuropsychological Test Battery for Number Processing and Calculation in Children (ZAREKI-R; von Aster, Weinhold Zulauf, & Horn, 2006). AMC sum scores are based on 31 dichotomous items (focusing on perceptual quantity estimation, placing numbers on an analog number line, counting backward, enumeration, magnitude comparison of spoken numbers, and contextual magnitude judgment), and AVC sum scores are based on 52 dichotomous items (mental calculations [addition, subtraction, and multiplication], repeating numbers forward and backward, and story problems). The sum scores were standardized prior to the analysis in order to improve interpretability. Because fourth-graders were most likely to solve all items of the AMC scale, we focused on the second- and third-grade children (n = 216; 123 girls and 93 boys) in order to avoid biased DDA results due to ceiling effects.

DDA was used to evaluate the two competing regression models (AMC → AVC vs. AVC → AMC) under adjustment for the covariates age (variable ), time needed for test completion in minutes (as an indirect measure of the perceived test difficulty; variable ), and preexisting difficulties with numerical quantities (0 = no, 1 = yes; variable ). All covariates preceded test performance in time, and we could exclude cyclic relations. Table 3 shows pairwise correlations and descriptive measures for all considered variables. Before applying DDA, two preevaluation stages are crucial to obtaining meaningful results: (1) evaluation of the distributional requirements for DDA and (2) carefully building a valid target model. Both stages are discussed in detail below.

Table 3 Bivariate Pearson correlations and descriptive measures of observed variables (means and standard deviations of AMC and AVC are based on sum scores)

Full size table

Distributional requirements for DDA

DDA requires that the distributions of the observed variables deviate from normality. Thus, before estimating the target model (AMC → AVC), we evaluated the assumption of nonnormality of the variables. The AMC and AVC measures were negatively skewed, with excess-kurtosis values greater than zero (Table 3). The Shapiro–Wilk test rejected the null hypothesis of normality for both ability measures (ps < .001). Visual inspection was used to rule out the presence of outliers, and frequencies of the minimum/maximum scores were computed in order to assess potential floor/ceiling effects. For the AVC scale, no participant reached the minimum or maximum score. For AMC, no participants received the minimum, and 14 out of 216 (6.5%) reached the maximum score, which is clearly below the commonly used cutoff of 15%–20% to define a ceiling effect (e.g., Lim et al., 2015; Terwee et al., 2007). Overall, the variables can be considered in line with the distributional requirements of DDA.

Estimating and validating the target model

We started by partialling out the effect of the binary indicator “difficulties with numbers” using separate OLS regressions and extracted residuals as “purified” AMC and AVC measures—that is, = – (0.032 – 0.850×) and = – (0.035 – 0.795×). Figure 3 shows the univariate distributions and the bivariate scatterplot (with the LOWESS smoothed line superimposed) for and . In a similar fashion, we partialled out the effect of the binary indicator on the remaining continuous covariates—that is, = – (8.222 + 0.111 ×), and = – (28.411 + 2.535 ×).

Next, we estimated the target model ( → ) under adjustment for the continuous covariates ( and ) and evaluated the validity of the model using regression diagnostics. Table 4 summarizes the results for both the target model (upper panel) and the alternative model (lower panel). The linearity assumption of the target model was confirmed through inspection of the LOWESS plots and inclusion of higher-order terms. Adding quadratic terms for the predictors did not significantly improve the model fit (e.g., including squared values of increased the model R² from .507 to .510, which was nonsignificant on the basis of a 5% significance level). Variance inflation factors for the predictors varied from 1.16 to 1.27, suggesting the absence of multicollinearity issues. Furthermore, we estimated leverage values and Cook’s distances for the model, to check for the presence of highly influential observations. We excluded one observation from the subsequent DDA with a maximum Cook’s distance of .195 (95% of the observations had a Cook’s distance smaller than or equal to .023) and a leverage value of 0.075, which exceeded three times the average leverage values.

Table 4 Results of the two competing models (B = unstandardized coefficients, Std. Error = standard error, Beta = standardized coefficients)

Full size table

Evaluating the direction of effect

To test whether the target model was indeed better-suited to approximate the data-generating mechanism, we first applied the macro. After setting the working directory C:\myproject (used to save and read temporary files during computations) and reading and activating the dataset data_example.sav using the code

the following command gives the results for observed-variable-based direction dependence tests using 1,000 bootstrap samples (used for the construction of confidence interval [CI] limits), a confidence level of 95%, and two-sided significance tests of normality:

The corresponding output is given in Box 1. The upper panel summarizes the results of D’Agostino skewness and Anscombe–Glynn kurtosis tests for the putative response (columns 1–3) and predictor (columns 4–6). Skewness and excess-kurtosis values were close to zero for and we can retain the null hypothesis of normality. In contrast, significantly deviated from normality with respect to skewness. The results for excess-kurtosis estimates point in the same direction. The lower panel of Box 1 reports the 95% nonparametric bootstrap CIs for the differences in skewness Δ(γ) = ∣γ_{AMC_r}∣−∣γ_{AVC_r}∣ and excess kurtosis Δ(κ) = ∣κ_{AMC_r}∣−∣κ_{AVC_r}∣. Although is significantly more skewed than , the difference in excess kurtosis was nonsignificant. Overall, the third-moment estimates provide evidence in line with direction dependence requirements necessary for .

Box 1. Results from

Next, we evaluated the properties of the residuals obtained from the two competing models. The following command performs residual-based direction dependence tests (again, using two-sided tests, 1,000 bootstrap samples for constructing CI limits, and a 95% confidence level):

The upper panel of Box 2 summarizes separate skewness and excess-kurtosis tests of the regression residuals. Columns 1–3 refer to the target model, and columns 4–6 give the results for the alternative model. Although the higher-moment estimates were larger (in absolute values) for the alternative model, we cannot the reject the null hypothesis of normality at the 5% level. Similar results were obtained for the higher-moment difference measures$ \Delta \left({\gamma}_e\right)=\mid {\gamma}_e^{\left( AVC\_r\to AMC\_r\right)}\mid -\mid {\gamma}_e^{\left( AMC\_r\to AVC\_r\right)}\mid $ and $ \Delta \left({\kappa}_e\right)=\mid {\kappa}_e^{\left( AVC\_r\to AMC\_r\right)}\mid -\mid {\kappa}_e^{\left( AMC\_r\to AVC\_r\right)}\mid $ (see the lower panel of Box 2). Both the asymptotic higher-moment difference tests (columns 1–3) and 95% nonparametric bootstrap CIs (last two columns) suggested that the two models are not distinguishable in terms of their residual distributions. Thus, no clear-cut decision is possible for this component.

Box 2. Results from

In the final step, we analyzed the independence properties of the two candidate models, which is the most important element for interpreting OLS estimates as causal. The command

computes Breusch–Pagan () homoscedasticity tests and nonlinear correlation tests using the square function. The results are summarized in Box 3. The upper section of the output gives the results for the BP and the robust-BP tests for the and the . Overall, the results were clearly in favor of the target model—that is, the homoscedasticity assumption holds for the target model and, at the same time, is violated for the alternative model. In addition, we used scatterplots of the standardized predicted values and standardized residuals for both models as a visual aid (see Fig. 4). No conspicuous patterns were observed for the target model, whereas the plot for the alternative model suggested an inverse U-shaped pattern. The lower section of Box 3 summarizes the results of nonlinear correlation tests. In general, given a selected function g, Pearson correlation coefficients, t values, and p values are computed for cor[g(pred), e], cor[pred, g(e)], and cor[g(pred), g(e)]. In the present example, nonlinear correlation tests based on the square function again clearly favored the target model; that is, all tests were nonsignificant for the target model, and at the same time, all tests rejected the null hypothesis for the alternative model.^{Footnote 4} Finally, the command

Box 3. Results from using homoscedasticity tests and nonlinear correlation tests based on the square function

computes HSIC tests for the two competing models using 500 bootstrap samples. Box 4 gives the corresponding output. Note that the HSIC will be zero if and only if the predictor and the error term are stochastically independent. Again, a nonsignificant result was observed for the , whereas the HSIC reached significance for the . In sum, all independence measures indicated that → is more likely to hold for the present dataset.

Box 4. Results from using the HSIC test

Considering the overall results of DDA for the numerical-cognition example, we conclude that, taking into account the covariates, AVC is indeed more likely to reflect the response, and AMC is more likely to be on the explanatory side. In other words, on the basis of the present sample, the DDA results empirically support von Aster and Shalev’s (2007) hierarchical developmental model of numerical cognition.

Discussion

DDA allows researchers to test hypotheses compatible with the directional relation between pairs of variables while adjusting for covariates that possibly contribute to the causal process. This empirical falsification approach is based on the translation of a substantive causal theory into a linear target model that is then compared with the corresponding alternative model. The two models differ in the direction that is hypothesized for the causal process. DDA component patterns can then be used to either retain the target model, retain the directionally competing model, or conclude that no distinct decisions are possible due to the presence of unmeasured confounders. Here, it is important to reiterate that directional conclusions derived from DDA component patterns are based on the operationalization of latent constructs $ \mathcal{X} $ and $ \mathcal{Y} $ using the linear model as an approximation of an unknown “true” functional relation $ \mathcal{F} $. Trustworthiness of DDA, thus, ultimately depends on both, the quality of operationalization and the validity of the linear model for the description of the causal mechanism. Although both requirements essentially apply to any linear modeling approach, they deserve particular attention in the context of DDA.

Because higher moments of variables constitute the key elements to select directionally competing models, DDA assumes that nonnormality of variables reflects inherent distributional characteristics of the constructs under study. Although the phenomenon of nonnormal variable distributions and its occurrence in practice have extensively been studied in the psychometric literature (Blanca, Arnau, López-Montiel, Bono, & Bendayan, 2013; Cain, Zhang, & Yuan, 2017; D. L. Cook, 1959; Lord, 1955; Micceri, 1989), not every form of nonnormality makes variables eligible for DDA. In classical test theory, for example, the impact of discrimination and difficulty of a measurement instrument on the relation between latent traits and true score is well understood. To ensure that the observed score distributions adequately reflect distributional properties of a latent trait, the test characteristic curve should go straight through the range of the trait distribution, which is usually achieved by using items with a broad range of difficulties (Lord & Novick, 1968, p. 392). In addition, item response theory (IRT) models^{Footnote 5} such as the Rasch model (Rasch, 1960/1980, for dichotomous data) and the partial credit model (Masters, 1982, for polytomous data) are valuable alternatives. These models (1) come with empirical measures to evaluate the adequacy of describing a given dataset, (2) provide accordingly “weighted” parameter estimates (i.e., taking into account item difficulties), and (3) if the measurement model holds, exhibit the feature of specific objectivity (i.e., items can be compared irrespective of the distribution of person parameters and subjects can be compared using any proper set of items), which allows the most adequate estimation of the underlying trait distributions. For example, data on numerical cognition used for illustrative purposes were shown to be in line with the Rasch model (see Koller & Alexandrowicz, 2010), which implies that raw scores are sufficient statistics for the latent person abilities. In contrast, applying DDA in cases in which nonnormality of variables is a by-product of poor item selection, scaling (Ho & Yu, 2015), or the result of ceiling/floor effects will lead to biased results (note that, in the empirical example, the fourth grade children who were most likely to solve all scale-specific items were excluded to reduce the risk of biases due to ceiling effects). Overall, selecting high-quality measurement instruments at the study planning stage, or carefully evaluating psychometric properties of secondary data are central steps toward meaningful DDA outcomes.

Explanatory modeling, in general, requires that selected statistical models f can easily be linked to the corresponding theoretical model $ \mathcal{F} $ (Shmueli, 2010). Because the “true” data-generating mechanism $ \mathcal{Y}=\mathcal{F}\left(\mathcal{X}\right) $ is unknown in any statistical modeling approach (Cudeck & Henly, 2003) empirical examinations whether y = f(x) is close enough to $ \mathcal{Y}=\mathcal{F}\left(\mathcal{X}\right) $ are impossible. Appropriateness of f must be established indirectly through critically evaluating model validity using regression diagnostics (cf. Belsley, Kuh, & Welsch, 1980; R. D. Cook & Weisberg, 1982). Several model checks are indispensable before applying DDA. First, one needs to ensure that the assumption of linearity is justified (in the illustrative example we used visual diagnostics and evaluated changes in R² values when adding higher polynomials of all continuous variables). Second, evaluating potential issues of multicollinearity (e.g., inspecting pairwise predictor correlations and VIFs) are necessary to avoid biased inference due to inflated standard errors. Third, absence of outliers and highly influential data points must be confirmed (e.g., via examining Cook’s distances, leverage statistics, or deleted studentized residuals). Ideally, the process of building a valid target model and the subsequent evaluation of its directional properties constitute two separate steps. This implies that unintended DDA outcomes should not be used as a basis to delete “misbehaving” data points.

The case of nonnormal “true” errors

The DDA framework presented here assumes that the “true” error follows a normal distribution. Although, in best practice applications, normality of residuals should routinely be evaluated to guarantee valid statistical inference (Box & Watson, 1962; Hampel, 1973; Pearson, 1931; White & MacDonald, 1980), normality is not required for OLS coefficients to be the best linear unbiased estimates. Normal “true” errors are particularly important for residual-distribution-based DDA tests when measures of both, skewness and excess kurtosis, are considered simultaneously because normality of the correctly specified error then serves as a benchmark for model comparison. However, when one only focuses on the skewness of competing error terms, model selection can be performed as long as $ {\gamma}_{e_{yx}} $ = 0—that is, no explicit assumptions are made concerning $ {\kappa}_{e_{yx}} $. Model selection should then be based on nonparametric bootstrap CIs of skewness differences instead of the asymptotic skewness difference test (cf. Wiedermann & von Eye, 2015c). Reversely, when solely focusing on the excess kurtosis of error terms, no explicit assumptions are made concerning symmetry of the “true” error distribution and, as long as $ {\kappa}_{e_{yx}}=0 $ holds for the “true” model, $ {\gamma}_{e_{yx}} $ is allowed to vary within the range $ -\sqrt{2} $ to $ \sqrt{2} $ according to the skewness–kurtosis inequality κ ≥ γ² − 2 (cf. Teuscher & Guiard, 1995).

Although DDA, based on the skewness and excess kurtosis of the observed variables, also requires normality of the “true” errors, focusing on either skewness or the excess kurtosis relaxes distributional assumptions about the “true” error in the same fashion (for a detailed discussion on distinguishing directionally competing models under error nonnormality, see also Wiedermann & Hagmann, 2015). In addition, alternative DDA measures based on higher-order correlations $ {\rho}_{ij}\left(x,y\right)={\operatorname{cov}}_{ij}\left(x,y\right)/\left({\sigma}_x^i{\sigma}_y^j\right) $ with cov_ij(x, y) = E[(x – E[x])ⁱ (y – E[y])^j] are available that do not make any assumptions about the “true” error distribution. Dodge and Rousson (2001) showed that ρ_xy = ρ₁₂(x, y)/ρ₂₁(x, y) holds whenever the “true” predictor is asymmetrically distributed without imposing distributional assumptions on the error. Thus, one obtains $ {\rho}_{12}^2\left(x,y\right)<{\rho}_{21}^2\left(x,y\right) $ under x → y and $ {\rho}_{12}^2\left(x,y\right)>{\rho}_{21}^2\left(x,y\right) $ under y → x independent of the error term distribution. A nonparametric bootstrap approach can again be carried out for statistical inference. Similarly, kurtosis-based DDA measures can be obtained when focusing on $ {\rho}_{13}^2\left(x,y\right) $ and $ {\rho}_{31}^2\left(x,y\right) $ (cf. Wiedermann, 2017). Implementing additional DDA measures for potentially nonnormal “true” errors in is planned in the future.

Methods to assess independence of predictor(s) and error can straightforwardly be applied without any further modification even when the “true” error is nonnormal. The reason for this is that the Darmois–Skitovich theorem, as applied in the present context, does not impose distributional assumptions on the “true” error. Nonindependence of predictor(s) and error will hold when at least one common variable is nonnormal. Thus, evaluating the independence assumption of competing models can be carried out when (1) only the “true” predictor, (2) only the “true” error, or (3) both deviate from normality as along as the product of corresponding coefficients (see Eq. 18) is unequal to zero. However, results of competing BP-tests to assess patterns of heteroscedasticity in the two candidate models must be interpreted with caution when residuals of both models deviate from normality. In this case, Type I error rates of the test will be distorted and directional decisions must be based on Koenker’s robust BP test.

Power and sample size considerations: What we know so far

To provide guidelines for the necessary number of observations to achieve sufficient power, we summarize previous simulation studies on DDA components and focus on three factors that impact empirical power rates: The magnitude of nonnormality, the magnitude of the causal effects, and sample size. Dodge and Rousson (2016) evaluated the power of nonparametric bootstrap CIs of Δ(γ) = |γ_x|–|γ_y| and Δ(κ) = κ_x|–|κ_y| and concluded that skewness-based model selection outperformed the kurtosis-based approach in terms of statistical power to detect the “true” model. Here, for small effects (R² = .25) and skewness values of 2, sample sizes as small as n = 50 may be sufficient to achieve a statistical power close to 80%. In contrast, for kurtosis-based selection, sample sizes of n = 500 and excess-kurtosis values larger 4 are needed to achieve similar statistical power.

Wiedermann and von Eye (2015b) evaluated power properties of residual distribution-based methods considering separate D’Agostino skewness tests, the asymptotic skewness difference test, and nonparametric bootstrap CIs for $ \Delta \left({\gamma}_e\right)=\mid {\gamma}_{e_{xy}}\mid -\mid {\gamma}_{e_{yx}}\mid $, and concluded that acceptable power levels can already be observed for n = 75 when causal effects are small (ρ_xy = .25) and the true predictor is sufficiently skewed (i.e., γ_x ≥ 2). Because model selection based on separate normality tests proves more powerful than tests based on Δ(γ_e), n = 50 may already be sufficient for separate D’Agostino tests. In general, at least n = 125 is required for less skewed variables (e.g., γ_x = 1) and lower correlations (e.g., ρ_xy = .25). Model selection based on excess-kurtosis differences of residual distributions was evaluated by Wiedermann (2015). Again, separate Anscombe–Glynn tests outperformed procedures based on the difference of excess-kurtosis estimates. Here, for n = 200 and ρ_xy = .4, excess-kurtosis values larger than 4 are necessary for power rates close to 80%.

Wiedermann, Artner, and von Eye (2017) compared the performance of nine homoscedasticity tests to evaluate the independence assumption in competing models and showed that the BP-test was the most powerful procedure to select the correct model. For slightly skewed predictors (γ_x = 0.75), large effects and large sample sizes n ≥ 400 may be required to achieve sufficient power. For γ_x ≥ 1.5 and medium effect sizes, at least n = 200 may be required. Quite similar results were obtained for model selection based on nonlinear correlation tests of the form $ cor\left(x,{e}_{yx}^2\right) $ (Wiedermann & von Eye, 2016). However, γ_x ≥ 1.5 and large effects are necessary to obtain power rates beyond 80% when n ≥ 200. Systematic simulation experiments that (1) compare the statistical power of several other independence tests and (2) evaluate all DDA components simultaneously constitute important future endeavors.

Further application scenarios and extensions

It is important to note that the proposed method is not restricted to the presented standard multiple regression setup. DDA is also applicable in other scenarios in which directionality issues have been deemed to be untestable. For example, when a statistical relation between two variables, x and y, has been established, researcher may further entertain hypotheses about the role of a third measured variable. Whether this third variable (m) should be conceptualized as a mediator (an intervening variable that transmits the effect from x to y) or as an observed confounder cannot be answered with standard statistical methods (MacKinnon, Krull, & Lockwood, 2000). From a DDA perspective, distinguishing between these models reduces to separately evaluating the directionality of x and m (i.e., whether x → m or m → x should be preferred) and m and y (i.e., whether m → y or y → m holds for the data) provided that nonnormality requirements are fulfilled (for extensions of residual- and independence-based DDA to mediation models, see Wiedermann & von Eye, 2015c, 2016). Furthermore, the application of DDA may not be restricted to observational studies. Directionality issues may also occur in experimental studies—in particular, those designed to test hypotheses that go beyond total effects in randomized trials. Here, mediation models may, again, provide sound explanations how experimental interventions causally affect the target outcome (Bullock, Green, & Ha, 2010; Heckman & Smith, 1995; Imai, Keele, & Tingley, 2010). However, even when the predictor is under experimental control, it is well-known that neither the direction (Wiedermann & von Eye, 2015c) nor the magnitude of the causal effect of the mediator on the outcome can be identified uniquely without imposing strong assumptions on data, assumptions that are similar to observational studies (Imai, Tingley, & Yamamoto, 2013; Keele, 2015). Again, DDA may help to gain further insight through evaluating competing mediator-outcome paths while adjusting for an experimentally controlled predictor.

Extensions of the direction dependence methodology proposed in this article can go in a number of directions. First, developing DDA for moderation models would enable researchers to test the direction of effect while accounting for a third variable that modifies the relation between predictor and response (the fact that the nature of the moderator effect may depend on the direction of the postulated model has been shown by Judd & Kenny, 2010). Similarly, future work is needed to study principles of direction dependence in polynomial (i.e., models that consider higher-order terms, cf. Aiken & West, 1991) and more general linearizable regression models (i.e., nonlinear regression functions that can be linearized through proper variable transformations). Another possible extension concerns the complexity of the research design. Although the presented framework is designed for single-level data, developing DDA for multilevel regression models (Raudenbush & Bryk, 2002) would allow to account for hierarchical (nested) data structures. Further, throughout the article, we assumed that the “true” predictor is measured without measurement error. Although first attempts to extend DDA components to measurement error models are given in von Eye and Wiedermann (2014) and Wiedermann, Merkle, and von Eye (2018), extending direction dependence to latent variable models may overcome potential biases in directional decisions resulting from imprecise measurement of constructs. Finally, the present study focused on cases in which the tentative predictor and the tentative response are continuous variables (covariates can either be continuous or categorical). The reason for this is that both candidates models (x → y and y → x) must be specified as standard linear regression models (similarly, the proposed SPSS macros are designed to evaluate two competing standard linear models). Although previous studies (cf. Inazumi et al., 2011; Peters, Janzing, & Schölkopf, 2011; von Eye & Wiedermann, 2016, 2017; Wiedermann & von Eye, 2018) discussed principles of direction dependence when both variables are categorical in nature, extending DDA to the generalized linear modeling framework (McCullagh & Nelder, 1989) would be most promising for evaluating causal relations among categorical, count, and continuous variables.

Author note

We thank the two anonymous reviewers, Wes Bonifay, Francis Huang, Edgar C. Merkle, Anna P. Nutt, and Phillip K. Wood for their constructive comments on an earlier version of the article. We are also indebted to Ingrid Koller for providing the data used for illustrative purposes.

Notes

Reciprocal causal models (x affects y, and vice versa) may serve as a fourth possible explanation for variable associations. Although it is mathematically possible to estimate reciprocal effects with cross-sectional data (James & Singh, 1978), some controversy exists about the adequacy of those estimates (Wong & Law, 1999) due to the absence of temporality. In some theories, temporal precedence constitutes a crucial element to quantify feedback loops, and thus, longitudinal data are usually preferred (Rogosa, 1985).
Although both types of statistical models are of importance for theory building (Braun & Oswald, 2011) and can be characterized as association-based models in the context of observational data, each one plays a different role in the process of testing and redefining theories. In explanatory models, a priori theories carry the crucial element of causation and the goal is to match a statistical model f and an underlying mechanism $ \mathcal{F} $ and use x and y as tools to estimate and validate f for the purpose of testing the causal hypothesis of interest. In contrast, in predictive models, f is considered being a tool capturing variable associations and x and y are of primary interest to build valid models for the purpose of forecasting new response values (Shmueli, 2010).
Note that correlation and higher-moment parameters refer to population values, which implies that DDA quantities will exactly hold in the population. Due to sampling variability, DDA quantities will not hold exactly for sample estimates but converge to their true values with increasing sample size.
We do not focus on tanh-based tests because of low power for asymmetrically distributed variables.
We thank one of the anonymous reviewers for this suggestion.

References

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks: Sage.
Google Scholar
Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.
Google Scholar
Anscombe, F. J., & Glynn, W. J. (1983). Distribution of the kurtosis statistics b2 for normal samples. Biometrika, 70, 227–234. doi:https://doi.org/10.2307/2335960
Google Scholar
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley.
Book Google Scholar
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9, 78–84. doi:https://doi.org/10.1027/1614-2241/a000057
Article Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Book Google Scholar
Box, G. E. P., & Watson, G. S. (1962). Robustness to nonnormality of regression tests. Biometrika, 49, 93–106. doi:https://doi.org/10.1093/biomet/49.1-2.93
Article Google Scholar
Braun, M. T., & Oswald, F. L. (2011). Exploratory regression analysis: A tool for selecting models and determining predictor importance. Behavior Research Methods, 43, 331–339. doi:https://doi.org/10.3758/s13428-010-0046-8
Article PubMed Google Scholar
Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98, 550–558. doi:https://doi.org/10.1037/a0018933
Article PubMed Google Scholar
Cain, M. K., Zhang, Z., & Yuan, K. H. (2017). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods, 49, 1716–1735. doi:https://doi.org/10.3758/s13428-016-0814-1
Article PubMed Google Scholar
Chickering D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554.
Google Scholar
Cook, D. L. (1959). A replication of Lord’s study on skewness and kurtosis of observed test-score distributions. Educational and Psychological Measurement, 19, 81–87. doi:https://doi.org/10.1177/001316445901900109
Article Google Scholar
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
Google Scholar
Cudeck, R., & Henly, S. J. (2003). A realistic perspective on pattern representation in growth data: Comment on Bauer and Curran (2003). Psychological Methods, 8, 378–383. doi:https://doi.org/10.1037/1082-989X.8.3.378
Article PubMed Google Scholar
D’Agostino, R. B. (1971). An omnibus test of normality for moderate and large sample sizes. Biometrika, 58, 341–348. doi:https://doi.org/10.2307/2334522
Article Google Scholar
Darmois, G. (1953). Analyse générale des liaisons stochastique. Review of the International Statistical Institute, 21, 2–8. doi:https://doi.org/10.2307/1401511
Article Google Scholar
Dehaene, S., & Cohen, L. (1998). Levels of representation in number processing. In B. Stemmer & H. A. Whitaker (Eds.), The handbook of neurolinguistics (pp. 331–341). New York: Academic Press.
Chapter Google Scholar
Dodge, Y., & Rousson, V. (2000). Direction dependence in a regression line. Communications in Statistics: Theory and Methods, 29, 1957–1972. doi:https://doi.org/10.1080/03610920008832589
Article Google Scholar
Dodge, Y., & Rousson, V. (2001). On asymmetric properties of the correlation coefficient in the regression setting. American Statistician, 55, 51–54. doi:https://doi.org/10.1198/000313001300339932
Article Google Scholar
Dodge, Y., & Rousson, V. (2016). Recent developments on the direction of a regression line. In W. Wiedermann & A. von Eye (eds.), Statistics and causality: Methods for applied empirical research (pp. 45–62). Hoboken: Wiley.
Google Scholar
Dodge, Y., & Yadegari, I. (2010). On direction of dependence. Metrika, 72, 139–150. doi:https://doi.org/10.1007/s00184-009-0273-0
Article Google Scholar
Entner, D., Hoyer, P. O., & Spirtes, P. (2012). Statistical test for consistent estimation of causal effects in linear non-Gaussian models. Journal of Machine Learning Research: Workshop and Conference Proceedings, 22, 364–372.
Google Scholar
Frisch, R., & Waugh, F. (1933). Partial time regressions as compared with individual trends. Econometrica, 1, 387–401. doi:https://doi.org/10.2307/1907330
Article Google Scholar
Geisser, J. (1993). Predictive inference: An introduction. London: Chapman & Hall.
Book Google Scholar
Gentile, D. A., Lynch, P. J., Linder, J. R., & Walsh, D. A. (2004). The effects of violent video game habits on adolescent hostility, aggressive behaviors, and school performance. Journal of Adolescence, 27, 5–22. doi:https://doi.org/10.1016/j.adolescence.2003.10.002
Article PubMed Google Scholar
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2008). A kernel statistical test of independence. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 585–592). Cambridge: MIT Press.
Google Scholar
Hampel, F. R. (1973). Robust estimation: A condensed partial survey. Zeitschrift für Wahrscheinlichkeitstheorie, 27, 87–104. doi:https://doi.org/10.1007/bf00536619
Article Google Scholar
Harris, A., & Seckl, J. (2011). Glucocorticoids, prenatal stress and the programming of disease. Hormones and Behavior, 59, 279–289. doi:https://doi.org/10.1016/j.yhbeh.2010.06.007
Article PubMed Google Scholar
Heckman, J. J., & Smith, J. A. (1995) Assessing the case for social experiments. Journal of Economic Perspectives, 9, 85–110. doi:https://doi.org/10.1257/jep.9.2.85
Article Google Scholar
Ho, A. D., & Yu, C. C. (2015). Descriptive statistics for modern test score distributions skewness, kurtosis, discreteness, and ceiling effects. Educational and Psychological Measurement, 75, 365–388. doi:https://doi.org/10.1177/0013164414548576
Article PubMed Google Scholar
Hoyer, P. O., Shimizu, S., Kerminen, A. J., & Palviainen, M. (2008). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49, 362–378. doi:https://doi.org/10.1016/j.ijar.2008.02.006
Article Google Scholar
Hyvärinen, A. (2010). Pairwise measures of causal direction in linear non-Gaussian acyclic models. In JMLR: Workshop and Conference Proceedings (Vol. 13, pp. 1–16). Tokyo, Japan: JMLR.
Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent components analysis. New York: Wiley.
Book Google Scholar
Hyvärinen, A., & Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14, 111–152.
Google Scholar
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15, 309–334. doi:https://doi.org/10.1037/a0020761
Article PubMed Google Scholar
Imai, K., Tingley, D., & Yamamoto, T. (2013). Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society: Series A, 176, 5–51. doi:https://doi.org/10.1111/j.1467-985x.2012.01032.x
Article Google Scholar
Inazumi, T., Washio, T., Shimizu, S., Suzuki, J., Yamamoto, A., & Kawahara, Y. (2011). Discovering causal structures in binary exclusive-or skew acyclic models. In F. Cozman & A. Pfeffer (Eds.), Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (pp. 373–382). Corvallis: AUAI Press. arXiv:1202.3736
Google Scholar
James, L. R., & Singh, B. K. (1978). An introduction to the logic, assumptions, and basic analytic procedures of two-stage least squares. Psychological Bulletin, 85, 1104–1122. doi:10.1037/0033-2909.85.5.1104
Judd, C. M., & Kenny, D. A. (2010). Data analysis. In D. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology (5th ed., Vol. 1, pp. 115–139). New York: Wiley.
Kaufman, R. L. (2013). Heteroskedasticity in regression: Detection and correction. Thousand Oaks: Sage.
Book Google Scholar
Keele, L. (2015). Causal mediation analysis: Warning! Assumptions ahead. American Journal of Evaluation, 36, 500–513. doi:https://doi.org/10.1177/1098214015594689
Article Google Scholar
Koller, I., & Alexandrowicz, R. W. (2010). A psychometric analysis of the ZAREKI-R using Rasch-models. Diagnostica, 56, 57–67. doi:https://doi.org/10.1026/0012-1924/a000003
Article Google Scholar
Lim, C. R., Harris, K., Dawson, J., Beard, D. J., Fitzpatrick, R., & Price, A. J. (2015). Floor and ceiling effects in the OHS: An analysis of the NHS PROMs data set. BMJ Open, 5, e007765. doi:https://doi.org/10.1136/bmjopen-2015-007765
Article PubMed PubMed Central Google Scholar
Lord, F. M. (1955). A survey of observed test-score distributions with respect to skewness and kurtosis. Educational and Psychological Measurement, 15, 383–389. doi:https://doi.org/10.1177/001316445501500406
Article Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.
Google Scholar
Lovell, M. (1963). Seasonal adjustment of economic time series and multiple regression analysis. Journal of the American Statistical Association, 58, 993–1010. doi:https://doi.org/10.1080/01621459.1963.10480682.
Article Google Scholar
MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1, 173–181. doi:https://doi.org/10.1023/A:1026595011371
Article PubMed PubMed Central Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. doi:https://doi.org/10.1007/bf02296272
Article Google Scholar
McCullagh, P., & Nelder, A. (1989). Generalized linear models (2nd). London: Chapman & Hall.
Book Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. doi:https://doi.org/10.1037/0033-2909.105.1.156
Article Google Scholar
Miller, T. W., Nigg, J. T., & Miller, R. L. (2009). Attention deficit hyperactivity disorder in African American children: What can be concluded from the past ten years? Clinical Psychology Review, 29, 77–86. doi:https://doi.org/10.1016/j.cpr.2008.10.001
Article PubMed Google Scholar
Muddapur, M. V. (2003). On directional dependence in a regression line. Communications in Statistics: Theory and Methods, 32, 2053–2057. doi:https://doi.org/10.1081/sta-120023266
Article Google Scholar
Mumford, J. A., & Ramsey, J. D. (2014). Bayesian networks for fMRI: A primer. NeuroImage, 86, 573–582. doi:https://doi.org/10.1016/j.neuroimage.2013.10.020
Article PubMed Google Scholar
Munafò, M. R., & Araya, R. (2010). Cigarette smoking and depression: A question of causation. British Journal of Psychiatry, 196, 425–426. doi:https://doi.org/10.1192/bjp.bp.109.074880
Article PubMed Google Scholar
Nigg, J. T. (2012). Future directions in ADHD etiology research. Journal of Clinical Child & Adolescent Psychology, 41, 524–533. doi:https://doi.org/10.1080/15374416.2012.686870
Article Google Scholar
Nigg, J. T., Knottnerus, G. M., Martel, M. M., Nikolas, M., Cavanagh, K., Karmaus, W., & Rappley, M. D. (2008). Low blood lead levels associated with clinically diagnosed attention-deficit/hyperactivity disorder and mediated by weak cognitive control. Biological Psychiatry, 63, 325–331. doi:https://doi.org/10.1016/j.biopsych.2007.07.013
Article PubMed Google Scholar
Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd). Cambridge: Cambridge University Press.
Book Google Scholar
Pearson, E. S. (1931). The analysis of variance in case of non-normal variation. Biometrika, 23, 114–133. doi:https://doi.org/10.2307/2333631
Article Google Scholar
Peters, J., Janzing, D., & Schölkopf, B. (2011). Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2436–2450. doi:https://doi.org/10.1109/tpami.2011.71
Article PubMed Google Scholar
Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of causal inference: Foundations and learning algorithms. Cambridge: MIT Press.
Google Scholar
Pornprasertmanit, S., & Little, T. D. (2012). Determining directional dependency in causal associations. International Journal of Behavioral Development, 36, 313–322. doi:https://doi.org/10.1177/0165025412448944
Article PubMed PubMed Central Google Scholar
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press. (Original work published 1960)
Google Scholar
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks: Sage.
Google Scholar
Richardson, T., & Spirtes, P., (1999). Automated discovery of linear feedback models. In C. Glymour & G. F. Cooper (Eds.), Computation, causation and discovery (pp. 253–304). Cambridge: MIT Press.
Rogosa, D. R. (1985). Analysis of reciprocal effects. In T. Husen & N. Postlethwaite (Eds.), International encyclopedia of education (pp. 4221–4225). London: Pergamon Press.
Google Scholar
Sen, A., & Sen, B. (2014). Testing independence and goodness-of-fit in linear models. Biometrika, 101, 927–942. doi:https://doi.org/10.1093/biomet/asu026
Article Google Scholar
Shimizu, S. (2016). Non-Gaussian structural equation models for causal discovery. In W. Wiedermann & A. von Eye (eds.), Statistics and causality: Methods for applied empirical research (pp. 153–276). Hoboken: Wiley.
Chapter Google Scholar
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7, 2003–2030.
Google Scholar
Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., . . . Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12, 1225–1248.
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25, 289–310. doi:https://doi.org/10.1214/10-sts330
Article Google Scholar
Skitovich, W. P. (1953). On a property of the normal distribution. Doklady Akademii Nauk SSSR, 89, 217–219.
Google Scholar
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd). Cambridge: MIT Press
Google Scholar
Spirtes, P., Richardson, T., Meek, C., Scheines, R., & Glymour, C. (1998). Using path diagrams as a structural equation modeling tool. Sociological Methods and Research, 27, 182–225. doi:https://doi.org/10.1177/0049124198027002003
Article Google Scholar
Spirtes, P., & Zhang, K. (2016). Causal discovery and inference: Concepts and recent methodological advances. Applied Informatics, 3, 1–28. doi:https://doi.org/10.1186/s40535-016-0018-x
Article Google Scholar
Sungur, E. A. (2005). A note on directional dependence in regression setting. Communications in Statistics: Theory and Methods, 34, 1957–1965. doi:https://doi.org/10.1080/03610920500201228
Article Google Scholar
Taylor, G., McNeill, A., Girling, A., Farley, A., Lindson-Hawley, N., & Aveyard, P. (2014). Change in mental health after smoking cessation: Systematic review and meta-analysis. British Medical Journal, 348, 1–22. doi:https://doi.org/10.1136/bmj.g1151
Google Scholar
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., … de Vet, H. C. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60, 34–42. doi:https://doi.org/10.1016/j.jclinepi.2006.03.012
Article PubMed Google Scholar
Teuscher, F., & Guiard, V. (1995). Sharp inequalities between skewness and kurtosis for unimodal distributions. Statistics and Probability Letters, 22, 257–260. doi:https://doi.org/10.1016/016771529400074I
Article Google Scholar
Verma, T. S., & Pearl, J. (1991). Equivalence and synthesis of causal models. Uncertainty in Artificial Intelligence, 6, 220–227.
Google Scholar
von Aster, M., Weinhold Zulauf, M., & Horn, R. (2006). Neuropsychologische Testbatterie fuer Zahlenverarbeitung und Rechnen bei Kindern (ZAREKI-R) [Neuropsychological test battery for number processing and calculation in children]. Frankfurt: Harcourt Test Services.
Google Scholar
von Aster, M. G., & Shalev, R. S. (2007). Number development and dyscalculia. Developmental Medicine and Child Neurology, 49, 868–873. doi:https://doi.org/10.1111/j.1469-8749.2007.00868.x
Article Google Scholar
von Eye, A., & DeShon, R. P. (2012). Directional dependence in developmental research. International Journal of Behavioral Development, 36, 303–312. doi:https://doi.org/10.1177/0165025412439968
Article Google Scholar
von Eye, A., & Wiedermann, W. (2014). On direction of dependence in latent variable contexts. Educational and Psychological Measurement, 74(1), 5–30. doi:https://doi.org/10.1177/0013164413505863
von Eye, A., & Wiedermann, W. (2016). Direction of effects in categorical variables: A structural perspective. In W. Wiedermann & A. von Eye (Eds.), Statistics and causality: Methods for applied empirical research (pp. 107–130). Hoboken: Wiley.
Google Scholar
von Eye, A., & Wiedermann, W. (2017). Direction of effects in categorical variables: Looking inside the table. Journal of Person-Oriented Research, 3, 11–26. doi:https://doi.org/10.17505/jpor.2017.02
Article Google Scholar
White, H., & MacDonald, G. M. (1980). Some large-sample tests for nonnormality in the linear regression model. Journal of the American Statistical Association, 75, 16–28. doi:https://doi.org/10.2307/2287373
Article Google Scholar
Wiedermann, W. (2015). Decisions concerning the direction of effects in linear regression models using the fourth central moment. In M. Stemmler, A. von Eye, & W. Wiedermann (Eds.), Dependent data in social sciences research: Forms, issues, and methods of analysis (pp. 149–169). New York: Springer.
Chapter Google Scholar
Wiedermann, W. (2017). A note on fourth moment-based direction dependence measures when regression errors are non normal. Communications in Statistics: Theory and Methods. doi:https://doi.org/10.1080/03610926.2017.1388403
Wiedermann, W., Artner, R., & von Eye, A. (2017). Heteroscedasticity as a basis of direction dependence in reversible linear regression models. Multivariate Behavioral Research, 52, 222–241. doi:https://doi.org/10.1080/00273171.2016.1275498
Article PubMed Google Scholar
Wiedermann, W., & Hagmann, M. (2015). Asymmetric properties of the Pearson correlation coefficient: Correlation as the negative association between linear regression residuals. Communications in Statistics, 45, 6263–6283. doi:https://doi.org/10.1080/03610926.2014.960582
Article Google Scholar
Wiedermann, W., Hagmann, M., Kossmeier, M., & von Eye, A. (2013). Resampling techniques to determine direction of effects in linear regression models. Interstat. Retrieved May 13, 2013, from http://interstat.statjournals.net/YEAR/2013/articles/1305002.pdf
Wiedermann, W., Hagmann, M., & von Eye, A. (2015). Significance tests to determine the direction of effects in linear regression models. British Journal of Mathematical and Statistical Psychology, 68, 116–141. doi:https://doi.org/10.1111/bmsp.12037
Article PubMed Google Scholar
Wiedermann, W., Merkle, E. C., & von Eye, A. (2018). Direction of dependence in measurement error models. British Journal of Mathematical and Statistical Psychology, 71, 117–145. doi:https://doi.org/10.1111/bmsp.12111
Article PubMed Google Scholar
Wiedermann, W., & von Eye, A. (2015a). Direction-dependence analysis: A confirmatory approach for testing directional theories. International Journal of Behavioral Development, 39, 570–580. doi:https://doi.org/10.1177/0165025415582056
Article Google Scholar
Wiedermann, W., & von Eye, A. (2015b). Direction of effects in multiple linear regression model. Multivariate Behavioral Research, 50, 23–40. doi:https://doi.org/10.1080/00273171.2014.958429
Article PubMed Google Scholar
Wiedermann, W., & von Eye, A. (2015c). Direction of effects in mediation analysis. Psychological Methods, 20, 221–244. doi:https://doi.org/10.1037/met0000027
Article PubMed Google Scholar
Wiedermann, W., & von Eye, A. (2016). Directionality of effects in causal mediation analysis. In W. Wiedermann & A. von Eye (Eds.), Statistics and causality: Methods for applied empirical research (pp. 63–106). Hoboken: Wiley.
Chapter Google Scholar
Wiedermann, W., & von Eye, A. (2018). Log-linear models to evaluate direction of effect in binary variables. Statistical Papers. doi:https://doi.org/10.1007/s00362-017-0936-2
Wong, C. S., & Law, K. S. (1999). Testing reciprocal relations by nonrecursive structural equation models using cross-sectional data. Organizational Research Methods, 2, 69–87. doi:https://doi.org/10.1177/109442819921005
Article Google Scholar
Zhang, J. (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172, 1873–1896. doi:https://doi.org/10.1016/j.artint.2008.08.001
Article Google Scholar

Download references

Author information

Authors and Affiliations

Statistics, Measurement, and Evaluation in Education, Department of Educational, School, and Counseling Psychology, College of Education, University of Missouri, 13B Hill Hall, Columbia, MO, 65211, USA
Wolfgang Wiedermann & Xintong Li

Authors

Wolfgang Wiedermann
View author publications
You can also search for this author in PubMed Google Scholar
Xintong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Wiedermann.

Electronic supplementary material

ESM 1

(PDF 84 kb)

ESM 2

(PDF 93 kb)

ESM 3

(DOC 130 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wiedermann, W., Li, X. Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS. Behav Res 50, 1581–1601 (2018). https://doi.org/10.3758/s13428-018-1031-x

Download citation

Published: 16 April 2018
Issue Date: August 2018
DOI: https://doi.org/10.3758/s13428-018-1031-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS

Abstract

Similar content being viewed by others

semPower: General power analysis for structural equation models

Path Analysis

Generating Correlated, Non-normally Distributed Data Using a Non-linear Structural Model

The direction dependence principle

Model definitions

DDA Component I: Distributional properties of observed variables

Absence of confounders

Presence of confounders

Statistical inference

DDA Component II: Distributional properties of error terms

Absence of confounders

Presence of confounders

Statistical inference

DDA Component III: Independence properties of predictor and error term

Absence of confounders

Presence of confounders

Statistical inference

Model selection

A worked empirical example with SPSS

Distributional requirements for DDA

Estimating and validating the target model

Evaluating the direction of effect

Discussion

The case of nonnormal “true” errors

Power and sample size considerations: What we know so far

Further application scenarios and extensions

Author note

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

ESM 2

ESM 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation