Skip to main content
Log in

A constraint-based algorithm for causal discovery with cycles, latent variables and selection bias

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Causal processes in nature may contain cycles, and real datasets may violate causal sufficiency as well as contain selection bias. No constraint-based causal discovery algorithm can currently handle cycles, latent variables and selection bias (CLS) simultaneously. I therefore introduce an algorithm called cyclic causal inference (CCI) that makes sound inferences with a conditional independence oracle under CLS, provided that we can represent the cyclic causal process as a non-recursive linear structural equation model with independent errors. Empirical results show that CCI outperforms the cyclic causal discovery algorithm in the cyclic case as well as rivals the fast causal inference and really fast causal inference algorithms in the acyclic case. An R implementation is available at https://github.com/ericstrobl/CCI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. We can perform the fixed point method more efficiently in the linear case by first representing the structural equations in matrix format: \(\varvec{X} = B \varvec{X} + \varvec{\varepsilon }\). Then, after drawing the values of \(\varvec{\varepsilon }\), we can obtain the values of \(\varvec{X}\) by solving the following system of equations: \(\varvec{X} = ({\mathbb {I}}-B)^{-1}\varvec{\varepsilon }\), where \({\mathbb {I}}\) denotes the identity matrix.

  2. CCD cannot handle selection bias as proposed in [20], but the algorithm may be able to if we modify the proofs.

  3. The CPMAG is also known as a partial ancestral graph (PAG). However, we will use the term CPMAG in order to mimic the use of the term CPDAG.

References

  1. Blondel, G., Arias, M., Gavaldà, R.: Identifiability and transportability in dynamic causal networks. Int. J. Data Sci. Anal. 3(2), 131–147 (2017). https://doi.org/10.1007/s41060-016-0028-8

    Article  Google Scholar 

  2. Colombo, D., Maathuis, M.H., Kalisch, M., Richardson, T.S.: Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40(1), 294–321 (2012). https://doi.org/10.1214/11-AOS940

    Article  MathSciNet  MATH  Google Scholar 

  3. Dagum, P., Galper, A., Horvitz, E., Seiver, A.: Uncertain reasoning and forecasting. Int. J. Forecast. 11, 73–87 (1995)

    Article  Google Scholar 

  4. Eberhardt, F.: Introduction to the foundations of causal discovery. Int. J. Data Sci. Anal. 3(2), 81–91 (2017). https://doi.org/10.1007/s41060-016-0038-6

    Article  Google Scholar 

  5. Evans, R.J.: Graphs for margins of bayesian networks. Scand. J. Stat. 43(3), 625–648 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  6. Fisher, FM.: (1970) A correspondence principle for simultaneous equation models. Econometrica 38(1):73–92. https://EconPapers.repec.org/RePEc:ecm:emetrp:v:38:y:1970:i:1:p:73-92

  7. Forré, P., Mooij, JM.: (2017) Markov properties for graphical models with cycles and latent variables. arXivorg preprint arXiv:1710.08775 [math.ST]. https://arxiv.org/abs/1710.08775

  8. Forré, P., Mooij, JM.: Constraint-based causal discovery for non-linear structural causal models with cycles and latent confounders. In: Proceedings of the 34th Annual Conference on Uncertainty in Artificial Intelligence (UAI-18) (2018)

  9. Hyttinen, A., Hoyer, PO., Eberhardt, F., Järvisalo, M.: (2013) Discovering cyclic causal models with latent variables: a general sat-based procedure. In: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013, Bellevue, WA, USA, August 11–15. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=2391&proceeding_id=29

  10. Hyttinen, A., Eberhardt, F., Järvisalo, M.: (2914) Constraint-based causal discovery: conflict resolution with answer set programming. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, United States, UAI’14, pp. 340–349. http://dl.acm.org/citation.cfm?id=3020751.3020787

  11. Kalisch, M., Bühlmann, P.: (2007) Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8:613–636. http://dl.acm.org/citation.cfm?id=1248659.1248681

  12. Lauritzen, S.L., Richardson, T.S.: Chain graph models and their causal interpretations. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 64(3), 321–348 (2002). https://doi.org/10.1111/1467-9868.00340. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00340

  13. Lauritzen, S.L., Dawid, A.P., Larsen, B.N., Leimer, H.G.: Independence properties of directed Markov fields. Networks 20(5), 491–505 (1990). https://doi.org/10.1002/net.3230200503

    Article  MathSciNet  MATH  Google Scholar 

  14. Mahmood, S.S., Levy, D., Vasan, R.S., Wang, T.J.: The framingham heart study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet 383(9921), 999–1008 (2014). https://doi.org/10.1016/S0140-6736(13)61752-3. http://www.sciencedirect.com/science/article/pii/S0140673613617523

  15. Meek, C.: (1995) Causal inference and causal explanation with background knowledge. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, UAI’95, pp. 403–410. http://dl.acm.org/citation.cfm?id=2074158.2074204

  16. Mooij, JM., Heskes, T.: (2013) Cyclic causal discovery from continuous equilibrium data. In: Nicholson, A., Smyth, P. (eds,) Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI-13), AUAI Press, pp. 431–439. http://auai.org/uai2013/prints/papers/23.pdf

  17. Raghu, V.K., Ramsey, J.D., Morris, A., Manatakis, D.V., Sprites, P., Chrysanthis, P.K., Glymour, C., Benos, P.V.: Comparison of strategies for scalable causal discovery of latent variable models from mixed data. Int. J. Data Sci. Anal. 6(1), 33–45 (2018). https://doi.org/10.1007/s41060-018-0104-3

    Article  Google Scholar 

  18. Richardson, T.: (1994) Properties of cyclic graphical models. Master’s thesis, Carnegie Mellon University

  19. Richardson, T.: (1996) A discovery algorithm for directed cyclic graphs. In: Proceedings of the Twelfth International Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, UAI’96, pp. 454–461. http://dl.acm.org/citation.cfm?id=2074284.2074338

  20. Richardson, T., Spirtes, P.: Automated causal discovery under linear feedback. Computation, Causation, and Discovery, pp. 253–302. AAAI Press, Menlo Park, CA (1999)

    Google Scholar 

  21. Richardson, T., Spirtes, P.: Ancestral graph Markov models. Ann. Stat. 30, 2002 (2000)

    MathSciNet  MATH  Google Scholar 

  22. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721), 523–529 (2005)

    Article  Google Scholar 

  23. Spirtes, P.: (1995) Directed cyclic graphical representations of feedback models. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, UAI’95, pp. 491–498. http://dl.acm.org/citation.cfm?id=2074158.2074214

  24. Spirtes, P., Richardson, T.: (1996) A polynomial time algorithm for determining dag equivalence in the presence of latent variables and selection bias. In: Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics

  25. Spirtes, P., Meek, C., Richardson, T.: (1995) Causal inference in the presence of latent variables and selection bias. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, UAI’95, pp. 499–506. http://dl.acm.org/citation.cfm?id=2074158.2074215

  26. Spirtes, P., Meek, C., Richardson, T.: An algorithm for causal inference in the presence of latent variables and selection bias. Computation, Causation, and Discovery, pp. 211–252. AAAI Press, Menlo Park, CA (1999)

    Google Scholar 

  27. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  28. Strobl, EV .: (2017) Causal discovery under non-stationary feedback. PhD thesis, University of Pittsburgh

  29. Strobl, EV., Zhang, K., Visweswaran, S.: (2017) Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. http://arxiv.org/abs/1702.03877

  30. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006). https://doi.org/10.1007/s10994-006-6889-7

    Article  Google Scholar 

  31. Zhang, J.: On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell. 172(16–17), 1873–1896 (2008). https://doi.org/10.1016/j.artint.2008.08.001

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful and constructive comments. This project was carried out independently during off hours without a source of funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric V. Strobl.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Appendices

Appendix: Algorithms

We will utilize ideas developed for the PC, FCI, RFCI and CCD algorithms in order to construct CCI. We therefore briefly review PC, FCI, RFCI and CCD in the next four subsections.

1.1 The PC algorithm

The PC algorithm considers the following problem: Assume that \({\mathbb {P}}\) is d-separation faithful to an unknown DAG \({\mathbb {G}}\). Then, given oracle information about the conditional independencies between any pair of variables \(X_i\) and \(X_j\) given any \(\varvec{W} \subseteq \varvec{X}\setminus \{X_i,X_j \}\) in \({\mathbb {P}}\), reconstruct as much of the underlying DAG as possible. The PC algorithm ultimately accomplishes this goal by reconstructing the DAG up to its Markov equivalence class, or the set of DAGs with the same conditional dependence and independence relations between variables in \(\varvec{X}\) [15, 27].

The PC algorithm represents the Markov equivalence class of DAGs using a completed partially directed acyclic graph (CPDAG). A partially directed acyclic graph (PDAG) is a graph with both directed and undirected edges. A PDAG is completed when the following conditions hold: (1) every directed edge also exists in every DAG belonging to the Markov equivalence class of the DAG, and (2) there exists a DAG with \(X_i \rightarrow X_j\) and a DAG with \(X_i \leftarrow X_j\) in the Markov equivalence class for every undirected edge \(X_i - X_j\). Each edge in the CPDAG also has the following interpretation:

  1. (i)

    An edge (directed or undirected) is absent between two vertices \(X_i\) and \(X_j\) if and only if there exists some \(\varvec{W} \subseteq \varvec{X}\setminus \{X_i, X_j\}\) such that \(X_i \perp \!\!\!\perp X_j | \varvec{W}\).

  2. (ii)

    If there exists a directed edge from \(X_i\) to \(X_j\), then \(X_i \in Pa (X_j)\).

figure c

The PC algorithm learns the CPDAG through a three-step procedure. First, the algorithm initializes a fully connected undirected graph and then determines the presence or absence of each undirected edge using the following fact: Under d-separation faithfulness, \(X_i\) and \(X_j\) are non-adjacent if and only if \(X_i\) and \(X_j\) are conditionally independent given some subset of \( Pa (X_i)\setminus X_j\) or some subset of \( Pa (X_j)\setminus X_i\). Note that PC cannot differentiate between the parents and children of a vertex from its neighbors using an undirected graph. Thus, PC tests whether \(X_i\) and \(X_j\) are conditionally independent given all subsets of \( Adj (X_i)\setminus X_j\) and all subsets of \( Adj (X_j)\setminus X_i\), where \( Adj (X_i)\) denotes the vertices adjacent to \(X_i\) in \({\mathbb {G}}\) (a superset of \( Pa (X_i)\)), in order to determine the final adjacencies; we refer to this sub-procedure of PC as skeleton discovery and list the pseudocode in Algorithm 3. The PC algorithm therefore removes the edge between \(X_i\) and \(X_j\) during skeleton discovery if such a conditional independence is found.

Step 2 of the PC algorithm orients unshielded triples to v-structures \(X_i \rightarrow X_j \leftarrow X_k\) if \(X_j\) is not in the set of variables which rendered \(X_i\) and \(X_k\) conditionally independent in the skeleton discovery phase of the algorithm. The final step of the PC algorithm involves the repetitive application of three orientation rules to replace as many tails as possible with arrowheads [15].

1.2 The FCI algorithm

The FCI algorithm considers the following problem: Assume that the distribution of \(\varvec{X} = \varvec{O} \cup \varvec{L} \cup \varvec{S}\) is d-separation faithful to an unknown DAG. Then, given oracle information about the conditional independencies between any pair of variables \(O_i\) and \(O_j\) given any \(\varvec{W} \subseteq \varvec{O}\setminus \{O_i,O_j \}\) as well as \(\varvec{S}\), reconstruct as much information about the underlying DAG as possible [27]. The FCI algorithm ultimately accomplishes this goal by reconstructing a MAG up to its Markov equivalence class, or the set of MAGs with the same conditional dependence and independence relations between variables in \(\varvec{O}\) given \(\varvec{S}\) [31].

The FCI algorithm represents the Markov equivalence class of MAGs using a completed partial maximal ancestral graph (CPMAG).Footnote 3 A partial maximal ancestral graph (PMAG) is nothing more than a MAG with possibly some circle endpoints. A PMAG is completed (and hence a CPMAG) when the following conditions hold: (1) every tail and arrowhead also exists in every MAG belonging to the Markov equivalence class of the MAG, and (2) there exists a MAG with a tail and a MAG with an arrowhead in the Markov equivalence class for every circle endpoint. Each edge in the CPMAG also has the following interpretations:

  1. (i)

    An edge is absent between two vertices \(O_i\) and \(O_j\) if and only if there exists some \(\varvec{W} \subseteq \varvec{O}\setminus \{O_i, O_j\}\) such that \(O_i \perp \!\!\!\perp O_j | \varvec{W} \cup \varvec{S}\). That is, an edge is absent if and only if there does not exist an inducing path between \(O_i\) and \(O_j\).

  2. (ii)

    If an edge between \(O_i\) and \(O_j\) has an arrowhead at \(O_j\), then \(O_j \not \in Anc (O_i \cup \varvec{S})\).

  3. (iii)

    If an edge between \(O_i\) and \(O_j\) has a tail at \(O_j\), then \(O_j \in Anc (O_i \cup \varvec{S})\).

The FCI algorithm learns the CPMAG through a three-step procedure involving skeleton discovery, v-structure orientation and orientation rule application. The skeleton discovery procedure involves running PC’s skeleton discovery procedure, orienting v-structures using Algorithm 4 and then re-performing skeleton discovery using possible d-separating sets (see Definition 4) constructed after the v-structure discovery process. FCI then orients v-structures again using Algorithm 4 on the final skeleton. The third step of FCI involves the repetitive application of 10 orientation rules [31].

figure d
figure e

1.3 The RFCI algorithm

Discovering inducing paths can require large possible d-separating sets, so the FCI algorithm often takes too long to complete. The RFCI algorithm [2] resolves this problem by recovering a graph where the presence and absence of an edge have the following modified interpretations:

  1. (i.)

    The absence of an edge between two vertices \(O_i\) and \(O_j\) implies that there exists some \(\varvec{W} \subseteq \varvec{O}\setminus \{O_i, O_j\}\) such that \(O_i \perp \!\!\!\perp O_j | \varvec{W} \cup \varvec{S}\).

  2. (ii.)

    The presence of an edge between two vertices \(O_i\) and \(O_j\) implies that \(O_i \not \perp \!\!\!\perp O_j | \varvec{W} \cup \varvec{S}\) for all \(\varvec{W} \subseteq Adj (O_i){\setminus }O_j\) and for all \(\varvec{W} \subseteq Adj (O_j){\setminus }O_i\). Here, \( Adj (O_i)\) denotes the set of vertices adjacent to \(O_i\) in RFCI’s graph.

We encourage the reader to compare these edge interpretations with the edge interpretations of FCI’s CPMAG.

The RFCI algorithm learns its graph (not necessarily a CPMAG) also through a three-step process. The algorithm performs skeleton discovery using PC skeleton discovery procedure (Algorithm 3). RFCI then orients v-structures using Algorithm 6. Notice that Algorithm 6 requires more steps than Algorithm 4 used in FCI because an inducing path may not exist between any two adjacent vertices after only running PC’s skeleton discovery procedure. RFCI must therefore check for additional conditional dependence relations in order to infer the non-ancestral relations. RFCI finally repetitively applies the 10 orientation rules of FCI in the last step with some modifications to the fourth orientation rule (see [2] for further details).

figure f

1.4 The CCD algorithm

The CCD algorithm considers the following problem: Assume that \({\mathbb {P}}\) is d-separation faithful to an unknown possibly cyclic directed graph \({\mathbb {G}}\). Then, given oracle information about the conditional independencies between any pair of variables \(X_i\) and \(X_j\) given any \(\varvec{W} \subseteq \varvec{X}\setminus \{X_i,X_j \}\) in \({\mathbb {P}}\), output a partial oriented MAAG (see Sect. 6 for a definition) of the underlying directed graph [19, 20]. Notice that CCD does not consider latent or selection variables.

The CCD algorithm involves six steps. The first step corresponds to skeleton discovery and is analogous to PC’s procedure (Algorithm 3). CCD also orients v-structures like PC. The algorithm then, however, checks for certain long-range d-separation relations in its third step in order to infer additional non-ancestral relations. The fourth step proceeds similarly (but not exactly) to CCI’s Step 4 by discovering additional non-minimal d-separating sets. Finally, the fifth and sixth steps of CCD utilize the aforementioned non-minimal d-separating sets in order to orient additional endpoints. Note that CCD does not apply orientation rules.

Appendix: Proofs

In the arguments to follow, I will always consider a directed graph (cyclic or acyclic) with vertices \(\varvec{X} = \varvec{O} \cup \varvec{L} \cup \varvec{S}\), where \(\varvec{O}, \varvec{L}\) and \(\varvec{S}\) are disjoint sets.

1.1 Utility lemmas

Lemma 14

(Lemma 2.5 in Colombo et al. [2]) Suppose that \(X_i\) and \(X_j\) are not in \(\varvec{W} \subseteq \varvec{X}{\setminus }\{X_i, X_j\}\), there is a sequence \(\sigma \) of distinct vertices in \(\varvec{X}\) from \(X_i\) to \(X_j\), and there is a set \({\mathcal {T}}\) of paths such that:

  1. 1.

    for each pair of adjacent vertices \(X_v\) and \(X_w\) in \(\sigma \), there is a unique path in \({\mathcal {T}}\) that d-connects \(X_v\) and \(X_w\) given \(\varvec{W}\);

  2. 2.

    if a vertex \(X_q\) in \(\sigma \) is in \(\varvec{W}\), then the paths in \({\mathcal {T}}\) that contain \(X_q\) as an endpoint collide at \(X_q\);

  3. 3.

    if for three vertices \(X_v\), \(X_w\) and \(X_q\) occurring in that order in \(\sigma \), the d-connecting paths in \({\mathcal {T}}\) between \(X_v\) and \(X_w\), and between \(X_w\) and \(X_q\) collide at \(X_w\), then \(X_w\) has a descendant in \(\varvec{W}\).

Then, there is a path \(\varPi _{X_i X_j}\) in \({\mathbb {G}}\) that d-connects \(X_i\) and \(X_j\) given \(\varvec{W}\). In addition, if all of the edges in all of the paths in \({\mathcal {T}}\) that contain \(X_i\) are into (out of ) \(X_i\), then \(\varPi _{X_i X_j}\) is into (out of ) \(X_i\), and similarly for \(X_j\).

Lemma 15

Consider a directed graph with vertices \(O_i\) and \(O_j\) as well as a set of vertices \(\varvec{R}\) such that \(O_i, O_j \not \in \varvec{R}\). Suppose that there is a set \(\varvec{W}{\setminus }\{O_i, O_j\}\) such that \(\varvec{R} \subseteq \varvec{W}\) and every proper subset \(\varvec{V} \subset \varvec{W}\) where \(\varvec{R} \subseteq \varvec{V}\) d-connects \(O_i\) and \(O_j\) given \(\varvec{V} \cup \varvec{S}\). If \(O_i\) and \(O_j\) are d-separated given \(\varvec{W} \cup \varvec{S}\) where \(O_k \in \varvec{W}\), then \(O_k\) is an ancestor of \(\{O_i, O_j\} \cup \varvec{R} \cup \varvec{S}\).

Proof

We will prove the claim by contrapositive. That is, we will prove the following statement: Suppose that there is a set \(\varvec{W} \setminus \{O_i, O_j\}\) and every proper subset \(\varvec{V} \subset \varvec{W}\) where \(\varvec{R} \subseteq \varvec{V}\) d-connects \(O_i\) and \(O_j\) given \(\varvec{V} \cup \varvec{S}\). If \(O_k\) is not an ancestor of \(\{O_i, O_j\} \cup \varvec{R} \cup \varvec{S}\), then \(O_i\) and \(O_j\) are d-connected given \(\varvec{W} \cup \varvec{S}\) where \(O_k \in \varvec{W}\).

Let \(\varvec{W}^* = Anc (\{O_i, O_j \} \cup \varvec{R} \cup \varvec{S}) \cap \varvec{W}\). Note that \(\varvec{W}^*\) is a proper subset of \(\varvec{W}\) because \(\varvec{W}^*\) is a subset of \(\varvec{W}{\setminus }O_k\), so \(O_i\) and \(O_j\) must be d-connected given \(\varvec{W}^* \cup \varvec{S}\) by a path \(\varPi \) by assumption. By the definition of a d-connecting path, we know that every element in \(\varPi \) must be an ancestor of \(O_i\), \(O_j\), \(\varvec{R}\), \(\varvec{S}\) or \(\varvec{W}^*\) (or some union). Moreover, because \(\varvec{W}^* = Anc (\{O_i, O_j \} \cup \varvec{R} \cup \varvec{S}) \cap \varvec{W}\), every element in \(\varvec{W}^*\) is an ancestor of \(\{O_i, O_j\} \cup \varvec{R} \cup \varvec{S}\). Thus, every element on the path \(\varPi \) is an ancestor of \(\{O_i, O_j\} \cup \varvec{R} \cup \varvec{S}\). Since \(\varvec{W}^* \subset \varvec{W}\), the only way in which \(\varPi \) could fail to d-connect \(O_i\) and \(O_j\) given \(\varvec{W} \cup \varvec{S}\) would be if some element of \(\varvec{W}{\setminus }\varvec{W}^*\) were located on \(\varPi \). But neither \(O_k\) nor any element in \(\varvec{W}{\setminus }\varvec{W}^*\) is an ancestor of \(\{O_i, O_j\} \cup \varvec{R} \cup \varvec{S}\), so it follows that no vertex in \(\varvec{W}{\setminus }\varvec{W}^*\) lies on \(\varPi \). We conclude that \(O_i\) and \(O_j\) are d-connected given \(\varvec{W} \cup \varvec{S}\). \(\square \)

1.2 Step 1: Skeleton discovery

Lemma 1

There exists an inducing path between \(O_i\) and \(O_j\) if and only if \(O_i\) and \(O_j\) are d-connected given \(\varvec{W} \cup \varvec{S}\) for all possible subsets \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_j \}\).

Proof

I first prove the forward direction. Consider any set \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_j\}\). Suppose there exists an inducing path \(\varPi \) between \(O_i\) and \(O_j\). We have two situations:

  1. 1.

    There exists a collider \(C_1\) on \(\varPi \) that is an ancestor of \(O_i\) via a directed path \(C_1 \leadsto O_i\) but not an ancestor of \(\varvec{W} \cup \varvec{S}\). Let \(C_1\) more specifically be such a collider on \(\varPi \) closest to \(O_j\). Now, one of the following two conditions will hold:

    1. (a)

      There also exists a collider \(C_2\) on \(\varPi \) that is an ancestor of \(O_j\) via a directed path \(C_2 \leadsto O_j\) but not an ancestor of \(\varvec{W} \cup \varvec{S}\). Let \(C_2\) more specifically denote such a collider which is closest to \(C_1\) on \(\varPi \) (if two such colliders are equidistant from \(C_1\), then choose one arbitrarily). Let \(\varPi _{C_1 C_2}\) denote the part of the inducing path between \(C_1\) and \(C_2\). Recall that every non-collider on \(\varPi _{C_1 C_2}\) is a member of \(\varvec{L}\) because \(\varPi \) is an inducing path. Moreover, every collider on \(\varPi _{C_1 C_2}\) is an ancestor \(\varvec{W}\cup \varvec{S}\) by construction. Then, the path is a d-connecting path by invoking Lemma 14 with \({\mathcal {T}}\).

    2. (b)

      There does not exist a collider \(C_2\) on \(\varPi \) that is an ancestor of \(O_j\) via a directed path \(C_2 \leadsto O_j\) and not an ancestor of \(\varvec{W} \cup \varvec{S}\). It follows that all colliders on \(\varPi \) are ancestors of \(O_i \cup \varvec{W} \cup \varvec{S}\). More specifically, all of the colliders on \(\varPi _{O_j C_1}\) are ancestors of \(\varvec{W} \cup \varvec{S}\) by construction. Recall also that every non-collider on \(\varPi _{O_j C_1}\) is a member of \(\varvec{L}\) because \(\varPi \) is an inducing path. We conclude that the path \({\mathcal {T}} = \{\varPi _{O_j C_1}, C_1 \leadsto O_i \}\) is a d-connecting path by invoking Lemma 14 with \({\mathcal {T}}\).

  2. 2.

    There does not exist a collider \(C_1\) on \(\varPi \) that is an ancestor of \(O_i\) via a directed path \(C_1 \leadsto O_i\) and not an ancestor of \(\varvec{W} \cup \varvec{S}\). This implies that all colliders on \(\varPi \) are ancestors of \(O_j \cup \varvec{W} \cup \varvec{S}\). Let \(\varPi _{O_i C_3}\) correspond to the part of the inducing path between \(O_i\) and \(C_3\), where \(C_3\) corresponds to the collider closest to \(O_i\) that is an ancestor of \(O_j\) via a directed path \(C_3 \leadsto O_j\) but not an ancestor of \(\varvec{W} \cup \varvec{S}\); if we do not encounter such a collider, then set \(C_3 = O_j\). Notice then that all colliders on \(\varPi _{O_i C_3}\) are ancestors of \(\varvec{W} \cup \varvec{S}\). Recall also that every non-collider on \(\varPi _{O_i C_3}\) is a member of \(\varvec{L}\) because \(\varPi \) is an inducing path. Thus, the path \({\mathcal {T}} = \{ \varPi _{O_iC_3}, C_3 \leadsto O_j \}\) is a d-connecting path by invoking Lemma 14 with \({\mathcal {T}}\).

For the backward direction, assume \(O_i\) and \(O_j\) are d-connecting given \(\varvec{W} \cup \varvec{S}\) for all possible subsets \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_j\}\). Then, \(O_i\) and \(O_j\) are d-connected given \((( Anc (\{O_i, O_j \} \cup \varvec{S}) \cap \varvec{O} ) \cup \varvec{S}){\setminus }\{O_i, O_j\}\). The backward direction follows by invoking Lemma 8 in [26] whose argument remains unchanged even for a cyclic directed graph. \(\square \)

Lemma 2

If there does not exist an inducing path between \(O_i\) and \(O_j\), then \(O_i\) and \(O_j\) are d-separated given \( D-SEP (O_i,O_j) \cup \varvec{S}\). Likewise, \(O_i\) and \(O_j\) are d-separated given \( D-SEP (O_j,O_i) \cup \varvec{S}\).

Proof

We will prove this by contradiction. Assume that we have \(O_i \not \perp \!\!\!\perp _d O_j| D-SEP (O_i,O_j) \cup \varvec{S}\). If there does not exist an inducing path between \(O_i\) and \(O_j\), then there exists some \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_j\}\) such that \(O_i \perp \!\!\!\perp _d O_j|\varvec{W} \cup \varvec{S}\) by Lemma 1. Let \(\varPi \) correspond to the path d-connecting \(O_i\) and \(O_j\) given \( D-SEP (O_i,O_j) \cup \varvec{S}\).

We have two conditions:

  1. 1.

    Suppose that every vertex in \(\varvec{O}\) on \(\varPi \) is a collider on \(\varPi \). This implies that all non-colliders on \(\varPi \) must be in \(\varvec{L} \cup \varvec{S}\). But no non-collider on \(\varPi \) can be in \(\varvec{S}\) because \(\varPi \) would be inactive in that case. Thus, all non-colliders on \(\varPi \) must more specifically be in \(\varvec{L}\). Now, recall that we assumed that \(O_i \not \perp \!\!\!\perp _d O_j| D-SEP \)\((O_i,O_j) \cup \varvec{S}\), so every collider on \(\varPi \) (including those in \(\varvec{O}\)) must be an ancestor of \( D-SEP (O_i,O_j) \cup \varvec{S}\) and hence also an ancestor of \(\{O_i, O_j\} \cup \varvec{S}\). The above facts imply that there exists an inducing path between \(O_i\) and \(O_j\), which is contradictory.

  2. 2.

    Suppose that there exists at least one vertex in \(\varvec{O}\) on \(\varPi \) that is a non-collider. Let \(O_k\) denote the first such vertex on \(\varPi \) closest to \(O_i\). Note that every vertex on \(\varPi \) is an ancestor of \(\{O_i,O_j\} \cup D-SEP (O_i,O_j) \cup \varvec{S}\) by the definition of d-connection and hence an ancestor of \(\{O_i,O_j\} \cup \varvec{S}\). This implies that \(O_k\) is an ancestor of \(\{O_i,O_j\} \cup \varvec{S}\). We will show that \(O_k \in D-SEP (O_i,O_j)\) in order to arrive at the contradiction that \(\varPi \) does not d-connect \(O_i\) and \(O_j\) given \( D-SEP (O_i,O_j) \cup \varvec{S}\). Consider the subpath \(\varPi _{O_iO_k}\). Let \(\langle C_1, \dots ,\)\(C_m \rangle \) denote the possibly empty sequence of colliders on \(\varPi _{O_iO_k}\) which are ancestors of \( D-SEP (O_i,O_j)\) but not \(\varvec{S}\). Also, let \(C_n\) denote an arbitrary collider in \(\langle C_1, \dots ,\)\(C_m \rangle \). Notice that there is a directed path \(C_n \leadsto O_n\) with \(O_n \in D-SEP (O_i,O_j)\). Let \(F_n\) denote the first observable on \(C_n \leadsto O_n\) which may be \(O_n\) if no other observable lies on \(C_n \leadsto O_n\). We will show that there exists an inducing path between \(F_n\) and \(F_{n+1}\), where \(F_{n+1}\) corresponds to the first observable on \(C_{n+1} \leadsto O_{n+1}\). First note that \(F_n, F_{n+1} \not \in Anc (\varvec{S})\) because \(C_n,C_{n+1} \not \in Anc (\varvec{S})\). Consider the path \(\varPhi _n\) constructed by concatenating the paths \(C_n \leadsto F_n\), \(\varPi _{C_n C_{n+1}}\) and \(C_{n+1} \leadsto F_{n+1}\). Notice that, by construction, the only observables in \(\varPhi _n\) lie on \(\varPi _{C_n C_{n+1}}\). Moreover, every observable on \(\varPi _{C_n C_{n+1}}\) is a collider because \(O_k\) is the first observable that is a non-collider on \(\varPi \); this implies that only a latent or a selection variable on \(\varPi _{C_n C_{n+1}}\) can be a non-collider. But no selection variable is also a non-collider on \(\varPi _{C_n C_{n+1}}\) because \(\varPi \) d-connects \(O_i\) and \(O_j\) given \( D-SEP (O_i,O_j) \cup \varvec{S}\). We conclude that only a latent variable can be a non-collider on \(\varPi _{C_n C_{n+1}}\). Next, every collider on \(\varPi _{C_n C_{n+1}}\) is an ancestor of \(\varvec{S}\) by construction of \(\langle C_1, \dots ,\)\(C_m \rangle \). We have shown that all colliders on \(\varPhi _n\) are ancestors of \(\varvec{S}\) and all non-colliders on \(\varPhi _n\) are in \(\varvec{L}\). This implies that \(\varPhi _n\) is an inducing path between \(F_n\) and \(F_{n+1}\), specifically one that is into \(F_n\) and into \(F_{n+1}\) by construction. We will now tie up the endpoints. We can also concatenate the paths \(\varPi _{O_iC_1}\) and \(C_1 \leadsto F_1\) in order to form an inducing path \(\varPhi _0\) between \(O_i\) and \(F_1\) that is into \(F_1\). Similarly, we can concatenate the paths \(\varPi _{O_kC_m}\) and \(C_m \leadsto F_m\) in order to form an inducing path \(\varPhi _m\) between \(O_k\) and \(F_m\) that is into \(F_m\). We have constructed a sequence of vertices \(\langle O_i \equiv F_0, F_1, \dots , F_m, F_{m+1} \equiv O_k \rangle \), where each vertex is an ancestor of \(\{O_i,O_j\} \cup \varvec{S}\) and any given \(F_l\) is connected to \(F_{l-1}\) by an inducing path into \(F_l\) and to \(F_{l+1}\) by an inducing path also into \(F_l\). Hence, \(O_k \in D-SEP (O_i, O_j)\). But this implies that \(\varPi \) does not d-connect \(O_i\) and \(O_j\) given \( D-SEP (O_i,O_j) \cup \varvec{S}\) because \(O_k\) is a non-collider on \(\varPi \); contradiction.

We have shown that if there does not exist an inducing path between \(O_i\) and \(O_j\), then \(O_i \perp \!\!\!\perp _d O_j| D-SEP (O_i,\)\(O_j) \cup \varvec{S}\). Now, \(O_i \perp \!\!\!\perp _d O_j| D-SEP (O_i,\)\(O_j) \cup \varvec{S}\)\(\implies \)\(O_j \perp \!\!\!\perp _d O_i| D-SEP (O_j,O_i) \cup \varvec{S}\) because i and j are arbitrary indices. Moreover, \(O_j \perp \!\!\!\perp _d O_i| D-SEP (O_j,O_i) \cup \varvec{S}\)\(\implies \)\(O_i \perp \!\!\!\perp _d O_j| D-SEP (O_j,O_i) \cup \varvec{S}\) because \(O_j \perp \!\!\!\perp _d O_i| D-SEP (O_j,O_i) \cup \varvec{S}\) if and only if \(O_i \perp \!\!\!\perp _d O_j| D-SEP (O_j,\)\(O_i) \cup \varvec{S}\) by symmetry of d-separation. We conclude that if there does not exist an inducing path between \(O_i\) and \(O_j\), then we also have \(O_i \perp \!\!\!\perp _d O_j| D-SEP (O_j,O_i) \cup \varvec{S}\). \(\square \)

Lemma 3

If an inducing path does not exist between \(O_i\) and \(O_j\) in \({\mathbb {G}}\), then \(O_i\) and \(O_j\) are d-separated given \(\varvec{W} \cup \varvec{S}\) with \(\varvec{W} \subseteq PD-SEP (O_i)\) in the MAAG \({\mathbb {G}}^{\prime }\). Likewise, \(O_i\) and \(O_j\) are d-separated given some \(\varvec{W} \cup \varvec{S}\) with \(\varvec{W} \subseteq PD-SEP (O_j)\) in \({\mathbb {G}}^{\prime }\).

Proof

It suffices to show that \( D-SEP (O_i, O_j) \subseteq PD-SEP \)\((O_i)\) by Lemma 2. The argument will hold analogously for \( D-SEP (O_j, O_i) \subseteq PD-SEP (O_j)\). If \(O_k \in D-SEP (O_i,\)\(O_j)\), then there exists a sequence of observables \(\varPi _{O_i, O_k}\) between \(O_i\) and \(O_k\) such that an inducing path exists between any two consecutive observables \(\langle O_h, O_{h+1} \rangle \) in \(\varPi _{O_i, O_k}\). Thus, there also exists a path \(\varPi ^\prime _{O_i, O_k}\) between \(O_i\) and \(O_k\) in \({\mathbb {G}}^\prime \) whose vertices involve all and only the vertices in \(\varPi _{O_i, O_k}\). We also know that, in every consecutive triplet \(\langle O_{h-1}, O_h, O_{h+1} \rangle \), the inducing path from \(O_{h-1}\) to \(O_h\) is into \(O_h\), and the inducing path from \(O_{h+1}\) to \(O_h\) is also into \(O_h\); hence, \(O_h\) is a collider in \({\mathbb {G}}\). We now need to show that any triplet \(\langle O_{h-1}, O_h, O_{h+1} \rangle \) on \(\varPi ^\prime _{O_i, O_k}\) is a v-structure in \({\mathbb {G}}^\prime \) or a triangle in \({\mathbb {G}}^\prime \). We have two situations:

  1. 1.

    Suppose that the collider \(O_h \not \in Anc (\{O_{h-1}, O_{h+1}\} \cup \varvec{S})\). Then, the path between \(O_{h-1}\) and \(O_h\) and then between \(O_h\) and \(O_{h+1}\) is not an inducing path. Hence, \(O_h\) lies in an unshielded triple involving \(\langle O_{h-1}, O_h,\)\(O_{h+1} \rangle \) on \(\varPi ^\prime _{O_i, O_k}\). If \(O_h\) lies in the unshielded triple, then \(O_h\) more specifically lies in a v-structure because \(O_h \not \in Anc (\{O_{h-1}, O_{h+1}\} \cup \varvec{S})\) by assumption.

  2. 2.

    Suppose that \(O_h \in Anc (\{O_{h-1}, O_{h+1}\} \cup \varvec{S})\). Then, there exists an inducing path between \(O_{h-1}\) and \(O_{h+1}\), so \(O_h\) is in a triangle on \(\varPi ^\prime _{O_i, O_k}\).\(\square \)

Lemma 4

If an inducing path does not exist between \(O_i\) and \(O_j\) in \({\mathbb {G}}\), then \(O_i\) and \(O_j\) are d-separated given \(\varvec{W} \cup \varvec{S}\) with \(\varvec{W} \subseteq PD-SEP (O_i)\) in \({\mathbb {G}}^{\prime \prime }\). Likewise, \(O_i\) and \(O_j\) are d-separated given some \(\varvec{W} \cup \varvec{S}\) with \(\varvec{W} \subseteq PD-SEP (O_j)\) in \({\mathbb {G}}^{\prime \prime }\).

Proof

In light of Lemma 3, it suffices to show that \( PD-SEP (O_i)\) formed using the MAAG \({\mathbb {G}}^\prime \) is a subset of \( PD-SEP (O_i)\) formed using \({\mathbb {G}}^{\prime \prime }\). Recall that all edges in \({\mathbb {G}}^\prime \) are also in \({\mathbb {G}}^{\prime \prime }\). Hence, all triangles in \({\mathbb {G}}^\prime \) are also triangles in \({\mathbb {G}}^{\prime \prime }\). We now need to show that all v-structures in \({\mathbb {G}}^\prime \) are also v-structures in \({\mathbb {G}}^{\prime \prime }\) or are triangles in \({\mathbb {G}}^{\prime \prime }\). Let \(\langle O_{h-1}, O_h, O_{h+1} \rangle \) denote an arbitrary v-structure in \({\mathbb {G}}^\prime \). The edge between \(O_{h-1}\) and \(O_h\) as well as the edge between \(O_h\) and \(O_{h+1}\) must be in \({\mathbb {G}}^{\prime \prime }\), because again all edges in \({\mathbb {G}}^\prime \) are also in \({\mathbb {G}}^{\prime \prime }\). We have two cases:

  1. 1.

    An edge exists between \(O_{h-1}\) and \(O_{h+1}\) in \({\mathbb {G}}^{\prime \prime }\). Then, the triple \(\langle O_{h-1}, O_h, \)\(O_{h+1} \rangle \) forms a triangle in \({\mathbb {G}}^{\prime \prime }\).

  2. 2.

    An edge does not exist between \(O_{h-1}\) and \(O_{h+1}\) in \({\mathbb {G}}^{\prime \prime }\). Recall that \(\langle O_{h-1}, O_h, O_{h+1} \rangle \) is a v-structure in \({\mathbb {G}}^\prime \), so \(O_h \not \in Anc (\{O_{h-1}, O_{h+1}\} \cup \varvec{S})\). Note that PC’s skeleton discovery procedure only discovers minimal separating sets so, if we have \(O_{h-1} \perp \!\!\!\perp _d O_{h+1} | \varvec{W} \cup \varvec{S}\) with \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_{h-1}, O_{h+1}\}\) and \(O_h \in \varvec{W}\), then \(O_h \in Anc (\{O_{h-1}, O_{h+1}\} \cup \varvec{S})\) by Lemma 15 with \(\varvec{R} = \emptyset \); but this contradicts the fact that \(O_h \not \in Anc (\{O_{h-1}, O_{h+1}\} \cup \varvec{S})\). Hence, \(O_h \not \in \varvec{W}\), so \(\langle O_{h-1}, O_h,\)\(O_{h+1} \rangle \) is also a v-structure in \({\mathbb {G}}^{\prime \prime }\).\(\square \)

1.3 Steps 2 and 3: Short- and long-range non-ancestral relations

Lemma 16

If \(O_i\) is an ancestor of \(O_j \cup \varvec{S}\), \(O_j\) and some vertex \(O_k\) are d-separated given \(\varvec{W} \cup \varvec{S}\) with \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_j, O_k\}\), \(O_i\) and \(O_j\) are d-connected given \(\varvec{W} \cup \varvec{S}\), and \(O_i \not \in \varvec{W}\), then \(O_i\) and \(O_k\) are d-separated given \(\varvec{W} \cup \varvec{S}\).

Proof

Suppose for a contradiction that \(O_i\) and \(O_k\) are d-connected given \(\varvec{W} \cup \varvec{S}\). There are two cases.

In the first case, suppose that \(O_i\) has a descendant in \(\varvec{W} \cup \varvec{S}\). Recall, however, that we have \(O_i \not \in \varvec{W} \cup \varvec{S}\), so we can merge the d-connecting path \(\varPi _{O_jO_i}\) between \(O_j\) and \(O_i\) and the d-connecting path \(\varPi _{O_iO_k}\) between \(O_i\) and \(O_k\) by invoking Lemma 14 with \({\mathcal {T}} = \{ \varPi _{O_jO_i}, \varPi _{O_iO_k}\}\) in order to form a d-connecting path between \(O_j\) and \(O_k\) given \(\varvec{W} \cup \varvec{S}\). We have arrived at a contradiction.

In the second case, suppose that \(O_i\) does not have a descendant in \(\varvec{W} \cup \varvec{S}\). Recall also that \(O_i\) is an ancestor of \(O_j \cup \varvec{S}\) by assumption. These two facts imply that there exists a directed path \(O_i \leadsto O_j\) that does not include \(\varvec{W} \cup \varvec{S}\); hence, the \(O_i \leadsto O_j\) is d-connecting. We can again invoke Lemma 14 with in order to form a d-connecting path between \(O_j\) and \(O_k\) given \(\varvec{W} \cup \varvec{S}\). We have thus arrived at another contradiction.

We have exhausted all possibilities and therefore conclude that \(O_i\) and \(O_k\) are in fact d-separated given \(\varvec{W} \cup \varvec{S}\). \(\square \)

We can write the contrapositive of the above lemma as follows:

Corollary 2

Let \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_j, O_k\}\). If \(O_i\) and \(O_j\) are d-connected given \(\varvec{W} \cup \varvec{S}\), \(O_k\) and \(O_i\) are d-connected given \(\varvec{W} \cup \varvec{S}\), \(O_k\) and \(O_j\) are d-separated given \(\varvec{W} \cup \varvec{S}\), and \(O_i \not \in \varvec{W}\), then \(O_i\) is not an ancestor of \(O_j \cup \varvec{S}\).

Lemma 5

Consider a set \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_j\}\). Now, suppose that \(O_i\) and \(O_k\) are d-connected given \(\varvec{W} \cup \varvec{S}\) and that \(O_j\) and \(O_k\) are d-connected given \(\varvec{W} \cup \varvec{S}\). If \(O_i\) and \(O_j\) are d-separated given \(\varvec{W} \cup \varvec{S}\) such that \(O_k \not \in \varvec{W}\), then \(O_k\) is not an ancestor of \(\{O_i, O_j\} \cup \varvec{S}\).

Proof

Follows by applying Corollary 2 twice with \(O_i\) and \(O_k\) d-connected and with \(O_j\) and \(O_k\) d-connected. \(\square \)

1.4 Step 5: Orienting with non-minimal D-separating sets

Lemma 6

Consider a quadruple of vertices \(\langle O_i, O_j, O_k,\)\(O_l \rangle \). Suppose that we have:

  1. 1.

    \(O_i\) and \(O_k\) non-adjacent.

  2. 2.

    \(O_i * \!\! \rightarrow O_l \leftarrow \!\! * O_k\).

  3. 3.

    \(O_i\) and \(O_k\) are d-separated given some \(\varvec{W} \cup \varvec{S}\) with \(O_j \in \varvec{W}\) and \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_k\}\);

  4. 4.

    \(O_j * \!\! {- \! \circ }O_l\).

If \(O_l \not \in \varvec{W} = Sep (O_i,O_k)\), then we have \(O_j * \!\! \rightarrow O_l\). If \(O_i * \!\! \rightarrow O_j \leftarrow \!\! * O_k\) and \(O_l \in \varvec{W} = SupSep (O_i,O_j,O_k)\), then we have \(O_j * \!\! - O_l\).

Proof

We prove the first conclusion by contrapositive. Assume that we have \(O_j * \!\! - O_l\). Now, suppose for a contradiction that \(O_l \not \in \varvec{W}\) (but \(O_j \in \varvec{W}\)). Note that \(O_j \cup \varvec{S}\) contains at least one descendant of \(O_l\) because \(O_l \in Anc (O_j \cup \varvec{S})\). With Lemma 14, we can use the d-connecting path between \(O_i\) and \(O_l\) given \(\varvec{W} \cup \varvec{S}\) as well as the d-connecting path between \(O_k\) and \(O_l\) given \(\varvec{W} \cup \varvec{S}\) to form a d-connecting path between \(O_i\) and \(O_k\) given \(\varvec{W} \cup \varvec{S}\) irrespective of whether or not the paths collide at \(O_l\); this contradicts the fact that \(O_i\) and \(O_k\) are d-separated given \(\varvec{W} \cup \varvec{S}\).

For the second conclusion, assume that we have \(O_l \in \varvec{W}\). We know from Lemma 15 with \(\varvec{R}=O_j \cup Sep (O_i,O_k)\) that \(O_l\) is an ancestor of \(\{O_i, O_j, O_k\} \cup Sep (O_i,O_k) \cup \varvec{S}\). Recall that every member of \( Sep (O_i,O_k)\) is an ancestor of \(\{O_i, O_k\} \cup \varvec{S}\) by setting \(\varvec{R} = \emptyset \). Hence, \(O_l\) is more specifically an ancestor of \(\{O_i, O_j, O_k\} \cup \varvec{S}\). Now, since we have \(O_i * \!\! \rightarrow O_l \leftarrow \!\! * O_k\), we can also claim that we have \(O_l \in Anc (O_j)\). Hence, we have \(O_j * \!\! - O_l\). \(\square \)

1.5 Step 6: Long-range ancestral relations

Lemma 7

If \(O_i\) and \(O_k\) are d-separated given \(\varvec{W} \cup \varvec{S}\), where \(\varvec{W} \subseteq \varvec{O}{\setminus }\{ O_i, O_k\}\), and \(\varvec{Q} \subseteq Anc (\{O_i, O_k\} \cup \varvec{W} \cup \varvec{S}){\setminus }\{O_i, O_k \}\), then \(O_i\) and \(O_k\) are also d-separated given \(\varvec{Q} \cup \varvec{W} \cup \varvec{S}\).

Proof

We will prove this by contrapositive. Suppose that there is a path \(\varPi _{O_iO_k}\) which d-connects \(O_i\) and \(O_k\) given some \(\varvec{Q} \cup \varvec{W} \cup \varvec{S}\). Then, every vertex on \(\varPi _{O_iO_k}\) is an ancestor of \(\{O_i, O_k\} \cup \varvec{Q} \cup \varvec{W} \cup \varvec{S}\) by the definition of a d-connecting path. Since \(\varvec{Q} \subseteq Anc (\{O_i, O_k\} \cup \varvec{W} \cup \varvec{S}) \setminus \{O_i, O_k \} \), every vertex on \(\varPi _{O_iO_k}\) must more specifically be an ancestor of \(\{O_i, O_k\} \cup \varvec{W} \cup \varvec{S}\).

Let \(O_a\) denote the collider furthest from \(O_i\) on \(\varPi _{O_iO_k}\) which is an ancestor of \(O_i \cup \varvec{S}\) and not in \(\varvec{W} \cup \varvec{S}\) (or \(O_i\) if no such collider exists). Similarly, let \(O_b\) denote the first collider after \(O_a\) on \(\varPi _{O_iO_k}\) which is an ancestor of \(O_k \cup \varvec{S}\) and not in \(\varvec{W} \cup \varvec{S}\) (or \(O_k\) if no such collider exists). The directed path \(\varPi _{O_aO_i}\) from \(O_a\) to \(O_i \cup \varvec{S}\), and the directed path \(\varPi _{O_bO_k}\) from \(O_b\) to \(O_k \cup \varvec{S}\) are d-connecting given \(\varvec{W} \cup \varvec{S}\), since no vertices on the path \(\varPi _{O_aO_i}\) or \(\varPi _{O_bO_k}\)are in \(\varvec{W} \cup \varvec{S}\). The subpath of \(\varPi _{O_aO_b}\) between \(O_a\) and \(O_b\) on \(\varPi _{O_iO_k}\) is also d-connecting given \(\varvec{W} \cup \varvec{S}\) because every collider is an ancestor of \(\varvec{W} \cup \varvec{S}\), and every non-collider is in \(\varvec{L}\). Lemma 14 implies that we can take \({\mathcal {T}} = \{\varPi _{O_aO_i},\varPi _{O_aO_b},\varPi _{O_bO_k} \}\) to form a d-connecting path between \(O_i\) and \(O_k\) given \(\varvec{W} \cup \varvec{S}\). \(\square \)

1.6 Step 7: Orientation rules

Lemma 8

Suppose that there is a set \(\varvec{W}{\setminus }\{O_i, O_j\}\) and every proper subset \(\varvec{V} \subset \varvec{W}\) d-connects \(O_i\) and \(O_j\) given \(\varvec{V} \cup \varvec{S}\). If \(O_i\) and \(O_j\) are d-separated given \(\varvec{W} \cup \varvec{S}\) where \(O_k \in \varvec{W}\), then \(O_k\) is an ancestor of \(\{O_i, O_j\} \cup \varvec{S}\).

Proof

This is a special case of Lemma 15 with \(\varvec{R} = \emptyset \). \(\square \)

Lemma 10

If we have \(O_i * \!\! \rightarrow O_j \text {---} O_k\) with \(O_i\) and \(O_k\) non-adjacent, then \(O_i * \!\! \rightarrow O_j\) is in a triangle involving \(O_i,O_j\) and \(O_l\) (\(l \not = k\)) with \(O_j \text {---} O_l\) and \(O_i * \!\! \rightarrow O_l\). Moreover, there exists a sequence of undirected edges between \(O_l\) and \(O_k\) that does not include \(O_j\).

Proof

Note that \(O_j\) or \(O_k\) (or both) cannot be ancestors of \(\varvec{S}\) because this would contradict the arrowhead at \(O_j\). Therefore, \(O_j\) is an ancestor of \(O_k\), and \(O_k\) is an ancestor of \(O_j\), so there is a cycle involving \(O_j\) and \(O_k\). Since we have an arrowhead at \(O_j\), there must be an inducing path \(\varPi _{O_i O_j}\) between \(O_i\) and \(O_j\) that is either out of \(O_j\) or into \(O_j\):

  1. 1.

    Suppose that \(\varPi _{O_i O_j}\) is out of \(O_j\). Every vertex on \(\varPi _{O_i O_j}\) is an ancestor of \(\{O_i, O_j\} \cup \varvec{S}\) by the definition of an inducing path. Thus, \(O_j \in Anc (\{O_i, O_j\} \cup \varvec{S})\). Recall that we also have the arrowhead \(O_i * \rightarrow O_j\), so we more specifically have the obvious relation \(O_j \in Anc (O_j)\). Let \(C_1\) denote the collider closest to \(O_j\) on \(\varPi _{O_i O_j}\). Such a collider must exist or else \(O_j \in Anc (O_i)\) which contradicts the arrowhead \(O_i * \rightarrow O_j\). Since \(\varPi _{O_i O_j}\) is an inducing path, we must have \(C_1 \in Anc (\{O_i, O_j\} \cup \varvec{S})\). However, \(C_1\) cannot be an ancestor of \(O_i \cup \varvec{S}\) because that would imply that we have \(O_j \in Anc (O_i \cup \varvec{S})\). We therefore more specifically have \(C_1 \in Anc (O_j)\). Let \(C_1 \leadsto O_j\) denote a directed path to \(O_j\). We have two scenarios:

    1. (a)

      \(C_1 \leadsto O_j\) contains a member of \(\varvec{O}\) besides \(O_j\). Denote that member of \(\varvec{O}\) closest to \(C_1\) as \(O_l\) (note that we may have \(C_1 = O_l\)). Then, \(\varPi _{O_i C_1}\), the part of \(\varPi _{O_i O_j}\) between \(O_i\) and \(C_1\), as well as \(C_1 \leadsto O_l\) together form an inducing path between \(O_i\) and \(O_l\) (every non-collider on \(C_1 \leadsto O_l\) is in \(\varvec{L}\) by construction). Moreover, we must have \(O_i * \rightarrow O_l\) because \(O_l \not \in Anc (O_i \cup \varvec{S})\) by construction. There also exists an inducing path \(\varPi _{O_j O_l}\) between \(O_j\) and \(O_l\) because all non-colliders on \(\varPi _{O_j O_l}\) are in \(\varvec{L}\). We more specifically must have \(O_j - O_l\) because \(O_j \in Anc (O_l)\) and \(O_l \in Anc (O_j)\) by construction. Finally, there exists a sequence of undirected edges to \(O_k\) because every member of \(\varvec{O}\) on \(C_1 \leadsto O_j\) between \(O_l\) and \(O_k\) is an ancestor of \(O_k\) and \(O_k\) is an ancestor of them.

    2. (b)

      \(C_1 \leadsto O_j\) does not contain a member of \(\varvec{O}\) besides \(O_j\). But then \(\varPi _{O_i C_1}\) as well as \(C_1 \leadsto O_j\) form an inducing path because every non-collider on \(C_1 \leadsto O_j\) must be in \(\varvec{L}\). Hence, there exists an inducing path between \(O_i\) and \(O_j\) that is into \(O_j\). See below for the continuation of the argument.

  2. 2.

    Suppose that \(\varPi _{O_i O_j}\) is into \(O_j\). We also know that there is an inducing path between \(O_j\) and \(O_k\). Furthermore, there exists a directed path from \(O_k\) to \(O_j\) by the first paragraph. Hence, there exists an inducing path \(\varPi _{O_j O_l}\) between some variable \(O_l\) (\(O_l\) is in the cycle involving \(O_j\) and \(O_k\) with possibly \(l=k\)) and \(O_j\) which is into \(O_j\). Suppose \(l=k\); but this would imply that \(O_i\) and \(O_k\) are adjacent in the MAAG, since \(\varPi _{O_i O_j}\) and \(\varPi _{O_j O_k}\) would together form an inducing path between \(O_i\) and \(O_k\) (the collider \(O_j\) is an ancestor of \(O_k\)). Hence, the inducing path must involve \(O_j\) and some other observable \(O_l\) where \(l \not = k\). Call this inducing path \(\varPi _{O_j O_l}\). Note that the path \(\{\varPi _{O_i O_j}, \varPi _{O_j O_l}\}\) is an inducing path between \(O_i\) and \(O_l\) because \(O_j\) is an ancestor of \(O_l\). Thus, \(O_i * \!\! \rightarrow O_j\) is in a triangle involving \(O_i, O_j\) and \(O_l\). Finally, recall that \(O_l\) is a member of a cycle involving \(O_j\) and \(O_k\). Hence, \(O_l\) is an ancestor of \(O_j\) and \(O_j\) is an ancestor of \(O_l\). Now, \(O_l\) is also not an ancestor of \(\varvec{S}\) because otherwise both \(O_j\) and \(O_k\) would also be ancestors of \(\varvec{S}\). Next, suppose for a contradiction that \(O_l\) is an ancestor of \(O_i\). Then, \(O_j\) must be an ancestor of \(O_i\) which contradicts the arrowhead \(O_i * \!\! \rightarrow O_j\). \(\square \)

1.7 Main result

Theorem 2

(Soundness) Assume that the global directed Markov property holds with respect to a directed graph \({\mathbb {G}}\) (e.g., when \({\mathbb {G}}\) is a DAG, or we have a linear SEM-IE with directed cyclic graph \({\mathbb {G}}\)). If d-separation faithfulness holds, then CCI outputs a partially oriented MAAG of \({\mathbb {G}}\).

Fig. 10
figure 10

a CCD outperforms CCI even when many (3–7) latent variables exist according to the corrected SHD metric. However, b CCI outperforms CCD when we change the metric to the SHD to the CCI oracle. The same result holds with the original setup of 0–3 latent variables as shown in (c)

Proof

Under d-separation faithfulness, \(O_i\) and \(O_j\) are d-separated given \(\varvec{W} \subseteq \varvec{O}{\setminus }\{O_i, O_j \}\) if and only if \(O_i \perp \!\!\!\perp O_j | \{ \varvec{W} \cup \varvec{S} \}\). Hence, we may use the terms d-separation and conditional independence as well as d-connection and conditional dependence interchangeably.

Lemma 1 implies that an inducing path exists in a maximal ancestral graph if and only if \(O_i\) and \(O_j\) are conditionally independent given all possible subsets of \(\varvec{O} \setminus \{O_i, O_j\}\) as well as \(\varvec{S}\). Lemmas 2 and 3 imply that we can discover the inducing paths using subsets of \( PD-SEP (O_i)\) and \( PD-SEP (O_j)\). Hence, Step 1 of CCI is sound.

We can justify Steps 2 and 3 by invoking Lemma 5. Correctness of Step 5 follows by Lemma 6, and Step 6 by the contrapositive of Lemma 7. Finally, correctness of the orientation rules follows by invoking Lemmas 11, 12 and 13 for orientation rules 1–3, 4–5 and 6–7, respectively. \(\square \)

Appendix: Extra experimental results

1.1 More latent variables

We next ran similar experiments as in Sect. 10.4, but drew the number of latent variables from 3 to 7 rather than 0 to 3 in order to analyze the effect of adding many latent variables. We also set the number of selection variables to zero, since CCD (or a slight variation of it) may be sound even with selection variables. We report the results in Fig. 10 as averaged over 400 random directed cyclic graphs. CCD still outperforms CCI. However, if we change the metric from the corrected SHD to the CCI oracle SHD (i.e., the SHD to the output of the CCI oracle), then CCI outperforms CCD (Fig. 10b). The same results hold true with the original simulation experiments of Sect. 10.4 (Fig. 10c). We conclude that CCD outperforms CCI with the corrected SHD metric but that the reverse is true with the CCI oracle SHD metric.

1.2 Precision and recall

We now report the precision and recall scores, where a positive is defined as a tail endpoint and a negative as an arrowhead. We summarize the results for the cyclic case in Fig. 11. CCI obtains higher precision than both CCD and CCD+OR; the results were significant at a Bonferroni corrected level of 0.05/6 with sample sizes greater than 1000 (min \(t = 4.23\), \(p = 2.55\)E−5). Note that we exclude CCI–OR from the graph, since very few tails are oriented without application of orientation rules. However, both CCI and CCI–OR underperform CCD and CCD+OR in recall at sample sizes greater than 500 (max \(t = -5.62\), \(p = 2.49\)E−8). We conclude that CCI helps detect more accurate ancestral but not non-ancestral relations. We hypothesize that this holds because CCD does not admit bidirected edges, so the algorithm may add tails too aggressively when latent variables exist.

We summarize the results for the acyclic case in Fig. 12. Here, precision results were comparable. CCI performed significantly worse than FCI and RFCI in recall (max \(t = -3.52\), \(p=4.59\)E−4), although the effect size was always less than 0.03 on average. We conclude that the performances of CCI, FCI and RFCI are comparable in the acyclic case.

Fig. 11
figure 11

Precision and recall in (a) and (b) according to the ground truth directed cyclic graphs. Positives correspond to tails and negatives to arrowheads. CCI outperforms all other algorithms in tail orientation with sample sizes greater than 1000

Fig. 12
figure 12

Precision and recall in (a) and (b) according to the ground truth DAGs. All three algorithms perform comparably in this case

Fig. 13
figure 13

a Sensitivity and b specificity of CCI versus ASP. We report timing results in (c). The ASP solver outperforms CCI with the smaller sample sizes, but CCI completes within a much shorter time frame

1.3 CCI versus ASP solver

We next compared CCI with the Clingo ASP solver-based approach proposed in [10]. We gave the solver the results of all pairwise Fisher’s z tests with conditioning set sizes of 0 and 1 using the same alpha values reported in Sect. 10.2. We also utilized the “log-weights” approach to weight each test result, since this procedure performs the best according to the original paper. We generated 1000 DCGs using the same approach mentioned in Sect. 10.1 but with only 7 variables and an expected neighborhood size of 1.5 because the ASP approach does not scale to many variables. We also do not include selection variables because the ASP approach was not designed to handle them.

Note that the ASP solver always outputs a fully oriented graph. The authors in the original paper [10] compared a partially oriented graph \(\widehat{{\mathbb {G}}}\) outputted by algorithms such as FCI with a fully oriented graph \({\mathbb {G}}^*\) by assigning each circle endpoint in \(\widehat{{\mathbb {G}}}\) to be either an arrowhead or a tail; the authors then compared all possible conditional independence/dependence relations implied by \(\widehat{{\mathbb {G}}}^*\) with those implied by \({\mathbb {G}}^*\). This approach is reasonable when dealing with MAGs because \(\widehat{{\mathbb {G}}}^*\) lies within the Markov equivalence class in the asymptotic sample limit, and every MAG in the Markov equivalence class shares the same conditional independence/dependence relations under faithfulness. However, the evaluation method does not generalize to MAAGs because we do not know if oracle CCI is complete; substituting tails or arrowheads for any circle endpoint in a partially oriented MAAG may introduce endpoints which are not present among any member of the Markov equivalence class.

We need an alternative way to compare a partially oriented MAAG to a (fully oriented) MAAG. To do this, we assessed sensitivity and specificity rather than precision and recall. Here, we first run the oracle version of CCI which will output a correct partially oriented MAAG with tails and arrowheads. We then compare the estimated arrowheads and tails of sample CCI and sample ASP with those of oracle CCI. This is a reasonable approach because every arrowhead in the oracle output of CCI is an arrowhead in all members of the Markov equivalence class; likewise for the tails. We can therefore compare a partially oriented output of the sample version of CCI to the fully oriented version of the ASP solver. In comparison, we may not be able to compute precision and recall accurately because the estimated arrowheads and tails of the fully oriented ASP output may not correspond to all members of the Markov equivalence class even in the asymptotic limit; this can place CCI at an unfair disadvantage.

We report the sensitivity and specificity results in Fig. 13. As expected, the ASP solver outperforms CCI on average with the smaller sample sizes, but ASP has a much longer mean run time (around 10–100 times longer); this replicates similar results seen in the acyclic case [10]. CCI, however, overcomes the accuracy disadvantage at larger sample sizes because CCI can utilize larger conditioning set sizes. In contrast, the ASP approach has trouble solving for an admissible graph within a short time frame when including many constraints. We conclude that CCI is less accurate but orders of magnitude faster than the ASP solver.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Strobl, E.V. A constraint-based algorithm for causal discovery with cycles, latent variables and selection bias. Int J Data Sci Anal 8, 33–56 (2019). https://doi.org/10.1007/s41060-018-0158-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0158-2

Keywords

Navigation