Causal Inference by String Diagram Surgery
Abstract
Extracting causal relationships from observed correlations is a growing area in probabilistic reasoning, originating with the seminal work of Pearl and others from the early 1990s. This paper develops a new, categorically oriented view based on a clear distinction between syntax (string diagrams) and semantics (stochastic matrices), connected via interpretations as structurepreserving functors.
A key notion in the identification of causal effects is that of an intervention, whereby a variable is forcefully set to a particular value independent of any prior dependencies. We represent the effect of such an intervention as an endofunctor which performs ‘string diagram surgery’ within the syntactic category of string diagrams. This diagram surgery in turn yields a new, interventional distribution via the interpretation functor. While in general there is no way to compute interventional distributions purely from observed data, we show that this is possible in certain special cases using a calculational tool called comb disintegration.
We showcase this technique on a wellknown example, predicting the causal effect of smoking on cancer in the presence of a confounding common cause. We then conclude by showing that this technique provides simple sufficient conditions for computing interventions which apply to a wide variety of situations considered in the causal inference literature.
Keywords
Causality String diagrams Probabilistic reasoning1 Introduction
An important conceptual tool for distinguishing correlation from causation is the possibility of intervention. For example, a randomised drug trial attempts to destroy any confounding ‘common cause’ explanation for correlations between drug use and recovery by randomly assigning a patient to the control or treatment group, independent of any background factors. In an ideal setting, the observed correlations of such a trial will reflect genuine causal influence. Unfortunately, it is not always possible (or ethical) to ascertain causal effects by means of actual interventions. For instance, one is unlikely to get approval to run a clinical trial on whether smoking causes cancer by randomly assigning 50% of the patients to smoke, and waiting a bit to see who gets cancer. However, in certain situations it is possible to predict the effect of such a hypothetical intervention from purely observational data.
In this paper, we will focus on the problem of causal identifiability. For this problem, we are given observational data as a joint distribution on a set of variables and we are furthermore provided with a causal structure associated with those variables. This structure, which typically takes the form of a directed acyclic graph or some variation thereof, tells us which variables can in principle have a causal influence on others. The problem then becomes whether we can measure how strong those causal influences are, by means of computing an interventional distribution. That is, can we ascertain what would have happened if a (hypothetical) intervention had occurred?
Over the past 3 decades, a great deal of work has been done in identifying necessary and sufficient conditions for causal identifiability in various special cases, starting with very specific notions such as the backdoor and frontdoor criteria [20] and progressing to more general necessary and sufficient conditions for causal identifiability based on the docalculus [11], or combinatoric concepts such as confounded components in semiMakovian models [25, 26].
This style of causal reasoning relies crucially on a delicate interplay between syntax and semantics, which is often not made explicit in the literature. The syntactic object of interest is the causal structure (e.g. a causal graph), which captures something about our understanding of the world, and the mechanisms which gave rise to some observed phenomena. The semantic object of interest is the data: joint and conditional probability distributions on some variables. Fixing a causal structure entails certain constraints on which probability distributions can arise, hence it is natural to see distributions satisfying those constraints as models of the syntax.
In this paper, we make this interplay precise using functorial semantics in the spirit of Lawvere [17], and develop basic syntactic and semantic tools for causal reasoning in this setting. We take as our starting point a functorial presentation of Bayesian networks similar to the one appearing in [7]. The syntactic role is played by string diagrams, which give an intuitive way to represent morphisms of a monoidal category as boxes plugged together by wires. Given a directed acyclic graph (dag) G, we can form a free category \(\mathsf {Syn}_{\scriptscriptstyle G} \) whose arrows are (formal) string diagrams which represent the causal structure syntactically. Structure preserving functors from \(\mathsf {Syn}_{\scriptscriptstyle G} \) to \(\mathsf {Stoch}\), the category of stochastic matrices, then correspond exactly to Bayesian networks based on the dag G.
Within this framework, we develop the notion of intervention as an operation of ‘string diagram surgery’. Intuitively, this cuts a string diagram at a certain variable, severing its link to the past. Formally, this is represented as an endofunctor on the syntactic category \(\mathsf {cut}_{\scriptscriptstyle X} :\mathsf {Syn}_{\scriptscriptstyle G} \rightarrow \mathsf {Syn}_{\scriptscriptstyle G} \), which propagates through a model \(\mathcal F :\mathsf {Syn}_{\scriptscriptstyle G} \rightarrow \mathsf {Stoch} \) to send observational probabilities \(\mathcal F(\omega )\) to interventional probabilities \(\mathcal F(\mathsf {cut}_{\scriptscriptstyle X}(\omega ))\).
The \(\mathsf {cut}_{\scriptscriptstyle X}\) endofunctor gives us a diagrammatic means of computing interventional distributions given complete knowledge of \(\mathcal F\). However, more interestingly, we can sometimes compute interventionals given only partial knowledge of \(\mathcal F\), namely some observational data. We show that this can also be done via a technique we call comb disintegration, which is a string diagrammatic version of a technique called cfactorisation introduced by Tian and Pearl [26]. Our approach generalises disintegration, a calculational tool whereby a joint state on two variables is factored into a singlevariable state and a channel, representing the marginal and conditional parts of the distribution, respectively. Disintegration has recently been formulated categorically in [5] and using string diagrams in [4]. We take the latter as a starting point, but instead consider a factorisation of a threevariable state into a channel and a comb. The latter is a special kind of map which allows inputs and outputs to be interleaved. They were originally studied in the context of quantum communication protocols, seen as games [8], but have recently been used extensively in the study of causallyordered quantum [3, 21] and generalised [15] processes. While originally imagined for quantum processes, the categorical formulation given in [15] makes sense in both the classical case (\(\mathsf {Stoch}\)) and the quantum. Much like Tian and Pearl’s technique, comb factorisation allows one to characterise when the confounding parts of a causal structure are suitably isolated from each other, then exploit that isolation to perform the concrete calculation of interventional distributions.
However, unlike in the traditional formulation, the syntactic and semantic aspects of causal identifiability within our framework exactly mirror oneanother. Namely, we can give conditions for causal identifiability in terms of factorisation a morphism in \(\mathsf {Syn}_{\scriptscriptstyle G}\), whereas the actual concrete computation of the interventional distribution involves factorisation of its interpretation in \(\mathsf {Stoch}\). Thanks to the functorial semantics, the former immediately implies the latter.
To introduce the framework, we make use of a running example taken from Pearl’s book [20]: identifying the causal effect of smoking on cancer with the help of an auxiliary variable (the presence of tar in the lungs). After providing some preliminaries on stochastic matrices and the functorial presentation of Bayesian networks in Sects. 2 and 3, we introduce the smoking example in Sect. 4. In Sect. 5 we formalise the notion of intervention as string diagram surgery, and in Sect. 6 we introduce the combs and prove our main calculational result: the existence and uniqueness of comb factorisations. In Sect. 7, we show how to apply this theorem in computing the interventional distribution in the smoking example, and in 8, we show how this theorem can be applied in a more general case which captures (and slightly generalises) the conditions given in [26]. In Sect. 9, we conclude and describe several avenues of future work.
2 Stochastic Matrices and Conditional Probabilities
Symmetric monoidal categories (SMCs) give a very general setting for studying processes which can be composed in sequence (via the usual categorical composition \(\circ \)) and in parallel (via the monoidal composition \(\otimes \)). Throughout this paper, we will use string diagram notation [24] for depicting composition of morphisms in an SMC. In this notation, morphisms are depicted as boxes with labelled input and output wires, composition \(\circ \) as ‘plugging’ boxes together, and the monoidal product \(\otimes \) as placing boxes sidebyside. Identity morphisms are depicted simply as a wire and the unit I of \(\otimes \) as the empty diagram. The ‘symmetric’ part of the structure consists of symmetry morphisms, which enable us to permute inputs and outputs arbitrarily. We depict these as wirecrossings: Open image in new window . Morphisms whose domain is I are called states, and they will play a special role throughout this paper.
Composition of stochastic matrices is matrix multiplication. In terms of conditional probabilities, that is multiplication followed by marginalization over the shared variable: \(\sum _B P(CB)P(BA)\). Identities are thus given by identity matrices, which we will often express in terms of the Kronecker delta function \(\varvec{\delta }_i^j\).
Definition 2.1
We assume that the CDU structure on I is trivial and the CDU structure on \(A \otimes B\) is constructed in the obvious way from the structure on A and B. We also use the first equation in (2) to justify writing ‘copy’ maps with arbitrarily many output wires: Open image in new window .
Similar to [2], we can form the free CDU category \(\mathsf {FreeCDU}(X,\varSigma )\) over a pair \((X, \varSigma )\) of a generating set of objects X and a generating set \(\varSigma \) of typed morphisms \(f :u \rightarrow w\), with \(u,w \in X^{\star }\) as follows. The category \(\mathsf {FreeCDU}(X,\varSigma )\) has \(X^{\star }\) as set of objects, and morphisms the string diagrams constructed from the elements of \(\varSigma \) and maps Open image in new window , Open image in new window and Open image in new window for each \(x \in X\), taken modulo the equations (2).
Lemma 2.2
\(\mathsf {Stoch} \) is a CDU category, with CDU structure defined as in (1).
Proposition 2.3
Note that Eq. (3) and the CDU rules immediately imply that the unique \(\varvec{a} :I \rightarrow A\) in Proposition 2.3 is the marginal of \(\varvec{\omega }\) onto A: Open image in new window .
3 Bayesian Networks as String Diagrams
Bayesian networks are a widelyused tool in probabilistic reasoning. They give a succinct representation of conditional (in)dependences between variables as a directed acyclic graph. Traditionally, a Bayesian network on a set of variables \(A, B, C, \ldots \) is defined as a directed acyclic graph (dag) G, an assignment of sets to each of the nodes \(V_G := \{A, B, C, \ldots \}\) of G and a joint probability distribution over those variables which factorises as \(P(V_G) = \prod _{A \in V_G} P(A\,\,\text {Pa}(A))\) where \(\text {Pa}(A)\) is the set of parents of A in G. Any joint distribution that factorises this way is said to satisfy the global Markov property with respect to the dag G. Alternatively, a Bayesian network can be seen as a dag equipped with a set of conditional probabilities \(\{ P(A \,\, \text {Pa}(A)) \mid A \in V_G \}\) which can be combined to form the joint state. Thanks to disintegration, these two perspectives are equivalent.
By ‘a string diagram in \(\mathsf {Stoch}\) ’, we mean not only the stochastic matrix itself, but also its decomposition into components. We can formalise exactly what we mean by taking a perspective on Bayesian networks which draws inspiration from Lawvere’s functorial semantics of algebraic theories [16]. In this perspective, which elaborates on [7, Ch. 4], we maintain a conceptual distinction between the purely syntactic object (the diagram) and its probabilistic interpretation.
Proposition 3.1
There is a 11 correspondence between Bayesian networks based on the dag G and CDU functors of type \(\mathsf {Syn}_{\scriptscriptstyle G} \rightarrow \mathsf {Stoch} \).
We refer to [12] for a proof. This proposition justifies the following definition of a category \(\mathsf {BN}_{\scriptscriptstyle {G}} \) of Gbased Bayesian networks: objects are CDU functors \(\mathsf {Syn}_{\scriptscriptstyle G} \rightarrow \mathsf {Stoch} \) and arrows are monoidal natural transformations between them.
4 Towards Causal Inference: The Smoking Scenario
We can again imagine intervening at S by replacing \(\varvec{s} : H \rightarrow S\) by Open image in new window . Again, this ‘cutting’ of the diagram will result in a new interventional distribution \(\varvec{\omega }'\). However, unike before, it is possible to compute this distribution from the observational distribution \(\varvec{\omega }\).
However, in order to do that, we first need to develop the appropriate categorical framework. In Sect. 5, we will model ‘cutting’ as a functor. In 6, we will introduce a generalisation of disintegration, which we call comb disintegration. These tools will enable us to compute \(\varvec{\omega }'\) for \(\varvec{\omega }\), in Sect. 7.
5 Interventional Distributions as Diagram Surgery
The goal of this section is to define the ‘cut’ operation in (7) as an endofunctor on the category of Bayesian networks. First, we observe that such an operation exclusively concerns the string diagram part of a Bayesian network: following the functorial semantics given in Sect. 3, it is thus appropriate to define cut as an endofunctor on \(\mathsf {Syn}_{\scriptscriptstyle G} \), for a given dag G.
Definition 5.1

For each object \(B \in V_G\), \(\mathsf {cut}_{\scriptscriptstyle A}(B) = B\).

Open image in new window and Open image in new window for any other Open image in new window .
Intuitively, \(\mathsf {cut}_{\scriptscriptstyle A}\) applied to a string diagram f of \(\mathsf {Syn}_{\scriptscriptstyle G} \) removes from f each occurrence of a box with output wire of type A.
Proposition 3.1 allows us to “transport” the cutting operation over to Bayesian networks. Given any Bayesian network based on G, let \(\mathcal {F}:\mathsf {Syn}_{\scriptscriptstyle G} \rightarrow \mathsf {Stoch} \) be the corresponding CDU functor given by Proposition 3.1. Then, we can define its Acutting as the Bayesian network identified by the CDU functor \(\mathcal {F}\circ \mathsf {cut}_{\scriptscriptstyle A}\). This yields an (idempotent) endofunctor \(\mathsf {Cut}_{\scriptscriptstyle A} :\mathsf {BN}_{\scriptscriptstyle {G}} \rightarrow \mathsf {BN}_{\scriptscriptstyle {G}} \).
6 The Comb Factorisation
In order to do that, we want to work in a setting where \(\varvec{t} :S \rightarrow T\) can be isolated and ‘extracted’ from (8). What is left behind is a stochastic matrix with a ‘hole’ where \(\varvec{t}\) has been extracted. To define ‘morphisms with holes’, it is convenient to pass from SMCs to compact closed categories (see e.g. [24]). \(\mathsf {Stoch} \) is not itself compact closed, but it embeds into \(\mathsf {Mat}(\mathbb R^{\scriptscriptstyle +}) \), whose morphisms are all matrices over positive numbers. \(\mathsf {Mat}(\mathbb R^{\scriptscriptstyle +}) \) has a (selfdual) compact closed structure; that means, for any set A there is a ‘cap’ \(\cap :A \otimes A \rightarrow I\) and a ‘cup’ \(\cup :I \rightarrow A \otimes A\), which satisfy the ‘yanking’ equations on the right. As matrices, caps and cups are defined by \(\cap _{ij} = \cup ^{ij} = \delta _i^j\). Intuitively, they amount to ‘bent’ identity wires. Another aspect of \(\mathsf {Mat}(\mathbb R^{\scriptscriptstyle +}) \) that is useful to recall is the following handy characterisation of the subcategory \(\mathsf {Stoch} \).
Lemma 6.1
A morphism \(\varvec{f} :A \rightarrow B\) in \(\mathsf {Mat}(\mathbb R^{\scriptscriptstyle +}) \) is a stochastic matrix (thus a morphism of \(\mathsf {Stoch} \)) if and only if (3) holds.
A suitable notion of ‘stochastic map with a hole’ is provided by a comb. These structures originate in the study of certain kinds of quantum channels [3].
Definition 6.2
This definition extends inductively to ncombs, where we require that discarding the rightmost output yields Open image in new window , for some \((n1)\)comb \(\varvec{f}'\). However, for our purposes, restricting to 2combs will suffice.
Importantly, for generic \(\varvec{f}\) and \(\varvec{g}\) of \(\mathsf {Stoch} \), there is no guarantee that forming the composite (11) in \(\mathsf {Mat}(\mathbb R^{\scriptscriptstyle +}) \) yields a valid \(\mathsf {Stoch} \)morphism, i.e. a morphism satisfying the finality Eq. (3). However, if \(\varvec{f}\) is a 2comb and \(\varvec{g}\) is a \(\mathsf {Stoch} \)morphism, Eq. (9) enables a discarding map plugged into the output \(B_2\) in (11) to ‘fall through’ the right side of \(\varvec{f}\), which guarantees that the composed map satisfies the finality equation for discarding. See [12, § ??] for the explicit diagram calculation.
With the concept of 2combs in hand, we can state our factorisation result.
Theorem 6.3
Proof
The construction of \(\varvec{f}\) and \(\varvec{g}\) mimics the one of cfactors in [26], using string diagrams and (diagrammatic) disintegration. We first use \(\varvec{\omega }\) to construct maps \(\varvec{a} : I \rightarrow A, \varvec{b} : A \rightarrow B\), \(\varvec{c} : A \otimes B \rightarrow C\), then construct \(\varvec{f}\) using \(\varvec{a}\) and \(\varvec{c}\) and construct \(\varvec{g}\) using \(\varvec{b}\). For the full proof, including uniqueness, see [12].\(\square \)
Note that Theorem 6.3 generalises the normal disintegration property given in Proposition 2.3. The latter is recovered by taking \(A := I\) (or \(C := I\)) above.
7 Returning to the Smoking Scenario
Note this conclusion depends totally on the particular observational data that we picked. For a different interpretation of \(\varvec{\omega }\) in \(\mathsf {Stoch} \), one might conclude that there is no causal connection, or even that smoking decreases the chance of getting cancer. Interestingly, all three cases can arise even when a naïve analysis of the data shows a strong direct correlation between S and C. To see and/or experiment with these cases, we have provided the Python code^{2} used to perform these calculations. See also [19] for a pedagocical overview of this example (using traditional Bayesian network language) with some sample calculations.
8 The General Case for a Single Intervention
While we applied the comb decomposition to a particular example, this technique applies essentially unmodified to many examples where we intervene at a single variable (called X below) within an arbitrary causal structure.
Theorem 8.1
Proof
This is general enough to cover several wellknown sufficient conditions from the causality literature, including singlevariable versions of the socalled frontdoor and backdoor criteria, as well as the sufficient condition based on confounding paths given by Pearl and Tian [26]. As the latter subsumes the other two, we will say a few words about the relationship between the Pearl/Tian condition and Theorem 8.1. In [26], the authors focus on semiMarkovian models, where the only latent variables have exactly two observed children and no parents. Suppose we write \(A \leftrightarrow B\) if two observed variables are connected by a latent common cause, then one can characterise confounding paths as the transitive closure of \(\leftrightarrow \). They go on to show that the interventional distribution corresponding cutting X is computable whenever there are no confounding paths connecting X to one of its children.
We can compare this to the form of expression \(\omega \) in Eq. (14). First, note this factorisation implies that all boxes which take X as an input must occur as subdiagrams of g. Hence, any ‘confounding path’ connecting X to its children would yield at least one (uncopied) wire from \(f_1\) to g, hence it cannot be factored as (14). Conversely, if there are no confounding paths from X to its children, then we can we can place the boxes involved in any other confounding path either entirely inside of g or entirely outside of g and obtain factorisation (14). Hence, restricting to semiMarkovian models, the no confoundingpath condition from [26] is equivalent to ours. However, Theorem 8.1 is slightly more general: its formulation doesn’t rely on the causal structure \(\omega \) being semiMarkovian.
9 Conclusion and Future Work
This paper takes a fresh, systematic look at the problem of causal identifiability. By clearly distinguishing syntax (string diagram surgery and identification of comb shapes) and semantics (combdisintegration of joint states) we obtain a clear methodology for computing interventional distributions, and hence causal effects, from observational data.
A natural next step is moving beyond singlevariable interventions to the general case, i.e. situations where we allow interventions on multiple variables which may have some arbitrary causal relationships connecting them. This would mean extending the comb factorisation Theorem 6.3 from a 2comb and a channel to arbitrary ncombs. This seems to be straightforward, via an inductive extension of the proof of Theorem 6.3. A more substantial direction of future work will be the strengthening of Theorem 8.1 from sufficient conditions for causal identifiability to a full characterisation. Indeed, the related condition based on confounding paths from [26] is a necessary and sufficient condition for computing the interventional distribution on a single variable. Hence, it will be interesting to formalise this necessity proof (and more general versions, e.g. [10]) within our framework and investigate, for example, the extent to which it holds beyond the semiMarkovian case.
While we focus exclusively on the case of taking models in \(\mathsf {Stoch}\) in this paper, the techniques we gave are posed at an abstract level in terms of composition and factorisation. Hence, we are optimistic about their prospects to generalise to other probabilistic (e.g. infinite discrete and continuous variables) and quantum settings. In the latter case, this could provide insights into the emerging field of quantum causal structures [6, 9, 18, 22, 23], which attempts in part to replay some of the results coming from statistical causal reasoning, but where quantum processes play a role analogous to stochastic ones. A key difficulty in applying our framework to a category of quantum processes, rather than \(\mathsf {Stoch}\), is the unavailability of ‘copy’ morphisms due to the quantum nocloning theorem [27]. However, a recent proposal for the formulation of ‘quantum common causes’ [1] suggests a (partiallydefined) analogue to the role played by ‘copy’ in our formulation constructed via multiplication of certain commuting Choi matrices. Hence, it may yet be possible to import results from classical causal reasoning into the quantum case just by changing the category of models.
Footnotes
Notes
Acknowledgements
FZ acknowledges support from EPSRC grant EP/R020604/1. AK would like to thank Tom Claassen for useful discussions on causal identification criteria.
References
 1.Allen, J.M.A., Barrett, J., Horsman, D.C., Lee, C.M., Spekkens, R.W.: Quantum common causes and quantum causal models. Phys. Rev. X 7, 031021 (2017)Google Scholar
 2.Bonchi, F., Sobociński, P., Zanasi, F.: Deconstructing Lawvere with distributive laws. J. Log. Algebr. Meth. Program. 95, 128–146 (2018)MathSciNetCrossRefGoogle Scholar
 3.Chiribella, G., D’Ariano, G.M., Perinotti, P.: Quantum circuit architecture. Phys. Rev. Lett. 101, 060401 (2008)CrossRefGoogle Scholar
 4.Cho, K., Jacobs, B.: Disintegration and Bayesian inversion, both abstractly and concretely (2017). arxiv.org/abs/1709.00322
 5.Clerc, F., Danos, V., Dahlqvist, F., Garnier, I.: Pointless learning. In: Esparza, J., Murawski, A.S. (eds.) FoSSaCS 2017. LNCS, vol. 10203, pp. 355–369. Springer, Heidelberg (2017). https://doi.org/10.1007/9783662544587_21CrossRefGoogle Scholar
 6.Costa, F., Shrapnel, S.: Quantum causal modelling. New J. Phys. 18(6), 063032 (2016)CrossRefGoogle Scholar
 7.Fong, B.: Causal theories: a categorical perspective on Bayesian networks. Master’s thesis, University of Oxford (2012). arxiv.org/abs/1301.6201
 8.Gutoski, G., Watrous, J.: Toward a general theory of quantum games. In: Proceedings of the ThirtyNinth Annual ACM Symposium on Theory of Computing, pp. 565–574. ACM (2007)Google Scholar
 9.Henson, J., Lal, R., Pusey, M.F.: Theoryindependent limits on correlations from generalized Bayesian networks. New J. Phys. 16(11), 113043 (2014)CrossRefGoogle Scholar
 10.Huang, Y., Valtorta, M.: On the completeness of an identifiability algorithm for semiMarkovian models. Ann. Math. Artif. Intell. 54(4), 363–408 (2008)MathSciNetCrossRefGoogle Scholar
 11.Huang, Y., Valtorta, M.: Pearl’s calculus of intervention is complete. CoRR, abs/1206.6831 (2012)Google Scholar
 12.Jacobs, B., Kissinger, A., Zanasi, F.: Causal inference by string diagram surgery. CoRR, abs/1811.08338 (2018)Google Scholar
 13.Jacobs, B., Zanasi, F.: A predicate/state transformer semantics for Bayesian learning. Electr. Notes Theor. Comput. Sci. 325, 185–200 (2016)MathSciNetCrossRefGoogle Scholar
 14.Jacobs, B., Zanasi, F.: The logical essentials of Bayesian reasoning. CoRR, abs/1804.01193 (2018)Google Scholar
 15.Kissinger, A., Uijlen, S.: A categorical semantics for causal structure. In: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, 20–23 June 2017, pp. 1–12 (2017)Google Scholar
 16.Lawvere, F.W.: Ordinal sums and equational doctrines. In: Eckmann, B. (ed.) Seminar on Triples and Categorical Homology Theory. LNM, vol. 80, pp. 141–155. Springer, Heidelberg (1969). https://doi.org/10.1007/BFb0083085CrossRefGoogle Scholar
 17.Lawvere, F.W.: Functorial semantics of algebraic theories. Proc. Natl. Acad. Sci. U.S.A. 50(5), 869 (1963)MathSciNetCrossRefGoogle Scholar
 18.Leifer, M.S., Spekkens, R.W.: Towards a formulation of quantum theory as a causally neutral theory of Bayesian inference. Phys. Rev. A 88, 052130 (2013)CrossRefGoogle Scholar
 19.Nielsen, M.: If correlation doesn’t imply causation, then what does? http://www.michaelnielsen.org/ddi/ifcorrelationdoesntimplycausationthenwhatdoes. Accessed 15 Nov 2018
 20.Pearl, J.: Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
 21.Perinotti, P.: Causal structures and the classification of higher order quantum computations (2016)Google Scholar
 22.Pienaar, J., Brukner, Č.: A graphseparation theorem for quantum causal models. New J. Phys. 17(7), 073020 (2015)CrossRefGoogle Scholar
 23.Ried, K., Agnew, M., Vermeyden, L., Janzing, D., Spekkens, R.W., Resch, K.J.: A quantum advantage for inferring causal structure. Nat. Phys. 11, 1745–2473 (2015)CrossRefGoogle Scholar
 24.Selinger, P.: A survey of graphical languages for monoidal categories. In: Coecke, B. (ed.) New Structures for Physics. LNP, vol. 813. Springer, Heidelberg (2011)zbMATHGoogle Scholar
 25.Shpitser, I., Pearl, J.: Identification of joint interventional distributions in recursive semiMarkovian causal models. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, p. 1219. AAAI Press/MIT Press, Menlo Park/Cambridge (1999/2006)Google Scholar
 26.Tian, J., Pearl, J.: A general identification condition for causal effects. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, 28 July–1 August 2002, Edmonton, Alberta, Canada, pp. 567–573 (2002)Google Scholar
 27.Wootters, W.K., Zurek, W.H.: A single quantum cannot be cloned. Nature 299(5886), 802–803 (1982)CrossRefGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.