Staged trees and asymmetry-labeled DAGs

Bayesian networks are a widely-used class of probabilistic graphical models capable of representing symmetric conditional independence between variables of interest using the topology of the underlying graph. For categorical variables, they can be seen as a special case of the much more general class of models called staged trees, which can represent any type of non-symmetric conditional independence. Here we formalize the relationship between these two models and introduce a minimal Bayesian network representation of the staged tree, which can be used to read conditional independences in an intutitive way. A new labeled graph termed asymmetry-labeled directed acyclic graph is defined, whose edges are labeled to denote the type of dependence existing between any two random variables. We also present a novel algorithm to learn staged trees which only enforces a specific subset of non-symmetric independences. Various datasets are used to illustrate the methodology, highlighting the need to construct models which more flexibly encode and represent non-symmetric structures.


Introduction
Probabilistic graphical models give an intuitive and efficient representation of the relationships existing between random variables of interest.Bayesian networks (BNs) (see e.g.Darwiche, 2009) are the most commonly used graphical model and have been applied in a variety of real-world applications.One of the main limitations of BNs is that they can only represent symmetric conditional independences, which in practice can be too restrictive.
For this reason, Boutilier et al. (1996) introduced the notion of context-specific independence, meaning that independences hold only for specific values, or contexts, of the conditioning variables.Extensions of BNs encoding context-specific independences are usually defined by associating a tree representation to each vertex of the network (Cano et al., 2012;Friedman and Goldszmidt, 1996;Talvitie et al., 2019), by labeling the edges (Pensar et al., 2015;Hyttinen et al., 2018), or by using some alternative approach (Chickering et al., 1997;Geiger and Heckerman, 1996;Poole and Zhang, 2003).In recent years there has been a growing interest in formalizing context-specific independence (Corander et al., 2019;Shen et al., 2020) and in generalizing other graphical models with non-symmetric dependencies (Nyman et al., 2016;Pensar et al., 2017).
With the exception of Jaeger et al. (2006) and Pensar et al. (2015), BNs embellished with context-specific independence lose their intuitiveness since all the model information cannot be succinctly represented in a unique graph.Staged trees (Smith and Anderson, 2008;Collazo et al., 2018) are probabilistic graphical models that, starting from an event tree, represent non-symmetric conditional independence statements via a coloring of its vertices.Coloring has recently been found to provide a valuable embellishment to other graphical models (Højsgaard and Lauritzen, 2008;Massam et al., 2018).
As demonstrated by Smith and Anderson (2008) and Duarte and Solus (2021a), every BN can be represented as a staged tree.However, the class of staged tree models is much more general and can represent not only symmetric, but also context-specific, partial and local independences (Pensar et al., 2016).Furthermore, a wide array of methods to efficiently investigate real-world applications have been introduced for staged trees, including userfriendly software (Carli et al., 2022), inferential routines (Görgen et al., 2015), structural learning (Freeman and Smith, 2011), dealing with missing data (Barclay et al., 2014), causal reasoning (Thwaites et al., 2010) and identification of equivalence classes (Görgen et al., 2018), to name a few.Such techniques are in general not available for other graphical models embedding non-symmetric independences, thus making staged trees a viable as well as efficient option for applied analyses.
Our first contribution is a deeper study of the relationship between BNs and staged trees.We introduce a minimal BN representation of a staged tree which embeds all its symmetric conditional independences.Importantly, this allows us to introduce a criterion to identify all symmetric conditional independences implied by the model, which has proven to be a very challenging task (Thwaites and Smith, 2015).
Reading non-symmetric independences directly from the staged tree is even more challenging.Our second contribution is a novel definition of classes of dependence among variables and the introduction of methods to identify the appropriate class from the staged tree.The presence or absence of edges in a BN encodes either (conditionally) full dependence or independence between two variables.However, the flexibility of the staged tree enables us to model and consequently identify intermediate relationship between variables, namely context-specific, partial or local (Pensar et al., 2016).
As a result, our third contribution is the definition of a new class of directed acyclic graphs (DAGs), termed asymmetry-labeled DAGs (ALDAGs), by coloring edges according to the type of relationship existing between the corresponding variables.Learning algorithms for ALDAGs, which use any structural learning algorithm for staged trees (see e.g.those included in the R package stagedtrees, Carli et al., 2022), are discussed below and applied to various datasets.Our fourth contribution is the definition of a new visualization of dependence, called the dependence subtree, which shows how a variable is related to only those that have a direct effect on it, namely its parents in the associated ALDAG.The use of such a tool is showcased in our data applications below.
Structural learning of generic staged trees is hard, due to the explosion of the model search space as the number of variables increases (see e.g.Duarte and Solus, 2021b).For this reason, recent research has focused on sub-classes of staged tree models: Carli et al. (2020) defined naive staged trees which have the same number of parameters of a naive BN over the same variables; Leonelli and Varando (2022b) considered simple staged trees which have a constrained type of partitioning of the vertices; Leonelli and Varando (2022a) introduced k-parents staged trees which limit the number of variables that can have a direct influence on another; Duarte and Solus (2021b) defined CStrees which only embed symmetric and context-specific types of independence.Our last contribution is the introduction of a novel algorithm, called context-specific backward hill-climbing (CSBHC), to learn staged trees whose staging is restricted.In particular, the proposed algorithm learns staged trees whose corresponding ALDAGs have a restricted subset of labels, those corresponding to conditional independences of the context-specific type only.

Bayesian Networks and Conditional Independence
Let G = ([p], F ) be a directed acyclic graph (DAG) with vertex set [p] = {1, . . ., p} and edge set F .Let X = (X i ) i∈[p] be categorical random variables with joint mass function P The ordered Markov condition implies conditional independences of the form which are equivalent to (2) It is customary to label the vertices of a BN so to respect the topological order of G, i.e.
a linear ordering of [p] for which only pairs (i, j) where i appears before j in the order can be in the edge set.Of course, there can be multiple permutations of [p] which respect the topological order.Henceforth, we assume that the natural ordering of the positive integers [p] respects the topological order of the BN.
To illustrate our methodology, we use throughout the paper the Titanic dataset (Dawson, 1995) which provides information on the fate of the Titanic passengers and available from the

Non-Symmetric Conditional Independence
BNs have the capability of expressing only symmetric conditional independences of the form in (1) and (2).The most common non-symmetric extension of conditional independence is the so-called context-specific independence which is often represented associating a tree to each vertex of a BN (Boutilier et al., 1996).Let A, B and C be three disjoint subsets of [p].
We say that Pensar et al. (2016) introduced a more general definition of non-symmetric conditional independence called partial conditional independence.We say that X A is partially condi- (4) coincide if D B = X B .Furthermore, the sample space X B must contain more than two elements for a non-trivial partial conditional independence to hold.
A final condition is the so called local conditional independence and first discussed in Chickering et al. (1997).For i ∈ [p] and an A ⊂ [p] such that A ∩ {i} = ∅, local conditional independence expresses equalities of probabilities of the form for all x i ∈ X i and two x A , xA ∈ X A .Notice that in terms of generality, (3) (4) (5).
Condition (5) simply states that some conditional probability distributions are identical, where no discernable patterns as in ( 3) and ( 4) can be detected.
Differently to any other probabilistic graphical model, the class of staged trees that we review next is able to graphically represent and formally encode any of the types of conditional independences defined in (2)-( 5).

Staged Trees
Differently to BNs, whose graphical representation is a DAG, staged trees visualize conditional independence by means of a colored tree.Let (V, E) be a directed, finite, rooted tree with vertex set V , root node v 0 and edge set E.
be the set of edges emanating from v and C be a set of labels.
An X-compatible staged tree is a triple T = (V, E, θ), where (V, E) is a rooted directed tree and: Points 1 and 2 above construct a rooted tree where each root-to-leaf path, or equivalently each leaf, is associated to an element of the sample space X.Then a labeling of the edges of such a tree is defined where labels are pairs with one element from a set C and the other from the sample space X i of the corresponding variable X i in the tree.By construction, X-compatible staged trees are such that two vertices can be in the same stage if and only if they correspond to the same sample space.For example, ).This representation of the labeling θ over vertices is equivalent to that over edges, whilst being more interpretable, and is henceforth used.There are 29 internal vertices and the staging is The parameter space associated to an X-compatible staged tree T = (V, E, θ) with labeling θ : E → L is defined as Equation ( 6) defines a class of probability mass functions over the edges emanating from any internal vertex coinciding with conditional distributions Let l T denote the leaves of a staged tree T .Given a vertex v ∈ V , there is a unique path in T from the root v 0 to v, denoted as λ(v).The depth of a vertex v ∈ V equals the number of edges in λ(v).For any path λ in T , let E(λ) = {e ∈ E : e ∈ λ} denote the set of edges in the path λ.
Definition 2. The staged tree model M T associated to the X-compatible staged tree Therefore, staged tree models are such that atomic probabilities are equal to the product of the edge labels in root-to-leaf paths and coincide with the usual factorization of mass functions via recursive conditioning.
Conditional independence is formally modeled and represented in staged trees via the labeling θ.As an illustration, consider the staged tree in Figure 2 for the Titanic The staged tree in Figure 2, embedding the above non-symmetric conditional independences, gives a better representation of the data than the BN in Figure 1.Indeed, the BIC of the staged tree can be computed as 10440.39,whilst the one of the BN is larger and equal to 10502.28 (see Görgen et al., 2022, for a discussion of using the BIC for staged trees).
This example illustrates the capability of staged trees to graphically represent any type of non-symmetric conditional independence.Although such independences can be read directly from the tree via visual inspection, it becomes challenging to detect them as the size of the tree increases.Below we formalize how to assess the type of conditional independence existing between pairs of random variables.

Staged Trees and Bayesian Networks
Although the relationship between BNs and staged trees was already formalized by Smith and Anderson (2008), we introduce here an implementable routine to transform a DAG to its equivalent staged tree.
Assume X is topologically ordered with respect to a DAG G and consider an X- As an illustration, Figure 3 reports the tree T G associated to the BN in Figure 1.Since the variables Class, Gender and Survived are fully connected in the BN, the associated staged tree is such that vertices at depth one and two are in their own individual stages.
The only symmetric conditional independence embedded in the BN is represented by joining pairs of vertices at depth three (associated to the variable Age) in the same stage.Clearly, the staging of the staged tree representing a BN in Figure 3 exhibits a lot more symmetry than the one in Figure 2, that can represent a wide array of non-symmetric independences.
Our first contribution is the solution of the following inverse problem: given an Xcompatible staged tree T = (V, E, θ) find the corresponding DAG G.This DAG cannot represent, in general, the same staged tree model, since BNs cannot represent non-symmetric conditional independences.Nevertheless, we prove that we can retrieve a minimal DAG, in a sense the we formalize next.A proof of the result is in the supplementary material.
Proposition 1.Let T = (V, E, θ) be an X-compatible staged tree, with κ : V → C the vertex labeling that defines θ.Let G T = ([p], F T ) be the DAG with vertex set [p] and whose edge set F T includes the edge (k, i), k < i, if and only if there exist such that x j = x j for all j = k and κ(x A staged tree T is therefore a sub-model of the resulting G T which embeds the same set of symmetric conditional independences.The BN G T is minimal in the sense that it includes the smallest number of edges among all possible BNs that include M T as a sub-model.
The models M T and M G T are equal if and only if T embeds only symmetric conditional independences.As an illustration consider the staged tree in Figure 2. It can be shown using Proposition 1 that the associated BN G T is complete and therefore it must be that Conversely, if the staged tree in Figure 3 is transformed into a BN, then using Corollary 1 the resulting BN must be the one in Figure 1.
Importantly, Theorem 1 gives a novel criterion to read symmetric conditional independence statements from a staged tree, by transforming it into a BN whose structure represents the same equalities of the form in (2).Conditional independence statements in the staged tree can then be read from the associated BN using the d-separation criterion (see e.g.Darwiche, 2009).For instance, the staged tree in Figure 2 does not embed any symmetric conditional independence, since the associated BN is complete.
The supplementary material gives a detailed implementation of both conversion algorithms, from BN to staged tree and vice versa.

Non-Symmetric Dependence and DAGs
Proposition 1 identifies if there is a dependence between two random variables in a Xcompatible staged tree T and in such a case draws an edge in G T .However, the staged tree carries a lot more information about the type of relationship existing between the two variables.In this section we introduce methods to label the edges of G T so to depict some of the information about the non-symmetric independences of T in G T .

Classes of Statistical Dependence
First we need to characterize the type of dependence existing between two random variables that are joined by an edge in a DAG G.
Definition 3. Let P be the joint mass function of X and P be Markov with respect to a DAG G = ([p], F ).For each (j, i) ∈ F we say that the dependence of X i from X j is of class and X i and X j are not context-specific independent given the same context x C .
• local, if none of the above hold and a local independence of the form P ( • total, if none of the above hold. Notice that if the class of dependence between X i and X j is context or partial then there may also be local independence statements as in ( 5) involving these two variables.
Similarly, the dependence between X i and X j can be both context and partial with respect to two different contexts.On the other hand if their class of dependence is local then, by definition, there are no context-specific or partial equalities.
Proposition 1 paves the way to assess the class of dependence existing between X i and X j .In particular, one has to check if there are equalities of the form ( 8) for all or some and, if so, to which class they correspond.A discussion of the implementation of such checks is in the supplementary material.As illustrated in Section 6, these can be performed quickly although all combinations of ancestral variables have to be considered.

Asymmetry-Labeled DAGs
An edge in a BN represents, by construction, a total dependence between two random variables.However, the flexibility of staged trees allows us to assess if such a dependence is of any other of the classes introduced in Definition 3.This observation leads us to define a new graphical representation, that we term asymmetry-labeled DAG (ALDAG), where edges are colored depending on the type of relationship between variables.
Formally, let G be a DAG and F its edge set and be the set of edge labels marking the type of dependence.
Definition 4.An ALDAG is a pair (G, ψ) where G = ([p], F ) is a DAG and ψ is a function from the edge set of G to L A , i.e. ψ : F → L A .We say that a joint mass function P is compatible with an ALDAG (G, ψ) if P is Markov to G and additionally P respects all the edge labels given by ψ; that is, for each (j, i) ∈ F , X i is ψ(i, j) dependent from X j .
Henceforth, we represent the labeling via a coloring of the edges of the ALDAG.Standard BNs have an ALDAG representation where all edges have label 'total'.Notice that standard features of BNs are also valid over ALDAGs: for instance, the already-mentioned d-separation criterion as well as fast probability propagation algorithms.
ALDAGs share features with labeled DAGs of Pensar et al. (2015) but they differ in two critical aspects: first, labeled DAGs can only embed context-specific independence whilst ALDAGs represent any type of asymmetric independence; second, labeled DAGs specifically report the contexts over which independences hold, whilst ALDAGs do not.There are two reasons behind this: on one hand, the specific independences in ALDAGs can be read from the associated staged tree; on the other, for applications with a larger number of variables the required contexts are often too complex to be reported within the DAG.
As an illustration of ALDAGs consider the staged tree for the Titanic data in Figure 2 which, using Proposition 1, is transformed into the ALDAG in Figure 4.Although the ALDAG does not carry all the information stored in the staged tree, which is quite complex to read, it intuitively describes the classes of dependence existing among the random variables.The blue edges denote that there is partial dependence between Class and any other variable.The red edges denote that Age has a context dependence with Gender and The edge coloring is: red -context; blue -partial; green -local.
that there is only a local dependence between these two variables, and therefore there is no specific pattern guiding the equalities of probabilities between these two variables.Such extended forms of dependence better describe the fate of the Titanic passengers since, as already noticed, the BIC of the associated staged tree is smaller than the one of the best scoring BN.

Constructing ALDAGs
ALDAGs can be obtained from estimated staged trees, and, in particular, with the following routine: (i) learn a staged tree model T from data, using for instance any of the algorithms in stagedtrees; (ii) transform T into G T as in Proposition 1; (iii) assign a label to each edge of G T by checking the equalities in (8) that hold in T .Steps (ii) and (iii) are implemented in the stagedtrees R package using the algorithms in the supplementary material.
The most critical and computationally expensive step of learning an ALDAG is the staged tree learning step (i).There is a now large literature on learning staged trees from data (Carli et al., 2020(Carli et al., , 2022;;Cowell and Smith, 2014;Freeman and Smith, 2011;Leonelli and Varando, 2022b;Silander and Leong, 2013).Here we consider the available algorithms implemented in the stagedtrees R package (Carli et al., 2022).In particular, we will use the following algorithms which work with a fixed order of the variables (see Carli et al., 2022, for more details): • an hill-climbing (HC) algorithm which at each step either joins or splits vertices of the tree in stages by optimizing a model score (usually the BIC); • a backward (hill-climbing) (BHC) algorithm which, at each step, can only join stages together by optimizing a model score.
Furthermore, any of the above mentioned algorithms can be used within the dynamic programming approach of Cowell and Smith (2014) to also choose an optimal order of the variables (Leonelli and Varando, 2021).
An ALDAG can also be obtained as a refinement of the DAG of a BN by the addition The use of these algorithms in practice is extensively illustrated in Section 6 below.

Searching Context-Specific Independences
The flexibility of staged trees to represent any type of non-symmetric independence has two major drawbacks: on one hand reading independences from the tree can become complex; on the other learning trees from data can be computationally very expensive.
To address these two difficulties we introduce here a new heuristic search for the stage structure, motivated by our new definition of the ALDAG.In particular we consider a backward hill-climbing algorithms that, for each variable, iteratively adds context-specific independence relationships to optimize a given score (e.g.BIC).In particular, at each step, the algorithm searches all possible additional context-specific conditional independences of the form, where C = [i] \ {i, j} and thus x C ∈ X C is a context specified by all variables preceding X i in the tree except X j .Complete details about the algorithm can be found in the supplementary material.
Therefore the ALDAG of a staged tree learned with the CSBHC algorithm, can only have context and partial (and context/partial) edge labels.As an example, in Figure 5 we report the tree learned with CSBHC for the Titanic dataset and its associated ALDAG.It can be seen that the ALDAG shares some features with the one in Figure 4, but in this case the local edge does not appear.The tree in Figure 5 has a BIC of 10479 which is worse than the one of the generic staged tree (BIC = 10440), but still better than the BN (BIC = 10502), again highlighting the need for models embedding non-symmetric independences.

Applications
We now consider a variety of datasets commonly used in the probabilistic graphical models literature.First we carry out an experiment to assess the performance of ALDAGs as well as the complexity of our routines.Then we consider two additional real-world applications to further illustrate the capabilities of staged trees and ALDAGs.

Computational Experiment
Nine datasets which are either available in R packages or downloaded from the UCI repository are considered.For each dataset a DAG is first learned by optimizing the BIC score (using a tabu greedy search, Scutari, 2010) and then both the BHC and the proposed CSBHC algorithms are used to refine the DAG to staged trees.The learned staged trees are then transformed into ALDAGs using Algorithm 2 given in the supplementary material.The results, summarized in Tables 1 and 2, suggest the following: • The refined staged tree provides a better fit than the standard BN in all datasets considered, thus highlighting the need to consider models which embed asymmetric conditional independences to untangle complex dependence structure; • only a small fraction of the edges learned via BN structural search algorithms are actually related to a symmetric dependence between variables.All ALDAGs have a much larger number of edges with a label which is not total; • the construction of the ALDAG does not impose a computational burden with respect to the staged tree model selection step, which is in all cases but one, the most computationally expensive task.Furthermore, the experiment shows that the methods are efficiently implemented even for a medium-large number of variables.

Aspects of Everyday Life
We next illustrate the use of staged trees to uncover dependence structures using data from the 2014 survey "Aspects on everyday life" collected by ISTAT (the Italian National A staged tree over the same dataset is learned as follows.First, we learn a staged tree with the hill-climbing algorithm (optimizing BIC score) and considering all possible orders of the variables but life grade (L), which we then fix as the last variable of the tree.The resulting staged tree is plotted in Figure 7 (left).
The staging of the life grade variable reveals a complex dependence pattern from which interesting conclusions can be drawn.believe in people, life grade does not depend on gender for those that practice sports and have friends they can count on (stages v 37 -v 40 ).Similarly, conditionally on whether individuals believe in people, the distribution of life grade is the same for male individuals that practice sports who either have friends or are unsure about it (stages v 37 , v 38 , v 41 , v 42 ).
It can also be noticed that gender almost has no effect on whether individuals believe in others, the only dependence existing for individuals that practice sports and are unsure about having friends (stages v 19 -v 20 ).These are just few of the many conclusions that can be drawn from the tree and a whole explanation is beyond the scope of this analysis.
Although the staged tree in Figure 7 can still be visually inspected, it is already rather extensive with 72 leaves and 45 internal vertices.Its associated ALDAG in Figure 6 (middle) provides a compact summary of the dependence structure and shows that all variables are related to each other according to different types of dependence.
An alternative tree is learned with the CSBHC algorithm over the same dataset and variable ordering and is reported in Figure 7 (right).The tree has a staging which is a lot more symmetric than the one of the generic staged tree.For instance, it states that life grade is independent of gender and the availability of friends conditionally on whether you believe in people and on conducting sport activity (stages v 33 -v 44 ).Also the staging of the vertices v 9 -v 20 reports that, given a specific level of sports activity and the availability of friends, gender does not affect whether an individual believes in people.The associated ALDAG in Figure 6 (right) is therefore not complete and has a missing edge from G to B. It can also be noticed that edges are only of type total or context.Compared to the standard BN, both ALDAGs show additional patterns of dependence that can be retrieved because the assumption of symmetric dependence was relaxed.
Notice that both the generic staged tree and the alternative staged tree obtained with CSBHC, provide a better description of the dependence structure of the data, since they have lower BICs than the one associated to the BN, 251648 and 251673 respectively.

Enterprise Innovation
We next consider data from the 2012 Italian enterprise innovation survey, again collected by ISTAT (ISTAT, 2015).The survey reports information about medium-sized Italian companies as well as their involvement with innovation in the three-years period between 2010-2012.The aim of the analysis is to assess which factors related to innovation are connected with changes in the company revenue.
Out of the many questions available in the survey, we consider 14 factors that could influence the revenue of an enterprise.The variables considered are summarized in Table 3. Instances with a missing answer were dropped, resulting in 8938 answers to the survey.
Notice that in this situation it is unfeasible to study directly the staged tree, since it would have more than 100k leaves making it impossible to visualize its staging.
Therefore, we follow the alternative strategy outlined in Section 5.3 of creating an   ALDAG as a refinement of a BN.Thus, we first learn a BN from data, reported in Figure 8 (by not considering the edge coloring).Interestingly, the BN suggests that the only two factors that have a direct influence on the change of revenue of a company (GROWTH) are the number of employees (EMP) and whether it carried out innovations of other types in the past three years (INPD) -meaning not product or service innovations.
The resulting BN is refined into an ALDAG using the backward hill-climbing algorithm which can only join staged together and is reported in Figure 8.Of the original 35 edges, only five are still of type total and embedding symmetric independences.All other edges are colored, indicating that there are types of dependence in the data which cannot be represented symmetrically.This is confirmed by the BIC of the ALDAG which is equal to 133689, much lower than the one of the BN (134311).
Since GROWTH is independent of all other variables conditionally on EMP and INPD and since our interest is in assessing how factors are relevant for the change of revenue, we can construct a tree with these three variables only and deleting all those that are conditionally independent.We call such a tree the dependence subtree.This is reported in Figure 9 for GROWTH and the ALDAG in Figure 8.It shows the staging of the variable GROWTH using only its parents in the associated ALDAG.The staging tells us that for larger companies the probability of revenue change does not depend on other innovations and it is the same as for medium-sized companies that invested in other innovations (stages v 7 -v 9 ).Medium-sized companies that did not invest in other innovations have the same probability of revenue change as small ones that did invest in other innovations (stages v 5 -v 6 ).
Importantly, larger enterprises and medium-sized ones that invested in other innovations have the larger probability of increasing revenue (0.61).On the other hand smaller companies that did not invest in other innovations are more likely to decrease their revenue since their probability of increasing is only 0.47.
An algorithm for constructing the dependence subtree is based on a simple variation of those given in the supplementary material.Dependence subtrees are extremely powerful since they allow us to visualize the dependence structure of GROWTH by means of the small tree in Figure 9, without having to investigate the full staged tree having more than 100k leaves.

Discussion
Staged trees are a flexible class of models that can represent highly non-symmetric relationships.This richness has the drawback that independences are often difficult to assess and visualize intuitively through its graph.In this paper we introduce methods that summarize both the symmetric and non-symmetric relationships learned from data via structural learning by transforming the tree into a DAG.As a result, we introduced a novel class of graphs extending DAGs by labeling their edges.Our data applications showed the superior fit to data of such models as well as the information they can provide in real domains.
The new DAG edge labeling is based on the identification of the class of dependence.
A different possibility would be to define a dependence index between any two variables which measures how different their relationship is from total dependence/independence.By learning a staged tree from data we could then label the edges of a BN with such indexes.
The definition of such models is the focus of current research.
This work further provides a first criterion to read any symmetric conditional independence from a staged tree.Algorithms to assess if generic non-symmetric conditional independence statements hold still need to be developed.Here we have provided an intermediate solution to this problem by characterizing if a non-symmetric independence exists or not.We plan to provide a conclusive solution to non-symmetric independence queries in future work.Thwaites, P., J. Q. Smith, and E. Riccomagno (2010).Causal analysis with chain event graphs.Artificial Intelligence 174 (12-13), 889-909.

Figure 1 :
Figure 1: Learned BN for the Titanic dataset.

Figure 2 :
Figure 2: A staged tree compatible with (Class, Gender, Survived, Age), learned over the Titanic dataset.

Figure 2
Figure 2 reports an example of an X-compatible staged tree model for the Titanic dataset learned with the R package stagedtrees.The coloring given by the function κ is shown in the vertices and each edge (•, (x 1 , . . ., x i )) is labeled with x i .The edge labeling θ can be read from the graph combining the text label and the color of the emanating vertex.
dataset.The fact that v 1 and v 2 are in the same stage represents the partial independence Gender ⊥ ⊥ Class|{1st, 2nd}.Considering vertices at depth two, the green and yellow staging again represents partial conditional independences.More interesting is the blue staging of the vertices v 5 and v 10 which implies P (S = s | Female, 3rd) = P (S = s | Male, 1st), i.e. the probability of survival for females travelling in third class is the same as that of males travelling in first class.Such a statement is a generic local conditional independence.Considering the last level, we can notice a very non-symmetric staging structure.As an illustration, consider the top four vertices v 25 , v 26 , v 27 and v 28 belonging to the same stage.This implies the context-specific independence Age ⊥ ⊥ Survived, Gender | Class = Crew.

Figure 3 :
Figure 3: The staged tree representation of the BN in Figure 1 for the Titanic dataset.

Survived.Figure 4 :
Figure 4: An ALDAG for the Titanic dataset constructed from the staged tree in Figure 2.
of edge labels indicating the class of dependence.Given a DAG G, the following steps implement such a refinement: (i) transform G into the staged tree T G using any of the topological orders of variables; (ii) run a backward hill-climbing algorithm using T G as starting model and obtain a new tree T ; (iii) transform T into G T and apply the edgelabeling.The resulting ALDAG has an edge set which is either equal to or a subset of the edge set of G. Furthermore, the edge set is now labeled and denoting the classes of dependence.

Figure 5 :
Figure 5: A staged tree learned with the proposed CSBHC algorithm over the Titanic dataset and its associated ALDAG.The edge coloring is: red -context; blue -partial; purple -context/partial; black -total.
Institute of Statistics)(ISTAT, 2014).The survey collects information from the Italian population on a variety of aspects of their daily lives.For the purpose of this analysis we consider five of the many questions asked in the survey: do you practice sports regularly?(S = yes/no); do you have friends you can count on?(F = yes/no/unsure); do you believe in people?(B = yes/no); what's your gender (G = male/female); what grade would you give to your life?(L = low/medium/high) 1 .Instances with a missing answer were dropped, resulting in 38156 answers to the survey.Our aim is to analyse how various factors affect the life grade of the Italian population.A BN with the variable life grade (L) as downstream variable is learned using the hill-climbing function (optimizing the BIC score) in the bnlearn package by blocklisting all outbond edges from L. The learned DAG is reported in Figure6(left).This embeds the symmetric conditional independence L,B,F⊥ ⊥G|S: given the level of sports activity, gender has no effect on life grade, on trust in people and on the availability of friends.The learned BN has a BIC of 251781.

Figure 6 :
Figure 6: BN for the aspects of everyday life data (left) as well as ALDAGs over the same data associated to the staged trees learned with BHC (center) and CSBHC (right).The edge coloring is: red -context; blue -partial; violet -context/partial; green -local; blacktotal.

Figure 7 :
Figure 7: Staged tree learned using the hill-climbing algorithm (left) and the CSBHC algorithm (right) over the aspects of everyday life data with variables' order S, F, G, B, L.

Figure 8 :
Figure 8: BN (without considering edge coloring) and ALDAG for the enterprise innovation survey data.The edge coloring is: red -context; blue -partial; violet -context/partial; green -local; black -total.

Figure 9 :
Figure9: The dependence subtree associated to the ALDAG in Figure8for the variable GROWTH.The variable order is (EMP,INPD,GROWTH).

Table 1 :
Results of the data experiments.ALDAG- * columns report the number of edges by type (total, context, partial, context/partial, local) in the two ALDAGs obtained by refinement of the DAG with BHC and CSBHC algorithms respectively.

Table 2 :
Elapsed time (in seconds) for the data experiments.The DAG column reports the seconds to estimate the DAG; The columns BHC and CSBHC report the seconds needed to refine the DAG to a staged tree, with the BHC and the CSBHC algorithms respectively; lastly the ALDAG column contains the time to build the ALDAG from the staged trees.

Table 3 :
Variables from the 2012 ISTAT enterprise innovation survey.