Exceptional contextual subgraph mining
 660 Downloads
 6 Citations
Abstract
Many relational data result from the aggregation of several individual behaviors described by some characteristics. For instance, a bikesharing system may be modeled as a graph where vertices stand for bikeshare stations and connections represent bike trips made by users from one station to another. Stations and trips are described by additional information such as the description of the geographical environment of the stations (business vs. residential area, closeness to POI, elevation, urbanization density, etc.), or properties of the bike trips (timestamp, user profile, weather, events and other special conditions about the trip). Identifying highly connected components (such as communities or quasicliques) in this graph provides interesting insights into global usages but does not capture mobility profiles that characterize a subpopulation. To tackle this problem we propose an approach rooted in exceptional model mining to find exceptional contextual subgraphs, i.e., subgraphs generated from a context or a description of the individual behaviors that is exceptional (behaves in a different way) compared to the whole augmented graph. The dependency between a context and an edge is assessed by a \(\chi ^2\) test and the weighted relative accuracy measure is used to only retain contexts that strongly characterize connected subgraphs. We present an original algorithm that uses sophisticated pruning techniques to restrict the search space of vertices, context refinements, and edges to be considered. An experimental evaluation on synthetic data and two reallife datasets demonstrates the effectiveness of the proposed pruning mechanisms, as well as the relevance of the discovered patterns.
Keywords
Attributed graphs Exceptional Model Mining Subgroup discovery Supervised pattern mining1 Introduction
Providing tools and methods to discover new actionable insights into heterogeneous data is widely considered to be one of the most important challenges of data science, especially in the data mining and machine learning communities. A natural way to handle and understand such complex data is to model them as graphs, a powerful mathematical abstraction that makes it possible to support a large variety of analyses in a generic way. This partially explains why graph mining has generated considerable interests in terms of both fundamental and applied research. A striking feature is its ability to allow better understanding of social interactions and to provide support for many tasks such as social recommendations (Jiang et al. 2012), community discovery (Girvan and Newman 2002), social influence propagation (Goyal et al. 2013), and link prediction (Bringmann et al. 2010).
In realworld phenomena, vertices and edges are often characterized by attributes. It is also very common that these graphs are dynamic, with vertex and edge attributes evolving through time. The design of effective graph mining methods to discover actionable insights in such graphs is therefore a current challenge, to derive new knowledge about the underlying rules that govern networks (Sun and Han 2012). The last decade has witnessed intense growth in the analysis of dynamic graphs, especially from two main research tracks: (a) the study of the properties that describe the topology of the graph (de Melo et al. 2011; Tong et al. 2008), and (b) the extraction of specific subgraphs to describe the graph evolution (Berlingerio et al. 2009; Robardet 2009; You et al. 2009). Surprisingly, the simultaneous consideration of the dynamics of the graph structure and the additional vertex and edge properties has not been given much attention. In this paper, we move towards this new direction.
Example of data: (a) Bikeshare station attributes, (b) Users attributes, (c) Bike trip attributes and (d) Augmented graph corresponding to those data
Station  Type of area  

(a) Bikeshare stations  
A  Bars  
B  Bars  
C  Bars  
D  Bars  
E  Residential 
UID  Gender  Age  

(b) User characteristics  
\(u_1\)  F  20  
\(u_2\)  M  23  
\(u_3\)  F  45  
\(u_4\)  M  50  
\(u_5\)  F  30 
MID  Departure  Arrival  UID  Time  Weather 

(c) Bike trip characteristics  
\(m_{1}\)  A  B  \(u_1\)  Day  Rainy 
\(m_{2}\)  A  B  \(u_2\)  Night  Windy 
\(m_{3}\)  A  B  \(u_3\)  Night  Cloudy 
\(m_{4}\)  A  B  \(u_4\)  Day  Windy 
\(m_{5}\)  A  B  \(u_5\)  Night  Rainy 
\(m_{6}\)  B  C  \(u_1\)  Night  Cloudy 
\(m_{7}\)  B  C  \(u_1\)  Night  Windy 
\(m_{8}\)  B  C  \(u_1\)  Night  Rainy 
\(m_{9}\)  B  D  \(u_2\)  Night  Cloudy 
\(m_{10}\)  B  D  \(u_1\)  Night  Windy 
\(m_{11}\)  B  D  \(u_1\)  Night  Cloudy 
\(m_{12}\)  C  D  \(u_1\)  Night  Rainy 
\(m_{13}\)  C  D  \(u_2\)  Night  Rainy 
\(m_{14}\)  D  E  \(u_1\)  Night  Cloudy 
\(m_{15}\)  D  E  \(u_2\)  Night  Windy 
\(m_{16}\)  D  E  \(u_3\)  Day  Rainy 
\(m_{17}\)  D  E  \(u_3\)  Night  Windy 
\(m_{18}\)  D  E  \(u_4\)  Night  Rainy 
(d) Augmented graph  

The considered data are made of a collection of connections between nodes characterized by a set of attributes. This rich dataset is a multigraph, which can be envisaged as a transactional database anchored to a graph. In other words, each connection is recorded as a transaction containing attributes and associated to the edge along which the connection occurred. A context, i.e., a set of conditions on the transaction attributes, is used as a selection operator that identifies the subgroup of supporting transactions. A socalled contextual subgraph is derived from this subgroup of connections as the graph weighted by the number of transactions that for each edge support the context. We propose to use a generalization mechanism on the contexts and to exploit it to identify exceptional contextual subgraphs, that is, contextual subgraphs whose weights are abnormally large in comparison to the most general contextual graph (the one containing all connections). Such exceptional subgraphs are of interest as most of the transactions that are associated to their edges in the whole graph support the context. For example, on the data of Table 1, the proposed method identifies connected stations that are travelled in the same context. Figure 1b represents the contextual subgraph that corresponds to the stations that are visited by young people (age in \([20;23]\)) at night. The number of trips that satisfy the context on each edge can be used as a support measure (see the weights on the edges) but this measure is not sufficient to evaluate how strongly the context is related to these edges, in contrast to all other movements occurring in this context. To that end, we use the Weighted Relative Accuracy measure (WRAcc) to only retain contexts whose accuracy on the edge is markedly higher than the one obtained by the most general context on this edge. Figure 1c represents the subgraph of locations visited by young people at night whose edges have a positive WRAcc value. The most specific context associated to this graph also includes the attribute Type of area = {bars}. The affinity of a context to an edge is also statistically assessed by a \(\chi ^2\) test.

The definition of exceptional contextual subgraph patterns in dynamic attributed graphs as an instance of the EMM framework.

The design of an efficient algorithm COSMIc that exploits several constraints, even those that are neither monotonic nor antimonotonic, to identify such subgraphs.

A quantitative and qualitative empirical study. We report on the evaluation of the efficiency and the effectiveness of the algorithm on two realworld dynamic attributed graphs.
The rest of this paper is organized as follows. We review the related work in Sect. 2. We then formally define the notions of augmented graph and contextual subgraphs and introduce the exceptional contextual subgraph problem as an instance of EMM in Sect. 3. Section 4 describes an exhaustive algorithm, COSMIc, that differs from beamsearch usually employed in EMM methods. We report a thorough empirical study of the algorithm COSMIc with synthetic data in Sect. 5 before comparing it to concurrent approaches (Sect. 6) and showing the usefulness of our approach with two realworld scenarios (Sect. 7). Section 8 concludes.
2 Related work
Finding descriptions of subpopulations for which the distribution of a single predefined target value is significantly different from the distribution in the whole data is a problem that has been widely studied in subgroup discovery (SD) (Lavrac et al. 2004; Novak et al. 2009). Finding subgroups of objects for which a more accurate and robust model of multiple target values can be learned/built, instead of considering the whole data, has then been introduced as Exceptional Model Mining (EMM) (Leman et al. 2008; Duivesteijn et al. 2016). In this framework, depicted in Fig. 2, there are two types of attributes, those used to characterize the subgroups (i.e., the object description), and others employed to evaluate the subgroup quality (i.e., the targets). Subgroups of interest are selected based on the quality of a model evaluated on the targets [e.g., classifier (Leman et al. 2008), Bayesian Networks (Duivesteijn et al. 2010), encoding based on Minimum Description Length (van Leeuwen 2010)]. The combination of large description and target spaces, as well as the use of nonmonotonic measures require the adoption of heuristic search methods such as beam search.
The exceptional contextual subgraph mining problem that we propose here belongs to this “exceptional subgroups” framework together with EMM and SD. In our case, the data consist of a collection of transactions anchored to a graph, the subgroups are then sets of transactions described by queries over the attributes. These subgroups can be naturally projected onto the edges of the graph, and we look for those that exhibit particular distributions, that is, those that induce a subgraph where they are particularly heavily represented. Specifically, the exceptionality of a subgroup depends on the existence of a subgraph where transactions from the subgroup are overrepresented compared to the remaining transactions and the remaining edges. By exploiting the connectivity of the subgraph, we are able to dynamically reduce the target search space, and propose an exact algorithm that performs successful extractions where heuristic techniques fail, as demonstrated in Sect. 6. To the best of our knowledge, the only EMM approach that uses exhaustive search has been proposed in Lemmerich et al. (2012), which adapts FPtrees to handle a number of countingbased measures for unstructured targets. Our approach can be viewed as an extension of such works.
Exceptional contextual subgraph mining problem is also related to augmented graph mining, where graphs have additional information on vertices or edges. Several settings have been considered so far, as detailed in the next paragraphs.
Vertexattributed graphs In a pioneering work, Moser et al. (2009) propose a method to find dense homogeneous subgraphs, i.e., subgraphs whose vertices share a large set of attributes. Similar to that work, Günnemann et al. (2010) present a method based on subspace clustering and dense subgraph mining to extract non redundant subgraphs that are homogeneous with respect to vertex attributes. Silva et al. (2012) extract pairs of dense subgraphs and Boolean attribute sets such that the Boolean attributes are strongly associated with the dense subgraphs. Similarly, Mougel et al. (2013) introduce the problem of mining maximal homogeneous clique sets. Khan et al. (2010) design a probabilistic approach to both construct the neighborhood of a vertex and propagate information into this neighborhood. Following the same motivation, Sese et al. (2010) extract (not necessarily dense) subgraph with common itemsets. Prado et al. (2013) propose to mine the graph topology of a large attributed graph by finding regularities among vertex descriptors. Interestingly, in a recent work Atzmueller et al. (2016) use a subgroup discovery approach to mine descriptions of communities, treating the communities as an (aggregated) target.
Edgeattributed graphs Existing approaches use edge information to define a similarity measure on edges in order to identify subgraphs or communities. In the proposal by Qi et al. (2012), edges are considered similar according to their associated collections of labels. Similarly, Bonchi et al. (2012) find clusters of edges such that edges of a cluster have the same labels. Berlingerio et al. (2013) propose multidimensional network analysis, where connections between vertices belong to different dimensions (e.g. cities can have both train and plane connections) and extend a number of network measures to multidimensional graphs. In this approach, two vertices connected by edges from different dimension are considered to be more strongly connected, whereas in our exceptional contextual subgraphs framework, dimensions are not presupposed but inferred based on high weighted relative accuracy. In the multilayer coherent subgraph approach called MiMag, Boden et al. (2012) use numerical labels on vertices to assess edges’ similarity in different layers of the graph. Vertices connected by edges with similar weights induce quasicliques. There is again a conceptual shift with our proposal: MiMag might consider similar edges that are not very typical for a context/layer. There is also the semantic difference that all edges in a exceptional contextual subgraph match the associated context while being typical, whereas MiMag may group very different contexts—misled by some similar behaviors of distinct subgroups. We found evidence of both effects in experiments reported in Sect. 6.
Dynamic graphs Various approaches have been proposed to characterize either the graph evolution by focusing on some topological properties (Tong et al. 2008), or the graph evolution by means of patterns/rules that are much more meaningful, identifying local interpretable substructures of interest. Borgwardt et al. (2006) introduce the problem of mining frequent subgraphs in dynamic graphs, i.e. isomorphic graphs that appear in consecutive timestamps. In Lahiri and BergerWolf (2008) also extract frequent subgraphs but at periodic or nearperiodic timestamps. Inokuchi and Washio (2010) define frequent induced subgraph subsequences, i.e. subgraph subsequences whose isomorphic occurrences appear frequently in a graph sequence collection. Prado et al. (2013) extract spatiotemporal patterns in a sequence of planar graphs. Robardet (2009) proposes an algorithm to extract evolving patterns, i.e. pseudocliques which appear in consecutive timestamps with slight evolutions. Ahmed and Karypis (2011) mine the evolution of conserved relational states, i.e. sequences of timeconserved patterns on consecutive time. Yang et al. (2013) devise an algorithm to identify the most frequently changing component. You et al. (2009) compute graph rewriting rules that describe the evolution between consecutive graphs. These rules are then abstracted into patterns representing the dynamics of graphs. Berlingerio et al. (2009) extract patterns based on frequency and derive evolution rules to solve prediction problems in Bringmann et al. (2010). All these works only focus on the graph structure and do not consider attributes related to the vertices and/or the edges.
Dynamic attributed graphs In Desmier et al. (2013), Desmier et al. define a new pattern domain that relies on the graph structure and the temporal evolution of the attribute values. It makes it possible to discover subgraphs of small diameter whose vertex attributes follow the same trends. Kaytoue et al. (2014) devise an algorithm to characterize local structure changes in a sequence of vertexattributes trends. While considering attributes on vertices and edges, exceptional contextual subgraphs also offer the opportunity to analyse the dynamics of relational data, when transactions associated to edges are timestamped.
3 The problem of exceptional contextual subgraph mining
In the following, we present the notion of augmented graph in which Exceptional Contextual Graphs are looked for. We describe the pattern domain incrementally: First contexts are introduced and different mappings/derivation operators allow to introduce contextual graphs. After, we introduced two evaluation measures used to filter uninteresting edges from such graphs. Finally, the problem of mining Exceptional Contextual Subgraphs is properly given.
3.1 Deriving contextual subgraphs
3.1.1 Augmented graphs
The data we are interested in consist of a set of entities and a collection of connections between pairs of these entities, augmented with rich heterogeneous data about the entities and the circumstances of the connections. For instance, in Table 1, the entities can represent bikeshare stations in a city with connections corresponding to bike trips made by users from one station to another. Additional details are available: The entities, i.e. stations, are geolocated and can be associated to additional information, characterizing their location (business vs. residential area, closeness to POI, elevation, urbanisation density, etc.) (see Table 1a). The connections, i.e. bike trips, are timestamped and can be augmented with the profile of the user, weather, events and other special conditions about the trip (see Table 1b, c). This rich dataset is a multigraph, which can be viewed as a transactional database anchored to a graph (which is called augmented graph as explained afterwards). In other words, each connection is recorded as a transaction containing attributes (the join of Table 1a, b and c) and associated to a source and a target entity that form the directed edge^{1} along which the connection occurred. This type of data is called augmented graph and is formally defined below.
Definition 1
(Augmented graph) Let R be a relation whose schema is denoted \(S_R=[R_1,\dots , R_p]\). Each attribute \(R_i\) takes values in \(\mathbf{dom }(R_i)\) that is either nominal, if there is no order relation among attribute modalities, or numerical. A transaction \(t\in R\) of this relation is a tuple \((t_1,\dots ,t_p)\) with \(t_i\in \mathbf{dom }(R_i)\). An augmented graph \(G=(V,E,T,{\textsc {Edge}})\) consists of a set V of vertices, a set \(E\subseteq V\times V\) of edges, a set T of transactions, and a function that maps a transaction to its edge: \({\textsc {Edge}}\, :\, T\rightarrow E\).
EMM extends classical subgroup discovery—the discovery of subgroups described by a few conditions on their attributes and whose target attribute somehow deviates from the norm—to the case where several target attributes are considered and used to derive a model. A subgroup is thus deemed interesting when its associated model is substantially different from the model on the whole dataset. In such a framework, our problem, which is illustrated in Fig. 3, can be depicted as follows: The data (i.e., the augmented graph) consist of a collection of transactions (or records) composed of attributes and associated to an edge of the graph. A description, which is here called a context, is used to select transactions that support it. This set of transactions is then projected onto a socalled contextual graph, on which the interestingness or exceptionality of the context is evaluated. Hence, edges correspond to multiple targets and the contextual graph plays the same role as the model in EMM.
3.1.2 Contextual graphs
Let us first define a context, that is, the description of a set of transactions.
Definition 2

\(C_i=a\), with \(a\in \mathbf{dom }(R_i)\), iff \(R_i\) is nominal and \(\forall (t_1,\ldots ,t_i,\ldots ,t_p)\in S,\, t_i=a\)

\(C_i=\star _i\), with \(\star _i\) a new symbol representing the whole set \(\mathbf{dom }(R_i)\), iff \(R_i\) is nominal and there exists two transactions \(t,t^\prime \in S\) such that \(t_i\not =t^\prime _i\).

\(C_i=[a,b]\), with \(a=\min \lbrace t_i\mid (t_1,\ldots ,t_i,\ldots ,t_p)\in S\rbrace \) and \(b=\max \lbrace t_i\mid (t_1,\ldots ,t_i,\ldots ,t_p)\in S\rbrace \) iff \(R_i\) is numerical.

\(t_i=C_i=a\), with \(a\in \mathbf{dom }(R_i)\) and \(R_i\) nominal

\(t_i\) is any of \(\mathbf{dom }(R_i)\), with \(C_i=\star _i\) and \(R_i\) nominal

\(a\le t_i\le b\), with \(C_i=[a,b]\) and \(R_i\) numerical.
It is important to note that a context is covered by a set of transactions. Each transaction is attached to an edge, so a set of transactions induces a subgraph. In a dual way, given an arbitrary subgraph, one can retrieve the set of transactions attached to its edges, and the most specific context that covers all these transactions. For convenience, we will use the following mappings between the different views of an augmented graph (illustrated on Fig. 4).
Definition 3

The mapping \(M_{C\rightarrow {}T}\) takes a context C as argument and returns the set of transactions that are covered by C, \(M_{C\rightarrow {}T}\) \((C) = \{t\in T\mid t\preceq C\} \subseteq T\). With arguments C and S this mapping returns the subset of transactions of \(S \subseteq T\) that are covered by C: \(M_{C\rightarrow {}T}\) \((C,S) = \{t\in S\mid t\preceq C\} \subseteq S\). For example, \(M_{C\rightarrow {}T}\) \((Age\in [23,45],\star ,\, Time\in \{Night\}, \, \star , type~of~area \in \{Bars\})= \{m_2,m_3,m_9,m_{13}\}\).

The mapping \(M_{T\rightarrow {}G}\) takes a set of transactions \(S \subseteq T\) and returns the subgraph consisting of the edges to which these transactions are attached: \(M_{T\rightarrow {}G}\) \((S) = \bigcup _{t \in S} {\textsc {Edge}}(t)\).

\(M_{C\rightarrow {}G}\) \(=\) \(M_{T\rightarrow {}G}\) \(\circ \) \(M_{C\rightarrow {}T}\) is the composition of the two operators introduced above.

The mapping \(M_{T\rightarrow {}C}\) is the counterpart of \(M_{C\rightarrow {}T}\). It takes a set of transactions \(S \subseteq T\) and returns the most specific context that covers all transactions in S. For example, \(M_{C\rightarrow {}T}\) \((\{m_2,m_3\})=(Age\in [23,45],\star ,\, Time\in \{Night\}, \, \star , type~of~area \in \{Bars\})\)

The mapping \(M_{G\rightarrow {}T}\) associates a set of transactions to an edge, that is the transactions that are attached to this(these) edge(s). It is the counterpart of \(M_{T\rightarrow {}G}\).

\(M_{G\rightarrow {}C}\) \(=\) \(M_{T\rightarrow {}C}\) \(\circ \) \(M_{G\rightarrow {}T}\) is the composition of the two operators introduced above.
By coupling these notions of augmented graph and context, we define a contextual subgraph as the projection of an augmented graph on a context, i.e. a graph whose edges are weighted by the number of their associated transactions that satisfy C:
Definition 4

\(W_C\, :\, E_C\rightarrow {\mathbb {R}}\) with \(W_C(e)=\vert M_{C\rightarrow {}T}(C,M_{G\rightarrow {}T}(e))\vert \), the number of transactions associated to e that satisfy C,

\(E_C = \lbrace e\in E\mid W_C(e) > 0\rbrace \).
For example, Fig. 1b shows the contextual subgraph of the context \((Age \in [20,23], \star ,Time\in \{Night\},\star ,\star )\).
3.1.3 Closed contexts
It may happen that some contexts map exactly to the same set of transactions: for \(C^1\) and \(C^2\) two different contexts, it is possible that \(M_{C\rightarrow {}T}\)(\(C^1\)) \(=\) \(M_{C\rightarrow {}T}\)(\(C^2\)) which implies that \(M_{C\rightarrow {}G}\) \((C^1)\) \(=\) \(M_{C\rightarrow {}G}\)(\(C^2\)). By using an appropriate order relation, it is possible to avoid this redundancy by considering only closed contexts.
Definition 5

\(C_i^2=\star _i\) or \(C_i^1=C_i^2=a\in \mathbf{dom }(R_i)\), for \(R_i\) a nominal attribute,

\([a_i^1,b_i^1]\subseteq [a_i^2,b_i^2]\) with \(C^1_i=[a_i^1,b_i^1]\) and \(C^2_i=[a_i^2,b_i^2]\), for all numerical attributes \(R_i\).
As such, instead of enumerating all contexts, it is enough to only enumerate the closed ones: The closure operator maps any context to the unique most specific one with the same image \(M_{C\rightarrow {}T}\).
Definition 6
(Closed context) A context C is closed iff \(\forall C^\prime \) such that \(M_{C\rightarrow {}T}\)(C)\(=\) \(M_{C\rightarrow {}T}\) \((C^\prime )\), \(C\preceq C^\prime \). Thus, \(M_{T\rightarrow {}C}\)(\(M_{C\rightarrow {}T}\)(\(C^\prime \))) returns the closed pattern of \(C^\prime \) and is called the closure operator.
The proof that \(M_{T\rightarrow {}C}\) \(\circ \) \(M_{C\rightarrow {}T}\) is a closure operator is omitted as it is a wellknown notion in the pattern mining and formal concept analysis fields.
3.2 Deriving exceptional contextual graphs
In pattern mining, it is usual to evaluate the interestingness of a pattern by wellchosen measures. To judge the strength of the dependency between a context and a derived graph (or each edge), we propose to use two evaluation measures: The Pearson’s chisquared test of independence (Pearson 1900) and the Weighted Relative Accuracy measure.
3.2.1 \(\chi ^2\) Test of independence
To evaluate the dependency between a context C and an edge e, we consider the proportion of transactions associated to e that satisfy the context and propose to statistically assess this value by means of a Pearson’s chisquared test of independence (Pearson 1900). This test determines whether or not the context appears significantly more often in the transactions of e than in all the whole set of transactions of the augmented graph.
Contingency tables O and E
(a) Contingency table O of events \(\mathbf {C}\) and \({\mathbf {e}}\)  

\({\mathbf {e}}\)  \(\overline{{\mathbf {e}}}\)  
\(\mathbf {C}\)  \(W_C(e)\)  \(\sum _{x\in E} W_C(x)W_C(e)\)  \(\sum _{x\in E} W_C(x)\) 
\(\overline{\mathbf {C}}\)  \(W_\star (e)W_C(e)\)  \(\sum _{x\in E} W_\star (x)W_\star (e) \sum _{x\in E} W_C(x)+W_C(e)\)  \(\sum _{x\in E} W_\star (x)\sum _{x\in E} W_C(x)\) 
\(W_\star (e)\)  \(\sum _{x\in E}W_\star (x)W_\star (e)\)  \(\sum _{x\in E}W_\star (x) = T \) 
(b) Contingency table E under the null hypothesis  

\({\mathbf {e}}\)  \(\overline{{\mathbf {e}}}\)  
\(\mathbf {C}\)  \(W_\star (e)\frac{\sum _{x\in E}W_C(x)}{\sum _{x\in E}W_\star (x)}\)  \(\left( \sum _{x\in E} W_\star (x)W_\star (e)\right) \times \frac{\sum _{x\in E}W_C(x)}{\sum _{x\in E}W_\star (x)}\)  \(\sum _{x\in E}W_C(x)\) 
\(\overline{\mathbf {C}}\)  \(W_\star (e)\times \left( 1 \frac{\sum _{x} W_C(x)}{\sum _{x\in E} W_\star (x)}\right) \)  \(\left( \sum _{x\in E} W_\star (x)W_\star (e)\right) \times \left( 1\frac{\sum _{x\in E} W_C(x)}{\sum _{x\in E} W_\star (x)}\right) \)  \(\sum _{x\in E} W_\star (x)\sum _{x\in E} W_C(x)\) 
\(W_\star (e)\)  \(\sum _{x\in E} W_\star (x)W_\star (e)\)  \(\sum _{x\in E} W_\star (x) = T\) 
3.2.2 The weighted relative accuracy measure
In the \(\chi ^2\) test of independence, the rejection of the null hypothesis can be due to either a very large or a very low value of \(\vert M_{C\rightarrow {}T}(C,M_{G\rightarrow {}T}(e))\vert \). We distinguish these two cases thanks to an additional measure, based on the Weighted Relative Accuracy measure.
3.2.3 Exceptional contextual graphs
Until now, we presented how to derive contextual subgraphs of an augmented graphs and introduced two measures to asses the significance of its edges. It remains to filter out the insignificant edges to obtain so called Exceptional contextual subgraphs. We formalize this with the following definition.
Definition 7
3.3 Deriving exceptional contextual connected components
We have defined the notion of the Exceptional Contextual Graph and how to derive instances of it and evaluate the affinity of a (closed) context to an edge (with \(\chi ^2\) test and WRAcc measure). Taking into account the topology of the subgraph associated to a context is also of interest. Its connectivity can be understood by examining its connected components. As numerical measures describing these connected components, we use the number of vertices and the number of edges. We also evaluate the global quality of the edges of each connected component by the sum of the individual WRAcc measures. We can now define precisely the kind of patterns that we are looking for, called Exceptional Contextual Connected Components, or simply Exceptional Contextual (Sub)Graphs for sake of simplicity.
Problem 1
As such, computing the whole collection of patterns requires one to enumerate the closed contexts and apply the different filtering and pruning operations as explained in the next section.
4 Algorithm
The theoretical search space of exceptional contextual subgraph patterns contains all possible combinations of contexts and subgraphs. Considering that contexts are ordered by \(\preceq \) and subgraphs by the inclusion of their set of edges, the pattern set is structured as a semilattice bounded by \(\{\star ,G_{\star }\}\). As contexts and subgraphs are linked by the mappings \(M_{C\rightarrow {}T}\), \(M_{T\rightarrow {}G}\), \(M_{G\rightarrow {}T}\) and \(M_{T\rightarrow {}C}\), we can enumerate one and derive the other one. In our proposed algorithm, named COSMIc,^{2} contexts are enumerated first and the associated subgraph is updated all along the enumeration process. Upper bounds and other pruning techniques are used to reduce the search space size, as explained in the following.
4.1 COSMIc principle
COSMIc enumerates contexts in a depthfirst search manner. Its pseudocode is given in Algorithm 1. Given the pattern \((C,G_C)\) that is currently explored, the algorithm returns all the specializations of C that are exceptional contextual subgraphs. If all the attributes have been instantiated (line 2), the connected components of \(G_C\) are considered (line 3) and the function CheckConstraints (line 4) is called: It returns true iff \((C,CC_C)\) satisfies all the constraints of Definition 7 and Problem 1. In that case, the pattern is output (line 5).
If the attribute \(R_i\) can still be specialized in the context (lines 6–31), a new context \(C^\prime \) is generated: If \(R_i\) is symbolic, a loop over the values of \(\mathbf{dom }(R_i)\cup \star _i\) (line 8) lists all the possible specializations \(C^\prime \) of C on \(R_i\) (line 9). Then, the transactions of \(G_C\) that do not satisfy \(C^\prime \) are removed (line 10). The closure F of \(C^\prime \) is computed line 11. If \(C^\prime \) is closed (line 12), the function Pruning (detailed in the next subsection) is called (line 13) to prune all the edges and connected components that are guaranteed to not satisfy the constraints for any contexts that are specializations of \(C^\prime \). If \(G_{C^\prime }\) is not empty (line 14), \((C^\prime ,G_{C^\prime })\) is recursively enumerated (line 15) to generate all valid exceptional contextual subgraph patterns.
From lines 16 to 31, we consider the case where \(R_i\) is numerical. Enumerating all possible contexts consists of listing all intervals, i.e. those whose endpoints occurring in the relation R. Let \(\mathbf{dom }_{R_i}=(v_i^1,\ldots ,v_i^m)\) be the ordered set of values that appear for attribute i in relation R. The function next (analogously previous) provides access to the following (analogously preceding) value of the one given as parameter. To enumerate all intervals included in \(\mathbf{dom }(R_i)\) once and only once, we generate, from each interval [a, b], two intervals [a, previous(b)] and [next(a), b], the first one [a, previous(b)] being generated only if its left endpoint a has not been increased so far (see the test line 2, with variable left retrieved from the stack in line 19). The generated intervals are pushed onto the stack (lines 29 and 31) and the loop from lines 18 to 31 is reiterated until the last interval has been considered.
This algorithm explores the lattice of symbolic concepts to find the closed ones, and therefore benefits from the developments and optimizations that have been published in the data mining literature for that problem setting. Given that we search strict closed contexts, the algorithm risks running into the same issues in the presence of noise that existing such algorithms exhibit. Extending the algorithm with the capability to mine noisetolerant contexts (Besson et al. (2006)) remains for future work.
4.2 The Pruning function
The Pruning function, see Algorithm 2, is based on two pruning mechanisms. The first one (lines 3 to 5) consists of removing individual edges. The constraint \(\vert M_{C\rightarrow {}T}(C,M_{G\rightarrow {}T}(e))\vert > min\_weight\) (constraint (2) in Definition 7) is antimonotonic and can be used to safely remove edges as soon as they do not satisfy the constraint. Constraint (3) on \(X^2(C,e)\) is not antimonotonic, but we use an upper bound \(X_{ub}^2(C,e)\), presented below, to remove the edge e from \(G_C\) as soon as we have guarantee that none of the specializations of C can lead to \(X^2(C,e)> \chi ^2_{0.05}\).
4.2.1 Upper bound for \(X^2(C,e)\)
4.2.2 Upper bound for \({\textsc {WRAcc}}(C,e)\)
Similarly, let us denote by \(y=W_C(e)\), \(x=\max _{z \in E_C} W_C(z)\), \(\alpha =\max _{z \in E_C} W_\star (z)\) and \(\beta =W_\star (e)\). Since \(\alpha \) and \(\beta \) are independent of \(W_C\), the values of x and y uniquely determine the \({\textsc {WRAcc}}(C,e)\) value and we have \({\textsc {WRAcc}}(x,y)=\frac{x}{\alpha }\left( \frac{y}{x}\frac{\beta }{\alpha }\right) \).
Property 1
The function \({\textsc {WRAcc}}(x,y)\) is a convex function.
Proof
4.2.3 Upper bound for the sum of \({\textsc {WRAcc}}\)
There is always the risk that weakly expressed edges mean that upper bounds are far too optimistic and do not aid in pruning. We empirically assess this issue, as well as the effect of mining only closed patterns, in Sect. 5.3.
4.3 Discussion
COSMIc performs a complete and non redundant enumeration of all useful contexts, that is, contexts for which one or several interesting patterns can be found. It is complete as it enumerates all contexts and as it uses safe pruning (c.f. upper bounds and a threshold on an antimonotonic constraint on the minimum weight of an edge). By definition, closed contexts cannot share exactly the same set of transactions, hence the enumeration is not redundant. This is a known result in the pattern mining literature, hence we omit a proof here.
Each closed context thus induces a different of set of transactions (obtained with the \(M_{C\rightarrow {}T}\) operator). However, the filtering of nonexceptional edges may output a redundant pattern. Redundant patterns are filtered out at the end of the algorithm. In practice however, we rarely observed such a situation in our experiments.
To identify interesting patterns without requiring the enduser to set thresholds (on minimum number of vertices, weights,...), we can slightly adapt COSMIc to output the best patterns either w.r.t a single measure (topk) or several measures and associated user preferences. For example, the analyst could be interested in patterns maximizing the number of edges while minimizing the edge average WRAcc and number of vertices, that is, densely connected graphs with a high average quality measure. For that matter, we can simply keep in memory patterns that are not dominated by others while enumerating them. In other words, we incrementally build the Pareto front given a set of user preferences [so called skypatterns (Soulet et al. 2011)].
5 Experimental study of COSMIc
In this section, we first propose an artificial data generator that is used in the following to evaluate the performances of COSMIc (Sect. 5.1). Then, we evaluate the ability of COSMIc to recover meaningful exceptional contextual subgraphs embedded into an augmented graph in the presence of noise (Sect. 5.2). Finally, we study the performance of COSMIc (e.g., execution time, pruning effectiveness) by using the generator and varying a single parameter, while controlling the others (Sect. 5.3).
Note also that COSMIc was implemented in Java and the experiments run on machines equipped with i72600 CPUs @ 3.40GHz, and 16GB main memory, running Ubuntu 12.04, and Java Version 1.6. Algorithm implementations, data generator and Vélo’v results exploration are available on a Web page.^{3}
5.1 Artificial augmented graph generator
Since it is notoriously difficult to obtain data of which the ground truth is known, especially for augmented graphs that have not been much studied so fare, we designed an artificial augmented graph generator that makes possible to evaluate COSMIc in a systematic way.
Default parameters used for generating data
Parameter  Description  Default value 

nbVertices  Number of vertices  \(10^4\) 
nbTrans  Number of transactions  \(3\times 10^6\) 
nbAtt  Number of nominal attributes  5 
\(domain_{size}\)  Avg. size of attribute domains  20 
nbPatterns  Number of hidden patterns  5 
patternSize  Avg. number of vertices involved in a hidden pattern  10 
linkProb  Probability of two vertices to be linked  0.2 
weight  Avg. weight of contextual edges in hidden patterns  10 
noiseRate  Probability of a transaction supporting the context to be noisy  0.1 
5.2 Robustness to noise and ability to discover hidden patterns
 \(S_V\) indicates the similarity between the vertices of the two patterns:$$\begin{aligned} S_V(P_d,P_h)=\frac{V_d \cap V_h}{V_d \cup V_h} \end{aligned}$$
 \(S_C\) assesses how similar \(P_d\) is to the context \(P_h\):with$$\begin{aligned} S_C(P_d,P_h)=\frac{\sum _{i=0}^m \delta _1(a^{p_d}_i,a^{p_h}_i)}{\sum _{i=0}^m \delta _2(a^{p_d}_ i,a^{p_h}_i)} \end{aligned}$$$$\begin{aligned}\begin{array}{l} \delta _1(a^{p_d}_i,a^{p_h}_i) = \left\{ \begin{array}{l} 1 \text{ if } a^{p_d}_i=a^{p_h}_i \\ 0 \text{ otherwise } \end{array}\right. \delta _2(a^{p_d}_i,a^{p_h}_i) = \left\{ \begin{array}{l} 1 \text{ if } a^{p_d}_i=a^{p_h}_i ~or~ a^{p_d}_i=\star _i\\ 2 \text{ otherwise } \\ \end{array}\right. \end{array} \end{aligned}$$
 1.
weight\(=\)10 and linkProb\(=\)0.1
 2.
weight\(=\)30 and linkProb\(=\)0.2
 3.
weight\(=\)40 and linkProb\(=\)0.5
5.3 Performance study
We also use our artificial data generator to study the behavior of COSMIc with regard to several factors: The number of transactions, the number of vertices, the number of attributes and the cardinality of the attribute domains. We generate datasets by varying a single factor, the other ones being fixed. To avoid atypical results due to the randomness, we generate 10 datasets for each settings and report the median of the execution times as well as the median number of discovered patterns and their median score S, defined in the previous subsection. In this set of experiments, we use the default values for the generator that are given in Table 3.
In Figs. 10 and 11, we respectively report on the behavior of our algorithm when varying the number of attributes and their domain cardinality. Obviously, adding new attributes or increasing the size of the attribute domain results in a larger search space. Therefore, the execution time increases when either the number of attributes or the size of the attribute domain increase. The number of attributes is the more influential factor. Its increase leads to the discovery of larger sets of patterns with worse quality. Notice that even if the quality of the patterns is decreasing, it remains satisfactory (i.e., greater than 0.6 with 7 attributes). We observe the same phenomena when we increase the size of domain values.
6 Comparative experiments
In this section, we compare the results obtained by COSMIc with those provided using related approaches (reviewed in Sect. 2), namely the MiMaG approach (Boden et al. 2012) designed for subspace clustering on layered edgeweighted graphs, and an Exceptional Model Mining algorithm (van Leeuwen and Knobbe 2012).
6.1 Comparison to MiMag
As we discuss in Sect. 2, several approaches for mining edgeattributed graphs have been proposed in the literature. The one closest in goal to our technique, an algorithm called MiMaG, has been introduced in Boden et al. (2012). In that framework, graphs exist in several layers, and edges have weights that are determined by those layers, obviously a formulation that is rather close to our own. MiMaG attempts to find quasicliques formed by edges with similar weights, and group layers in which the same nodes are involved in a quasiclique. We therefore explore to what degree that technique can discover patterns in the data that we use.
 1.
How to define graph layers?
 2.
How to define edge weights?
Potential graph layers and maximum weights of edges for each layer
Graph layer modeling 1  Graph layer modeling 2  

Gender  Age  Time  Weather  \(\max \limits _{x\in E} W_C(x)\)  Attribute  Value  \(\max \limits _{x\in E} W_C(x)\) 
F  20  Day  Rainy  1  Gender  F  3 
F  20  Night  Cloudy  1  Gender  M  2 
F  20  Night  Windy  1  Age  20  3 
F  20  Night  Rainy  1  Age  23  1 
M  23  Night  Windy  1  Age  30  1 
M  23  Night  Cloudy  1  Age  45  2 
M  23  Night  Rainy  1  Age  50  1 
F  45  Night  Cloudy  1  Time  Day  2 
F  45  Day  Rainy  1  Time  Night  4 
F  45  Night  Windy  1  Weather  Cloudy  2 
M  50  Day  Windy  1  Weather  Rainy  2 
M  50  Night  Rainy  1  Weather  Windy  2 
F  30  Night  Rainy  1 
Possible weight encodings of edge (C,D) for an example context and its component attributevalue pairs
Context  Relative weight  WRAcc 

\(\langle F,20,\hbox {Night},\hbox {rainy}\rangle \)  1/1  \(\frac{1}{5}\left( \frac{1}{1} \frac{5}{5}\right) \) 
Gender\(=\)‘F’  1/3  \(\frac{3}{5}\left( \frac{1}{3}\frac{2}{5}\right) \) 
Age\(=\)20  1/3  \(\frac{3}{5}\left( \frac{1}{3}\frac{2}{5}\right) \) 
Time\(=\)‘Night’  2/4  \(\frac{4}{5}\left( \frac{2}{4}\frac{2}{5}\right) \) 
Weather\(=\)‘Rainy’  1/2  \(\frac{2}{5}\left( \frac{1}{2}\frac{2}{5}\right) \) 
Parameter settings MiMaG has two parameters: \(\gamma \) influences the degree to which quasicliques need to be connected, and w is the tolerance parameter deciding whether edge weights are considered similar or not. There is no clear guidance how to set those parameters: \(\gamma \in [0,1]\) but anything below 0.5 denoted nondense cliques (a type of pattern COSMIc can discover). Given that we have normalized weights, we know that \(w \in [0,1]\) but we cannot decide a priori what is a good value. The authors of the original paper evaluate their approach with \(\gamma = 0.5\) and \(w = 0.1\) and we therefore use the same parameters. We use the Java implementation of the algorithm that has been provided to us by the authors.
Performancewise, Fig. 17 shows that MiMaG is not faster than COSMIc, even while finding a comparable amount of patterns overall. Notably, each pattern involves typically five nodes or less.
There is arguably an explanation for the inability of MiMaG to group more nodes together: The parameter value \(\gamma =0.5\) means that it is searching relatively dense subgraphs. The problem is, however, that lowering that value causes MiMaG to encounter memory problems again: for \(\gamma =0.3\), for instance, it crashes when mining data with attributevalue pairs as layers. When mining data with context as layers, the process terminates with long running times, yet fails to find any patterns. Using MiMaG to address our problem setting therefore requires significant preprocessing of the data: the tasks that COSMIc performs internally—decomposing contexts and calculating WRAcc—need to be done beforehand. Even then WRAcc stays static, however, and does not change while attributes are combined, leading to patterns of lower quality. As a conclusion, those experiments show that MiMaG can hardly be adapted to solve the problem we consider here, which is quite different from the one addressed by the authors in Boden et al. (2012).
6.2 Comparison to exceptional model mining
As mentioned earlier, our problem is a particular instance of exceptional model mining (EMM) (Leman et al. 2008; Duivesteijn et al. 2016). Therefore, we instantiate our problem as the closest existing EMM setting that can partially model our setting to evaluate the performance of an EMM approach. Recall that our dataset is an augmented graph: Each transaction takes values for some numerical and categorical attributes and is also associated to an edge. These attributes values vectors form the object descriptions in EMM. Each edge found in the dataset is a binary target. In this reformulation, an EMM algorithm searches for subgroups \((S, M_{T\rightarrow {}C}(S))\) (or equivalently \((M_{C\rightarrow {}T}(C), C)\)), for any subset of transactions \(S \subseteq T\) and context \(M_{T\rightarrow {}C}\)(S). Subgroups are evaluated with a quality measure, for instance the (weighted) KullbackLeibler divergence ((W)KL): it measures the difference between the target attributes’ distribution (the graph edges’ distribution) within the subgroup and within the full dataset. The higher the difference, the more exceptional the subgroup. Contexts for which the appearance of edges is exceptional are searched for.
Example
Consider Table 1 and the context \(C = (\star , Night, Windy)\). We have that \(T = M_{C\rightarrow {}T}(C)= \{ m_2, m_7, m_{10}, m_{15} , m_{17} \}\) and (T, C) is a subgroup. The appearance probability of each edge \(e\in E\) in the subgroup \(p(e\vert T)\) and in the whole dataset \(p(e\vert R)\) are:
Although this modelization partially fits to our problem, it suffers from major issues that we discuss below. For some elements of the discussion, we ran several experiments with this modelization and the DSSD algorithm. It performs a beamsearch through the lattice of subgroups, enabling the discovery of a diverse set of subgroups (van Leeuwen 2010; van Leeuwen and Knobbe 2012) and considers the WKL quality measure. Given that this heuristic exploration is not able to deal with large graphs, we did not experiment with exhaustive explorations [e.g., SDMap (Atzmüller and Puppe 2006)].
Subgroup interpretation Knowing that a subgroup, or context, is exceptional is not enough: we need to know for which edges this is the case. In other words, the targets of the objects within a subgroup induce a weighted subgraphs, and selecting which edges are important remains to be done. A solution is to keep only overexpressed edge according to the WRAcc measure (edges whose WRAcc is strictly positive). This can however result in an overabundance of connected components, as well as in too small subgraphs, or even individual edges. Most importantly, the WKL suffers from the curse of dimensionality: when dealing with numerous targets (edges), it is very likely that the best subgroups appear exceptional due to a slight global difference in the distribution of edges, and not a strong local one that affects only a few edges.^{4} The best contextual graphs may be missed.
7 Case studies with realworld data
We report two case studies showing the actionability of the discovered patterns: (1) On the public bicycle sharing system of Lyon, called Vélo’v, we study the use of the system depending on time of day, user demographic data (age, gender etc.) and properties of the districts of Lyon; (2) On data gathered from matches of DOTA 2, a tactical multiplayer computer game, we demonstrate the ability of exceptional contextual subgraphs to highlight mobility behaviors.
7.1 Travel patterns in the Vélo’v system
Vélo’v is the bikesharing system run by the city of Lyon (France) and the company JCDecaux.^{6} There are a total of 348 Vélo’v stations across the city of Lyon. Our Vélo’v dataset contains all the trips collected over a 2 year period (Jan. 2011–Dec. 2012). Each trip includes the bicycle station and the time stamp for both departure and arrival, as well as some basic demographics about the users (gender, age, zip code, country of residence, type of pass). Hence, the Vélo’v stations are the graph vertices (\(\vert V\vert = 348\)), and directed edges correspond to the fact that a Vélo’v user checks out a bicycle at a station and returns it at another. There are in total 164, 390 users for which demographics are available and 6.7 million of transactions (i.e., movements).
The rapid development of bicycle sharing and renting systems has an impact on urban mobility practices. Studying this impact is crucial for the following reasons: (1) It is important to understand whether and how this new service contributes to the emergence of new mobility trends; (2) This study is multidisciplinary and involves physicists, economists, geographers and sociologists as well as the practicians directly involved with the bicycle sharing system. Notice that our approach fits well in a multidisciplinary context since the patterns we are interested in are interpretable without data mining expertise; (3) The conclusions of the analysis are of interest for several urban mobility actors (local authorities and private mobility operators). For instance, these conclusions can be transferred to new cities for the deployment of new services.

Number of transactions To vary this parameter, three subsets of data have been chosen: The two first weeks of October 2011, denoted as 2weeks, with 312, 185 transactions), the full month of October 2011, denoted as october, with 565, 065 transactions, and the full dataset, denoted as all, with 6, 713, 937 transactions.

Number of attributes In its basic version, the dataset contains the following attributes: \(day time \in \{ morning, midday, evening, lastmetro, night , other \} \) which denotes specific bike usage (Hamon 2015); the zipcode, gender, country, and age of the biker [where \(age \in \{ [14;25][25,60], {\ge }60 \}\) still according to Hamon (2015)] as well as the type of pass subscribed by the user.
In its extended versions, the dataset contains properties of both departure and arrival stations (edge source and target attributes). We use census data provided by the National Institute of Statistics and Economic Studies (INSEE) that provides meaningful information about education, employment, industries, etc. Each station is labeled with some information of the INSEE division whose center is the closest to the bike station (using a Google API). The information used is TrainStation, University, Companies, Hotel, Tourism, which respectively are true if there is at least a train station or a university, at least 10 companies, at least one hotel and at least one tourism center. In total there are 9 attributes for the basic datasets and 19 for extended ones.
Figure 21 presents a visualization on the map of Lyon of six patterns that we extracted from different datasets introduced above.^{7} For each, we detail the experimental protocol and propose an interpretation.
While mining the dataset (all, basic) with \(\{daytime=night\}\in C_{root}\), we obtain 45 patterns in 80 s (with parameters \(min\_vertex\_size=2\), \(min\_edge\_size=1\) and \(min\_weight=6\)). Two of these patterns are shown in Fig. 21c and d: The graph associated to \(C_3\) involves three areas known for their nightlife (left hand side of the figure) and two residential areas with many young inhabitants (on the right). Context \(C_4\) contains the attributevalue \(zip code=69005\) and its associated graph displays travels between this area (on the left along the river) and Lyon’s opera as well as the PartDieu rail station. The pattern represented in Fig. 21c (resp. d) contains 7 nodes and 10 edges with a WRAcc sum of 0.03 (resp. 6, 10 and 0.05). These patterns may thus identify key stations and demographics that the Vélo’v system operator could target for heightening awareness campaigns on, for example, dangers when biking at night or after parties.
Finally, we run COSMIc on (all, extended) starting the pattern enumeration with \(\{age\in [14,26]\}\in C_{root}\), thus aiming to get insights on young people’s behaviour. The execution took 70 minutes with \(min\_vertex\_size=15\), \(min\_weight=100\) and \(min\_sum\_wracc = 0.1\). It returned 31 patterns. Two of the patterns obtained are shown in Fig. 21e, f, having, respectively, 18 vertices, 45 edges, and a WRAcc sum of 0.28, and 16 vertices, 39 edges, and a WRAcc sum of 0.3. In the graph associated to \(C_5\), edges link city center areas with the city center campus. Pattern \(C_6\) contains the attributevalue pair \(zip code=69004\) which is the area where many edges depart (upper left part). The arrivals of these edges are the main components of the University of Lyon spread across the city. Most importantly, in both case the context hints the presence of universities in the IRIS attached to each node. One possible interpretation is that these two patterns depict students from the \(4^{th}\) district of Lyon going to their universities. Here again, such hypotheses are valuable for the Vélo’v system operator as it gives hints on the behaviour for particular demographics (without specifying them explicitly: The root pattern has just a single attribute instantiated).
7.2 Behavioral mobility patterns in Dota 2
Electronic sport (eSport) is an emerging domain where the most skilled gamers are hired by professional teams, surrounded by sponsors, and compete in international tournaments (Taylor 2012), widely followed on live streaming platforms such as Twitch.tv (Kaytoue et al. 2012).^{8} Its development impacts our society: For example, a law project in France is studying the legal status of esport athletes and tournaments just as for offline athletes.^{9} Academics and experts in sport analytics are starting to get interested in this emerging topic (Von Eschen 2014; Schubert and Drachen 2016) as well. Strategic video games received much attention from the AI community for an extended period (Ontañón et al. 2013), attention that was renewed after recent announcements from the DeepMind team naming a video game as the next challenge after Go.^{10}
In this context, we study Dota 2, a multiplayer online battle arena video game released in July 2013. Up to February 2015, Dota 2 attracted tournaments totalling US$ 25 million in prize money, becoming one of the most lucrative competitive video games. Just as in sport, players are gathered as a team with coaches and practice as a daily routine. Behavioural data analytics start to play a key role to understand and model the opponents and thus prepare tournaments, here again, just as for any athlete or sport team with sport analytics (baseball, soccer and basketball). We assess our methodology showing that behavioural patterns specific to game conditions and players can be discovered from Dota 2 games. These patterns can be used to understand the behaviour of a single or several players (the subgraph) at various stage of the game and under several conditions (the contexts).
Dota 2 and problem settings A game is played on a map where two teams of five players are battling each other in real time. Each team has to defend their own stronghold and destroy the opponent’s one to win. Each player controls a hero that he moves on the map, and needs to train, by collecting gold, new items and abilities, and by fighting opposing heroes. Figure 22 displays the initial influence zone of both teams. The red team called the dire (resp. green for the radiant), defends their stronghold at the top right corner (resp. bottom left). Three lanes (top, mid, bot in Fig. 23a) separate the teams, on which defensive towers are set. The players have well defined roles, depending on the heroes they initially picked and their properties (110 available heroes). One role consists of defending and extending the influence zone in a specific lane, another is to quickly switch lanes to attack by surprise. Knowing that a team only sees controlled map zones, estimating enemies positions and triggering team fights at wellchosen times and in wellchosen areas is key to success. As in any traditional sport, professional teams study their soontobe opponents. Understanding what are the specific zones controlled by a player in some contexts is crucial, and allows teams to adapt their strategy and prepare the tournaments.
Scenario Any action performed during a game is stored afterwards in a file (replay), allowing to rewatch it at any time. Replays and parsing tools are freely available on dotabank.com and the skadistats GitHub. We randomly selected an expert game from The International 3 Eastern Qualifiers: lasting 42 minutes and won by the dire (more information on dota2stat.com, match #199392262). We built different augmented graphs from this game as follows. The map is cut into \(n^2\) nonoverlapping squares of equal width/height and each cell of this grid is a vertex \(v \in V\), edges of the graph are hero travel paths (movement between two cells), R is the set of attributes describing players heroes properties at the moment of the movement. Game time is measured in ticks (30 ticks per second): There is at most \(T = 30 \times 42 \times 60 \times 10 = 756,000\) movements (as there are 10 players). As such, we work directly on aggregated data by rounding game times to a factor of w seconds and grid resolution n. We build two datasets with different resolutions in space and time: DOTA \(_1\) with \(w=600\) and \(n=60\) (which gives 482, 937 transactions and 3, 475 vertices) DOTA \(_2\) with \(w=1800\) and \(n=100\) (that is, 482, 250 transactions and 11, 263 vertices). The datasets also differ by their attributes: Transactions are defined on time (discretized game ticks), hero type, team (dire or radiant) for DOTA \(_1\). We add two attributes in DOTA \(_2\) : percentage of remaining life (at zero, the hero dies and waits a time proportional to the current game time before respawning) and percentage of remaining mana (consumed when using special tactics). A transaction example is \(t =\{Jakiro, [0600]\),\( dire\}\) with edge \(\{(10,32)\}\): A player of the dire team moved his Jakiro from cell 10 to cell 32 in the first 600 s of the game.
Experimental results We run COSMIc on the two datasets searching for contextual subgraphs having at least 30 nodes and 29 edges. We compute the average and deviation of the WRAcc measure for each pattern. We remove patterns dominated by another on all these dimensions to reduce the number of output patterns (that is, we use a skyline operator seeking to minimize deviation and maximize the other measures). DOTA \(_1\) produces 77 exceptional contextual subgraphs out of 29 different contexts in 363 s while DOTA \(_2\) produces 158 exceptional contextual subgraphs out of 124 contexts in 230 s. Figure 23 presents some of the best patterns (highest average WRAcc). In (a), contexts are: \(\{SD, w_2\), \( radiant\}\) and \(\{SD, w_3, radiant\}\): these two connected components show different characteristic zones of the hero ShadowDemon (SD) at two different phases of the game. While this player is aggressive in time window \(w_2\) pushing on the top lane, he mainly walks around his stronghold in the next window. Given that he belongs to the losing team, we can assume that the latter is a defensive pattern. The pattern in Fig. 23b, whose context is \(\{Juggernaut, w_2, dire\}\), characterizes a role mentioned before: It represents quick lane switches to help team mates and attack by surprise. Finally, two exceptional contextual subgraphs of DOTA \(_2\) are given on Fig. 23c, sharing the same context \(\{w_0, radiant, mana \le 25\%\}\). They clearly show manabackup trajectories: in the early stage, mana is a rare resource, and getting back inside the stronghold allows a player to quickly regain all of his mana, which otherwise increases very slowly out of the stronghold.
8 Conclusion
In this paper, we defined the problem of finding exceptional contextual subgraphs in augmented graphs. This problem has many applications, especially for mobility data analysis such as in location based social networks, urban data, and recommendation systems. It enables to discover connected components highly characteristic of specific category of users and time periods. We showed how an inductive approach rooted in Exceptional Model Mining can answer this challenging problem. This is achieved thanks to an efficient data mining algorithm COSMIc that avoids materializing all context/subgraph pairs and benefits from pruning and upper bound computations techniques. We evaluated COSMIc on both synthetic and realworld datasets. For that matter, we designed an augmented graph generator that allows to hide exceptional contextual subgraphs and showed that COSMIc is able to retrieve the hidden patterns in noisy data and to scale w.r.t. the parameters of the input data (attribute domain size and number, number of transactions and vertices). We compared our approach to the closest existing formalisms and algorithms we could find and discussed how they fail to answer our problem. Eventually, we provided two casestudies (i) on the analysis of a bikesharing system, where discovered patterns are helpful for the Vélo’v system operators (e.g. discovering stations and mobility patterns involving young people at night) and (ii) on the analysis of Dota 2 replays, a wellknown game in eSport, for which the discovered patterns explains the mobility behaviors of players.
Footnotes
 1.
For sake of simplicity, we use the term edge to refer indifferently to directed or undirected edges without loss of generality.
 2.
COSMIc stands for COntextual Subgraph MIning.
 3.
 4.
A parallel could be drawn to the case when one has to use biclustering techniques over traditional clustering in the presence of a large number of attributes (Madeira and Oliveira 2004).
 5.
Provided by the authors of the DSSD at http://patternsthatmatter.org/dssd/.
 6.
 7.
All results can be explored through a user friendly interface http://liris.cnrs.fr/dm2l/projects/graisearch/mlj/.
 8.
Recently acquired by Amazon for US$ 970 million.
 9.
 10.
Notes
Acknowledgements
The authors would like to thank the anonymous reviewers for their frank, fruitful, constructive and insightful comments and the authors of the MiMaG and DSSD algorithms for providing us their prototypes. They also gratefully acknowledge Pierre Houdyer for the development of the pattern visualization platform on VELOV data. This work has been partially supported by the projects GRAISearch (FP7PEOPLE2013IAPP) and VEL’INNOV (ANR INOV 2012).
References
 Ahmed, R., & Karypis, G. (2011). George algorithms for mining the evolution of conserved relational states in dynamic networks. In IEEE ICDM (pp. 1–10).Google Scholar
 Atzmueller, Martin, Doerfel, Stephan, & Mitzlaff, Folke. (2016). Descriptionoriented community detection using exhaustive subgroup discovery. Information Sciences, 329, 965–984.CrossRefGoogle Scholar
 Atzmüller, M., & Puppe, F. (2006). Sdmap—A fast algorithm for exhaustive subgroup discovery. In PKDD, volume 4213 of LNCS (pp. 6–17), Springer.Google Scholar
 Berlingerio, M., Bonchi, F., Bringmann, B., & Gionis, A. (2009). Mining graph evolution rules. In ECML/PKDD (pp. 115–130).Google Scholar
 Berlingerio, Michele, Coscia, Michele, Giannotti, Fosca, Monreale, Anna, & Pedreschi, Dino. (2013). Multidimensional networks: Foundations of structural analysis. World Wide Web, 16(5–6), 567–593.CrossRefGoogle Scholar
 Besson, J., Robardet, C., & Boulicaut, J. (2006) Mining a new faulttolerant pattern type as an alternative to formal concept discovery. In Schärfe, H., Hitzler, P. & Øhrstrøm, P. (eds.), Conceptual Structures: Inspiration and Application, Proceedings of the 14th International Conference on Conceptual Structures, ICCS 2006, Aalborg, Denmark, July 16–21, 2006 volume 4068 of Lecture Notes in Computer Science, (pp. 144–157), Springer.Google Scholar
 Boden, B., Günnemann, S., Hoffmann, H. & Seidl, T. (2012). Mining coherent subgraphs in multilayer graphs with edge labels. In KDD (pp. 1258–1266).Google Scholar
 Bonchi, F., Gionis, A., Gullo, F., & Ukkonen, A. (2012). Chromatic correlation clustering. In KDD (pp. 1321–1329).Google Scholar
 Borgwardt, K. M., Kriegel, H. P. , & Wackersreuther, P. (2006) Pattern mining in frequent dynamic subgraphs. In IEEE ICDM (pp. 818–822).Google Scholar
 Bringmann, Björn, Berlingerio, Michele, Bonchi, Francesco, & Gionis, Aristides. (2010). Learning and predicting the evolution of social networks. IEEE Intelligent Systems, 25(4), 26–35.CrossRefGoogle Scholar
 Das, Mahashweta, AmerYahia, Sihem, Das, Gautam, & Cong, Yu. (2011). MRI: Meaningful interpretations of collaborative ratings. Proceedings of the VLDB Endowment, 4(11), 1063–1074.Google Scholar
 Das, Mahashweta, Thirumuruganathan, Saravanan, AmerYahia, Sihem, Das, Gautam, & Cong, Yu. (2014). An expressive framework and efficient algorithms for the analysis of collaborative tagging. The VLDB Journal, 23(2), 201–226.CrossRefGoogle Scholar
 de Melo, P. O. S. V., Faloutsos, C., & Loureiro, A. A. F. (2011). Human dynamics in large communication networks. In SDM (pp. 968–879), SIAM.Google Scholar
 Desmier, E., Plantevit, M., Robardet, C. & Boulicaut, J. F. (2013). Trend mining in dynamic attributed graphs. In ECML/PKDD (pp. 654–669).Google Scholar
 Duivesteijn, W. (2014). A short survey of exceptional model mining: Exploring unusual interactions between multiple targets. In 2014 International Workshop on MultiTarget Prediction.Google Scholar
 Duivesteijn, Wouter, Feelders, Ad, & Knobbe, Arno J. (2016). Exceptional model mining—Supervised descriptive local pattern mining with complex target concepts. Data Mining and Knowledge Discovery, 30(1), 47–98.MathSciNetCrossRefGoogle Scholar
 Duivesteijn, W., Knobbe, A., Feelders, A., & van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks—An exceptional model mining approach. In Geoffrey I. W., Bing L., Chengqi Z., Dimitrios G., & Xindong W. (Eds), ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010 (pp. 158–167), IEEE Computer Society.Google Scholar
 Girvan, Michelle, & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.MathSciNetCrossRefMATHGoogle Scholar
 Goyal, Amit, Bonchi, Francesco, Lakshmanan, Laks V . S., & Venkatasubramanian, Suresh. (2013). On minimizing budget and time in influence propagation over social networks. Social Network Analysis and Mining, 3(2), 179–192.CrossRefGoogle Scholar
 Günnemann, S., Färber, I., Boden, B., & Seidl, T. (2010). Subspace clustering meets dense subgraph mining. In ICDM (pp. 845–850).Google Scholar
 Hamon, R. (2015). Analysis of temporal networks using signal processing methods : Application to the bikesharing system in Lyon. Ecole normale supérieure de lyon—ENS LYON: Theses.Google Scholar
 Inokuchi, A., & Washio, T. (2010). Mining frequent graph sequence patterns induced by vertices. In SDM (pp. 466–477), SIAM.Google Scholar
 Jiang, M., Cui, P., Liu, R., Yang, Q., Wang, F., Zhu, W., & Yang, S. (2012). Social contextual recommendation. In CIKM, (pp. 45–54).Google Scholar
 Kaytoue, M., Pitarch, Y., Plantevit, M., & Robardet, C. (2014). Triggering patterns of topology changes in dynamic graphs. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, Beijing, China, August 17–20, 2014 pp. (158–165).Google Scholar
 Kaytoue, M., Silva, A., Cerf, L., Meira Jr., W., & Raïssi, C. (2012). Watch me playing, i am a professional. In WWW (Comp. Vol.) pp. (1181–1188), ACM.Google Scholar
 Khan, A., Yan, X., & Wu, K. L. (2010). Towards proximity pattern mining in large graphs. In SIGMOD pp. (867–878), ACM.Google Scholar
 Lahiri, M., & BergerWolf, T. Y. (2008). Mining periodic behavior in dynamic social networks. In IEEE ICDM pp. (373–382).Google Scholar
 Lavrac, Nada, Kavsek, Branko, Flach, Peter A., & Todorovski, Ljupco. (2004). Subgroup discovery with CN2SD. Journal of Machine Learning Research, 5, 153–188.MathSciNetGoogle Scholar
 Leman, D., Feelders, A., & Knobbe, A. J. (2008). Exceptional model mining. In ECML/PKDD (pp. 1–16).Google Scholar
 Lemmerich, F., Becker, M., & Atzmueller, M. (2012). Generic pattern trees for exhaustive exceptional model mining. In Flach, P. A., De Bie, T, & Cristianini, N. (Eds.), Machine Learning and Knowledge Discovery in Databases—European Conference, ECML PKDD 2012, Bristol, UK, September 24–28, 2012. Proceedings, Part II, volume 7524 of Lecture Notes in Computer Science (pp. 277–292), Springer.Google Scholar
 Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.CrossRefGoogle Scholar
 Morishita, S., & Sese, J. (2000). Traversing itemset lattice with statistical metric pruning. In PODS.Google Scholar
 Moser, F., Colak, R., Rafiey, A., & Ester, M. (2009). Mining cohesive patterns from graphs with feature vectors. In SDM (pp. 593–604), SIAM.Google Scholar
 Mougel, P. N., Rigotti, C., Plantevit, M., & Gandrillon, O. (2013). Finding maximal homogeneous clique sets. Knowledge and Information Systems, pp. 1–30.Google Scholar
 Novak, Petra Kralj, Lavrač, Nada, & Webb, Geoffrey I. (2009). Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.MATHGoogle Scholar
 Ontanón, S., Synnaeve, G., Uriarte, A., Richoux, F., Churchill, D., & Preuss, M. (2013). A survey of realtime strategy game AI research and competition in starcraft. IEEE Transactions on Computational Intelligence and AI in Games, 5(4), 293–311.CrossRefGoogle Scholar
 Pearson, Karl. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175.CrossRefMATHGoogle Scholar
 Prado, Adriana, Jeudy, Baptiste, Fromont, Élisa, & Diot, Fabien. (2013). Mining spatiotemporal patterns in dynamic plane graphs. Intelligent Data Analysis, 17(1), 71–92.Google Scholar
 Prado, Adriana, Plantevit, Marc, Robardet, Céline, & Boulicaut, JeanFrançois. (2013). Mining graph topological patterns: Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 99, 1.Google Scholar
 Qi, GuoJun, Aggarwal, Charu C., Tian, Qi, Ji, Heng, & Huang, Thomas S. (2012). Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5), 850–862.CrossRefGoogle Scholar
 Robardet, C. (2009). Constraintbased pattern mining in dynamic graphs. In IEEE ICDM (pp. 950–955).Google Scholar
 Schubert, M., & Drachen, A. (2016). Esports analytics through encounter detection. In Sloan, M. I. T (Ed.), Proceedings of the MIT Sloan Sports Analytics Conference, 2016.Google Scholar
 Sese, J., Seki, M., & Fukuzaki, M. (2010). Mining networks with shared items. In CIKM (pp. 1681–1684), ACM.Google Scholar
 Silva, Arlei, Meira, Wagner, & Zaki, Mohammed J. (2012). Mining attributestructure correlated patterns in large attributed graphs. Proceedings of the VLDB Endowment, 5(5), 466–477.CrossRefGoogle Scholar
 Soulet, A., Raïssi, C., Plantevit, M., & Crémilleux, B. (2011). Mining dominant patterns in the sky. In 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011 (pp. 655–664).Google Scholar
 Sun, Yizhou, & Han, Jiawei. (2012). Mining heterogeneous information networks: Principles and methodologies. San Rafael: Morgan & Claypool Publishers.Google Scholar
 Taylor, T . L. (2012). Raising the stakes:Esports and the professionalization of computer gaming. Cambridge: MIT Press.Google Scholar
 Tong, H., Papadimitriou, S., Sun, J., Yu, P. S., & Faloutsos, C. (2008). Colibri: fast mining of large static and dynamic graphs. In KDD (pp. 686–694).Google Scholar
 van Leeuwen, Matthijs. (2010). Maximal exceptions with minimal descriptions. Data Mining and Knowledge Discovery, 21(2), 259–276.MathSciNetCrossRefGoogle Scholar
 van Leeuwen, Matthijs, & Knobbe, Arno J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.MathSciNetCrossRefGoogle Scholar
 Von Eschen, A. (2014). Machine learning and data mining in call of duty (invited talk). In ECML/PKDD.Google Scholar
 Yang, Y., Yu, J. X., Gao, H., Pei, J. & Li, J. (2013). Mining most frequently changing component in evolving graphs. World Wide Web, pp. 1–26.Google Scholar
 You, Chang Hun, Holder, Lawrence B & Cook, Diane J. (2009) Learning patterns in the dynamics of biological networks. In KDD, pages 977–986.Google Scholar