1 Introduction

In ubiquitous and social environments, a variety of heterogenous multi-relational data is generated, e. g., by sensors and social media. Then, a set of complex networks can be derived, in the form of social interaction networks [2], capturing distinct facets of the interaction space [19]. In that context, local exceptionality detection – based on subgroup discovery and exceptional model mining – provides flexible approaches for data exploration, assessment, and the detection of unexpected and interesting phenomena.

Subgroup discovery [3, 15, 23] is an approach for discovering interesting subgroups – as an instance of local pattern detection [20]. The interestingness is usually defined by a certain property of interest formalized by a quality function. In the simplest case, a binary target variable is considered, where the share in a subgroup can be compared to the share in the dataset in order to detect (exceptional) deviations. More complex target concepts consider sets of target variables. In particular, exceptional model mining [3, 12] focuses on more complex quality functions. In the context of ubiquitous data and social media, interesting target concepts are given, e. g., by densely connected graph structures (communities) [5], unexpected spatio-semantic distributions [8], or exceptional matches between online-offline relations [13] for behavioral characterization.

This paper focuses on formalizations and applications of subgroup discovery and exceptional model mining in the context of social interaction networks. We summarize recent work on community detection, behavior characterization and spatio-temporal analysis, and efficient implementation (comprising the papers [1, 2, 48, 10, 13]). In that way, we provide a compact and structured overview of recent scientific advances in this field, covering specific methods and their applications for analyzing social interactions.

2 Methods

Social interaction networks [2, 17, 18] focus on user-related social networks in social media capturing social relations inherent in social interactions, social activities and other social phenomena which act as proxies for social user-relatedness. Therefore, according to the categorization of Wassermann and Faust [22, p. 37 ff.] social interaction networks focus on interaction relations between people as the corresponding actors. This also includes interaction data from sensors and mobile devices, as long as the data is created by real users [1, 2].

In such contexts, exploratory data analysis is an important approach, e. g., for getting first insights into the data. In particular, descriptive data mining aims to uncover certain patterns for characterization and description of the data and the captured relations. Typically, the goal of the methods is not only an actionable model, but also a human interpretable set of patterns [16].

Subgroup discovery and exceptional model mining are prominent methods for local exceptionality detection that can be configured and adapted to various analytical tasks. Local exceptionality detection especially supports the goal of explanation-aware data mining [9], due to its more interpretable results, e. g., for characterizing a set of data, for concept description, for providing regularities and associations between elements in general, and for detecting and characterizing unexpected situations, e. g., events or episodes. In the following, we summarize approaches and methods for local exceptionality detection on attributed graphs, for behavioral characterization, and spatio-temporal analysis. Furthermore, we address issues of scalability and large-scale data processing.

2.1 Description-Oriented Community Detection

Communities can intuitively be defined as subsets of nodes of a graph with a dense structure in the corresponding subgraph. However, for mining such communities usually only structural aspects are taken into account. Typically, no concise nor easily interpretable community description is provided.

In [5], we focus on description-oriented community detection using subgroup discovery. For providing both structurally valid and interpretable communities we utilize the graph structure as well as additional descriptive features of the graph’s nodes. We aim at identifying communities according to standard community quality measures, while providing characteristic descriptions at the same time. We propose several optimistic estimates of standard community quality functions to be used for efficient pruning of the search space in an exhaustive branch-and-bound algorithm. We present examples of an evaluation using five real-world data sets, obtained from three different social media applications, showing runtime improvements of several orders of magnitude. The results also indicate significant semantic structures compared to the baselines. A further application of this method to the exploratory analysis of social media using geo-references in demonstrated in [2, 6]. A scalable implementation of the described description-oriented community detection approach, i. e., the COMODO algorithm [5], is described in [7], which is also suited for large-scale data processing utilizing the Map/Reduce framework [11]. With that, we can apply the same method for in-memory datasets as well as for large-scale datasets supporting efficient processing.

2.2 Behavioral Characterization on Social Interaction Networks

Important structures that emerge in social interaction networks are given by subgroups. As outlined above, we can apply community detection in order to mine both the graph structure and descriptive features in order to obtain description-oriented communities. However, we can also analyze subgroups in a social interaction network from a compositional perspective, i. e., neglecting the graph structure. Then, we focus on the attributes of subsets of nodes or on derived parameters of these, e. g., corresponding to roles, centrality scores, etc. In addition, we can also consider sequential data, e. g., for characterization of exceptional link trails, i. e., sequential transitions, as presented in [4].

In [1], we discuss a number of exemplary analysis results of social behavior in mobile social networks, focusing on the characterization of links and roles. For that, we describe the configuration, adaptation and extension of the subgroup discovery methodology in that context. In addition, we can analyze multiplex networks by considering the match between different networks, and deviations between the networks, respectively. A description of characteristic (mis-)matches in a multiplex network, for example, is presented in [13] regarding relations between online and offline social interaction networks. Outlining these examples, we demonstrate that local exceptionality detection is a flexible approach for compositional analysis in social interaction networks.

2.3 Exceptional Model Mining for Spatio-Temporal Analysis

Exploratory analysis on ubiquitous data needs to handle different heterogenous and complex data types. In [2, 8], we present an adaptation of subgroup discovery using exceptional model mining formalizations on ubiquitous social interaction networks. Then, we can detect locally exceptional patterns, e. g., corresponding to bursts or special events in a dynamic network. Furthermore, we propose subgroup discovery and assessment approaches for obtaining interesting descriptive patterns and provide a novel graph-based analysis approach for assessing the relations between the obtained subgroup set. This exploratory visualization approaches allows for the comparison of subgroups according to their relations to other subgroups and to include further parameters, e. g., geo-spatial distribution indicators. We present and discuss analysis results utilizing a real-world ubiquitous social media dataset.

3 Conclusions and Outlook

Subgroup discovery and exceptional model mining provide powerful and comprehensive methods for knowledge discovery and exploratory analyis in the context of local exceptionality detection. In this paper, we presented according approaches and methods, specifically targeting social interaction networks, and showed how to implement local exceptionality detection on both a methodological and practical level.

Interesting future directions for adapting and extending local exceptionality detection in social contexts include extended postprocessing and presentation options, e. g.,  [3]. In addition, extensions to predictive modeling, e. g., link prediction [2, 21] are interesting options to explore. Furthermore, extending the analysis of sequential data in online or offline social contexts, e. g., based on Markov chains as exceptional models [4, 10], or network dynamics [14] are further interesting options for future work.