Advertisement

From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles

  • Giona CasiraghiEmail author
  • Vahan Nanumyan
  • Ingo Scholtes
  • Frank Schweitzer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10540)

Abstract

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. The framework builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.

Keywords

Statistical analysis Graph theory Network inference Statistical ensemble Relational data Graph mining Graph analysis Network analysis Social network Social network analysis Community structures Data mining Social interactions 

1 Motivation

Advances in data sensing and collection give rise to an increasing volume of data that capture dyadic relations between elements or actors in social, natural, and technical systems. While it is common to apply graph mining and network analysis to such relational data, it is often questionable whether the application of these techniques is actually justified. Consider, for instance, various forms of time series data, which not only tell us which elements of a complex system are related but also when or in which order relations occur. Such data give rise to temporal networks, which question the application of widely used network-based modeling and data mining techniques [13, 24, 26, 27, 30]. Apart from temporal information, we often have access to data that capture multiple types of relations or interactions. The resulting multi-layer network topologies give rise to complications that threaten standard techniques, e.g., to infer and analyze social networks, detect community structures, or to model and control dynamical processes in networked systems [3, 7, 16, 28, 35].

The challenges outlined above are due to the growing availability of additional information – such as time-stamped, sequential or multi-dimensional relational data – which must be incorporated into network-based techniques to model and analyze relational data. However, we are often confronted with situations in which we lack information that is needed to interpret observed relations. Consider, for instance, data sets that capture the simultaneous presence of two users at the same location, the joint expression of two genes in a DNA microarray, or the co-occurrence of two words in the same document. Each of these observed relations can either be due to an underlying social tie, a functional relationship between genes, a semantic link between two words, or it could simply have occurred by mere chance. Rather than naîvely analyzing such data from the perspective of graphs or networks, we should thus treat them as noisy observations that may or may not indicate true relations between a system’s elements.

Principled and efficient methods to solve this network inference problem are of major importance for the modeling and analysis of social networks, the reconstruction of biological networks, and the mining of semantic structures in information systems. The problem has received significant attention from the data mining and machine learning community, as well as from researchers in graph theory and network science. Especially in the latter community, the problem is commonly addressed using statistical ensembles, i.e., generative stochastic models of graphs that can be used for inference, learning and modeling tasks. A common issue of these techniques is that the underlying statistical ensembles are not analytically tractable, thus requiring time-consuming numerical simulations and Monte-Carlo sampling techniques.

To address this problem, in this short paper we propose generalized hypergeometric ensembles (gHypE), a novel framework of statistical ensembles to infer significant links in relational data. The framework can be viewed as generalization of the configuration model, which is commonly used to generate random graph topologies with a given sequence of node degrees. Our framework extends this state-of-the-art graph-theoretic approach in two ways. First, it provides analytically tractable probability spaces of directed and undirected multi-edge graphs, eliminating the need for expensive numerical simulations. Second, it allows to account for known factors that influence the occurrence of interactions, such as known group structures, similarities between elements, or other forms of biases. We demonstrate our framework in two real-world data sets that capture spatio-temporal proximities of actors in a social system. The results show that our framework provides interesting new perspectives for the mining and learning in graphs.

2 Background and Related Work

The problem of inferring significant links in relational data has been addressed in a number of works. In the following, we coarsely categorize them into three lines of research.

Applying predictive analytics techniques, a first set of works studied the problem from the perspective of link prediction [17]. In [29], a supervised learning technique is used to predict types of social ties based on unlabeled interactions. The authors of [25] show that tensor factorization techniques allow to infer international relations from data that capture how often two countries co-occur in news reports. In [33], a link-based latent variable model is used to predict friendship relations using data on social interactions.

Using the special characteristics of time-stamped social interactions or geographical co-occurrences, a second line of works has additionally accounted for spatio-temporal information. Studying data on time-stamped proximities of students at MIT campus, the authors of [8] show that the temporal and spatial distribution of proximity events allows to infer social ties with high accuracy. In [5], a model that captures location diversity, regularity, intensity and duration is used to predict social ties based on co-location events. An entropy-based approach taking into account the diversity of interactions’ locations has been used in [22].

Addressing scenarios where neither training data nor spatio-temporal information is available, a third line of works is based on generative models for random graphs. Such models can be used as null models for observed dyadic interactions, which help us to assess whether the relations between a given pair of elements occur significantly more often than expected. Existing works in this area typically rely on standard modeling frameworks, such as exponential random graphs [4, 23], or the configuration model for graphs with given degree sequence or distribution [18]. On the one hand, these approaches provide statistically principled network inference and learning methods for general relational data [2, 12, 19, 32]. On the other hand, the underlying generative models are often not analytically tractable, thus requiring expensive numerical simulations [19, 23]. Proposing a framework of analytically tractable generative models for directed and undirected multi-edge graphs, in this work we close this research gap.

3 Generalized Hypergeometric Ensembles

In the following we introduce our framework step by step. For this, let us first consider a data set consisting of repeated dyadic interactions (ij), which have been observed between two nodes i and j. Such a data set can be represented as a multi-edge, or weighted, network \(G=(V,E)\), where V is a set of n nodes, and \(E \subseteq V \times V\) is a multi-set of (directed or undirected) edges. Let us further define an adjacency matrix \(\hat{\mathbf {A}}\), where entries \(\hat{A}_{ij}\in \mathbb N_{0}\) capture the weight of an edge \((i,j)\in V \times V\), i.e., the multiplicity of an edge (ij) in the multi-set E. For each node \(i \in V\) we further define the (weighted) in-degree \(\hat{k}_{\mathrm {in}}(i) := \sum _{j \in V} \hat{A}_{ji}\) and the (weighted) out-degree \(\hat{k}_{\mathrm {out}}(i) := \sum _{j \in V} \hat{A}_{ij}\).

Rather than directly applying graph mining and learning techniques to such a weighted graph G, in the following we are interested in a crucial question: Which of the links between nodes are significant, i.e., which of the observed weights \(A_{ij}\) go beyond what is expected at random, given (i) the total number of observed interactions, and (ii) the number of times individual nodes engage in interactions? To answer this question, we take the common approach of defining a stochastic model that generates a so-called statistical ensemble, i.e., a probability space of graphs. Different from existing approaches, where link weights are assumed to be continuous (e.g. [1, 6]), we are interested in a statistical ensemble that (i) can handle directed and multi-edge graphs, (ii) is analytically tractable, and (iii) thus allows us to assess the significance of links in a theoretically principled way.

Our construction of a statistical ensemble follows the general idea of the Molloy-Reed configuration model, which is to randomly shuffle the topology of a given network G while preserving the observed node degrees. For this, the configuration model generates edges between randomly sampled pairs of nodes in such a way that the exact observed degrees of all nodes are preserved. Different from this approach, we assume a sampling of m multi-edges such that the sequence of expected degrees of nodes is preserved. For this, for each pair of nodes i and j, we first define the maximum number \(\varXi _{ij}\) of multi-edges that can possibly exist between nodes i and j as \(\varXi _{ij} := \hat{k}_{\mathrm {out}}(i) \hat{k}_{\mathrm {in}}(j)\) (cf. [15, 20]). The maximally possible numbers of links between all pairs of nodes can then be conveniently represented in matrix form as \(\mathbf {\Xi } := \left( \varXi _{ij}\right) _{i,j \in V}\).

Our statistical ensemble is then defined by the following sampling procedure: For each pair of nodes ij, we sample edges from a set of \(\varXi _{ij}\) possible multi-edges uniformly at random. This can be viewed as an urn problem [14] where the edges to be sampled are represented by balls in an urn. By representing edges connecting different pairs of nodes (ij) as balls having \(n^{2}=\left| V\times V \right| \) different colours, we obtain an urn with a total of \(M=\sum _{i,j}\varXi _{ij}\) differently colored balls. With this, the sampling of a network according to our model corresponds to drawing exactly m balls from this urn. Each adjacency matrix \(\mathbf {A}\), with entries \(A_{ij}\) such that \(\sum _{i,j} A_{ij}=m\), corresponds to one particular realization drawn from this ensemble. The probability to draw exactly \(\mathbf {A}=\{A_{ij}\}_{i,j \in V}\) edges between each pair of nodes is given by the multivariate hypergeometric distribution1
$$\begin{aligned} \Pr (\mathbf {A}) = \genfrac(){0.0pt}0{M}{m}^{-1}\prod _{i,j}\genfrac(){0.0pt}0{\varXi _{ij}}{A_{ij}}. \end{aligned}$$
(1)
For each pair of nodes \(i,j \in V\), the probability to draw exactly \(\hat{A}_{ij}\) edges between i and j is given by the marginal distributions of the multivariate hypergeometric distribution. We thus arrive at a hypergeometric statistical ensemble, which (i) generalizes the configuration model to directed, multi-edge graphs, (ii) has a fixed sequence of expected degrees, and (iii) is analytically tractable. Moreover, it provides a framework to generalize other random graph models like, e.g., the multi-edge version of the Erdös-Rényi model [10], where only n and m are fixed, while there are no constraints on the degree sequence. This corresponds to a definition of \(\mathbf {\Xi }\) with \(\varXi _{ij}=m^{2}/n^{2}=\;\)const. which directly results from \(\left\langle k_{\mathrm {in}}(i) \right\rangle =\left\langle k_{\mathrm {out}}(i) \right\rangle =m/n\).
The sampling procedure above gives a stochastic model for weighted, directed graph in which (i) the expected weighted in- and out-degree sequence is fixed, and (ii) interactions between nodes are generated at random. This provides a null model in which the probability for a particular pair of nodes to be connected by an edge is only influenced by combinatorial effects, and thus only depends on the node degrees. For scenarios where we have additional information on factors that influence the formation of edges, we can further generalize the ensemble above as follows: We introduce a matrix \(\mathbf {\Omega }\) whose entries \(\varOmega _{ij}\) capture relative dyadic propensities, i.e., the tendency of a node i to form an edge specifically to node j. These propensities \(\varOmega _{ij}\) bias the edge sampling process described above. This implies that entry \(\varOmega _{ij}\) only captures the propensity that goes beyond the tendency of a node i to connect to a node j that is due to combinatorial effects, i.e., the in-degree of j and the out-degree of i. In analogy to the urn model, here a biased sampling implies that the probability of drawing balls of a given color (representing all possible edges between a given pair of nodes) does not only depend on their number but also on the respective relative propensities. The probability distribution resulting from such a biased sampling process is given by the multivariate Wallenius’ non-central hypergeometric distribution [11, 31]:
$$\begin{aligned} \Pr (\mathbf {A})=\left[ \prod _{i,j}{\genfrac(){0.0pt}0{\varXi _{ij}}{A_{ij}}}\right] \int _{0}^{1}{\prod _{i,j}{\left( 1-z^{\frac{\varOmega _{ij}}{S_{\mathbf {\Omega }} }}\right) ^{A_{ij}}}dz} \end{aligned}$$
(2)
with \(S_{\mathbf {\Omega }}= \sum _{i,j} \varOmega _{ij}(\varXi _{ij}-A_{ij})\).
Similar to the unbiased sampling described above,the probability to observe a particular number \(\hat{A}_{ij}\) of edges between a pair of nodes i and j can again be calculated from the marginal distribution as
$$\begin{aligned} \begin{aligned} \Pr (&A_{ij}=\hat{A}_{ij}) = \genfrac(){0.0pt}0{\varXi _{ij}}{\hat{A}_{ij}}\genfrac(){0.0pt}0{M-\varXi _{ij}}{m-\hat{A}_{ij}}\cdot \\&\int _{0}^{1} \left[ \left( 1 - z^{ \frac{\varOmega _{ij}}{S_{\mathbf {\Omega }}}} \right) ^{\hat{A}_{ij}} \left( 1-z^{ \frac{\bar{\varOmega }_{\setminus (i,j)}}{S_{\mathbf {\Omega }}}} \right) ^{m-\hat{A}_{ij}} \right] dz \end{aligned} \end{aligned}$$
(3)
where \(\bar{\varOmega }_{\setminus (i,j)} = (M-\varXi _{ij})^{-1}\sum _{(l,m)\in V\times V\backslash (i,j)}{\varXi _{lm}\varOmega _{lm}}\).

Note that for the special case of a uniform dyadic propensity matrix \(\mathbf {\Omega } \equiv \text {const}\), we recover Eq. 1 for the unbiased case, i.e., where all dyadic propensities are identical. We thus obtain a general framework of statistical ensembles which (i) allows to encode arbitrary a priori tendencies of nodes to interact, and (ii) provides an analytical expression for the probability to observe a given number of interactions between any pair of nodes.

4 Inferring Significant Social Ties

In the following, we demonstrate how our framework can be used to infer significant links in two relational data sets: (RM) captures time-stamped proximities between students and faculty at MIT [9] recorded via smart devices. (ZKC) covers frequencies of self-reported encounters between members of a university Karate club collected by Wayne Zachary [34]. We denote the weighted adjacency matrix capturing observed dyadic interactions as \(\mathbf {\hat{A}}\). For a given significance threshold \(\alpha \), we then identify significant links by filtering matrix \(\mathbf {\hat{A}}\) by a threshold \(\Pr (A_{ij} \le \hat{A}_{ij}) > 1 - \alpha \) based on Eq. 3. This can be seen as assigning p-values to dyads (ij), obtaining a high-pass noise filter for entries in the adjacency matrix.
Fig. 1.

Illustration of our approach in the (RM) data set capturing proximity of students and staff at MIT campus. For the observed weighted adjacency matrix (a) and a given significance threshold, our framework allows to establish a high-pass noise filter matrix (b), which can be used to obtain a filtered adjacency matrix containing only significant links (c). A visual comparison of the output of a community detection algorithm on the unfiltered (d) and filtered (f) graphs shows that detected partitions in the filtered one better correspond to ground truth lab affiliations and classes (e). (a) Unfiltered weighted adjacency matrix. (b) High-pass noise filter matrix. (c) Filtered adjacency matrix containing only significant links. (d) Unfiltered graph. (e) Comparison of ground truth lab affiliations (center column) vs. detected communities in the unfiltered (left column) and filtered (right column) graph. (f) Filtered graph.

To illustrate our approach, Fig. 1(a) shows the entries of the (original) adjacency matrix \(\mathbf {A}\) for (RM). The high-pass noise filter resulting from our methodology (using \(\alpha =0.01\)) is shown in Fig. 1(b), where black entries correspond to pairs of nodes with non-significant links. The application of this filter to the original matrix yields the noise-filtered matrix shown in Fig. 1(c). While in the full network there are 721, 889 observed multi-edges amounting to 2, 952 distinct links, after filtering there are 626 \((21.2\%)\) significant links left (617, 069 multi-edges, \(85.5\%\) of the original). We validate the benefit of filtering the original interactions in (RM) by comparing the output of a standard community detection algorithm – the degree-corrected block model [21] – in (i) the original, unfiltered graph shown in Fig. 1(d), and (ii) the filtered, significant graph shown in Fig. 1(f). Using known classes of students and affiliations of staff members as ground truth allows us to compare the quality of the community detection. Figure 1(e) shows the set overlaps between the ground truth labels (middle column) and detected partitions in the unfiltered (left column) and filtered graph (right column). Due to the high number of non-significant links in the unfiltered graph, the algorithm only detects three partitions, each spanning multiple labs and classes. In contrast, applying the algorithm to the filtered graph yields six partitions that better capture the ground truth lab and class structure (cf. Fig. 1(e)). As expected, detected partitions do not perfectly correspond to the ground truth, since labs and classes are likely not the only driving force behind observed proximities.
Fig. 2.

Observed (a) and filtered (b) weighted graphs for the (ZKC) data set, capturing encounters between members of a Karate club. The filtered graph shows that most of the observed encounters can be explained by random effects resulting from the club members’ separation into two classes.

A major advantage of gHypEs is that, by specifying a non-uniform matrix \(\mathbf {\Omega }\), we can additionally encode known factors that influence the occurrence of interactions between nodes, while still obtaining an analytically tractable ensemble. In our second illustrative example, we use this to encode the known structure of two separate Karate classes in the (ZKC) data. These two classes naturally influence the frequency of encounters between actors beyond what would be expected “at random”. We incorporate this prior knowledge via a block matrix \(\mathbf {\Omega }\) that assigns higher dyadic propensities to pairs of actors in the same class (cf. [3]). This approach allows to establish a “random baseline” accounting both (i) for combinatorial effects due to heterogeneous node degrees, and (ii) the known group structure in the data. Using a significance threshold of \(\alpha =0.01\), for (ZKC) this yields the striking result that only 8 out of 78 observed links are significant (\(\sim 90\%\) of 231 observed multi-edges are filtered out, cf. Fig. 2). In other words, taking into account the partitioning of members in two classes for (ZKC) almost all encounters between club members can simply be explained by random effects. Figure 2 compares the original weighted network, illustrated in Fig. 2(a), and the filtered network, in Fig. 2(b).

5 Conclusion

In this short paper we introduce gHypEs, a broad class of statistical ensembles of graphs that can be used to infer significant links from noisy data. Our work makes three important contributions: First, we provide an analytically tractable statistical model of directed and undirected multi-edge graphs that can be used for inference and learning tasks. Second, the formulation of our ensemble highlights a – to the best of our knowledge – previously unknown relation between random graph theory and Wallenius‘non-central hypergeometric distribution. And finally, different from existing statistical ensembles such as, e.g., the configuration model, our framework can be used to encode prior knowledge on factors that influence the formation of relations. This flexible approach allows for a tuning of the “random baseline”, opening perspectives for a statistically principled network inference that accounts for effects that are not purely random. We thus argue that our work advances the theoretical foundation for the mining of relational data on social systems. It further highlights that principled model selection and hypothesis testing are crucial prerequisites that should precede the application of network-based data mining and modeling techniques.

Footnotes

  1. 1.

    Note that we do not distinguish between the \(n\times n\) adjacency matrix \(\mathbf {A}\) and the \(n^{2}\times 1\) vector obtained by stacking.

Notes

Acknowledgments

The authors acknowledge support from the Swiss State Secretariat for Education, Research and Innovation (SERI), Grant No. C14.0036, the MTEC Foundation project “The Influence of Interaction Patterns on Success in Socio-Technical Systems”, and EU COST Action TD1210 KNOWeSCAPE. The authors thank Rebekka Burkholz, Giacomo Vaccario, and Simon Schweighofer for helpful discussions.

References

  1. 1.
    Aicher, C., Jacobs, A.Z., Clauset, A.: Learning latent block structure in weighted networks. J. Complex Netw. 3(2), 221–248 (2015). https://academic.oup.com/comnet/article-lookup/doi/10.1093/comnet/cnu026 MathSciNetCrossRefGoogle Scholar
  2. 2.
    Anand, K., Bianconi, G.: Entropy measures for networks: toward an information theory of complex topologies. Phys. Rev. E 80, 045102 (2009)CrossRefGoogle Scholar
  3. 3.
    Casiraghi, G.: Multiplex network regression: how do relations drive interactions? arXiv preprint arXiv:1702.02048, February 2017. http://arxiv.org/abs/1702.02048
  4. 4.
    Cimini, G., Squartini, T., Garlaschelli, D., Gabrielli, A.: Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5(1), 15758 (2015). http://arxiv.org/abs/1411.7613%0A, http://dx.doi.org/10.1038/srep15758, http://www.nature.com/articles/srep15758
  5. 5.
    Cranshaw, J., Toch, E., Hong, J., Kittur, A., Sadeh, N.: Bridging the gap between physical location and online social networks. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing, UbiComp 2010, pp. 119–128. ACM, New York (2010)Google Scholar
  6. 6.
    De Choudhury, M., Mason, W.A., Hofman, J.M., Watts, D.J.: Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 301–310. ACM, New York (2010)Google Scholar
  7. 7.
    De Domenico, M., Lancichinetti, A., Arenas, A., Rosvall, M.: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5(1), 011027 (2015)Google Scholar
  8. 8.
    Eagle, N., Pentland, A.S., Lazer, D.: Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)CrossRefGoogle Scholar
  9. 9.
    Eagle, N., (Sandy) Pentland, A.: Reality mining: sensing complex social systems. Pers. Ubiquit. Comput. 10(4), 255–268 (2006)CrossRefGoogle Scholar
  10. 10.
    Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debrecen 6, 290–297 (1959)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Fog, A.: Calculation methods for wallenius’ noncentral hypergeometric distribution. Commun. Stat. - Simul. Comput. 37(2), 258–273 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy, June 2017. http://arxiv.org/abs/1706.00230
  13. 13.
    Holme, P.: Modern temporal network theory: a colloquium. Europ. Phys. J. B 88(9), 1–30 (2015)CrossRefGoogle Scholar
  14. 14.
    Jacod, J., Protter, P.E.: Probability Essentials. Springer Science & Business Media, Heidelberg (2003)zbMATHGoogle Scholar
  15. 15.
    Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)CrossRefGoogle Scholar
  17. 17.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  18. 18.
    Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Newman, M.E.J., Peixoto, T.P.: Generalized communities in networks. Phys. Rev. Lett. 115, 088701 (2015)CrossRefGoogle Scholar
  20. 20.
    Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  21. 21.
    Peixoto, T.P.: Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89, 012804 (2014)CrossRefGoogle Scholar
  22. 22.
    Pham, H., Shahabi, C., Liu, Y.: EBM: an entropy-based model to infer social strength from spatiotemporal data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 265–276. ACM (2013)Google Scholar
  23. 23.
    Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 29(2), 173–191 (2007)CrossRefGoogle Scholar
  24. 24.
    Rosvall, M., Esquivel, A.V., Lancichinetti, A., West, J.D., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)CrossRefGoogle Scholar
  25. 25.
    Schein, A., Paisley, J., Blei, D.M., Wallach, H.: Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015. ACM (2015)Google Scholar
  26. 26.
    Scholtes, I.: When is a network a network? multi-order graphical model selection in pathways and temporal networks. In: KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, February 2017, to appearGoogle Scholar
  27. 27.
    Scholtes, I., Wider, N., Garas, A.: Higher-order aggregate networks in the analysis of temporal networks: path structures and centralities. Europ. Phys. J. B 89(3), 1–15 (2016). http://link.springer.com/article/10.1140:2016-60663-0 CrossRefGoogle Scholar
  28. 28.
    Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. 107(31), 13636–13641 (2010)CrossRefGoogle Scholar
  29. 29.
    Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 743–752. ACM, New York (2012)Google Scholar
  30. 30.
    Vidmer, A., Medo, M.: The essential role of time in network-based recommendation. EPL (Europhy. Lett.) 116(3), 30007 (2016)CrossRefGoogle Scholar
  31. 31.
    Wallenius, K.T.: Biased Sampling: The Noncentral Hypergeometric Probability Distribution. Ph.D. thesis, Stanford University (1963)Google Scholar
  32. 32.
    Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8(3), 1853–1891 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 981–990. ACM, New York (2010)Google Scholar
  34. 34.
    Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)CrossRefGoogle Scholar
  35. 35.
    Zhang, Y., Garas, A., Schweitzer, F.: Value of peripheral nodes in controlling multilayer scale-free networks. Phys. Rev. E 93, 012309 (2016). https://journals.aps.org/pre/abstract/10.1103/PhysRevE.93.012309 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Chair of Systems DesignETH ZürichZürichSwitzerland

Personalised recommendations