Abstract
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. The framework builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.
Keywords
- Statistical analysis
- Graph theory
- Network inference
- Statistical ensemble
- Relational data
- Graph mining
- Graph analysis
- Network analysis
- Social network
- Social network analysis
- Community structures
- Data mining
- Social interactions
This is a preview of subscription content, access via your institution.


Notes
- 1.
Note that we do not distinguish between the \(n\times n\) adjacency matrix \(\mathbf {A}\) and the \(n^{2}\times 1\) vector obtained by stacking.
References
Aicher, C., Jacobs, A.Z., Clauset, A.: Learning latent block structure in weighted networks. J. Complex Netw. 3(2), 221–248 (2015). https://academic.oup.com/comnet/article-lookup/doi/10.1093/comnet/cnu026
Anand, K., Bianconi, G.: Entropy measures for networks: toward an information theory of complex topologies. Phys. Rev. E 80, 045102 (2009)
Casiraghi, G.: Multiplex network regression: how do relations drive interactions? arXiv preprint arXiv:1702.02048, February 2017. http://arxiv.org/abs/1702.02048
Cimini, G., Squartini, T., Garlaschelli, D., Gabrielli, A.: Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5(1), 15758 (2015). http://arxiv.org/abs/1411.7613%0A, http://dx.doi.org/10.1038/srep15758, http://www.nature.com/articles/srep15758
Cranshaw, J., Toch, E., Hong, J., Kittur, A., Sadeh, N.: Bridging the gap between physical location and online social networks. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing, UbiComp 2010, pp. 119–128. ACM, New York (2010)
De Choudhury, M., Mason, W.A., Hofman, J.M., Watts, D.J.: Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 301–310. ACM, New York (2010)
De Domenico, M., Lancichinetti, A., Arenas, A., Rosvall, M.: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5(1), 011027 (2015)
Eagle, N., Pentland, A.S., Lazer, D.: Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)
Eagle, N., (Sandy) Pentland, A.: Reality mining: sensing complex social systems. Pers. Ubiquit. Comput. 10(4), 255–268 (2006)
Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debrecen 6, 290–297 (1959)
Fog, A.: Calculation methods for wallenius’ noncentral hypergeometric distribution. Commun. Stat. - Simul. Comput. 37(2), 258–273 (2008)
Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy, June 2017. http://arxiv.org/abs/1706.00230
Holme, P.: Modern temporal network theory: a colloquium. Europ. Phys. J. B 88(9), 1–30 (2015)
Jacod, J., Protter, P.E.: Probability Essentials. Springer Science & Business Media, Heidelberg (2003)
Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)
Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)
Newman, M.E.J., Peixoto, T.P.: Generalized communities in networks. Phys. Rev. Lett. 115, 088701 (2015)
Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Peixoto, T.P.: Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89, 012804 (2014)
Pham, H., Shahabi, C., Liu, Y.: EBM: an entropy-based model to infer social strength from spatiotemporal data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 265–276. ACM (2013)
Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 29(2), 173–191 (2007)
Rosvall, M., Esquivel, A.V., Lancichinetti, A., West, J.D., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)
Schein, A., Paisley, J., Blei, D.M., Wallach, H.: Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015. ACM (2015)
Scholtes, I.: When is a network a network? multi-order graphical model selection in pathways and temporal networks. In: KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, February 2017, to appear
Scholtes, I., Wider, N., Garas, A.: Higher-order aggregate networks in the analysis of temporal networks: path structures and centralities. Europ. Phys. J. B 89(3), 1–15 (2016). http://link.springer.com/article/10.1140:2016-60663-0
Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. 107(31), 13636–13641 (2010)
Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 743–752. ACM, New York (2012)
Vidmer, A., Medo, M.: The essential role of time in network-based recommendation. EPL (Europhy. Lett.) 116(3), 30007 (2016)
Wallenius, K.T.: Biased Sampling: The Noncentral Hypergeometric Probability Distribution. Ph.D. thesis, Stanford University (1963)
Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8(3), 1853–1891 (2014)
Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 981–990. ACM, New York (2010)
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)
Zhang, Y., Garas, A., Schweitzer, F.: Value of peripheral nodes in controlling multilayer scale-free networks. Phys. Rev. E 93, 012309 (2016). https://journals.aps.org/pre/abstract/10.1103/PhysRevE.93.012309
Acknowledgments
The authors acknowledge support from the Swiss State Secretariat for Education, Research and Innovation (SERI), Grant No. C14.0036, the MTEC Foundation project “The Influence of Interaction Patterns on Success in Socio-Technical Systems”, and EU COST Action TD1210 KNOWeSCAPE. The authors thank Rebekka Burkholz, Giacomo Vaccario, and Simon Schweighofer for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017). From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-67256-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67255-7
Online ISBN: 978-3-319-67256-4
eBook Packages: Computer ScienceComputer Science (R0)