From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles
- 1 Citations
- 28 Mentions
- 2.5k Downloads
Abstract
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. The framework builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.
Keywords
Statistical analysis Graph theory Network inference Statistical ensemble Relational data Graph mining Graph analysis Network analysis Social network Social network analysis Community structures Data mining Social interactionsNotes
Acknowledgments
The authors acknowledge support from the Swiss State Secretariat for Education, Research and Innovation (SERI), Grant No. C14.0036, the MTEC Foundation project “The Influence of Interaction Patterns on Success in Socio-Technical Systems”, and EU COST Action TD1210 KNOWeSCAPE. The authors thank Rebekka Burkholz, Giacomo Vaccario, and Simon Schweighofer for helpful discussions.
References
- 1.Aicher, C., Jacobs, A.Z., Clauset, A.: Learning latent block structure in weighted networks. J. Complex Netw. 3(2), 221–248 (2015). https://academic.oup.com/comnet/article-lookup/doi/10.1093/comnet/cnu026 MathSciNetCrossRefGoogle Scholar
- 2.Anand, K., Bianconi, G.: Entropy measures for networks: toward an information theory of complex topologies. Phys. Rev. E 80, 045102 (2009)CrossRefGoogle Scholar
- 3.Casiraghi, G.: Multiplex network regression: how do relations drive interactions? arXiv preprint arXiv:1702.02048, February 2017. http://arxiv.org/abs/1702.02048
- 4.Cimini, G., Squartini, T., Garlaschelli, D., Gabrielli, A.: Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5(1), 15758 (2015). http://arxiv.org/abs/1411.7613%0A, http://dx.doi.org/10.1038/srep15758, http://www.nature.com/articles/srep15758
- 5.Cranshaw, J., Toch, E., Hong, J., Kittur, A., Sadeh, N.: Bridging the gap between physical location and online social networks. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing, UbiComp 2010, pp. 119–128. ACM, New York (2010)Google Scholar
- 6.De Choudhury, M., Mason, W.A., Hofman, J.M., Watts, D.J.: Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 301–310. ACM, New York (2010)Google Scholar
- 7.De Domenico, M., Lancichinetti, A., Arenas, A., Rosvall, M.: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5(1), 011027 (2015)Google Scholar
- 8.Eagle, N., Pentland, A.S., Lazer, D.: Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)CrossRefGoogle Scholar
- 9.Eagle, N., (Sandy) Pentland, A.: Reality mining: sensing complex social systems. Pers. Ubiquit. Comput. 10(4), 255–268 (2006)CrossRefGoogle Scholar
- 10.Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debrecen 6, 290–297 (1959)MathSciNetzbMATHGoogle Scholar
- 11.Fog, A.: Calculation methods for wallenius’ noncentral hypergeometric distribution. Commun. Stat. - Simul. Comput. 37(2), 258–273 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy, June 2017. http://arxiv.org/abs/1706.00230
- 13.Holme, P.: Modern temporal network theory: a colloquium. Europ. Phys. J. B 88(9), 1–30 (2015)CrossRefGoogle Scholar
- 14.Jacod, J., Protter, P.E.: Probability Essentials. Springer Science & Business Media, Heidelberg (2003)zbMATHGoogle Scholar
- 15.Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)MathSciNetCrossRefGoogle Scholar
- 16.Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)CrossRefGoogle Scholar
- 17.Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
- 18.Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
- 19.Newman, M.E.J., Peixoto, T.P.: Generalized communities in networks. Phys. Rev. Lett. 115, 088701 (2015)CrossRefGoogle Scholar
- 20.Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
- 21.Peixoto, T.P.: Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89, 012804 (2014)CrossRefGoogle Scholar
- 22.Pham, H., Shahabi, C., Liu, Y.: EBM: an entropy-based model to infer social strength from spatiotemporal data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 265–276. ACM (2013)Google Scholar
- 23.Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 29(2), 173–191 (2007)CrossRefGoogle Scholar
- 24.Rosvall, M., Esquivel, A.V., Lancichinetti, A., West, J.D., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)CrossRefGoogle Scholar
- 25.Schein, A., Paisley, J., Blei, D.M., Wallach, H.: Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015. ACM (2015)Google Scholar
- 26.Scholtes, I.: When is a network a network? multi-order graphical model selection in pathways and temporal networks. In: KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, February 2017, to appearGoogle Scholar
- 27.Scholtes, I., Wider, N., Garas, A.: Higher-order aggregate networks in the analysis of temporal networks: path structures and centralities. Europ. Phys. J. B 89(3), 1–15 (2016). http://link.springer.com/article/10.1140:2016-60663-0 CrossRefGoogle Scholar
- 28.Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. 107(31), 13636–13641 (2010)CrossRefGoogle Scholar
- 29.Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 743–752. ACM, New York (2012)Google Scholar
- 30.Vidmer, A., Medo, M.: The essential role of time in network-based recommendation. EPL (Europhy. Lett.) 116(3), 30007 (2016)CrossRefGoogle Scholar
- 31.Wallenius, K.T.: Biased Sampling: The Noncentral Hypergeometric Probability Distribution. Ph.D. thesis, Stanford University (1963)Google Scholar
- 32.Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8(3), 1853–1891 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
- 33.Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 981–990. ACM, New York (2010)Google Scholar
- 34.Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)CrossRefGoogle Scholar
- 35.Zhang, Y., Garas, A., Schweitzer, F.: Value of peripheral nodes in controlling multilayer scale-free networks. Phys. Rev. E 93, 012309 (2016). https://journals.aps.org/pre/abstract/10.1103/PhysRevE.93.012309 CrossRefGoogle Scholar