From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles

  • Giona Casiraghi
  • Vahan Nanumyan
  • Ingo Scholtes
  • Frank Schweitzer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10540)

Abstract

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. The framework builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.

Keywords

Statistical analysis Graph theory Network inference Statistical ensemble Relational data Graph mining Graph analysis Network analysis Social network Social network analysis Community structures Data mining Social interactions 

References

  1. 1.
    Aicher, C., Jacobs, A.Z., Clauset, A.: Learning latent block structure in weighted networks. J. Complex Netw. 3(2), 221–248 (2015). https://academic.oup.com/comnet/article-lookup/doi/10.1093/comnet/cnu026 MathSciNetCrossRefGoogle Scholar
  2. 2.
    Anand, K., Bianconi, G.: Entropy measures for networks: toward an information theory of complex topologies. Phys. Rev. E 80, 045102 (2009)CrossRefGoogle Scholar
  3. 3.
    Casiraghi, G.: Multiplex network regression: how do relations drive interactions? arXiv preprint arXiv:1702.02048, February 2017. http://arxiv.org/abs/1702.02048
  4. 4.
    Cimini, G., Squartini, T., Garlaschelli, D., Gabrielli, A.: Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5(1), 15758 (2015). http://arxiv.org/abs/1411.7613%0A, http://dx.doi.org/10.1038/srep15758, http://www.nature.com/articles/srep15758
  5. 5.
    Cranshaw, J., Toch, E., Hong, J., Kittur, A., Sadeh, N.: Bridging the gap between physical location and online social networks. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing, UbiComp 2010, pp. 119–128. ACM, New York (2010)Google Scholar
  6. 6.
    De Choudhury, M., Mason, W.A., Hofman, J.M., Watts, D.J.: Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 301–310. ACM, New York (2010)Google Scholar
  7. 7.
    De Domenico, M., Lancichinetti, A., Arenas, A., Rosvall, M.: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5(1), 011027 (2015)Google Scholar
  8. 8.
    Eagle, N., Pentland, A.S., Lazer, D.: Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)CrossRefGoogle Scholar
  9. 9.
    Eagle, N., (Sandy) Pentland, A.: Reality mining: sensing complex social systems. Pers. Ubiquit. Comput. 10(4), 255–268 (2006)CrossRefGoogle Scholar
  10. 10.
    Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debrecen 6, 290–297 (1959)MathSciNetMATHGoogle Scholar
  11. 11.
    Fog, A.: Calculation methods for wallenius’ noncentral hypergeometric distribution. Commun. Stat. - Simul. Comput. 37(2), 258–273 (2008)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy, June 2017. http://arxiv.org/abs/1706.00230
  13. 13.
    Holme, P.: Modern temporal network theory: a colloquium. Europ. Phys. J. B 88(9), 1–30 (2015)CrossRefGoogle Scholar
  14. 14.
    Jacod, J., Protter, P.E.: Probability Essentials. Springer Science & Business Media, Heidelberg (2003)MATHGoogle Scholar
  15. 15.
    Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)CrossRefGoogle Scholar
  17. 17.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  18. 18.
    Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Newman, M.E.J., Peixoto, T.P.: Generalized communities in networks. Phys. Rev. Lett. 115, 088701 (2015)CrossRefGoogle Scholar
  20. 20.
    Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  21. 21.
    Peixoto, T.P.: Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89, 012804 (2014)CrossRefGoogle Scholar
  22. 22.
    Pham, H., Shahabi, C., Liu, Y.: EBM: an entropy-based model to infer social strength from spatiotemporal data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 265–276. ACM (2013)Google Scholar
  23. 23.
    Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 29(2), 173–191 (2007)CrossRefGoogle Scholar
  24. 24.
    Rosvall, M., Esquivel, A.V., Lancichinetti, A., West, J.D., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)CrossRefGoogle Scholar
  25. 25.
    Schein, A., Paisley, J., Blei, D.M., Wallach, H.: Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015. ACM (2015)Google Scholar
  26. 26.
    Scholtes, I.: When is a network a network? multi-order graphical model selection in pathways and temporal networks. In: KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, February 2017, to appearGoogle Scholar
  27. 27.
    Scholtes, I., Wider, N., Garas, A.: Higher-order aggregate networks in the analysis of temporal networks: path structures and centralities. Europ. Phys. J. B 89(3), 1–15 (2016). http://link.springer.com/article/10.1140:2016-60663-0 CrossRefGoogle Scholar
  28. 28.
    Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. 107(31), 13636–13641 (2010)CrossRefGoogle Scholar
  29. 29.
    Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 743–752. ACM, New York (2012)Google Scholar
  30. 30.
    Vidmer, A., Medo, M.: The essential role of time in network-based recommendation. EPL (Europhy. Lett.) 116(3), 30007 (2016)CrossRefGoogle Scholar
  31. 31.
    Wallenius, K.T.: Biased Sampling: The Noncentral Hypergeometric Probability Distribution. Ph.D. thesis, Stanford University (1963)Google Scholar
  32. 32.
    Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8(3), 1853–1891 (2014)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 981–990. ACM, New York (2010)Google Scholar
  34. 34.
    Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)CrossRefGoogle Scholar
  35. 35.
    Zhang, Y., Garas, A., Schweitzer, F.: Value of peripheral nodes in controlling multilayer scale-free networks. Phys. Rev. E 93, 012309 (2016). https://journals.aps.org/pre/abstract/10.1103/PhysRevE.93.012309 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Chair of Systems DesignETH ZürichZürichSwitzerland

Personalised recommendations