Skip to main content

From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles

Part of the Lecture Notes in Computer Science book series (LNISA,volume 10540)


The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. The framework builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.


  • Statistical analysis
  • Graph theory
  • Network inference
  • Statistical ensemble
  • Relational data
  • Graph mining
  • Graph analysis
  • Network analysis
  • Social network
  • Social network analysis
  • Community structures
  • Data mining
  • Social interactions

This is a preview of subscription content, access via your institution.

Fig. 1.
Fig. 2.


  1. 1.

    Note that we do not distinguish between the \(n\times n\) adjacency matrix \(\mathbf {A}\) and the \(n^{2}\times 1\) vector obtained by stacking.


  1. Aicher, C., Jacobs, A.Z., Clauset, A.: Learning latent block structure in weighted networks. J. Complex Netw. 3(2), 221–248 (2015).

    CrossRef  MathSciNet  Google Scholar 

  2. Anand, K., Bianconi, G.: Entropy measures for networks: toward an information theory of complex topologies. Phys. Rev. E 80, 045102 (2009)

    CrossRef  Google Scholar 

  3. Casiraghi, G.: Multiplex network regression: how do relations drive interactions? arXiv preprint arXiv:1702.02048, February 2017.

  4. Cimini, G., Squartini, T., Garlaschelli, D., Gabrielli, A.: Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 5(1), 15758 (2015).,,

  5. Cranshaw, J., Toch, E., Hong, J., Kittur, A., Sadeh, N.: Bridging the gap between physical location and online social networks. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing, UbiComp 2010, pp. 119–128. ACM, New York (2010)

    Google Scholar 

  6. De Choudhury, M., Mason, W.A., Hofman, J.M., Watts, D.J.: Inferring relevant social networks from interpersonal communication. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 301–310. ACM, New York (2010)

    Google Scholar 

  7. De Domenico, M., Lancichinetti, A., Arenas, A., Rosvall, M.: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5(1), 011027 (2015)

    Google Scholar 

  8. Eagle, N., Pentland, A.S., Lazer, D.: Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106(36), 15274–15278 (2009)

    CrossRef  Google Scholar 

  9. Eagle, N., (Sandy) Pentland, A.: Reality mining: sensing complex social systems. Pers. Ubiquit. Comput. 10(4), 255–268 (2006)

    CrossRef  Google Scholar 

  10. Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debrecen 6, 290–297 (1959)

    MathSciNet  MATH  Google Scholar 

  11. Fog, A.: Calculation methods for wallenius’ noncentral hypergeometric distribution. Commun. Stat. - Simul. Comput. 37(2), 258–273 (2008)

    CrossRef  MathSciNet  MATH  Google Scholar 

  12. Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy, June 2017.

  13. Holme, P.: Modern temporal network theory: a colloquium. Europ. Phys. J. B 88(9), 1–30 (2015)

    CrossRef  Google Scholar 

  14. Jacod, J., Protter, P.E.: Probability Essentials. Springer Science & Business Media, Heidelberg (2003)

    MATH  Google Scholar 

  15. Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)

    CrossRef  MathSciNet  Google Scholar 

  16. Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)

    CrossRef  Google Scholar 

  17. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)

    CrossRef  Google Scholar 

  18. Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)

    CrossRef  MathSciNet  MATH  Google Scholar 

  19. Newman, M.E.J., Peixoto, T.P.: Generalized communities in networks. Phys. Rev. Lett. 115, 088701 (2015)

    CrossRef  Google Scholar 

  20. Newman, M.E.J.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)

    CrossRef  Google Scholar 

  21. Peixoto, T.P.: Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89, 012804 (2014)

    CrossRef  Google Scholar 

  22. Pham, H., Shahabi, C., Liu, Y.: EBM: an entropy-based model to infer social strength from spatiotemporal data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 265–276. ACM (2013)

    Google Scholar 

  23. Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 29(2), 173–191 (2007)

    CrossRef  Google Scholar 

  24. Rosvall, M., Esquivel, A.V., Lancichinetti, A., West, J.D., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014)

    CrossRef  Google Scholar 

  25. Schein, A., Paisley, J., Blei, D.M., Wallach, H.: Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015. ACM (2015)

    Google Scholar 

  26. Scholtes, I.: When is a network a network? multi-order graphical model selection in pathways and temporal networks. In: KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, February 2017, to appear

    Google Scholar 

  27. Scholtes, I., Wider, N., Garas, A.: Higher-order aggregate networks in the analysis of temporal networks: path structures and centralities. Europ. Phys. J. B 89(3), 1–15 (2016).

    CrossRef  Google Scholar 

  28. Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. 107(31), 13636–13641 (2010)

    CrossRef  Google Scholar 

  29. Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 743–752. ACM, New York (2012)

    Google Scholar 

  30. Vidmer, A., Medo, M.: The essential role of time in network-based recommendation. EPL (Europhy. Lett.) 116(3), 30007 (2016)

    CrossRef  Google Scholar 

  31. Wallenius, K.T.: Biased Sampling: The Noncentral Hypergeometric Probability Distribution. Ph.D. thesis, Stanford University (1963)

    Google Scholar 

  32. Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8(3), 1853–1891 (2014)

    CrossRef  MathSciNet  MATH  Google Scholar 

  33. Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 981–990. ACM, New York (2010)

    Google Scholar 

  34. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)

    CrossRef  Google Scholar 

  35. Zhang, Y., Garas, A., Schweitzer, F.: Value of peripheral nodes in controlling multilayer scale-free networks. Phys. Rev. E 93, 012309 (2016).

    CrossRef  Google Scholar 

Download references


The authors acknowledge support from the Swiss State Secretariat for Education, Research and Innovation (SERI), Grant No. C14.0036, the MTEC Foundation project “The Influence of Interaction Patterns on Success in Socio-Technical Systems”, and EU COST Action TD1210 KNOWeSCAPE. The authors thank Rebekka Burkholz, Giacomo Vaccario, and Simon Schweighofer for helpful discussions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Giona Casiraghi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017). From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67255-7

  • Online ISBN: 978-3-319-67256-4

  • eBook Packages: Computer ScienceComputer Science (R0)