Advertisement

Sampling on Networks: Estimating Eigenvector Centrality on Incomplete Networks

  • Nicolò Ruggeri
  • Caterina De BaccoEmail author
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 881)

Abstract

We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios where data collection is expensive, the network is too big for data storage capacity or only partial information is available. The sampling algorithm is theoretically grounded by results derived from spectral approximation theory. We studied the problem on both synthetic and real data and tested the performance comparing with state-of-the-art methods. We show that approximations obtained from such methods are not always reliable and that our algorithm, while preserving computational scalability, improves performance under some relevant error measures.

Keywords

Sampling Networks Eigenvector centrality 

References

  1. 1.
    De Choudhury, M., Lin, Y.R., Sundaram, H., Candan, K.S., Xie, L., Kelliher, A.: How does the data sampling strategy impact the discovery of information diffusion in social media?. In: Fourth International AAAI Conference on Weblogs and Social Media (2010)Google Scholar
  2. 2.
    Sadikov, E., Medina, M., Leskovec, J., Garcia-Molina, H.: Correcting for missing data in information cascades. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 55–64. ACM (2011)Google Scholar
  3. 3.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)Google Scholar
  4. 4.
    Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference Proceedings DCC 2001, pp. 203–212. IEEE (2001)Google Scholar
  5. 5.
    Frank, O.: Network sampling and model fitting. In: Models and Methods in Social Network Analysis, pp. 31–56 (2005)Google Scholar
  6. 6.
    Han, J.-D.J., Dupuy, D., Bertin, N., Cusick, M.E., Vidal, M.: Effect of sampling on topology predictions of protein-protein interaction networks. Nat. Biotechnol. 23(7), 839 (2005)CrossRefGoogle Scholar
  7. 7.
    Lee, S.H., Kim, P.-J., Jeong, H.: Statistical properties of sampled networks. Phys. Rev. E 73(1), 016102 (2006)CrossRefGoogle Scholar
  8. 8.
    Kossinets, G.: Effects of missing data in social networks. Soc. Netw. 28(3), 247–268 (2006)CrossRefGoogle Scholar
  9. 9.
    Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2(1), 113–120 (1972)CrossRefGoogle Scholar
  10. 10.
    Costenbader, E., Valente, T.W.: The stability of centrality measures when networks are sampled. Soc. Netw. 25(4), 283–307 (2003)CrossRefGoogle Scholar
  11. 11.
    Saad, Y.: Numerical Methods for Large Eigenvalues Problems. Manchester University Press, Manchester (2011)CrossRefGoogle Scholar
  12. 12.
    Blagus, N., Šubelj, L., Bajec, M.: Empirical comparison of network sampling: how to choose the most appropriate method? Physica A: Stat. Mech. Appl. 477, 136–148 (2017)CrossRefGoogle Scholar
  13. 13.
    Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? comparing data from Twitter’s streaming API with Twitter’s firehose. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)Google Scholar
  14. 14.
    Stutzbach, D., Rejaie, R., Duffield, N., Sen, S., Willinger, W.: On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw. (TON) 17(2), 377–390 (2009)CrossRefGoogle Scholar
  15. 15.
    Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 283–292. IEEE (2008)Google Scholar
  16. 16.
    Stumpf, M.P., Wiuf, C.: Sampling properties of random graphs: the degree distribution. Phys. Rev. E 72(3), 036118 (2005)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Ganguly, A., Kolaczyk, E.D.: Estimation of vertex degrees in a sampled network. In: 2017 51st Asilomar Conference on Signals, Systems, and Computers, pp. 967–974. IEEE (2018)Google Scholar
  18. 18.
    Antunes, N., Bhamidi, S., Guo, T., Pipiras, V., Wang, B.: Sampling-based estimation of in-degree distribution with applications to directed complex networks. arXiv preprint arXiv:1810.01300 (2018)
  19. 19.
    Segarra, S., Ribeiro, A.: Stability and continuity of centrality measures in weighted graphs. IEEE Trans. Signal Process. 64(3), 543–555 (2015)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Han, C.-G., Lee, S.-H.: Analysis of effect of an additional edge on eigenvector centrality of graph. J. Korea Soc. Comput. Inf. 21(1), 25–31 (2016)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Murai, S., Yoshida, Y.: Sensitivity analysis of centralities on unweighted networks. In: The World Wide Web Conference, pp. 1332–1342. ACM (2019)Google Scholar
  22. 22.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  23. 23.
    Sakakura, Y., Yamaguchi, Y., Amagasa, T., Kitagawa, H.: An improved method for efficient PageRank estimation. In: International Conference on Database and Expert Systems Applications, pp. 208–222. Springer (2014)Google Scholar
  24. 24.
    Chen, Y.-Y., Gan, Q., Suel, T.: Local methods for estimating PageRank values. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 381–389. ACM (2004)Google Scholar
  25. 25.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)zbMATHGoogle Scholar
  26. 26.
    Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in Facebook: a case study of unbiased sampling of OSNs. In: 2010 Proceedings IEEE Infocom, pp. 1–9. IEEE (2010)Google Scholar
  27. 27.
    Romance, M.: Local estimates for eigenvector-like centralities of complex networks. J. Comput. Appl. Math. 235(7), 1868–1874 (2011)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Karrer, B., Newman, M.E.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83(1), 016107 (2011)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Erdös, P., Rényi, A.: On random graphs, I. Publicationes Mathematicae (Debrecen) 6, 290–297 (1959)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Oliveira, R., Willinger, W., Zhang, B., et al.: Quantifying the completeness of the observed internet as-level structure. Work 11(15), 13–17 (2008)Google Scholar
  32. 32.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)CrossRefGoogle Scholar
  33. 33.
    Richardson, M., Agrawal, R., Domingos, P.: Trust management for the semantic web. In: International Semantic Web Conference, pp. 351–368. Springer (2003)Google Scholar
  34. 34.
    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: Proceedings of the 19th International Conference on World Wide Web, pp. 701–710. ACM (2010)Google Scholar
  36. 36.
    Lovász, L., et al.: Random walks on graphs: a survey. Comb. Paul Erdos Eighty 2(1), 1–46 (1993)Google Scholar
  37. 37.
    Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)CrossRefGoogle Scholar
  38. 38.
    Goodman, L.A.: Snowball sampling. Ann. Math. Stat. 32, 148–170 (1961)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Corder, G.W., Foreman, D.I.: Nonparametric Statistics: A Step-by-Step Approach. Wiley, Hoboken (2014)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Max Planck Institute for Intelligent SystemsTuebingenGermany
  2. 2.Università degli Studi di PadovaPadovaItaly

Personalised recommendations