Abstract
We address a problem of efficiently estimating value of a centrality measure for a node in a large network, and propose a sampling-based framework in which only a small number of nodes that are randomly selected are used to estimate the measure. The error estimator we derived is an unbiased estimator of the approximation error defined as the expectation of the difference between the true and the estimated values of the centrality. We experimentally evaluate the fundamental performance of the proposed framework using the closeness and betweenness centralities on six real world networks from different domains, and show that it allows us to estimate the approximation error more tightly and more precisely than the standard error estimator traditionally used based on i.i.d. sampling, i.e., with the confidence level of \(95\%\) for a small number of sampling, say \(20\%\) of the total number of nodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AlGhamdi, Z., Jamour, F., Skiadopoulos, S., Kalnis, P.: A benchmark for betweenness centrality approximation algorithms on large graphs. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management (2017)
Bonacichi, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 163–177 (2001)
Brandes, U., Pich, C.: Centrality estimation in large networks. Int. J. Bifurcat. Chaos 17(7), 303–318 (2007)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998)
Chakrabarti, S., et al.: Mining the web’s link structure. IEEE Comput. 32, 60–67 (1999)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23(4), 493–507 (1952)
Freeman, L.: Centrality in social networks: conceptual clarification. Soc. Netw. 1, 215–239 (1979)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. Int. J. Comput. Telecommun. Network. 33(1–6), 295–308 (2000)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Katz, L.: A new status index derived from sociometric analysis. Sociometry 18, 39–43 (1953)
Kimura, M., Saito, K., Ohara, K., Motoda, H.: Speeding-up node influence computation for huge social networks. Int. J. Data Sci. Anal. 1, 1–14 (2016)
Klimt, B., Yang, Y.: The Enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22
Kurant, M., Markopoulou, A., Thiran, P.: Towards unbiased BFS sampling. IEEE J. Sel. Areas Commun. 29(9), 1799–1809 (2011)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 631–636 (2006)
Newman, M.E.J.: Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132 (2001)
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based framework for estimating node centrality of large social network. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS (LNAI), vol. 8777, pp. 228–239. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11812-3_20
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based gap analysis for detecting nodes with high centrality on large social network. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 135–147. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18038-0_11
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Accelerating computation of distance based centrality measures for spatial networks. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 376–391. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_24
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-k influential nodes. Int. J. Data Sci. Anal (2019, online first)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Riondato, M., Kornaropoulos, E.M.: Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Disc. 30(2), 438–475 (2016)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–280 (1971)
Wandelt, S., Shi, X., Sun, X.: Scalability of betweenness approximation algorithms: an experimental review. IEEE Access 7, 104057–104071 (2019)
Zhuge, H., Zhang, J.: Topological centrality and its e-Science applications. J. Am. Soc. Inf. Sci. Technol. 61, 1824–1841 (2010)
Acknowledgments
This material is based upon work supported by JSPS Grant-in-Aid for Scientific Research (C) (No. 17K00314).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Saito, K., Ohara, K., Kimura, M., Motoda, H. (2019). Resampling-Based Framework for Unbiased Estimator of Node Centrality over Large Complex Network. In: Kralj Novak, P., Šmuc, T., Džeroski, S. (eds) Discovery Science. DS 2019. Lecture Notes in Computer Science(), vol 11828. Springer, Cham. https://doi.org/10.1007/978-3-030-33778-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-33778-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33777-3
Online ISBN: 978-3-030-33778-0
eBook Packages: Computer ScienceComputer Science (R0)