Resampling-Based Framework for Unbiased Estimator of Node Centrality over Large Complex Network

Saito, Kazumi; Ohara, Kouzou; Kimura, Masahiro; Motoda, Hiroshi

doi:10.1007/978-3-030-33778-0_32

Kazumi Saito^11,12,
Kouzou Ohara¹³,
Masahiro Kimura¹⁴ &
…
Hiroshi Motoda¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11828))

Included in the following conference series:

International Conference on Discovery Science

1669 Accesses

Abstract

We address a problem of efficiently estimating value of a centrality measure for a node in a large network, and propose a sampling-based framework in which only a small number of nodes that are randomly selected are used to estimate the measure. The error estimator we derived is an unbiased estimator of the approximation error defined as the expectation of the difference between the true and the estimated values of the centrality. We experimentally evaluate the fundamental performance of the proposed framework using the closeness and betweenness centralities on six real world networks from different domains, and show that it allows us to estimate the approximation error more tightly and more precisely than the standard error estimator traditionally used based on i.i.d. sampling, i.e., with the confidence level of \(95\%\) for a small number of sampling, say \(20\%\) of the total number of nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

AlGhamdi, Z., Jamour, F., Skiadopoulos, S., Kalnis, P.: A benchmark for betweenness centrality approximation algorithms on large graphs. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management (2017)
Google Scholar
Bonacichi, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)
Article Google Scholar
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 163–177 (2001)
Article Google Scholar
Brandes, U., Pich, C.: Centrality estimation in large networks. Int. J. Bifurcat. Chaos 17(7), 303–318 (2007)
Article MathSciNet Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998)
Article Google Scholar
Chakrabarti, S., et al.: Mining the web’s link structure. IEEE Comput. 32, 60–67 (1999)
Article Google Scholar
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23(4), 493–507 (1952)
Article MathSciNet Google Scholar
Freeman, L.: Centrality in social networks: conceptual clarification. Soc. Netw. 1, 215–239 (1979)
Article Google Scholar
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. Int. J. Comput. Telecommun. Network. 33(1–6), 295–308 (2000)
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet Google Scholar
Katz, L.: A new status index derived from sociometric analysis. Sociometry 18, 39–43 (1953)
MATH Google Scholar
Kimura, M., Saito, K., Ohara, K., Motoda, H.: Speeding-up node influence computation for huge social networks. Int. J. Data Sci. Anal. 1, 1–14 (2016)
Article Google Scholar
Klimt, B., Yang, Y.: The Enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22
Chapter Google Scholar
Kurant, M., Markopoulou, A., Thiran, P.: Towards unbiased BFS sampling. IEEE J. Sel. Areas Commun. 29(9), 1799–1809 (2011)
Article Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 631–636 (2006)
Google Scholar
Newman, M.E.J.: Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132 (2001)
Article Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based framework for estimating node centrality of large social network. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS (LNAI), vol. 8777, pp. 228–239. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11812-3_20
Chapter Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based gap analysis for detecting nodes with high centrality on large social network. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 135–147. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18038-0_11
Chapter Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Accelerating computation of distance based centrality measures for spatial networks. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 376–391. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_24
Chapter Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-k influential nodes. Int. J. Data Sci. Anal (2019, online first)
Google Scholar
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)
Article Google Scholar
Riondato, M., Kornaropoulos, E.M.: Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Disc. 30(2), 438–475 (2016)
Article MathSciNet Google Scholar
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–280 (1971)
Article Google Scholar
Wandelt, S., Shi, X., Sun, X.: Scalability of betweenness approximation algorithms: an experimental review. IEEE Access 7, 104057–104071 (2019)
Article Google Scholar
Zhuge, H., Zhang, J.: Topological centrality and its e-Science applications. J. Am. Soc. Inf. Sci. Technol. 61, 1824–1841 (2010)
Article Google Scholar

Download references

Acknowledgments

This material is based upon work supported by JSPS Grant-in-Aid for Scientific Research (C) (No. 17K00314).

Author information

Authors and Affiliations

Faculty of Science, Kanagawa University, Hiratsuka, Japan
Kazumi Saito
Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
Kazumi Saito
College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Japan
Kouzou Ohara
Department of Electronics and Informatics, Ryukoku University, Kyoto, Japan
Masahiro Kimura
Institute of Scientific and Industrial Research, Osaka University, Suita, Japan
Hiroshi Motoda

Authors

Kazumi Saito
View author publications
You can also search for this author in PubMed Google Scholar
Kouzou Ohara
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Motoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kouzou Ohara .

Editor information

Editors and Affiliations

Jožef Stefan Institute, Ljubljana, Slovenia
Petra Kralj Novak
Rudjer Bošković Institute, Zagreb, Croatia
Tomislav Šmuc
Jožef Stefan Institute, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saito, K., Ohara, K., Kimura, M., Motoda, H. (2019). Resampling-Based Framework for Unbiased Estimator of Node Centrality over Large Complex Network. In: Kralj Novak, P., Šmuc, T., Džeroski, S. (eds) Discovery Science. DS 2019. Lecture Notes in Computer Science(), vol 11828. Springer, Cham. https://doi.org/10.1007/978-3-030-33778-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-33778-0_32
Published: 16 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33777-3
Online ISBN: 978-3-030-33778-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics