An Empirical Study of the Effect of Noise Models on Centrality Metrics
An important yet little studied problem in network analysis is the effect of the presence of errors in creating the networks. Errors can occur both due to the limitations of data collection techniques and the implicit bias during modeling the network. In both cases, they lead to changes in the network in the form of additional or missing edges, collectively termed as noise. Given that network analysis is used in many critical applications from criminal identification to targeted drug discovery, it is important to evaluate by how much the noise affects the analysis results. In this paper, we present an empirical study of how different types of noise affect real-world networks. Specifically, we apply four different noise models to a suite of nine networks, with different levels of perturbations to test how the ranking of the top-k centrality vertices changes. Our results show that deletion of edges has less effect on centrality than the addition of edges. Nevertheless, the stability of the ranking depends on all three parameters: the structure of the network, the type of noise model used, and the centrality metric to be computed. To the best of our knowledge, this is one of the first extensive studies to conduct both longitudinal (across different networks) and horizontal (across different noise models and centrality metrics) experiments to understand the effect of noise in network analysis.
KeywordsNoise models in networks Centrality metrics Accuracy of analysis
SB was supported by the NSF CCF Award #1533881 and #1725566.
- 1.Adiga, A., Vullikanti, A.K.: How robust is the core of a network? In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Volume 8188, ECML PKDD 2013, pp. 541–556. Springer, New York (2013). https://doi.org/10.1007/978-3-642-40988-2_35 CrossRefGoogle Scholar
- 2.Adiga, A., Vullikanti, A.K.S.: How robust is the core of a network? In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 541–556. Springer, Berlin/Heidelberg (2013)Google Scholar
- 7.Kim, M., Leskovec, J.: The network completion problem: inferring missing nodes and edges in networks. In: SDM, pp. 47–58. SIAM, Philadelphia (2011)Google Scholar
- 8.Leskovec, J., Sosič, R.: Snap.py: SNAP for Python, a general purpose network analysis and graph mining tool in Python (2014). http://snap.stanford.edu/snappy
- 10.Liu, J., Aggarwal, C., Han, J.: On integrating network and community discovery. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 117–126. ACM, New York (2015)Google Scholar
- 11.Moustafa, W.E., Kimmig, A., Deshpande, A., Getoor, L.: Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 904–915. IEEE, New York (2014)Google Scholar
- 12.Pinto, J.P., Machado, R.S.R., Xavier, J.M., Futschik, M.E.: Targeting molecular networks for drug research. Front. Genet. 5, 160 (2014). https://doi.org/10.3389/fgene.2014.00160. https://www.frontiersin.org/article/10.3389/fgene.2014.00160
- 13.Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015). http://networkrepository.com
- 14.Sarkar, S., Bhowmick, S., Kumar, S., Mukherjee, A.: Sensitivity and reliability in incomplete networks: centrality metrics to community scoring functions. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’16, pp. 69–72. IEEE Press, Piscataway (2016). http://dl.acm.org/citation.cfm?id=3192424.3192437
- 17.Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors (2015)Google Scholar