Abstract
We critically evaluate normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method’s performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences under relevant random models, and for evaluating community detection, we advise one-sided AMI under the \(\mathbb {M}_\mathbf{all }\) model (all partitions of \(n\) nodes). This work seeks (1) to drive a conversation on robust measurements (2) to advocate evaluations which do not give “free lunch”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Negative AMI indicates worse-than-chance clusterings.
- 2.
This shape has alternatively been called a class or decomposition pattern.
- 3.
This abuse of notation blurs the distinction between random models and the spaces over which they define their probabilities. We have restricted ourselves to uniform distributions over the spaces, so we believe that the abuse will not confuse.
- 4.
The bound function in this AMI definition is \(M(\mathcal {C}, \mathcal {T}) = \max _{\mathcal {C}^\prime , \mathcal {T}^\prime }I(\mathcal {C}^\prime , \mathcal {T}^\prime )\). In practice we could use any of the upper bounds from [6], for example Eq. 3, as long as it is a upper bound consistent with the chosen random model (here \(\mathbb {M}_{\text {perm}}\)).
- 5.
The surprising fact that larger graphs were processed faster comes from our column generation scheme: When gridlock splinters the graph into singletons at an early stage, we’ve reached our optimum, and the remaining \(O(N)\) LPs need not be solved.
References
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Danon, L., Díaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. 2005(09), P09008 (2005)
Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011)
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. JMLR 3, 1265–1287 (2003)
Gates, A.J., Ahn, Y.Y.: The impact of random models on clustering similarity. JMLR 18(87), 1–28 (2017)
Horta, D., Campello, R.J.G.B.: Comparing hard and overlapping clusterings. JMLR 16(1), 2949–2997 (2015)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 1 (1985)
Kingman, J.F.C.: The representation of partition structures. J. London Math. Soc. s2–18(2), 374–380 (1978)
Lai, D., Nardini, C.: A corrected normalized mutual information for performance evaluation of community detection. J. Stat. Mech. (9) (2016)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046110 (2008)
Liu, X., Cheng, H.M., Zhang, Z.Y.: Evaluation of community structures using kappa index and F-score instead of NMI. arXiv (2018)
Matula, D.W., Shahrokhi, F.: Sparsest cuts and bottlenecks in graphs. Discrete Appl. Math. 27(1), 113–123 (1990)
McCarthy, A.D.: Gridlock in networks: the Leximin method for hierarchical community detection. Master’s thesis, Southern Methodist University (2017)
McCarthy, A.D., Chen, T., Ebner, S.: An exact no free lunch theorem for community detection. In: Proceedings of the 8th International Conference on Complex Networks and Their Applications: Complex Networks 2019, Lisbon, Portugal (2019)
Meilă, M.: Comparing clusterings by the variation of information. In: Learning Theory and Kernel Machines (2003)
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006)
Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)
Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Computer and Information Sciences - ISCIS (2005)
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. PNAS 101(9), 2658–2663 (2004)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007)
Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006)
Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In: Proceedings of 31st International Conference on Machine Learning, vol. 32 (2014)
Romano, S., Vinh, N.X., Bailey, J., Verspoor, K.: Adjusting for chance clustering comparison measures. JMLR 17(1), 4635–4666 (2016)
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings EMNLP-CoNLL (2007)
Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)
Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolvingcommunity structure in complex networks. PNAS 104(18), 7327–7331 (2007)
Shahrokhi, F., Matula, D.W.: The maximum concurrent flow problem. J. ACM 37(2), 318–334 (1990)
Veldt, N., Gleich, D.F., Wirth, A.: A correlation clustering framework for community detection. In: Proceedings of 2018 WWW Conference (2018)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of ICML (2009)
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Yang, Z., Algesheimer, R., Tessone, C.J.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750 (2016)
Young, J.G., St-Onge, G., Desrosiers, P., Dubé, L.J.: Universality of the stochastic block model. Phys. Rev. E 98, 032309 (2018)
Zhang, J., Chen, T., Hu, J.: On the relationship between Gaussian stochastic blockmodels and label propagation algorithms. J. Stat. Mech. 2015(3), P03009 (2015)
Zhang, P.: Evaluating accuracy of community detection using the relative normalized mutual information. J. Stat. Mech. 2015(11), P11006 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
McCarthy, A.D., Chen, T., Rudinger, R., Matula, D.W. (2020). Metrics Matter in Community Detection. In: Cherifi, H., Gaito, S., Mendes, J., Moro, E., Rocha, L. (eds) Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881. Springer, Cham. https://doi.org/10.1007/978-3-030-36687-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-36687-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36686-5
Online ISBN: 978-3-030-36687-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)