Metrics Matter in Community Detection

McCarthy, Arya D.; Chen, Tongfei; Rudinger, Rachel; Matula, David W.

doi:10.1007/978-3-030-36687-2_14

Arya D. McCarthy⁷,
Tongfei Chen⁷,
Rachel Rudinger⁷ &
…
David W. Matula⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 881))

Included in the following conference series:

International Conference on Complex Networks and Their Applications

3202 Accesses
2 Citations

Abstract

We critically evaluate normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method’s performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences under relevant random models, and for evaluating community detection, we advise one-sided AMI under the \(\mathbb {M}_\mathbf{all }\) model (all partitions of \(n\) nodes). This work seeks (1) to drive a conversation on robust measurements (2) to advocate evaluations which do not give “free lunch”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Negative AMI indicates worse-than-chance clusterings.
2.
This shape has alternatively been called a class or decomposition pattern.
3.
This abuse of notation blurs the distinction between random models and the spaces over which they define their probabilities. We have restricted ourselves to uniform distributions over the spaces, so we believe that the abuse will not confuse.
4.
The bound function in this AMI definition is \(M(\mathcal {C}, \mathcal {T}) = \max _{\mathcal {C}^\prime , \mathcal {T}^\prime }I(\mathcal {C}^\prime , \mathcal {T}^\prime )\). In practice we could use any of the upper bounds from [6], for example Eq. 3, as long as it is a upper bound consistent with the chosen random model (here \(\mathbb {M}_{\text {perm}}\)).
5.
The surprising fact that larger graphs were processed faster comes from our column generation scheme: When gridlock splinters the graph into singletons at an early stage, we’ve reached our optimum, and the remaining \(O(N)\) LPs need not be solved.

References

Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)
Article Google Scholar
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Article Google Scholar
Danon, L., Díaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. 2005(09), P09008 (2005)
Article Google Scholar
Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011)
Article Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. JMLR 3, 1265–1287 (2003)
MathSciNet MATH Google Scholar
Gates, A.J., Ahn, Y.Y.: The impact of random models on clustering similarity. JMLR 18(87), 1–28 (2017)
MathSciNet MATH Google Scholar
Horta, D., Campello, R.J.G.B.: Comparing hard and overlapping clusterings. JMLR 16(1), 2949–2997 (2015)
MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 1 (1985)
Article Google Scholar
Kingman, J.F.C.: The representation of partition structures. J. London Math. Soc. s2–18(2), 374–380 (1978)
Article MathSciNet Google Scholar
Lai, D., Nardini, C.: A corrected normalized mutual information for performance evaluation of community detection. J. Stat. Mech. (9) (2016)
Google Scholar
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046110 (2008)
Article Google Scholar
Liu, X., Cheng, H.M., Zhang, Z.Y.: Evaluation of community structures using kappa index and F-score instead of NMI. arXiv (2018)
Google Scholar
Matula, D.W., Shahrokhi, F.: Sparsest cuts and bottlenecks in graphs. Discrete Appl. Math. 27(1), 113–123 (1990)
Article MathSciNet Google Scholar
McCarthy, A.D.: Gridlock in networks: the Leximin method for hierarchical community detection. Master’s thesis, Southern Methodist University (2017)
Google Scholar
McCarthy, A.D., Chen, T., Ebner, S.: An exact no free lunch theorem for community detection. In: Proceedings of the 8th International Conference on Complex Networks and Their Applications: Complex Networks 2019, Lisbon, Portugal (2019)
Google Scholar
Meilă, M.: Comparing clusterings by the variation of information. In: Learning Theory and Kernel Machines (2003)
Chapter Google Scholar
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006)
Article MathSciNet Google Scholar
Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)
Article Google Scholar
Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Computer and Information Sciences - ISCIS (2005)
Chapter Google Scholar
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. PNAS 101(9), 2658–2663 (2004)
Article Google Scholar
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007)
Article Google Scholar
Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006)
Article MathSciNet Google Scholar
Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In: Proceedings of 31st International Conference on Machine Learning, vol. 32 (2014)
Google Scholar
Romano, S., Vinh, N.X., Bailey, J., Verspoor, K.: Adjusting for chance clustering comparison measures. JMLR 17(1), 4635–4666 (2016)
MathSciNet MATH Google Scholar
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings EMNLP-CoNLL (2007)
Google Scholar
Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)
Article Google Scholar
Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolvingcommunity structure in complex networks. PNAS 104(18), 7327–7331 (2007)
Article Google Scholar
Shahrokhi, F., Matula, D.W.: The maximum concurrent flow problem. J. ACM 37(2), 318–334 (1990)
Article MathSciNet Google Scholar
Veldt, N., Gleich, D.F., Wirth, A.: A correlation clustering framework for community detection. In: Proceedings of 2018 WWW Conference (2018)
Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of ICML (2009)
Google Scholar
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Article Google Scholar
Yang, Z., Algesheimer, R., Tessone, C.J.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750 (2016)
Article Google Scholar
Young, J.G., St-Onge, G., Desrosiers, P., Dubé, L.J.: Universality of the stochastic block model. Phys. Rev. E 98, 032309 (2018)
Article Google Scholar
Zhang, J., Chen, T., Hu, J.: On the relationship between Gaussian stochastic blockmodels and label propagation algorithms. J. Stat. Mech. 2015(3), P03009 (2015)
Article MathSciNet Google Scholar
Zhang, P.: Evaluating accuracy of community detection using the relative normalized mutual information. J. Stat. Mech. 2015(11), P11006 (2015)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Johns Hopkins University, Baltimore, USA
Arya D. McCarthy, Tongfei Chen & Rachel Rudinger
Southern Methodist University, University Park, USA
David W. Matula

Authors

Arya D. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Tongfei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Rudinger
View author publications
You can also search for this author in PubMed Google Scholar
David W. Matula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arya D. McCarthy .

Editor information

Editors and Affiliations

University of Burgundy, Dijon Cedex, France
Hocine Cherifi
Università degli Studi di Milano, Milan, Italy
Sabrina Gaito
University of Aveiro, Aveiro, Portugal
José Fernendo Mendes
Universidad Carlos III de Madrid, Leganés, Madrid, Spain
Esteban Moro
Indiana University, Bloomington, IN, USA
Luis Mateus Rocha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McCarthy, A.D., Chen, T., Rudinger, R., Matula, D.W. (2020). Metrics Matter in Community Detection. In: Cherifi, H., Gaito, S., Mendes, J., Moro, E., Rocha, L. (eds) Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881. Springer, Cham. https://doi.org/10.1007/978-3-030-36687-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-36687-2_14
Published: 26 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36686-5
Online ISBN: 978-3-030-36687-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics