Skip to main content

Metrics Matter in Community Detection

  • Conference paper
  • First Online:
Complex Networks and Their Applications VIII (COMPLEX NETWORKS 2019)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 881))

Included in the following conference series:

Abstract

We critically evaluate normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method’s performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences under relevant random models, and for evaluating community detection, we advise one-sided AMI under the \(\mathbb {M}_\mathbf{all }\) model (all partitions of \(n\) nodes). This work seeks (1) to drive a conversation on robust measurements (2) to advocate evaluations which do not give “free lunch”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Negative AMI indicates worse-than-chance clusterings.

  2. 2.

    This shape has alternatively been called a class or decomposition pattern.

  3. 3.

    This abuse of notation blurs the distinction between random models and the spaces over which they define their probabilities. We have restricted ourselves to uniform distributions over the spaces, so we believe that the abuse will not confuse.

  4. 4.

    The bound function in this AMI definition is \(M(\mathcal {C}, \mathcal {T}) = \max _{\mathcal {C}^\prime , \mathcal {T}^\prime }I(\mathcal {C}^\prime , \mathcal {T}^\prime )\). In practice we could use any of the upper bounds from [6], for example Eq. 3, as long as it is a upper bound consistent with the chosen random model (here \(\mathbb {M}_{\text {perm}}\)).

  5. 5.

    The surprising fact that larger graphs were processed faster comes from our column generation scheme: When gridlock splinters the graph into singletons at an early stage, we’ve reached our optimum, and the remaining \(O(N)\) LPs need not be solved.

References

  1. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)

    Article  Google Scholar 

  2. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)

    Article  Google Scholar 

  3. Danon, L., Díaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. 2005(09), P09008 (2005)

    Article  Google Scholar 

  4. Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011)

    Article  Google Scholar 

  5. Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. JMLR 3, 1265–1287 (2003)

    MathSciNet  MATH  Google Scholar 

  6. Gates, A.J., Ahn, Y.Y.: The impact of random models on clustering similarity. JMLR 18(87), 1–28 (2017)

    MathSciNet  MATH  Google Scholar 

  7. Horta, D., Campello, R.J.G.B.: Comparing hard and overlapping clusterings. JMLR 16(1), 2949–2997 (2015)

    MathSciNet  MATH  Google Scholar 

  8. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 1 (1985)

    Article  Google Scholar 

  9. Kingman, J.F.C.: The representation of partition structures. J. London Math. Soc. s2–18(2), 374–380 (1978)

    Article  MathSciNet  Google Scholar 

  10. Lai, D., Nardini, C.: A corrected normalized mutual information for performance evaluation of community detection. J. Stat. Mech. (9) (2016)

    Google Scholar 

  11. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046110 (2008)

    Article  Google Scholar 

  12. Liu, X., Cheng, H.M., Zhang, Z.Y.: Evaluation of community structures using kappa index and F-score instead of NMI. arXiv (2018)

    Google Scholar 

  13. Matula, D.W., Shahrokhi, F.: Sparsest cuts and bottlenecks in graphs. Discrete Appl. Math. 27(1), 113–123 (1990)

    Article  MathSciNet  Google Scholar 

  14. McCarthy, A.D.: Gridlock in networks: the Leximin method for hierarchical community detection. Master’s thesis, Southern Methodist University (2017)

    Google Scholar 

  15. McCarthy, A.D., Chen, T., Ebner, S.: An exact no free lunch theorem for community detection. In: Proceedings of the 8th International Conference on Complex Networks and Their Applications: Complex Networks 2019, Lisbon, Portugal (2019)

    Google Scholar 

  16. Meilă, M.: Comparing clusterings by the variation of information. In: Learning Theory and Kernel Machines (2003)

    Chapter  Google Scholar 

  17. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006)

    Article  MathSciNet  Google Scholar 

  18. Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)

    Article  Google Scholar 

  19. Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Computer and Information Sciences - ISCIS (2005)

    Chapter  Google Scholar 

  20. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. PNAS 101(9), 2658–2663 (2004)

    Article  Google Scholar 

  21. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007)

    Article  Google Scholar 

  22. Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006)

    Article  MathSciNet  Google Scholar 

  23. Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In: Proceedings of 31st International Conference on Machine Learning, vol. 32 (2014)

    Google Scholar 

  24. Romano, S., Vinh, N.X., Bailey, J., Verspoor, K.: Adjusting for chance clustering comparison measures. JMLR 17(1), 4635–4666 (2016)

    MathSciNet  MATH  Google Scholar 

  25. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings EMNLP-CoNLL (2007)

    Google Scholar 

  26. Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)

    Article  Google Scholar 

  27. Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolvingcommunity structure in complex networks. PNAS 104(18), 7327–7331 (2007)

    Article  Google Scholar 

  28. Shahrokhi, F., Matula, D.W.: The maximum concurrent flow problem. J. ACM 37(2), 318–334 (1990)

    Article  MathSciNet  Google Scholar 

  29. Veldt, N., Gleich, D.F., Wirth, A.: A correlation clustering framework for community detection. In: Proceedings of 2018 WWW Conference (2018)

    Google Scholar 

  30. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of ICML (2009)

    Google Scholar 

  31. Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)

    Article  Google Scholar 

  32. Yang, Z., Algesheimer, R., Tessone, C.J.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750 (2016)

    Article  Google Scholar 

  33. Young, J.G., St-Onge, G., Desrosiers, P., Dubé, L.J.: Universality of the stochastic block model. Phys. Rev. E 98, 032309 (2018)

    Article  Google Scholar 

  34. Zhang, J., Chen, T., Hu, J.: On the relationship between Gaussian stochastic blockmodels and label propagation algorithms. J. Stat. Mech. 2015(3), P03009 (2015)

    Article  MathSciNet  Google Scholar 

  35. Zhang, P.: Evaluating accuracy of community detection using the relative normalized mutual information. J. Stat. Mech. 2015(11), P11006 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arya D. McCarthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

McCarthy, A.D., Chen, T., Rudinger, R., Matula, D.W. (2020). Metrics Matter in Community Detection. In: Cherifi, H., Gaito, S., Mendes, J., Moro, E., Rocha, L. (eds) Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881. Springer, Cham. https://doi.org/10.1007/978-3-030-36687-2_14

Download citation

Publish with us

Policies and ethics