Skip to main content

Fast Community Detection in Complex Networks with a K-Depths Classifier

  • Chapter
  • First Online:
Big and Complex Data Analysis

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Abstract

We introduce a notion of data depth for recovery of community structures in large complex networks. We propose a new data-driven algorithm, K-depths, for community detection using the L 1-depth in an unsupervised setting. We evaluate finite sample properties of the K-depths method using synthetic networks and illustrate its performance for tracking communities in online social media platform Flickr. The new method significantly outperforms the classical K-means and yields comparable results to the regularized K-means. Being robust to low-degree vertices, the new K-depths method is computationally efficient, requiring up to 400 times less CPU time than the currently adopted regularization procedures based on optimizing the Davis–Kahan bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amini, A.A., Chen, A., Bickel, P.J., Levina, E.: Pseudo-likelihood methods for community detection in large sparse networks. Ann. Stat. 41, 2097–2122 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  2. Arlot, S., Celisse, A., et al.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Baddar, S.A.-H., Merlo, A., Migliardi, M.: Anomaly detection in computer networks: a state-of-the-art review. J. Wirel. Mob. Netw. Ubiquit. Comput. Dependable Appl. 5 (4), 29–64 (2014)

    Google Scholar 

  4. Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutorials 16 (1), 303–336 (2014)

    Article  Google Scholar 

  5. Cai, T.T., Li, X.: Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Stat. 43 (3), 1027–1059 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chernozhukov, V., Galichon, A., Hallin, M., Henry, M.: Monge-Kantorovich depth, quantiles, ranks, and signs. arXiv preprint arXiv:1412.8434 (2014)

    Google Scholar 

  7. Cuesta-Albertos, J., Gordaliza, A., Matrán, C., et al.: Trimmed k-means: An attempt to robustify quantizers. Ann. Stat. 25 (2), 553–576 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cuesta-Albertos, J.A., Matrán, C., Mayo-Iscar, A.: Trimming and likelihood: robust location and dispersion estimation in the elliptical model. Ann. Stat. 36 (5), 2284–2318 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cuevas, A., Febrero, M., Fraiman, R.: Robust estimation and classification for functional data via projection-based depth functions. Comput. Stat. 22, 481–496 (2007)

    Article  MATH  Google Scholar 

  10. Emelichev, V., Efimchik, N.: Asymptotic approach to the problem of k-median of a graph. Cybern. Syst. Anal. 30 (5), 726–732 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  11. Estrada, E., Knight, P.A.: A First Course in Network Theory. Oxford University Press, Oxford (2015)

    MATH  Google Scholar 

  12. Fallani, F.D.V., Nicosia, V., Latora, V., Chavez, M.: Nonparametric resampling of random walks for spectral network clustering. Phys. Rev. E 89 (1), 012802 (2014)

    Article  Google Scholar 

  13. Fortunato, S.: Community detection in graphs. Phys. Rep. 486 (3), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  14. Fraiman, D., Fraiman, F., Fraiman, R.: Statistics of dynamic random networks: a depth function approach. arXiv:1408.3584v3 (2015)

    Google Scholar 

  15. Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)

    Google Scholar 

  16. García-Escudero, L.Á., Gordaliza, A.: Robustness properties of k means and trimmed k means. J. Am. Stat. Assoc. 94 (447), 956–969 (1999)

    MathSciNet  MATH  Google Scholar 

  17. Gogoi, P., Bhattacharyya, D., Borah, B., Kalita, J.K.: A survey of outlier detection methods in network anomaly identification. Comput. J. 54 (4) (2011)

    Google Scholar 

  18. Gupta, M., Gao, J., Han, J.: Community distribution outlier detection in heterogeneous information networks. In: Machine Learning and Knowledge Discovery in Databases, pp. 557–573. Springer, Berlin (2013)

    Google Scholar 

  19. Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11 (9), 1074–1085 (1992)

    Article  Google Scholar 

  20. Hallin, M.: Monge-Kantorovich ranks and signs. GOF DAYS 2015, p. 33 (2015)

    Google Scholar 

  21. Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., Samatova, N.: Community detection in large-scale networks: a survey and empirical evaluation. WIRE Comput. Stat. 6, 426–439 (2014)

    Article  Google Scholar 

  22. Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., Samatova, N.: Community detection in large-scale networks: a survey and empirical evaluation. Wiley Interdiscip. Rev. Comput. Stat. 6 (6), 426–439 (2014)

    Article  Google Scholar 

  23. Holland, P., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Networks 5 (2), 109–137 (1983)

    Article  MathSciNet  Google Scholar 

  24. Huber, P.J., Ronchetti, E.: Robust Statistics. Wiley, Hoboken vol. 10(1002). doi:9780470434697 (2009)

    Google Scholar 

  25. Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23 (1), 92–119 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  26. Hugg, J., Rafalin, E., Seyboth, K., Souvaine, D.: An experimental study of old and new depth measures. In: Proceedings of the Meeting on Algorithm Engineering & Experiments, pp. 51–64. Society for Industrial and Applied Mathematics (2006)

    Google Scholar 

  27. Hyndman, R.J., Shang, H.L.: Rainbow plots, bagplots, and boxplots for functional data. J. Comput. Graph. Stat. 19, 29–45 (2010)

    Article  MathSciNet  Google Scholar 

  28. Jin, J.: Fast community detection by score. Ann. Stat. 43 (1), 57–89 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. Jörnsten, R.: Clustering and classification based on the L 1 data depth. J. Multivar. Anal. 90 (1), 67–89 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  30. Jörnsten, R., Vardi, Y., Zhange, C.-H.: A robust clustering method and visualization tool based on data depth. In: Dodge, Y. (ed.) Statistics in Industry and Technology: Statistical Data Analysis, pp. 353–366. Birkhäuser, Basel (2002)

    Google Scholar 

  31. Joseph, A., Yu, B.: Impact of regularization on spectral clustering. Ann. Stat. 44 (4), 1765–1791 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  32. Kondo, Y., Salibian-Barrera, M., Zamar, R.: A robust and sparse k-means clustering algorithm. arXiv preprint arXiv:1201.6082 (2012)

    Google Scholar 

  33. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 +ε)-approximation algorithm for k-means clustering in any dimensions. In: Annual Symposium on Foundations of Computer Science, vol. 45, pp. 454–462. IEEE Computer Society Press, New York (2004)

    Google Scholar 

  34. Lange, T., Mosler, K.: Fast nonparametric classification based on data depth. Stat. Pap. 55, 49–69 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  35. Le, C.M., Vershynin, R.: Concentration and regularization of random graphs. arXiv preprint arXiv:1506.00669 (2015)

    Google Scholar 

  36. Lei, J., Rinaldo, A.: Consistency of spectral clustering in stochastic block models. Ann. Stat. 43 (1), 215–237 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  37. Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM, New York (2010)

    Google Scholar 

  38. Liu, R.Y., Parelius, J.M., Singh, K.: Special invited paper: multivariate analysis by data depth: descriptive statistica, graphics and inference. Ann. Stat. 27 (3), 783–858 (1999)

    Article  MATH  Google Scholar 

  39. López-Pintado, S., Jörnsten, R.: Functional analysis via extensions of the band depth. In: Complex Datasets and Inverse Problems: Tomography, Networks and Beyond. Lecture Notes-Monograph Series, pp. 103–120. Beachwood, Ohio, USA (2007)

    Google Scholar 

  40. López-Pintado, S., Romo, J.: On the concept of depth for functional data. J. Am. Stat. Assoc. 104, 718–734 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  41. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)

    MathSciNet  MATH  Google Scholar 

  42. Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533, 95–142 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  43. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  44. Newman, M., Clauset, A.: Structure and inference in annotated networks. arXiv preprint arXiv:1507.04001 (2015)

    Google Scholar 

  45. Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103 (23), 8577–8582 (2006)

    Article  Google Scholar 

  46. Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)

    Book  MATH  Google Scholar 

  47. Nieto-Reyes, A., Battey, H.: A topologically valid definition of depth for functional data. preprint. Stat. Sci. 31 (1), 61–79 (2016)

    Google Scholar 

  48. Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Proceedings of NIPS (2014)

    Google Scholar 

  49. Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20 (10), 1027–1040 (1999)

    Article  Google Scholar 

  50. Plantiè, M., Crampes, M.: Survey on social community detection. Social Media Retrieval Computer Communications and Networks (2012)

    Google Scholar 

  51. Qin, T., Rohe, K.: Regularized spectral clustering under the degree-corrected stochastic blockmodel. In: NIPS, pp. 3120–3128 (2013)

    Google Scholar 

  52. Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39, 1878–1915 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  53. Sarkar, P., Bickel, P.: Role of normalization in spectral clustering for stochastic blockmodels. Ann. Stat. 43, 962–990 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  54. Selim, S.Z., Ismail, M.A.: k-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6 (1), 81–87 (1984)

    Google Scholar 

  55. Serfling, R.: Generalized quantile processes based on multivariate depth functions, with applications in nonparametric multivariate analysis. J. Multivar. Anal. 83, 232–247 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  56. Serfling, R.: Quantile functions for multivariate analysis: approaches and applications. Statistica Neerlandica 56, 214–232 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  57. Serfling, R.: Depth functions in nonparametric multivariate inference. In: Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 72(1). American Mathematical Society, Providence, RI (2006)

    Google Scholar 

  58. Serfling, R., Wijesuriya, U.: Nonparametric description of functional data using the spatial depth approach (2015). Accessible at www.utdallas.edu/~serfling

  59. Sharma, S., Yadav, R.L.: Comparative study of k-means and robust clustering. Int. J. Adv. Comput. Res. 3 (3), 207 (2013)

    Google Scholar 

  60. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 (8), 888–905 (2000)

    Article  Google Scholar 

  61. Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E.: A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Am. Stat. Assoc. 107 (499), 1119–1128 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  62. Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826 (2009)

    Google Scholar 

  63. Thompson, M.E., Ramirez Ramirez, L.L., Lyubchich, V., Gel, Y.R.: Using the bootstrap for statistical inference on random graphs. Can. J. Stat. 44, 3–24 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  64. Torrente, A., Romo, J.: Refining k-means by bootstrap and data depth (2013). https://www.researchgate.net/profile/Juan_Romo/publication/242090768_Reflning_k-means_by_Bootstrap_and_Data_Depth/links/02e7e528daa72dc0a1000000.pdf

    Google Scholar 

  65. Vardi, Y., Zhang, C.-H.: The multivariate l1-median and associated data depth. Proc. Natl. Acad. Sci. 97 (4), 1423–1426 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  66. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17 (4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  67. White, S., Smyth, P.: A spectral clustering approach to finding communities in graph. In: SDM, vol. 5, pp. 76–84 (2005)

    Google Scholar 

  68. Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8 (3), 1853–1891 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  69. Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105 (490), 713–726 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  70. Zafarani, R., Liu, H.: Social computing data repository at ASU (2009)

    Google Scholar 

  71. Zhang, Y., Levina, E., Zhu, J.: Community detection in networks with node features. arXiv preprint arXiv:1509.01173 (2015)

    Google Scholar 

  72. Zhou, W., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28, 461–482 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  73. Zuo, Y., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28, 461–482 (2000)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Authors are grateful to Robert Serfling, Ricardo Fraiman, and Rebecka Jörnsten for motivating discussions at various stages of this paper. Yulia R. Gel is supported in part by the National Science Foundation grant IIS 1633331. This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET) of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yulia R. Gel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Tian, Y., Gel, Y.R. (2017). Fast Community Detection in Complex Networks with a K-Depths Classifier. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_8

Download citation

Publish with us

Policies and ethics