Fast Community Detection in Complex Networks with a K-Depths Classifier

Tian, Yahui; Gel, Yulia R.

doi:10.1007/978-3-319-41573-4_8

Yahui Tian² &
Yulia R. Gel²

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

3551 Accesses
5 Citations

Abstract

We introduce a notion of data depth for recovery of community structures in large complex networks. We propose a new data-driven algorithm, K-depths, for community detection using the L ₁-depth in an unsupervised setting. We evaluate finite sample properties of the K-depths method using synthetic networks and illustrate its performance for tracking communities in online social media platform Flickr. The new method significantly outperforms the classical K-means and yields comparable results to the regularized K-means. Being robust to low-degree vertices, the new K-depths method is computationally efficient, requiring up to 400 times less CPU time than the currently adopted regularization procedures based on optimizing the Davis–Kahan bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amini, A.A., Chen, A., Bickel, P.J., Levina, E.: Pseudo-likelihood methods for community detection in large sparse networks. Ann. Stat. 41, 2097–2122 (2013)
Article MathSciNet MATH Google Scholar
Arlot, S., Celisse, A., et al.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Article MathSciNet MATH Google Scholar
Baddar, S.A.-H., Merlo, A., Migliardi, M.: Anomaly detection in computer networks: a state-of-the-art review. J. Wirel. Mob. Netw. Ubiquit. Comput. Dependable Appl. 5 (4), 29–64 (2014)
Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutorials 16 (1), 303–336 (2014)
Article Google Scholar
Cai, T.T., Li, X.: Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Stat. 43 (3), 1027–1059 (2015)
Article MathSciNet MATH Google Scholar
Chernozhukov, V., Galichon, A., Hallin, M., Henry, M.: Monge-Kantorovich depth, quantiles, ranks, and signs. arXiv preprint arXiv:1412.8434 (2014)
Google Scholar
Cuesta-Albertos, J., Gordaliza, A., Matrán, C., et al.: Trimmed k-means: An attempt to robustify quantizers. Ann. Stat. 25 (2), 553–576 (1997)
Article MathSciNet MATH Google Scholar
Cuesta-Albertos, J.A., Matrán, C., Mayo-Iscar, A.: Trimming and likelihood: robust location and dispersion estimation in the elliptical model. Ann. Stat. 36 (5), 2284–2318 (2008)
Article MathSciNet MATH Google Scholar
Cuevas, A., Febrero, M., Fraiman, R.: Robust estimation and classification for functional data via projection-based depth functions. Comput. Stat. 22, 481–496 (2007)
Article MATH Google Scholar
Emelichev, V., Efimchik, N.: Asymptotic approach to the problem of k-median of a graph. Cybern. Syst. Anal. 30 (5), 726–732 (1994)
Article MathSciNet MATH Google Scholar
Estrada, E., Knight, P.A.: A First Course in Network Theory. Oxford University Press, Oxford (2015)
MATH Google Scholar
Fallani, F.D.V., Nicosia, V., Latora, V., Chavez, M.: Nonparametric resampling of random walks for spectral network clustering. Phys. Rev. E 89 (1), 012802 (2014)
Article Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486 (3), 75–174 (2010)
Article MathSciNet Google Scholar
Fraiman, D., Fraiman, F., Fraiman, R.: Statistics of dynamic random networks: a depth function approach. arXiv:1408.3584v3 (2015)
Google Scholar
Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)
Google Scholar
García-Escudero, L.Á., Gordaliza, A.: Robustness properties of k means and trimmed k means. J. Am. Stat. Assoc. 94 (447), 956–969 (1999)
MathSciNet MATH Google Scholar
Gogoi, P., Bhattacharyya, D., Borah, B., Kalita, J.K.: A survey of outlier detection methods in network anomaly identification. Comput. J. 54 (4) (2011)
Google Scholar
Gupta, M., Gao, J., Han, J.: Community distribution outlier detection in heterogeneous information networks. In: Machine Learning and Knowledge Discovery in Databases, pp. 557–573. Springer, Berlin (2013)
Google Scholar
Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11 (9), 1074–1085 (1992)
Article Google Scholar
Hallin, M.: Monge-Kantorovich ranks and signs. GOF DAYS 2015, p. 33 (2015)
Google Scholar
Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., Samatova, N.: Community detection in large-scale networks: a survey and empirical evaluation. WIRE Comput. Stat. 6, 426–439 (2014)
Article Google Scholar
Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., Samatova, N.: Community detection in large-scale networks: a survey and empirical evaluation. Wiley Interdiscip. Rev. Comput. Stat. 6 (6), 426–439 (2014)
Article Google Scholar
Holland, P., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Networks 5 (2), 109–137 (1983)
Article MathSciNet Google Scholar
Huber, P.J., Ronchetti, E.: Robust Statistics. Wiley, Hoboken vol. 10(1002). doi:9780470434697 (2009)
Google Scholar
Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23 (1), 92–119 (2008)
Article MathSciNet MATH Google Scholar
Hugg, J., Rafalin, E., Seyboth, K., Souvaine, D.: An experimental study of old and new depth measures. In: Proceedings of the Meeting on Algorithm Engineering & Experiments, pp. 51–64. Society for Industrial and Applied Mathematics (2006)
Google Scholar
Hyndman, R.J., Shang, H.L.: Rainbow plots, bagplots, and boxplots for functional data. J. Comput. Graph. Stat. 19, 29–45 (2010)
Article MathSciNet Google Scholar
Jin, J.: Fast community detection by score. Ann. Stat. 43 (1), 57–89 (2015)
Article MathSciNet MATH Google Scholar
Jörnsten, R.: Clustering and classification based on the L 1 data depth. J. Multivar. Anal. 90 (1), 67–89 (2004)
Article MathSciNet MATH Google Scholar
Jörnsten, R., Vardi, Y., Zhange, C.-H.: A robust clustering method and visualization tool based on data depth. In: Dodge, Y. (ed.) Statistics in Industry and Technology: Statistical Data Analysis, pp. 353–366. Birkhäuser, Basel (2002)
Google Scholar
Joseph, A., Yu, B.: Impact of regularization on spectral clustering. Ann. Stat. 44 (4), 1765–1791 (2016)
Article MathSciNet MATH Google Scholar
Kondo, Y., Salibian-Barrera, M., Zamar, R.: A robust and sparse k-means clustering algorithm. arXiv preprint arXiv:1201.6082 (2012)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 +ε)-approximation algorithm for k-means clustering in any dimensions. In: Annual Symposium on Foundations of Computer Science, vol. 45, pp. 454–462. IEEE Computer Society Press, New York (2004)
Google Scholar
Lange, T., Mosler, K.: Fast nonparametric classification based on data depth. Stat. Pap. 55, 49–69 (2014)
Article MathSciNet MATH Google Scholar
Le, C.M., Vershynin, R.: Concentration and regularization of random graphs. arXiv preprint arXiv:1506.00669 (2015)
Google Scholar
Lei, J., Rinaldo, A.: Consistency of spectral clustering in stochastic block models. Ann. Stat. 43 (1), 215–237 (2015)
Article MathSciNet MATH Google Scholar
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM, New York (2010)
Google Scholar
Liu, R.Y., Parelius, J.M., Singh, K.: Special invited paper: multivariate analysis by data depth: descriptive statistica, graphics and inference. Ann. Stat. 27 (3), 783–858 (1999)
Article MATH Google Scholar
López-Pintado, S., Jörnsten, R.: Functional analysis via extensions of the band depth. In: Complex Datasets and Inverse Problems: Tomography, Networks and Beyond. Lecture Notes-Monograph Series, pp. 103–120. Beachwood, Ohio, USA (2007)
Google Scholar
López-Pintado, S., Romo, J.: On the concept of depth for functional data. J. Am. Stat. Assoc. 104, 718–734 (2009)
Article MathSciNet MATH Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)
MathSciNet MATH Google Scholar
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533, 95–142 (2013)
Article MathSciNet MATH Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Newman, M., Clauset, A.: Structure and inference in annotated networks. arXiv preprint arXiv:1507.04001 (2015)
Google Scholar
Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103 (23), 8577–8582 (2006)
Article Google Scholar
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)
Book MATH Google Scholar
Nieto-Reyes, A., Battey, H.: A topologically valid definition of depth for functional data. preprint. Stat. Sci. 31 (1), 61–79 (2016)
Google Scholar
Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Proceedings of NIPS (2014)
Google Scholar
Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20 (10), 1027–1040 (1999)
Article Google Scholar
Plantiè, M., Crampes, M.: Survey on social community detection. Social Media Retrieval Computer Communications and Networks (2012)
Google Scholar
Qin, T., Rohe, K.: Regularized spectral clustering under the degree-corrected stochastic blockmodel. In: NIPS, pp. 3120–3128 (2013)
Google Scholar
Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39, 1878–1915 (2011)
Article MathSciNet MATH Google Scholar
Sarkar, P., Bickel, P.: Role of normalization in spectral clustering for stochastic blockmodels. Ann. Stat. 43, 962–990 (2013)
Article MathSciNet MATH Google Scholar
Selim, S.Z., Ismail, M.A.: k-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6 (1), 81–87 (1984)
Google Scholar
Serfling, R.: Generalized quantile processes based on multivariate depth functions, with applications in nonparametric multivariate analysis. J. Multivar. Anal. 83, 232–247 (2002)
Article MathSciNet MATH Google Scholar
Serfling, R.: Quantile functions for multivariate analysis: approaches and applications. Statistica Neerlandica 56, 214–232 (2002)
Article MathSciNet MATH Google Scholar
Serfling, R.: Depth functions in nonparametric multivariate inference. In: Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 72(1). American Mathematical Society, Providence, RI (2006)
Google Scholar
Serfling, R., Wijesuriya, U.: Nonparametric description of functional data using the spatial depth approach (2015). Accessible at www.utdallas.edu/~serfling
Sharma, S., Yadav, R.L.: Comparative study of k-means and robust clustering. Int. J. Adv. Comput. Res. 3 (3), 207 (2013)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 (8), 888–905 (2000)
Article Google Scholar
Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E.: A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Am. Stat. Assoc. 107 (499), 1119–1128 (2012)
Article MathSciNet MATH Google Scholar
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826 (2009)
Google Scholar
Thompson, M.E., Ramirez Ramirez, L.L., Lyubchich, V., Gel, Y.R.: Using the bootstrap for statistical inference on random graphs. Can. J. Stat. 44, 3–24 (2016)
Article MathSciNet MATH Google Scholar
Torrente, A., Romo, J.: Refining k-means by bootstrap and data depth (2013). https://www.researchgate.net/profile/Juan_Romo/publication/242090768_Reflning_k-means_by_Bootstrap_and_Data_Depth/links/02e7e528daa72dc0a1000000.pdf
Google Scholar
Vardi, Y., Zhang, C.-H.: The multivariate l1-median and associated data depth. Proc. Natl. Acad. Sci. 97 (4), 1423–1426 (2000)
Article MathSciNet MATH Google Scholar
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17 (4), 395–416 (2007)
Article MathSciNet Google Scholar
White, S., Smyth, P.: A spectral clustering approach to finding communities in graph. In: SDM, vol. 5, pp. 76–84 (2005)
Google Scholar
Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8 (3), 1853–1891 (2014)
Article MathSciNet MATH Google Scholar
Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105 (490), 713–726 (2012)
Article MathSciNet MATH Google Scholar
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009)
Google Scholar
Zhang, Y., Levina, E., Zhu, J.: Community detection in networks with node features. arXiv preprint arXiv:1509.01173 (2015)
Google Scholar
Zhou, W., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28, 461–482 (2000)
Article MathSciNet MATH Google Scholar
Zuo, Y., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28, 461–482 (2000)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Authors are grateful to Robert Serfling, Ricardo Fraiman, and Rebecka Jörnsten for motivating discussions at various stages of this paper. Yulia R. Gel is supported in part by the National Science Foundation grant IIS 1633331. This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET) of Canada.

Author information

Authors and Affiliations

University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX, 75080, USA
Yahui Tian & Yulia R. Gel

Authors

Yahui Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yulia R. Gel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yulia R. Gel .

Editor information

Editors and Affiliations

Department of Mathematics & Statistics, Brock University, St. Catherines, Ontario, Canada
S. Ejaz Ahmed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tian, Y., Gel, Y.R. (2017). Fast Community Detection in Complex Networks with a K-Depths Classifier. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-41573-4_8
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics