Abstract
We introduce a notion of data depth for recovery of community structures in large complex networks. We propose a new data-driven algorithm, K-depths, for community detection using the L 1-depth in an unsupervised setting. We evaluate finite sample properties of the K-depths method using synthetic networks and illustrate its performance for tracking communities in online social media platform Flickr. The new method significantly outperforms the classical K-means and yields comparable results to the regularized K-means. Being robust to low-degree vertices, the new K-depths method is computationally efficient, requiring up to 400 times less CPU time than the currently adopted regularization procedures based on optimizing the Davis–Kahan bound.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amini, A.A., Chen, A., Bickel, P.J., Levina, E.: Pseudo-likelihood methods for community detection in large sparse networks. Ann. Stat. 41, 2097–2122 (2013)
Arlot, S., Celisse, A., et al.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Baddar, S.A.-H., Merlo, A., Migliardi, M.: Anomaly detection in computer networks: a state-of-the-art review. J. Wirel. Mob. Netw. Ubiquit. Comput. Dependable Appl. 5 (4), 29–64 (2014)
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutorials 16 (1), 303–336 (2014)
Cai, T.T., Li, X.: Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Stat. 43 (3), 1027–1059 (2015)
Chernozhukov, V., Galichon, A., Hallin, M., Henry, M.: Monge-Kantorovich depth, quantiles, ranks, and signs. arXiv preprint arXiv:1412.8434 (2014)
Cuesta-Albertos, J., Gordaliza, A., Matrán, C., et al.: Trimmed k-means: An attempt to robustify quantizers. Ann. Stat. 25 (2), 553–576 (1997)
Cuesta-Albertos, J.A., Matrán, C., Mayo-Iscar, A.: Trimming and likelihood: robust location and dispersion estimation in the elliptical model. Ann. Stat. 36 (5), 2284–2318 (2008)
Cuevas, A., Febrero, M., Fraiman, R.: Robust estimation and classification for functional data via projection-based depth functions. Comput. Stat. 22, 481–496 (2007)
Emelichev, V., Efimchik, N.: Asymptotic approach to the problem of k-median of a graph. Cybern. Syst. Anal. 30 (5), 726–732 (1994)
Estrada, E., Knight, P.A.: A First Course in Network Theory. Oxford University Press, Oxford (2015)
Fallani, F.D.V., Nicosia, V., Latora, V., Chavez, M.: Nonparametric resampling of random walks for spectral network clustering. Phys. Rev. E 89 (1), 012802 (2014)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486 (3), 75–174 (2010)
Fraiman, D., Fraiman, F., Fraiman, R.: Statistics of dynamic random networks: a depth function approach. arXiv:1408.3584v3 (2015)
Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)
García-Escudero, L.Á., Gordaliza, A.: Robustness properties of k means and trimmed k means. J. Am. Stat. Assoc. 94 (447), 956–969 (1999)
Gogoi, P., Bhattacharyya, D., Borah, B., Kalita, J.K.: A survey of outlier detection methods in network anomaly identification. Comput. J. 54 (4) (2011)
Gupta, M., Gao, J., Han, J.: Community distribution outlier detection in heterogeneous information networks. In: Machine Learning and Knowledge Discovery in Databases, pp. 557–573. Springer, Berlin (2013)
Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11 (9), 1074–1085 (1992)
Hallin, M.: Monge-Kantorovich ranks and signs. GOF DAYS 2015, p. 33 (2015)
Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., Samatova, N.: Community detection in large-scale networks: a survey and empirical evaluation. WIRE Comput. Stat. 6, 426–439 (2014)
Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., Samatova, N.: Community detection in large-scale networks: a survey and empirical evaluation. Wiley Interdiscip. Rev. Comput. Stat. 6 (6), 426–439 (2014)
Holland, P., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Networks 5 (2), 109–137 (1983)
Huber, P.J., Ronchetti, E.: Robust Statistics. Wiley, Hoboken vol. 10(1002). doi:9780470434697 (2009)
Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23 (1), 92–119 (2008)
Hugg, J., Rafalin, E., Seyboth, K., Souvaine, D.: An experimental study of old and new depth measures. In: Proceedings of the Meeting on Algorithm Engineering & Experiments, pp. 51–64. Society for Industrial and Applied Mathematics (2006)
Hyndman, R.J., Shang, H.L.: Rainbow plots, bagplots, and boxplots for functional data. J. Comput. Graph. Stat. 19, 29–45 (2010)
Jin, J.: Fast community detection by score. Ann. Stat. 43 (1), 57–89 (2015)
Jörnsten, R.: Clustering and classification based on the L 1 data depth. J. Multivar. Anal. 90 (1), 67–89 (2004)
Jörnsten, R., Vardi, Y., Zhange, C.-H.: A robust clustering method and visualization tool based on data depth. In: Dodge, Y. (ed.) Statistics in Industry and Technology: Statistical Data Analysis, pp. 353–366. Birkhäuser, Basel (2002)
Joseph, A., Yu, B.: Impact of regularization on spectral clustering. Ann. Stat. 44 (4), 1765–1791 (2016)
Kondo, Y., Salibian-Barrera, M., Zamar, R.: A robust and sparse k-means clustering algorithm. arXiv preprint arXiv:1201.6082 (2012)
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 +ε)-approximation algorithm for k-means clustering in any dimensions. In: Annual Symposium on Foundations of Computer Science, vol. 45, pp. 454–462. IEEE Computer Society Press, New York (2004)
Lange, T., Mosler, K.: Fast nonparametric classification based on data depth. Stat. Pap. 55, 49–69 (2014)
Le, C.M., Vershynin, R.: Concentration and regularization of random graphs. arXiv preprint arXiv:1506.00669 (2015)
Lei, J., Rinaldo, A.: Consistency of spectral clustering in stochastic block models. Ann. Stat. 43 (1), 215–237 (2015)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM, New York (2010)
Liu, R.Y., Parelius, J.M., Singh, K.: Special invited paper: multivariate analysis by data depth: descriptive statistica, graphics and inference. Ann. Stat. 27 (3), 783–858 (1999)
López-Pintado, S., Jörnsten, R.: Functional analysis via extensions of the band depth. In: Complex Datasets and Inverse Problems: Tomography, Networks and Beyond. Lecture Notes-Monograph Series, pp. 103–120. Beachwood, Ohio, USA (2007)
López-Pintado, S., Romo, J.: On the concept of depth for functional data. J. Am. Stat. Assoc. 104, 718–734 (2009)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533, 95–142 (2013)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Newman, M., Clauset, A.: Structure and inference in annotated networks. arXiv preprint arXiv:1507.04001 (2015)
Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103 (23), 8577–8582 (2006)
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)
Nieto-Reyes, A., Battey, H.: A topologically valid definition of depth for functional data. preprint. Stat. Sci. 31 (1), 61–79 (2016)
Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Proceedings of NIPS (2014)
Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20 (10), 1027–1040 (1999)
Plantiè, M., Crampes, M.: Survey on social community detection. Social Media Retrieval Computer Communications and Networks (2012)
Qin, T., Rohe, K.: Regularized spectral clustering under the degree-corrected stochastic blockmodel. In: NIPS, pp. 3120–3128 (2013)
Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39, 1878–1915 (2011)
Sarkar, P., Bickel, P.: Role of normalization in spectral clustering for stochastic blockmodels. Ann. Stat. 43, 962–990 (2013)
Selim, S.Z., Ismail, M.A.: k-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6 (1), 81–87 (1984)
Serfling, R.: Generalized quantile processes based on multivariate depth functions, with applications in nonparametric multivariate analysis. J. Multivar. Anal. 83, 232–247 (2002)
Serfling, R.: Quantile functions for multivariate analysis: approaches and applications. Statistica Neerlandica 56, 214–232 (2002)
Serfling, R.: Depth functions in nonparametric multivariate inference. In: Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 72(1). American Mathematical Society, Providence, RI (2006)
Serfling, R., Wijesuriya, U.: Nonparametric description of functional data using the spatial depth approach (2015). Accessible at www.utdallas.edu/~serfling
Sharma, S., Yadav, R.L.: Comparative study of k-means and robust clustering. Int. J. Adv. Comput. Res. 3 (3), 207 (2013)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 (8), 888–905 (2000)
Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E.: A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Am. Stat. Assoc. 107 (499), 1119–1128 (2012)
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826 (2009)
Thompson, M.E., Ramirez Ramirez, L.L., Lyubchich, V., Gel, Y.R.: Using the bootstrap for statistical inference on random graphs. Can. J. Stat. 44, 3–24 (2016)
Torrente, A., Romo, J.: Refining k-means by bootstrap and data depth (2013). https://www.researchgate.net/profile/Juan_Romo/publication/242090768_Reflning_k-means_by_Bootstrap_and_Data_Depth/links/02e7e528daa72dc0a1000000.pdf
Vardi, Y., Zhang, C.-H.: The multivariate l1-median and associated data depth. Proc. Natl. Acad. Sci. 97 (4), 1423–1426 (2000)
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17 (4), 395–416 (2007)
White, S., Smyth, P.: A spectral clustering approach to finding communities in graph. In: SDM, vol. 5, pp. 76–84 (2005)
Wilson, J.D., Wang, S., Mucha, P.J., Bhamidi, S., Nobel, A.B.: A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8 (3), 1853–1891 (2014)
Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105 (490), 713–726 (2012)
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009)
Zhang, Y., Levina, E., Zhu, J.: Community detection in networks with node features. arXiv preprint arXiv:1509.01173 (2015)
Zhou, W., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28, 461–482 (2000)
Zuo, Y., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28, 461–482 (2000)
Acknowledgements
Authors are grateful to Robert Serfling, Ricardo Fraiman, and Rebecka Jörnsten for motivating discussions at various stages of this paper. Yulia R. Gel is supported in part by the National Science Foundation grant IIS 1633331. This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET) of Canada.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Tian, Y., Gel, Y.R. (2017). Fast Community Detection in Complex Networks with a K-Depths Classifier. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-41573-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)