In this article, we develop a clique-based method for social network clustering. We introduce a new index to evaluate the quality of clustering results, and propose an efficient algorithm based on recursive bipartition to maximize an objective function of the proposed index. The optimization problem is NP-hard, so we approximate the semi-optimal solution via an implicitly restarted Lanczos method. One of the advantages of our algorithm is that the proposed index of each community in the clustering result is guaranteed to be higher than some predetermined threshold, p, which is completely controlled by users. We also account for the situation that p is unknown. A statistical procedure of controlling both under-clustering and over-clustering errors simultaneously is carried out to select localized threshold for each subnetwork, such that the community detection accuracy is optimized. Accordingly, we propose a localized clustering algorithm based on binary tree structure. Finally, we exploit the stochastic blockmodels to conduct simulation studies and demonstrate the accuracy and efficiency of our algorithms, both numerically and graphically.
Clique-score index Localized clustering algorithm Modularity Social network Spectral analysis Stochastic blockmodel
This is a preview of subscription content, log in to check access.
The authors would like to thank the Associate Editor as well as two anonymous reviewers for their insightful comments and suggestions to this manuscript.
Bickel, P.J., & Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 106, 21068–21073.CrossRefGoogle Scholar
Calvetti, D., Reichel, L., Sorensen, D. (1994). An implicitly restarted Lanczos method for large symmetric eigenvalue problems. Electronic Transactions on Numerical Analysis, 2, 1–21.MathSciNetzbMATHGoogle Scholar
Goldenberg, A., Zheng, A.X., Fienberg, S.E., Airoldi, E.M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning, 2, 129–233.CrossRefGoogle Scholar
Handcock, M.S., Raftery, A.E., Tantrum, J.M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society, Series A, 170, 301–354.MathSciNetCrossRefGoogle Scholar
Hoff, P.D., Raftery, A.E., Handcock, M.S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97, 1090–1098.MathSciNetCrossRefGoogle Scholar
Holland, P.W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 33–50.MathSciNetCrossRefGoogle Scholar
Horn, R.A., & Johnson, C.R. (1985). Matrix analysis. New York: Cambridge University Press.CrossRefGoogle Scholar
Hubert, L., & Abrabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRefGoogle Scholar
Lancichinetti, A., & Fortunato, S. (2011). Limits of modularity maximization in community detection. Physical Review E, 84, 066122.CrossRefGoogle Scholar
Newman, M.E.J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98, 404–409.MathSciNetCrossRefGoogle Scholar
Newman, M.E.J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103, 8577–8582.CrossRefGoogle Scholar
Newman, M.E.J., Strogatz, S.H., Watts, D.J. (2001). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64, 026118.CrossRefGoogle Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 14, 849–856.Google Scholar
Snijders, T.A.B., & Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14, 75–100.MathSciNetCrossRefGoogle Scholar
Ouyang, G. (2015). Social network community detection. Ph.D.dissertation, University of Connecticut.Google Scholar
Pei, X., Zhan, X. -X., Jin, Z. (2017). Application of pair approximation method to modeling and analysis of a marriage network. Applied Mathematics and Computation, 294, 280–293.MathSciNetCrossRefGoogle Scholar