Abstract
In this article, we develop a clique-based method for social network clustering. We introduce a new index to evaluate the quality of clustering results, and propose an efficient algorithm based on recursive bipartition to maximize an objective function of the proposed index. The optimization problem is NP-hard, so we approximate the semi-optimal solution via an implicitly restarted Lanczos method. One of the advantages of our algorithm is that the proposed index of each community in the clustering result is guaranteed to be higher than some predetermined threshold, p, which is completely controlled by users. We also account for the situation that p is unknown. A statistical procedure of controlling both under-clustering and over-clustering errors simultaneously is carried out to select localized threshold for each subnetwork, such that the community detection accuracy is optimized. Accordingly, we propose a localized clustering algorithm based on binary tree structure. Finally, we exploit the stochastic blockmodels to conduct simulation studies and demonstrate the accuracy and efficiency of our algorithms, both numerically and graphically.
Similar content being viewed by others
References
Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981–2014.
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
Bickel, P.J., & Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 106, 21068–21073.
Calvetti, D., Reichel, L., Sorensen, D. (1994). An implicitly restarted Lanczos method for large symmetric eigenvalue problems. Electronic Transactions on Numerical Analysis, 2, 1–21.
Chung, F.R.K. (1997). Spectral graph theory. Providence: American Mathematical Society.
Clauset, A., Newman, M.E.J., Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70, 066111.
Erdös, P., & Rényi, A. (1959). On random graphs I. Publicationes Mathematicae, 6, 290–297.
Fortunato, S., & Barthélemy, M. (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America, 104, 36–41.
Fred, A., & Jain, A. (2003). Robust data clustering. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 128–133.
Gilbert, E.N. (1959). Random graphs. Annals of Mathematical Statistics, 30, 1141–1144.
Goldenberg, A., Zheng, A.X., Fienberg, S.E., Airoldi, E.M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning, 2, 129–233.
Handcock, M.S., Raftery, A.E., Tantrum, J.M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society, Series A, 170, 301–354.
Hoff, P.D., Raftery, A.E., Handcock, M.S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97, 1090–1098.
Holland, P.W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 33–50.
Holland, P.W., Laskey, K.B., Leinhardt, S. (1983). Stochastic blockmodels: first steps. Social Networks, 5, 109–137.
Horn, R.A., & Johnson, C.R. (1985). Matrix analysis. New York: Cambridge University Press.
Hubert, L., & Abrabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Lancichinetti, A., & Fortunato, S. (2011). Limits of modularity maximization in community detection. Physical Review E, 84, 066122.
Newman, M.E.J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98, 404–409.
Newman, M.E.J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103, 8577–8582.
Newman, M.E.J., Strogatz, S.H., Watts, D.J. (2001). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64, 026118.
Ng, A.Y., Jordan, M.I., Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 14, 849–856.
Snijders, T.A.B., & Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14, 75–100.
Ouyang, G. (2015). Social network community detection. Ph.D.dissertation, University of Connecticut.
Pao, L.-F. (2014). Discovering the dynamics of smart business networks. Computational Management Science, 1, 445–458.
Pei, X., Zhan, X. -X., Jin, Z. (2017). Application of pair approximation method to modeling and analysis of a marriage network. Applied Mathematics and Computation, 294, 280–293.
Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74, 016110.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22, 888–905.
Watts, D.J., & Strogatz, S.H. (1998). Collective dynamics of “small-world” networks. Nature, 440–442.
Wohlgemuth, J., & Matache, M.T. (2014). Small-wold properties of Facebook group networks. Complex Systems, 23, 197–225.
Acknowledgements
The authors would like to thank the Associate Editor as well as two anonymous reviewers for their insightful comments and suggestions to this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ouyang, G., Dey, D.K. & Zhang, P. Clique-Based Method for Social Network Clustering. J Classif 37, 254–274 (2020). https://doi.org/10.1007/s00357-019-9310-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-9310-5