We consider the community detection problem from a partially observable network structure where some edges are not observable. Previous community detection methods are often based solely on the observed connectivity relation and the above situation is not explicitly considered. Even when the connectivity relation is partially observable, if some profile data about the vertices in the network is available, it can be exploited as auxiliary or additional information. We propose to utilize a graph structure (called a profile graph) which is constructed via the profile data, and propose a simple model to utilize both the observed connectivity relation and the profile graph. Furthermore, instead of a hierarchical approach, based on the modularity matrix of the network structure, we propose an embedding approach which utilizes the regularization via the profile graph. Various experiments are conducted over two social network datasets and comparison with several state-of-the-art methods is reported. The results are encouraging and indicate that it is promising to pursue this line of research.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
The problem of how to define (or learn) the suitable similarity function for the profile is out of the scope of this paper.
k = 10 corresponds to sparse graphs, and larger values of k corresponds to denser graphs.
As in Newman (2006), since the eigenvector with eigenvalue 1 is a “trivial” solution, it is not utilized in our method.
Basically the value of k was set to 10. However, since walktrap did not work with k = 10 with KL similarity in IV’04 dataset, the rightmost figure is the comparion with k = 20.
Basu, S., Davidson, I., & Wagstaff, K. (Eds.) (2008). Constrained clustering: Advances in algorithms, theory, and applications. Chapman & Hall/CRC Press.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.
Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press.
Chung, F. (1997). Spectral graph theory. Providence: American Mathematical Society.
Cover, T., & Thomas, J. (2006). Elements of information theory. New York: Wiley.
Dhillon, I. S. (1997). A new O(n 2 ) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. PhD thesis, EECS Department, University of California, Berkeley.
Kannan, R., Vempala, S., & Vetta, A. (2004). On clusterings: Good, bad, and spectral. Journal of the ACM, 51(3), 497–515.
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Mika, P. (2007). Social networks and the semantic web. New York: Springer.
Newman, M. (2006). Finding community structure using the eigenvectors of matrices. Physical Review E, 76(3), 036104.
Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms, 10(2), 191–218.
Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76, 036106.
Ristad, E. (1995). A natural law of succession. Technical Report CS-TR-495-95, Princeton University.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Watts, D. J. (2004). Six degrees: The science of a connected age. W W Norton & Co Inc.
Yoshida, T. (2010a). A graph model for clustering based on mutual information. In Proc. PRICAI-2010 (LNAI 6230) (pp. 339–350).
Yoshida, T. (2010b). A graph model for mutual information based clustering. Journal of Intelligent Information Systems. doi:10.1007/s10844-010-0132-5.
Yoshida, T. (2010c). Performance evaluation of constraints in graph-based semi-supervised clustering. In Proc. AMT-2010 (LNAI 6335) (pp. 138–149).
We express sincere gratitude to the reviewers for their careful reading of the manuscript and for providing valuable suggestions to improve the paper.
Appendix A: Results with different values of k in k-NN graphs
Appendix A: Results with different values of k in k-NN graphs
1.1 A.1 Results with different values of k in other methods
As described in Section 3.2, we conducted experiments with various values of k for k-nearest neighbor profile graphs in both Pascal dataset and IV’04 dataset Section 3.3 reported the results for k = 10. The results for other values of k in k-NN graph (k = 20, 40, 80) are shown in Figs. 6 and 7. Figure 6 is for Pascal dataset, and Fig. 7 for IV’04 dataset.
As illustrated in these figures, results with other values of k are similar to the result with k = 10 in Fig. 2. Especially, as the value of k increases, similar results were observed as in the results for the fully connected (complete) profile graph in Fig. 1. Furthermore, increasing the value of k deteriorated the performance of LabelPropagation. In addition, LeadingEigenvector showed almost no difference except for α = 1 and r = 1. Thus, these results also conform to the results and discussions in Section 3.
1.2 A.2 Results with different values of k in the proposed method
As in Section A1, we also conducted evaluations with other values of k in k-NN graph (k = 20, 40, 80). In addition, for defining the weights in profile graphs, we conducted experiments with both cosine similarity and KL similarity defined in (13). The results are shown in Figs. 8 (Pascal dataset) and 9 (IV’04 dataset). In these figures, first row corresponds to cosine similarity, and second row corresponds to KL similarity. In addition, left column is for k = 20, middle for k = 40, and right for k = 80.
As in the results in Section A1, similar results were observed as in the case when k = 10 (Figs. 3 and 4). Especially, as the value of k increases, the results get similar to the results for the fully connected (complete) profile graph with KL similarity, especially in Pascal dataset (in Fig. 3). On the other hand, in IV’04 dataset, the results with k = 80 are similar to the results with k = 10. Since the number of users (nodes) are much larger in IV’04 dataset, larger value of k would be necessary to observe the similar result with fully connected (complete) profile graph.
About this article
Cite this article
Yoshida, T. Toward finding hidden communities based on user profile. J Intell Inf Syst 40, 189–209 (2013). https://doi.org/10.1007/s10844-011-0175-2