Skip to main content
Log in

Toward finding hidden communities based on user profile

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We consider the community detection problem from a partially observable network structure where some edges are not observable. Previous community detection methods are often based solely on the observed connectivity relation and the above situation is not explicitly considered. Even when the connectivity relation is partially observable, if some profile data about the vertices in the network is available, it can be exploited as auxiliary or additional information. We propose to utilize a graph structure (called a profile graph) which is constructed via the profile data, and propose a simple model to utilize both the observed connectivity relation and the profile graph. Furthermore, instead of a hierarchical approach, based on the modularity matrix of the network structure, we propose an embedding approach which utilizes the regularization via the profile graph. Various experiments are conducted over two social network datasets and comparison with several state-of-the-art methods is reported. The results are encouraging and indicate that it is promising to pursue this line of research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The problem of how to define (or learn) the suitable similarity function for the profile is out of the scope of this paper.

  2. http://analytics.ijs.si/~blazf/pvc/data.html

  3. http://iv.slis.indiana.edu/ref/iv04contest/

  4. http://www.tartarus.org/~martin/PorterStemmer

  5. http://web.media.mit.edu/~hugo/montytagger

  6. k = 10 corresponds to sparse graphs, and larger values of k corresponds to denser graphs.

  7. As in Newman (2006), since the eigenvector with eigenvalue 1 is a “trivial” solution, it is not utilized in our method.

  8. Basically the value of k was set to 10. However, since walktrap did not work with k = 10 with KL similarity in IV’04 dataset, the rightmost figure is the comparion with k = 20.

References

  • Basu, S., Davidson, I., & Wagstaff, K. (Eds.) (2008). Constrained clustering: Advances in algorithms, theory, and applications. Chapman & Hall/CRC Press.

  • Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.

    Article  MATH  Google Scholar 

  • Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press.

    Google Scholar 

  • Chung, F. (1997). Spectral graph theory. Providence: American Mathematical Society.

    MATH  Google Scholar 

  • Cover, T., & Thomas, J. (2006). Elements of information theory. New York: Wiley.

    MATH  Google Scholar 

  • Dhillon, I. S. (1997). A new O(n 2 ) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. PhD thesis, EECS Department, University of California, Berkeley.

  • Kannan, R., Vempala, S., & Vetta, A. (2004). On clusterings: Good, bad, and spectral. Journal of the ACM, 51(3), 497–515.

    Article  MathSciNet  MATH  Google Scholar 

  • Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

    Article  MathSciNet  MATH  Google Scholar 

  • Mika, P. (2007). Social networks and the semantic web. New York: Springer.

    Google Scholar 

  • Newman, M. (2006). Finding community structure using the eigenvectors of matrices. Physical Review E, 76(3), 036104.

    Article  Google Scholar 

  • Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms, 10(2), 191–218.

    Article  MathSciNet  MATH  Google Scholar 

  • Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76, 036106.

    Article  Google Scholar 

  • Ristad, E. (1995). A natural law of succession. Technical Report CS-TR-495-95, Princeton University.

  • Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  • Watts, D. J. (2004). Six degrees: The science of a connected age. W W Norton & Co Inc.

  • Yoshida, T. (2010a). A graph model for clustering based on mutual information. In Proc. PRICAI-2010 (LNAI 6230) (pp. 339–350).

  • Yoshida, T. (2010b). A graph model for mutual information based clustering. Journal of Intelligent Information Systems. doi:10.1007/s10844-010-0132-5.

  • Yoshida, T. (2010c). Performance evaluation of constraints in graph-based semi-supervised clustering. In Proc. AMT-2010 (LNAI 6335) (pp. 138–149).

Download references

Acknowledgements

We express sincere gratitude to the reviewers for their careful reading of the manuscript and for providing valuable suggestions to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuya Yoshida.

Appendix A: Results with different values of k in k-NN graphs

Appendix A: Results with different values of k in k-NN graphs

1.1 A.1 Results with different values of k in other methods

As described in Section 3.2, we conducted experiments with various values of k for k-nearest neighbor profile graphs in both Pascal dataset and IV’04 dataset Section 3.3 reported the results for k = 10. The results for other values of k in k-NN graph (k = 20, 40, 80) are shown in Figs. 6 and 7. Figure 6 is for Pascal dataset, and Fig. 7 for IV’04 dataset.

Fig. 6
figure 6

Results of a weighted graph model in Pascal dataset (first row: k = 20, second row: k = 40, third row: k = 80)

Fig. 7
figure 7

Results of a weighted graph model in IV’04 dataset (first row: k = 20, second row: k = 40, third row: k = 80)

As illustrated in these figures, results with other values of k are similar to the result with k = 10 in Fig. 2. Especially, as the value of k increases, similar results were observed as in the results for the fully connected (complete) profile graph in Fig. 1. Furthermore, increasing the value of k deteriorated the performance of LabelPropagation. In addition, LeadingEigenvector showed almost no difference except for α = 1 and r = 1. Thus, these results also conform to the results and discussions in Section 3.

1.2 A.2 Results with different values of k in the proposed method

As in Section A1, we also conducted evaluations with other values of k in k-NN graph (k = 20, 40, 80). In addition, for defining the weights in profile graphs, we conducted experiments with both cosine similarity and KL similarity defined in (13). The results are shown in Figs. 8 (Pascal dataset) and 9 (IV’04 dataset). In these figures, first row corresponds to cosine similarity, and second row corresponds to KL similarity. In addition, left column is for k = 20, middle for k = 40, and right for k = 80.

Fig. 8
figure 8

Results of proposed method in Pascal dataset (first row: cosine similarity, second row: KL similarity) (left column: k = 20, middle: k = 40, right: k = 80)

Fig. 9
figure 9

Results of proposed method in IV04 dataset (first row: cosine similarity, second row: KL similarity) (left column: k = 20, middle: k = 40, right: k = 80)

As in the results in Section A1, similar results were observed as in the case when k = 10 (Figs. 3 and 4). Especially, as the value of k increases, the results get similar to the results for the fully connected (complete) profile graph with KL similarity, especially in Pascal dataset (in Fig. 3). On the other hand, in IV’04 dataset, the results with k = 80 are similar to the results with k = 10. Since the number of users (nodes) are much larger in IV’04 dataset, larger value of k would be necessary to observe the similar result with fully connected (complete) profile graph.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoshida, T. Toward finding hidden communities based on user profile. J Intell Inf Syst 40, 189–209 (2013). https://doi.org/10.1007/s10844-011-0175-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0175-2

Keywords

Navigation