Toward finding hidden communities based on user profile

Yoshida, Tetsuya

doi:10.1007/s10844-011-0175-2

Toward finding hidden communities based on user profile

Published: 01 September 2011

Volume 40, pages 189–209, (2013)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Tetsuya Yoshida¹

604 Accesses
10 Citations
Explore all metrics

Abstract

We consider the community detection problem from a partially observable network structure where some edges are not observable. Previous community detection methods are often based solely on the observed connectivity relation and the above situation is not explicitly considered. Even when the connectivity relation is partially observable, if some profile data about the vertices in the network is available, it can be exploited as auxiliary or additional information. We propose to utilize a graph structure (called a profile graph) which is constructed via the profile data, and propose a simple model to utilize both the observed connectivity relation and the profile graph. Furthermore, instead of a hierarchical approach, based on the modularity matrix of the network structure, we propose an embedding approach which utilizes the regularization via the profile graph. Various experiments are conducted over two social network datasets and comparison with several state-of-the-art methods is reported. The results are encouraging and indicate that it is promising to pursue this line of research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ego-network probabilistic graphical model for discovering on-line communities

Article 06 February 2018

A Network Embedding-Enhanced Approach for Generalized Community Detection

Community Detection for Weighted Networks with Unknown Number of Communities

Article 01 December 2021

Notes

The problem of how to define (or learn) the suitable similarity function for the profile is out of the scope of this paper.
http://analytics.ijs.si/~blazf/pvc/data.html
http://iv.slis.indiana.edu/ref/iv04contest/
http://www.tartarus.org/~martin/PorterStemmer
http://web.media.mit.edu/~hugo/montytagger
k = 10 corresponds to sparse graphs, and larger values of k corresponds to denser graphs.
As in Newman (2006), since the eigenvector with eigenvalue 1 is a “trivial” solution, it is not utilized in our method.
Basically the value of k was set to 10. However, since walktrap did not work with k = 10 with KL similarity in IV’04 dataset, the rightmost figure is the comparion with k = 20.

References

Basu, S., Davidson, I., & Wagstaff, K. (Eds.) (2008). Constrained clustering: Advances in algorithms, theory, and applications. Chapman & Hall/CRC Press.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.
Article MATH Google Scholar
Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge: MIT Press.
Google Scholar
Chung, F. (1997). Spectral graph theory. Providence: American Mathematical Society.
MATH Google Scholar
Cover, T., & Thomas, J. (2006). Elements of information theory. New York: Wiley.
MATH Google Scholar
Dhillon, I. S. (1997). A new O(n ² ) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. PhD thesis, EECS Department, University of California, Berkeley.
Kannan, R., Vempala, S., & Vetta, A. (2004). On clusterings: Good, bad, and spectral. Journal of the ACM, 51(3), 497–515.
Article MathSciNet MATH Google Scholar
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Article MathSciNet MATH Google Scholar
Mika, P. (2007). Social networks and the semantic web. New York: Springer.
Google Scholar
Newman, M. (2006). Finding community structure using the eigenvectors of matrices. Physical Review E, 76(3), 036104.
Article Google Scholar
Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms, 10(2), 191–218.
Article MathSciNet MATH Google Scholar
Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76, 036106.
Article Google Scholar
Ristad, E. (1995). A natural law of succession. Technical Report CS-TR-495-95, Princeton University.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Book Google Scholar
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Article MathSciNet Google Scholar
Watts, D. J. (2004). Six degrees: The science of a connected age. W W Norton & Co Inc.
Yoshida, T. (2010a). A graph model for clustering based on mutual information. In Proc. PRICAI-2010 (LNAI 6230) (pp. 339–350).
Yoshida, T. (2010b). A graph model for mutual information based clustering. Journal of Intelligent Information Systems. doi:10.1007/s10844-010-0132-5.
Yoshida, T. (2010c). Performance evaluation of constraints in graph-based semi-supervised clustering. In Proc. AMT-2010 (LNAI 6335) (pp. 138–149).

Download references

Acknowledgements

We express sincere gratitude to the reviewers for their careful reading of the manuscript and for providing valuable suggestions to improve the paper.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, N-14 W-9, Sapporo, 060-0814, Japan
Tetsuya Yoshida

Authors

Tetsuya Yoshida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuya Yoshida.

Appendix A: Results with different values of k in k-NN graphs

1.1 A.1 Results with different values of k in other methods

As described in Section 3.2, we conducted experiments with various values of k for k-nearest neighbor profile graphs in both Pascal dataset and IV’04 dataset Section 3.3 reported the results for k = 10. The results for other values of k in k-NN graph (k = 20, 40, 80) are shown in Figs. 6 and 7. Figure 6 is for Pascal dataset, and Fig. 7 for IV’04 dataset.

As illustrated in these figures, results with other values of k are similar to the result with k = 10 in Fig. 2. Especially, as the value of k increases, similar results were observed as in the results for the fully connected (complete) profile graph in Fig. 1. Furthermore, increasing the value of k deteriorated the performance of LabelPropagation. In addition, LeadingEigenvector showed almost no difference except for α = 1 and r = 1. Thus, these results also conform to the results and discussions in Section 3.

1.2 A.2 Results with different values of k in the proposed method

As in Section A1, we also conducted evaluations with other values of k in k-NN graph (k = 20, 40, 80). In addition, for defining the weights in profile graphs, we conducted experiments with both cosine similarity and KL similarity defined in (13). The results are shown in Figs. 8 (Pascal dataset) and 9 (IV’04 dataset). In these figures, first row corresponds to cosine similarity, and second row corresponds to KL similarity. In addition, left column is for k = 20, middle for k = 40, and right for k = 80.

As in the results in Section A1, similar results were observed as in the case when k = 10 (Figs. 3 and 4). Especially, as the value of k increases, the results get similar to the results for the fully connected (complete) profile graph with KL similarity, especially in Pascal dataset (in Fig. 3). On the other hand, in IV’04 dataset, the results with k = 80 are similar to the results with k = 10. Since the number of users (nodes) are much larger in IV’04 dataset, larger value of k would be necessary to observe the similar result with fully connected (complete) profile graph.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoshida, T. Toward finding hidden communities based on user profile. J Intell Inf Syst 40, 189–209 (2013). https://doi.org/10.1007/s10844-011-0175-2

Download citation

Received: 26 January 2011
Revised: 11 May 2011
Accepted: 11 August 2011
Published: 01 September 2011
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10844-011-0175-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward finding hidden communities based on user profile

Abstract

Access this article

Similar content being viewed by others

Ego-network probabilistic graphical model for discovering on-line communities

A Network Embedding-Enhanced Approach for Generalized Community Detection

Community Detection for Weighted Networks with Unknown Number of Communities

Notes

References

Acknowledgements