Skip to main content

Semi-supervised Clustering on Heterogeneous Information Networks

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8444)

Abstract

Semi-supervised clustering on information networks combines both the labeled and unlabeled data sets with an aim to improve the clustering performance. However, the existing semi-supervised clustering methods are all designed for homogeneous networks and do not deal with heterogeneous ones. In this work, we propose a semi-supervised clustering approach to analyze heterogeneous information networks, which include multi-typed objects and links and may contain more useful semantic information. The major challenge in the clustering task here is how to handle multi-relations and diverse semantic meanings in heterogeneous networks. In order to deal with this challenge, we introduce the concept of relation-path to measure the similarity between two data objects of the same type. Thereafter, we make use of the labeled information to extract different weights for all relation-paths. Finally, we propose SemiRPClus, a complete framework for semi-supervised learning in heterogeneous networks. Experimental results demonstrate the distinct advantages in effectiveness and efficiency of our framework in comparison with the baseline and some state-of-the-art approaches.

Keywords

  • Heterogeneous information network
  • Semi-supervised clustering

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-06605-9_45
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   99.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-06605-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   129.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fortunato, S.: Community detection in graphs. Physics Reports 486(3), 75–174 (2010)

    CrossRef  MathSciNet  Google Scholar 

  2. Lipka, N., Stein, B., Anderka, M.: Cluster-based one-class ensemble for classification problems in information retrieval. In: SIGIR 2012, pp. 1041–1042. ACM (2012)

    Google Scholar 

  3. Pham, M.C., Cao, Y., et al.: A clustering approach for collaborative filtering recommendation using social network analysis. J. UCS 17(4), 583–604 (2011)

    Google Scholar 

  4. Sun, Y., Han, J., Zhao, P., et al.: Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: ICDT 2009, pp. 565–576. ACM (2009)

    Google Scholar 

  5. Zhu, X.: Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison 2, 3 (2006)

    Google Scholar 

  6. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: ICML, vol. 2, pp. 27–34 (2002)

    Google Scholar 

  7. Zhou, D., Bousquet, O., Lal, T.N., et al.: Learning with local and global consistency. Advances in Neural Information Processing Systems 16(16), 321–328 (2004)

    Google Scholar 

  8. Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, p. 11. ACM (2004)

    Google Scholar 

  9. Sun, Y.E.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: KDD 2012, pp. 1348–1356. ACM (2012)

    Google Scholar 

  10. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)

    Google Scholar 

  11. Shi, C., Kong, X., Yu, P.S., Xie, S., Wu, B.: Relevance search in heterogeneous networks. In: ICDT 2012, pp. 180–191. ACM (2012)

    Google Scholar 

  12. Sun, Y., Barber, R., Gupta, M., et al.: Co-author relationship prediction in heterogeneous bibliographic networks. In: ASONAM 2011, pp. 121–128. IEEE (2011)

    Google Scholar 

  13. Lü, L., Zhou, T.: Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390(6), 1150–1170 (2011)

    CrossRef  Google Scholar 

  14. Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to linear regression analysis, vol. 821. Wiley (2012)

    Google Scholar 

  15. Cai, D., Shao, Z., He, X., Yan, X., Han, J.: Mining hidden community in heterogeneous social networks. In: LinkKDD, pp. 58–65. ACM (2005)

    Google Scholar 

  16. Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied logistic regression. Wiley. com (2013)

    Google Scholar 

  17. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)

    Google Scholar 

  18. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    CrossRef  MATH  MathSciNet  Google Scholar 

  19. Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. The Computer Journal 26(4), 354–359 (1983)

    CrossRef  MATH  Google Scholar 

  20. Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS (LNAI), vol. 6321, pp. 570–586. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Luo, C., Pang, W., Wang, Z. (2014). Semi-supervised Clustering on Heterogeneous Information Networks. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06605-9_45

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06604-2

  • Online ISBN: 978-3-319-06605-9

  • eBook Packages: Computer ScienceComputer Science (R0)