Skip to main content

Coupled Semi-supervised Clustering: Exploring Attribute Correlations in Heterogeneous Information Networks

  • Conference paper
  • First Online:
Book cover Web and Big Data (APWeb-WAIM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11641))

  • 1381 Accesses

Abstract

Heterogeneous Information Network (HIN) has been widely adopted in various tasks due to its excellence in modeling complex network data. To handle the additional attributes of nodes in HIN, the Attributed Heterogeneous Information Network (AHIN) was brought forward. Recently, clustering on HIN becomes a hot topic, since it is useful in many applications. Although existing semi-supervised clustering methods in HIN have achieved performance improvements to some extent, these models seldom consider the correlations among attributes which typically exist in real applications. To tackle this issue, we propose a novel model SCAN for semi-supervised clustering in AHIN. Our model captures the coupling relations between mixed types of node attributes and therefore obtains better attribute similarity. Moreover, we propose a flexible constraint method to leverage supervised information and network information for flexible adaption of different datasets and clustering objectives. Extensive experiments have shown that our model outperforms state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.yelp.com/academic_dataset.

  2. 2.

    https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset.

  3. 3.

    https://www.mpaa.org/film-ratings/.

References

  1. Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: In Proceedings of 19th International Conference on Machine Learning, ICML 2002. Citeseer (2002)

    Google Scholar 

  2. Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics, vol. 169. Springer, New York (1997). https://doi.org/10.1007/978-1-4612-0653-8

    Book  MATH  Google Scholar 

  3. Cao, L.: Coupling learning of complex interactions. Inf. Process. Manag. 51(2), 167–186 (2015)

    Article  Google Scholar 

  4. Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6321, pp. 570–586. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_42

    Chapter  Google Scholar 

  5. Jin, R., Breitbart, Y., Muoh, C.: Data discretization unification. Knowl. Inf. Syst. 19(1), 1 (2009)

    Article  Google Scholar 

  6. Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: International Joint Conference of Artificial Intelligence. Stanford InfoLab (2003)

    Google Scholar 

  7. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, Hoboken (2009)

    MATH  Google Scholar 

  8. Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009)

    Article  Google Scholar 

  9. Li, X., Wu, Y., Ester, M., Kao, B., Wang, X., Zheng, Y.: Semi-supervised clustering in attributed heterogeneous information networks. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1621–1629. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  10. Luo, C., Pang, W., Wang, Z.: Semi-supervised clustering on heterogeneous information networks. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS, vol. 8444, pp. 548–559. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06605-9_45

    Chapter  Google Scholar 

  11. Perozzi, B., Akoglu, L., Iglesias Sánchez, P., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1346–1355. ACM (2014)

    Google Scholar 

  12. Rubner, Y., Tomasi, C.: The earth mover’s distance. In: Perceptual Metrics for Image Database Navigation. SECS, vol. 594, pp. 13–28. Springer, Boston (2001). https://doi.org/10.1007/978-1-4757-3343-3_2

    Chapter  Google Scholar 

  13. Shi, C., Kong, X., Huang, Y., Philip, S.Y., Wu, B.: HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)

    Article  Google Scholar 

  14. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  15. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)

    Google Scholar 

  16. Sun, Y., Norick, B., Han, J., Yan, X., Yu, P.S., Yu, X.: Pathselclus: integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Trans. Knowl. Discov. Data (TKDD) 7(3), 11 (2013)

    Google Scholar 

  17. Wang, C., Dong, X., Zhou, F., Cao, L., Chi, C.H.: Coupled attribute similarity learning on categorical data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 781–797 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

This work is supported by the National Key Research and Development Program of China (2017YFB0803304), the National Natural Science Foundation of China (No. 61772082, 61806020, 61702296) the Beijing Municipal Natural Science Foundation (4182043), the CCF-Tencent Open Fund and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuan Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, J., Xiao, D., Hu, L., Shi, C. (2019). Coupled Semi-supervised Clustering: Exploring Attribute Correlations in Heterogeneous Information Networks. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11641. Springer, Cham. https://doi.org/10.1007/978-3-030-26072-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26072-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26071-2

  • Online ISBN: 978-3-030-26072-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics