Advertisement

Clustering for heterogeneous information networks with extended star-structure

  • Jian-Ping MeiEmail author
  • Huajiang Lv
  • Lianghuai Yang
  • Yanjun Li
Article
  • 32 Downloads

Abstract

Clustering of objects in a heterogeneous information network, where different types of objects are linked to each other, is an important problem in heterogeneous information network analysis. Several existing clustering approaches deal with star-structured information networks with different central-attribute relations. In real applications, homogeneous links between central objects may also be available and useful for clustering. In this paper, we propose a new approach called CluEstar for clustering of network with an extended star-structure (E-Star), which extends the classic star-structure by further including central–central relation, i.e., links between objects of the central type. In CluEstar, all objects have a ranking with respect to each cluster to reflect their within-cluster representativeness and determine the clusters of objects that they linked to. A novel objective function is proposed for clustering of E-Star network by formulating both central-attribute and central–central links in an efficient way. Results of extensive experimental studies with benchmark data sets show that the proposed approach is more favorable than existing ones for clustering of E-Star networks with high quality and good efficiency.

Keywords

Clustering Heterogeneous information network Multi-type relational data 

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61502420 and 61772472), the Zhejiang Provincial Natural Science Foundation (Grant Nos. LY16F020032 and LY17F020020).

References

  1. Abdelsadek Y, Chelghoum K, Herrmanna F, Kacem I, Otjacques B (2018) Community extraction and visualization in social networks applied to twitter. Inf Sci 424:204–223CrossRefGoogle Scholar
  2. Banerjee A, Dhillon I, Ghosh J, Meruguand S, Modha DS (2004) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 509–514Google Scholar
  3. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  4. Chen J, Yuan B (2006) Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22:2283–2290CrossRefGoogle Scholar
  5. Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng 22(10):1459–1474CrossRefGoogle Scholar
  6. Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 89–98Google Scholar
  7. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 269–274Google Scholar
  8. Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175CrossRefzbMATHGoogle Scholar
  9. Ding CHQ, He X, Zha H, Gu M, Simon HD (2001) A min–max cut algorithm for graph partitioning and data clustering. In: Proceedings of IEEE international conference on data mining, pp 107–114Google Scholar
  10. Gao B, Liu T-Y, Zheng X, Cheng Q-S, Ma W-Y (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 41–50Google Scholar
  11. Guo Z, Zhu S, Chi Y, Zhang Z, Gong Y (2009) A latent topic model for linked documents. In: Proceedings of international conference on research and development in information retrieval, pp 720–721Google Scholar
  12. Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 359–368Google Scholar
  13. Hofmann T (1999) Probabilistic latent semantic analysis. In: Conference on uncertainty in artificial intelligence, pp 289–296Google Scholar
  14. Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 1507–1515Google Scholar
  15. Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254MathSciNetCrossRefzbMATHGoogle Scholar
  16. Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of European conference on machine learning and data mining, pp 570–586Google Scholar
  17. Krishnamurthy B, Wang J (2000) On network-aware clustering of web clients. SIGCOMM Comput Commun Rev 30:97–110CrossRefGoogle Scholar
  18. Kummamuru K, Dhawale A, Krishnapuram R (2003) Fuzzy co-clustering of documents and keywords. In: Proceedings of the 12th IEEE international conference on fuzzy systems, pp 772–777Google Scholar
  19. Lin W, Yu PS, Zhao Y, Deng B (2016) Multi-type clustering in heterogeneous information networks. Knowl Inf Syst 48(1):143–178CrossRefGoogle Scholar
  20. Long B, Zhang Z, Wu X, Yu PS (2006a) Spectral clustering for multi-type relational data. In: Proceedings of 23th international conference on machine learning, pp 585–592Google Scholar
  21. Long B, Wu X, Zhang Z, Yu PS (2006b) Unsupervised learning on k-partite graphs. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 317–326Google Scholar
  22. Long B, Zhang Z, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 470–479Google Scholar
  23. Long B, Zhang Z, Yu PS (2010) A general framework for relation graph clustering. Knowl Inf Syst 24:393–413CrossRefGoogle Scholar
  24. McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163CrossRefGoogle Scholar
  25. Mei J-P, Chen L (2010) Fuzzy clustering with weighted medoids for relational data. Pattern Recognit 43:1964–1974CrossRefzbMATHGoogle Scholar
  26. Mei J-P, Chen L (2011) Fuzzy clustering approach for star-structured multi-type relational data. In: IEEE international conference on fuzzy systems, pp 2500–2506Google Scholar
  27. Mei J-P, Chen L (2012) A fuzzy approach for multitype relational data clustering. IEEE Trans Fuzzy Syst 20:358–371CrossRefGoogle Scholar
  28. Mei Q, Cai D, Zhang D, Zhai CX (2008) Topic modeling with network regularization. In: Proceedings of international world wide web conference, pp 101–110Google Scholar
  29. Mei J-P, Kwoh C-K, Yang P, Li X-L, Zheng J (2013) Drugtarget interaction prediction by learning from local information and neighbors. Bioinformatics 29(2):238–245CrossRefGoogle Scholar
  30. Miyamoto S, Umayahara K (1998) Fuzzy clustering by quadratic regularization. In: IEEE international conference on fuzzy systems, pp 1394–1399Google Scholar
  31. Pio G, Serafino F, Malerba D, Ceci M (2018) Multi-type clustering and classification from heterogeneous networks. Inf Sci 425:107–126MathSciNetCrossRefGoogle Scholar
  32. Serafino F, Pio G, Ceci M (2018) Ensemble learning for multi-type classification in heterogeneous networks. IEEE Trans Knowl Data Eng, 1–1.  https://doi.org/10.1109/TKDE.2018.2822307
  33. Shafiei MM, Milios EE (2006) Latent Dirichlet co-clustering. In: Proceedings of IEEE international conference on data mining, pp 542–551Google Scholar
  34. Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29:17–37CrossRefGoogle Scholar
  35. Shi Y, Zhu Q, Guo F, Zhang C, Han J (2018) Easing embedding learning by comprehensive transcription of heterogeneous information networks. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 2190–2199Google Scholar
  36. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetzbMATHGoogle Scholar
  37. Sun Y, Han J, Gao J, Yu Y (2009a) itopicmodel: Information network-integrated topic modeling. In: Proceedings of IEEE international conference on data mining, pp 493–502Google Scholar
  38. Sun Y, Yu Y, Han J (2009b) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 797–806Google Scholar
  39. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of international conference on research and development in information retrieval, pp 267–273Google Scholar
  40. Yamanishi Y, Araki M, Gutteridge A (2008) Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24:i232–i240CrossRefGoogle Scholar
  41. Zhang D, Wang F, Zhang C, Li T (2008) Multi-view local learning. In: Proceedings of AAAI conference on artificial intelligence, pp 752–757Google Scholar
  42. Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: Proceedings of international conference on research and development in information retrieval, pp 487–494Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyZhejiang University of TechnologyHangzhouChina

Personalised recommendations