Skip to main content
Log in

Similarity enhancement of heterogeneous networks by weighted incorporation of information

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many real-world datasets, different aspects of information are combined, so the data is usually represented as heterogeneous graphs whose nodes and edges have different types. Learning representations in heterogeneous networks is one of the most important topics that can be utilized to extract important details from the networks with the embedding methods. In this paper, we introduce a new framework for embedding heterogeneous graphs. Our model relies on weighted heterogeneous networks with star structures that take structural and attributive similarity into account as well as semantic knowledge. The target nodes form the center of the star and the different attributes of the target nodes form the points of the star. The edge weights are calculated based on three aspects, including the natural language processing in texts, the relationship between different attributes of the dataset and the co-occurrence of each attribute pair in target nodes. We strengthen the similarities between the target nodes by examining the latent connections between the attribute nodes. We find these indirect connections by considering the approximate shortest path between the attributes. By applying the side effect of the star components to the central component, the heterogeneous network is reduced to a homogeneous graph with enhanced similarities. Thus, we can embed this homogeneous graph to capture the similar target nodes. We evaluate our framework for the clustering task and show that our method is more accurate than previous unsupervised algorithms for real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Algorithm 2
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.imdb.com/.

  2. https://dblp.uni-trier.de/.

  3. developer.twitter.com.

Abbreviations

\(\mathcal {N}\) :

Target set

\(\mathcal {A}_i\) :

Information set

\(\mathcal {M}\) :

Main attribute set

\(\mathcal {R}\) :

Relational attribute set

\(\mathcal {T}\) :

Textual attribute set

C :

Clustered set

\(t_j\) :

Text object

\(\textbf{t}_j\) :

Word vector of \(t_j\)

\(\mathbf {t_j^e}\) :

Embedded vector of \(t_j\)

\(\overrightarrow{\textsf {BERT}}(.)\) :

BERT embedding function

\(\textsf {TF}(.)\) :

Rank weighted density function

\(m_j\) :

Number of elements of \(t_j\)

\(\textrm{t}_i^j\) :

i-th word of vector \(\textbf{t}_j\)

\(\textrm{x}_{ih}^j\) :

h-th element of \(\overrightarrow{\textsf {BERT}}(\textrm{t}_i^j)\)

\(\mathcal {B}_h^j\) :

h-th element of \(\mathbf {t_j^e}\)

\(\mathbb {D}\) :

Feature space size

\(\mathbb {f}(.)\) :

Term frequency in target set

\(\mathbb {H}(.)\) :

Term frequency in feature space

\(\mathbb {L}(.)\) :

Text length

\(G =(\mathcal {V}, \mathcal {E}, \mathcal {W})\) :

Star heterogeneous graph

\(G_c=(\mathcal {V}_c, \mathcal {E}_c, \mathcal {W})\) :

Core graph

\(G_s^i=(\mathcal {V}^{i}_s, \mathcal {E}^{i}_s,\mathcal {W})\) :

\(\mathcal {M}_i\) shell graph

\(\overline{G}_{c}=(\mathcal {V}_c, \mathcal {E}_c, \overline{\mathcal {W}})\) :

Homogeneous core graph

\(V_\mathcal {N}\) :

Vertex of target set

\(V_\mathcal {M}\) :

Vertex of main attribute set

\(E_\mathcal {I}\) :

Internal link set

\(E_\mathcal {O}\) :

External link set

\(d_{xy}\) :

Euclidean distance of xy

\(w_\mathcal {R}(.,.)\) :

Relational weight

\(w_\mathcal {T}(.,.)\) :

Textual weight

\(w_\mathcal {J}(.,.)\) :

Joint presence weight

w(., .):

Total weight

\(p({{\mathcal {M}}_i})\) :

\({{\mathcal {M}}_i}-\)path

\(\rho \langle .,p({{\mathcal {M}}_i}),.\rangle \) :

\({\mathcal {M}}_i\) auxiliary path

\(\rho _i\langle .,.\rangle \) :

\({\mathcal {M}}_i\) shortest path

\(\mathcal {R}(G)\) :

Remapped graph

\(\pi (.)\) :

Remapped function

\(W(.), \overline{W}(.)\) :

Path weight function

\(\mathcal {H}\) :

Spanner graph

\(P_i=\{c_i, \sigma _i, \kappa _i\}\) :

Truncation parameters

\(\alpha =\{\alpha _\mathcal {R}, \alpha _\mathcal {T}, \alpha _\mathcal {J}\}\) :

Weighting coefficients

\(\beta _i\) :

Main attribute impact factor

\(\theta _i\) :

Scaling parameter

\(\mu _i\) :

Spanner parameter

\(\mathbb {N}\) :

Normalization operator

References

  1. Churchill R, Singh L (2021) Topic-noise models: modeling topic and noise distributions in social media post collections. In: 2021 IEEE international conference on data mining (ICDM), pp. 71–80

  2. Li Y, Yu R, Shahabi C, Liu Y (2017) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv preprint arXiv:1707.01926

  3. Atwood J, Towsley D (2016) Diffusion-convolutional neural networks. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems. 30th Conference on neural information processing systems (NIPS 2016), Barcelona, Spain), vol 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/390e982518a50e280d8e2b535462ec1f-Paper.pdf

  4. Shi C, Li Y, Zhang J, Sun Y, Philip SY (2016) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37

    Article  Google Scholar 

  5. Moscato V, Sperli G (2021) A survey about community detection over on-line social and heterogeneous information networks. Knowl-Based Syst 224:107112

    Article  Google Scholar 

  6. Wang X, Bo D, Shi C, Fan S, Ye Y, Philip SY (2022) A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Tran Big Data 9(2):415–436

    Article  Google Scholar 

  7. Dong Y, Chawla NV, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 135–144

  8. Fu T-y, Lee W-C, Lei Z (2017) Hin2vec: explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp. 1797–1806

  9. Li X, Wu Y, Ester M, Kao B, Wang X, Zheng Y (2017) Semi-supervised clustering in attributed heterogeneous information networks. In: Proceedings of the 26th international conference on World Wide Web, pp. 1621–1629

  10. Carranza AG, Rossi RA, Rao A, Koh E (2020) Higher-order clustering in complex heterogeneous networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 25–35

  11. Fu X, Zhang J, Meng Z, King I (2020) MAGNN: Metapath aggregated graph neural network for heterogeneous graph embedding. In: Proceedings of The Web Conference 2020, pp. 2331–2341

  12. Malliaros FD, Vazirgiannis M (2013) Clustering and community detection in directed networks: a survey. Phys Rep 533(4):95–142

    Article  MathSciNet  Google Scholar 

  13. Rokach L, Maimon O (2005) Clustering methods. Springer, Berlin

    Book  Google Scholar 

  14. Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data. Springer, Boston, MA, pp 163–222. https://doi.org/10.1007/978-1-4614-3223-4_6

  15. Leskovec J, Rajaraman A, Ullman JD (2020) Mining of massive data sets. Cambridge University Press, Cambridge

    Book  Google Scholar 

  16. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543

  17. Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  18. Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering. In: Proceedings of the 26th international joint conference on artificial intelligence, pp. 1965–1972

  19. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp. 478–487

  20. Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained \(k\)-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584

  21. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2001/file/801272ee79cfde7fa5960571fee36b9b-Paper.pdf

  22. Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 855–864

  23. Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 701–710

  24. Belkin M, Niyogi P (2001) Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. https://proceedings.neurips.cc/paper_files/paper/2001/file/f106b7f99d2cb30c3db1c3cc0fde9ccb-Paper.pdf

  25. Li J, Wu L, Guo R, Liu C, Liu H (2019) Multi-level network embedding with boosted low-rank matrix approximation. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 49–56

  26. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

  27. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903

  28. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  29. Li P-Z, Huang L, Wang C-D, Lai J-H (2019) EdMot: an edge enhancement approach for motif-aware community detection. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data Mining, pp. 479–487

  30. Epasto A, Lattanzi S, Paes Leme R (2017) Ego-splitting framework: from non-overlapping to overlapping clusters. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 145–154

  31. Rozemberczki B, Davies R, Sarkar R, Sutton C (2019) GEMSEC: graph embedding with self clustering. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 65–72

  32. Xie Y, Wang X, Jiang D, Xu R (2019) High-performance community detection in social networks using a deep transitive autoencoder. Inf Sci 493:75–90

    Article  MathSciNet  Google Scholar 

  33. Jia Y, Zhang Q, Zhang W, Wang X (2019) Communitygan: community detection with generative adversarial nets. In: The World Wide Web Conference, pp. 784–794

  34. Rostami M, Oussalah M, Berahmand K, Farrahi V (2023) Community detection algorithms in healthcare applications: a systematic review. IEEE Access 11:30247

    Article  Google Scholar 

  35. Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: The World Wide Web Conference, pp. 2022–2032

  36. Zhang C, Song D, Huang C, Swami A, Chawla NV (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 793–803

  37. Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 797–806

  38. Forouzandeh S, Berahmand K, Sheikhpour R, Li Y (2023) A new method for recommendation based on embedding spectral clustering in heterogeneous networks (reschet). Expert Syst Appl 231:120699

    Article  Google Scholar 

  39. Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521

    Article  Google Scholar 

  40. Chang Y, Chen C, Hu W, Zheng Z, Zhou X, Chen S (2022) MEGNN: meta-path extracted graph neural network for heterogeneous graph representation learning. Knowl-Based Syst 235:107611

    Article  Google Scholar 

  41. Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: a balance between structural and attribute similarities. ACM Trans Knowl Discov Data 5(2):1–33

    Article  MathSciNet  Google Scholar 

  42. Sabbah T, Selamat A, Selamat MH, Al-Anzi FS, Viedma EH, Krejcar O, Fujita H (2017) Modified frequency-based term weighting schemes for text classification. Appl Soft Comput 58:193–206

    Article  Google Scholar 

  43. Schrijver A (2003) Combinatorial optimization: polyhedra and efficiency. Springer, Berlin

    Google Scholar 

  44. Narasimhan G, Smid M (2007) Geometric spanner networks. Cambridge University Press, Cambridge

    Book  Google Scholar 

  45. Althöfer I, Das G, Dobkin D, Joseph D, Soares J (1993) On sparse spanners of weighted graphs. Discret Comput Geom 9(1):81–100

    Article  MathSciNet  Google Scholar 

  46. Thorup M, Zwick U (2005) Approximate distance oracles. J ACM 52(1):1–24

    Article  MathSciNet  Google Scholar 

  47. Deutsch M, Krauss RM (1965) Social psychology. Basic Books, New York

    Google Scholar 

  48. Isenberg DJ (1986) Group polarization: a critical review and meta-analysis. J Pers Soc Psychol 50(6):1141

    Article  Google Scholar 

  49. You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: International conference on machine learning, pp. 7134–7143

  50. Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on machine learning, pp. 521–528

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatemeh Baharifard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baharifard, F., Motaghed, V. Similarity enhancement of heterogeneous networks by weighted incorporation of information. Knowl Inf Syst 66, 3133–3156 (2024). https://doi.org/10.1007/s10115-023-02050-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02050-x

Keywords

Navigation