Similarity enhancement of heterogeneous networks by weighted incorporation of information

Baharifard, Fatemeh; Motaghed, Vahid

doi:10.1007/s10115-023-02050-x

Similarity enhancement of heterogeneous networks by weighted incorporation of information

Regular Paper
Published: 27 January 2024

Volume 66, pages 3133–3156, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Fatemeh Baharifard¹^na1 &
Vahid Motaghed¹^na1

166 Accesses
Explore all metrics

Abstract

In many real-world datasets, different aspects of information are combined, so the data is usually represented as heterogeneous graphs whose nodes and edges have different types. Learning representations in heterogeneous networks is one of the most important topics that can be utilized to extract important details from the networks with the embedding methods. In this paper, we introduce a new framework for embedding heterogeneous graphs. Our model relies on weighted heterogeneous networks with star structures that take structural and attributive similarity into account as well as semantic knowledge. The target nodes form the center of the star and the different attributes of the target nodes form the points of the star. The edge weights are calculated based on three aspects, including the natural language processing in texts, the relationship between different attributes of the dataset and the co-occurrence of each attribute pair in target nodes. We strengthen the similarities between the target nodes by examining the latent connections between the attribute nodes. We find these indirect connections by considering the approximate shortest path between the attributes. By applying the side effect of the star components to the central component, the heterogeneous network is reduced to a homogeneous graph with enhanced similarities. Thus, we can embed this homogeneous graph to capture the similar target nodes. We evaluate our framework for the clustering task and show that our method is more accurate than previous unsupervised algorithms for real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

Graph convolutional networks: a comprehensive review

Article Open access 10 November 2019

A comprehensive survey of link prediction methods

Article 07 September 2023

Notes

Abbreviations

\(\mathcal {N}\) :: Target set
\(\mathcal {A}_i\) :: Information set
\(\mathcal {M}\) :: Main attribute set
\(\mathcal {R}\) :: Relational attribute set
\(\mathcal {T}\) :: Textual attribute set
C :: Clustered set
\(t_j\) :: Text object
\(\textbf{t}_j\) :: Word vector of \(t_j\)
\(\mathbf {t_j^e}\) :: Embedded vector of \(t_j\)
\(\overrightarrow{\textsf {BERT}}(.)\) :: BERT embedding function
\(\textsf {TF}(.)\) :: Rank weighted density function
\(m_j\) :: Number of elements of \(t_j\)
\(\textrm{t}_i^j\) :: i-th word of vector \(\textbf{t}_j\)
\(\textrm{x}_{ih}^j\) :: h-th element of \(\overrightarrow{\textsf {BERT}}(\textrm{t}_i^j)\)
\(\mathcal {B}_h^j\) :: h-th element of \(\mathbf {t_j^e}\)
\(\mathbb {D}\) :: Feature space size
\(\mathbb {f}(.)\) :: Term frequency in target set
\(\mathbb {H}(.)\) :: Term frequency in feature space
\(\mathbb {L}(.)\) :: Text length
\(G =(\mathcal {V}, \mathcal {E}, \mathcal {W})\) :: Star heterogeneous graph
\(G_c=(\mathcal {V}_c, \mathcal {E}_c, \mathcal {W})\) :: Core graph
\(G_s^i=(\mathcal {V}^{i}_s, \mathcal {E}^{i}_s,\mathcal {W})\) :: \(\mathcal {M}_i\) shell graph
\(\overline{G}_{c}=(\mathcal {V}_c, \mathcal {E}_c, \overline{\mathcal {W}})\) :: Homogeneous core graph
\(V_\mathcal {N}\) :: Vertex of target set
\(V_\mathcal {M}\) :: Vertex of main attribute set
\(E_\mathcal {I}\) :: Internal link set
\(E_\mathcal {O}\) :: External link set
\(d_{xy}\) :: Euclidean distance of x, y
\(w_\mathcal {R}(.,.)\) :: Relational weight
\(w_\mathcal {T}(.,.)\) :: Textual weight
\(w_\mathcal {J}(.,.)\) :: Joint presence weight
w(., .):: Total weight
\(p({{\mathcal {M}}_i})\) :: \({{\mathcal {M}}_i}-\)path
\(\rho \langle .,p({{\mathcal {M}}_i}),.\rangle \) :: \({\mathcal {M}}_i\) auxiliary path
\(\rho _i\langle .,.\rangle \) :: \({\mathcal {M}}_i\) shortest path
\(\mathcal {R}(G)\) :: Remapped graph
\(\pi (.)\) :: Remapped function
\(W(.), \overline{W}(.)\) :: Path weight function
\(\mathcal {H}\) :: Spanner graph
\(P_i=\{c_i, \sigma _i, \kappa _i\}\) :: Truncation parameters
\(\alpha =\{\alpha _\mathcal {R}, \alpha _\mathcal {T}, \alpha _\mathcal {J}\}\) :: Weighting coefficients
\(\beta _i\) :: Main attribute impact factor
\(\theta _i\) :: Scaling parameter
\(\mu _i\) :: Spanner parameter
\(\mathbb {N}\) :: Normalization operator

References

Churchill R, Singh L (2021) Topic-noise models: modeling topic and noise distributions in social media post collections. In: 2021 IEEE international conference on data mining (ICDM), pp. 71–80
Li Y, Yu R, Shahabi C, Liu Y (2017) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv preprint arXiv:1707.01926
Atwood J, Towsley D (2016) Diffusion-convolutional neural networks. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems. 30th Conference on neural information processing systems (NIPS 2016), Barcelona, Spain), vol 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/390e982518a50e280d8e2b535462ec1f-Paper.pdf
Shi C, Li Y, Zhang J, Sun Y, Philip SY (2016) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Article Google Scholar
Moscato V, Sperli G (2021) A survey about community detection over on-line social and heterogeneous information networks. Knowl-Based Syst 224:107112
Article Google Scholar
Wang X, Bo D, Shi C, Fan S, Ye Y, Philip SY (2022) A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Tran Big Data 9(2):415–436
Article Google Scholar
Dong Y, Chawla NV, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 135–144
Fu T-y, Lee W-C, Lei Z (2017) Hin2vec: explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp. 1797–1806
Li X, Wu Y, Ester M, Kao B, Wang X, Zheng Y (2017) Semi-supervised clustering in attributed heterogeneous information networks. In: Proceedings of the 26th international conference on World Wide Web, pp. 1621–1629
Carranza AG, Rossi RA, Rao A, Koh E (2020) Higher-order clustering in complex heterogeneous networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 25–35
Fu X, Zhang J, Meng Z, King I (2020) MAGNN: Metapath aggregated graph neural network for heterogeneous graph embedding. In: Proceedings of The Web Conference 2020, pp. 2331–2341
Malliaros FD, Vazirgiannis M (2013) Clustering and community detection in directed networks: a survey. Phys Rep 533(4):95–142
Article MathSciNet Google Scholar
Rokach L, Maimon O (2005) Clustering methods. Springer, Berlin
Book Google Scholar
Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Mining text data. Springer, Boston, MA, pp 163–222. https://doi.org/10.1007/978-1-4614-3223-4_6
Leskovec J, Rajaraman A, Ullman JD (2020) Mining of massive data sets. Cambridge University Press, Cambridge
Book Google Scholar
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543
Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering. In: Proceedings of the 26th international joint conference on artificial intelligence, pp. 1965–1972
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp. 478–487
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained \(k\)-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2001/file/801272ee79cfde7fa5960571fee36b9b-Paper.pdf
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 855–864
Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 701–710
Belkin M, Niyogi P (2001) Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. https://proceedings.neurips.cc/paper_files/paper/2001/file/f106b7f99d2cb30c3db1c3cc0fde9ccb-Paper.pdf
Li J, Wu L, Guo R, Liu C, Liu H (2019) Multi-level network embedding with boosted low-rank matrix approximation. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 49–56
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Article Google Scholar
Li P-Z, Huang L, Wang C-D, Lai J-H (2019) EdMot: an edge enhancement approach for motif-aware community detection. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data Mining, pp. 479–487
Epasto A, Lattanzi S, Paes Leme R (2017) Ego-splitting framework: from non-overlapping to overlapping clusters. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 145–154
Rozemberczki B, Davies R, Sarkar R, Sutton C (2019) GEMSEC: graph embedding with self clustering. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 65–72
Xie Y, Wang X, Jiang D, Xu R (2019) High-performance community detection in social networks using a deep transitive autoencoder. Inf Sci 493:75–90
Article MathSciNet Google Scholar
Jia Y, Zhang Q, Zhang W, Wang X (2019) Communitygan: community detection with generative adversarial nets. In: The World Wide Web Conference, pp. 784–794
Rostami M, Oussalah M, Berahmand K, Farrahi V (2023) Community detection algorithms in healthcare applications: a systematic review. IEEE Access 11:30247
Article Google Scholar
Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: The World Wide Web Conference, pp. 2022–2032
Zhang C, Song D, Huang C, Swami A, Chawla NV (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 793–803
Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 797–806
Forouzandeh S, Berahmand K, Sheikhpour R, Li Y (2023) A new method for recommendation based on embedding spectral clustering in heterogeneous networks (reschet). Expert Syst Appl 231:120699
Article Google Scholar
Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521
Article Google Scholar
Chang Y, Chen C, Hu W, Zheng Z, Zhou X, Chen S (2022) MEGNN: meta-path extracted graph neural network for heterogeneous graph representation learning. Knowl-Based Syst 235:107611
Article Google Scholar
Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: a balance between structural and attribute similarities. ACM Trans Knowl Discov Data 5(2):1–33
Article MathSciNet Google Scholar
Sabbah T, Selamat A, Selamat MH, Al-Anzi FS, Viedma EH, Krejcar O, Fujita H (2017) Modified frequency-based term weighting schemes for text classification. Appl Soft Comput 58:193–206
Article Google Scholar
Schrijver A (2003) Combinatorial optimization: polyhedra and efficiency. Springer, Berlin
Google Scholar
Narasimhan G, Smid M (2007) Geometric spanner networks. Cambridge University Press, Cambridge
Book Google Scholar
Althöfer I, Das G, Dobkin D, Joseph D, Soares J (1993) On sparse spanners of weighted graphs. Discret Comput Geom 9(1):81–100
Article MathSciNet Google Scholar
Thorup M, Zwick U (2005) Approximate distance oracles. J ACM 52(1):1–24
Article MathSciNet Google Scholar
Deutsch M, Krauss RM (1965) Social psychology. Basic Books, New York
Google Scholar
Isenberg DJ (1986) Group polarization: a critical review and meta-analysis. J Pers Soc Psychol 50(6):1141
Article Google Scholar
You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: International conference on machine learning, pp. 7134–7143
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on machine learning, pp. 521–528

Download references

Author information

Fatemeh Baharifard and Vahid Motaghed have equally contributed to this work.

Authors and Affiliations

School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Fatemeh Baharifard & Vahid Motaghed

Authors

Fatemeh Baharifard
View author publications
You can also search for this author in PubMed Google Scholar
Vahid Motaghed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatemeh Baharifard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Baharifard, F., Motaghed, V. Similarity enhancement of heterogeneous networks by weighted incorporation of information. Knowl Inf Syst 66, 3133–3156 (2024). https://doi.org/10.1007/s10115-023-02050-x

Download citation

Received: 16 May 2023
Revised: 29 November 2023
Accepted: 14 December 2023
Published: 27 January 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10115-023-02050-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity enhancement of heterogeneous networks by weighted incorporation of information

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Graph convolutional networks: a comprehensive review

A comprehensive survey of link prediction methods

Notes

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Similarity enhancement of heterogeneous networks by weighted incorporation of information

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

Graph convolutional networks: a comprehensive review

A comprehensive survey of link prediction methods

Notes

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation