Abstract
Prediction tasks over the nodes and edges of the real-world network structure should learn features used by us, which is useful in many tasks such as node classification, link prediction and so on. Recent research in such field of representation learning has significant progress of deep neural networks combined with a different network walking method. However, present feature learning approaches do not pay much attention to get enough global information combined with getting the diversity of connectivity patterns in the network. An algorithmic framework called as Global2vec and Global2vec-PMI of learning vector feature representations for nodes in networks is proposed. With such network embedding method, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of the network neighborhoods information as well as the global statistics information of nodes. The global and local statistics of the nodes are used for modeling the loss function. We demonstrate the efficiency of our methods over existing state-of-the-art techniques for multi-label classification, link prediction in several different real-world networks. The proposed Global2vec can outperform the compared methods in all cases, and Global2vec-PMI outperforms others in most cases of BlogCatalog, PPI and Flickr dataset with Micro-F1 and Macro-F1 score for multi-label classification task. For link prediction task, generally, Global2vec-PMI is better when using Euclidean and Manhattan distance; for other distance metrics, Global2vec can achieve better performance. The maximum area under curve scores of all distance metrics is mostly obtained by the proposed global co-occurrence statistics-based methods. In conclusion, our work represents a very efficient way for learning vector representations of different network structures.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bonner MF, Epstein RA (2021) Object representations in the human brain reflect the co-occurrence statistics of vision and language. Nat Commun 12(1):4081
Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, pp 891–900
Forcen JI, Pagola M, Barrenechea E, Bustince H (2020) Co-occurrence of deep convolutional features for image search. Image Vis Comput 97:103909
Gallagher B, Eliassi-Rad T (2010) Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In: Advances in social network mining and analysis, Springer, pp 1–19
Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl-Based Syst 151:78–94
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 855–864
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034
Henderson K, Gallagher B, Eliassi-Rad T, Tong H, Basu S, Akoglu L, Koutra D, Faloutsos C, Li L (2012) Rolx: structural role extraction & mining in large graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1231–1239
Henderson K, Gallagher B, Li L, Akoglu L, Eliassi-Rad T, Tong H, Faloutsos C (2011) It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 663–671
Kim DJ, Sun X, Choi J, Lin S, Kweon IS (2020) Detecting human-object interactions with action co-occurrence priors. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, Springer, pp 718–736
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Transact. Knowl. Discov Data (TKDD) 1(1):2
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst 27:2177–2185
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP (2011) Molecular signatures database (msigdb) 3.0. Bioinformatics 27(12):1739–1740
Masoumi N, Khajavi R (2023) A fuzzy classifier for evaluation of research topics by using keyword co-occurrence network and sponsors information. Scientometrics pp 1–28
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 701–710
Sheikh N, Kefato Z, Montresor A (2019) gat2vec. Computing
Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X (2010) The biogrid interaction database: 2011 update. Nucleic Acids Res 39(suppl–1):D698–D704
Tang L, Liu H (2009) Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 817–826
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web, International World Wide Web conferences steering committee, pp 1067–1077
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous. Min. (IJDWM) 3(3):1–13
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903 1(2)
Wang P, Agarwal K, Ham C, Choudhury S, Reddy CK (2021) Self-supervised learning of contextual embeddings for link prediction in heterogeneous networks. Proc. Web Conf. 2021:2946–2957
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1225–1234
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International conference on machine learning, PMLR, pp 6861–6871
Xue G, Zhong M, Li J, Chen J, Zhai C, Kong R (2022) Dynamic network embedding survey. Neurocomputing 472:212–223
Yang J, Leskovec J (2014) Overlapping communities explain core-periphery organization of networks. Proc IEEE 102(12):1892–1902
Yang C, Xiao Y, Zhang Y, Sun Y, Han J (2020) Heterogeneous network representation learning: a unified framework with survey and benchmark
Zafarani R, Liu H (2009) Social computing data repository at asu
Zhang Y, Gao S, Pei J, Huang H (2022) Improving social network embedding via new second-order continuous graph neural networks. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp 2515–2523
Zhao Z, Zhou H, Li C, Tang J, Zeng Q (2021) Deepemlan: deep embedding learning for attributed networks. Inf Sci 543:382–397
Acknowledgements
Fan Ye was supported by the Natural Science Foundation of Anhui Province of China (under grant 1908085MF187) and Key Natural Science Fund of Department of Education of Anhui Province of China (under grant KJ2018A0011).
Funding
This work was supported by the Natural Science Foundation of Anhui Province of China (under grant 1908085MF187) and Key Natural Science Fund of Department of Education of Anhui Province of China (under grant KJ2018A0011).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, F. Co-occurrence statistics-based global and local feature learning for graph networks. Soft Comput 27, 11319–11328 (2023). https://doi.org/10.1007/s00500-023-08665-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-08665-0