Abstract
Link prediction in co-word network is a quantitative method widely used to predict the research trends and direction of disciplines. It has aroused extensive attention from academia and the industry domain. Most of the methods to date predicting co-word network links are only based on the topology of the co-word network but ignore the characteristics of network nodes. This paper proposes an approach with an attempt to exploit network nodes’ semantic information to improve link prediction in co-word network. Our work involves three major tasks. First, a new semantic feature of network nodes (based on the original technology) was proposed. Second, multiple ground-truth data sets which consist of literature from the Information Science and Library Science, Blockchain and Primary Health Care fields are built. Third, to validate the effectiveness of the new feature and prior ones, extensive prediction experiments are carried out based on the data set we construct. The result shows that the new predictive models with semantic information obtain more than 80% of overall accuracy and more than 0.7 of Area Under Curve, which indicates the effectiveness and stability of the new feature in different feature sets and algorithm sets.
Similar content being viewed by others
Availability of data and materials
I make sure that all data and materials support their published claims and comply with field standards. You can get the data of this study by email zhouliang_bnu@163.com.
Code availability
I make sure that all software application or custom code support their published claims and comply with field standards.
Change history
09 June 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11192-022-04422-6
References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230. https://doi.org/10.1016/S0378-8733(03)00009-l
Ahmed, C., ElKorany, A., & Bahgat, R. (2016). A supervised learning approach to link prediction in Twitter. Social Network Analysis and Mining, 6(1), 1–11. https://doi.org/10.1007/s13278-016-0333-1
Aiello, L. M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B., & Menczer, F. (2012). Friendship prediction and homophily in social media. ACM Transactions on the Web (TWEB), 6(2), 1–33. https://doi.org/10.1145/2180861.2180866
Al-Anzi, F. S., & AbuZeina, D. (2017). Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. Journal of King Saud University-Computer and Information Sciences, 29(2), 189–195. https://doi.org/10.1016/j.jksuci.2016.04.001
Andrews, M., & Vigliocco, G. (2010). The hidden Markov topic model: A probabilistic model of semantic representation. Topics in Cognitive Science, 2(1), 101–113.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.
Brookes, B. (1969). Bradford’s Law and the Bibliography of Science. Nature, 224, 953–956. https://doi.org/10.1038/224953a0
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2014). An evolutionary algorithm approach to link prediction in dynamic social networks. Journal of Computational Science, 5(5), 750–764. https://doi.org/10.1016/j.jocs.2014.01.003
Benchettara, N., Kanawati, R., & Rouveirol, C. (2010). Supervised machine learning applied to link prediction in bipartite social networks. In 2010 international conference on advances in social networks analysis and mining. https://doi.org/10.1109/asonam.2010.87
Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191), 98–101. https://doi.org/10.1038/nature06830
Chen, J., Chen, J., Zhao, S., Zhang, Y., & Tang, J. (2020). Exploiting word embedding for heterogeneous topic model towards patent recommendation. Scientometrics, 125(3), 2091–2108.
Choudhury, N., & Uddin, S. (2016). Time-aware link prediction to explore network effects on temporal knowledge evolution. Scientometrics, 108(2), 745–776. https://doi.org/10.1007/s11192-016-2003-5
Chuan, P. M., Son, L. H., Ali, M., Khang, T. D., Huong, L. T., & Dey, N. (2018). Link prediction in co-authorship networks based on hybrid content similarity metric. Applied Intelligence, 48(8), 2470–2486. https://doi.org/10.1007/s10489-017-1086-x
Donohue, J. C. (1973). Understanding scientific literature: A bibliographic approach (pp. 49–50). The MIT Press.
De Sá, H. R., & Prudêncio, R. B. (2011). Supervised link prediction in weighted networks. In: The 2011 international joint conference on neural networks. https://doi.org/10.1109/ijcnn.2011.6033513
Daud, N. N., et al. (2020). Applications of link prediction in social networks: A review. Journal of Network and Computer Applications, 166(1), 102716. https://doi.org/10.1016/j.jnca.2020.102716
Deza, M. M., & Deza, E. (2009). Encyclopedia of distances (pp. 1–583). Springer.
Eminagaoglu, M. (2020). A new similarity measure for vector space models in text classification and information retrieval. Journal of Information Science. https://doi.org/10.1177/0165551520968055
Elberrichi, Z., & Abidi, K. (2012). Arabic text categorization: A comparative study of different representation modes. International Arab Journal of Information Technology, 9(1), 465–470.
Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka. Bioinformatics, 20(15), 2479–2481. https://doi.org/10.1093/bioinformatics/bth261
George, K. M., Soundarabai, P. B., & Krishnamurthi, K. (2017). Impact of topic modelling methods and text classification techniques in text mining: A survey. International Journal of Advances in Electronics and Computer Science, 4(3), 72–77.
Gong, X., & Cui, L. (2018). Link prediction in MeSH terms co-occurring networks. Journal of Intelligence, 37(1), 66–71.
Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics, 101(2), 1461–1473. https://doi.org/10.1007/s11192-013-1228-9
Haghani, S., & Keyvanpour, M. R. (2019). A systemic analysis of link prediction in social network. Artificial Intelligence Review, 52(3), 1961–1995. https://doi.org/10.1007/s10462-017-9590-2
Hardy, M. (2010). Pareto’s law. Mathematical Intelligencer, 32(3), 38–43.
Huang, L., Chen, X., Ni, X., Liu, J., Cao, X., & Wang, C. (2021). Tracking the dynamics of co-word networks for emerging topic identification. Technological Forecasting and Social Change, 170(1), 1–14. https://doi.org/10.1016/j.techfore.2021.120944
Huang, L., Ma, J., & Chen, C. (2017). Topic detection from microblogs using T-LDA and perplexity. In: 2017 24th Asia-Pacific software engineering conference workshops (APSECW) (pp. 71–77). IEEE. https://doi.org/10.1109/apsecw.2017.11
Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin De La Societe Vaudoise Des Science Naturelles, 37(1), 547–579.
Jelodar, H., Wang, Y., Yuan, C., et al. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications., 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4
Julian, K., & Lu, W. (2016). Application of Machine Learning to Link Prediction.
Kastrin, A., Rindflesch, T., & Hristovski, D. (2016). Link prediction on a network of co-occurring MeSH terms: Towards literature-based discovery. Methods of Information in Medicine, 55(4), 340–346. https://doi.org/10.3414/me15-01-0108
Kumar, V., Chhabra, J. K., & Kumar, D. (2014). Performance evaluation of distance metrics in the clustering algorithms. INFOCOMP Journal of Computer Science, 13(1), 38–52.
Liu, M. J., Zhang, X. F., & Yan, Y. (2016). Research on method of determining scope of word set in co-word analysis based on word frequency, number of words, cumulative word frequency in proportion. Library and Information Service, 60(23), 135–142. https://doi.org/10.13266/j.issn.0252-3116.2016.23.017
Lebedev A., Lee J., Rivera V., Mazzara M. (2017) Link prediction using top-k shortest distances. In British international conference on databases (pp. 101–105). Springer. https://doi.org/10.1007/978-3-319-60795-5_10
Lin, C. H., Konecki, D. M., Liu, M., Wilson, S. J., Nassar, H., Wilkins, A. D., & Lichtarge, O. (2018). Multimodal network diffusion predicts future disease-gene-chemical associations. Bioinformatics, 35(9), 1536–1543. https://doi.org/10.1093/bioinformatics/bty858
Li, Z., Fang, X., Bai, X., & Sheng, O. R. L. (2017). Utility-based link recommendation for online social networks. Management Science, 63(6), 1938–1952. https://doi.org/10.1287/mnsc.2016.2446
Lü, L. Y. (2010). Link prediction on complex networks. Journal of University of Electronic Science and Technology of China, 39(5), 651–661. https://doi.org/10.3969/j.issn.1001-0548.2010.05.002
Liu, W. P., & Lü, L. Y. (2010). Link prediction based on local random walk. EPL (europhysics Letters), 89(5), 58007. https://doi.org/10.1209/0295-5075/89/58007
Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press.
Marjan, M., Zaki, N., & Mohamed, E. A. (2018). Link prediction in dynamic social networks: A literature review. In: 2018 IEEE 5th international congress on information science and technology (CiSt) (pp. 200–207).
Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Gener Comput, 26(3), 245–257.
Martinčić-Ipšić, S., Močibob, E., & Perc, M. (2017). Link prediction on Twitter. PLoS ONE, 12(7), 1–21.
Nassar, H., Benson, A. R., & Gleich, D. F. (2019). Pairwise link prediction. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE.
Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks. Physical Review Letters, 3(64), 25–29.
Salton, G., Mcgill, M. J. (1983). Friendship prediction and homophily in social media (pp. 305–306). MuGraw-Hill.
Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 29, No. 1, pp. 234–265). Cambridge University Press.
Sharma, D., & Sharma, U. (2014). Link prediction algorithm for co-authorship networks using Neural Network. In Proceedings of 3rd international conference on reliability, Infocom technologies and optimization (pp. 1–4). IEEE.
Sirichanya, C., & Kraisak, K. (2021). Semantic data mining in the information age: A systematic review. International Journal of Intelligent Systems, 36(8), 3880–3916. https://doi.org/10.1002/int.22443.
Sun, X., Lin, H., Xu, K., & Ding, K. (2015). How we collaborate: Characterizing, modeling and predicting scientific collaborations. Scientometrics, 104(1), 43–60. https://doi.org/10.1007/s11192-015-1597-3
Wang, Z., et al. (2011). A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS ONE, 6(3), 1–17. https://doi.org/10.1371/journal.pone.0017906
Wang, J., & Dong, Y. (2020). Measurement of text similarity: A survey. Information, 11(9), 1–17. https://doi.org/10.3390/info11090421
Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 178–185). https://doi.org/10.1145/1148170.1148204
Yau, C. K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics, 100(3), 767–786.
Zhang, B., & Ma, F. C. (2015). A review on link prediction of scientific knowledge network. Journal of Library Science in China, 41(217), 399–113. https://doi.org/10.13530/j.cnki.jlis.150016
Zhao, H., Du, L., & Buntine, W. (2017). Leveraging node attributes for incomplete relational data. In International conference on machine learning (pp. 4072–4081).
Zhang, J., & Korfhage, R. R. (1999). A distance and angle similarity measure method. Journal of the American Society for Information Science, 50(9), 772–778.
Zheng, Q., Tian, X., Yang, M., & Wang, H. (2019). Differential learning: A powerful tool for interactive content-based image retrieval. Engineering Letters, 27(1), 1.
Zhou, X., Ding, L., Li, Z., & Wan, R. (2017). Collaborator recommendation in heterogeneous bibliographic networks using random walks. Information Retrieval Journal, 20(4), 317–337. https://doi.org/10.1007/s10791-017-9300-3
Zhou, W., Gu, J., & Jia, Y. (2018). h-Index-based link prediction methods in citation network. Scientometrics, 117(1), 381–390. https://doi.org/10.1007/s11192-018-2867-7
Zhou, L., Zhang, L., Zhao, Y., Zheng, R., & Song, K. (2020). A scientometric review of blockchain research. Information Systems and e-Business Management. https://doi.org/10.1007/s10257-020-00461-9
Funding
This work is supported by Fundamental Research Funds for the Central Universities, Sichuan University (Grant No. skbsh201808).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
The original online version of this article was revised: In the original version of the article few references were incorrectly published.
Rights and permissions
About this article
Cite this article
Xiong, T., Zhou, L., Zhao, Y. et al. Mining semantic information of co-word network to improve link prediction performance. Scientometrics 127, 2981–3004 (2022). https://doi.org/10.1007/s11192-021-04247-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-04247-9