Skip to main content
Log in

Mining semantic information of co-word network to improve link prediction performance

  • Published:
Scientometrics Aims and scope Submit manuscript

A Correction to this article was published on 09 June 2022

This article has been updated

Abstract

Link prediction in co-word network is a quantitative method widely used to predict the research trends and direction of disciplines. It has aroused extensive attention from academia and the industry domain. Most of the methods to date predicting co-word network links are only based on the topology of the co-word network but ignore the characteristics of network nodes. This paper proposes an approach with an attempt to exploit network nodes’ semantic information to improve link prediction in co-word network. Our work involves three major tasks. First, a new semantic feature of network nodes (based on the original technology) was proposed. Second, multiple ground-truth data sets which consist of literature from the Information Science and Library Science, Blockchain and Primary Health Care fields are built. Third, to validate the effectiveness of the new feature and prior ones, extensive prediction experiments are carried out based on the data set we construct. The result shows that the new predictive models with semantic information obtain more than 80% of overall accuracy and more than 0.7 of Area Under Curve, which indicates the effectiveness and stability of the new feature in different feature sets and algorithm sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

I make sure that all data and materials support their published claims and comply with field standards. You can get the data of this study by email zhouliang_bnu@163.com.

Code availability

I make sure that all software application or custom code support their published claims and comply with field standards.

Change history

References

  • Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230. https://doi.org/10.1016/S0378-8733(03)00009-l

    Article  Google Scholar 

  • Ahmed, C., ElKorany, A., & Bahgat, R. (2016). A supervised learning approach to link prediction in Twitter. Social Network Analysis and Mining, 6(1), 1–11. https://doi.org/10.1007/s13278-016-0333-1

    Article  Google Scholar 

  • Aiello, L. M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B., & Menczer, F. (2012). Friendship prediction and homophily in social media. ACM Transactions on the Web (TWEB), 6(2), 1–33. https://doi.org/10.1145/2180861.2180866

    Article  Google Scholar 

  • Al-Anzi, F. S., & AbuZeina, D. (2017). Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing. Journal of King Saud University-Computer and Information Sciences, 29(2), 189–195. https://doi.org/10.1016/j.jksuci.2016.04.001

    Article  Google Scholar 

  • Andrews, M., & Vigliocco, G. (2010). The hidden Markov topic model: A probabilistic model of semantic representation. Topics in Cognitive Science, 2(1), 101–113.

    Article  Google Scholar 

  • Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.

    Article  Google Scholar 

  • Brookes, B. (1969). Bradford’s Law and the Bibliography of Science. Nature, 224, 953–956. https://doi.org/10.1038/224953a0

    Article  Google Scholar 

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  MathSciNet  Google Scholar 

  • Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2014). An evolutionary algorithm approach to link prediction in dynamic social networks. Journal of Computational Science, 5(5), 750–764. https://doi.org/10.1016/j.jocs.2014.01.003

    Article  MathSciNet  Google Scholar 

  • Benchettara, N., Kanawati, R., & Rouveirol, C. (2010). Supervised machine learning applied to link prediction in bipartite social networks. In 2010 international conference on advances in social networks analysis and mining. https://doi.org/10.1109/asonam.2010.87

  • Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191), 98–101. https://doi.org/10.1038/nature06830

    Article  Google Scholar 

  • Chen, J., Chen, J., Zhao, S., Zhang, Y., & Tang, J. (2020). Exploiting word embedding for heterogeneous topic model towards patent recommendation. Scientometrics, 125(3), 2091–2108.

    Article  Google Scholar 

  • Choudhury, N., & Uddin, S. (2016). Time-aware link prediction to explore network effects on temporal knowledge evolution. Scientometrics, 108(2), 745–776. https://doi.org/10.1007/s11192-016-2003-5

    Article  Google Scholar 

  • Chuan, P. M., Son, L. H., Ali, M., Khang, T. D., Huong, L. T., & Dey, N. (2018). Link prediction in co-authorship networks based on hybrid content similarity metric. Applied Intelligence, 48(8), 2470–2486. https://doi.org/10.1007/s10489-017-1086-x

  • Donohue, J. C. (1973). Understanding scientific literature: A bibliographic approach (pp. 49–50). The MIT Press.

    Google Scholar 

  • De Sá, H. R., & Prudêncio, R. B. (2011). Supervised link prediction in weighted networks. In: The 2011 international joint conference on neural networks. https://doi.org/10.1109/ijcnn.2011.6033513

  • Daud, N. N., et al. (2020). Applications of link prediction in social networks: A review. Journal of Network and Computer Applications, 166(1), 102716. https://doi.org/10.1016/j.jnca.2020.102716

    Article  Google Scholar 

  • Deza, M. M., & Deza, E. (2009). Encyclopedia of distances (pp. 1–583). Springer.

  • Eminagaoglu, M. (2020). A new similarity measure for vector space models in text classification and information retrieval. Journal of Information Science. https://doi.org/10.1177/0165551520968055

    Article  Google Scholar 

  • Elberrichi, Z., & Abidi, K. (2012). Arabic text categorization: A comparative study of different representation modes. International Arab Journal of Information Technology, 9(1), 465–470.

    Google Scholar 

  • Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka. Bioinformatics, 20(15), 2479–2481. https://doi.org/10.1093/bioinformatics/bth261

    Article  Google Scholar 

  • George, K. M., Soundarabai, P. B., & Krishnamurthi, K. (2017). Impact of topic modelling methods and text classification techniques in text mining: A survey. International Journal of Advances in Electronics and Computer Science, 4(3), 72–77.

    Google Scholar 

  • Gong, X., & Cui, L. (2018). Link prediction in MeSH terms co-occurring networks. Journal of Intelligence, 37(1), 66–71.

    Google Scholar 

  • Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics, 101(2), 1461–1473. https://doi.org/10.1007/s11192-013-1228-9

    Article  Google Scholar 

  • Haghani, S., & Keyvanpour, M. R. (2019). A systemic analysis of link prediction in social network. Artificial Intelligence Review, 52(3), 1961–1995. https://doi.org/10.1007/s10462-017-9590-2

    Article  Google Scholar 

  • Hardy, M. (2010). Pareto’s law. Mathematical Intelligencer, 32(3), 38–43.

    Article  MathSciNet  Google Scholar 

  • Huang, L., Chen, X., Ni, X., Liu, J., Cao, X., & Wang, C. (2021). Tracking the dynamics of co-word networks for emerging topic identification. Technological Forecasting and Social Change, 170(1), 1–14. https://doi.org/10.1016/j.techfore.2021.120944

    Article  Google Scholar 

  • Huang, L., Ma, J., & Chen, C. (2017). Topic detection from microblogs using T-LDA and perplexity. In: 2017 24th Asia-Pacific software engineering conference workshops (APSECW) (pp. 71–77). IEEE. https://doi.org/10.1109/apsecw.2017.11

  • Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin De La Societe Vaudoise Des Science Naturelles, 37(1), 547–579.

    Google Scholar 

  • Jelodar, H., Wang, Y., Yuan, C., et al. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications., 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4

    Article  Google Scholar 

  • Julian, K., & Lu, W. (2016). Application of Machine Learning to Link Prediction.

  • Kastrin, A., Rindflesch, T., & Hristovski, D. (2016). Link prediction on a network of co-occurring MeSH terms: Towards literature-based discovery. Methods of Information in Medicine, 55(4), 340–346. https://doi.org/10.3414/me15-01-0108

    Article  Google Scholar 

  • Kumar, V., Chhabra, J. K., & Kumar, D. (2014). Performance evaluation of distance metrics in the clustering algorithms. INFOCOMP Journal of Computer Science, 13(1), 38–52.

    Google Scholar 

  • Liu, M. J., Zhang, X. F., & Yan, Y. (2016). Research on method of determining scope of word set in co-word analysis based on word frequency, number of words, cumulative word frequency in proportion. Library and Information Service, 60(23), 135–142. https://doi.org/10.13266/j.issn.0252-3116.2016.23.017

    Article  Google Scholar 

  • Lebedev A., Lee J., Rivera V., Mazzara M. (2017) Link prediction using top-k shortest distances. In British international conference on databases (pp. 101–105). Springer. https://doi.org/10.1007/978-3-319-60795-5_10

  • Lin, C. H., Konecki, D. M., Liu, M., Wilson, S. J., Nassar, H., Wilkins, A. D., & Lichtarge, O. (2018). Multimodal network diffusion predicts future disease-gene-chemical associations. Bioinformatics, 35(9), 1536–1543. https://doi.org/10.1093/bioinformatics/bty858

    Article  Google Scholar 

  • Li, Z., Fang, X., Bai, X., & Sheng, O. R. L. (2017). Utility-based link recommendation for online social networks. Management Science, 63(6), 1938–1952. https://doi.org/10.1287/mnsc.2016.2446

    Article  Google Scholar 

  • Lü, L. Y. (2010). Link prediction on complex networks. Journal of University of Electronic Science and Technology of China, 39(5), 651–661. https://doi.org/10.3969/j.issn.1001-0548.2010.05.002

    Article  Google Scholar 

  • Liu, W. P., & Lü, L. Y. (2010). Link prediction based on local random walk. EPL (europhysics Letters), 89(5), 58007. https://doi.org/10.1209/0295-5075/89/58007

    Article  Google Scholar 

  • Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press.

  • Marjan, M., Zaki, N., & Mohamed, E. A. (2018). Link prediction in dynamic social networks: A literature review. In: 2018 IEEE 5th international congress on information science and technology (CiSt) (pp. 200–207).

  • Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Gener Comput, 26(3), 245–257.

    Article  Google Scholar 

  • Martinčić-Ipšić, S., Močibob, E., & Perc, M. (2017). Link prediction on Twitter. PLoS ONE, 12(7), 1–21.

    Article  Google Scholar 

  • Nassar, H., Benson, A. R., & Gleich, D. F. (2019). Pairwise link prediction. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE.

  • Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks. Physical Review Letters, 3(64), 25–29.

    Google Scholar 

  • Salton, G., Mcgill, M. J. (1983). Friendship prediction and homophily in social media (pp. 305–306). MuGraw-Hill.

  • Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 29, No. 1, pp. 234–265). Cambridge University Press.

  • Sharma, D., & Sharma, U. (2014). Link prediction algorithm for co-authorship networks using Neural Network. In Proceedings of 3rd international conference on reliability, Infocom technologies and optimization (pp. 1–4). IEEE.

  • Sirichanya, C., & Kraisak, K. (2021). Semantic data mining in the information age: A systematic review. International Journal of Intelligent Systems, 36(8), 3880–3916. https://doi.org/10.1002/int.22443.

  • Sun, X., Lin, H., Xu, K., & Ding, K. (2015). How we collaborate: Characterizing, modeling and predicting scientific collaborations. Scientometrics, 104(1), 43–60. https://doi.org/10.1007/s11192-015-1597-3

    Article  Google Scholar 

  • Wang, Z., et al. (2011). A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS ONE, 6(3), 1–17. https://doi.org/10.1371/journal.pone.0017906

    Article  MathSciNet  Google Scholar 

  • Wang, J., & Dong, Y. (2020). Measurement of text similarity: A survey. Information, 11(9), 1–17. https://doi.org/10.3390/info11090421

    Article  Google Scholar 

  • Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 178–185). https://doi.org/10.1145/1148170.1148204

  • Yau, C. K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics, 100(3), 767–786.

    Article  Google Scholar 

  • Zhang, B., & Ma, F. C. (2015). A review on link prediction of scientific knowledge network. Journal of Library Science in China, 41(217), 399–113. https://doi.org/10.13530/j.cnki.jlis.150016

    Article  Google Scholar 

  • Zhao, H., Du, L., & Buntine, W. (2017). Leveraging node attributes for incomplete relational data. In International conference on machine learning (pp. 4072–4081).

  • Zhang, J., & Korfhage, R. R. (1999). A distance and angle similarity measure method. Journal of the American Society for Information Science, 50(9), 772–778.

    Article  Google Scholar 

  • Zheng, Q., Tian, X., Yang, M., & Wang, H. (2019). Differential learning: A powerful tool for interactive content-based image retrieval. Engineering Letters, 27(1), 1.

    Google Scholar 

  • Zhou, X., Ding, L., Li, Z., & Wan, R. (2017). Collaborator recommendation in heterogeneous bibliographic networks using random walks. Information Retrieval Journal, 20(4), 317–337. https://doi.org/10.1007/s10791-017-9300-3

    Article  Google Scholar 

  • Zhou, W., Gu, J., & Jia, Y. (2018). h-Index-based link prediction methods in citation network. Scientometrics, 117(1), 381–390. https://doi.org/10.1007/s11192-018-2867-7

    Article  Google Scholar 

  • Zhou, L., Zhang, L., Zhao, Y., Zheng, R., & Song, K. (2020). A scientometric review of blockchain research. Information Systems and e-Business Management. https://doi.org/10.1007/s10257-020-00461-9

    Article  Google Scholar 

Download references

Funding

This work is supported by Fundamental Research Funds for the Central Universities, Sichuan University (Grant No. skbsh201808).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Zhou.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

The original online version of this article was revised: In the original version of the article few references were incorrectly published.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, T., Zhou, L., Zhao, Y. et al. Mining semantic information of co-word network to improve link prediction performance. Scientometrics 127, 2981–3004 (2022). https://doi.org/10.1007/s11192-021-04247-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04247-9

Keywords

Navigation