A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks

Vital, Adilson; Amancio, Diego R.

doi:10.1007/s11192-022-04484-6

A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks

Published: 12 August 2022

Volume 127, pages 6011–6028, (2022)
Cite this article

Scientometrics Aims and scope Submit manuscript

Adilson Vital¹ &
Diego R. Amancio¹

824 Accesses
6 Citations
Explore all metrics

Abstract

Understanding the evolution of paper and author citations is of paramount importance for the design of research policies and evaluation criteria that can promote and accelerate scientific discoveries. Recently many studies on the evolution of science have been conducted in the context of the emergent Science of Science field. While many studies have probed the link problem in citation networks, only a few works have analyzed the temporal nature of link prediction in author citation networks. In this study we compared the performance of 10 well-known local network similarity measurements with four machine learning models to predict future links in author citations networks. Differently from traditional link prediction methods, the temporal nature of the predict links is relevant for our approach. Our analysis revealed that the Jaccard coefficient was found to be among the most relevant measurements. The preferential attachment measurement, conversely, displayed the worst performance. We also found that the extension of local measurements to their weighted version do not significantly improved the performance of predicting citations. Finally, we also found that a XGBoost and neural network approach summarizing the information from all 10 considered similarity measurements was able to provide the highest AUC performance and competitive precision values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time-aware link prediction to explore network effects on temporal knowledge evolution

Article 08 June 2016

Predicting authors’ citation counts and h-indices with a neural network

Article 04 May 2019

Citation burst prediction in a bibliometric network

Article 25 March 2022

References

Adamic, E., & Adar, LA. (2003). Friends and neighbors on the web (3):211–230
Amancio, D. R., Nunes, M. G. V., Oliveira, O. N., Jr., Pardo, T. A. S., Antiqueira, L., & Costa, L. F. (2011). Using metrics from complex networks to evaluate machine translation. Physica A: Statistical Mechanics and its Applications, 390(1), 131–142.
Article Google Scholar
Amancio, D. R., Nunes, Md. G. V., Oliveira, O. N., Jr., & da F Costa L,. (2012). Using complex networks concepts to assess approaches for citations in scientific papers. Scientometrics, 91(3), 827–842.
Amancio, D. R., Oliveira, O. N., Jr., & da Fontoura, Costa L. (2012). Three-feature model to reproduce the topology of citation networks and the effects from authors’ visibility on their h-index. Journal of Informetrics, 6(3), 427–434.
Article Google Scholar
Amancio, D. R., Comin, C. H., Casanova, D., Travieso, G., Bruno, O. M., Rodrigues, F. A., & Costa, L. F. (2014). A systematic comparison of supervised classifiers. PLoS One, 9(4), e94. 137.
Article Google Scholar
Amancio, D. R., Oliveira, O. N., Jr., & Costa, Ld. F. (2015). Topological-collaborative approach for disambiguating authors’ names in collaborative networks. Scientometrics, 102(1), 465–485.
Article Google Scholar
Bai, X., Xia, F., Lee, I., Zhang, J., & Ning, Z. (2016). Identifying anomalous citations for objective evaluation of scholarly article impact. PloS One, 11(9), e0162.
Article Google Scholar
Bai, X., Zhang, F., & Lee, I. (2019). Predicting the citations of scholarly paper. Journal of Informetrics, 13(1), 407–418.
Article Google Scholar
Bai, X., Zhang, F., Ni, J., Shi, L., & Lee, I. (2020). Measure the impact of institution and paper via institution-citation network. IEEE Access, 8, 548–555.
Google Scholar
Barabási, A., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3–4), 590–614. https://doi.org/10.1016/s0378-4371(02)00736-7
Article MathSciNet MATH Google Scholar
Bornmann, L., & Daniel, HD. (2008). What do citation counts measure? a review of studies on citing behavior. Journal of documentation
Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.
Article Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Chacon, X. S., Silva, T. C., & Amancio, D. R. (2020). Comparing the impact of subfields in scientific journals. Scientometrics, 125(1), 625–639.
Article Google Scholar
Chen, S., Dang, D., Macy, R., & Rockwell, C. (2019). Link prediction on the patent citation network. https://crockwell.github.io/data/LP_patent.pdf
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Cui, P., Wang, X., Pei, J., & Zhu, W. (2018). A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31(5), 833–852.
Article Google Scholar
Daud, A., Ahmed, W., Amjad, T., Nasir, JA., Aljohani, NR., Abbasi, RA., & Ahmad, I. (2017). Who will cite you back? reciprocal link prediction in citation networks. Library Hi Tech
Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning, pp 233–240
Edwards, M. A., & Roy, S. (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34(1), 51–61.
Article Google Scholar
Eom, Y. H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS ONE, 6(9), e24-926.
Article Google Scholar
Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., Petersen, A. M., Radicchi, F., Sinatra, R., Uzzi, B., Vespignani, A., Waltman, L., Wang, D., & Barabási, A. L. (2018). Science of science. Science. https://doi.org/10.1126/science.aao0185
Article Google Scholar
Hennemann, S., Rybski, D., & Liefner, I. (2012). The myth of global science collaboration-collaboration patterns in epistemic communities. Journal of Informetrics, 6(2), 217–225.
Article Google Scholar
Hug, SE., & Brändle, MP. (2017). The coverage of microsoft academic: Analyzing the publication output of a university. CoRR arxiv:bs/1703.05539
Hung, S. W., & Wang, A. P. (2010). Examining the small world phenomenon in the patent citation network: a case study of the radio frequency identification (rfid) network. Scientometrics, 82(1), 121–134.
Article Google Scholar
Jain, A., Mao, J., & Mohiuddin, K. (1996). Artificial neural networks: a tutorial. Computer, 29(3), 31–44. https://doi.org/10.1109/2.485891
Article Google Scholar
Katz, J. (1994). Geographical proximity and scientific collaboration. Scientometrics, 31(1), 31–43.
Article Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1995, 1137–1145.
Google Scholar
Krumov, L., Fretter, C., Müller-Hannemann, M., Weihe, K., & Hütt, M. T. (2011). Motifs in co-authorship networks and their relation to the impact of scientific publications. The European Physical Journal B, 84(4), 535–540.
Article Google Scholar
Lande, D., Fu, M., Guo, W., Balagura, I., Gorbov, I., & Yang, H. (2020). Link prediction of scientific collaboration networks based on information retrieval. World Wide Web pp 1–19
Li, W., Aste, T., Caccioli, F., & Livan, G. (2019). Reciprocity and impact in academic careers. EPJ Data Science, 8(1), 20.
Article Google Scholar
Liu, X. F., Chen, H. J., & Sun, W. J. (2021). Adaptive topological coevolution of interdependent networks: Scientific collaboration-citation networks as an example. Physica A: Statistical Mechanics and its Applications, 564(125), 518.
Google Scholar
Lü, L., & Zhou, T. (2010). Link prediction in weighted networks: The role of weak ties. EPL (Europhysics Letters), 89(18), 001.
Google Scholar
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6), 1150–1170.
Article Google Scholar
Martinčić-Ipšić, S., Močibob, E., & Perc, M. (2017). Link prediction on twitter. PLoS ONE, 12(7), 1–21. https://doi.org/10.1371/journal.pone.0181079
Article Google Scholar
Milojević, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767–773.
Article Google Scholar
Molléri, J. S., Petersen, K., & Mendes, E. (2018). Towards understanding the relation between citations and research quality in software engineering studies. Scientometrics, 117(3), 1453–1478.
Article Google Scholar
Newman, M. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025–102.
Google Scholar
Nie, Z., Liu, Y., Yang, L., Li, S., & Pan, F. (2021). Construction and application of materials knowledge graph based on author disambiguation: Revisiting the evolution of lifepo4. Advanced Energy Materials p 2003580
Nielsen, M. A. (2015). Neural networks and deep learning (Vol. 25). CA: Determination press San Francisco.
Google Scholar
Nielsen, M. W., & Andersen, J. P. (2021). Global citation inequality is on the rise. Proceedings of the National Academy of Sciences, 118(7), 2012208118.
Article Google Scholar
Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567.
Article Google Scholar
Parnas, D. L. (2007). Stop the numbers game. Communications of the ACM, 50(11), 19–21.
Article Google Scholar
Powell, W. W., White, D. R., Koput, K. W., & Owen-Smith, J. (2005). Network dynamics and field evolution: The growth of interorganizational collaboration in the life sciences. American Journal of Sociology, 110(4), 1132–1205.
Article Google Scholar
Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(5), 056–103.
Article Google Scholar
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of Database Systems, 5, 532–538.
Article Google Scholar
de Sá, H., & Prudencio, R. (2011). Supervised link prediction in weighted networks. In: Neural Networks (IJCNN), The 2011 International Joint Conference on, IEEE, pp 2281–2288
Sebo, P., de Lucia, S., & Vernaz, N. (2021). Accuracy of pubmed-based author lists of publications and use of author identifiers to address author name ambiguity: a cross-sectional study. Scientometrics pp 1–15
Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.
Article Google Scholar
Silva, F. N., Amancio, D. R., Bardosova, M., Costa, Ld. F., & Oliveira, O. N., Jr. (2016). Using network science and text analytics to produce surveys in a scientific topic. Journal of Informetrics, 10(2), 487–502.
Article Google Scholar
Silva, F. N., Tandon, A., Amancio, D. R., Flammini, A., Menczer, F., Milojević, S., & Fortunato, S. (2020). Recency predicts bursts in the evolution of author citations. Quantitative Science Studies, 1(3), 1298–1308.
Article Google Scholar
Stella, M. (2019). Modelling early word acquisition through multiplex lexical networks and machine learning. Big Data and Cognitive Computing, 3(1), 10.
Article Google Scholar
Stella, M. (2020). Multiplex networks quantify robustness of the mental lexicon to catastrophic concept failures, aphasic degradation and ageing. Physica A: Statistical Mechanics and Its Applications, 554(124), 382.
Google Scholar
Vital, Jr A., & Amancio, DR. (2021). A comparative analysis of local network similarity measurements: application to author citation networks. arXiv:2103.13946
Wang, K., Shen, Z., Huang, C., Wu, C. H., Dong, Y., & Kanakia, A. (2020). Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1), 396–413.
Article Google Scholar
Wang, M., Yu, G., & Yu, D. (2008). Measuring the preferential attachment mechanism in citation networks. Physica A: Statistical Mechanics and its Applications, 387(18), 4692–4698.
Article Google Scholar
Wang, P., Xu, B., Wu, Y., & Zhou, X. (2014). Link prediction in social networks: the state-of-the-art
Wright, RE. (1995). Logistic regression.
Wuestman, M. L., Hoekman, J., & Frenken, K. (2019). The geography of scientific citations. Research Policy, 48(7), 1771–1780.
Article Google Scholar
Yegnanarayana, B. (2009). Artificial neural networks. Delhi: PHI Learning Pvt. Ltd.
Google Scholar
Zhang, G., Ding, Y., & Milojević, S. (2013). Citation content analysis (cca): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology, 64(7), 1490–1503.
Article Google Scholar
Zhang, L., & Ban, Z. (2020). Author name disambiguation based on rule and graph model. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, pp 617–628
Zhou, T., Lü, L., & Zhang, Y. C. (2009). Predicting missing links via local information. The European Physical Journal B, 71(4), 623–630.
Article MATH Google Scholar

Download references

Acknowledgements

A preprint version of this manuscript is available at arXiv (Vital and Amancio 2021). D.R.A. acknowledges financial support from São Paulo Research Foundation (FAPESP Grant No. 2020/06271-0) and CNPq-Brazil (Grant No. 304026/2018-2 and 311074/2021-9). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.

Author information

Authors and Affiliations

Institute of Mathematics and Computer Science, Department of Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil
Adilson Vital & Diego R. Amancio

Authors

Adilson Vital
View author publications
You can also search for this author in PubMed Google Scholar
Diego R. Amancio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego R. Amancio.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 279 kb)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vital, A., Amancio, D.R. A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks. Scientometrics 127, 6011–6028 (2022). https://doi.org/10.1007/s11192-022-04484-6

Download citation

Received: 21 March 2022
Accepted: 01 August 2022
Published: 12 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11192-022-04484-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks

Abstract

Access this article

Similar content being viewed by others

Time-aware link prediction to explore network effects on temporal knowledge evolution

Predicting authors’ citation counts and h-indices with a neural network

Citation burst prediction in a bibliometric network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file1 (PDF 279 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks

Abstract

Access this article

Similar content being viewed by others

Time-aware link prediction to explore network effects on temporal knowledge evolution

Predicting authors’ citation counts and h-indices with a neural network

Citation burst prediction in a bibliometric network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file1 (PDF 279 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation