Detecting and analyzing missing citations to published scientific entities

Lin, Jialiang; Yu, Yao; Song, Jiaxin; Shi, Xiaodong

doi:10.1007/s11192-022-04334-5

Detecting and analyzing missing citations to published scientific entities

Published: 06 May 2022

Volume 127, pages 2395–2412, (2022)
Cite this article

Scientometrics Aims and scope Submit manuscript

1424 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Proper citation is of great importance in academic writing for it enables knowledge accumulation and maintains academic integrity. However, citing properly is not an easy task. For published scientific entities, the ever-growing academic publications and over-familiarity of terms easily lead to missing citations. To deal with this situation, we design a special method Citation Recommendation for Published Scientific Entity (CRPSE) based on the cooccurrences between published scientific entities and in-text citations in the same sentences from previous researchers. Experimental outcomes show the effectiveness of our method in recommending the source papers for published scientific entities. We further conduct a statistical analysis on missing citations among papers published in prestigious computer science conferences in 2020. In the 12,278 papers collected, 475 published scientific entities of computer science and mathematics are found to have missing citations. Many entities mentioned without citations are found to be well-accepted research results. On a median basis, the papers proposing these published scientific entities with missing citations were published 8 years ago, which can be considered the time frame for a published scientific entity to develop into a well-accepted concept. For published scientific entities, we appeal for accurate and full citation of their source papers as required by academic standards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

Plagiarism in research

Article 04 July 2014

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

Notes

https://app.dimensions.ai/discover/publication.
Data was obtained in November of 2021.
https://www.tensorflow.org/.
https://github.com/tensorflow.
The examples in the figure are created for better illustration, not real examples from S2ORC.
https://github.com/explosion/spaCy.
https://www.kaggle.com/rtatman/english-word-frequency.
https://api.semanticscholar.org/.
Papers with parsing errors are excluded.

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A, Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, DG., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke. M., Yu, Y., & Zheng, X. (2016). TensorFlow: A system for large-scale machine learning. In OSDI.
Amjad, T., Rehmat, Y., Daud, A., & Abbasi, R. A. (2020). Scientific impact of an author and role of self-citations. Scientometrics, 122(2), 915–932. https://doi.org/10.1007/s11192-019-03334-2
Article Google Scholar
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual question answering. In ICCV. https://doi.org/10.1109/ICCV.2015.279
Article Google Scholar
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. In EMNLP-IJCNLP. https://doi.org/10.18653/v1/D19-1371
Bradford, S. C. (1934). Sources of information on specific subjects. Engineering, 137, 85–86.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Brooks, R. L. (1941). On colouring the nodes of a network. Mathematical Proceedings of the Cambridge Philosophical Society, 37(2), 194–197. https://doi.org/10.1017/S030500410002168X
Article MathSciNet MATH Google Scholar
Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV. https://doi.org/10.1007/978-3-030-01234-2_49
Article Google Scholar
Chen, X., Hj, Zhao, Zhao, S., Chen, J., & Yp, Zhang. (2019). Citation recommendation based on citation tendency. Scientometrics, 121(2), 937–956. https://doi.org/10.1007/s11192-019-03225-6
Article Google Scholar
Chung, J. S., Nagrani, A., & Zisserman, A. (2018). VoxCeleb2: Deep speaker recognition. In Interspeech. https://doi.org/10.21437/Interspeech.2018-1929
Ciotti, V., Bonaventura, M., Nicosia, V., Panzarasa, P., & Latora, V. (2016). Homophily and missing links in citation networks. EPJ Data Science. https://doi.org/10.1140/EPJDS/S13688-016-0068-2
Article Google Scholar
Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Furnari, A., Kazakos, E., Moltisanti, D., Munro, J., Perrett, T., Price, W., & Wray, M. (2018). Scaling egocentric vision: The EPIC-KITCHENS dataset. In ECCV. https://doi.org/10.1007/978-3-030-01225-0_44
Article Google Scholar
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. In OSDI.
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Li, F. F. (2009). ImageNet: A large-scale hierarchical image database. In CVPR. https://doi.org/10.1109/CVPR.2009.5206848.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT. https://doi.org/10.18653/v1/N19-1423
Ebesu, T., & Fang, Y. (2017). Neural citation network for context-aware citation recommendation. In SIGIR. https://doi.org/10.1145/3077136.3080730
Fowler, J. H., & Aksnes, D. W. (2007). Does self-citation pay? Scientometrics, 72(3), 427–437. https://doi.org/10.1007/S11192-007-1777-2
Article Google Scholar
Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N. F., Peters, M., Schmitz, M., & Zettlemoyer, L. (2018). AllenNLP: A deep semantic natural language processing platform. In NLP-OSS. https://doi.org/10.18653/v1/W18-2501
Ginsparg, P. (1997). Winners and losers in the global research village. The Serials Librarian, 30(3–4), 83–95. https://doi.org/10.1300/J123v30n03_13
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., & Parikh, D. (2017). Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR. https://doi.org/10.1109/CVPR.2017.670
Gross, B. M. (1964). The managing of organizations: The administrative struggle (Vol. 2). Free Press of Glencoe.
Halpern, J. Y. (2000). CoRR: A computing research repository. ACM Journal of Computer Documentation, 24(2), 41–48. https://doi.org/10.1145/337271.337274
He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, C. L. (2010). Context-aware citation recommendation. In WWW. https://doi.org/10.1145/1772690.1772734
He, Q., Kifer, D., Pei, J., Mitra, P., & Giles, C. L. (2011). Citation recommendation without author supervision. In WSDM. https://doi.org/10.1145/1935826.1935926
Hicks, R. W. (2021). How accurate are your citations? Journal of the American Association of Nurse Practitioners, 33(9), 667–669. https://doi.org/10.1097/jxx.0000000000000645
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Hoeks, F. W. J. M. M., Van Wees-Tangerman, C., Luyben, K. C. A. M., Gasser, K., Schmid, S., & Mommers, H. M. (1997). Stirring as foam disruption (SAFD) technique in fermentation processes. The Canadian Journal of Chemical Engineering, 75(6), 1018–1029. https://doi.org/10.1002/cjce.5450750604
Article Google Scholar
Hu, Z., Lin, G., Sun, T., & Hou, H. (2017). Understanding multiply mentioned references. Journal of Informetrics, 11(4), 948–958. https://doi.org/10.1016/J.JOI.2017.08.004
Article Google Scholar
Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C. L., & Rokach, L. (2012). Recommending citations: Translating papers into references. In CIKM. https://doi.org/10.1145/2396761.2398542
Huang, W., Wu, Z., Liang, C., Mitra, P., & Giles, C. L. (2015). A neural probabilistic model for context based citation recommendation. In AAAI.
Jeong, C., Jang, S., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics, 124(3), 1907–1922. https://doi.org/10.1007/s11192-020-03561-y
Article Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In ICLR.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
Lin, J., Yu, Y., Zhou, Y., Zhou, Z., & Shi, X. (2020). How many preprints have actually been printed and why: A case study of computer science preprints on arXiv. Scientometrics, 124(1), 555–574. https://doi.org/10.1007/s11192-020-03430-8
Article Google Scholar
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT. https://doi.org/10.18653/v1/N18-1202
Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. S. (2020). S2ORC: The Semantic Scholar open research corpus. In ACL. https://doi.org/10.18653/v1/2020.acl-main.447
Lopez, P. (2009). GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. In ECDL. https://doi.org/10.1007/978-3-642-04346-8_62
Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Berkeley Symposium on Mathematical Statistics and Probability.
MathSciNet MATH Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR.
Neumann, M., King, D., Beltagy, I., & Ammar, W. (2019). ScispaCy: Fast and robust models for biomedical natural language processing. In BioNLP workshop. https://doi.org/10.18653/v1/W19-5034
Oh, S., Lei, Z., Lee, W. C., & Yen, J. (2014). Recommending missing citations for newly granted patents. In DSAA. https://doi.org/10.1109/DSAA.2014.7058110
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. In WWW.
Google Scholar
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). LibriSpeech: An ASR corpus based on public domain audio books. In ICASSP. https://doi.org/10.1109/ICASSP.2015.7178964
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In ACL. https://doi.org/10.3115/1073083.1073135
Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A simple data augmentation method for automatic speech recognition. In Interspeech. https://doi.org/10.21437/Interspeech.2019-2680
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. preprint.
Roetzel, P. G. (2019). Information overload in the information age: A review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development. Business Research, 12(2), 479–522. https://doi.org/10.1007/s40685-018-0069-z
Article Google Scholar
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4), 406–425. https://doi.org/10.1093/oxfordjournals.molbev.a040454
Schrödinger, E. (1926). An undulatory theory of the mechanics of atoms and molecules. Physical Review, 28(6), 1049–1070. https://doi.org/10.1103/PhysRev.28.1049
Article Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961
Tan, C., Zhang, L., & Qian, T. (2019). A new supervised learning approach: Statistical adaptive Fourier decomposition (SAFD). In ICONIP. https://doi.org/10.1007/978-3-030-36802-9_42
Trevor, S., Croft, W. B., & Jensen, D. (2007). Recommending citations for academic papers. In SIGIR (pp. 705–706). https://doi.org/10.1145/1277741.1277868
Strohman, T., Croft, W. B., & Jensen, D. (2007). Recommending citations for academic papers. In SIGIR. https://doi.org/10.1145/1277741.1277868
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460.
Article MathSciNet Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In NIPS.
Voorhees, E. M. (1999). The TREC-8 Question Answering track report. In TREC.
Voorhees, E. M., & Harman, D. (1998). Overview of the seventh Text REtrieval Conference (TREC-7). In TREC.
Vrettas, G., & Sanderson, M. (2015). Conferences versus journals in computer science. Journal of the Association for Information Science and Technology, 66(12), 2674–2684. https://doi.org/10.1002/asi.23349
Article Google Scholar
Wang, C., Luo, Z., Zhong, Z., & Li, S. (2021). SAFD: Single shot anchor free face detector. Multimedia Tools and Applications, 80(9), 13761–13785. https://doi.org/10.1007/s11042-020-10401-x
Article Google Scholar
Wang, J. S., & Matyjaszewski, K. (1995). Controlled/"living" radical polymerization. atom transfer radical polymerization in the presence of transition-metal complexes. Journal of the American Chemical Society, 117(20), 5614–5615. https://doi.org/10.1021/ja00125a035
Article Google Scholar
Witten, I. H., & Bell, T. C. (1991). The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4), 1085–1094. https://doi.org/10.1109/18.87000
Article Google Scholar
Yan, E., Chen, Z., & Li, K. (2020). Authors’ status and the perceived quality of their work: Measuring citation sentiment change in nobel articles. Journal of the Association for Information Science and Technology, 71(3), 314–324. https://doi.org/10.1002/asi.24237
Article Google Scholar
Yang, L., Zheng, Y., Cai, X., Dai, H., Mu, D., Guo, L., & Dai, T. (2018). A LSTM based model for personalized context-aware citation recommendation. IEEE Access, 6, 59618–59627. https://doi.org/10.1109/ACCESS.2018.2872730
Article Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. In NeurIPS.
Yin, J., & Li, X. (2017). Personalized citation recommendation via convolutional neural networks. In APWeb-WAIM. https://doi.org/10.1007/978-3-319-63564-4_23
Zhao, M., Yan, E., & Li, K. (2018). Data set mentions and citations: A content analysis of full-text publications. Journal of the Association for Information Science and Technology, 69(1), 32–46. https://doi.org/10.1002/asi.23919
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to acknowledge the support of Yingmin Wang for improving the mathematical expressions. We are grateful to Li Lei, Xun Zhou, Lei Lin and Meizhen Zheng for their help in the data processing. We also appreciate two anonymous reviewers for their valuable comments. Special and heartfelt gratitude goes to the first author’s wife Fenmei Zhou, for her understanding and love. Her unwavering support and continuous encouragement enable this research to be possible.

Funding

This work is partly funded by the 13th Five-Year Plan project Artificial Intelligence and Language of State Language Commission of China (Grant No. WT135-38).

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, China
Jialiang Lin, Yao Yu, Jiaxin Song & Xiaodong Shi

Authors

Jialiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Shi.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J., Yu, Y., Song, J. et al. Detecting and analyzing missing citations to published scientific entities. Scientometrics 127, 2395–2412 (2022). https://doi.org/10.1007/s11192-022-04334-5

Download citation

Received: 01 June 2021
Accepted: 25 February 2022
Published: 06 May 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11192-022-04334-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting and analyzing missing citations to published scientific entities

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Plagiarism in research

Artificial intelligence to automate the systematic review of scientific literature

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting and analyzing missing citations to published scientific entities

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Plagiarism in research

Artificial intelligence to automate the systematic review of scientific literature

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation