A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

Iqbal, Sehrish; Hassan, Saeed-Ul; Aljohani, Naif Radi; Alelyani, Salem; Nawaz, Raheel; Bornmann, Lutz

doi:10.1007/s11192-021-04055-1

A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

Published: 23 June 2021

Volume 126, pages 6551–6599, (2021)
Cite this article

Scientometrics Aims and scope Submit manuscript

Sehrish Iqbal¹,
Saeed-Ul Hassan ORCID: orcid.org/0000-0002-6509-9190¹,
Naif Radi Aljohani²,
Salem Alelyani^3,4,
Raheel Nawaz⁵ &
…
Lutz Bornmann⁶

2803 Accesses
29 Citations
Explore all metrics

Abstract

In-text citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Large-Scale Analysis of Cross-lingual Citations in English Papers

Cross-lingual citations in English papers: a large-scale analysis of prevalence, usage, and impact

Article Open access 23 December 2021

Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023

Article 07 December 2023

References

Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards nlp-based bibliometrics. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 596–606.
Agarwal, S., Choubey, L., & Yu, H. (2010). Automatically classifying the role of citations in biomedical articles. AMIA Annual Symposium Proceedings, 2010, 11.
Google Scholar
Ahmad, R., & Afzal, M. T. (2018). CAD: An algorithm for citation-anchors detection in research papers. Scientometrics, 117(3), 1405–1423.
Article Google Scholar
Aljaber, B., Martinez, D., Stokes, N., & Bailey, J. (2011). Improving Mesh classification of biomedical articles using citation contexts. Journal of Biomedical Informatics, 44(5), 881–896.
Article Google Scholar
Ananiadou, S., Thompson, P., & Nawaz, R. (2013, March). Enhancing search: Events and their discourse context. In International conference on intelligent text processing and computational linguistics (pp. 318–334). Berlin, Heidelberg: Springer.
Anderson, M. H. (2006). How can we know what we think until we see what we said?: A citation and citation context analysis of Karl Weick’s The Social Psychology of Organizing. Organization Studies, 27(11), 1675–1692.
Article Google Scholar
Angrosh, M. A., Cranefield, S., & Stanger, N. (2010). Context identification of sentences in related work sections using a conditional random field: Towards intelligent digital libraries. Proceedings of the 10th Annual Joint Conference on Digital Libraries,: 293–302.
Angrosh, M. A., Cranefield, S., & Stanger, N. (2013). Context identification of sentences in research articles: Towards developing intelligent tools for the research community. Natural Language Engineering, 19(4), 481–515.
Article Google Scholar
Athar, A., & Teufel, S. (2012a). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 597–601).
Athar, A., & Teufel, S. (2012b). Detection of implicit citations for sentiment detection. In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (pp. 18–26).
Bakhti, K., Niu, Z., & Nyamawe, A. S. (2018). Semi-automatic annotation for citation function classification. 2018 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO), 43–47.
Balabantaray, R. C., Sarma, C., & Jha, M. (2015). Document clustering using k-means and k-medoids. ArXiv Preprint. https://arxiv.org/abs/1502.07938.
Barrera, A., & Verma, R. (2012). Combining syntax and semantics for automatic extractive single-document summarization. International Conference on Intelligent Text Processing and Computational Linguistics, 366–377.
Batista-Navarro, R. T., Kontonatsios, G., Mihăilă, C., Thompson, P., Rak, R., Nawaz, R., et al. (2013, March). Facilitating the analysis of discourse phenomena in an interoperable NLP platform (pp. 559–571). Berlin, Heidelberg: Springer.
Google Scholar
Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. PLoS Neglected Tropical Diseases, 1(200 920), 83–402.
Google Scholar
Bertin, M., & Atanassova, I. (2017). The context of multiple in-text references and their signification. International Journal on Digital Libraries, 19(2), 127–138.
Google Scholar
Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.
Article Google Scholar
Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.
Article Google Scholar
Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Article Google Scholar
Bornmann, L., Haunschild, R., & Hug, S. E. (2018). Visualizing the context of citations referencing papers published by Eugene Garfield: A new type of keyword co-occurrence analysis. Scientometrics, 114(2), 427–437.
Article Google Scholar
Boyack, K. W., van Eck, N. J., Colavizza, G., & Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of Informetrics, 12(1), 59–73.
Article Google Scholar
Cao, Z., Li, W., & Wu, D. (2016). Polyu at cl-scisumm 2016. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 132–138.
Chang, Y.-W. (2013). A comparison of citation contexts between natural sciences and social sciences and humanities. Scientometrics, 96(2), 535–553.
Article Google Scholar
Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5(4), 423–441.
Article Google Scholar
Cohan, A., & Goharian, N. (2018). Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, 19(2–3), 287–303.
Article Google Scholar
Cohen, A. M., Hersh, W. R., Peterson, K., & Yen, P.-Y. (2006). Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association, 13(2), 206–219.
Article Google Scholar
Conroy, J., & Davis, S. T. (2015). Vector space models for scientific document summarization. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 186–191.
Councill, I. G., Giles, C. L., & Kan, M.-Y. (2008). ParsCit: An Open-source CRF Reference String Parsing Package. LREC, 8, 661–667.
Google Scholar
Cronin, B. (1984). The citation process. The Role and Significance of Citations in Scientific Communication, 103.
Dreyer, M., & Marcu, D. (2012, June). Hyter: Meaning-equivalent semantics for translation evaluation. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 162–171).
Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.
Article Google Scholar
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65(9), 1820–1833.
Article Google Scholar
Dong, C., & Schäfer, U. (2011). Ensemble-style self-training on citation classification. Proceedings of 5th International Joint Conference on Natural Language Processing, 623–631.
Doslu, M., & Bingol, H. O. (2016). Context sensitive article ranking with citation context analysis. Scientometrics, 108(2), 653–671.
Article Google Scholar
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.
Article Google Scholar
Erikson, M. G., & Erlandson, P. (2014). A taxonomy of motives to cite. Social Studies of Science, 44(4), 625–637.
Article Google Scholar
Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.
Fang, H. (2017). A theoretical model of scientific impact based on citations. Malaysian Journal of Library & Information Science, 20(3), 1–13.
Google Scholar
Finney, B. (1979). The reference characteristics of scientific texts [Ph.D. Thesis]. City University (London, England).
Frost, C. O. (1979). The use of citations in literary research: A preliminary classification of citation functions. The Library Quarterly, 49(4), 399–414.
Article Google Scholar
Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics, 6(1), 38.
Article Google Scholar
Garfield, E. (1965) Can citation indexing be automated. Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, 269, 189–192.
Galgani, F., Compton, P., & Hoffmann, A. (2015). Lexa: Building knowledge bases for automatic legal citation classification. Expert Systems with Applications, 42(17–18), 6391–6407.
Article Google Scholar
Gambhir, M., & Gupta, V. (2017). Recent automatic text summarization techniques: A survey. Artificial Intelligence Review, 47(1), 1–66.
Article Google Scholar
Garfield, E. (1956). Citation indexes: New paths to scientific knowledge. The Chemical Bulletin, 43(4), 11.
Google Scholar
Garfield E, E. (1955). Citation indexes to the old testament. Am. Documentation Inst.
Gupta, S., & Varma, V. (2017). Scientific article recommendation by using distributed representations of text and graph. Proceedings of the 26th International Conference on World Wide Web Companion, 1267–1268.
Hassan, S.-U., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, 41–48.
Hassan, S.-U., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics, 117(3), 1645–1662.
Article Google Scholar
Hassan, S.-U., Iqbal, S., Imran, M., Aljohani, N. R., & Nawaz, R. (2018). Mining the context of citations in scientific publications. International Conference on Asian Digital Libraries, 316–322.
Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 174–181.
He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. Proceedings of the 19th International Conference on World Wide Web, 421–430.
Hernández, M., & Gómez, J. M. (2015). Sentiment, polarity and function analysis in bibliometrics: A review. Natural Language Processing and Cognitive Science, 10, 149–160.
Article Google Scholar
Hernández-Alvarez, M., & Gomez, J. M. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572.
Hoffmann, A., & Pham, S. B. (2003). Towards topic-based summarization for interactive document viewing. Proceedings of the 2nd International Conference on Knowledge Capture, 28–35.
Hooten, P. A. (1991). Frequency and functional use of cited documents in information science. Journal of the American Society for Information Science, 42(6), 397–404.
Article Google Scholar
Hu, Z., Chen, C., & Liu, Z. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896.
Article Google Scholar
Hu, Z., Chen, C., & Liu, Z. (2015). The recurrence of citations within a scientific article. ISSI.
Hu, Z., Lin, G., Sun, T., & Hou, H. (2017). Understanding multiply mentioned references. Journal of Informetrics, 11(4), 948–958.
Article Google Scholar
Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C. L., & Rokach, L. (2012). Recommending citations: Translating papers into references. CIKM, 12, 1910–1914.
Google Scholar
Huang, W., Wu, Z., Liang, C., Mitra, P., & Giles, C. L. (2015). A neural probabilistic model for context based citation recommendation. Twenty-Ninth AAAI Conference on Artificial Intelligence.
Hurt, C. D. (1987). Conceptual citation differences in science, technology, and social sciences literature. Information Processing & Management, 23(1), 1–6.
Article Google Scholar
Ikram, M. T., & Afzal, M. T. (2019). Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge. Scientometrics, 119(1), 73–95.
Article Google Scholar
Ikram, M. T., Afzal, M. T., & Butt, N. A. (2018). Automated citation sentiment analysis using high order n-grams: A preliminary investigation. Turkish Journal of Electrical Engineering & Computer Sciences, 26(4), 1922–1932.
Article Google Scholar
Jahangir, M., Afzal, H., Ahmed, M., Khurshid, K., & Nawaz, R. (2017, September). An expert system for diabetes prediction using auto tuned multi-layer perceptron. In 2017 Intelligent systems conference (IntelliSys) (pp. 722–728). IEEE.
Jebari, C., Cobo, M. J., & Herrera-Viedma, E. (2018). A new approach for implicit citation extraction. International Conference on Intelligent Data Engineering and Automated Learning, 121–129.
Jeong, Y. K., Song, M., & Ding, Y. (2014). Content-based author co-citation analysis. Journal of Informetrics, 8(1), 197–211.
Article Google Scholar
Jha, R., Jbara, A.-A., Qazvinian, V., & Radev, D. R. (2017). NLP-driven citation analysis for scientometrics. Natural Language Engineering, 23(1), 93–130.
Article Google Scholar
Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. Proceedings of COLING 2012, 1343–1358.
Judge, T. A., Cable, D. M., Colbert, A. E., & Rynes, S. L. (2007). What causes a management article to be cited—Article, author, or journal? Academy of Management Journal, 50(3), 491–506.
Article Google Scholar
Kaplan, D., Iida, R., & Tokunaga, T. (2009). Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL), 88–95.
Karimi, S., Moraes, L., Das, A., Shakery, A., & Verma, R. (2018). Citance-based retrieval and summarization using IR and machine learning. Scientometrics, 116(2), 1331–1366.
Article Google Scholar
Klampfl, S., Rexha, A., & Kern, R. (2016). Identifying referenced text in scientific publications by summarisation and classification techniques. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 122–131.
Lawrence, S., Giles, C. L., & Bollacker, K. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6), 67–71.
Article Google Scholar
Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, 402–407.
Li, L., Zhang, Y., Mao, L., Chi, J., Chen, M., & Huang, Z. (2017). CIST@ CLSciSumm-17: Multiple features based citation linkage, classification and summarization. BIRNDL@ SIGIR, 2, 43–54.
Google Scholar
Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics (pp. 150–157).
Liu, M. (1993). Progress in documentation the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4), 370–408.
Article Google Scholar
Lopez, P. (2009). GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. International Conference on Theory and Practice of Digital Libraries, 473–474.
Lu, K., Mao, J., Li, G., & Xu, J. (2016). Recognizing reference spans and classifying their discourse facets. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 139–145.
Ma, S., Xu, J., & Zhang, C. (2018). Automatic identification of cited text spans: A multi-classifier approach over imbalanced dataset. Scientometrics, 116, 1303–1330.
Article Google Scholar
Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122(3), 1445–1472.
Article Google Scholar
MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.
Article Google Scholar
Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16–32.
Article Google Scholar
McCain, K., & Turner, K. (1989). Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics, 17(1–2), 127–163.
Article Google Scholar
McCallum, A. K., Nigam, K., Rennie, J., & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval, 3(2), 127–163.
Article Google Scholar
Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. Proceedings of ACL-08: HLT, 816–824.
Mercer, R. E., Di Marco, C., & Kroon, F. W. (2004). The frequency of hedging cues in citation contexts in scientific writing. Conference of the Canadian Society for Computational Studies of Intelligence, 75–88.
Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., & Zajic, D. (2009). Using citations to generate surveys of scientific paradigms. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 584–592.
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.
Nallapati, R. M., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 542–550.
Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., Rodrigues, N. P., & Rife, S. Scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. bioRxiv, 2021.
Nomoto, T. (2016). NEAL: A neurally enhanced approach to linking citation and reference. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 168–174.
Oppenheim, C., & Renn, S. P. (1978). Highly cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information Science, 29(5), 225–231.
Article Google Scholar
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.
Article Google Scholar
Piao, S., Ananiadou, S., Tsuruoka, Y., Sasaki, Y., & McNaught, J. (2007). Mining opinion polarity relations of citations. International Workshop on Computational Semantics (IWCS), 366–371.
Prabha, C. G. (1983). Some aspects of citation behavior: A pilot study in business administration. Journal of the American Society for Information Science, 34(3), 202–206.
Article MathSciNet Google Scholar
Pride, D., & Knoth, P. (2017). Incidental or influential?–A decade of using text-mining for citation function classification. 16th International Society of Scientometrics and Informetrics Conference.
Qazvinian, V., & Radev, D. R. (2010). Identifying non-explicit citing sentences for citation-based summarization. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 555–564.
Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., & Moon, T. (2013). Generating extractive summaries of scientific paradigms. Journal of Artificial Intelligence Research, 46, 165–201.
Article MathSciNet Google Scholar
Radev, D. R., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., & Liu, D. (2004). MEAD-a platform for multidocument multilingual text summarization. Lisbon, Portugal: LREC.
Ritchie, A., Robertson, S., & Teufel, S. (2008). Comparing citation contexts for information retrieval. Proceedings of the 17th ACM Conference on Information and Knowledge Management, 213–222.
Safder, I., & Hassan, S. U. (2019). Bibliometric-enhanced information retrieval: A novel deep feature engineering approach for algorithm searching from full-text publications. Scientometrics, 119(1), 257–277.
Article Google Scholar
Safer, M. A., & Tang, R. (2009). The psychology of referencing in psychology journal articles. Perspectives on Psychological Science, 4(1), 51–53.
Article Google Scholar
Salton, G. (1963). Associative document retrieval techniques using bibliographic information. Journal of the ACM (JACM), 10(4), 440–457.
Article MATH Google Scholar
See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
Shadish, W. R., Tolliver, D., Gray, M., & Sen Gupta, S. K. (1995). Author judgements about works they cite: Three studies from psychology journals. Social Studies of Science, 25(3), 477–498.
Article Google Scholar
Shardlow, M., Batista-Navarro, R., Thompson, P., Nawaz, R., McNaught, J., & Ananiadou, S. (2018). Identification of research hypotheses and new knowledge from scientific literature. BMC Medical Informatics and Decision Making, 18(1), 1–13.
Article Google Scholar
Siddharthan, A., & Teufel, S. (2007). Whose idea was this, and why does it matter? Attributing scientific work to citations. Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, 316–323.
Small, H. (1982). Citation context analysis. Progress in Communication Sciences, 287–310.
Small, H. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71–79.
Article Google Scholar
Small, H. (2018). Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty. Journal of Informetrics, 12(2), 461–480.
Article Google Scholar
Small, H., Tseng, H., & Patek, M. (2017). Discovering discoveries: Identifying biomedical discoveries using citation contexts. Journal of Informetrics, 11(1), 46–62.
Article Google Scholar
Sugiyama, K., Kumar, T., Kan, M.-Y., & Tripathi, R. C. (2010). Identifying citing sentences in research papers using supervised learning. Information Retrieval & Knowledge Management,(CAMP), 2010 International Conference On, 67–72.
Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452–464.
Article Google Scholar
Tahamtan, I., & Bornmann, L. (2019). What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018. Scientometrics, 121(3), 1635–1684.
Article Google Scholar
Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers. 35th German Conference on Artificial Intelligence, 98.
Tang, J., & Zhang, J. (2009, April). A discriminative approach to topic-based citation recommendation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 572-579).
Taşkın, Z., & Al, U. (2018). A content-based citation analysis study based on text categorization. Scientometrics, 114(1), 335–357.
Article Google Scholar
Teufel, S., & Moens, M. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), 409–445.
Article Google Scholar
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 103–110.
Thompson, P., Nawaz, R., McNaught, J., & Ananiadou, S. (2017). Enriching news events with meta-knowledge information. Language Resources and Evaluation, 51(2), 409–438.
Article Google Scholar
Tkaczyk, D., & Bolikowski, L. (2015). Extracting contextual information from scientific literature using CERMINE system. Semantic Web Evaluation. Challenges, 93–104.
Tuarob, S., Kang, S. W., Wettayakorn, P., Pornprasit, C., Sachati, T., Hassan, S.-U., & Haddawy, P. (2019). Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering, 32(10), 1881–1896.
Article Google Scholar
Turney, P. D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 417–424.
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.
Verma, R., & Lee, D. (2017). Extractive summarization: Limits, compression, generalized model and heuristics. Computación y Sistemas, 21(4), 787–798.
Google Scholar
Voos, H., & Dagaev, K. S. (1976). Are all citations equal? Or did we op. cit. your idem? Journal of Academic Librarianship, 1(6), 19–21.
Google Scholar
Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 448–456.
Wang, M., Leng, D., Ren, J., Zeng, Y., & Chen, G. (2019). Sentiment classification based on linguistic patterns in citation context. CURRENT SCIENCE, 117(4), 606.
Article Google Scholar
Wang, W., Villavicencio, P., & Watanabe, T. (2012). Analysis of reference relationships among research papers, based on citation context. International Journal on Artificial Intelligence Tools, 21(02), 1240004.
Article Google Scholar
White, H. D. (2004). Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1), 89–116.
Article Google Scholar
Yang, L., Zheng, Y., Cai, X., Dai, H., Mu, D., Guo, L., & Dai, T. (2018). A LSTM based model for personalized context-aware citation recommendation. IEEE Access, 6, 59618–59627.
Article Google Scholar
Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A. R., Li, I., Friedman, D., & Radev, D. R. (2019). ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7386–7393.
Yin, X., Huang, J. X., & Li, Z. (2011). Mining and modeling linkage information from citation context for improving biomedical literature retrieval. Information Processing & Management, 47(1), 53–67.
Article Google Scholar
Yousif, A., Niu, Z., Chambua, J., & Khan, Z. Y. (2019). Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification. Neurocomputing, 335, 195–205.
Article Google Scholar
Zafar, L., Ahmed, U., & Islam, M. A. (2019). Citation context analysis using word-graph. 2019 2nd International Conference on Communication, Computing and Digital Systems (C-CODE), 120–125.
Zarrinkalam, F., & Kahani, M. (2013). SemCiR: A citation recommendation system based on a novel semantic distance measure. Program, 47(1), 92–112.
Article Google Scholar
Zhang, G., Ding, Y., & Milojević, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology, 64(7), 1490–1503.
Article Google Scholar
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.
Article Google Scholar

Download references

Acknowledgements

The authors (Salem Alelyani and Saeed-Ul Hassan) are grateful for the financial support received from King Khalid University for this research Under Grant No. R.G.P2/100/41.

Author information

Authors and Affiliations

Department of Computer Science, Information Technology University, 346-B, Ferozepur Road, Lahore, Pakistan
Sehrish Iqbal & Saeed-Ul Hassan
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia
Naif Radi Aljohani
Center for Artificial Intelligence (CAI), King Khalid University, PO Box 9004, Abha, 61413, Kingdom of Saudi Arabia
Salem Alelyani
College of Computer Science, King Khalid University, PO Box 9004, Abha, 61413, Kingdom of Saudi Arabia
Salem Alelyani
Department of Operations, Technology, Events and Hospitality Management, Manchester Metropolitan University, Manchester, United Kingdom
Raheel Nawaz
Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstraße, 8, 80539, Munich, Germany
Lutz Bornmann

Authors

Sehrish Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Saeed-Ul Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Naif Radi Aljohani
View author publications
You can also search for this author in PubMed Google Scholar
Salem Alelyani
View author publications
You can also search for this author in PubMed Google Scholar
Raheel Nawaz
View author publications
You can also search for this author in PubMed Google Scholar
Lutz Bornmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed-Ul Hassan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iqbal, S., Hassan, SU., Aljohani, N.R. et al. A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics 126, 6551–6599 (2021). https://doi.org/10.1007/s11192-021-04055-1

Download citation

Received: 06 August 2020
Accepted: 19 May 2021
Published: 23 June 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11192-021-04055-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

Abstract

Access this article

Similar content being viewed by others

A Large-Scale Analysis of Cross-lingual Citations in English Papers

Cross-lingual citations in English papers: a large-scale analysis of prevalence, usage, and impact

Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

Abstract

Access this article

Similar content being viewed by others

A Large-Scale Analysis of Cross-lingual Citations in English Papers

Cross-lingual citations in English papers: a large-scale analysis of prevalence, usage, and impact

Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation