Skip to main content
Log in

A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In-text citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards nlp-based bibliometrics. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 596–606.

  • Agarwal, S., Choubey, L., & Yu, H. (2010). Automatically classifying the role of citations in biomedical articles. AMIA Annual Symposium Proceedings, 2010, 11.

    Google Scholar 

  • Ahmad, R., & Afzal, M. T. (2018). CAD: An algorithm for citation-anchors detection in research papers. Scientometrics, 117(3), 1405–1423.

    Article  Google Scholar 

  • Aljaber, B., Martinez, D., Stokes, N., & Bailey, J. (2011). Improving Mesh classification of biomedical articles using citation contexts. Journal of Biomedical Informatics, 44(5), 881–896.

    Article  Google Scholar 

  • Ananiadou, S., Thompson, P., & Nawaz, R. (2013, March). Enhancing search: Events and their discourse context. In International conference on intelligent text processing and computational linguistics (pp. 318–334). Berlin, Heidelberg: Springer.

  • Anderson, M. H. (2006). How can we know what we think until we see what we said?: A citation and citation context analysis of Karl Weick’s The Social Psychology of Organizing. Organization Studies, 27(11), 1675–1692.

    Article  Google Scholar 

  • Angrosh, M. A., Cranefield, S., & Stanger, N. (2010). Context identification of sentences in related work sections using a conditional random field: Towards intelligent digital libraries. Proceedings of the 10th Annual Joint Conference on Digital Libraries,: 293–302.

  • Angrosh, M. A., Cranefield, S., & Stanger, N. (2013). Context identification of sentences in research articles: Towards developing intelligent tools for the research community. Natural Language Engineering, 19(4), 481–515.

    Article  Google Scholar 

  • Athar, A., & Teufel, S. (2012a). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 597–601).

  • Athar, A., & Teufel, S. (2012b). Detection of implicit citations for sentiment detection. In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse (pp. 18–26).

  • Bakhti, K., Niu, Z., & Nyamawe, A. S. (2018). Semi-automatic annotation for citation function classification. 2018 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO), 43–47.

  • Balabantaray, R. C., Sarma, C., & Jha, M. (2015). Document clustering using k-means and k-medoids. ArXiv Preprint. https://arxiv.org/abs/1502.07938.

  • Barrera, A., & Verma, R. (2012). Combining syntax and semantics for automatic extractive single-document summarization. International Conference on Intelligent Text Processing and Computational Linguistics, 366–377.

  • Batista-Navarro, R. T., Kontonatsios, G., Mihăilă, C., Thompson, P., Rak, R., Nawaz, R., et al. (2013, March). Facilitating the analysis of discourse phenomena in an interoperable NLP platform (pp. 559–571). Berlin, Heidelberg: Springer.

    Google Scholar 

  • Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. PLoS Neglected Tropical Diseases, 1(200 920), 83–402.

    Google Scholar 

  • Bertin, M., & Atanassova, I. (2017). The context of multiple in-text references and their signification. International Journal on Digital Libraries, 19(2), 127–138.

    Google Scholar 

  • Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.

    Article  Google Scholar 

  • Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.

    Article  Google Scholar 

  • Bornmann, L., Haunschild, R., & Hug, S. E. (2018). Visualizing the context of citations referencing papers published by Eugene Garfield: A new type of keyword co-occurrence analysis. Scientometrics, 114(2), 427–437.

    Article  Google Scholar 

  • Boyack, K. W., van Eck, N. J., Colavizza, G., & Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of Informetrics, 12(1), 59–73.

    Article  Google Scholar 

  • Cao, Z., Li, W., & Wu, D. (2016). Polyu at cl-scisumm 2016. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 132–138.

  • Chang, Y.-W. (2013). A comparison of citation contexts between natural sciences and social sciences and humanities. Scientometrics, 96(2), 535–553.

    Article  Google Scholar 

  • Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5(4), 423–441.

    Article  Google Scholar 

  • Cohan, A., & Goharian, N. (2018). Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, 19(2–3), 287–303.

    Article  Google Scholar 

  • Cohen, A. M., Hersh, W. R., Peterson, K., & Yen, P.-Y. (2006). Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association, 13(2), 206–219.

    Article  Google Scholar 

  • Conroy, J., & Davis, S. T. (2015). Vector space models for scientific document summarization. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 186–191.

  • Councill, I. G., Giles, C. L., & Kan, M.-Y. (2008). ParsCit: An Open-source CRF Reference String Parsing Package. LREC, 8, 661–667.

    Google Scholar 

  • Cronin, B. (1984). The citation process. The Role and Significance of Citations in Scientific Communication, 103.

  • Dreyer, M., & Marcu, D. (2012, June). Hyter: Meaning-equivalent semantics for translation evaluation. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 162–171).

  • Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.

    Article  Google Scholar 

  • Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65(9), 1820–1833.

    Article  Google Scholar 

  • Dong, C., & Schäfer, U. (2011). Ensemble-style self-training on citation classification. Proceedings of 5th International Joint Conference on Natural Language Processing, 623–631.

  • Doslu, M., & Bingol, H. O. (2016). Context sensitive article ranking with citation context analysis. Scientometrics, 108(2), 653–671.

    Article  Google Scholar 

  • Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.

    Article  Google Scholar 

  • Erikson, M. G., & Erlandson, P. (2014). A taxonomy of motives to cite. Social Studies of Science, 44(4), 625–637.

    Article  Google Scholar 

  • Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.

  • Fang, H. (2017). A theoretical model of scientific impact based on citations. Malaysian Journal of Library & Information Science, 20(3), 1–13.

    Google Scholar 

  • Finney, B. (1979). The reference characteristics of scientific texts [Ph.D. Thesis]. City University (London, England).

  • Frost, C. O. (1979). The use of citations in literary research: A preliminary classification of citation functions. The Library Quarterly, 49(4), 399–414.

    Article  Google Scholar 

  • Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics, 6(1), 38.

    Article  Google Scholar 

  • Garfield, E. (1965) Can citation indexing be automated. Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, 269, 189–192.

  • Galgani, F., Compton, P., & Hoffmann, A. (2015). Lexa: Building knowledge bases for automatic legal citation classification. Expert Systems with Applications, 42(17–18), 6391–6407.

    Article  Google Scholar 

  • Gambhir, M., & Gupta, V. (2017). Recent automatic text summarization techniques: A survey. Artificial Intelligence Review, 47(1), 1–66.

    Article  Google Scholar 

  • Garfield, E. (1956). Citation indexes: New paths to scientific knowledge. The Chemical Bulletin, 43(4), 11.

    Google Scholar 

  • Garfield E, E. (1955). Citation indexes to the old testament. Am. Documentation Inst.

  • Gupta, S., & Varma, V. (2017). Scientific article recommendation by using distributed representations of text and graph. Proceedings of the 26th International Conference on World Wide Web Companion, 1267–1268.

  • Hassan, S.-U., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, 41–48.

  • Hassan, S.-U., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics, 117(3), 1645–1662.

    Article  Google Scholar 

  • Hassan, S.-U., Iqbal, S., Imran, M., Aljohani, N. R., & Nawaz, R. (2018). Mining the context of citations in scientific publications. International Conference on Asian Digital Libraries, 316–322.

  • Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 174–181.

  • He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. Proceedings of the 19th International Conference on World Wide Web, 421–430.

  • Hernández, M., & Gómez, J. M. (2015). Sentiment, polarity and function analysis in bibliometrics: A review. Natural Language Processing and Cognitive Science, 10, 149–160.

    Article  Google Scholar 

  • Hernández-Alvarez, M., & Gomez, J. M. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572.

  • Hoffmann, A., & Pham, S. B. (2003). Towards topic-based summarization for interactive document viewing. Proceedings of the 2nd International Conference on Knowledge Capture, 28–35.

  • Hooten, P. A. (1991). Frequency and functional use of cited documents in information science. Journal of the American Society for Information Science, 42(6), 397–404.

    Article  Google Scholar 

  • Hu, Z., Chen, C., & Liu, Z. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896.

    Article  Google Scholar 

  • Hu, Z., Chen, C., & Liu, Z. (2015). The recurrence of citations within a scientific article. ISSI.

  • Hu, Z., Lin, G., Sun, T., & Hou, H. (2017). Understanding multiply mentioned references. Journal of Informetrics, 11(4), 948–958.

    Article  Google Scholar 

  • Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C. L., & Rokach, L. (2012). Recommending citations: Translating papers into references. CIKM, 12, 1910–1914.

    Google Scholar 

  • Huang, W., Wu, Z., Liang, C., Mitra, P., & Giles, C. L. (2015). A neural probabilistic model for context based citation recommendation. Twenty-Ninth AAAI Conference on Artificial Intelligence.

  • Hurt, C. D. (1987). Conceptual citation differences in science, technology, and social sciences literature. Information Processing & Management, 23(1), 1–6.

    Article  Google Scholar 

  • Ikram, M. T., & Afzal, M. T. (2019). Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge. Scientometrics, 119(1), 73–95.

    Article  Google Scholar 

  • Ikram, M. T., Afzal, M. T., & Butt, N. A. (2018). Automated citation sentiment analysis using high order n-grams: A preliminary investigation. Turkish Journal of Electrical Engineering & Computer Sciences, 26(4), 1922–1932.

    Article  Google Scholar 

  • Jahangir, M., Afzal, H., Ahmed, M., Khurshid, K., & Nawaz, R. (2017, September). An expert system for diabetes prediction using auto tuned multi-layer perceptron. In 2017 Intelligent systems conference (IntelliSys) (pp. 722–728). IEEE.

  • Jebari, C., Cobo, M. J., & Herrera-Viedma, E. (2018). A new approach for implicit citation extraction. International Conference on Intelligent Data Engineering and Automated Learning, 121–129.

  • Jeong, Y. K., Song, M., & Ding, Y. (2014). Content-based author co-citation analysis. Journal of Informetrics, 8(1), 197–211.

    Article  Google Scholar 

  • Jha, R., Jbara, A.-A., Qazvinian, V., & Radev, D. R. (2017). NLP-driven citation analysis for scientometrics. Natural Language Engineering, 23(1), 93–130.

    Article  Google Scholar 

  • Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. Proceedings of COLING 2012, 1343–1358.

  • Judge, T. A., Cable, D. M., Colbert, A. E., & Rynes, S. L. (2007). What causes a management article to be cited—Article, author, or journal? Academy of Management Journal, 50(3), 491–506.

    Article  Google Scholar 

  • Kaplan, D., Iida, R., & Tokunaga, T. (2009). Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL), 88–95.

  • Karimi, S., Moraes, L., Das, A., Shakery, A., & Verma, R. (2018). Citance-based retrieval and summarization using IR and machine learning. Scientometrics, 116(2), 1331–1366.

    Article  Google Scholar 

  • Klampfl, S., Rexha, A., & Kern, R. (2016). Identifying referenced text in scientific publications by summarisation and classification techniques. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 122–131.

  • Lawrence, S., Giles, C. L., & Bollacker, K. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6), 67–71.

    Article  Google Scholar 

  • Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, 402–407.

  • Li, L., Zhang, Y., Mao, L., Chi, J., Chen, M., & Huang, Z. (2017). CIST@ CLSciSumm-17: Multiple features based citation linkage, classification and summarization. BIRNDL@ SIGIR, 2, 43–54.

    Google Scholar 

  • Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics (pp. 150–157).

  • Liu, M. (1993). Progress in documentation the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4), 370–408.

    Article  Google Scholar 

  • Lopez, P. (2009). GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. International Conference on Theory and Practice of Digital Libraries, 473–474.

  • Lu, K., Mao, J., Li, G., & Xu, J. (2016). Recognizing reference spans and classifying their discourse facets. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 139–145.

  • Ma, S., Xu, J., & Zhang, C. (2018). Automatic identification of cited text spans: A multi-classifier approach over imbalanced dataset. Scientometrics, 116, 1303–1330.

    Article  Google Scholar 

  • Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122(3), 1445–1472.

    Article  Google Scholar 

  • MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.

    Article  Google Scholar 

  • Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16–32.

    Article  Google Scholar 

  • McCain, K., & Turner, K. (1989). Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics, 17(1–2), 127–163.

    Article  Google Scholar 

  • McCallum, A. K., Nigam, K., Rennie, J., & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval, 3(2), 127–163.

    Article  Google Scholar 

  • Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. Proceedings of ACL-08: HLT, 816–824.

  • Mercer, R. E., Di Marco, C., & Kroon, F. W. (2004). The frequency of hedging cues in citation contexts in scientific writing. Conference of the Canadian Society for Computational Studies of Intelligence, 75–88.

  • Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., & Zajic, D. (2009). Using citations to generate surveys of scientific paradigms. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 584–592.

  • Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.

  • Nallapati, R. M., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 542–550.

  • Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., Rodrigues, N. P., & Rife, S. Scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. bioRxiv, 2021.

  • Nomoto, T. (2016). NEAL: A neurally enhanced approach to linking citation and reference. Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), 168–174.

  • Oppenheim, C., & Renn, S. P. (1978). Highly cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information Science, 29(5), 225–231.

    Article  Google Scholar 

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.

    Article  Google Scholar 

  • Piao, S., Ananiadou, S., Tsuruoka, Y., Sasaki, Y., & McNaught, J. (2007). Mining opinion polarity relations of citations. International Workshop on Computational Semantics (IWCS), 366–371.

  • Prabha, C. G. (1983). Some aspects of citation behavior: A pilot study in business administration. Journal of the American Society for Information Science, 34(3), 202–206.

    Article  MathSciNet  Google Scholar 

  • Pride, D., & Knoth, P. (2017). Incidental or influential?–A decade of using text-mining for citation function classification. 16th International Society of Scientometrics and Informetrics Conference.

  • Qazvinian, V., & Radev, D. R. (2010). Identifying non-explicit citing sentences for citation-based summarization. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 555–564.

  • Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., & Moon, T. (2013). Generating extractive summaries of scientific paradigms. Journal of Artificial Intelligence Research, 46, 165–201.

    Article  MathSciNet  Google Scholar 

  • Radev, D. R., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., & Liu, D. (2004). MEAD-a platform for multidocument multilingual text summarization. Lisbon, Portugal: LREC.

  • Ritchie, A., Robertson, S., & Teufel, S. (2008). Comparing citation contexts for information retrieval. Proceedings of the 17th ACM Conference on Information and Knowledge Management, 213–222.

  • Safder, I., & Hassan, S. U. (2019). Bibliometric-enhanced information retrieval: A novel deep feature engineering approach for algorithm searching from full-text publications. Scientometrics, 119(1), 257–277.

    Article  Google Scholar 

  • Safer, M. A., & Tang, R. (2009). The psychology of referencing in psychology journal articles. Perspectives on Psychological Science, 4(1), 51–53.

    Article  Google Scholar 

  • Salton, G. (1963). Associative document retrieval techniques using bibliographic information. Journal of the ACM (JACM), 10(4), 440–457.

    Article  MATH  Google Scholar 

  • See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.

  • Shadish, W. R., Tolliver, D., Gray, M., & Sen Gupta, S. K. (1995). Author judgements about works they cite: Three studies from psychology journals. Social Studies of Science, 25(3), 477–498.

    Article  Google Scholar 

  • Shardlow, M., Batista-Navarro, R., Thompson, P., Nawaz, R., McNaught, J., & Ananiadou, S. (2018). Identification of research hypotheses and new knowledge from scientific literature. BMC Medical Informatics and Decision Making, 18(1), 1–13.

    Article  Google Scholar 

  • Siddharthan, A., & Teufel, S. (2007). Whose idea was this, and why does it matter? Attributing scientific work to citations. Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, 316–323.

  • Small, H. (1982). Citation context analysis. Progress in Communication Sciences, 287–310.

  • Small, H. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71–79.

    Article  Google Scholar 

  • Small, H. (2018). Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty. Journal of Informetrics, 12(2), 461–480.

    Article  Google Scholar 

  • Small, H., Tseng, H., & Patek, M. (2017). Discovering discoveries: Identifying biomedical discoveries using citation contexts. Journal of Informetrics, 11(1), 46–62.

    Article  Google Scholar 

  • Sugiyama, K., Kumar, T., Kan, M.-Y., & Tripathi, R. C. (2010). Identifying citing sentences in research papers using supervised learning. Information Retrieval & Knowledge Management,(CAMP), 2010 International Conference On, 67–72.

  • Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452–464.

    Article  Google Scholar 

  • Tahamtan, I., & Bornmann, L. (2019). What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018. Scientometrics, 121(3), 1635–1684.

    Article  Google Scholar 

  • Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers. 35th German Conference on Artificial Intelligence, 98.

  • Tang, J., & Zhang, J. (2009, April). A discriminative approach to topic-based citation recommendation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 572-579).

  • Taşkın, Z., & Al, U. (2018). A content-based citation analysis study based on text categorization. Scientometrics, 114(1), 335–357.

    Article  Google Scholar 

  • Teufel, S., & Moens, M. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), 409–445.

    Article  Google Scholar 

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 103–110.

  • Thompson, P., Nawaz, R., McNaught, J., & Ananiadou, S. (2017). Enriching news events with meta-knowledge information. Language Resources and Evaluation, 51(2), 409–438.

    Article  Google Scholar 

  • Tkaczyk, D., & Bolikowski, L. (2015). Extracting contextual information from scientific literature using CERMINE system. Semantic Web Evaluation. Challenges, 93–104.

  • Tuarob, S., Kang, S. W., Wettayakorn, P., Pornprasit, C., Sachati, T., Hassan, S.-U., & Haddawy, P. (2019). Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering, 32(10), 1881–1896.

    Article  Google Scholar 

  • Turney, P. D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 417–424.

  • Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.

  • Verma, R., & Lee, D. (2017). Extractive summarization: Limits, compression, generalized model and heuristics. Computación y Sistemas, 21(4), 787–798.

    Google Scholar 

  • Voos, H., & Dagaev, K. S. (1976). Are all citations equal? Or did we op. cit. your idem? Journal of Academic Librarianship, 1(6), 19–21.

    Google Scholar 

  • Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 448–456.

  • Wang, M., Leng, D., Ren, J., Zeng, Y., & Chen, G. (2019). Sentiment classification based on linguistic patterns in citation context. CURRENT SCIENCE, 117(4), 606.

    Article  Google Scholar 

  • Wang, W., Villavicencio, P., & Watanabe, T. (2012). Analysis of reference relationships among research papers, based on citation context. International Journal on Artificial Intelligence Tools, 21(02), 1240004.

    Article  Google Scholar 

  • White, H. D. (2004). Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1), 89–116.

    Article  Google Scholar 

  • Yang, L., Zheng, Y., Cai, X., Dai, H., Mu, D., Guo, L., & Dai, T. (2018). A LSTM based model for personalized context-aware citation recommendation. IEEE Access, 6, 59618–59627.

    Article  Google Scholar 

  • Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A. R., Li, I., Friedman, D., & Radev, D. R. (2019). ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7386–7393.

  • Yin, X., Huang, J. X., & Li, Z. (2011). Mining and modeling linkage information from citation context for improving biomedical literature retrieval. Information Processing & Management, 47(1), 53–67.

    Article  Google Scholar 

  • Yousif, A., Niu, Z., Chambua, J., & Khan, Z. Y. (2019). Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification. Neurocomputing, 335, 195–205.

    Article  Google Scholar 

  • Zafar, L., Ahmed, U., & Islam, M. A. (2019). Citation context analysis using word-graph. 2019 2nd International Conference on Communication, Computing and Digital Systems (C-CODE), 120–125.

  • Zarrinkalam, F., & Kahani, M. (2013). SemCiR: A citation recommendation system based on a novel semantic distance measure. Program, 47(1), 92–112.

    Article  Google Scholar 

  • Zhang, G., Ding, Y., & Milojević, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology, 64(7), 1490–1503.

    Article  Google Scholar 

  • Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.

    Article  Google Scholar 

Download references

Acknowledgements

The authors (Salem Alelyani and Saeed-Ul Hassan) are grateful for the financial support received from King Khalid University for this research Under Grant No. R.G.P2/100/41.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed-Ul Hassan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iqbal, S., Hassan, SU., Aljohani, N.R. et al. A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics 126, 6551–6599 (2021). https://doi.org/10.1007/s11192-021-04055-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04055-1

Keywords

Navigation