Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Athar, A., & Teufel, S. (2012). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 597–601).
Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. In Proceedings of the first workshop on bibliometric-enhanced information retrieval co-located with 36th European conference on information retrieval (ECIR 2014), Amsterdam, The Netherlands.
Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.
Bertin, M., Atanassova, I., Lariviere, V., & Gingras, Y. (2013). The distribution of references in scientific papers: An analysis of the IMRaD structure. In Proceedings of the 14th International Society of Scientometrics and Informetrics Conference (Vol. 1, pp. 591–603). Vienna, Austria.
Bloch, J. (2010). A concordance-based study of the use of reporting verbs as rhetorical devices in academic papers. Journal of Writing Research, 2, 219–244.
Boyack, K. W., Klavans, R., Small, H., & Ungar, L. (2012). Characterizing emergence using a detailed micro-model of science: Investigating two hot topics in nanotechnology. In Proceedings of PICMET'12: Technology Management for Emerging Technologies (pp. 2605–2611).
Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.
Bradshaw, S. (2003). Reference directed indexing: Redeeming relevance for subject search in citation indexes. In International Conference on Theory and Practice of Digital Libraries (pp. 499–510). Berlin: Springer.
Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual cocitation: Augmenting cocitation analysis and its applications. Journal of the American Society for Information Science and Technology, 61(6), 1130–1143.
Catalini, C., Lacetera, N., & Oettl, A. (2015). The incidence and role of negative citations in science. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 13823–13826.
Cavnar, W. B., Trenkle, J. M., et al. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, Ann Arbor MI (Vol. 48113(2), pp. 161–175).
Chang, Y. W. (2013). A comparison of citation contexts between natural sciences and social sciences and humanities. Scientometrics, 96(2), 535–553.
Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5, 423–441.
Cozzens, S. E. (1985). Comparing the sciences: Citation context analysis of papers from neuropharmacology and the sociology of science. Social Studies of Science, 15, 127–153.
Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24.
Cronin, B. (1984). The citation process. The role and significance of citations in scientific communication. London: Taylor Graham.
de Solla Price, D. J. (1963). Little Science, Big Science. New York: Columbia University Press.
Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.
Frost, C. (1979). The use of citations in literary research: Preliminary classification of citation functions. Library Quarterly, 49(4), 399–414.
Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics. doi:10.1186/s13326-015-0037-x.
Garfield, E. (1964). Can citation indexing be automated? In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 1, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.
Garfield, E., et al. (1972). Citation analysis as a tool in journal evaluation. American Association for the Advancement of Science. Retrieved from http://www.elshami.com/Terms/I/impactfactor-Garfield.pdf.
Gilbert, G. N. (1977). Referencing as persuasion. Social Studies of Science, 7(1), 113–122.
Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In E. Witten, R. Akseyn, & F. M. Shipman III (Eds.), Digital libraries 98: The third ACM conference on digital libraries (pp. 89–98). New York: ACM Press.
Gipp, B., & Beel, J. (2009). Identifying related documents for research paper recommender by CPA and COA. Paper presented at the proceedings of international conference on education and information technology, Berkeley.
Halevi, G., & Moed, H. F. (2013). The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology, 64(9), 1903–1913.
Hargens, L. L. (2000). Using the literature: Reference networks, reference contexts, and the social structure of scholarship. American Sociological Review, 65(6), 846–865.
Hopper, P. (1987). Emergent grammar. In Annual Meeting of the Berkeley Linguistics Society (Vol. 13, pp. 139–157).
Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341–367.
Jörg, B. (2008). Towards the nature of citations. In Poster proceedings of the 5th international conference on formal ontology in information systems (FOIS 2008).
Kaplan, N. (1965). The norms of citation behavior: Prolegomena to the footnote. American Documentation, 16(3), 179–184.
Leydesdorff, L. (1998). Theories of citation? Scientometrics, 43(1), 5–25. doi:10.1007/BF02458391.
Lipetz, B. A. (1965). Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation, 16(2), 81–90.
Liu, M. (1993). Progress in documentation the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4), 370–408.
Liu, S., & Chen, C. (2012). The proximity of co-citation. Scientometrics, 91(2), 495–511.
Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293–1307.
Marici, S., Spaventi, J., Pavicic, L., & Pifat-Mrzljak, G. (1998). Citation context versus the frequency counts of citation histories. Journal of the American Society for Information Science, 49(6), 530–540.
Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. Paper presented at the proceedings of ACL ‘08, Columbus.
Merton, R. K. (1957). Priorities in scientific discovery: A chapter in the sociology of science. American Sociological Review, 22(6), 635–659.
Merton, R. K. (1961). Singletons and multiples in scientific discovery: A chapter in the sociology of science. Proceedings of the American Philosophical Society, 105(5), 470–486.
Merton, R. K. (1988). The Matthew Effect in Science, II: Cumulative advantage and the symbolism of intellectual property. Isis, 79(4), 606–623.
Moed, H. F. (2005). Citation analysis in research evaluation, information science and knowledge management. Berlin: Springer.
Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., et al. (2009). Using citations to generate surveys of scientific paradigms. Paper presented at the proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, Boulder.
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.
Nakov, P. I., Schwartz, A. S., & Hearst, M. A. (2004). Citances: Citation sentences for semantic analysis of bioscience text. Paper presented at the SIGIR 2004 workshop on search and discovery in bioinformatics, Sheffield.
Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. Paper presented at the 16th international joint conference on artificial intelligence, Stockholm.
Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends in Information Retrieval, 5(2–3), 103–233.
O’Connor, J. (1982). Citing statements: Computer recognition and use to improve retrieval. Information Processing and Management, 18, 125–131.
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.
Sakita, T. I. (2002). Reporting discourse, tense, and cognition. Oxford: Elsevier.
Schneider, J. W. (2006). Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols. Scientometrics, 68(3), 573–593.
Siddharthan, A., & Teufel, S. (2007). Whose idea was this, and why does it matter? Attributing scientific work to citations. In Proceedings of HLT-NAACL (pp. 316–323).
Sieweke, J. (2014). Peirre Bourdieu in management and organization studies—A citation context analysis and discussion of contributions. Scandinavian Journal of Management, 30(4), 532–543.
Small, H. G. (1978). Cited documents as concept symbols. Social Studies of Science, 8(3), 327–340.
Small, H. G. (1979). Co-citation context analysis: Relationship between bibliometric structure and knowledge. Proceedings of the American Society for Information Science, 16, 270–275.
Small, H. G. (1982). Citation context analysis. Progress in Communication Sciences, 3, 287–310.
Small, H. G. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71–79.
Small, H. G. (2011). Interpreting maps of science using citation context sentiments: A preliminary investgation. Scientometrics, 87(2), 373–388.
Spiegel-Rösing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7, 97–113.
Sugimoto, C. R. (Ed.). (2016). Theories of informetrics and scholarly communication (p. 426). Berlin: De Gruyter Mouton.
Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452–464.
Swales, J. (1986). Citation analysis and discourse analysis. Applied Linguistics, 7(1), 39–56.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 103–110). Sydney, Australia: Association for Computational Linguistics.
Teufel, S., Siddharthan, A., & Tidhar, D. (2009). An annotation scheme for citation function. In Proceedings of the 7th SIGdial workshop on discourse and dialogue (pp. 80–87). Association for Computational Linguistics.
Voos, H., & Dagaev, K. (1976). Are all citations equal? Or did we op cit your idem? Journal of Academic Librarianship, 6(1), 19–21.
White, H. D. (2004). Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1), 89–116.
Willett, P. (2013). Readers’ perceptions of authors’ citation behavior. Journal of Documentation, 69(1), 145–156.
Zhao, D., & Strotmann, A. (2014). In-text author citation analysis: Feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology, 65(11), 2348–2358.
We thank Benoit Macaluso of the Observatoire des Sciences et des Technologies (OST), Montreal, Canada, for harvesting and providing the PLOS dataset.
About this article
Cite this article
Bertin, M., Atanassova, I., Sugimoto, C.R. et al. The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics 109, 1417–1434 (2016). https://doi.org/10.1007/s11192-016-2134-8
- Content citation analysis
- Discursive patterns
- Citation function
- Rhetorical structure