, Volume 109, Issue 3, pp 1417–1434 | Cite as

The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

  • Marc Bertin
  • Iana Atanassova
  • Cassidy R. Sugimoto
  • Vincent Lariviere


Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles.


Content citation analysis IMRaD Discursive patterns Citation function Rhetorical structure n-grams 



We thank Benoit Macaluso of the Observatoire des Sciences et des Technologies (OST), Montreal, Canada, for harvesting and providing the PLOS dataset.


  1. Athar, A., & Teufel, S. (2012). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 597–601).Google Scholar
  2. Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. In Proceedings of the first workshop on bibliometric-enhanced information retrieval co-located with 36th European conference on information retrieval (ECIR 2014), Amsterdam, The Netherlands.Google Scholar
  3. Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.CrossRefGoogle Scholar
  4. Bertin, M., Atanassova, I., Lariviere, V., & Gingras, Y. (2013). The distribution of references in scientific papers: An analysis of the IMRaD structure. In Proceedings of the 14th International Society of Scientometrics and Informetrics Conference (Vol. 1, pp. 591–603). Vienna, Austria.Google Scholar
  5. Bloch, J. (2010). A concordance-based study of the use of reporting verbs as rhetorical devices in academic papers. Journal of Writing Research, 2, 219–244.CrossRefGoogle Scholar
  6. Boyack, K. W., Klavans, R., Small, H., & Ungar, L. (2012). Characterizing emergence using a detailed micro-model of science: Investigating two hot topics in nanotechnology. In Proceedings of PICMET'12: Technology Management for Emerging Technologies (pp. 2605–2611).Google Scholar
  7. Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.CrossRefGoogle Scholar
  8. Bradshaw, S. (2003). Reference directed indexing: Redeeming relevance for subject search in citation indexes. In International Conference on Theory and Practice of Digital Libraries (pp. 499–510). Berlin: Springer.Google Scholar
  9. Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual cocitation: Augmenting cocitation analysis and its applications. Journal of the American Society for Information Science and Technology, 61(6), 1130–1143.Google Scholar
  10. Catalini, C., Lacetera, N., & Oettl, A. (2015). The incidence and role of negative citations in science. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 13823–13826.CrossRefGoogle Scholar
  11. Cavnar, W. B., Trenkle, J. M., et al. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, Ann Arbor MI (Vol. 48113(2), pp. 161–175).Google Scholar
  12. Chang, Y. W. (2013). A comparison of citation contexts between natural sciences and social sciences and humanities. Scientometrics, 96(2), 535–553.CrossRefGoogle Scholar
  13. Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5, 423–441.CrossRefGoogle Scholar
  14. Cozzens, S. E. (1985). Comparing the sciences: Citation context analysis of papers from neuropharmacology and the sociology of science. Social Studies of Science, 15, 127–153.CrossRefGoogle Scholar
  15. Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24.CrossRefGoogle Scholar
  16. Cronin, B. (1984). The citation process. The role and significance of citations in scientific communication. London: Taylor Graham.Google Scholar
  17. de Solla Price, D. J. (1963). Little Science, Big Science. New York: Columbia University Press.Google Scholar
  18. Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.CrossRefGoogle Scholar
  19. Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.CrossRefGoogle Scholar
  20. Frost, C. (1979). The use of citations in literary research: Preliminary classification of citation functions. Library Quarterly, 49(4), 399–414.CrossRefGoogle Scholar
  21. Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics. doi: 10.1186/s13326-015-0037-x.
  22. Garfield, E. (1964). Can citation indexing be automated? In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 1, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.Google Scholar
  23. Garfield, E., et al. (1972). Citation analysis as a tool in journal evaluation. American Association for the Advancement of Science. Retrieved from
  24. Gilbert, G. N. (1977). Referencing as persuasion. Social Studies of Science, 7(1), 113–122.CrossRefGoogle Scholar
  25. Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In E. Witten, R. Akseyn, & F. M. Shipman III (Eds.), Digital libraries 98: The third ACM conference on digital libraries (pp. 89–98). New York: ACM Press.CrossRefGoogle Scholar
  26. Gipp, B., & Beel, J. (2009). Identifying related documents for research paper recommender by CPA and COA. Paper presented at the proceedings of international conference on education and information technology, Berkeley.Google Scholar
  27. Halevi, G., & Moed, H. F. (2013). The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology, 64(9), 1903–1913.CrossRefGoogle Scholar
  28. Hargens, L. L. (2000). Using the literature: Reference networks, reference contexts, and the social structure of scholarship. American Sociological Review, 65(6), 846–865.CrossRefGoogle Scholar
  29. Hopper, P. (1987). Emergent grammar. In Annual Meeting of the Berkeley Linguistics Society (Vol. 13, pp. 139–157).Google Scholar
  30. Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341–367.Google Scholar
  31. Jörg, B. (2008). Towards the nature of citations. In Poster proceedings of the 5th international conference on formal ontology in information systems (FOIS 2008).Google Scholar
  32. Kaplan, N. (1965). The norms of citation behavior: Prolegomena to the footnote. American Documentation, 16(3), 179–184.CrossRefGoogle Scholar
  33. Leydesdorff, L. (1998). Theories of citation? Scientometrics, 43(1), 5–25. doi: 10.1007/BF02458391.CrossRefGoogle Scholar
  34. Lipetz, B. A. (1965). Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation, 16(2), 81–90.CrossRefGoogle Scholar
  35. Liu, M. (1993). Progress in documentation the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4), 370–408.Google Scholar
  36. Liu, S., & Chen, C. (2012). The proximity of co-citation. Scientometrics, 91(2), 495–511.MathSciNetCrossRefGoogle Scholar
  37. Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293–1307.CrossRefGoogle Scholar
  38. Marici, S., Spaventi, J., Pavicic, L., & Pifat-Mrzljak, G. (1998). Citation context versus the frequency counts of citation histories. Journal of the American Society for Information Science, 49(6), 530–540.CrossRefGoogle Scholar
  39. Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. Paper presented at the proceedings of ACL ‘08, Columbus.Google Scholar
  40. Merton, R. K. (1957). Priorities in scientific discovery: A chapter in the sociology of science. American Sociological Review, 22(6), 635–659.Google Scholar
  41. Merton, R. K. (1961). Singletons and multiples in scientific discovery: A chapter in the sociology of science. Proceedings of the American Philosophical Society, 105(5), 470–486.Google Scholar
  42. Merton, R. K. (1988). The Matthew Effect in Science, II: Cumulative advantage and the symbolism of intellectual property. Isis, 79(4), 606–623.Google Scholar
  43. Moed, H. F. (2005). Citation analysis in research evaluation, information science and knowledge management. Berlin: Springer.Google Scholar
  44. Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., et al. (2009). Using citations to generate surveys of scientific paradigms. Paper presented at the proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, Boulder.Google Scholar
  45. Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.CrossRefGoogle Scholar
  46. Nakov, P. I., Schwartz, A. S., & Hearst, M. A. (2004). Citances: Citation sentences for semantic analysis of bioscience text. Paper presented at the SIGIR 2004 workshop on search and discovery in bioinformatics, Sheffield.Google Scholar
  47. Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. Paper presented at the 16th international joint conference on artificial intelligence, Stockholm.Google Scholar
  48. Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends in Information Retrieval, 5(2–3), 103–233.CrossRefGoogle Scholar
  49. O’Connor, J. (1982). Citing statements: Computer recognition and use to improve retrieval. Information Processing and Management, 18, 125–131.CrossRefGoogle Scholar
  50. Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.CrossRefGoogle Scholar
  51. Sakita, T. I. (2002). Reporting discourse, tense, and cognition. Oxford: Elsevier.Google Scholar
  52. Schneider, J. W. (2006). Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols. Scientometrics, 68(3), 573–593.CrossRefGoogle Scholar
  53. Siddharthan, A., & Teufel, S. (2007). Whose idea was this, and why does it matter? Attributing scientific work to citations. In Proceedings of HLT-NAACL (pp. 316–323).Google Scholar
  54. Sieweke, J. (2014). Peirre Bourdieu in management and organization studies—A citation context analysis and discussion of contributions. Scandinavian Journal of Management, 30(4), 532–543.CrossRefGoogle Scholar
  55. Small, H. G. (1978). Cited documents as concept symbols. Social Studies of Science, 8(3), 327–340.Google Scholar
  56. Small, H. G. (1979). Co-citation context analysis: Relationship between bibliometric structure and knowledge. Proceedings of the American Society for Information Science, 16, 270–275.Google Scholar
  57. Small, H. G. (1982). Citation context analysis. Progress in Communication Sciences, 3, 287–310.Google Scholar
  58. Small, H. G. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71–79.Google Scholar
  59. Small, H. G. (2011). Interpreting maps of science using citation context sentiments: A preliminary investgation. Scientometrics, 87(2), 373–388.CrossRefGoogle Scholar
  60. Spiegel-Rösing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7, 97–113.CrossRefGoogle Scholar
  61. Sugimoto, C. R. (Ed.). (2016). Theories of informetrics and scholarly communication (p. 426). Berlin: De Gruyter Mouton.Google Scholar
  62. Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452–464.CrossRefGoogle Scholar
  63. Swales, J. (1986). Citation analysis and discourse analysis. Applied Linguistics, 7(1), 39–56.CrossRefGoogle Scholar
  64. Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 103–110). Sydney, Australia: Association for Computational Linguistics.Google Scholar
  65. Teufel, S., Siddharthan, A., & Tidhar, D. (2009). An annotation scheme for citation function. In Proceedings of the 7th SIGdial workshop on discourse and dialogue (pp. 80–87). Association for Computational Linguistics.Google Scholar
  66. Voos, H., & Dagaev, K. (1976). Are all citations equal? Or did we op cit your idem? Journal of Academic Librarianship, 6(1), 19–21.Google Scholar
  67. White, H. D. (2004). Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1), 89–116.CrossRefGoogle Scholar
  68. Willett, P. (2013). Readers’ perceptions of authors’ citation behavior. Journal of Documentation, 69(1), 145–156.CrossRefGoogle Scholar
  69. Zhao, D., & Strotmann, A. (2014). In-text author citation analysis: Feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology, 65(11), 2348–2358.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2016

Authors and Affiliations

  1. 1.Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST)Université du Québec à MontréalMontrealCanada
  2. 2.Centre de Recherche en Linguistique et Traitement Automatique des Langues “Lucien Tesnière”Université de Bourgogne Franche-ComtéBesançonFrance
  3. 3.School of Informatics and ComputingIndiana University BloomingtonBloomingtonUSA
  4. 4.École de bibliothéconomie et des sciences de l’informationUniversité de MontréalMontrealCanada
  5. 5.Observatoire des Sciences et des Technologies (OST), Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST)Université du Québec à MontréalMontrealCanada

Personalised recommendations