Skip to main content
Log in

The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Athar, A., & Teufel, S. (2012). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 597–601).

  • Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. In Proceedings of the first workshop on bibliometric-enhanced information retrieval co-located with 36th European conference on information retrieval (ECIR 2014), Amsterdam, The Netherlands.

  • Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.

    Article  Google Scholar 

  • Bertin, M., Atanassova, I., Lariviere, V., & Gingras, Y. (2013). The distribution of references in scientific papers: An analysis of the IMRaD structure. In Proceedings of the 14th International Society of Scientometrics and Informetrics Conference (Vol. 1, pp. 591–603). Vienna, Austria.

  • Bloch, J. (2010). A concordance-based study of the use of reporting verbs as rhetorical devices in academic papers. Journal of Writing Research, 2, 219–244.

    Article  Google Scholar 

  • Boyack, K. W., Klavans, R., Small, H., & Ungar, L. (2012). Characterizing emergence using a detailed micro-model of science: Investigating two hot topics in nanotechnology. In Proceedings of PICMET'12: Technology Management for Emerging Technologies (pp. 2605–2611).

  • Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.

    Article  Google Scholar 

  • Bradshaw, S. (2003). Reference directed indexing: Redeeming relevance for subject search in citation indexes. In International Conference on Theory and Practice of Digital Libraries (pp. 499–510). Berlin: Springer.

  • Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual cocitation: Augmenting cocitation analysis and its applications. Journal of the American Society for Information Science and Technology, 61(6), 1130–1143.

    Google Scholar 

  • Catalini, C., Lacetera, N., & Oettl, A. (2015). The incidence and role of negative citations in science. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 13823–13826.

    Article  Google Scholar 

  • Cavnar, W. B., Trenkle, J. M., et al. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, Ann Arbor MI (Vol. 48113(2), pp. 161–175).

  • Chang, Y. W. (2013). A comparison of citation contexts between natural sciences and social sciences and humanities. Scientometrics, 96(2), 535–553.

    Article  Google Scholar 

  • Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5, 423–441.

    Article  Google Scholar 

  • Cozzens, S. E. (1985). Comparing the sciences: Citation context analysis of papers from neuropharmacology and the sociology of science. Social Studies of Science, 15, 127–153.

    Article  Google Scholar 

  • Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24.

    Article  Google Scholar 

  • Cronin, B. (1984). The citation process. The role and significance of citations in scientific communication. London: Taylor Graham.

    Google Scholar 

  • de Solla Price, D. J. (1963). Little Science, Big Science. New York: Columbia University Press.

  • Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.

    Article  Google Scholar 

  • Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.

    Article  Google Scholar 

  • Frost, C. (1979). The use of citations in literary research: Preliminary classification of citation functions. Library Quarterly, 49(4), 399–414.

    Article  Google Scholar 

  • Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics. doi:10.1186/s13326-015-0037-x.

  • Garfield, E. (1964). Can citation indexing be automated? In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 1, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.

  • Garfield, E., et al. (1972). Citation analysis as a tool in journal evaluation. American Association for the Advancement of Science. Retrieved from http://www.elshami.com/Terms/I/impactfactor-Garfield.pdf.

  • Gilbert, G. N. (1977). Referencing as persuasion. Social Studies of Science, 7(1), 113–122.

    Article  Google Scholar 

  • Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In E. Witten, R. Akseyn, & F. M. Shipman III (Eds.), Digital libraries 98: The third ACM conference on digital libraries (pp. 89–98). New York: ACM Press.

    Chapter  Google Scholar 

  • Gipp, B., & Beel, J. (2009). Identifying related documents for research paper recommender by CPA and COA. Paper presented at the proceedings of international conference on education and information technology, Berkeley.

  • Halevi, G., & Moed, H. F. (2013). The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology, 64(9), 1903–1913.

    Article  Google Scholar 

  • Hargens, L. L. (2000). Using the literature: Reference networks, reference contexts, and the social structure of scholarship. American Sociological Review, 65(6), 846–865.

    Article  Google Scholar 

  • Hopper, P. (1987). Emergent grammar. In Annual Meeting of the Berkeley Linguistics Society (Vol. 13, pp. 139–157).

  • Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341–367.

  • Jörg, B. (2008). Towards the nature of citations. In Poster proceedings of the 5th international conference on formal ontology in information systems (FOIS 2008).

  • Kaplan, N. (1965). The norms of citation behavior: Prolegomena to the footnote. American Documentation, 16(3), 179–184.

    Article  Google Scholar 

  • Leydesdorff, L. (1998). Theories of citation? Scientometrics, 43(1), 5–25. doi:10.1007/BF02458391.

    Article  Google Scholar 

  • Lipetz, B. A. (1965). Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation, 16(2), 81–90.

    Article  Google Scholar 

  • Liu, M. (1993). Progress in documentation the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4), 370–408.

  • Liu, S., & Chen, C. (2012). The proximity of co-citation. Scientometrics, 91(2), 495–511.

    Article  MathSciNet  Google Scholar 

  • Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293–1307.

    Article  Google Scholar 

  • Marici, S., Spaventi, J., Pavicic, L., & Pifat-Mrzljak, G. (1998). Citation context versus the frequency counts of citation histories. Journal of the American Society for Information Science, 49(6), 530–540.

    Article  Google Scholar 

  • Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. Paper presented at the proceedings of ACL ‘08, Columbus.

  • Merton, R. K. (1957). Priorities in scientific discovery: A chapter in the sociology of science. American Sociological Review, 22(6), 635–659.

  • Merton, R. K. (1961). Singletons and multiples in scientific discovery: A chapter in the sociology of science. Proceedings of the American Philosophical Society, 105(5), 470–486.

  • Merton, R. K. (1988). The Matthew Effect in Science, II: Cumulative advantage and the symbolism of intellectual property. Isis, 79(4), 606–623.

  • Moed, H. F. (2005). Citation analysis in research evaluation, information science and knowledge management. Berlin: Springer.

    Google Scholar 

  • Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., et al. (2009). Using citations to generate surveys of scientific paradigms. Paper presented at the proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, Boulder.

  • Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.

    Article  Google Scholar 

  • Nakov, P. I., Schwartz, A. S., & Hearst, M. A. (2004). Citances: Citation sentences for semantic analysis of bioscience text. Paper presented at the SIGIR 2004 workshop on search and discovery in bioinformatics, Sheffield.

  • Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. Paper presented at the 16th international joint conference on artificial intelligence, Stockholm.

  • Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends in Information Retrieval, 5(2–3), 103–233.

    Article  Google Scholar 

  • O’Connor, J. (1982). Citing statements: Computer recognition and use to improve retrieval. Information Processing and Management, 18, 125–131.

    Article  Google Scholar 

  • Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.

    Article  Google Scholar 

  • Sakita, T. I. (2002). Reporting discourse, tense, and cognition. Oxford: Elsevier.

  • Schneider, J. W. (2006). Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols. Scientometrics, 68(3), 573–593.

    Article  Google Scholar 

  • Siddharthan, A., & Teufel, S. (2007). Whose idea was this, and why does it matter? Attributing scientific work to citations. In Proceedings of HLT-NAACL (pp. 316–323).

  • Sieweke, J. (2014). Peirre Bourdieu in management and organization studies—A citation context analysis and discussion of contributions. Scandinavian Journal of Management, 30(4), 532–543.

    Article  Google Scholar 

  • Small, H. G. (1978). Cited documents as concept symbols. Social Studies of Science, 8(3), 327–340.

  • Small, H. G. (1979). Co-citation context analysis: Relationship between bibliometric structure and knowledge. Proceedings of the American Society for Information Science, 16, 270–275.

    Google Scholar 

  • Small, H. G. (1982). Citation context analysis. Progress in Communication Sciences, 3, 287–310.

    Google Scholar 

  • Small, H. G. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71–79.

  • Small, H. G. (2011). Interpreting maps of science using citation context sentiments: A preliminary investgation. Scientometrics, 87(2), 373–388.

    Article  Google Scholar 

  • Spiegel-Rösing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7, 97–113.

    Article  Google Scholar 

  • Sugimoto, C. R. (Ed.). (2016). Theories of informetrics and scholarly communication (p. 426). Berlin: De Gruyter Mouton.

    Google Scholar 

  • Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452–464.

    Article  Google Scholar 

  • Swales, J. (1986). Citation analysis and discourse analysis. Applied Linguistics, 7(1), 39–56.

    Article  Google Scholar 

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 103–110). Sydney, Australia: Association for Computational Linguistics.

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2009). An annotation scheme for citation function. In Proceedings of the 7th SIGdial workshop on discourse and dialogue (pp. 80–87). Association for Computational Linguistics.

  • Voos, H., & Dagaev, K. (1976). Are all citations equal? Or did we op cit your idem? Journal of Academic Librarianship, 6(1), 19–21.

    Google Scholar 

  • White, H. D. (2004). Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1), 89–116.

    Article  Google Scholar 

  • Willett, P. (2013). Readers’ perceptions of authors’ citation behavior. Journal of Documentation, 69(1), 145–156.

    Article  MathSciNet  Google Scholar 

  • Zhao, D., & Strotmann, A. (2014). In-text author citation analysis: Feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology, 65(11), 2348–2358.

Download references

Acknowledgments

We thank Benoit Macaluso of the Observatoire des Sciences et des Technologies (OST), Montreal, Canada, for harvesting and providing the PLOS dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iana Atanassova.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertin, M., Atanassova, I., Sugimoto, C.R. et al. The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics 109, 1417–1434 (2016). https://doi.org/10.1007/s11192-016-2134-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2134-8

Keywords

Navigation