The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

Abstract

Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Athar, A., & Teufel, S. (2012). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 597–601).

  2. Bertin, M., & Atanassova, I. (2014). A study of lexical distribution in citation contexts through the IMRaD standard. In Proceedings of the first workshop on bibliometric-enhanced information retrieval co-located with 36th European conference on information retrieval (ECIR 2014), Amsterdam, The Netherlands.

  3. Bertin, M., Atanassova, I., Gingras, Y., & Larivière, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.

    Article  Google Scholar 

  4. Bertin, M., Atanassova, I., Lariviere, V., & Gingras, Y. (2013). The distribution of references in scientific papers: An analysis of the IMRaD structure. In Proceedings of the 14th International Society of Scientometrics and Informetrics Conference (Vol. 1, pp. 591–603). Vienna, Austria.

  5. Bloch, J. (2010). A concordance-based study of the use of reporting verbs as rhetorical devices in academic papers. Journal of Writing Research, 2, 219–244.

    Article  Google Scholar 

  6. Boyack, K. W., Klavans, R., Small, H., & Ungar, L. (2012). Characterizing emergence using a detailed micro-model of science: Investigating two hot topics in nanotechnology. In Proceedings of PICMET'12: Technology Management for Emerging Technologies (pp. 2605–2611).

  7. Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.

    Article  Google Scholar 

  8. Bradshaw, S. (2003). Reference directed indexing: Redeeming relevance for subject search in citation indexes. In International Conference on Theory and Practice of Digital Libraries (pp. 499–510). Berlin: Springer.

  9. Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual cocitation: Augmenting cocitation analysis and its applications. Journal of the American Society for Information Science and Technology, 61(6), 1130–1143.

    Google Scholar 

  10. Catalini, C., Lacetera, N., & Oettl, A. (2015). The incidence and role of negative citations in science. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 13823–13826.

    Article  Google Scholar 

  11. Cavnar, W. B., Trenkle, J. M., et al. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, Ann Arbor MI (Vol. 48113(2), pp. 161–175).

  12. Chang, Y. W. (2013). A comparison of citation contexts between natural sciences and social sciences and humanities. Scientometrics, 96(2), 535–553.

    Article  Google Scholar 

  13. Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5, 423–441.

    Article  Google Scholar 

  14. Cozzens, S. E. (1985). Comparing the sciences: Citation context analysis of papers from neuropharmacology and the sociology of science. Social Studies of Science, 15, 127–153.

    Article  Google Scholar 

  15. Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24.

    Article  Google Scholar 

  16. Cronin, B. (1984). The citation process. The role and significance of citations in scientific communication. London: Taylor Graham.

    Google Scholar 

  17. de Solla Price, D. J. (1963). Little Science, Big Science. New York: Columbia University Press.

  18. Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.

    Article  Google Scholar 

  19. Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.

    Article  Google Scholar 

  20. Frost, C. (1979). The use of citations in literary research: Preliminary classification of citation functions. Library Quarterly, 49(4), 399–414.

    Article  Google Scholar 

  21. Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics. doi:10.1186/s13326-015-0037-x.

  22. Garfield, E. (1964). Can citation indexing be automated? In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 1, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.

  23. Garfield, E., et al. (1972). Citation analysis as a tool in journal evaluation. American Association for the Advancement of Science. Retrieved from http://www.elshami.com/Terms/I/impactfactor-Garfield.pdf.

  24. Gilbert, G. N. (1977). Referencing as persuasion. Social Studies of Science, 7(1), 113–122.

    Article  Google Scholar 

  25. Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In E. Witten, R. Akseyn, & F. M. Shipman III (Eds.), Digital libraries 98: The third ACM conference on digital libraries (pp. 89–98). New York: ACM Press.

    Google Scholar 

  26. Gipp, B., & Beel, J. (2009). Identifying related documents for research paper recommender by CPA and COA. Paper presented at the proceedings of international conference on education and information technology, Berkeley.

  27. Halevi, G., & Moed, H. F. (2013). The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology, 64(9), 1903–1913.

    Article  Google Scholar 

  28. Hargens, L. L. (2000). Using the literature: Reference networks, reference contexts, and the social structure of scholarship. American Sociological Review, 65(6), 846–865.

    Article  Google Scholar 

  29. Hopper, P. (1987). Emergent grammar. In Annual Meeting of the Berkeley Linguistics Society (Vol. 13, pp. 139–157).

  30. Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341–367.

  31. Jörg, B. (2008). Towards the nature of citations. In Poster proceedings of the 5th international conference on formal ontology in information systems (FOIS 2008).

  32. Kaplan, N. (1965). The norms of citation behavior: Prolegomena to the footnote. American Documentation, 16(3), 179–184.

    Article  Google Scholar 

  33. Leydesdorff, L. (1998). Theories of citation? Scientometrics, 43(1), 5–25. doi:10.1007/BF02458391.

    Article  Google Scholar 

  34. Lipetz, B. A. (1965). Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation, 16(2), 81–90.

    Article  Google Scholar 

  35. Liu, M. (1993). Progress in documentation the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4), 370–408.

  36. Liu, S., & Chen, C. (2012). The proximity of co-citation. Scientometrics, 91(2), 495–511.

    MathSciNet  Article  Google Scholar 

  37. Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293–1307.

    Article  Google Scholar 

  38. Marici, S., Spaventi, J., Pavicic, L., & Pifat-Mrzljak, G. (1998). Citation context versus the frequency counts of citation histories. Journal of the American Society for Information Science, 49(6), 530–540.

    Article  Google Scholar 

  39. Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. Paper presented at the proceedings of ACL ‘08, Columbus.

  40. Merton, R. K. (1957). Priorities in scientific discovery: A chapter in the sociology of science. American Sociological Review, 22(6), 635–659.

  41. Merton, R. K. (1961). Singletons and multiples in scientific discovery: A chapter in the sociology of science. Proceedings of the American Philosophical Society, 105(5), 470–486.

  42. Merton, R. K. (1988). The Matthew Effect in Science, II: Cumulative advantage and the symbolism of intellectual property. Isis, 79(4), 606–623.

  43. Moed, H. F. (2005). Citation analysis in research evaluation, information science and knowledge management. Berlin: Springer.

    Google Scholar 

  44. Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., et al. (2009). Using citations to generate surveys of scientific paradigms. Paper presented at the proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, Boulder.

  45. Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.

    Article  Google Scholar 

  46. Nakov, P. I., Schwartz, A. S., & Hearst, M. A. (2004). Citances: Citation sentences for semantic analysis of bioscience text. Paper presented at the SIGIR 2004 workshop on search and discovery in bioinformatics, Sheffield.

  47. Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. Paper presented at the 16th international joint conference on artificial intelligence, Stockholm.

  48. Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends in Information Retrieval, 5(2–3), 103–233.

    Article  Google Scholar 

  49. O’Connor, J. (1982). Citing statements: Computer recognition and use to improve retrieval. Information Processing and Management, 18, 125–131.

    Article  Google Scholar 

  50. Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.

    Article  Google Scholar 

  51. Sakita, T. I. (2002). Reporting discourse, tense, and cognition. Oxford: Elsevier.

  52. Schneider, J. W. (2006). Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols. Scientometrics, 68(3), 573–593.

    Article  Google Scholar 

  53. Siddharthan, A., & Teufel, S. (2007). Whose idea was this, and why does it matter? Attributing scientific work to citations. In Proceedings of HLT-NAACL (pp. 316–323).

  54. Sieweke, J. (2014). Peirre Bourdieu in management and organization studies—A citation context analysis and discussion of contributions. Scandinavian Journal of Management, 30(4), 532–543.

    Article  Google Scholar 

  55. Small, H. G. (1978). Cited documents as concept symbols. Social Studies of Science, 8(3), 327–340.

  56. Small, H. G. (1979). Co-citation context analysis: Relationship between bibliometric structure and knowledge. Proceedings of the American Society for Information Science, 16, 270–275.

    Google Scholar 

  57. Small, H. G. (1982). Citation context analysis. Progress in Communication Sciences, 3, 287–310.

    Google Scholar 

  58. Small, H. G. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71–79.

  59. Small, H. G. (2011). Interpreting maps of science using citation context sentiments: A preliminary investgation. Scientometrics, 87(2), 373–388.

    Article  Google Scholar 

  60. Spiegel-Rösing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7, 97–113.

    Article  Google Scholar 

  61. Sugimoto, C. R. (Ed.). (2016). Theories of informetrics and scholarly communication (p. 426). Berlin: De Gruyter Mouton.

    Google Scholar 

  62. Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452–464.

    Article  Google Scholar 

  63. Swales, J. (1986). Citation analysis and discourse analysis. Applied Linguistics, 7(1), 39–56.

    Article  Google Scholar 

  64. Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 103–110). Sydney, Australia: Association for Computational Linguistics.

  65. Teufel, S., Siddharthan, A., & Tidhar, D. (2009). An annotation scheme for citation function. In Proceedings of the 7th SIGdial workshop on discourse and dialogue (pp. 80–87). Association for Computational Linguistics.

  66. Voos, H., & Dagaev, K. (1976). Are all citations equal? Or did we op cit your idem? Journal of Academic Librarianship, 6(1), 19–21.

    Google Scholar 

  67. White, H. D. (2004). Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1), 89–116.

    Article  Google Scholar 

  68. Willett, P. (2013). Readers’ perceptions of authors’ citation behavior. Journal of Documentation, 69(1), 145–156.

    MathSciNet  Article  Google Scholar 

  69. Zhao, D., & Strotmann, A. (2014). In-text author citation analysis: Feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology, 65(11), 2348–2358.

Download references

Acknowledgments

We thank Benoit Macaluso of the Observatoire des Sciences et des Technologies (OST), Montreal, Canada, for harvesting and providing the PLOS dataset.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Iana Atanassova.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bertin, M., Atanassova, I., Sugimoto, C.R. et al. The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics 109, 1417–1434 (2016). https://doi.org/10.1007/s11192-016-2134-8

Download citation

Keywords

  • Content citation analysis
  • IMRaD
  • Discursive patterns
  • Citation function
  • Rhetorical structure
  • n-grams