Skip to main content
Log in

Language use reflects scientific methodology: A corpus-based study of peer-reviewed journal articles

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Recently, philosophers of science have argued that the epistemological requirements of different scientific fields lead necessarily to differences in scientific method. In this paper, we examine possible variation in how language is used in peer-reviewed journal articles from various fields to see if features of such variation may help to elucidate and support claims of methodological variation among the sciences. We hypothesize that significant methodological differences will be reflected in related differences in scientists’ language style.

This paper reports a corpus-based study of peer-reviewed articles from twelve separate journals in six fields of experimental and historical sciences. Machine learning methods were applied to compare the discourse styles of articles in different fields, based on easily-extracted linguistic features of the text. Features included function word frequencies, as used often in computational stylistics, as well as lexical features based on systemic functional linguistics, which affords rich resources for comparative textual analysis. We found that indeed the style of writing in the historical sciences is readily distinguishable from that of the experimental sciences. Furthermore, the most significant linguistic features of these distinctive styles are directly related to the methodological differences posited by philosophers of science between historical and experimental sciences, lending empirical weight to their contentions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abrams, E., Wandersee, J. H. (1995). How does biological knowledge grow? A study of life scientists’ research practice. Journal of Research in Science Teaching, 32(6): 643–663.

    Article  Google Scholar 

  • American Association For The Advancement Of Science. (1990). Benchmarks for Science Literacy. New York: Oxford University Press.

    Google Scholar 

  • Argamon, S., Koppel, M., Avneri, G. (1998a). Routing documents according to style. In: Proc. Int’l Workshop on Innovative Internet Information Systems. Pisa, Italy.

  • Argamon, S., Koppel, M., Avneri, G. (1998b). Style-based text categorization: What newspaper am I reading? In: Proc. AAAI Workshop on Learning for Text Categorization (pp. 1–4).

  • Argamon, S., Koppel, M., Fine, J., Shimony, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3): 321–346.

    Google Scholar 

  • Argamon, S., Saric, M., Stein, S. S. (2003). Style mining of electronic messages for multiple author discrimination. In: Proc. ACM Conference on Knowledge Discovery and Data Mining.

  • Argamon, S., Whitelaw, C., Chase, P., Dhawle, S., Garg, N., Hota, S. R., et al. (2007). Stylistic text classification using functional lexical features. Journal of the American Society of Information Science. 58(6): 802–822.

    Article  Google Scholar 

  • Baayen, R. H., Halteren, H. Van, Tweedie, F. (1996). Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 7: 91–109.

    Google Scholar 

  • Baker, V. R. (1996). The pragmatic routes of American Quaternary geology and geomorphology. Geomorphology, 16: 197–215.

    Article  Google Scholar 

  • Barzilay, R., Elhadad, M. (1999). Using lexical chains for text summarization. In: I. Mani, M. T. Maybury (Eds), Advances in Automatic Text Summarization (pp. 111–121). The MIT Press.

  • Bazerman, C. (2004). What activity systems are literary genres part of? Journal of the Interdisciplinary Crossroads, 1(3).

  • Bazerman, C., Prior, P. (2005). Participating in Emergent Socio-Literate Worlds: Genre, Disciplinarity, Interdisciplinarity. In Multidisciplinary Perspectives on Literacy Research. Hampton Press.

  • Biber, D. (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge: Cambridge University Press.

    Google Scholar 

  • Bond-Robinson, J., Stucky, A. P. (2005). Grounding scientific inquiry and knowledge in situated cognition. In: Proceedings of the 27th Annual Cognitive Science Society. Stresa, Italy.

  • Brunn, M., Chali, Y., Pinchak, C. J. (2001). Text Summarization Using Lexical Chains. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA.

  • Cleland, C. E. (2001). Historical science, experimental science, and the scientific method. Geology, 29(11): 987–990.

    Article  Google Scholar 

  • Cleland, C. E. (2002). Methodological and epistemic differences between historical science and experimental science. Philosophy of Science.

  • Cooper, R. A. (2002). Scientific knowledge of the past is possible: Confronting myths about evolution and the nature of science. The American Biology Teacher, 64: 476–481.

    Article  Google Scholar 

  • Cooper, R. A. (2004). Teaching how scientists reconstruct history: Patterns and processes. The American Biology Teacher, 66(2): 101–108.

    Article  Google Scholar 

  • Cristianini, N., Shaw-Taylor, J. (2000). An introduction to Support Vector Machines. Cambridge Press.

  • Cronin, B. (2005). The Hand of Science: Academic Writing and Its Rewards. Scarecrow Press.

  • Cronin, B., Overfelt, K. (1994). Citation-based auditing of academic performance. Journal of the American Society for Information Science, 45(2): 61–72.

    Article  Google Scholar 

  • Dagan, I., Karov, Y., Roth, D. (1997). Mistake-driven learning in text categorization. In: C. Cardie, R. Weischedel (Eds), Proceedings of EMNLP-97, 2nd Conference on Empirical Methods in Natural Language Processing (pp. 55–63). Providence, US: Association for Computational Linguistics, Morristown, US.

    Google Scholar 

  • Diamond, J. (2002). Guns, Germs and Steel: The Fates of Human Societies. New York: W.W. Norton.

    Google Scholar 

  • Dimitrova, M., Finn, A., Kushmerick, N., Smyth, B. (2002). Web genre visualization. In: Proc. Conference on Human Factors in Computing Systems.

  • Dodick, J. T., Orion, N. (2003). Geology as an historical science: Its perception within science and the education system. Science and Education, 12(2): 197–211.

    Article  Google Scholar 

  • Dunbar, K. (1995). How scientists really reason: Scientific reasoning in real-world laboratories. In: Mechanisms of Insight (p. 365–395). Cambridge MA: MIT Press.

    Google Scholar 

  • Dunbar, K. (1999). The scientist InVivo: How scientists think and reason in the laboratory. In: Model-Based Reasoning in Scientific Discovery. Plenum Press.

  • Dunbar, K. (2001). What scientific thinking reveals about the nature of cognition. In: Designing for Science (pp. 115–140). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Dunbar, K., Blanchette, I. (2001). The invivo/invitro approach to cognition: The case of analogy. Trends in Cognitive Sciences, 5: 334–339.

    Article  Google Scholar 

  • Finn, A., Kushmerick, N., Smyth, B. (2002). Genre classification and domain transfer for information filtering. In: F. Crestani, M. Girolami, C. J. van Rijsbergen (Eds), Proceedings of ECIR-02, 24th European Colloquium on Information Retrieval Research. Glasgow, UK: Springer Verlag, Heidelberg, DE.

    Google Scholar 

  • Frodeman, R. (1995). Geological reasoning: Geology as an interpretive and historical science. Geological Society of America Bulletin, 107: 960–968.

    Article  Google Scholar 

  • Fujimura, J. H. (1987). Constructing ‘do-able’ problems in cancer research: Articulating argument. Social Studies of Science, 17: 257–293.

    Article  Google Scholar 

  • Goodwin, C. (1994). Professional vision. American Anthropologist, 96(3): 606–633.

    Article  MathSciNet  Google Scholar 

  • Goodwin, C. (1995). Seeing in depth. Social Studies of Science, 25: 237–74.

    Article  Google Scholar 

  • Gould, S. J. (1986). Evolution and the triumph of homology, or, why history matters. American Scientist (Jan.–Feb.): 60–69.

  • Graham, N., Hirst, G. (2003). Segmenting a document by stylistic character. In: Workshop on Computational Approaches to Style Analysis and Synthesis, 18th International Joint Conference on Artificial Intelligence. Acapulco.

  • Grossman, D., Frieder, O. (2004). Information Retrieval: Algorithms and heuristics (Second ed.). Springer.

  • Halliday, M. A. K. (1994). Introduction to Functional Grammar (Second ed.). Edward Arnold.

  • Halliday, M. A. K., Hasan, R. (1976). Cohesion in English. London: Longman.

    Google Scholar 

  • Halliday, M. A. K., Martin, J. R. (1993). Writing Science: Literacy and Discursive Power. London: Falmer.

    Google Scholar 

  • Harabagiu, S. (1999). From lexical cohesion to textual coherence: A data driven perspective. Journal of Pattern Recognition and Artificial Intelligence, 13(2) (4): 247–265.

    Article  Google Scholar 

  • Harris, J. (1989). The idea of community in the study of writing. College Composition and Communication, 40(1): 11–22.

    Article  Google Scholar 

  • Hasan, R. (1988). Language in the process of socialisation: Home and school. In: Language and Socialisation: Home and School. North Ryde, N.S.W.: Macquarie University.

    Google Scholar 

  • Herke-Couchman, M., Whitelaw, C., Patrick, J. (2004). Identifying interpersonal distance using systemic features. In: Proc. AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications.

  • Holmes, D. I. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing, 13(3): 111–117.

    Article  Google Scholar 

  • Hovy, E. (1993). In defense of syntax: Informational, intentional, and rhetorical structures in discourse. In: O. Rambow (Ed.), Intentionality and Structure in Discourse Relations (pp. 35–39). Ohio.

  • Hull, D. (1973). Darwin and His Critics: The Reception of Darwin’s Theory of Evolution by the Scientific Community. Cambridge: Harvard University Press.

    Google Scholar 

  • Hyland, K. (2000). Disciplinary Discourses: Social Interactions in Academic Writing. Longman.

  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In: Machine Learning: ECML-98, Tenth European Conference on Machine Learning (pp. 137–142).

  • Kelly, G. J., Bazerman, C. (2003). How students argue scientific claims. Applied Linguistics, 24(1): 28–55.

    Article  Google Scholar 

  • Killingsworth, M., Gilbertson, M. (1992). Signs, Genres, and Communities in Technical Communication. Baywood Pub. Co.

  • Kitcher, P. (1993). The Advancement of Science. New York: Oxford University Press.

    Google Scholar 

  • Koppel, M., Argamon, S., Shimoni, A. R. (2003). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4): 401–412.

    Article  Google Scholar 

  • Lang, K. (1995). NewsWeeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning (pp. 331–339). Morgan Kaufmann Publishers Inc.: San Mateo, CA, USA.

    Google Scholar 

  • Latour, B., Woolgar, S. (1986). Laboratory Life: The Construction of Scientific Facts. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Lewin, B. A., Fine, J., Young, L. (1986). Expository Discourse: A Genre-Based Approach to Social Science Research Texts. Continuum.

  • Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In: C. N’edellec, C. Rouveirol (Eds), Proceedings of ECML-98, 10th European Conference on Machine Learning (pp. 4–15). Chemnitz, DE: Springer Verlag, Heidelberg, DE.

    Chapter  Google Scholar 

  • Macroberts, M. H., Macroberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3): 435–444.

    Article  Google Scholar 

  • Mann, W., Thompson, S. (1988). Rhetorical structure theory: Towards a functional theory of text organization. Text, 8(3): 243–281.

    Google Scholar 

  • Marcu, D. (2000). The rhetorical parsing of unrestricted texts: A surface-based approach. Computational Linguistics, 26(3): 395–448.

    Article  Google Scholar 

  • Matthews, R. A. J., Merriam, T. V. N. (1997). Distinguishing literary styles using neural networks. In: Handbook of Neural Computation (chap. 8). IOP publishing and Oxford University Press.

  • Matthiessen, C. (1992). Lexicogrammatical Cartography: English Systems. Tokyo, Taipei and Dallas: International Language Sciences Publishers.

    Google Scholar 

  • Mayr, E. (1976). Evolution and the Diversity of Life. Cambridge: Harvard University Press.

    Google Scholar 

  • Mayr, E. (1985). How biology differs from the physical sciences. In: Evolution at the Crossroads: The New Biology and the New Philosophy of Science (p. 43–46). Cambridge: MIT Press.

    Google Scholar 

  • Mosteller, F., Wallace, D. L. (1964). Inference and Disputed Authorship: The Federalist. Massachusetts: Addison-Wesley.

    MATH  Google Scholar 

  • Mulkay, N., Gilbert, G. N. (1983). Scientist’s theory talk. Canadian Journal of Sociology, 8: 179–197.

    Google Scholar 

  • Myers, G. (1990). Writing Biology: Texts in the Social Construction of Scientific Knowledge. University of Wisconsin Press.

  • National Research Council. (1996). National Science Education Standards. Washington, DC: National Academy Press.

    Google Scholar 

  • Nersessian, N. J. (2005). Interpreting scientific and engineering practices: Integrating the cognitive, social, and cultural dimensions. In: M. Gorman, R. Tweney, D. Gooding, A. Kincannon (Eds), Scientific and technological Thinking (pp. 17–56). New York: Erlbaum Press.

    Google Scholar 

  • Ochs, E., Jacoby, S. (1997). Down to the wire: The cultural clock of physicists and the discourse of consensus. Language in Society, 26(4): 479–506.

    Article  Google Scholar 

  • Ochs, E., Jacoby, S., Gonzales, P. (1994). Interpretive journeys: How physicists talk and travel through graphic space. Configurations, 1: 151–171.

    Article  Google Scholar 

  • Okada, T., Simon, H. A. (1997). Collaborative discovery in a scientific domain. Cognitive Science, 21(2): 109–146.

    Article  Google Scholar 

  • Platt, J. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Redmond, Wa: Microsoft Research Technical Report Msr-Tr-98-14.

  • Plum, G., Cowling, A. (1987). Social constraints on grammatical variables: Tense choice in english. In: Language Topics (Vol. 2). Amsterdam: John Benjamins.

    Google Scholar 

  • Riloff, E., Wiebe, J., Wilson, T. (2003). Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of CONLL-2003 (pp. 25–32). Edmonton, Canada.

  • Rudolph, J. L., Stewart, J. (1998). Evolution and the nature of science: On the historical discord and its implication for education. Journal of Research in Science Teaching, 35: 1069–1089.

    Article  Google Scholar 

  • Rudwick, M. J. S. (1998). Lyell and the principles of geology. In: D. Blundell, A. Scott (Eds), Lyell: The Past is the Key to the Present (pp. 3–15). Geological Society of London. (Special Publications 143).

  • Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1): 1–47.

    Article  Google Scholar 

  • Sober, E. (1993). Philosophy of Biology. Boulder, CO: Westview Press.

    Google Scholar 

  • Stamatatos, E., Fakotakis, N., Kokkinakis, G. K. (2000). Automatic text categorization in terms of genre, author. Computational Linguistics, 26(4): 471–495.

    Article  Google Scholar 

  • Stucky, A. P., Bond-Robinson, J. (2004). Empirical studies of scientists at work: Analysis of Authentic Inquiry experiences. In: Proceedings of the national association of research in science teaching annual meeting. Vancouver, BC, Canada.

  • Swales, J. M. (1990). Genre analysis. Cambridge University Press.

  • Teufel, S., Moens, M. (1998). Sentence extraction and rhetorical classification for flexible abstracts. In: Proc. AAAI Spring Symposium on Intelligent Text Summarization.

  • Teufel, S., Moens, M. (2002). Summarising Scientific Articles — Experiments with Relevance and Rhetorical Status. Computational Linguistics, 28(4): 409–445.

    Article  Google Scholar 

  • Whewell, W. (1837). History of the Inductive Sciences. London: John W. Parker.

    Google Scholar 

  • White, H. D., Mccain, K. W. (1989). Bibliometrics. Annual Reviews of Information Science and Technology, 24: 119–186.

    Google Scholar 

  • White, H. D., Mccain, K. W. (1998). Visualizing a discipline. an author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49: 327–355.

    Google Scholar 

  • Whitelaw, C., Argamon, S. (2004, October). Systemic functional features in stylistic text classification. In: Proc. AAAI Fall Symposim on Style and Meaning in language, Art, Music, and Design. Washington, DC.

  • Whitelaw, C., Garg, N., Argamon, S. (2005, November). Using appraisal taxonomies for sentiment analysis. In: Proceedings of the ACM Conference on Information and Knowledge Management. Bremen, Germany.

  • Wiebe, J. (2000). Learning subjective adjectives from corpora. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence (pp. 735–740). AAAI Press / The MIT Press.

  • Witten, I. H., Frank, E. (2000). Data Mining: Practical Machine Learning Tools with JAVA Implementations. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Yule, G. (1938). On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship. Biometrika, 30: 363–390.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shlomo Argamon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Argamon, S., Dodick, J. & Chase, P. Language use reflects scientific methodology: A corpus-based study of peer-reviewed journal articles. Scientometrics 75, 203–238 (2008). https://doi.org/10.1007/s11192-007-1768-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-007-1768-y

Keywords

Navigation