Introduction
Corpus linguistics is, broadly speaking, the application of “big data” to the science of linguistics. Unlike traditional linguistic analysis [caricatured by Fillmore (1992) as “armchair linguistics”], which relies on native intuition and introspection, corpus linguists rely on large samples to quantitatively analyze the distribution of linguistic items. It has therefore tended to focus on what can be easily measured by computer and quantified, such as words, phrases, and word-based grammar, instead of more abstract concepts such as discourse or formal syntax. With the advent of high-powered computers and the increased availability of machine-readable texts, it has become a major force in modern linguistic research.
History
The use of corpora for language analysis long predates computers. Theologians were making Biblical concordances in the eighteenth century, and Samuel Johnson started a tradition followed to this day (e.g., most famously by the Oxford English Dictionary)...
Further Readings
Fillmore, C. J. (1992). “Corpus linguistics” or “computer-aided armchair linguistics”. In J. Svartvik (Ed.), Directions in corpus linguistics: Proceedings of Nobel symposium 82. 4–8 August 1991 (pp. 35–60). Berlin: Mouton de Gruyter.
Juola, P. (2006). Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233–334.
Kennedy, G. (1998). An introduction to corpus linguistics. London: Longman.
Kučera, H., & Nelson Francis, W. (1967). Computational analysis of present-day American English. Providence: Brown University Press.
McEnery, T., & Hardy, A. (2012). Corpus linguistics: Method, theory, practice. Cambridge: Cambridge University Press.
Meyer, C. F. (2002). English corpus linguistics: An introduction. Cambridge: Cambridge University Press.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Juola, P. (2018). Corpus Linguistics. In: Schintler, L., McNeely, C. (eds) Encyclopedia of Big Data. Springer, Cham. https://doi.org/10.1007/978-3-319-32001-4_523-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-32001-4_523-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32001-4
Online ISBN: 978-3-319-32001-4
eBook Packages: Springer Reference Business and ManagementReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences