Abstract
In this chapter , we make attempt to discuss in brief about various types of statistical approaches that are normally used for processing and analyzing a text corpus as well as for obtaining data which may be considered statistically reliable in making some generalized or specific comments on the patterns of occurrence or manner of distribution of linguistic elements in a corpus . Moreover, in this chapter , we try to show how, based on the nature of text used in a corpus , the patterns of quantitative analysis may vary from that of qualitative analysis , although in the long run both types of analysis may be combined together to get a clear picture of the linguistic phenomenon under scrutiny. We also try to give a short history about the use of statistical methods and techniques in the analysis of corpus before and after the introduction of digital corpus as well as describe how descriptive approaches , inferential approaches , and evaluative approaches can be combined together in the act of corpus analysis , linguistics investigation, and inference deduction .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barnbrook, G. 1996. Language and Computers: A Practical Introduction to the Computer Analysis of Language. Edinburgh: Edinburgh University Press.
Bhattacharya, N. 1965. Some Statistical Studies of the Bangla Language. Unpublished Doctoral Dissertation. Indian Statistical Institute, Kolkata.
Biber, D. 1993. Representativeness in Corpus Design. Literary and Linguistic Computing 8 (4): 243–257.
Biber, D., S. Conrad, and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.
Borg, I., and P. Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications, 2nd ed. Springer-Verlag: New York.
Cardinal, R.N., and M.R.F. Aitken. 2006. ANOVA for the Behavioural Sciences Researcher. Mahwah, NJ: Lawrence Erlbaum Associates.
Chatterji, S.K. 1926. The Origin and Development of the Bengali Language. Kolkata: Calcutta University Press (Reprinted by Rupa, Kolkata in 1993).
Das, G., S. Bhattacharya, and S. Mitra. 1984. Representing Asamia, Bengali and Manipuri text in Line Printer and Daisy-Wheel Printer. Journal of the Institution of Electronics and Telecommunication Engineers. 30 (2): 251–256.
Dash, N.S. 2005. Corpus Linguistics, and Language Technology: With Reference to Indian Languages. New Delhi: Mittal Publications.
Dewey, G. 1950. Relativ Frequency of English Speech Sounds. Harvard: Harvard University Press.
Edwards, A.W., and R.L. Chambers. 1964. Occurrence of Various Language Properties in English. Journal of the Association for Computing Machinery 2: 465–482.
Everitt, B. 2011. Cluster Analysis. Chichester, West Sussex, UK: Wiley.
Fasold, R.W. (ed.). 1989. Language Change and Variation. London: John Benjamins.
Good, I.J. 1957. Distribution of Word Frequencies. Nature 179: 595.
Greenwood, P.E., and M.S. Nikulin. 1996. A Guide to Chi-squared Testing. New York: Wiley.
Herden, G. 1962. Calculus of Linguistic Observation. Hague: Mouton & Co.
Huber, P.J. 2004. Robust Statistics. New York: Wiley.
Katz, M.H. 2006. Multivariable Analysis: A Practical Guide for Clinicians, 2nd ed. Cambridge: Cambridge University Press.
Kennedy, G. 1998. An Introduction to Corpus Linguistics. New York: Addison-Wesley Longman Inc.
Kenny, A.J.P. 1982. The Computation of Style. Oxford: Pergamon Press.
Kilgarriff, A. 1996. Corpus Similarity and Homogeneity via Word Frequency. In EURALEX Proceedings. Gothenburg, Sweden.
Leech, G., B. Francis, and X. Xu. 1994. The Use of Computer Corpora in the Textual Demonstrability of Gradience in Linguistic Categories. In Continuity in Linguistic Semantics, ed. C. Fuchs and B. Vitorri, 31–47. John Benjamins: Amsterdam and Philadelphia.
Mallik, B.P., N. Bhattacharya, S.C. Kundu, and M. Dawn. 1998. Phonemic and Morphemic Frequency in the Bengali Language. Kolkata: The Asiatic Society.
Manning, C.D., P. Raghavan, and H. Schütze. 2009. Introduction to Information Retrieval. Cambridge: Cambridge University Press.
McEnery, T., and A. Wilson. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.
Miller, G.A. 1951. Language and Communication. New York: McGraw-Hills.
Miller, G.A., F.B. Newman, and E.A. Friedman. 1958. Length-Frequency Statistics for Written English. Information and Control 1: 370–389.
Oakes, M.P. 1998. Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
Rice, J.A. 2006. Mathematical Statistics and Data Analysis, 3rd ed. Belmont, CA: Duxbury Press.
Rutherford, A. 2001. Introducing ANOVA and ANCOVA: A GLM approach. Thousand Oaks, CA: Sage Publications.
Wilcox, R.R. 2005. Introduction to Robust Estimation and Hypothesis Testing. London: Academic Press.
Williams, C.B. 1940. A Note on the Statistical Analysis of Sentence Length as a Criterion of Literary Style. Biometrika 31: 356–361.
Yule, G.U. 1964. The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press.
Web Links
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Dash, N.S., Ramamoorthy, L. (2019). Statistical Studies on Language Corpus. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_4
Download citation
DOI: https://doi.org/10.1007/978-981-13-1801-6_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1800-9
Online ISBN: 978-981-13-1801-6
eBook Packages: Social SciencesSocial Sciences (R0)