Skip to main content

Statistical Studies on Language Corpus

  • Chapter
  • First Online:
Utility and Application of Language Corpora

Abstract

In this chapter , we make attempt to discuss in brief about various types of statistical approaches that are normally used for processing and analyzing a text corpus as well as for obtaining data which may be considered statistically reliable in making some generalized or specific comments on the patterns of occurrence or manner of distribution of linguistic elements in a corpus . Moreover, in this chapter , we try to show how, based on the nature of text used in a corpus , the patterns of quantitative analysis may vary from that of qualitative analysis , although in the long run both types of analysis may be combined together to get a clear picture of the linguistic phenomenon under scrutiny. We also try to give a short history about the use of statistical methods and techniques in the analysis of corpus before and after the introduction of digital corpus as well as describe how descriptive approaches , inferential approaches , and evaluative approaches can be combined together in the act of corpus analysis , linguistics investigation, and inference deduction .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Barnbrook, G. 1996. Language and Computers: A Practical Introduction to the Computer Analysis of Language. Edinburgh: Edinburgh University Press.

    Google Scholar 

  • Bhattacharya, N. 1965. Some Statistical Studies of the Bangla Language. Unpublished Doctoral Dissertation. Indian Statistical Institute, Kolkata.

    Google Scholar 

  • Biber, D. 1993. Representativeness in Corpus Design. Literary and Linguistic Computing 8 (4): 243–257.

    Article  Google Scholar 

  • Biber, D., S. Conrad, and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Borg, I., and P. Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications, 2nd ed. Springer-Verlag: New York.

    Google Scholar 

  • Cardinal, R.N., and M.R.F. Aitken. 2006. ANOVA for the Behavioural Sciences Researcher. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Chatterji, S.K. 1926. The Origin and Development of the Bengali Language. Kolkata: Calcutta University Press (Reprinted by Rupa, Kolkata in 1993).

    Google Scholar 

  • Das, G., S. Bhattacharya, and S. Mitra. 1984. Representing Asamia, Bengali and Manipuri text in Line Printer and Daisy-Wheel Printer. Journal of the Institution of Electronics and Telecommunication Engineers. 30 (2): 251–256.

    Google Scholar 

  • Dash, N.S. 2005. Corpus Linguistics, and Language Technology: With Reference to Indian Languages. New Delhi: Mittal Publications.

    Google Scholar 

  • Dewey, G. 1950. Relativ Frequency of English Speech Sounds. Harvard: Harvard University Press.

    Google Scholar 

  • Edwards, A.W., and R.L. Chambers. 1964. Occurrence of Various Language Properties in English. Journal of the Association for Computing Machinery 2: 465–482.

    Article  Google Scholar 

  • Everitt, B. 2011. Cluster Analysis. Chichester, West Sussex, UK: Wiley.

    Book  Google Scholar 

  • Fasold, R.W. (ed.). 1989. Language Change and Variation. London: John Benjamins.

    Google Scholar 

  • Good, I.J. 1957. Distribution of Word Frequencies. Nature 179: 595.

    Article  Google Scholar 

  • Greenwood, P.E., and M.S. Nikulin. 1996. A Guide to Chi-squared Testing. New York: Wiley.

    Google Scholar 

  • Herden, G. 1962. Calculus of Linguistic Observation. Hague: Mouton & Co.

    Google Scholar 

  • Huber, P.J. 2004. Robust Statistics. New York: Wiley.

    Google Scholar 

  • Katz, M.H. 2006. Multivariable Analysis: A Practical Guide for Clinicians, 2nd ed. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Kennedy, G. 1998. An Introduction to Corpus Linguistics. New York: Addison-Wesley Longman Inc.

    Google Scholar 

  • Kenny, A.J.P. 1982. The Computation of Style. Oxford: Pergamon Press.

    Google Scholar 

  • Kilgarriff, A. 1996. Corpus Similarity and Homogeneity via Word Frequency. In EURALEX Proceedings. Gothenburg, Sweden.

    Google Scholar 

  • Leech, G., B. Francis, and X. Xu. 1994. The Use of Computer Corpora in the Textual Demonstrability of Gradience in Linguistic Categories. In Continuity in Linguistic Semantics, ed. C. Fuchs and B. Vitorri, 31–47. John Benjamins: Amsterdam and Philadelphia.

    Google Scholar 

  • Mallik, B.P., N. Bhattacharya, S.C. Kundu, and M. Dawn. 1998. Phonemic and Morphemic Frequency in the Bengali Language. Kolkata: The Asiatic Society.

    Google Scholar 

  • Manning, C.D., P. Raghavan, and H. Schütze. 2009. Introduction to Information Retrieval. Cambridge: Cambridge University Press.

    Google Scholar 

  • McEnery, T., and A. Wilson. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.

    Google Scholar 

  • Miller, G.A. 1951. Language and Communication. New York: McGraw-Hills.

    Book  Google Scholar 

  • Miller, G.A., F.B. Newman, and E.A. Friedman. 1958. Length-Frequency Statistics for Written English. Information and Control 1: 370–389.

    Article  Google Scholar 

  • Oakes, M.P. 1998. Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.

    Google Scholar 

  • Rice, J.A. 2006. Mathematical Statistics and Data Analysis, 3rd ed. Belmont, CA: Duxbury Press.

    Google Scholar 

  • Rutherford, A. 2001. Introducing ANOVA and ANCOVA: A GLM approach. Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Wilcox, R.R. 2005. Introduction to Robust Estimation and Hypothesis Testing. London: Academic Press.

    Google Scholar 

  • Williams, C.B. 1940. A Note on the Statistical Analysis of Sentence Length as a Criterion of Literary Style. Biometrika 31: 356–361.

    Google Scholar 

  • Yule, G.U. 1964. The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press.

    Google Scholar 

Web Links

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niladri Sekhar Dash .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dash, N.S., Ramamoorthy, L. (2019). Statistical Studies on Language Corpus. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1801-6_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1800-9

  • Online ISBN: 978-981-13-1801-6

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics