Skip to main content

Quantitative Analysis of the Morphological Complexity of Malayalam Language

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2020)

Abstract

This paper presents a quantitative analysis on the morphological complexity of Malayalam language. Malayalam is a Dravidian language spoken in India, predominantly in the state of Kerala with about 38 million native speakers. Malayalam words undergo inflections, derivations and compounding leading to an infinitely extending lexicon. In this work, morphological complexity of Malayalam is quantitatively analyzed on a text corpus containing 8 million words. The analysis is based on the parameters type-token growth rate (TTGR), type-token ratio (TTR) and moving average type-token ratio (MATTR). The values of the parameters obtained in the current study is compared to that of the values of other morphologically complex languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Malayalam.

References

  1. Malayalam Corpus, by Swathanthra Malayalam Computing, April 2020. https://gitlab.com/smc/corpus

  2. Asher, R.E., Kumari, T.: Malayalam. Psychology Press, London (1997)

    Google Scholar 

  3. Baerman, M., Brown, D., Corbett, G.G.: Understanding and Measuring Morphological Complexity. Oxford University Press, Oxford (2015)

    Book  Google Scholar 

  4. Bane, M.: Quantifying and measuring morphological complexity. In: Proceedings of the 26th West Coast Conference on Formal Linguistics, Cascadilla Proceedings Project Somerville, MA, pp. 69–76 (2008)

    Google Scholar 

  5. Bentz, C., Ruzsics, T., Koplenig, A., Samardzic, T.: A comparison between morphological complexity measures: typological data vs. language corpora. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pp. 142–153 (2016)

    Google Scholar 

  6. Covington, M.A., McFall, J.D.: Cutting the gordian knot: the moving-average type-token ratio (MATTR). J. Quant. Linguist. 17(2), 94–100 (2010)

    Article  Google Scholar 

  7. Davis, M., Dürst, M.: Unicode normalization forms (2001)

    Google Scholar 

  8. Fidler, M., Cvrček, V.: Taming the Corpus: From Inflection and Lexis to Interpretation, 1st edn. Springer, New York (2018). https://doi.org/10.1007/978-3-319-98017-1

    Book  Google Scholar 

  9. Georgiev, G., Zhikov, V., Osenova, P., Simov, K., Nakov, P.: Feature-rich part-of-speech tagging for morphologically complex languages: application to Bulgarian. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012, pp. 492–502. Association for Computational Linguistics (2012)

    Google Scholar 

  10. Gutierrez-Vasques, X., Mijangos, V.: Comparing morphological complexity of Spanish, Otomi and Nahuatl. In: Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing, Association for Computational Linguistics, Santa Fe, New-Mexico, pp. 30–37, August 2018. https://www.aclweb.org/anthology/W18-4604

  11. Htay, H.H., Kumar, G.B., Murthy, K.N.: Statistical Analyses of Myanmar Corpora. Department of Computer and Information Sciences, University of Hyderabad pp, Hyderabad, pp. 1–15 (2007)

    Google Scholar 

  12. Juola, P.: Measuring linguistic complexity: the morphological tier. J. Quant. Linguist. 5(3), 206–213 (1998)

    Article  Google Scholar 

  13. Kettunen, K.: Can type-token ratio be used to show morphological complexity of languages? J. Quant. Linguist. 21(3), 223–245 (2014)

    Article  Google Scholar 

  14. Kipyatkova, I., Karpov, A.: Study of morphological factors of factored language models for Russian ASR. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 451–458. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11581-8_56

    Chapter  Google Scholar 

  15. Kumar, G.B., Murthy, K.N., Chaudhuri, B.: Statistical analyses of Telugu text corpora. IJDL. Int. J. Dravidian Linguist. 36(2), 71–99 (2007)

    Google Scholar 

  16. Nair, R.S.S.: A grammar of Malayalam. Lang. Ind. 12, 1–135 (2012)

    Google Scholar 

  17. Pakoci, E., Popović, B., Pekar, D.: Using morphological data in language modeling for Serbian large vocabulary speech recognition. Comput. Intell. Neurosci. 2019, 8 (2019)

    Article  Google Scholar 

  18. Pirinen, T.: Weighted Finite-State Methods for Spell-Checking and Correction. University of Helsinki, Helsinki (2014)

    Google Scholar 

  19. Thottingal, S.: Finite state transducer based morphology analysis for Malayalam language. In: Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages, European Association for Machine Translation, Dublin, Ireland, pp. 1–5, August 2019. https://www.aclweb.org/anthology/W19-6801

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kavya Manohar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manohar, K., Jayan, A.R., Rajan, R. (2020). Quantitative Analysis of the Morphological Complexity of Malayalam Language. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58323-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58322-4

  • Online ISBN: 978-3-030-58323-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics