Abstract
This paper presents a quantitative analysis on the morphological complexity of Malayalam language. Malayalam is a Dravidian language spoken in India, predominantly in the state of Kerala with about 38 million native speakers. Malayalam words undergo inflections, derivations and compounding leading to an infinitely extending lexicon. In this work, morphological complexity of Malayalam is quantitatively analyzed on a text corpus containing 8 million words. The analysis is based on the parameters type-token growth rate (TTGR), type-token ratio (TTR) and moving average type-token ratio (MATTR). The values of the parameters obtained in the current study is compared to that of the values of other morphologically complex languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Malayalam Corpus, by Swathanthra Malayalam Computing, April 2020. https://gitlab.com/smc/corpus
Asher, R.E., Kumari, T.: Malayalam. Psychology Press, London (1997)
Baerman, M., Brown, D., Corbett, G.G.: Understanding and Measuring Morphological Complexity. Oxford University Press, Oxford (2015)
Bane, M.: Quantifying and measuring morphological complexity. In: Proceedings of the 26th West Coast Conference on Formal Linguistics, Cascadilla Proceedings Project Somerville, MA, pp. 69–76 (2008)
Bentz, C., Ruzsics, T., Koplenig, A., Samardzic, T.: A comparison between morphological complexity measures: typological data vs. language corpora. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pp. 142–153 (2016)
Covington, M.A., McFall, J.D.: Cutting the gordian knot: the moving-average type-token ratio (MATTR). J. Quant. Linguist. 17(2), 94–100 (2010)
Davis, M., Dürst, M.: Unicode normalization forms (2001)
Fidler, M., Cvrček, V.: Taming the Corpus: From Inflection and Lexis to Interpretation, 1st edn. Springer, New York (2018). https://doi.org/10.1007/978-3-319-98017-1
Georgiev, G., Zhikov, V., Osenova, P., Simov, K., Nakov, P.: Feature-rich part-of-speech tagging for morphologically complex languages: application to Bulgarian. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012, pp. 492–502. Association for Computational Linguistics (2012)
Gutierrez-Vasques, X., Mijangos, V.: Comparing morphological complexity of Spanish, Otomi and Nahuatl. In: Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing, Association for Computational Linguistics, Santa Fe, New-Mexico, pp. 30–37, August 2018. https://www.aclweb.org/anthology/W18-4604
Htay, H.H., Kumar, G.B., Murthy, K.N.: Statistical Analyses of Myanmar Corpora. Department of Computer and Information Sciences, University of Hyderabad pp, Hyderabad, pp. 1–15 (2007)
Juola, P.: Measuring linguistic complexity: the morphological tier. J. Quant. Linguist. 5(3), 206–213 (1998)
Kettunen, K.: Can type-token ratio be used to show morphological complexity of languages? J. Quant. Linguist. 21(3), 223–245 (2014)
Kipyatkova, I., Karpov, A.: Study of morphological factors of factored language models for Russian ASR. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 451–458. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11581-8_56
Kumar, G.B., Murthy, K.N., Chaudhuri, B.: Statistical analyses of Telugu text corpora. IJDL. Int. J. Dravidian Linguist. 36(2), 71–99 (2007)
Nair, R.S.S.: A grammar of Malayalam. Lang. Ind. 12, 1–135 (2012)
Pakoci, E., Popović, B., Pekar, D.: Using morphological data in language modeling for Serbian large vocabulary speech recognition. Comput. Intell. Neurosci. 2019, 8 (2019)
Pirinen, T.: Weighted Finite-State Methods for Spell-Checking and Correction. University of Helsinki, Helsinki (2014)
Thottingal, S.: Finite state transducer based morphology analysis for Malayalam language. In: Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages, European Association for Machine Translation, Dublin, Ireland, pp. 1–5, August 2019. https://www.aclweb.org/anthology/W19-6801
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Manohar, K., Jayan, A.R., Rajan, R. (2020). Quantitative Analysis of the Morphological Complexity of Malayalam Language. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-58323-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58322-4
Online ISBN: 978-3-030-58323-1
eBook Packages: Computer ScienceComputer Science (R0)