Abstract
Native language acquisition is one of the initial processes undertaken by the human brain in the infant stage of life. The linguist community has always been interested in finding the method, which is adopted by the human brain to acquire the native language. Word segmentation in one of the most important tasks in acquiring the language. Statistical learning has been employed to be one of the earliest strategies that mimic the way an infant can adapt to segment a lot of different words. It is desired that the language learnability theories be universal in nature and work on most, if not all the languages. In the present work, we have analyzed the learnability of Hindi, the most popular Indian language, using ideal (universal) and constrained Bayesian learner models. We have analyzed the learnability of the language using unigram and bigram approaches by considering word, syllables, and phonemes as the smallest unit of the language. We demonstrate that Bayesian inference is indeed a viable cross-linguistic strategy and works well for Hindi also.



Similar content being viewed by others
References
Black, A. W., & Taylor, P. A. (1997). The festival speech synthesis system: System documentation. Technical Report HCRC/TR-83. Scotland: Human Communciation Research Centre, University of Edinburgh. Avaliable at http://www.cstr.ed.ac.uk/projects/festival.html.
Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., & Zeman, D. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).
Clark, R. A. J., Richmond, K., & King, S. (2007). Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication, 49(4), 317–330.
Cognition Institute for Language and Indic Multi-parallel Corpus Computation, University of Edinburgh (2011). http://homepages.inf.ed.ac.uk/miles/babel.html.
Eddington, D., Treiman, R., & Elzinga, D. (2013). Syllabification of american english: Evidence from a large-scale experiment. Part ii. Journal of Quantitative Linguistics, 20(2), 75–93.
Eimas, P. D. (1999). Segmental and syllabic representations in the perception of speech by young infants. The Journal of the Acoustical Society of America, 105(3), 1901–1911.
Felser, C., & Drummer, J.-D. (2017). Sensitivity to crossover constraints during native and non-native pronoun resolution. Journal of Psycholinguistic Research, 46(3), 771–789.
Floccia, C., Keren-Portnoy, T., DePaolis, R., Duffy, H., Luche, C. D., Durrant, S., et al. (2016). British english infants segment words only with exaggerated infant-directed speech stimuli. Cognition, 148, 1–9.
Gambell, T., & Yang, C. (2006). Word segmentation: Quick but not dirty. Unpublished manuscript.
Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.
Graddol, D. (2004). The future of language (Vol. 303). Washington, DC.: American Association for the Advancement of Science.
Gupta, K., Choudhury, M., & Bali, K. (2012). Mining hindi-english transliteration pairs from online hindi lyrics. In LREC (pp. 2459–2465).
Gural, S. K., Kecskes, I., Gillespie, D., Rijlaarsdam, G. C. W., Ter-Minasova, S. G., Karasik, V. I., et al. (2015). Word collocations as language knowledge patterns: A study of infant speech. Procedia–Social and Behavioral Sciences, 200, 353–358.
Halpern, M. (2016). How children learn their mother tongue: They dont. Journal of Psycholinguistic Research, 45(5), 1173–1181.
Haris, B. C., Gayadhar Pradhan, A., Misra, S. R. M. P., Das, R. K., & Sinha, R. (2012). Multivariability speaker recognition database in indian scenario. International Journal of Speech Technology, 15(4), 441–453.
Hyams, N. (2012). Language acquisition and the theory of parameters (Vol. 3). Berlin: Springer Science & Business Media.
IIT-Bombay Hindi Corpus (2010). http://www.cfilt.iitb.ac.in/downloads.html.
India Hindi Speech Corpus. TDIL: Technology Development for Indian Languages Programme (2010). http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=268&lang=en
Jusczyk, P. W., & Derrah, C. (1987). Representation of speech sounds by young infants. Developmental Psychology, 23(5), 648.
Kuamr, A., Dua, M., & Choudhary, T. (2014). Continuous hindi speech recognition using gaussian mixture hmm. In 2014 IEEE students’ conference on electrical, electronics and computer science (SCEECS) (pp. 1–5).
Lignos, C., & Yang, C. (2010). Recession segmentation: Simpler online word segmentation using limited resources. In Proceedings of the fourteenth conference on computational natural language learning (pp. 88–97). Vancouver: Association for Computational Linguistics.
MacWhinney, B. (2000). The CHILDES project: The database (Vol. 2). London: Psychology Press.
Pearl, L. (2014). Evaluating learning-strategy components: Being fair (commentary on ambridge, pine, and lieven). Language, 90(3), e107–e114.
Phillips, L., & Pearl, L. (2015). The utility of cognitive plausibility in language acquisition modeling: Evidence from word segmentation. Cognitive Science, 39(8), 1824–1854.
Swingley, D. (2005). Statistical clustering and the contents of the infant vocabulary. Cognitive Psychology, 50(1), 86–132.
Taha, H. (2017). How does the linguistic distance between spoken and standard language in arabic affect recall and recognition performances during verbal memory examination. Journal of Psycholinguistic Research, 46(3), 551–566.
Weerasinghe, R., Wasala, A., & Gamage, K. (2005). A rule based syllabification algorithm for sinhala. In Natural language processing–IJCNLP 2005 (pp. 438–449). Berlin: Springer.
Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7(1), 49–63.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2002). The htk book. Cambridge University Engineering Department, 3, 175.
Acknowledgements
This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur. We thank MNIT’s computer labs for setting up the experiment and also the LNMIIT’s GPU services in simulations to obtain the results.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
The research is not funded by any external project/agency other than the LNMIIT Jaipur and MNIT Jaipur, India. This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur.
Conflict of Interest
The work is supported by the LNM Institute of Information Technology, Jaipur and Malaviya National Institute of Technology (MNIT), Jaipur, India only. The details are mentioned in funding section. We have no conflict of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saini, S., Sahula, V. Language Learnability Analysis of Hindi: A Comparison with Ideal and Constrained Learning Approaches. J Psycholinguist Res 48, 947–960 (2019). https://doi.org/10.1007/s10936-019-09641-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10936-019-09641-2

