Skip to main content
Log in

Language Learnability Analysis of Hindi: A Comparison with Ideal and Constrained Learning Approaches

  • Published:
Journal of Psycholinguistic Research Aims and scope Submit manuscript

Abstract

Native language acquisition is one of the initial processes undertaken by the human brain in the infant stage of life. The linguist community has always been interested in finding the method, which is adopted by the human brain to acquire the native language. Word segmentation in one of the most important tasks in acquiring the language. Statistical learning has been employed to be one of the earliest strategies that mimic the way an infant can adapt to segment a lot of different words. It is desired that the language learnability theories be universal in nature and work on most, if not all the languages. In the present work, we have analyzed the learnability of Hindi, the most popular Indian language, using ideal (universal) and constrained Bayesian learner models. We have analyzed the learnability of the language using unigram and bigram approaches by considering word, syllables, and phonemes as the smallest unit of the language. We demonstrate that Bayesian inference is indeed a viable cross-linguistic strategy and works well for Hindi also.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Finland)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Black, A. W., & Taylor, P. A. (1997). The festival speech synthesis system: System documentation. Technical Report HCRC/TR-83. Scotland: Human Communciation Research Centre, University of Edinburgh. Avaliable at http://www.cstr.ed.ac.uk/projects/festival.html.

  • Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., & Zeman, D. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).

  • Clark, R. A. J., Richmond, K., & King, S. (2007). Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication, 49(4), 317–330.

    Article  Google Scholar 

  • Cognition Institute for Language and Indic Multi-parallel Corpus Computation, University of Edinburgh (2011). http://homepages.inf.ed.ac.uk/miles/babel.html.

  • Eddington, D., Treiman, R., & Elzinga, D. (2013). Syllabification of american english: Evidence from a large-scale experiment. Part ii. Journal of Quantitative Linguistics, 20(2), 75–93.

    Article  Google Scholar 

  • Eimas, P. D. (1999). Segmental and syllabic representations in the perception of speech by young infants. The Journal of the Acoustical Society of America, 105(3), 1901–1911.

    Article  PubMed  Google Scholar 

  • Felser, C., & Drummer, J.-D. (2017). Sensitivity to crossover constraints during native and non-native pronoun resolution. Journal of Psycholinguistic Research, 46(3), 771–789.

    Article  PubMed  Google Scholar 

  • Floccia, C., Keren-Portnoy, T., DePaolis, R., Duffy, H., Luche, C. D., Durrant, S., et al. (2016). British english infants segment words only with exaggerated infant-directed speech stimuli. Cognition, 148, 1–9.

    Article  PubMed  Google Scholar 

  • Gambell, T., & Yang, C. (2006). Word segmentation: Quick but not dirty. Unpublished manuscript.

  • Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.

    Article  PubMed  Google Scholar 

  • Graddol, D. (2004). The future of language (Vol. 303). Washington, DC.: American Association for the Advancement of Science.

    Google Scholar 

  • Gupta, K., Choudhury, M., & Bali, K. (2012). Mining hindi-english transliteration pairs from online hindi lyrics. In LREC (pp. 2459–2465).

  • Gural, S. K., Kecskes, I., Gillespie, D., Rijlaarsdam, G. C. W., Ter-Minasova, S. G., Karasik, V. I., et al. (2015). Word collocations as language knowledge patterns: A study of infant speech. Procedia–Social and Behavioral Sciences, 200, 353–358.

    Article  Google Scholar 

  • Halpern, M. (2016). How children learn their mother tongue: They dont. Journal of Psycholinguistic Research, 45(5), 1173–1181.

    Article  PubMed  Google Scholar 

  • Haris, B. C., Gayadhar Pradhan, A., Misra, S. R. M. P., Das, R. K., & Sinha, R. (2012). Multivariability speaker recognition database in indian scenario. International Journal of Speech Technology, 15(4), 441–453.

    Article  Google Scholar 

  • Hyams, N. (2012). Language acquisition and the theory of parameters (Vol. 3). Berlin: Springer Science & Business Media.

    Google Scholar 

  • IIT-Bombay Hindi Corpus (2010). http://www.cfilt.iitb.ac.in/downloads.html.

  • India Hindi Speech Corpus. TDIL: Technology Development for Indian Languages Programme (2010). http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=268&lang=en

  • Jusczyk, P. W., & Derrah, C. (1987). Representation of speech sounds by young infants. Developmental Psychology, 23(5), 648.

    Article  Google Scholar 

  • Kuamr, A., Dua, M., & Choudhary, T. (2014). Continuous hindi speech recognition using gaussian mixture hmm. In 2014 IEEE students’ conference on electrical, electronics and computer science (SCEECS) (pp. 1–5).

  • Lignos, C., & Yang, C. (2010). Recession segmentation: Simpler online word segmentation using limited resources. In Proceedings of the fourteenth conference on computational natural language learning (pp. 88–97). Vancouver: Association for Computational Linguistics.

  • MacWhinney, B. (2000). The CHILDES project: The database (Vol. 2). London: Psychology Press.

    Google Scholar 

  • Pearl, L. (2014). Evaluating learning-strategy components: Being fair (commentary on ambridge, pine, and lieven). Language, 90(3), e107–e114.

    Article  Google Scholar 

  • Phillips, L., & Pearl, L. (2015). The utility of cognitive plausibility in language acquisition modeling: Evidence from word segmentation. Cognitive Science, 39(8), 1824–1854.

    Article  PubMed  Google Scholar 

  • Swingley, D. (2005). Statistical clustering and the contents of the infant vocabulary. Cognitive Psychology, 50(1), 86–132.

    Article  PubMed  Google Scholar 

  • Taha, H. (2017). How does the linguistic distance between spoken and standard language in arabic affect recall and recognition performances during verbal memory examination. Journal of Psycholinguistic Research, 46(3), 551–566.

    Article  PubMed  Google Scholar 

  • Weerasinghe, R., Wasala, A., & Gamage, K. (2005). A rule based syllabification algorithm for sinhala. In Natural language processing–IJCNLP 2005 (pp. 438–449). Berlin: Springer.

  • Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7(1), 49–63.

    Article  Google Scholar 

  • Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2002). The htk book. Cambridge University Engineering Department, 3, 175.

    Google Scholar 

Download references

Acknowledgements

This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur. We thank MNIT’s computer labs for setting up the experiment and also the LNMIIT’s GPU services in simulations to obtain the results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandeep Saini.

Ethics declarations

Funding

The research is not funded by any external project/agency other than the LNMIIT Jaipur and MNIT Jaipur, India. This research is partially supported by the project under SMDP-C2SD-ERP-1000110086, Department of Electronics and Information Technology, Ministry of Communication & IT, Government of India at Malaviya National Institute of Technology (MNIT), Jaipur.

Conflict of Interest

The work is supported by the LNM Institute of Information Technology, Jaipur and Malaviya National Institute of Technology (MNIT), Jaipur, India only. The details are mentioned in funding section. We have no conflict of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saini, S., Sahula, V. Language Learnability Analysis of Hindi: A Comparison with Ideal and Constrained Learning Approaches. J Psycholinguist Res 48, 947–960 (2019). https://doi.org/10.1007/s10936-019-09641-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10936-019-09641-2

Keywords

Navigation