Advertisement

Application of Linguistic Knowledge in Factored Language Modeling for Hindi Language

  • Arun R. BabhulgaonkarEmail author
  • Shefali P. Sonavane
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1025)

Abstract

A language model is a technique that shows which words are more or less likely to be generated during some conversation in any natural language. N-gram language modeling is the pioneer technology used to construct language models. N-gram technique considers preceding words only to predict the upcoming word. Factored language modeling is a formalism that provides a facility to undertake other linguistic knowledge of the words like gender, number, part of speech, stem of word along with word itself to predict next word in a sentence. This paper discusses the effect of various combinations of linguistic features of word on predictability of next word in Hindi-language sentence. The paper also discusses how use of linguistic features decreases the perplexity by 31.71% as compared to perplexity of baseline N-gram language model.

Keywords

N-gram Factored language model (FLM) Perplexity 

References

  1. 1.
    Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88(8), 1270–1278 (2000)CrossRefGoogle Scholar
  2. 2.
    Kirchhoff, K. Bilmes, J., Duh, K.: Factored Language Models Tutorial. University of Washington (2008)Google Scholar
  3. 3.
    Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: The Proceedings of the HLT/NAACL, pp. 4–6 (2003)Google Scholar
  4. 4.
    Axelrod, A.E.: Factored Language Models for Statistical Machine Translation. University of Edinburgh (2006)Google Scholar
  5. 5.
    DeNovais, E.M.: Portuguese Text Generation Using Factored Language Models. J. Brazilian Comput. Soc. 19(2), 135–146 (2013)CrossRefGoogle Scholar
  6. 6.
    Cristina, B., Elena, L.: Analyzing the influence of semantic knowledge in natural language generation. In: The Proceedings of the 12th International Conference on Digital Information Management (ICDIM), Fukuoka, Japan, pp. 185–190 (2017)Google Scholar
  7. 7.
    Kipyatkova, I., Karpov, A.: Study of morphological factors of factored language models for Russian ASR. In: The Proceedings of SPECOM 2014, Novi Sad, pp. 451–458 (2014)Google Scholar
  8. 8.
    Sak, H., Saraçlar, M.: Morphology based and sub word language modeling for Turkish speech recognition. In: The Proceedings of ICASSP, Dallas, pp. 5402–5405 (2010)Google Scholar
  9. 9.
    Mousa, A., Shaik, M., Ney, H.: Morpheme based factored language models for German LVCSR. In: the Proceedings of INTERSPEECH, Florence, pp. 1053–1056 (2011)Google Scholar
  10. 10.
    Alumae, Z.: Sentence adapted factored language model for transcribing Stonian speech. In: The Proceedings of ICASSP, Toulouse, pp. 429–432 (2006)Google Scholar
  11. 11.
    Hirsimaki, T., Pylkkonen, J., Kurimo, M.: Importance of high-order N-gram models in morph-based speech recognition. IEEE Trans. Audio, Speech, Lang. Process. 17(4), 724–732 (2009)Google Scholar
  12. 12.
    Adel, H., Vu, N.T., Kirchhoff, K., Telaar, Schultz, T.: Syntactic and semantic features for code-switching factored language models. IEEE/ACM Trans. Audio, Speech, Lang. Process 23(3), 431–440 (2015)Google Scholar
  13. 13.
    Stolcke, A.: SRILM—an extensible language modeling toolkit. In: The Proceedings of International Conference on Spoken Language Processing, Colorado, September (2002)Google Scholar
  14. 14.
    Stolcke, A., Wheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: The Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa (2011)Google Scholar
  15. 15.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: The Proceedings of the Thirty Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, pp. 310–318 (1996) Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringWalchand College of EngineeringSangliIndia
  2. 2.Department of Information TechnologyWalchand College of EngineeringSangliIndia

Personalised recommendations