Skip to main content

Long-Distance Continuous Space Language Modeling for Speech Recognition

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

  • 3319 Accesses

Abstract

The n-gram language models has been the most frequently used language model for a long time as they are easy to build models and require the minimum effort for integration in different NLP applications. Although of its popularity, n-gram models suffer from several drawbacks such as its ability to generalize for the unseen words in the training data, the adaptability to new domains, and the focus only on short distance word relations. To overcome the problems of the n-gram models the continuous parameter space LMs were introduced. In these models the words are treated as vectors of real numbers rather than of discrete entities. As a result, semantic relationships between the words could be quantified and can be integrated into the model. The infrequent words are modeled using the more frequent ones that are semantically similar. In this paper we present a long distance continuous language model based on a latent semantic analysis (LSA). In the LSA framework, the word-document co-occurrence matrix is commonly used to tell how many times a word occurs in a certain document. Also, the word-word co-occurrence matrix is used in many previous studies. In this research, we introduce a different representation for the text corpus, this by proposing long-distance word co-occurrence matrices. These matrices to represent the long range co-occurrences between different words on different distances in the corpus. By applying LSA to these matrices, words in the vocabulary are moved to the continuous vector space. We represent each word with a continuous vector that keeps the word order and position in the sentences. We use tied-mixture HMM modeling (TM-HMM) to robustly estimate the LM parameters and word probabilities. Experiments on the Arabic Gigaword corpus show improvements in the perplexity and the speech recognition results compared to the conventional n-gram.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Markov, A.A.: An example of statistical investigation in the text of ‘Eugene Onyegin’ illustrating coupling of ‘tests’ in chains. In: Proceedings of the Academy of Sciences. VI, vol. 7 , St. Petersburg, pp. 153–162 (1913)

    Google Scholar 

  2. Damerau, F.: Markov models and linguistic theory: an experimental study of a model for English, Janua linguarum: Series minor. Mouton (1971)

    Google Scholar 

  3. Jelinek, F.: Statistical Methods for Speech Recognition. Language, Speech, & Communication: A Bradford Book. MIT Press (1997)

    Google Scholar 

  4. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 181–184 (1995)

    Google Scholar 

  5. Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependencies in stochastic language modelling. Computer Speech and Language 8, 1–38 (1994)

    Article  Google Scholar 

  6. Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3 and 4), 237–264 (1953)

    Article  MATH  MathSciNet  Google Scholar 

  7. Jelinek, F., Mercer, R.L.: Interpolated estimation of markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)

    Google Scholar 

  8. Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3), 400–401 (1987)

    Article  Google Scholar 

  9. Lidstone, G.: Note on the general case of the Bayes–Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries 8, 182–192 (1920)

    Google Scholar 

  10. Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice-Hall, Inc., Upper Saddle River (1990)

    Google Scholar 

  11. Brown, P.F., de Souza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)

    Google Scholar 

  12. Broman, S., Kurimo, M.: Methods for combining language models in speech recognition. In: Interspeech, pp. 1317–1320 (September 2005)

    Google Scholar 

  13. Wada, Y., Kobayashi, N., Kobayashi, T.: Robust language modeling for a small corpus of target tasks using class-combined word statistics and selective use of a general corpus. Systems and Computers in Japan 34(12), 92–102 (2003)

    Article  Google Scholar 

  14. Niesler, T., Woodland, P.: Combination of word-based and category-based language models. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 1996, vol. 1, pp. 220–223 (1996)

    Google Scholar 

  15. Afify, M., Siohan, O., Sarikaya, R.: Gaussian mixture language models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV–29–IV–32 (2007)

    Google Scholar 

  16. Sarikaya, R., Afify, M., Kingsbury, B.: Tied-mixture language modeling in continuous space. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 459–467. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  17. Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 570–583 (1990)

    Article  Google Scholar 

  18. Rosenfeld, R.: A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10(3), 187–228 (1996)

    Article  MathSciNet  Google Scholar 

  19. Nakagawa, S., Murase, I., Zhou, M.: Comparison of language models by stochastic context-free grammar, bigram and quasi-simplified-trigram (0300-1067). IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 0300–1067 (2008)

    Google Scholar 

  20. Niesler, T., Woodland, P.: A variable-length category-based n-gram language model. In: 1996 IEEE International Conference Proceedings on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 1, pp. 164–167 (1996)

    Google Scholar 

  21. Bellegarda, J., Butzberger, J., Chow, Y.-L., Coccaro, N., Naik, D.: A novel word clustering algorithm based on latent semantic analysis. In: 1996 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 1, pp. 172–175 (1996)

    Google Scholar 

  22. Bellegarda, J.: A multispan language modeling framework for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing 6(5), 456–467 (1998)

    Article  Google Scholar 

  23. Bellegarda, J.: Latent semantic mapping (information retrieval). IEEE Signal Processing Magazine 22(5), 70–80 (2005)

    Article  Google Scholar 

  24. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  25. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. Journal of Machine Learning Research 3, 1137–1155 (2003)

    MATH  Google Scholar 

  26. Blat, F., Castro, M., Tortajada, S., Snchez, J.: A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 195–202. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  27. Emami, A., Xu, P., Jelinek, F.: Using a connectionist model in a syntactical based language model. In: 2003 IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003, vol. 1, pp. I–372–I–375 (2003)

    Google Scholar 

  28. Schwenk, H., Gauvain, J.: Connectionist language modeling for large vocabulary continuous speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. I–765–I–768 (2002)

    Google Scholar 

  29. Schwenk, H., Gauvain, J.-L.: Neural network language models for conversational speech recognition. In: ICSLP (2004)

    Google Scholar 

  30. Schwenk, H., Gauvain, J.-L.: Building continuous space language models for transcribing european languages. In: INTERSPEECH, pp. 737–740. ISCA (2005)

    Google Scholar 

  31. Naptali, W., Tsuchiya, M., Nakagawa, S.: Language model based on word order sensitive matrix representation in latent semantic analysis for speech recognition. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 252–256 (2009)

    Google Scholar 

  32. Fumitada: A linear space representation of language probability through SVD of n-gram matrix. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 86(8), 61–70 (2003)

    Article  Google Scholar 

  33. Rishel, T., Perkins, A.L., Yenduri, S., Zand, F., Iyengar, S.S.: Augmentation of a term/document matrix with part-of-speech tags to improve accuracy of latent semantic analysis. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science, ACOS 2006, pp. 573–578. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2006)

    Google Scholar 

  34. Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language 9(2), 171–185 (1995)

    Article  Google Scholar 

  35. Bellegarda, J.R., Nahamoo, D.: Tied mixture continuous parameter modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38(12), 2033–2045 (1990)

    Article  Google Scholar 

  36. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970), doi:10.2307/2239727

    Article  MATH  MathSciNet  Google Scholar 

  37. Rashwan, M., Al-Badrashiny, M., Attia, M., Abdou, S., Rafea, A.: A stochastic arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio, Speech, and Language Processing 19(1), 166–175 (2011)

    Article  Google Scholar 

  38. Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proceedings of ICSLP, vol. 2, Denver, USA, pp. 901–904 (2002)

    Google Scholar 

  39. Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK book, version 3.4. In: Cambridge University Engineering Department, Cambridge, UK (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Talaat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Talaat, M., Abdou, S., Shoman, M. (2015). Long-Distance Continuous Space Language Modeling for Speech Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_41

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics