Long-Distance Continuous Space Language Modeling for Speech Recognition

Talaat, Mohamed; Abdou, Sherif; Shoman, Mahmoud

doi:10.1007/978-3-319-18117-2_41

Mohamed Talaat¹⁴,
Sherif Abdou¹⁴ &
Mahmoud Shoman¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

3319 Accesses

Abstract

The n-gram language models has been the most frequently used language model for a long time as they are easy to build models and require the minimum effort for integration in different NLP applications. Although of its popularity, n-gram models suffer from several drawbacks such as its ability to generalize for the unseen words in the training data, the adaptability to new domains, and the focus only on short distance word relations. To overcome the problems of the n-gram models the continuous parameter space LMs were introduced. In these models the words are treated as vectors of real numbers rather than of discrete entities. As a result, semantic relationships between the words could be quantified and can be integrated into the model. The infrequent words are modeled using the more frequent ones that are semantically similar. In this paper we present a long distance continuous language model based on a latent semantic analysis (LSA). In the LSA framework, the word-document co-occurrence matrix is commonly used to tell how many times a word occurs in a certain document. Also, the word-word co-occurrence matrix is used in many previous studies. In this research, we introduce a different representation for the text corpus, this by proposing long-distance word co-occurrence matrices. These matrices to represent the long range co-occurrences between different words on different distances in the corpus. By applying LSA to these matrices, words in the vocabulary are moved to the continuous vector space. We represent each word with a continuous vector that keeps the word order and position in the sentences. We use tied-mixture HMM modeling (TM-HMM) to robustly estimate the LM parameters and word probabilities. Experiments on the Arabic Gigaword corpus show improvements in the perplexity and the speech recognition results compared to the conventional n-gram.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Markov, A.A.: An example of statistical investigation in the text of ‘Eugene Onyegin’ illustrating coupling of ‘tests’ in chains. In: Proceedings of the Academy of Sciences. VI, vol. 7 , St. Petersburg, pp. 153–162 (1913)
Google Scholar
Damerau, F.: Markov models and linguistic theory: an experimental study of a model for English, Janua linguarum: Series minor. Mouton (1971)
Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. Language, Speech, & Communication: A Bradford Book. MIT Press (1997)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 181–184 (1995)
Google Scholar
Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependencies in stochastic language modelling. Computer Speech and Language 8, 1–38 (1994)
Article Google Scholar
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3 and 4), 237–264 (1953)
Article MATH MathSciNet Google Scholar
Jelinek, F., Mercer, R.L.: Interpolated estimation of markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)
Google Scholar
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3), 400–401 (1987)
Article Google Scholar
Lidstone, G.: Note on the general case of the Bayes–Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries 8, 182–192 (1920)
Google Scholar
Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice-Hall, Inc., Upper Saddle River (1990)
Google Scholar
Brown, P.F., de Souza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Broman, S., Kurimo, M.: Methods for combining language models in speech recognition. In: Interspeech, pp. 1317–1320 (September 2005)
Google Scholar
Wada, Y., Kobayashi, N., Kobayashi, T.: Robust language modeling for a small corpus of target tasks using class-combined word statistics and selective use of a general corpus. Systems and Computers in Japan 34(12), 92–102 (2003)
Article Google Scholar
Niesler, T., Woodland, P.: Combination of word-based and category-based language models. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 1996, vol. 1, pp. 220–223 (1996)
Google Scholar
Afify, M., Siohan, O., Sarikaya, R.: Gaussian mixture language models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV–29–IV–32 (2007)
Google Scholar
Sarikaya, R., Afify, M., Kingsbury, B.: Tied-mixture language modeling in continuous space. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 459–467. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 570–583 (1990)
Article Google Scholar
Rosenfeld, R.: A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10(3), 187–228 (1996)
Article MathSciNet Google Scholar
Nakagawa, S., Murase, I., Zhou, M.: Comparison of language models by stochastic context-free grammar, bigram and quasi-simplified-trigram (0300-1067). IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 0300–1067 (2008)
Google Scholar
Niesler, T., Woodland, P.: A variable-length category-based n-gram language model. In: 1996 IEEE International Conference Proceedings on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 1, pp. 164–167 (1996)
Google Scholar
Bellegarda, J., Butzberger, J., Chow, Y.-L., Coccaro, N., Naik, D.: A novel word clustering algorithm based on latent semantic analysis. In: 1996 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 1, pp. 172–175 (1996)
Google Scholar
Bellegarda, J.: A multispan language modeling framework for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing 6(5), 456–467 (1998)
Article Google Scholar
Bellegarda, J.: Latent semantic mapping (information retrieval). IEEE Signal Processing Magazine 22(5), 70–80 (2005)
Article Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. Journal of Machine Learning Research 3, 1137–1155 (2003)
MATH Google Scholar
Blat, F., Castro, M., Tortajada, S., Snchez, J.: A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 195–202. Springer, Heidelberg (2005)
Chapter Google Scholar
Emami, A., Xu, P., Jelinek, F.: Using a connectionist model in a syntactical based language model. In: 2003 IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003, vol. 1, pp. I–372–I–375 (2003)
Google Scholar
Schwenk, H., Gauvain, J.: Connectionist language modeling for large vocabulary continuous speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. I–765–I–768 (2002)
Google Scholar
Schwenk, H., Gauvain, J.-L.: Neural network language models for conversational speech recognition. In: ICSLP (2004)
Google Scholar
Schwenk, H., Gauvain, J.-L.: Building continuous space language models for transcribing european languages. In: INTERSPEECH, pp. 737–740. ISCA (2005)
Google Scholar
Naptali, W., Tsuchiya, M., Nakagawa, S.: Language model based on word order sensitive matrix representation in latent semantic analysis for speech recognition. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 252–256 (2009)
Google Scholar
Fumitada: A linear space representation of language probability through SVD of n-gram matrix. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 86(8), 61–70 (2003)
Article Google Scholar
Rishel, T., Perkins, A.L., Yenduri, S., Zand, F., Iyengar, S.S.: Augmentation of a term/document matrix with part-of-speech tags to improve accuracy of latent semantic analysis. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science, ACOS 2006, pp. 573–578. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2006)
Google Scholar
Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Bellegarda, J.R., Nahamoo, D.: Tied mixture continuous parameter modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38(12), 2033–2045 (1990)
Article Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970), doi:10.2307/2239727
Article MATH MathSciNet Google Scholar
Rashwan, M., Al-Badrashiny, M., Attia, M., Abdou, S., Rafea, A.: A stochastic arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio, Speech, and Language Processing 19(1), 166–175 (2011)
Article Google Scholar
Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proceedings of ICSLP, vol. 2, Denver, USA, pp. 901–904 (2002)
Google Scholar
Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK book, version 3.4. In: Cambridge University Engineering Department, Cambridge, UK (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computers and Information, Cairo University, 5 Dr. Ahmed Zewail Street, 12613, Giza, Egypt
Mohamed Talaat, Sherif Abdou & Mahmoud Shoman

Authors

Mohamed Talaat
View author publications
You can also search for this author in PubMed Google Scholar
Sherif Abdou
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Shoman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Talaat .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Talaat, M., Abdou, S., Shoman, M. (2015). Long-Distance Continuous Space Language Modeling for Speech Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-18117-2_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics