Abstract
The n-gram language models has been the most frequently used language model for a long time as they are easy to build models and require the minimum effort for integration in different NLP applications. Although of its popularity, n-gram models suffer from several drawbacks such as its ability to generalize for the unseen words in the training data, the adaptability to new domains, and the focus only on short distance word relations. To overcome the problems of the n-gram models the continuous parameter space LMs were introduced. In these models the words are treated as vectors of real numbers rather than of discrete entities. As a result, semantic relationships between the words could be quantified and can be integrated into the model. The infrequent words are modeled using the more frequent ones that are semantically similar. In this paper we present a long distance continuous language model based on a latent semantic analysis (LSA). In the LSA framework, the word-document co-occurrence matrix is commonly used to tell how many times a word occurs in a certain document. Also, the word-word co-occurrence matrix is used in many previous studies. In this research, we introduce a different representation for the text corpus, this by proposing long-distance word co-occurrence matrices. These matrices to represent the long range co-occurrences between different words on different distances in the corpus. By applying LSA to these matrices, words in the vocabulary are moved to the continuous vector space. We represent each word with a continuous vector that keeps the word order and position in the sentences. We use tied-mixture HMM modeling (TM-HMM) to robustly estimate the LM parameters and word probabilities. Experiments on the Arabic Gigaword corpus show improvements in the perplexity and the speech recognition results compared to the conventional n-gram.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Markov, A.A.: An example of statistical investigation in the text of ‘Eugene Onyegin’ illustrating coupling of ‘tests’ in chains. In: Proceedings of the Academy of Sciences. VI, vol. 7 , St. Petersburg, pp. 153–162 (1913)
Damerau, F.: Markov models and linguistic theory: an experimental study of a model for English, Janua linguarum: Series minor. Mouton (1971)
Jelinek, F.: Statistical Methods for Speech Recognition. Language, Speech, & Communication: A Bradford Book. MIT Press (1997)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 181–184 (1995)
Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependencies in stochastic language modelling. Computer Speech and Language 8, 1–38 (1994)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3 and 4), 237–264 (1953)
Jelinek, F., Mercer, R.L.: Interpolated estimation of markov source parameters from sparse data. In: Proceedings of the Workshop on Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3), 400–401 (1987)
Lidstone, G.: Note on the general case of the Bayes–Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries 8, 182–192 (1920)
Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice-Hall, Inc., Upper Saddle River (1990)
Brown, P.F., de Souza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Broman, S., Kurimo, M.: Methods for combining language models in speech recognition. In: Interspeech, pp. 1317–1320 (September 2005)
Wada, Y., Kobayashi, N., Kobayashi, T.: Robust language modeling for a small corpus of target tasks using class-combined word statistics and selective use of a general corpus. Systems and Computers in Japan 34(12), 92–102 (2003)
Niesler, T., Woodland, P.: Combination of word-based and category-based language models. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 1996, vol. 1, pp. 220–223 (1996)
Afify, M., Siohan, O., Sarikaya, R.: Gaussian mixture language models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV–29–IV–32 (2007)
Sarikaya, R., Afify, M., Kingsbury, B.: Tied-mixture language modeling in continuous space. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 459–467. Association for Computational Linguistics, Stroudsburg (2009)
Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 570–583 (1990)
Rosenfeld, R.: A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10(3), 187–228 (1996)
Nakagawa, S., Murase, I., Zhou, M.: Comparison of language models by stochastic context-free grammar, bigram and quasi-simplified-trigram (0300-1067). IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 0300–1067 (2008)
Niesler, T., Woodland, P.: A variable-length category-based n-gram language model. In: 1996 IEEE International Conference Proceedings on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 1, pp. 164–167 (1996)
Bellegarda, J., Butzberger, J., Chow, Y.-L., Coccaro, N., Naik, D.: A novel word clustering algorithm based on latent semantic analysis. In: 1996 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 1, pp. 172–175 (1996)
Bellegarda, J.: A multispan language modeling framework for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing 6(5), 456–467 (1998)
Bellegarda, J.: Latent semantic mapping (information retrieval). IEEE Signal Processing Magazine 22(5), 70–80 (2005)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. Journal of Machine Learning Research 3, 1137–1155 (2003)
Blat, F., Castro, M., Tortajada, S., Snchez, J.: A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 195–202. Springer, Heidelberg (2005)
Emami, A., Xu, P., Jelinek, F.: Using a connectionist model in a syntactical based language model. In: 2003 IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003, vol. 1, pp. I–372–I–375 (2003)
Schwenk, H., Gauvain, J.: Connectionist language modeling for large vocabulary continuous speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. I–765–I–768 (2002)
Schwenk, H., Gauvain, J.-L.: Neural network language models for conversational speech recognition. In: ICSLP (2004)
Schwenk, H., Gauvain, J.-L.: Building continuous space language models for transcribing european languages. In: INTERSPEECH, pp. 737–740. ISCA (2005)
Naptali, W., Tsuchiya, M., Nakagawa, S.: Language model based on word order sensitive matrix representation in latent semantic analysis for speech recognition. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 252–256 (2009)
Fumitada: A linear space representation of language probability through SVD of n-gram matrix. Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 86(8), 61–70 (2003)
Rishel, T., Perkins, A.L., Yenduri, S., Zand, F., Iyengar, S.S.: Augmentation of a term/document matrix with part-of-speech tags to improve accuracy of latent semantic analysis. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science, ACOS 2006, pp. 573–578. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2006)
Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language 9(2), 171–185 (1995)
Bellegarda, J.R., Nahamoo, D.: Tied mixture continuous parameter modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38(12), 2033–2045 (1990)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970), doi:10.2307/2239727
Rashwan, M., Al-Badrashiny, M., Attia, M., Abdou, S., Rafea, A.: A stochastic arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Transactions on Audio, Speech, and Language Processing 19(1), 166–175 (2011)
Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proceedings of ICSLP, vol. 2, Denver, USA, pp. 901–904 (2002)
Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK book, version 3.4. In: Cambridge University Engineering Department, Cambridge, UK (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Talaat, M., Abdou, S., Shoman, M. (2015). Long-Distance Continuous Space Language Modeling for Speech Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)