Impact of the Approaches Involved on Word-Graph Derivation from the ASR System
Finding the most likely sequence of symbols given a sequence of observations is a classical pattern recognition problem. This problem is frequently approached by means of the Viterbi algorithm, which aims at finding the most likely sequence of states within a trellis given a sequence of observations. Viterbi algorithm is widely used within the automatic speech recognition (ASR) framework to find the expected sequence of words given the acoustic utterance in spite of providing a suboptimal result. Word-graphs (WGs) are also frequently provided as the ASR output as a means of obtaining alternative hypotheses, hopefully more accurate than the one provided by the Viterbi algorithm. The trouble is that WGs can grow up in a very computationally inefficient manner. The aim of this work is to fully describe a specific method, computationally affordable, for getting a WG given the input utterance. The paper focuses specifically on the underlying approaches and their influence on both the spatial cost and the performance.
KeywordsLattice word-graphs automatic speech recognition
Unable to display preview. Download preview PDF.
- 3.Ferreiros, J., Segundo, R.S., Fernández, F., D’Haro, L., Sama, V., Barra, R., Mellén, P.: New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding. In: Proc. Interspeech, pp. 3377–3380 (2005)Google Scholar
- 4.Blackwood, G.: Lattice Rescoring Methods for Statistical Machine Translation. PhD thesis, University of Cambridge (2010)Google Scholar
- 5.Jelinek, F.: Statistical Methods for Speech Recognition, 2nd edn. Language, Speech and Communication series. The MIT Press, Cambridge (1999)Google Scholar
- 6.Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)Google Scholar
- 7.Caseiro, D., Trancoso, I.: A specialized on-the-fly algorithm for lexicon and language model composition. IEEE TASLP 14, 1281–1291 (2006)Google Scholar
- 8.Benedí, J., Lleida, E., Varona, A., Castro, M., Galiano, I., Justo, R., López, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proc. of LREC 2006, Genoa, Italy (2006)Google Scholar
- 9.Pérez, A., Torres, M.I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: Proc. of the 5t SALTMIL, Genoa, Italy (2006)Google Scholar