Finding the most likely sequence of symbols given a sequence of observations is a classical pattern recognition problem. This problem is frequently approached by means of the Viterbi algorithm, which aims at finding the most likely sequence of states within a trellis given a sequence of observations. Viterbi algorithm is widely used within the automatic speech recognition (ASR) framework to find the expected sequence of words given the acoustic utterance in spite of providing a suboptimal result. Word-graphs (WGs) are also frequently provided as the ASR output as a means of obtaining alternative hypotheses, hopefully more accurate than the one provided by the Viterbi algorithm. The trouble is that WGs can grow up in a very computationally inefficient manner. The aim of this work is to fully describe a specific method, computationally affordable, for getting a WG given the input utterance. The paper focuses specifically on the underlying approaches and their influence on both the spatial cost and the performance.
- automatic speech recognition
This work has been partially funded by the Spanish Ministry of Science and Innovation under the Consolider Ingenio 2010 programme (MIPRCV CSD2007-00018) and SD-TEAM project (TIN2008-06856-C05-01); and by the Basque Government (under grant GIC10/158 IT375-10).
This is a preview of subscription content, access via your institution.
Tax calculation will be finalised at checkout
Purchases are for personal use onlyLearn about institutional subscriptions
Unable to display preview. Download preview PDF.
Forney Jr., G.D.: The Viterbi Algorithm. Proc. of the IEEE 61, 268–278 (1973)
Hazen, T.J., Seneff, S., Polifroni, J.: Recognition confidence scoring and its use in speech understanding systems. Computer Speech & Language 16, 49–67 (2002)
Ferreiros, J., Segundo, R.S., Fernández, F., D’Haro, L., Sama, V., Barra, R., Mellén, P.: New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding. In: Proc. Interspeech, pp. 3377–3380 (2005)
Blackwood, G.: Lattice Rescoring Methods for Statistical Machine Translation. PhD thesis, University of Cambridge (2010)
Jelinek, F.: Statistical Methods for Speech Recognition, 2nd edn. Language, Speech and Communication series. The MIT Press, Cambridge (1999)
Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Caseiro, D., Trancoso, I.: A specialized on-the-fly algorithm for lexicon and language model composition. IEEE TASLP 14, 1281–1291 (2006)
Benedí, J., Lleida, E., Varona, A., Castro, M., Galiano, I., Justo, R., López, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proc. of LREC 2006, Genoa, Italy (2006)
Pérez, A., Torres, M.I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: Proc. of the 5t SALTMIL, Genoa, Italy (2006)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. on Pattern Analysis and Machine Intelligence 23, 1222–1239 (2001)
Editors and Affiliations
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Justo, R., Pérez, A., Torres, M.I. (2011). Impact of the Approaches Involved on Word-Graph Derivation from the ASR System. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21257-4_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21256-7
Online ISBN: 978-3-642-21257-4