Advertisement

Impact of the Approaches Involved on Word-Graph Derivation from the ASR System

  • Raquel Justo
  • Alicia Pérez
  • M. Inés Torres
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6669)

Abstract

Finding the most likely sequence of symbols given a sequence of observations is a classical pattern recognition problem. This problem is frequently approached by means of the Viterbi algorithm, which aims at finding the most likely sequence of states within a trellis given a sequence of observations. Viterbi algorithm is widely used within the automatic speech recognition (ASR) framework to find the expected sequence of words given the acoustic utterance in spite of providing a suboptimal result. Word-graphs (WGs) are also frequently provided as the ASR output as a means of obtaining alternative hypotheses, hopefully more accurate than the one provided by the Viterbi algorithm. The trouble is that WGs can grow up in a very computationally inefficient manner. The aim of this work is to fully describe a specific method, computationally affordable, for getting a WG given the input utterance. The paper focuses specifically on the underlying approaches and their influence on both the spatial cost and the performance.

Keywords

Lattice word-graphs automatic speech recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Forney Jr., G.D.: The Viterbi Algorithm. Proc. of the IEEE 61, 268–278 (1973)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Hazen, T.J., Seneff, S., Polifroni, J.: Recognition confidence scoring and its use in speech understanding systems. Computer Speech & Language 16, 49–67 (2002)CrossRefGoogle Scholar
  3. 3.
    Ferreiros, J., Segundo, R.S., Fernández, F., D’Haro, L., Sama, V., Barra, R., Mellén, P.: New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding. In: Proc. Interspeech, pp. 3377–3380 (2005)Google Scholar
  4. 4.
    Blackwood, G.: Lattice Rescoring Methods for Statistical Machine Translation. PhD thesis, University of Cambridge (2010)Google Scholar
  5. 5.
    Jelinek, F.: Statistical Methods for Speech Recognition, 2nd edn. Language, Speech and Communication series. The MIT Press, Cambridge (1999)Google Scholar
  6. 6.
    Huang, X., Acero, A., Hon, H.: Spoken Language Processing: A guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)Google Scholar
  7. 7.
    Caseiro, D., Trancoso, I.: A specialized on-the-fly algorithm for lexicon and language model composition. IEEE TASLP 14, 1281–1291 (2006)Google Scholar
  8. 8.
    Benedí, J., Lleida, E., Varona, A., Castro, M., Galiano, I., Justo, R., López, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proc. of LREC 2006, Genoa, Italy (2006)Google Scholar
  9. 9.
    Pérez, A., Torres, M.I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: Proc. of the 5t SALTMIL, Genoa, Italy (2006)Google Scholar
  10. 10.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. on Pattern Analysis and Machine Intelligence 23, 1222–1239 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Raquel Justo
    • 1
  • Alicia Pérez
    • 1
  • M. Inés Torres
    • 1
  1. 1.University of the Basque CountryLeioaSpain

Personalised recommendations