Advertisement

Juicer: A Weighted Finite-State Transducer Speech Decoder

  • Darren Moore
  • John Dines
  • Mathew Magimai Doss
  • Jithendra Vepa
  • Octavian Cheng
  • Thomas Hain
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)

Abstract

A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the “Juicer” (from transducer) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving state-of-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer.

Keywords

Speech Recognition Language Model Automatic Speech Recognition Knowledge Source Decoder Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aubert, X.: An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech and Language 16(1), 89–114 (2002)CrossRefGoogle Scholar
  2. 2.
    Caseiro, D.A.: Finite-state methods in automatic speech recognition. Ph.D thesis, Instituto Superior Téchnico, Universidade Técnica de Lisboa (December 2003)Google Scholar
  3. 3.
    Dolfing, H., Hetherington, I.: Incremental language models for speech recognition using finite-state transducers. In: Proc. IEEE ASRU 2001 (2001)Google Scholar
  4. 4.
    Young, S., et al.: The HTK Book. Cambridge University Engineering Department, For HTK Version 3.2.1 (December 2002)Google Scholar
  5. 5.
    Hain, T., et al.: The 2005 AMI system for the transcription of speech in meetings. In: Proc. NIST RT05 Workshop, Edinburgh (July 2005)Google Scholar
  6. 6.
    Hetherington, L.: The MIT FST toolkit. MIT Computer Science and Artificial Intelligence Laboratory (May 2005), http://people.csail.mit.edu/ilh//fst/
  7. 7.
    Hochberg, M., Renals, S., Robinson, A., Kershaw, D.: Large vocabulary continuous speech recognition using a hybrid connectionist-HMM system. In: Proc. ICSLP, Yokohama, Japan, pp. 1499–1502 (1994)Google Scholar
  8. 8.
    Hori, T., Hori, C., Minami, Y.: Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous speech recognition. In: Proc. Interspeech (ICSLP), vol. 1, pp. 289–292 (October 2004)Google Scholar
  9. 9.
    Kanthak, S., Ney, H.: FSA: An efficient and flexible C++ toolkit for finite state automata using on demand computation. In: Proc. ACL, Barcelona, Spain, July 2004, pp. 510–517 (2004)Google Scholar
  10. 10.
    Lee, K.F.: Automatic Speech Recognition – The Development of the Sphinx System. Kluwer Academic Publishers, Norwell (1989)Google Scholar
  11. 11.
    Mohri, M.: Finite-state transducers in language and speech processing. Computational Linguistics 23(2) (1997)Google Scholar
  12. 12.
    Mohri, M., Pereira, F., Riley, M.: General-purpose finite-state machine software tools. AT&T Labs – Research (1997), http://www.research.att.com/sw/tools/fsm
  13. 13.
    Mohri, M., Pereira, F., Riley, M.l.: Weighted finite-state transducers in speech recognition. Computer Speech and Language 16(1), 69–88 (2002)CrossRefGoogle Scholar
  14. 14.
    Moore, D.: TODE: A Decoder for Continuous Speech Recognition. IDIAP Research Institute, Martigny, Switzerland (2002)Google Scholar
  15. 15.
    Moore, D.: The Juicer LVCSR decoder - user manual. IDIAP Research Institute, Martigny, Switzerland, for Juicer version 0.5.0 (August 2005)Google Scholar
  16. 16.
    Paul, D.B., Baker, J.M.: The design for the wall stree journal-based CSR corpus. In: Proc. ICSLP (1992)Google Scholar
  17. 17.
    Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in beam search. In: Proc. ICSLP, Yokohama, Japan, pp. 2143–2146 (September 1994)Google Scholar
  18. 18.
    Willett, D., Katagiri, S.: Recent advances in efficient decoding combining on-line transducer composition and smoothed language model incorporation. In: Proc. ICASSP, vol. 1, pp. 713–716 (May 2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Darren Moore
    • 1
  • John Dines
    • 1
  • Mathew Magimai Doss
    • 1
  • Jithendra Vepa
    • 1
  • Octavian Cheng
    • 1
  • Thomas Hain
    • 2
  1. 1.IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL)MartignySwitzerland
  2. 2.Department of Computer ScienceUniversity of SheffieldUK

Personalised recommendations