Multiple-Pass Search Strategies

  • Richard Schwartz
  • Long Nguyen
  • John Makhoul
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 355)

Abstract

Large vocabulary speech recognition is very expensive computationally. We explore multi-pass search strategies as a way to reduce computation substantially, without any increase in error rate. We consider two basic strategies: the N-best Paradigm, and the Forward-Backward search. Both of these strategies operate on the entire sentence in (at least) two passes. The N-best Paradigm computes alternative hypotheses for a sentence, which can later be rescored using more detailed and more expensive knowledge sources. We present and compare many algorithms for finding the N-best sentence hypotheses, and suggest which are the most efficient and accurate. The Forward-Backward Search performs a time-synchronous forward search that finds all of the words that are likely to end at each frame within an utterance. Then, a second more expensive search can be performed in the backward direction, restricting its attention to those words found in the forward pass.

Keywords

Acoustics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    F. Alleva, X. Huang and M.-Y. Hwang, “An Improved Search Algorithm Using Incremental Knowledge for Continuous Speech Recognition”, IEEE ICASSP-93, pp. II–307–310, April 1993.Google Scholar
  2. [2]
    S. Austin, R. Schwartz, and P. Placeway, “The Forward-Backward Search Strategy for Real-Time Speech Recognition”, IEEE ICASSP-91, Toronto, Canada, pp. 697–700, May 1991. Also in Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.Google Scholar
  3. [3]
    A. Austin, G. Zavaliagkos, J. Makhoul and R. Schwartz, “Speech Recognition using Segmental Neural Nets”, IEEE ICASSP-92.Google Scholar
  4. [4]
    L. R. Bahl, P. de Souza, P. S. Gopalakrishnan, D. Kanevsky and D. Na-hamoo, “Constructing Groups of Acoustically Confusable Words”, IEEE ICASSP-90, April 1990.Google Scholar
  5. [5]
    J.-L. Gauvain, L. F. Lamel, G. Adda, and M. Adda-Decker, “The LIMSI Continuous Speech Dictation System: Evaluation on the ARPA Wall Street Journal Task” IEEE ICASSP-94, pp. 557–560, Adelaide, Australia, April 1994.Google Scholar
  6. [6]
    L Gillick and R. Roth, “A Rapid Match Algorithm for Continuous Speech Recognition”, Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.Google Scholar
  7. [7]
    P. S. Gopalakrishnan, L. R. Bahl and R. L. Mercer, “A Tree Search Strategy for Large-Vocabulary Continuous Speech Recognition”, IEEE ICASSP-95, pp. I–572–575, Detroit, MI, May, 1995.Google Scholar
  8. [8]
    J. Mariño and E. Monte, “Generation of Multiple Hypothesis in Connected Phonetic-Unit Recognition by a Modified One-Stage Dynamic Programming Algorithm”, Proc. of the EuroSpeech-89, Vol. 2, pp. 408–411, Paris, Sept. 1989.Google Scholar
  9. [9]
    H. Murveit, J. Butzberger, V. Digalakis and M. Weintraub, “Large Vocabulary Dictation using SRI’s Decipher Speech Recognition System: Progressive Search Techniques”, IEEE ICASSP-93, Vol. II pp. 319–322, Minneapolis, MN, April, 1993.Google Scholar
  10. [10]
    L. Nguyen, R. Schwartz, F. Kubala and P. Placeway, “Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies”, Proc. of ARPA Human Language Technology Workshop, pp. 91–95, Plains-boro, NJ, Mar. 1993.Google Scholar
  11. [11]
    L. Nguyen, R. Schwartz, Y. Zhao and G. Zavaliagkos, “Is N-best Dead?”, Proc. of ARPA Human Language Technology Workshop, pp. 411–414, Plainsboro, NJ, Mar. 1994.Google Scholar
  12. [12]
    M. Ostendorf, A. Kannan, O. Kimball, R. Schwartz, S. Austin and R. Rohlicek, “Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses”. Proc. of the DARPA Speech and Natural Language Workshop, Monterey, Feb. 1991.Google Scholar
  13. [13]
    D. Paul, “Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder”, IEEE ICASSP-91, pp. 693–696, Toronto, Canada, May 1991.Google Scholar
  14. [14]
    P. Price, W. M. Fisher, J. Bernstein and D.S. Pallett, “The DARPA 1000-Word Resource Management Database for Continuous Speech Recognition,” IEEE ICASSP-88, pp. 651–654, New York, NY, April 1988.Google Scholar
  15. [15]
    R. Schwartz and Y. L. Chow, “The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses”, IEEE ICASSP-90, pp. 81–84, Albuquerque, April 1990. Also in Proc. of the DARPA Speech and Natural Language Workshop, Cape Cod, Oct. 1989.Google Scholar
  16. [16]
    R. Schwartz and S. Austin, “A Comparison Of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses”, IEEE ICASSP-91, pp. 701–704, Toronto, Canada, May 1991.Google Scholar
  17. [17]
    F. Soong and E. Huang, “A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition”. IEEE ICASSP-91, pp. 705–708, Toronto, Canada, May 1991. Also in Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.Google Scholar
  18. [18]
    V. Steinbiss, “Sentence-Hypotheses Generation in a Continuous-Speech Recognition System,” Proc. EuroSpeech-89, Vol. 2, pp. 51–54, Paris, Sept. 1989.Google Scholar
  19. [19]
    P. Woodland, C. Legetter, J. Odell, V. Valtchev, and S. Young, “The Development of the 1994 HTK Large Vocabulary Speech Recognition System”, Proc. of ARPA Spoken Language Technology Workshop, pp. 104–109, Austin, TX, January, 1995.Google Scholar
  20. [20]
    S. Young, “Generating Multiple Solutions from Connected Word DP Recognition Algorithms”. Proc. of the Institute of Acoustics, Vol. 6 Part 4, pp. 351–354, 1984.Google Scholar
  21. [21]
    G. Zavaliagkos, S. Austin, J. Makhoul and R. Schwartz, “A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets With Hidden Markov Models”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4, pp. 949–963, 1993.CrossRefGoogle Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Richard Schwartz
    • 1
  • Long Nguyen
    • 1
  • John Makhoul
    • 1
  1. 1.BBN CorporationCambridgeUSA

Personalised recommendations