Advertisement

Word-Spotting and Rejection

  • Jean-Claude Junqua
  • Jean-Paul Haton
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 341)

Summary

In this chapter, the state of the art of word-spotting and rejection methods is presented. After an introduction to word-spotting, available algorithms are classified in several categories. This is followed by a description of template matching word-spotting systems, garbage modeling, and the use of a large vocabulary recognizer in a word-spotting task. Then, we address the issues of vocabulary-independent word-spotting, performance measures, and rejection. The rejection problem is associated to the notion of confidence measure indicating how well an hypothesis matches with the recognized result. Consequently, confidence measures and the related problem of detecting out-of-vocabulary words are considered.

Keywords

False Alarm Speech Recognition Confidence Measure Speech Recognition System Unknown Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alleva, F. and Lee, K.-F. (1989). Automatic new word acquisition: Spelling from acoustics. In DARPA Speech and Natural Language Workshop, pages 266–270.CrossRefGoogle Scholar
  2. Antoniol, G., Cettolo, M., and Federico, M. (1993). Robust and reliable speech understanding in restricted domains. In IEEE ASR Workshop, pages 103–104.Google Scholar
  3. Asadi, A., Schwartz, R., and Makhoul, J. (1990). Automatic detection of new words in a large-vocabulary continuous speech recognition system. In ICASSP, pages 125–128.Google Scholar
  4. Asadi, A., Schwartz, R., and Makhoul, J. (1991). Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system. In ICASSP, pages 305–308.Google Scholar
  5. Bahl, L., Brown, P., de Souza, P., Mercer, R., and Picheny, M. (1988). Acoustic Markov models in the TANGORA speech recognition system. In ICASSP, pages 497–500.Google Scholar
  6. Boite, J.-M., Boulard, H., D’hoore, B., and Haesen, M. (1993). A new approach towards keyword spotting. In EUROSPEECH, pages 1273–1276.Google Scholar
  7. Boulard, H., D’hoore, B., and Boite, J.-M. (1994). Optimizing recognition and rejection performance in wordspotting systems. In ICASSP, pages I.373-I.376.Google Scholar
  8. Bridle, J. (1973). An efficient elastic-template method for detecting given words in running speech. In Brit Acoust. Soc. Meeting, pages 1–4.Google Scholar
  9. Chigier, B. (1992). Rejection and keyword spotting algorithms for a directory assistance city name recognition application. In ICASSP, pages H.93-II.96.Google Scholar
  10. Christiansen, R. and Rushforth, C. (1977). Detecting and locating key words in continuous speech using linear predictive coding. IEEE Trans. ASSP, ASSP-25(5):361–367.CrossRefGoogle Scholar
  11. Cole, R., Novick, D., Fanty, M., and. S. Sutton, P. V., Burnett, D., and Schalkwyk, J. (1994). A prototype voice-response questionnaire for the U.S. census. In ICSLP, pages 683–686.Google Scholar
  12. De la Torre, C. and Acero, A. (1994). Discriminative training of garbage model for non-vocabulary utterance rejection. In ICSLP, pages 475–478.Google Scholar
  13. Feng, M.-W. and Mazor, B. (1992). Continuous word spotting for applications in telecommunications. In ICSLP, pages 21–24.Google Scholar
  14. Gillick, L., Baker, J., Baker, J., Bridle, J., Hunt, M., Ito, Y., Lowe, S., Orloff, J., Peskin, B., Roth, R., and Scattone, F. (1993). Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech. In ICASSP, pages II.471-n.474.Google Scholar
  15. Gish, H., Ng, K., and Rohlicek, J. (1992). Secondary processing using speech segments for an HMM word spotting system. In ICSLP, pages 17–20.Google Scholar
  16. Godfrey, J., Holliman, E., and McDaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In ICASSP, pages 1.517–1.520.Google Scholar
  17. Haeb-Umbach, R., Beyerlein, P., and Thelen, E. (1995). Automatic transcription of unknown words in a speech recognition system. In ICASSP, pages 840–843.Google Scholar
  18. Higgins, A. and Wohlford, R. (1985). Keyword recognition using template concatenation. In ICASSP, pages 1233–1236.Google Scholar
  19. Hofstetter, E. and Rose, R. (1992). Techniques for task independent word spotting in continuous speech messages. In ICASSP, pages H101–11.104.Google Scholar
  20. Inamura, A. and Suzuki, Y. (1990). Speaker-independent word spotting and a transputer-based implementation. In ICSLP, pages 13.5.1–13.5.4.Google Scholar
  21. James, D. and Young, S. (1994). A fast lattice-based approach to vocabulary independent wordspotting. In ICASSP, pages I.377-I.380.Google Scholar
  22. Jones, G., Foote, J., Sparck-Jones, K., and Young, S. (1995). Video mail retieval: The effect of word spotting accuracy on precision. In ICASSP, pages 309–312.Google Scholar
  23. Kimura, T., Niyada, K., Hiraoka, S., Morii, S., and Watanabe, T. (1987). A telephone speech recognition system using word spotting technique based on statistical measure. In ICASSP, pages 1175–1178.Google Scholar
  24. Li, K., Naylor, J., and Rossen, M. (1992). A whole word recurrent neural network for keyword spotting. In ICASSP, pages II.81–II84.Google Scholar
  25. Lleida, E., MariJ., Salavedra, J., Bonafonte, A., Monte, E., and Martinez, A. (1993). Out-of-vocabulary word modelling and rejection for keyword spotting. In EU-ROSPEECH, pages 1265–1268.Google Scholar
  26. Marcus, J. (1992). A novel algorithm for HMM word spotting, performance evaluation and error analysis. In ICASSP, pages II.89–II.92.Google Scholar
  27. Masai, Y., Tanaka, S., and Nitta, T. (1992). Speaker-independent keyword recognition based on SMQ/HMM. In ICSLP, pages 619–622.Google Scholar
  28. Mathan, L. and Miclet, L. (1991). Rejection of extraneous speech input in speech recognition applications using multi-layer perceptrons and the trace of HMMs. In ICASSP, pages 93–96.Google Scholar
  29. Meng, H., Seneff, S., and Zue, V. (1994). Phonological parsing for reversible letter-to-sound/sound-to-letter generation. In ICASSP, pages II.1–II.4Google Scholar
  30. Mercier, G. (1989). Rules and strategies for syllabic segmentation, phoneme identification and tuning in continuous speech. In Lea, W., editor, Towards Robustness in Speech Recognition, pages 409–426. Speech Science Publications.Google Scholar
  31. Morgan, D., Scofield, C., Lorenzo, T., Real, E., and Loconto, D. (1990). A keyword spotter which incorporates neural networks for secondary processing. In ICASSP, pages 113–116.Google Scholar
  32. Myers, C., Rabiner, L., and Rosenberg, A. (1980). An investigation of the use of dynamic time warping for word spotting and connected speech recognition. In ICASSP, pages 173–177.Google Scholar
  33. Nakagawa, S. (1989). Speaker-independent continuous-speech recognition by phoneme-based word spotting and time-synchronous context-free parsing. Computer Speech and Language, 3(3):277–299.CrossRefGoogle Scholar
  34. Nakagawa, S., Hauptmann, A., and Tomita, M. (1986). On quick word spotting techniques. In ICASSP, pages 2311–2314.Google Scholar
  35. Nakamura, S., Akabane, T., and Hamaguchi, S. (1993). Robust word spotting in adverse car environments. In EUROSPEECH, pages 1045–1048.Google Scholar
  36. MST (1991). NIST speech disc 6–1.1.Google Scholar
  37. Rahim, M., Lee, C.-H., and Juang, B.-H. (1995). Robust utterance verification for connected digits recognition. In ICASSP, pages 285–288.Google Scholar
  38. Rohlicek, J., Jeanrenaud, P., Ng, K., Gish, H., Musicus, B., and Siu, M. (1993). Phonetic training and language modeling for word spotting. In ICASSP, pages II.459–II.462.Google Scholar
  39. Rohlicek, J., Russel, W., Roukos, S., and Gish, H. (1989). Continuous hidden Markov modeling for speaker-independent word-spotting. In ICASSP, pages 627–630.Google Scholar
  40. Rose, R. (1992). Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech. In ICASSP, pages II.105–II.108.Google Scholar
  41. Rose, R. (1993). Definition of subword acoustic units for wordspotting. In EUROSPEECH, pages 1049–1052.Google Scholar
  42. Rose, R., Chang, E., and Lippmann, R. (1991). Techniques for information retrieval from voice messages. In ICASSP, pages 317–321.Google Scholar
  43. Rose, R., Juang, B.-H., and Lee, C.-H. (1995). A training procedure for verifying string hypotheses in continuous speech recognition. In ICASSP, pages 281–284.Google Scholar
  44. Rose, R. and Paul, D. (1990). A hidden Markov model based keyword recognition system. In ICASSP, pages 129–132.Google Scholar
  45. Rosenberg, A. and Collat, A. (1987). A connected speech recognition system based on spotting diphone-like segments - preliminary results. In ICASSP, pages 85–88.Google Scholar
  46. Song, J. (1993). Continuous HMM for word spotting and rejection of non vocabulary word in speech recognition over telephone networks. In EUROSPEECH, pages 1563–1566.Google Scholar
  47. Sukkar, R. (1994). Rejection for connected digit recognition based on GPD segmental discrimination. In ICASSP, pages I.393–I.396.Google Scholar
  48. Sukkar, R. and Wilpon, J. (1993). A two pass classifier for utterance rejection in keyword spotting. In ICASSP, pages II.451–II.454.Google Scholar
  49. Sunstar (1992). Sunstar Esprit Project 2094. Design and recording of the SAMOGO database, Doc. W-PIV.STC.009.Google Scholar
  50. Takebayashi, Y., Tsuboi, H., and Kanazawa, H. (1991). A robust speech recognition system using word-spotting with noise immunity learning. In ICASSP, pages 905–908.Google Scholar
  51. Teixeira, C., Trancoso, I., and Serralheiro, A. (1992). Single vs. multiple sink models for isolated and connected word recognition. In ETRW: Speech Processing in Adverse Conditions, pages 179–182.Google Scholar
  52. Tsuboi, H., Kanazawa, H., and Takebayashi, Y. (1990). An accelerator for a highspeed spoken word-spotting and noise immunity learning system. In ICSLP, pages 273–276.Google Scholar
  53. Villarubia, L. and Acero, A. (1993). Rejection techniques for digit recognition in telecommunication applications. In ICASSP, pages II.455–II.458.Google Scholar
  54. Weintraub, M. (1993). Keyword-spotting using SRI’s DECIPHER large-vocabulary speech recognition system. In ICASSP, pages II.463–II.466.Google Scholar
  55. Wilcox, L. and Bush, M. (1991). HMM-based wordspotting for voice editing and indexing. In EUROSPEECH, pages 25–28.Google Scholar
  56. Wilpon, J., Miller, L., and Modi, P. (1991). Improvements and applications for key word recognition using hidden Markov modeling techniques. In ICASSP. pages 309–312.Google Scholar
  57. Wilpon, J., Rabiner, L., Lee, C.-H., and Goldman, E. (1990). Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans. ASSP, ASSP-38(11):1870–1878.CrossRefGoogle Scholar
  58. Wright, J., Carey, M., and Parris, E. (1995). Improved topic spotting through statistical modelling of keyword dependencies. In ICASSP, pages 313–316.Google Scholar
  59. Yamada, M., Komori, Y., and Ohora, Y. (1994). Active/non-active word control using garbage model - unknown word re-evaluation in speech conversation. In ICSLP, pages 823–826.Google Scholar
  60. Young, S. R. (1994a). Detecting misrecognitions and out-of-vocabulary words. In ICASSP, pages II.21–II.24.Google Scholar
  61. Young, S. R. (1994b). Estimating recognition confidence: Methods for conjoining acoustics, semantics, pragmatics and discourse. In ICSLP, pages 2159–2162.Google Scholar
  62. Young, S. R. and Ward, W. (1993a). Learning new words from spontaneous speech. In ICASSP, pages II.590–II.591.Google Scholar
  63. Young, S. R. and Ward, W. (1993b). Recognition confidence measures for spontaneous spoken dialog. In EUROSPEECH, pages 1177–1179.Google Scholar
  64. Zeppenfeld, T., Houghton, R., and Waibel, A. (1993). Improving the MS-TDNN for word spotting. In ICASSP, pages II.475–II.478.Google Scholar
  65. Zeppenfeld, T. and Waibel, A. (1992). A hybrid neural network, dynamic programming word spotter. In ICASSP, pages II.77–II.80.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Jean-Claude Junqua
    • 1
  • Jean-Paul Haton
    • 2
  1. 1.Speech Technology LaboratoryUSA
  2. 2.CRIN - INRIAFrance

Personalised recommendations