Advertisement

International Journal of Speech Technology

, Volume 5, Issue 1, pp 9–22 | Cite as

Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives

  • Peter S. Cardillo
  • Mark Clements
  • Michael S. Miller
Article

Abstract

A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one popular embodiment of LVCSR is presented along with other operating characteristics of the new technique. The current implementation for Digital Media Asset Management (DMAM) is described along with suggested applications in other domains.

phonetic searching large vocabulary continuous speech recognition (LVCSR) digital media asset management (DMAM) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, E.I. and Lippmann, R.P. (1996). Improvingwordspotting performance with artificially generated data. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE, pp. 283–286.Google Scholar
  2. Choi, J., Hindle, D., Hirshberg, J., Magrin-Chagnolleau, I., Kakatani, C., Pereira, F., Singhal, A., and Whittaker, S. (1998). SCAN— speech content based audio navigator: A systems overview. Proceedings International Conference on Spoken Language Processing. Wilmington: Alfred I. duPont Institute.Google Scholar
  3. Clements, M., Cardillo, P., and Miller, M. (2001). Phonetic searching of digital audio. Broadcast Engineering Conference Proceedings. Washington: National Association of Broadcasters, pp. 131–140.Google Scholar
  4. Convera. http://www.convera.com, Screening Room®.Google Scholar
  5. Dragon. http://www.dragonsys.com, Naturally Speaking®.Google Scholar
  6. Garofolo, J., Auzanne, C., and Voorhees, E. (1999). The TREC spoken document retrieval track: A success story. Proceedings of TREC-8. Gaithersburg, MD: National Institute of Standards and Technology, pp. 107–116.Google Scholar
  7. Graff, D., Wu, Z., McIntyre, R., and Liberman, M. (1997). The 1996 broadcast news speech and language-model corpus. Proceedings of the 1997 DARPA Speech Recognition Workshop. Washington: DARPA.Google Scholar
  8. Huang, X., Acero, A., Alleva, F., Hwang, M., Jiang, L., and Mahajan, M. (1995). Microsoft windows highly intelligent speech recognizer: WHISPER. Proceedings of ICASSP 95. Piscataway: IEEE, pp. 93–97.Google Scholar
  9. IBM. http://www-4.ibm.com/software/speech, ViaVoice®.Google Scholar
  10. James, D.A. and Young, S.J. (1994). A fast lattice-based approach to vocabulary independentwordspotting. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Adelais, Australia: IEEE, pp. 377–380.Google Scholar
  11. Johnson, S.E., Woodland, P.C., Jourlin, P., and Spärk Jones, K. (1999). Spoken document retrieval for TREC-8 at Cambridge University. Proceedings of TREC-8. Gaithersburg, MD: National Institute of Standards and Technology, pp. 197–206.Google Scholar
  12. Jurafsky, D. and Martin, J. (2000). Speech and Language Processing. Upper Saddle River: Prentice-Hall.Google Scholar
  13. Ng, K. and Zue, V. (1998). Phonetic recognition for spoken document retrieval. Proceedings of ICASSP 98. Seattle: IEEE.Google Scholar
  14. Philips. http://www.speech.be.philips.com, Speech Pearl®.Google Scholar
  15. Sarukkai, R.R. and Ballard, D.H. (1998). Phonetic set indexing for fast lexical access. IEEE Transactions on Pattern Analysis and Machine Intelligence, New York: IEEE, Vol. 20, No. 1, pp. 78–82.Google Scholar
  16. Virage. http://www.virage.com, VideoLogger® & AudioLogger®.Google Scholar
  17. Wilpon, J., Rabiner, L., Lee, L., and Goldman, E. (1990). Automatic recognition of keywords in unconstrained speech using Hidden Markov Models. IEEE Transactions on Acoustics, Speech, and Signal Processing, NewYork: IEEE,Vol. 38, No. 11, pp. 1870–1878.Google Scholar
  18. Wohlford, R., Smith, A., and Sambur, M. (1980). The enhancement of wordspotting techniques. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Denver: IEEE, pp. 209–212.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Peter S. Cardillo
    • 1
  • Mark Clements
    • 1
  • Michael S. Miller
    • 1
  1. 1.Fast-Talk Communications, IncAtlantaUSA

Personalised recommendations