Investigating the Global Semantic Impact of Speech Recognition Error on Spoken Content Collections

  • Martha Larson
  • Manos Tsagkias
  • Jiyin He
  • Maarten de Rijke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5478)

Abstract

Errors in speech recognition transcripts have a negative impact on effectiveness of content-based speech retrieval and present a particular challenge for collections containing conversational spoken content. We propose a Global Semantic Distortion (GSD) metric that measures the collection-wide impact of speech recognition error on spoken content retrieval in a query-independent manner. We deploy our metric to examine the effects of speech recognition substitution errors. First, we investigate frequent substitutions, cases in which the recognizer habitually mis-transcribes one word as another. Although habitual mistakes have a large global impact, the long tail of rare substitutions has a more damaging effect. Second, we investigate semantically similar substitutions, cases in which the word spoken and the word recognized do not diverge radically in meaning. Similar substitutions are shown to have slightly less global impact than semantically dissimilar substitutions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allan, J.: Robust techniques for organizing and retrieving spoken documents. EURASIP Journal on Applied Signal Processing 2003(1), 103–114 (2003)CrossRefGoogle Scholar
  2. 2.
    Byrne, W., et al.: Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Transactions on Speech and Audio Processing, Special Issue on Spontaneous Speech Processing 12(4), 420–435 (2004)CrossRefGoogle Scholar
  3. 3.
    Chelba, C., Silva, J., Acero, A.: Soft indexing of speech content for search in spoken documents. Computer Speech and Language 21(3), 458–478 (2007)CrossRefGoogle Scholar
  4. 4.
    de Jong, F.M.G., Oard, D.W., Heeren, W.F.L., Ordelman, R.J.F.: Access to recorded interviews: A research agenda. ACM Journal on Computing and Cultural Heritage (JOCCH) 1(1), 3:1–3:27 (2008)Google Scholar
  5. 5.
    Garofolo, J., Voorhees, E., Auzanne, C., Stanford, V.: Spoken document retrieval: 1998 evaluation and investigation of new metrics. In: Proceedings of the ESCA workshop: Accessing information in spoken audio, pp. 1–7 (1999)Google Scholar
  6. 6.
    Garofolo, J., Auzanne, C., Voorhees, E.: The TREC spoken document retrieval track: A success story. In: RIAO 2000 (2000)Google Scholar
  7. 7.
    Jones, G., Zhang, K., Newman, E., Lam-Adesina, A.: Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech. In: ACM SIGIR SSCS 2007 Workshop (2007)Google Scholar
  8. 8.
    Renals, S., Hain, T., Bourlard, H.: Recognition and understanding of meetings: The AMI and AMIDA projects. In: ASRU 2007 (2007)Google Scholar
  9. 9.
    van der Werff, L., Heeren, W.: Evaluating ASR output for information retrieval. In: ACM SIGIR SSCS 2007 Workshop (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Martha Larson
    • 1
  • Manos Tsagkias
    • 2
  • Jiyin He
    • 2
  • Maarten de Rijke
    • 2
  1. 1.Information and Communication Theory Group, EEMCSDelft University of TechnologyDelftThe Netherlands
  2. 2.ISLA, University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations