Skip to main content

Search of Spoken Documents Retrieves Well Recognized Transcripts

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

Abstract

This paper presents a series of analyses and experiments on spoken document retrieval systems: search engines that retrieve transcripts produced by speech recognizers. Results show that transcripts that match queries well tend to be recognized more accurately than transcripts that match a query less well. This result was described in past literature, however, no study or explanation of the effect has been provided until now. This paper provides such an analysis showing a relationship between word error rate and query length. The paper expands on past research by increasing the number of recognitions systems that are tested as well as showing the effect in an operational speech retrieval system. Potential future lines of enquiry are also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abberley, D., Renals, S., Cook, G.: Retrieval of broadcast news documents with the THISL system. In: Proceeding IEEE ICASSP, pp. 3781–3784. IEEE Computer Society Press, Los Alamitos (1998)

    Google Scholar 

  • Allan, J., et al.: INQUERY and TREC-7. In: The proceeding of the 7th Text REtrieval Conference (TREC 7) (1998)

    Google Scholar 

  • Allan, J.: Perspectives on Information Retrieval and Speech. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) SIGIR-WS 2001. LNCS, vol. 2273, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Garofolo, J.S., et al.: TREC-7 Spoken Document Retrieval Track Overview and Results. In: Proceedings of the DARPA Broadcast News Workshop (1999)

    Google Scholar 

  • Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M.: The TREC Spoken Document Retrieval Track: A Success Story. In: Proceeding of RIAO (2000)

    Google Scholar 

  • Johnson, S.E., et al.: Spoken Document Retrieval For TREC-7 At Cambridge University. In: Proceeding of the 7th Text REtrieval Conference (TREC 7) (1998)

    Google Scholar 

  • Johnson, S.E., et al.: Spoken Document Retrieval for TREC-8 at Cambridge University. In: Proceedings of the 8th Text REtrieval Conference (TREC 8) (1999)

    Google Scholar 

  • Jones, G.J.F., Lam-Adesina, A.M.: An Investigation of Mixed-Media Information Retrieval. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 463–478. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Moreno, P., et al.: SpeechBot: A Content-based Search Index for Multimedia on the Web. In: Proceedings of the 1st IEEE Pacific-Rim Conference on Multimedia, IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  • Nowell, P.: Experiments in Spoken Document Retrieval at DERA-SRU. In: Proceeding of the 7th Text REtrieval Conference (TREC 7) (1998)

    Google Scholar 

  • Quinn, E.: SpeechBot: The First Internet Site for Content-Based Indexing of Streaming Spoken Audio. Technical Whitepaper, Compaq Computer Corporation, Cambridge, Massachusetts, USA (2000)

    Google Scholar 

  • Robertson, S., Walker, S., Jones, M.M.: Hancock-Beaulieu Okapi at TREC-3. In: Proceeding of the 3rd Text REtrieval Conference (TREC 3) (1995)

    Google Scholar 

  • Sanderson, M., Crestani, F.: Mixing and Merging for Spoken Document Retrieval. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 397–407. Springer, Heidelberg (1998)

    Google Scholar 

  • Sanderson, M., Shou, X.M.: Speech and Hand Transcribed Retrieval. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) SIGIR-WS 2001. LNCS, vol. 2273, pp. 78–85. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Shou, X.M., Sanderson, M., Tuffs, N.: The Relationship of Word Error Rate to Document Ranking. In: Proceedings of the AAAI Spring Symposium Intelligent Multimedia Knowledge Management Workshop, Technical Report SS-03 (2003)

    Google Scholar 

  • Siegler, M., et al.: Experiments in Spoken Document Retrieval at CMU. In: Proceedings of the 7th TREC conference (TREC-7) (1998)

    Google Scholar 

  • Singhal, A., et al.: AT&T at TREC-7. In: Proceeding of the 7th Text REtrieval Conference (TREC 7) (1998)

    Google Scholar 

  • Van Thong, J.M., et al.: SpeechBot: A Speech Recognition based Audio Indexing System for the Web. In: Proceedings of the International Conference on Computer-Assisted Information Retrieval, Recherche d’Informations Assistee par Ordinateur (RIAO2000), pp. 106–115 (2000)

    Google Scholar 

  • Zechner, K., Waibel, A.: Minimizing Word error rate in Textual Summaries of Spoken Language. In: Proceedings of NAACL-ANLP-2000, pp. 186–193 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Sanderson, M., Shou, X.M. (2007). Search of Spoken Documents Retrieves Well Recognized Transcripts. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics