Abstract
This paper presents four novel techniques for open-vocabulary spoken document retrieval: a method to detect slots that possibly contain a query feature; a method to estimate occurrence probabilities; a technique that we call collection-wide probability re-estimation and a weighting scheme which takes advantage of the fact that long query features are detected more reliably. These four techniques have been evaluated using the TREC-6 spoken document retrieval test collection to determine the improvements in retrieval effectiveness with respect to a baseline retrieval method. Results show that the retrieval effectiveness can be improved considerably despite the large number of speech recognition errors.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abberley D, Renals S, Cook G and Robinson T (1998) The THISL spoken document retrieval system. In: Proceedings of the Sixth Text Retrieval Conference (TREC-6).
Allan J, Callan J, Croft W, Ballesteros L, Byrd D, Swan R and Xu J (1998) INQUERY does battle with TREC-6. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).
Brown M, Foote J, Jones G, Jones KS and Young S (1996) Open-vocabulary speech indexing for voice and video mail retrieval. In: ACM Multimedia Conference, Boston, MA.
Buckley C, Allan J and Salton G (1994) Automatic routing and ad-hoc retrieval using SMART: TREC 2. In: TREC-2 Proceedings, pp. 45-55.
CMU (1995) cmudict. 0.4. Carnegie Mellon University Pronouncing Dictionary, http://www.speech.cs.cmu. edu/cgi-bin/cmudict.
Dharanipragada S, Franz M and Roukos S (1998) Audio indexing for broadcast news. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).
Garofolo JS, Lamel L and Fisher W (1990) DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM. U.S. Department of Commerce, Gaithersburg, MD 20899.
Glavitsch U and Schäuble P (1992) A system for retrieving speech documents. In: Belkin N, Ingwersen P and Pejtersen AM Eds., ACM SIGIR Conference on R & D in Information Retrieval, pp. 168-176.
James D (1996) A system for unrestricted topic retrieval from radio broadcasts. In: Proceedings ICASSP, Atlanta, GA, USA. pp. 279-282.
Jones G, Foote J, Jones KS and Young S (1995) Video mail retrieval using voice: An overview of the stage-2 system. In: van Rijsbergen C, Ed., Proceedings of the Final Workshop on Multimedia Information Retrieval (MIRO'95), Electronic Workshops in Computing, Glasgow. Springer.
Jones G, Foote J, Jones KS and Young S (1996) Retrieving spoken documents by combining multiple index sources. In: ACM SIGIR Conference on R & D in Information Retrieval, Zurich, pp. 30-38.
LDC (1996) DARPA continuous speech recognition corpus-IV: Radio broadcast news (CSRIV Hub-4), CD-ROM, Linguistic Data Consortium, Philadelphia, PA 19104-2608, USA, ldc@ldc.upenn.edu.
Lee KF (1989) Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers, Boston.
Mateev B, Munteanu E, Sheridan P, Wechsler M and Schäuble P (1998) ETH TREC-6: Routing, Chinese, crosslanguage and spoken document retrieval. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).
Mittendorf E (1998) Data corruption and information retrieval. PhD Thesis, Swiss Federal Institute of Technology. Diss. ETH No. 12507.
Mittendorf E, Schäuble P and Sheridan P (1995) Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue. In: ACMSIGIR Conference onR&Din Information Retrieval, pp. 328-335.
Ng K and Zue V (1997) Subword unit representations for spoken document retrieval. In: Proceedings of ESCA Eurospeech Conference, Rhodes, Greece.
Rabiner J (1993) Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NY.
Robinson T (1994) An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks, 5(3).
Schäuble P (1997) Multimedia Information Retrieval-Content-Based Information Retrieval from Large Text and Audio Databases. Kluwer Academic Publishers, Boston.
Sheridan P, Wechsler M and Schuble P (1997) Cross-language speech retrieval: Establishing a baseline performance. In: ACM SIGIR Conference on Research & Development in Information Retrieval, Philadelphia.
Singhal A, Buckley C and Mitra M (1996) Pivoted document length normalization In: ACM SIGIR Conference on R & D in Information Retrieval, pp. 21-29.
Voorhees E, Garofolo J and Jones K (1998) The TREC-6 spoken document retrieval track. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).
Wactlar H, Hauptmann A and Witbrock M (1996) Informedia:News-on-demand experiments in speech recognition. In: Proceedings of DARPA Speech Recogition Workshop, Arden House, Harriman, NY.
Wasser JA (1985) English to phoneme translation. Public domain software, ftp://ftp.doc.ic.ac.uk/packages/ unix-c/utils/phoneme.c.gz.
Wechsler M (1998) Spoken Document Retrieval Based on Phoneme Recognition. PhD Thesis, ETH Zurich. Diss. No. 12879.
Wechsler M, Munteanu E and Schäuble P (1998) New techniques for open-vocabulary spoken document retrieval. In: ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 20-27.
Wechsler M and Schäuble P (1995) Speech retrieval based on automatic indexing. In: van Rijsbergen C Ed., (Proceedings of the Final Workshop on Multimedia Information Retrieval (MIRO '95), Electronic Workshops in Computing, Glasgow. Springer.
Witbrock M and Hauptmann AG (1997) Speech recognition and information retrieval: Experiments in retrieving spoken documents. In: Proceedings of the DARPA Speech Recognition Workshop, Chantilly Virginia.
Young S, Woodland P and Byrne W (1993) HTK Version 1.5: User, Reference & Programmer Manual. Entropic Cambridge Research Laboratory, Sheraton House, Castle Park, Cambridge CB3 OAX, England.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wechsler, M., Munteanu, E. & Schäuble, P. New Approaches to Spoken Document Retrieval. Information Retrieval 3, 173–188 (2000). https://doi.org/10.1023/A:1026512724855
Issue Date:
DOI: https://doi.org/10.1023/A:1026512724855