Information Retrieval from Unsegmented Broadcast News Audio

Johnson, Sue E.; Jourlin, Pierre; Jones, Karen Spärck; Woodland, Philip C.

doi:10.1023/A:1011312708732

Information Retrieval from Unsegmented Broadcast News Audio

Published: July 2001

Volume 4, pages 251–268, (2001)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Sue E. Johnson¹,
Pierre Jourlin²,
Karen Spärck Jones² &
…
Philip C. Woodland³

61 Accesses
2 Citations
Explore all metrics

Abstract

This paper describes a system for retrieving relevant portions of broadcast news shows starting with only the audio data. A novel method of automatically detecting and removing commercials is presented and shown to increase the performance of the system while also reducing the computational effort required. A sophisticated large vocabulary speech recogniser which produces high-quality transcriptions of the audio and a window-based retrieval system with post-retrieval merging are also described.

Results are presented using the 1999 TREC-8 Spoken Document Retrieval data for the task where no story boundaries are known. Experiments investigating the effectiveness of all aspects of the system are described, and the relative benefits of automatically eliminating commercials, enforcing broadcast structure during retrieval, using relevance feedback, changing retrieval parameters and merging during post-processing are shown.

An Average Precision of 46.8%, when duplicates are scored as irrelevant, is shown to be achievable using this system, with the corresponding word error rate of the recogniser being 20.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Transcription of Polish Radio and Television Broadcast Audio

Unsupervised story segmentation and indexing of broadcast news video

Article 16 September 2021

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

References

Abberley, D., Renals, S., Robinson, T., and Ellis, D. (2000). The THISL SDR system at TREC-8. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD; Department of Commerce, National Institute of Standards and Technology, pp. 699-706.
Google Scholar
Bimbot, F. and Mathan, L. (1993). Text-free speaker recognition using an arithmetic harmonic sphericity measure. Proc. Eurospeech'93, Berlin, Germany, Vol. 1, pp. 169-172.
Google Scholar
Cieri, C., Graff, D., Liberman, M., Martey, N., and Strassel, S. (1999). The TDT-2 text and speech corpus. Proc. DARPA 1999 Broadcast News Workshop, Herndon, VA, pp. 57-60.
Google Scholar
Dharanipragada, S., Franz, M., and Roukos, S. (1999). Audio indexing for broadcast news. In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 115-119.
Google Scholar
Dharanipragada, S. and Roukos, S. (1997). Experimental results in audio indexing. Proc.DARPA1997 Speech RecognitionWorkshop, Chantilly, VA.
Google Scholar
Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
Google Scholar
Foote, J.T. (1997). Content-based retrieval of music and audio. Multimedia Storage and Archiving Systems II, Proc. SPIE, 3229, pp. 138-147.
Google Scholar
Foote, J.T. (1999). An overview of audio information retrieval. Multimedia Systems, 7(1):2-10.
Google Scholar
Franz, M., McCarley, J., and Ward, R. (2000). Ad hoc, crosslanguage and spoken document information retrieval at IBM. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246.Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 391-398.
Google Scholar
Gales, M.J.F. and Woodland, P.C. (1996). Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10:249-264.
Google Scholar
Garofolo, J.S., Voorhees, E.M., Stanford, V.M., and Spärck Jones, K. (1998). TREC-6 1997 spoken document retrieval track overview and results. In E.M. Voorhees and D.K. Harman (Eds.), The Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 83-92.
Google Scholar
Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M., and Spärck Jones, K. (1999a). The 1999 TREC-8 spoken document retrieval (SDR) track evaluation specification. Available via http://www.nist.gov/speech/tests/sdr/sdr99/sdr99.htm.
Garofolo, J.S., Voorhees, E.M., Auzanne, C.G.P., Stanford, V.S., and Lund, B.A. (1999b). 1998 TREC-7 spoken document retrieval track overview and results. In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 79-90.
Google Scholar
Garofolo, J.S., Auzanne, C.G.P., and Voorhees, E.M. (2000). The TREC spoken document retrieval track: A success story.Proc. Recherche d'Informations Assistee par Ordinateur (RIAO) 2000, Content-Based Multimedia Information Access, Paris, France, Vol. 1, pp. 1-20.
Google Scholar
Gauvain, J.-L., de Kercadio, Y., Lamel, L., and Adda, G. (2000). The LIMSI SDR system for TREC-8. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 475-482.
Google Scholar
Hain, T., Johnson, S.E., Tuerk, A., Woodland, P.C., and Young, S.J. (1998). Segment generation and clustering in the HTK broadcast news transcription system. Proc. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA, pp. 133-137.
Google Scholar
Hauptmann, A. and Witbrock, M. (1998). Story segmentation and detection of commercials in broadcast news video. '98), Santa Barbara, CA, pp. 168-179.
Google Scholar
Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000). Spoken document retrieval for TREC-8 at Cambridge University. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 197-206.
Google Scholar
Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2001). Spoken document retrieval for TREC-9 at Cambridge University. 'The Ninth Text REtrieval Conference (TREC-9), to appear.
Johnson, S.E. and Woodland, P.C. (1998). Speaker clustering using direct maximisation of the MLLR-adapted likelihood. Proc. 5th International Conference on Spoken Language Processing, Sydney, Australia, Vol. 5, pp. 1775-1778.
Google Scholar
Johnson, S.E. and Woodland, P.C. (2000). A method for direct audio search with applications to indexing and retrieval. Proc. 2000 IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Vol. 3, pp. 1427-1430.
Google Scholar
Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (1999a). General query expansion techniques for spoken document retrieval. Proc. ESCA Workshop on Extracting Information from Spoken Audio, Cambridge, England, pp. 8-13.
Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (1999b). Improving retrieval on imperfect speech transcriptions. Proc. 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 283-284.
Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (2000). Spoken document representations for probabilistic retrieval. Speech Communication, 32(1):21-36.
Google Scholar
Kashino, K., Smith, G., and Murase, H. (1999). Time-series active search for quick retrieval of audio and video. Proc. 1999 IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, pp. 2993-2996.
Ng, C., Wilkinson, R., and Zobel, J. (2000). Experiments in spoken document retrieval using phoneme n-grams. Speech Communication, 32(1):61-77.
Google Scholar
Odell, J.J., Woodland, P.C., and Hain, T. (1999). The CUHTK-entropic 10xRT broadcast news transcription system. Proc. DARPA 1999 Broadcast News Workshop, Herndon, VA, pp. 271-275.
Google Scholar
Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14:130-137.
Google Scholar
Robertson, S.E. and Spärck Jones, K. (1997). Simple, proven approaches to text retrieval (Technical Report TR-356). Cambridge University Computer Laboratory.
Robinson, A., Abberley, D., Kirby, D., and Renals, S. (1999). Recognition, indexing and retrieval of British broadcast news with the THISL system. Proc. Eurospeech 99, Budapest, Hungary, pp. 1267-1270.
Singhal, A. and Pereira, F. (1999). Document expansion for speech retrieval. Proc. 17th Annual InternationalACM-SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 34-41.
Spärck Jones, K., Walker, S., and Robertson, S.E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, Parts 1 and 2. Information Processing and Management, 36(6):779-840.
Google Scholar
van Mulbregt, P., Carp, I., Gillick, L., Lowe, S., and Yamron, J. (1999). Segmentation of automatically transcribed broadcast news text. Proc.DARPA 1999 Broadcast NewsWorkshop, Herndon, VA, pp. 77-80.
Google Scholar
van Rijsbergen, C.J. (1979).Information Retrieval, 2nd ed. Stoneham: MA Butterworths.
Google Scholar
Voorhees, E.M. and Harman, D.K. (1999). Overview of the seventh text REtrieval conference (TREC-7). In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 1-24.
Google Scholar
Wold, E., Blum, T., Keslar, D., and Wheaton, J. (1996). Contentbased classification, search and retrieval of audio. IEEE Multimedia, Fall 1996:27-36.
Woodland, P.C., Gales, M.J.F., Pye, D., and Young, S.J. (1997). The development of the 1996 HTK broadcast news transcription system. Proc. DARPA 1997 Speech RecognitionWorkshop, Chantilly, VA, pp. 73-78.
Google Scholar

Download references

Author information

Authors and Affiliations

Engineering Department, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK
Sue E. Johnson
Computer Laboratory, University of Cambridge, Pembroke Street, Cambridge, CB2 3QG, UK
Pierre Jourlin & Karen Spärck Jones
Engineering Department, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK
Philip C. Woodland

Authors

Sue E. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Jourlin
View author publications
You can also search for this author in PubMed Google Scholar
Karen Spärck Jones
View author publications
You can also search for this author in PubMed Google Scholar
Philip C. Woodland
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, S.E., Jourlin, P., Jones, K.S. et al. Information Retrieval from Unsegmented Broadcast News Audio. International Journal of Speech Technology 4, 251–268 (2001). https://doi.org/10.1023/A:1011312708732

Download citation

Issue Date: July 2001
DOI: https://doi.org/10.1023/A:1011312708732

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Retrieval from Unsegmented Broadcast News Audio

Abstract

Access this article

Similar content being viewed by others

Automatic Transcription of Polish Radio and Television Broadcast Audio

Unsupervised story segmentation and indexing of broadcast news video

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Information Retrieval from Unsegmented Broadcast News Audio

Abstract

Access this article

Similar content being viewed by others

Automatic Transcription of Polish Radio and Television Broadcast Audio

Unsupervised story segmentation and indexing of broadcast news video

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation