Skip to main content
Log in

Information Retrieval from Unsegmented Broadcast News Audio

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper describes a system for retrieving relevant portions of broadcast news shows starting with only the audio data. A novel method of automatically detecting and removing commercials is presented and shown to increase the performance of the system while also reducing the computational effort required. A sophisticated large vocabulary speech recogniser which produces high-quality transcriptions of the audio and a window-based retrieval system with post-retrieval merging are also described.

Results are presented using the 1999 TREC-8 Spoken Document Retrieval data for the task where no story boundaries are known. Experiments investigating the effectiveness of all aspects of the system are described, and the relative benefits of automatically eliminating commercials, enforcing broadcast structure during retrieval, using relevance feedback, changing retrieval parameters and merging during post-processing are shown.

An Average Precision of 46.8%, when duplicates are scored as irrelevant, is shown to be achievable using this system, with the corresponding word error rate of the recogniser being 20.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abberley, D., Renals, S., Robinson, T., and Ellis, D. (2000). The THISL SDR system at TREC-8. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD; Department of Commerce, National Institute of Standards and Technology, pp. 699-706.

    Google Scholar 

  • Bimbot, F. and Mathan, L. (1993). Text-free speaker recognition using an arithmetic harmonic sphericity measure. Proc. Eurospeech'93, Berlin, Germany, Vol. 1, pp. 169-172.

    Google Scholar 

  • Cieri, C., Graff, D., Liberman, M., Martey, N., and Strassel, S. (1999). The TDT-2 text and speech corpus. Proc. DARPA 1999 Broadcast News Workshop, Herndon, VA, pp. 57-60.

    Google Scholar 

  • Dharanipragada, S., Franz, M., and Roukos, S. (1999). Audio indexing for broadcast news. In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 115-119.

    Google Scholar 

  • Dharanipragada, S. and Roukos, S. (1997). Experimental results in audio indexing. Proc.DARPA1997 Speech RecognitionWorkshop, Chantilly, VA.

    Google Scholar 

  • Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

    Google Scholar 

  • Foote, J.T. (1997). Content-based retrieval of music and audio. Multimedia Storage and Archiving Systems II, Proc. SPIE, 3229, pp. 138-147.

    Google Scholar 

  • Foote, J.T. (1999). An overview of audio information retrieval. Multimedia Systems, 7(1):2-10.

    Google Scholar 

  • Franz, M., McCarley, J., and Ward, R. (2000). Ad hoc, crosslanguage and spoken document information retrieval at IBM. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246.Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 391-398.

    Google Scholar 

  • Gales, M.J.F. and Woodland, P.C. (1996). Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10:249-264.

    Google Scholar 

  • Garofolo, J.S., Voorhees, E.M., Stanford, V.M., and Spärck Jones, K. (1998). TREC-6 1997 spoken document retrieval track overview and results. In E.M. Voorhees and D.K. Harman (Eds.), The Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 83-92.

    Google Scholar 

  • Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M., and Spärck Jones, K. (1999a). The 1999 TREC-8 spoken document retrieval (SDR) track evaluation specification. Available via http://www.nist.gov/speech/tests/sdr/sdr99/sdr99.htm.

  • Garofolo, J.S., Voorhees, E.M., Auzanne, C.G.P., Stanford, V.S., and Lund, B.A. (1999b). 1998 TREC-7 spoken document retrieval track overview and results. In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 79-90.

    Google Scholar 

  • Garofolo, J.S., Auzanne, C.G.P., and Voorhees, E.M. (2000). The TREC spoken document retrieval track: A success story.Proc. Recherche d'Informations Assistee par Ordinateur (RIAO) 2000, Content-Based Multimedia Information Access, Paris, France, Vol. 1, pp. 1-20.

    Google Scholar 

  • Gauvain, J.-L., de Kercadio, Y., Lamel, L., and Adda, G. (2000). The LIMSI SDR system for TREC-8. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 475-482.

    Google Scholar 

  • Hain, T., Johnson, S.E., Tuerk, A., Woodland, P.C., and Young, S.J. (1998). Segment generation and clustering in the HTK broadcast news transcription system. Proc. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA, pp. 133-137.

    Google Scholar 

  • Hauptmann, A. and Witbrock, M. (1998). Story segmentation and detection of commercials in broadcast news video. '98), Santa Barbara, CA, pp. 168-179.

    Google Scholar 

  • Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000). Spoken document retrieval for TREC-8 at Cambridge University. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 197-206.

    Google Scholar 

  • Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2001). Spoken document retrieval for TREC-9 at Cambridge University. 'The Ninth Text REtrieval Conference (TREC-9), to appear.

  • Johnson, S.E. and Woodland, P.C. (1998). Speaker clustering using direct maximisation of the MLLR-adapted likelihood. Proc. 5th International Conference on Spoken Language Processing, Sydney, Australia, Vol. 5, pp. 1775-1778.

    Google Scholar 

  • Johnson, S.E. and Woodland, P.C. (2000). A method for direct audio search with applications to indexing and retrieval. Proc. 2000 IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Vol. 3, pp. 1427-1430.

    Google Scholar 

  • Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (1999a). General query expansion techniques for spoken document retrieval. Proc. ESCA Workshop on Extracting Information from Spoken Audio, Cambridge, England, pp. 8-13.

  • Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (1999b). Improving retrieval on imperfect speech transcriptions. Proc. 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 283-284.

  • Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (2000). Spoken document representations for probabilistic retrieval. Speech Communication, 32(1):21-36.

    Google Scholar 

  • Kashino, K., Smith, G., and Murase, H. (1999). Time-series active search for quick retrieval of audio and video. Proc. 1999 IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, pp. 2993-2996.

  • Ng, C., Wilkinson, R., and Zobel, J. (2000). Experiments in spoken document retrieval using phoneme n-grams. Speech Communication, 32(1):61-77.

    Google Scholar 

  • Odell, J.J., Woodland, P.C., and Hain, T. (1999). The CUHTK-entropic 10xRT broadcast news transcription system. Proc. DARPA 1999 Broadcast News Workshop, Herndon, VA, pp. 271-275.

    Google Scholar 

  • Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14:130-137.

    Google Scholar 

  • Robertson, S.E. and Spärck Jones, K. (1997). Simple, proven approaches to text retrieval (Technical Report TR-356). Cambridge University Computer Laboratory.

  • Robinson, A., Abberley, D., Kirby, D., and Renals, S. (1999). Recognition, indexing and retrieval of British broadcast news with the THISL system. Proc. Eurospeech 99, Budapest, Hungary, pp. 1267-1270.

  • Singhal, A. and Pereira, F. (1999). Document expansion for speech retrieval. Proc. 17th Annual InternationalACM-SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 34-41.

  • Spärck Jones, K., Walker, S., and Robertson, S.E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, Parts 1 and 2. Information Processing and Management, 36(6):779-840.

    Google Scholar 

  • van Mulbregt, P., Carp, I., Gillick, L., Lowe, S., and Yamron, J. (1999). Segmentation of automatically transcribed broadcast news text. Proc.DARPA 1999 Broadcast NewsWorkshop, Herndon, VA, pp. 77-80.

    Google Scholar 

  • van Rijsbergen, C.J. (1979).Information Retrieval, 2nd ed. Stoneham: MA Butterworths.

    Google Scholar 

  • Voorhees, E.M. and Harman, D.K. (1999). Overview of the seventh text REtrieval conference (TREC-7). In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 1-24.

    Google Scholar 

  • Wold, E., Blum, T., Keslar, D., and Wheaton, J. (1996). Contentbased classification, search and retrieval of audio. IEEE Multimedia, Fall 1996:27-36.

  • Woodland, P.C., Gales, M.J.F., Pye, D., and Young, S.J. (1997). The development of the 1996 HTK broadcast news transcription system. Proc. DARPA 1997 Speech RecognitionWorkshop, Chantilly, VA, pp. 73-78.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, S.E., Jourlin, P., Jones, K.S. et al. Information Retrieval from Unsegmented Broadcast News Audio. International Journal of Speech Technology 4, 251–268 (2001). https://doi.org/10.1023/A:1011312708732

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011312708732

Navigation