Advertisement

Information Retrieval from Unsegmented Broadcast News Audio

  • Sue E. Johnson
  • Pierre Jourlin
  • Karen Spärck Jones
  • Philip C. Woodland
Article
  • 49 Downloads

Abstract

This paper describes a system for retrieving relevant portions of broadcast news shows starting with only the audio data. A novel method of automatically detecting and removing commercials is presented and shown to increase the performance of the system while also reducing the computational effort required. A sophisticated large vocabulary speech recogniser which produces high-quality transcriptions of the audio and a window-based retrieval system with post-retrieval merging are also described.

Results are presented using the 1999 TREC-8 Spoken Document Retrieval data for the task where no story boundaries are known. Experiments investigating the effectiveness of all aspects of the system are described, and the relative benefits of automatically eliminating commercials, enforcing broadcast structure during retrieval, using relevance feedback, changing retrieval parameters and merging during post-processing are shown.

An Average Precision of 46.8%, when duplicates are scored as irrelevant, is shown to be achievable using this system, with the corresponding word error rate of the recogniser being 20.5%.

spoken document retrieval automatic speech recognition story segmentation commercial detection information retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abberley, D., Renals, S., Robinson, T., and Ellis, D. (2000). The THISL SDR system at TREC-8. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD; Department of Commerce, National Institute of Standards and Technology, pp. 699-706.Google Scholar
  2. Bimbot, F. and Mathan, L. (1993). Text-free speaker recognition using an arithmetic harmonic sphericity measure. Proc. Eurospeech'93, Berlin, Germany, Vol. 1, pp. 169-172.Google Scholar
  3. Cieri, C., Graff, D., Liberman, M., Martey, N., and Strassel, S. (1999). The TDT-2 text and speech corpus. Proc. DARPA 1999 Broadcast News Workshop, Herndon, VA, pp. 57-60.Google Scholar
  4. Dharanipragada, S., Franz, M., and Roukos, S. (1999). Audio indexing for broadcast news. In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 115-119.Google Scholar
  5. Dharanipragada, S. and Roukos, S. (1997). Experimental results in audio indexing. Proc.DARPA1997 Speech RecognitionWorkshop, Chantilly, VA.Google Scholar
  6. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar
  7. Foote, J.T. (1997). Content-based retrieval of music and audio. Multimedia Storage and Archiving Systems II, Proc. SPIE, 3229, pp. 138-147.Google Scholar
  8. Foote, J.T. (1999). An overview of audio information retrieval. Multimedia Systems, 7(1):2-10.Google Scholar
  9. Franz, M., McCarley, J., and Ward, R. (2000). Ad hoc, crosslanguage and spoken document information retrieval at IBM. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246.Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 391-398.Google Scholar
  10. Gales, M.J.F. and Woodland, P.C. (1996). Mean and variance adaptation within the MLLR framework. Computer Speech and Language, 10:249-264.Google Scholar
  11. Garofolo, J.S., Voorhees, E.M., Stanford, V.M., and Spärck Jones, K. (1998). TREC-6 1997 spoken document retrieval track overview and results. In E.M. Voorhees and D.K. Harman (Eds.), The Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 83-92.Google Scholar
  12. Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M., and Spärck Jones, K. (1999a). The 1999 TREC-8 spoken document retrieval (SDR) track evaluation specification. Available via http://www.nist.gov/speech/tests/sdr/sdr99/sdr99.htm.Google Scholar
  13. Garofolo, J.S., Voorhees, E.M., Auzanne, C.G.P., Stanford, V.S., and Lund, B.A. (1999b). 1998 TREC-7 spoken document retrieval track overview and results. In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 79-90.Google Scholar
  14. Garofolo, J.S., Auzanne, C.G.P., and Voorhees, E.M. (2000). The TREC spoken document retrieval track: A success story.Proc. Recherche d'Informations Assistee par Ordinateur (RIAO) 2000, Content-Based Multimedia Information Access, Paris, France, Vol. 1, pp. 1-20.Google Scholar
  15. Gauvain, J.-L., de Kercadio, Y., Lamel, L., and Adda, G. (2000). The LIMSI SDR system for TREC-8. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 475-482.Google Scholar
  16. Hain, T., Johnson, S.E., Tuerk, A., Woodland, P.C., and Young, S.J. (1998). Segment generation and clustering in the HTK broadcast news transcription system. Proc. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA, pp. 133-137.Google Scholar
  17. Hauptmann, A. and Witbrock, M. (1998). Story segmentation and detection of commercials in broadcast news video. '98), Santa Barbara, CA, pp. 168-179.Google Scholar
  18. Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000). Spoken document retrieval for TREC-8 at Cambridge University. In E.M. Voorhees and D.K. Harman (Eds.), The Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 197-206.Google Scholar
  19. Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2001). Spoken document retrieval for TREC-9 at Cambridge University. 'The Ninth Text REtrieval Conference (TREC-9), to appear.Google Scholar
  20. Johnson, S.E. and Woodland, P.C. (1998). Speaker clustering using direct maximisation of the MLLR-adapted likelihood. Proc. 5th International Conference on Spoken Language Processing, Sydney, Australia, Vol. 5, pp. 1775-1778.Google Scholar
  21. Johnson, S.E. and Woodland, P.C. (2000). A method for direct audio search with applications to indexing and retrieval. Proc. 2000 IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Vol. 3, pp. 1427-1430.Google Scholar
  22. Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (1999a). General query expansion techniques for spoken document retrieval. Proc. ESCA Workshop on Extracting Information from Spoken Audio, Cambridge, England, pp. 8-13.Google Scholar
  23. Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (1999b). Improving retrieval on imperfect speech transcriptions. Proc. 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 283-284.Google Scholar
  24. Jourlin, P., Johnson, S.E., Spärck Jones, K., and Woodland, P.C. (2000). Spoken document representations for probabilistic retrieval. Speech Communication, 32(1):21-36.Google Scholar
  25. Kashino, K., Smith, G., and Murase, H. (1999). Time-series active search for quick retrieval of audio and video. Proc. 1999 IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, pp. 2993-2996.Google Scholar
  26. Ng, C., Wilkinson, R., and Zobel, J. (2000). Experiments in spoken document retrieval using phoneme n-grams. Speech Communication, 32(1):61-77.Google Scholar
  27. Odell, J.J., Woodland, P.C., and Hain, T. (1999). The CUHTK-entropic 10xRT broadcast news transcription system. Proc. DARPA 1999 Broadcast News Workshop, Herndon, VA, pp. 271-275.Google Scholar
  28. Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14:130-137.Google Scholar
  29. Robertson, S.E. and Spärck Jones, K. (1997). Simple, proven approaches to text retrieval (Technical Report TR-356). Cambridge University Computer Laboratory.Google Scholar
  30. Robinson, A., Abberley, D., Kirby, D., and Renals, S. (1999). Recognition, indexing and retrieval of British broadcast news with the THISL system. Proc. Eurospeech 99, Budapest, Hungary, pp. 1267-1270.Google Scholar
  31. Singhal, A. and Pereira, F. (1999). Document expansion for speech retrieval. Proc. 17th Annual InternationalACM-SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 34-41.Google Scholar
  32. Spärck Jones, K., Walker, S., and Robertson, S.E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, Parts 1 and 2. Information Processing and Management, 36(6):779-840.Google Scholar
  33. van Mulbregt, P., Carp, I., Gillick, L., Lowe, S., and Yamron, J. (1999). Segmentation of automatically transcribed broadcast news text. Proc.DARPA 1999 Broadcast NewsWorkshop, Herndon, VA, pp. 77-80.Google Scholar
  34. van Rijsbergen, C.J. (1979).Information Retrieval, 2nd ed. Stoneham: MA Butterworths.Google Scholar
  35. Voorhees, E.M. and Harman, D.K. (1999). Overview of the seventh text REtrieval conference (TREC-7). In E.M. Voorhees and D.K. Harman (Eds.), The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-242. Gaithersburg, MD: Department of Commerce, National Institute of Standards and Technology, pp. 1-24.Google Scholar
  36. Wold, E., Blum, T., Keslar, D., and Wheaton, J. (1996). Contentbased classification, search and retrieval of audio. IEEE Multimedia, Fall 1996:27-36.Google Scholar
  37. Woodland, P.C., Gales, M.J.F., Pye, D., and Young, S.J. (1997). The development of the 1996 HTK broadcast news transcription system. Proc. DARPA 1997 Speech RecognitionWorkshop, Chantilly, VA, pp. 73-78.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Sue E. Johnson
    • 1
  • Pierre Jourlin
    • 2
  • Karen Spärck Jones
    • 2
  • Philip C. Woodland
    • 3
  1. 1.Engineering DepartmentUniversity of CambridgeCambridgeUK
  2. 2.Computer LaboratoryUniversity of CambridgeCambridgeUK
  3. 3.Engineering DepartmentUniversity of CambridgeCambridgeUK

Personalised recommendations