Advertisement

The Cambridge University Multimedia Document Retrieval Demo System

  • A. Tuerk
  • S.E. Johnson
  • P. Jourlin
  • K. Spärck Jones
  • P.C. Woodland
Article

Abstract

The Cambridge University Multimedia Document Retrieval (CU-MDR) Demo System is a web-based application that allows the user to query a database of radio broadcasts that are available on the Internet. The audio from several radio stations is downloaded and transcribed automatically. This gives a collection of text and audio documents that can be searched by a user. The paper describes how speech recognition and information retrieval techniques are combined in the CU-MDR Demo System and shows how the user can interact with it.

information retrieval spoken document retrieval speech recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fellbaum, C. (1998). WordNet: An Electronic Lexical Databse. Cambridge, MA: MIT Press.Google Scholar
  2. Garofolo, J., Auzanne, C., and Voorhees, E. (2000). The TREC spoken document retrieval track: A success story. Proc. RIAO, vol. 1, p. 1-20.Google Scholar
  3. Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000a). Audio indexing and retrieval of complete broadcast news shows. Proc. RIAO, Paris, France.Google Scholar
  4. Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000b). Spoken document retrieval for TREC-8 at Cambridge University, Proc. TREC-8, NIST Gaithersburg, MD.Google Scholar
  5. Johnson, S.E. and Woodland, P.C. (1998). Speaker clustering using direct maximisation of the MLLR-adapted likelihood. Proc. ICSLP, pp. 1775-1779.Google Scholar
  6. Jourlin, P., Johnson, S., Spärck Jones, K., and Woodland, P. (2000). Spoken document representations for probabilistic retrieval. Speech Communication, 32(12):21-36.Google Scholar
  7. Leggetter, C.J. and Woodland, P.C. (1995). Maximum likelilhood linear regression for speaker adaptation of continuous density HMMs. Computer Speech and Language, 9:171-186.Google Scholar
  8. Odell, J.J., Woodland, P.C., and Hain, T. (1999). The CUHTK entropic 10 × RT broadcast news transcription system. Proc. DARPA Broadcast News Workshop, Herndon, VA, pp. 271-275.Google Scholar
  9. Pallett, D.S., Fiscus, J.G., Garofolo, J.S., Martin, A., and Przybocki, M. (1999). 1998 Broadcast news benchmark test results: English and non-English word error rate performance measures. Proc. DARPA Broadcast News Workshop, Herndon, Virginia.Google Scholar
  10. Pallett, D.S., Fiscus, J.G., Martin, A., and Przybocki, A. (1998). 1997 Broadcast news benchmark test results: English and non-English. Proc. DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, pp. 5-11.Google Scholar
  11. Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14:130-137.Google Scholar
  12. Renals, S., Abberley, D., Robinson, T., Kirby, D., and Marks, M. (2000). The THISL Broadcast news retrieval system. Proc. RIAO, vol. 3, pp. 39-40.Google Scholar
  13. Robertson, S.E. and Spärck Jones, K. (1997). Simple, proven approaches to test retrieval (Technical Report TR356). Cambridge, UK: Cambridge University Computer Laboratory.Google Scholar
  14. Spärck Jones, K. (2001). Automatic language and information processing: Rethinking evaluation. Natural Language Engineering, 7:1-18.Google Scholar
  15. Spärck Jones, K., Walker, S., and Robertson, S.E. (1998). A probabilistic model of information retrieval: Development and status (Technical Report TR446). Cambridge, UK: Cambridge University Computer Laboratory.Google Scholar
  16. Tuerk, A., Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000a). The Cambridge University multimedia document retrieval demo system. Proc. RIAO, vol. 3, pp. 14-15.Google Scholar
  17. Tuerk, A., Johnson, S.E., Jourlin, P., Spärck Jones, K., and Woodland, P.C. (2000b). The Cambridge University multimedia document retrieval demo system. Proc. SIGIR, p. 111.Google Scholar
  18. Van Thong, J., Goddeau, D., Litvinova, A., Logan, B., Moreno, P., and Swain, M. (2000). SpeechBot: A speech recognition based audio indexing system for the web. Proc. RIAO, vol. 1, pp. 106-115.Google Scholar
  19. Witbrock, M. and Hauptmann, A. (1997). Speech recognition and information retrieval: Experiments in retrieving spoken documents. Proc. DARPA Speech Recognition Workshop, Chantilly, Virginia.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • A. Tuerk
    • 1
  • S.E. Johnson
    • 1
  • P. Jourlin
    • 2
  • K. Spärck Jones
    • 2
  • P.C. Woodland
    • 1
  1. 1.Cambridge University Engineering DepartmentCambridgeUK
  2. 2.Cambridge University Computer LaboratoryCambridgeUK

Personalised recommendations