Skip to main content
Log in

The ISL View4You Broadcast News Transcription System

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we introduce the Interactive Systems Laboratories multimedia data indexing and retrieval system 'View4You'. The main components of the system, namely the segmenter, the speech recognizer and the information retrieval engine, are described in detail.

In the View4You system, public television newscasts are recorded on a daily basis. The newscasts are automatically segmented and an index is created for each of the segments by means of automatic speech recognition. The user can query the system in natural language. The system returns a list of segments which is sorted by relevance with respect to the user query. By selecting a segment, the user can watch the corresponding part of the news show on his or her computer screen.

Several end to end evaluations on real world data, using questions from naive users, are described. By substituting each of the components of the system with a perfect (manually simulated) one, the effect of the components' imperfection on the end to end result can be determined. We show that the information retrieval component has the largest impact on the system performance, followed by the segmentation. The quality of the speech recognizer, as long as its error rate is below approximately 25%, is shown to have only a relatively small importance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Beaulieu, M.M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., and Williams, P. (1997). Okapi at TREC-5. Proc. of the 5th Text Retrieval Conference, NIST, Gaithersburg, MD.

  • Chen, S.S. and Gopalakrishnan, P.S. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion. Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne Conference Resort, Landsdowne, VA, p. 127ff.

  • Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., and Harshman, R.A. (1990). Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391-407.

    Google Scholar 

  • Fukunaga K. (1990). Introduction to Statistical Pattern Recognition. San Diego: Academic Press Inc.,CA92101, ISBN 0-12-269851-7.

    Google Scholar 

  • Gauvain, J.-L., Lamel, L., and Adda, G. (1998). The LIMSI 1997 Hub-4E transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.

  • Kemp, T. and Schaaf T. (1997). Estimating confidence using word lattices. Proc.EUROSPEECH-97, Rhodes, Greece, vol. 2, pp. 827.

    Google Scholar 

  • Kemp, T., Schmidt, M., Westphal, M., and Waibel A. (2000). Strategies for automatic segmentation of audio data. Proc. ICASSP 2000, Istanbul, Turkey.

  • Kneser, R. and Ney, H. (1995). Improved backing-off for M-Gram language modelling. Proc. ICASSP 95, Detroit.

  • Kubala, F., Jin, H., Matsoukas, S., Nguyen, L., and Schwartz, R. (1997). Brodcast news transcription. Proc. ICASSP 97, p. 203 ff.

  • Lee, K.-F. (1988). Large-vocabulary speaker-independent continuous speech recognition: TheSPHINXsystem. Ph.D. Thesis,CMUCS-88-148, Carnegie Mellon University, Pittsburgh, PA15213.

    Google Scholar 

  • Legetter, C.J. and Woodland, P.C. (1995).Maximumlikelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9:171-185.

    Google Scholar 

  • National Institute of Standards (NIST) (1998). Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, February 8-11, 1998.

  • Polymenakos, L., Olsen, P., Kanvesky, D., Gopinath, R., Gopalakrishnan, P., and Chen, S. (1998). Transcription of broadcast news-some recent improvements to IBM's LVCSR system. Proc. ICASSP 1998, Seattle, Washington, p. 901 ff.

  • Sankar, A., Weng, F., Rivlin, Z., Stolcke, A., and Gadde, R. (1998).The development of SRI's 1997 broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.

  • Siegler, M., Jain, U., Ray, B., and Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio.Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, TheWestfields Conference Center, Chantilly, VA, p. 97 ff. http:// www-nlpir.nist.gov/TREC/

  • Van Rijsbergen, C.J. (1979). Information Retrieval. London: Butterworth, p. 174 ff.

    Google Scholar 

  • Wactlar, H., Christel, M., Gong, Y., and Hauptmann, A. (1999). Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2):66-73.

    Google Scholar 

  • Wactlar, H., Hauptmann, A., and Witbrock M. (1996). Informedia: News-on-demand experiments in speech recognition. Proc. of ARPA SLT Workshop, 1996.

  • Wegmann, S., Scattone, F., Carp, I., Gillick, L., Roth, R., and Yamron, J. (1998). Dragon system's 1997 broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.

  • Wilkinson, R., Zobel, J., and Sacks-Davis, R. (1995). Similarity measures for short queries. Proc. of TREC-4 NIST.

  • Woodland, P.C., Hain, T., Johnson, S., Niesler, T., Tuerk, A., and Young, S. (1998). Experiments in broadcast news transcription. Proc.ICASSP 9998, Seattle, Washington, p. 109 ff.

  • Zhan, P., Westphal, M. (1997). Speaker normalization based on frequency warping. Proc. ICASSP-97, Munich.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kemp, T., Weber, M. & Waibel, A. The ISL View4You Broadcast News Transcription System. International Journal of Speech Technology 4, 177–191 (2001). https://doi.org/10.1023/A:1011348306007

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011348306007

Navigation