The ISL View4You Broadcast News Transcription System

Kemp, T.; Weber, M.; Waibel, A.

doi:10.1023/A:1011348306007

The ISL View4You Broadcast News Transcription System

Published: July 2001

Volume 4, pages 177–191, (2001)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

T. Kemp¹,
M. Weber¹ &
A. Waibel¹

48 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we introduce the Interactive Systems Laboratories multimedia data indexing and retrieval system 'View4You'. The main components of the system, namely the segmenter, the speech recognizer and the information retrieval engine, are described in detail.

In the View4You system, public television newscasts are recorded on a daily basis. The newscasts are automatically segmented and an index is created for each of the segments by means of automatic speech recognition. The user can query the system in natural language. The system returns a list of segments which is sorted by relevance with respect to the user query. By selecting a segment, the user can watch the corresponding part of the news show on his or her computer screen.

Several end to end evaluations on real world data, using questions from naive users, are described. By substituting each of the components of the system with a perfect (manually simulated) one, the effect of the components' imperfection on the end to end result can be determined. We show that the information retrieval component has the largest impact on the system performance, followed by the segmentation. The quality of the speech recognizer, as long as its error rate is below approximately 25%, is shown to have only a relatively small importance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Beaulieu, M.M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., and Williams, P. (1997). Okapi at TREC-5. Proc. of the 5th Text Retrieval Conference, NIST, Gaithersburg, MD.
Chen, S.S. and Gopalakrishnan, P.S. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion. Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne Conference Resort, Landsdowne, VA, p. 127ff.
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., and Harshman, R.A. (1990). Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391-407.
Google Scholar
Fukunaga K. (1990). Introduction to Statistical Pattern Recognition. San Diego: Academic Press Inc.,CA92101, ISBN 0-12-269851-7.
Google Scholar
Gauvain, J.-L., Lamel, L., and Adda, G. (1998). The LIMSI 1997 Hub-4E transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Kemp, T. and Schaaf T. (1997). Estimating confidence using word lattices. Proc.EUROSPEECH-97, Rhodes, Greece, vol. 2, pp. 827.
Google Scholar
Kemp, T., Schmidt, M., Westphal, M., and Waibel A. (2000). Strategies for automatic segmentation of audio data. Proc. ICASSP 2000, Istanbul, Turkey.
Kneser, R. and Ney, H. (1995). Improved backing-off for M-Gram language modelling. Proc. ICASSP 95, Detroit.
Kubala, F., Jin, H., Matsoukas, S., Nguyen, L., and Schwartz, R. (1997). Brodcast news transcription. Proc. ICASSP 97, p. 203 ff.
Lee, K.-F. (1988). Large-vocabulary speaker-independent continuous speech recognition: TheSPHINXsystem. Ph.D. Thesis,CMUCS-88-148, Carnegie Mellon University, Pittsburgh, PA15213.
Google Scholar
Legetter, C.J. and Woodland, P.C. (1995).Maximumlikelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9:171-185.
Google Scholar
National Institute of Standards (NIST) (1998). Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, February 8-11, 1998.
Polymenakos, L., Olsen, P., Kanvesky, D., Gopinath, R., Gopalakrishnan, P., and Chen, S. (1998). Transcription of broadcast news-some recent improvements to IBM's LVCSR system. Proc. ICASSP 1998, Seattle, Washington, p. 901 ff.
Sankar, A., Weng, F., Rivlin, Z., Stolcke, A., and Gadde, R. (1998).The development of SRI's 1997 broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Siegler, M., Jain, U., Ray, B., and Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio.Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, TheWestfields Conference Center, Chantilly, VA, p. 97 ff. http:// www-nlpir.nist.gov/TREC/
Van Rijsbergen, C.J. (1979). Information Retrieval. London: Butterworth, p. 174 ff.
Google Scholar
Wactlar, H., Christel, M., Gong, Y., and Hauptmann, A. (1999). Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2):66-73.
Google Scholar
Wactlar, H., Hauptmann, A., and Witbrock M. (1996). Informedia: News-on-demand experiments in speech recognition. Proc. of ARPA SLT Workshop, 1996.
Wegmann, S., Scattone, F., Carp, I., Gillick, L., Roth, R., and Yamron, J. (1998). Dragon system's 1997 broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Wilkinson, R., Zobel, J., and Sacks-Davis, R. (1995). Similarity measures for short queries. Proc. of TREC-4 NIST.
Woodland, P.C., Hain, T., Johnson, S., Niesler, T., Tuerk, A., and Young, S. (1998). Experiments in broadcast news transcription. Proc.ICASSP 9998, Seattle, Washington, p. 109 ff.
Zhan, P., Westphal, M. (1997). Speaker normalization based on frequency warping. Proc. ICASSP-97, Munich.

Download references

Author information

Authors and Affiliations

ISL Interactive Systems Laboratories, University of Karlsruhe, Am Fasanengarten 5, 76129, Karlsruhe, Germany
T. Kemp, M. Weber & A. Waibel

Authors

T. Kemp
View author publications
You can also search for this author in PubMed Google Scholar
M. Weber
View author publications
You can also search for this author in PubMed Google Scholar
A. Waibel
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kemp, T., Weber, M. & Waibel, A. The ISL View4You Broadcast News Transcription System. International Journal of Speech Technology 4, 177–191 (2001). https://doi.org/10.1023/A:1011348306007

Download citation

Issue Date: July 2001
DOI: https://doi.org/10.1023/A:1011348306007

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The ISL View4You Broadcast News Transcription System

Abstract

Access this article

Similar content being viewed by others

Automatic Transcription of Polish Radio and Television Broadcast Audio

Improving Transcript-Based Video Retrieval Using Unsupervised Language Model Adaptation

N-Best 2008: A Benchmark Evaluation for Large Vocabulary Speech Recognition in Dutch

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

The ISL View4You Broadcast News Transcription System

Abstract

Access this article

Similar content being viewed by others

Automatic Transcription of Polish Radio and Television Broadcast Audio

Improving Transcript-Based Video Retrieval Using Unsupervised Language Model Adaptation

N-Best 2008: A Benchmark Evaluation for Large Vocabulary Speech Recognition in Dutch

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation