Abstract
In this paper, we introduce the Interactive Systems Laboratories multimedia data indexing and retrieval system 'View4You'. The main components of the system, namely the segmenter, the speech recognizer and the information retrieval engine, are described in detail.
In the View4You system, public television newscasts are recorded on a daily basis. The newscasts are automatically segmented and an index is created for each of the segments by means of automatic speech recognition. The user can query the system in natural language. The system returns a list of segments which is sorted by relevance with respect to the user query. By selecting a segment, the user can watch the corresponding part of the news show on his or her computer screen.
Several end to end evaluations on real world data, using questions from naive users, are described. By substituting each of the components of the system with a perfect (manually simulated) one, the effect of the components' imperfection on the end to end result can be determined. We show that the information retrieval component has the largest impact on the system performance, followed by the segmentation. The quality of the speech recognizer, as long as its error rate is below approximately 25%, is shown to have only a relatively small importance.
Similar content being viewed by others
References
Beaulieu, M.M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., and Williams, P. (1997). Okapi at TREC-5. Proc. of the 5th Text Retrieval Conference, NIST, Gaithersburg, MD.
Chen, S.S. and Gopalakrishnan, P.S. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion. Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne Conference Resort, Landsdowne, VA, p. 127ff.
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., and Harshman, R.A. (1990). Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391-407.
Fukunaga K. (1990). Introduction to Statistical Pattern Recognition. San Diego: Academic Press Inc.,CA92101, ISBN 0-12-269851-7.
Gauvain, J.-L., Lamel, L., and Adda, G. (1998). The LIMSI 1997 Hub-4E transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Kemp, T. and Schaaf T. (1997). Estimating confidence using word lattices. Proc.EUROSPEECH-97, Rhodes, Greece, vol. 2, pp. 827.
Kemp, T., Schmidt, M., Westphal, M., and Waibel A. (2000). Strategies for automatic segmentation of audio data. Proc. ICASSP 2000, Istanbul, Turkey.
Kneser, R. and Ney, H. (1995). Improved backing-off for M-Gram language modelling. Proc. ICASSP 95, Detroit.
Kubala, F., Jin, H., Matsoukas, S., Nguyen, L., and Schwartz, R. (1997). Brodcast news transcription. Proc. ICASSP 97, p. 203 ff.
Lee, K.-F. (1988). Large-vocabulary speaker-independent continuous speech recognition: TheSPHINXsystem. Ph.D. Thesis,CMUCS-88-148, Carnegie Mellon University, Pittsburgh, PA15213.
Legetter, C.J. and Woodland, P.C. (1995).Maximumlikelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9:171-185.
National Institute of Standards (NIST) (1998). Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, February 8-11, 1998.
Polymenakos, L., Olsen, P., Kanvesky, D., Gopinath, R., Gopalakrishnan, P., and Chen, S. (1998). Transcription of broadcast news-some recent improvements to IBM's LVCSR system. Proc. ICASSP 1998, Seattle, Washington, p. 901 ff.
Sankar, A., Weng, F., Rivlin, Z., Stolcke, A., and Gadde, R. (1998).The development of SRI's 1997 broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Siegler, M., Jain, U., Ray, B., and Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio.Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, TheWestfields Conference Center, Chantilly, VA, p. 97 ff. http:// www-nlpir.nist.gov/TREC/
Van Rijsbergen, C.J. (1979). Information Retrieval. London: Butterworth, p. 174 ff.
Wactlar, H., Christel, M., Gong, Y., and Hauptmann, A. (1999). Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2):66-73.
Wactlar, H., Hauptmann, A., and Witbrock M. (1996). Informedia: News-on-demand experiments in speech recognition. Proc. of ARPA SLT Workshop, 1996.
Wegmann, S., Scattone, F., Carp, I., Gillick, L., Roth, R., and Yamron, J. (1998). Dragon system's 1997 broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA.
Wilkinson, R., Zobel, J., and Sacks-Davis, R. (1995). Similarity measures for short queries. Proc. of TREC-4 NIST.
Woodland, P.C., Hain, T., Johnson, S., Niesler, T., Tuerk, A., and Young, S. (1998). Experiments in broadcast news transcription. Proc.ICASSP 9998, Seattle, Washington, p. 109 ff.
Zhan, P., Westphal, M. (1997). Speaker normalization based on frequency warping. Proc. ICASSP-97, Munich.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kemp, T., Weber, M. & Waibel, A. The ISL View4You Broadcast News Transcription System. International Journal of Speech Technology 4, 177–191 (2001). https://doi.org/10.1023/A:1011348306007
Issue Date:
DOI: https://doi.org/10.1023/A:1011348306007