Topic Indexing of TV Broadcast News Programs
This paper describes a topic segmentation and indexation system for TV broadcast news programs spoken in European Portuguese. The system is integrated in an alert system for selective dissemination of multimedia information developed in the scope of an European Project. The goal of this work is to enhance the retrieval of specific spoken documents that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heuristics related with anchor detection. The indexation is based on hierarchical concept trees (thesaurus), containing 22 main thematic domains, for which Hidden Markov models and topic language models were created. On-going experiments related to multiple topic indexing are also described, where a confidence measure based on the likelihood ratio test is used as the hypothesis test.
KeywordsAutomatic Speech Recognition Speaker Identification Broadcast News News Program Report Segment
Unable to display preview. Download preview PDF.
- 1.Fiscus, J., Doddington, G., Garofolo, J., Martin, A., “NIST’S 1998 Topic Detection and Tracking Evaluation (TDT2)”, in Proc. DARPA Broadcast News Workshop, Feb. 1999.Google Scholar
- 2.Yamron, J. P., Carp, I., Gillick, L., Lowe, S., “A Hidden Markov Model Approach to Text Segmentation and Event Tracking”, in Proceedings of ICASSP-98, Seattle, May 1998.Google Scholar
- 3.Clarkson, P., Rosenfeld, R., “Statistical Language Modeling using the CMU-Cambridge Toolkit”, in Proc. EUROSPEECH 97, Rhodes, Greece, 1997.Google Scholar
- 4.Alexander Gelbukh, Grigori Sidorov and Adolfo Guzmán-Arenas: Document Indexing With a Concept Hierarchy. In: New Developments in Digital Libraries. Proceedings of the 1st International Workshop on New Developments in Digital Libraries (NDDL-2001). ICEIS PRESS, Setúbal, 2001.Google Scholar
- 5.H. Meinedo, N. Souto, J. Neto: Speech Recognition of Broadcast News for the European Portuguese language. Proceedings ASRU’2001-IEEE Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Italy, December 2001.Google Scholar
- 6.C. Hagège: SMORPH: um analisador/gerador morfológico para o português., Lisboa, Portugal, 1997.Google Scholar
- 7.NIST Speech Group: The 2001 Topic Detection and Tracking (TDT2001) Task Definition and Evaluation Plan. ftp://jaguar.ncsl.nist.gov//tdt/tdt2001/evalplans/TDT01.Eval.Plan.v1.2.ps, 15 November 2002.Google Scholar
- 8.Ng, K., “Survey of Approaches to Information Retrieval of Speech Messages” Technical report, Spoken Language Systems Group, MIT, February 1996.Google Scholar