Cross-Language Access to Recorded Speech in the MALACH Project
The MALACH project seeks to help users find information in a vast multilingual collections of untranscribed oral history interviews. This paper introduces the goals of the project and focuses on supporting access by users who are unfamiliar with the interview language. It begins with a review of the state of the art in crosslanguage speech retrieval; approaches that will be investigated in the project are then described. Czech was selected as the first non-English language to be supported, so results of an initial experiment with Czech/English cross-language retrieval are reported.
KeywordsSpeech Recognition Query Term Word Error Rate Language Pair Topic Detection
Unable to display preview. Download preview PDF.
- 2.Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, and Nizar Habash. Improved word-level alignment: Injecting knowledge about MT divergences. Technical Report CS-TR-4333, University of Maryland, Institute for Advanced Computer Studies, 2002.Google Scholar
- 3.Frederic C. Gey, Michael Buckland, Aitao Chen, and Ray Larson. Entry vocabulary-a technology to enhance digital search. In First International Conference on Human Language Technologies, 2001.Google Scholar
- 4.Samuel Gustman, Dagobert Soergel, Douglas Oard, William Byrne, Michael Picheny, Bhuvana Ramadhadran, and Douglas Greenberg. Supporting access to large digital oral history archives. In The Second Joint Digital Libraries, June 2002. to appear.Google Scholar
- 5.Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, Petr Sgall, and Barbora Vidová-Hladká. Prague dependency treebank 1.0, 2001. LDC2001T10.Google Scholar
- 7.J. Scott McCarley and Martin Franz. Influence of speech recognition errors on topic detection. In Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 342–344, 2000.Google Scholar
- 8.Helen Meng, Berlin Chen, Erika Grams, Sanjeev Khudanpur, Gina-Anne Levow, Wai-Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin-Min Wang, and Jianqiang Wang. Mandarin-English information (MEI): Investigating translingual speech retrieval. In First International Conference on Human Language Technologies, San Diego, March 2001.Google Scholar
- 9.Douglas W. Oard and Anne R. Diekema. Cross-language information retrieval. In Annual Review of Information Science and Technology, volume 33. American Society for Information Science, 1998.Google Scholar
- 10.Douglas W. Oard and Julio Gonzalo. The CLEF 2001 interactive track. In Carol Peters, editor, Proceedings of the Second Cross-Language Evaluation Forum. 2002.Google Scholar
- 11.D. Yarowsky, G. Nagi, and R. Wicentowski. Inducing multilingual text analysis tools via robust projection across aligned corpora. In First International Conference on Human Language Technologies, 2001.Google Scholar