Advertisement

Cross-Language Access to Recorded Speech in the MALACH Project

  • Douglas W. Oard
  • Dina Demner-Fushman
  • Jan Hajič
  • Bhuvana Ramabhadran
  • Samuel Gustman
  • William J. Byrne
  • Dagobert Soergel
  • Bonnie Dorr
  • Philip Resnik
  • Michael Picheny
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2448)

Abstract

The MALACH project seeks to help users find information in a vast multilingual collections of untranscribed oral history interviews. This paper introduces the goals of the project and focuses on supporting access by users who are unfamiliar with the interview language. It begins with a review of the state of the art in crosslanguage speech retrieval; approaches that will be investigated in the project are then described. Czech was selected as the first non-English language to be supported, so results of an initial experiment with Czech/English cross-language retrieval are reported.

Keywords

Speech Recognition Query Term Word Error Rate Language Pair Topic Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    James Allan. Perspectives on information retrieval and speech. In Anni R. Coden, Eric W. Brown, and Savitha Srinivasan, editors, Information Retrieval Techniques for Speech Applications, pages 1–10. Springer, 2002. Lecture Notes in Computer Science 2273.CrossRefGoogle Scholar
  2. 2.
    Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, and Nizar Habash. Improved word-level alignment: Injecting knowledge about MT divergences. Technical Report CS-TR-4333, University of Maryland, Institute for Advanced Computer Studies, 2002.Google Scholar
  3. 3.
    Frederic C. Gey, Michael Buckland, Aitao Chen, and Ray Larson. Entry vocabulary-a technology to enhance digital search. In First International Conference on Human Language Technologies, 2001.Google Scholar
  4. 4.
    Samuel Gustman, Dagobert Soergel, Douglas Oard, William Byrne, Michael Picheny, Bhuvana Ramadhadran, and Douglas Greenberg. Supporting access to large digital oral history archives. In The Second Joint Digital Libraries, June 2002. to appear.Google Scholar
  5. 5.
    Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, Petr Sgall, and Barbora Vidová-Hladká. Prague dependency treebank 1.0, 2001. LDC2001T10.Google Scholar
  6. 6.
    Gina-Anne Levow and Douglas W. Oard. Signal boosting for translingual topic tracking. In James Allan, editor, Topic Detection and Tracking: Event-based Information Organization, pages 175–195. Kluwer Academic Publishers, Boston, 2002.CrossRefGoogle Scholar
  7. 7.
    J. Scott McCarley and Martin Franz. Influence of speech recognition errors on topic detection. In Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 342–344, 2000.Google Scholar
  8. 8.
    Helen Meng, Berlin Chen, Erika Grams, Sanjeev Khudanpur, Gina-Anne Levow, Wai-Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin-Min Wang, and Jianqiang Wang. Mandarin-English information (MEI): Investigating translingual speech retrieval. In First International Conference on Human Language Technologies, San Diego, March 2001.Google Scholar
  9. 9.
    Douglas W. Oard and Anne R. Diekema. Cross-language information retrieval. In Annual Review of Information Science and Technology, volume 33. American Society for Information Science, 1998.Google Scholar
  10. 10.
    Douglas W. Oard and Julio Gonzalo. The CLEF 2001 interactive track. In Carol Peters, editor, Proceedings of the Second Cross-Language Evaluation Forum. 2002.Google Scholar
  11. 11.
    D. Yarowsky, G. Nagi, and R. Wicentowski. Inducing multilingual text analysis tools via robust projection across aligned corpora. In First International Conference on Human Language Technologies, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Douglas W. Oard
    • 1
  • Dina Demner-Fushman
    • 1
  • Jan Hajič
    • 2
  • Bhuvana Ramabhadran
    • 3
  • Samuel Gustman
    • 4
  • William J. Byrne
    • 5
  • Dagobert Soergel
    • 1
  • Bonnie Dorr
    • 1
  • Philip Resnik
    • 1
  • Michael Picheny
    • 3
  1. 1.University of MarylandCollege ParkUSA
  2. 2.Charles UniversityPraha 1Czech Republic
  3. 3.IBM T. J. Watson Research CenterYorktown HeightsUSA
  4. 4.Survivors of the Shoah Visual History FoundationLos AngelesUSA
  5. 5.Johns Hopkins UniversityBaltimoreUSA

Personalised recommendations