Advertisement

Interlingual Annotation for MT Development

  • Florence Reeder
  • Bonnie Dorr
  • David Farwell
  • Nizar Habash
  • Stephen Helmreich
  • Eduard Hovy
  • Lori Levin
  • Teruko Mitamura
  • Keith Miller
  • Owen Rambow
  • Advaith Siddharthan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3265)

Abstract

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

Keywords

Natural Language Processing Machine Translation Semantic Representation Question Answering Thematic Role 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bateman, J.A., Kasper, R.T., Moore, J.D., Whitney, R.A.: A General Organization of Knowledge for Natural Language Processing: The Penman Upper Model. Unpublished research report, USC/Information Sciences Institute, Marina del Rey, CA (1989)Google Scholar
  2. 2.
    Carletta, J.C.: Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics 22(2), 249–254 (1996)Google Scholar
  3. 3.
    Dorr, B.: Machine Translation: A View from the Lexicon. MIT Press, Cambridge, MA (1993)Google Scholar
  4. 4.
    Dorr, B.: LCS Verb Database, Online Software Database of Lexical Conceptual Structures and Documentation, University of Maryland (2001), http://www.umiacs.umd.edu/~bonnie/LCS_Database_Documentation.html
  5. 5.
    Farwell, D., Helmreich, S., Dorr, B., Habash, N., Reeder, F., Miller, K., Levin, L., Mitamura, T., Hovy, E., Rambow, O., Siddharthan, A.: Interlingual Annotation of Multilingual Text Corpora. In: Proceedings of Workshop on Frontiers in Corpus Annotation. NAACL/HLT (2004)Google Scholar
  6. 6.
    Fellbaum, C. (ed.): WordNet: An On-line Lexical Database and Some of its Applications. MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Habash, N.: Matador: A Large Scale Spanish-English GHMT System. In: Proceedings of the MT Summit, New Orleans, LA (2003)Google Scholar
  8. 8.
    Habash, N., Dorr, B., Traum, D.: Efficient Language Independent Generation from Lexical Conceptual Structures.Machine Translation 17(4) (2002)Google Scholar
  9. 9.
    Haji, J., Vidová-Hladká, B., Pajas, P.: The Prague Dependency Treebank: Annotation Structure and Support. In: Proceeding of the IRCS Workshop on Linguistic Databases, University of Pennsylvania, Philadelphia, USA, pp. 105-114 (2001)Google Scholar
  10. 10.
    Hirst, G.: Paraphrasing paraphrased. Invited talk at Second International Workshop on Paraphrasing, 41st Annual Meeting of the ACL, Sapporo, Japan (2003)Google Scholar
  11. 11.
    Knight, K., Langkilde, I.: Preserving Ambiguities in Generation via Automata Intersection. American Association for Artificial Intelligence conference AAAI (2000)Google Scholar
  12. 12.
    Knight, K., Luk, S.K.: Building a Large-Scale Knowledge Base for Machine Translation. In:Proceedings of AAAI, Seattle, WA (1994)Google Scholar
  13. 13.
    Kozlowski, R., McCoy, K., Vijay-Shanker, K.: Generation of Single-Sentence Paraphrases from Predicate/argument Structure using Lexico-grammatical Resources. In:Second International Workshop on Paraphrasing, 41st ACL, Sapporo, Japan (2003)Google Scholar
  14. 14.
    Mahesh, K., Nirenberg, S.: A Situated Ontology for Practical NLP. Proc. of Workshop on Basic Ontological Issues in Knowledge Sharing at IJCAI 1995, Montreal, Canada (1995)Google Scholar
  15. 15.
    Mitamura, T., Nyberg, E., Carbonell, J.: An Efficient Interlingua Translation System for Multilingual Document Production. In:Proc. of 3rd MT Summit. Washington, DC (1991)Google Scholar
  16. 16.
    Philpot, A., Fleischman, M., Hovy, E.H.: Semi-Automatic Construction of a General Purpose Ontology.In: Proc. of the International Lisp Conference. New York, NY (2003) (invited)Google Scholar
  17. 17.
    Rinaldi, F., Dowdall, J., Kaljurand, K., Hess, M., Molla, D.: Exploiting Paraphrases in a Question Answering System.In: 2nd International Workshop on Paraphrasing, 41st ACL (2003)Google Scholar
  18. 18.
    Tapanainen, P.: T Jarvinen, A non-projective dependency parser. In: the 5th Conference on Applied Natural Language Processing, Washington, DC (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Florence Reeder
    • 2
  • Bonnie Dorr
    • 3
  • David Farwell
    • 4
  • Nizar Habash
    • 3
  • Stephen Helmreich
    • 4
  • Eduard Hovy
    • 5
  • Lori Levin
    • 1
  • Teruko Mitamura
    • 1
  • Keith Miller
    • 2
  • Owen Rambow
    • 6
  • Advaith Siddharthan
    • 6
  1. 1.Carnegie Mellon University 
  2. 2.Mitre Corporation 
  3. 3.University of Maryland 
  4. 4.New Mexico State University 
  5. 5.University of Southern California 
  6. 6.Columbia University 

Personalised recommendations