Advertisement

Machine Translation

, Volume 9, Issue 3–4, pp 221–250 | Cite as

From syntactic encodings to thematic roles: Building lexical entries for interlingual MT

  • Bonnie J. Dorr
  • Joseph Garman
  • Amy Weinberg
Article

Abstract

Our goal is to construct large-scale lexicons for interlingual MT of English, Arabic, Korean, and Spanish. We describe techniques that predict salient linguistic features of a non-English word using the features of its English gloss (i.e., translation) in a bilingual dictionary. While not exact, owing to inexact glosses and language-to-language variations, these techniques can augment an existing dictionary with reasonable accuracy, thus saving significant time. We have conducted two experiments that demonstrate the value of these techniques. The first tested the feasibility of building a database of thematic grids for over 6500 Arabic verbs based on a mapping between English glosses and the syntactic codes in Longman's Dictionary of Contemporary English (LDOCE) (Procter, 1978). We show that it is more efficient and less error-prone to hand-verify the automatically constructed grids than it would be to build the thematic grids by hand from scratch. The second experiment tested the automatic classification of verbs into a richer semantic typology based on (Levin, 1993), from which we can derive a more refined set of thematic grids. In this second experiment, we show that a brute-force, non-robust technique provides 72% accuracy for semantic classification of LDOCE verbs; we then show that it is possible to approach this yield with a more robust technique based on fine-tuned statistical correlations. We further suggest the possibility of raising this yield by taking into account linguistic factors such as polysemy and positive and negative constraints on the syntax-semantics relation. We conclude that, while human intervention will always be necessary for the construction of a semantic classification from LDOCE, such intervention is significantly minimized as more knowledge about the syntax-semantics relation is introduced.

Keywords

lexical acquisition interlingual MT thematic grids Arabic lexicon semantic verb classes syntactic codes Longman's Dictionary 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alshawi, H. 1989. Analysing the Dictionary Definitions. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 153–169.Google Scholar
  2. Boguraev, B. and T. Briscoe. 1989. Utilising the LDOCE Grammar Codes. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 85–116.Google Scholar
  3. Dorr, B.J. 1993.Machine Translation: A View from the Lexicon. MIT Press, Cambridge, MA.Google Scholar
  4. Dorr, B.J., J. Hendler, S. Blanksteen, and B. Migdalof. 1994. Use of LCS and Discourse for Intelligent Tutoring: On Beyond Syntax. In M. Holland and J. Kaplan and M. Sams, editor,Intelligent Language Tutors: Balancing Theory and Technology. Lawrence Erlbaum Associates, Hillsdale, NJ.Google Scholar
  5. Dorr, B.J. and D. Jones. 1995. Automatic Extraction of Semantic Classes from Syntactic Information in Online Resources. Technical Report UMIACS/CS TR, Institute for Advanced Computer Studies, University of Maryland, College Park, MD.Google Scholar
  6. Dorr, B.J., D. Lin, J. Lee, and S. Suh. 1994. A Paradigm for Non-head-driven Parsing: Parameterized Message-Passing. InProceedings of the International Conference on New Methods in Language Processing, Manchester, UK.Google Scholar
  7. Farwell, D., L. Guthrie, and Y. Wilks. 1993. Automatically Creating Lexical Entries for ULTRA, a Multilingual MT System.Machine Translation, 8(3).Google Scholar
  8. Fillmore, C.J. 1968. The Case for Case. In E. Bach and R.T. Harms, editor,Universals in Linguistic Theory. Holt, Rinehart, and Winston, pages 1–88.Google Scholar
  9. Fontenelle, T. and J. Vanandroye. 1989. Retrieving Ergative Verbs from a Lexical Data Base.Dictionaries, 11:11–39.Google Scholar
  10. Grimshaw, J. 1990.Argument Structure. MIT Press, Cambridge, MA.Google Scholar
  11. Gruber, J.S. 1965.Studies in Lexical Relations. Ph.D. thesis, Information Science, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  12. Jackendoff, R.S. 1983.Semantics and Cognition. MIT Press, Cambridge, MA.Google Scholar
  13. Jackendoff, R.S. 1990.Semantic Structures. MIT Press, Cambridge, MA.Google Scholar
  14. Levin, B. 1993.English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL.Google Scholar
  15. Lin, D., B.J. Dorr, J. Lee, and S. Suh. 1994. A Parameter-Based Message-Passing Parser for MT of Korean and English. InProceedings of the Association for MT in the Americas Conference on Partnerships in Translation Technology, Columbia, MD, pages 149–156, Columbia, MD.Google Scholar
  16. Lonsdale, D., T. Mitamura, and E. Nyberg. 1995. Acquisition of Large Lexicons for Practical Knowledge-Based MT.Machine Translation, 9(3).Google Scholar
  17. Montemagni, S. and L. Vanderwende. 1992. Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries. InProceedings of Fourteenth International Conference on Computational Linguistics, pages 546–552, Nantes, France.Google Scholar
  18. Pesetsky, D. 1982. Paths and Categories. Unpublished MIT Ph.D. dissertation.Google Scholar
  19. Pinker, S. 1989.Learnability and Cognition: The Acquisition of Argument Structure. MIT Press, Cambridge, MA.Google Scholar
  20. Procter, P. 1978.Longman Dictionary of Contemporary English. Longman, London.Google Scholar
  21. Sanfilippo, A. and V. Poznanski. 1992. The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Resources. InProceedings of the Applied Natural Language Processing Conference, pages 80–87, Trento, Italy.Google Scholar
  22. Weinberg, A., J. Garman, J. Martin, and P. Merlo. 1994. Principle-Based Parser for Foreign Language Training in German and Arabic. In M. Holland and J. Kaplan and M. Sams, editor,Intelligent Language Tutors: Balancing Theory and Technology. Lawrence Erlbaum Associates, Hillsdale, NJ.Google Scholar
  23. Wilks, Y., D. Fass, C.M. Guo, J.E. McDonald, T. Plate, and B.M. Slator. 1989. A Tractable Machine Dictionary as a Resource for Computational Semantics. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 85–116.Google Scholar
  24. Wilks, Y., D. Fass, C.M. Guo, J.E. McDonald, T. Plate, and B.M. Slator. 1990. Providing Machine Tractable Dictionary Tools.Machine Translation, 5(2):99–154.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Bonnie J. Dorr
    • 1
  • Joseph Garman
    • 2
  • Amy Weinberg
    • 3
  1. 1.Department of Computer Science and UMIACSUniversity of MarylandCollege Park
  2. 2.Department of LinguisticsUniversity of MarylandCollege Park
  3. 3.Department of Linguistics and UMIACSUniversity of MarylandCollege Park

Personalised recommendations