Machine Translation

, Volume 12, Issue 4, pp 271–322 | Cite as

Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation

  • Bonnie J. Dorr


This paper describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language-independent representation called “lexical conceptual structure” (LCS). A primary goal of the LCS research is to demonstrate that synonymous verb senses share distributional patterns. We show how the syntax–semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. We start by describing the structure of the LCS and showing how this representation is used in FLT and MT. We then focus on the problem of building LCS dictionaries for large-scale FLT and MT. First, we describe authoring tools for manual and semi-automatic construction of LCS dictionaries; we then present a more sophisticated approach that uses linguistic techniques for building word definitions automatically. These techniques have been implemented as part of a set of lexicon-development tools used in the milt FLT project.

lexical acquisition foreign language tutoring interlingual MT semantic verb classes syntactic codes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alshawi, H.: 1989, ‘Analysing the Dictionary Definitions’, in: Boguraev and Briscoe (1989a), pp. 153–169.Google Scholar
  2. Ballard, B.W. and Stumberger, D.E.: 1986, ‘Semantic Acquisition in TELI: a Transportable, User-Customized Natural Language Processor’, in: 24th Annual Meeting of the Association for Computational Linguistics, New York, pp. 20–29.Google Scholar
  3. Barnett, J., Knight, K., Mani, I. and Rich, E.: 1990, ‘Knowledge and Natural Language Processing’, Communications of the ACM, Vol. 33, No. 8, Association for Computing Machinery, New York, pp. 50–71.Google Scholar
  4. Bates, M. and Bobrow, R.: 1983, ‘Information Retrieval Using a Transportable Natural Language Interface’, in: Proceedings of the Sixth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, pp. 81–86.Google Scholar
  5. Boguraev, B. and Briscoe, T. (eds): 1989a, Computational Lexicography for Natural Language Processing, Longman, London.Google Scholar
  6. Boguraev, B. and Briscoe, T.: 1989b, ‘Utilising the LDOCE Grammar Codes’, in: Boguraev and Briscoe (1989a), pp. 85–116.Google Scholar
  7. Brent, M.R.: 1993. ‘From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax’, Computational Linguistics, 19, 243–262.Google Scholar
  8. Carrier, J. and Randall, J.H.: 1993, ‘Lexical Mapping’, in: E. Reuland and W. Abraham (eds), Knowledge and Language II: Lexical and Conceptual Structure, Kluwer, Dordrecht, pp. 119–142.Google Scholar
  9. Chomsky, N.: 1981, Lectures on Government and Binding, Foris Publications, Dordrecht.Google Scholar
  10. Chomsky, N.: 1986, Knowledge of Language: Its Nature, Origin and Use, The MIT Press, Cambridge, MA.Google Scholar
  11. Church, K.W. and Hanks, P.: 1990, ‘Word Association Norms, Mutual Information, and Lexicography’, Computational Linguistics, 16, 22–29.Google Scholar
  12. Copestake, A., Briscoe, T., Vossen, P., Ageno, A., Castellon, I., Ribas, F., Rigau, G., Rodríguez, H. and Samiotou, A.: 1995, ‘Acquisition of Lexical Translation Relations from MRDs’, Machine Translation, 9, 183–219.Google Scholar
  13. Corbin, W., Copeland, D. and Buck, B.: 1994, ‘Determining Verb Usage from Parsed Corpora: Matrix of Levin's Syntactic/Semantic Classes’, Project Report for NLP Course (CMSC 723), University of Maryland, College Park, MD.Google Scholar
  14. Dorr, B.J.: 1992, ‘The Use of Lexical Semantics in Interlingual Machine Translation’, Machine Translation, 7, 135–193.Google Scholar
  15. Dorr, B.J.: 1993, Machine Translation: A View from the Lexicon, The MIT Press, Cambridge, MA.Google Scholar
  16. Dorr, B.J.: 1994, ‘Machine Translation Divergences: A Formal Description and Proposed Solution’, Computational Linguistics, 20, 597–633.Google Scholar
  17. Dorr, B.J.: 1997, ‘Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring’, in: Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 139–146.Google Scholar
  18. Dorr, B.J., Garman, J. and Weinberg, A.: 1995a, ‘From Syntactic Encodings to Thematic Roles: Building Lexical Entries for Interlingual MT’, Machine Translation, 9, 221–250.Google Scholar
  19. Dorr, B.J., Hendler, J., Blanksteen, S. and Migdalof, B.: 1993, ‘Use of Lexical Conceptual Structure for Intelligent Tutoring’, Technical Report UMIACS TR 93–108, CS TR 3161, University of Maryland, College ParkGoogle Scholar
  20. Dorr, B.J., Hendler, J., Blanksteen, S. and Migdalof, B.: 1995b, ‘Use of LCS and Discourse for Intelligent Tutoring: On Beyond Syntax’, in: Holland et al. (1995), pp. 289–309.Google Scholar
  21. Dorr, B.J. and Jones, D.: 1996a, ‘Acquisition of Semantic Lexicons: Using Word Sense Disambiguation to Improve Precision’, in: Proceedings of the Workshop on Breadth and Depth of Semantic Lexicons, 34th Annual Conference of the Association for Computational Linguistics, Santa Cruz, CA, pp. 42–50.Google Scholar
  22. Dorr, B.J. and Jones, D.: 1996b, ‘Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues’, in: COLING-96: The 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp. 322–333.Google Scholar
  23. Dorr, B.J., Lee, J-H., Voss, C. and Suh, S.: 1995c, ‘Development of Interlingual Lexical Conceptual Structures with Syntactic Markers for Machine Translation’, Technical Report UMIACS TR 95–16, CS TR 3412, University of Maryland, College Park, MD.Google Scholar
  24. Dorr, B.J., Lin, D., Lee, J-H. and Suh, S.: 1995d, ‘Efficient Parsing for Korean and English: A Parameterized Message-Passing Approach’, Computational Linguistics, 21, 255–263.Google Scholar
  25. Dorr, B.J. and Olsen, M.B.: 1996, ‘Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization’, Machine Translation, 11, 37–74.Google Scholar
  26. Dorr, B.J. and Olsen, M.B.: 1997, ‘Aspectual Modifications to a LCS Database for NLP Applications’, Technical Report LAMP TR 007, UMIACS TR 97–23, CS TR, University of Maryland, College Park, MD.Google Scholar
  27. Dorr, B.J. and Voss, C.: 1996, ‘A Multi-Level Approach to Interlingual MT: Defining the Interface between Representational Languages’, International Journal of Expert Systems, 9, 15–51.Google Scholar
  28. Farwell, D., Guthrie, L. and Wilks, Y.: 1993, ‘Automatically Creating Lexical Entries for ULTRA, a Multilingual MT System’, Machine Translation, 8, 127–145.Google Scholar
  29. Fillmore, C.J.: 1968.: ‘The Case for Case’, in: E. Bach and R. Harms (eds), Universals in Linguistic Theory, Holt, Rinehart, and Winston, New York, pp. 1–88.Google Scholar
  30. Fillmore, C.J.: 1970, ‘The Grammar of Hitting and Breaking’, in: R.A. Jacobs and P.S. Rosenbaum (eds), Readings in English Transformational Grammar, Ginn, Waltham, MA, pp. 120–133Google Scholar
  31. Fisher, C., Gleitman, H. and Gleitman, L.: 1991, ‘On the Semantic Content of Subcategorization Frames’, Cognitive Psychology, 23, 331–392.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Bonnie J. Dorr
    • 1
  1. 1.Department of Computer Science and UMIACSUniversity of MarylandCollege ParkU.S.A.

Personalised recommendations