Skip to main content
Log in

Hybrid Natural Language Generation from Lexical Conceptual Structures

  • Published:
Machine Translation

Abstract

This paper describes Lexogen, a system for generating natural-languagesentences from Lexical Conceptual Structure, an interlingualrepresentation. The system has been developed as part of aChinese–English Machine Translation (MT) system; however, it isdesigned to be used for many other MT language pairs and naturallanguage applications. The contributions of this work include: (1)development of a large-scale Hybrid Natural Language Generation system withlanguage-independent components; (2) enhancements to an interlingualrepresentation and associated algorithm forgeneration from ambiguous input; (3) development of an efficientreusable language-independent linearization module with a grammardescription language that can be used with other systems; (4)improvements to an earlier algorithm forhierarchically mapping thematic roles to surface positions; and (5)development of a diagnostic tool for lexicon coverage and correctnessand use of the tool for verification of English, Spanish, and Chineselexicons. An evaluation of Chinese–English translation quality showscomparable performance with a commercial translation system. Thegeneration system can also be extended to other languages and this isdemonstrated and evaluated for Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alshawi, Hiyan, Srinivas Bangalore, and Shona Douglas: 2000, 'Learning Dependency Translation Models as Collections of Finite-State Head Transducers', Computational Linguistics 26, 45-60.

    Google Scholar 

  • Alsina, Alex and Sam A. Mchombo: 1993, 'Object Asymmetries and the Chichewa Applicative Construction', in Sam A. Mchombo (ed.): Aspects of Automated Natural Language Generation, Stanford, CA: CSLI Publications, pp. 1-46.

    Google Scholar 

  • Baker, Carl Lee: 1989, English Syntax, Cambridge, MA: The MIT Press.

    Google Scholar 

  • Bangalore, Srinivas and Owen Rambow: 2000, 'Corpus-Based Lexical Choice in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 464-471.

  • Bangalore, S., O. Rambow, and S. Whittaker: 2000, 'Evaluation Metrics for Generation', in Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel, pp. 1-8.

  • Bateman, John A.: 1997, 'Enabling technology for multilingual natural language generation: the KPML development environment', Natural Language Engineering 3, 15-55.

    Google Scholar 

  • Bateman, J., C. Matthiessen, and L. Zeng: 1999, 'Multilingual natural language generation for multilingual software: a functional linguistic approach', Applied Artificial Intelligence 13, 607-639.

    Google Scholar 

  • Bresnan, J. and J. Kanerva: 1989, 'Locative Inversion in Chichewa: A Case Study of Factorization in Grammar', Linguistic Inquiry 20, 1-50.

    Google Scholar 

  • Carrier-Duncan, J.: 1985, 'Linking of Thematic Roles in Derivational Word Formation', Linguistic Inquiry 16, 1-34.

    Google Scholar 

  • Chandioux, John: 1989, 'MÉTÉO: 100 Million Words Later', in D. L. Hammond (ed.), American Translators Association Conference 1989: Coming of Age, Medford, NJ: Learned Information, pp. 449-453.

    Google Scholar 

  • Charniak, E.: 2000, 'statistical Techniques in Natural Language Processing', in The MIT Encyclopedia of the Cognitive Sciences, Cambridge: MIT Press.

    Google Scholar 

  • Church, Kenneth W. and Eduard H. Hovy: 1993, 'Good Applications for Crummy Machine Translation', Machine Translation 8, 239-258.

    Google Scholar 

  • Cole, R., J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue: 1997, Survey of the State of the Art in Human Language Technology, Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Dorr, Bonnie Jean: 1993a, 'Interlingual Machine Translation: A Parameterized Approach', Artificial Intelligence 63, 429-492.

    Google Scholar 

  • Dorr, Bonnie Jean: 1993b, Machine Translation: A View from the Lexicon, Cambridge, MA: The MIT Press.

    Google Scholar 

  • Dorr, Bonnie J.: 1994, 'Machine Translation Divergences: A Formal Description and Proposed Solution', Computational Linguistics 20, 597-633.

    Google Scholar 

  • Dorr, Bonnie J.: 1997a, 'Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring', in Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 139-146.

  • Dorr, Bonnie J.: 1997b, 'Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation', Machine Translation 12, 271-322.

    Google Scholar 

  • Dorr, B. J.: 2001, 'LCS Verb Database’, Technical Report Online Software Database, University ofMaryland, College Park, MD. http://www.umiacs.umd.edu/∼bonnie/LCSDatabase_Docmentation.html

    Google Scholar 

  • Dorr, B. J. and T. Gaasterland: 2002, 'Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions', Technical Report CS-TR-4391, UMIACS-TR-2002-71, LAMP-TR-091. University of Maryland, College Park, MD, USA.

    Google Scholar 

  • Dorr, Bonnie J., Joseph Garman, and Amy Weinberg: 1995, 'From Syntactic Encodings to Thematic Roles: Building Lexical Entries for Interlingual MT', Machine Translation 9, 221-250.

    Google Scholar 

  • Dorr, Bonnie, Nizar Habash, and David Traum: 1998, 'A Thematic Hierarchy for Efficient Generation from Lexical-Conceptal Structure, in David Farwell, Laurie Gerber and Eduard Hovy (eds), Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Berlin: Springer, pp. 333-343.

    Google Scholar 

  • Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: 2000, 'Building a Chinese-English Mapping Between Verb Concepts for Multilingual Applications', In White (2000), pp. 1-12.

  • Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: forthcoming, 'Construction of a Chinese-English Verb Lexicon for Embedded Machine Translation in Cross-Language Information Retrieval', to appear in Machine Translation (Special Issue on Embedded MT).

  • Dorr, Bonnie J. and Mari Broman Olsen: 1996, 'Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization', Machine Translation 11, 37-74.

    Google Scholar 

  • Elhadad, Michael, Kathleen McKeown, and J. Robin: 1997, 'Floating Constraints in Lexical Choice', Computational Linguistics 23, 195-240.

    Google Scholar 

  • Elhadad, Michael and Jacques Robin: 1992, 'Controlling Content Realization with Functional Uni-fication Grammars', in Robert Dale, Eduard Hovy, Dietmar Rösner, and Oliviero Stock (eds), Aspects of Automated Natural Language Generation: The 6th International Workshop on Natural Language Generation, Berlin: Springer, pp. 89-104.

    Google Scholar 

  • Fellbaum, C.: 1998, WordNet: An Electronic Lexical Database, Cambridge, MA: The MIT Press.

    Google Scholar 

  • Giorgi, A.: 1984, 'Toward a Theory of Long Distance Anaphors: A GB Approach', The Linguistic Review 3, 307-361.

    Google Scholar 

  • Grimshaw, J. and A. Mester: 1988, 'Light Verbs and Theta-Marking', Linguistic Inquiry 19, 205-232.

    Google Scholar 

  • Habash, Nizar: 2000, 'Oxygen: A Language Independent Linearization Engine', in White (2000), pp. 68-79.

  • Habash, Nizar: 2001, A Reference Manual to the Linearization Engine oxyGen version 1.6', Technical report CS-TR-4295, University of Maryland, College Park, MD.

    Google Scholar 

  • Habash, Nizar and Bonnie Dorr: 2001a, 'Large Scale Language Independent Generation Using Thematic Hierarchies', in MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 139-144.

    Google Scholar 

  • Habash, Nizar and Bonnie J. Dorr: 2001b, 'Large Scale Language Independent Generation: Using Thematic Hierarchies', Technical report LAMP-TR-075, CS-TR-4280, UMIACS-TR-2001-59, University of Maryland, College Park, MD.

    Google Scholar 

  • Halliday, Michael A. K.: 1985, An Introduction to Functional Grammar, London: Edward Arnold.

    Google Scholar 

  • Hirst, Graeme: 1987, Semantic Interpretation and the Resolution of Ambiguity, New York: Cambridge University Press.

    Google Scholar 

  • Hovy, Eduard: 1988, 'Generating Natural Language under Pragmatic Constraints', Ph.D. thesis, Yale University.

  • Hovy, Eduard: 1999, 'Toward Finely Differentiated Evaluation Metrics for Machine Translation', in Proceedings of the EAGLES Workshop on Standards and Evaluation, Pisa, Italy, pp. 127-133.

  • International Standards for Language Engineering (ISLE): 2000, 'The ISLE Classification of Machine Translation Evaluations', http://www.isi.edu/natural-language/mteval/, [accessed 9/8/2003].

  • Jackendoff, Ray S.: 1972, Semantic Interpretation in Generative Grammar, Cambridge, MA: The MIT Press.

    Google Scholar 

  • Jackendoff, Ray S.: 1983, Semantics and Cognition, Cambridge, MA: The MIT Press.

    Google Scholar 

  • Jackendoff, Ray S.: 1990, Semantic Structures, Cambridge, MA: The MIT Press.

    Google Scholar 

  • Jackendoff, Ray S.: 1996, 'The Proper Treatment of Measuring Out, Telicity, and Perhaps Even Quantification in English', Natural Language and Linguistic Theory 14, 305-354.

    Google Scholar 

  • Joshi, A. K.: 1987, 'An Introduction to Tree Adjoining Grammars', in A. Manaster-Ramer (ed.): Mathematics of Language, John Benjamins, Amsterdam, pp. 87-115

    Google Scholar 

  • Jurafsky, Daniel and James H. Martin: 2000, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Upper Saddle River, New Jersey: Prentice Hall.

    Google Scholar 

  • Kay, Martin: 1979, 'Functional Grammar', in Proceedings of the 5th Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, pp. 142-158.

  • Kiparsky, P.: 1985, 'Morphology and Grammatical Relations', unpublished ms., Stanford University.

  • Kittredge, Richard, Lidija Iordanskaja, and Alain Polguère: 1988, 'Multi-Lingual Text Generation and the Meaning-Text Theory', in Second International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Pittsburgh, Pennsylvania.

  • Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo Iida, Steve K. Luk, Akitoshi Okumura, Richard Whitney, and Kenji Yamada: 1994, 'Integrating Knowledge Bases and Statistics in MT', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 134-141.

  • Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard H. Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, and Kenji Yamada: 1995, 'Filling Knowledge Gaps in a Broad-Coverage Machine Translation System', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, pp. 1390-1397.

  • Knight, Kevin and Vasileios Hatzivassiloglou: 1995, 'Two-Level, Many-Paths Generation', in 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge,MA, pp. 252-260.

  • Knight, Kevin and Steve K. Luk: 1994, 'Building a Large Knowledge Base forMachine Translation', in Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, pp. 773-778.

  • Langkilde, Irene and Kevin Knight: 1998a, 'Generating Word Lattices from Abstract Meaning Representation', Technical report, Information Sciences Institute, University of Southern California, Marina del Rey, CA.

    Google Scholar 

  • Langkilde, Irene and Kevin Knight: 1998b, 'Generation that Exploits Corpus-Based Statistical Knowledge', in COLING-ACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704-710.

    Google Scholar 

  • Langkilde, Irene and Kevin Knight: 1998c, 'The Practical Value of n-Grams in Generation', in Proceedings of the 9th International Natural Language Workshop (INLG'98), Niagara-on-the-Lake, Ontario, pp. 248-255.

    Google Scholar 

  • Langkilde-Geary, Irene: 2002, 'An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator', in Proceedings of International Natural Language Generation Conference (INLG'02), Harriman, NY, pp. 17-24

  • Larson, R.: 1988, 'On the Double Object Construction', Linguistic Inquiry 19, 335-391.

    Google Scholar 

  • Levin, Beth: 1993, English Verb Classes and Alternations: A Preliminary Investigation, Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Levin, B. and M. Rappaport Hovav: 1996, 'From Lexical Semantics to Argument Realization', Technical report, Northwestern University, Evanston IL, and Bar Ilan University, Ramat Gan, Israel. ftp://ftp.ling.nwu.edu/pub/beth/borer96.ps.

    Google Scholar 

  • Malouf, Robert: 2000, 'The Order of Prenominal Adjectives in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 85-92.

  • Mann, William C. and Christian Matthiessen: 1985, 'Demonstration of the Nigel Text Generation Computer Program', in James D. Benson and William S. Greaves (eds), Systemic Perspectives on Discourse, Volume 1, Norwood, NJ: Ablex, pp. 50-83.

    Google Scholar 

  • Marcu, D., L. Carlson, and M. Watanabe: 2000, 'An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?', in First International Natural Language Generation Conference (INLG'2000), Mitzpe Ramon, Israel, pp. 17-23.

    Google Scholar 

  • Miller, George A. and Christiane Fellbaum: 1991, 'Semantic Networks of English', in Beth Levin and Steven Pinker (eds), Lexical and Conceptual Semantics, Amsterdam: Elsevier, pp. 197-229.

    Google Scholar 

  • Nagao, Makoto: 1989, 'Two Years After the MT SUMMIT', in MT Summit II: Final Programme, Exhibition, Papers, Munich, Germany, pp. 100-105.

    Google Scholar 

  • Nishgauchi, T.: 1984, 'Control and the Thematic Domain', Language 60, 215-260.

    Google Scholar 

  • Okumura, Akitoshi, Kazunori Muraki, and Susumu Akamine: 1991, 'Multi-lingual Sentence Generation from the PIVOT Interlingua', in Machine Translation Summit III Proceedings, Washington, D.C., pp. 67-71. Repr. in: Sergei Nirenburg (ed.) Progress in Machine Translation, Amsterdam/Tokyo (1993): IOS Press/Ohmsha, pp. 119-125.

    Google Scholar 

  • Olsen, Mari Broman: 1997, A Semantic and Pragmatic Model of Lexical and Grammatical Aspect, New York: Garland Press.

    Google Scholar 

  • Olsen, Mari, David Traum, Carol Van Ess-Dykema, and Amy Weinberg: 2001, 'Implicit Cues for Explicit Generation: Using Telicity as a Cue for Tense Structure in a Chinese to English MT System', in MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp. 259-264.

  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu: 2002, 'Bleu: A Method for Automatic Evaluation of Machine Translation', in 40th Annual Meeting of the Association of Computational Linguistics, Philadelphia, PA, pp. 311-318.

  • Paris, C., K. Vander Linden, M. Fischer, A. Hartley, L. Pemberton, R. Power, and D. Scott: 1995, 'A Support Tool for Writing Multilingual Instructions', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montréal, Canada, pp. 1398-1404.

  • Penman: 1989, 'The Penman Reference Manual', ISI, University of Southern California, Marina del Rey, CA.

    Google Scholar 

  • Polguère, A.: 1991, 'Everything Has Not Been Said about Interlinguae: The Case of Multilingual Text Generation Systems', in Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS '91), Singapore, pp. 314-320.

  • Quirk, Randolf, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik: 1985, A Comprehensive Grammar of the English Language, London: Longman.

    Google Scholar 

  • Ratnaparkhi, Adwait: 2000, 'Trainable Methods for Surface Natural Language Generation', in 1st Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, pp. 194-201.

  • Reiter, Ehud: 1995, 'NLG vs Templates', in Proceedings of the Fifth European Workshop on Natural Language Generation, Leiden, The Netherlands, pp. 95-106.

  • Rosetta, M. T.: 1994, Compositional Translation, Dordrecht: Kluwer.

    Google Scholar 

  • Rösner, Dietmar: 1994, Automatische Generierung von mehrsprachigen Instruktionstexten aus einer Wissensbasis [Automatic knowledge-based generation of multilingual instruction texts]. Habilitationsschrift, Fakultät für Informatik, Universität Stuttgart.

  • Rösner, Dietmar and Manfred Stede: 1994, 'TECHDOC: Multilingual Generation of Online and Offline Instructional Text', in 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 209-210.

  • Shaw, James and Vasileios Hatzivassiloglou: 1999, 'Ordering Among Premodifiers', in 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, pp. 135-143.

  • Vander Linden, K. and D. Scott: 1995, 'Raising the Interlingual Ceiling with Multilingual Text Generation', in Proceedings of the IJCAI Workshop on Multilingual Text Generation, Montreal, Canada, pp. 95-101.

  • White, John S. <nt>(ed.): 2000, Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Berlin: Springer.

    Google Scholar 

  • Wilkins, Wendy: 1988, 'Thematic Structure and Reflexivization', in Wendy Wilkins (ed.), Syntax and Semantics 21: Thematic Relations, San Diego, CA: Academic Press, pp. 191-213.

    Google Scholar 

  • XTAG-Group, T.: 1999, 'A Lexicalized Tree Adjoining Grammar for English', Technical report, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, PA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Habash, N., Dorr, B. & Traum, D. Hybrid Natural Language Generation from Lexical Conceptual Structures. Machine Translation 18, 81–128 (2003). https://doi.org/10.1023/B:COAT.0000020960.27186.18

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:COAT.0000020960.27186.18

Navigation