Abstract
This paper describes Lexogen, a system for generating natural-languagesentences from Lexical Conceptual Structure, an interlingualrepresentation. The system has been developed as part of aChinese–English Machine Translation (MT) system; however, it isdesigned to be used for many other MT language pairs and naturallanguage applications. The contributions of this work include: (1)development of a large-scale Hybrid Natural Language Generation system withlanguage-independent components; (2) enhancements to an interlingualrepresentation and associated algorithm forgeneration from ambiguous input; (3) development of an efficientreusable language-independent linearization module with a grammardescription language that can be used with other systems; (4)improvements to an earlier algorithm forhierarchically mapping thematic roles to surface positions; and (5)development of a diagnostic tool for lexicon coverage and correctnessand use of the tool for verification of English, Spanish, and Chineselexicons. An evaluation of Chinese–English translation quality showscomparable performance with a commercial translation system. Thegeneration system can also be extended to other languages and this isdemonstrated and evaluated for Spanish.
Similar content being viewed by others
References
Alshawi, Hiyan, Srinivas Bangalore, and Shona Douglas: 2000, 'Learning Dependency Translation Models as Collections of Finite-State Head Transducers', Computational Linguistics 26, 45-60.
Alsina, Alex and Sam A. Mchombo: 1993, 'Object Asymmetries and the Chichewa Applicative Construction', in Sam A. Mchombo (ed.): Aspects of Automated Natural Language Generation, Stanford, CA: CSLI Publications, pp. 1-46.
Baker, Carl Lee: 1989, English Syntax, Cambridge, MA: The MIT Press.
Bangalore, Srinivas and Owen Rambow: 2000, 'Corpus-Based Lexical Choice in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 464-471.
Bangalore, S., O. Rambow, and S. Whittaker: 2000, 'Evaluation Metrics for Generation', in Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel, pp. 1-8.
Bateman, John A.: 1997, 'Enabling technology for multilingual natural language generation: the KPML development environment', Natural Language Engineering 3, 15-55.
Bateman, J., C. Matthiessen, and L. Zeng: 1999, 'Multilingual natural language generation for multilingual software: a functional linguistic approach', Applied Artificial Intelligence 13, 607-639.
Bresnan, J. and J. Kanerva: 1989, 'Locative Inversion in Chichewa: A Case Study of Factorization in Grammar', Linguistic Inquiry 20, 1-50.
Carrier-Duncan, J.: 1985, 'Linking of Thematic Roles in Derivational Word Formation', Linguistic Inquiry 16, 1-34.
Chandioux, John: 1989, 'MÉTÉO: 100 Million Words Later', in D. L. Hammond (ed.), American Translators Association Conference 1989: Coming of Age, Medford, NJ: Learned Information, pp. 449-453.
Charniak, E.: 2000, 'statistical Techniques in Natural Language Processing', in The MIT Encyclopedia of the Cognitive Sciences, Cambridge: MIT Press.
Church, Kenneth W. and Eduard H. Hovy: 1993, 'Good Applications for Crummy Machine Translation', Machine Translation 8, 239-258.
Cole, R., J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue: 1997, Survey of the State of the Art in Human Language Technology, Cambridge University Press, Cambridge, UK.
Dorr, Bonnie Jean: 1993a, 'Interlingual Machine Translation: A Parameterized Approach', Artificial Intelligence 63, 429-492.
Dorr, Bonnie Jean: 1993b, Machine Translation: A View from the Lexicon, Cambridge, MA: The MIT Press.
Dorr, Bonnie J.: 1994, 'Machine Translation Divergences: A Formal Description and Proposed Solution', Computational Linguistics 20, 597-633.
Dorr, Bonnie J.: 1997a, 'Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring', in Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 139-146.
Dorr, Bonnie J.: 1997b, 'Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation', Machine Translation 12, 271-322.
Dorr, B. J.: 2001, 'LCS Verb Database’, Technical Report Online Software Database, University ofMaryland, College Park, MD. http://www.umiacs.umd.edu/∼bonnie/LCSDatabase_Docmentation.html
Dorr, B. J. and T. Gaasterland: 2002, 'Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions', Technical Report CS-TR-4391, UMIACS-TR-2002-71, LAMP-TR-091. University of Maryland, College Park, MD, USA.
Dorr, Bonnie J., Joseph Garman, and Amy Weinberg: 1995, 'From Syntactic Encodings to Thematic Roles: Building Lexical Entries for Interlingual MT', Machine Translation 9, 221-250.
Dorr, Bonnie, Nizar Habash, and David Traum: 1998, 'A Thematic Hierarchy for Efficient Generation from Lexical-Conceptal Structure, in David Farwell, Laurie Gerber and Eduard Hovy (eds), Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Berlin: Springer, pp. 333-343.
Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: 2000, 'Building a Chinese-English Mapping Between Verb Concepts for Multilingual Applications', In White (2000), pp. 1-12.
Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: forthcoming, 'Construction of a Chinese-English Verb Lexicon for Embedded Machine Translation in Cross-Language Information Retrieval', to appear in Machine Translation (Special Issue on Embedded MT).
Dorr, Bonnie J. and Mari Broman Olsen: 1996, 'Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization', Machine Translation 11, 37-74.
Elhadad, Michael, Kathleen McKeown, and J. Robin: 1997, 'Floating Constraints in Lexical Choice', Computational Linguistics 23, 195-240.
Elhadad, Michael and Jacques Robin: 1992, 'Controlling Content Realization with Functional Uni-fication Grammars', in Robert Dale, Eduard Hovy, Dietmar Rösner, and Oliviero Stock (eds), Aspects of Automated Natural Language Generation: The 6th International Workshop on Natural Language Generation, Berlin: Springer, pp. 89-104.
Fellbaum, C.: 1998, WordNet: An Electronic Lexical Database, Cambridge, MA: The MIT Press.
Giorgi, A.: 1984, 'Toward a Theory of Long Distance Anaphors: A GB Approach', The Linguistic Review 3, 307-361.
Grimshaw, J. and A. Mester: 1988, 'Light Verbs and Theta-Marking', Linguistic Inquiry 19, 205-232.
Habash, Nizar: 2000, 'Oxygen: A Language Independent Linearization Engine', in White (2000), pp. 68-79.
Habash, Nizar: 2001, A Reference Manual to the Linearization Engine oxyGen version 1.6', Technical report CS-TR-4295, University of Maryland, College Park, MD.
Habash, Nizar and Bonnie Dorr: 2001a, 'Large Scale Language Independent Generation Using Thematic Hierarchies', in MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 139-144.
Habash, Nizar and Bonnie J. Dorr: 2001b, 'Large Scale Language Independent Generation: Using Thematic Hierarchies', Technical report LAMP-TR-075, CS-TR-4280, UMIACS-TR-2001-59, University of Maryland, College Park, MD.
Halliday, Michael A. K.: 1985, An Introduction to Functional Grammar, London: Edward Arnold.
Hirst, Graeme: 1987, Semantic Interpretation and the Resolution of Ambiguity, New York: Cambridge University Press.
Hovy, Eduard: 1988, 'Generating Natural Language under Pragmatic Constraints', Ph.D. thesis, Yale University.
Hovy, Eduard: 1999, 'Toward Finely Differentiated Evaluation Metrics for Machine Translation', in Proceedings of the EAGLES Workshop on Standards and Evaluation, Pisa, Italy, pp. 127-133.
International Standards for Language Engineering (ISLE): 2000, 'The ISLE Classification of Machine Translation Evaluations', http://www.isi.edu/natural-language/mteval/, [accessed 9/8/2003].
Jackendoff, Ray S.: 1972, Semantic Interpretation in Generative Grammar, Cambridge, MA: The MIT Press.
Jackendoff, Ray S.: 1983, Semantics and Cognition, Cambridge, MA: The MIT Press.
Jackendoff, Ray S.: 1990, Semantic Structures, Cambridge, MA: The MIT Press.
Jackendoff, Ray S.: 1996, 'The Proper Treatment of Measuring Out, Telicity, and Perhaps Even Quantification in English', Natural Language and Linguistic Theory 14, 305-354.
Joshi, A. K.: 1987, 'An Introduction to Tree Adjoining Grammars', in A. Manaster-Ramer (ed.): Mathematics of Language, John Benjamins, Amsterdam, pp. 87-115
Jurafsky, Daniel and James H. Martin: 2000, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Upper Saddle River, New Jersey: Prentice Hall.
Kay, Martin: 1979, 'Functional Grammar', in Proceedings of the 5th Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, pp. 142-158.
Kiparsky, P.: 1985, 'Morphology and Grammatical Relations', unpublished ms., Stanford University.
Kittredge, Richard, Lidija Iordanskaja, and Alain Polguère: 1988, 'Multi-Lingual Text Generation and the Meaning-Text Theory', in Second International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Pittsburgh, Pennsylvania.
Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo Iida, Steve K. Luk, Akitoshi Okumura, Richard Whitney, and Kenji Yamada: 1994, 'Integrating Knowledge Bases and Statistics in MT', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 134-141.
Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard H. Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, and Kenji Yamada: 1995, 'Filling Knowledge Gaps in a Broad-Coverage Machine Translation System', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, pp. 1390-1397.
Knight, Kevin and Vasileios Hatzivassiloglou: 1995, 'Two-Level, Many-Paths Generation', in 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge,MA, pp. 252-260.
Knight, Kevin and Steve K. Luk: 1994, 'Building a Large Knowledge Base forMachine Translation', in Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, pp. 773-778.
Langkilde, Irene and Kevin Knight: 1998a, 'Generating Word Lattices from Abstract Meaning Representation', Technical report, Information Sciences Institute, University of Southern California, Marina del Rey, CA.
Langkilde, Irene and Kevin Knight: 1998b, 'Generation that Exploits Corpus-Based Statistical Knowledge', in COLING-ACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704-710.
Langkilde, Irene and Kevin Knight: 1998c, 'The Practical Value of n-Grams in Generation', in Proceedings of the 9th International Natural Language Workshop (INLG'98), Niagara-on-the-Lake, Ontario, pp. 248-255.
Langkilde-Geary, Irene: 2002, 'An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator', in Proceedings of International Natural Language Generation Conference (INLG'02), Harriman, NY, pp. 17-24
Larson, R.: 1988, 'On the Double Object Construction', Linguistic Inquiry 19, 335-391.
Levin, Beth: 1993, English Verb Classes and Alternations: A Preliminary Investigation, Chicago, IL: University of Chicago Press.
Levin, B. and M. Rappaport Hovav: 1996, 'From Lexical Semantics to Argument Realization', Technical report, Northwestern University, Evanston IL, and Bar Ilan University, Ramat Gan, Israel. ftp://ftp.ling.nwu.edu/pub/beth/borer96.ps.
Malouf, Robert: 2000, 'The Order of Prenominal Adjectives in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 85-92.
Mann, William C. and Christian Matthiessen: 1985, 'Demonstration of the Nigel Text Generation Computer Program', in James D. Benson and William S. Greaves (eds), Systemic Perspectives on Discourse, Volume 1, Norwood, NJ: Ablex, pp. 50-83.
Marcu, D., L. Carlson, and M. Watanabe: 2000, 'An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?', in First International Natural Language Generation Conference (INLG'2000), Mitzpe Ramon, Israel, pp. 17-23.
Miller, George A. and Christiane Fellbaum: 1991, 'Semantic Networks of English', in Beth Levin and Steven Pinker (eds), Lexical and Conceptual Semantics, Amsterdam: Elsevier, pp. 197-229.
Nagao, Makoto: 1989, 'Two Years After the MT SUMMIT', in MT Summit II: Final Programme, Exhibition, Papers, Munich, Germany, pp. 100-105.
Nishgauchi, T.: 1984, 'Control and the Thematic Domain', Language 60, 215-260.
Okumura, Akitoshi, Kazunori Muraki, and Susumu Akamine: 1991, 'Multi-lingual Sentence Generation from the PIVOT Interlingua', in Machine Translation Summit III Proceedings, Washington, D.C., pp. 67-71. Repr. in: Sergei Nirenburg (ed.) Progress in Machine Translation, Amsterdam/Tokyo (1993): IOS Press/Ohmsha, pp. 119-125.
Olsen, Mari Broman: 1997, A Semantic and Pragmatic Model of Lexical and Grammatical Aspect, New York: Garland Press.
Olsen, Mari, David Traum, Carol Van Ess-Dykema, and Amy Weinberg: 2001, 'Implicit Cues for Explicit Generation: Using Telicity as a Cue for Tense Structure in a Chinese to English MT System', in MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp. 259-264.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu: 2002, 'Bleu: A Method for Automatic Evaluation of Machine Translation', in 40th Annual Meeting of the Association of Computational Linguistics, Philadelphia, PA, pp. 311-318.
Paris, C., K. Vander Linden, M. Fischer, A. Hartley, L. Pemberton, R. Power, and D. Scott: 1995, 'A Support Tool for Writing Multilingual Instructions', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montréal, Canada, pp. 1398-1404.
Penman: 1989, 'The Penman Reference Manual', ISI, University of Southern California, Marina del Rey, CA.
Polguère, A.: 1991, 'Everything Has Not Been Said about Interlinguae: The Case of Multilingual Text Generation Systems', in Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS '91), Singapore, pp. 314-320.
Quirk, Randolf, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik: 1985, A Comprehensive Grammar of the English Language, London: Longman.
Ratnaparkhi, Adwait: 2000, 'Trainable Methods for Surface Natural Language Generation', in 1st Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, pp. 194-201.
Reiter, Ehud: 1995, 'NLG vs Templates', in Proceedings of the Fifth European Workshop on Natural Language Generation, Leiden, The Netherlands, pp. 95-106.
Rosetta, M. T.: 1994, Compositional Translation, Dordrecht: Kluwer.
Rösner, Dietmar: 1994, Automatische Generierung von mehrsprachigen Instruktionstexten aus einer Wissensbasis [Automatic knowledge-based generation of multilingual instruction texts]. Habilitationsschrift, Fakultät für Informatik, Universität Stuttgart.
Rösner, Dietmar and Manfred Stede: 1994, 'TECHDOC: Multilingual Generation of Online and Offline Instructional Text', in 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 209-210.
Shaw, James and Vasileios Hatzivassiloglou: 1999, 'Ordering Among Premodifiers', in 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, pp. 135-143.
Vander Linden, K. and D. Scott: 1995, 'Raising the Interlingual Ceiling with Multilingual Text Generation', in Proceedings of the IJCAI Workshop on Multilingual Text Generation, Montreal, Canada, pp. 95-101.
White, John S. <nt>(ed.): 2000, Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Berlin: Springer.
Wilkins, Wendy: 1988, 'Thematic Structure and Reflexivization', in Wendy Wilkins (ed.), Syntax and Semantics 21: Thematic Relations, San Diego, CA: Academic Press, pp. 191-213.
XTAG-Group, T.: 1999, 'A Lexicalized Tree Adjoining Grammar for English', Technical report, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, PA.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Habash, N., Dorr, B. & Traum, D. Hybrid Natural Language Generation from Lexical Conceptual Structures. Machine Translation 18, 81–128 (2003). https://doi.org/10.1023/B:COAT.0000020960.27186.18
Issue Date:
DOI: https://doi.org/10.1023/B:COAT.0000020960.27186.18