Hybrid Natural Language Generation from Lexical Conceptual Structures

Habash, Nizar; Dorr, Bonnie; Traum, David

doi:10.1023/B:COAT.0000020960.27186.18

Hybrid Natural Language Generation from Lexical Conceptual Structures

Published: June 2003

Volume 18, pages 81–128, (2003)
Cite this article

Machine Translation

Nizar Habash¹,
Bonnie Dorr¹ &
David Traum²

188 Accesses
8 Citations
Explore all metrics

Abstract

This paper describes Lexogen, a system for generating natural-languagesentences from Lexical Conceptual Structure, an interlingualrepresentation. The system has been developed as part of aChinese–English Machine Translation (MT) system; however, it isdesigned to be used for many other MT language pairs and naturallanguage applications. The contributions of this work include: (1)development of a large-scale Hybrid Natural Language Generation system withlanguage-independent components; (2) enhancements to an interlingualrepresentation and associated algorithm forgeneration from ambiguous input; (3) development of an efficientreusable language-independent linearization module with a grammardescription language that can be used with other systems; (4)improvements to an earlier algorithm forhierarchically mapping thematic roles to surface positions; and (5)development of a diagnostic tool for lexicon coverage and correctnessand use of the tool for verification of English, Spanish, and Chineselexicons. An evaluation of Chinese–English translation quality showscomparable performance with a commercial translation system. Thegeneration system can also be extended to other languages and this isdemonstrated and evaluated for Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alshawi, Hiyan, Srinivas Bangalore, and Shona Douglas: 2000, 'Learning Dependency Translation Models as Collections of Finite-State Head Transducers', Computational Linguistics 26, 45-60.
Google Scholar
Alsina, Alex and Sam A. Mchombo: 1993, 'Object Asymmetries and the Chichewa Applicative Construction', in Sam A. Mchombo (ed.): Aspects of Automated Natural Language Generation, Stanford, CA: CSLI Publications, pp. 1-46.
Google Scholar
Baker, Carl Lee: 1989, English Syntax, Cambridge, MA: The MIT Press.
Google Scholar
Bangalore, Srinivas and Owen Rambow: 2000, 'Corpus-Based Lexical Choice in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 464-471.
Bangalore, S., O. Rambow, and S. Whittaker: 2000, 'Evaluation Metrics for Generation', in Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel, pp. 1-8.
Bateman, John A.: 1997, 'Enabling technology for multilingual natural language generation: the KPML development environment', Natural Language Engineering 3, 15-55.
Google Scholar
Bateman, J., C. Matthiessen, and L. Zeng: 1999, 'Multilingual natural language generation for multilingual software: a functional linguistic approach', Applied Artificial Intelligence 13, 607-639.
Google Scholar
Bresnan, J. and J. Kanerva: 1989, 'Locative Inversion in Chichewa: A Case Study of Factorization in Grammar', Linguistic Inquiry 20, 1-50.
Google Scholar
Carrier-Duncan, J.: 1985, 'Linking of Thematic Roles in Derivational Word Formation', Linguistic Inquiry 16, 1-34.
Google Scholar
Chandioux, John: 1989, 'MÉTÉO: 100 Million Words Later', in D. L. Hammond (ed.), American Translators Association Conference 1989: Coming of Age, Medford, NJ: Learned Information, pp. 449-453.
Google Scholar
Charniak, E.: 2000, 'statistical Techniques in Natural Language Processing', in The MIT Encyclopedia of the Cognitive Sciences, Cambridge: MIT Press.
Google Scholar
Church, Kenneth W. and Eduard H. Hovy: 1993, 'Good Applications for Crummy Machine Translation', Machine Translation 8, 239-258.
Google Scholar
Cole, R., J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue: 1997, Survey of the State of the Art in Human Language Technology, Cambridge University Press, Cambridge, UK.
Google Scholar
Dorr, Bonnie Jean: 1993a, 'Interlingual Machine Translation: A Parameterized Approach', Artificial Intelligence 63, 429-492.
Google Scholar
Dorr, Bonnie Jean: 1993b, Machine Translation: A View from the Lexicon, Cambridge, MA: The MIT Press.
Google Scholar
Dorr, Bonnie J.: 1994, 'Machine Translation Divergences: A Formal Description and Proposed Solution', Computational Linguistics 20, 597-633.
Google Scholar
Dorr, Bonnie J.: 1997a, 'Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring', in Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 139-146.
Dorr, Bonnie J.: 1997b, 'Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation', Machine Translation 12, 271-322.
Google Scholar
Dorr, B. J.: 2001, 'LCS Verb Database’, Technical Report Online Software Database, University ofMaryland, College Park, MD. http://www.umiacs.umd.edu/∼bonnie/LCSDatabase_Docmentation.html
Google Scholar
Dorr, B. J. and T. Gaasterland: 2002, 'Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions', Technical Report CS-TR-4391, UMIACS-TR-2002-71, LAMP-TR-091. University of Maryland, College Park, MD, USA.
Google Scholar
Dorr, Bonnie J., Joseph Garman, and Amy Weinberg: 1995, 'From Syntactic Encodings to Thematic Roles: Building Lexical Entries for Interlingual MT', Machine Translation 9, 221-250.
Google Scholar
Dorr, Bonnie, Nizar Habash, and David Traum: 1998, 'A Thematic Hierarchy for Efficient Generation from Lexical-Conceptal Structure, in David Farwell, Laurie Gerber and Eduard Hovy (eds), Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Berlin: Springer, pp. 333-343.
Google Scholar
Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: 2000, 'Building a Chinese-English Mapping Between Verb Concepts for Multilingual Applications', In White (2000), pp. 1-12.
Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: forthcoming, 'Construction of a Chinese-English Verb Lexicon for Embedded Machine Translation in Cross-Language Information Retrieval', to appear in Machine Translation (Special Issue on Embedded MT).
Dorr, Bonnie J. and Mari Broman Olsen: 1996, 'Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization', Machine Translation 11, 37-74.
Google Scholar
Elhadad, Michael, Kathleen McKeown, and J. Robin: 1997, 'Floating Constraints in Lexical Choice', Computational Linguistics 23, 195-240.
Google Scholar
Elhadad, Michael and Jacques Robin: 1992, 'Controlling Content Realization with Functional Uni-fication Grammars', in Robert Dale, Eduard Hovy, Dietmar Rösner, and Oliviero Stock (eds), Aspects of Automated Natural Language Generation: The 6th International Workshop on Natural Language Generation, Berlin: Springer, pp. 89-104.
Google Scholar
Fellbaum, C.: 1998, WordNet: An Electronic Lexical Database, Cambridge, MA: The MIT Press.
Google Scholar
Giorgi, A.: 1984, 'Toward a Theory of Long Distance Anaphors: A GB Approach', The Linguistic Review 3, 307-361.
Google Scholar
Grimshaw, J. and A. Mester: 1988, 'Light Verbs and Theta-Marking', Linguistic Inquiry 19, 205-232.
Google Scholar
Habash, Nizar: 2000, 'Oxygen: A Language Independent Linearization Engine', in White (2000), pp. 68-79.
Habash, Nizar: 2001, A Reference Manual to the Linearization Engine oxyGen version 1.6', Technical report CS-TR-4295, University of Maryland, College Park, MD.
Google Scholar
Habash, Nizar and Bonnie Dorr: 2001a, 'Large Scale Language Independent Generation Using Thematic Hierarchies', in MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 139-144.
Google Scholar
Habash, Nizar and Bonnie J. Dorr: 2001b, 'Large Scale Language Independent Generation: Using Thematic Hierarchies', Technical report LAMP-TR-075, CS-TR-4280, UMIACS-TR-2001-59, University of Maryland, College Park, MD.
Google Scholar
Halliday, Michael A. K.: 1985, An Introduction to Functional Grammar, London: Edward Arnold.
Google Scholar
Hirst, Graeme: 1987, Semantic Interpretation and the Resolution of Ambiguity, New York: Cambridge University Press.
Google Scholar
Hovy, Eduard: 1988, 'Generating Natural Language under Pragmatic Constraints', Ph.D. thesis, Yale University.
Hovy, Eduard: 1999, 'Toward Finely Differentiated Evaluation Metrics for Machine Translation', in Proceedings of the EAGLES Workshop on Standards and Evaluation, Pisa, Italy, pp. 127-133.
International Standards for Language Engineering (ISLE): 2000, 'The ISLE Classification of Machine Translation Evaluations', http://www.isi.edu/natural-language/mteval/, [accessed 9/8/2003].
Jackendoff, Ray S.: 1972, Semantic Interpretation in Generative Grammar, Cambridge, MA: The MIT Press.
Google Scholar
Jackendoff, Ray S.: 1983, Semantics and Cognition, Cambridge, MA: The MIT Press.
Google Scholar
Jackendoff, Ray S.: 1990, Semantic Structures, Cambridge, MA: The MIT Press.
Google Scholar
Jackendoff, Ray S.: 1996, 'The Proper Treatment of Measuring Out, Telicity, and Perhaps Even Quantification in English', Natural Language and Linguistic Theory 14, 305-354.
Google Scholar
Joshi, A. K.: 1987, 'An Introduction to Tree Adjoining Grammars', in A. Manaster-Ramer (ed.): Mathematics of Language, John Benjamins, Amsterdam, pp. 87-115
Google Scholar
Jurafsky, Daniel and James H. Martin: 2000, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Upper Saddle River, New Jersey: Prentice Hall.
Google Scholar
Kay, Martin: 1979, 'Functional Grammar', in Proceedings of the 5th Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, pp. 142-158.
Kiparsky, P.: 1985, 'Morphology and Grammatical Relations', unpublished ms., Stanford University.
Kittredge, Richard, Lidija Iordanskaja, and Alain Polguère: 1988, 'Multi-Lingual Text Generation and the Meaning-Text Theory', in Second International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Pittsburgh, Pennsylvania.
Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo Iida, Steve K. Luk, Akitoshi Okumura, Richard Whitney, and Kenji Yamada: 1994, 'Integrating Knowledge Bases and Statistics in MT', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 134-141.
Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard H. Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, and Kenji Yamada: 1995, 'Filling Knowledge Gaps in a Broad-Coverage Machine Translation System', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, pp. 1390-1397.
Knight, Kevin and Vasileios Hatzivassiloglou: 1995, 'Two-Level, Many-Paths Generation', in 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge,MA, pp. 252-260.
Knight, Kevin and Steve K. Luk: 1994, 'Building a Large Knowledge Base forMachine Translation', in Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, pp. 773-778.
Langkilde, Irene and Kevin Knight: 1998a, 'Generating Word Lattices from Abstract Meaning Representation', Technical report, Information Sciences Institute, University of Southern California, Marina del Rey, CA.
Google Scholar
Langkilde, Irene and Kevin Knight: 1998b, 'Generation that Exploits Corpus-Based Statistical Knowledge', in COLING-ACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704-710.
Google Scholar
Langkilde, Irene and Kevin Knight: 1998c, 'The Practical Value of n-Grams in Generation', in Proceedings of the 9th International Natural Language Workshop (INLG'98), Niagara-on-the-Lake, Ontario, pp. 248-255.
Google Scholar
Langkilde-Geary, Irene: 2002, 'An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator', in Proceedings of International Natural Language Generation Conference (INLG'02), Harriman, NY, pp. 17-24
Larson, R.: 1988, 'On the Double Object Construction', Linguistic Inquiry 19, 335-391.
Google Scholar
Levin, Beth: 1993, English Verb Classes and Alternations: A Preliminary Investigation, Chicago, IL: University of Chicago Press.
Google Scholar
Levin, B. and M. Rappaport Hovav: 1996, 'From Lexical Semantics to Argument Realization', Technical report, Northwestern University, Evanston IL, and Bar Ilan University, Ramat Gan, Israel. ftp://ftp.ling.nwu.edu/pub/beth/borer96.ps.
Google Scholar
Malouf, Robert: 2000, 'The Order of Prenominal Adjectives in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 85-92.
Mann, William C. and Christian Matthiessen: 1985, 'Demonstration of the Nigel Text Generation Computer Program', in James D. Benson and William S. Greaves (eds), Systemic Perspectives on Discourse, Volume 1, Norwood, NJ: Ablex, pp. 50-83.
Google Scholar
Marcu, D., L. Carlson, and M. Watanabe: 2000, 'An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?', in First International Natural Language Generation Conference (INLG'2000), Mitzpe Ramon, Israel, pp. 17-23.
Google Scholar
Miller, George A. and Christiane Fellbaum: 1991, 'Semantic Networks of English', in Beth Levin and Steven Pinker (eds), Lexical and Conceptual Semantics, Amsterdam: Elsevier, pp. 197-229.
Google Scholar
Nagao, Makoto: 1989, 'Two Years After the MT SUMMIT', in MT Summit II: Final Programme, Exhibition, Papers, Munich, Germany, pp. 100-105.
Google Scholar
Nishgauchi, T.: 1984, 'Control and the Thematic Domain', Language 60, 215-260.
Google Scholar
Okumura, Akitoshi, Kazunori Muraki, and Susumu Akamine: 1991, 'Multi-lingual Sentence Generation from the PIVOT Interlingua', in Machine Translation Summit III Proceedings, Washington, D.C., pp. 67-71. Repr. in: Sergei Nirenburg (ed.) Progress in Machine Translation, Amsterdam/Tokyo (1993): IOS Press/Ohmsha, pp. 119-125.
Google Scholar
Olsen, Mari Broman: 1997, A Semantic and Pragmatic Model of Lexical and Grammatical Aspect, New York: Garland Press.
Google Scholar
Olsen, Mari, David Traum, Carol Van Ess-Dykema, and Amy Weinberg: 2001, 'Implicit Cues for Explicit Generation: Using Telicity as a Cue for Tense Structure in a Chinese to English MT System', in MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp. 259-264.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu: 2002, 'Bleu: A Method for Automatic Evaluation of Machine Translation', in 40th Annual Meeting of the Association of Computational Linguistics, Philadelphia, PA, pp. 311-318.
Paris, C., K. Vander Linden, M. Fischer, A. Hartley, L. Pemberton, R. Power, and D. Scott: 1995, 'A Support Tool for Writing Multilingual Instructions', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montréal, Canada, pp. 1398-1404.
Penman: 1989, 'The Penman Reference Manual', ISI, University of Southern California, Marina del Rey, CA.
Google Scholar
Polguère, A.: 1991, 'Everything Has Not Been Said about Interlinguae: The Case of Multilingual Text Generation Systems', in Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS '91), Singapore, pp. 314-320.
Quirk, Randolf, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik: 1985, A Comprehensive Grammar of the English Language, London: Longman.
Google Scholar
Ratnaparkhi, Adwait: 2000, 'Trainable Methods for Surface Natural Language Generation', in 1st Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, pp. 194-201.
Reiter, Ehud: 1995, 'NLG vs Templates', in Proceedings of the Fifth European Workshop on Natural Language Generation, Leiden, The Netherlands, pp. 95-106.
Rosetta, M. T.: 1994, Compositional Translation, Dordrecht: Kluwer.
Google Scholar
Rösner, Dietmar: 1994, Automatische Generierung von mehrsprachigen Instruktionstexten aus einer Wissensbasis [Automatic knowledge-based generation of multilingual instruction texts]. Habilitationsschrift, Fakultät für Informatik, Universität Stuttgart.
Rösner, Dietmar and Manfred Stede: 1994, 'TECHDOC: Multilingual Generation of Online and Offline Instructional Text', in 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 209-210.
Shaw, James and Vasileios Hatzivassiloglou: 1999, 'Ordering Among Premodifiers', in 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, pp. 135-143.
Vander Linden, K. and D. Scott: 1995, 'Raising the Interlingual Ceiling with Multilingual Text Generation', in Proceedings of the IJCAI Workshop on Multilingual Text Generation, Montreal, Canada, pp. 95-101.
White, John S. <nt>(ed.): 2000, Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Berlin: Springer.
Google Scholar
Wilkins, Wendy: 1988, 'Thematic Structure and Reflexivization', in Wendy Wilkins (ed.), Syntax and Semantics 21: Thematic Relations, San Diego, CA: Academic Press, pp. 191-213.
Google Scholar
XTAG-Group, T.: 1999, 'A Lexicalized Tree Adjoining Grammar for English', Technical report, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, PA.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20742, USA E-mail
Nizar Habash & Bonnie Dorr
University of Southern California Institute for Creative Technologies, 13274 Fiji Way, Marina del Rey, CA, 90292, USA E-mail
David Traum

Authors

Nizar Habash
View author publications
You can also search for this author in PubMed Google Scholar
Bonnie Dorr
View author publications
You can also search for this author in PubMed Google Scholar
David Traum
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Habash, N., Dorr, B. & Traum, D. Hybrid Natural Language Generation from Lexical Conceptual Structures. Machine Translation 18, 81–128 (2003). https://doi.org/10.1023/B:COAT.0000020960.27186.18

Download citation

Issue Date: June 2003
DOI: https://doi.org/10.1023/B:COAT.0000020960.27186.18

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Natural Language Generation from Lexical Conceptual Structures

Abstract

Access this article

Similar content being viewed by others

A New Linguistic Engine for NooJ: Parsing Context-Sensitive Grammars with Finite-State Machines

A Model-Based Multilingual Natural Language Parser — Implementing Chomsky’s X-bar Theory in ModelCC

A multilingual FrameNet-based grammar and lexicon for controlled natural language

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Hybrid Natural Language Generation from Lexical Conceptual Structures

Abstract

Access this article

Similar content being viewed by others

A New Linguistic Engine for NooJ: Parsing Context-Sensitive Grammars with Finite-State Machines

A Model-Based Multilingual Natural Language Parser — Implementing Chomsky’s X-bar Theory in ModelCC

A multilingual FrameNet-based grammar and lexicon for controlled natural language

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation