Advertisement

Machine Translation

, Volume 18, Issue 2, pp 81–128 | Cite as

Hybrid Natural Language Generation from Lexical Conceptual Structures

  • Nizar Habash
  • Bonnie Dorr
  • David Traum
Article

Abstract

This paper describes Lexogen, a system for generating natural-languagesentences from Lexical Conceptual Structure, an interlingualrepresentation. The system has been developed as part of aChinese–English Machine Translation (MT) system; however, it isdesigned to be used for many other MT language pairs and naturallanguage applications. The contributions of this work include: (1)development of a large-scale Hybrid Natural Language Generation system withlanguage-independent components; (2) enhancements to an interlingualrepresentation and associated algorithm forgeneration from ambiguous input; (3) development of an efficientreusable language-independent linearization module with a grammardescription language that can be used with other systems; (4)improvements to an earlier algorithm forhierarchically mapping thematic roles to surface positions; and (5)development of a diagnostic tool for lexicon coverage and correctnessand use of the tool for verification of English, Spanish, and Chineselexicons. An evaluation of Chinese–English translation quality showscomparable performance with a commercial translation system. Thegeneration system can also be extended to other languages and this isdemonstrated and evaluated for Spanish.

Hybrid Natural Language Generation Multilingual Natural Language Generation interlingua Lexical Conceptual Structure 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alshawi, Hiyan, Srinivas Bangalore, and Shona Douglas: 2000, 'Learning Dependency Translation Models as Collections of Finite-State Head Transducers', Computational Linguistics 26, 45-60.Google Scholar
  2. Alsina, Alex and Sam A. Mchombo: 1993, 'Object Asymmetries and the Chichewa Applicative Construction', in Sam A. Mchombo (ed.): Aspects of Automated Natural Language Generation, Stanford, CA: CSLI Publications, pp. 1-46.Google Scholar
  3. Baker, Carl Lee: 1989, English Syntax, Cambridge, MA: The MIT Press.Google Scholar
  4. Bangalore, Srinivas and Owen Rambow: 2000, 'Corpus-Based Lexical Choice in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 464-471.Google Scholar
  5. Bangalore, S., O. Rambow, and S. Whittaker: 2000, 'Evaluation Metrics for Generation', in Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel, pp. 1-8.Google Scholar
  6. Bateman, John A.: 1997, 'Enabling technology for multilingual natural language generation: the KPML development environment', Natural Language Engineering 3, 15-55.Google Scholar
  7. Bateman, J., C. Matthiessen, and L. Zeng: 1999, 'Multilingual natural language generation for multilingual software: a functional linguistic approach', Applied Artificial Intelligence 13, 607-639.Google Scholar
  8. Bresnan, J. and J. Kanerva: 1989, 'Locative Inversion in Chichewa: A Case Study of Factorization in Grammar', Linguistic Inquiry 20, 1-50.Google Scholar
  9. Carrier-Duncan, J.: 1985, 'Linking of Thematic Roles in Derivational Word Formation', Linguistic Inquiry 16, 1-34.Google Scholar
  10. Chandioux, John: 1989, 'MÉTÉO: 100 Million Words Later', in D. L. Hammond (ed.), American Translators Association Conference 1989: Coming of Age, Medford, NJ: Learned Information, pp. 449-453.Google Scholar
  11. Charniak, E.: 2000, 'statistical Techniques in Natural Language Processing', in The MIT Encyclopedia of the Cognitive Sciences, Cambridge: MIT Press.Google Scholar
  12. Church, Kenneth W. and Eduard H. Hovy: 1993, 'Good Applications for Crummy Machine Translation', Machine Translation 8, 239-258.Google Scholar
  13. Cole, R., J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue: 1997, Survey of the State of the Art in Human Language Technology, Cambridge University Press, Cambridge, UK.Google Scholar
  14. Dorr, Bonnie Jean: 1993a, 'Interlingual Machine Translation: A Parameterized Approach', Artificial Intelligence 63, 429-492.Google Scholar
  15. Dorr, Bonnie Jean: 1993b, Machine Translation: A View from the Lexicon, Cambridge, MA: The MIT Press.Google Scholar
  16. Dorr, Bonnie J.: 1994, 'Machine Translation Divergences: A Formal Description and Proposed Solution', Computational Linguistics 20, 597-633.Google Scholar
  17. Dorr, Bonnie J.: 1997a, 'Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring', in Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 139-146.Google Scholar
  18. Dorr, Bonnie J.: 1997b, 'Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation', Machine Translation 12, 271-322.Google Scholar
  19. Dorr, B. J.: 2001, 'LCS Verb Database’, Technical Report Online Software Database, University ofMaryland, College Park, MD. http://www.umiacs.umd.edu/∼bonnie/LCSDatabase_Docmentation.htmlGoogle Scholar
  20. Dorr, B. J. and T. Gaasterland: 2002, 'Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions', Technical Report CS-TR-4391, UMIACS-TR-2002-71, LAMP-TR-091. University of Maryland, College Park, MD, USA.Google Scholar
  21. Dorr, Bonnie J., Joseph Garman, and Amy Weinberg: 1995, 'From Syntactic Encodings to Thematic Roles: Building Lexical Entries for Interlingual MT', Machine Translation 9, 221-250.Google Scholar
  22. Dorr, Bonnie, Nizar Habash, and David Traum: 1998, 'A Thematic Hierarchy for Efficient Generation from Lexical-Conceptal Structure, in David Farwell, Laurie Gerber and Eduard Hovy (eds), Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Berlin: Springer, pp. 333-343.Google Scholar
  23. Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: 2000, 'Building a Chinese-English Mapping Between Verb Concepts for Multilingual Applications', In White (2000), pp. 1-12.Google Scholar
  24. Dorr, Bonnie J., Gina-Anne Levow, and Dekang Lin: forthcoming, 'Construction of a Chinese-English Verb Lexicon for Embedded Machine Translation in Cross-Language Information Retrieval', to appear in Machine Translation (Special Issue on Embedded MT).Google Scholar
  25. Dorr, Bonnie J. and Mari Broman Olsen: 1996, 'Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization', Machine Translation 11, 37-74.Google Scholar
  26. Elhadad, Michael, Kathleen McKeown, and J. Robin: 1997, 'Floating Constraints in Lexical Choice', Computational Linguistics 23, 195-240.Google Scholar
  27. Elhadad, Michael and Jacques Robin: 1992, 'Controlling Content Realization with Functional Uni-fication Grammars', in Robert Dale, Eduard Hovy, Dietmar Rösner, and Oliviero Stock (eds), Aspects of Automated Natural Language Generation: The 6th International Workshop on Natural Language Generation, Berlin: Springer, pp. 89-104.Google Scholar
  28. Fellbaum, C.: 1998, WordNet: An Electronic Lexical Database, Cambridge, MA: The MIT Press.Google Scholar
  29. Giorgi, A.: 1984, 'Toward a Theory of Long Distance Anaphors: A GB Approach', The Linguistic Review 3, 307-361.Google Scholar
  30. Grimshaw, J. and A. Mester: 1988, 'Light Verbs and Theta-Marking', Linguistic Inquiry 19, 205-232.Google Scholar
  31. Habash, Nizar: 2000, 'Oxygen: A Language Independent Linearization Engine', in White (2000), pp. 68-79.Google Scholar
  32. Habash, Nizar: 2001, A Reference Manual to the Linearization Engine oxyGen version 1.6', Technical report CS-TR-4295, University of Maryland, College Park, MD.Google Scholar
  33. Habash, Nizar and Bonnie Dorr: 2001a, 'Large Scale Language Independent Generation Using Thematic Hierarchies', in MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 139-144.Google Scholar
  34. Habash, Nizar and Bonnie J. Dorr: 2001b, 'Large Scale Language Independent Generation: Using Thematic Hierarchies', Technical report LAMP-TR-075, CS-TR-4280, UMIACS-TR-2001-59, University of Maryland, College Park, MD.Google Scholar
  35. Halliday, Michael A. K.: 1985, An Introduction to Functional Grammar, London: Edward Arnold.Google Scholar
  36. Hirst, Graeme: 1987, Semantic Interpretation and the Resolution of Ambiguity, New York: Cambridge University Press.Google Scholar
  37. Hovy, Eduard: 1988, 'Generating Natural Language under Pragmatic Constraints', Ph.D. thesis, Yale University.Google Scholar
  38. Hovy, Eduard: 1999, 'Toward Finely Differentiated Evaluation Metrics for Machine Translation', in Proceedings of the EAGLES Workshop on Standards and Evaluation, Pisa, Italy, pp. 127-133.Google Scholar
  39. International Standards for Language Engineering (ISLE): 2000, 'The ISLE Classification of Machine Translation Evaluations', http://www.isi.edu/natural-language/mteval/, [accessed 9/8/2003].Google Scholar
  40. Jackendoff, Ray S.: 1972, Semantic Interpretation in Generative Grammar, Cambridge, MA: The MIT Press.Google Scholar
  41. Jackendoff, Ray S.: 1983, Semantics and Cognition, Cambridge, MA: The MIT Press.Google Scholar
  42. Jackendoff, Ray S.: 1990, Semantic Structures, Cambridge, MA: The MIT Press.Google Scholar
  43. Jackendoff, Ray S.: 1996, 'The Proper Treatment of Measuring Out, Telicity, and Perhaps Even Quantification in English', Natural Language and Linguistic Theory 14, 305-354.Google Scholar
  44. Joshi, A. K.: 1987, 'An Introduction to Tree Adjoining Grammars', in A. Manaster-Ramer (ed.): Mathematics of Language, John Benjamins, Amsterdam, pp. 87-115Google Scholar
  45. Jurafsky, Daniel and James H. Martin: 2000, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Upper Saddle River, New Jersey: Prentice Hall.Google Scholar
  46. Kay, Martin: 1979, 'Functional Grammar', in Proceedings of the 5th Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, pp. 142-158.Google Scholar
  47. Kiparsky, P.: 1985, 'Morphology and Grammatical Relations', unpublished ms., Stanford University.Google Scholar
  48. Kittredge, Richard, Lidija Iordanskaja, and Alain Polguère: 1988, 'Multi-Lingual Text Generation and the Meaning-Text Theory', in Second International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Pittsburgh, Pennsylvania.Google Scholar
  49. Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo Iida, Steve K. Luk, Akitoshi Okumura, Richard Whitney, and Kenji Yamada: 1994, 'Integrating Knowledge Bases and Statistics in MT', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 134-141.Google Scholar
  50. Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard H. Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, and Kenji Yamada: 1995, 'Filling Knowledge Gaps in a Broad-Coverage Machine Translation System', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, pp. 1390-1397.Google Scholar
  51. Knight, Kevin and Vasileios Hatzivassiloglou: 1995, 'Two-Level, Many-Paths Generation', in 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge,MA, pp. 252-260.Google Scholar
  52. Knight, Kevin and Steve K. Luk: 1994, 'Building a Large Knowledge Base forMachine Translation', in Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, pp. 773-778.Google Scholar
  53. Langkilde, Irene and Kevin Knight: 1998a, 'Generating Word Lattices from Abstract Meaning Representation', Technical report, Information Sciences Institute, University of Southern California, Marina del Rey, CA.Google Scholar
  54. Langkilde, Irene and Kevin Knight: 1998b, 'Generation that Exploits Corpus-Based Statistical Knowledge', in COLING-ACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704-710.Google Scholar
  55. Langkilde, Irene and Kevin Knight: 1998c, 'The Practical Value of n-Grams in Generation', in Proceedings of the 9th International Natural Language Workshop (INLG'98), Niagara-on-the-Lake, Ontario, pp. 248-255.Google Scholar
  56. Langkilde-Geary, Irene: 2002, 'An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator', in Proceedings of International Natural Language Generation Conference (INLG'02), Harriman, NY, pp. 17-24Google Scholar
  57. Larson, R.: 1988, 'On the Double Object Construction', Linguistic Inquiry 19, 335-391.Google Scholar
  58. Levin, Beth: 1993, English Verb Classes and Alternations: A Preliminary Investigation, Chicago, IL: University of Chicago Press.Google Scholar
  59. Levin, B. and M. Rappaport Hovav: 1996, 'From Lexical Semantics to Argument Realization', Technical report, Northwestern University, Evanston IL, and Bar Ilan University, Ramat Gan, Israel. ftp://ftp.ling.nwu.edu/pub/beth/borer96.ps.Google Scholar
  60. Malouf, Robert: 2000, 'The Order of Prenominal Adjectives in Natural Language Generation', in 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 85-92.Google Scholar
  61. Mann, William C. and Christian Matthiessen: 1985, 'Demonstration of the Nigel Text Generation Computer Program', in James D. Benson and William S. Greaves (eds), Systemic Perspectives on Discourse, Volume 1, Norwood, NJ: Ablex, pp. 50-83.Google Scholar
  62. Marcu, D., L. Carlson, and M. Watanabe: 2000, 'An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?', in First International Natural Language Generation Conference (INLG'2000), Mitzpe Ramon, Israel, pp. 17-23.Google Scholar
  63. Miller, George A. and Christiane Fellbaum: 1991, 'Semantic Networks of English', in Beth Levin and Steven Pinker (eds), Lexical and Conceptual Semantics, Amsterdam: Elsevier, pp. 197-229.Google Scholar
  64. Nagao, Makoto: 1989, 'Two Years After the MT SUMMIT', in MT Summit II: Final Programme, Exhibition, Papers, Munich, Germany, pp. 100-105.Google Scholar
  65. Nishgauchi, T.: 1984, 'Control and the Thematic Domain', Language 60, 215-260.Google Scholar
  66. Okumura, Akitoshi, Kazunori Muraki, and Susumu Akamine: 1991, 'Multi-lingual Sentence Generation from the PIVOT Interlingua', in Machine Translation Summit III Proceedings, Washington, D.C., pp. 67-71. Repr. in: Sergei Nirenburg (ed.) Progress in Machine Translation, Amsterdam/Tokyo (1993): IOS Press/Ohmsha, pp. 119-125.Google Scholar
  67. Olsen, Mari Broman: 1997, A Semantic and Pragmatic Model of Lexical and Grammatical Aspect, New York: Garland Press.Google Scholar
  68. Olsen, Mari, David Traum, Carol Van Ess-Dykema, and Amy Weinberg: 2001, 'Implicit Cues for Explicit Generation: Using Telicity as a Cue for Tense Structure in a Chinese to English MT System', in MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp. 259-264.Google Scholar
  69. Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu: 2002, 'Bleu: A Method for Automatic Evaluation of Machine Translation', in 40th Annual Meeting of the Association of Computational Linguistics, Philadelphia, PA, pp. 311-318.Google Scholar
  70. Paris, C., K. Vander Linden, M. Fischer, A. Hartley, L. Pemberton, R. Power, and D. Scott: 1995, 'A Support Tool for Writing Multilingual Instructions', in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montréal, Canada, pp. 1398-1404.Google Scholar
  71. Penman: 1989, 'The Penman Reference Manual', ISI, University of Southern California, Marina del Rey, CA.Google Scholar
  72. Polguère, A.: 1991, 'Everything Has Not Been Said about Interlinguae: The Case of Multilingual Text Generation Systems', in Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS '91), Singapore, pp. 314-320.Google Scholar
  73. Quirk, Randolf, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik: 1985, A Comprehensive Grammar of the English Language, London: Longman.Google Scholar
  74. Ratnaparkhi, Adwait: 2000, 'Trainable Methods for Surface Natural Language Generation', in 1st Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, pp. 194-201.Google Scholar
  75. Reiter, Ehud: 1995, 'NLG vs Templates', in Proceedings of the Fifth European Workshop on Natural Language Generation, Leiden, The Netherlands, pp. 95-106.Google Scholar
  76. Rosetta, M. T.: 1994, Compositional Translation, Dordrecht: Kluwer.Google Scholar
  77. Rösner, Dietmar: 1994, Automatische Generierung von mehrsprachigen Instruktionstexten aus einer Wissensbasis [Automatic knowledge-based generation of multilingual instruction texts]. Habilitationsschrift, Fakultät für Informatik, Universität Stuttgart.Google Scholar
  78. Rösner, Dietmar and Manfred Stede: 1994, 'TECHDOC: Multilingual Generation of Online and Offline Instructional Text', in 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 209-210.Google Scholar
  79. Shaw, James and Vasileios Hatzivassiloglou: 1999, 'Ordering Among Premodifiers', in 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, pp. 135-143.Google Scholar
  80. Vander Linden, K. and D. Scott: 1995, 'Raising the Interlingual Ceiling with Multilingual Text Generation', in Proceedings of the IJCAI Workshop on Multilingual Text Generation, Montreal, Canada, pp. 95-101.Google Scholar
  81. White, John S. <nt>(ed.): 2000, Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Berlin: Springer.Google Scholar
  82. Wilkins, Wendy: 1988, 'Thematic Structure and Reflexivization', in Wendy Wilkins (ed.), Syntax and Semantics 21: Thematic Relations, San Diego, CA: Academic Press, pp. 191-213.Google Scholar
  83. XTAG-Group, T.: 1999, 'A Lexicalized Tree Adjoining Grammar for English', Technical report, Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, PA.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Nizar Habash
    • 1
  • Bonnie Dorr
    • 1
  • David Traum
    • 2
  1. 1.Institute for Advanced Computer StudiesUniversity of MarylandCollege ParkUSA E-mail
  2. 2.University of Southern California Institute for Creative TechnologiesMarina del ReyUSA E-mail

Personalised recommendations