Advertisement

Highly-Inflected Language Generation Using Factored Language Models

  • Eder Miranda de Novais
  • Ivandré Paraboni
  • Diogo Takaki Ferreira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6608)

Abstract

Statistical language models based on n-gram counts have been shown to successfully replace grammar rules in standard 2-stage (or ‘generate-and-select’) Natural Language Generation (NLG). In highly-inflected languages, however, the amount of training data required to cope with n-gram sparseness may be simply unobtainable, and the benefits of a statistical approach become less obvious. In this work we address the issue of text generation in a highly-inflected language by making use of factored language models (FLM) that take morphological information into account. We present a number of experiments involving the use of simple FLMs applied to various surface realisation tasks, showing that FLMs may implement 2-stage generation with results that are far superior to standard n-gram models alone.

Keywords

Text Generation Surface Realisation Language Modelling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gatt, A., Reiter, E.: SimpleNLG: A realization engine for practical applications. In: European Natural Language Generation Workshop, ENLG 2009 (2009)Google Scholar
  2. 2.
    Reiter, E.: An Architecture for Data-to-Text Systems. In: European Natural Language Generation Workshop (ENLG 2007), pp. 97–104 (2007)Google Scholar
  3. 3.
    Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of ANLP-NAACL 2000, pp. 170–177 (2000)Google Scholar
  4. 4.
    Belz, A.: Automatic Generation of Weather Forecast Texts using Comprehensive Probabilistic Generation-Space Models. Natural Language Engineering 14(4), 431–455 (2008)CrossRefGoogle Scholar
  5. 5.
    de Novais, E.M., Dias Tadeu, T., Paraboni, I.: Improved Text Generation Using N-gram Statistics. In: Kuri-Morales, A., Simari, G.R. (eds.) IBERAMIA 2010. LNCS (LNAI), vol. 6433, pp. 316–325. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Nunes, M.G.V., Vieira, F.M.C., Zavaglia, C., Sossolote, C.R.C., Hernandez, J.: A construcao de um lexico para o portugues do Brasil: licoes aprendidas e perspectivas. II Encontro para o processamento de portugues escrito e Falado, 61–70 (1996)Google Scholar
  7. 7.
    Reiter, E., Sripada, S.: Human Variation and Lexical Choice. Computational Linguistics 28(4) (2002)Google Scholar
  8. 8.
    Bangalore, S., Rambow, O.: Corpus-based lexical choice in natural language generation. In: 38th Meeting of the ACL, Hong Kong, pp. 464–471 (2000)Google Scholar
  9. 9.
    Malouf, R.: The order of prenominal adjectives in natural language generation. In: Proceedings of ACL 2000, Hong Kong (2000)Google Scholar
  10. 10.
    Mitchell, M.: Class-Based Ordering of Prenominal Modifiers. In: Proceedings of the 12th European Workshop on Natural Language Generation, Athens, pp. 50–57 (2009)Google Scholar
  11. 11.
    Bilmes, J., Kirchhoff, K.: Factored Language Models and Generalized Parallel Backoff. In: Proceedings of HLT-NAACL 2003, vol. 2 (2003)Google Scholar
  12. 12.
    NIST: Automatic Evaluation of Machine Translation Quality using n-gram Co-occurrence Statistics (2002), http://www.nist.gov/speech/tests/mt/doc/ngram-study.pdf
  13. 13.
    Papineni, S., Roukos, T., Ward, W., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: ACL 2002, pp. 311–318 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Eder Miranda de Novais
    • 1
  • Ivandré Paraboni
    • 1
  • Diogo Takaki Ferreira
    • 1
  1. 1.School of Arts, Sciences and HumanitiesUniversity of São Paulo (USP / EACH)São PauloBrazil

Personalised recommendations