Highly-Inflected Language Generation Using Factored Language Models

  • Eder Miranda de Novais
  • Ivandré Paraboni
  • Diogo Takaki Ferreira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6608)


Statistical language models based on n-gram counts have been shown to successfully replace grammar rules in standard 2-stage (or ‘generate-and-select’) Natural Language Generation (NLG). In highly-inflected languages, however, the amount of training data required to cope with n-gram sparseness may be simply unobtainable, and the benefits of a statistical approach become less obvious. In this work we address the issue of text generation in a highly-inflected language by making use of factored language models (FLM) that take morphological information into account. We present a number of experiments involving the use of simple FLMs applied to various surface realisation tasks, showing that FLMs may implement 2-stage generation with results that are far superior to standard n-gram models alone.


Text Generation Surface Realisation Language Modelling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gatt, A., Reiter, E.: SimpleNLG: A realization engine for practical applications. In: European Natural Language Generation Workshop, ENLG 2009 (2009)Google Scholar
  2. 2.
    Reiter, E.: An Architecture for Data-to-Text Systems. In: European Natural Language Generation Workshop (ENLG 2007), pp. 97–104 (2007)Google Scholar
  3. 3.
    Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of ANLP-NAACL 2000, pp. 170–177 (2000)Google Scholar
  4. 4.
    Belz, A.: Automatic Generation of Weather Forecast Texts using Comprehensive Probabilistic Generation-Space Models. Natural Language Engineering 14(4), 431–455 (2008)CrossRefGoogle Scholar
  5. 5.
    de Novais, E.M., Dias Tadeu, T., Paraboni, I.: Improved Text Generation Using N-gram Statistics. In: Kuri-Morales, A., Simari, G.R. (eds.) IBERAMIA 2010. LNCS (LNAI), vol. 6433, pp. 316–325. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Nunes, M.G.V., Vieira, F.M.C., Zavaglia, C., Sossolote, C.R.C., Hernandez, J.: A construcao de um lexico para o portugues do Brasil: licoes aprendidas e perspectivas. II Encontro para o processamento de portugues escrito e Falado, 61–70 (1996)Google Scholar
  7. 7.
    Reiter, E., Sripada, S.: Human Variation and Lexical Choice. Computational Linguistics 28(4) (2002)Google Scholar
  8. 8.
    Bangalore, S., Rambow, O.: Corpus-based lexical choice in natural language generation. In: 38th Meeting of the ACL, Hong Kong, pp. 464–471 (2000)Google Scholar
  9. 9.
    Malouf, R.: The order of prenominal adjectives in natural language generation. In: Proceedings of ACL 2000, Hong Kong (2000)Google Scholar
  10. 10.
    Mitchell, M.: Class-Based Ordering of Prenominal Modifiers. In: Proceedings of the 12th European Workshop on Natural Language Generation, Athens, pp. 50–57 (2009)Google Scholar
  11. 11.
    Bilmes, J., Kirchhoff, K.: Factored Language Models and Generalized Parallel Backoff. In: Proceedings of HLT-NAACL 2003, vol. 2 (2003)Google Scholar
  12. 12.
    NIST: Automatic Evaluation of Machine Translation Quality using n-gram Co-occurrence Statistics (2002),
  13. 13.
    Papineni, S., Roukos, T., Ward, W., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: ACL 2002, pp. 311–318 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Eder Miranda de Novais
    • 1
  • Ivandré Paraboni
    • 1
  • Diogo Takaki Ferreira
    • 1
  1. 1.School of Arts, Sciences and HumanitiesUniversity of São Paulo (USP / EACH)São PauloBrazil

Personalised recommendations