Advertisement

Surface Realisation Using Factored Language Models and Input Seed Features

  • Cristina BarrosEmail author
  • Elena Lloret
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10633)

Abstract

Natural Language Generation research field needs to move forward to the design and development of flexible and adaptive techniques and approaches capable of producing language automatically, for any domain, language and purpose. In light of this, the aim of this paper is to study the appropriateness of factored language models for the stage of surface realisation, thus presenting an almost-fully language independent statistical approach. Its main novelty is that it can be adapted to generate texts for different purposes or domains thanks to the use of an input seed feature that guides all the generation process. In the context of this research, the seed input is a phoneme and our goal is to generate a full meaningful sentence that maximises the amount of words containing that phoneme. We experimented with different factors, including lemmas or part-of-speech tags, based on a trigram language model. The analysis carried out with several configurations of our proposed approach showed an improvement of 47% and 40% as far as the total meaningful generated sentences is concerned, with respect to traditional language models, for English and Spanish, respectively.

Keywords

Natural Language Generation Surface realisation Statistical approach Seed feature Factored language models 

Notes

Acknowledgment

This research work has been partially funded by the Generalitat Valenciana through the projects “DIIM2.0: Desarrollo de técnicas Inteligentes e Interactivas de Minería y generación de información sobre la web 2.0” (PROMETEOII/2014/ 001); and partially funded by the Spanish Government through projects TIN2015-65100-R, TIN2015-65136-C2-2-R, as well as by the project “Análisis de Sentimientos Aplicado a la Prevención del Suicidio en las Redes Sociales (ASAP)” funded by Ayudas Fundación BBVA a equipos de investigación científica.

References

  1. 1.
    Axelrod, A.: Factored language models for statistical machine translation (2006)Google Scholar
  2. 2.
    Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven sentence generation with non-isomorphic trees. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 387–397. Association for Computational Linguistics, Denver, May–June 2015. http://www.aclweb.org/anthology/N15-1042
  3. 3.
    Barros, C., Lloret, E.: Input seed features for guiding the generation process: a statistical approach for spanish. In: Proceedings of the 15th European Workshop on Natural Language Generation (ENLG). Association for Computational Linguistics (ACL) (2015)Google Scholar
  4. 4.
    Belz, A., Bohnet, B., Mille, S., Wanner, L., White, M.: The surface realisation task: recent developments and future plans. In: Proceedings of the Seventh International Natural Language Generation Conference, pp. 136–140. INLG 2012. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2392712.2392743
  5. 5.
    Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)Google Scholar
  6. 6.
    Crego, J.M., Yvon, F.: Factored bilingual n-gram language models for statistical machine translation. Mach. Transl. 24(2), 159–175 (2010)CrossRefGoogle Scholar
  7. 7.
    Dethlefs, N., Hastie, H., Cuayáhuitl, H., Lemon, O.: Conditional random fields for responsive surface realisation using global features. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1254–1263. Association for Computational Linguistics, Sofia, August 2013. http://www.aclweb.org/anthology/P13-1123
  8. 8.
    Durrett, G., DeNero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (2013)Google Scholar
  9. 9.
    Ge, T., Pei, W., Ji, H., Li, S., Chang, B., Sui, Z.: Bring you to the past: automatic generation of topically relevant event chronicles. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Long Papers, vol. 1, pp. 575–585. Association for Computational Linguistics, Beijing, July 2015. http://www.aclweb.org/anthology/P15-1056
  10. 10.
    Gerani, S., Mehdad, Y., Carenini, G., Ng, R.T., Nejat, B.: Abstractive summarization of product reviews using discourse structure. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1602–1613. Association for Computational Linguistics, Doha, October 2014. http://www.aclweb.org/anthology/D14-1168
  11. 11.
    Gyawali, B., Gardent, C.: Surface realisation from knowledge-bases. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 424–434. Association for Computational Linguistics, Baltimore, June 2014. http://www.aclweb.org/anthology/P14-1040
  12. 12.
    Isard, A., Brockmann, C., Oberlander, J.: Individuality and alignment in generated dialogues. In: Proceedings of the Fourth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2006)Google Scholar
  13. 13.
    Kondadadi, R., Howald, B., Schilder, F.: A statistical NLG framework for aggregated planning and realization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1406–1415. Association for Computational Linguistics, Sofia, August 2013. http://www.aclweb.org/anthology/P13-1138
  14. 14.
    Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Int. Res. 48(1), 305–346 (2013). http://dl.acm.org/citation.cfm?id=2591248.2591256CrossRefGoogle Scholar
  15. 15.
    Lim-Cheng, N.R., Fabia, G.I.G., Quebral, M.E.G., Yu, M.T.: Shed: an online diet counselling system. In: DLSU Research Congress 2014 (2014)Google Scholar
  16. 16.
    Mairesse, F., Young, S.: Stochastic language generation in dialogue using factored language models. Comput. Linguist. 40(4), 763–799 (2014)CrossRefGoogle Scholar
  17. 17.
    Morales, J.L.O.: Nuevo método de ortografía. Colección Cervantes, Verbum (1992)Google Scholar
  18. 18.
    Nicolai, G., Cherry, C., Kondrak, G.: Inflection generation as discriminative string transduction. In: Proceedings of NAACL (2015)Google Scholar
  19. 19.
    Novais, E.M., Paraboni, I.: Portuguese text generation using factored language models. J. Braz. Comput. Soc. 19(2), 135–146 (2012)CrossRefGoogle Scholar
  20. 20.
    Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA) (2012)Google Scholar
  21. 21.
    Ramos-Soto, A., Bugarín, A.J., Barro, S., Taboada, J.: Linguistic descriptions for automatic generation of textual short-term weather forecasts on real prediction data. IEEE Trans. Fuzzy Syst. 23(1), 44–57 (2015)CrossRefGoogle Scholar
  22. 22.
    Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)CrossRefGoogle Scholar
  23. 23.
    Reiter, E., Turner, R., Alm, N., Black, R., Dempster, M., Waller, A.: Using NLG to help language-impaired users tell stories and participate in social dialogues. In: Proceedings of the 12th European Workshop on Natural Language Generation, pp. 1–8. Association for Computational Linguistics (2009)Google Scholar
  24. 24.
    Rvachew, S., Rafaat, S., Martin, M.: Stimulability, speech perception skills, and the treatment of phonological disorders. Am. J. Speech-Lang. Pathol. 8(1), 33–43 (1999)CrossRefGoogle Scholar
  25. 25.
    Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)Google Scholar
  26. 26.
    Tachbelie, M.Y., Abate, S.T., Menzel, W.: Morpheme-based and factored language modeling for Amharic speech recognition. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 82–93. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20095-3_8CrossRefGoogle Scholar
  27. 27.
    Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-based language modeling for Arabic speech recognition. In: INTERSPEECH, vol. 4, pp. 2245–2248 (2004)Google Scholar
  28. 28.
    Wan, S., Dras, M., Dale, R., Paris, C.: Spanning tree approaches for statistical sentence generation. In: Empirical Methods in Natural Language Generation: Data-Oriented Methods and Empirical Evaluation, pp. 13–44 (2010).  https://doi.org/10.1007/978-3-642-15573-4_2CrossRefGoogle Scholar
  29. 29.
    White, M., Rajkumar, R.: Perceptron reranking for CCG realization. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 410–419. Association for Computational Linguistics (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Software and Computing SystemsUniversity of AlicanteAlicanteSpain

Personalised recommendations