Skip to main content

Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation

  • Chapter
Book cover Empirical Methods in Natural Language Generation (EACL 2009, ENLG 2009)

Abstract

Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this paper, we investigate the cost/quality trade-off in generation system building. We compare six data-to-text systems which were created by predominantly automatic techniques against six systems for the same domain which were created by predominantly manual techniques. We evaluate the systems using intrinsic automatic metrics and human quality ratings. We find that there is some correlation between degree of automation in the system-building process and output quality (more automation tending to mean lower evaluation scores). We also find that there are discrepancies between the results of the automatic evaluation metrics and the human-assessed evaluation experiments. We discuss caveats in assessing system-building cost and implications of the discrepancies in automatic and human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belz, A.: Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Natural Language Engineering 14(4), 431–455 (2008)

    Article  Google Scholar 

  2. Belz, A.: Prodigy-METEO: Pre-alpha release notes (Nov 2009). Tech. Rep. NLTG-09-01, Natural Language Technology Group, CMIS, University of Brighton (2009)

    Google Scholar 

  3. Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 313–320 (2006)

    Google Scholar 

  4. Belz, A.: That’s nice.. what can you do with it? Computational Linguistics 35(1), 111–118 (2009)

    Article  Google Scholar 

  5. Belz, A., Kow, E.: System building cost vs. output quality in data-to-text generation. In: Proceedings of the 12th European Workshop on Natural Language Generation (2009)

    Google Scholar 

  6. Belz, A., Kow, E., Viethen, J., Gatt, A.: Generating referring expressions in context: The GREC task evaluation challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 294–328. Springer, Heidelberg (2010)

    Google Scholar 

  7. Bertoldi, N., Haddow, B., Fouet, J.: Improved Minimum Error Rate Training in Moses. The Prague Bulletin of Mathematical Linguistics 91, 7–16 (2009)

    Article  Google Scholar 

  8. Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  9. Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of BLEU in machine translation research. In: Proceedings of EACL 2006 (2006)

    Google Scholar 

  10. Chiang, D.: An introduction to synchronous grammars (part of the course materials for the ACL 2006 tutorial on synchronous grammars) (2006)

    Google Scholar 

  11. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the ARPA Workshop on Human Language Technology (2002)

    Google Scholar 

  12. Gatt, A., Belz, A.: Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 264–293. Springer, Heidelberg (2010)

    Google Scholar 

  13. Knight, K., Langkilde, I.: Generation that exploits corpus-based statistical knowledge. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), pp. 704–710 (1998)

    Google Scholar 

  14. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pp. 177–180 (2007)

    Google Scholar 

  15. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003), pp. 48–54 (2003)

    Google Scholar 

  16. Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of the 6th Applied Natural Language Processing Conference and the 1st Meeting of the North American Chapter of the Association of Computational Linguistics (ANLP-NAACL 2000), pp. 170–177 (2000)

    Google Scholar 

  17. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  18. Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, p. 167. Association for Computational Linguistics (2003)

    Google Scholar 

  19. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A method for automatic evaluation of machine translation. IBM research report, IBM Research Division (2001)

    Google Scholar 

  20. Parmentier, Y., Le Roux, J.: XMG: a Multi-formalism Metagrammatical Framework. In: 17th European Summer School in Logic, Language and Information - ESSLLI 2005, Edinburgh/Scotland (August 2005)

    Google Scholar 

  21. Reidsma, D., Op den Akker, R.: Exploiting ‘subjective’ annotations. In: Proceedings of the COLING 2008 Workshop on Human Judgements in Computational Linguistics, pp. 8–16 (2008)

    Google Scholar 

  22. Reiter, E., Belz, A.: An investigation into the validity of some metrics for automatically evaluating NLG systems. Computational Linguistics 35(4) (2009)

    Google Scholar 

  23. Reiter, E., Dale, R.: Building applied natural language generation systems. Natural Langauge Engineering 3(1), 57–87 (1997)

    Article  Google Scholar 

  24. Reiter, E., Sripada, S., Hunter, J., Yu, J.: Choosing words in computer-generated weather forecasts. Artificial Intelligence 167, 137–169 (2005)

    Article  Google Scholar 

  25. Riezler, S., Maxwell, J.T.: On some pitfalls in automatic evaluation and significance testing for MT. In: Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, pp. 57–64 (2005)

    Google Scholar 

  26. Sripada, S., Reiter, E., Hunter, J., Yu, J.: SumTime-Meteo: A parallel corpus of naturally occurring forecast texts and weather data. Tech. Rep. AUCS/TR0201, Computing Science Department, University of Aberdeen (2002)

    Google Scholar 

  27. Wong, Y.W., Mooney, R.: Learning for semantic parsing with statistical machine translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2006), pp. 439–446 (2006)

    Google Scholar 

  28. Wong, Y.W., Mooney, R.: Generation by inverting a semantic parser that uses statistical machine translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2007), pp. 172–179 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Belz, A., Kow, E. (2010). Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15573-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15572-7

  • Online ISBN: 978-3-642-15573-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics