Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation

Belz, Anja; Kow, Eric

doi:10.1007/978-3-642-15573-4_10

Anja Belz²¹ &
Eric Kow²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5790))

Included in the following conference series:

1167 Accesses
1 Citations

Abstract

Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this paper, we investigate the cost/quality trade-off in generation system building. We compare six data-to-text systems which were created by predominantly automatic techniques against six systems for the same domain which were created by predominantly manual techniques. We evaluate the systems using intrinsic automatic metrics and human quality ratings. We find that there is some correlation between degree of automation in the system-building process and output quality (more automation tending to mean lower evaluation scores). We also find that there are discrepancies between the results of the automatic evaluation metrics and the human-assessed evaluation experiments. We discuss caveats in assessing system-building cost and implications of the discrepancies in automatic and human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Belz, A.: Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Natural Language Engineering 14(4), 431–455 (2008)
Article Google Scholar
Belz, A.: Prodigy-METEO: Pre-alpha release notes (Nov 2009). Tech. Rep. NLTG-09-01, Natural Language Technology Group, CMIS, University of Brighton (2009)
Google Scholar
Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 313–320 (2006)
Google Scholar
Belz, A.: That’s nice.. what can you do with it? Computational Linguistics 35(1), 111–118 (2009)
Article Google Scholar
Belz, A., Kow, E.: System building cost vs. output quality in data-to-text generation. In: Proceedings of the 12th European Workshop on Natural Language Generation (2009)
Google Scholar
Belz, A., Kow, E., Viethen, J., Gatt, A.: Generating referring expressions in context: The GREC task evaluation challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 294–328. Springer, Heidelberg (2010)
Google Scholar
Bertoldi, N., Haddow, B., Fouet, J.: Improved Minimum Error Rate Training in Moses. The Prague Bulletin of Mathematical Linguistics 91, 7–16 (2009)
Article Google Scholar
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of BLEU in machine translation research. In: Proceedings of EACL 2006 (2006)
Google Scholar
Chiang, D.: An introduction to synchronous grammars (part of the course materials for the ACL 2006 tutorial on synchronous grammars) (2006)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the ARPA Workshop on Human Language Technology (2002)
Google Scholar
Gatt, A., Belz, A.: Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 264–293. Springer, Heidelberg (2010)
Google Scholar
Knight, K., Langkilde, I.: Generation that exploits corpus-based statistical knowledge. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), pp. 704–710 (1998)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pp. 177–180 (2007)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003), pp. 48–54 (2003)
Google Scholar
Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of the 6th Applied Natural Language Processing Conference and the 1st Meeting of the North American Chapter of the Association of Computational Linguistics (ANLP-NAACL 2000), pp. 170–177 (2000)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, p. 167. Association for Computational Linguistics (2003)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A method for automatic evaluation of machine translation. IBM research report, IBM Research Division (2001)
Google Scholar
Parmentier, Y., Le Roux, J.: XMG: a Multi-formalism Metagrammatical Framework. In: 17th European Summer School in Logic, Language and Information - ESSLLI 2005, Edinburgh/Scotland (August 2005)
Google Scholar
Reidsma, D., Op den Akker, R.: Exploiting ‘subjective’ annotations. In: Proceedings of the COLING 2008 Workshop on Human Judgements in Computational Linguistics, pp. 8–16 (2008)
Google Scholar
Reiter, E., Belz, A.: An investigation into the validity of some metrics for automatically evaluating NLG systems. Computational Linguistics 35(4) (2009)
Google Scholar
Reiter, E., Dale, R.: Building applied natural language generation systems. Natural Langauge Engineering 3(1), 57–87 (1997)
Article Google Scholar
Reiter, E., Sripada, S., Hunter, J., Yu, J.: Choosing words in computer-generated weather forecasts. Artificial Intelligence 167, 137–169 (2005)
Article Google Scholar
Riezler, S., Maxwell, J.T.: On some pitfalls in automatic evaluation and significance testing for MT. In: Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, pp. 57–64 (2005)
Google Scholar
Sripada, S., Reiter, E., Hunter, J., Yu, J.: SumTime-Meteo: A parallel corpus of naturally occurring forecast texts and weather data. Tech. Rep. AUCS/TR0201, Computing Science Department, University of Aberdeen (2002)
Google Scholar
Wong, Y.W., Mooney, R.: Learning for semantic parsing with statistical machine translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2006), pp. 439–446 (2006)
Google Scholar
Wong, Y.W., Mooney, R.: Generation by inverting a semantic parser that uses statistical machine translation. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2007), pp. 172–179 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Technology Group, School of Computing, Mathematical and Information Sciences, University of Brighton, Brighton, BN2 3PB, UK
Anja Belz & Eric Kow

Authors

Anja Belz
View author publications
You can also search for this author in PubMed Google Scholar
Eric Kow
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Humanities, Department of Communication and Information Sciences (DCI), Tilburg University, P.O.Box 90153, 5000 LE, Tilburg, The Netherlands
Emiel Krahmer
Human Media Interaction (HMI), Department of Electrical Engineering, Mathematics and Computer Science (EEMCS), University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Mariët Theune

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Belz, A., Kow, E. (2010). Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-15573-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15572-7
Online ISBN: 978-3-642-15573-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics