Abstract
This paper describes the First Challenge on Generating Instructions in Virtual Environments (GIVE-1). GIVE is a shared task for generation systems which give real-time natural-language instructions to users in a virtual 3D world. These systems are evaluated by connecting users and NLG systems over the Internet. We describe the design and results of GIVE-1 as well as the participating NLG systems, and validate the experimental methodology by comparing the results collected over the Internet with results from a more traditional laboratory-based experiment.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the ACM CHI Conference (2004)
Bangalore, S., Rambow, O., Whittaker, S.: Evaluation metrics for generation. In: Proceedings of the First International Natural Language Generation Conference (INLG 2000), Mitzpe Ramon, pp. 1–8 (2000)
Belz, A.: That’s nice... what can you do with it? Computational Linguistics 35(1), 111–118 (2009)
Belz, A., Gatt, A.: Intrinsic vs. extrinsic evaluation measures for referring expression generation. In: Proceedings of ACL 2008: HLT, Short Papers, Columbus, Ohio, pp. 197–200 (2008)
Belz, A., Kow, E.: Assessing the trade-off between system building cost and output quality in data-to-text generation. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 180–200. Springer, Heidelberg (2010)
Belz, A., Kow, E., Viethen, J., Gatt, A.: Generating referring expressions in context: The GREC task evaluation challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 294–328. Springer, Heidelberg (2010)
Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of EACL 2006, Trento, Italy, pp. 249–256 (2006)
Boer Rookhuiszen, R., Obbink, M., Theune, M.: Two approaches to GIVE: dynamic level adaptation versus playfulness. In: Proceedings of the First NLG Challenge on Generating Instructions in Virtual Environments (2009), http://www.give-challenge.org/research
Cahill, A., Forst, M.: Human Evaluation of a German Surface Realisation Ranker. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 201–221. Springer, Heidelberg (2010)
Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of Bleu in machine translation research. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 249–256 (2006)
Chamberlain, J., Poesio, M., Kruschwitz, U.: Addressing the resource bottleneck to create large-scale annotated texts. In: Bos, J., Delmonte, R. (eds.) Proceedings of the Symposium on Semantics in Text Processing (STEP), pp. 375–380 (2008)
Chen, D., Karpov, I.: The GIVE-1 Austin system. In: Proceedings of the First NLG Challenge on Generating Instructions in Virtual Environments (2009), http://www.give-challenge.org/research
Dale, R., White, M. (eds.): Proceedings of the NSF/SIGGEN Workshop for Shared Tasks and Comparative Evaluation in NLG, Arlington, VA (2007)
Dale, R., Reiter, E.: Computational interpretations of the Gricean maxims in the generation of referring expressions. Cognitive Science 19(2), 233–263 (1995)
Dionne, D., de la Puente, S., León, C., Hervás, R., Gervás, P.: Guide. In: Proceedings of the First NLG Challenge on Generating Instructions in Virtual Environments (2009), http://www.give-challenge.org/research
Foster, M.E.: Automated metrics that agree with human judgements on generated output for an embodied conversational agent. In: Proceedings of the Fifth International Natural Language Generation Conference (INLG 2008), Salt Fork, OH, pp. 95–103 (2008)
Gatt, A., Belz, A., Kow, E.: The TUNA-REG challenge 2009: Overview and evaluation results. In: Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), pp. 174–182 (2009)
Gatt, A., Belz, A.: Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 264–293. Springer, Heidelberg (2010)
Hsu, C.W., Wah, B.W., Huang, R., Chen, Y.X.: New features in SGPlan for handling soft constraints and goal preferences in PDDL 3.0. In: Proceedings of the Fifth International Planning Competition, 16th International Conference on Automated Planning and Scheduling, pp. 39–41 (2006)
Keller, F., Gunasekharan, S., Mayo, N., Corley, M.: Timing accuracy of web experiments: A case study using the WebExp software package. Behavior Research Methods 41(1), 1–12 (2009)
Koller, A., Petrick, R.: Experiences with planning for natural language generation. In: Proceedings of SPARK 2008: The ICAPS 2008 Scheduling and Planning Applications Workshop, Sydney, Australia (2008)
Moffat, S., Hampson, E., Hatzipantelis, M.: Navigation in a “virtual” maze: Sex differences and correlation with psychometric measures of spatial ability in humans. Evolution and Human Behavior 19(2), 73–87 (1998)
Orkin, J., Roy, D.: The restaurant game: Learning social behavior and language from thousands of players online. Journal of Game Development 3(1), 39–60 (2007)
Stent, A., Marge, M., Singhai, M.: Evaluating evaluation methods for generation in the presence of variation. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 341–351. Springer, Heidelberg (2005)
Stoia, L., Shockley, D.M., Byron, D.K., Fosler-Lussier, E.: Noun phrase generation for situated dialogs. In: Proceedings of the Fourth International Natural Language Generation Conference (INLG 2006), Sydney (2006)
Striegnitz, K., Majda, F.: Landmarks in navigation instructions for a virtual environment. In: Proceedings of the First NLG Challenge on Generating Instructions in Virtual Environments (2009), http://www.give-challenge.org/research
Walker, M., Litman, D., Kamm, C., Abella, A.: PARADISE: A framework for evaluating spoken dialogue agents. In: Proceedings of ACL 1997, Madrid, Spain, pp. 271–280 (1997)
Walker, M., Rudnicky, A., Prasad, R., Aberdeen, J., Bratt, E.O., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Roukos, S., Sanders, G., Seneff, S., Stallard, D.: DARPA communicator: Cross-system results for the 2001 evaluation. In: ICSLP 2002: Inter. Conf. on Spoken Language Processing, Denver, CO USA, vol. 1, pp. 273–276 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Koller, A. et al. (2010). The First Challenge on Generating Instructions in Virtual Environments. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-15573-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15572-7
Online ISBN: 978-3-642-15573-4
eBook Packages: Computer ScienceComputer Science (R0)