Abstract
Conversational Intelligence Challenge (ConvAI) is a competition of non-goal-oriented dialogue systems (chatbots). It aims at (1) improving state-of-the-art chatbots and (2) creating an evaluation setup that allows performing unbiased evaluation and comparison of chatbots manually and automatically. The task of the second ConvAI competition is smalltalk about common topics such as hobbies, work, family, pets.
This report contains the description of human-to-bot dialogues collected during ConvAI2. We analyse this data and compare it with dialogues from the first ConvAI (discussion of Wikipedia articles). We found that the task of ConvAI2 is both more engaging for user and less challenging for chatbots than the task of the first ConvAI. Our comparison of performance of paid workers and volunteers demonstrated that paid workers generate dialogues of better quality and score chatbots higher. However, in order to make the competition closer to real-world cases of chatbot usage the task should be more engaging for volunteers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The datasets are available online at http://convai.io/data/.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Bordes, A. and Weston, J. (2016). Learning end-to-end goal-oriented dialog. CoRR, abs/1605.07683.
Burtsev, M., Logacheva, V., Malykh, V., Serban, I., Lowe, R., Prabhumoye, S., Black, A. W., Rudnicky, A., and Bengio, Y. (2018). The First Conversational Intelligence Challenge. NIPS 2017 Competition track Springer Proceedings.
Dinan, E., Logacheva, V., Malykh, V., Miller, A., Shuster, K., Urbanek, J., Kiela, D., Szlam, A., Serban, I., Lowe, R., Prabhumoye, S., Black, A. W., Rudnicky, A., Williams, J., Pineau, J., Burtsev, M., and Weston, J. (2019). The Second Conversational Intelligence Challenge (ConvAI2). NIPS 2018 Competition track Springer Proceedings.
Dušek, O., Novikova, J., and Rieser, V. (2017). Referenceless Quality Estimation for Natural Language Generation. In ICML-2017: 34th International Conference on Machine Learning, 1st Workshop on Learning to Generate Natural Language (LGNL 2017).
Khatri, C., Hedayatnia, B., Venkatesh, A., Nunn, J., Pan, Y., Liu, Q., Song, H., Gottardi, A., Kwatra, S., Pancholi, S., et al. (2018). Advancing the state of the art in open domain dialog systems through the alexa prize. arXiv preprint arXiv:1812.10757.
Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B., and Dreyer, M. (2017). Just ASK: building an architecture for extensible self-service spoken language understanding. CoRR, abs/1711.00549.
Lavie, A. and Agarwal, A. (2007). Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 228–231, Stroudsburg, PA, USA.
Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2015). A Diversity-Promoting Objective Function for Neural Conversation Models.
Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2016). A persona-based neural conversation model. CoRR, abs/1603.06155.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Proc. ACL workshop on Text Summarization Branches Out.
Liu, C., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. CoRR, abs/1603.08023.
Logacheva, V., Burtsev, M., Malykh, V., Polulyakh, V., and Seliverstov, A. (2018). ConvAI Dataset of Topic-Oriented Human-to-Chatbot Dialogues. NIPS 2017 Competition track Springer Proceedings.
Lowe, R., Noseworthy, M., Serban, I. V., Angelard-Gontier, N., Bengio, Y., and Pineau, J. (2017). Towards an automatic turing test: Learning to evaluate dialogue responses. CoRR, abs/1708.07149.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 311–318, Stroudsburg, PA, USA.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas.
Ram, A., Prasad, R., Khatri, C., Venkatesh, A., Gabriel, R., Liu, Q., Nunn, J., Hedayatnia, B., Cheng, M., Nagar, A., King, E., Bland, K., Wartick, A., Pan, Y., Song, H., Jayadevan, S., Hwang, G., and Pettigrue, A. (2018). Conversational AI: the science behind the Alexa prize. CoRR, abs/1801.03604.
Specia, L., Blain, F., Logacheva, V., Astudillo, R., and Martins, A. F. T. (2018). Findings of the wmt 2018 shared task on quality estimation. In Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers, pages 702–722, Belgium, Brussels. Association for Computational Linguistics.
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.
Vinyals, O. and Le, Q. V. (2015). A neural conversational model. CoRR, abs/1506.05869.
Yu, Z., Xu, Z., Black, A. W., and Rudnicky, A. I. (2016). Chatbot Evaluation and Database Expansion via Crowdsourcing. WOCHAT workshop.
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., and Weston, J. (2018). Personalizing dialogue agents: I have a dog, do you have pets too? CoRR, abs/1801.07243.
Zhou, L., Gao, J., Li, D., and Shum, H.-Y. (2018). The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. arXiv e-prints, page arXiv:1812.08989.
Acknowledgements
The work was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002. The authors are also grateful to Olga Megorskaya and other members of Yandex.Toloka team for their help with setting up the data collection.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Logacheva, V., Malykh, V., Litinsky, A., Burtsev, M. (2020). ConvAI2 Dataset of Non-goal-Oriented Human-to-Bot Dialogues. In: Escalera, S., Herbrich, R. (eds) The NeurIPS '18 Competition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-29135-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-29135-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29134-1
Online ISBN: 978-3-030-29135-8
eBook Packages: Computer ScienceComputer Science (R0)