ConvAI2 Dataset of Non-goal-Oriented Human-to-Bot Dialogues

Logacheva, Varvara; Malykh, Valentin; Litinsky, Aleksey; Burtsev, Mikhail

doi:10.1007/978-3-030-29135-8_11

Varvara Logacheva⁶,
Valentin Malykh⁶,
Aleksey Litinsky⁶ &
…
Mikhail Burtsev⁶

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

1059 Accesses
4 Citations

Abstract

Conversational Intelligence Challenge (ConvAI) is a competition of non-goal-oriented dialogue systems (chatbots). It aims at (1) improving state-of-the-art chatbots and (2) creating an evaluation setup that allows performing unbiased evaluation and comparison of chatbots manually and automatically. The task of the second ConvAI competition is smalltalk about common topics such as hobbies, work, family, pets.

This report contains the description of human-to-bot dialogues collected during ConvAI2. We analyse this data and compare it with dialogues from the first ConvAI (discussion of Wikipedia articles). We found that the task of ConvAI2 is both more engaging for user and less challenging for chatbots than the task of the first ConvAI. Our comparison of performance of paid workers and volunteers demonstrated that paid workers generate dialogues of better quality and score chatbots higher. However, in order to make the competition closer to real-world cases of chatbot usage the task should be more engaging for volunteers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The datasets are available online at http://convai.io/data/.
2.
https://www.mturk.com/.
3.
https://toloka.yandex.ru/.
4.
http://parl.ai/.
5.
https://telegram.org.
6.
https://messenger.com.
7.
http://deephack.me/chat.

References

Bordes, A. and Weston, J. (2016). Learning end-to-end goal-oriented dialog. CoRR, abs/1605.07683.
Google Scholar
Burtsev, M., Logacheva, V., Malykh, V., Serban, I., Lowe, R., Prabhumoye, S., Black, A. W., Rudnicky, A., and Bengio, Y. (2018). The First Conversational Intelligence Challenge. NIPS 2017 Competition track Springer Proceedings.
Chapter Google Scholar
Dinan, E., Logacheva, V., Malykh, V., Miller, A., Shuster, K., Urbanek, J., Kiela, D., Szlam, A., Serban, I., Lowe, R., Prabhumoye, S., Black, A. W., Rudnicky, A., Williams, J., Pineau, J., Burtsev, M., and Weston, J. (2019). The Second Conversational Intelligence Challenge (ConvAI2). NIPS 2018 Competition track Springer Proceedings.
Google Scholar
Dušek, O., Novikova, J., and Rieser, V. (2017). Referenceless Quality Estimation for Natural Language Generation. In ICML-2017: 34th International Conference on Machine Learning, 1st Workshop on Learning to Generate Natural Language (LGNL 2017).
Google Scholar
Khatri, C., Hedayatnia, B., Venkatesh, A., Nunn, J., Pan, Y., Liu, Q., Song, H., Gottardi, A., Kwatra, S., Pancholi, S., et al. (2018). Advancing the state of the art in open domain dialog systems through the alexa prize. arXiv preprint arXiv:1812.10757.
Google Scholar
Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B., and Dreyer, M. (2017). Just ASK: building an architecture for extensible self-service spoken language understanding. CoRR, abs/1711.00549.
Google Scholar
Lavie, A. and Agarwal, A. (2007). Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 228–231, Stroudsburg, PA, USA.
Chapter Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2015). A Diversity-Promoting Objective Function for Neural Conversation Models.
Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2016). A persona-based neural conversation model. CoRR, abs/1603.06155.
Google Scholar
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Proc. ACL workshop on Text Summarization Branches Out.
Google Scholar
Liu, C., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. CoRR, abs/1603.08023.
Google Scholar
Logacheva, V., Burtsev, M., Malykh, V., Polulyakh, V., and Seliverstov, A. (2018). ConvAI Dataset of Topic-Oriented Human-to-Chatbot Dialogues. NIPS 2017 Competition track Springer Proceedings.
Chapter Google Scholar
Lowe, R., Noseworthy, M., Serban, I. V., Angelard-Gontier, N., Bengio, Y., and Pineau, J. (2017). Towards an automatic turing test: Learning to evaluate dialogue responses. CoRR, abs/1708.07149.
Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 311–318, Stroudsburg, PA, USA.
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas.
Chapter Google Scholar
Ram, A., Prasad, R., Khatri, C., Venkatesh, A., Gabriel, R., Liu, Q., Nunn, J., Hedayatnia, B., Cheng, M., Nagar, A., King, E., Bland, K., Wartick, A., Pan, Y., Song, H., Jayadevan, S., Hwang, G., and Pettigrue, A. (2018). Conversational AI: the science behind the Alexa prize. CoRR, abs/1801.03604.
Google Scholar
Specia, L., Blain, F., Logacheva, V., Astudillo, R., and Martins, A. F. T. (2018). Findings of the wmt 2018 shared task on quality estimation. In Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers, pages 702–722, Belgium, Brussels. Association for Computational Linguistics.
Google Scholar
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.
Article MathSciNet Google Scholar
Vinyals, O. and Le, Q. V. (2015). A neural conversational model. CoRR, abs/1506.05869.
Google Scholar
Yu, Z., Xu, Z., Black, A. W., and Rudnicky, A. I. (2016). Chatbot Evaluation and Database Expansion via Crowdsourcing. WOCHAT workshop.
Google Scholar
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., and Weston, J. (2018). Personalizing dialogue agents: I have a dog, do you have pets too? CoRR, abs/1801.07243.
Google Scholar
Zhou, L., Gao, J., Li, D., and Shum, H.-Y. (2018). The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. arXiv e-prints, page arXiv:1812.08989.
Google Scholar

Download references

Acknowledgements

The work was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002. The authors are also grateful to Olga Megorskaya and other members of Yandex.Toloka team for their help with setting up the data collection.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Varvara Logacheva, Valentin Malykh, Aleksey Litinsky & Mikhail Burtsev

Authors

Varvara Logacheva
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Malykh
View author publications
You can also search for this author in PubMed Google Scholar
Aleksey Litinsky
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Burtsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikhail Burtsev .

Editor information

Editors and Affiliations

Universitat de Barcelona and Computer, Vision Center, Barcelona, Spain
Sergio Escalera
Amazon (Berlin), Berlin, Berlin, Germany
Ralf Herbrich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Logacheva, V., Malykh, V., Litinsky, A., Burtsev, M. (2020). ConvAI2 Dataset of Non-goal-Oriented Human-to-Bot Dialogues. In: Escalera, S., Herbrich, R. (eds) The NeurIPS '18 Competition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-29135-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-29135-8_11
Published: 30 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29134-1
Online ISBN: 978-3-030-29135-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics