Skip to main content

ConvAI2 Dataset of Non-goal-Oriented Human-to-Bot Dialogues

  • Conference paper
  • First Online:
The NeurIPS '18 Competition

Abstract

Conversational Intelligence Challenge (ConvAI) is a competition of non-goal-oriented dialogue systems (chatbots). It aims at (1) improving state-of-the-art chatbots and (2) creating an evaluation setup that allows performing unbiased evaluation and comparison of chatbots manually and automatically. The task of the second ConvAI competition is smalltalk about common topics such as hobbies, work, family, pets.

This report contains the description of human-to-bot dialogues collected during ConvAI2. We analyse this data and compare it with dialogues from the first ConvAI (discussion of Wikipedia articles). We found that the task of ConvAI2 is both more engaging for user and less challenging for chatbots than the task of the first ConvAI. Our comparison of performance of paid workers and volunteers demonstrated that paid workers generate dialogues of better quality and score chatbots higher. However, in order to make the competition closer to real-world cases of chatbot usage the task should be more engaging for volunteers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The datasets are available online at http://convai.io/data/.

  2. 2.

    https://www.mturk.com/.

  3. 3.

    https://toloka.yandex.ru/.

  4. 4.

    http://parl.ai/.

  5. 5.

    https://telegram.org.

  6. 6.

    https://messenger.com.

  7. 7.

    http://deephack.me/chat.

References

  1. Bordes, A. and Weston, J. (2016). Learning end-to-end goal-oriented dialog. CoRR, abs/1605.07683.

    Google Scholar 

  2. Burtsev, M., Logacheva, V., Malykh, V., Serban, I., Lowe, R., Prabhumoye, S., Black, A. W., Rudnicky, A., and Bengio, Y. (2018). The First Conversational Intelligence Challenge. NIPS 2017 Competition track Springer Proceedings.

    Chapter  Google Scholar 

  3. Dinan, E., Logacheva, V., Malykh, V., Miller, A., Shuster, K., Urbanek, J., Kiela, D., Szlam, A., Serban, I., Lowe, R., Prabhumoye, S., Black, A. W., Rudnicky, A., Williams, J., Pineau, J., Burtsev, M., and Weston, J. (2019). The Second Conversational Intelligence Challenge (ConvAI2). NIPS 2018 Competition track Springer Proceedings.

    Google Scholar 

  4. Dušek, O., Novikova, J., and Rieser, V. (2017). Referenceless Quality Estimation for Natural Language Generation. In ICML-2017: 34th International Conference on Machine Learning, 1st Workshop on Learning to Generate Natural Language (LGNL 2017).

    Google Scholar 

  5. Khatri, C., Hedayatnia, B., Venkatesh, A., Nunn, J., Pan, Y., Liu, Q., Song, H., Gottardi, A., Kwatra, S., Pancholi, S., et al. (2018). Advancing the state of the art in open domain dialog systems through the alexa prize. arXiv preprint arXiv:1812.10757.

    Google Scholar 

  6. Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B., and Dreyer, M. (2017). Just ASK: building an architecture for extensible self-service spoken language understanding. CoRR, abs/1711.00549.

    Google Scholar 

  7. Lavie, A. and Agarwal, A. (2007). Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 228–231, Stroudsburg, PA, USA.

    Chapter  Google Scholar 

  8. Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2015). A Diversity-Promoting Objective Function for Neural Conversation Models.

    Google Scholar 

  9. Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2016). A persona-based neural conversation model. CoRR, abs/1603.06155.

    Google Scholar 

  10. Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Proc. ACL workshop on Text Summarization Branches Out.

    Google Scholar 

  11. Liu, C., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. CoRR, abs/1603.08023.

    Google Scholar 

  12. Logacheva, V., Burtsev, M., Malykh, V., Polulyakh, V., and Seliverstov, A. (2018). ConvAI Dataset of Topic-Oriented Human-to-Chatbot Dialogues. NIPS 2017 Competition track Springer Proceedings.

    Chapter  Google Scholar 

  13. Lowe, R., Noseworthy, M., Serban, I. V., Angelard-Gontier, N., Bengio, Y., and Pineau, J. (2017). Towards an automatic turing test: Learning to evaluate dialogue responses. CoRR, abs/1708.07149.

    Google Scholar 

  14. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 311–318, Stroudsburg, PA, USA.

    Google Scholar 

  15. Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas.

    Chapter  Google Scholar 

  16. Ram, A., Prasad, R., Khatri, C., Venkatesh, A., Gabriel, R., Liu, Q., Nunn, J., Hedayatnia, B., Cheng, M., Nagar, A., King, E., Bland, K., Wartick, A., Pan, Y., Song, H., Jayadevan, S., Hwang, G., and Pettigrue, A. (2018). Conversational AI: the science behind the Alexa prize. CoRR, abs/1801.03604.

    Google Scholar 

  17. Specia, L., Blain, F., Logacheva, V., Astudillo, R., and Martins, A. F. T. (2018). Findings of the wmt 2018 shared task on quality estimation. In Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers, pages 702–722, Belgium, Brussels. Association for Computational Linguistics.

    Google Scholar 

  18. Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.

    Article  MathSciNet  Google Scholar 

  19. Vinyals, O. and Le, Q. V. (2015). A neural conversational model. CoRR, abs/1506.05869.

    Google Scholar 

  20. Yu, Z., Xu, Z., Black, A. W., and Rudnicky, A. I. (2016). Chatbot Evaluation and Database Expansion via Crowdsourcing. WOCHAT workshop.

    Google Scholar 

  21. Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., and Weston, J. (2018). Personalizing dialogue agents: I have a dog, do you have pets too? CoRR, abs/1801.07243.

    Google Scholar 

  22. Zhou, L., Gao, J., Li, D., and Shum, H.-Y. (2018). The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. arXiv e-prints, page arXiv:1812.08989.

    Google Scholar 

Download references

Acknowledgements

The work was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002. The authors are also grateful to Olga Megorskaya and other members of Yandex.Toloka team for their help with setting up the data collection.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikhail Burtsev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Logacheva, V., Malykh, V., Litinsky, A., Burtsev, M. (2020). ConvAI2 Dataset of Non-goal-Oriented Human-to-Bot Dialogues. In: Escalera, S., Herbrich, R. (eds) The NeurIPS '18 Competition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-29135-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29135-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29134-1

  • Online ISBN: 978-3-030-29135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics