The First Conversational Intelligence Challenge

Burtsev, Mikhail; Logacheva, Varvara; Malykh, Valentin; Serban, Iulian Vlad; Lowe, Ryan; Prabhumoye, Shrimai; Black, Alan W.; Rudnicky, Alexander; Bengio, Yoshua

doi:10.1007/978-3-319-94042-7_2

Mikhail Burtsev⁶,
Varvara Logacheva⁶,
Valentin Malykh⁶,
Iulian Vlad Serban⁸,
Ryan Lowe⁹,
Shrimai Prabhumoye⁷,
Alan W. Black⁷,
Alexander Rudnicky⁷ &
…
Yoshua Bengio⁸

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

1246 Accesses
4 Citations

Abstract

The first Conversational Intelligence Challenge was conducted over 2017 with finals at NIPS conference. The challenge IS aimed at evaluating the state of the art in non-goal-driven dialogue systems (chatbots) and collecting a large dataset of human-to-machine and human-to-human conversations manually labelled for quality. We established a task for formal human evaluation of chatbots that allows to test capabilities of chatbot in topic-oriented dialogue. Instead of traditional chit-chat, participating systems and humans were given a task to discuss a short text. Ten dialogue systems participated in the competition. The majority of them combined multiple conversational models such as question answering and chit-chat systems to make conversations more natural. The evaluation of chatbots was performed by human assessors. Almost 1,000 volunteers were attracted and over 4,000 dialogues were collected during the competition. Final score of the dialogue quality for the best bot was 2.7 compared to 3.8 for human. This demonstrates that current technology allows supporting dialogue on a given topic but with quality significantly lower than that of human. To close this gap we plan to continue the experiments by organising the next conversational intelligence competition. This future work will benefit from the data we collected and dialogue systems that we made available after the competition presented in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://developer.amazon.com/alexaprize
2.
https://en.wikipedia.org/wiki/Loebner_Prize
3.
https://bibinlp.umiacs.umd.edu/
4.
http://turing.tilda.ws/
5.
https://telegram.org
6.
https://messenger.com
7.
https://mongodb.com
8.
https://github.com/deepmipt/convai-testing-system
9.
https://github.com/DeepPavlov/convai/tree/master/2017/solutions
10.
https://github.com/sld/convai-bot-1337
11.
Unfortunately, we were not able to collect any more dialogues during the round

References

Bordes, A. and Weston, J. (2016). Learning end-to-end goal-oriented dialog. CoRR, abs/1605.07683.
Google Scholar
Chorowski, J., Łańcucki, A., Malik, S., Pawlikowski, M., Rychlikowski, P., and Zykowski, P. (2018). A Talker Ensemble: University of Wrocaw entry to the NIPS 2017 Conversational Intelligence Challenge. NIPS 2017 Competition track Springer Proceedings.
Google Scholar
Lavie, A. and Agarwal, A. (2007). Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 228–231, Stroudsburg, PA, USA.
Chapter Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., and Dolan, B. (2016). A persona-based neural conversation model. CoRR, abs/1603.06155.
Google Scholar
Liu, C., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. CoRR, abs/1603.08023.
Google Scholar
Logacheva, V., Burtsev, M., Malykh, V., Polulyakh, V., and Seliverstov, A. (2018). ConvAI Dataset of Topic-Oriented Human-to-Chatbot Dialogues. NIPS 2017 Competition track Springer Proceedings.
Google Scholar
Lowe, R., Noseworthy, M., Serban, I. V., Angelard-Gontier, N., Bengio, Y., and Pineau, J. (2017). Towards an automatic turing test: Learning to evaluate dialogue responses. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1116–1126, Vancouver, Canada.
Chapter Google Scholar
Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., and Pineau, J. (2016). On the evaluation of dialogue systems with next utterance classification. CoRR, abs/1605.05414.
Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 311–318, Stroudsburg, PA, USA.
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas.
Chapter Google Scholar
Serban, I. V., Sankar, C., Germain, M., Zhang, S., Lin, Z., Subramanian, S., Kim, T., Pieper, M., Chandar, S., Ke, N. R., Mudumba, S., de Brébisson, A., Sotelo, J., Suhubdy, D., Michalski, V., Nguyen, A., Pineau, J., and Bengio, Y. (2017). A deep reinforcement learning chatbot. CoRR, abs/1709.02349.
Google Scholar
Serban, I. V., Sordoni, A., Bengio, Y., Courville, A. C., and Pineau, J. (2015). Hierarchical neural network generative models for movie dialogues. CoRR, abs/1507.04808.
Google Scholar
Shen, X., Su, H., Li, Y., Li, W., Niu, S., Zhao, Y., Aizawa, A., and Long, G. (2017). A conditional variational framework for dialog generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 504–509.
Google Scholar
Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. (2015). End-to-end memory networks. In NIPS-2015: Proceedings of the 28th International Conference on Neural Information Processing Systems, pages 2440–2448.
Google Scholar
Yu, Z., Xu, Z., Black, A. W., and Rudnicky, A. I. (2016). Chatbot Evaluation and Database Expansion via Crowdsourcing. In WOCHAT workshop at IVA-2016, Los Angeles, California.
Google Scholar

Download references

Acknowledgements

Participation of MB, VL and VM was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Mikhail Burtsev, Varvara Logacheva & Valentin Malykh
Carnegie Mellon University, Pittsburgh, PA, USA
Shrimai Prabhumoye, Alan W. Black & Alexander Rudnicky
University of Montreal, Montreal, QC, Canada
Iulian Vlad Serban & Yoshua Bengio
McGill University, Montreal, QC, Canada
Ryan Lowe

Authors

Mikhail Burtsev
View author publications
You can also search for this author in PubMed Google Scholar
Varvara Logacheva
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Malykh
View author publications
You can also search for this author in PubMed Google Scholar
Iulian Vlad Serban
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Lowe
View author publications
You can also search for this author in PubMed Google Scholar
Shrimai Prabhumoye
View author publications
You can also search for this author in PubMed Google Scholar
Alan W. Black
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Rudnicky
View author publications
You can also search for this author in PubMed Google Scholar
Yoshua Bengio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikhail Burtsev .

Editor information

Editors and Affiliations

Department Mathematics & Informatics, University of Barcelona, Barcelona, Spain
Sergio Escalera
Microsoft (United States), Redmond, WA, USA
Markus Weimer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burtsev, M. et al. (2018). The First Conversational Intelligence Challenge. In: Escalera, S., Weimer, M. (eds) The NIPS '17 Competition: Building Intelligent Systems. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94042-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-94042-7_2
Published: 28 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94041-0
Online ISBN: 978-3-319-94042-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics