What Went Wrong and Why? Diagnosing Situated Interaction Failures in the Wild

  • Sean AndristEmail author
  • Dan Bohus
  • Ece Kamar
  • Eric Horvitz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10652)


Effective situated interaction hinges on the well-coordinated operation of a set of competencies, including computer vision, speech recognition, and natural language, as well as higher-level inferences about turn taking and engagement. Systems often rely on a set of hand-coded and machine-learned components organized into several sensing and decision-making pipelines. Given their complexity and inter-dependencies, developing and debugging such systems can be challenging. “In-the-wild” deployments outside of controlled lab conditions bring further challenges due to unanticipated phenomena, including unexpected interactions such as playful engagements. We present a methodology for assessing performance, identifying problems, and diagnosing the root causes and influences of different types of failures on the overall performance of a situated interaction system functioning in the wild. We apply the methodology to a dataset of interactions collected with a robot deployed in a public space inside an office building. The analyses identify and characterize multiple types of failures, their causes, and their relationship to overall performance. We employ models that predict overall interaction quality from various combinations of failures. Finally, we discuss lessons learned with such a diagnostic methodology for improving situated systems deployed in the wild.


Situated interaction Human-robot interaction Dialog systems Integrative AI Failure diagnosis 


  1. 1.
    Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J., Dennison, D.: Hidden technical debt in machine learning systems. In: Advances in Neural Information Processing Systems, pp. 2503–2511 (2015)Google Scholar
  2. 2.
    Nushi, B., Kamar, E., Kossmann, D., Horvitz, E.: On human intellect and machine failures: troubleshooting integrative machine learning systems. In: Proceedings of AAAI 2017 (2017)Google Scholar
  3. 3.
    Mirnig, N., Weiss, A., Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., Granström, B., Tscheligi, M.: Face-to-face with a robot: what do we actually talk about? Int. J. Hum. Robot. 10(01), 23 (2013)CrossRefGoogle Scholar
  4. 4.
    Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., Granström, B.: Furhat at robotville: a robot head harvesting the thoughts of the public through multi-party dialogue. In: Proceedings of the International Conference on Intelligent Virtual Agents (2012)Google Scholar
  5. 5.
    Parikh, D., Zitnick, C.L.: Human-debugging of machines. In: The Second Workshop on Computational Social Science and the Wisdom of Crowds (NIPS 2011) (2011)Google Scholar
  6. 6.
    Georgiladakis, S., Athanasopoulou, G., Meena, R., Lopes, J., Chorianopoulou, A., Palogiannidi, E., Iosif, E., Skantze, G., Potamianos, A.: Root cause analysis of miscommunication hotspots in spoken dialogue systems. In: Interspeech 2016, pp. 1156–1160 (2016)Google Scholar
  7. 7.
    Walker, M.A., Litman, D.J., Kamm, C.A., Abella. A.: PARADISE: A framework for evaluating spoken dialogue agents. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 271–280 (1987)Google Scholar
  8. 8.
    Bohus, D., Saw, C.W., Horvitz, E.: Directions robot: In-the-wild experiences and lessons learned. In: Proceedings of AAMAS 2014, Paris, France (2014)Google Scholar
  9. 9.
    Bohus, D., Horvitz, E.: Managing human-robot engagement with forecasts and … um … hesitations. In: Proceedings of ICMI 2014, Istanbul, Turkey (2014)Google Scholar
  10. 10.
    Andrist, S., Bohus, D., Yu, Z., Horvitz, E.: Are you messing with me?: querying about the sincerity of interactions in the open world. In: Proceedings of HRI 2016, Piscataway, NJ, USA, pp. 409–410. IEEE Press (2016)Google Scholar
  11. 11.
    Schmitt, A., Schatz, B., Minker, W.: Modeling and predicting quality in spoken human-computer interaction. In: Proceedings of the SIGDIAL 2011 Conference, pp. 173–184 (2011)Google Scholar
  12. 12.
    Walker, M.A., Litman, D.J., Kamm, C.A., Abella, A.: PARADISE: A framework for evaluating spoken dialogue agents. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics (1997)Google Scholar
  13. 13.
    Meena, R., Lopes, J., Skantze, G., Gustafson, J.: Automatic detection of miscommunication in spoken dialogue systems. In: Proceeding of SIGDIAL (2015)Google Scholar
  14. 14.
    Walker, M., Langkilde, I., Wright, J., Gorin, A., Litman, D.: Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you?. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (2000)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Sean Andrist
    • 1
    Email author
  • Dan Bohus
    • 1
  • Ece Kamar
    • 1
  • Eric Horvitz
    • 1
  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations