Assessing Agreement in Human-Robot Dialogue Strategies: A Tale of Two Wizards
The Wizard-of-Oz (WOz) method is a common experimental technique in virtual agent and human-robot dialogue research for eliciting natural communicative behavior from human partners when full autonomy is not yet possible. For the first phase of our research reported here, wizards play the role of dialogue manager, acting as a robot’s dialogue processing. We describe a novel step within WOz methodology that incorporates two wizards and control sessions: the wizards function much like corpus annotators, being asked to make independent judgments on how the robot should respond when receiving the same verbal commands in separate trials. We show that inter-wizard discussion after the control sessions and the resolution with a reconciled protocol for the follow-on pilot sessions successfully impacts wizard behaviors and significantly aligns their strategies. We conclude that, without control sessions, we would have been unlikely to achieve both the natural diversity of expression that comes with multiple wizards and a better protocol for modeling an automated system.
KeywordsNatural language dialogue Human-robot communication
Providing dialogue capabilities to robots will enable them to become effective teammates with humans in many collaborative tasks, such as search-and-rescue operations and reconnaissance. We propose a multi-phase plan to achieve the goal of fully automated, natural communication between humans and robots, leveraging recent advances in virtual agent dialogue. In the first phase, we conduct exploratory data collection in tasks where naïve humans provide spoken instructions to a robot, but a wizard experimenter stands in for the robot’s communications intelligence. The wizard may use free response to reply to the spoken dialogue commands, but does so only in text form through a chat window. A second phase automates some of the wizard labor, where instead of free response, the wizard uses an interface that generalizes command handling and response generation based on dialogue observed in the first phase. In a third, final phase, the wizard will be “automated away” with a dialogue manager trained from second-phase wizard decisions with the specialized interface. Our approach resembles that taken with the virtual agent SimSensei .
This paper focuses on research from the first phase, where we explore how best to encourage natural diversity in communication strategies used by the naïve human, while imposing some guidelines for consistent strategies in the wizard’s communications so that dialogue processing is tractable but also natural. We present findings from conducting control sessions, a novel contribution to the Wizard-of-Oz methodology that turns the focus of experimentation to the wizard. We explore the possible diversity in communicative strategies for two individuals playing the wizard role. All other aspects of the interaction, such as the experimenters and environment context, are held constant.
2 Collaborative Exploration Domain
In order to bootstrap the robot’s conceived capabilities of automated language processing and navigation, we employ the Wizard-of-Oz method. Figure 1 presents our first-phase setup. A Dialogue Manager (DM-Wizard) listens to the Commander’s speech, and decides whether to prompt for clarification. If deemed executable in the current context, the DM-Wizard passes a constrained, text version of the Commander’s instruction to the Robot Navigator (RN), an experimenter who teleoperates the robot. The DM-Wizard and RN both see the same map and photos requested by the Commander. However, the DM-Wizard and RN also see a live video feed from the robot to facilitate the shared, accurate understanding of the robot’s environment.
We trained two experimenters to be DM-Wizards across a series of pre-pilot and pilot study sessions. There are several benefits to multiple DM-Wizards: we can collect variation in their decisions, assess their consistency, and identify opportunities for aligning their behavior. This motivated us to conduct control sessions, where we substituted the naïve Commander with an experimenter who communicated a pre-defined list of about 70 navigational commands to each DM-Wizard in separate trials, many of which were problematic and unseen in past data collection.
To analyze the variation in the DM-Wizards’ responses, each message from the DM-Wizard to the Commander in the control sessions was annotated with dialogue-moves: the types of actions available to the DM-Wizard in the communication protocol . Validation of the set of dialogue-moves was performed on two dialogues (99 DM to Commander messages), annotated independently by the first three authors, with up to three dialogue-moves per message. We calculated agreement using Krippendorf’s \(\alpha \) with the MASI distance metric , which allows for partial agreement between sets. Agreement between all three annotators was high (\(\alpha =0.92\)).
DM-Wizard dialogue-moves to the Commander for control and subsequent pilots.
Control Sessions. We observe some marked differences in strategies taken by the wizards: Half of DM-Wizard1’s dialogue-moves provided feedback, compared to only a third for DM-Wizard2. Feedback is defined broadly as dialogue-moves that acknowledge a Commander’s conversational move or an action (often completion of a request). For example, DM-Wizard1 used the feedback sent, indicating each time that a requested photo was sent to the Commander. Meanwhile, DM-Wizard2 used more describe moves: general statements detailing the situation, including the environment, plans, or actions. Describe moves constituted 41 % of DM-Wizard2’s dialogue-moves, compared to 24 % for DM-Wizard1. These results suggest that DM-Wizard1 took a strategy of actively providing feedback, while DM-Wizard2 echoed back situations and plans. Proportions of clarify and request-info dialogue-moves were predictably similar given that both DM-Wizards faced the same number of problematic instructions.
Post-control Adjudication. After both DM-Wizards had completed the control session, they met to discuss the results. Many of the challenging commands given in the control sessions revealed a lack of complete and shared understanding of the robot’s capabilities and how requests for help from the robot should be handled. This discussion session also revealed that the basic strategies taken by DM-Wizards could be generalized: the DM-Wizards agreed that providing simpler feedback-type evidence of the robot’s status was more efficient than using more detailed describe moves.
Post-control Pilot Sessions. DM-Wizard decisions following the final two pilots, conducted after the control sessions, indicate improved agreement (see Table 1). As a direct result of adjudication, both DM-Wizards used a greater count of feedback and in a greater proportion of their dialogue-moves. In particular, status updates such as done and sent experienced an increase from control to post-control pilot session. Notably, other DM-Wizard behaviors did not seem to be affected by the control sessions and ensuing discussion and guideline updates. This indicates that the control sessions facilitated a “surgical strike,” precisely changing only extremely divergent behaviors.
The Wizard-of-Oz (WOz) method is useful for eliciting natural human communication and readily permits variation based on the individual playing the wizard role. In this research, the wizard operates as the robot’s dialogue processing, typing responses and clarifications to a human Commander for the purpose of exploratory data collection. We introduced control sessions, a novel contribution to WOz methodology that supports multiple wizards. Discussions between the wizards after the control sessions successfully impacted their behaviors and aligned their strategies. Without control sessions, we would have been unlikely to achieve both the natural diversity of expression that comes with multiple wizards and a better protocol for modeling an automated system.
The effort described here is supported by the U.S. Army. Any opinion, content or information presented does not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.
- 1.DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhommet, M., et al.: SimSensei Kiosk: a virtual human interviewer for healthcare decision support. In: Proceedings of AAMAS (2014)Google Scholar
- 2.Marge, M., Bonial, C., Byrne, B., Cassidy, T., Evans, A.W., Hill, S.G., Voss, C.: Applying the Wizard-of-Oz technique to multimodal human-robot dialogue. In: Proceedings of RO-MAN (2016)Google Scholar
- 3.Passonneau, R.: Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation. In: Proceedings of LREC (2006)Google Scholar
- 4.Roque, A., Leuski, A., Rangarajan, V., Robinson, S., Vaswani, A., Narayanan, S., Traum, D.: Radiobot-CFF: a spoken dialogue system for military training. In: Proceedings of Interspeech (2006)Google Scholar