Abstract
This paper describes the development of a monitoring system for the NAO humanoid robot for speaking presentations. The dialog between the human and the robot is based on both a dialog list stored by the robot and its dynamic retrieval of information from an external topic map server. The stereoscopic 3D capability of a see-through wearable binocular-type display, i.e., the EPSON MOVERIO BT-200, is used to allow simultaneous observation of the robot and the retrieved information, and the distance between the side-by-side images is controlled to create a converged image overlaid on the robot’s body. This image shift method is examined using simple line images with the see-through display, and the effect of pictorial cues of a real object on the generated images is discussed.
Keywords
1 Introduction
Studies on education aided by humanoid robots with cognition function [1] have attracted interest for years. A humanoid robot has worked as a teaching assistant controlled by an instructional design tool in primary education [2]. An important characteristic of a humanoid robot for education is social interactivity with learners. A socially interactive humanoid robot in an educational environment can increase the efficiency of learning [3]. A fuzzy control system for robot communication has been proven effective in promoting self-efficacy in language learning [4]. A combination of educational robot and multimedia learning materials has been proven beneficial for increasing student motivation [5]. Furthermore, interactions with humanoid robots increase human creativity [6].
This study introduces the NAO humanoid robot (Aldebaran Robotics, SoftBank Group) for a principal’s speech and presentation at an elementary school. NAO has programmable gesture and dialog capabilities, which enable cognitive interaction with humans based on recognition functionalities for speech, faces, and objects [7]. Educational researchers have utilized NAO for the instruction and care of children with autism [8–10]. The expressive and affective behaviors of the robot improve communication and reinforce learning [11]. Furthermore, future smart environments and ambient intelligence are expected to produce witty humor [12].
Educational and therapeutic use of humanoid robots typically involves interactive relationships between humans and humanoid robots. In contrast, stage speeches or presentations are essentially one-way forms of communication, i.e., the audience is passive. A multi-robot system has performed Manzai, which is a Japanese-style comedy talk show, usually performed by two people, as a passive-social medium [13]. Hayashi et al. used a network to facilitate communication between robots performing Manzai rather than direct speech recognition between them because the sensing and recognition system was inadequate to create Japanese-style performance comedy. The humanoid NAO’s recognition capability is satisfactory because the words it must recognize are pre-registered or pre-downloaded before the corresponding dialog occurs.
In this study, we examine a pilot system with the humanoid NAO that downloads keywords and related dialogs from an external e-learning server in which data are interconnected semantically and structured on the basis of topic map technology [14, 15]. To engage dialog between a humanoid robot and a person, it is necessary to know what topic words the robot has downloaded. For this purpose, a downloaded list of topic words is shown on a web page generated by the Internet server of NAO’s operating system. A human presenter checks the list of candidate words using a see-through wearable display connected to an Android device while engaging the dialog. A stereoscopic 3D display was used to allow simultaneous observation of the humanoid robot and the retrieved information. This paper describes a simple stereoscopic 3D vision method to display the topic texts at an appropriate depth.
2 Method
Topic Map-based e-Learning Server.
The author has created the “Everyday Physics on Web” (EPW) e-learning portal. The EPW system is based on topic map technology [16, 17]. Topic maps (ISO/IEC JTC1/SC34) represent information using a “topic,” “association,” and “occurrence.” Topics represent subjects that are interconnected by various types of associations. An occurrence is a specific type of association that connects a topic and the actual information resources, such as text and web pages. The networked structures of topics are referred to as topic map ontology. Topic maps enable rich and flexible indexing based on semantics and increase the findability of information. Since one can edit the knowledge structure in the topic map tier, a topic maps-based web system is efficient relative to extensibility and manageability.
The EPW topic map was built using the “Ontopia” [17] topic maps server. Ontopia has its own topic map query language, “tolog,” and the navigator framework, which consists of a set of tolog tag libraries, can generate JavaServer Pages. In addition, Ontopia has a web service interface, i.e., the Topic Maps Remote Access Protocol (TMRAP), which enables retrieval of topic map fragments from a remote topic maps server. By enabling TMRAP with EPW, one can utilize the topic map of the EPW server on the client. One can perform tolog queries from the client to retrieve any element of the EPW server.
See-through Wearable Display.
A see-through type binocular wearable display, i.e., the EPSON MOVERIO BT-200 (Fig. 1a), was used to monitor the topics of the humanoid robot’s talk. The BT-200 has Wi-Fi and Bluetooth connectivity. In this study, the mirroring capability over Wi-Fi was used to show the PC display on the wearable display. In addition, the binocular dual displays work as a stereoscopic side-by-side 3D viewer. It has a 960 × 540 (pixel) area as the 2D display, and side-by-side 480 × 540 (pixel) areas are transformed into the whole area and shown to both the left and right eyes for 3D display. The distance between the left and right displays is 65 mm. In the 3D display mode, the image appears to be located approximately 4.5 m in front of the user.
Test Application.
A simple Adobe Flash application was created to test the stereoscopic 3D (S3D) of text representation in the environment using the Papervision3D library. This application shows the same “Y-letter” as three lines connected at their edges in a plane parallel to the image plane (or screen), as shown in Fig. 1b. The same “Y-letter” lines were drawn in the centers of the side-by-side windows on a black background. These windows are movable in the horizontal directions by pushing the buttons. When this side-by-side display of the “Y-letter” is represented in the 3D mode, the left and right images exhibit a parallax, and the image appears at a distance of approximately 4.5 m from the user by default. The parallax of the image is changed by moving the right or left window. This application works on the Flash player on a PC. Then, the application display is mirrored to and controlled by the BT-200.
Test Process.
We must be able to see the retrieved text while looking at and talking to the humanoid robot. Thus, the accommodation point of the user’s eyes is primarily at the position of the robot. To observe the robot and the projected text simultaneously, the S3D position of the text must be moved to the appropriate position (i.e., depth) of the robot. For this purpose, the distance between the side-by-side windows of the application was reduced. When the accommodation and convergence agree at a position, we have clear focus. Some applications have been published on the MOVERIO application site [18, 19].
To test the effect of convergence on the cognition of the depth position and a possible illusion occurring during the display of the “Y-letter,” two tests were conducted. The participants were asked to hold and focus on a cube as a target object, as shown in Fig. 2 (Tables 1 and 2).
Test 1 checks if the depth of the line image is changeable by moving only the right side image. In addition, to consider how the projected image merges with the real scene, the illusion of line caused by the pictorial cues of cube edges.
In test 2, the participants were asked for their impressions about the effect of the different ways to converge the side-by-side images. We compared the impressions about the motion of the line images with and without coexistence of the cube near the line images.
NAO Humanoid Robot.
An application of the NAO humanoid robot was created using its development environment, “Choregraphe.” Note that the “NAOqi” programming framework is capable of using external service APIs. In this study, URL TMRAP requests were sent to the EPW server to obtain a candidate list of topic names. When the person speaks one of the list items and when it is recognized by the speech recognition system, the dialog occurrence and topic list associated with the previous topic are again requested from the EPW. Then, using speech recognition, NAO speaks the dialog occurrence and waits for the person to speak. Thus, the person needs to know the candidate list of words retrieved by NAO. Then, when NAO obtains the topic name list, it generates a web page for the topic name list on its internal website. The person requests this page from NAO’s URL, as if looking into the robot’s mind. In this manner, the dialog between the humanoid and human can be developed dynamically using the knowledge structure of the EPW topic map.
Manzai and Comic Frame.
Japanese comic performance has a common framework, i.e., Furi, Boke, Tsukkomi, and Ochi. Furi is a proposal or a suggestion for an interest, topic, and atmosphere of talk. Boke is speaking funny lines, and Tsukkomi is responding to the funny lines to make it impressive. The flow of Furi, Boke, Tsukkomi, and Ochi is a traditional framework of Japanese comic performance, such as Manzai. The speaking system with the humanoid robot in this work has the same structure. First, the human provides Furi from the topic list retrieved by the robot. Then, the robot presents the dialog occurrence, which is the main content of the talk. Finally, the human responds to the content shown by the robot. Again, the human can restart the frame choosing one of the related topics. If the talk is humorous, it is considered an effective representation of Manzai.
3 Results and Discussion
3.1 Topic Maps-Based Dialog
We conducted speech demonstrations, as shown in Fig. 3, based on the combination of topic map retrieval and the original dialog box of Choregraphe. Notably, the latter provides an exchange of rather humorous words. The former topic map talk was conducted to obtain knowledge about particular words. The human intermediates these knowledgeable parts in an entertaining discussion.
NAO and the principal’s talk began in autumn 2015. Many elementary school students appear to look forward to NAO’s talk with the principal. From autumn 2015, a second-grade class began to play postal system as an activity in school. From then until January 2016, the principal received 46 letters from students; 70 % of the letters referred to NAO. In particular, the knowledgeable phrases that NAO spoke often encouraged the students to giggle or laugh. Students might feel a sense of incongruity when they see the humorous and expressive NAO speak intellectual words, and the principal admires this.
Verbal communication is more preferable for human–robot interaction than communication mediated by a computer. However, speech recognition as an interface is far from ideal. Thus, at least for now, humans require a visualization of the robot’s “knowledge” or “brain.” In addition, such a relationship is consistent with the human’s role with Tsukkomi. NAO is not remote controlled, such as by a PC. In this sense, a see-through wearable type display with information retrieval from the robot’s brain is preferable to remote control by PC.
3.2 See-Through Stereoscopic 3D Rendering in Space
Twenty-three subjects aged 19 to 21 years participated in the stereoscopic 3D line rendering experiments. Only two were female. If the participant wore glasses, they were asked to wear the see-through display over their glasses.
Figure 4 shows the percentage of participants who recognized a change in the position of the lines. Most participants felt that the lines were placed in the real space, and their positions shifted back and forth around the real cube held in front of their faces. However, a few participants saw the lines as split or shifted in the opposite direction. This was observed when the lines were shifted around the cube at a fixed position in space. Note that the shift motion was controlled by the experimenter rather than the participant. A more detailed investigation of the combination of accommodation and convergence for motion is required.
Figures 5a and b show the percentage of participants who observed the 3D illusion on the projected lines. The “Y-letter” shaped lines were drawn on a vertical plane parallel to the eyes so that there is no parallax in the shape. In addition, since the shape is symmetric, it has particularly few cues for the emergence of the illusion. However, a possible visual illusion might be in the way that the junction of three lines looks raising toward the observer, or oppositely caving.
Figure 5a shows the results when the lines were observed around the real cube, which normally appears in 3D. Most participants felt that the lines were on a plane. However, as the lines moved around the cube, a few participants observed an illusion.
Figure 5b shows the results when the lines were overlaid on three edges of the real cube. The percentage of participants who observed the raising illusion increased significantly. Furthermore, as the lines shifted back and forth, the illusions decreased. In addition, it is intriguing that the rate of the cave-shaped illusion increased when the lines were observed on the real cube.
In test 1, the back and forth motion of the lines was generated by moving only the window of the right image of the lines to the left, while the left window was fixed. Even such asymmetric manipulation allows perception of motion in the depth direction. Only a few participants commented about the sense of asymmetric shift. Then, in test 2, the participants were asked if they could observe the right (for left window shift) or left (right shift) component of each shift with and without the cube.
Figure 6 shows the percentage of participants who observed right or left displacement of the lines. Even when both sides of the windows were moved simultaneously, 20-40 % of the participants indicated that they could observe right and left displacement separately (e.g., a zigzag-like motion) or the lines appeared split. Note that a few participants claimed visual fatigue after this test. Thus, frequent and quick manipulation of the convergence might be visually uncomfortable.
4 Conclusion
Human–robot paired dialog was performed using both a dialog list and dynamic retrieval of information from a topic map server. A binocular-type see-through wearable display, i.e., the EPSON MOVERIO BT-200, was used to monitor the information retrieval.
The stereoscopic 3D display was used to allow simultaneous observation of the robot and the retrieved information. The distance between the side-by-side images was changed to control the convergence and allow the image to be observed at an arbitrary depth in the view of field.
References
Vernon, D., von Hofsten, C., Fadiga, L.: A Roadmap for Cognitive Development in Humanoid Robots, Cognitive Systems Monographs, vol. 11. Springer-Verlag, Berlin Heidelberg (2010)
Chin, K-Y., Wu, C-H., Hong, Z-W.: A Humanoid Robot as a Teaching assistant for primary education. In: Fifth International Conference on Genetic and Evolutionary Computing, pp. 21–24 (2011)
Brown, L-V., Howard, A.M.: Engaging children in math education using a socially interactive humanoid robot. In: 13th IEEE-RAS International Conference on Humanoid Robots, pp. 183–188 (2013)
Yorita, A., Botzheim, J., Kubota, N.: Self-efficacy using fuzzy control for long-term communication in robot-assisted language learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5708–5715 (2013)
Chin, K.-Y., Hong, Z.-W., Chen, Y.-L.: Impact of using an educational robot-based learning system on students’ motivation in elementary education. IEEE Trans. Learn. Tech. 7(4), 333–345 (2014)
Jo, D., Lee, J.-g., Lee, K.C.: Empirical analysis of changes in human creativity in people who work with humanoid robots and their avatars. In: Zaphiris, P., Ioannou, A. (eds.) LCT 2014, Part I. LNCS, vol. 8523, pp. 273–281. Springer, Heidelberg (2014)
Miskam, M.A., Shamsuddin, S., Yussof, H., Omar, A.R., Muda, M.Z.: Programming platform for NAO robot in cognitive interaction applications. In: IEEE International Symposium on Robotics and Manufacturing Automation, pp. 141–146 (2014)
Diaz, M., Nuno, N., Saez-Pons, J., Pardo, D.E., Angulo, C.: Building up child-robot relationship for therapeutic purposes. In: From Initial Attraction Towards Long-term Social Engagement, pp. 927–932 (2011)
Miskam, M.A., Shamsuddin, S., Abdul Samat, M.R, Yussof, H., Ainudin, H.A., Omar, A.R.: Humanoid robot NAO as a teaching tool of emotion recognition for children with autism using the android app. In: International Symposium on Micro-Nano Mechatronics and Human Science, pp. 1–5 (2014)
Bingbing, L.L., Chen, I-M., Goh, T.J., Sung, M.: Interactive robots as social partner for communication care. In: IEEE International Conference on Robotics and Automation, pp. 2231–2236 (2014)
Addo, I.D., Ahamed, S.I.: Applying affective feedback to reinforcement learning in ZOEI, a comic humanoid robot. In: IEEE International Symposium on Robot and Human Interactive Communication, pp. 423–428 (2014)
Nijholt, A.: Humor techniques: from real world and game environments to smart environments. In: Streitz, N., Markopoulos, P. (eds.) DAPI 2015. LNCS, vol. 9189, pp. 659–670. Springer, Heidelberg (2015)
Hayashi, K., Kanda, T., Miyashita, T., Ishiguro, H., Hagita, N.: Robot Manzai–Robots’ conversation as a passive social medium. In: IEEE-RAS International Conference on Humanoid Robots, pp. 456–462 (2005)
Tsuchida, S., Yumoto, N., Matsuura, S.: Development of augmented reality teaching materials with projection mapping on real experimental settings. In: Stephanidis, C. (ed.) HCI 2014, Part II. CCIS, vol. 435, pp. 177–182. Springer, Heidelberg (2014)
ISO/IEC 13250:2002
Pepper, S.: The TAO of Topic Maps. http://www.ontopia.net/topicmaps/materials/tao.html#d0e632
“Everyday Physics on Web”. http://tm.u-gakugei.ac.jp/epw/
Matsuura, S., Naito, M.: Subject-centric Computing Fourth International Conference on Topic Maps Research and Applications, TMRA 2008, Leipziger Beiträge zur Informatik:Band XII pp. 247–260 (2008)
“Ontopia”. http://www.ontopia.net/
“MOVERIO Apps Market”. https://moverio.epson.com/jsp/pc/pc_application_list.jsp
Yonemura, T.: A personal correspondence (2016)
Acknowledgments
This study was funded in part by a Grant-in-Aid for Scientific Research (C) 15K00912 from the Ministry of Education, Culture, Sports, Science and Technology, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Matsuura, S. (2016). Use of See-Through Wearable Display as an Interface for a Humanoid Robot. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Interaction Techniques and Environments. UAHCI 2016. Lecture Notes in Computer Science(), vol 9738. Springer, Cham. https://doi.org/10.1007/978-3-319-40244-4_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-40244-4_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40243-7
Online ISBN: 978-3-319-40244-4
eBook Packages: Computer ScienceComputer Science (R0)