The main research platform utilized in the experiment was the expressive EMYS type head [27] coupled with the underlying control system. Capabilities of the platform have undergone major upgrades both in hardware and software aspects. The motivation for this was twofold. Firstly, the robot needed to be adapted to a home environment so that the various mechanical components (cables, speakers, etc.) would not distract the user from the robot’s face during interaction. This was found to be of great importance during previous studies [9]. Secondly, EMYS’ capabilities had to be expanded to provide means of weaving non-repetitive information into the experiment scenario in order to make it more engaging for the study participants.
Robot
EMYS (EMotive headY System) type head used as a basis for creating the hardware platform was developed during FP7 LIREC project as a part of an anthropomorphic FLASH platform [16, 28]. The head consists of three horizontal disks (imitating forehead and chin movements) mounted on a neck. The head also possesses a set of eyeballs that can rotate and pop out. Before the study described in this paper, both robots had been thoroughly verified in a series of studies which proved that they are engaging and widely accepted interaction partners [9]. EMYS’ appearance allows him to bypass the uncanny valley issue while at the same time retaining the ability to recognizably express basic human emotions [27, 36].
When operating as a standalone platform, EMYS usually requires additional components (speakers, Kinect sensor, etc.). Therefore, he required modifications to be able to function properly in home environment. Most importantly, the audio system of the robot was modified—a set of speakers was mounted inside of the bottom disc of the robot. This required the robot’s power supply system to be modified accordingly—a dedicated DC/DC converter was installed along with a 6 W stereo audio amplifier. Additionally, a Kinect sensor was permanently attached to EMYS’ base and the base itself was made more sturdy to increase the stability of the robot during rapid movements.
The master PC controller used during the study was a Mac mini, i7-3615QM, 16GB RAM, 256GB SSD drive. The whole hardware platform can be seen in Fig. 1.
Control System Overview
The control system that EMYS utilized during the study is an open-source software based on Gostai Urbi [19] and compliant with the three-layer architecture paradigm [18].
The lowest layer of the control system consists of modules called UObjects which enable the programmer to access motion controllers, sensors and external software. These modules, written in C++, can be dynamically loaded into Urbi engine, which takes care of their orchestration (e.g. scheduling and parallelization of tasks).
The middle layer of the control architecture is responsible for the implementation of competencies, which define what tasks the robot will be able to perform. The competencies are usually implemented as sets of functions delivered by the low level modules. Urbi delivers a script language called urbiscript that gives the programmer a quick way of creating and synchronizing complex behaviors. The designed control system enables accessing the robot hardware and competencies in a unified manner—using a tree structure called robot. It makes using the API, implemented in the middle layer, more convenient and helps to maintain the modularity of software. Elements of the robot structure have been grouped based on their role, e.g. audio, video, head, emotions, network. The middle layer also contains the competency manager which decides how the specific behaviors should be realized, depending on the current physical setup of the robot (the head can operate as a standalone robot or be combined with a torso, arms and a mobile platform and every combination requires a different set of scripts in the competency manager).
Finally, the highest layer should contain a dedicated decision system responsible for utilizing the behaviors provided by the middle layer with respect to a specific application. This layer can contain anything from a simple remote controller to a complex system simulating human mind.
The overview of the three-layer control system implementation is presented in Fig. 2. Full list of all low-level modules can be found in Table 1. A more complete description of the control system can be found in [26, 28] and in on-line documentation [11].
Competencies
In the middle layer, various low level functions (provided by the modules loaded into the control system) are combined to form competencies. Competencies are then further organized inside the competency manager (a part of the middle layer). From a technical point of view, the manager is comprised of a set of scripts which utilize the existing software substructures of the robot structure (video, audio, head, etc.) in order to implement competencies that are application specific. Therefore, the manager needs to be extended whenever the existing competency set is not sufficient to fulfill tasks required by the current scenario. However, the existing competencies should never be modified unless the physical setup of the robot changes.
Sending an e-mail is an example of a complex function implemented in the competency manager layer. A message is sent when such action is requested by the user via the highest layer of the control system. The user declares whether to include a voice message and a photo. This request is then passed onto the competency manager which uses the Kinect sensor to take a photo using a function from video structure and then uses functions from audio structure to record a voice message and convert it to mp3 format. The last step is to compose the e-mail, add the attachments, and address the message. These steps are realized by various functions from the network structure. The diagram of this competency is presented in Fig. 3.
Table 1 List of all low-level modules (asterisk marks modules that were used in the experiment scenario)
Another, much simpler example of a competency manager function is generating utterances. The user can use the competency manager to customize the way the robot speaks, e.g. when the utilized scenario includes an emotional component. In such a situation, the manager function can be used to retrieve the current emotional state of the robot from one of the emotion simulation modules and then use functions from audio structure to accordingly modify the parameters of the utterance (volume, pitch, rate).
The competency manager allows the programmer to create reusable dialog functions. One such example can be observed when the robot asks the user about the genre of music to be played from the on-line radio. Calling the manager function causes the following operations:
-
utterance is generated to ask the user for the genre,
-
speech recognition system is prepared,
-
file containing the grammar for possible user answers is loaded,
-
speech recognition is activated.
Afterwards, the system waits for the user to utter one of the phrases specified within the loaded grammar file. This particular file defines syntax for representing grammars used by the speech recognition engine. It specifies the words and patterns to be listened for by a speech recognizer. The syntax can be presented in XML form (grxml) [41]. The recognized sentence is then returned by the function. If the user says something that is not allowed by the grammar or is silent for a specified time, the function returns information about the timeout.
Emotions
Emotions are a key element of the design of a social robot. They play a vital role, ensuring a more natural communication and the much needed variation of behaviors [1]. During the study, EMYS’ emotion simulation system was based on WASABI project [2]. It is based on PAD theory where emotions are represented in 3D space, where the dimensions are pleasure, arousal, dominance. The system is enriched with various additional components such as internal dynamics, secondary emotions, etc.
In order for the emotion simulation system to operate properly, individual emotions needed to be placed within the PAD space. These have been modeled as regions contained within spheres with a set origin and radius. The initial configuration was based on the presets provided with WASABI itself. However, since these were tailored towards short-term studies, it was necessary to modify them to achieve a more stable robot behavior in a long-term setting. The final configuration of the emotional space is shown in Table 2.
Table 2 Robot emotions in PAD space
The finite-state machine (FSM) operating in the highest layer generates various impulses influencing the robot’s emotional state. Two parameters are associated with each emotional impulse. First parameter specified how positive or negative a certain stimulant is, e.g. reporting bad weather was related with a negative impulse, while seeing that the user is around generated a positive one. The second parameter directly changed the robot dominance value. It is related with the amount of control that the robot perceives to have over the stimulant causing the emotional impulse. To exemplify this, if the robot could not retrieve something from the Internet due to connectivity problems, the emotional impulse would have low dominance but if the robot failed at recognizing a color that he had learned earlier, the impulse would have had a high dominance component. Selecting a proper value for a dominance impulse (in accordance with the situation) is very important since it enables differentiating between emotions that are similar in the pleasure-arousal subspace, such as sadness/fear and anger.
The emotional state of the robot is not directly utilized in the highest level of the control system. However, the emotion simulation system can be used to enrich robot behaviors by modifying competencies according to emotional state of the robot (e.g. movements can be more rapid when the robot is angry, he can speak more slowly when he is sad, etc.). The emotional system utilizes as stimuli the different events occurring in the highest level of control system (usually generated by the robot’s sensors, operation of modules, and undertaken actions). A diagram of the emotional system is presented in Fig. 4. It is worth noting that since the study was conducted in Polish, UANEW and USentiWordnet modules (based on databases of English words) did not influence the emotion simulation system.
The robot is endowed with a personalized character. This is achieved via a configuration file for the underlying emotion simulation system which contains several parameters controlling e.g. how fast the robot gets bored or how stable the emotional states are. It is worth noting that some configurations may cause an “emotional instability” where every emotional impulse causes the robot to switch uncontrollably between the various emotional states.
Home Companion Scenario
The improvement of EMYS’ competencies allowed to create a complex scenario with the robot operating as a sort of home assistant. During the study, the highest layer of the control system was implemented as a finite-state machine in Gostai Studio (dedicated software for FSM creation). Each FSM state represents a set of scripts realizing a particular robot function (e.g. checking weather, see Fig. 5). Transitions between the various states are triggered mainly based on the user’s instructions. Each recognizable command depends on the currently loaded grammar file and can be composed in a number of ways to enable a more natural conversation. For example, to turn on flamenco music, the user can assemble an instruction from the sections delivered by the grammar file to say both 1—“turn on flamenco” and 2—“flamenco please” as shown in Fig. 6.
From the study participants’ point of view, the main goal of this scenario was to teach the robot to recognize various colors. Learning competency has been developed utilizing the k-nearest neighbors classifier algorithm. Teaching was achieved by showing the robot a uniformly colored sample in the right hand or asking him to look at the clothes on the user’s torso. The user then told the robot what color that particular object/article of clothing is. After the robot was shown a color enough times, he was able to learn it. EMYS could be taught to distinguish up to 21 different colors in total. The user could check EMYS’ knowledge at any stage by playing a similar game where the user could again show an object to the robot but this time it was EMYS who named the color. He then asked the user for a confirmation of his guess.
To stimulate the interaction between the user and EMYS, he was equipped with a wide array of functions that enabled him to assist the user in everyday tasks. EMYS could connect to the Internet and browse various news services, check weather forecasts, TV program, and play back radio streams. The robot could also serve as an alarm clock, send and receive emails with audio and photo attachments, and use Google Calendar to remind the user what he had had planned for the coming days. Finally, he was connected to the Facebook account of the study participant and enabled access to most of the service’s content and functionality. In a sense, EMYS became an interface to the external world by enabling the study participants to use the above mentioned media in a more natural and comfortable manner.
All throughout the interaction, the user could control the robot with gestures. For example any action of the robot such as reading particularly long post or piece of news, could be stopped by making a clicking gesture (seen in Fig. 7). Another example was regulating the volume of the music being played back by raising the right hand, closing the fist and moving it up and down. All of the study participants received a user guide detailing how to interact with EMYS [10], i.e. how to teach the robot, what his additional functions are, how to construct commands, and how to use gestures to control the robot.