Keywords

1 Introduction

Our research goal was not to introduce yet another new candidate into the world of Siri, Cortana, Google Now and various other Voice Agents. Instead our attempt was to design a very natural and humanlike Voice Assistant from scratch. With modulations in language styles, speech parameters, and visuals it is possible to impart quasi-human characteristics in new versions of voice assistant. Earlier research reported in this area includes explorations of the personality for a robotic assistant for television [1], robot for giving advice [2], including emotions in a conversational agent [3]. Beyond all the existing work in this area, our contributions lies in incorporating the user’s perspective as an important element in formulating a personality for the voice avatar, creating dialogues and expressions for the voice agents. We decomposed the design problem into four, (i) Communication Style, (ii) Personality, (iii) Speech/Dialogue, (iv) Appearance and Non-verbal gestures. Contextual interviews, survey questionnaires and participatory design were conducted to get an insight into the problem from user’s view point. We studied human personal assistants to understand their communication style and behavior with their boss. Then we used a personality elicitation survey to formulate the attributes of personality desired by the users. This followed by the study of linguistic, speech and characteristics of user created dialogue library for a voice assistant across different scenarios. Users’ direct perception regarding these were taken into account through a co-creation activity. Overall scope of research has been illustrated in Fig. 1.

Fig. 1.
figure 1

Illustration of the scope of research

2 Defining the Communication Style

In the first phase, our interview sessions with seven professional assistants were focused mainly to identify various personality traits of the assistants, understand how they handle various situations and observe explicit cues that are exhibited during conversation. We formulated the social behavior of the assistants through our observations on emotional, behavioral and functional aspects. Outcome from the assistant interviews has been summarized in Fig. 2.

Fig. 2.
figure 2

Illustration of outcome from Assistant Interviews

We also observed the typical verbal expressions such as, intonations and exclamations and non-verbal expressions such as, gaze aversion, facial expression, head tilt and nod, on the assistants. In terms of the speech constructs, assistants used intonations and discourse markers to provide feedback on its level of understanding while listening to others but these are not to be repeated very frequently. While speaking, assistant provide feedback on its confidence level of the response through use of intonations and discourse markers.

3 Decoding the Personality

Our next objective was to formulate an appropriate and user desirable personality for the voice assistant. The definition of voice assistant personality in our exploration consisted of 9 attributes considering various aspects of personality traits. We derived these from various prior work [4, 5] as well as included few additional attributes relevant to the case of Voice Assistant like, Spontaneous, Inventiveness and Similarity based on our insights from the earlier personal assistant interviews. Next we performed the Personality Elicitation session through a brief questionnaire, a face to face survey discussion session with 30 participants to understand the importance and desirability of each of those attributes for a voice assistant. Our intent was to quantify, how much of each of these traits was desirable to a user. The evolved personality based on the 9 attributes is illustrated in Fig. 3. Interesting facts emerged out of the personality analysis; such as, users preferred their Voice Assistant to Control them to some extent, but they would prefer not be fully Dependent on it. Therefore, perhaps the users wish to see it as a Companion than as an Assistant.

Fig. 3.
figure 3

Evolved personality from elicitation activity

4 Developing Speech and Dialogue

In the next phase we conducted crowdsourcing and co-creation sessions with 32 participants which included language enthusiasts and avid readers of literature in order to create natural dialogues for the voice assistant in various situations. The intent of this experiment was to extract the underlying common patterns in responses from the users. The participants were given a set of user queries and corresponding responses from an existing voice assistant application covering various usage scenarios. They were asked to frame the response of the voice assistant as natural and in the way they would prefer to hear. Participants could specify speech variables like pitch, speech rate and stress along with their language responses. Figure 4 lists all the goal parameters used for the co-creation activity.

Fig. 4.
figure 4

Goals parameters for the co-creation activity

5 Creating Appearance and Gestures

As the next step, few of these raw dialogues from the participants in this earlier session were given to 6 animation designers with the intent of generating the visuals of the gestural aspects. This helped us formulate various gestural cues like smiles, frowns, eyebrow shapes, head nods, head positions, body postures, etc. [5].

6 Conclusions

The design guidelines, dialogues and personality definitions evolved from our research is being used as a framework for creating natural voice assistant avatar. Going forward, we plan to extend this research in exploring the notion of Voice as a Companion, beyond being an Assistant.