Control of Avatar’s Facial Expression Using Fundamental Frequency in Multi-user Voice Chat System
An automatic facial expression control algorithm of CG avatar based on the fundamental frequency of the user’s utterance is proposed, in order to facilitate the multi-party casual chat in a multi-user virtual-space voice chat system. The proposed method utilizes the common tendency of the voice fundamental frequency that reflects the emotional activity, especially the strength of the delight. This study simplified the facial expression control problem by limiting the expression in the strength of the delight, because it appears the expression of the delight is the most important to facilitate the casual chat. The problem of using the fundamental frequency is that fundamental frequency varies with intonation as well as emotion; hence the use of the raw fundamental frequency changes the expression of the avatar passionately. Therefore, Emotional Point by emotional Activity (EPa) was defined as the moving-average of the normalized fundamental frequency, to suppress the influence of the intonation. The strength of the delight of the avatar facial expression was linearly controlled using EPa, based on the Facial Action Coding System (FACS). The duration of the moving average was chosen as five seconds experimentally. However, the moving average delays the avatar behavior, and the delay is more serious especially in the response utterance. Therefore, to compensate the delay of the response, the Emotional Point by Response (EPr), was defined using the initial voice volume of the response utterance. EPr was calculated for only the response utterance, which means the utterance just after another user’s utterance. The ratio of EPr to EPa was decided experimentally as one to one. The proposed automatic avatar facial expression control algorithm was implemented on the previously developed virtual-space multi-user voice chat system. The subjective evaluation was performed in ten subjects. The each subject in separate room was required to chat with an experimental partner using the system for four minutes and to answer four questions using Likert scale. Throughout the experiments, the subjects reported better impression of the automatic control of facial expression according to the utterances. The facial control using both EPa and EPr demonstrated better performance in terms of naturalness, favorability, familiarity and interactivity, compared to the fixed facial expression, the automatic control using EPa alone and the EPr alone conditions.