Introduction and Motivation
For many years the prototypical interactive system has been the personal computer (PC) that offers a screen, a keyboard and a mouse as an interface for the human–machine interaction. The human face-to-face communication on the other hand involves spoken language, gestures and facial expressions on the transmitting side, and listening as well as lip- and expression reading on the perceiving side. With the advances in computation, and the consequent pervasion of computer systems in our daily life, an interest has arisen in designing more ‘natural’ interfaces to the computer. An obvious approach is to develop interactive systems that mirror human–human interaction (Krämer and Bente 2002). Such interfaces should recognize spoken language, gestures and facial expressions. Ideally, the system would also respond in a similar way, making a body or at least a face necessary. This approach has led, for example, to the developement of so-called Embodied Conversational Agents (ECA), aiming to eliminate the need of learning special strategies for human–computer interaction (Xiao et al. 2002).