[HUGE]: Universal Architecture for Statistically Based HUman GEsturing

  • Karlo Smid
  • Goranka Zoric
  • Igor S. Pandzic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4133)


We introduce a universal architecture for statistically based HUman GEsturing (HUGE) system, for producing and using statistical models for facial gestures based on any kind of inducement. As inducement we consider any kind of signal that occurs in parallel to the production of gestures in human behaviour and that may have a statistical correlation with the occurrence of gestures, e.g. text that is spoken, audio signal of speech, bio signals etc. The correlation between the inducement signal and the gestures is used to first build the statistical model of gestures based on a training corpus consisting of sequences of gestures and corresponding inducement data sequences. In the runtime phase, the raw, previously unknown inducement data is used to trigger (induce) the real time gestures of the agent based on the previously constructed statistical model. We present the general architecture and implementation issues of our system, and further clarify it through two case studies. We believe that this universal architecture is useful for experimenting with various kinds of potential inducement signals and their features and exploring the correlation of such signals or features with the gesturing behaviour.


Speech Signal Application Program Interface Inducement State Training Corpus Facial Animation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Smid, K., Radman, V., Pandzic, I.: Automatic Content Production for an Autonomous Speaker Agent. In: Nakano, Y.I., Nishida, T. (eds.) Conversational Informatics for Supporting Social Intelligence and Interaction: Situational and Environmental Information Enforcing Involvement in Conversation, Hatfield: AISB, The Society for the Study of Artificial Intelligence and the Simulation of Behaviour, pp. 103–113 (2005)Google Scholar
  2. 2.
    Zoric, G., Smid, K., Pandzic, I.: Automatic facial gesturing for conversational agents and avatars. In: Tarumi, H., Li, Y., Yoshida, T. (eds.) Proceedings of the 2005 International Conference on Active Media Technology (AMT 2005), pp. 505–510. IEEE, Piscataway (2005)CrossRefGoogle Scholar
  3. 3.
    Albrecht, I., Haber, J., Seidel, H.: Automatic Generation of Non-Verbal Facial Expressions from Speech. In: Proc. Computer Graphics International 2002 (CGI 2002), pp. 283–293 (July 2002)Google Scholar
  4. 4.
    Poggi, I., Pelachaud, C.: Signals and meanings of gaze in Animated Faces. In: McKevitt, P., O’ Nuallàin, S., Mulvihill, C. (eds.) Language,Vision, and Music, pp. 133–144. John Benjamins, Amsterdam (2002)Google Scholar
  5. 5.
    Lee, S.P., Badler, J.B., Badler, N.I.: Eyes Alive. In: Proceedings of the 29th annual conference on Computer graphics and interactive techniques 2002, San Antonio, Texas, USA, pp. 637–644. ACM Press, New York (2002)Google Scholar
  6. 6.
    Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douvillle, B., Prevost, S., Stone, M.: Animated Conversation: Rule-based Generation of Facial Expressions, Jesture & Spoken Intonation for Multiple Conversational Agents. In: Proceedings of SIGGAPH 1994 (1994)Google Scholar
  7. 7.
    Cassell, J., Vilhjálmsson, H., Bickmore, T.: BEAT: the Behavior Expression Animation Toolkit. In: Fiume, E. (ed.) Proceedings of SIGGRAPH 2001. Computer Graphics Proceedings. Annual Conference Series, pp. 477–486. ACM Press / ACM SIGGRAPH, New York (2001)CrossRefGoogle Scholar
  8. 8.
    Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.: Visual Prosody: Facial Movements Accompanying Speech. In: Proceedings of AFGR 2002, pp. 381–386 (2002)Google Scholar
  9. 9.
    Cao, Y., Tien, W.C., Faloutsos, P., Pighin, F.: Expressive speech-driven facial animation. ACM Trans. Graph. 24(4), 1283–1302 (2005)CrossRefGoogle Scholar
  10. 10.
    Gutierrez-Osuna, R., Kakumanu, P., Esposito, A., Garcia, O.N., Bojorquez, A., Castillo, J., Rudomin, I.: Speech-driven Facial Animation with Realistic Dynamics. IEEE Trans. on Mutlimedia 7(1), 33–42 (2005)CrossRefGoogle Scholar
  11. 11.
    Granström, B., House, D.: Audiovisual representation of prosody in expressive speech communication. Speech Communication 46, 473–484 (2005)CrossRefGoogle Scholar
  12. 12.
    Brand, M.: Voice Puppetry. In: Proceedings of SIGGRAPH 1999 (1999)Google Scholar
  13. 13.
    Zorić, G., Pandžić, I.S.: A Real-time Lip Sync System Using a Genetic Algorithm for Automatic Neural Network Configuration. In: Proceedings of the IEEE International Conference on Multimedia & Expo ICME, Amsterdam, The Netherlands (July 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Karlo Smid
    • 1
  • Goranka Zoric
    • 2
  • Igor S. Pandzic
    • 2
  1. 1.Ericsson Nikola TeslaZagreb
  2. 2.Faculty of electrical engineering and computingZagreb UniversityZagreb

Personalised recommendations