Cicero - Towards a Multimodal Virtual Audience Platform for Public Speaking Training

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8108)


Public speaking performances are not only characterized by the presentation of the content, but also by the presenters’ nonverbal behavior, such as gestures, tone of voice, vocal variety, and facial expressions. Within this work, we seek to identify automatic nonverbal behavior descriptors that correlate with expert-assessments of behaviors characteristic of good and bad public speaking performances. We present a novel multimodal corpus recorded with a virtual audience public speaking training platform. Lastly, we utilize the behavior descriptors to automatically approximate the overall assessment of the performance using support vector regression in a speaker-independent experiment and yield promising results approaching human performance.


Virtual Reality Behavioral Modification Multimodal Perception Public Speaking Training 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9), 341–345 (2001)Google Scholar
  2. 2.
    Drugman, T., Abeer, A.: Joint robust voicing detection and pitch estimation based on residual harmonics. In: Proceedings of Interspeech 2011, pp. 1973–1976. ISCA (2011)Google Scholar
  3. 3.
    Harris, S.R., Kemmerling, R.L., North, M.M.: Brief virtual reality therapy for public speaking anxiety. Cyberpsychology and Behavior 5, 543–550 (2002)CrossRefGoogle Scholar
  4. 4.
    Hofmann, S.G., DiBartolo, P.M.: An instrument to assess self-statements during public speaking: Scale development and preliminary psychometric properties. Journal of Behavior Therapy, 499–515 (2000)Google Scholar
  5. 5.
    Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23, 67–72 (1975)CrossRefGoogle Scholar
  6. 6.
    Kane, J., Scherer, S., Morency, L.-P., Gobl, C.: A comparative study of glottal open quotient estimation techniques. To Appear in Proceedings of Interspeech 2013. ISCA (2013)Google Scholar
  7. 7.
    Kenny, P., Hartholt, A., Gratch, J., Swartout, W., Traum, D., Marsella, S., Piepol, D.: Building interactive virtual humans for training environments. In: Proceedings of I/ITSEC (2007)Google Scholar
  8. 8.
    Koppensteiner, M., Grammer, K.: Motion patterns in political speech and their influence on personality ratings. Journal of Research in Personality 44, 374–379 (2010)CrossRefGoogle Scholar
  9. 9.
    McCroskey, J.C.: Measures of communication-bound anxiety. Speech Monographs 37, 269–277 (1970)CrossRefGoogle Scholar
  10. 10.
    North, M.M., North, S.M., Coble, J.R.: Virtual reality therapy: An effective treatment for the fear of public speaking. International Journal of Virtual Reality 3, 2–6 (1998)Google Scholar
  11. 11.
    Pertaub, D.P., Slater, M., Barker, C.: An experiment on public speaking anxiety in response to three different types of virtual audience. Presence: Teleoperators and Virtual Environments 11, 68–78 (2002)CrossRefGoogle Scholar
  12. 12.
    Rammstedt, B., John, O.P.: Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German. Journal of Research in Personality 41, 203–212 (2007)CrossRefGoogle Scholar
  13. 13.
    Scherer, S., Layher, G., Kane, J., Neumann, H., Campbell, N.: An audiovisual political speech analysis incorporating eye-tracking and perception data. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 1114–1120. ELRA (2012)Google Scholar
  14. 14.
    Scherer, S., Marsella, S., Stratou, G., Xu, Y., Morbini, F., Egan, A., Rizzo, A(S.), Morency, L.-P.: Perception markup language: Towards a standardized representation of perceived nonverbal behaviors. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS (LNAI), vol. 7502, pp. 455–463. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Scherer, S., Stratou, G., Mahmoud, M., Boberg, J., Gratch, J., Rizzo, A., Morency, L.-P.: Automatic behavior descriptors for psychological disorder analysis. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition. IEEE (2013)Google Scholar
  16. 16.
    Shapiro, A.: Building a character animation system. In: Allbeck, J.M., Faloutsos, P. (eds.) MIG 2011. LNCS, vol. 7060, pp. 98–109. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Strangert, E., Gustafson, J.: What makes a good speaker? Subject ratings, acoustic measurements and perceptual evaluations. In: Proceedings of Interspeech 2008, pp. 1688–1691. ISCA (2008)Google Scholar
  18. 18.
    Talkin, D.: A Robust Algorithm for Pitch Tracking. In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, pp. 495–517. Elsevier (1995)Google Scholar
  19. 19.
    Thompson, E.R.: Development and validation of an internationally reliable short-form of the positive and negative affect schedule (panas). Journal of Cross-Cultural Psychology 38(2), 227–242 (2007)CrossRefGoogle Scholar
  20. 20.
    Wagner, J., Lingenfelser, F., Bee, N., André, E.: Social signal interpretation (ssi). In: KI - Kuenstliche Intelligenz, vol. 25, pp. 251–256 (2011), doi:10.1007/s13218-011-0115-xGoogle Scholar
  21. 21.
    Witmer, B.G., Singer, M.J.: Measuring presence in virtual environments: A presence questionnaire. Presence 7(3), 225–240 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.FBK-IRSTTrentoItaly
  2. 2.Institute for Creative TechnologiesUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations