Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

  • Nicolas d’Alessandro
  • Joëlle Tilmanne
  • Maria Astrinaki
  • Thomas Hueber
  • Rasmus Dall
  • Thierry Ravet
  • Alexis Moinet
  • Huseyin Cakmak
  • Onur Babacan
  • Adela Barbulescu
  • Valentin Parfait
  • Victor Huguenin
  • Emine Sümeyye Kalaycı
  • Qiong Hu
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 425)


This paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool.


Statistical Modelling Hidden Markov Models Motion Capture Speech Singing Laughter Realtime Systems Mapping 


  1. 1.
    Mori, M.: The Uncanny Valley. Energy 7(4), 33–35 (1970)Google Scholar
  2. 2.
    Mori, M.: The Uncanny Valley (K. F. MacDorman & N. Kageki, Trans.). IEEE Robotics & Automation Magazine 19(2), 98–100 (2012)CrossRefGoogle Scholar
  3. 3.
    Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers Inc. (1997)Google Scholar
  4. 4.
    Raux, A., Black, A.W.: A Unit Selection Approach to F0 Modelling and its Applications to Emphasis. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 700–705 (December 2003)Google Scholar
  5. 5.
    Lindemann, E.: Music Synthesis with Reconstructive Phrase Modelling. IEEE Signal Processing Magazine 24(2), 80–91 (2007)CrossRefGoogle Scholar
  6. 6.
    Fechteler, P., Eisert, P., Rurainsky, J.: Fast and High Resolution 3D Face Scanning. In: IEEE International Conference on Image Processing, vol. 3, pp. 81–84 (2007)Google Scholar
  7. 7.
    Menache, A.: Understanding Motion Capture for Computer Animation and Video Games. Morgan Kauffman Publishers Inc. (2000)Google Scholar
  8. 8.
    d’Alessandro, N.: Realtime and Accurate Musical Control of Expression in Voice Synthesis. PhD defence at the University of Mons (November 2009)Google Scholar
  9. 9.
    Maestre, E., Blaauw, M., Bonada, J., Guaus, E., Perez, A.: Statistical Modelling of Bowing Control Applied to Violin Sound Synthesis. IEEE Transactions on Audio, Speech, and Language Processing 18(4), 855–871 (2010)CrossRefGoogle Scholar
  10. 10.
    Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), vol. 3, pp. 1315–1318 (2000)Google Scholar
  11. 11.
    Dutreve, L., Meyer, A., Bouakaz, S.: Feature Points Based Facial Animation Retargeting. In: Proceedings of the 2008 ACM Symposium on Virtual Reality Software and Technology, pp. 197–200 (2008)Google Scholar
  12. 12.
    Hunt, A., Wanderley, M., Paradis, M.: The Importance of Parameter Mapping in Electronic Instrument Design. Journal of New Music Research 32(4), 429–440 (2003)CrossRefGoogle Scholar
  13. 13.
    Tokuda, K., Oura, K., Hashimoto, K., Shiota, S., Takaki, S., Zen, H., Yamagishi, J., Toda, T., Nose, T., Sako, S., Black, A.W.: HMM-based Speech Synthesis System (HTS),
  14. 14.
    Tilmanne, J., Moinet, A., Dutoit, T.: Stylistic Gait Synthesis Based on Hidden Markov Models. Eurasip Journal on Advances in Signal Processing 2012(1,72) (2012)Google Scholar
  15. 15.
    Urbain, J., Cakmak, H., Dutoit, T.: Evaluation of HMM-Based Laughter Synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 7835–7839 (2013)Google Scholar
  16. 16.
    Astrinaki, M., d’Alessandro, N., Picart, B., Drugman, T., Dutoit, T.: Reactive and Continuous Control of HMM-based Speech Synthesis. In: IEEE Workshop on Spoken Language Technology (December 2012)Google Scholar
  17. 17.
    Astrinaki, M., Moinet, A., Yamagishi, J., Richmond, K., Ling, Z.-H., King, S., Dutoit, T.: Mage - Reactive Articulatory Feature Control of HMM-Based Parametric Speech Synthesis. In: Proceedings of the 8th ISCA Speech Synthesis Workshop, SSW 8 (September 2013)Google Scholar
  18. 18.
    Hueber, T., Bailly, G., Denby, B.: Continuous Articulatory-to-Acoustic Mapping using Phone-Based Trajectory HMM for a Silent Speech Interface. In: Proceedings of Interspeech, ISCA (2012)Google Scholar
  19. 19.
    Kay, S.M.: Fundamentals of Statistical Signal Processing: Detection Theory, vol. 2. Prentice Hall PTR (1998)Google Scholar
  20. 20.
    Stylianou, Y., Cappé, O., Moulines, E.: Continuous Probabilistic Transform for Voice Conversion. IEEE Transactions on Speech and Audio Processing 6(12), 131–142 (1998)CrossRefGoogle Scholar
  21. 21.
    Kain, A.B.: High Resolution Voice Transformation. PhD Thesis, Rockford College (2001)Google Scholar
  22. 22.
    Toda, T., Black, A.W., Tokuda, K.: Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Transactions on Audio, Speech, and Language Processing 15(8), 2222–2235 (2007)CrossRefGoogle Scholar
  23. 23.
    Astrinaki, M., Moinet, A., Wilfart, G., d’Alessandro, N., Dutoit, T.: Mage Platform for Performative Speech Synthesis.,
  24. 24.
    Kominek, J., Black, A.W.: CMU Arctic Databases for Speech Synthesis. Tech. Rep., Language Technologies Institute, School of Computer Science, Carnegie Mellon University (2003)Google Scholar
  25. 25.
    Imai, S., Sumita, K., Furuichi, C.: Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis. Electronics and Communications in Japan, Part I 66(2), 10–18 (1983)CrossRefGoogle Scholar
  26. 26.
    Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech Synthesis Based on Hidden Markov Models. In: Proceedings of IEEE, vol. 101(5) (2013)Google Scholar
  27. 27.
    Sundberg, J.: The Science of Singing Voice. PhD Thesis, Illinois University Press (1987)Google Scholar
  28. 28.
    Titze, I.R.: Nonlinear Source-Filter Coupling in Phonation: Theory. J. Acoust. Soc. Am. 123, 2733–2749 (2008)CrossRefGoogle Scholar
  29. 29.
    Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds. In: Proceedings of ICASSP (2013)Google Scholar
  30. 30.
    Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A Quantitative Comparison of Glottal Closure Instant Estimation Algorithms on a Large Variety of Singing Sounds. In: Proceedings of ICASSP (2013)Google Scholar
  31. 31.
    Tilmanne, J., Ravet, T.: The Mockey Database,
  32. 32.
    IGS-190, Animazoo website,
  33. 33.
    Baumann, T., Schlangen, D.: Recent Advances in Incremental Spoken Language Processing. In: Interspeech 2013 Tutorial 1 (2013)Google Scholar
  34. 34.
    Oura, K.: An Example of Context-Dependent Label Format for HMM-Based Speech Synthesis in English. In: HTS-demo_CMU-ARCTIC-SLT (2011),
  35. 35.
    Urbain, J., Cakmak, H., Dutoit, T.: Evaluation of HMM-based Laughter Synthesis. In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2013)Google Scholar
  36. 36.
    Urbain, J., Cakmak, H., Dutoit, T.: Automatic Phonetic Transcription of Laughter and its Application to Laughter Synthesis. In: Proceedings of the 5th Biannual Humaine Association Conference on Affective Computing and Intelligent Interaction (2013)Google Scholar
  37. 37.
    Kawahara, H.: Straight, Exploitation of the Other Aspect of Vocoder: Perceptually Isomorphic Decomposition of Speech Sounds. Acoustical Science and Technology 27(6) (2006)Google Scholar
  38. 38.
    Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic Plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proceedings of Interspeech (2009)Google Scholar
  39. 39.
    Tilmanne, J., Dutoit, T.: Continuous Control of Style and Style Transitions through Linear Interpolation in Hidden Markov Model Based Walk Synthesis. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science XVI. LNCS, vol. 7380, pp. 34–54. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  40. 40.
    Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., Van Gool, L.: Acquisition of a 3D Audio-Visual Corpus of Affective Speech. IEEE Transactions on Multimedia 12(6), 591–598 (2010)CrossRefGoogle Scholar
  41. 41.
    Bailly, G., Govokhina, O., Elisei, F., Breton, G.: Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models. EURASIP Journal on Audio, Speech, and Music Processing 2009(5) (2009)Google Scholar
  42. 42.
    Barbulescu, A., Hueber, T., Bailly, G., Ronfard, R.: Audio-Visual Speaker Conversion Using Prosody Features. In: International Conference on Auditory-Visual Speech Processing (2013)Google Scholar
  43. 43.
  44. 44.
  45. 45.
    University of Cambridge, The Hidden Markov Model Toolkit (HTK),
  46. 46.
    Puckette, M.: Pure Data,
  47. 47.
    Lieberman, Z., Watson, T., Castro, A., et al.: openFrameworks,
  48. 48.
    Astrinaki, M., Moinet, A., d´Alessandro, N., Dutoit, T.: Pure Data External for Reactive HMM-based Speech and Singing Synthesis. In: Proceedings of the 16th International Conference on Digital Audio Effects, DAFx 2013 (September 2013)Google Scholar
  49. 49.
    Astrinaki, M., d’Alessandro, N., Reboursiere, L., Moinet, A., Dutoit, T.: Mage 2.0: New Features and its Application in the Development of A Talking Guitar. In: Proceedings of the 13th International Conference on New Interfaces for Musical Expression, NIME 2013 (May 2013)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  • Nicolas d’Alessandro
    • 1
  • Joëlle Tilmanne
    • 1
  • Maria Astrinaki
    • 1
  • Thomas Hueber
    • 2
  • Rasmus Dall
    • 3
  • Thierry Ravet
    • 1
  • Alexis Moinet
    • 1
  • Huseyin Cakmak
    • 1
  • Onur Babacan
    • 1
  • Adela Barbulescu
    • 2
  • Valentin Parfait
    • 1
  • Victor Huguenin
    • 1
  • Emine Sümeyye Kalaycı
    • 1
  • Qiong Hu
    • 3
  1. 1.Numediart Institute for New Media Art TechnologyUniversity of MonsBelgium
  2. 2.GIPSA-labUMR 5216/CNRS/INP/UJF/Stendhal UniversityGrenobleFrance
  3. 3.Centre for Speech Technology ResearchUniversity of EdinburghScotlandUK

Personalised recommendations