RAMCESS 2.X framework—expressive voice analysis for realtime and accurate synthesis of singing

  • Nicolas d‘Alessandro
  • Onur Babacan
  • Baris Bozkurt
  • Thomas Dubuisson
  • Andre Holzapfel
  • Loic Kessous
  • Alexis Moinet
  • Maxime Vlieghe
Original Paper


In this paper we present the work that has been achieved in the context of the second version of the Ramcess singing synthesis framework. The main improvement of this study is the integration of new algorithms for expressive voice analysis, especially the separation of the glottal source and the vocal tract. Realtime synthesis modules have also been refined. These elements have been integrated in an existing digital instrument: the HandSketch 1.x, a bi-manual controller. Moreover this digital instrument is compared to existing systems.


Speech processing Glottal source estimation Realtime singing synthesis Digital instrument design 


  1. 1.
    Bonada J, Serra X (2007) Synthesis of the singing voice by performance sampling and spectral models. IEEE Signal Process 24(2):67–79 CrossRefGoogle Scholar
  2. 2.
    Kawahara H (1999) Restructuring speech representations using a pitch-adaptative time-frequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds. Speech Commun 27:187–207 CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580 CrossRefGoogle Scholar
  5. 5.
    Bozkurt B (2005) New spectral methods for the analysis of source/filter characteristics of speech signals. PhD thesis, Faculté Polytechnique de Mons Google Scholar
  6. 6.
    Henrich N (2001) Etude de la source glottique en voix parlée et chantée: modélisation et estimation, mesures acoustiques et electroglottographiques, perception. PhD thesis, Université de Paris VI Google Scholar
  7. 7.
    Doval B, d’Alessandro C, Henrich N (2006) The spectrum of glottal flow models. Acta Acustica 92:1026–1046 Google Scholar
  8. 8.
    Doval B, d’Alessandro C (2003) The voice source as a causal/anticausal linear filter. In: Proceedings of Voqual’03, voice quality: functions, analysis and synthesis, ISCA workshop Google Scholar
  9. 9.
    Sundberg J (1974) Articulatory interpretation of the singing formant. J Acoust Soc Am 55:838–844 CrossRefGoogle Scholar
  10. 10.
    Boite R, Bourlard H, Dutoit T, Hancq J, Leich H (2000) Traitement de la parole Google Scholar
  11. 11.
  12. 12.
    Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Commun 49(3):159–176 CrossRefGoogle Scholar
  13. 13.
    Dubuisson T, Dutoit T (2007) Improvement of source-tract decomposition of speech using analogy with LF model for glottal source and tube model for vocal tract. In: Proceedings of models and analysis of vocal emissions for biomedical application workshop, pp 119–122 Google Scholar
  14. 14.
    Edelman A, Murakami H (1995) Polynomial roots from companion matrix eigenvalues. Math Comput 64(210):763–776 zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Bozkurt B, Doval B, d’Alessandro C, Dutoit T (2005) Zeros of the Z-transform representation with application to source-filter separation in speech. IEEE Signal Process Lett 12(4):344–347 CrossRefGoogle Scholar
  16. 16.
    Fant G, Liljencrants J, Lin Q (1985) A four-parameter model of glottal flow. STL-QPSR 4:1–13 Google Scholar
  17. 17.
    Fant G (1960) Acoustic theory of speech production. Mouton and Co, Netherlands Google Scholar
  18. 18.
    Vincent D, Rosec O, Chonavel T (2005) Estimation of LF glottal source parameters based on ARX model. In: Proceedings of Interspeech, Lisbonne, pp 333–336 Google Scholar
  19. 19.
    Vincent D, Rosec O, Chonavel T (2007) A new method for speech synthesis and transformation based on an ARX-LF source-filter decomposition and HNM modeling. In: Proceedings of ICASSP, Honolulu, pp 525–528 Google Scholar
  20. 20.
  21. 21.
  22. 22.
    d’Alessandro N, Dutoit T (2007) HandSketch bi-manual controller. In: Proceedings of NIME, pp 78–81 Google Scholar
  23. 23.
    Schwarz D, Wright M (2000) Extensions and applications of the SDIF sound description interchange format. In: International computer music conference Google Scholar
  24. 24.
    d’Alessandro N, Doval B, Beux SL, Woodruff P, Fabre Y, d’Alessandro C, Dutoit T (2007) Realtime and accurate musical control of expression in singing synthesis. J Multimodal User Interfaces 1(1):31–39 CrossRefGoogle Scholar
  25. 25.
    d’Alessandro N, Dutoit T (2007) RAMCESS/HandSketch: a multi-representation framework for realtime and expressive singing synthesis. In: Proceedings of Interspeech’07, pp TuC. SS–5 Google Scholar
  26. 26.
    Birkholz P, Steiner I, Breuer S (2007) Control concepts for articulatory speech synthesis. In: Proceedings of the 6th ISCA workshop on speech synthesis Google Scholar
  27. 27.
    Berndtsson G, Sundberg J (1993) The MUSSE DIG singing synthesis. In: Proceedings of the Stockholm music acoustics conference, pp 279–281 Google Scholar
  28. 28.
    d’Alessandro N, Dubuisson T, Moinet A, Dutoit T (2007) Causal/anticausal decomposition for mixed-phase description of brass and bowed string sounds. In: Proceedings of international computer music conference, pp 465–468 Google Scholar

Copyright information

© OpenInterface Association 2008

Authors and Affiliations

  • Nicolas d‘Alessandro
    • 1
  • Onur Babacan
    • 2
  • Baris Bozkurt
    • 2
  • Thomas Dubuisson
    • 1
  • Andre Holzapfel
    • 3
  • Loic Kessous
    • 4
  • Alexis Moinet
    • 1
  • Maxime Vlieghe
    • 1
  1. 1.Circuit Theory & Signal Processing LaboratoryFaculté PolytechniqueMonsBelgium
  2. 2.Electrical and Electronics Engineering DptIzmir Institute of TechnologyIzmirTurkey
  3. 3.Computer Science DptUniversity of CreteHeraklionGreece
  4. 4.LIMSI-CNRSUniversité Paris XIParisFrance

Personalised recommendations