Skip to main content
Log in

Realtime and accurate musical control of expression in singing synthesis

  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

In this paper, we describe a full computer-based musical instrument allowing realtime synthesis of expressive singing voice. The expression results from the continuous action of an interpreter through a gestural control interface. In this context, expressive features of voice are discussed. New real-time implementations of a spectral model of glottal flow (CALM) are described. These interactive modules are then used to identify and quantify voice quality dimensions. Experiments are conducted in order to develop a first framework for voice quality control. The representation of vocal tract and the control of several vocal tract movements are explained and a solution is proposed and integrated. Finally, some typical controllers are connected to the system and expressivity is evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

10. References

  1. http://www.loquendo.com/. 31

  2. C. d’Alessandro, N. D’Alessandro, S. L. Beux, J. Simko, F. Cetin, and H. Pirker, “The Speech Conductor: Gestural Control of Speech Synthesis”, inProceedings of eNTERFACE’05 Summer Workshop on Multimodal Interfaces, 2005. 31, 32, 33, 37

  3. M. Kob, “Singing Voice Modelling As We Know It Today”,Acta Acustica United with Acustica, vol. 90, pp. 649–661, 2004. 31

    Google Scholar 

  4. http://www.virsyn.de/. 31

  5. http://www.vocaloid.com/. 31

  6. X. Rodet and G. Bennet, “Synthesis of the Singing Voice”,Current Directories in Computer Music Research, 1989. 31

  7. X. Rodet, “Synthesis and Processing of the Singing Voice”, inProceeding of the First IEEE Benelux Workshop on Model-Based Processing and Coding of Audio (MPCA-2002), (Leuven, Belgium), 2002. 31

  8. P. Cook,Identification of Control Parameters in an Articulatory Vocal Tract Model, with Applications to the Synthesis of Singing. Ph.d. thesis, Standford University, 1990. 31

  9. J. Moorer, “The Use of the Phase Vocoder in Computer Music Application”,Journal of the Audio Engineering Society, vol. 26, no. 1–2, pp. 42–45, 1978. 32

    Google Scholar 

  10. J. Laroche, Y. Stylianou, and E. Moulines, “HNS: Speech Modifications Based on a Harmonic plus Noise Model”, inProceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 550–553, 1993. 32

    Article  Google Scholar 

  11. M. Macon, L. Jensen-Link, J. Oliviero, M. Clements, and E. George, “A Singing Voice Synthesis System Based on Sinusoidal Modeling”, inProceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 435–438, 1997. 32

    Google Scholar 

  12. K. Lomax,The Analysis and the Synthesis of the Singing Voice. Ph.d. thesis, Oxford University, 1997. 32

  13. Y. Meron,High Quality Singing Synthesis Using the Selection-Based Synthesis Scheme. Ph.d. thesis, University of Michigan, 2001. 32

  14. P. Cano, A. Loscos, J. Bonada, M. de Boer, and X. Serra, “Voice Morphing System for Impersonating in Karaoke Applications”, inProceedings of the International Computer Music Conference, 2000. 32

  15. B. Doval, C. d’Alessandro, and N. Henrich, “The spectrum of glottal flow models”,Acta Acustica, vol. 92, pp. 1026–1046, 2006. 32

    Google Scholar 

  16. L. Kessous, “A two-handed controller with angular fundamental frequency control and sound color navigation”, inProceedings of the 2002 Conference on New Interfaces for Musical Expression (NIME-02), 2002. 32

  17. G. Fant, J. Liljencrants, and Q. Lin, “A four-parameter model of glottal flow”,STL-QPSR, vol. 4, pp. 1–13, 1985. 32

    Google Scholar 

  18. B. Doval and C. d’Alessandro, “The voice source as a causal/anticausal linear filter”, inproc. Voqual’03, Voice Quality: Functions, analysis and synthesis, ISCA workshop, (Geneva, Switzerland), Aug. 2003. 32, 33

  19. B. Larson, “Music and Singing Synthesis Equipment (MUSSE)”,Speech Transmission Laboratory Quarterly Progress and Statut Report (STL-QPSR), pp. (1/1977):38–40, 1977. 32

  20. P. Cook, “SPASM: a Real-Time Vocal Tract Physical Model Editor/Controller and Singer: the Companion Software System”, inColloque sur les Modèles Physiques dans l’Analyse, la Production et la Création Sonore, 1990. 32

  21. J. O. Smith, “Waveguide Filter Tutorial”, inProceedings of the International Computer Music Conference, pp. 9–16, 1987. 32

  22. V. Välimäki and M. Karjalainen, “Improving the Kelly-Lochbaum Vocal Tract Model Using Conical Tubes Sections and Fractionnal Delay Filtering Techniques”, inProceedings of the International Conference on Spoken Language Processing, 1994. 32

  23. X. Rodet, “Time-Domain Formant Wave Function Synthesis”, vol. 8, no. 3, pp. 9–14, 1984. 32

    Google Scholar 

  24. X. Rodet and J. Barriere, “The CHANT Project: From the Synthesis of the Singing Voice to Synthesis in General”,Computer Music Journal, vol. 8, no. 3, pp. 15–31, 1984. 32

    Article  Google Scholar 

  25. N. Henrich,Etude de la source glottique en voix parlée et chantée. Ph.d. thesis, Université Paris 6, France, 2001. 32, 34

    Google Scholar 

  26. G. Fant,Acoustic theory of speech production. Mouton, La Hague, 1960. 32

    Google Scholar 

  27. D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”,J. Acous. Soc. Am., vol. 87, no. 2, pp. 820–857, 1990. 32, 34

    Article  Google Scholar 

  28. R. Veldhuis, “A Computationally Efficient Alternative for the Liljencrants-Fant Model and its Perceptual Evaluation”,J. Acous. Soc. Am., vol. 103, pp. 566–571, 1998. 32

    Article  Google Scholar 

  29. A. Rosenberg, “Effect of Glottal Pulse Shape on the Quality of Natural Vowels”,J. Acous. Soc. Am., vol. 49, pp. 583–590, 1971. 32

    Article  Google Scholar 

  30. G. Fant, “The LF-Model Revisited. Transformations and Frequency Domain Analysis”,STL-QPSR, 1995. 32

  31. B. Bozkurt,Zeros of the Z-Transform (ZZT) Representation and Chirp Group Delay Processing for the Analysis of Source and Filter Characteristics of Speech Signals. PhD thesis, Faculté Polytechnique de Mons, 2004. 33

  32. N. D’Alessandro, C. d’Alessandro, S. Le Beux, and B. Doval, “Realtime CALM Synthesizer, New Approaches in Hands-Controlled Voice Synthesis”, inNIME’06, 6th international conference on New Interfaces for Musical Expression, (IRCAM, Paris, France), pp. 266–271, 2006. 33, 34, 37

    Google Scholar 

  33. D. Zicarelli, G. Taylor, J. Clayton, jhno, and R. Dudas,Max 4.3 Reference Manual. Cycling’74 / Ircam, 1993–2004. 33

  34. D. Zicarelli, G. Taylor, J. Clayton, jhno, and R. Dudas,MSP 4.3 Reference Manual. Cycling’74 / Ircam, 1997–2004. 33

  35. M. Puckette,Pd Documentation. 2006. http://puredata.info. 33

  36. C. d’Alessandro, N. D’Alessandro, S. L. Beux, and B. Doval, “Comparing Time-Domain and Spectral-Domain Voice Source Models for Gesture Controlled Vocal Instruments”, inProc. of the 5th International Conference on Voice Physiology and Biomechanics, 2006. 34, 37

  37. R. Schulman, “Articulatory dynamics of loud and normal speech”,J. Acous. Soc. Am., vol. 85, no. 1, pp. 295–312, 1989. 34

    Article  Google Scholar 

  38. H. M. Hanson,Glottal characteristics of female speakers. Ph.d. thesis, Harvard University, 1995. 34

  39. H. M. Hanson, “Glottal characteristics of female speakers: Acoustic correlates”,J. Acous. Soc. Am., vol. 101, pp. 466–481, 1997. 34

    Article  Google Scholar 

  40. H. M. Hanson and E. S. Chuang, “Glottal characteristics of male speakers: Acoustic correlates and comparison with female data”,J. Acous. Soc. Am., vol. 106, no. 2, pp. 1064–1077, 1999. 34

    Article  Google Scholar 

  41. M. Castellengo, B. Roubeau, and C. Valette, “Study of the acoustical phenomena characteristic of the transition between chest voice and falsetto”, inProc. SMAC 83, vol.1, (Stockholm, Sweden), pp. 113–23, July 1983. 34

    Google Scholar 

  42. P. Alku and E. Vilkman, “A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers”,Folia Phoniatr., vol. 48, pp. 240–54, 1996. 34

    Article  Google Scholar 

  43. H. Traunmüller and A. Eriksson, “Acoustic effects of variation in vocal effort by men, women, and children”,J. Acous. Soc. Am., vol. 107, no. 6, pp. 3438–51, 2000. 34

    Article  Google Scholar 

  44. N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of non-pathological phonation”,J. Acous. Soc. Am., vol. 115, pp. 1321–1332, Mar. 2004. 34

    Article  Google Scholar 

  45. N. Henrich, C. d’Alessandro, M. Castellengo, and B. Doval, “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency”,J. Acous. Soc. Am., vol. 117, pp. 1417–1430, Mar. 2005. 34

    Article  Google Scholar 

  46. N. Henrich, G. Sundin, D. Ambroise, C. d’Alessandro, M. Castellengo, and B. Doval, “Just noticeable differences of open quotient and asymmetry coefficient in singing voice”,Journal of Voice, vol. 17, no. 4, pp. 481–494, 2003. 35

    Article  Google Scholar 

  47. N. Henrich, “Mirroring the voice from garcia to the present day: Some insights into singing voice registers”,Logopedics Phoniatrics Vocology, vol. 31, pp. 3–14, 2006. 35

    Article  Google Scholar 

  48. G. Bloothooft, M. van Wijck, and P. Pabon, “Relations between Vocal Registers in Voice Breaks”, inProceedings of Eurospeech, 2001. 35

  49. J. D. Markel and A. H. Gray,Linear prediction of speech. Springer-Verlag, Berlin, 1976. 36

    MATH  Google Scholar 

  50. B. Story, “Physical modeling of voice and voice quality”, inproc. Voqual’03, Voice Quality: Functions, analysis and synthesis, ISCA workshop, (Geneva, Switzerland), Aug. 2003. 36

  51. G. Carlsson and J. Sundberg, “Formant frequency tuning in singing”,J. Voice, vol. 6, no. 3, pp. 256–60, 1992. 36

    Article  Google Scholar 

  52. http://www.vrealities.com/P5.html. 37

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

D’Alessandro, N., Woodruff, P., Fabre, Y. et al. Realtime and accurate musical control of expression in singing synthesis. J Multimodal User Interfaces 1, 31–39 (2007). https://doi.org/10.1007/BF02884430

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02884430

Keywords

Navigation