Journal on Multimodal User Interfaces

, Volume 1, Issue 1, pp 31–39 | Cite as

Realtime and accurate musical control of expression in singing synthesis

  • Nicolas D’Alessandro
  • Pascale Woodruff
  • Yohann Fabre
  • Thierry Dutoit
  • Sylvain Le Beux
  • Boris Doval
  • Christophe d’Alessandro

Abstract

In this paper, we describe a full computer-based musical instrument allowing realtime synthesis of expressive singing voice. The expression results from the continuous action of an interpreter through a gestural control interface. In this context, expressive features of voice are discussed. New real-time implementations of a spectral model of glottal flow (CALM) are described. These interactive modules are then used to identify and quantify voice quality dimensions. Experiments are conducted in order to develop a first framework for voice quality control. The representation of vocal tract and the control of several vocal tract movements are explained and a solution is proposed and integrated. Finally, some typical controllers are connected to the system and expressivity is evaluated.

Keywords

Singing voice Voice synthesis Voice quality Glottal flow models Gestural control Interfaces 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

10. References

  1. [1]
    http://www.loquendo.com/. 31Google Scholar
  2. [2]
    C. d’Alessandro, N. D’Alessandro, S. L. Beux, J. Simko, F. Cetin, and H. Pirker, “The Speech Conductor: Gestural Control of Speech Synthesis”, inProceedings of eNTERFACE’05 Summer Workshop on Multimodal Interfaces, 2005. 31, 32, 33, 37Google Scholar
  3. [3]
    M. Kob, “Singing Voice Modelling As We Know It Today”,Acta Acustica United with Acustica, vol. 90, pp. 649–661, 2004. 31Google Scholar
  4. [4]
    http://www.virsyn.de/. 31Google Scholar
  5. [5]
    http://www.vocaloid.com/. 31Google Scholar
  6. [6]
    X. Rodet and G. Bennet, “Synthesis of the Singing Voice”,Current Directories in Computer Music Research, 1989. 31Google Scholar
  7. [7]
    X. Rodet, “Synthesis and Processing of the Singing Voice”, inProceeding of the First IEEE Benelux Workshop on Model-Based Processing and Coding of Audio (MPCA-2002), (Leuven, Belgium), 2002. 31Google Scholar
  8. [8]
    P. Cook,Identification of Control Parameters in an Articulatory Vocal Tract Model, with Applications to the Synthesis of Singing. Ph.d. thesis, Standford University, 1990. 31Google Scholar
  9. [9]
    J. Moorer, “The Use of the Phase Vocoder in Computer Music Application”,Journal of the Audio Engineering Society, vol. 26, no. 1–2, pp. 42–45, 1978. 32Google Scholar
  10. [10]
    J. Laroche, Y. Stylianou, and E. Moulines, “HNS: Speech Modifications Based on a Harmonic plus Noise Model”, inProceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 550–553, 1993. 32CrossRefGoogle Scholar
  11. [11]
    M. Macon, L. Jensen-Link, J. Oliviero, M. Clements, and E. George, “A Singing Voice Synthesis System Based on Sinusoidal Modeling”, inProceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 435–438, 1997. 32Google Scholar
  12. [12]
    K. Lomax,The Analysis and the Synthesis of the Singing Voice. Ph.d. thesis, Oxford University, 1997. 32Google Scholar
  13. [13]
    Y. Meron,High Quality Singing Synthesis Using the Selection-Based Synthesis Scheme. Ph.d. thesis, University of Michigan, 2001. 32Google Scholar
  14. [14]
    P. Cano, A. Loscos, J. Bonada, M. de Boer, and X. Serra, “Voice Morphing System for Impersonating in Karaoke Applications”, inProceedings of the International Computer Music Conference, 2000. 32Google Scholar
  15. [15]
    B. Doval, C. d’Alessandro, and N. Henrich, “The spectrum of glottal flow models”,Acta Acustica, vol. 92, pp. 1026–1046, 2006. 32Google Scholar
  16. [16]
    L. Kessous, “A two-handed controller with angular fundamental frequency control and sound color navigation”, inProceedings of the 2002 Conference on New Interfaces for Musical Expression (NIME-02), 2002. 32Google Scholar
  17. [17]
    G. Fant, J. Liljencrants, and Q. Lin, “A four-parameter model of glottal flow”,STL-QPSR, vol. 4, pp. 1–13, 1985. 32Google Scholar
  18. [18]
    B. Doval and C. d’Alessandro, “The voice source as a causal/anticausal linear filter”, inproc. Voqual’03, Voice Quality: Functions, analysis and synthesis, ISCA workshop, (Geneva, Switzerland), Aug. 2003. 32, 33Google Scholar
  19. [19]
    B. Larson, “Music and Singing Synthesis Equipment (MUSSE)”,Speech Transmission Laboratory Quarterly Progress and Statut Report (STL-QPSR), pp. (1/1977):38–40, 1977. 32Google Scholar
  20. [20]
    P. Cook, “SPASM: a Real-Time Vocal Tract Physical Model Editor/Controller and Singer: the Companion Software System”, inColloque sur les Modèles Physiques dans l’Analyse, la Production et la Création Sonore, 1990. 32Google Scholar
  21. [21]
    J. O. Smith, “Waveguide Filter Tutorial”, inProceedings of the International Computer Music Conference, pp. 9–16, 1987. 32Google Scholar
  22. [22]
    V. Välimäki and M. Karjalainen, “Improving the Kelly-Lochbaum Vocal Tract Model Using Conical Tubes Sections and Fractionnal Delay Filtering Techniques”, inProceedings of the International Conference on Spoken Language Processing, 1994. 32Google Scholar
  23. [23]
    X. Rodet, “Time-Domain Formant Wave Function Synthesis”, vol. 8, no. 3, pp. 9–14, 1984. 32Google Scholar
  24. [24]
    X. Rodet and J. Barriere, “The CHANT Project: From the Synthesis of the Singing Voice to Synthesis in General”,Computer Music Journal, vol. 8, no. 3, pp. 15–31, 1984. 32CrossRefGoogle Scholar
  25. [25]
    N. Henrich,Etude de la source glottique en voix parlée et chantée. Ph.d. thesis, Université Paris 6, France, 2001. 32, 34Google Scholar
  26. [26]
    G. Fant,Acoustic theory of speech production. Mouton, La Hague, 1960. 32Google Scholar
  27. [27]
    D. Klatt and L. Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers”,J. Acous. Soc. Am., vol. 87, no. 2, pp. 820–857, 1990. 32, 34CrossRefGoogle Scholar
  28. [28]
    R. Veldhuis, “A Computationally Efficient Alternative for the Liljencrants-Fant Model and its Perceptual Evaluation”,J. Acous. Soc. Am., vol. 103, pp. 566–571, 1998. 32CrossRefGoogle Scholar
  29. [29]
    A. Rosenberg, “Effect of Glottal Pulse Shape on the Quality of Natural Vowels”,J. Acous. Soc. Am., vol. 49, pp. 583–590, 1971. 32CrossRefGoogle Scholar
  30. [30]
    G. Fant, “The LF-Model Revisited. Transformations and Frequency Domain Analysis”,STL-QPSR, 1995. 32Google Scholar
  31. [31]
    B. Bozkurt,Zeros of the Z-Transform (ZZT) Representation and Chirp Group Delay Processing for the Analysis of Source and Filter Characteristics of Speech Signals. PhD thesis, Faculté Polytechnique de Mons, 2004. 33Google Scholar
  32. [32]
    N. D’Alessandro, C. d’Alessandro, S. Le Beux, and B. Doval, “Realtime CALM Synthesizer, New Approaches in Hands-Controlled Voice Synthesis”, inNIME’06, 6th international conference on New Interfaces for Musical Expression, (IRCAM, Paris, France), pp. 266–271, 2006. 33, 34, 37Google Scholar
  33. [33]
    D. Zicarelli, G. Taylor, J. Clayton, jhno, and R. Dudas,Max 4.3 Reference Manual. Cycling’74 / Ircam, 1993–2004. 33Google Scholar
  34. [34]
    D. Zicarelli, G. Taylor, J. Clayton, jhno, and R. Dudas,MSP 4.3 Reference Manual. Cycling’74 / Ircam, 1997–2004. 33Google Scholar
  35. [35]
    M. Puckette,Pd Documentation. 2006. http://puredata.info. 33Google Scholar
  36. [36]
    C. d’Alessandro, N. D’Alessandro, S. L. Beux, and B. Doval, “Comparing Time-Domain and Spectral-Domain Voice Source Models for Gesture Controlled Vocal Instruments”, inProc. of the 5th International Conference on Voice Physiology and Biomechanics, 2006. 34, 37Google Scholar
  37. [37]
    R. Schulman, “Articulatory dynamics of loud and normal speech”,J. Acous. Soc. Am., vol. 85, no. 1, pp. 295–312, 1989. 34CrossRefGoogle Scholar
  38. [38]
    H. M. Hanson,Glottal characteristics of female speakers. Ph.d. thesis, Harvard University, 1995. 34Google Scholar
  39. [39]
    H. M. Hanson, “Glottal characteristics of female speakers: Acoustic correlates”,J. Acous. Soc. Am., vol. 101, pp. 466–481, 1997. 34CrossRefGoogle Scholar
  40. [40]
    H. M. Hanson and E. S. Chuang, “Glottal characteristics of male speakers: Acoustic correlates and comparison with female data”,J. Acous. Soc. Am., vol. 106, no. 2, pp. 1064–1077, 1999. 34CrossRefGoogle Scholar
  41. [41]
    M. Castellengo, B. Roubeau, and C. Valette, “Study of the acoustical phenomena characteristic of the transition between chest voice and falsetto”, inProc. SMAC 83, vol.1, (Stockholm, Sweden), pp. 113–23, July 1983. 34Google Scholar
  42. [42]
    P. Alku and E. Vilkman, “A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers”,Folia Phoniatr., vol. 48, pp. 240–54, 1996. 34CrossRefGoogle Scholar
  43. [43]
    H. Traunmüller and A. Eriksson, “Acoustic effects of variation in vocal effort by men, women, and children”,J. Acous. Soc. Am., vol. 107, no. 6, pp. 3438–51, 2000. 34CrossRefGoogle Scholar
  44. [44]
    N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of non-pathological phonation”,J. Acous. Soc. Am., vol. 115, pp. 1321–1332, Mar. 2004. 34CrossRefGoogle Scholar
  45. [45]
    N. Henrich, C. d’Alessandro, M. Castellengo, and B. Doval, “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency”,J. Acous. Soc. Am., vol. 117, pp. 1417–1430, Mar. 2005. 34CrossRefGoogle Scholar
  46. [46]
    N. Henrich, G. Sundin, D. Ambroise, C. d’Alessandro, M. Castellengo, and B. Doval, “Just noticeable differences of open quotient and asymmetry coefficient in singing voice”,Journal of Voice, vol. 17, no. 4, pp. 481–494, 2003. 35CrossRefGoogle Scholar
  47. [47]
    N. Henrich, “Mirroring the voice from garcia to the present day: Some insights into singing voice registers”,Logopedics Phoniatrics Vocology, vol. 31, pp. 3–14, 2006. 35CrossRefGoogle Scholar
  48. [48]
    G. Bloothooft, M. van Wijck, and P. Pabon, “Relations between Vocal Registers in Voice Breaks”, inProceedings of Eurospeech, 2001. 35Google Scholar
  49. [49]
    J. D. Markel and A. H. Gray,Linear prediction of speech. Springer-Verlag, Berlin, 1976. 36MATHGoogle Scholar
  50. [50]
    B. Story, “Physical modeling of voice and voice quality”, inproc. Voqual’03, Voice Quality: Functions, analysis and synthesis, ISCA workshop, (Geneva, Switzerland), Aug. 2003. 36Google Scholar
  51. [51]
    G. Carlsson and J. Sundberg, “Formant frequency tuning in singing”,J. Voice, vol. 6, no. 3, pp. 256–60, 1992. 36CrossRefGoogle Scholar
  52. [52]
    http://www.vrealities.com/P5.html. 37Google Scholar

Copyright information

© OpenInterface Association 2007

Authors and Affiliations

  • Nicolas D’Alessandro
    • 1
  • Pascale Woodruff
    • 1
  • Yohann Fabre
    • 1
  • Thierry Dutoit
    • 1
  • Sylvain Le Beux
    • 2
  • Boris Doval
    • 2
  • Christophe d’Alessandro
    • 2
  1. 1.TCTS LabFaculté Polytechnique de MonsBelgium
  2. 2.LIMSI-CNRSUniversité Paris XIOrsayFrance

Personalised recommendations