Singing Like a Tenor without a Real Voice

  • Jochen Feitsch
  • Marco Strobel
  • Christian Geiger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8253)


We describe a multimedia installation that provides users with the experience to sing like a tenor from the early 20th century. The user defines vowels with her mouth but does not produce sound. The mouth shape is recognized and tracked by a depth-sensing camera and synthesized using a dedicated sound analysis using formants. Arm gestures are recognized and used to determine pitch and volume of an artificially generated voice. This synthesized voice is additionally modified by acoustic filters to sound like a singing voice from an old gramophone. The installation allows to scan the user’s face and to create an individual 3D model of a tenor character that is used to visualize the user performance.


Face Tracking Mouth Shape Body Tracking Sound Synthesis Singing Voice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Butler, A., Izadi, S., Hilliges, O., Molyneaux, D., Hodges, S., Kim, D.: Shake’n’Sense: Reducing Interference for Overlapping Structured Light Depth Cameras. In: ACM SIGCHI Conference on Human Factors in Computing Systems (2012)Google Scholar
  2. 2.
    Cheng, J., Huang, P.: Real-Time Mouth Tracking and 3D reconstruction. In: Int. Conf. on Image and Signal Processing (2010)Google Scholar
  3. 3.
    Falk, D.: Finding our Tongues: Mothers, Infants, and the Origins of Language. Basic Books (2009)Google Scholar
  4. 4.
    Hapipis, A., Miranda, E.R.: Artificial Singing with a webcam mouth-controller. In: Int Conf. on Sound and Music Computing (2005)Google Scholar
  5. 5.
    Lyons, M., Haehnel, M., Tetsutani, N.: Designing, Playing, and Performing with a Vision-based Mouth Interface. In: Proc. of NIME 2003 (2003)Google Scholar
  6. 6.
    Miller, G.: The mating mind, how sexual choice shaped the evolution of human nature. Anchor (2001)Google Scholar
  7. 7.
    Odowichuk, G., Trail, S., Driessen, P., Nie, W., Page, W.: Sensor fusion: Towards a fully expressive 3D music control interface. In: Pacific Rim Conference on Communications, Computers and Signal Processing (2011)Google Scholar
  8. 8.
    Peterson, G.E., Barney, H.L.: Control methods used in a study of the vowels. J. of the Acoustical Society of America 24, 183 (1952)Google Scholar
  9. 9.
    Riedmiller, M., Braun, H.: Rprop - A Fast Adaptive Learning Algorithm. In: Proc. of the International Symposium on Computer and Information Science VII (1992)Google Scholar
  10. 10.
    Rodet, X.: Synthesis and Processing of the Singing Voice. In: 1st IEEE Workshop on Model Based Processing and Coding of Audio (2002)Google Scholar
  11. 11.
    de Silva, C., Smyth, T., Lyons, M.J.: A novel face-tracking mouth controller and its application to interacting with bioacoustic models. In: Proc. of NIME (2004)Google Scholar
  12. 12.
    Sundberg, J.: The KTH Synthesis of Singing. J. on Advances in Cognitive Psychology 2(2-3) (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Jochen Feitsch
    • 1
  • Marco Strobel
    • 1
  • Christian Geiger
    • 1
  1. 1.Department of MediaUniversity of Applied Sciences DüsseldorfGermany

Personalised recommendations