A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

  • Lei Xie
  • Helen Meng
  • Zhi-Qiang Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.


Visual Speech Facial Animation Translingual Mapping Conversion Scheme Time Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ostermann, J., Weissenfeld, A.: Talking Faces–Technologies and Applications. In: Proc. 17th ICPR (2004)Google Scholar
  2. 2.
    Pighin, F., Hecker, D., Lischinski, R., Szeliski, D.H.: Synthesizing Realistic Facial Expressions from Photographs. Siggraph, 75–84 (1998)Google Scholar
  3. 3.
    Cosatto, E., Ostermann, J.: Lifelike Talking Faces for Interactive Services. Proceedings of IEEE 91(9), 1406–1429 (2003)CrossRefGoogle Scholar
  4. 4.
    Olives, J.-L., Sams, M., Kulju, J., Seppaia, O., Karjalainen, M., Altosaar, T., Lemmetty, S., Toyra, K., Vainio, M.: Towards a High Quality Finnish Talking Head. In: IEEE 3rd Workshop on Multimedia Signal Processing, pp. 433–437 (1999)Google Scholar
  5. 5.
    Pelachaud, C.E., Magno-Caldognetto, Z.C., Cosi, P.: Modelling an Italian Talking Head. In: Proc. Audio-Visual Speech Processing, pp. 72–77 (2001)Google Scholar
  6. 6.
    Wang, J.-Q., Wong, K.-H., Heng, P.-A., Meng, H., Wong, T.-T.: A Real-Time Cantonese Text-To-Audiovisual Speech Synthesizer. In: Proc. ICASSP, pp. 653–656 (2004)Google Scholar
  7. 7.
    Verma, A., Subramaniam, V., Rajput, N., Neti, C.: Animating Expressive Faces Across Languages. IEEE Trans. on Multimedia 6(6), 791–800 (2003)CrossRefGoogle Scholar
  8. 8.
    Xie, L., Liu, Z.-Q.: An Articulatory Appraoch to Video-Realistic Mouth Animation. In: Proc. of ICASSP, pp. 593–596 (2006)Google Scholar
  9. 9.
    Young, S., Evermann, G., Kershaw, D., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2), Cambridge University Engineering Department (2002), http://htk.eng.cam.ac.uk/
  10. 10.
    Linguistic Society of Hong Kong. Cantonese Transcription Scheme (1997)Google Scholar
  11. 11.
    Hui, P.Y., Lo, W.K., Meng, H.: Tow Robust Methods for Cantonese Spoken Document Retrieval. In: Proc. of 2003 ISCA Workshop on Multilingual Spoken Document Retrieval, pp. 7–12 (2003)Google Scholar
  12. 12.
    Xie, L., Liu, Z.-Q.: A Coupled HMM Approach to Video-Realisic Speech Animation. Pattern Recognition (submitted)(2006)Google Scholar
  13. 13.
    Cosatto, E.: Sample-Based Talking-Head Synthesis. Ph.D Thesis of Swiss Federal Institue of Technology (2002)Google Scholar
  14. 14.
    Pérez, P., Gangnet, M., Blake, A.: Poisson Image Editing. Siggraph, 313–318 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lei Xie
    • 1
  • Helen Meng
    • 1
  • Zhi-Qiang Liu
    • 2
  1. 1.Human-Computer Communications Laboratory, Dept. of Systems Engineering & Engineering ManagementThe Chinese University of Hong KongHong Kong
  2. 2.School of Creative MediaCity University of Hong KongHong Kong

Personalised recommendations