Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

  • Kang Liu
  • Joern Ostermann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5702)


Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in CIF format without loss of naturalness and fulfill the need of talking head for Internet applications.


Cluster Algorithm Link Weight Test Sentence Selection Frequency Facial Animation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Theobald, B., Fagel, S., Bailly, G., Elisei, F.: LIPS2008: Visual Speech Synthesis Challenge. In: Proc. Interspeech 2008, Brisbane, Australia, September 2008, pp. 2310–2313 (2008)Google Scholar
  2. 2.
    Liu, K., Ostermann, J.: Realistic Facial Animation System for Interactive Services. In: Proc. Interspeech 2008, Brisbane, Australia, September 2008, pp. 2330–2333 (2008)Google Scholar
  3. 3.
    LIPS 2008: Visual Speech Synthesis Challenge (2008),
  4. 4.
    Liu, K., Ostermann, J.: Realistic Talking Head for Human-Car-Entertainment Services. In: Proc. IMA 2008 Informationssysteme fuer mobile Anwendungen, Braunschweig, Germany, September 2008, pp. 108–118 (2008)Google Scholar
  5. 5.
    Weissenfeld, A., Urfalioglu, O., Liu, K., Ostermann, J.: Robust Rigid Head Motion Estimation based on Differential Evolution. In: IEEE Proc. ICME 2006, Toronto, Canada, July 2006, pp. 225–228 (2006)Google Scholar
  6. 6.
    Jolliffe, I.: Principal Component Analysis. Springer, New York (1989)Google Scholar
  7. 7.
    Prim, R.C.: Shortest connection networks and some generalizations. Bell System Technical Journal 36, 1389–1401 (1957)Google Scholar
  8. 8.
    Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Commun. COM-28, 84–95 (1980)CrossRefGoogle Scholar
  9. 9.
    Morris, O.J., Lee, M.J., Constantinides, A.G.: Graph theory for image analysis: an approach based on the shortest spanning tree. In: Proc. Inst. Electr. Eng., vol. 133, pp. 146–152 (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kang Liu
    • 1
  • Joern Ostermann
    • 1
  1. 1.Institut für InformationsverarbeitungLeibniz Universität HannoverHannoverGermany

Personalised recommendations