Abstract
In this paper, the inversion of a joint Audio-Visual Hidden Markov Model is proposed to estimate the visual information from speech data in a speech driven MPEG-4 compliant facial animation system. The inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. The system performance is evaluated for the cases of full and diagonal covariance matrices. Experimental results show that full covariance matrices are preferable since similar, to the case of using diagonal matrices, performance can be achieved using a less complex model. The experiments are carried out using audio-visual databases compiled by the authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on Hidden Markov Models. Speech Communication 26(1-2), 105–115 (1998)
Rao, R., Chen, T., Mersereau, R.: Audio-to-visual conversion for multimedia communication. IEEE Trans. on Industrial Electronics 45(1), 15–22 (1998)
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)
Brand, M.: Voice puppetry. In: Proceedings of SIGGRAPH, Los Angeles, CA USA, pp. 21–28 (August 1999)
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theories 13, 260–269 (1967)
Choi, K., Luo, Y., Hwang, J.: Hidden Markov Model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. Journal of VLSI Signal Processing 29(1-2), 51–61 (2001)
Moon, S., Hwang, J.: Noisy speech recognition using robust inversion of Hidden Markov Models. In: Proceedings of IEEE International Conf. Acoust., Speech, Signal Processing, pp. 145–148 (1995)
Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P., Garcia, O.: Audio/visual mapping with cross-modal Hidden Markov Models. IEEE Trans. on Multimedia 7(2), 243–252 (2005)
Xie, L., Liu, Z.Q.: A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40, 2325–2340 (2007)
ISO/IEC IS 14496-2, Visual (1999)
Baum, L.E., Sell, G.R.: Growth functions for transformations on manifolds. Pacific Journal of Mathematics 27(2), 211–227 (1968)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Inc., New York (2001)
Gävert, H., Hurri, J., Särelä, J., Hyvärinen, A.: FastICA package for MATLAB. Lab. of Computer and Information Science, Helsinki University of Technology
Terissi, L.D., Gómez, J.C.: Facial motion tracking and animation: An ICA-based approach. In: Proceedings of 15th European Signal Processing Conference, Poznań, Poland, September 3-7, pp. 292–296 (2007)
Ostermann, J.: Face Animation in MPEG-4. In: MPEG-4 Facial Animation - The Standard, Implementation and Applications, pp. 17–56. John Wiley & Sons, Chichester (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Terissi, L.D., Gómez, J.C. (2008). Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-88190-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)