Abstract
Predictive coding of video has traditionally used information from previous video frames to help construct an estimate of the current frame. The difference between the original and estimated signal can then be transmitted to allow the receiver to fully reconstruct the original video frame. In this paper, we explore a new algorithm for use in coding the shape of a person’s lips in a head-and-shoulder video sequence. This algorithm uses the same predictive coding loop, but instead of forming an estimate of the lip image using motion compensation and previous video frames, it forms an estimate from the associated acoustic data. Since the acoustic data is also transmitted, the receiver is able to reconstruct the video with very little side information. In this paper, we will describe our predictive coding system and analyze methods for converting from the acoustic data to visual estimates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
REFERENCES
Aizawa, K., Harashima, H., Saito, T., “Model-based synthesis image coding (MBASIC) system for a person’s face,” Signal Processing: Image Communication, volume 1, number 2, pages 139–152, Oct. 1989.
Rao, R. and Mersereau, R., “State-Embedded Deformable Templates,” to appear in ICIP ’95, Washington, D.C., 1995.
Chen, T., Graf, H. P., and Wang, K., “Speech-assisted video processing: Interpolation and low-bitrate coding,” 28th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, October 1994.
Lavagetto F.,“Converting speech into lip movements: A multimedia telephone for hard of hearing people,” IEEE Transactions on Rehabilitation Engineering, Vol. 3, No. 1, March 1995.
Morishima, S., Aizawa, K., and Harashima, H., “An intelligent facial image coding driven by speech and phonemes,” ICASSP ’89, Glasgow, UK, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Plenum Press, New York
About this chapter
Cite this chapter
Rao, R.R., Chen, T. (1996). Cross-Modal Predictive Coding for Talking Head Sequences. In: Wang, Y., Panwar, S., Kim, SP., Bertoni, H.L. (eds) Multimedia Communications and Video Coding. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0403-6_37
Download citation
DOI: https://doi.org/10.1007/978-1-4613-0403-6_37
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-8036-8
Online ISBN: 978-1-4613-0403-6
eBook Packages: Springer Book Archive