Abstract
Video frame interpolation is a computer vision technique used to synthesize intermediate frames between two subsequent frames. This technique has been extensively used for the purpose of video upsampling, video compression and video rendering. We present here an unexplored application of frame interpolation, by using it to join different phoneme videos in order to generate speech videos. Such videos can be used for the purpose of speech entrainment, as well as help to create lip reading video exercises. We propose an end-to-end convolutional neural network employing a U-net architecture that learns optical flows and generates intermediate frames between two different phoneme videos. The quality of the model is evaluated against qualitative measures like the Structural Similarity Index (SSIM) and the peak signal-to-noise ratio (PSNR), and performs favorably well, with an SSIM score of 0.870, and a PSNR score of 33.844.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fridriksson J, Hubbard HI, Hudspeth SG, Holland AL, Bonilha L, Fromm D, Rorden C (2012) Speech entrainment enables patients with Broca’s Aphasia to produce fluent speech. Brain J Neurol
Zhai J, Yu K, Li J, Li S (2005) A low complexity motion compensated frame interpolation method. In: Proceedings of IEEE international symposium on circuits and systems 2005
Ha T, Lee S, Kim J (2004) Motion compensated frame interpolation by new block-based motion estimation algorithm. IEEE Trans Consum Electron
Guo D, Lu Z (2016) The grid: motion-compensated frame interpolation with weighted motion estimation and hierarchical vector refinement. Neurocomputing
Liu Z, Yeh R, Tang X, Liu Y, Agarwala A (2017) Video frame synthesis using deep voxel flow. In: ECCV 2017
Sharma A, Menda K, Koren M (2017) Convolutional neural networks for video frame interpolation. Neurocomputing
Jiang H, Sun D, Jampani V, Yang MH, Miller EL, Kautz J (2017) Super SloMo: high quality estimation of multiple intermediate frames for video. Interpolation. arXiv:1712.00080v1 [cs.CV] 2017
Niklaus S, Mai L, Liu F (2017) Video frame interpolation via adaptive convolution. In: CVPR 2017
Long G, Kneip L, Alvarez JM, Li H, Zhang X, Yu Q (2016) Learning image matching by simply watching video. In: ECCV 2016
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. arXiv:1505.04597v1 [cs.CV] 2015
Yahia HB, Frame interpolation using convolutional neural networks on 2D animation. MA thesis, University of Amsterdam, Amsterdam, The Netherland
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res
Zhao H, Gallo O, Frosio I, Kautz J (2016) Loss functions for neural networks for image processing. IEEE Trans Comput Imaging
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process
Horé A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: International conference on pattern recognition (ICPR) 2010
Sharma A, Menda K, Koren M (2017) Frame interpolation using generative adversarial networks. Neurocomputing
Amersfoort J, Shi W, Acosta A, Massa F, Totz J, Wang Z, Caballero J (2017) Frame interpolation with multi-scale deep loss functions and generative adversarial networks. arXiv:1711.06045v1 [cs.CV] 2017
Fridriksson J, Speech entrainment treatment for Broca’s Aphasia. University of South Carolina at Columbia, Columbia, SC, USA. http://grantome.com/grant/NIH/R21-DC014170-01A1
Guilliams I, Segui A (1988) Interactive videodisc for teaching and evaluating lipreading. Eng Med Biol Soc
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Agarwal, S., Saxena, V., Singal, V., Aggarwal, S. (2021). Deep Learning-Based Computer Aided Customization of Speech Therapy. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_36
Download citation
DOI: https://doi.org/10.1007/978-981-16-3067-5_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3066-8
Online ISBN: 978-981-16-3067-5
eBook Packages: Computer ScienceComputer Science (R0)