Abstract
In this paper, we propose a novel nonparametric regression model to generate virtual humans from still images for the applications of next generation environments (NG). This model automatically synthesizes deformed shapes of characters by using kernel regression with elliptic radial basis functions (ERBFs) and locally weighted regression (LOESS). Kernel regression with ERBFs is used for representing the deformed character shapes and creating lively animated talking faces. For preserving patterns within the shapes, LOESS is applied to fit the details with local control. The results show that our method effectively simulates plausible movements for character animation, including body movement simulation, novel views synthesis, and expressive facial animation synchronized with input speech. Therefore, the proposed model is especially suitable for intelligent multimedia applications in virtual humans generation.
Similar content being viewed by others
References
Alexa M, Cohen-Or D, Levin D (2000) As-rigid-as-possible shape interpolation. In SIGGRAPH ’00 157–164
Arad N, Dyn N, Reisfeld D, Yeshurun Y (1994) Image warping by radial basis functions: applications to facial expressions. CVGIP Graph Models Image Process 56(2):161–172
Baker S, Scharstein D, Lewis JP, Roth S, Black MJ, Szeliski R (2007) A database and evaluation methodology for optical flow. In IEEE International Conference on Computer Vision 1–8
Blanz V, Basso C, Poggio T, Vetter T (2003) Reanimating faces in images and video. Comput Graph Forum 22(3):641–650
Botsch M, Sorkine O (2008) On linear variational surface deformation methods. IEEE Trans Vis Comput Graph 14(1):213–230
Brand M (1999) Voice puppetry. In SIGGRAPH ’99 21–28
Bruce HT, Calder P (1995) Animating direct manipulation interfaces. In the 8th ACM Symposium on User Interface Software and Technology 3–12
Busso C, Deng Z, Grimm M, Neumann U, Narayanan SS (2007) Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Trans Audio Speech Lang Process 15(8):1075–1086
Busso C, Narayanan SS (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio Speech Lang Process 15(8):2331–2347
Chan TF (2001) Active contours without edges. IEEE Trans Image Process 10(2):266–277
Chen SE, William L (1993) View interpolation for image synthesis. In SIGGRAPH ’93 279–288
Chuang Y-Y, Goldman DB, Zheng KC, Curless B, Salesin D, Szeliski R (2005) Animating pictures with stochastic motion textures. ACM Trans Graph 24(3):853–860
Deng Z, Neumann U (2006) efase: expressive facial animation synthesis and editing with phoneme-isomap controls. In SIGGRAPH/Eurographics Symposium on Computer Animation 251–260
Ezzat TF, Geiger G, Poggio T (2002) Trainable video realistic speech animation. ACM Trans Graph 21(3):388–398
Forstmann S, Ohya J, Krohn-Grimberghe A, McDougall R (2007) Deformation styles for spline-based skeletal animation. In SIGGRAPH/Eurographics Symposium on Computer Animation 141–150
Fu T, Foroosh H (2004) Expression morphing from distant viewpoints. In International Conference on Image Processing 3519–3522
Glocker B, Paragios N, Komodakis K, Tziritas G, Navab N (2008) Optical flow estimation with uncertainties through dynamic MRFs. In IEEE Conference on Computer Vision and Pattern Recognition
Goldstein E, Gotsman C (1995) Polygon morphing using a multiresolution representation. In Graphics Interface ’95 247–254
Herbrich R (2002) Learning kernel classifiers theory and algorithms. The MIT Press
Hornung A, Dekkers E, Kobbelt L (2007) Character animation from 2D pictures and 3D motion data. ACM Transaction on Graphics 26(1) Article No. 1
Igarashi T, Moscovich T, Hughes JF (2005) As-rigid-as-possible shape manipulation. ACM Trans Graph 24(3):1134–1141
Jang Y, Botchen RP, Lauser A, Ebert DS, Gaither KP, Ertl T (2006) Enhancing the interactive visualization of procedurally encoded multifield data with ellipsoidal basis functions. Comput Graph Forum 25(3):587–596
Lempitsky L, Roth S, Rother C (2008) FusionFlow: discrete-continuous optimization for optical flow estimation. In IEEE Conference on Computer Vision and Pattern Recognition
Li Y, Huttenlocher D (2008) Learning for optical flow using stochastic optimization. In the 10th European Conference on Computer Vision 2:379–391
Litwinowicz P, Williams L (1994) Animating images with drawings. In SIGGRAPH ’94 409–412
Mahajan D, Huang F-C, Matusik W, Ramamoorthi R, Belhumeur P (2009) Moving gradients: a path-based method for plausible image interpolation. ACM Transaction on Graphics 28(3) Article No. 42
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 746–748
Montgomery DC, Peck EA, Vining GG (2006) Introduction to linear regression analysis. Wiley
Mukundan R, Ong SH, Lee PA (2001) Image analysis by tchebichef moments. IEEE Trans Image Process 10(9):1357–1364
Ngo T, Cutrell D, Dan J, Donald B, Loeb L, Zhu S (2000) Accessible animation and customizable graphics via simplicial configuration modeling. In SIGGRAPH ’00 403–410
Park J, Sandberg WI (1993) Nonlinear approximations using elliptic basis function networks. In 32nd Conference on Decision and Control 3700–3705
Rabiner LR (1990) A tutorial on hidden markov models and selected applications in speech recognition. Readings in speech recognition 267–296
Ranjan V, Fournier A (1996) Matching and interpolation of shapes using unions of circles. Comput Graph Forum 15(3):129–142
Ren X (2008) Local grouping for optical flow. In IEEE Conference on Computer Vision and Pattern Recognition
Rother C, Kolmogorov V, Blake A (2004) “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
Ruprecht D, Müller H (1995) Image warping with scattered data interpolation. IEEE Comput Graph Appl 15(2):37–43
Schaefer S, Mcphail T, Warren J (2006) Image deformation using moving least squares. ACM Trans Graph 25(3):533–540
Sederberg T, Greenwood E (1992) A physically based approach to 2D shape blending. In SIGGRAPH ’92 25–34
Seitz SM, Dyer CR (1996) View morphing. In SIGGRAPH ’96 21–30
Sethian JA (1996) Level set methods. Cambridge University Press
Sethian JA (1999) Level set methods and fast marching methods: evolving interfaces in computational geometry, fluid mechanics, computer vision, and materials science. Cambridge University Press
Sun D, Roth S, Lewis JP, Black MJ (2008) Learning optical flow. In the 10th European Conference on Computer Vision 3:83–97
Trobin W, Pock T, Cremers D, Bischof H (2008) Continuous energy minimization via repeated binary fusion. In the 10th European Conference on Computer Vision 4:677–690
Vedula S, Baker S, Kanade T (2005) Image-based spatio-temporal modeling and view interpolation of dynamic events. ACM Trans Graph 24(2):240–261
Vorobyov SA, Cichocki A (2001) Hyper radial basis function neural networks for interference cancellation with nonlinear processing of reference signal. Digit Signal Process 11(3):204–221
Wang Y, Xu K, Xiong Y, Cheng Z-Q (2008) 2D shape deformation based on rigid square matching. Computer Animation and Virtual Worlds 19(3–4):411–420
Weber O, Ben-Chen M, Gotsman C (2009) Complex barycentric coordinates with applications to planar shape deformation. Comput Graph Forum 28(2):587–397
Wolberg G (1998) Image morphing: a survey. Vis Comput 14(8):360–372
Xu L, Chen J, Jia J (2008) Segmentation based variational model for accurate optical flow estimation. In the 10th European Conference on Computer Vision 1:671–684
Yan H-B, Hu S-M, Martin RR, Yang Y-L (2008) Shape deformation using a skeleton to drive simplex transformations. IEEE Trans Vis Comput Graph 14(3):693–706
Yotsukura T, Morishima S, Nakamura S (2003) Model-based talking face synthesis for anthropomorphic spoken dialog agent system. In the 11th ACM International Conference on Multimedia 351–354
Acknowledgements
This work is supported partially by the National Science Council, Republic of China, under grant NSC 98-2221-E-009-123-MY3. We would like to thank Prof. Sang-Soo Yeo and reviewers for their helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix 1 Hyper radial basis functions (HRBFs)
Appendix 1 Hyper radial basis functions (HRBFs)
HRBF is computed by using the Mahalanobis distance, which is defined in the matrix form as follows:
where \( \sigma_N^2 \) should be the covariance of the multidimensional Gaussians rather than the single variance. HRBF differs from a standard RBF insofar each axis of the input space \( \chi \subseteq \ell_2^N \) (the space of square summable sequences of length N) has a separate smoothing parameter, i.e., a separate scale onto which the differences on this axis are viewed. It is worth mentioning that RBF kernels map the input space onto the surface of an infinite dimensional hyperspace. Note that N = 2 in arbitrary directional ERBF kernel represents the analysis of data distribution along the major axis and the minor axis in an ellipse. Along the orientation of arbitrary directional ERBF (the major axis and the minor axis), (1) is constructed.
Rights and permissions
About this article
Cite this article
Chou, YF., Shih, ZC. A nonparametric regression model for virtual humans generation. Multimed Tools Appl 47, 163–187 (2010). https://doi.org/10.1007/s11042-009-0412-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0412-7