3D facial feature and expression computing from Internet image or video
Large-scale multimedia datasets such as the Internet image and video collections provide new opportunities to understand and analyze human actions, among which one of the most interesting type is facial performance. In this paper, we present an automatic reconstruction system of detailed face performances. Many existing facial performance reconstruction systems rely on data captured under controlled environments with densely spaced cameras and lights. On the contrary, our system reconstructs detailed facial geometry from just one image or a monocular video sequence with unknown lighting. To achieve this, we first simultaneously track 2D and 3D sparse features, then reconstruct the low frequency facial geometry by performing a 2D-3D feature trajectory fusion optimization, which we formulate as a linear problem that can be solved efficiently. Finally, we use a per-pixel shape-from-shading algorithm to estimate the fine-scale geometry details such as wrinkles to further improve the reconstruction fidelity. We demonstrate the accuracy of our system with reconstruction results using both single images and monocular video sequences.
Keywords3D understanding of multimedia data Image/video based 3D face acquisition 2D & 3D facial feature computing
This work is supported by National Key R&D Program of China (2017YFB1002702).
- 7.Bregler C, Hertzmann A, Biermann H (2000) Recovering non-rigid 3D shape from image streams. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 690–696Google Scholar
- 11.Dai Y, Li H, He M (2012) A simple prior-free method for non-rigid structure-from-motion factorization. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2018–2025Google Scholar
- 17.Guenter B, Grimm C, Wood D (1998) Making faces. In: Processing of ACM SIGGRAPH 1998, pp 55–66Google Scholar
- 18.Hartley R, Ziserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, p 2003Google Scholar
- 21.Huber P, Hu G, Tena R, Kittler J (2016) A multiresolution 3D Morphable Face Model and fitting framework. In: Proceeding of international conference on computer vision theory and applications, pp 1–8Google Scholar
- 27.Suwajanakorn S, Kemelmacher-Shlizerman I, Seitz SM (2014) Total moving face reconstruction. In: Processing of European conference on computer vision (ECCV), pp 796–812Google Scholar
- 28.Tian F, Liu X, Liu Z, Sun N,Wang M,Wang H, Zhang F (2017) Multimedia integrated annotation based on common space learning. Multimed Tools Appl 1–20. https://doi.org/10.1007/s11042-017-5068-0
- 29.Tian F, Shen X, Liu X (2017) Multimedia automatic annotation by mining label set correlation. Multimed Tools Appl 1–17. https://doi.org/10.1007/s11042-017-5170-3
- 30.Tian F, Shen X, Shang F (2017) Automatic image annotation with real-world community contributed data set. Multimed Syst 1–12. https://doi.org/10.1007/s00530-017-0548-7
- 36.Zhang H, Yang Y, Luan H, Yang S, Chua T-S (2014) Start from scratch: towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM international conference on multimedia, pp 187–196Google Scholar
- 37.Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference, pp 781–790Google Scholar