Enhancing Facial Impression for Video Conference

Park, Sungyeon; Choi, Heeseung; Kim, Ig-Jae

doi:10.1007/978-3-319-20804-6_33

Sungyeon Park^15,16,
Heeseung Choi¹⁶ &
Ig-Jae Kim^15,16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9189))

Included in the following conference series:

International Conference on Distributed, Ambient, and Pervasive Interactions

1816 Accesses

Abstract

Most people have their preferred impression to be seen by others. Our face warping method can make this real. In this paper, we propose a new technique for automatic transformation of facial impression, for example, look more attractive or baby-face from the original. We build an impression score function, trained by scores from human raters, and the function is used to get displacement vectors for a given face. To preserve the facial identity as much as possible, we constrain the facial deformation range with facial classification. In case of real time application, such as video conference, there are frequent facial expressions variation and position changes. Face tracker is used to cope with this changing situation. Through our experiments, our proposed method can be one of the promising methods for future video conference system.

You have full access to this open access chapter, Download conference paper PDF

Video-Based Performance Driven Facial Animation

Keywords

1 Introduction

People have developed many kinds of advanced digital technologies that make us enrich all our lives. Among them, smartphone is one of the fascinating inventions. Owing to smartphone, it becomes no surprise that more people can talk to or see others who live in the distant whenever or wherever they are. Since nearly everyone has a smartphone capable of handling video chats, we can enjoy a video chat or teleconference easily. Consequently, we have more chances to talk to someone else face-to-face with digital devices.

When it comes to video conferencing, most people want to look well to others. People have their preferred impression to be seen by others regarding the situation en-countered. For instance, we might want to be seen as trustworthy in a case such as job interview and talking with older people. People want to be seen as an attractive person to someone who is in their mind.

The human face conveys important information regarding not only the person’s identity but also impressions of various personal attributes from biological ones such as age and gender to social ones such as personality and attractiveness [12]. If the relationship between the physical parameters representing variations in the face’s appearance and the impression of the corresponding images perceived by humans is formalized in a mathematical model, the possibility of a computer capable of dealing with perceptual or subjective information conveyed by faces could become a reality [13].

From this background, we propose a new method to transform facial impression automatically for video chat or teleconference while maintaining identity of original face in this work. Since we transform the impression in regard to the shape of original face, it is possible to maintain close similarity with original face.

In Sect. 2, our transformation facial impression system framework consisting of three stages is presented. In Sects. 3 and 4, we explain details about transformation system and tracking and warping method for video sequence. Finally, we summarize and conclude the chapter in Sects. 5 and 6, respectively.

2 Overview

We showed overview of our proposed method in Fig. 1. Once we capture a person in the frontal view through a camera, we detect the face region and extract 66 landmark points with facial features, such as two eyebrows, two eyes, nose, lips, and boundary of face, from it automatically. In the first stage, we use the detected frontal face image and the landmark points as an input image to the second stage.

The key process in our system is an automatic transformation of facial impression (Fig. 2) based on our database which we trained. It is composed of frontal faces and their facial impression score that we gathered from groups of human raters. We built facial impression score function, modeled by a support vector regressor, with the classified data according to their facial shape. Finally, a deformed face is obtained by optimization of the function. We can transform facial impression as the user wants in this step, in the meanwhile our proposed algorithm can preserve the original face identity.

Facial features are detected in images and tracked across video sequences in the last step. According to the facial features detected through face tracker, original faces are substituted with the deformed face in real time. Our results indicate that the proposed method is capable of transforming facial impression and substituting the original face in real time.

3 Impression Transform Engine

3.1 Construction Database and Extract Facial Feature

The training set is consist of 846 frontal portraits of males with neutral expression. The impression (e.g. Attractiveness, baby-face and aggressiveness) degree of each face was rated by 10 human raters. The average rating of a face is its impression score (on a scale of 1 to 7). Since each rater has their own score range and deviation, we use the z-score (Eq. 1) for score normalization which is calculated using the arithmetic mean and standard deviation of the given data [1].

$$ {s^{\prime }}_{k} = \frac{{s_{k} - \mu }}{\sigma } $$

(1)

In our work, we use constrained local model (CLM) [11] and Chehra [10] face tracker to extract 66-landmark points (Fig. 3(a)). Since Chehra face tracker which is a good face alignment technique does not provide feature points for the boundary of a face, CLM is used to extract 17-landmark points for the boundary. The extracted feature points are used to construct a Delaunay triangulation which builds meshes (Fig. 3(b)). The triangulation consists of 166 edges, and the lengths of these edges are components of 166-dimensional distance vector (Fig. 3(c)). The distance vector is normalized by the square root of the face area.

Table 1. The number of data in each class

Full size table

Face area is a sum of the all triangles from meshes. Training samples are classified as their facial shape in five classes by K-Means [8] classifier. Table 1 is the number of data in each class. Classified samples reduce computation and are used to build Impression Score Function (Sect. 3.2). Constructing function through data which has similar facial shape with original face makes the result images preserve the original facial identity.

3.2 Facial Impression Score Function

The image captured by user is original image as input. To seek suitable training sample class for original image, we should select training sample class which has the smallest Euclidean Distances between the center vector in each class and distance vector of original data. Impression Score Estimation function is built from training sample data consisted of selected distance vector and impression score by using Support Vector Regression (SVR) [4].

SVR is similar to Support Vector Machine (SVM) but used for data regression. SVR is an induction algorithm for fitting multidimensional data by using various kernels. Suppose we have training data $ \{ \left( {x_{1} ,y_{1} } \right), \ldots ,\left( {x_{l} ,y_{l} } \right)\} $, where $ x \in {\mathbb{R}}^{d} $ and $ y \in {\mathbb{R}} $. In ε-SV regression [1], the goal is to find a function f(x) that has at most ε deviation from the actually obtained targets $ y_{i} $ for all the training data, and at the same time is as flat as possible. Describe the case of linear functions f(x), taking the form

$$ f\left( x \right) = < w,u > + \,b $$

(2)

where $ < {\cdot} , {\cdot} > $ denotes the dot product. Flatness in the case of (Eq. 2) means that one seeks a small $ w $. For small $ w $, it is required to minimize the Euclidean norm i.e. $ \parallel w\parallel^{2} $. This can be written as a convex optimization problem by requiring. We use a Radial Basis Function kernel, which is a popular kernel function used in various kernel method learning algorithms. Model selection was performed by a grid search over the width of the kernel $ \sigma $, the slack parameter $ C $ and the tube width parameter $ \varepsilon $. Leave-One-Out Cross Validation (LOOCV) method was used to determine adequate parameters. In the (1), $ u $ is 166-dimensional feature vectors and $ y $ is their corresponding impression score. Therefore, we can estimate impression score with above function.

3.3 Transformation of Facial Impression

Reduce Dimension and Constrain the Search Space with PCA. In this paper, we use Principal Component Analysis (PCA) to reduce a dimension of distance vector from 166 to 30. $ x $ is defined as the distance vector reduced. We regularize objective function (Eq. 3) to constrain the search space in valid human face. LP(X) (Eq. 4) define face space by multivariate Gaussian distribution. We set $ \alpha $ to 0.3 following experimentation.

$$ E(x) = - f\left( x \right) - \alpha \left( {f\left( x \right) - LP\left( x \right)} \right) $$

(3)

$$ LP\left( x \right) = \sum_{i = 1}^{d} \frac{{\left( {x - \bar{\mu }} \right)^{2} }}{{2\sum_{jj} }} + cons $$

(4)

In LP(X), $ \bar{\mu } $ is 30-D average vector of training sample data for selected class and $ \varSigma_{jj} $ is eigen-value obtained by PCA.

Modified Distance Vector. We use Powell’s method [2] to seek modified distance vector. Powell’s method is a proper optimization for problem where the optimal solution is close to the starting point. Since Modified distance vector should be near to original one to maintain close similarity with original, Powell’s method is suitable one. Modified distance vector is obtained by minimizing objective function (Eq. 3) which is based on the impression score function. Original distance vector is staring point in optimization.

$$ f\left( {x'} \right) > f(x) $$

(5)

Modified distance vector should satisfy the condition (5). This means that modified data has higher score than original one.

Adjust Landmark Points as Modified Distance Vector. To transform the facial impression, landmark points should be placed to new landmark position according to the modified distance vector. New landmark points can be obtained by minimizing Eq. 6. This minimizing reduces the differences between the lengths of these edges and their corresponding modified distances. The Levenberg-Marquardt [5, 6] algorithm is used to perform this minimization.

$$ D\left( {q_{1} , \ldots ,q_{166} } \right) = \mathop \sum \nolimits \alpha_{ij} (\parallel q_{i} - q_{j} \parallel^{2} - d_{ij}^{2} )^{2} $$

(6)

The variable $ q_{i} $ and $ q_{j} $ are target landmark point and $ d_{ij} $ is distance corresponded to $ q_{i} $ and $ q_{j} $. We can obtain optimal landmark positions by considering global error.

Post-processing. To reduce distortion of deformed face with forehead and background, modified landmark points of face boundary is replaced original one. The modified landmark points {$ q_{i} $} should translate to the position of the original face. The reason is that it makes deformed face align in the background of the video. Post-processing can cause a decrease in target impression score, but it is necessary to realistic result image.

4 Face Tracking and Warping

We should replace original face image to deformed one in real time. For this, we detect face landmark points with Chehra Face Tracker and CLM in real time, and then add the moving amount to deformed landmark points to track the moving face.

We have the set of the original face landmark points {$ p_{i} $}, which extracted on initial frame, and the set of the deformed one {$ q_{i} $} in initial image position. The set of difference {$ diff_{i} $} is acquired by subtracting from the set of original face landmark points {$ p_{i} $} to the set of moving original face one {$ {p^{\prime }}_{i} $} (Eq. 7). The set of moving deformed face landmark points {$ {q^{\prime }}_{i} $} is updated in every moment by sum of {$ q_{i} $} and {$ diff_{i} $} (Eq. 8).

$$ diff_{i} = p_{i} - {p^{\prime }}_{i} $$

(7)

$$ {q^{\prime}}_{i} = q_{i} + diff_{i} $$

(8)

According to the {$ {p^{\prime}}_{i} $} and {$ {q^{\prime}}_{i} $}, we perform texture warping with the triangle mash which is a result of Delaunay triangulation. The texture images for warping are obtained from each frame to handle the problems of illumination and expression.

5 Experiment

We have implemented application for our system. The application automatically detects facial feature in video sequence image. The user looks camera straight with neutral expression and captures frontal image. Next, the application computes for transformation of the facial impression and displays the transformed image. The application track the face and the original face on the video is substituted with deformed faces in real time. Currently, our training sample images are only for Asian male. We have deformed face which is not included in training samples to enhance baby-face and transform Fig. 4(a) to (b) based on the training data. The difference between original face and deformed face are subtle on the eyes, nose and lips but this subtle changes have impact on the facial impression (Fig. 5).

6 Conclusion

In this paper, we propose a new system for automatic transformation of facial impression in real-time. Since we use trained dataset which is classified as their facial shape, the deformed face is close to original face. Since our database is composed of images of Asian male only, so we need to gather additional datasets, such as images for female and other races. In the future, we will gather more review scores for various kinds of impression for more wide use. And besides, we have to come up with a more sophisticated and objective method for evaluating impression than the current one based on human evaluators.

References

Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recogn. 38, 2270–2285 (2005)
Article Google Scholar
Press, W.H., Flannery, B.P., Teukolasky, S.A., Vetterling, W.T.: Numerical Recipes: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)
Google Scholar
Eisenthal, Y., Dror, G., Ruppin, E.: Facial attractiveness: beauty and the machine. Neural Comput. 18(1), 119–142 (2006)
Article Google Scholar
Joachims, T.: Making Large-scale SVM Learning Practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)
Google Scholar
Levenberg, K.: A method for the solution of certain problems in least squares. Quart. Appl. Math. 2, 164–168 (1944)
MathSciNet MATH Google Scholar
Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11(2), 431–441 (1963)
Article MathSciNet MATH Google Scholar
Leyvand, T., Cohen-Or, D., Dror, G., Lischinski, D.: Data-driven enhancement of facial attractiveness. ACM Trans. Graph. SIGGRAPH 27(3), 38 (2008)
Google Scholar
Wagstaff, K., Rogers, S.: Constrained K-means cluastering with background knowledge. In: Proceedings of the 18th International Conference on Machine Learning, pp. 577–584 (2001)
Google Scholar
Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 504–513. Springer, Heidelberg (2008)
Chapter Google Scholar
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In: CVPR (2014)
Google Scholar
Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: Proceeding of British Machine Vision Conference, vol. 3, pp. 929–938 (2006)
Google Scholar
Bruce, V.: Recognizing Faces. Lawrence Erlbaum Associates, London (1988)
Google Scholar
Hashimoto, S.: KANSEI as the third target of information processing and related topics in Japan. In: Proceeding of KANSEI – The Technology of Emotion, AIMI International Workshop, pp. 101–104, October 1997
Google Scholar
https://www.youtube.com/watch?v=sPaRCtGu9zQ

Download references

Acknowledgement

This work was supported by the KIST Institutional Program (Project No. 2E25660).

Author information

Authors and Affiliations

HCI Robotics, University of Science and Technology, Seoul, Republic of Korea
Sungyeon Park & Ig-Jae Kim
Imaging Media Research Center, Korea Institute of Science and Technology, Seoul, Republic of Korea
Sungyeon Park, Heeseung Choi & Ig-Jae Kim

Authors

Sungyeon Park
View author publications
You can also search for this author in PubMed Google Scholar
Heeseung Choi
View author publications
You can also search for this author in PubMed Google Scholar
Ig-Jae Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ig-Jae Kim .

Editor information

Editors and Affiliations

Smart Future Initiative, Frankfurt, Germany
Norbert Streitz
Eindhoven University of Technology, Eindhoven, The Netherlands
Panos Markopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Choi, H., Kim, IJ. (2015). Enhancing Facial Impression for Video Conference. In: Streitz, N., Markopoulos, P. (eds) Distributed, Ambient, and Pervasive Interactions. DAPI 2015. Lecture Notes in Computer Science(), vol 9189. Springer, Cham. https://doi.org/10.1007/978-3-319-20804-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-20804-6_33
Published: 22 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20803-9
Online ISBN: 978-3-319-20804-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Facial Impression for Video Conference

Abstract

Similar content being viewed by others

Video-Based Performance Driven Facial Animation

Video-Based Performance Driven Facial Animation

Video-Based Performance Driven Facial Animation

Keywords

1 Introduction

2 Overview

3 Impression Transform Engine