Abstract
Automatic recognition of facial expressions is an interesting and challenging research topic in the field of pattern recognition due to applications such as human–machine interface design and developmental psychology. Designing classifiers for facial expression recognition with high reliability is a vital step in this research. This paper presents a novel framework for person-independent expression recognition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM). Existing MKL-based approaches jointly learn the same kernel weights with \(l_{1}\)-norm constraint for all binary classifiers, whereas our framework learns one kernel weight vector per binary classifier in the multiclass-SVM with \(l_{p}\)-norm constraints \((p \ge 1)\), which considers both sparse and non-sparse kernel combinations within MKL. We studied the effect of \(l_{p}\)-norm MKL algorithm for learning the kernel weights and empirically evaluated the recognition results of six basic facial expressions and neutral faces with respect to the value of “\(p\)”. In our experiments, we combined two popular facial feature representations, histogram of oriented gradient and local binary pattern histogram, with two kernel functions, the heavy-tailed radial basis function and the polynomial function. Our experimental results on the CK\(+\), MMI and GEMEP-FERA face databases as well as our theoretical justification show that this framework outperforms the state-of-the-art methods and the SimpleMKL-based multiclass-SVM for facial expression recognition.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Mehrabian, A., Wiener, M.: Decoding of inconsistent communications. J. Personal. Soc. Psychol. 6(1), 109–114 (1967)
Knapp, M.L., Hall, J.A.: Nonverbal Communication in Human Interaction, 7th edn. Cengage Learning, Wadsworth (2010)
Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978)
Cornelius, R.R.: Theoretical approaches to emotion. In: SpeechEmotion-2000, pp. 3–10 (2000)
Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recognit. 36(1), 259–275 (2003)
Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Kotsia, I., Pitas, I.: Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Trans. Image Process. 16(1), 172–187 (2007)
Shan, C., Gong, S., McOwan, P.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Hsu, C., Lin, C.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Senechal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., Prevost, L.: Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 993–1005 (2012)
Zhang, X., Mahoor, M.H., Voyles, R.M.: Facial expression recognition using hessianmkl based multiclass-svm. In: IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG’13) (2013)
Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29(1), 51–59 (1996)
Chapelle, O., Haffner, P., Vapnik, V.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y., et al.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)
Chapelle, O., Rakotomamonjy, A.: Second order optimization of kernel parameters. In: Proc. of the NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels (2008)
Valstar, M., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG’11), pp. 921–926 (2011)
Zhu, Y., De la Torre, F., Cohn, J., Zhang, Y.: Dynamic cascades with bidirectional bootstrapping for spontaneous facial action unit detection. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII’09), pp. 1–8 (2009)
Chang, Y., Hu, C., Feris, R., Turk, M.: Manifold based analysis of facial expression. Image Vis. Comput. 24(6), 605–614 (2006)
Cootes, T., Taylor, C., Cooper, D., Graham, J., et al.: Active shape models—their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)
Pantic, M., Rothkrantz, L.: Facial action recognition for facial expression analysis from static face images. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(3), 1449–1461 (2004)
Pantic, M., Patras, I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybern. Part B Cybern. 36(2), 433–449 (2006)
Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Sung, J., Kim, D.: Pose-robust facial expression recognition using view-based 2d + 3d aam. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(4), 852–866 (2008)
Cheon, Y., Kim, D.: Natural facial expression recognition using differential-aam and manifold learning. Pattern Recognit. 42(7), 1340–1350 (2009)
Lyons, M., Budynek, J., Akamatsu, S.: Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1357–1362 (1999)
Wu, T., Bartlett, M., Movellan, J.: Facial expression recognition using gabor motion energy filters. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–47 (2010)
Ahonen, T., Hadid, A., Pietikäinen, M.: Face recognition with local binary patterns. Comput. Vis.-ECCV 2004, 469–481 (2004)
Liao, S., Fan, W., Chung, A., Yeung, D.: Facial expression recognition using advanced local binary patterns, Tsallis entropies and global appearance features. In: IEEE International Conference on Image Processing, pp. 665–668 (2006)
Almaev, T.R., Valstar, M.F.: Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 356–361 (2013)
Wang, X., Han, T., Yan, S.: An hog-lbp human detector with partial occlusion handling. In: IEEE 12th International Conference on Computer Vision, pp. 32–39 (2009)
Li, Z., Imai, J., Kaneko, M.: Facial-component-based bag of words and phog descriptor for facial expression recognition. In: IEEE International Conference on Systems, Man and Cybernetics (SMC’09), pp. 1353–1358 (2009)
Dahmane, M., Meunier, J.: Emotion recognition using dynamic grid-based hog features. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 884–888 (2011)
Bartlett, M., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., Movellan, J.: Recognizing facial expression: machine learning and application to spontaneous behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 568–573 (2005)
Sebe, N., Lew, M., Sun, Y., Cohen, I., Gevers, T., Huang, T.: Authentic facial expression analysis. Image Vis. Comput. 25(12), 1856–1863 (2007)
Wan, S., Aggarwal, J.: Spontaneous facial expression recognition: a robust metric learning approach. Pattern Recognit. 47(5), 1859–1868 (2014)
Yacoob, Y., Davis, L.: Recognizing human facial expressions from long image sequences using optical flow. IEEE Trans. Pattern Anal. Mach. Intell. 18(6), 636–642 (1996)
Essa, I., Pentland, A.: Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 757–763 (1997)
Cohen, I., Sebe, N., Garg, A., Chen, L., Huang, T.: Facial expression recognition from video sequences: temporal and static modeling. Comput. Vis. Image Underst. 91(1), 160–187 (2003)
Yeasin, M., Bullot, B., Sharma, R.: Recognition of facial expressions and measurement of levels of interest from video. IEEE Trans. Multimed. 8(3), 500–508 (2006)
Zhang, Y., Ji, Q.: Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 699–714 (2005)
Shan, C., Gong, S., McOwan, P.: Dynamic facial expression recognition using a bayesian temporal manifold model. In: Proc. BMVC, vol. 1, pp. 297–306 (2006)
Fang, H., Mac Parthaláin, N., Aubrey, A.J., Tam, G.K., Borgo, R., Rosin, P.L., Grant, P.W., Marshall, D., Chen, M.: Facial expression recognition in dynamic sequences: an integrated approach. Pattern Recognit. 47(3), 1271–1281 (2014)
Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)
Fu, S., Kuai, X., Yang, G.: Multiple kernel active learning for facial expression analysis. Adv. Neural Netw.-ISNN 2011, 381–387 (2011)
Sénéchal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., Prevost, L.: Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 993–1005 (2012)
Zhang, W., Shan, S., Gao, W., Chen, X., Zhang, H.: Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 786–791 (2005)
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)
Cortes, C., Mohri, M., Rostamizadeh, A.: \(l_{2}\) regularization for learning kernels. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 109–116 (2009)
Sun, T., Jiao, L., Liu, F., Wang, S., Feng, J.: Selective multiple kernel learning for classification with ensemble strategy. Pattern Recognit. 46(11), 3081–3090 (2013)
Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: \(l_{p}\)-norm multiple kernel learning. J. Mach. Learn. Res. 12, 953–997 (2011)
Kloft, M.: Lp-norm multiple kernel learning. Ph.D. dissertation, Berlin Institute of Technology (2011)
Yan, F., Mikolajczyk, K., Kittler, J., Tahir, M.: A comparison of l\_1 norm and l\_2 norm multiple kernel svms in image and video classification. In: Seventh International Workshop on Content-Based Multimedia Indexing (CBMI’09), pp. 7–12 (2009)
Luenberger, D., Ye, Y.: Linear and Nonlinear Programming. Springer, New York (2008)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2000)
Bach, F.R.: Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)
Canu, S., Grandvalet, Y., Guigue, V., Rakotomamonjy, A.: SVM and kernel methods matlab toolbox. Perception Systémes et Information, INSA de Rouen, Rouen, France (2005)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn–Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)
Jiang, B., Valstar, M., Martinez, B., Pantic, M.: A dynamic appearance descriptor approach to facial actions temporal modeling. IEEE Trans. Cybern. 44(2), 161–174 (2014)
Zhang, X., Mahoor, M.H., Nielsen, R.D.: On multi-task learning for facial action unit detection. In: IVCNZ, pp. 202–207 (2013)
Zhang, X., Mahoor, M., Mavadati, S., Cohn, J.: A lp-norm mtmkl framework for simultaneous detection of multiple facial action units. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1104–1111 (2014)
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: IEEE 12th International Conference on Computer Vision, pp. 221–228 (2009)
Roy, K., Kamel, M.: Facial expression recognition using game theory. In: Artificial Neural Networks in Pattern Recognition, pp. 139–150 (2012)
Jain, S., Hu, C., Aggarwal, J.: Facial expression recognition with temporal modeling of shapes. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1642–1649 (2011)
Ramirez Rivera, A., Rojas Castillo, J., Chae, O.: Local directional number pattern for face analysis: face and expression recognition. IEEE Trans. Image Process. 22(5), 1740–1752 (2013)
Gu, W., Xiang, C., Venkatesh, Y., Huang, D., Lin, H.: Facial expression recognition using radial encoding of local Gabor features and classifier synthesis. Pattern Recognit. 45(1), 80–91 (2012)
Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: IEEE International Conference on Multimedia and Expo (ICME’05), pp. 5–8 (2005)
Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: The Workshop Programme, pp. 65–70 (2010)
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 532–539 (2013)
Guo, Y., Zhao, G., Pietikäinen, M.: Dynamic facial expression recognition using longitudinal facial expression atlases. Comput. Vis.-ECCV 2012, 631–644 (2012)
Sánchez, A., Ruiz, J.V., Moreno, A.B., Montemayor, A.S., Hernández, J., Pantrigo, J.J.: Differential optical flow applied to automatic facial expression recognition. Neurocomputing 74(8), 1272–1282 (2011)
Littlewort, G., Bartlett, M.S., Fasel, I., Susskind, J., Movellan, J.: Dynamics of facial expression extracted automatically from video. Image Vis. Comput. 24(6), 615–625 (2006)
Valstar, M.F., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 921–926 (2011)
Valstar, M.F., Mehu, M., Jiang, B., Pantic, M., Scherer, K.: Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 966–979 (2012)
Tariq, U., Lin, K.-H., Li, Z., Zhou, X., Wang, Z., Le, V., Huang, T.S., Lv, X., Han, T.X.: Emotion recognition from an ensemble of features. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 872–877 (2011)
Yang, S., Bhanu, B.: Facial expression recognition using emotion avatar image. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 866–871 (2011)
Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099–1125 (2005)
Acknowledgments
This research is partially supported by Grants BCS-1052781, IIS-1111568 from the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendices
1.1 Appendix A. Description of MKL-based SVM in multiple kernel spaces
To explain the functionality of MKL-based SVM, we utilize the \(p=1\) case as an example. In this case, the optimized kernel combination weights follow the constraint \({\sum _{i=1}^{M}{d_{m}}} = 1,d_{m} \ge 0\). Given a test example \(x_{0} \in {\mathbb {R}}^{D}\), we rewrite Eq. 5 as follows.
From the above equation, the formulation within the under bracket is the discriminant function used for classifying new samples in canonical binary SVM. In other words, using MKL-based SVM the label of a sample is determined based on weighted summation of the results obtained from each RKHS, which enhances the discriminant power for classification.
1.2 Appendix B. Proof of the superiority of MKL-based SVM over canonical binary SVM with single kernel and single type of features
Without loss of generality, our proof is pursed in the case of \(1<p<2\). We transform the object function of Eq. 3 based on the Lemma 26 in [82] as:
where \(r=p/(2-p)\).
As described in Sect. 3.3, this convex optimization problem is solved by the two-step method, where two nested iterations are equipped in each loop of the method. In the outer iteration, the kernel combination weights are updated by fixing the parameters of SVM, whereas in the inner iteration the optimization problem of canonical SVM is solved by fixing the updated kernel combination weights. Let \(N_{f}\) be the number of features extracted from each sample and \(N_{k}\) the number of kernel functions used in the \(l_{p}\)-norm MKL-based SVM. We denote the updated kernel combination vector in the \(t\)th loop of the two-step method as follows.
In addition, the SVM discriminant hyperplane obtained in the outer iteration of the \(t\)th loop is denoted based on \(w^{(t)}\) and \(w_{0}^{(t)}\).
For the canonical binary SVM, we suppose that the \(i\)th feature with the \(j\)th kernel function is utilized. Then, the canonical SVM becomes a special case in the framework of MKL-based SVM, and its corresponding kernel combination vector can be defined as follows.
Further, the learned discriminant hyperplane of canonical SVM is defined based on \({\hat{w}}^{\star }\) and \({\hat{w}}_{0}^{\star }\).
By assuming that in the first loop of the two-step method \(d^{(1)}\) is initialized as \(\hat{d}\) in the outer iteration, we obtain that in the inner iteration of the first loop the learned \(w^{(1)}={\hat{w}}^{\star }\) and \(w_{0}^{(1)} = {\hat{w}}_{0}^{\star }\). Thereafter, our proof is formulated as follows,
where \(J^{\star }\) is the learned minimum of the objective function in \(l_{p}\)-norm MKL-based SVM with its corresponding optimum \(d^{\star },w^{\star },w_{0}^{\star }\), and \({\hat{J}}^{\star }\) with \({\hat{w}}^{\star },{\hat{w}}_{0}^{\star }\) is for canonical binary SVM.
Based on the above justification, we can naturally extend the conclusion to a more general case. That is:
Suppose \(\exists \) a set of basis kernel functions \(S\) (\(S \ne \emptyset \)) and a set of features \(F\) (\(F \ne \emptyset \)). Then \(\forall S^{\prime } \subseteq S\) (\(S^{\prime } \ne \emptyset \)) and \(F^{\prime } \subseteq F\) (\(F^{\prime } \ne \emptyset \)), we obtain that \(J^{\star }_{S \times F} \le J^{\star }_{S^{\prime } \times F^{\prime }}\), since \(d^{\star }_{S^{\prime } \times F^{\prime }}\) can be seen as a special case of \(d_{S \times F}\). The subscripts \(S \times F\) and \(S^{\prime } \times F^{\prime }\) denote the kernels and features in use.
To be more specific, we conclude that MKL-based SVM with multiple kernels and features performs better or at least equally than those with multiple kernels and single feature or with single kernel and multiple features.
1.3 Appendix C. Proof of the superiority of our proposed MKL-based multiclass-SVM over the SimpleMKL-based multiclass-SVM
To be consistent with the SimpleMKL-based multiclass-SVM, we set \(p=1\) in our framework. Then, the only difference between the two methods are the ways of updating the kernel combination vectors for multiclass classification tasks as mentioned in Eqs. 6 and 7. The superiority of our proposed MKL framework for multiclass-SVM lies in the fact that its minimized objective function preserves the lower boundary of the one obtained using SimpleMKL-based multiclass-SVM. That is, the derived hyperplanes from our method perform better or at least equally among the training data.
Suppose that \({\hat{L}}^{\star }\) is the optimal value of the objective function in Eq. 7, and \(d_{u}^{\star }\) is the learned optimum for each binary classifier in our framework. \(L^{\star }\) and \(d^{\star }\) are the corresponding notations for the SimpleMKL-based multiclass-SVM in Eq. 6. Our proof is as follows
since \(L_{u}(d_{u}^{\star }) \le L_{u}({d}^{\star }), \forall u \in \varPhi \).
Rights and permissions
About this article
Cite this article
Zhang, X., Mahoor, M.H. & Mavadati, S.M. Facial expression recognition using \({l}_{p}\)-norm MKL multiclass-SVM. Machine Vision and Applications 26, 467–483 (2015). https://doi.org/10.1007/s00138-015-0677-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-015-0677-y