Skip to main content

Advertisement

Log in

Facial expression recognition using \({l}_{p}\)-norm MKL multiclass-SVM

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Automatic recognition of facial expressions is an interesting and challenging research topic in the field of pattern recognition due to applications such as human–machine interface design and developmental psychology. Designing classifiers for facial expression recognition with high reliability is a vital step in this research. This paper presents a novel framework for person-independent expression recognition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM). Existing MKL-based approaches jointly learn the same kernel weights with \(l_{1}\)-norm constraint for all binary classifiers, whereas our framework learns one kernel weight vector per binary classifier in the multiclass-SVM with \(l_{p}\)-norm constraints \((p \ge 1)\), which considers both sparse and non-sparse kernel combinations within MKL. We studied the effect of \(l_{p}\)-norm MKL algorithm for learning the kernel weights and empirically evaluated the recognition results of six basic facial expressions and neutral faces with respect to the value of “\(p\)”. In our experiments, we combined two popular facial feature representations, histogram of oriented gradient and local binary pattern histogram, with two kernel functions, the heavy-tailed radial basis function and the polynomial function. Our experimental results on the CK\(+\), MMI and GEMEP-FERA face databases as well as our theoretical justification show that this framework outperforms the state-of-the-art methods and the SimpleMKL-based multiclass-SVM for facial expression recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Mehrabian, A., Wiener, M.: Decoding of inconsistent communications. J. Personal. Soc. Psychol. 6(1), 109–114 (1967)

    Article  Google Scholar 

  2. Knapp, M.L., Hall, J.A.: Nonverbal Communication in Human Interaction, 7th edn. Cengage Learning, Wadsworth (2010)

    Google Scholar 

  3. Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978)

    Google Scholar 

  4. Cornelius, R.R.: Theoretical approaches to emotion. In: SpeechEmotion-2000, pp. 3–10 (2000)

  5. Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recognit. 36(1), 259–275 (2003)

    Article  MATH  Google Scholar 

  6. Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)

    Article  Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  8. Kotsia, I., Pitas, I.: Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Trans. Image Process. 16(1), 172–187 (2007)

    Article  MathSciNet  Google Scholar 

  9. Shan, C., Gong, S., McOwan, P.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)

    Article  Google Scholar 

  10. Hsu, C., Lin, C.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  11. Senechal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., Prevost, L.: Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 993–1005 (2012)

  12. Zhang, X., Mahoor, M.H., Voyles, R.M.: Facial expression recognition using hessianmkl based multiclass-svm. In: IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG’13) (2013)

  13. Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    MATH  Google Scholar 

  14. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)

  15. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005)

  16. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29(1), 51–59 (1996)

    Article  Google Scholar 

  17. Chapelle, O., Haffner, P., Vapnik, V.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)

  18. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  19. Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y., et al.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)

    MATH  MathSciNet  Google Scholar 

  20. Chapelle, O., Rakotomamonjy, A.: Second order optimization of kernel parameters. In: Proc. of the NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels (2008)

  21. Valstar, M., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG’11), pp. 921–926 (2011)

  22. Zhu, Y., De la Torre, F., Cohn, J., Zhang, Y.: Dynamic cascades with bidirectional bootstrapping for spontaneous facial action unit detection. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII’09), pp. 1–8 (2009)

  23. Chang, Y., Hu, C., Feris, R., Turk, M.: Manifold based analysis of facial expression. Image Vis. Comput. 24(6), 605–614 (2006)

    Article  Google Scholar 

  24. Cootes, T., Taylor, C., Cooper, D., Graham, J., et al.: Active shape models—their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)

    Article  Google Scholar 

  25. Pantic, M., Rothkrantz, L.: Facial action recognition for facial expression analysis from static face images. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(3), 1449–1461 (2004)

    Article  Google Scholar 

  26. Pantic, M., Patras, I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybern. Part B Cybern. 36(2), 433–449 (2006)

    Article  Google Scholar 

  27. Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)

    Article  Google Scholar 

  28. Sung, J., Kim, D.: Pose-robust facial expression recognition using view-based 2d + 3d aam. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(4), 852–866 (2008)

    Article  Google Scholar 

  29. Cheon, Y., Kim, D.: Natural facial expression recognition using differential-aam and manifold learning. Pattern Recognit. 42(7), 1340–1350 (2009)

    Article  MATH  Google Scholar 

  30. Lyons, M., Budynek, J., Akamatsu, S.: Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1357–1362 (1999)

    Article  Google Scholar 

  31. Wu, T., Bartlett, M., Movellan, J.: Facial expression recognition using gabor motion energy filters. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–47 (2010)

  32. Ahonen, T., Hadid, A., Pietikäinen, M.: Face recognition with local binary patterns. Comput. Vis.-ECCV 2004, 469–481 (2004)

    Google Scholar 

  33. Liao, S., Fan, W., Chung, A., Yeung, D.: Facial expression recognition using advanced local binary patterns, Tsallis entropies and global appearance features. In: IEEE International Conference on Image Processing, pp. 665–668 (2006)

  34. Almaev, T.R., Valstar, M.F.: Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 356–361 (2013)

  35. Wang, X., Han, T., Yan, S.: An hog-lbp human detector with partial occlusion handling. In: IEEE 12th International Conference on Computer Vision, pp. 32–39 (2009)

  36. Li, Z., Imai, J., Kaneko, M.: Facial-component-based bag of words and phog descriptor for facial expression recognition. In: IEEE International Conference on Systems, Man and Cybernetics (SMC’09), pp. 1353–1358 (2009)

  37. Dahmane, M., Meunier, J.: Emotion recognition using dynamic grid-based hog features. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 884–888 (2011)

  38. Bartlett, M., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., Movellan, J.: Recognizing facial expression: machine learning and application to spontaneous behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 568–573 (2005)

  39. Sebe, N., Lew, M., Sun, Y., Cohen, I., Gevers, T., Huang, T.: Authentic facial expression analysis. Image Vis. Comput. 25(12), 1856–1863 (2007)

    Article  Google Scholar 

  40. Wan, S., Aggarwal, J.: Spontaneous facial expression recognition: a robust metric learning approach. Pattern Recognit. 47(5), 1859–1868 (2014)

    Article  Google Scholar 

  41. Yacoob, Y., Davis, L.: Recognizing human facial expressions from long image sequences using optical flow. IEEE Trans. Pattern Anal. Mach. Intell. 18(6), 636–642 (1996)

    Article  Google Scholar 

  42. Essa, I., Pentland, A.: Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 757–763 (1997)

    Article  Google Scholar 

  43. Cohen, I., Sebe, N., Garg, A., Chen, L., Huang, T.: Facial expression recognition from video sequences: temporal and static modeling. Comput. Vis. Image Underst. 91(1), 160–187 (2003)

    Article  Google Scholar 

  44. Yeasin, M., Bullot, B., Sharma, R.: Recognition of facial expressions and measurement of levels of interest from video. IEEE Trans. Multimed. 8(3), 500–508 (2006)

    Article  Google Scholar 

  45. Zhang, Y., Ji, Q.: Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 699–714 (2005)

    Article  Google Scholar 

  46. Shan, C., Gong, S., McOwan, P.: Dynamic facial expression recognition using a bayesian temporal manifold model. In: Proc. BMVC, vol. 1, pp. 297–306 (2006)

  47. Fang, H., Mac Parthaláin, N., Aubrey, A.J., Tam, G.K., Borgo, R., Rosin, P.L., Grant, P.W., Marshall, D., Chen, M.: Facial expression recognition in dynamic sequences: an integrated approach. Pattern Recognit. 47(3), 1271–1281 (2014)

  48. Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)

    MATH  MathSciNet  Google Scholar 

  49. Fu, S., Kuai, X., Yang, G.: Multiple kernel active learning for facial expression analysis. Adv. Neural Netw.-ISNN 2011, 381–387 (2011)

    Google Scholar 

  50. Sénéchal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., Prevost, L.: Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 993–1005 (2012)

  51. Zhang, W., Shan, S., Gao, W., Chen, X., Zhang, H.: Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 786–791 (2005)

  52. Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)

    MATH  MathSciNet  Google Scholar 

  53. Cortes, C., Mohri, M., Rostamizadeh, A.: \(l_{2}\) regularization for learning kernels. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 109–116 (2009)

  54. Sun, T., Jiao, L., Liu, F., Wang, S., Feng, J.: Selective multiple kernel learning for classification with ensemble strategy. Pattern Recognit. 46(11), 3081–3090 (2013)

    Article  Google Scholar 

  55. Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: \(l_{p}\)-norm multiple kernel learning. J. Mach. Learn. Res. 12, 953–997 (2011)

    MATH  MathSciNet  Google Scholar 

  56. Kloft, M.: Lp-norm multiple kernel learning. Ph.D. dissertation, Berlin Institute of Technology (2011)

  57. Yan, F., Mikolajczyk, K., Kittler, J., Tahir, M.: A comparison of l\_1 norm and l\_2 norm multiple kernel svms in image and video classification. In: Seventh International Workshop on Content-Based Multimedia Indexing (CBMI’09), pp. 7–12 (2009)

  58. Luenberger, D., Ye, Y.: Linear and Nonlinear Programming. Springer, New York (2008)

  59. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2000)

  60. Bach, F.R.: Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)

    MATH  MathSciNet  Google Scholar 

  61. Canu, S., Grandvalet, Y., Guigue, V., Rakotomamonjy, A.: SVM and kernel methods matlab toolbox. Perception Systémes et Information, INSA de Rouen, Rouen, France (2005)

  62. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn–Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)

  63. Jiang, B., Valstar, M., Martinez, B., Pantic, M.: A dynamic appearance descriptor approach to facial actions temporal modeling. IEEE Trans. Cybern. 44(2), 161–174 (2014)

  64. Zhang, X., Mahoor, M.H., Nielsen, R.D.: On multi-task learning for facial action unit detection. In: IVCNZ, pp. 202–207 (2013)

  65. Zhang, X., Mahoor, M., Mavadati, S., Cohn, J.: A lp-norm mtmkl framework for simultaneous detection of multiple facial action units. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1104–1111 (2014)

  66. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)

    Article  Google Scholar 

  67. Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: IEEE 12th International Conference on Computer Vision, pp. 221–228 (2009)

  68. Roy, K., Kamel, M.: Facial expression recognition using game theory. In: Artificial Neural Networks in Pattern Recognition, pp. 139–150 (2012)

  69. Jain, S., Hu, C., Aggarwal, J.: Facial expression recognition with temporal modeling of shapes. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1642–1649 (2011)

  70. Ramirez Rivera, A., Rojas Castillo, J., Chae, O.: Local directional number pattern for face analysis: face and expression recognition. IEEE Trans. Image Process. 22(5), 1740–1752 (2013)

  71. Gu, W., Xiang, C., Venkatesh, Y., Huang, D., Lin, H.: Facial expression recognition using radial encoding of local Gabor features and classifier synthesis. Pattern Recognit. 45(1), 80–91 (2012)

    Article  Google Scholar 

  72. Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: IEEE International Conference on Multimedia and Expo (ICME’05), pp. 5–8 (2005)

  73. Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: The Workshop Programme, pp. 65–70 (2010)

  74. Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 532–539 (2013)

  75. Guo, Y., Zhao, G., Pietikäinen, M.: Dynamic facial expression recognition using longitudinal facial expression atlases. Comput. Vis.-ECCV 2012, 631–644 (2012)

    Google Scholar 

  76. Sánchez, A., Ruiz, J.V., Moreno, A.B., Montemayor, A.S., Hernández, J., Pantrigo, J.J.: Differential optical flow applied to automatic facial expression recognition. Neurocomputing 74(8), 1272–1282 (2011)

    Article  Google Scholar 

  77. Littlewort, G., Bartlett, M.S., Fasel, I., Susskind, J., Movellan, J.: Dynamics of facial expression extracted automatically from video. Image Vis. Comput. 24(6), 615–625 (2006)

    Article  Google Scholar 

  78. Valstar, M.F., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 921–926 (2011)

  79. Valstar, M.F., Mehu, M., Jiang, B., Pantic, M., Scherer, K.: Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 966–979 (2012)

    Article  Google Scholar 

  80. Tariq, U., Lin, K.-H., Li, Z., Zhou, X., Wang, Z., Le, V., Huang, T.S., Lv, X., Han, T.X.: Emotion recognition from an ensemble of features. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 872–877 (2011)

  81. Yang, S., Bhanu, B.: Facial expression recognition using emotion avatar image. In: IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’11), pp. 866–871 (2011)

  82. Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099–1125 (2005)

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research is partially supported by Grants BCS-1052781, IIS-1111568 from the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad H. Mahoor.

Appendices

Appendices

1.1 Appendix A. Description of MKL-based SVM in multiple kernel spaces

To explain the functionality of MKL-based SVM, we utilize the \(p=1\) case as an example. In this case, the optimized kernel combination weights follow the constraint \({\sum _{i=1}^{M}{d_{m}}} = 1,d_{m} \ge 0\). Given a test example \(x_{0} \in {\mathbb {R}}^{D}\), we rewrite Eq. 5 as follows.

$$\begin{aligned}&y_{0} = {{\text {sgn}}}\left[ {\sum }_{i=1}^{N}{{\sum }_{m=1}^{M}{{\alpha }_{i}y_{i}d_{m}k_{m}(x_{i},x_{0})+w_{0}}}\right] \\& = {{\text {sgn}}}\left[ {\sum }_{m=1}^{M}{d_{m}\underbrace{\left( {\sum }_{i=1}^{N}{{\alpha }_{i}y_{i}k_{m}(x_{i},x_{0})}+w_{0}\right) }_{\text {single kernel with single feature}}}\right] \end{aligned}$$

From the above equation, the formulation within the under bracket is the discriminant function used for classifying new samples in canonical binary SVM. In other words, using MKL-based SVM the label of a sample is determined based on weighted summation of the results obtained from each RKHS, which enhances the discriminant power for classification.

1.2 Appendix B. Proof of the superiority of MKL-based SVM over canonical binary SVM with single kernel and single type of features

Without loss of generality, our proof is pursed in the case of \(1<p<2\). We transform the object function of Eq. 3 based on the Lemma 26 in [82] as:

$$\begin{aligned} \mathop {\min }_{d,\Vert d\Vert _{r} \le 1} \mathop {\min }_{w,w_{0},\xi } \quad J(d,w,w_{0},\xi ) = \frac{1}{2}\sum _{m=1}^{M}{\frac{\Vert w_{m}\Vert ^{2}_{2}}{d_{m}}}+C\sum _{i=1}^{N}{\xi _{i}} \end{aligned}$$

where \(r=p/(2-p)\).

As described in Sect. 3.3, this convex optimization problem is solved by the two-step method, where two nested iterations are equipped in each loop of the method. In the outer iteration, the kernel combination weights are updated by fixing the parameters of SVM, whereas in the inner iteration the optimization problem of canonical SVM is solved by fixing the updated kernel combination weights. Let \(N_{f}\) be the number of features extracted from each sample and \(N_{k}\) the number of kernel functions used in the \(l_{p}\)-norm MKL-based SVM. We denote the updated kernel combination vector in the \(t\)th loop of the two-step method as follows.

$$\begin{aligned} d^{(t)}= & {} \left[ \underbrace{d_{1}^{(t)}, \ldots , d_{N_{k}}^{(t)}}_{\text {the } 1{\mathrm{st}} \text { feature}}, \ldots , \underbrace{d_{(i-1)N_{k}+1}^{(t)}, \ldots , d_{i \cdot N_{k}}^{(t)}}_{\text {the } i{\mathrm{th}} \text { feature}}, \ldots ,\right. \\&\quad \left. \underbrace{d_{(N_{f}-1)N_{k}+1}^{(t)}, \ldots , d_{N_{f}N_{k}}^{(t)}}_{\text {the } {N_{f}}{\mathrm{th}}} \text {feature}\right] ^{T} \\ d^{(t)}\in & {} {{\mathbb {R}}_{+}^{\star N_{f}N_{K}}},\quad {\Vert d^{(t)}\Vert }_{r}=1 \end{aligned}$$

In addition, the SVM discriminant hyperplane obtained in the outer iteration of the \(t\)th loop is denoted based on \(w^{(t)}\) and \(w_{0}^{(t)}\).

For the canonical binary SVM, we suppose that the \(i\)th feature with the \(j\)th kernel function is utilized. Then, the canonical SVM becomes a special case in the framework of MKL-based SVM, and its corresponding kernel combination vector can be defined as follows.

$$\begin{aligned} \hat{d} = \left[ \underbrace{0,0, \ldots ,0, \ldots , 0}_{{d_{1} \sim d_{(i-1)N_{k}+j-1}}}, 1, \underbrace{0,0, \ldots ,0, \ldots , 0}_{{d_{(i-1)N_{k}+j+1} \sim d_{N_{f}N_{k}}}}\right] ^{T} \end{aligned}$$

Further, the learned discriminant hyperplane of canonical SVM is defined based on \({\hat{w}}^{\star }\) and \({\hat{w}}_{0}^{\star }\).

By assuming that in the first loop of the two-step method \(d^{(1)}\) is initialized as \(\hat{d}\) in the outer iteration, we obtain that in the inner iteration of the first loop the learned \(w^{(1)}={\hat{w}}^{\star }\) and \(w_{0}^{(1)} = {\hat{w}}_{0}^{\star }\). Thereafter, our proof is formulated as follows,

$$\begin{aligned} {\hat{J}}^{\star }&= J({\hat{d}}, {\hat{w}}^{\star },{\hat{w}}_{0}^{\star }) = J(d^{(1)},w^{(1)},w_{0}^{(1)})\\&\ge J(d^{(2)},w^{(1)},w_{0}^{(1)}) \ge J(d^{(2)},w^{(2)},w_{0}^{(2)}) \\&\ge \cdots \ge J(d^{\star },w^{\star },w_{0}^{\star }) = J^{\star } \end{aligned}$$

where \(J^{\star }\) is the learned minimum of the objective function in \(l_{p}\)-norm MKL-based SVM with its corresponding optimum \(d^{\star },w^{\star },w_{0}^{\star }\), and \({\hat{J}}^{\star }\) with \({\hat{w}}^{\star },{\hat{w}}_{0}^{\star }\) is for canonical binary SVM.

Based on the above justification, we can naturally extend the conclusion to a more general case. That is:

Suppose \(\exists \) a set of basis kernel functions \(S\) (\(S \ne \emptyset \)) and a set of features \(F\) (\(F \ne \emptyset \)). Then \(\forall S^{\prime } \subseteq S\) (\(S^{\prime } \ne \emptyset \)) and \(F^{\prime } \subseteq F\) (\(F^{\prime } \ne \emptyset \)), we obtain that \(J^{\star }_{S \times F} \le J^{\star }_{S^{\prime } \times F^{\prime }}\), since \(d^{\star }_{S^{\prime } \times F^{\prime }}\) can be seen as a special case of \(d_{S \times F}\). The subscripts \(S \times F\) and \(S^{\prime } \times F^{\prime }\) denote the kernels and features in use.

To be more specific, we conclude that MKL-based SVM with multiple kernels and features performs better or at least equally than those with multiple kernels and single feature or with single kernel and multiple features.

1.3 Appendix C. Proof of the superiority of our proposed MKL-based multiclass-SVM over the SimpleMKL-based multiclass-SVM

To be consistent with the SimpleMKL-based multiclass-SVM, we set \(p=1\) in our framework. Then, the only difference between the two methods are the ways of updating the kernel combination vectors for multiclass classification tasks as mentioned in Eqs. 6 and 7. The superiority of our proposed MKL framework for multiclass-SVM lies in the fact that its minimized objective function preserves the lower boundary of the one obtained using SimpleMKL-based multiclass-SVM. That is, the derived hyperplanes from our method perform better or at least equally among the training data.

Suppose that \({\hat{L}}^{\star }\) is the optimal value of the objective function in Eq. 7, and \(d_{u}^{\star }\) is the learned optimum for each binary classifier in our framework. \(L^{\star }\) and \(d^{\star }\) are the corresponding notations for the SimpleMKL-based multiclass-SVM in Eq. 6. Our proof is as follows

$$\begin{aligned} {\hat{L}}^{\star } = \sum _{u \in \varPhi }L_{u}(d_{u}^{\star }) \le \sum _{u \in \varPhi }L_{u}({d}^{\star }) = L^{\star } \end{aligned}$$

since \(L_{u}(d_{u}^{\star }) \le L_{u}({d}^{\star }), \forall u \in \varPhi \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Mahoor, M.H. & Mavadati, S.M. Facial expression recognition using \({l}_{p}\)-norm MKL multiclass-SVM. Machine Vision and Applications 26, 467–483 (2015). https://doi.org/10.1007/s00138-015-0677-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-015-0677-y

Keywords

Navigation