1 Introduction

Numerous studies [1] have been greatly proposed for face recognition recently. In realistic situations such as video surveillance, face recognition may encounter many great challenges, especially low-resolution problems, caused by the cameras at a distance. Additionally, the low-resolution (LR) problems might be coupled with other effects such as illumination variations. Therefore, it is desired to devise a face recognition method for both very low-resolution and illumination variation problems.

In the literature, numerous researches based on the subspace projection methods have been proposed to achieve successful face recognition. The principle component analysis (PCA) [26] and independent component analysis (ICA) [79] have been reported to be robust in noisy conditions. The linear discriminant analysis (LDA) [36] yields better results in clean conditions and lighting changes than the PCA. Moreover, the kernel variants such as kernel PCA (KPCA) [1012] and kernel LDA (KLDA) [1214] have been presented to achieve better performance by nonlinearly mapping the data from the original space to a very high-dimensional feature space, which is called the reproducing kernel Hilbert space (RKHS). Therefore, KPCA and KLDA by nonlinearly mapping could be improved from high-order statistics, whereas the PCA and LDA only utilize the first- and second-order statistics. Thus, for highly nonlinear data distribution, these kernel methods are more suitable for low-resolution and illumination variation conditions. Moreover, some geometrically motivated approaches such as locality preserving projection (LPP) [15] and neighboring preserving embedding (NPE) [16] have been shown effectiveness for face recognition.

Recently, the spare representation classification (SRC) [17, 18] and a linear regression classification (LRC) algorithms [19] have been proposed for face recognition. Although SRC-based approaches perform very well in many situations, the execution time in SRC-based approaches is more than that in LRC-based approaches. For pursuing the accuracy and speed, the LRC would be a good choice for further investigation. The LRC is based on that face images from a specific class are known to lie on a class-specific linear subspace [3, 20]. The regression coefficients can be estimated by using the least-square method, and the decision can be determined by the minimum reconstruction error. Experiments reported have shown that the down-sampled low-resolution face image could be used for face classification directly. However, as the results reported, the LRC could not withstand severe illumination variations. In addition, a robust linear regression classification algorithm (RLRC) [21] has been introduced to address the problem of robust face recognition. However, low-resolution problems did not be addressed.

The performances achieved by the existing methods, such as PCA, LDA, LPP, and NPE methods, decrease under low-resolution condition because of the loss of high-frequency information [22]. Boom et al. in [23] showed that low-resolution face images below 32 × 32 pixels degraded the performances of the PCA and LDA seriously. In [24], the face images with 20 × 20 and 10 × 10 resolution dramatically deteriorated the recognition performance compared with those with 40 × 40 pixels in video-based face recognition systems. In [25], face resolution below 36 × 48 reduced the expression recognition performance.

To overcome the problem of low-resolution face recognition, several works have been presented to resolve this problem by using the super-resolution (SR) method [2630]. One is to train relationship between the low-resolution (LR) and its corresponding high-resolution (HR) face images [29]. The other uses the canonical correlation analysis to compute the coherent features between the LR and its HR face images [30]. Here, we can observe that the LR and HR image pairs are needed for SR methods.

1.1 Problem statements

In realistic situations, it may be a case that only the LR face images are available in training set for identifying criminals, so it is imperative to overcome the problem that some training individuals may not be the same as gallery. In other words, the HR face images for specific persons are not available for modeling the relationship and computing the similarity in SR approaches [2630]. Thus, how to perform LR face recognition directly without the HR information is a critical and practical topic.

To conquer the illumination variation problems, several approaches [3, 3135] have been proposed. For instance, numerous preprocessing methods, such as the histogram equalization, gamma correction, and logarithm transform, are widely used for the illumination normalization. Other methods including the gradient operation, Gabor filters, and LDA-based approaches are well-known illumination invariant methods. However, these methods would fail because the important features in high-frequency details for face recognition are lost under the LR problems.

1.2 Contributions

We propose a novel face recognition algorithm to improve the limitation of the LRC [15] by embedding the kernel method into the linear regression. The key of the proposed method is to apply a nonlinear mapping function to twist the original space into a higher dimensional feature space for better linear regression. Moreover, in order to make the proposed kernel projection feasible, a constrained low-rank approximation [3638] is proposed to obtain low-rank-r singular value approximation. The low-rank approximation is a rank reduction method which minimizes the difference between a given matrix and an approximation matrix. Simulations carried on the extended Yale B, FERET, and AR facial databases reveal that the proposed kernel linear regression classification (KLRC) can achieve good performance for LR face recognition under variable lighting changes without any preprocessing. At the same time, the proposed algorithm can reconstruct the very low-resolution face image under illumination variations with high quality measured by quality assessments.

1.3 Paper outline

The rest of this paper is organized as follows. Section 2 reviews the LRC approach and presents the motivations. Section 3 formulates the proposed KLRC method with a constrained low-rank approximation algorithm. Section 4 shows the comparisons with the related work. Section 5 gives experimental results. Finally, we draw conclusions in Section 6.

2 Background and motivations

2.1 Linear regression classification (LRC)

Assume we have N subjects with p i training images from the ith class, i = 1, 2,…,N. Each gray scale training image is in size of a × b pixels and is represented as v i,j  ∈  a × b, i = 1, 2,…,N and j = 1, 2,…,p i . Then, each training image is transformed to a column vector as w i,j  ∈  q × 1, where q = a × b. For applying the linear regression to estimate the class-specific model, we stack all column vectors w i,j regarding the class-membership. Hence, for the ith class, we have

$$ {\boldsymbol{W}}_i=\left[{\boldsymbol{w}}_{i,1},\dots, {\boldsymbol{w}}_{i,j},\dots, {\boldsymbol{w}}_{i,{p}_i}\right]\in {\Re}^{q\times {p}_i}, $$
(1)

where each vector w i,j is a column vector of W i . Thus, in the training phase, the ith class is represented by a vector space W i , which is called the regressor for each subject.

If y belongs to the ith class, it can be represented as a linear combination of the training images from the ith class and can be defined as

$$ \begin{array}{cc}\hfill \boldsymbol{y}={\boldsymbol{W}}_{\boldsymbol{i}}{\boldsymbol{\beta}}_{\boldsymbol{i}}+\boldsymbol{e},\hfill & \hfill i=1,2,\dots, N\hfill \end{array}, $$
(2)

where \( {\boldsymbol{\beta}}_i\in {\Re}^{p_i\times 1} \) is the vector of regression parameters and e is an error vector whose elements are independent random variables with zero mean and variance σ 2. The goal of the regression is to find \( {\tilde{\boldsymbol{\beta}}}_i \), which minimizes the residual errors as

$$ \begin{array}{cc}\hfill {\tilde{\boldsymbol{\beta}}}_i= \arg \underset{{\boldsymbol{\beta}}_i}{ \min }{\left\Vert {\boldsymbol{W}}_i{\boldsymbol{\beta}}_i-\boldsymbol{y}\right\Vert}_2^2,\hfill & \hfill i=1,2,\dots, \boldsymbol{N}\hfill \end{array} $$
(3)

The regression coefficients can be solved through the least-square estimation and can be written as a matrix form as

$$ \begin{array}{cc}\hfill {\tilde{\boldsymbol{\beta}}}_i={\left({\boldsymbol{W}}_i^T{\boldsymbol{W}}_i\right)}^{-1}{\boldsymbol{W}}_i^T\boldsymbol{y},\hfill & \hfill i=1,2,\dots, N\hfill \end{array} $$
(4)

The vector of estimated parameters, \( {\tilde{\boldsymbol{\beta}}}_i \), and the predictor, W i , are used to predict the response vector i for the ith class as

$$ \begin{array}{cc}\hfill {\tilde{\boldsymbol{y}}}_i={\boldsymbol{W}}_i{\tilde{\boldsymbol{\beta}}}_i,\hfill & \hfill i=1,2,\dots, N\hfill \end{array} $$
(5)

By substituting (4) for \( {\tilde{\boldsymbol{\beta}}}_i \) in (5), the optimal prediction in the least-square sense becomes

$$ \begin{array}{cc}\hfill {\tilde{\boldsymbol{y}}}_i={\boldsymbol{W}}_i{\left({\boldsymbol{W}}_i^T{\boldsymbol{W}}_i\right)}^{-1}{\boldsymbol{W}}_i^T\boldsymbol{y},\hfill & \hfill i=1,2,\dots, N\hfill \end{array} $$
(6)

Theoretically, we can treat the above equation as a class-specific projection as [39],

$$ \begin{array}{cc}\hfill {\tilde{\boldsymbol{y}}}_i={\boldsymbol{H}}_i\boldsymbol{y},\hfill & \hfill i=1,2,\dots, N\hfill \end{array}, $$
(7)

where i is the projection of y onto the subspace of the ith class by the projection matrix, \( {\boldsymbol{H}}_i={\boldsymbol{W}}_i{\left({\boldsymbol{W}}_i^T{\boldsymbol{W}}_i\right)}^{-1}{\boldsymbol{W}}_i^T \). It is noted that the projection matrix is a symmetric and idempotent matrix.

The LRC is developed based on the minimum reconstruction error. In other words, if the original vector belongs to the subspace of class i, the predicted response vector i will be the closest vector to the original vector. The identity i* could be determined by calculating the Euclidean distance measure between the predicted response vectors and the original vector as

$$ \begin{array}{l}{i}^{*}= \arg \underset{i}{ \min}\begin{array}{cc}\hfill \left\Vert {\tilde{\boldsymbol{y}}}_i-\boldsymbol{y}\right\Vert, \hfill & \hfill i=1,2,\dots, N\hfill \end{array}\\ {}= \arg \underset{i}{ \min}\left\Vert {\boldsymbol{H}}_i\boldsymbol{y}-\boldsymbol{y}\right\Vert \end{array} $$
(8)

2.2 Motivations

The LRC has been developed based on the concept that samples from a specific person are known to lie on a class-specific linear subspace and demonstrated that it could achieve good performance for the low-resolution face images, but not good for severe illumination variations. This is because illumination variations make the data distribution more complicated. So the face images captured under variable lighting conditions may cause the linear subspace approaches inappropriate. In other words, the linear subspace methods would fail when they violate the Lambertian assumption regarding the illumination problem. Especially, when the low-resolution problem is coupled with illumination variations, the linear subspace methods, such as the PCA and LDA, and the linear regression classification (LRC) could not counteract the problem. In this paper, a kernel linear regression classification (KLRC) with a constrained low-rank approximation is proposed for low-resolution face recognition under illumination variations. The KLRC with the nonlinear mapping function can evaluate the LRC in the higher dimensional feature space and can achieve good results.

3 KLRC

Assume the original input space can be always mapped to some higher dimensional feature space where the data set is distributed linearly. As shown in Fig. 1, the left figure shows that it is difficult to fit the data by a regression line because of nonlinear data distribution, whereas the right figure shows that it is easy to fit the data by a regression plane because of data distribution linearly by a mapping function from R 2 to R 3. Thus, it can be expected that a nonlinear mapping prior to linear regression could improve the limitation of the LRC under severe illumination variations. In order to formulate a general equation and solve the problem systematically, we will discuss it in details later. Here, we first introduce the kernel linear regression classification (KLRC) method. The KLRC is also developed based on the theory that samples from a specific class are known to lie on a linear subspace by a nonlinear mapping. The key is to apply a nonlinear mapping function to the input space and then evaluate it by the LRC in the higher dimensional feature space. The dimension of the resulting feature space could be very large. Fortunately, this explicit knowledge of the nonlinear mapping function can be computationally avoided by using the kernel trick [40].

Fig. 1
figure 1

Illustration of a mapping from R 2 to R 3 . The left figure shows it is difficult to fit the data by a regression line because of nonlinear data distribution, whereas the right figure shows it can fit the data by a regression plane because of linear data distribution in the higher dimensional space

For the following theoretical derivation, a vector space should be defined as

$$ {\boldsymbol{Z}}_i=\left[\begin{array}{c}\hfill {\boldsymbol{z}}_{i,1}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {\boldsymbol{z}}_{i,j}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {\boldsymbol{z}}_{i,q}\hfill \end{array}\right]={\boldsymbol{W}}_i, $$
(9)

where each vector z i,j is a row vector of Z i . Specifically, each row vector in Z i is projected from the original space \( {\Re}^{p_i} \) to a high-dimensional space f by a nonlinear mapping function \( \begin{array}{cc}\hfill \boldsymbol{\varPhi} \left({\boldsymbol{z}}_{i,j}\right):{\Re}^{p_i}\to {\Re}^f,\hfill & \hfill f>{p}_i\hfill \end{array} \). Therefore, now, f is the space spanned by Φ(z i,j ). The projected row vectors can be used for linear regression as

$$ \begin{array}{cc}\hfill \boldsymbol{y}=\boldsymbol{\varPhi} \left({\boldsymbol{Z}}_i\right){\boldsymbol{\beta}}_i,\hfill & \hfill i=1,2,\dots, N\hfill \end{array} $$
(10)

Because of the increase in dimensionality, the mapping function Φ(z i,j ) is made implicitly by using the kernel function satisfying Mercer’s theorem. Furthermore, by using the dual representation β i  = Φ(Z i )T α i , the linear regression stated in (10) becomes

$$ \begin{array}{cc}\hfill \boldsymbol{y}=\boldsymbol{\varPhi} \left({\boldsymbol{Z}}_i\right)\boldsymbol{\varPhi} {\left({\boldsymbol{Z}}_i\right)}^T{\boldsymbol{\alpha}}_i={\boldsymbol{K}}_i{\alpha}_i,\hfill & \hfill i=1,2,\dots, N,\hfill \end{array} $$
(11)

where the kernel matrix K i is positive semi-definite. Typically, kernel functions include the polynomial kernel and Gaussian kernel, which satisfy Mercer’s theorem.

We first perform singular value decomposition (SVD) on the kernel matrix K i as

$$ {\boldsymbol{K}}_i=\boldsymbol{US}{\boldsymbol{V}}^T, $$
(12)

where U and V T are left and right SVD orthonormal matrices and S = diag{λ 1, λ 2, …, λ g } is an rectangular diagonal matrix with the descend-sorted singular values on the diagonal with λ 1 ≥ … ≥ λ k  ≥ … ≥ λ g  ≥ 0. To achieve a robust estimation, we propose a constrained rank-r approximation of K i defined as

$$ {\boldsymbol{K}}_i^r=\boldsymbol{U}{\boldsymbol{S}}^r{\boldsymbol{V}}^T, $$
(13)

where

$$ {\boldsymbol{S}}^r=\mathrm{diag}\left\{{\lambda}_1,{\lambda}_2,\dots, {\lambda}_r,0,\ 0, \dots 0\right\}, $$
(14)

by discarding (g − r) least SVD components. The number of principle SVD components, r, is determined by

$$ r= \arg \underset{k}{ \max}\left\{\forall {\lambda}_k\Big|{\lambda}_k>{\lambda}_{\mathrm{median}}+\mu \left({\lambda}_{\mathrm{median}}-{\lambda}_g\right)\right\}, $$
(15)

where μ is a selected control factor, λ g is the smallest singular value, and \( {\lambda}_{{}_{\mathrm{median}}} \), which is the median of all singular values, is expressed as

$$ {\lambda}_{{}_{\mathrm{median}}}=\mathrm{median}\left\{\forall {\lambda}_k\Big|k<g\right\} $$
(16)

After the construction of the constrained low-rank approximation stated in (13), we could obtain a kernel linear regression model for the ith class as

$$ \boldsymbol{y}\approx {\boldsymbol{K}}_i^r{\boldsymbol{\alpha}}_i+\boldsymbol{e} $$
(17)

Then, the kernel linear regression aims to minimize the residual errors as

$$ {\tilde{\boldsymbol{\alpha}}}_i= \arg \underset{{\boldsymbol{\alpha}}_i}{ \min }{\left\Vert {\boldsymbol{K}}_i^r{\boldsymbol{\alpha}}_i-\boldsymbol{y}\right\Vert}_2^2. $$
(18)

The above solution can be also solved by the least-square estimation since it has the same form as stated in (2). After the low-rank approximation, we can use the pseudo-inverse of \( {\boldsymbol{K}}_i^r \) to obtain the least-square solution as

$$ {\tilde{\boldsymbol{\alpha}}}_i={\left({\boldsymbol{K}}_i^r\right)}^{-}\boldsymbol{y}, $$
(19)

where the pseudo-inverse of \( {\boldsymbol{K}}_i^r \) is expressed by

$$ {\left({\boldsymbol{K}}_i^r\right)}^{-}=\boldsymbol{U}{\left({\boldsymbol{S}}^r\right)}^{-}{\boldsymbol{V}}^T, $$
(20)

with

$$ {\left({\boldsymbol{S}}^r\right)}^{-}=\mathrm{diag}\left\{{\lambda}_1^{-1},{\lambda}_2^{-1},\dots, {\lambda}_r^{-1},0,\ 0, \dots 0\right\} $$
(21)

Since \( {\boldsymbol{K}}_i^r{\left({\boldsymbol{K}}_i^r\right)}^{-}\ne \boldsymbol{I} \), it will be feasible to compute the minimum reconstruction error between the original vector and projected vector for determining the classification results.

In the classification phase, the response vector i for the ith class can be predicted by

$$ {\tilde{\boldsymbol{y}}}_i={\boldsymbol{K}}_i^r{\tilde{\boldsymbol{\alpha}}}_i. $$
(22)

By substituting (19) in (22), we can obtain

$$ {\tilde{\boldsymbol{y}}}_i={\boldsymbol{P}}_i\boldsymbol{y}, $$
(23)

and obtain a class-specific kernel projection matrix as

$$ {\boldsymbol{P}}_i={\boldsymbol{K}}_i^r{\left({\boldsymbol{K}}_i^r\right)}^{-}, $$
(24)

where i is the projection of y onto the kernel subspace of the ith class by the class-specific kernel projection matrix, P i . It is noted that \( {\boldsymbol{K}}_i^r{\left({\boldsymbol{K}}_i^r\right)}^{-}\ne \boldsymbol{I} \) is necessary for the KLRC computation.

The KLRC is also developed based on the minimum reconstruction error. So in the recognition phase, the identity i* could be determined by calculating the Euclidean distance measure between the predicted response vectors and the original vector as

$$ \begin{array}{l}{i}^{*}= \arg \underset{i}{ \min}\begin{array}{cc}\hfill \left\Vert {\tilde{\boldsymbol{y}}}_i-\boldsymbol{y}\right\Vert, \hfill & \hfill i=1,2,\dots, N\hfill \end{array}\\ {}= \arg \underset{i}{ \min}\left\Vert {\boldsymbol{P}}_i\boldsymbol{y}-\boldsymbol{y}\right\Vert \end{array} $$
(25)

4 Comparison with the related works

4.1 Analysis of the regression parameter

To simplify the analysis, we assume that W i is a square matrix. We have W i  = U i D i V i T by SVD with U i T U i  = I, because W i is a square matrix, U i T = U i − 1, and U i U i T = I. Similarly, V i V i T = V i T V i  = I. In addition, the linear kernel k(z ij , z ij ) = < z ij , z ij  > = z ij z ij T is used in the KLRC for the theoretical analysis below.

4.1.1 LRC

The goal of the LRC is to find \( {\tilde{\boldsymbol{\beta}}}_i \), which minimizes the residual errors. Statistically, the linear regression model is an unbiased estimate. Also, the variance of the regression parameter vector \( {\tilde{\boldsymbol{\beta}}}_i \) in the linear regression model is expressed as

$$ \begin{array}{l}\mathrm{V}\mathrm{a}\mathrm{r}{\left({\tilde{\boldsymbol{\beta}}}_i\right)}_{\mathrm{LRC}}=E\left\{\left({\tilde{\boldsymbol{\beta}}}_i-{\boldsymbol{\beta}}_i\right){\left({\tilde{\boldsymbol{\beta}}}_i-{\boldsymbol{\beta}}_i\right)}^T\right\}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill \kern2.25em =E\left\{{\left({\boldsymbol{W}}_i^T{\boldsymbol{W}}_i\right)}^{-1}{\boldsymbol{W}}_i^T\boldsymbol{e}{\boldsymbol{e}}^T{\boldsymbol{W}}_i{\left({\boldsymbol{W}}_i^T{\boldsymbol{W}}_i\right)}^{-1}\right\}\hfill \end{array}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill \kern2.25em ={\sigma}^2{\left({\boldsymbol{W}}_i^T{\boldsymbol{W}}_i\right)}^{-1}\hfill \end{array}={\sigma}^2{\left({\boldsymbol{V}}_i{\boldsymbol{D}}_i{{\boldsymbol{U}}_i}^T{\boldsymbol{U}}_i{\boldsymbol{D}}_i{{\boldsymbol{V}}_i}^T\right)}^{-1}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill \kern2.25em =\hfill \end{array}{\sigma}^2{\displaystyle \sum_{j=1}^J\frac{1}{d_{ij}}{\mathbf{v}}_{ij}}{\mathbf{v}}_{ij}^T,\end{array} $$
(26)

where W i  = U i D i V i T by SVD, v ij is the jth column eigenvector of V i , and d ij is the jth eigenvalue corresponding to the v ij .

4.1.2 KLRC

The goal of the KLRC is to find \( {\tilde{\boldsymbol{\alpha}}}_i \), which minimizes the residual errors. Statistically, the kernel linear regression model is also an unbiased estimator since the kernel linear regression model in (17) has the same form as in (2). On the other side, the variance of the regression parameter vector in the kernel linear regression model is expressed as

$$ \begin{array}{l}\mathrm{V}\mathrm{a}\mathrm{r}{\left({\tilde{\boldsymbol{\alpha}}}_i\right)}_{\mathrm{KLRC}}=E\left\{\left({\tilde{\boldsymbol{\alpha}}}_i-{\boldsymbol{\alpha}}_i\right){\left({\tilde{\boldsymbol{\alpha}}}_i-{\boldsymbol{\alpha}}_i\right)}^T\right\}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill =E\left\{{\left({\boldsymbol{K}}_i^r\right)}^{-1}\boldsymbol{e}{\boldsymbol{e}}^T\right({\left({\boldsymbol{K}}_i^r\right)}^{-1}\hfill \end{array}\left){}^T\right\}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill ={\sigma}^2{\left({\boldsymbol{K}}_i^r{\left({\boldsymbol{K}}_i^r\right)}^T\right)}^{-1}\hfill \end{array}\\ {}\begin{array}{cc}\hfill \hfill & \hfill \begin{array}{cc}\hfill \hfill & \hfill ={\sigma}^2{\left(\left({\boldsymbol{Z}}_i^r{\left({\boldsymbol{Z}}_i^r\right)}^T\right){\left({\boldsymbol{Z}}_i^r{\left({\boldsymbol{Z}}_i^r\right)}^T\right)}^T\right)}^{-1}\hfill \end{array}\hfill \end{array}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill ={\sigma}^2{\left(\left({\boldsymbol{W}}_i^r{\left({\boldsymbol{W}}_i^r\right)}^T\right){\left({\boldsymbol{W}}_i^r{\left({\boldsymbol{W}}_i^r\right)}^T\right)}^T\right)}^{-1}\hfill \end{array}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill ={\sigma}^2{\left({\boldsymbol{U}}_i^r{\boldsymbol{D}}_i^r{\boldsymbol{D}}_i^r{\boldsymbol{D}}_i^r{\boldsymbol{D}}_i^r{\left({\boldsymbol{U}}_i^r\right)}^T\right)}^{-1}\hfill \end{array}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill ={\sigma}^2{\displaystyle \sum_{j=1}^r\frac{1}{d_{ij}^2}{\boldsymbol{u}}_{ij}{\boldsymbol{u}}_{ij}^T},\hfill \end{array}\end{array} $$
(27)

where r < J, \( {\boldsymbol{W}}_i^r={\boldsymbol{U}}_i^r{\boldsymbol{D}}_i^r{{\boldsymbol{V}}_i^r}^T \) by the SVD, and d ij is the jth eigenvalue corresponding to the jth eigenvector. Compared (26) with (27), the variance of the regression parameter vector in the KLRC is smaller than that in the LRC. Therefore, it can be expected that the KLRC can provide more reliable regression parameter to the regression model for classification.

5 Experimental results

For verifications, we examine the proposed algorithms on the facial images, which are down-sampled from the extended Yale B (EYB) [41], AR [42], and FERET [43] face databases. In the experiments, we evaluate the proposed method against low-resolution problems coupled with illumination variations. In this section, all experimental results report the top 1 recognition accuracy (%).

The experiments are designed to evaluate the effectiveness of the proposed method in coping with unseen lighting changes under the LR condition. In the experiments, we compare the proposed methods, KLRC-p and KLRC-g, with the PCA+Euclidean, PCA+Mahalanobis [44], KPCA-p, KPCA-g, LDA+Euclidean, LDA+Mahalanobis, KLDA-p, KLDA-g, LRC, RLRC, SRC, LPP, NPE, improved principal component regression (IPRC) [45], unitary regression classification (URC) [46], linear discriminant regression classification (LDRC) [47], and local binary pattern (LBP) [48], where p and g denote the polynomial kernel and Gaussian kernel, respectively. The PCA-based and LDA-based approaches utilize 85 % dimensionality for experiments. It should be noted that this paper assumes the corresponding HR face images for the LR face images are not available as stated in Section 1.1. Hence, the existing face recognition systems with the SR method are not suitable for this problem.

5.1 Experiments on EYB

The EYB contains images of 38 subjects with 9 poses and 64 illuminations per pose. The frontal face images of all subjects with 64 different illuminations are used for evaluation. The EYB is divided into five subsets based on the angle of the light source directions. As a result, there are total 2432 images: 266 (7 images per person), 456 (12 images per person), 456 (12 images per person), 532 (14 images per person), and 722 (19 images per person) images in subsets 1 to 5, respectively. All images are cropped, lowpassed, and resized to low-resolution images in size of 8 × 8 pixels, as shown in Fig. 2. Subset 1 is conducted for training, and the remaining subsets (subsets 2 to 5) are used for testing.

Fig. 2
figure 2

10 samples with the low-resolution problem which are obtained from five subsets of two persons from the EYB face database

First of all, we investigate the performances under different image resolutions. Figure 3 shows the average recognition rate over the remaining subsets. Results reveal that the proposed KLRC-p and KLRC-g outperform the other methods consistently. Low-resolution face images with 8 × 8 pixels degrade the performances significantly. Nonetheless, the proposed KLRC algorithms still could perform well under the very low-resolution condition. Also, the results show that the performances achieved by the PCA-based, LDA-based, and the other subspace projection methods are drastically reduced for resolution below 16 × 16 pixels. Moreover, it is interesting to point out that the KLRC-p and KLRC-q under 8 × 8 pixels outperform the PCA-based and LDA-based approaches under 32 × 32 pixels significantly and achieve comparable recognition rate by the LRC-based approaches under 32 × 32 pixels. LBP [48] is a local feature method which can effectively defense the illumination variations under 32 × 32 pixels. However, LBP cannot perform very well under very low-resolution situation since the facial information is not enough for local feature extraction. This success should be attributed to perform linear regression in the higher dimensional feature space.

Fig. 3
figure 3

The recognition rate under different image resolutions on the EYB face database

Then, we further focus on the low-resolution images. As shown in Table 1, the KLRC-p and KLRC-g outperform the LRC, RLRC, IPCR, URC, LDRC, PCA+Euclidean, PCA+Mahalanobis, kernel-based PCA, LDA+Euclidean, LDA+Mahalanobis, kernel-based LDA, SRC, LPP, NPE, and LBP for the low-resolution face recognition under illumination variations. The PCA-based methods are the worst ones. Note that although it is widely accepted that the discriminant-based approaches offer higher robustness to lighting variations than the PCA-based approaches [3], the discriminant-based approaches still cannot withstand the low-resolution problem coupled with illumination variations in our work. It is because the low-resolution image contains insufficient high-frequency components containing the discriminative information for discriminant analysis. In [45], Huang et al. had shown that the regression-based method can perform better than the discriminant-based approach for face recognition under illumination variations. The proposed KLRC-p and KLRC-q have gained improvement significantly. In addition, although the RLRC performs better than the LRC, the RLRC could not obtain satisfactory performance because of performing regression in the original linear space.

Table 1 Performance (%) comparisons of different methods on EYB in size of 8 × 8 pixels

5.2 Experiments on AR

For further verifications, we conducted experiments on AR face database. The AR database, built by Martinez and Benavente, totally contains 3510 mug shots of 135 subjects (76 males and 59 females) with different facial expressions, lighting changes, and partially occlusions. Each subject contains 26 images in two sessions. The first session, containing 13 images, includes the neutral expression, smile, anger, screaming, different lighting changes, and two realistic partial occlusions with lighting changes. The second session duplicates the first session in the same way 2 weeks later.

To evaluate the effectiveness of the proposed approach in coping with variable illuminations, only the face images with illumination variations were considered in the experiments. All color face images in the AR are converted to gray levels, cropped, and down-sampled to the size of 8 × 6 pixels. Note that no face alignment is done in the copped face images. As shown in Fig. 4, 120 subjects with 8 face images under illumination variations, including no lighting, left lighting, right lighting, and full lighting, are chosen for evaluation. Training is conducted on the images with no lighting, and the remaining lighting conditions (left lighting, right lighting, and full lighting) are used for testing.

Fig. 4
figure 4

Eight samples in size of 8 × 6 pixels from the lighting subset of one person from AR database

The experimental results are tabulated in Table 2, which reflects that the proposed KLRC-p and KLRC-g can attain higher recognition rate than the LRC, RLRC, IPCR, URC, LDRC, PCA+Euclidean, PCA+Mahalanobis, kernel-based PCA, LDA+Euclidean, LDA+Mahalanobis, kernel-based LDA, NPP, NPE, and LBP for low-resolution face recognition under illumination variations. From the experimental results, we can observe that SRC performs well as the proposed KLRC, especially under low ill-posed situation. However, the execution time in SRC-based approaches generally is more than that in LRC-based approaches [49]. The discriminant-based approaches, LRC, RLRC, IPCR, URC, and LDRC, could not work well for low-resolution face recognition under illumination variations.

Table 2 Performance (%) comparisons of different methods on AR in size of 8 × 6 pixels

5.3 Experiments on FERET

We further conduct experiments on the FERET face database to evaluate the performance on the illumination and expression variations since the different facial expression is inevitable to face recognition. The FERET includes 250 people with four frontal view images from each subject. These 1000 face images with illumination and expression variations are resized to 8 × 6 pixels. Two images of each person are randomly selected for training, and the other two images are for testing. The experimental results are tabulated in Table 3, which has reported that the KLRC-p and KLRC-g can achieve higher recognition rate than the LRC, RLRC, IPCR, URC, LDRC, PCA+Euclidean, PCA+Mahalanobis, kernel-based PCA, LDA+Euclidean, LDA+Mahalanobis, kernel-based LDA, SRC, NPP, NPE, and LBP. As the results shown, we can observe that the KLRC-p and KLRC-g can work well for low-resolution problems with illumination and expression variations. It is reasonable because the LR face image will lose facial expression information [25]. On the other hand, in FERET, the lighting variations are slight, so the improvement is limited.

Table 3 Performance (%) comparisons of different methods on FERET in size of 8 × 6 pixels

6 Conclusions

In this paper, the statistical analyses and experiment results verified that the proposed class-specific kernel linear regression classification performs the best for the low-resolution face recognition under illumination variations. With the kernel trick, the nonlinear and increased-dimension mapping function enhances the modeling capability for low-resolution and illumination variations. Furthermore, the constrained low-rank approximation has been proposed to perform low-rank approximation automatically to make the kernel projection feasible for classification. The comparisons with the state-of-the-art methods indicate a comparable performance for the proposed KLRC-p and KLRC-g. We have demonstrated that the proposed KLRC-p and KLRC-g perform better than the PCA+Euclidean, PCA+Mahalanobis, KPCA-p, KPCA-g, LDA+Euclidean, LDA+Mahalanobis, KLDA-p, KLDA-g, LRC, RLRC, SRC, LPP, NPE, IPRC, URC, LDRC, and LBP for low-resolution face recognition under variable lighting. In summary, the KLRC-p and KLRC-g dramatically improve the LRC to possess good robustness for very low-resolution face recognition under severe illumination variations.