1 Introduction

In all biometric techniques, face recognition is the most attractive technique. A lot of face recognition methods have been applied to identity authentication and security system [3, 4, 11, 21, 23]. However, face recognition still faces a lot of challenges such as the various lightings, facial expressions, poses and environments [10, 13, 14, 33, 35, 42, 54, 57, 65, 68]. In order to overcome these challenges, a lot of representation-based classification methods (RBCMs) [15, 29, 31, 37, 52, 53, 63, 64, 68] are proposed such as SRC [52], collaborative representation classification (CRC) [63], two-phase test sample representation (TPTSR) [53], linear regression classification (LRC) [37], feature space representation method [61], an improvement to the nearest neighbor (INNC) classification [55], etc. SRC tries to represent the test sample by an optimal linear combination of the training samples. Then, the test sample is assigned to the class which has the minimum deviation. Since SRC uses the l1-regularization to obtain the coefficient vector, SRC is viewed as a l1 norm-based representation method. Scholars also proposed l2 norm-based representation methods such as CRC, LRC and TPTSR. The same as SRC, CRC uses the combination of all training samples to represent the test sample. But it computes the coefficient vector by l2-regularization. The recently proposed LRC is closely related with nearest intra-class space (NICS) method [32]. The major difference between LRC and CRC is that LRC tries to use an optimal linear combination of the training samples of each class to represent the test sample. In other words, it establishes a linear system for each class. TPTSR first eliminates the classes which are far from the test sample. Then, the training samples of the remainder classes are combined to represent the test sample.

Besides the above-mentioned RBCMs, another way to improve the recognition rate is simultaneously using the original training sample and corresponding virtual samples [16, 34, 38, 40, 44, 46, 47] to recognize the test sample. In real-world application of face recognition, because of the limited training samples, face recognition methods often suffer the challenges of varying poses, illuminations and facial expressions of face image. To solve these problems, a lot of methods have been proposed. For example, Xu et al. use the symmetry of the face to generate the mirror samples [57]. Then, the mirror samples are integrated into original training set to recognize the test samples. Another method [58] of Xu et al. generates a kind of virtual samples by the combination of multiple descriptions or representations. Then, the original training samples and virtual samples are integrated to recognize the test sample. The formula of multiple representations is defined as Jij = Iij ⋅ (m − Iij). In this case, m = 255, Iij and Jij stand for the intensity of the pixel at the i − th row and j − th col. of original training sample and virtual training sample matrix. Experimental results in [57, 58] demonstrate that the virtual samples are able to reflect the possible changes of poses, illuminations and facial expressions of face image. Although virtual samples are able to enlarge original training set, this might adversely affect the efficiency of the appointed algorithm. So how to improve the recognition rate with limited training samples is still a hot research topic.

To improve the recognition rate with limited training samples, scholars proposed the fusion theory. Fusion theory is mainly performed at three levels, i.e., score level, decision level and feature level. Although the decision level fusion is very easy to implement, the multisource information cannot be fully exploited. Feature level fusion is able to exploit the most information from the original data, but it need to overcome the challenges of the inconsistence of different data set. For the score level fusion [8, 20, 68], there are three kinds of score level fusions, i.e., transformation-based score fusion [30], classifier-based score fusion [26] and density-based score fusion. The classifier-based score fusion directly combines the scores of different data source into form a feature vector. Then, the test sample is classified in teams of the feature vector. The transformation-based score fusion first transforms the scores of different data source into a common domain. Then, it classifies the test sample by the integration of the normalized scores. Density-based score fusion is able to obtain higher recognition rate than transformation-based score fusion and classifier-based score fusion if we can accurately evaluate the scores densities. But in real-world applications, because of the complex of evaluation, density-based score fusion is hard to use. Pervious proposed literatures [11, 12, 18, 49] reflect that if the score fusion methods were integrated into RBCMs, especially the GSRCMs [48, 56, 62], the test sample is more possibly to be assigned to the correct class.

However, pervious proposed GSRMs only consider the residuals from different kinds of groups, and not consider the relationship between group and group. Moreover, in practical application of face recognition, we only have very small number of training samples to use. So this motivates us to propose a novel GSRM to resolve above problems. In order to classify the test sample, the proposed GSRM first selects the training samples of its nearest classes to form the first group. Next, a framework is performed on the first group to generate a kind of virtual samples. All virtual samples are formed the second group. Then, SRC is performed on the two groups to generate two residuals for each class. Finally, two residuals of a class and their distance are fused as the ultimate residuals. The distance is defined as the difference between the reconstructed sample of the class from the first group and second group.

The proposed method has the following main contributions. Firstly, it exploits the idea that representing the test sample by its nearest neighbor classes. This is able to reduce the side-effect of some low relative training samples. Secondly, by designing a framework of generating a kind of virtual sample, the challenges of the various poses, illuminates and facial expressions can be partly overcame. Thirdly, for the classification decision of the test sample, the proposed method not only takes into count two residuals of a class, but also considers a distance of the class since they all contain discriminant information. Fourthly, a novel weighted fusion approach is proposed to fuse two residuals and the distance of the class. The fusion approach takes the test sample into account and generates adaptive weights to different residuals. This is helpful to better classify the test sample.

In the rest of this paper, we mainly give a brief review to SRC and LRC in section 2. In section 3, we present the details of the proposed method. In the meanwhile, the rationalities of the proposed method are analyzed in section 4. In section 5, we conduct the experiments on five databases. The conclusion is showed in section 6.

2 Related works

SRC and LRC are two conventional representation-based methods. Many proposed representation-based classification methods are based on SRC and LRC. Assuming that there are c classes and each class has n training samples. The test sample is denoted by z. Moreover, we converted each sample matrix into a column vector.

2.1 SRC

Suppose that the samples of the i-th class are denoted by xi1, xi2, …, xin, where i = 1, 2, …, c. Let us combine all training samples to form a matrix X = [x11, …, x1n, …, xc1, …xcn]. According to SRC, we have

$$ \mathbf{X}A\approx z $$
(1)

where A = [a11, …, a1n, …, ac1, …acn]is the coefficient vector. Next, SRC solves the Eq. (1) with l1-norm regularization

$$ {A}^{\prime }=\arg \min {\left\Vert A\right\Vert}_1,\mathrm{s}.\mathrm{t}.{\left\Vert z-\mathbf{X}A\right\Vert}_2^2\le \xi $$
(2)

In this case, ξ is very small positive.

Assuming that the \( {a}_e^{\prime },\dots, {a}_f^{\prime } \) is the coefficients of the i-th class, then the representation residual of the i-th class can be written as

$$ {r}_i=\left\Vert z-{\sum}_{t=e}^f{a}_t^{\prime }{x}_{it}\right\Vert $$
(3)

If the minimum residual is from the i-th class, the test sample will be classified to the i-th class.

The major difference between CRC and SRC is that CRC solves the Eq. (2) with l2-norm regularization.

2.2 LRC

LRC is viewed as a l2 norm-based representation method. Let us combine all training samples of the i-th class to form a matrix Xi = [xi1, …, xin]. LRC represents the test sample by the linear combination of the training samples of each class. In other words, it exploits the following model to obtain the representation coefficient of each training sample.

$$ {\mathbf{X}}_i{B}_i\approx z $$
(4)

where Bi = [bi1, bi2, …, bin] is the coefficient vector of the i-th class. The model of solving Eq. (4) can be written as

$$ {B}_i^{\prime }=\left[{b}_{i1}^{\prime },\dots, {b}_{in}^{\prime}\right]={\left({\mathbf{X}}_i^T{\mathbf{X}}_i+\mu \mathbf{I}\right)}^{-1}{\mathbf{X}}_i^Tz $$
(5)

In this case, the residual of the i-th class is

$$ {d}_i=\left\Vert z-{\sum}_{t=1}^n{b}_t^{\prime }{x}_{it}\right\Vert $$
(6)

The test sample will be assigned to the i-th class if the minimum residual is from the i-th class.

3 The proposed method

3.1 The description of framework

In this subsection, we show the details of the framework. Let us denote the left and right vector of an original face image as z1 and z2. The average vector of z1 and z2 is denoted by z0. Both z1, z2 and z0 are column vectors. The model of obtaining z1, z2 and z0 is as follows. Let Fj (j = 1, 2, …, Q) denotes the j-th column of the 2D image matrix, where Q is an even number. The pixel at the i-th row and j-th column of F is denoted by Fij. z1, z2 and z0 are defined as

$$ {z}_1=\left[\begin{array}{l}{\mathbf{F}}_1\\ {}{\mathbf{F}}_2\\ {}\vdots \\ {}{\mathbf{F}}_{\frac{Q}{2}}\end{array}\right],{z}_2=\left[\begin{array}{l}{\mathbf{F}}_Q\\ {}{\mathbf{F}}_{Q-1}\\ {}\vdots \\ {}{\mathbf{F}}_{\frac{Q}{2}+1}\end{array}\right],{z}_0=\frac{z_1+{z}_2}{2} $$
(7)

Finally, z1 and z0 are integrated to form a kind of virtual image. The procedure of the framework is showed in Fig. 1.

Fig. 1
figure 1

The flow chart of the procedure

3.2 The description of novel weighted fusion approach

In this subsection, we show the details of the novel weighted fusion approach. It is sure that conventional weighted fusion approaches are able to take advantages of data from different source. However, conventional weighted fusion approaches assign fix weights to different data. Sometimes the fix weights are not the optimal weights of the data. So this motives us to propose an automatic and adaptive weighted fusion approach. Our weighted fusion approach supposes that there are two training sample sets and each set has c classes. The residual between a test sample and the i-th class of the first set is denoted by \( {r}_i^1 \), i = 1, 2, …, c. The residual between the test sample and the i-th class of the second set is denoted by \( {r}_i^2 \). Moreover, we define \( {\alpha}_h=\frac{\max \left({r}_1^h,{r}_2^h,\dots, {r}_c^h\right)-\min \left({r}_1^h,{r}_2^h,\dots, {r}_c^h\right)}{q} \), where \( q={\sum}_{t=1}^c{r}_t^h \) and h = 1, 2. The details are as follows.

  1. Step 1:

    \( {r}_i^1 \) and \( {r}_i^2 \) are normalized to the range of 0 to 1. In this case, \( {r}_i^1 \) and \( {r}_i^2 \) can be rewritten as

$$ {r}_i^1=\left({r}_{\mathrm{max}}^1-{r}_i^1\right)/\left({r}_{\mathrm{max}}^1-{r}_{\mathrm{min}}^1\right) $$
(8)
$$ {r}_i^2=\left({r}_{\mathrm{max}}^2-{r}_i^2\right)/\left({r}_{\mathrm{max}}^2-{r}_{\mathrm{min}}^2\right) $$
(9)
  1. Step 2:

    \( {r}_1^1,{r}_2^1,\dots, {r}_c^1 \) are sorted in the order of ascending and the sorted results is recorded as \( {r_1^1}^{\prime}\le {r_2^1}^{\prime}\le \cdots \le {r_c^1}^{\prime } \). After that, \( {r}_1^2,{r}_2^2,\dots, {r}_c^2 \) are sorted in the order of ascending and the sorted result is recorded as \( {r_1^2}^{\prime}\le {r_2^2}^{\prime}\le \cdots \le {r_c^2}^{\prime } \). Let \( w=\left({r_2^1}^{\prime }-{r_1^1}^{\prime}\right)+\left({r_2^2}^{\prime }-{r_1^2}^{\prime}\right) \), \( {w}_1=\frac{\left({r_2^1}^{\prime }-{r_1^1}^{\prime}\right)}{w} \), \( {w}_2=\frac{{r_2^2}^{\prime }-{r_1^2}^{\prime }}{w} \). So, the ultimate residual of the i-th class can be written as

$$ {f}_i={\alpha}_1{w}_1{r}_i^1+{\alpha}_2{w}_2{r}_i^2 $$
(10)

3.3 The description of the grouped spare classification

The details of the proposed method are showed in this subsection. The number of the classes is still defined as c. Each class has n training samples. All sample matrixes are converted into column vectors. Each column vector has m components. The training samples of the i-th class are denoted by xi1, xi2, …, xin, where i = 1, 2, …, c. And the test sample is denoted by z. The details are as follows.

  • First step: We first use the linear combination of the training samples of original training set to represent the test sample. According to SRC, each class of the training set obtains a residual. Next, all training samples from K classes with the first K smallest residuals are selected to form the first group, where K < c.

  • Second step: We perform the framework on the first group to generate the virtual samples. The virtual samples are integrated into the second group. Let us respectively combine all training samples from the first and second groups to form the matrixes G1 and G2. Then SRC are performed on these two groups. In this case, we have

$$ {\mathbf{G}}_1{A}_1\approx z,{\mathbf{G}}_2{A}_2\approx z $$
(11)

where A1 and A2 are the coefficient vectors of these two groups. To solve Eq. (11), according to Eqs (2) and (3), each class of the K selected classes obtains two residuals. And corresponding training samples obtain two coefficients.

  • Third step: Let ae, …, af and \( {a}_e^{\prime },\dots, {a}_f^{\prime } \) be the coefficients of the training samples. Next, the reconstructed samples of the i-th class in two groups are defined as \( {\sum}_{t=e}^f{a}_t{x}_{it} \) and \( {\sum}_{t=e}^f{a}_t^{\prime }{x}_{it}^v \), where \( {x}_{it}^v \) is the virtual sample of xit. Then, the distance of the i-th class can be written as

$$ {d}_i=\left\Vert {\sum}_{t=e}^f{a}_t{x}_{it}-{\sum}_{t=e}^f{a}_t^{\prime }{x}_{it}^v\right\Vert $$
(12)
  • Fourth step: Let us perform the proposed weighted fusion approach on two groups. According to the description of subsection 3.2, the ultimate residual of the i-th class of K selected classes can be written as

$$ {s}_i={\alpha}_1{w}_1{r}_i^1+{\alpha}_2{w}_2{r}_i^2+\left({\alpha}_1-{\alpha}_2\right){d}_i $$
(13)

where \( {r}_i^1 \) and \( {r}_i^2 \) are two residuals of the i-th class. If si is the minimum residual, the test sample will be classified to the i-th class.

4 Analysis of the proposed method

4.1 Intuitive rationalities of the proposed method

There are four rationalities in the proposed method. Firstly, the test sample is represented by linear combination of the training samples of its nearest neighbor classes. In representation-based classification method, different training samples have different effect on the classification decision of the test sample. If a training sample is close to the test sample, the training sample has a great effect on the decision of classifying the test sample, and if a training sample is far enough from the test sample, the training sample has little effect and even has side-effect on the decision of classifying the test sample. So, in order to better represent the test sample, we can eliminate the training samples of the classes that are far from the test samples. In fact, if the test sample is represented by the linear combination of the training samples of all class, according to SRC, the training samples of its nearest neighbor classes will obtain small representation coefficients. This is not good enhancing the capability of representation. However, if we remove the training samples of the class that are far from the test sample, the representation coefficients of its nearest neighbor classes will be enlarged. In this case, the test sample can be better represented. Secondly, we proposed a framework to generate a kind of virtual sample. In real-world applications, we often suffer the challenges of various poses, illuminations and facial expressions of face image because of the limited training samples. So, the simultaneous use of the original training samples and virtual samples is able to represent the test sample better. For the framework, the obtained virtual samples are not far from original training samples. Moreover, the virtual samples and original training samples are not linear correlative. In other words, the virtual samples are not from the subspace of the original training set. So the generated virtual samples are very helpful to strengthen the capability of representing the test sample. Thirdly, a novel grouped classification method is proposed to classify the test sample. We not only take into count two residuals of a class in the first and second group, but also consider a distance of the class. Two residuals of a class contain the discriminant information of the test sample. For the distance, let us denote \( \Delta {z}_i^1=z-{\sum}_{t=e}^f{a}_t{x}_{it} \) be the residual vector of the i-th class from the first group, \( \Delta {z}_i^2=z-{\sum}_{t=e}^f{a}_t^{\prime }{x}_{it}^v \) be the residual vector of the i-th class from the second group. Figure 2 shows the relationship of \( \Delta {z}_i^1 \), \( \Delta {z}_i^2 \) and \( {dis}_i={\sum}_{t=e}^f{a}_t{x}_{it}-{\sum}_{t=e}^f{a}_t^{\prime }{x}_{it}^v \). Clearly,

Fig. 2
figure 2

The relationship of \( \Delta {z}_i^1 \), \( \Delta {z}_i^2 \) and disi

$$ {dis}_i=\Delta {z}_i^1-\Delta {z}_i^2={\sum}_{t=e}^f{a}_t{x}_{it}-{\sum}_{t=e}^f{a}_t^{\prime }{x}_{it}^v $$
(14)

It reflects the difference between the reconstructed samples of the class in the first and second group. In addition, \( \Delta {z}_i^1 \), \( \Delta {z}_i^2 \) and disiform a triangle of the m-dimensional space, i.e, subsection 3.3. \( \Delta {z}_i^1 \) and \( \Delta {z}_i^2 \) show the discriminant information between the test sample and i-th class, so the distance disi simultaneously contains the discriminant information of the test sample in two groups. If the reconstructed sample of the class in the first group is similar to the one in the second group, the corresponding distance will be very small. In this case, the fusion two kinds of residuals and the distance of the class maybe obtain smaller residual. This helps improve the recognition accuracy of test sample. Fourthly, a novel weighted fusion approach is proposed to fuse two residuals and the distance of a class. The weighted fusion approach uses a simple way to generate the weights for different residuals of a class automatically. In other words, the weighted fusion approach takes each test sample into account and determines optimal weights for the test sample. This is able to consider the dissimilarity between the test sample and the class flexibly.

4.2 More analysis of the proposed method

4.2.1 Insight into the advantage of the framework of our method

In this subsection, we compare our method with the method [54] to demonstrate that the linear combination of virtual samples of our method is able to better represent the test sample. The method [54] exploits the symmetry of face to generate a kind of axis-symmetrical virtual sample. It first divides each training face image into two halves. Next, the two halves are used to generate two symmetrical virtual face images. All original training samples and virtual samples are combined to form the first and second group, respectively. Then, one of conventional RBCs is performed on these two groups to generate two residuals for each class. Finally, two residuals of a class are fused to classify the test sample. The details on how to obtain the virtual sample of the method [54] are showed in Fig. 3. For the comparison, we convert all sample into column vectors and use Euclidean distance \( d=\sqrt{{\left(x-y\right)}^T\left(x-y\right)} \) to evaluate the similarly between the test sample and reconstructed virtual sample of the method [54], reconstructed virtual sample of our method. The reconstructed virtual sample is defined as the linear combination of virtual samples from assigned class. In Fig. 4, the first row is the test samples of YALEB face database [17] from the 1-th, 2-th, 3-th, 4-th and 5-th class. The second and third rows are the reconstructed virtual samples of the method [54] and our method from the first five classes. According to the evaluation results in Table 1, we conclude that the distances between the test samples and reconstructed virtual samples of our method are smaller than the distances between the test samples and reconstructed virtual samples of the method [54]. In other words, the reconstructed virtual samples of our method are closer to the test samples. This implies that the integration of original training samples and virtual samples of our method is able to better recognize the test sample.

Fig. 3
figure 3

The details of the method [54]

Fig. 4
figure 4

The first, second and the third rows are the test samples, reconstructed virtual samples of method [54] and reconstructed virtual samples of our method.

Table 1 The Euclidean distances between the test sample and reconstructed virtual sample of our method, reconstructed virtual sample of method [54]

4.2.2 The advantage of the weighted fusion approach of our method

It should be note that RBCMs classify the test sample based on the residual of each class and assign the test sample to the class which has the smallest residual. The residual reflects the representation capability of each class to the test sample. In the four step of our method, we compare the proposed weighted fusion approach with fix weighted fusion approach to demonstrate that the proposed weighted fusion approach is able to better classify the test sample. We first give a toy example to demonstrate the advantage of the proposed weighted fusion approach. Let us select the test sample from the 5-th class in YALEB face database. The parameter K of our method is set to 19. Moreover, the number of the training samples of each class is set to 4. For the fix weighted fusion approach, the coefficients from the fourth step of our method are set to 0.4, 0.4 and 0.2. In Fig. 5, we conclude that the proposed weighted fusion approach is able to obtain smaller residuals than the fix weighted approach. Clearly, the discriminability of red line is better than blue line. In red line, it is easy to determine that the smallest residual is from the 5-th class. But in blue line, the residuals of the 1-th class and 4-th class are approximately on the same horizontal line. Moreover, using the fix weighted fusion approach cannot classify the test sample to correct class. It classifies the test sample to the 4-th class, not the 5-th class. However, our method is able to classify the test sample to the correct class because the smallest residual is from the 5-th class. In this case, compared with fix weighted fusion approach, our weighted fusion approach is able to better classify the test sample.

Fig. 5
figure 5

The residuals of the fix weighted fusion approach and our weighted fusion approach in YALEB face database

We further use CMU-PIE face database as an example to demonstrate the advantage of the proposed weighted fusion approach. The test sample is selected from the 2-th class and the parameter K of our method is set to 19. Moreover, we took the first 4 samples of each class as training samples. For the fix weighted fusion approach, the coefficients of the fourth step of our method are set to 0.4, 0.4 and 0.2. In Fig. 6, it is obvious that the residuals of the proposed weighted approach are smaller than the residuals of the fix weighted approach. The residual of the 2-th class is clearly the smallest in red line. Meanwhile, the use of fix weighted approach erroneously classifies the test sample to the 6-th class because the 6-th class has the smallest residual value. However, our method is able to classify the test sample to the 2-th class. This further implies that the representation ability of our method is better than the use of fix weighted approach.

Fig. 6
figure 6

The residuals of the fix weighted approach and our method in CMU-PIE face database

4.2.3 The comparison between our weighted fusion approach and the method [67]

The method [67] proposed an adaptive weighted fusion approach (ADWF) for face recognition. ADWF takes each test sample into account and uses two variables to generate the weights for a class automatically. Suppose that each class has u residuals, the first variable is defined as

$$ {\delta}_k=\frac{sumd-\min \left({r}_1^k,{r}_2^k,\dots, {r}_c^k\right)}{sumd} $$
(15)

where \( sumd={\sum}_{t=1}^u{r}_i^t \) and k = 1, 2, 3, …, u. For the second variable, ADWF firsts sorts the residuals \( {r}_i^1,{r}_i^2,\dots, {r}_i^u \) in the order of ascending and the results are described as \( {r^{\prime}}_1^1\le {r^{\prime}}_2^1\le \cdots \le {r^{\prime}}_c^1 \), \( {r^{\prime}}_1^2\le {r^{\prime}}_2^2\le \cdots \le {r^{\prime}}_c^2 \),…, \( {r^{\prime}}_1^u\le {r^{\prime}}_2^u\le \cdots \le {r^{\prime}}_c^u \). Then, the second variable is defined as

$$ {w}_k=\frac{{r^{\prime}}_2^k}{w} $$
(16)

where \( w={\sum}_{t=1}^u{r^{\prime}}_2^t \). Finally, the residual of the i-th class is denoted as

$$ {s}_i^{\prime }={\delta}_1{w}_1{r}_i^1+{\delta}_2{w}_2{r}_i^2+\cdots +{\delta}_u{w}_u{r}_i^u $$
(17)

The same as above-mentioned comparison strategy, we compare our weighted fusion approach and ADWF in the four step of our method. In this subsection, we select the test sample from the 1–th class in CMU-PIE face database. In the meanwhile, we set the parameter K of our method and the number of the training samples of each class to 0.01 and 27. In Fig. 7, red line shows the residuals of our weighted fusion approach and blue line shows the residuals of ADWF. It is obvious that the residuals of our weighted fusion approach are smaller than ADWF. Meanwhile, in blue line, it is hard to discriminate that which class has the smallest the residual. However, obviously, the residual of the 1-th class is smallest in red line. Moreover, ADWF cannot classify the test sample to the correct class. The smallest residual is from the 3-th class. However, our method is able to classify the test sample to the correct class. The smallest residual of our weighted fusion approach is from the first class. So the results show that the discriminability and representation capability of the test sample of our weighted fusion approach are stronger than ADWF.

Fig. 7
figure 7

The residuals of our weighted fusion approach and ADWF in CMU-PIE face database

To further demonstrate that our weighted fusion approach enable better representing the test sample than ADWF, we use FERET face database an example. In this case, the test sample is from the 38-th class. In the meanwhile, we set the parameter K and the numbers of the training samples set to 80 and 3. The comparison result is showed in Fig. 8. Clearly in Fig. 8, the residuals of our weighted fusion approach are smaller than ADWF. In red line, it is obvious that the residual of the 38-th class is the smallest. But hard to discriminate that which class has the smallest residual in blue line. Moreover, our weighted fusion approach is able to classify the test sample to the correct class. The smallest residual of our weighted fusion approach is from the 38-th class. However, the smallest residual of ADWF is from the 1-th class, not from the 38-th class. This further shows that our weighted fusion approach is able to better represent the test sample than ADWF.

Fig. 8
figure 8

The residuals of our weighted fusion approach and ADWF in FERET face database

4.2.4 Insight into the advantage of the distance of our method

In this subsection, we use an example to demonstrate that the distance of a class helps improve the recognition accuracy of test samples. We first selected the test sample from the 8-th class in ORL face database. The parameter K and the number of the training sample of each class are set to 20 and 4. Red and blue lines in Fig. 9 reflect the ultimate residuals generated by the fusion of two residuals and by our method. For the red line, it is obvious that the ultimate residual of the 8-th class is the smallest. However, it is hard to discriminate which class has the smallest residual in blue line. The residuals of the 25-th class and the 8-th class are approximately at the same level. Moreover, the simple fusion of two residuals cannot classify the test sample to the correct class. The smallest fusion residual is from the 25-th class rather than the 8-th class. This implies that our method is more discriminative than the simple fusion of two residuals.

Fig. 9
figure 9

The fusion results of two residuals and results of two residuals and corresponding distance in ORL face database

Next, we use Libor face database as a sample to further demonstrate that the distance of a class is able to better recognize the test sample. The test sample is selected from the 12-th class. The parameters K and the number of training samples of each class are set to 76 and 4. Fig. 8 shows the residuals between the test sample and the classes of Libor face database. In Fig. 10, it is clearly that the residual of the 12-th class is the smallest in red line. However, the smallest residual in blue line is not obvious. In the meanwhile, the test sample cannot be accurately classified by the simple fusion of two residuals, because the smallest residual of the class is from the 106-th class, not from the 12-th class. This further implies that the discriminability of our method is stronger than the simple fusion of two residuals.

Fig. 10
figure 10

The fusion results of two residuals and results of two residuals and corresponding distance of the class in Libor face database

5 Experimental result and discussion

In this section, the ORL, FERET, Georgia Tech and PIE face databases were used to conduct the experiment. In the meanwhile, SRC, CRC, LRC, coarse to fine K nearest neighbor classification (CFKNNC) [56], improvement to nearest neighbor classification (INNC), homotopy [7], primal augmented lagrangian method (PLAM) [60], the method [65], discriminative sparse representation method (DSRM) [59], block-diagonal representation (BDLRR) [66] and the method [27] were used to compare with our method. Moreover, we set the parameter μ of our method as 0.01. And the number of iteration of Homotopy and PLAM is set to 10. For the parameter K, the size is various because the number of the classes of different datasets is different. In the middle of the experiments, we find that the improvement of recognition accuracy of our method is obvious if K approximately equals to one-half or one-third of the number of the classes. Finally, for the computation formula of accuracy, assuming that the number of the test samples in correct classes is pr,then the formula is define as \( accuracy=\frac{p_r}{C\times n}\times 100\% \).

5.1 Experiments on ORL face database

We first conduct the experiments on a subset of ORL face database [41]. The subset has 400 images from 40 classes and each class has 10 images. The size of each image is 30 × 30. Some training and test images of the first three classes are showed in Fig. 11. Moreover, we used the first 1, 2, 3, 4 images of each class as the training images and the remainder samples as the test images. K in our method is set to 20, respectively. The experimental results were showed in Table 2. We find that our method performs better than SRC, LRC, CRC, CFKNNC, INNC, homotopy, PLAM, the method [54], DSRM, BDLRR and the method [27]. For example, when the number of training images per class is 4, SRC, LRC, CRC, CFKNNC, INNC, homotopy, PLAM, the method [54], DSRM, BDLRR, the method [27] and the proposed method are 90%, 85%, 89.17%, 82.52%, 81.25%, 89.75%, 86.25%, 77.50%, 92.17%, 90.42%, 84.17 and 92.50%, respectively. The accuracies of all other methods are lower than that of the proposed method.

Fig. 11
figure 11

Some training and test images of the first three classes

Table 2 THE ACCURACIES (%) ON THE ORL FACE DATABASE

5.2 Experiments on Georgia Tech face database

In this subsection, we conducted the experiments on the Georgia Tech face database [19]. The Georgia Tech face database has 750 images from 50 faces. Each face has 15 images. The first 1, 2, 3 and 4 images of each face were selected as training images and the rest samples as test images. In the meanwhile, we set the parameter K to 15. Figure 12 shows some training images and test images of Georgia Tech face database. We respectively used SRC, LRC, CRC, CFKNNC, INNC, homotopy, the method [54], PLAM, the method [27] to compare with our method and obtained the experimental results shown in Table 3. In Table 3, we conclude that the recognition rates of our method are higher than those eleven state-of-the-art methods.

Fig. 12
figure 12

Some training images and test images of Georgia Tech face database

Table 3 The accuracies (%) on Georgia Tech face database

5.3 Experiments on FERET face database

Next, the FERET face database [39] is used to conduct the experiment. This subset has 200 classes and each class has 7 images. We took the first 1, 2, 3, 4 image of each class as the training samples and the remainder as the test sample. Moreover, we set the parameter K to 100. Figure 13 shows the first three face image on FERET face database. In Table 4,we conclude that when the number of training sample is 3, INNC obtains the highest recognition rate. However, our method performs better than the other eleven methods if the number of the training samples is 1, 2 and 4.

Fig. 13
figure 13

The first three faces on the FERET face database

Table 4 The Accuracies (%) on the feret face database

5.4 Experiments on the CMU-PIE face database

Then, the subset of CMU-PIE face database [43] which has 3332 images is used to conduct the experiment. Each face has 68 images. We took the first 1, 2, 3, 4 images of each face as the training images and the rest as the test images. Moreover, the parameter K of our method was set to 20. Figure 14 shows some images of CMU-PIE face database and Table 5 shows the experimental results. In Table 5, we conclude that when the number of the training samples is 1, the recognition rate of DSRM is the highest. However, the proposed method performs better than those eleven state-of-the-art methods when the number of the training samples is 2, 3 and 4.

Fig. 14
figure 14

Some images of CMU-PIE face database

Table 5 The accuracies (%) on the cmu-pie face database

5.5 Experiments on the Libor face database

Finally, we use Libor face database (http://cswww.essex.ac.uk/mv/allfaces/index.html) to conduct the experiments. Libor face database was designed and maintained by Dr. Libor Spaeck in 2017. The face database has 3040 face images and each face has 20 images. The first 1, 2, 3 and 4 images of each face were taken as training samples and the remainder samples of the face as test samples. Moreover, we set the parameter K to 50. Some training samples and test samples of Lirbor face database were showed in Fig. 15. From Table 6, we conclude that when the number of the training samples is 4, the recognition accuracy of DSRM is the highest. However, our method performs better than those eleven state-of-the-art methods when the number of the training samples is 1, 2 and 3.

Fig. 15
figure 15

Some training samples and test samples of Libor face database

Table 6 The accuracies (%) on the libor face database

5.6 Discussion and analysis

Above mentioned competing methods can be categorized into five groups. SRC, LRC and CRC are traditional sparse representation-based classification method. INNC and CFKNNC are based on nearest neighbor classification method (NNC) [12]. Different from traditional sparse representation-based classification method, Homotopy and PLAM exploit greedy strategy approximation to solve sparse representation problem. So, Homotopy and PLAM are regarded as iterative classification method. The method in [27, 54] belong to the grouped sparse representation-based classification method. And DSRM and BDLRR are the most advanced related method published in recent years.

Through the observation of above experiment results, in most cases, our method is able to obtain higher recognition accuracy than the other competing methods. In fact, our method can be viewed as an improved SRC. Compared with the mentioned three traditional spare representation-based classification methods and two improved NNCs, the improvement in recognition accuracy of our method is obvious. The maximum improvement is greater than 23%. Although Homotopy and PLAM may be able to obtain higher recognition accuracies than our method if the number of iteration is increased, the time complexity will be increased exponentially. The same as our method, the methods in [27, 54] both exploit the original training set to generate a kind of virtual samples. Then a grouped sparse strategy and weighted fusion approach are used to classify the test sample. Moreover, in the step of fusing the residuals of a class, the method [27] uses the inner product of residual vectors of a class to reflect the relations of different groups. However, the two methods are beat by our method. This reflects that our analysis in the fourth section is rational. For DSRM and BDLRR, when the number of training samples is small, i.e., less than 4, in most cases, the recognition accuracies of our method are higher than these two most advanced related methods. However, DSRM and BDLRR can beat our method when each class has large number training samples.

In recent years, deep learning theory [36, 45] has become a hot spot of research in face recognition. However, deep learning theory need a large number of training samples to train the model. If the number of training sample is very limited, deep learning theory may be over fitting. In this case, deep learning theory is not suitable for the database with a small number of samples. Different from deep learning, our method has a good performance on the small size database.

Our method uses the pixels of face image as the features to classify the test sample. However, the time complexity of our method will be increased accordingly if the size of face image is large. Moreover, a framework of our method exploits the axis-symmetrical property of face to generate a kind of virtual samples. In this case, if the face image is not-symmetrical, the framework may generate a misshapen face image. Fortunately, pervious literatures [50, 51] show that image feature extraction can greatly reduce the dimensionality of data without reducing the recognition rate. So our method maybe obtains good recognition accuracy and lower time complexity if it combines with efficient feature extraction methods.

Meanwhile, pervious literatures [1, 2, 25] show that image feature can be used to steganography [24, 28] and digital-watermarking [9]. In this case, maybe the integration of our method and appointed feature extraction method can be used in security certification [21, 22] or information hiding [5, 6].

6 Conclusion

In this paper, we proposed a novel group representation-based classification method. The proposed method first combines all training samples of the nearest neighbor classes of test sample to form the first group. Next, a framework is performed on the first group to generate a kind of virtual samples. The virtual samples are combined to form the second group. Then, the proposed method performs SRC on these two groups to obtain two residuals for the classes. Finally, a novel weighted fusion approach is used to fuse two residuals and the distance of a class to recognize the test samples. Two residuals include the discriminant information of the test sample in the first and second group, respectively. The distance simultaneously includes the discriminant information of the test sample in two groups. In the subsection of 4.2.1, we can see that the virtual samples of our method are able to reflect the possible change of poses, facial expressions and illuminations of face image well. Moreover, according to the figures of the subsection of 4.2.2 and 4.2.3, we conclude that our method has strong representation ability on classifying the test sample. Experimental results on ORL, Georgia-Tech, FERET, CM-PIE and Libor face databases reflects above conclusion is reasonable.