We used three datasets for the evaluation: the Labeled Face in the Wild dataset (LFW) , Amsterdam Library of Object Images (ALOI) , and ETL9B1. We describe the experimental setting and results in each dataset.
LFW is a celebrity face database from Yahoo! News2. It has 13,233 images of 5790 subjects. The image set we used was called “LFW-a” , whose images were cropped and normalized to 250 × 250 pixels by a commercial face detector. We used 482 subjects for learning M and G, and 1198 subjects for the gallery and probe. The subjects used for learning did not overlap with the subjects for the gallery and probe. We used both 50 and 482 subjects for learning M and G. When we used 50 subjects, the number of images in each subject was fixed to 22. When we used 482 subjects, the number of images was different in each subject and the average number of images was 22 per subject. We chose one image per subject for the gallery and one image per subject for the probe. We evaluated the computational time and recognition rate by increasing the number of subjects for the gallery and probe from 100 to 1000 in increments of 100. We extracted features following Cao’s method . That is, nine feature points were fixed as shown in Fig. 1, and the SIFT descriptor  was extracted on the points at three scales: 2, 6, and 10. Extracted features were concatenated and the dimensionality was reduced to 100 using principal component analysis. In the recognition process, we used the 1-nearest neighbor. We used the BDH  as the ANNS method. Figure 2 shows the recognition rates and average search times of the proposed method 1000 subjects for the gallery when the BDH parameters changed. This indicates that the recognition rate and speed depend on the BDH parameters. Therefore, we experimented many times with different parameters and present the best recognition rate in the paper.
We compared the proposed method with the face recognition method proposed in , called local generic representation (LGR). The LGR focuses on improving recognition accuracy when a single image per person is available for the gallery and probe. In the literature, the best recognition rate on the LFW database is 30.4% with 50 subjects . We used only 50 subjects for learning in the LGR because the number of images in each subject must be uniform for learning in the LGR. We could evaluate the LGR up to 400 subjects in the gallery because of the memory limitation. We also compared the proposed method with the fast face recognition method proposed in , which uses the PCA-SIFT for image representation and BDH for search. In the literature, a 100% cumulative recognition rate with 139-ms search time on an original 5 million-item database has been presented .
All experiments were conducted on a PC with an Intel Xeon E5-4627 v2 (3.30 GHz) processor and 8 GB of RAM running the Debian 4.9.2-10 operating system using a single processor core. We measured the search time for all queries and calculated the average search time of each query. The search time excluded feature extraction time and learning time.
The recognition rate and average search time are shown in Fig. 3. In Fig. 3
a, “without ANNS” is the method which ANNS is excluded in the proposed method, “BDH+PCA-SIFT” is the method proposed in , and the following numbers represent the number of subjects used for learning. The proposed method showed the same recognition rate as “without ANNS” in Fig. 3
a. This indicates that the proposed method can recognize subjects without reducing the accuracy. The proposed method showed better recognition rate than the LGR and “BDH+PCA-SIFT.” This indicates that the proposed method can achieve satisfactory accuracy for the face recognition task.
In Fig. 3
b, the proposed method and “without ANNS” used the parameters learned with the 482 subjects. Figure 3
b shows that the proposed method is about 24,000 times faster than the LGR, 88 times faster than “without ANNS,” and about 1600 times faster than “BDH+PCA-SIFT”.
ALOI is an object image database that consists of 110,250 color images of 1000 small objects. We used a part of the ALOI called “ALOI-VIEW,” whose images were taken from 72 different directions by rotating objects on the plane at intervals of 5 degrees. The number of objects is 1000 and the total number of images is 7200. The image size is 384 × 288 pixels. We sampled images according to the rotation of the objects to use for learning parameters: every 180 degrees (0+k180), 90 degrees (0+k90), 45 degrees (0+k45), 20 degrees (0+k20), and 10 degrees (0+k10). We also used the sampled data for the gallery, and the rest were used for the probe in the recognition process. We fixed the number of objects used for the gallery and probe to 100, 200, 500, 700, and 1000, and evaluated the computational time and recognition rate. We used the bag-of-feature model with the SIFT features for image representation. To acquire the whole image feature, we sampled feature points at every 5 pixels horizontally and vertically, and extracted the SIFT descriptors from these points. The scales of the SIFT were fixed to 20 and 30. We fixed the number of visual words to 300 based on preliminary experimental results. We used a soft-voting k-Nearest Neighbor (k-NN) classifier in which the voting weight was given by the similarity score. We fixed parameter k to 100.
We compared the proposed method with a multiclass SVM . The training and probe data were identical to the proposed method, and we used a linear function as a kernel function. We also compared the k-NN classifier with the Euclidean distance. We conducted experiments on the same PC used for the LFW experiments.
Figure 4 shows the recognition rates and search time. Figure 4
a shows that the proposed method obtained a better recognition rate than other methods when the learning data were identical. In Fig. 4
b, the proposed method was faster than the other methods under almost all experimental settings. When the learning data were 0+k180 and 0+k90, and the number of subjects was 100, the multiclass SVM was faster than the proposed method. However, when the number of subjects increased to more than 200, the proposed method was faster than the multiclass SVM. This indicates that the proposed method has better scalability than the multiclass SVM.
ETL9B is a handwriting Japanese character database that is a binarized version of ETL9  and consists of 3036 Japanese characters and 200 images per character. The size of character images is 64 × 63 pixels. We used the first 100 images in each character for learning and the gallery, and the rest for the probe. We fixed the number of characters for the gallery and probe to 100, 500, 1000, 2000, and 3036, to evaluate the proposed method. We resized the images to 16 × 16 pixels, and converted the resized images to vector features by concatenating the pixel values. We also used the directional element features , which have a dimensionality of 196, to represent the images. The recognition method was same as ALOI, and the PC used for evaluation was the same as LFW. We compared the proposed method with a Euclidean distance-based k-NN classifier. Figure 5 shows the recognition rate and average search time. In Fig. 5
a, the proposed method shows the same recognition rate as the method without the ANNS method, just as for LFW and ALOI. Figure 5
b shows that the computational time increased in a sublinear manner. Consequently, we confirmed that the proposed method was much faster than existing methods and, with respect to recognition accuracy, no worse than existing methods.