Binarized eigenphases applied to limited memory face recognition systems
Most of the algorithms proposed for face recognition involve considerable amount of computations and hence they cannot be used on devices constrained with limited memory. In this paper, we propose a novel solution for efficient face recognition problem for the systems that utilize low memory devices. The new technique applies the principal component analysis to the binarized phase spectrum of the Fourier transform of the covariance matrix constructed from the MPEG-7 Fourier Feature Descriptor vectors of the images. The binarization step that is applied to the phases adds many interesting advantages to the system. It will be shown that the proposed technique increases the face recognition rate and at the same time achieves substantial savings in the computational time, when compared to other known systems. Experiments on two independent databases of face images are reported to demonstrate the effectiveness of the proposed technique.
KeywordsFace recognition Limited memory PCA MPEG-7
In the last few years, researchers in the area of face recognition have proposed many numerous techniques that achieve high recognition rate [1, 2, 3, 4, 5, 6, 22, 23, 24]. Despite the significant advances in face recognition approaches, it has yet to achieve levels of practicality required for many commercial and industrial applications. With the increased commercial interest in portable devices and with the advanced real-time face recognition systems, the need for more practical, cost effective and low-power implementation of these systems has increased. The typical face recognition algorithms that found great applications in different fields (banks and airports) may not be executable in devices that are memory-constrained, with computer processing unit (CPU) speed of no more than 100–400 MHz. With this low processor speed many face recognition approaches will face many difficult challenges. Further, the limited memory of these CPUs put more burdens on the practical implementations. Thus, this important implementation issue for these devices needs special treatment and more investigation. Moreover, to the best knowledge of the authors, there are few published works that propose schemes for devices of limited memory constraints [7, 8, 9]. Lee et al.  presented face authentication algorithm that uses the features chosen by Genetic algorithms as an input vector to a support vector machine. Ng et al.  proposed another algorithm that copes the illumination and pose variations using boosting algorithm to determine which pose variation of the face are challenging and to bootstrap them into filter synthesis. One of the main challenges of face recognition algorithms is the considerable amount of calculations and computations involved in the process, and eventually this will slow down the system significantly. For example, in surveillance systems face recognition plays an important role for reliable security issues. The wireless domain for data transmission and the low-memory requirement of the data sent are of great important interests.
Most of the appearance-based methods for face recognition that deal with the face images as a whole, depend on calculating the eigenvalues and the eigenvectors of the covariance matrix of a system representing this face space [10, 11, 12, 13, 14]. The time required for these calculations is relatively huge. Consequently, for low-memory devices, we cannot just apply the principal component analysis (PCA) or even other known face recognition methods such as linear discriminant analysis (LDA) to the face images directly.
In this paper, we describe a new approach for face recognition system that can be implemented and loaded on small and portable devices. The proposed approach provides an extension to the well-known MPEG-7 Fourier Feature Descriptor (FFD) algorithm. The main aim beyond this process is to dramatically reduce the size of the space needed to represent the face features, while keeping the performance rate as high as possible. We propose a novel technique for face recognition for systems with a limited memory.
It will be shown that the new technique provides improvements of the performance of the face recognition rates when compared to the MPEG-7 FFD vector method, in particular, as well as for other approaches such as the fisherface LDA method, the direct eigenphase implementation method to the face space , and the PCA method.
This work is organized as follows. A brief description of MPEG-7 FFD is given in Sect. 2. The formulation of the proposed technique is presented and discussed in Sect. 3. In Sect. 4, results of testing and implementing the new technique on two independent databases, the Olivetti Research Ltd. (ORL) database and xm2vts, CVSSP—University of Surrey database, are presented. Concluding remarks are given in Sect. 5.
2 MPEG-7 Fourier feature descriptor
The MPEG committee is primarily known for the successful development of a series of video compression standards: MPEG-1, MPEG-2, and MPEG-4. The MPEG-7, formally named “Multimedia content description interface”, objective is to describe the content of multimedia data so that it can be efficiently searched, accessed, transformed or adapted for use by any device and to support different applications .
In addition, many face descriptors for MPEG-7 have been proposed for face retrieval in video streams. MPEG-7 is very flexible where improved algorithms can replace previous ones and therefore, not frozen in time.
MPEG-7 uses the FFD vector to represent the facial feature of an image. The descriptor represents the projection of a face vector onto a set of basis vectors, which span the space of possible face vectors. The face recognition feature set is extracted from normalized face images each of size 56 × 46. The FFD vector represents the facial feature of an image using a small single vector. This small vector is derived from two feature vectors; one is a Fourier spectrum vector x1f, and the other is a Multi-block Fourier amplitude vector x2f of a normalized face image.
At this point, it is very beneficial to investigate and find out how these important elements or features of the FFD vectors are distributed. Accordingly, one can emphasize the important features and neglect the relatively less important ones. In this process, we are proposing to achieve this objective by transforming these vectors into another domain, namely the frequency domain. Constructing the face recognition system from the FFD vectors in frequency domain is explained next in Sect. 3.
3 Binarized eigenphases of the MPEG-7 FFD vectors
Note that Eq. 3 represents the covariance matrix in the spatial domain and its size is reduced to 63 × 63. Oppenheim et al.  showed that the phase angle of the Fourier transform retains most of the information about the image. In addition, recently, Savvides et al.  proposed a work based on this frequency domain concept. In their work, they analytically demonstrated that the eigenvectors of a face space can be obtained in frequency domain and these frequency domain vectors are equivalent to the ones obtained in the space domain using the principal component analysis. Further, as the phase information retains the most of the intelligibility features of an image, the image variation was modelled by only keeping the complex phase spectrum of the image. Note that, in general, changes in the images affect the magnitude more than the phase. This effect can be reduced by eliminating the magnitude and using only the phase. Now, since much of the noise, distribution, distortion, and image corruption are noticeable in the magnitude part of the spectrum of the image, so by taking off the magnitude we are actually reducing the weaknesses of the image and keeping only the discriminative features presented in the phase part of the Fourier transform. Furthermore, the magnitudes of the spectra of the signals tend to fall off at higher frequencies and many recognizable characteristics of the images will be at these higher frequencies. Thus, in this regard, the eigenphase approach  was very successful and very tolerant to illumination variations in the images. Following this approach, it is expected that our proposed binarized eigenphase scheme will outperform the FFD approach.
Low memory storage: no more than 1 bit/pixel, instead of representing the pixel with 8 bits is used.
Simple processing: the algorithms are in most cases much simpler than those applied to grey-level images are.
Enhancing the performance of the system, as will be seen in Sect. 4.
A further significant improvement of the proposed system is obtained by reducing the dimensionality of the system in order to cope with the system constraints and requirement. Since each feature adds to the computational burden in terms of processing and storage, the application of the PCA at the final step further reduces the dimensionality of our system.
A value of M′ (M′ is much smaller than M) eigenvectors associated with the largest eigenvalues is sufficient for the recognition process. It is found from the experiments that M’′ = 10.
Many different experiments were performed to evaluate the performance of the proposed system. We carried out experiments on two independent and different databases, one is the ORL and the other is the xm2vts. Both sets include a number of images for each person, with variations in pose, expression and lighting. The ORL set includes 400 images of 40 different individuals where each individual is represented by 10. The system was trained using five images for each person from this set and tested using the other five images.
For the xm2vts set, we have used 2,360 images for 295 different individuals with each individual represented by 8 different images. These images have been taken at four different sessions, with two shots at each session. The xm2vts uses a standard protocol, referred to as Lausanne protocol . This protocol was defined for the task of verification. The features of the observed person are compared with stored features corresponding to claimed identity, and the system decides whether the identity claim is true or false on the basis of a similarity score. The subjects, whose features are stored in the system database, are called clients, whereas the person who is claiming a false identity is called imposter. According to Lausanne protocol, the database is split into three groups: the training group, evaluation group, and the testing group. We have trained our system with the images from the first two sessions (4 images), and used the images from the third session for evaluation (2 images), and finally we have used the images from the fourth session for testing (2 images). The evaluation set is used to find the threshold that determines if a person is accepted or rejected. The xm2vts database images are taken at different sessions (different days). The experiments on this database test the robustness of the proposed system under the variation in time conditions of the images. Different timing means different hairstyle, different clothes and different “moods”.
Now, in order to draw solid conclusions and to show the robustness of our system, we have implemented three stages of experiments to the xm2vts database. At the first stage, we have used 1,180 images that were taken on shot 1. Second stage uses the other 1,180 images that were taken on shot 2. For stages 1 and 2, we have trained our system using 2 images per person, used one image for evaluation, and used one image for testing. At the final stage, we have utilized the whole 2,360 images for the 295 individuals where we have trained our system with 4 images from the first two sessions, and used the 2 images from the third session for evaluation, and finally we have used the 2 images from the fourth session for testing. Note that the evaluation set is used to find the threshold that determines if a person is accepted or rejected.
The following steps were carried out on both database sets:
The images were normalized to a size of 56 × 46.
The MPEG-7 algorithm was applied to these sets, and the FFD for each one of the images was calculated.
The eigenvectors and eigenvalues of the (63 × 63) covariance matrix were calculated, the M′ eigenvectors corresponding to the highest associated eigenvalues is chosen. M′ = 10 was selected in the experiments.
For each known individual, the class vector Ωk was calculated by averaging the pattern (weight) vectors Ω for the learning images calculated from the original FFD vector of each individual.
For each new face image to be identified, its pattern vector Ω was found and the distances ɛk to each known class was calculated. The class vector Ωk that has the minimum distance ɛk will represent this input face.
The new system achieved 93.5% recognition rate when applied to xm2vts database, while under the same conditions, the MPEG-7 face recognition method achieved 89.5%.
The other experiment was to test the proposed technique under other different circumstances. The ORL face database is used in this experiment. This database include images with different poses, different illuminations, different expressions (open or closed eyes, smiling or non-smiling), different facial details (glasses or no glasses), and some of them were taken at different times. Examples of the ORL database used are shown in Fig. 3b. The proposed technique achieved 95.5% correct classification, while under the same conditions the MPEG-7 face recognition method achieved 91.5% correct classification.
The recognition rates of five different methods
New method (%)
It is significant to emphasize at this point the importance of the binarization step in the new proposed method. The performance of the system has degraded by ~2% for both databases without the binarization step. This shows the importance of the binarization step in enhancing the performance of the system in addition to the memory saving of the system.
From Figs. 15, 16, 17, and 18, one can see that the performances of all five methods are degraded when they are applied to more practical images. However, our proposed technique still outperforms the other ones as it is expected.
One of the big advantages of our proposed method is obviously its time saving of the computations. A closer look at the computation processes of the face recognition methods (except of MPEG-7 method) reveals that the huge size of the covariance matrix, which undergoes the process of finding the eigenvectors and eigenvalues, is the limiting factor.
In the PCA, LDA, and the eigenphase methods, the program spent most of the time in calculating the huge size covariance matrix. The numbers of multiplications and summations needed to construct the covariance matrix can be approximated as M × M × N2 and M × M × (N2 − 1), respectively, where M is the number of images needed to build the covariance matrix and N × N is the image size (assume a square image). Whereas for our new method, these numbers are 63 × 63 × M and 63 × 63 × (M – 1). For example, if the image size is N × N = 32 × 32 and the number of images M = 100, then for this example, the number of multiplications and summations needed to construct the covariance matrix using the PCA, the LDA, or the eigenphase methods is of order O (107). On the other hand, this number is of order O (105) for the new method. Note that the MPEG-7 method does not find the covariance matrix and the corresponding eigenvectors at the same level, and consequently the computational complexity cannot be compared to the previous methods.
Summary of calculations and consumed time for a certain example
τa = time
5 Concluding remarks
In this paper, a novel method is proposed for efficient face recognition that can be implemented in systems that have limited memory capabilities and have low speed processors. Due to the recent fast advances in technology, face recognition techniques that utilize small memory size devices and show robustness in performance are worth more investigation. Although the appearance-based methods (such as the PCA and the fisherface) have been proposed for face recognition tasks, they are overwhelmed by the significant amount of time taken to calculate the eigenvectors of the covariance matrix; a problem that we have overcome in our new technique. The new technique exploits the characteristics and combines the advantages of the MPEG-7 FFD vectors, frequency domain, binarization process, and the PCA. The MPEG-7 FFD reduces the huge dimensionality of the system and provides compact and small memory size vectors for further processing. The frequency domain provides the phase contents of the images where the intelligent features are residing. The binarization process emphasizes the important features of the covariance system through enhancing the high frequency components. Finally, the PCA is applied to the system for further dimensionality reduction. As the covariance matrix of the face space in our system is represented in an efficient way (63 × 63), the calculations related to the eigenvectors of the covariance matrix are dramatically reduced. Moreover, the binarization step of the new system enhances the performance of the system and provides better recognition rate compared to other known techniques when applied to two independent face images databases.
The authors would like to thank the reviewers for the constructive suggestions and valuable comments.
- 6.Cai J, Goshtasby A (1999) Detecting human faces in color images. Image Vis Comput 63–75Google Scholar
- 8.Ng C, Savvides M, Khosla PK (2005) Real-time face verification system on a cell-phone using advanced correlation filters. Automatic identification advanced technologies, fourth IEEE workshop on 57–62, New York, USAGoogle Scholar
- 9.Zaeri N, Mokhtarian F, Cherri A (2006) Fast face recognition techniques for small and portable devices. Proceedings of the IEEE, ItalyGoogle Scholar
- 13.Kim HC, Kim D, Bang SY (2002) Face retrieval using 1st- and 2nd-order PCA mixture model. International conference on image processing, Rochester, NYGoogle Scholar
- 14.Kong H, Li X, Wang J-G, Teoh EK, Kambhamettu C (2005) Discriminant low-dimensional subspace analysis for face recognition with small number of training samples. British machine vision conference (BMVC), Oxford, UK, 5–9 September 2005Google Scholar
- 15.Savvides M, Vijaya Kumar BVK, Khosla PK (2004) Eigenphases vs. eigenfaces. IEEE-17th international conference on pattern recognitionGoogle Scholar
- 17.Lipinski L, Yamada A (2003) MPEG-7 Face recognition technique, international organization for standardization, ISO/IEC JTC1/SC29/WG11, coding of moving pictures and audio, NPEG03/N6188Google Scholar
- 18.Kamei T, Yamada A (2002) Report of core experiment on Fourier spectral PCA based face description, ISO/IEC JTC1/SC21/WG11 M8277, Fairfax, VAGoogle Scholar
- 20.Messer K, Matas J, Kittler J, Luettin J, Maitre G (1999) XM2VTSDB: The extended M2VTS database. Second international conference on audio and video-based biometric person authenticationGoogle Scholar
- 21.Phillips P, Wechsler H, Huang J, Rauss P (1998) The FERET database and evaluation procedure for face-recognition algorithms. J Image Vis Comput 295–306Google Scholar
- 22.Moghaddam B, Jebara T, Pentland A (1999) Bayesian modeling of facial similarity. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing system 11. MIT Press, Cambridge, pp 910–916Google Scholar