Understanding unconventional preprocessors in deep convolutional neural networks for face identification
- 267 Downloads
Deep convolutional neural networks have achieved huge successes in application domains like object and face recognition. The performance gain is attributed to different facets of the network architecture such as: depth of the convolutional layers, activation function, pooling, batch normalization, forward and back propagation and many more. However, very little emphasis is made on the preprocessor’s module of the network. Therefore, in this paper, the network’s preprocessing module is varied across different preprocessing approaches while keeping constant other facets of the deep network architecture, to investigate the contribution preprocessing makes to the network. Commonly used preprocessors are the data augmentation and normalization and are termed conventional preprocessors. Others are termed the unconventional preprocessors, they are: color space converters; grey-level resolution preprocessors; full-based and plane-based image quantization, Gaussian blur, illumination normalization and insensitive feature preprocessors. To achieve fixed network parameters, CNNs with transfer learning is employed. The aim is to transfer knowledge from the high-level feature vectors of the Inception-V3 network to offline preprocessed LFW target data; and features is trained using the SoftMax classifier for face identification. The experiments show that the discriminative capability of the deep networks can be improved by preprocessing RGB data with some of the unconventional preprocessors before feeding it to the CNNs. However, for best performance, the right setup of preprocessed data with augmentation and/or normalization is required. Summarily, preprocessing data before it is fed to the deep network is found to increase the homogeneity of neighborhood pixels even at reduced bit depth which serves for better storage efficiency.
KeywordsDeep convolutional neural networks Face identification Preprocessing Transfer learning
Humans have an intuitive ability to effortlessly analyze, process and store face information for the purposes of identification and authentication . This ability also extends to recognition of face images even at low-resolution . However, since the inception of convolutional neural networks (CNNs), a class of deep machine learning algorithms, machines have been developed that perform face identification and verification tasks at the level of efficiency comparable to humans.
The ability of intelligent machines to perform recognition tasks successfully is dependent on the CNN architecture and the format of the input data. While there is an arsenal of research on the architecture of deep networks , there is less focus on the input data of the CNN. This may be due to the perception that the network only needs the raw face images in RGB format in order to extract and learn relevant features for discerning between faces without prior processing. Meanwhile, some recognition applications might present to the network scenarios where information for recognition is not sufficiently represented. For instance, the size of the dataset might be insufficient for training a neural network, distribution of the data may vary, or degradation due to noise or grey-level resolution might be a problem, the color space might differ. Also, there may be cases of intra-person variation resulting from pose differences and/or lighting. Additionally, there may be inter-person similarity where there is close resemblance of faces of persons of different classes. The latter is more typical within a face space than in object space, but it is not within the scope of this study.
Typically, the size of the dataset for training a CNN is highly significant for making sense of the intricate patterns in the data. Compared to traditional machine learning algorithms that employ handcrafted feature extraction algorithms, the amount of data input to the CNN must be large enough, (as is the case for ImageNet . To address the small sample size problem, data augmentation methods such as translation, rotation, scaling and reflection, are often employed in the literature; examples are [5, 6, 7, 8]. Another common practice is data normalization for reducing variation in the distribution of data. It involves subtracting the mean from each pixel of the input data and afterwards, the resulting output is divided by the standard deviation of the data. In , the study considered the object classification problem. They first adopted the zero-mean and one-standard deviation data normalization method to the training data; then, followed by zero component analysis to enhance the edges of features fed to the CNN. Their work showed significant improvement to the raw-input format. These practices are widely used and are known to substantially improve the CNN performance in face recognition tasks. Another, not so popular, preprocessing strategy, is the color space conversion. Reddy et al.  trained the CNN model on different color spaces to investigate influence on the performance of the network. Their results show that the best color format to serve to CNN models as input is the raw RGB image format. Though this study is in the context of object recognition, it appears relevant to face recognition. On the other hand, the robustness of CNN to degraded samples, due to noise by study  is gaining interest. Degradation can take another form; image grey-level resolution. This impacts on the visual quality of images due to reduced bit depth and has been seen to affect performance [12, 13]. However, reduced bit depth can be useful for real-world applications [14, 15] of face recognition on mobile devices requiring small memory usage and processing. And for this reason, it is a significant problem to be studied.
The problem of dissimilarity due to pose differences and lighting are particularly significant to hand-crafted feature extraction models. With preprocessing approaches such as, face alignment, contrast enhancement, feature preprocessing, etc., employed prior to the facial feature extraction, the face recognition classifiers (after extraction of handcrafted feature processing) showed considerable increases in accuracy [16, 17]. Consequently, the effect of some of the commonly used preprocessors were investigated on deep networks. An interesting characteristic of preprocessing in the CNN pipeline can be seen in the work of . Here, raw input images of a general classification problem which includes; places, things and objects, were run through the local contrast normalization pre-processor before sending its output to the network. The pre-processor shows significant improvement to the performance of the model in comparison to the use of raw image format. Similarly, illumination normalization and contrast enhancement  is seen to enhance performance of deep networks for face recognition on extended Yale face dataset by a large margin. Also, [19, 20], show that contrast enhancement does improve CNN accuracies. Pitaloka et al.  studied different preprocessors, contrast enhancement and additive noising. The results show that preprocessing actually improves recognition accuracy. A remarkable 20.37% and 31.33% CNN performance improvement to the recognition accuracy of the original raw input data were observed with histogram equalization and noise addition, respectively, on facial expression datasets. Hu et al.  applied and studied various feature preprocessors: large and small-scale feature (LSSF), Difference-of-Gaussian (DOG), and Single Scale Retinex (SSR). These feature preprocessors were applied to face images that were fed to the CNN model. Their work showed that a 10.85% increase to the recognition accuracy of the network can be achieved as compared with the original input data.
In all these studies, the input data, whether as raw or preprocessed data, were trained on the CNN and accuracies of the model were validated and tested. However, the future of deep learning should encompass solving large search engine problems. Search engines, such as googlehouse collections of data on various entities such as, faces, objects, places, things, and so on. Therefore, it is unclear how deep convolutional models trained on specific recognition task domains, can translate to practical search engine usage given that the data format may have changed. Another challenge is, for a search of a given person’s face, for the engine to output like faces of the individual. For cases such as this, transfer learning may be useful. The search engine can house a pool data for features such as, faces, objects, places, and/or things, which vary extensively and are captured at different camera sensitivities. Therefore, it is worth investigating whether preprocessors are relevant when knowledge of general classification feature model parameters is transferred from a pretrained model to a specific new classification problem like face identification.
Empirical analysis of the preprocessing module in deep networks with knowledge transfer. This is to demonstrate the performance of the network when data format of the target domain is of a different color space, reduced grey-level resolution, or varying lighting, from the source domain.
Exhaustive evaluation of the conventional preprocessing with unconventional preprocessing methods to investigate the best setup for the preprocessors in deep networks.
Demonstrate and propose effective preprocessing strategy for input images into the deep networks, the plane-based quantization and Gaussian blurring. Quantization increases the homogeneity of nearness pixels and blurring retains relevant features. Both utilizes a more reduced bit depth for better storage efficiency. Common is quantization fashioned inside the CNN architecture and Gaussian blur used to mimic degraded probe sample.
2 The framework
The raw RGB face images are preprocessed using data augmentation by translation, data alignment by deep funneling, and normalization by zero-mean and one-standard deviation method. Also, Hue-Saturation-Value (HSV), CIELAB and YCBCR color space converters, image quantizers, and illumination normalization and insensitive feature preprocessors using: histogram equalization (HE), rgb-gamma encoding in log domain (rgbGELog) , local contrast normalization (LN), illumination normalization based LSSF  and complete face structural pattern (CFSP). These preprocessors individually address assumptions a-f, respectively, associated with the dataset.
2.1.1 Data augmentation
Training from scratch or from a pretrained network, requires a good number of samples of data per class for the CNN to generalize well to the given class. Data augmentation has been used as an artificial means to grow the size of the training data. There are different transformation approaches commonly used: translation, rotation, scaling and reflection [5, 6, 7, 8]. Each contributes differently to the CNN performance. For interested readers, the work by Paulin et al.  exemplifies each transformation performance. Since data augmentation is commonly adopted by the deep learning community, it is deemphasized in this work. Therefore, only a translation operation is performed, such that the data are translated by [+ 30, − 30] pixels to create four (4) additional faces shifted to the left, right, top and bottom, per class.
2.1.2 Data alignment and normalization
A common problem of real-world face image data is that face appearance, from the same person frontal view to his/her profile, varies drastically across large poses. Face alignment is used to improve the performance of handcrafted feature extraction algorithms and is currently applied to input faces to deep networks, to set face data of multiple pose variation to a canonical pose. Since removing the existence of pose variability significantly improves recognition performance, this work continues in the trend by using images aligned by the deep funneling method [26, 27]. Also, to make convergence of the network faster while training, the zero-mean and one-standard deviation is adopted. This ensures that the data input parameters are normalized to exhibit the same data distribution.
2.1.3 Color space conversion
In computer vision, color spaces other than RGB, are somewhat robust to lighting changes. Therefore, following on from the work in , this study evaluates the color space preprocessors such as, Hue-Saturation-Value (HSV) and YCBCR. The Y represents luminance, Cb and Cr represent the chrominance component. For the CIE La* b*, L is for luminance, while a* and b* for the green–red and blue–yellow color components. The analysis is not only focused on whether the RGB performance was improved, but also addressed the following question: is it possible for a CNN trained on RGB input data to transfer its knowledge to an input data of different color space?
2.1.4 Image degradation
Image blur is another form of degradation which reduces image quality. Gaussian blur is a popular image blurring technique in the field of image processing. Its importance is heightened in handcrafted feature-based face recognition, but seem awkward in a CNN architecture because it operates as a convolutional kernel. However, we postulate that it might enhance learning in a CNN-based face recognition as it does in handcrafted feature-based face recognition. For this reason, we explore gaussian blur as a preprocessing module that serves to retain features of significance from fine to coarse and see how it fits within a CNN architecture.
The Gaussian blur is achieved via the use of Gaussian kernels with σ = 3,7,11 to simulate fine to coarse features. Unlike [12, 13] which considered it as a degradation property on probe samples, the Gaussian blur is applied on the target and probe data on the LFW validation set just like other preprocessor used in this paper.
2.1.5 Ilumination normalization and insensitive feature preprocessing
For face images acquired at different spectral bands, the effects of spatial variation in sensitivity of camera systems, is likely to occur. To minimize the variability effect for the face images of the same class, rgbGELog  and commonly used illumination normalization techniques LSSF  is employed. Other approaches, such as LN and CFSP, involve the extraction of illumination-insensitive features. These types of feature preprocessors mostly enhance edges as opposed to low-level features. To output a color image with LSSF, HE, LN and CFSP, a color image version of the preprocessors is used. Given an RGB image, each channel plane is processed individually.
2.2 Convolutional neural networks
Current trends in the application of deep CNN shows that its possibilities in the real-world are endless. A remarkable attribute of deep networks is the ability, with sufficient training data. They can also process raw pixel data directly, extract and learn deep structural features for discrimination and generalize incredibly well to new data. More remarkable is that the deep structural features of the network can be transferred irrespective of the domain [28, 29]. For this reason, transfer learning is explored.
2.3 Transfer learning
A deep CNN architecture designed by Szegedy et al. , denoted as Inception-V3 was used. The fact that the Inception-V3 model is trained on ImageNet , a huge dataset of a million and 200,000 (1.2 million) generalized data samples and 1000 distinct class labels (of faces, objects, places, things, animals, etc.), makes it a good fit to the objective of this study. It is commonly presented in literature that transfer learning is mainly for situations where training data is insufficient . However, a more significant property of interest, in this study, is the transfer of knowledge from one domain (source) to another, almost unidentical (target), domain. By unidentical it is meant that the data format of the target set might change, either it is of different color space, reduced grey-level resolution, or varying lighting, etc. From the work of  the rich features of the CNN at different layers were investigated and their study showed that the lower layers respond to edge-like features, while succeeding layers combine lower layers and more abstract features, which are finally merged at the higher layer as global features. This is likened to the recognition ability of the human visual cortex  which processes parts of the face, individually, and are put together as a global feature to make sense of a person’s identity. In , the output of the last layer, which comprises the high-level feature vectors of a pretrained CNN, showed to generalize well to a new target dataset than fine-tuning some layers of network. It is on this reasons that the high-level feature vectors of the inception-V3 model is found useful to this study’s face search problem.
3 Experimental setup
For a better understanding of the experimentation carried out in this study, the data, the transfer model settings and the evaluation strategy is presented in the succeeding subsections.
The LFW data set  is commonly used for modelling real-world data. It is well-known for intra-class variability resulting from pose, illumination and expression problems. The images are in RGB format and comprise of 13,233 face images of 5749 individuals. The face search problem with respect to the objective of this study, do not necessarily demand huge data for classifier training. Therefore, only the individuals with over 50 images are considered to enable the classifier to generalize well with the new data.
Each of the deep funneled 1456  face samples belonging to 10-person classes comprises an RGB image of 250 × 250 resolution, but is resized to 299 × 299 resolution because the pretrained network has been trained on images of size 299 × 299 × 3. After resizing, the images are further scaled in the range [0, 1].
3.2 Training the classifier
The training was implemented using Inception-V3 pretrained on ImageNet on standard software package, Tensorflow-Slim . The lower layers of the network are frozen, while the last (high-level features) layer of the network is used as provided by the freely available source code.1 This layer is believed to resemble human global assemble of individually processed parts of a face, therefore, is transferred for training new weights and biases of the face search data. Implementation computation was performed using Intel® Core™ i7-7500U CPU 4 logical processors. The data set was split into training, validation and testing sets, containing 70%, 5% and 25% of the images, respectively.2 The validation set controls the training process, while the final accuracy is determined using the test set. The Adam optimizer2 was used with exponential decay, from 0.003 initial learning rate to 0.0001 (exponentially decayed with step down method for every 29 iterations).
Performance is evaluated using the Top-1 accuracy metric. Since the data augmentation and normalization preprocessors are commonly used on aligned data by the CNN research community, this study terms them conventional preprocessors. Consequently, they become the basis for evaluating the unconventional preprocessors in a CNN as follows: with_augmentation (WA), without_augmentation (NA), with_normalization (WN), and without_normalization (NN). The performance of each of the unconventional preprocessors is reported under these categories: color space conversion, illumination normalization and insensitive feature preprocessors, grey-level resolution degradation. Finally, we present the unconventional preprocessors performance alongside the state of arts methods.
4 Results and discussion
Here, the applicability of unconventional preprocessing with conventional preprocessing methods in transfer networks, for a specific target problem such as a face search, is observed and reported. Therefore, the results will be discussed in four stages as follows. Stage one: A general performance of the preprocessors that encompasses: color space conversion, illumination normalization and insensitive feature preprocessors. Stage two: a streamlined report on various grey-level degradation preprocessors. Here various levels of quantization, 224 grey-level to 7-levels, 6-levels, 5-levels and 4-levels; and Gaussian kernels with σ = 3, 7, 11 are presented. Further report on accuracy and loss of some of preprocessor in this category is illustrated. For the third stage, the face search is depicted through a result of the performance of each of the unconventional preprocessors for a search of a given person’s identity.
4.1 Preprocessor performance in transfer network
RGB versus preprocessor performance in transfer networks
WA and WN (%)
WA and NN (%)
NA and WN (%)
NA and NN (%)
Color space conversion
Illumination Normalization and Insensitive methods
Local contrast normalization
Large & small-scale features
Surprisingly, the performance of the color space preprocessors showed that format does not hinder knowledge transfer. That is to say that no matter what the input format of a data model is in a transfer setting, the transferred features still generalize well to a new target data. The CIELAB was by far the worst color format across the board. A difference of 28.9772% is observed when compared against the RGB format for the augmented and normalized data experiment.
This might be based on the fact that it captures only structural features. Though the RGB has the highest identification accuracy, the YCBCR remained consistent across all the experiments. On average, it outperformed the raw RGB format by a 3.7692% margin.
On the other hand, the LSSF illumination normalization and insensitive feature preprocessors appear to perform best within this preprocessor category. Others performed similarly, except for the CFSP. The CFSP does not seem to be promising in a transfer network. The best performing illumination preprocessor is the LSSF, LN followed by HE.
Additionally, it is obvious that different preprocessors react differently with data augmentation and normalization. The raw RGB, HE, CFSP, LN and LSSF perform better when data is normalized and augmented. The YCBCR is best with only data augmentation and rgbGELog is best with only normalized data.
4.2 Grey-level preprocessor performance in transfer network
Comparing the performance of RGB data and grey-level preprocessors in a transfer network
WA and WN (%)
WA and NN (%)
NA and WN (%)
NA and NN (%)
In the first experiment, with data augmentation and normalization, the quantization to 7-levels using the full-image based approach outperforms its other counterparts as well as the plane-based quantization approach. The same performance is achieved for data quantized to 4-levels with no augmentation and normalization. Also, in the second and third experiments, the 5-level quantization preprocessors competed between augmentation and normalization with almost a draw in performance. However, the 7-level quantization performs significantly well when data is normalized.
4.3 The transfer network face search problem
Interestingly, the LSSF, HE, YCBCR, rgbGELog, full and plane-based quantization preprocessors were equally useful and competitive at retrieving the right identity of a query image. However, in order to attain the best performance of any of these preprocessors, it is best to utilize them based on their acceptance for some conventional preprocessing such as augmentation and normalization.
The face search experiment also reveals that the raw RGB format only performs well when data size is increased by augmentation and the distribution of data is normalized. Other than this, it fails to perform favorably. The LSSF, YCBCR, rgbGELog, and HE maintained good accuracies independent of normalization and or augmentation. The LSSF particularly performed better than the raw RGB in identity retrieval.
4.4 On the state-of-the-art results
The accuracy report of unconventional preprocessor driven network with the state-of-the-art results on the LFW dataset is carried out particularly to portray that preprocessors can make a contribution to the learning of features that matter to deep networks if properly explored. However, the comparison is limited to deep networks which utilized outside data for training, though the methods under consideration differ by number of LFW validation samples, evaluation protocols, preprocessors and CNN architecture.
Unconventional preprocessors comparison with state-of-the-art results on LFW
Facebook (4.4 M, 4 K)
CelebFaces + (0.2 M,10 K)
VGGFace (2.6 M, 2.6 K)
alignment, crop, PCA
CASIA-WebFace (0.49 m, 10 k)
CASIA-WebFace (0.49 M,10 k)
MS-Celeb-1 M (3.5 M,31 k)
MS-Celeb-1 M, CASIA-WebFace(5 M,100 k)
face synthesis with facial parts
face synthesis with facial parts
face synthesis with facial parts
face synthesis with facial parts
face synthesis with facial parts
CASIA-WebFace (0.5 M,10 k)
VGGFace (1.8 M, 1 K)
Gaussian blur with σ = 3 (probe set)
VGGFace (1.8 M, 1 K)
Gaussian blur with σ = 11 (probe set)
VGGFace (1.8 M, 1 K)
ImageNet (1.2 M, 1 K)
ImageNet (1.2 M, 1 K)
Patch-based quantization to 7-levels
ImageNet (1.2 M, 1 K)
Gaussian blur with σ = 11
ImageNet (1.2 M, 1 K)
Gaussian blur with σ = 3
ImageNet (1.2 M, 1 K)
Gaussian blur with σ = 11, augmentation
ImageNet (1.2 M, 1 K)
Our preprocessor based deep network is contrary to the goal of the work in  which is basically to understand the performance of the deep network model when probed with an image degraded due to blurriness from camera settings. They only applied the Gaussian blur of different sigma levels the probe set and their result clearly showed that blur image reduces accuracy of a deep network extremely. On the other hand, we simulated the Gaussian blur with the goal of preprocessing to retain features of significance from fine to coarse and see how it fits within a CNN architecture. Obviously, from our experiment it is evident that Gaussian blur is a promising preprocessor for the deep network. The performance of blur with σ = 3 over σ = 11 graphed in Fig. 5 confirms our earlier analogy of performance consistency.
In contrast to what is commonly believed, the preprocessor module for deep learning frameworks has proved significant in deep networks. In this paper, the various facets of the CNN architecture were kept constant while the preprocessing module was varied for different preprocessing algorithms.
The HE, full-based and plane-based quantization, rgbGELog, and YCBCR (these are the unconventional preprocessors in CNNs), showed that the discriminative capability of the deep networks can be improved by preprocessing the raw RGB data prior to feeding it to the network. However, the best performance of these preprocessors was achieved via considering, under various preprocessing setups, data augmentation and/or normalization (these are the conventional preprocessors) in CNNs. Even though the raw RGB format performed well, quantizing a 224 grey-level image to 7-levels and Gaussian blur with σ = 3 outperformed the RGB. They achieved above 72% and 99% accuracy with data normalization and no augmentation, respectively. This preprocessor was found to be an effective preprocessing strategy in deep networks for the reasons that they both might have increased the homogeneity of neighborhood pixels and utilizes a more reduced bit depth for better storage efficiency.
The authors received no specific funding for this work.
Complaince with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
- 1.Dakin SC, Watt RJ (2009) Biological bar codes in human faces. J Vis 9:1–10Google Scholar
- 4.Krizhevsky I, Sutskever Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
- 5.Gudi et al (2015) Deep learning based facs action unit occurrence and intensity estimation. In: 11th IEEE international conference and workshops on automatic face and gesture recognition, pp 1–5Google Scholar
- 6.Khorrami P, Paine T, Huang T (2015) Do deep neural networks learn facial action units when doing expression recognition? In: Proceedings of the IEEE international conference on computer vision workshops 2015, pp 19–27Google Scholar
- 7.Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: Proceedings applications of computer vision, pp 1–10Google Scholar
- 9.Pal KK, Sudeep KS (2016) Preprocessing for image classification by convolutional neural networks. In: IEEE international conference on recent trends in electronics, information and communication technology, pp 1778–1781Google Scholar
- 10.Reddy KS, Singh U, Uttam PK (2017) Effect of image colourspace on performance of convolution neural networks. In: IEEE international conference on recent trends in electronics, information and communication technology, pp 2001–2005Google Scholar
- 11.Dodge S, Karam L (2016) Understanding how image quality affects deep neural networks. In: 8th international conference on quality of multimedia experience, pp 1–6Google Scholar
- 13.Karahan S, Yildirum MK, Kirtac K, Rende FS, Butun G, Ekenel HK (2016) How image degradations affect deep CNN-based facerecognition? In: IEEE international conference in biometrics special interest group, pp 1–5Google Scholar
- 14.Wu J et al. (2016) Quantized convolutional neural networks for mobile devices. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4820–4828Google Scholar
- 19.Ghazi MM, Ekenel HK (2016) A comprehensive analysis of deep learning-based representation for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 34–41Google Scholar
- 20.Dosovitskiy A, Springenberg JT, Riedmiller M, Brox T (2014) Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in neural information processing systems, pp 766–774Google Scholar
- 25.Paulin M (2014) Transformation pursuit for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3646–3653Google Scholar
- 26.Huang G, Mattar M, Lee H, Learned-Miller EG (2012) Learning to align from scratch. In: Advances in neural information processing systems, pp 764–772Google Scholar
- 27.Huang G, Mattar M, Lee H, Learned-Miller EG (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ Images: detection, alignment, and recognitionGoogle Scholar
- 28.Patricia N, Caputo B (2014) Learning to learn, from transfer learning to domain adaptation: a unifying perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1442–1449Google Scholar
- 30.Szegedy C (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
- 31.Deng J (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings computer vision and pattern recognition, pp 248–255Google Scholar
- 32.Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724Google Scholar
- 33.Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing system, pp 3320–3328Google Scholar
- 35.Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In: International conference on image, vision and computing, pp 783–787Google Scholar
- 36.Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: CVPR, pp 1701–1708Google Scholar
- 37.Parkhi MO, Vedaldi A, Zisserman A et al (2015) Deep face recognition. BMVC 1:6–17Google Scholar
- 38.Liu W, Wen Y, Yu Z, Yang M (2016) Large-margin softmax loss for convolutional neural networks. In: ICML, pp 507–516Google Scholar
- 39.Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: Deep hypersphere embedding for face recognition. In: CVPR, pp 5690–4699Google Scholar
- 40.Zheng Y, Pal DK, Savvides M (2018) Ring loss: Convex feature normalization for face recognition. In: CVPR, pp 5089–5097Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.