2D and 3D Palmprint and Palm Vein Recognition Based on Neural Architecture Search

Palmprint recognition and palm vein recognition are two emerging biometrics technologies. In the past two decades, many traditional methods have been proposed for palmprint recognition and palm vein recognition and have achieved impressive results. In recent years, in the field of artificial intelligence, deep learning has gradually become the mainstream recognition technology because of its excellent recognition performance. Some researchers have tried to use convolutional neural networks (CNNs) for palmprint recognition and palm vein recognition. However, the architectures of these CNNs have mostly been developed manually by human experts, which is a time-consuming and error-prone process. In order to overcome some shortcomings of manually designed CNN, neural architecture search (NAS) technology has become an important research direction of deep learning. The significance of NAS is to solve the deep learning model’s parameter adjustment problem, which is a cross-study combining optimization and machine learning. NAS technology represents the future development direction of deep learning. However, up to now, NAS technology has not been well studied for palmprint recognition and palm vein recognition. In this paper, in order to investigate the problem of NAS-based 2D and 3D palmprint recognition and palm vein recognition in-depth, we conduct a performance evaluation of twenty representative NAS methods on five 2D palmprint databases, two palm vein databases, and one 3D palmprint database. Experimental results show that some NAS methods can achieve promising recognition results. Remarkably, among different evaluated NAS methods, ProxylessNAS achieves the best recognition performance.


Introduction
In the digital and intelligent society, more and more application scenarios need to authenticate people′s identity effectively. Biometric technology is considered to be one of the most effective solutions for personal authentication. The so-called biometrics refers to the technology that uses the human body′s physical or behavioral characteristics to identify individuals through image processing, computer vision, pattern recognition and other techniques. Generally speaking, face recognition, fingerprint recognition and iris recognition are the three most successful biometric technologies and have been widely used. However, different biometric technologies have their advantages and disadvantages. In other words, there is no one biometric technology that can meet the needs of all applications of personal authentication. Therefore, academic and industrial circles are developing different bio-metric technologies to meet the application requirements of different scenarios.
In recent years, palmprint recognition and palm vein recognition have become two new biometric recognition technologies, which have attracted great attention [1−4] . Palmprint recognition refers to the technology conducting personal authentication based on the palm skin images of human hands. According to the resolution and data type of palmprint image, palmprint recognition technology can be divided into 2D palmprint recognition and 3D palmprint recognition. Furthermore, 2D palmprint recognition can be further divided into low-resolution palmprint recognition and high-resolution palmprint recognition. High-resolution palmprint recognition is generally used for forensics purposes, while low-resolution palmprint recognition and 3D palmprint recognition are mainly used for civilian purposes. Palm vein recognition refers to the technology using the palm vein images captured under near-infrared light for personal authentication. Palm vein recognition is also mainly used for civilian purposes. Since palmprint and palm vein are both collected from the palm, and their recognition methods are similar to some extent, some researchers study them simultaneously. In this paper, we only pay attention to civil-ian use of biometrics technology, so we mainly study 2D low-resolution palmprint recognition, 3D palmprint recognition, and palm vein recognition. In the rest of this paper, for the sake of convenience, we will write 2D low-resolution palmprint recognition as 2D palmprint recognition.
Researchers have proposed many effective methods for 2D and 3D palmprint recognition and palm vein recognition, which can be divided into two groups, i.e., traditional methods and deep learning-based methods. Generally, traditional methods are based on hand-crafted features and traditional machine learning techniques. Different from traditional methods, deep learning can automatically learn features from images, videos or texts. The highly flexible architecture of deep learning can learn directly from the original data, and the prediction accuracy will be improved after more data are obtained. Nowadays, deep learning has become one of the most important technologies in the field of artificial intelligence. In recent years, the explosive progress made in computer vision, speech recognition, natural language processing, robotics and other fields almost all depend on deep learning technology [5−8] .
In the field of biometrics, especially in face recognition, deep learning has become the most mainstream technology [9] . Convolutional neural network (CNN) is one of the most important branches in deep learning. For image-based biometrics technologies, CNN is the most commonly used deep learning technique [9] . Until now, a lot of classic CNNs have been proposed, and have achieved impressive results for many recognition tasks. The success of these CNNs is mainly attributed to the automation of the feature engineering process: A layered feature extractor learns from data in an end-to-end manner. With this success, there is a growing demand for architecture engineering, and more and more complex neural architectures are designed in a manual manner. That is, currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. In order to overcome some shortcomings of manually designed CNN, neural architecture search (NAS) technology has become an important research direction of deep learning [10−14] . The core idea of NAS is to use a search algorithm to find the neural network structure needed to solve the problems. The significance of NAS is to solve the parameter adjustment problem of the deep learning model, which is a cross-study combining optimization and machine learning. The concept of NAS was first proposed by Zoph and Le [15] at International Conference on Machine Learning (ICML) in 2017, and has become a fundamental and active research direction of deep learning.
With the continuous improvement of deep learning network architecture and the increasing amount of data, the recognition accuracy of deep learning in different biometrics tasks is also increasing. For example, in the field of face recognition, the recognition accuracy of deep learning has far exceeded the traditional hand-crafted algorithms; thus, deep learning has successfully promoted the large-scale application of face recognition technology. However, in the fields of 2D and 3D palmprint recognition and palm vein recognition, the related research based on deep learning is still preliminary. A lot of researchers have used some classic CNNs or manually designed CNNs for 2D and 3D palmprint recognition and palm vein recognition. Nevertheless, up to now, NAS technology has not been well studied for 2D and 3D palmprint recognition and palm vein recognition. Because NAS technology represents the future development direction of deep learning, it is vital to systematically investigate the recognition performance of NAS methods for 2D and 3D palmprint recognition and palm vein recognition. To this end, we conduct the performance evaluation of NAS methods on 2D and 3D palmprint recognition and palm vein recognition in this paper. Particularly, twenty representative NAS methods are selected and exploited for performance evaluation.
It should be noted that the samples within the above databases are captured in two different sessions at certain time intervals. If the training samples are only from the first session, and the test samples are from the second session, we call this experimental mode the "separate data mode". If the training samples are from both sessions, we call this experimental mode the "mixed data mode". In traditional recognition methods, some samples captured in the first session are usually used as training sets, while all the samples captured in the second session are used as the test set. Therefore, the experiments of those traditional recognition methods were usually conducted in the "separate data mode". However, in existing deep learning-based palmprint recognition and palm vein recognition methods, the experiments were usually conducted in the "mixed data mode". Thus, it is easy to obtain a high recognition accuracy. In this paper, we will conduct experiments in both "separate data mode" and "mixed data mode" to observe the recognition performance of representative NAS methods in these two different modes.
The main contributions of our work are as follows. 1) We briefly summarize some important NAS methods, which can help the readers to better understand the development history of NAS technology.
2) We conduct a performance evaluation of representative NAS methods for 2D and 3D palmprint and palm vein recognition. To the best of our knowledge, it is the first time such an evaluation has been conducted. Particularly in the field of biometrics, this is also the first work to evaluate the recognition performance of representative NAS methods.
3) We evaluated the performance of representative NAS methods on Hefei University of Technology cross sensor palmprint database. It is the first time that the problem of palmprint recognition across different devices using NAS technology has been investigated. 4) We investigate the problem of the recognition performance of NAS methods on both "separate data mode" and "mixed data mode".
The rest of this paper is organized as follows. Section 2 presents the related work. Section 3 briefly introduces NAS technology. Section 4 introduces the selected NAS methods in detail. Section 5 introduces the 2D and 3D palmprint and palm vein databases used for evaluation. Extensive experiments are conducted and reported in Section 6. Section 7 offers the concluding remarks.

Traditional 2D palmprint recognition methods
For 2D palmprint recognition, researchers have proposed many traditional methods. Kong et al. [1] , Zhang et al. [2] , Fei et al. [3] , Zhong et al. [4] have published several survey papers on the traditional palmprint recognition methods. As shown in Fig. 1, these traditional methods can be classified into different subcategories, such as palm line-based, texture-based, orientation coding-based, correlation filter-based, and subspace learning-based. The palm line is the primary feature of the palmprint. Therefore, some researchers tried to extract the palm lines for palmprint recognition. However, due to the complexity of palmprint images, it is still difficult to extract palm lines accurately. Palmprint images contain obvious texture features. Therefore, researchers have proposed many texturebased palmprint recognition methods. Texture-based methods usually exploited sparse descriptor, dense descriptor, or other texture representations, such as the Gabor feature and wavelet feature, for palmprint recognition. Thus, texture-based methods can be further divided into three subtypes, i.e., Gabor and wavelet-based methods, dense texture descriptor-based methods, and sparse texture descriptor-based methods. Notably, some dense texture descriptors have achieved promising recognition results. As we know, palmprint contains many palm lines, and these lines have their orientations. Orientation features are insensitive to some variations such as illumination changes; thus, orientation is a robust feature of palmprint. A lot of orientation coding-based methods have been proposed, which have high accuracy and fast matching speed. Generally, orientation coding-based methods first detect the orientation of each pixel and then encode the orientation number to a bit string, at last, exploited Hamming distance for matching. Recently, correlation methods have been successfully used for biometrics, which also has high accuracy and fast matching speed. Subspace learning has been one of the important techniques for pattern recognition. Some subspace learning-based methods have been used for palmprint recognition. However, the recognition performance of subspace learning-based methods is sensitive to illumination changes and other image variations.

Traditional 3D palmprint recognition methods
Fei et al. [23] surveyed the papers on traditional 3D palmprint recognition methods. Generally, 3D palmprint data preserves the depth information of a palm surface. The original captured 3D palmprint data is a small positive or negative float. For practical feature extraction, the original 3D palmprint data is usually transformed into the grey-level value. To this end, the original 3D palmprint data is usually transformed into a curvaturebased data to facilitate the design of recognition algorithms. The two most important curvatures include the mean curvature (MC) and Gaussian curvature (GC), and their corresponding images are mean curvature image (MCI) and Gaussian curvature image (GCI

Traditional palm vein recognition methods
As shown in Fig. 3, the traditional palm vein recognition methods can be divided into the following subcategories: structure-based, texture-based, orientation coding-based, and subspace learning-based. Structure-based methods usually first perform an image segmentation algorithm or line detection algorithm, and then extract the structure features of palm vein for recognition such as lines, skeletons, points, minutiae and graph. Thus, structure-based methods can be further divided into three subtypes, i.e., line/skeleton-based methods, points/minutiaebased methods, and graph-based methods. Texture-based methods, orientation coding-based methods, and subspace learning-based methods used for palm vein recognition are similar to those used for palmprint recognition.

2D and 3D palmprint recognition and palm vein recognition methods based on deep learning
Many researchers have studied 2D and 3D palmprint recognition and palm vein recognition based on deep learning. Some representative 2D palmprint recognition methods based on deep learning are as follows. Zhang et al. [22] proposed the method of PalmRCNN for palmprint recognition, which is a modified version of Inception-ResNet-V1. Genovese et al. [24] proposed the method of PalmNet, a CNN that uses a method to tune palmprint specific filters through an unsupervised procedure based on Gabor responses and principal component analysis (PCA). Zhong and Zhu [25] proposed an end-to-end method for open-set 2D palmprint recognition by applying CNN with a novel loss function, i.e., centralized large margin cosine loss (C-LMCL). In order to solve the problem of palmprint recognition in an uncontrolled and uncooperative environment, Matkowski et al. [26] proposed end-to-end palmprint recognition network (EE-PRnet) consisting of two main networks, i.e., ROI localization and alignment network (ROI-LAnet) and feature extraction and recognition network (FERnet). Zhao and Zhang [27] proposed a deep discriminative representation (DDR) for palmprint recognition. DDR uses several CNNs similar to VGG-F to extract deep features from global and local palmprint images, and uses the collaborative representation-based classifier (CRC) for recognition. Zhao and Zhang [28] presented a joint constrained least-square regression (JCLSR) model with a deep local convolution feature for palmprint recognition. Zhao et al. [29] also proposed a joint deep convolutional feature representation (JDCFR) methodology for hyperspectral palmprint recognition. Liu and Kumar [30] proposed a generalizable deep learning-based framework for the contactless palmprint recognition, in which the network is based on a fully convolutional network that generates deeply learned residual features.
Some representative palm vein recognition algorithms based on deep learning are as follows. Zhang et al. [22] released a new touchless palm vein database and used the method of PalmRCNN for palm vein recognition. Lefkovits et al. [31] applied four CNNs for palm vein identification, including AlexNet, VGG-16, ResNet-50, and SqueezeNet. Thapar et al. [32] proposed the method of PVSNet, where a Siamese network was trained using triplet loss. Chantaf et al. [33] applied Inception-V3 and SmallerVGGNet for palm vein recognition. Stanuch et al. [34] proposed a contact-free multispectral palm vein recognition system using a designed CNN, whose architecture comprises of ten different layers, including five convolutional, four max pooling, and one dense layers.
In our previous work [35] , we systematically investigated the recognition performance of classic CNNs for 2D and 3D palmprint recognition and palm vein recognition. Seventeen representative and classic CNNs were ex-

A brief introduction of NAS technology
NAS is the sub-field of automated machine learning (AutoML). The goal of NAS is to design a network architecture with the best performance with the least human intervention and limited computing resources. The papers of [15] and [36] are considered as the pioneering work of NAS. In [15], the network structure obtained by the reinforcement learning (RL) achieves very promising accuracy in image classification tasks, which shows the idea of automation network architecture design is feasible. The development of NAS technology is very rapid. At the same time, NAS technology is being widely used in various tasks, such as classification, object detection, semantic segmentation, language modeling, and data augmentation, etc.
Despite the short development history of NAS, there have been many papers published, including five survey papers [10−14] . In 2019, Elsken et al. [10] provided an overview of existing NAS methods and categorized them according to three dimensions: search space, search strategy, and performance estimation strategy. Wistuba et al. [11] provided a formalism that unifies the landscape of existing NAS methods. This formalism can be used to critically examine the different approaches and understand the benefits of the different components that contribute to the design and success of NAS. Wistuba et al. [11] also highlighted some popular misconception pitfalls in the current trends of NAS technology. Ren et al. [12] provided a new perspective of NAS technology: Starting with an overview of the characteristics of the earliest NAS algorithms, a summary of the problems in these early NAS algorithms, and then giving solutions for subsequent related research work. Ren et al. [12] also conducted a detailed and comprehensive analysis, comparison and summary of existing NAS works and gave possible future research directions. Hu and Yu [13] surveyed NAS technology from a technical view. By summarizing the previous NAS approaches, Hu and Yu [13] drew a picture of NAS from different aspects, including problem definition, search approaches, progress towards practical applications and possible future directions. He et al. [14] compared the performance and efficiency of existing NAS algorithms on the CIFAR-10 and ImageNet datasets and provided an in-depth discussion of different research directions on NAS, including one/two-stage NAS, one-shot NAS, and joint hyperparameter and architecture optimization.
Almost all NAS methods are organized around three components: search space, optimization method, and evaluation method. Fig. 4 shows an abstract illustration of the NAS methods.
1) Search space. Search space is a set of possible neural network architectures. It adopts different design concepts according to different application scenarios, including computer vision tasks and language modeling tasks. From the above point of view, NAS is not completely out of artificial design; instead, it is more based on the design of the network structure of the search and reconstruction, and the number of architectures searched will usually reach a very large order of magnitude.
2) Optimization method. The optimization method teaches the search space how to search better. A good optimization method often plays a key role. Although there are many optimization methods, the starting point of their research is to obtain a better network architecture. In addition, most optimization methods are introduced on the basis of traditional optimization methods, such as reinforcement learning, evolutionary search, gradient-based optimization, Bayesian optimization, etc.
3) Evaluation method. The evaluation method especially evaluates the network structure. The general evalu- ation methods include a full training mode, partial training mode and NAS specific evaluation method. The full training mode is a time-consuming method, which usually requires thorough training for all the searched models, while partial training mode usually stops the training process early, saving cost and time. Among the specific evaluation methods of NAS, network morphism, weight sharing and hypernetworks are often used as heuristic quality assessment methods. In general, partial training mode is typically an order-of-magnitude cheaper than full training mode, while NAS specific evaluation methods are 2−3 orders-of-magnitude cheaper than full training mode. Many NAS methods have been proposed. According to the survey papers of NAS and the new development of NAS, we selected a lot of important NAS methods and listed them in Tables 1−3 according to the publishing year. It can be seen that most NAS papers have been published after 2017 and at some top conferences of artificial intelligence such as CVPR, ICML, ICCV, ECCV, ICLR, and NeurIPS. In fact, the number of papers on NAS is increasing rapidly. More comprehensive and newest NAS papers can be found on the websites https://github.com/D-X-Y/Awesome-AutoDL and https://www. automl.org.

Selected representative NAS methods for performance evaluation
The classification task is one of the important applications of NAS technology. As can be seen from Tables 1−3, most NAS methods are dedicated to finding a robust classification model. Fig. 5 shows the chronology of representative NAS methods for the classification task. These methods play an important role in the development history of NAS. Here, we briefly introduce them by year of publication.
In 2017, Zoph and Le [15] published the first paper that proposed the concept of NAS. Their work expresses the network structure as a variable-length string. They learn a good network structure through reinforcement learning (RL), specifically generate a description of the neural network model by using recurrent neural network (RNN), and train the RNN to maximize the accuracy of the generated neural network model.
In 2018, Zoph et al. [44] proposed the method of NAS-Net. NASNet improves the search space from searching hyperparameters to searching block cell structure, and its accuracy can reach state of the art (SOTA). Moreover, in this method, Zoph et al. [44] proposed to search on the proxy dataset, the small datasets (such as CIFAR-10), and then migrated to large datasets (such as ImageNet). Brock et al. [41] proposed the method of SMASH. SMASH uses an auxiliary network to initialize parameters of different networks and avoid retraining again, which greatly reduces the training time. Liu et al. [39] proposed the method of PNASNet to learn the structure of the CNN, which is more effective than the NAS methods based on reinforcement learning and evolutionary algorithm. Particularly, a sequential model-based optimization strategy is used in PNANet. Luo et al. [40] proposed the method of NAONet, which is a new method for optimizing network architecture, mapping the architecture to a continuous vector space. NAONet uses performance predictors and encoders to perform gradient optimization in continuous space to find a new coding structure with higher accuracy and decode it into a network by decoder.
In 2019, Xie et al. [62] proposed the method of SNAS. SNAS directly optimizes the objective function of the NAS task, and puts forward the expectation of optimizing the network loss function and network forward delay, so as to automatically generate hardware-friendly sparse Liu et al. [39] Progressive neural architecture search PNASNet 2018 ECCV Performance prediction Classification Luo et al. [40] Neural architecture optimization NAONet 2018 NeurIPS Gradient based Classification Brock et al. [41] One-shot model architecture search SMASH 2018 ICLR Gradient based Classification Bender et al. [42] Understanding and simplifying one-shot architecture search One-shot 2018 ICML Gradient based Classification Zhong et al. [43] Block-wise neural network architecture Block-QNN 2018 CVPR Reinforcement learning Classification Zoph et al. [44] NASNet architecture NASNet 2018 CVPR Reinforcement learning Classification Yang et al. [45] Platform-aware neural network adaptation NetAdapt 2018 ECCV Other Classification network. Real et al. [51] proposed the method of Amoe-baNet. The search space of AmoebaNet adopts the search space of NASNet [44] , and its network structure is similar to the structure of Inception [110] . In SNAS, the algorithm of ageing evolution is used to achieve better results. Pham et al. [61] proposed the method of ENAS, which is an economical and automatic model design method. By forcing all sub-models to share weights, the shortcomings of Chu et al. [53] Bridging the gap between stability and scalability in weight-sharing neural architecture search Nayman et al. [68] Neural architecture search with expert advice XNAS 2019 NeurIPS Gradient based Classification Peng et al. [69] Neural architecture transformation NATS 2019 NeurIPS Gradient based Object detection Hu et al. [70] Petridish Petridish 2019 NeurIPS Gradient based Classification Dong and Yang [71] Transformable architecture search TAS 2019 NeurIPS Gradient based Classification Chen et al. [72] Neural architecture search on object detection DetNAS 2019 NeurIPS Other Object detection Wortsman et al. [73] Discovering neural wirings DNW 2019 NeurIPS Gradient based Classification Dong and Yang [74] Neural architecture search for generative adversarial networks AutoGAN 2019 ICCV Reinforcement learning GAN Dong and Yang [75] Self-evaluated template network SETN 2019 ICCV Gradient based Classification Xiong et al. [76] Resource constrained neural network architecture search RCNet 2019 ICCV Evolutionary algorithm Classification Howard et al. [77] MobileNet-V3 MobileNet-V3 2019 ICCV Evolutionary algorithm Classification Zheng et al. [78] Multinomial distribution learning for effective neural architecture search huge and time-consuming NAS computing power are overcome, and the GPU computing time is reduced by more than 1 000 times. Cai et al. [66] proposed a NAS method without proxy tasks called ProxylessNAS. Proxy- Chu et al. [82] Mobile GPU-aware neural architecture search MoGA 2020 ICASSP Evolutionary algorithm Classification Chen et al. [83] Searching for faster real-time semantic segmentation FasterSeg 2020 ICLR Gradient based Semantic segmentation Xu et al. [84] Partially-connected differentiable architecture search PC-DARTS 2020 ICLR Gradient based Classification Mei et al. [85] Atomic blocks for neural architecture search AtomNAS 2020 ICLR Other Classification Dong and Yang [86] NAS-Bench-201 NAS-Bench-201 2020 ICLR Other Classification Tan et al. [87] EfficientDet EfficientDet 2020 CVPR Reinforcement learning Object detection Fang et al. [88] Densely connected search space for more flexible neural architecture search DenseNAS 2020 CVPR Gradient based Classification Zhang et al. [89] Gradient-based sampling NASrandom sampling NAS GDAS-NSAS 2020 CVPR Gradient based Classification Li et al. [90] Distill neural architecture DNA 2020 CVPR Gradient based Classification Guo et al. [91] Robust architectures network RobNet 2020 CVPR Gradient based Classification Gao et al. [92] Adversarial neural architecture search for GANs AdversarialNAS 2020 CVPR Gradient based GAN Wan et al. [93] Differentiable neural architecture search for spatial and channel dimensions FBNet-V2 2020 CVPR Gradient based Classification Bender et al. [94] TuNAS TuNAS 2020 CVPR Reinforcement learning Classification Li et al. [95] Sequential greedy architecture search SGAS 2020 CVPR Gradient based Classification Zheng et al. [96] Budgeted performance estimation BPE 2020 CVPR Other Classification Phan et al. [97] Binary neural network BNN 2020 CVPR Evolutionary algorithm Classification He et al. [98] Efficient neural architecture search via mixed-level reformulation MiLeNAS 2020 CVPR Gradient based Classification Dai et al. [99] Data adapted pruning for efficient neural architecture search DA-NAS 2020 ECCV Gradient based Classification Tian et al. [100] Efficient and effective GAN architecture search E 2 GAN 2020 ECCV Reinforcement learning GAN Chu et al. [101] Fair differentiable architecture search FairDARTS 2020 ECCV Gradient based Classification Hu et al. [102] Three-freedom neural architecture search TF-NAS 2020 ECCV Gradient based Classification Hu et al. [103] Angle-based search space shrinking ABS 2020 ECCV Other Classification Yu et al. [104] Barrier penalty neural architecture search BP-NAS 2020 ECCV Other Classification Wang et al. [105] Attention cell search for video classification AttentionNAS 2020 ECCV Other Video classification Bulat et al. [106] Binary architecTure search BATS 2020 ECCV Other Classification Yu et al. [107] Neural architecture search with big single-stage models BigNAS 2020 ECCV Gradient based Classification Guo et al. [108] Single path one-shot neural architecture search with uniform sampling Single-Path-SuperNet 2020 ECCV Evolutionary algorithm Classification Liu et al. [109] Unsupervised neural architecture search UnNAS 2020 ECCV Gradient based Classification lessNAS can directly search structures of large-scale target tasks, which can solve large GPU memory consumption problems and long computation time of the NAS method. Liu et al. [67] proposed the method of DARTS for effective structure search. Instead of searching in the discrete set of candidate structures, the search space of DARTS is relaxed to the continuous domain. The optimization can be carried out with the effect of verification set by means of gradient descent. The method of FairNAS is proposed by Chu et al. [48] FairNAS is the inheritance and development of the one-shot in the NAS community. FairNAS thinks that fair sampling and training methods can exert the potential of each module. Therefore, FairNAS proposes to meet the strict fairness. This constraint is that every single iteration of the hypernetwork makes the parameters of each optional operation module of each layer be trained. Dong and Yang [60] proposed the method of GDAS, which uses the gradient descent method to realize the effective network structure search. GDAS treats the search space as a directed acyclic graph, uses a differentiable sampler to test the sample structure, and optimizes the sampler by training the verification loss of the sampled structure. Howard et al. [77] proposed the method of MobileNet-V3, a new lightweight network structure based on MobileNet-V2 [111] . It is searched by MNASNet [55] and NetAdapt [45] . MobileNet-V3 contains the MobileNet-V3-large version and the Mobilenet-V3-small version to cope with resource consumption scenarios. Moreover, MobileNet-V3 has been successfully used in target detection and semantic segmentation tasks. Wu et al. [58] proposed a search framework for differentiable neural structures called DNAS, which uses a gradient-based method to optimize the convolution network structure and avoids exhaustive and independent training structure. FBNets are the family of network structures generated by the DNAS search framework, surpasses the manually designed and automatically generated state of the art model. Tan et al. [55] proposed an automatic mobile NAS method called MNASNet, which explicitly incorporated model delay into the main target.
The search can identify a model that achieves a good trade-off between precision and delay. Chu et al. [53] pro-posed the method of ScarletNAS with scalability function and solved the fairness problem of scalable hypernetwork training in one-shot routes through linear equivalent transformation. Tan and Le [47] proposed the method of EfficientNet. They used NAS to search a baseline network with accuracy and flops simultaneously, and do the balance of depth, width and resolution, and get a group of better EfficientNets. In 2020, Chu et al. [82] designed the mobile terminal GPU sensitive model from a practical standpoint, which is called MoGA. The method of PC-DARTS was proposed by Xu et al. [84] PC-DARTS is an extension of DARTS, which reduces the memory consumption of computing time in the process of network search through partial channel connection. Guo et al. [108] constructed a simplified hypernetwork called Single-Path-SuperNet, which was trained according to the uniform path sampling method. All substructures (and their weights) are fully and equally trained. Based on the trained hypernetwork, the optimal substructure can be quickly searched by an evolutionary algorithm, in which no fine-tuning of any substructure is required. Based on the idea of knowledge distillation, Li et al. [90] proposed the distill neural architecture (DNA), which introduces a teacher model to guide the direction of the network structure search. Using the supervision information from different depths of teacher model, the original end-to-end network search space is divided into blocks in-depth to realize the weight sharing training of independent blocks of network search space, which significantly reduces the interference caused by weight sharing. Wan et al. [93] proposed FBNet-V2, which takes both memory and efficiency into account. FBNet-V2 uses a masking mechanism for feature graph reuse and effective shape propagation to obtain better accuracy. Guo et al. [91] studied the model of neural network structure and then proposed the method of RobNet, which can resist the attack from the perspective of neural network structure. To obtain a large number of networks needed for research, they used a one-shot neural network structure search to train a super-net and then fine-tuned the sub-networks sampled from it. NAS (Zoph and Le [15] ) SMASH (Brock et al. [41] ) NASNet (Zoph et al. [44] ) PNASNet (Liu et al. [39] ) SNAS (Xie et al. [62] )
In this section, we introduce the selected NAS methods in detail as follows.
1) NASNet NASNet [44] was designed to make the training structure transferable. The best structure of NASNet is found on CIFAR-10 dataset, and then stacked several times and then applied to the ImageNet dataset. In addition, a new regularization technique called schedule drop path is proposed in this method. Fig. 6(a) shows the controller model architecture for recursively constructing one block of a convolutional cell. Each block requires selecting 5 discrete parameters, each of which corresponds to the output of a softmax layer. Fig. 6(b) shows the architecture of the best convolutional cells with B = 5 blocks identified with CIFAR-10. NASNet has different versions including NASNet-A, NASNet-B, NASNet-C and NASNet-mobile. The recognition performance of NASNet-A is the best, and the NASNet-mobile is light-weight network. In this paper, NASNet-A and NASNet-mobile are used for performance evaluation.
2) SMASH SMASH [41] trains an auxiliary model-hypernet to train the candidate models in the search process, which dynamically generates the weights of the main model with variable structure. Although the weights generated are worse than those obtained by free learning of fixed network structure, the relative performance of different networks in early training provides meaningful guidance for the performance under an optimal state. At the same time, a network representation mechanism based on memory back is developed to define various network structures.
3) PNASNet PNASNet [39] can learn a CNN, which matches the previous SOTA, and requires five times less model evalu-ation during architecture search. The starting point of this work is the structured search space proposed by NASNet, in which the task of the search algorithm is to search for a suitable convolution cell rather than a complete CNN. A cell contains B blocks, one of which is a combination operator (such as addition) applied to two inputs (tensors), each of which can be transformed (e.g., using convolution) before combining. Then, according to the size of the training set and the running time of the CNN, the cell structure is stacked for a certain number of times. This modular design also allows us to migrate the architecture from one dataset to another easily. Fig. 7(a) shows the best cell structure found by PNASNet, consisting of 5 blocks, and Fig. 7(b) shows the construction of CNNs from cells on CIFAR-10 and ImageNet. 4) NAONet Fig. 8 shows the general framework of NAONet [40] . NAONet consists of three parts: encoder, predictor and decoder. Experimental results showed that the architecture found by NAONet performs well in both CIFAR-10 image classification task and penn treebank (PTB) language modeling task, and is better than or equal to the best previous architecture search methods with significantly reduced computing resources.

5) SNAS
Compared with ENAS, the search optimization of SNAS is differentiable and the search efficiency is higher [62] . Compared with other differentiable methods such as DARTS and so on, SNAS directly optimizes the objective function of NAS tasks, and the search structure is more robust and efficient for multitasking. In addition, based on the advantage that SNAS keeps the advantage of stochasticity, Xie et al. [62] further proposed to optimize both the expectation of network loss function and the expectation of network forward delay to generate hardware friendly sparse network automatically.

6) AmoebaNet
AmoebaNet [51] improves the tournament selection method in genetic algorithm. This method is changed into an age-based selection method, namely the ageing evolution algorithm, which makes the genetic algorithm prefer young individuals. Experiments show that the algorithm has a faster search speed than reinforcement learning and random search on the same hardware conditions. The ageing evolution algorithm has the following six steps: i) P neural network structures are randomly initialized and added to the queue to form a population for training; ii) The population was sampled, and S neural networks were selected; iii) S neural networks are obtained by sampling, and the neural network with the highest accuracy is selected as the parent; iv) The network is trained and added to the population, i.e., the rightmost side of the queue; v) Removing the "oldest" neural network in the population is actually the leftmost element of the queue; vi) Go back to Step ii) and cycle a certain number of times.

7) ENAS
ENAS [61] is an economical and automatic model design method beyond NASNet [44] . By forcing all sub-models to share weights, the efficiency of NAS is improved, and the shortcomings of huge computing cost and time-consuming of NAS are overcome. The computing time of GPU is reduced by more than 1 000 times. On the CIFAR-10 dataset, the test error reaches 2.89%, which is similar to NASNet (2.65% test error). Fig. 9 shows the network architecture of ENAS.

8) ProxylessNSA
ProxylessNAS [66] is the first NAS algorithm to search large design space directly on large-scale ImageNet datasets without any proxy and customize CNN architecture for hardware for the first time. Cai et al. [66] combine the idea of model compression (pruning and quantifying) with NAS, reduce the computing cost (GPU time, GPU memory) of NAS to the same scale as the conventional training, and reserve rich search space, and directly incorporate the hardware performance (delay, energy consumption) of neural network structure into the optimization objective. Fig. 10 shows the efficient models optimized for different hardware. Figs. 10(a)−10(c) show the GPU model, the CPU model and the mobile model found by ProxylessNAS. GPU prefers shallow and wide model with early pooling, while CPU prefers deep and narrow model with late pooling. Pooling layers prefer large and wide kernel while early layers prefer small kernel, late layers prefer large kernel. In this paper, the GPU model and mobile model of ProxylessNAS are used for evaluation.

9) DARTS
Most network search algorithms use enhanced learning or evolutionary algorithms to search for structures. The search space of such algorithms is discrete, and the search time is too time consuming. The search space of DARTS is continuous, and the search process is completed by using a gradient descent algorithm on the verification set. The computational cost of DARTS is several orders of magnitude smaller than that of ordinary network search algorithms, but the result obtained by searching can still be equal to that of the previous SOTA algorithm. Meanwhile, its generalization ability is also very good. It can be used not only for searching CNN structure, but also for searching RNN structure. Fig. 11 shows an overview of DARTS.

10) FairNAS
FairNAS [48] is a one-shot method in the field of NAS, and it advocates that weights can be shared. It trains a super-net from the beginning to the end (only one HY-PERNET is trained completely, which is also the meaning of one-shot). Each model is a sampling model of HY-PERNET. The advantage of this is that it does not need time-consuming training for each model to know its representation ability. Therefore, it is famous for greatly improving the efficiency of NAS and has become the mainstream of NAS. Nevertheless, the premise of one-shot is that weight sharing is effective, and the model ability can be verified quickly and accurately in this way. This kind of situation is a little like the Matthew effect. If the con-  Fig. 7 Cell structure of PNASNet [39] : (a) The best cell structure found by PNASNet; (b) Employing a similar strategy as [44] when constructing CNNs from cells on CIFAR-10 and ImageNet.
Encoder  Fig. 9 ENAS′s discovered network from the macro search space for image classification [61] ditions are not good, they will fall into a circular dilemma. FairNAS believes that fair sampling and training methods can give full play to the potential of each module. Finally, after completing the training, the sampling model can quickly use the weights in the hypernet to get a relatively stable performance index on the verification set. This fair algorithm can almost completely maintain the ranking of the models, and the models sampled from the super-net and the models trained separately will eventually have almost the same ranking. FairNAS has three versions, including FairNAS-A, FairNAS-B and FairNAS-C, which have different search architectures by different search space. Fig. 12 shows architectures of FairNAS-A, B and C.

11) GDAS
GDAS [60] uses the gradient descent to search the network structure effectively, and the search space is represented by a directed acyclic graph (DAG). This DAG may have millions of sub-graphs, each of which is a neural network structure. In order to avoid traversing so many subgraphs, Dong and Yang [60] use a differentiable sampler to sample the sample structure and optimizes the sampler by training the verification loss of the sampled structure. GDAS can search a robust neural network structure in 4 hours on a V100GPU. GDAS is similar to DARTS, but there are two differences between them: i) How to make the search space differentiable? DARTS transforms the weight softmax of operations into a probability after joint   optimization and takes the operation with the maximum probability of connection between nodes. Dong and Yang [60] used the Gumbel max trick to select the transformation function between nodes with argmax function in forward propagation, and use softmax function to differentiate one hot vector in backward propagation and use gradient backpropagation. ii) DARTS jointly searches all operations, which will lead to antagonism between operations, and the weight that may be generated will offset each other, which makes the optimization difficult. Besides, the joint search of normal cell and reduction cell greatly increases the search space. In [60], the reduction cell is fixed, and the normal cell is searched. It only takes 4 hours on a V100GPU, and only the function between the sampled nodes is updated each time. Fig. 13 shows the search space of a neural cell using DAG. Different operations (colored arrows) transform one node (square) to its intermediate features (little circles). Meanwhile, each node is the sum of the intermediate features transformed from the previous nodes. As indicated by the solid connections, the neural cell in the proposed GDAS is a sampled sub-graph of this DAG. Specifically, among the intermediate features between every two nodes, GDAS samples one feature in a differentiable way.

12) FBNet-V1
In the method of FBNet-V1 [58] , a differentiable neural architecture search (DNAS) is used to find the hardware related light-weight convolution network. The DNAS method represents the whole search space as a hypernetwork, transforms the search for the optimal network structure into finding the optimal candidate block distribution, trains the block distribution by gradient descent, and can select different blocks for each layer of the network. In order to better estimate the network delay, the actual delay of each candidate block is measured and recorded in advance, which can be accumulated directly according to the network structure and the corresponding delay. Fig. 14 shows the visualization of some of the searched architectures.

13) MNASNet
MNASNet [55] is an automatic mobile neural network search method. MNASNet explicitly takes the model operation delay time as one of the main optimization objectives to search for a network model structure that can balance the latency and accuracy. In the previous work, the latency was measured indirectly by an inaccurate method, such as flops (floating-point operations per second). MNASNet can implement models on mobile devices to directly measure the inference delay in the real world. In addition, a hierarchical search space method is proposed to determine the network structure. The first inspiration of [55] comes from the fact that although there are similar flops (575 M VS. 564 M) in MobileNet and NASNet, the delay time is quite different (113 ms VS. 183 ms). Secondly, Tan et al. [55] observed that the previous automation methods mainly search for several types of units and then stack the same units repeatedly through the network. This simple search mechanism will limit the diversity of layers. The first inspiration gave birth to the idea of multi-objective optimization of operation delay time and precision; the second inspiration gave birth to the method of hierarchical decomposition of search space. Allow layers to be architecturally different but still strike the right balance between flexibility and search space   Fig. 13 Using a DAG to represent the search space of a neural cell [60] size. Fig. 15 shows an overview of MNASNet.

14) ScarletNAS
In the method of ScarletNAS [53] , an automatic neural network search with a scalability function is proposed. The problem of the fairness of scalable hypernetwork training in one shot route is solved by a linear equivalent transformation. ScarletNAS uses conv1×1 (without bias/relu) + conv to replace identity + conv in training super net, which solves the problem of convergence in training scalable network. The introduction of conv1×1 (without bias/relu) + conv is a linear transformation, which is equivalent to identity + conv. In ImageNet 1K classification task, it achieves 76.9% top-1 accuracy and is currently SOTA of < 390M flops level. ScaletNAS has three different versions, including ScaletNAS-A, Scalet-NAS-B and ScaletNAS-C, and their network complexity is gradually reduced. ScaletNAS-A usually gets the better result. Fig. 16 shows architectures of ScaletNAS-A, B and C.

15) MoGA
MoGA [82] considers the use of mobile GPU in real scenes, and the model can directly serve mobile visual products. The first novel point of MoGA is mobile GPU aware (MoGA), which is to design the mobile GPU sensitive model from the perspective of practical use. The second point of view of MoGA comes from the analysis of MobileNet trilogy. From MobileNet-V1 to MobileNet-V3, the accuracy is constantly improving, but the number of model parameters is increasing. Therefore, the optimization of the model parameters is an aspect worth studying. In addition to the business indicator top-1 accuracy, the running time of the model at the device side is regarded as the key indicator to measure the model, not the multiplier and adder, so the multiplier and adder are eliminated from the target. Also, the previous methods tried to compress the parameters, which is very disadvantageous to multi-objective optimization. On the Pareto boundary, which does not harm others but benefits oneself, one must give up to get something. It is considered that the parameter quantity is the representation of the model capability, so the model with high parameters but with low delay can be obtained by encouraging the increase of parameter quantity instead of increasing the search range. MoGA has three versions including MoGA-A, MoGA-B and MoGA-C, which have different search layers. Fig. 17 shows architectures of MoGA-A, B and C.
16) PC-DARTS PC-DARTS [84] is an effective channel sampling method in which only a part of the channel is sampled into the core of the multi-choice operation. Channel sampling can alleviate the "overfitting" phenomenon of hypernetwork, and greatly reduce its memory consumption so that the speed and stability of structure search can be improved by increasing the batch size in the training process. However, channel sampling will lead to the inconsistency of the edge selection of the hypernetwork, which increases the disturbance caused by random approximation. In order to solve this problem, an edge regularization method is proposed, which uses a set of additional edge weight parameters to reduce the uncertainty in search. After these two improvements, the search speed of the method is faster, the performance is more stable, and the  Fig. 18 shows the general framework of PC-DARTS.

17) Single-Path-SuperNet
One-shot is a powerful neural architecture search framework, but its training is relatively complex, and it isn′t easy to obtain competitive results on large data sets (such as ImageNet). Guo et al. [108] proposed a single path one-shot model called Single-Path-SuperNet to solve the main challenges in the training process. The core idea of Single-Path-SuperNet is to construct a simplified super network, which is trained according to the uniform path sampling method. All substructures (and their weights) are fully and equally trained. Based on the trained hypernetwork, the optimal substructure can be quickly searched by an evolutionary algorithm, in which no finetuning of any substructure is required.
18) DNA DNA [90] is a method to solve the two problems: efficiency and effectiveness. It differs from the existing neural network search algorithm such as RL, DARTS, One-Shot, etc. Based on the idea of knowledge distillation, Li et al. [90] introduce a teacher model to guide the direction of the network structure search. Using the supervision information from different depths of teacher model, the original end-to-end network search space is divided into blocks in-depth to realize the weight sharing training of independent blocks of network search space, which greatly reduces the interference caused by weight sharing. At the same time, it ensures the accuracy of the evaluation of candidate sub-models without sacrificing the efficiency of weight sharing. The algorithm can traverse all candidate structures in the search space. DNA has four versions, i.e., DNA-a, DNA-b, DNA-c and DNA-d, which are searched by different parameters. Fig. 19 shows an illustration of DNA. The teacher′s previous feature map is used as input for both teacher and student block. Each cell of the super net is trained independently to mimic the behavior of the corresponding teacher block by minimizing the l2-distance between their output feature  Fig. 17 Architectures of MoGA-A, B and C (from top to bottom) [82] maps. The dotted lines indicate randomly sampled paths in a cell.

19) FBNet-V2
There have been some classic NAS before the FBNet-V2 method was proposed, such as DARTS, Proxy-lessNAS and so on. However, all of them have their own defects, such as: i) Memory loss limits the size of search space; ii) Memory loss is linearly related to the number of operations per layer; iii) ProxylessNAS can effectively reduce memory loss by using binary training method, but in large search space, the memory loss is not limited, and the convergence rate is very slow. FBNet-V2 [93] can greatly expand the search space without increasing memory loss and can maintain high-speed search in large search space. The main contributions of FBNet-V2 are as follows: i) A NAS with both memory and efficiency is proposed; ii) A masking mechanism and an effective shape propagation for feature map reuse are proposed; iii) The precision of the network is very high.

20) RobNet
In order to improve the robustness of the deep neural network, the existing work focuses on the study of confrontation learning algorithm or loss function to enhance the network robustness. From the perspective of neural network structure, RobNet [91] studies the model of neural network structure, which can resist attack. In order to obtain a large number of networks needed in this study, one-shot neural network structure search is used to train a supernet, and then the sub-networks sampled from it are adjusted against fine-tuning. The sampling network structure and its robustness accuracy provide a rich basis for the research.

2D and 3D palmprint and palm vein databases used for evaluation
In this paper, five 2D palmprint image databases, one 3D palmprint database and two palm vein databases are exploited for performance evaluation, including PolyU II, Sample Fig. 18 Overall framework of PC-DART: The upper part is the partial channel connection and the lower part is the edge regularization [84] .  Fig. 19 Illustration of the DNA method [90] W.  Figs. 25 and 26, the three images depicted in the first row were captured in the first session. The three images depicted in the second row were captured in the second session. Fig. 1 shows one original 3D palmprint data, and four different 2D representations from one 3D palmprint, including MCI, GCI, ST and CST. PolyU II is a challenging palmprint database because the illumination between the first and second sessions has a notorious change. HFUT CS is also a challenging palmprint database. From Fig. 24, it can be seen that there are some differences between the palmprints captured by different devices.

Experimental configuration
We selected twenty NAS methods for performance evaluation. As some of the selected methods have different versions, we selected various versions for evaluation, including (NASNet-A, NASNet-mobile), (ProxylessNAS, ProxylessNAS-mobile), (FairNAS-A, B, and C), (Scalet-NAS-A, B and C), (MoGA-A, B and C) and (DNA-a, b,  c and d).
Here, we introduce the default configuration of the experiment, including experimental hyperparameters and hardware configuration. Since different networks need different input sizes, the palmprint and palm vein ROI image need to be up-sampled to a suitable size before inputting into the network. In order to enhance the stability of the network, we also added a random flip operation (only during the training phase); that is, for a training image, there is a certain probability that the image is flipped horizontally and then input into the network. We did not initialize the model parameters using the random parameter initialization method; instead, we initialized it us-   ing the parameters of the pretrained model in the Im-ageNet or CIFAR dataset. It is worth noting that when an official model is trained on the ImageNet dataset, we prefer to use the pre-trained model. If not, we will use the pre-trained model of the CIFAR dataset. The palmprint and palm vein ROI image in the database is usually a grayscale image, which means that the number of image channels is 1. The input of the model is an RGB image, so the grayscale channel of the image is copied three times to form an RGB image with 3 channels. The system configuration is as follows: Intel CPU i7-8700 3.20GHz, NVIDIA GPU GTX 2080, 16GB memory and Windows 10 operating system. All evaluation experiments were performed on Pytorch. The cross-entropy loss function, Adam optimizer, was used by default. The batch size was set to 4, and the learning rate to 5×10 −5 .

Performance measures
In this paper, both identification and verification experiments are conducted.
Identification is a one-to-many comparison, which answers the question of "who the person is?". In this paper, the close-set identification is conducted. That is, we know all enrollments exist in the training set. In order to obtain identification accuracy, the rank 1 identification rate Fig. 22 Six palmprint ROI images of the HFUT database. The three images in the first row were captured in the first session. The three images in the second row were captured in the second session.    is used, in which a test image will be matched with all templates in the training set, and the label of the most similar template will be assigned to this test image. For the sake of simplicity, we define the rank 1 identification rate as the accuracy recognition rate (ARR).
Verification is a one-to-one comparison, which answers the question of "whether the person is whom they claim to be". In the verification experiments, the statistical value of equal error rate (EER) is adopted to evaluate the performance of different methods.

Recognition performance on separate data mode
We first conduct evaluation experiments on separate data mode. That is, in all databases, all samples captured in the first session are used for training, and all samples captured in the second session are used for the test. We conduct the experiments using all selected NAS methods on all databases. The values of ARR and EER of the selected NAS methods obtained from 2D palmprint and palm vein databases are listed in Tables 5 and 6, respectively. The values of ARR and EER obtained from four 2D representations (CST, ST, MCI and GCI) of 3D palmprint databases are listed in Table 7.
From Tables 5−7, we have the following observations: 1) ProxylessNAS achieves the best recognition results on most databases. In the palm vein databases PolyU M_N and FairNAS-A, it achieves the best recognition results, whose ARR and EER are 100% and 0.000 1%, respectively.
2) ProxylessNAS was proposed in 2019, but its recognition performance is better than those methods proposed in 2020. This shows that the recognition performance of the latest NAS recognition methods is not necessarily better than that of the old NAS methods. For example, the recognition performance of NASNet proposed in 2016 is better than that of some methods proposed in 2019 and 2020, such as DARTS, ENAS, and RobNet, etc.
3) PolyU II is a challenging database because the samples captured in the first session and the second session have some noticeable variations, such as illumination change. In this database, the highest ARR is obtained by the proxylessNAS method, which is 98.63%. However, this is an unsatisfactory recognition result. It also shows that it is necessary to further study the new NAS methods to improve the results of 2D palmprint recognition. 4) HFUT CS is a cross-sensors database and is also a challenging database. In this database, the highest ARR is obtained by the proxylessNAS method, which is 99.78%. It is a promising result, and it also shows that cross-sensor palmprint recognition based on NAS technology is worthy of attention in the future. 5) For 3D palmprint recognition, the method of ProxylessNAS achieves 100% ARR and 0.005 7% EER on MCI representation, which is a very encouraging result. This result shows that NAS technology is up-and-coming in 3D palmprint recognition, which deserves further study. Among four 2D representations of 3D palmprint, MCI is most suitable for 3D palmprint recognition based on NAS technology.

Recognition performance on mixed data mode
In this section, we conduct evaluation experiments on the mixed data mode. The first image captured in the second session is added to the training data. That is, the training set of each palm contains all images captured in the first session and the first image captured in the second session. The values of ARR and EER of the selected NAS methods obtained from 2D palmprint and palm vein databases are listed in Tables 8 and 9, respectively. The values of ARR and EER obtained from 3D palmprint databases are listed in Table 10.
From Tables 8−10, we have the following observations: 1) The recognition results of all the NAS methods obtained in "mixed data mode" is better than that of "separate data mode". Although we only added one image of the second session to the training set, the recognition results of all NAS methods are still greatly improved. This experiment shows that the prediction accuracy of deep learning-based methods will be improved after more data are obtained. We can infer that if the network can be trained by the data collected from multiple stages, the recognition results of all methods can be significantly improved.
2) In the "mixed data mode", the three methods with the best recognition performance are ProxylessNAS, ProxylessNAS-Mobile and Scarlet-A. The ProxylessNAS method achieves the best recognition results in most databases. On all databases, the ARR of ProxylessNAS is 100%, and the EER is very low.
3) In the "mixed data mode", the convergence speed of the neural network is usually faster, and the number of epochs trained is reduced by nearly half.

Performance comparison with other methods
For 2D and 3D palmprint and palm vein recognition, we compare the performance between NAS methods and other methods, including four traditional methods and four deep learning methods.
For NAS methods, we select ProxylessNAS and FairNAS-A for performance comparison. Among different NAS method, the overall performance of Proxy-lessNAS is the best, and FairNAS-A achieves the best recognition performance in the PolyU M_N database.
Four traditional and representative palmprint recogni-tion methods are selected for performance comparison, including competitive code (CompC) [16] , ordinal code (Or-dinalC) [112] , robust line orientation code (RLOC) [113] and local line directional pattern (LLDP) [114] . Four deep learning methods are selected for performance comparison including PalmNet [24] , ResNet [115] , Mobi-leNet-V3, and EfficientNet. PalmNet is a deep learning method specially designed for palmprint recognition, which has excellent recognition performance. ResNet is a very famous CNN and is also a typical representative of manually designed CNN. As we have mentioned above, MobileNet-V3 and EfficientNet are two semi-NAS methods and have excellent recognition performance. The recognition performance of MobileNet-V3 and EfficientNet has been evaluated in our previous work.

Performance comparison with other methods
in the separate data mode In the "separate data mode", for all databases, traditional methods use four images captured in the first session as the training data and uses the images collected in the second session as the test data. For deep learningbased methods including PalmNet, ResNet, MobileNet-V3, EfficientNet. ProxylessNAS and FairNAS-A, all images collected in the first session are used as the training data, and the remaining second session images are used as test data. The comparison results on 2D palmprint and palm vein databases are listed in Table 11, and the comparison results on 3D palmprint database are listed in Table 12.
From Tables 11 and 12, we have the following obser- vations: 1) In the PolyU II palmprint database, four traditional methods (CompC, OrdinalC, RLOC, and LLDP) and one manually designed deep learning method (PalmNet) achieve better recognition performance than the NAS methods. In the PolyU M_B palmprint database, all methods can achieve 100% ARR, and the method of ProxylessNAS achieves the lowest EER, which is 3.34×10 −5 %. In the HFUT I palmprint database, the method of PalmNet achieves the highest ARR (100%), and the method of ProxylessNAS achieves the lowest EER (0.040 7%). In the HFUT CS palmprint database, the method of PalmNet achieves the highest ARR (100%), and the method of EfficientNet achieves the low-est EER (0.021 7%). In the PolyU M_N palm vein database, most methods can achieve 100% ARR, and RLOC and FairNAS-A achieve the lowest EER (0.000 1%). In the TJU-PV palm vein database, the method of RLOC achieves the highest ARR (100%), and the method of FairNAS-A achieves the lowest EER (0.064 7%).
2) For 2D palmprint and palm vein recognition, the overall recognition performance of NAS methods is close to that of the traditional methods. On some databases, traditional recognition methods have better recognition performance. On other databases, NAS methods have better recognition performance.
3) For 2D palmprint and palm vein recognition, the overall recognition performance of the NAS methods is close to the deep learning-based method, i.e., PalmNet, a method specially designed for palmprint recognition. In the databases of PolyU II, PolyU M_B, HFUT I and TJU-P, PalmNet has better recognition performance than the NAS methods. However, in the database of HFUT CS, the recognition performance of PalmNet is very poor. 4) For 2D palmprint and palm vein recognition, the overall recognition performance of the NAS methods is notoriously better than that of one representative manually designed CNN method, i.e., ResNet. 5) For 2D palmprint and palm vein recognition, the overall recognition performance of one pure NAS method, i.e., ProxylessNAS, is slightly better than EfficientNet and MobileNet-V3. 6) For 3D palmprint recognition, the method of ProxylessNAS achieves 100% ARR on the MCI representation, which is better than other methods.

Performance comparison with other methods
in the mixed data mode In the "mixed data mode", for traditional methods, four images collected in the first session are used as the training data, and we add the first image captured in the second session to the training set. The remaining images collected in the second session are exploited as the test data. For ProxylessNAS and FairNAS-A, all the images  Table 13, and the comparison results on the 3D palmprint database are listed in Table 14.
From Tables 13 and 14, we have the following observations: 1) In the mixed data mode, all methods have achieved outstanding recognition performance. Almost all methods can obtain 100% ARR and very low EER. That is to say, for various methods, it is very easy to obtain high recognition performance by using the mixed data mode.
2) In the mixed data mode, for 2D palmprint and palm vein recognition, the overall recognition performance of NAS methods is close to that of traditional methods and other deep learning-based methods.
3) In the mixed data mode, for 3D palmprint recognition, the method of ProxylessNAS achieves the best performance on the MCI representation.

Conclusions
This paper systematically investigated the recognition performance of representative NAS methods for 2D and 3D palmprint recognition and palm vein recognition. Twenty representative NAS methods were exploited for performance evaluation, including NASNet, SMASH, PNASNet, NAONet, SNAS, AmoebaNet, ENAS, Proxy- We conducted evaluation experiments on both separate data mode and mixed data mode. Experimental results showed that, among different NAS methods, Proxy-lessNAS achieved the best recognition accuracy. In other words, ProxylessNAS is a very suitable NAS method for 2D and 3D palmprint recognition and palm vein recognition.
In the "separate data mode", for 2D palmprint recognition and palm vein recognition, the overall recognition performance of ProxylessNAS is close to that of traditional methods including CompC, OrdinalC, RLOC and LLDP. It is close to the deep learning-based method, i.e., PalmNet, which is a method specially designed for palmprint recognition. The overall recognition performance of ProxylessNAS is considerably better than that of one representative manually designed CNN method, i.e., ResNet, and is slightly better than EfficientNet and Mo-bileNet-V3. For 3D palmprint recognition, ProxylessNAS achieved 100% ARR on the MCI representation, which is better than other methods. In the "mixed data mode", almost all methods can obtain 100% ARR and very low EER. For 2D palmprint and palm vein recognition, the overall recognition performance of ProxylessNAS is close to that of the traditional methods and other deep learning-based methods. For 3D palmprint recognition, ProxylessNAS achieves the best performance on the MCI representation.
In this work, it is the first time to conduct a performance evaluation of representative NAS methods for 2D and 3D palmprint and palm vein recognition. Experimental results showed the NAS is an up-and-coming technology for 2D and 3D palmprint and palm vein recognition. In our future work, based on NAS technology, we will try to design new methods to further improve the recognition performance of 2D and 3D palmprint recognition and palm vein recognition.

Acknowledgements
This work was supported by National Science Founda-

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article′s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.