Keywords

1 Introduction

In spite of the many advantages offered by biometric recognition with respect to other traditional authentication methods (the well-known Lema “forget about PINs or passwords, you are your own key”), biometric systems are also vulnerable to external attacks. As a consequence, the security and privacy offered by biometric recognition systems can be undermined. Given its serious implications, the vulnerabilities of biometric systems to different types of attacks have been the subject of numerous studies in the last decades for different characteristics, including fingerprint [9, 18, 64], face [1], iris [23, 26, 27], voice [3] or multimodal systems [2, 10, 28].

Among other possible points of attack [64], the biometric capture device is probably the most exposed one: the attacker does not need to know any details about the inner modules of the biometric system in order to attack the sensor. To fool the biometric system, he can present the capture device with a Presentation Attack Instrument (PAI), such as a 3D mask [16], a printed finger vein image [76] or a fingerprint overlay [18]. These attacks are known in the literature as Presentation Attacks (PA) [38].

In order to prevent such attacks, Presentation Attack Detection (PAD) methods have been recently developed to automatically distinguish between bona fide (i.e. real, live or genuine) presentations and access attempts carried out by means of PAIs [49]. Incorporating such countermeasures in biometric systems are crucial, especially in unattended scenarios. Given the importance of increasing the robustness of biometric systems to these attacks, and hence the systems’ security, this area of research has attracted a considerable attention within the biometric community in the last decade. In fact, several international projects like the European Tabula Rasa [70] and BEAT [48], or the more recent US Odin research program [55], deal with these security concerns. In addition, the LivDet liveness detection competition series on iris [79] and fingerprint [80] have been running since 2009. In turn, these initiatives have led to a wide number of publications on PAD methodologies for several biometric characteristics, including iris [19], fingerprint [47, 67], or face [20].

Compared to other biometric characteristics, such as fingerprint or handwritten signature, the use of finger vein for recognition purposes are relatively new: the first commercial applications date back to 2005 by Hitachi Ltd [45]. The first studies on the vulnerability of finger vein recognition systems to presentation attacks were carried out only in 2014 [76]. In this work, Tome et al. showed how a simple print out of a finger vein image could successfully fool the system in up to 86% of the attempts. A similar evaluation was carried out by Tome and Marcel [74] in 2015 for palm vein images, where the success rate of the attacks reached figures as high as 75%. It is hence crucial to protect vein-based systems from these presentation attacks, which, given their simplicity, can be carried out by potentially any individual. This is especially relevant for finger vein, due to the extended use of the corresponding sensors in ATMs (i.e. unsupervised scenario) in countries as diverse as China,Footnote 1 Turkey,Footnote 2 Taiwan,Footnote 3 or Poland.Footnote 4

These facts call for a joint effort within the biometrics community to develop PAD techniques for vein-based systems. In this context, the first approach based on Fourier and wavelet transforms was proposed in 2013 by Nguyen et al. [51]. Two years later, the first competition on finger vein PAD was organised [75], where three different teams participated. Since then, different PAD approaches have been presented, based on either a video sequence and motion magnification [60], texture analysis [44, 61, 71], image quality metrics [7], or more recently, neural networks [52, 59, 63] and image decomposition [58].

All the aforementioned works are focused on the detection of printed finger vein images, or, in some cases, of replay attacks carried out with digital displays [61]. In all cases, almost perfect error rates are achieved, thereby indicating that such PAIs can be easily detected with the current techniques. However, the applications of finger vein-based PAD are not limited to finger vein recognition. In fact, the development of multimodal capture devices which are able to acquire both finger vein images or videos, and finger photos, opens new lines of research [62]: biometric recognition can be based on fingerprints extracted from the photos, and PAD techniques can be developed for the finger vein data. This approach is being currently followed in the BATL project [6] within the US Odin research program [55]: among other sensors, finger vein images are used to detect fingerprint presentation attacks. As with the aforementioned finger vein print outs, it has already been shown that fingerprints can be recovered even from the stored ISO templates [18], and then be transformed into a PAI, which is recognised as a fingerprint. However, most fingerprint PAIs do not take into account the blood flow, which is also harder to simulate. On the other hand, the finger vein printed images analysed in the finger vein PAD literature will not be able to fool the fingerprint scanner, as it contains no fingerprint. We can therefore also include a finger vein PAD module in multimodal finger sensors designed for fingerprint recognition, thereby making it harder for an eventual attacker to design a PAI which is able to bypass both sensors.

In this chapter, we will first summarise in Sect. 14.2 the main concepts and evaluation metrics for biometric PAD defined in the recent ISO/IEC 30107 standard [38, 39]. The state of the art in fingervein and fingerprint PAD is subsequently reviewed in Sect. 14.3. We will then describe the multimodal sensor developed in the BATL project and the proposed approach to finger vein-based PAD to detect fingerprint PAIs (Sect. 14.4). The proposed method is evaluated according to the ISO/IEC 30107 standard [39] in Sect. 14.5. The chapter ends with the final discussion and conclusions in Sect. 14.6.

2 Presentation Attack Detection

Presentation attacks are defined within the ISO/IEC 30107 standard on biometric presentation attack detection [38] as the “presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”. The attacker may aim at impersonating someone else (i.e. impostor) or avoiding being recognised due to black-listing (i.e. identity concealer).

In the following, we include the main definitions presented within the ISO/IEC 30107-3 standard on biometric presentation attack detection—part 3: testing and reporting [39], which will be used throughout the chapter:

  • Bona fide presentation: “interaction of the biometric capture subject and the biometric data capture subsystem in the fashion intended by the policy of the biometric system”. That is, a normal or genuine presentation.

  • Attack presentation/presentation attack: “presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”. That is, an attack carried out on the capture device to either conceal your identity or impersonate someone else.

  • Presentation Attack Instrument (PAI): “biometric characteristic or object used in a presentation attack”. For instance, a silicone 3D mask or an ecoflex fingerprint overlay.

  • PAI species: “class of presentation attack instruments created using a common production method and based on different biometric characteristics”.

In order to evaluate the vulnerabilities of biometric systems to PAs, the following metrics should be used:

  • Impostor Attack Presentation Match Rate (IAPMR): “proportion of impostor attack presentations using the same PAI species in which the target reference is matched”.

  • Attack Presentation Classification Error Rate (APCER): “ proportion of attack presentations using the same PAI species incorrectly classified as bona fide presentations in a specific scenario”.

  • Bona Fide Presentation Classification Error Rate (BPCER): “ proportion of bona fide presentations incorrectly classified as presentation attacks in a specific scenario”.

Derived from the aforementioned metrics, a global measure can be computed for an easier benchmark across different systems: the Detection Equal Error Rate (D-EER). It is defined as the error rate at the operating point where APCER = BPCER.

3 Related Works

In addition to the initial review of the existing works on finger vein PAD presented in the introductory chapter, we first survey those works in detail, further discussing the PAI species analysed and the detection performance achieved (see Sect. 14.3.1). We subsequently summarise in Sect. 14.3.2 the most relevant works on fingerprint PAD, since our main aim is to detect fingerprint PAIs with finger vein images. For more details and a more extensive survey on fingerprint PAD, the reader is referred to [47, 67].

3.1 Finger Vein Presentation Attack Detection

A summary of the most relevant works in finger vein PAD is presented in Table 14.1, classified according to the feature types extracted (handcrafted versus deep learning) and the publication year. In addition, the main performance metrics over the selected database is reported.

As mentioned in Sect. 14.1, research on finger vein recognition is relatively new. As a direct consequence, the pioneering work on finger vein PAD was published as recent as in 2013 [51]. Nguyen et al. proposed the combination of features in both spatial and frequency domains through the Fourier and two different wavelet transforms (i.e. Haar and Daubechies). They achieved a D-EER as low as 1.5% in their experiments on a self-acquired database comprising both bona fides and a single PAI species: printed finger vein images.

One year later, in 2014, Tome et al. analysed in-depth the vulnerabilities of finger vein recognition systems to PAs, revealing an alarming IAPMR up to 86% for simple print outs of vein images [76]. This study motivated Tome et al. to organise the first competition on finger vein PAD in 2015 [75]. In addition to the baseline system developed at Idiap,Footnote 5 three teams participated, proposing different approaches to detect the PAs, namely: (i) Binarised Statistical Image Features (BSIF), (ii) a monogenic global descriptor to capture local energy and local orientation at coarse level and (iii) a set of local descriptors including Local Binary Patterns (LBP), Local Phase Quantisation (LPQ), a patch-wise Short-time Fourier transform (STFT) and a Weber Local Descriptor (WLD). In all cases, the final classification was carried out with Support Vector Machines (SVMs), achieving remarkable detection rates with a low complexity. Another byproduct of the competition was the establishment of the Idiap Research Institute VERA Fingervein Database [77] as a benchmark for finger vein PAD (see Table 14.1) with a single PAI species: printed images. This, in turn, motivated the biometrics community to pursue the development of more efficient PAD techniques.

Also in 2015, Raghavendra et al. [60] analysed short video sequences with the aid of Eulerian video magnification [78]. The goal was to amplify the blood flow and thus detect the printed artefacts. They compared the newly proposed method with reimplementations of the algorithms presented in [75] over a self-acquired database: the ACER was reduced 5 to 23 times, thus proving the soundness of the proposed approach. In the same year, Tirunagari et al. proposed the use of Dynamic Mode Decomposition (DMD), which is a mathematical method developed to extract information from non-linear complex fluid flows [71]. They designed a windowed DMD technique in order to extract micro-texture information from a single image, which is decomposed into its maximum variance at column level, and the corresponding residual or noise image. Using SVMs for classification over the VERA DB, they achieved D-EERs outperforming other texture descriptors.

Table 14.1 Summary of the most relevant methodologies for finger vein presentation attack detection. For performance evaluation, the metrics are the ones reported in the articles, where “Acc.” stands for detection accuracy

As for other biometric characteristics, texture patterns have been extensively analysed for finger vein PAD. In addition to the approaches presented in [71, 75], Raghavendra and Busch included a new PAI species in a subsequent work [61]: a smartphone display. In this case, they considered the residual high frequency band extracted from steerable pyramids and a SVM, achieving again ACERs around 3%. The following year, Kocher et al. thoroughly analysed different LBP extensions in [44], to finally conclude that the baseline LBP technique performs as good as its “improvements”. Finally, in a combined approach, Qiu et al. used total variation decomposition to divide the finger vein sample into its structural and noise components [58]. Using again LBP descriptors and SVMs, they achieved a perfect detection accuracy with APCER = BPCER = 0% over the VERA DB.

Another approach followed for PAD, in general, is based on the use of image quality assessment [21]. This technique was also analysed by Bhogal et al. in [7] for finger vein. In particular, they considered six different measures and their combinations, achieving a detection accuracy over 99%.

Finally, in the last years, Deep Learning (DL) has become a thriving topic [33], allowing computers to learn from experience and understand the world in terms of a hierarchy of simpler units. This way, DL has enabled significant advances in complex domains such as natural language processing [69], computer vision [81], biometric recognition in general, and finger vein PAD in particular. In this context, in 2017, Qiu et al. designed a new Convolutional Neural Network (CNN) for finger vein PAD, which they named FPNet [59]. This network achieved a perfect detection accuracy over the VERA DB. In the same year, Nguyen et al. used two different pre-trained models (i.e. AlexNet [46] and VGG-16 [66]) for the same task. After extracting the features with these nets, Nguyen et al. reduced their dimensionality with Principal Component Analysis (PCA) and used SVMs for final classification. Again, a perfect detection rate over the VERA DB was reported. In a similar fashion, Raghavendra et al. analysed in [63] the use of AlexNet with Linear Discriminant Analysis (LDA) and SVMs for classification purposes, also achieving perfect error rates over a self-acquired database.

3.2 Fingerprint Presentation Attack Detection

The excellent performance of the finger vein PAD methods described above has motivated us to also use finger vein images to detect fingerprint PAIs. However, let us first review the state of the art in fingerprint PAD. Given the vast number of articles studying this problem, we will summarise the most relevant ones for the present study and refer the reader to [47, 67, 72] for more comprehensive reviews.

In general, PAD approaches can be broadly classified into two categories: software-based methods perform a deeper analysis of the captured data to distinguish between bona fide and attack presentations, hardware-based setups make use of information captured by additional sensors. In contrast to the younger finger vein PAD research field, where only the former have been studied so far, for fingerprint PAD both approaches have been followed. Tables 14.2 and 14.3 provide a summary of the reviewed works, classified into soft- and hardware-based approaches. In addition, the number of PAI species and the main performance metrics over the selected databases are reported.

A typical example of software-based approaches is the detection of sweat pores in high-resolution fingerprint images [11, 17, 50]. Sweat pores are not visible in latent fingerprints and, because of their tiny size, it is challenging to include them in artefacts. Therefore, the existence of sweat pores can be utilised as an indicator of a bona fide sample.

Table 14.2 Summary of the most relevant methodologies for software-based fingerprint presentation attack detection. For performance evaluation, the metrics are the ones reported in the articles, where CCR stands for correct classification rate and ACER for average classification error rate
Table 14.3 Summary of the most relevant methodologies for hardware-based fingerprint presentation attack detection. For performance evaluation, the metrics are the ones reported in the articles

Another classical approach, widely applied not only to fingerprint but to other biometric characteristics, is the extraction of textural information. Nikam and Agarwal [53] were among the first ones in 2008 to analyse this kind of approaches. On the one hand, they extracted Local Binary Pattern (LBP) histograms to capture textural details. On the other hand, the ridge frequency and orientation information were characterised using wavelet energy features. Both feature sets were fused and the dimensionality reduced with the Sequential Forward Floating Selection (SFFS) algorithm. For classification, the authors utilised a hybrid classifier, formed by fusing three classifiers: a neural network, SVMs and K-nearest neighbours. Over a self-acquired database comprising two different PAI fabrication materials and several mould materials, an overall classification rate up to 97.4% is reported.

In 2009, the LivDet competition series on fingerprint and iris started in a bi-annual basis [25]. The datasets provided quickly became the de facto standard for fingerprint PAD evaluations. For instance, Jia et al. [40] continued the research line based on texture information and proposed the use of two different variants of multi-scale LBP in combination with SVMs. Over the LivDet 2011 dataset, their method achieved a D-EER of 7.52%. More recently, Jiang et al. presented another approach to extract LBP features from multiple scales in [41]. In particular, a Gaussian pyramid was constructed from the input samples and the corresponding LBP histograms, extracted from three different levels, were classified using an SVM. Achieving an ACER of 21% over the LivDet 2013 dataset, this method outperformed the algorithms presented in the competition.

In a more general approach, Galbally et al. [22] use 25 complementary image quality features to detect presentation attacks for face, iris and fingerprint on legacy data. Regarding fingerprint, they compare their approach with other state-of-the-art methods on the LivDet 2009 fingerprint database, which includes three different PAI species. Their results are competitive for 2014 and even outperform some previously published PAD algorithms on the same dataset. Their main advantage is its independency of the modality, and, additionally, the method is “simple, fast, non-intrusive, user-friendly, and cheap”.

All the aforementioned approaches focus on the basic scenario where all PAI species in the test set are also included in the training test. However, a more realistic, and challenging, scenario should include additional “unknown attacks”, or PAI species only used for testing purposes. In such a case, the detection performance usually decreases. To tackle this issue, Gonzalez-Soler et al. analysed in [32] the use of the Bag of Words feature encoding approach applied to local keypoint-based descriptors (dense Scale Invariant Feature Transform, SIFT). They compare their detection performance with other existing methods using feature descriptors, with no encoding schemes, and show a relative 25% improvement on the average Average Classification Error Rate (ACER, the performance metric used in the LivDet competitions) over the LivDet 2011 with respect to the state of the art. In addition, they present a fully compliant ISO evaluation in terms of APCER and BPCER for the first time for the LivDet datasets.

In contrast to the handcrafted approaches mentioned above, most of the newest approaches rely on deep learning. One of the first works directly related to fingerprint PAD based on conventional capture devices (i.e. a software-based method), was carried out by Nogueira et al. [54]. In more details, the following three CNNs were tested: (i) the pre-trained VGG [66], (ii) the pre-trained Alexnet [46] and (iii) a CNN with randomly initialised weights and trained from scratch. The authors benchmarked the ACER obtained with the networks over the LivDet 2009, 2011 and 2013 databases to a classical state of the art algorithm based on LBP. The best detection performance is achieved using a VGG pre-trained model and data augmentation (average ACER = 2.9%), with a clear improvement with respect to LBP (average ACER = 9.6%). It should be also noted that the ACER decreased between 25% and 50% (relative decrease) for all three networks tested when data augmentation was used.

More recently, Chugh et al. presented the current state of the art for the LivDet datasets in [12], and they evaluated it on multiple publicly available datasets including three LivDet datasets (2011, 2013, 2015), as well as their own collected and published MSU-FPAD and Precise Biometric Spoof-Kit datasets (PBSKD), which include in total 12 PAI species and more than 20000 samples. The so-called Fingerprint Spoof Buster [12] is a convolutional neural network (CNN) based on MobileNet [35], which is applied to minutiae-centred patches. Splitting the CNN input into patches allows them to train the network from scratch without over-fitting. They evaluate several different test scenarios and outperform other state-of-the-art approaches on the LivDet datasets. In a subsequent work [13], the Fingerprint Spoof Buster’s generalisation capability is analysed by applying a leave-one-out protocol on all 12 PAI species from the MSU-FPAD and PBSKD datasets. They observe that some materials are harder to detect when not included during training and specify an optimised training set comprising six of twelve PAIs. The testing results in an APCER of 4.7% at a BPCER of 0.2%.

Even if the aforementioned works manage to achieve remarkably low error rates, PAD can also benefit from information captured by additional sensors, as any other pattern recognition task. To that end, some hardware-based approaches utilise different illumination techniques or capture the pulse frequencies. Hengfoss et al. [34] analysed in 2011 the reflections for all wavelengths between 400 and 1650 nm on the blanching effect. This effect appears when the finger is pressed against a surface and the blood is squeezed out due to the compression of the tissue. Furthermore, they utilise pulse oximetry but admit that this approach takes more time and thus is less desirable for PAD. They manage to correctly distinguish living fingers, cadaver fingers and three PAIs for both methods, and conclude that those dynamic effects (i.e. blanching and pulse) only occur for living fingers. Two years later, Drahansky et al. [15] proposed new optical handcrafted PAD methods for pulse, colour change under pressure and skin reflection for different wavelengths (470, 550 and 700 nm). These methods are evaluated on a database comprising 150 fingerprints, achieving the best results for the wavelength approach. Additionally, they analyse 11 different skin diseases that could occur on the fingertip. However, the influence on the detection performance was not tested.

Over the last five years, it has been shown that the skin reflection within the Short-wave Infrared (SWIR) spectrum of 900–1700 nm are independent from the skin tone. This fact was first analysed by NIST [14] and later on confirmed by Steiner et al. [68] for face PAD. Building upon the work of [68], Gomez-Barrero et al. [29] apply the spectral signature concept first developed for facial images to fingerprint PAD. Their preliminary experiments, over a rather small database, show that most materials, except for orange play doh, respond different than human skin in the SWIR wavelengths of 1200, 1300, 1450 and 1550  nm. However, with the use of fine-tuned CNNs, also the orange play doh is correctly classified in a subsequent work [73]. In a follow-up study [72], Tolosana et al. benchmark both pre-trained CNN models, and design and train a new residual CNN from scratch for PAD purposes for the same SWIR data. Over a larger dataset including 35 different PAI species and more than 4700 samples, they show that a combination of two different CNNs can achieve a remarkable performance: an APCER around 7% for a BPCER of 0.1%. In addition, the evaluation protocol includes 5 PAI species considered only for testing, thereby proving the soundness of their approach even in the presence of unknown attacks.

Additionally, it has been shown that Laser Speckle Contrast Imaging (LSCI) can be used for PAD purposes [43]. The LSCI technique comes from biomedical applications, where it has been applied to visualise and monitor microvascular blood flow in biological tissues, such as skin and retina [65]. Keilbach et al. capture the blood movement beneath the skin to differentiate living fingers from presentation attacks in [43]. However, the utilised laser also penetrates thin transparent fingerprint overlays, thereby detecting the underlying blood flow and falsely classifying the presentation as a bona fide one. Therefore, for a BPCER of 0.2% (system focused on the user convenience), the APCER increases to 15.5%.

Combining SWIR and LSCI, Hussein et al. [37] use a patch-based CNN to classify multi-spectral samples from both domains. For both techniques, low error rates are reported and a combined fusion achieves a perfect detection performance over a database compromising 551 bona fides and 227 PAs, including 17 different PAI species.

Further research by Gomez-Barrero et al. [30] applies a score-level fusion method based on handcrafted features to benefit from different domains, including SWIR, LSCI and vein images. Their training set comprises only 136 samples in order to evaluate the approach on 4531 samples in the test set containing 35 different PAI species. The weights for the fusion are computed on 64 samples of the development set. An APCER \(<10\%\) for a BPCER = 0.1% is reported, as well as an APCER of 6.6% for a BPCER = 0.2%, thus yielding secure systems even for very low BPCERs.

Lastly, in a subsequent work by Gomez-Barrero et al. [31], the SWIR CNN approaches proposed in [72] are combined with an enhancement of the handcrafted features extracted from the LSCI data in [43]. This combined approach, tested on the same database comprising 35 different PAI species, shows a clear improvement on the detection capabilities of the proposed method, even if only 2 sets of images are used (i.e. reduced capture device cost): the D-EER is reduced from 2.7 to 0.5%.

4 Proposed Finger Vein Presentation Attack Detection

As indicated in Sect. 14.1, we will now focus on the development of PAD techniques based on finger vein data, in order to detect fingerprint PAIs. It should be noted that the PAD algorithm can process data that is captured simultaneously with a single capture device from both the finger vein and the fingerprint. Otherwise, if the capture with both sensors was done sequentially, the attacker might exchange the PAI used for fingerprint verification with his bona fide finger for the PAD capture process. Therefore, in this section, we first describe a multimodal capture device which is able to acquire both fingerprint and finger vein images (Sect. 14.4.1). We subsequently present an efficient PAD method applied to the finger vein data in Sect. 14.4.2. Given that some fingerprint overlays may still reveal part of the vein structure, we will focus on texture analysis to detect PAs in a real-time fashion using a single image.

4.1 Multimodal Finger Capture Device

Given the requirement to capture both fingerprint and finger veins, a contact-less multimodal capture device is used to acquire photos of fingerprints as well as finger veins. A diagram of the inner components of the capture device is depicted in Fig. 14.1. As it may be observed, the camera and illumination boards are placed inside a closed box, which includes an open slot in the middle. When the finger is placed there, all ambient light is blocked and therefore only the desired wavelengths are used for the acquisition of the images. In particular, we have used a Basler acA1300-60gm Near-infrared (NIR) camera, which captures \(1280 \times 1024\) px. images, with an Edmunds Optics 35mm C Series VIS-NIR Lens. This camera is used for both frontal visible (VIS) light images and NIR finger vein samples (see the following subsections for more details on each individual sensor).

Fig. 14.1
figure 1

Sensor diagram: a box, with a slot in the middle to place the finger, encloses all the components: a single camera, two sets of LEDs for visible (VIS) and NIR illumination and the light guide necessary for the finger vein capture (more details in Sect. 14.4.1.2)

Fig. 14.2
figure 2

Full bona fide samples as they are captured by the camera

An example finger photo as it is captured by the camera is shown in Fig. 14.2, for both the finger vein and the finger photo acquisition. As it can be seen, the central Region of Interest (ROI) corresponding to the open slot where the finger is placed needs to be extracted from the background before the images can be further processed. Given that the finger is always placed over the open slot, and the camera does not move, a simple fixed size cropping can be applied.

4.1.1 Finger Photo Sensor

The most important requirement for the design of the finger photo sensor is its compatibility with legacy (optical) sensors. In other words, we need to make sure that fingerprints can be extracted from the finger photos captured within the visible wavelengths and be subsequently used for verification with Commercial off-the-shelf (COTS) systems. In order to fulfil this requirement, the resolution and focus of the selected camera and lens combination need to be high enough to yield fingerprints with at least the equivalence to 500 dpi resolution. We have therefore chosen the aforementioned Basler and Edmunds Optics components.

To illustrate how the finger photos can be used for fingerprint recognition, Fig. 14.3 shows the captured bona fide sample (Fig. 14.3a). Next to it, the minutiae extracted with Neurotechnology VeriFinger SDKFootnote 6 (Fig. 14.3b), which has been defined as the standard fingerprint recognition SDK within the Odin program, and the corresponding enrolled fingerprint (Fig, 14.3c) are depicted. As it may be observed, the minutiae are correctly detected within the fingerprint area. It should be noted that, if this system should be used in combination with optical sensors, the finger photo needs to be flipped (left-to-right) before enrolment or comparison.

Fig. 14.3
figure 3

Bona fide finger photos: a visible (VIS) light image, b minutiae extracted with Verifinger and c fingerprint enrolled with Verifinger

4.1.2 Finger Vein Sensor

The finger vein capture device comprises three main components, namely: (i) a NIR light source behind the finger with 20 LEDs of 940 nm, (ii) the corresponding NIR camera and lens and (iii) an elevated physical structure to obtain the adequate amount of light.

It should be noted that, in order to capture high-quality finger vein samples, it is vital to let only the right amount of light intensity penetrate through the finger. To achieve the correct amount of light transmission, a physical structure with elevation is placed to concentrate the light intensity to the specified area, referred to in Fig. 14.1 as “light guide”. The subject interacts with the sensor by placing a finger on the small gap provided between the NIR light source and the camera. The NIR spectral light is placed facing the camera in a unique way, so that the light emitting from the NIR spectrum penetrates through the finger. Since the haemoglobin blocks the NIR illumination, the veins appear as darker areas in the captured image. A sample image is depicted in Fig. 14.4, where the veins are clearly visible even before preprocessing the sample.

Fig. 14.4
figure 4

Bona fide finger vein ROI, of size \(830 \times 240\) px

4.2 Presentation Attack Detection Algorithm

As mentioned at the beginning of this Section, we will focus on texture analysis of the finger vein samples in order to discriminate bona fide samples from presentation attacks. To that end, we have chosen a combination of Gaussian pyramids and Local Binary Patterns (LBP), referred to as PLBP, which was proposed in [57] as a general descriptor. The main advantage of this texture descriptor lies on the fact that, by extracting the LBP features from the hierarchical spatial pyramids, texture information at different resolution levels can be considered. In fact, the PLBP approach was used in [41] for fingerprint PAD over the LivDet 2013 DB [24], achieving results within the state of the art for only three pyramid levels. In order to analyse the influence of the different pyramid levels, we compare the results using up to 16 pyramid levels.

Fig. 14.5
figure 5

General diagram of the proposed PAD algorithm. From the finger vein photo, the Gaussian pyramid is computed first, then LBP is applied and the corresponding histogram serves as input to the SVM classifier

The flowchart of the proposed method is shown in Fig. 14.5. First, the Gaussian pyramids are computed from the original cropped image or ROI (see Fig. 14.4). Subsequently, LBP images are generated for every pyramid level, resulting in the PLBP images. Then, histograms are computed from the PLBP images and classified with a Support Vector Machine (SVM). Each step is described in more detail in the following paragraphs.

Gaussian pyramids. For multi-resolution analysis, lowpass pyramid transforms are widely used [8]. In particular, the Gaussian blur lowpass filter can be used to down-sample the original image. This step can be repeated to get continuously smaller images, resembling a pyramid, as depicted in Fig. 14.6. In practice, one pixel of the down-sampled image corresponds to a fixed size area of the previous pyramid level, thereby losing information the further up we go into the pyramid. However, in our implementation, all levels of the pyramid have the same size, which is obtained by up-sampling the output image in each iteration. As a consequence, the higher level images appear blurrier.

It should be highlighted that, in our implementation, different pyramids with up to 16 levels are created. This allows us to determine how the PAD performance change when more levels of the pyramid are used.

Fig. 14.6
figure 6

Illustration of example pyramids for: a Gaussian pyramid of vein images and b LBP images of this Gaussian pyramid

Local Binary Patterns (LBP). Local binary patterns were introduced in [56] as a simple but efficient texture descriptor. Its computational simplicity and greyscale invariance are the most important properties of LBP. The algorithm compares neighbouring pixels and returns the result as a binary number, which is in turn stored as a decimal value. The process is illustrated in Fig. 14.7 for a radius of 1 pixel (\(3 \times 3\) block). It should be noted that the binary representation can also be flipped and the direction and starting point of reading the binary number does not matter as long as it is fixed for the whole system (otherwise, the extracted feature would not be comparable). An example of the four selected PLBP images of the bona fide sample shown in Fig. 14.4 is presented in Fig. 14.8.

Fig. 14.7
figure 7

LBP computation: Comparing the central pixel (orange) to each neighbouring pixel results in a binary representation. The binary values are converted to a decimal number, which is stored in the resulting LBP image instead of the original central pixel

Classification. In order to reduce the dimensionality of the feature vector, a greyscale histogram is computed from the resulting LBP images. Subsequently, linear SVMs are used to classify the extracted histograms. These SVMs rely on a main parameter, C, which can be tuned for an optimal performance. Intuitively, the C parameter trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.

In addition, we benchmark two SVM approaches, as shown in Fig. 14.9 for the simple case of three pyramid levels. On the one hand, we use separate SVMs for each pyramid level (Fig. 14.9a). On the other hand, we utilise a single SVM for all pyramid levels (Fig. 14.9b). Both setups produce one label per pyramid level and then apply a majority vote on the corresponding SVM outputs in order to reach a final decision.

Fig. 14.8
figure 8

Resulting bona fide LBP images of different Gaussian pyramid levels (i.e. PLBP images)

Fig. 14.9
figure 9

Diagram of the two SVM approaches on the example of 3 pyramid levels

5 Experimental Evaluation

With the aim of analysing the suitability of the proposed method for finger vein-based PAD, several experiments were carried out using an identical experimental protocol. Our training and test sets are completely disjoint in order to avoid biased results. Furthermore, in order to allow reproducibility of the experiments, preprocessing and feature extraction are based on the bob toolkit [4, 5].

5.1 Experimental Set-Up

The captured dataset comprises 766 samples including 542 bona fides and 224 presentation attacks, stemming from 32 different PAI species. The PAs can be classified into three categories, namely: (i) 2D printouts, (ii) full fingers and (iii) overlays, whereby 2D printouts can also be used as an overlay during the presentation. A detailed listing of all PAIs from the database is presented in Table 14.4.

Table 14.4 Listing of all PAI species and the number of samples in parenthesis

All samples were captured within the BATL project with our project partners at the University of Southern California. Note that the project sponsor has indicated that they will make the complete dataset available in the near future such that research results presented in this work can be reproduced.

We have additionally considered two test scenarios (see Table 14.5). The first one uses the same number of bona fides and PAs in the training set (69 samples each). To increase the robustness on the detection of bona fide presentations (i.e. minimise the BPCER), the second scenario adds additional 35 bona fide samples to the training set, thus reducing the test set. The partitioning for both scenarios is shown in Table 14.5. Both approaches, using a single SVM or separated SVMs, are compared using the same training and test sets for each scenario.

Table 14.5 Partitioning of training and test data

In more details, the training set comprises all different PAIs except from dragon-skin overlays, since this thin and transparent material does not block NIR illumination as known from previous experiments [30]. As a consequence, all veins are visible and the sample has the same appearance as a bona fide. Using such samples to train the SVM would thus have a negative impact on its detection accuracy, increasing the BPCER. These PAIs are therefore used only for testing purposes.

In the first scenario, cross-validation is used during the training to automatically select a best-fitting C value as SVM parameter. As suggested by Hsu et al. [36], exponential growing sequences for C \(\left( 2^x\right) \) were tested within the range \(x=\left\{ -20, ..., 20\right\} \). However, due to the increased number of training samples for the second scenario, and consequently, the training time required, only the range \(x=\left\{ -20, ..., 8\right\} \) has been used to cross-validate scenario 2.

Finally, all results are reported in terms of the APCER and BPCER over the test set (see Sect. 14.2), in compliance with the ISO/IEC 30107-3 standard on biometric presentation attack detection - part 3: testing and reporting [39].

It should be noted that establishing a fair benchmark with previous works in the state of the art are difficult since this is the first approach to carry out fingerprint PAD based on finger vein samples.

5.2 Results

The results in terms of APCER (dashed) and BPCER (solid) for scenario 1 are plotted in Fig. 14.10, in order to facilitate the visualisation and comparison across different pyramid levels. On the x-axis, the range of pyramid levels are given while the y-axis shows the error rates (in %). For the single SVM approach (Fig. 14.10a), both error rates reach a minimum when using 6 pyramid levels, namely, BPCER = 3.38% and APCER = 5.81%. On the other hand, for the separate SVM approach (Fig. 14.10b), the minimum of both error rates is reached at different levels, namely, BPCER = 2.54% for the fifth level and APCER = 6.45% for the fourth level. This means that, depending on the application at hand (i.e. which error rate should be optimised), different levels may be selected. As it may be observed from Fig. 14.10, the error rates of the separate SVMs somewhat stabilise for using five or more pyramid levels, whereas the single SVMs show much more peaks and no stabilisation.

Regarding the aforementioned decision of prioritising one error rate over the other one, it should be taken into account that a low BPCER results in user convenience (i.e. a low number of bona fide presentation will be wrongly rejected). On the other hand, a low APCER will grant a more secure system (i.e. the number of non-detected attacks will be minimised). One of the aims of the Odin program is achieving a low BPCER. To that end, we analyse the second scenario, for which more training samples for the bona fide class are utilised in order to make the classifier more robust. The corresponding plots with the APCER and BPCER for every pyramid level are presented in Fig. 14.11.

Fig. 14.10
figure 10

Percentage of APCER and BPCER of scenario 1 for both SVM classifiers

Fig. 14.11
figure 11

Percentage of APCER and BPCER of scenario 2 for both SVM classifiers

We can observe that the BPCER is significantly lower for all pyramid levels when compared to scenario 1, reaching minimum values of 0.68% for the single SVM and 2.28% for the separate SVMs. At the same time, the APCER stays similar to that of scenario 1, thereby showing the soundness of increasing the number of bona fide samples for training. Additionally, we can see that using only the first four levels produces higher peaks and higher error rates, thus making it unsuitable for PAD purposes. In turn, increasing the number of levels results in a decreasing BPCER, as can be seen for the levels greater than four. Taking into account the pyramid levels five to sixteen, the average APCER is slightly lower for the single SVM approach (10.32–11.50%), while the average BPCER improves significantly for the single SVM (1.12–2.87%). Therefore, we may conclude that the single SVM approach achieves a better PAD performance than the separate SVMs since the training set of the latter is not big enough to train one pyramid level independently of the others. The single SVM gets complimentary information when seeing all levels together and is thus able to reach a higher detection performance.

Table 14.6 Comparison of the proposed method to state-of-the-art implementations

A comparison for both scenarios of the single SVM approach (level 7) to other handcrafted state-of-the-art implementations is given in Table 14.6. The Luminosity and MC mean algorithms operate on a very convenient threshold but classify only a fraction of presentation attacks correctly (APCER = 68.39% and APCER = 43.87%, respectively). The other algorithms use a support vector machine for classification and present lower APCERs. However, in some cases, the BPCER raises to nearly 10%. In particular, the MC histogram achieves an APCER between 12 and 14%while the BPCER is between 8 and 10%. In contrast, the BSIF implementation results in a BPCER of around 5% at the cost of a higher APCER (26–29%). The results of the plain LBP implementation and the proposed PLBP implementation are identical regarding APCER but differ in the BPCER. Whereas for scenario 1 LBP provides a better BPCER of 1.9% compared to 4.02%, the proposed PLBP approach reduces its BPCER in scenario 2 to 0.68% in contrast to 1.14% for LBP. Therefore, we can see that our PLBP algorithm achieves the best results for scenario 2 while it is outperformed by LBP in scenario 1. The score files from all tests in this chapter are freely available.Footnote 7

Even if the results are promising, reaching an APCER \(\approx 10\%\) for BPCER \(\approx 1\%\), where also unknown attacks (i.e. only used for testing and not seen by the classifier at training) are considered, there is still room for improvement. In particular, a deeper analysis of the results shows that a remarkable number of misclassified PAIs are transparent overlays made of dragon-skin, silicone, monster latex, school glue or wax. In addition, two types of full fake fingers also managed to deceive the PAD algorithm in some cases, namely, glow-in-the-dark silly putty, and one of the samples acquired from a teal play doh finger. Some samples that were not detected are shown in Fig. 14.12. As we may observe, especially for the dragon-skin (c) and the school glue (f) overlays, the samples are very similar to the bona fide sample shown in Fig. 14.4. In particular, the vein structure can be clearly seen.

Fig. 14.12
figure 12

Examples of undetected PAI species

Finally, Fig. 14.13 shows the 11th level of PLBP images for (a) a dragon-skin overlay, (b) a teal play doh finger, (c) a school glue overlay and (d) a 3D printed finger with silver coating. Comparing these samples with the bona fide one from Fig. 14.8, we can see the high similarities for the transparent overlays in (a) and (c). However, the teal play doh and the 3D printed finger have different patterns (i.e. the 3D printed finger does not block the NIR light at all, only the silver-coated part is visible). Hence, the SVMs always correctly classify the 3D printed PAIs, and only one error occurred for the teal play doh samples.

Fig. 14.13
figure 13

Resulting LBP images of different PAIs for 11th Gaussian pyramid level (i.e. PLBP images)

To sum up the findings in this section, we can state that the APCERs of around 10% show the limitations of vein-based still image PAD: thin transparent overlays cannot be detected since the extracted features look far too similar to the bona fide ones. However, this PAD technique already allows to successfully detect a wide range of PAIs, including full fake fingers and overlays fabricated from materials which block NIR light to a bigger extent than human flesh.

6 Summary and Conclusions

Although being relatively new in comparison with other biometric characteristics, such as fingerprints or handwritten signatures, finger vein recognition has enjoyed a considerable attention within the last decade. As with any other security-related technology, a wider deployment also implies an increase in security and privacy related concerns. This has, in turn, lead to the development of countermeasures to prevent, among others, presentation attacks.

In particular, the biometric community has focused on detecting finger vein images or videos presented to the capture device, in contrast to bona fide fingers. Highly accurate PAD methods have been developed in the literature, able to detect these PAIs with perfect error rates.

In parallel, multimodal capture devices able to acquire both finger vein and fingerprint images have been proposed and implemented. In contrast to the finger vein, which is harder to imitate, multiple recipes are available to an eventual attacker in order to carry out a PA and fool a fingerprint-based recognition system. These facts have motivated us to present in this chapter a novel approach to protect fingerprint sensors: finger vein PAD methods which are able to detect fingerprint PAIs.

In more details, due to the remarkable performance shown by LBP for different tasks, including PAD for several biometric characteristics, we chose this texture descriptor for our work. Even for some challenging PAIs, we can observe with the naked eye that the texture captured has a different appearance from the bona fide finger. In addition, different texture details were analysed utilising Gaussian pyramids and extracting the LBP features from each level of the pyramid. Subsequently, SVMs were utilised for classification purposes.

With a sensor developed for the Odin program, a database comprising 32 different PAIs was acquired and used for the present evaluation. After an extensive experimental evaluation, we found that using a single SVM for a concatenation of the features extracted from all the levels of the pyramid is the best performing approach. This scenario leads to operation points with BPCERs under 1% and an APCER around 10%. The latter shows the main limitation of vein-based still image PAD: thin transparent overlays cannot be detected. However, this PAD technique still allows to successfully detect a wide range of PAIs.

We thus believe that finger vein can be effectively used with fingerprint for both a more accurate recognition performance, as shown in previous works, and also for PAD purposes. In the end, an attacker who needs to deceive both the fingerprint and the vein sensors will face harder challenges in his path. In the forthcoming months, we will focus on improving the finger vein-based PAD, and on developing combined approaches with the finger photos captured with the sensor.