1 Introduction

Biometrics refers to the automated recognition of individuals based on their biological and behavioural characteristics [1]. Due to their convenience, Face Recognition Systems (FRSs) are widely-used for user authentication, e.g. for Automated Border Control (ABC) based on the biometric passport (ePass) as defined in [2]. The main advantage of biometric systems over common authentication systems is the unique link, which is established between the Machine Readable Travel Document (MRTD), containing the biometric reference, and the data subject (i.e. the owner of the electronic passport). However, current research [3, 4] has revealed a weakness of the passport issuance process, which allows to inject manipulated images into the biometric database, e.g. the ePass. The key deficiency in the passport issuance process lies in the way the facial picture of an applicant is processed. In many countries, the applicant provides a printed facial image, which is scanned and then digitally transferred to the passport production site. Thus, an artificial facial image, resembling two or more subjects in their visual and feature representation (see Fig. 1b), can be submitted to the passport issuance authority. If the artificial image is used for verification, both constituting subjects can be verified successfully. This type of attack is referred to as morphed face image attack.

Fig. 1.
figure 1

Example of face morphing

Fig. 2.
figure 2

Scheme for differential morphing detection

The feasibility of this attack was first analyzed in [3, 4] on a dataset of 12 morphed face images on two Commercial-Of-The-Shelf (COTS) Face Recognition Systems (FRSs), and verified recently in [5, 6] on a larger dataset of 450 morphed images. The datasets used in [3,4,5,6] were generated utilizing GIMP and GAP. In addition, a theoretical framework for the estimation of the success of morphing attacks on a specific system was presented in [7].

In [5], a first detection system for morphed face images was proposed, based on well-established multi-purpose image descriptors. The method aims at detecting morphed face images without a bona fide reference, hence referred to as no-reference morphed face image detection in the remainder of this article. A bona fide presentation is defined in ISO/IEC 30107-3 as a “interaction of the biometric capture subject and the biometric data capture subsystem in the fashion intended by the policy of the biometric system” [8]. Among the analyzed feature extractors, Binarized Statistical Image Features (BSIF) [9] with a filter of \(11 \times 11\) pixels and 12-bit performed best on the given dataset.

The morphed face image attack detection algorithm proposed in [5] focuses on digital samples, as used for ePass renewal in New-Zealand, where face images are uploaded electronically. However, the application process of the ePass in many countries (e.g. most of the European Schengen states) still requires a printed face image that will be handed over to the public authority office during the application process.

Taking this real-world scenario of printed and scanned face images into account, [6] focuses on the effect of the print-scan process of a digitally morphed face image on FRSs and morphing detection. The print and scan process adds some noise and granularity to the face image, which affects the performance of both FRSs and morphed face image detection algorithms. Despite such noise, it was shown in [6] that morphed face images pose a severe threat to face recognition systems even after printing and scanning, and many well-established multi-purpose image descriptors are not suitable for detecting ether digital nor printed and scanned morphed face images.

Ferrara et al. [10] proposed face demorphing for morph detection employing a trusted live capture in addition to the questioned sample. Scherhag et al. [11] analyzed multiple general purpose image feature extractors in this differential scenario. Further, Hildebrandt et al. [12] suggest to employ generic image forgery detection techniques, in particular multi-compression anomaly detection, to reliably detect morphed facial images. Kraetzer et al. [13] evaluate the feasibility of detecting facial morphs with keypoint descriptors and edge operators. However for the current state of the art detection accuracy is very limited and generalisation capabilities of detectors are yet unexplored.

In this paper we propose a novel framework for the detection of morphed face images based on facial landmarks. In contrast to previously proposed no-reference methods, we compare a bona fide face image with the passport image we want to classify (see Fig. 2). This approach will be referred to as differential morphing detection. The required bona fide face image could be either captured at the ABC-Gate or during the application process at the public authority office. In either scenario the assumption holds that the capture process is semi-supervised (i.e. via video surveillance) or supervised (i.e. in the passport application office).

The paper is organized as follows: The differential morphed face image detection framework is introduced in Sect. 2. In Sect. 3 we present the new morphed face image database, and the experiments conducted with the framework are described in Sect. 4. Final conclusions are drawn in Sect. 5.

2 Proposed Algorithm

The algorithm is motivated by the observations made in [14], that meaningful landmarks are suitable for face recognition. In addition, landmark based face recognition systems are robust to ageing, which is an important property for passport scenarios [14]. The position of each landmark of the morphed face image, \(l_m(x_m,y_m)\), is situated between the corresponding landmarks, \(l_i(x_i,y_i)\) and \(l_j(x_j,y_j)\), of both constituting subjects, i and j:

$$\begin{aligned} \begin{aligned} x_m&= (1-\alpha )x_i+\alpha x_j \\ y_m&= (1-\alpha )y_i+\alpha y_j, \end{aligned} \end{aligned}$$
(1)

where \(\alpha \) defines the ratio of the contribution of Subject j to the morph, and consequently \(1-\alpha \) describes the contribution of Subject i. It can be assumed that the intra-subject variance of landmarks extracted from bona fide images is smaller than the variance between the landmarks of the morphed image and its contributing subjects. Based on this assumption, two feature extraction methods are designed: (1) distance based, and (2) angle based.

Fig. 3.
figure 3

Facial landmarks and angle calculation

  1. (1)

    Distance based: The landmarks of both images (bona fide image, \(I_b\), and passport image, \(I_p\)) are determined utilizing the facial landmark predictor of dlib [15], which returns the absolute position of 68 facial landmarks (\(l_0\) ...\(l_{67}\)), as depicted in Fig. 3. In order to achieve a scaling-robust system, the landmarks are normalized to a range between 0 and 1. To that end, the green and yellow dots depict the upper-left (0.0, 0.0) and lower-right (1.0, 1.0) boundaries for the normalization. In the next step, the Euclidean distance of the relative position of each landmark \(l_i\) between both images \(I_p\) and \(I_b\) is calculated (depicted in red in Fig. 4), resulting in a feature vector of length 2278, which referred to as distance features.

  2. (2)

    Angle based: Depending on the face region, pose and expression, the position of the landmarks varies. Even if the images utilized in this work are normalized according to ICAO recommendations [2], minor pose variations and expressions can not be completely avoided. Those minor changes in the positioning of the landmarks affect the calculation of the distance, thereby decreasing the morphing detection accuracy. Therefore, in order to achieve a more robust feature extractor, the angles of each landmark, \(l_i\), to a predefined neighbor (in order to obtain the most discriminative dependencies) are calculated as \(\hat{l_i}\) (depicted in Fig. 3). The corresponding angles of \(I_p\) and \(I_b\) are compared as shown in Fig. 4. In order to avoid unrealistic high differences when the angles cross the horizontal line, the difference is calculated as:

    $$\begin{aligned} d(\hat{l^p_i}, \hat{l^b_i}) = min(|\hat{l^p_i} - \hat{l^b_i}|, 360^{\circ }- |\hat{l^p_i} - \hat{l^b_i}|), i = 0 \dots 67, \end{aligned}$$
    (2)

    returning a positive difference between \(0^\circ \) and \(180^\circ \). The resulting feature vector has a length of length 68 and will be referred to as angles features.

Fig. 4.
figure 4

Landmark based feature extraction

During this work, multiple classifiers have been tested. It turned out, that the classification task can not be solved by linear classifiers, e.g. linear Support Vector Machine (SVM). Therefore, and in order to have a more comprehensive evaluation, three different classifiers are employed:

Random Forest with 500 estimators.

SVM without kernel

SVM with a Radial Basis Function (RBF) kernel

The best performance could be achieved employing an SVM with RBF kernel.

3 Databases

The databases created for previous works are either non-public, or do not comprise enough independent bona fide images to carry out a fair evaluation. Thus, for this work, a new database was constructed utilising two different morphing techniques. The new database builds upon the publicly available ARface database [16], which comprises 136 subjects. In order to generate realistic morphs, all frontal faces with neutral expression are selectedFootnote 1. The selected 493 images from 120 subjects are processed according to the recommendations of the International Civil Aviation Organisation (ICAO) [2]. For the normalization process, the landmark detection algorithm of dlib [15] is employed to determine the centers of the eyes, according to which the images are horizontally aligned. Subsequently, the image is cropped according to [17] and downsized to \(720 \times 960\) pixels, which is 5 times higher than the minimal required resolution for passport images.

For the experiments, the subjects are divided into two subsets: training and testing, each comprising 60 subjects. While the first image of each subject is reserved for the morphing attack creation, the remaining images are utilized as bona fide samples. The total composition of the database is depicted in Table 1. The morphs are generated utilizing the dlib landmark detector [15] and delaunay triangulation and referred to as OpenCV Morphs.

Table 1. Composition of created database

4 Experiments and Results

The experiments conducted in this work are evaluated according the metrics for presentation attack detection defined in ISO/IEC 30107-3 [8]:

  • Attack Presentation Classification Error Rate (APCER): proportion of attack presentations using the same Presentation Attack Instrument (PAI) species incorrectly classified as bona fide presentations in a specific scenario.

  • Bona fide Presentation Classification Error Rate (BPCER): proportion of bona fide presentations incorrectly classified as presentation attacks in a specific scenario.

In addition, the following metrics are reported:

  • BPCER10: BPCER observed at a fixed APCER of 10%.

  • Detection Equal Error Rate (D-EER): the operating point in which APCER and BPCER are equal.

In addition to the aforementioned metrics and in order to evaluate the quality of the generated database, the Mated Morph Presentation Match Rate as well as the Relative Morph Match Rate (RMMR) [18] were calculated, which indicates the vulnerability of the FRS with respect to the attack. Employing the Cognitec FaceVACS-SDK [19] a ProdAvg-MMPMR of 96.7% ProdAvg-RMMR of 97.7% was computed for testing and training set.

Due to the fact that we employ a differential morphing detection scheme, we obtain many more distance-scores than samples. The number of bona fide comparisons is 315 for the testing set. The number of morph comparisons, which could be much higher, is limited to 300 to obtained a balanced classifier.

Table 2. Performance of the proposed system
Fig. 5.
figure 5

Performance evaluation of the analyzed detection algorithms

The error rates (D-EER and BPCER10) for the classifiers are summarised in Table 2. Furthermore, the Detection Error Tradeoff (DET) curves [20] are depicted in Fig. 5.

In the DET-plot (Fig. 5), higher error rates can be observed for the distance features compared to the angle features. The linear SVM yields slightly higher error rates compared to the SVM with RBF kernel, indicating a non-linearity of the problem.

The best detection performance for all operating points is achieved by the SVM with RBF kernel, yielding a D-EER of 32.7% and BPCER10 of 61.7%. The results indicate, that some information about the morphing process can be derived from the landmarks, but due to the low overall performance, the algorithms presented in this work are still not applicable for real world applications. Thus our future work, will fuse landmark based information with complementary information derived from the image texture.

5 Conclusions

In this work a novel approach for morphed face image detection has been presented. Unlike previous detection methods, this work utilizes, in addition to the passport image submitted, a trusted bona fide facial image from the claimed data subject. Employing a Random Forest classifier, the differences of the angles between certain landmarks of passport and bona fide image are compared, yielding a D-EER of 32.7%, a fusion of both approaches might lead to lower error rate. The results achieved in this work are not suited for operational deployment, but it is a first step towards reference based morphed face image detection.