1 Introduction

Authentication plays a vital role in numerous domains, including forensic science, financial security, and physical access control. Biometric features offer a powerful means of authentication, providing ample information for accurate person identification. These features encompass behavioral traits, such as handwritten signatures, voiceprints, and keystroke dynamics, as well as physiological characteristics like fingerprints, hand silhouettes, and blood vessel patterns [46]. While behavioral-traits-based systems are considered less intrusive, they exhibit higher variability over time; whereas, physiological-based systems tend to offer greater stability but often involve more intrusive measurement methods.

In this paper, we propose an innovative approach that not only meets the increasing demand for adaptable, efficient, and robust biometric identification methods but also prioritizes explainability, a key aspect from an ethical standpoint. Our method focuses on enhancing the security and reliability of physiological-based biometrics, specifically human retinas and palmprints. These biometric features possess an exceptional level of distinctiveness, even among identical twins, owing to their unique structural patterns, as depicted in Fig. 1. The explainable nature of our approach aligns with ethical AI principles, ensuring transparency and accountability in the identification process, and making it ideal for advanced computer vision techniques.

Vision-based biometric systems leverage image sensors and sophisticated computer vision methods. Despite significant developments in this field, persistent challenges remain, including the need for short distance image acquisition systems, ensuring high-quality images, and the development of dependable and rapid identification methods [41].

Palmprint-based approaches are emerging as reliable identification schemes, offering notable advantages such as low computational cost, high accuracy, and user-friendliness compared to traditional biometric modalities like face, iris, or fingerprint recognition [23, 42, 104]. Remarkably, contactless technology for palmprint image acquisition is already available in the market [64].

In contrast to other biometric features, the uniqueness and difficulty of forging retina patterns are well-documented [2, 60]. Although standard retina acquisition devices may not be entirely non-invasive and require cooperation from the individual being verified [30], recent advancements have made retinal screening systems accessible on smartphones through suitable adapters [35, 67], significantly enhancing accessibility and user-friendliness.

Fig. 1
figure 1

(First row) Sample images of retina taken from the VARIA data set  [72], (second row) and contactless palmprints taken from the IITD data set [48, 49]. Images in b and c are taken from the same person

Our contributions are centered around the introduction of a versatile, learning-free, and explainable method for person identification, applicable with efficiency to both retina and palmprint-based biometric systems. By versatile, we mean that our approach is highly adaptable and can be effectively utilized in various scenarios. It is also learning-free, implying that it does not rely on extensive learning processes or complex optimization algorithms, making it efficient and less dependent on large datasets. Moreover, our approach is explainable, meaning that the reasoning behind its decisions and outcomes can be readily understood, providing transparency and insight into its functionality. The cornerstone of our novelty is the utilization of hierarchical COSFIRE filters [6, 7]. Unlike conventional learning-based methods, our approach harnesses the unique trainable nature of COSFIRE filters without the dependence on extensive datasets or optimization algorithms. Specifically, we configure COSFIRE filters to exhibit selectivity for distinct spatial arrangements of keypoints, such as junctions, characteristic of retina and palm patterns. This approach obviates the need for vessel/palm lines segmentation or image registration, simplifying the identification process. Additionally, our contributions extend to robustness, with our method demonstrating resilience against a decision-based black-box adversarial attack and excelling in partial matching scenarios.

2 Related works

Relevant literature about retina and palmprint identification or verification purposes can be classified in five categories: bitwise-based, feature-based, line-based, hybrid and deep learning-based.

Bitwise codes are obtained by applying banks of phase or directional filters followed by encoding the filtered image using a binary scheme. Sometimes these approaches are addressed as graph-based methods. In relation to retinal identification, [52] modeled the segmented vessel structure as a graph, so that the verification matching is carried out using a graph-based technique taking into account three distance measures between a pair of graphs. Regarding palmprint biometrics, very well-known methods such as Palm Code [98], Competitive Code (Comp Code) [47], Ordinal code [87], and Fusion Code [45] use bitwise approaches. More recently, [38] extracted block dominant orientation code and block-based histogram of oriented gradients as palmprint features. [11] employed multi-resolution log-Gabor filters in which the final feature map was constructed by the winning codes of the responses corresponding to the lowest real intensity value of pixels among the filtered images. A double-orientation code based on Gabor filters and a nonlinear matching scheme was used in [24, 97]. Histograms of the combinations of binarized statistical image features (BSIF) of phase of outputs of Gabor filters were obtained and used by a k-nearest neighbors classifier. In [58], a discriminant analysis scheme for multiple orientations and scales of the palmprint features was proposed, which used a representation of four code bits and matched using the Hamming distance. Bitwise-based approaches are known to be highly computational intensive due to the need of a high number of convolutions required by the bank of filters.

Feature-based methods usually use some global or local properties, such as the optic disk in retinal fundus images, to describe the given images after some preprocessing [53, 76]. With respect to retina-based biometrics, a neural network recognition system with normalized and enhanced retinal images was investigated by [77]. [37] preprocessed the images using an optical disc ring algorithm and extracted SURF and ORB features. Concerning palmprint identification, [50] presented a global approach by means of a Passband Discrete Cosine Transform (PBDCT) for dimensionality reduction and feature extraction of the Fisher Locality Preserving Projections (FLPP). SIFT feature extraction and matching along with a matching refinement was also explored in [94]. Moreover, [38] employed a weighted histogram of oriented gradients for locally selected patterns (WHOG-LSP), and [57] adopted the local line directional pattern (LLDP) descriptor. Texture coding methods generally achieve high recognition rates but require clean palmprint images that are easier extracted by contact devices. In general, most of the feature-based approaches rely on handcrafted features, such as SIFT, which do not necessarily have any perceptual importance and thus they may lose their repeatability property in retinal and palmprint images. This is because handcrafted features may be detected in regions that are not very distinctive; and hence, they might cause incorrect matches.

Line-based methods extract the segmented blood vessel patterns [43, 75, 96] or palm lines using line/edge detection algorithms. Most of the existing methods segment blood vessels prior to retinal identification. Retinal segmentation is a challenging task that has been extensively studied [8, 29, 70, 84, 103]. In [9], the authors obtained feature points using a geometric hashing technique on segmented retinal images. Some approaches also include a registration step in which two images are aligned to pursue more stability when only part of the retina image is available [36, 71]. The work in [30] introduced a co-registration step that exploits the information used to segment retinal images via point sets alignment, and in [28] a two-step method was presented, which consists of a multiscale affine registration followed by a multiscale elastic registration. As to line-based palmprint identification, the study in [55] was built on [20, 66] by taking into account the direction and thickness of local descriptors and considering a concatenation of histograms. In [74] it was explored the use of log-Gabor transform and feature dimensionality reduction using kernel discriminant analysis along with a sparse representation classifier. Line-based approaches rely on vessel or palm lines segmentation algorithms, which may also be computational intensive and may be sensitive to artifacts due to pathologies or imperfections of the acquisition systems.

Hybrid methods combine several of the above techniques. For instance, [93] proposed an implementation of a vascular-based method for the removal of false blood vessels before feature extraction and then textural features such as luminance, contrast and color information were extracted. [2] first segmented the vessels and then used end points, bifurcations and crossings as features together with principal component analysis. The work in [25] embedded adaptive principal line distance into the low-rank representation of images for palmprint identification. For the same purpose, [100] used a schema based on collaborative representation [101] whose features came from block-wise statistics of CompCode maps.

Deep learning techniques have become a center of focus on image classification and recognition. Following this trend, some recent works on biometric identification make use of deep learning methods [54, 88]. Deep learning has been thoroughly investigated for the detection of retinal pathologies [34] and retinal vessel segmentation [15, 18, 31, 39, 51, 61, 80, 86], and recently also for retina biometric systems [62]. As to palmprint biometric systems, there are some works that use deep learning for the determination of multiscale features [14, 81, 83, 89, 99]. For instance, [102] proposed a neural network based on a stack of restricted Boltzmann machines (RBMs) and a regression layer while [65] proposed a convolutional neural network (CNN) that used predefined wavelet transform filters instead of learning filters from data. The studies in [17, 90], proposed the applications of the AlexNet with eight layers on enhanced images, and a Siamese-type CNN architecture for contactless palmprint identification, respectively. Moreover, a double-layer direction method was introduced in [26] for the extraction of the apparent surface layer direction and the latent energy layer direction features of palmprint images. Deep learning methods can achieve comparable or even better performance than the conventional palmprint recognition methods [22]. Nonetheless, they usually obtain features of high dimensionality and are computationally intensive, usually requiring GPUs to speed up computations. Additionally, deep learning-based approaches often suffer from a lack of interpretability, operating as a “black box” where the decision-making process is not easily understood or transparent. This opacity can be a significant drawback, particularly in critical applications where understanding the rationale behind decisions is essential. Our scope is to propose a highly interpretable and effective approach, aiming to maintain high performance while ensuring the methods are transparent and the decision-making process can be easily comprehended. Explainable biometric systems are crucial for building trust, ensuring accountability, and complying with regulatory standards.

We propose a versatile and domain-free method that is effective for both retina and palmprint identification. In our preliminary experiments, we found out that a hierarchical combination of trainable COSFIRE filters can be an effective tool for retina recognition [4]. Here, we simplify and make the method more efficient, and conduct several experiments to evaluate its versatility (by adding the palmprint biometric application), and its robustness to decision-based black-box attack [10, 95], partial matching. To the best of our knowledge, no single method has been proposed that is suitable for different biometric identification applications. The COSFIRE filters are highly efficient and energy-saving algorithms, especially compared to deep learning approaches. In [69], we provide a theoretical analysis of the number of FLOPS, highlighting their superior computational efficiency.

Unlike handcrafted detectors, whose selectivity are predefined in their implementation, we use trainable COSFIRE filters, which were found to be effective in various computer vision applications [5, 6, 27, 82]. A COSFIRE filter can be automatically configured from just one reference image, which is very suitable for biometric systems. It is represented by a set of tuples that carry information about the spatial arrangement of the dominant keypoints and their mutual spatial arrangement in a given reference image. Therefore, there is no need of segmenting the blood vessels or the palm lines. For a query image, we apply the hierarchical COSFIRE filters that represent all reference images and we classify it with the label (or person) of the COSFIRE filter that achieves the highest similarity score.

3 Method

COSFIRE filters have been demonstrated to be effective in the localization and recognition of patterns of interest in complex scenes. They have been widely used as two-layer architectures where they take input from low-level filters that are selective for certain orientations or to changes in contrast. In [7] it was demonstrated how COSFIRE filters can be used to provide input to other COSFIRE filters. This mechanism enables the construction of multi-layer or hierarchical COSFIRE filters that can allow more tolerance to deformations in the patterns of interest.

Below, we describe the three-layer trainable COSFIRE filters that we use for the representation of retinal or palmprint reference images as well as the similarity function that compares a query image with the reference images. In the bottom layer of the three-layer architecture, we use a bank of orientation-selective Gabor filters. In the second layer, we configure a small and fixed bank of keypoint selective COSFIRE filters. The third layer takes input from the keypoint selective filters of the second layer. Figure 2 illustrates the high-level architecture of the proposed biometric system.

Fig. 2
figure 2

A high-level architecture of the proposed biometric system

3.1 Two-layer COSFIRE filters

COSFIRE filters are trainable, in that their selectivity is determined by an automatic configuration process that extracts certain properties from some keypoints in a given prototype image. Figure 3 shows the configuration of one such filter with a Y-junction prototype pattern. We follow the most popular configuration strategy that has been used so far, which comprises four main steps namely convolve-ReLU-detection-description.

In the first step we convolve the prototype image with a bank of symmetric Gabor filtersFootnote 1 of eight orientations and one scale. We superimpose the resulting feature maps by taking the maximum response of all Gabor filters in every position. Then, we apply a rectification linear unit (ReLU), also known as half-wave rectification, which sets all negative responses to zero in the aggregated feature map. Next, we use a system of concentric circles, with a given set of radii, around a user-specified point, which is typically the center of the prototype. Then, we detect the local maximum responses in the aggregated feature map along these circles. Finally, we describe each keypoint i by a 4-tuple \((\lambda _i, \theta _i, \rho _i, \phi _i)\) where \(\lambda _i\) and \(\theta _i\) are the scale and orientation parameters, respectively, of the Gabor filter that achieves the maximum response in that keypoint, while \(\rho _i\) and \(\phi _i\) are the distance and polar angle with respect to the center of the prototype. We denote by \(B_\chi = \{(\lambda _i, \theta _i, \rho _i, \phi _i)~|~ i = 1\dots n\}\) a two-layer COSFIRE filter that describes the properties of n keypoints in the given prototype \(\chi\).

The response of the configured COSFIRE filter in every position of a given test image is computed by an aggregate function that combines the responses of Gabor filters whose parameters and positions are indicated in the set \(B_\chi\). In practice, we first generate n feature maps, one for each tuple in the set \(B_\chi\), and then apply the aggregate function along every column of the stack of feature maps. Each feature map \(F_i(x,y)\) is obtained by a set of four operations, namely \(convolve-ReLU-shift-blur\). The convolve step involves the linear filtering of a given image by a Gabor kernel with parameters \(\lambda _i\) and \(\theta _i\). Similar to the configuration stage, we then rectify the Gabor feature maps with the ReLU operator. The last two steps, blur and shift, deal with the preferred mutual spatial arrangement of the involved Gabor responses. In order to enable the columnwise operation that aggregates all feature maps, we shift all Gabor responses in the direction opposite of the corresponding polar coordinates \([\rho _i,\pi -\phi _i]\). Moreover, we blur the Gabor response maps in order to allow for some tolerance with respect to the preferred positions. The blurring function is a dilation whose structuring element is a Gaussian function with a standard deviation \(\hat{\sigma }\) that grows linearly with the distance from the support center of the COSFIRE filter: \(\hat{\sigma } = \sigma _0 + \alpha \rho _i\). The parameters \(\sigma _0\) and \(\alpha\) are determined empirically from the training and validation images on given data sets.

Fig. 3
figure 3

Configuration example of a two-layer COSFIRE filter. a A prototype pattern of size \(208 \times 242\) pixels. b A set of concentric circles overlaid on the maximum superposition of all Gabor filter response maps to the image in (a). The black spots indicate the keypoints that characterize the local maximum points along the circles. c Graphical representation that shows the structure of the resulting COSFIRE filter. The ellipses illustrate the scale and orientation of the selected Gabor filters and the blobs indicate the tolerance given with respect to the preferred positions

We denote by \(r_{B_\chi }(x,y)\) the aggregate function that combines by geometric mean all blurred and shifted Gabor feature maps \(F_i(x,y)\):

$$\begin{aligned} r_{B_\chi }(x,y) = \bigg (\prod _{i=1}^{|B_\chi |}F_i(x,y)\bigg )^\frac{1}{|B_\chi |} \end{aligned}$$
(1)

This equation is adapted from the research presented in [6]. For a comprehensive understanding of the two-layer COSFIRE filters and further technical details, readers are encouraged to refer to the same cited work.

3.2 Tolerance to rotation

The COSFIRE filter \(B_\chi\) configured above is selective for patterns that are similar to and in the same orientation of the prototype. In order to have a filter that is selective for the same pattern rotated by a given angle \(\psi\) we form a new filter, which we denote by \(B_{\chi ,\psi }\), by offsetting the orientations \(\theta _i\) and polar angles \(\phi _i\) of all tuples of the original filter \(B_\chi\):

$$\begin{aligned} B_{\chi ,\psi } = \{(\lambda _i,\theta _i+\psi ,\rho _i,\phi _i+\psi )~|~\forall ~ (\lambda _i,\theta _i,\rho _i,\phi _i) \in B_\chi \} \end{aligned}$$
(2)

We denote by \(r_{B_{\chi ,\Psi }}(x,y)\) the rotation-tolerant response by taking the maximum response of a set of COSFIRE filters with orientation preferences given in the set \(\Psi\):

$$\begin{aligned} r_{B_{\chi ,\Psi }}(x,y) = \max {\{r_{B_{\chi ,\psi \in \Psi }}(x,y)\}} \end{aligned}$$
(3)

These two equations are adapted from the work in [6].

3.3 Three-layer COSFIRE filters

In the two-layer COSFIRE filters described above, we have Gabor filters in the bottom layer which are selective for simple linear structures and whose responses are combined by geometric mean in the second layer. That architecture is suitable to construct COSFIRE filters that are selective for moderately complex and rather rigid patterns. Such COSFIRE filters were inspired by the functionality of shape-selective neurons in area V4 of visual cortex [73]. There is neurophysiological evidence that some neurons in the inferotemporal cortex (TEO) of visual cortex are selective for more complex shapes [13]. They have dendrites connected with shape-selective V4 cells. We use this evidence as a source of inspiration to configure a three-layer COSFIRE architecture that is selective for more complex shapes and more robust to deformations. In the new architecture, the third layer receives input from V4-like COSFIRE filters.

Retinal fundus and palmprint images are characterized by curvilinear structures, with junctions being the most salient features. Our hypothesis is that the mutual spatial arrangement of such junctions can be used as a biometric feature for person identification.

3.3.1 Configuration

We configure a couple of two-layer COSFIRE filters, one that is selective for symmetric Y-junctions denoted by Y and one that is selective for lines, denoted by L, and eight orientation preferences \(\psi \in \{0,\frac{\pi }{8},\dots ,\frac{15\pi }{8}\}\).

For each reference image, we apply in rotation-tolerant mode the \(Y-\) and \(L-\) COSFIRE filters. The suppressed response map, from each of the two-layer COSFIRE filters is then divided into a \(k\times k\) grid and the maximum value and its location are extracted from each tile. The list of \(k^2\) values for each filter is then normalized using the \(z-\)standardization. Finally, for each tile we configure a new two-layer COSFIRE filter \(\chi _i\) using the local pattern at the location with the maximum \(z-\)value of the Y and L filters in that tile. We denote by \(\hat{R}\), a set of 4-tuples that describes a given image:

$$\begin{aligned} \hat{R} = \{(\chi _i,\psi _i,\hat{\rho }_i,\hat{\phi }_i)~|~i=1\dots k^2\}, \end{aligned}$$
(4)

where \(\chi _i\) and \(\psi _i\) indicate the type and orientation, respectively, of the selected two-layer COSFIRE filter. The polar coordinates \((\hat{\rho }_i,\hat{\phi }_i)\) are determined with respect to the center of the involved keypoints and \(k^2\) is equal to the number of grid tiles in a \(k \times k\) arrangement. Figure 4 shows the configuration of a retina-selective three-layer COSFIRE filter that takes input from five two-layer COSFIRE filters. For better clarity, in this example we use three types of bifurcation-selective filters (instead of the Y and L mentioned filters mentioned above) and configure a three-layer COSFIRE filter with only 5 two-layer filters. In practice, the number of two-layer filters depends on the value of k that determines the size of the grid.

Fig. 4
figure 4

a At the top are the structures of the COSFIRE filters that are configured to be selective for symmetrical Y junctions, and straight lines. These COSFIRE filters are applied to a given retinal image such as the one shown in b which produces the aggregated feature map illustrated at the top in c. The cross markers in the bottom of (c) indicate the locations at which the bifurcation-selective COSFIRE filters achieve the five strongest responses. (d, left) The enlarged local patterns indicated by the cross markers in (c), and (d, right) the structures of the corresponding bifurcation-selective COSFIRE filters. Finally, e a retina- or person-selective COSFIRE filter is configured by using the mutual spatial arrangement of the identified bifurcations. The ‘*’ marker indicates the centroid of the five bifurcations

3.3.2 Response of a three-layer COSFIRE filter

The response of a three-layer COSFIRE filter is computed by applying the operations filter-ReLU-shift-blur. In the filter step, we apply the concerned two-layer filters \(B_{\chi _i,\psi _i}\). Their response maps are then rectified by the ReLU function. Next, we shift the rectified responses by the vector \((\hat{\rho }_i,\pi \!-\!\hat{\phi }_i\)) and blur them by a dilation operation with a Gaussian structuring element of a fixed standard deviation \(\hat{\sigma }\), irrespective of the distance from the support center. Similar to the two-layer COSFIRE filters, finally, we combine all resulting feature maps by geometric mean. As a post-processing step we suppress all responses that are less than 50% of the maximum. Figure 5 illustrates the application of the retina-selective COSFIRE filter configured in Fig. 4 to a query image that belongs to the same person of the image used for its configuration.

Fig. 5
figure 5

Application of a three-layer person-selective COSFIRE filter to a query image. a Structures of the bifurcation-selective applied to the given b query image. c The corresponding response maps of the filters in (a). d Blurred and shifted response maps and e the output put of the three-layer COSFIRE filter obtained by geometric mean

3.4 Similarity function

For a database of m reference images that belong to s persons, where \(s \le m\) (the same person may have more than one reference image), we apply to each given query image the m three-layer COSFIRE filters and determine the filter that achieves the maximum similarity score, which we compute by taking the maximum value in the response map and divide it by the ratio of the sum of the minor axes and the sum of the major axes of all connected components in the response map. The use of the minor–major axis ratio as a weighting is motivated from the fact that a response map of a three-layer COSFIRE filter should be characterized with a single circular blob for a genuine match. The query image is then assigned to the person, whose reference retinal or palmprint image was used to configure the filter concerned.

3.5 Hyper-parameters

The two-layer COSFIRE filters use three parameters. In the configuration, we use a set of radii values \(\rho\) of the concentric circles that are used during the configuration stage. The maximum \(\rho\) value, essentially, determines the support radius of the resulting COSFIRE filter. The other two parameters, \(\sigma _0\) and \(\alpha\), are required for the blurring function that is used to compute the intermediate feature maps for each tuple.

For the configuration of three-layer COSFIRE filters, we require the number \(k^2\) of two-layer COSFIRE filters that provide input, and a distance measure \(\eta\) that is used by the non-maximum suppression of the two-layer COSFIRE filter responses. Finally, for the computation of the response of a three-layer COSFIRE filter we use a fixed standard deviation \(\hat{\sigma }\) for the blurring function applied to the intermediate maps. Based on previous work, in our experiments, we use the following parameter values: \(\rho \in \{0,3,8\}\), \(\sigma _0 = 0.67\), and \(\alpha = 0.4\).

4 Evaluation

4.1 Data sets

We evaluated the proposed method on two data sets of retinal images and one data set of palmprints.

We used the retinal data sets VARIA [72] and Retinal Identification Database (RIDB) [93], which are very appropriate for retinal identification since they present different captures of the retinas of the enrolled individuals. These data sets are frequently used for both retinal recognition and vessel segmentation applications.

The VARIA data set comprises 233 images taken from 139 different subjects. The images were acquired with a TopCon non-mydriatic camera NW-100 model at a resolution of 768\(\times\)584 pixels. The optic disc appears centered in the images. Of the 139 subjects, only 59 have at least two different retinal images in the data set. In our experiments, we used the images of those 59 subjects, which amount to 153 images in total. The RIDB data set consists of 100 images of healthy retinas acquired with a TopCon TRC 50EX camera with a resolution of 1504\(\times\)1000 pixels. It includes 20 individuals with five images each.

For palmprint recognition, we used the cropped IIT Delhi Touchless Palmprint Database (IITD) [48, 49]. It consists of 2601 grayscale palmprint images at a resolution of \(150\times 150\) pixels from 230 persons. For each individual, there are five to six images from each of the left and right hands.

4.2 Preprocessing

In order to save computational time, we reduced the size of the retinal images by a factor of four, such that the images of the VARIA data set were resized from a resolution of 768\(\times\)584 to 192\(\times\)146 pixels, and the images of the RIDB data set were resized from a resolution of 1504\(\times\)1000 to 376\(\times\)250 pixels. The images of the IITD palmprint data set were not resized since the normalized images were already at a low resolution of 150\(\times\)150 pixels. Moreover, we enhance the images of all data sets by the contrast-limited adaptive histogram equalization algorithm (CLAHE).

4.3 Experiments and results

The performance of the proposed method for retinal and palmprint image identification was rigorously evaluated. For retinal images, we employed a one-versus-all strategy, ensuring each image was compared against the rest in the dataset. In the case of palmprint images, we aligned our methodology with established practices in the field, conducting four distinct sets of experiments. Each set varied in the number of reference images used, and each experiment was repeated five times to account for the randomness in reference image selection.

The effectiveness of our method was quantitatively measured using the recognition rate (RR). RR is defined as the ratio of correctly identified images, calculated as follows:

$$\begin{aligned} \hbox {RR} = \frac{\hbox {TP}}{(\hbox {TP}+\hbox {FP})} \cdot 100 \end{aligned}$$
(5)

where TP represents the number of true positives - instances where the method correctly identifies an image - and FP stands for false positives, where the method incorrectly identifies an image. This metric was chosen for its direct relevance to our method’s objective - accurately identifying biometric images. It succinctly captures our method’s effectiveness, making it straightforward to compare our results with existing approaches.

Our results, as detailed in Tables  1 and  2, showcase the proposed method’s performance in comparison with existing methods.

Table 1 Performance comparison between our and existing methods on the two data sets of retinal images

4.3.1 Effect of the number of two-layer COSFIRE filters

We evaluated the effect that the number of two-layer COSFIRE filters \(k^2\) used to configure the three-layer COSFIRE filters has on the performance. The plot in Fig. 6 shows that for the VARIA data set we require at least 16 two-layer COSFIRE filters in order to achieve high effectiveness. Note that the VARIA data set is more challenging than the RIDB as the authors have less reference images. The highest performance is reached with 16 and 25 two-layer COSFIRE filters for both data sets.

Table 2 Performance comparison between our and existing methods on the IITD data set
Fig. 6
figure 6

Recognition rates (RR) achieved by the proposed approach with varying number of tiles in the configuration of two-layer COSFIRE filters

4.4 Robustness to boundary attack

We evaluate the robustness of the proposed COSFIRE filters against a decision-based black-box attack, namely boundary attack, which merely relies on the final decision of the model. The decision of evaluating our approach with regards to this type of attack is twofold. First, our multi-layer COSFIRE filters are learning-free; and therefore, their configurations do not include any gradient optimization methods. Thus, in the absence of a learning algorithm, only black-box attacks can be considered for our approach. Second, the decision-based boundary attack is the most practical in real-world applications, such as biometric authentication, where the internal decision-making process is not accessible. It only requires the output of the model under investigation. This is in contrast to other black-box attacks that use transfer-based and gradient estimation-based techniques. Although transfer-based attacks do not require the model’s inner workings they need access to the training data in order to learn a substitute model that generates adversarial examples. On the other hand, attacks using gradient estimation need information regarding the logits of the model. Such attacks can be defended by performing robust training on a data set augmented by adversarial examples and adding stochastic elements, such as dropout. The requirements of the training data or the logits, therefore, make these two types of black-box attacks less relevant in practice.

The boundary attack [12] evaluates a sequence of perturbed images to generate an adversarial example. The attack starts by sampling a noisy image from a uniform distribution and iteratively modifying it in the direction of the decision boundary by performing a random walk along the boundary until it finds an adversarial image. An adversarial image is perceptually similar to the original image but the underlying model fails to classify it correctly. The algorithm is characterized by two main parameters, namely \(\delta\), which is the intensity of the total perturbation, and \(\epsilon\), which is the number of steps along the boundary to find an adversarial image similar to the original image.

Convolutional neural networks have been found to be vulnerable to adversarial attacks where the adversarial examples generated are imperceptible to the naked eye [33]. Various previous works have proposed explanations related to the vulnerability of deep neural networks. The primary cause of vulnerability of convolutional neural networks has been attributed to its discontinuous, linear nature and high dimensional input space [32, 33, 91]. Some works ascribe the cause of susceptibility to the difficulty of training robust classifiers due to large sample complexity [79] and a natural consequence to non-robust features in data sets, which show high predictability [40].

In order to demonstrate the robustness of our approach against the boundary attack we use the RIDB data set for evaluation. We generate the adversarial images from the query images by applying an untargeted boundary attack to the three-layer COSFIRE filters that we propose. An untargeted attack is when they do not aim to make the resulting adversarial images belong to a specific class. The aim is to make the adversarial images just off the distribution of the correct class. Specifically, we set the two main parameters as follows: \(\delta =5e-4\) and \(\epsilon =0.8\), which ensure the algorithm finds the most subtle adversarial images. Figure 7 shows the query images and their most subtle (i.e., with the lowest mean squared error) adversarial examples generated for the COSFIRE filters as soon as the attack converges. We can observe from Fig. 7 that the boundary attack fails to generate imperceptible adversarial images for the COSFIRE filters. In fact, such adversarial images can even be detected by simple histogram analysis. For completeness sake, we implemented the following basic technique to demonstrate how such cases can be detected [1]. We take the histogram of the saturation channel of the given image and subsequently calculate the percentage of pixels with minimum saturation that are greater than or equal to 0.05. This method yields a ROC-AUC of 1, meaning that by selecting an appropriate threshold (0.7 in this case) we can detect all adversarial attacks generated by this boundary attack.

Fig. 7
figure 7

Adversarial images generated from RIDB data set using boundary attack for 5 different users. The adversarial images are labeled with the mean squared error (MSE) between them and the corresponding authentic images

In order to make three-layer COSFIRE filters robust to such adversarial images for real-world applications we propose a pipeline that consists of this histogram analysis as the first step, where adversarial images are detected and discarded; while, the authentic images are then processed for recognition by COSFIRE filters in the second step. In this experiment we extend the test set of 100 authentic query images with another 100 adversarial images generated with boundary attack. The results include 100% detection of adversarial samples and 100% recognition of all authentic cases.

4.5 Robustness to partial matching

In order to compare the performance to the work in [30] in other challenging situations, we evaluate the robustness of the proposed approach when the retina and palmprint patterns of the query images are not entirely visible.

Similar to [30], we use random square-shaped regions of the query images with areas equal to 90 and 80% of the full image. To alleviate the random effect of the selected regions, we repeated the experiments 10 times and computed the average results. In Table 3, we compare the results yielded by our approach against those reported in [30]. In these experiments, all three-layer COSFIRE filters are configured with 25 two-layer COSFIRE filters.

Table 3 Performance of the proposed method for partial matching compared to the results obtained in [30]

5 Discussion

Our research contributes to the advancement in biometric identification, leveraging the novel hierarchical COSFIRE filters. This approach manifests several strengths, which are pivotal to its success. Firstly, it demonstrates exceptional accuracy: in retina datasets, our method achieves perfect classification, a remarkable feat that sets a new benchmark. In palmprint datasets, it substantially improves upon existing state-of-the-art results, underscoring its effectiveness in handling complex biometric patterns. This high level of accuracy is crucial in fields where precision is paramount.

Another major strength is the versatility of our method. It is uniquely adaptable across two biometric systems without necessitating any fine-tuning of parameters. This adaptability enhances its usability in a wide range of applications, making it a versatile tool in biometric analysis. Additionally, the method’s robustness is noteworthy, particularly against incomplete patterns and a sophisticated decision-based black-box adversarial attack. This robustness ensures reliability and security in applications where these attributes are non-negotiable, such as in secure access control systems.

A significant advantage of our approach is its explainable nature. Unlike many complex models that operate as ‘black boxes’, the hierarchical COSFIRE filters are intuitive and transparent, making the decision-making process understandable. This explainability is not only essential for gaining user trust but also aligns with the ethical principles in AI development. The COSFIRE filters, being trainable yet intuitive, offer an explainable model that does not require extensive learning or optimization algorithms. This shares conceptual and methodological parallels with recent advancements in medical imaging, such as the refined pre-trained deep model for cervical spine fracture and dislocation classification [68], the explainable transfer learning-based model for pelvis fracture detection [44], and the innovative Skin-Net architecture for skin lesions classification using multilevel feature extraction and cross-channel correlation [3]. These works highlight the importance of advanced feature extraction and the explainability of deep learning models in critical areas like medical diagnostics, aligning with our study’s focus on explainable and robust biometric analysis.

Moreover, the computational efficiency of our approach, bolstered by scalable computations, presents a significant advantage, especially in scenarios involving large-scale biometric data processing. In particular, the hierarchical COSFIRE filters that we propose share many computations and therefore the number of computations is not significantly affected by the increasing number of reference images. For instance, the bottom layer of these filters consists of a fixed bank of eight Gabor filters; while, the second layer consists of bifurcation- and line-selective filters. Irrespective of the number of reference images and, therefore, the number of COSFIRE filters, the number of computations required by the first and second layers remains constant. It is in the third layer where the number of computations increases minimally. This is due to the unique mutual spatial arrangement of all involved keypoints. The computations in the third layer, however, involve only the shifting of the feature maps generated in the second layer followed by geometric mean, which are all low-intensity operations. The current implementation of the proposed method does not take advantage of the parallelization capabilities of modern hardware. In particular, the fixed computations of Gabor and two-layer COSFIRE filters in the first two layers can be computed in parallel and stored in shared memory to be used by the third layer operations. Such parallel implementation could further improve the computing time needed for the identification, which is particularly important in those scenarios related to massive image processing.

While the approach has numerous strengths, it also presents areas for further improvement. The most prominent limitation is its specific applicability to shape-based pattern recognition. While highly effective in this domain, its utility in recognizing more complex or less structured patterns is an area that remains to be explored. Additionally, the current implementation does not fully exploit the parallel processing capabilities of modern hardware, an aspect that could significantly enhance processing speed and efficiency. Furthermore, the potential for performance improvement through automated hyperparameter tuning is a promising avenue that has yet to be fully explored in our research.

In our future work, we aim to address these limitations. We plan to extend the method’s applicability to more complex biometric patterns and harness the full potential of parallel processing technologies. The exploration of automated hyperparameter tuning also presents an opportunity to further refine and enhance the effectiveness of our approach.

5.1 Ethical implications

In addressing the ethical implications of our work with biometric data, we emphasize the importance of privacy and security throughout the identification process, as highlighted in the literature [78, 92]. Our approach is grounded in the guidelines for trustworthy AI, which mandate that systems should be lawful, ethical, and robust [16]. We adhere to the four ethical principles of respect for human autonomy, prevention of harm, fairness, and explicability. The explainable character of our hierarchical COSFIRE filter approach aligns closely with these principles, ensuring transparency and accountability in our method.

The risk of incorrect identification, a critical ethical concern in biometric systems, is minimized in our approach due to its high accuracy and explainability. This mitigates potential harm and aligns with the need for fairness in AI systems. Moreover, the datasets used in our research, being publicly available and anonymized, reduce privacy concerns. However, the application of these algorithms in real-world scenarios necessitates continuous attention to privacy, security, and adherence to regulatory frameworks.

In summary, our research is conducted with a strong commitment to ethical principles, considering the sensitive nature of biometric data and the implications of its application in AI systems.

6 Conclusions

We demonstrated that the proposed three-layer COSFIRE filters are effective for person identification using retina and palmprint biometric analysis. For the retina benchmark data sets, RIDB and VARIA, we achieve perfect classification and for the palmprint data set IITD we outperform the state-of-the-art. In further experiments, we also demonstrated that the proposed approach is robust to boundary attack and incomplete patterns—to some extent—in the query images.

An important contribution of this work is the versatility of the hierarchical COSFIRE filters. Without any fine-tuning, we demonstrated that the proposed approach can be applied to two biometric applications and achieved high-performance results. Such versatility makes them appealing to other applications that require the modeling of sets of moderately complex local features. Moreover, COSFIRE filters are learning-free, in that they only require one example to configure a filter. As a result, they are highly interpretable, as the cause of their output can be easily traced.