1 Introduction

The growing desire for increased security in everyday life because of digitalization has prompted the development of a reliable and intelligent biometric-based person identification system. The measuring and statistical analysis of people's unique physical and behavioral features is known as biometrics. The technology is mostly used for identification and access control. Traditional identifying methods make use of cards or passwords. These methods can be harmed by misplacing or stealing cards, as well as forgetting passwords. That is why biometric identification technologies that can identify people without relying on what they have or what they recall are much required.

The scientific community tested several biometric approaches for human recognition, such as the fingerprint, voice, face, iris, signature, retina, palm print, gait, and hand geometry [1]. These characteristics are more trustworthy (although they do have certain limitations) compared to the other traditional security systems. For example, speech biometric security apps suffer some problems such as when various persons have identical voices, the effect of both health and aging, and how the media is transmitted. In addition, the signature biometric apps have their own set of constraints. For example, given practice, it is possible to copy someone's signature. On the other hand, face biometrics are influenced by non-uniform lighting, beards, scars, age indications, and wounds. Fingerprint biometrics are also subject to finger contamination with oil, grease, and other substances. The retina biometric has limitations as well since it can be damaged if overexposed to infrared illumination.

Iris biometric systems have garnered increased attention in recent decades because of their distinctiveness and relevance as a biometric authentication method. Using pattern recognition and digital image processing techniques, this system distinguishes persons based on the texture of their iris. Figure 1 shows that the iris is an annulus that sits between the eye pupil and the sclera. It is an externally visible internal organ that is protected by the cornea. According to the literature, a human's iris remains steady throughout his or her life, except for a few minor modifications throughout childhood [2].

Fig. 1
figure 1

Eye anatomy

The proposed Iris Recognition system consists of the following steps: edge detection, iris segmentation, feature extraction and classification. In this paper, edge detection is done using Canny Edge Detection technique, segmentation is achieved using Hough transform for localizing the iris and pupil regions, and then CNN and HD used for feature extraction and classification to increase the accuracy of the Iris Recognition.

The rest of the paper is arranged as follows: Sect. 2 displays the related work utilized in developing the suggested models. Section 3 explains the phases of the proposed system in detail. Firstly, in Sect. 3.1, the three used datasets are described in detail. After that, sub-Sects. 3.2 and 3.3 describes the edge detection and segmentation hybrid technique applied on the image on one of the datasets; Casia-Iris-Interval, IITD or MMU. At the end of Sect. 3, feature extraction and classification using CNN and HD are mentioned. Section 4 displays the experimental results and discussion. Finally, Sect. 5 provides a summary of the entire study.

2 Related Work

In this section, the authors introduced a comprehensive survey of related research that revealed some modern analysis studies of Iris Recognition. The following related work are various Iris Recognition systems discussed as a whole system with different classifiers. Our proposed system is used some of the datasets and classifiers mentioned the related work.

Alaslani et al. [3] introduced the evaluation of the extracted learned features from a pre-trained convolutional neural network (CNN) (Alex-Net) followed by a multi-class support vector machine (SVM) algorithm to implement Iris Recognition. Support Vector Machine is generally thought to be a classification technique; however, they may be used in both classification and regression issues. It can handle a large number of continuous and categorical variables with ease. To differentiate various classes, SVM creates a hyperplane in multidimensional space. SVM iteratively develops ideal hyperplanes, which are used to minimize errors. The circular Hough transform (HT) was used to perform the iris segmentation, while the rubber sheet model was used for the normalization. The resulting image was fed as an input to the CNN. The proposed model experimented on public datasets, such as CASIA-Iris-V1, IITD iris databases, CASIA-Iris-Interval, and CASIA-Iris-thousand. The achieved results showed that the accuracy of the presented system is higher than when extracting characteristics from the normalized photos.

Yiming et al. [4] presented a HT-based Iris Recognition method. The iris image was first pre-processed, and then the iris boundary was discovered using the HT in conjunction with the Canny edge detection. The annular iris region was then normalized into a rectangular area using the Rubber sheet method, and the 1D Log-Gabor filter pair was applied. The textural properties of the normalized iris picture were retrieved, and the final pattern matching was done using a recognition method based on the Hamming Distance (HD) classifier. The CASIA V1.0 iris image database was utilized as the experimental object in this study, and the simulation was implemented on the MATLAB IDE to evaluate the effectiveness of the Iris Recognition algorithm. The demonstrated experiments proved that Iris Recognition is an applicable and effective technology.

The Radon Transform Thresholding (RTT) and Gradient-based Isolation (GI) approaches were proposed in [5]. The significant characteristics of the pre-processed picture were extracted using RTT. GI is a pre-processing approach that takes advantage of the Gradient operator's edge detecting capability and isolates the patterns to get the most important iris textures. A feature based on Binary Particle Swarm Optimization (BPSO) was used. The feature vector space was searched using a selection technique. The experiments were done on three different databases: Phoenix, IITD, and CASIA. The feature extractor's performance was evaluated for a variety of block sizes. The total number of blocks in the image grew as the block size decreased, increasing the calculation time per image. The average testing time for the Phoenix database was 700 ms per image on a PC with an Intel Core i7, 2.4 GHz CPU, and 8 GB RAM, which was a constraint for real-time applications.

Liu et al. [6] utilized Gaussian to pre-process images by fuzzifying the area outside the border, using triangle fuzzy average and triangular fuzzy median smoothing filters. They used the upgraded photos to train deep learning (DL) systems using fuzzy operations, which sped up the convergence process and improved identification accuracy. Fuzzified image filters made pictures more instructive for DL, as seen by the saliency maps. Many more deep learning applications of image processing, analysis, and prediction may benefit from the suggested fuzzy image operation.

Danlami et al. [7] evaluated the influence of the Legendre Wavelet filter through the Gabor Wavelet filter in an experiment. They used the databases CASIA V4 distances and intervals, UBIRIS V2, and MMU V2. They looked at the complete database, except for the partially captured image and the angled photos. The researchers used the HT, Rubber sheet, and HD methods for performing the segmentation, normalization, and matching, respectively.

Data Augmentation is a well-known approach in image processing, particularly in computer vision, for increasing the diversity and quantity of training data by performing random (but realistic) modifications. Image resizing, rotation, and flipping are just a few examples. This strategy allows us to obtain a more diversified nature of previously contained data, resulting in a better training set and, as a result, a better-trained model. The data augmentation has aided in the generation of enormous datasets on individuals, which has made developing improved deep learning models for iris identification challenging. Deep convolutional networks and a mixed convolutional network were suggested in [8]. The ADAM optimizer, which generates gradients using adaptive momentum, had a better learning method and process than the Stochastic Gradient Descent with Momentum (SGDM). The hybrid CNN with SVM, on the other hand, outperformed the raw CNN architecture in terms of accuracy. This was due to the SVM's capacity to deal with features in multidimensional space. It also avoided the use of handmade segmentation, which is used in numerous deep learning algorithms, while still delivering comparable results. This study was enhanced by looking into other learning optimizers and adding new layers. Also, the other hyper-parameter adjustments can be investigated. It may present an opportunity for the classifier's performance to be improved further. Due to the multi-dimensional elasticity of the SVM rather than the fully connected layer, the hybrid CNN with SVM performs better than the traditional CNN. However, there are a few gaps in this strategy. The suggested methodologies' performance measurements are confined to the IITD database, and the network's performance may fail for other iris datasets. The other problem is that the hybrid structure requires too many calculations. Even if convolutional features are more distinguishable, deep structures and huge data samples need more computation.

A sparse representation of the iris identification model based on compressive sensing and k-nearest segments was presented in Bhateja et al. [9]. The k-nearest subspace technique was utilized for shortlisting the classes to minimize time. The selected applicants were separated into sectors, and each field was given scant respect. Three classifiers were used: the k-nearest distance classifier, the sector-based classifier, and the Cumulative Sparse Concentration Index (CSCI)-based classifier. A classifier combination technique based on additive functions was used, with each classifier having its weight which is learned using a Genetic Algorithm. The technique is very resilient, with a FAR of practical nil, according to results acquired from several datasets.

Nishanth et al. [10] presented a unique Iris Recognition system that enhanced the recognition rate and minimized the number of features evaluated by conducting feature extraction using the Gabor filter and Discrete Cosine Transform, followed by feature selection using Dynamic Binary Particle Swarm Optimization (DBPSO). Binary Particle Swarm Optimization (BPSO) is a discrete form of PSO that changes particle velocities depending on the likelihood that a particle co-ordinate will change to either 0 or 1. The number of features picked was greatly reduced when the number of iterations of the proposed DBPSO algorithm was increased, with only a little trade-off in the average recognition rate. The relevance of this strategy was that it is dataset-independent and assures that the number of features picked was reduced regardless of the dataset. In the IITD dataset, the suggested technique achieved an average recognition rate of 96.46%, while in the MMU dataset, it achieved a rate of 78.07%.

Tallapragada et al. [11] presented a novel segmentation technique for segmenting the partially visible iris region in this work. The suggested segmentation achieved 90% accuracy throughout the MMU iris dataset and took 1.8 s to segment each iris. Different features were taken and integrated from the segmented iris region to generate a feature vector. They were classified using the decision tree classifier, which is made up of a rooted tree that is driven by a node called root, which is prime, and the rest of the nodes are called leaves. The algorithm of decision trees automatically generated a decision tree for the provided dataset such that the inaccuracy was as little as possible. The decision tree classifier attempted to find a decision tree T with a given cost function and optimize the cost function. L-labeled collection samples were attempted to optimize the decision. When a tree was supplied, it would search for the best class from the given dataset.

Innovative strategies were suggested in [12], that is, Contrast Enhancement employment, Top Hat and Bottom Hat filters to improve the gradience between brighter and darker pixels, and DWT + DCT feature extractor to identify the important features. A feature selection approach based on BPSO was utilized to explore the feature space for the optimum feature subset. A full Iris Recognition system was shown for improved recognition performance. Experiment findings on two benchmark iris datasets, IITD and MMU, showed that the suggested approaches for Iris Recognition performed well.

A multi-unit feature level fusion strategy for iris-based biometric systems was described in [13], with the goal of enhancing identification accuracy even for poorly segmented iris images. Based on the available research, it was obvious that greater attention should be placed on the pre-processing and segmentation stages for an iris-based biometric system to become trustworthy and accurate. Firstly, Daugman's Integro-Differential Operator was used for unconstrained eye images, and the iris area of interest was recovered without removing noise components, such as eyelid and eyelash occlusions, specular reflections, and so on. By working on missing values in badly segmented iris pictures, Probabilistic Principal Component Analysis (PPCA), a new feature selection technique, was applied to provide a high identification rate. To increase feature selection accuracy, the approach of multi-unit feature level fusion methodology was presented. When tested on the MMU dataset, the suggested technique achieved an 83.3% identification rate even for incorrectly segmented iris photos.

3 The Proposed Model for Iris Recognition

In this work, the proposed Iris Recognition system includes the following phases shown in Fig. 2: (1) Reading Dataset, (2) Edge Detection, (3) Localization and Segmentation, and finally (4) Feature Extraction and Classification. The novelty of the proposed Iris Recognition system is the hybridization between the two techniques edge detection and segmentation, in which the image is first edge-detected and segmented to find all the edges, boundaries and circles in the eye image. After that, the output image of these phases goes into the feature extraction phase. This combination between edge detection and segmentation helps in the phase of feature extraction and classification, which makes the accuracy higher than using only one of the techniques before classification. Each phase will be discussed in the next sub-sections.

Fig. 2
figure 2

Sample images for a CASIA, b IIT Delhi, and c MMU, respectively

3.1 Dataset

Three datasets are used in building this model, which are CASIA-Iris-Interval V4, IITDelhi, and MMU, where all the images are greyscale. The resolutions for CASIA, IITD, and MMU datasets are 640 × 480, 320 × 240, and 320 × 240, respectively. Figure 2 shows image samples for the aforementioned datasets.

3.2 Edge Detection

Edge detection is a method in image processing that is used to determine the edges of objects inside images. It is carried out by taking into account the intensity variation that exists in one or more regions of an image. In applications, such as computer vision and image processing, edge detection is a very common problem. It is easy to see why we rely on edge detection for activities like depth perception and recognizing things in our range of view. Canny edge detection algorithm [14] is one of the most used algorithms in edge detection. It is used in the proposed model to detect the image's edges, which aids in the identification of objects, mostly circles. Canny's algorithm uses the first derivative for edge detection, taking intensity into account. In regions where the intensity does not change, a value of 0 is established, while in regions of a rapid intensity change, a value of 1 is established. This algorithm is divided into steps. Initially, the operators blur the image in the smoothing stage where the Gaussian filter has been applied to remove noise. Then, to acquire image gradients, the largest magnitude of the image gradient is computed using the step operator for marking the edges. Thereafter, the non-maximum suppression phase is used, in which the operator simply looks for local maxima and identifies them as edges, followed by a two-fold threshold approach with hysteresis to select the probable edge. Finally, a binary image is obtained, with either edged or non-edged pixels. In more detail, this binary edge map can be thought of as a set of edge curves that can be represented as polygons in the image domain. Edge detection applied to the used datasets is shown in Fig. 3.

Fig. 3
figure 3

a Canny edge detection on CASIA. b Canny edge detection on IITD. c Canny edge detection on MMU

3.3 Localization and Segmentation

In general, a human eye has both dark and bright intensity regions. Because of the presence of such dark regions, pupillary border extraction is considered as a difficult process, due to the fact that the gray level intensities in these regions are often close to one another. Fortunately, these regions may be distinguished by their geometrical characteristics as well as substantial compactness of the pupil region. Both segmentation and localization are applied on a coarse iris region after the canny edge detection phase.

Image segmentation is a type of image processing method that divides an image into various portions sharing similarities, based on their attributes and qualities. In other words, it is commonly used to detect objects and boundaries in images. In this study, the Hough circle transform segmentation technique was used [15]. A Circular Hough transform is used twice: first to determine the iris/sclera boundary from the whole eye and again to determine the pupil/iris border from the iris area. Circular Hough transform will construct a circle in Hough space with varying radii for each edge point. The highest point in Hough space corresponds to the radius and center coordinates of the circle best delineated by the edge points. When compared to lines, circles are more simply represented in parameter space since circle parameters may be immediately transferred to the parameter space. As noted in the preceding section, Canny's method was used to identify the image's borders and to help in the identification of objects, usually circles, using the Hough transform. Edge detection was accomplished by taking into account the intensity fluctuation between one or more regions of an image. Following the identification of the edges, figures with a portion or whole circumference were recognized to locate the iris. Although it was only a partial circumference owing to things like eyelashes, eyelids, or the use of spectacles, it was the most visible within the image. Recognizing circumferences for iris detection was facilitated by the identification of edges. Table 1 shows the localization and segmentation phases applied to the aforementioned three datasets.

Table 1 Localization and segmentation phases applied to different datasets

3.4 Feature Extraction and Classification

The extraction of the most discriminating features from an iris pattern is considered the most critical stage in the Iris Recognition process. The feature extraction phase has the greatest influence on the recognition rate of matching two iris templates. CNN is used in this study to extract features from a segmented iris image. To compare the similarity of two iris templates, there is a need for a matching metric. This matching measure determines if the two templates belong to the same or separate people. Herein, either CNN or HD is applied in feature extraction and classification phases. The input to the CNN or HD is the iris image after being edge-detected and segmented.

3.4.1 Convolutional Neural Network (CNN)

Two sub-datasets are used for creating the CNN model: one for training and the other for testing. The training dataset is classified into sub-folders, each sub-folder contains some iris images for one person where the CNN model searches for unique features in each one. This dataset is often much larger than the set of data for testing. Furthermore, the greater the quantity of training dataset, the higher the quality of the output using the trained dataset [13].

To discover different features, the machine uses the training dataset and layers aid in the extraction of features. In this situation, a layer indicates a specific process that turns the image into a different shape, size, color, or appearance using pixels that are much smaller than the real image. Many layers are added to images of the same category, which are subsequently saved in the CNN network. The stored features are based on the likelihood of several features from the training dataset being repeated. A generic layout of CNN is presented in this work. It is built on convolutional layers based on experimental analyses. Achieving an efficient and effective iris representation necessitates the creation of a good CNN architecture. In this regard, greyscale iris images with a resolution of 320 × 240 pixels are delivered into the established network which is extensive, with a high number of convolutional layers.

In CNN, each convolutional layer (conv1 to conv5) is followed by batch normalization. Pooling is often done after every two convolutional layers, and there is a total of four. The network is first built by stacking consecutive convolutional layers. Moreover, the first two pooling operations are performed after the convolutional layer, whereas additional pooling (pool3) is performed immediately after conv5. Usually, extremely tiny convolution kernels of size 3 × 8 (with stride of 2, padding of 'same') are used. Throughout the network, max-pooling is done with stride 2 across a 2 × 2-pixel frame. Each output neuron is coupled to all inputs in the top three layers, which are completely connected. Softmax classifier receives the output of the last fully connected layer. The learning rate is set to 0.01 for all datasets and then dropped by a factor of 10 when the validation error rate stops improving. In the training procedure, 15 epochs are employed with shuffling. I tried more than 15 epochs and the accuracy was not satisfactory. In all hidden layers, Rectified Linear Unit (ReLU) was applied as an activation function. During training, Stochastic Gradient Descent (SGD) is utilized to optimize the system and a back-propagation algorithm is used to calculate the gradients.

3.4.2 Hamming Distance (HD)

The hamming distance is used as a second method for feature extraction and classification, to determine if the two templates are created by heterogeneous iris or the same iris.

HD value resulted from comparing X and Y bits as the ratio of the number of different bits to the total number of bits in the template. ⊕ is the XOR operation and N is the feature length code. Xj and Yj are the identical bit code used to represent the two template image feature codes, respectively.

$$HD=\frac{1}{\mathrm{N}}{\sum }_{j=1}^{N}\left({X}_{j} \oplus {Y}_{J}\right)$$

After the image is edge-detected and segmented, a feature template and an associated noise mask are created.

This template is compared to all of the final templates that have been enrolled, and a Hamming distance (HD) is calculated. While the iris templates created from separate irises are wholly unique, the values of the two-bit patterns are completely random, and the HD between them should be larger than or equal to 0.35, but the HD for the same iris templates should be close to 0 [4]. Our model predicts that the HD for iris images from the same eye is between 0.14 and 0.35 and for iris images from separate eyes is between 0.36 and 0.56. The HD range we employed was determined by the experiments described in the next section.

4 Experimental Results

The performance of the proposed systems is compared to previous literature based on recognition accuracy as shown in Tables 2, 3, 4. The iris image is utilized as an input for feature extraction, and the classification process is performed using CNN or HD. The testing was performed on three publicly available datasets: IITD, CASIA-Iris-Interval V4, and MMU. The entire proposed scheme of Iris Recognition was implemented using MATLAB. To test the experiments, 600 jpeg iris images from the CASIA dataset with resolution 640 × 480, 639 bmp iris images from the IITD dataset with resolution 320 × 240, and 45 images from the MMU V1 dataset are used with resolution 320 × 240 in bmp format. To compute all accuracies, true positive matches and false negative matches are divided by the total number of picture samples in each dataset [13].

Table 2 Accuracy of the CASIA-Iris-Interval dataset
Table 3 Accuracy of the IITD dataset
Table 4 Accuracy of the MMU dataset

The proposed CNN classifier tested CASIA-Iris-Interval dataset achieved 91.56% in comparison with [3] which achieved 89% accuracy using Alex-Net and SVM as shown in Table 2. When compared to other feature extraction models, the depth of the Alex-Net model is quite low, making it difficult to learn features from image sets and necessitating more time to attain higher accuracy. HD classifier was applied on the same dataset which achieved 94.88% accuracy compared to [4], which also used the same classifier, but our maximum HD was 0.35 compared to theirs which was 0.45, where the smallest HD is known to have a high probability of being correct. The HD used two kinds of filters [7]: The Gabor filter is a widely used approach for feature extraction. However, Legendre wavelet filter has nearly the same properties as Gabor filter and was designed based on the order of its polynomials, giving Legendre wavelet filter an advantage over Gabor filter. The Legendre Wavelet filter, like the Gabor Wavelet filter, is a linear texture analysis filter. Based on the efficiency and accuracy of their operations, the Legendre Wavelet filter was compared to the Gabor Wavelet filter, and a considerable improvement was achieved. They achieved lower accuracy than our enhanced one when applied to MMU and CASIA datasets. Radon transform and gradient-based isolation techniques used in [5] achieved 84.17% and 95.93% for CASIA and IITD datasets, respectively, compared to our results, 91.56% for CASIA and 96.56% for IITD using CNN.

The IITD dataset achieved 96.56% and 94.3% for the proposed classifiers which are higher compared to the following techniques as shown in Table 3. The hybrid technique (CNN + KNN) [8] achieved 86%. CNN's performance with KNN was unsatisfactory. This is due to the fact that the KNN uses all of the data in the sample space, making classification harder with a bigger dataset. Compared to Dynamic Binary Particle Swarm Optimization used in [10], the results were somehow close to our CNN results, and when applied to the MMU dataset, they achieved 78.07%, which is lower than our classifiers applied on the same dataset as shown in Table 4. When applied to the MMU dataset as shown in Table 4, the testing results were 98.01% for the proposed CNN classifier compared to [11, 12], and [7]. Our neural network architecture contains five layers to improve upon the previously mentioned techniques. For the same dataset, HD obtained 94.96% accuracy compared to [13], the accuracy being low due to the inaccurate segmentation of iris images which is a very important phase in Iris Recognition.

Figures 4, 5, 6 show the training and validation processes using CNN on CASIA-Interval, IITD, and MMU datasets. To build a CNN system, two datasets are needed: one for training and the other for testing. The computer searches for distinctive characteristics in each category using the training photos or train dataset. Layers aid in the extraction of characteristics and the stored features are based on the probability of several characteristics from the training dataset being repeated. When an image is evaluated, the same layers are added to it and then analyzed to decide which category best matches its features.

Fig. 4
figure 4

Accuracy of 91.56% of the CASIA-Iris-Interval dataset

Fig. 5
figure 5

Accuracy of 96.56% of the IITD dataset

Fig. 6
figure 6

Accuracy of 98.01% of the MMU dataset

All our accuracies are high due to the integration between the pervious phases which are edge detection using Canny edge detection and segmentation using HT in addition to the feature extraction phase using CNN. The classifier achieves high accuracy if the main phases of Iris Recognition are accurate, especially segmentation and feature extraction. Our hybrid techniques of using Canny edge detection and HT and then feature extraction helped the classifiers to achieve higher accuracy.

5 Conclusion

Iris Recognition is a difficult problem to solve in the imaging environment. As a result, we have suggested a hybrid technique for improving iris identification system performance in a noisy imaging environment, as well as increasing Iris Recognition rates on the CASIA-Iris-Interval, IITD, and MMU datasets. Accuracies are high owing to the interconnectivity of the previous steps, which include edge detection using Canny’s algorithm, segmentation using Hough transform, and feature extraction with CNN or HD. If the primary steps of iris identification, particularly segmentation and feature extraction, are precise, the classifier achieves high accuracy. Our hybrid technique of edge detection and segmentation, followed by feature extraction, assisted the classifiers in achieving greater accuracy. HD had high accuracy compared to the previous work because the iris image was processed in an accurate way in the edge detection and segmentation phases. The highest accuracy in HD is 94.88% when applied on Casia-Iris-Interval, and the CNN achieved 98.01% when applied on MMU dataset. In terms of recognition accuracy, the proposed system exceeds the compared ones.