Keywords

1 Introduction

The sclera is a white region in the eye, around the eyeball. It contains randomly distributed blood vessels, which have recently been investigated as a biometric trait in an eye recognition system for personal identification [5, 7]. The first step for an eye recognition system based on the extraction of the sclera vessels is sclera segmentation. However, uncontrolled human posing, multiple iris gaze directions, and illumination variations result in many challenges in sclera segmentation. For these reasons, some sclera vessel identification methods adopt manual sclera segmentation [9, 10].

Fig. 1.
figure 1

Examples of images with different issues

Several methods for sclera segmentation have been proposed in the literature. A semi-automatic technique is proposed in [4], where the output of an automatic clustering method is refined by manual intervention. Some fully automatic methods are robust to illumination variations and to occluded sclera regions and these are based on color image segmentation in the HSV color space [13, 14]. An adaptive approach to sclera segmentation based on active contours has been presented in [3]. The same authors propose a sclera segmentation method based on a fusion technique which processes each pixel of the eye image in different color spaces [2]. A method based on multiple classifier architecture is presented in [12].

In this paper, a fully automatic unsupervised sclera segmentation method is proposed. This approach has been designed to take into account the different iris gaze directions and illumination variations in eye images. It allows you to obtain a correct segmentation independently of whether or not the sclera appears divided into many parts, whether only one part of the sclera is visible or whether the sclera completely surrounds the iris (see Fig. 1).

The color image is converted into a gray level image, where the sclera is highlighted. Next, the image is binarized by means of a clustering and an adaptive thresholding. For each foreground component of the binary image, a score is computed taking into account the shape and geometric features of the component. Finally, the sclera is detected by analyzing the score information and relative positions of the candidate components.

Our algorithm was submitted to the Sclera Segmentation and Eye Recognition Benchmarking Competition (SSRBC 2017) [1, 6]. The dataset of SSRBC 2017 included eye images acquired in the visible spectrum and having a high degree of variation in illumination and eye pose.

The rest of the paper is organized as follows: in Sect. 2, the proposed method is presented; Sect. 3 describes the obtained results; and finally in Sect. 4 some conclusions are drawn.

2 The Method

The sclera is a white area so that its pixels have a high intensity in each channel of the RGB color space. Skin pixels can also have such a feature, but the intensity in the red channel (R) is generally higher than one associated in the blue (B) and green (G) channels. To discriminate the color information associated with the sclera from a characterizing skin region, the color input image is converted into a gray level image by merging the three channels with a low contribution of the red channel.

The input image I is decomposed into R, G and B channels and the intensities of the each channel C are mapped with a range from 0 to 1 by means of a non-linear transformation based on a quasi-sigmoid function defined in [8]:

$$\begin{aligned} C'=128\left( 1+\frac{1-b^{\frac{C}{1.5\mu }}}{ab^{\frac{C}{1.5\mu }}+1}\right) \end{aligned}$$
(1)

with a and b being pre-defined constant terms with values of \(a=2+\sqrt{3}\) and \(b=7-4\sqrt{3}\), respectively. The choice of the \(\mu \) value assumes a fundamental rule in order to avoid the possibility that intensity values greater than \(\mu \) are predominantly mapped onto high values in the range [0, 1]. We assume:

$$\begin{aligned} \mu = min(nR,nG,nB) \end{aligned}$$
(2)

where

$$\begin{aligned} \begin{array}{ll} nR = mean(R) + \sigma (R)/2 \\ nG = max(G) + \sigma (G)/2 \\ nB = max(B) + \sigma (B)/2 \\ \end{array} \end{aligned}$$

where \(\sigma (R)\) (\(\sigma (G)\), \(\sigma (B)\)) represents the standard deviation of intensity values in the R (G, B) channel(s). Since the mean value of R is considered, differently from G and B for which the maximum value is computed, the red channel contributes with a low weight to the calculation of \(\mu \). Next, a gray level image \(I_Q\) is obtained by combining the normalized channels \(R'\), \(G'\) and \(B'\) according to the following formula:

$$\begin{aligned} I_Q = B' + G' - R' \end{aligned}$$
(3)

Finally, a full scale histogram stretch of \(I_Q\) in the range [0, 255] is performed. Note that the red channel \(R'\) is subtracted in the combination of normalized channels of I in such a way as to highlight the sclera. Figure 2 shows the difference between the gray level image obtained using a classic weighted average method to convert a color image in grayscale and the one computed according to (3) for a running example.

Fig. 2.
figure 2

(a) Running example, (b) gray level image according to a classic weighted average method, and (c) gray level image according to (3). (Color figure online)

The second step of the method is based on an incremental clustering algorithm [11] to divide \(I_Q\) into a set of regions characterized by gray levels with similar information content. For each couple \(q_i\) and \(q_j\) of gray levels in \(I_Q\), a distance measure \(d(q_i,q_j)\) is computed by summing the following parameters:

  • the actual difference between \(q_i\) and \(q_j\) defined as \(log(|q_i-q_j|+1)\);

  • the weighted difference between \(q_i\) and \(q_j\), given by

    $$\begin{aligned} 1/|I_Q|\cdot (|w(q_i)q_i - w(q_j)q_j|) \end{aligned}$$
    (4)

    where \(w(q_i)\) (\(w(q_j)\)) represents the occurrence of \(q_i\)(\(q_j\)) in \(I_Q\); and

  • the joint sparsity index defined as:

    $$\begin{aligned} \sigma (q_i, q_j)=\frac{min\{\sigma (q_i), \sigma (q_j)\}}{max\{\sigma (q_i), \sigma (q_j)\}} \end{aligned}$$
    (5)

    where \(\sigma (q_i)\) (\(\sigma (q_j)\)) is the standard deviation of the coordinates of the pixels with a gray level \(q_i\) (\(q_j\)).

The assignment criterion of the gray level is based on the thresholding of the distance measure, which has to be smaller than a fixed value t. In our experiments t has been set to 1.5. This incremental clustering algorithm produces h clusters (with \(0<h<255\)), such that indexes assigned to the cluster increase proportionally to gray level of the pixel they include. A new image \(I'_Q\) is obtained, by assigning to each pixel p, with gray level \(g_i\) in \(I_Q\), the label associated with the cluster including \(g_i\). Figure 3a shows \(I'_Q\) in false color.

Thresholding of \(I'_Q\) allows an assignment of non sclera pixels to the background. In particular, each pixel p of \(I'_Q\) is assigned to the foreground if its value is larger than \(t'\), where \(t'=avg(I'_Q)+0.75\cdot \sigma (I'_Q)\). Let \(I''_Q\) be the obtained binarized image (see Fig. 3b).

Fig. 3.
figure 3

(a) Clustered image, (b) binary image, and (c) candidate connected components. The green component has the highest score value, the blue component falls inside the region of interest (ROI). (Color figure online)

In the following section, only connected components of the foreground pixels will be taken into account.

The third step of the method involves the computation of a score for each connected component \(F_k\). Such a score is computed on the basis of the following parameters:

  • the Proximity of \(F_k\) to the center of the image computed as:

    $$\begin{aligned} \alpha (F_k)= 1/2\cdot \delta / \sqrt{W^2+H^2} \end{aligned}$$

    where \(\delta \) is the Euclidean distance between the center of \(F_k\) and the center of the image. The values W and H are the width and height of the image.

  • the Density \(\omega (F_k)\) computed as the ratio between the area of \(F_k\) and the area of the convex hull of \(F_k\).

  • the Compactness of \(F_k\) defined as:

    $$\begin{aligned} \gamma (F_k)= 1/|F_k|\cdot (\sum _{i,j}{g(i,j)/bg(i,j)}) \end{aligned}$$

    where g(ij) is the gray level in \(I_Q\) of the pixel p of \(F_k\) with coordinates (ij), while

    $$\begin{aligned} bg(i,j) = \left\{ \begin{array}{lr} g(i,j)+ \alpha (F_k) \; if\; p\;is\;a\;border\;pixel\\ \alpha (F_k)\;otherwise \\ \end{array} \right. \end{aligned}$$

    Note that before the computation of \(\gamma (F_k)\), a process is performed to fill holes included in \(F_k\).

Finally, the score of \(F_k\) is given by:

$$\begin{aligned} s(F_k)=\gamma (F_k)\cdot \omega (F_k) \end{aligned}$$

The foreground connected component of \(I''_Q\) with the highest score is considered as belonging to the sclera (the green connected component in Fig. 3c). Let \(F_m\) be the selected connected component. Since in an eye image the sclera can be divided into many regions, as in the case of the running example, a further selection of the foreground connected components is performed. Precisely, only connected components \(F_k\) with \(k\ne m\) such that \(s(F_k)> 0.70\cdot s(F_m)\) are candidates to belong to the sclera (the blue and red connected components in Fig. 3c). A region of interest (ROI) is selected in such a way that the bottom and the top of the ROI are positioned in correspondence with the lowest border pixel and highest border pixel of \(F_m\), respectively (see Fig. 3c). The width of the ROI is equivalent to the width of the image. Any candidate component \(F_k\) having at least one pixel included within the selected ROI is considered as belonging to the sclera (the blue connected component in Fig. 3c). It is worth to notice that all parameters have been set experimentally so to obtain the best segmentation performance.

Finally, a process to fill the holes of the detected sclera regions is performed.

3 Results

The experiments have been performed on an Intel Pentium i7-6900K@3.2 GHz with 8 GB of RAM. Our Matlab implementation of the method requires 0.34 s to segment an eye image with resolution 1295\(\,\times \,\)576 pixels.

Some results of our method can be observed in Fig. 4, where in each line the input image, the ground truth and the result of our method are shown from left to right.

An evaluation of the proposed method has been performed by a team of the SSRBC 2017 organization [1, 6] in terms of precision and recall. Precision was considered the most important measure for the performance evaluation. The results of the competition are shown in Table 1.

Fig. 4.
figure 4

Results of our method: (a) original image, (b) ground truth, (c) results of the proposed method.

Table 1. Results of SSRBC 2017

Our algorithm obtained a satisfactory result in terms of precision and it was ranked in second position in the classification.

The tuning dataset provided by SSRBC 2017 to the participants in competition contained 120 eye images from 30 individuals. The images were acquired under different illumination conditions and contained noise such as reflections or occluded sclera regions. We obtained 95.47% in terms of accuracy, 98.80% in specificity, 82.32% in recall and 90.36% in precision. Although the evaluation of the performance of our algorithm on the whole dataset of SSRBC 2017 proved to be lower than the performance evaluated on the tuning data set, the results of our experiments encourage future developments of the proposed method by improving, particularly in terms of the analysis of the scores and positions of the foreground components. In particular, we underline that the proposed approach is completely unsupervised, so it does not require training.

4 Conclusions

In this work, we have presented an unsupervised sclera segmentation method for visible spectrum eye images. The method is based on a gray level clustering, applied after a suitable conversion of the RGB image into a gray-scale image. Score computation and feature extraction are involved in the detection of the components representing the sclera. The method provides results that are satisfactory both from a qualitative point of view and in terms of precision. The proposed approach was ranked \(2^{nd}\) in SSRBC 2017.