Introduction

The face and its components detection, in particular, the eyes are known as one of the challenging issue in the area of machine vision in the last two decades. There are various researchers that are working in this way to propose potential solutions. The detection of the eyes states has many practical uses including security and safety systems, control systems, driver behavior and human computer interaction (HCI). It should be noted that a number of techniques are presented in the area of the face and eye detection as well as its tracking. These methods are realized based on the different features such as composition, shape and colour or a combination of them. One of the methods for the face detection is to use the skin colour, which is efficient in widespread applications up to now [16]. Another method for the face detection is to use facial features extracted by the principal component analysis. This is widely used in the face detection and the corresponding recognition systems [7]. There is also motion analysis to find the head and the face in video frames [8]. It is also possible to take advantage of recursive nonparametric discriminant analysis (RNDA) to extract features of the eyes and pass it to a classifier for detection [9].

Other techniques have been also suggested to detect and classify the states of eyes including distinctive features of eyes, such as corners and coloured sections of the iris. And also variance filter analysis can be realized for eye detection and tracking utilising vertical projection for finding the exact location. The distinctive features of eyes can be realized to detect the eyes and its states, while corners of the eyes and edges of the iris in line with its center should be identified to locate the upper eyelid. The distance between the iris and upper eyelid is important to establish if the eye is open or closed and how far it is closed. A method based on line to line analysis of the face area is considered for eye blink detection [10, 11]. For this, the face area is scanned and analysed line by line from top to bottom to locate the position of eyes. After detecting the eyes, its profile is matched against a database of known states of the eyes to determine these ones. Light flow is used to analyse future frames for detection of eyelid motion. A very fast and reliable method for face detection in colour pictures is considered through the AdaBoost method. In this method, three original ideas are realized, while the first one is a representation of pictures as integral pictures so that the required features can so quickly be calculated [1214].

Instead of relying on intensity of light, it uses the Haar-like features. Using integral pictures, the speed of processing is significantly increased by reducing the number of calculations. Each Haar-like feature can be calculated at any scale in any location in constant time. The key point is to reduce the number of features using the AdaBoost. In any picture, the number of Harr-like features is much bigger than the number of pixels. By focusing on a small number of critical features, the performance can largely be improved. The feature selection can be improved by modifying the AdaBoost. It is to note that the weak learner is constrained by reducing the weak classifier to one single feature. The selection of each new weak classifier can be viewed as a feature selection method in the boosting process. This provides an effective generalised algorithm for learning [1517]. The outcomes are to focus on the areas of the picture, which have important features. By combining more complex classifiers in a cascade structure, the speed of detector is increased and focused on the regions of the picture. This helps us to determine in which areas of the picture objects may occur and focus on the more complex processing for these regions. Using the false-negative rate, almost all objects can be chosen through the filter [18].

The rest of the research is organised as follows: The proposed high-resolution detection approach and its experimental results are presented in “The proposed high-resolution detection approach and its experimental results”. Finally, the research concludes the investigated outcomes in “Conclusion”.

The proposed high-resolution detection approach and its experimental results

The proposed high-resolution detection approach is schematically illustrated in Fig. 1. It is to note that designing a system in capable of automatic and real-time eye detection and its states in the different light conditions is always difficult. The eyes are complicated by two-dimensional pictures with a number of varied states and features depending on emotional state of the subject. This research is to introduce an accurate and useful approach to detect the eyes and its states, carefully. To reduce the number of calculations involved in this complicated analysis, the principle component analysis (PCA) in association with the artificial neural networks is realized.

Fig. 1
figure 1

The schematic diagram of the high-resolution detection proposed approach

In fact, the output of the aforementioned PCA is passed to the artificial neural network for purpose of classifying the results to the open and closed eyes.

The database

To compare the effectiveness of different algorithms, a systematic and complete database is first necessary to be evaluated. This is made up of pictures of eyes, which are taken from various angels at different lighting conditions in two states including the open and closed ones. Because there is no readily available database for this purpose, the team created a unique database of pictures to be applicable. The present database consists of 640 pictures taken from the eyes of 160 subjects in both open and closed states. This includes 135 men and also 25 women. Moreover, 27 people out of 160 subjects wore spectacles. They are in all different ages. The pictures are taken in full colour at resolution of \(640 \times 480\) pixels covering the whole face. Using MATLAB programming language, these are converted to greyscale pictures of eyes only at a resolution of \(81 \times 81\) pixels. As are obvious in Fig. 2, these are classified into two classes of open and closed eyes.

The feature extraction and the artificial neural network training

To realize the approach proposed here, the feature extraction in connection with the artificial neural network training should be implemented. In one such case, the pictures are split into two groups based on the state of the eyes including the open and closed ones. From each group, randomly, 80 % (256 pictures) are selected to train and 20 % (64 pictures) are also selected to test. Because of this selection technique, there is no possibility of the same picture being used for training and testing. Using the PCA, the eigenvectors and eigenvalues of the present pictures ate to be extracted. From the point of these values, the PCA is asked to select 260 features to pass to the artificial neural network. A multi-layer perceptron (MLP) neural network is employed to carry out the analysis. Hence, the matrix of input regarding the MLP for training consists of 512 rows and 260 columns. Moreover, the output regarding the MLP consists of two neurons, which determine states of open and closed eyes through a binary code.

To determine the optimal hidden layer and learning rate in the MLP, the Imperialist competitive algorithm (ICA), the particle swarm optimisation (PSO) and the genetic algorithm (GA) are realized. The data have been normalised by sigmoid function to an output of between 0 and 1. The MLP learns the appropriate bias and weights from the training sample of 512 pictures. The neural network parameters, which used for the simulation, are tabulated in Table 1. One important point to note at this stage is that with each epoch the mean square error rate reduces as illustrated in Fig. 3, where the true-positive rate in line with the false-positive rate in the form of the region of convergence (ROC) is shown in Fig. 4. As expected to reach, while the pictures that are used in the process of training are given as the input, the outcomes for each one of the classes are 100 % accurate. In the process of testing, the input matrix has 128 rows and 260 columns, respectively. The output for each row is a figure between 0 and 1 as detailed above.

Fig. 2
figure 2

The samples of database

Table 1 The parameters of the artificial neural network
Fig. 3
figure 3

The outcome of the MSE

Fig. 4
figure 4

The outcome of the ROC

The statistics for the tests are tabulated in Table 2. It should be noted that the class is taken as the state of the eye including 1 being closed and 2 being open. The sensitivity-true-positive rate (SEN − TPR), the specificity-true-negative rate (SPC − TNR), the accuracy (ACC) and the positive predictive value (PPV) as the performance are taken as:

$$\begin{aligned}&\mathrm{SEN}-\mathrm{TPR}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}=\frac{\mathrm{TP}}{P}\end{aligned}$$
(1)
$$\begin{aligned}&\mathrm{SPC}-\mathrm{TNR}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}=1-\mathrm{FPR}\end{aligned}$$
(2)
$$\begin{aligned}&ACC=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}\end{aligned}$$
(3)
$$\begin{aligned}&\mathrm{PPV}=\frac{\mathrm{TP}}{\mathrm{TP+}\mathrm{FP}} \end{aligned}$$
(4)

where TP is taken as the number of the true-positive predictions, TN is taken as the number of the true-negative predictions, FP is taken as the number of the false-positive predictions and, finally, FN is taken as the number of the false-negative predictions.

Table 2 The confusion matrix for each class

The face detection

The face detection is important to be used in the process of the proposed high-resolution detection approach. To reduce the effect of different lighting conditions, the normal red–green–blue (RGB) colour model should be replaced by the normalised red–green (RG) colour model. This transformation idea is realized by the following:

$$\begin{aligned} r=R/(R+G+B)\end{aligned}$$
(5)
$$\begin{aligned} g=G/(R+G+B) \end{aligned}$$
(6)

In this method, the geometric properties of the face are used to determine five points, where they are taken for the two eyes, the two nostrils and the mouth, respectively. A sample of pictures used to be evaluated through the proposed approach is illustrated in Fig. 5, where eye regions are extracted to be passed to the MLP for processing as shown in Fig. 6.

Fig. 5
figure 5

The samples of the pictures

Fig. 6
figure 6

The samples of the eyes extraction

The investigated outcomes

Four people are first chosen and a video of them in various states with one or two eyes in either open or closed states has been taken to cover all the possible positions to be evaluated. The outcomes of the sample picture from the proposed approach are illustrated in Fig. 7. In these pictures, the system has identified the eyes regions by drawing a frame around each one of the eyes. It has also identified the states of the eyes by colour coding including the red for closed and the green for open ones.

Fig. 7
figure 7

The results of the detection

The proposed approach for the detection of eyes and its states are carried out by 25 new subjects using continuous video frames. Each video has 1000 frames. Each subject is dealt with to blink 15 times during the process. The light source is placed above the head of subjects. The acquired results of four subjects, which are randomly selected, are tabulated in Table 3.

Table 3 The results of the proposed algorithm

The results indicate that the proposed approach is obviously carried out with the high performance. The investigated results are considered to be compared with two benchmarks including the Sirohey and the Lalonde, as well. The results are tabulated in Table 4, where the outcomes confirm that the proposed one is favourably behaved with respect to the benchmarks.

Table 4 The comparison of the proposed approach with the two benchmarks

Conclusion

A real-time high-resolution detection approach is considered in the present research to deal with the human eyes in connection with its states through intelligence-based representation. The proposed approach is able to address a distinction between the open and closed states. There are a number of processing steps to be carried out including detection of the face through AdaBoost technique and also identifications of the areas of the mouth, the left and right nostrils as well as the eyes that are all important to be evaluated. It is to note that the features of the open and closed eyes are considered to identify the states of the eyes through the principal component analysis in association with the artificial neural network. Regarding the further works, it is possible to develop the approach investigated here by linking the colour frames around the left and right eyes through the corresponding left and right mouse clicks to deal with a various applications. It is also possible to track the position of the iris and link it to the movement of pointer on the screen. Both these ideas can improve the human–computer interaction.