An automated classification framework for glaucoma detection in fundus images using ensemble of dynamic selection methods

Glaucoma is an optic neuropathy, which leads to vision loss and is irreversible due to damage in the optic nerve head mainly caused by increased intra-ocular pressure. Retinal fundus photography facilitates ophthalmologist in detection of glaucoma but is subjective to human intervention and is time-consuming. Computational methods such as image processing and machine learning classifiers can aid in computer-based glaucoma detection which helps in mass screening of glaucoma. In this context, the proposed method develops an automated glaucoma detection system, in the following steps: (i) pre-processing by segmenting the blood vessels using directional filter; (ii) segmenting the region of interest by using statistical features; (iii) extracting the clinical and texture-based features; and (iv) developing ensemble of classifier models using dynamic selection techniques. The proposed method is evaluated on two publically available datasets and 300 fundus images collected from a hospital. The best results are obtained using ensemble of random forest using META-DES dynamic ensemble selection technique, and the average specificity, sensitivity and accuracy for glaucoma detection on hospital dataset are 100%, respectively. For RIM-ONE dataset, the average specificity, sensitivity and accuracy for glaucoma detection are 100%, 93.85% and 97.86%, respectively. For Drishti dataset, the average specificity, sensitivity and accuracy for glaucoma detection are 90%, 100% and 97%, respectively. The quantitative results and comparative study indicate the ability of the developed method, and thus, it can be deployed in mass screening and also as a second opinion in decision making by the ophthalmologist for glaucoma detection.


Introduction
Glaucoma is a leading cause of irreversible blindness, and the manifestation of glaucoma is unknown until it reaches the advanced stage. Hence, periodic eye checkup is the sole way of detecting the disease and preventing further blindness. Glaucoma is defined as a progressive optic neuropathy that damages the structural appearance of optic nerve head also known as optic disk (OD). The major cause of glaucoma is a decrease in outflow of the intra-ocular fluid called as aqueous humor in the eye [1][2][3]. Glaucoma prevalence is likely to increase by 112 million by 2040 worldwide [3]. Fundus photography of the optic nerve head is a non-invasive way used by ophthalmologist for observing the changes such as cup-to-disk ratio (CDR) and loss in the neuro-retinal rim (NRR) area which are caused due to glaucoma [3,4]. The area of optic cup (OC) to the area of OD is known as CDR [1]. NRR area is the area present between OD and OC. The thinning of NRR area is also observed as glaucoma advances [1]. Figure 1 shows the fundus image of a normal eye with parts labeled. Manual assessment of the fundus images is subjective to inter-or intra-observer variations. Hence, development of an automated detection system which processes the fundus images for quantifying glaucoma is of great advantage in mass screening and providing a second opinion for ophthalmologist during diagnosis.

Literature review
The recent methods described in the literature for development of a glaucoma detection system can be divided into segmentation-and non-segmentation-based methods.
Segmentation-based methods involve extraction of clinical features such as CDR and NRR area. Dela et al. [5] applied active contours and Hough transform in the red channel to segment OD. The displacement of vessels in the OD is estimated using chessboard metric to detect glaucoma. The method obtained an accuracy of 92% for a hospital dataset of 67 images. Although the method developed a unique feature such as displacement of vessel, it lacked the main clinical feature CDR. Ashish et al. [6] used adaptive threshold obtained using intensity of pixels for segmentation of OD and OC. Further, CDR and NRR features are calculated and the accuracy obtained is 94% for a hospital dataset of 67 images. The segmentation method is not designed to exclude artifact, which in some images is segmented as a part of OD. Soorya et al. [7] developed a method which tracks the bends in the vessel present inside the OD in order to obtain the OC contours. The OD contour is obtained using thresholding and point contour joining. The method used CDR as a feature for glaucoma detection and obtained an average accuracy of 97% for 225 images, which were collected from a local hospital. Although the method achieved a good accuracy, the OC contours can be detected only in images having high vessel contrast. Pardha et al. [8] performed OD segmentation using region-based active contour and OC segmentation using clustering. The method is tested on 59 images obtained from a hospital. The method obtained an average dice coefficient of 97% and 87% for OD and OC, respectively. The method does not report glaucoma detection accuracy. Kasu et al. [9] segmented OD and OC using fuzzy-C-mean (FCM) and Otsu thresholding method. CDR and energy-based wavelet features were further estimated for glaucoma detection. The obtained accuracy for glaucoma detection is 97% using artificial neural network (ANN). Although the method achieved good detection result on 86 images obtained from a hospital, it needs to evaluate its OD and OC segmentation algorithm for clinical implementation. Soltani et al. [10] used canny edge detection for obtaining OD and OC contours. A fuzzy engine is developed which considers CDR along with patient's health data for classifying the fundus images as normal or glaucoma. The obtained classification accuracy is 96%. Chia et al. [11] developed a fully convolutional network in order to obtain the OD and OC contours. Along with the CDR feature, the patients' health data was considered for glaucoma detection. The method is implemented on a hospital dataset of 2554 fundus images. The accuracy of correct classification is 91%. The main limitation of this method is the need of a large dataset for training the model and resizing the image in order to reduce the computation time. Julian et al. [12] segmented OD and OC by developing a framework based on convolution neural network (CNN). The output of the filters is trained using a soft max logistic model and is subjected to convex hull and graph cut. For final segmentation, CDR is estimated for glaucoma detection. The method achieved a dice coefficient of 97% and 87% for OD and OC, respectively. The method is evaluated on Drishti dataset which is publically available for research. Nevertheless, the segmentation results achieved are good, and the optimization of parameters is an issue for reducing computation complexity. Perdoma et al. [13] developed a three-step CNN model. In the first step, CNN with 15 layers is developed for OD and OC segmentation. In the second step, CNN with 12 layers is designed for extracting morphometric features from the segmented OD and OC. In the third step, CNN is trained for classifying the fundus images based on the features extracted. The method obtained a classification accuracy of 89% on Drishti dataset. Sevastopolsky et al. [14] developed OD and OC segmentation method based on U-Net. U-Net consists of contracting and expansive path. The CNN architecture is used to build the contracting path, and the image information is merged in expansive path. The dice coefficient for OD and OC is 94% and 85%, respectively. The segmented OD and OC are used to compute CDR. The glaucoma detection accuracy is not reported. Thakur et al. [15] developed a hybrid model consisting of adaptive FCM and level set for segmenting OD and OC. The accuracy of OD and OC segmentation for Drishti dataset is 93% and 92%, respectively. The main limitation of this method is over-and under-segmentation for lowcontrast fundus images. Accuracy of glaucoma detection is not reported. Cheng et al. [16] used super-pixel-based classification of OD and OC. The OD boundary is initialized using features based on histogram and statistics. For OC boundary, the local information is used along with histogram and statistics. The area under curve achieved for glaucoma detection is 0.80 on 650 images collected on a hospital dataset. Civit et al. [53] selected U-Net as a segmentation network and developed functions that implement generalized U-Net which is adapted to execution of tensor processing unit for service based on cloud. Tulsani et al. [54] segmented OD and OC using custom U-Net++ architecture which minimizes the loss of context and image local information by employing the encoder, decoder, and skip connection. The study shows that U-Net is effective for the process of segmentation even for a small data. For glaucoma detection, the method achieved an accuracy of 94% and 91% for Drishti and RIM dataset, respectively. When using a U-Net model, for extraction of feature and restoration it executes repeated convolution functions and thus requires several trainable factors.
Non-segmentation-based methods involve extraction and classification of features based on how spatially the color or intensity of the pixels is distributed in the image. Singh et al. [17] localized OD using bit plane analysis and extracted wavelet features. Evolutionary attributes and principal component analysis (PCA) are used as feature selection methods. The obtained accuracy is 94% using support vector machine (SVM) classifier for 63 images collected from a local hospital. The accuracy can be further improved by adding clinical features. Maheshwari et al. [18] extracted features such as Kapoor, Reyni, fractal dimension and Yager. Least square SVM is used for classifying the extracted features. The accuracy obtained is 95% for a hospital dataset of 488 images. Kevin et al. [19] developed a glaucoma detection model using higher-order cumulant (HOS) features. Linear discriminant analysis is used as a feature reduction method. These features are used to train naïve Bayes (NB) and SVM classifier for classification. The method was tested on 272 images collected from a hospital and achieved an accuracy of 92%. The detection accuracy can be further improved by using clinical features such as CDR. Rajendra et al. [20] extracted features such as kurtosis, Kapoor entropies, energy, mean, Reyni, Shannon and variance from the Gabor transform coefficients. These features are ranked using t-test. The method obtained a classification accuracy of 93% on 510 images collected from a hospital. Haleem et al. [21] developed an image feature model, which uses vascular convergence to locate OD. The localized OD is used to extract Gaussian, wavelet, gradient and Gabor features. These features are used to train the SVM classifier. The method obtained an accuracy of 94% on a RIM dataset, which is a publically available dataset. The redundant features are reduced since they are extracted from OD. Dua et al. [22] trained Lib SVM, sequential minimal optimization classifier with wavelet-based features. SVM achieved a higher accuracy of 93% on 63 images collected from a hospital. The method needs to be tested on a large dataset for clinical implementation. Raghavendra et al. [23] proposed an 18-layer CNN model which maps the pixels in a hierarchical form to classify the fundus images. The model is trained on a hospital dataset of 1426 images and achieved an accuracy of 98%. Although the method achieved a good classification accuracy, the method cannot be generalized on small number of images, as it requires a large number of images for training. Gour et al. [24] proposed a glaucoma detection model which uses histogram-based gradient features and gradient information scale for capturing the shape features. A total of 1448 features are extracted, and the prominent features are selected using PCA. The method obtained an accuracy of 79% on Drishti dataset. The accuracy can be further improved by using clinical features. Akram et al. [25] localized OD in the red channel by using Laplacian of Gaussian filter. The vascular density information is considered, and multivariate with m-mediods is used for classification. An accuracy of 91% is obtained for 462 images. Mookiah et al. [26] proposed a model which uses random transform and histogram equalization as a pre-processing method. Discrete wavelet transform and HOS features are extracted and used to train SVM classifier. The method obtained an accuracy of 95% on 60 images. The method developed a glaucoma risk index, which aided in good classification accuracy. However, the method needs to be tested on a large dataset for clinical implementation. Raja et al. [27] developed a hybrid swarm optimizing method. The features are extracted from hyper-analytic wavelet transformation to preserve the phase information. Classification is performed using SVM classifier with radial basic function. A group search optimizer with ranging and area scanning feature is embedded in the particle swarm framework for better detection. The method obtained an accuracy of 95% on RIM dataset.
The above discussed methods report good results for glaucoma detection. But, there still exist some limitations such as: (i) over-and under-segmentation which can cause a significant difference in clinical features such as CDR; (ii) there is a lack of methods which considers both the features based on clinical evaluation (CDR and NRR) and features based on texture (color and intensity) for classifying the fundus images; and (iii) different classifiers make different errors; hence, developing ensemble of classifiers needs to be explored for glaucoma detection. The proposed methodology aims to overcome these limitations. Hence, the main

Proposed method
The proposed method for glaucoma detection is evaluated on three datasets. Here, in the datasets used, the term "annotations" refers to the class (normal or glaucoma) and also to the disk and cup boundary masks. (1) The images are collected from Kasturba Medical College (KMC), Manipal, Karnataka, India. The annotations for all the 300 (glaucoma-205, normal-95) fundus images are given by the ophthalmologist. The images are captured by using a Zeiss FF450 plus fundus camera with a resolution of 2588 × 1958 pixels. The data collection has also been approved by the KMC ethics committee. (2) Drishti [28] is an online dataset which is publically available. The annotations for all the 101 (glaucoma-70, normal-31) fundus images have been provided. The fundus images have a resolution of 2896 × 1944 pixels. (3) RIM version 3 [29] is an online dataset which is publically available. The annotations for all the 124 (glaucoma-39, normal-85) fundus images have been provided. The fundus images have a resolution of 1072 × 1424 pixels.
The proposed methodology for automated glaucoma detection consists of the following steps: pre-processing, segmentation methods for OD and OC, extraction of features and classification. Figure 2 shows the flow design of the proposed framework for detection of glaucoma.

Pre-processing
The segmentation is often hindered by the presence of blood vessels. Hence, the blood vessels are detected and excluded from the fundus image. In the RGB fundus image, the blood vessels appear to be more evident in the green channel. Hence, the green channel is selected based on the literature [55]. The noise in the green channel is removed by applying a 2D Gaussian smoothing kernel having σ = 4. This results in a Gaussian filtered image. On a set of 20 images, the value of standard deviation varies for all possible values. It is found that σ = 4 gives more appropriate blood vessel detection. In order to enhance the contrast of blood vessel, a linear structuring element with size 150 is considered for angle orientations varying in steps of 45°from 0°to 360°. The responses are summed up, and dilation and erosion operation is performed. This response image is subtracted from the Gaussian filtered image in order to enhance the irregularly distributed blood vessels. The image is then subjected to Otsu thresholding. Inpainting is performed by using the Mumford-Shah method [30]. This gives blood vessel excluded fundus image. Figure 3 illustrates the blood vessel extraction and exclusion.

OD segmentation
The pre-processed image is used for segmentation of OD and OC. The OD appears to be more prominent in the red channel of the RGB image [5,6,8,9,15]. The red channel is considered, and statistical feature, namely absolute mean, is computed from the red channel and is subtracted in an iterative manner. The number of iterations considered is three. The procedure is repeated for different iteration numbers, and the best results are retained and reported. For the third iterative image, Prewitt operator is used to compute the edges having high intensity. A circle finder operation is performed to determine all the possible circles. For this purpose, the minimum radius of OD considered is 2.5 mm [31] which is 9.5 pixel in terms of pixel distance. The minimum radius is taken as 10, and the maximum radius is taken as twice the minimum radius, i.e., 20. A circle is defined as (1) where r is the radius and the circle center is defined by a and b. For r value in the range (10,20), the optimal centers of the circles are obtained by using (2) and (3): where x and y are the edge coordinates defined by the Prewitt operator, and the range of angle θ varies as 2 r minimum , 2 r maximum . This gives all the possible circles. The circumference points are obtained by varying the angle in steps of 45°from 0°to 360°. The mean of the circumference points is subtracted with the iterative image. This further reduces the background variability and enhances the OD region. This image is then subjected to threshold operation.
To determine the threshold value, a decision tree classifier, which uses Iterative Dichotomiser 3 (ID3) [32] algorithm, is used. A set of 20 images obtained after the third iteration of absolute mean computation and subtraction in an iterative manner is considered from the RIM dataset. The images are overlapped with their corresponding binary mask, and the mean intensity values of the background and the region of interest (OD) are obtained. These intensity values are given for a decision tree with a single split which uses ID3 algorithm. The ID3 gives a boundary value, which minimizes the entropy over all the possible boundaries. This is used as a threshold value, to obtain the binary mask of OD. The threshold value obtained for the OD segmentation is not specific to a particular dataset, and hence, it can be used for other fundus dataset. An eccentricity threshold value of 0.6 helps to eliminate any noisy pixels which are present in the binary mask of OD. The threshold value for eccentricity is obtained by examining different values on 20 images [33]. The final resulting binary mask is the region of OD. Figure 4 explains the segmentation of OD.

OC segmentation
For OC segmentation, the green channel is considered because the OC appears to be more prominent in the green channel [8,9,15]. In order to reduce background variation, in the green channel successive computation and subtraction of absolute mean are performed in an iterative manner. The number of iterations considered is three, and the resulting image shows the pixels having high intensity that belongs to OD and OC regions. In order to retain only the pixels belonging to OC region, all the absolute means computed in the three iterations are added and the result is subtracted with the absolute mean of the last iteration. In order to sharpen the edges belonging to OC region, the resulting image is subjected to successive computation and subtraction of standard deviation. This is done for two iterations. This gives a new channel, with increasing contrast of pixels belonging to OC region. The new channel is subjected to K-means clustering. Clustering helps in grouping high-intensity pixels in a single group. K-means clustering helps in quick convergence. Clustering is performed as follows: (1) The number of k initial centroids considered is 4. This is evaluated using silhouette criteria [34] to avoid randomization in selecting the initial number of clusters. In the silhouette method, the silhouette coefficients of each point are computed. This measures the similarity of a point to its own cluster when compared with other clusters [34]. (2) The distances from each pixel P(x, y) to the centroid {C 1 , C 2 , C 3 , C 4 } are computed by squared Euclidean distance d = (P i − C n ) 2 , where n is the cluster number and {i = 1, 2, 3 …N} in which N is the pixel number. P i is assigned to the cluster with smaller distance. (3) After assigning all the pixels, the centroids are recalculated. (4) Step 2 and step 3 are repeated until there is no change in the centroid. (5) The cluster that has average pixel intensity value greater than 3 is segmented as region belonging to OC. Figure 5 explains the segmentation of OC.

Feature extraction
Using the segmented binary mask of OD and OC, clinical features, namely CDR and NRR area, are obtained. From the fundus image, features, namely: gray-level co-occurrence matrix (GLCM)-based features, texture directionality feature extracted from N + 1 directional difference of Gaussian filters, Gabor features, Hu-invariant moments and color features, are extracted. Each of these features is explained in the following.

CDR area
CDR is defined as given in (4). CDR value of 0.3 is considered as normal. Using the binary image of segmented OD and  [8] OC, the area of white pixels is estimated and the CDR ratio is obtained. The CDR values will be used as a feature for glaucoma classification.

CDR =
Area of segmented OC Area of segmented OD . (4)

NRR area
The area present between OC and OD is the region of NRR.
The changes in the NRR area are estimated using the inferior superior nasal and temporal (ISNT) rule. According to the ISNT rule, the NRR area is in a decreasing order of thickness around (I > S > N > T ) area. Figure 6 shows the ISNT quadrants in NRR area for a normal eye. The NRR area ratio defined in (5) is used to verify ISNT rule. A normal eye has NRR area greater than 1, and for a glaucoma eye, the NRR area is close to 1 or less than 1 [8,9,21]. NRR = sum of area in inferior and superior quadrant sum of area in nasal and temporal quadrant .

GLCM
GLCM defined by Harlick is the statistical approach which helps in examining the image texture by considering the spatial relationship of the pixels [35]. In order to achieve rotation invariance, the 13 GLCM features [35]

Invariant moments feature
Invariant moments help in describing the image texture features and shape. They have a significant contribution in image analysis and pattern recognition. Hu constructed seven invariant moments by considering the second-and thirdorder central moments according to the algebraic invariant [36]. While constructing the seven Hu-invariant moments, the central moment is used to eliminate the impact of image translation. The normalized processing is used to remove the influence of image scaling. The rotation invariance is achieved by polynomial construction. Hence, to obtain translation, rotation and scaling invariants we extract the seven Hu-invariant moments as defined by [36][37][38]. These seven Hu-invariant moments are used as features for glaucoma classification.

Gabor features
In a 2D plane, the impulse response of a Gabor function [39] is given as (6) f where μ = Gabor functions radial frequency and σ x and σ y = Gaussian envelop along x-and y-axis, respectively.  (7): where * denotes the 2D convolution operation. The Gabor features Gabor Features pq are obtained as average output of I pq (x, y), i.e., S × L (8 × 8 = 64), for each segmented window. Out of this, three features, namely (i) maximum of Gabor features (G max ), (ii) minimum of Gabor features (G min ) and (iii) the range {G max − (G min )}, are used to represent the texture features, since they are rotation invariant.

Texture directionality feature extracted from N + 1 directional difference of Gaussian filters
The Tamura directionality equation is used to compute the texture directionality [41]. Unlike the existing Tamura directionality feature which uses the Prewitt operator for edge detection, the proposed method uses difference of Gaussian for edge detection. The use of difference of Gaussian produces a sharpened image with edges having increased contrast when compared with the Prewitt operator. Directional filters are used as a prominent descriptor of texture in image analysis, and they are used to smooth the images and retain the edge information [40]. The impulse response of a directional filter is given by (8): where G k (x, y) is a Gaussian filter given by (9): where C k = normalized constant, and the values of (x ı , y ı ) are related to (x, y) by a rotational amplitude θ i as (10) and (11): The parameters σ x k and σ y k values are chosen such that the second filter is directional and the first filter is isotropic or less directional. In order to capture a better enhancement of directional patterns in fundus images, a difference of Gaussian is chosen. The rotation parameter θ decides the number of directional filters N. θ varies in steps of π N , i.e., 12°from 0 to π . N = 15 gives a step size of 12°. It is observed that a step size less than 12°resulted in increased processing time, without significant changes in the response image. A step size greater than 12°results in incorrect values. The output of filter bank is expressed as N + 1 filter images I i . Since θ varies between 0 and π , with a step size of 12°, there are 16 directional Gaussian filters. In order to synthesize the final image (F), a maximization is performed for each pixel as given in (12): The outputs after applying the difference of Gaussian filter are H and V , which are further used in Tamura directionality equation given in (15).
Edge of a pixel is a vector, and it has a magnitude ( G) given in (13) and direction (θ ) given in (14): where (| H |) and (| V |) indicate the horizontal and vertical change in direction, respectively.
The histogram for directionality, i.e., H D , is obtained by quantizing θ (0 ≤ θ < π) and taking the count of pixels having a magnitude greater than a particular threshold. If the histogram has n peaks, then w i is the window of bins from previous valley to the next valley. ϕ i is the angular position of a peak in w i . At angular position ϕ, let H D (ϕ) be the bin height and the texture directionality D using the sharpness of H D [41] is calculated as given in (15):

Color features
The change in color is significant in pattern analysis. Three color models, namely RGB, CIEL × a × b and HSV color spaces, are used to identify the color information present within an image as they maintain a color difference ratio.
For choosing a color model, there are no pre-requisite criteria. Hence, the color features are extracted from RGB, CIEL × a × b and HSV color spaces. From the three color models, for each color channel corresponding to nine channels, six statistical features, namely skewness, variance, standard deviation, average, entropy and energy, are extracted. This leads to computation of 6 × 9 = 54 color features.

Classification
The features extracted are used by the classifier for predicting the glaucomatous versus normal class. Different classifiers may produce different errors; to overcome this, we can create an ensemble of classifiers to give accurate decision. The proposed method overcomes these errors by creating an ensemble of classifiers using dynamic selection techniques [43]. Dynamic selection methods can either be an ensemble of competent classifiers termed as dynamic ensemble selection (DES) or one classifier termed as dynamic classifier selection (DCS) [42,44].
In the proposed method instead of selecting one single classifier for all the dataset, dynamic selection is preferred, because it dynamically selects the most suitable classifier from a pool of classifiers for every test data. This makes the classification more flexible and efficient since each test data has a different pattern. Hence, dynamic selection is more suitable for handling imbalanced data and finding patterns from biomedical images. Dynamic selection-based classification also reduces the risk of overfitting and generalization.
The main consideration related to dynamic selection classifier is the hyperparameters. The hyperparameters should be chosen appropriately, as this plays a crucial role in improving classification accuracy.
For classification, two models are created: The first classification model is created by a homogeneous ensemble of random forest classifiers and the second classification model is created by an ensemble of heterogeneous classifiers. In the first classification model, a pool of 100 classifiers are considered which are a homogeneous ensemble of random forest classifiers. Random forest classifiers are considered since they are more diverse by the use of random samples and help in better predictive performance [46,47]. In order to train a random forest to be used as pool of classifiers, by experimenting on different values, the maximum depth of the tree is set to 5, so that it can estimate probabilities. In the second classification model, the system generates a pool of heterogeneous classifiers built by bagging and is composed of different classification models, namely: perceptron, Gaussian naïve Bayes, k-NN, Gaussian SVM and decision trees. The diversity is achieved due to the intrinsic properties of each classifier [42,48]. For these two classification models, four dynamic selection techniques such as overall local accuracy (OLA), multiple classifier behavior (MCB), metalearning for dynamic ensemble classification (META-DES) and dynamic ensemble selection performance (DES-P) [42,48,49] are applied for the selection of the classifiers.
The OLA and MCB techniques belong to DCS method. The META-DES and DES-P techniques belong to DES method [42,48,49]. For application of the dynamic selection techniques, the region of competence is determined as the set of nearest neighbors of the test sample in the training samples [42,[48][49][50]. The appropriate size of neighborhood is decided by experimenting on one dataset by considering several dynamic selection techniques, and the best value is reported [42,[48][49][50]. Hence, the proposed method uses five nearest neighbors of the test sample from the training set to precisely define the region of competence for the test sample.
In OLA, for each base classifier, the level of competence is computed as its accuracy of classification obtained in its region of competence and the classifier having the highest level of competence is selected in order to classify the test sample.
In MCB, the behavior knowledge space is used to filter and preselect from the region of competence. Then, the competence of the base classifier in the resulting region of competence is computed as its accuracy of classification. A single classifier is used for classification of the test sample, if its competence level is higher than all the base classifiers present in the pool. Otherwise, majority voting is used for determining the class of the test sample.
In META-DES, for each of the base classifiers five sets of meta-features are extracted, namely neighbor classification, posterior probability, overall local accuracy, output profile classification and classifiers confidence [49]. The major advantage of meta-leaning is that meta-features encode different multiple criteria which helps in estimating the level of competence of the base classifier. The meta-classifier is used to predict whether the base classifier is competent enough to classify the input test sample. A multilayer perceptron consisting of ten hidden neurons is used as a meta-classifier. The meta-classifier is trained by the meta-feature vector in the training phase. The training of the meta-classifier is stopped if there is no improvement in the performance for five consecutive epochs. The base classifiers considered as competent by the meta-classifier are considered, and their outputs are aggregated by majority voting to estimate the class of test sample. In order to handle tie-breaking, the class with highest posteriori probability is chosen as the class of test sample.
In DES-P, the competence of the base classifiers is estimated by the difference between the base classifier accuracy obtained in the region of competence and the performance of the random classifier. The random classifier is the classification model which randomly chooses the class with same probabilities. The base classifiers that have obtained a higher level of competence are chosen, and their outputs are aggregated by majority voting to estimate the class of test sample. A more detailed explanation of these selection techniques is found in [42,[48][49][50]. The two classification models are tested on 171 feature sets obtained from KMC, RIM and Drishti databases.

Results
A set of 70 percent is used in training and 30 percent is used for testing the classification models. The results reported are the average performance values of the classifier for three iterations. The average performance values of the classifier models are reported using the classification metrics: sensitivity, specificity and accuracy [17][18][19][20][21][22][23][24][25][26][27]. The definitions of these classification metrics are given in Eqs. (16)(17)(18).
where TP = true positive, TN = true negative, FP = false positive and FN = false negative. For classifiers, TP indicates that the classifier predicts correctly that the image is glaucoma. TN indicates that the classifier predicts correctly that the image is normal. FP indicates that the classifier prediction states that the image is normal but its correct label is glaucoma. FN indicates that the classifier prediction states that the image is glaucoma but its correct label is normal.  Table 1 gives the results of classification using homogeneous ensemble of random forest using dynamic selection methods for KMC dataset. Table 2 gives the results of classification using the heterogeneous ensemble of classifiers using dynamic selection methods for KMC dataset. Figure 7 illustrates the best receiver operating characteristics (ROC) curve for the best result obtained by ensemble of random forest using META-DES and DES-P dynamic selection method for KMC dataset.    Table 3 gives the results of classification using homogeneous ensemble of random forest using dynamic selection methods for RIM dataset. Table 4 gives the results of classification using the heterogeneous ensemble of classifiers using dynamic selection methods for RIM dataset. Figure 8 illustrates the ROC curve for the best result obtained by ensemble of random forest using META-DES and OLA dynamic selection method for RIM dataset. Table 5 gives the results of classification using homogeneous ensemble of random forest using dynamic selection methods for Drishti dataset. Table 6 gives the results of classification using the heterogeneous ensemble of classifiers using dynamic selection methods for Drishti dataset. Figure 9 illustrates the ROC curve for the best result obtained by ensemble of random forest using META-DES and DES-P dynamic selection method for Drishti dataset.

Results of classification for Drishti dataset
From Tables 1, 2, 3, 4, 5 and 6, it can be observed that the two classification models developed with four dynamic selection techniques produced good classification results. Figure 10 illustrates the best performance frequency of the dynamic selection methods for the two classification models. As shown in Fig. 10, the classifier model that is found to be more optimal for classification of normal and glaucoma class is ensemble of random forest with META-DES. This is because of the fact that META-DES uses several information sources (meta-features) while performing the dynamic selection scheme. Therefore, META-DES performs better classification among all the other dynamic selection methods. A stratified k-fold cross-validation is performed with k = 10 for the best result obtained with META-DES technique using ensemble of random forest. The obtained average accuracy for k = 10-fold cross-validation for KMC, Drishti and RIM dataset is 93%, 90% and 91%, respectively. The area under the curve (AUC) gives degree of separability between the normal and glaucoma classes. A good classification model will have an AUC near to 1, which means the measure of separability is good. The AUC for the best results obtained with META-DES using ensemble of random forest for KMC, Drishti and RIM dataset is 0.99, 0.99 and 0.94, respectively.

Generalization ability of the two classifier models
The generalization ability of the two classifier models is measured by concatenating the three datasets (KMC, RIM-ONE and Drishti) in order to create a single dataset. Tables 7 and 8 present the results of classification using ensemble of random forest and ensemble of heterogeneous classifiers using DCS and DES methods, respectively.

Comparative analysis
A comparative analysis for glaucoma detection with the existing methods which considers RIM and Drishti dataset is given in Tables 9 and 10, respectively. The best results reported for the proposed method for RIM and Drishti dataset belong to ensemble of random forest using META-DES dynamic ensemble selection technique. Most of the methods reported in Tables 9 and 10 extracted spatial information of pixels based on intensity or texture. Including clinical features based on OD can further improve the detection accuracy. Also, under-and over-segmentation can greatly affect the clinical features. Extracting both domain (clinical) and

Conclusion
A framework for computer-based automated detection of glaucoma is developed. As a pre-processing approach, the blood vessels are detected and segmented for accurately determining the region of interest. The use of statistical features for segmentation enhances the OD and OC region, thereby reducing the background variation. The threshold for OD segmentation is obtained using the decision tree classifier. This has increased the accuracy of OD segmentation, even for images where the OD is surrounded by exudates. The determined threshold value for segmentation is not specific to a particular dataset and results in efficient segmentation. Feature extraction includes both clinical and image texture features. For classification, two robust ensemble of classifier models using dynamic selection approaches is developed. The classifier having more competence level (DCS or DES) is considered for detecting the class of test sample. The proposed framework is developed on three glaucoma datasets. The performance of proposed framework is illustrated by the evaluation parameters. The best results are obtained using ensemble of random forest using META-DES dynamic ensemble selection technique. The average specificity, sensitivity and accuracy for glaucoma detection  The main reason for META-DES performing better in comparison with other dynamic selection methods is the meta-classifier uses meta-features in order to determine the capability of the classifiers used in creating the ensemble. A typical single classifier strategy focuses on using one classifier to generalize the full test dataset. While using a general ensemble of classifiers without dynamic selection methods, several classifiers are used, and the suitable classifier is chosen for classifying the entire test data. In dynamic ensemble models, flexibility is provided by dynamically redistributing a collection of multiple classifiers to each test data. Based on its performance in the competence region, the test sample picks the suitable ensemble of classifiers. This strategy accomplishes two goals: (i) It aids in the redistribution of the ensemble of classifiers for each test sample, preventing a whole test set from being over-generalized to a single classifier. (ii) The ensemble of classifiers is assigned to each test sample based on their performance in the neighborhood. This can aid in the selection of a suitable ensemble of classifiers for the test data. As a result, the dynamic ensemble selection algorithms can improve the performance of classifiers.
The comparative analysis indicate that the developed method performs better when compared with methods reported in the literature for respective datasets. Hence, the method can be deployed as a second opinion during glaucoma screening and in mass glaucoma detection.
The proposed classification method can be further implemented by using different prominent feature selection methods. Also, for comprehensive glaucoma analysis, automated glaucoma detection algorithms can be developed by using optical coherence tomography images. Several clinical features can also be explored for grading different stages of glaucoma.
Funding Open access funding provided by Manipal Academy of Higher Education, Manipal.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.