1 Introduction

Glaucoma is one of the predominant causes of visual disability globally, which accounts for more than 12% of overall blindness [1]. Glaucoma is a salient chronic eye disease that leads to irreversible vision loss if it is not detected and cured at earlier stages. It is an enlightened optic neuropathy visible within the macula and optic disc [2]. According to the World Health Organization (WHO), glaucoma cases will rise up to 76 million by 2020 [3], which is about 3-5% global occurrence of glaucoma for 40-80 years older people. Glaucoma progressively damages the optic nerve by degenerating the nerve fibers, which causes visual impairment leading to blindness [4]. Fig. 1 shows the severity levels of glaucoma in fundus images.

Fig. 1
figure 1

An example of various glaucoma severity levels: a normal disc b mild c moderate d severe

Glaucoma is primarily classified into two types based on the increased intraocular pressure (IOP): open-angle and angle-closure glaucoma. In both types of glaucoma, the liquid’s affluence termed as Aqueous Humor (AH) is congested and led to rising the IOP behind the eye and influencing the optic nerve head (ONH). Most of the existing studies utilized three common methods to diagnose glaucoma i.e., IOP measurement, visual field test, and ONH diagnosis. Early detection and continuous screening may lessen the blindness rate up to 50% [5], but manual screening is a tedious and time taking effort. Therefore, an automated method is essential for the detection of glaucoma.

Computer-aided diagnosis (CAD) is a cost-effective technique for early-stage glaucoma detection in retinal fundus images. It is important to develop a CAD system for glaucoma diagnosis to assist the ophthalmologists for a better screening process. Machine learning (ML) based glaucoma diagnosis systems achieved remarkable accuracy from 90%-98% based on handcrafted features and different classifier type, as given in Table 1. [6,7,8,9,10,11,12,13,14,15,16,17]. Usually, cup-to-disc ratio (CDR) is evaluated by applying various feature extraction techniques such as wavelet transform [12, 13], thresholding [18, 19] or high order spectral transforms [15, 16] on OD images. Then these manually extracted features are fed into ML classifiers like Support Vector Machine (SVM), Artificial Neural Network (ANN), k-nearest neighbor (kNN) and Random Forests (RF), etc. Although ML-based methods attained state-of-the-art performance results, the manual feature extraction and selection are time-consuming effort and based on the ophthalmologist’s subjectivity.

Table 1 State-of-the-art ML-based techniques for Glaucoma Diagnosis

Recently, deep learning (DL) has emerged as the most employed field for various tasks such as image classification [20], natural language processing [21], and medical image analysis [22]. A convolutional neural network (CNN) is the class of DL, which is commonly utilized for image classification [23].

Maheshwari et al. [24] developed a glaucoma diagnosis system by employing a local binary pattern (LBP) based on data augmentation and retinal fundus images. Initially, they extracted red, green, and blue channels of the fundus images separately and then employed LBP for data augmentation of each channel. Finally, the fusion-based technique is used to combine the decisions from the corresponding CNN model. A glaucoma diagnosis system based on the optic disc and cup localization has been introduced in [25]. An ML-based system is used for the segmentation of optic disc and cup and CNN based system for glaucoma diagnosis. The proposed method evaluated on two public retinal datasets DRISHTI and RIM-ONE achieved 0.96 accuracies. Kim et al. proposed an automatic diagnosis and localization of glaucoma by using deep learning. The proposed model is evaluated on a private dataset collected from Samsung medical center and achieved a high diagnostic accuracy of 96%. Moreover, they developed a web-based system Medinoid for automatic diagnosis and localization of glaucoma [26]. Cerentini et al. [27] proposed a glaucoma identification system based on GoogLeNet. They first extracted region of interest (ROI) and then applied a sliding window approach for glaucoma classification. The authors report 82.20% accuracy. Diaz-Pinto et al. [28] used various transfer learning based models (VGG-16, VGG-19, Inception-V3, ResNet-50 and Xception) for detection of glaucoma using retinal images. They evaluated the proposed on five public datasets and achieved an average accuracy of 73.5%, and AUC of 96%. A hybrid convolutional neural network and recurrent neural network based automatic glaucoma detection has been proposed in [29]. The proposed has shown significant performance for the both the spatial and temporal features extraction from fundus images. The hybrid CNN/RNN method is evaluated on 1810 fundus images and achieved an average accuracy of 94%. Serte et al. [30] proposed an ensemble of graph-based saliency and CNN for automatic glaucoma detection. The proposed approach employed graph-based saliency for cropping of optic disc, and then classification is performed using convolutional neural network. The proposed method is evaluated on public dataset comprising of 1542 retinal images, and achieved 0.88 of accuracy. A novel TWEEC model for glaucoma diagnosis based on deep learning models has been performed in [31]. The proposed model is designed to extract anatomical features of optic disc and adjoining blood vessels. The wavelet approximation and spatial fundus images sub-bands are used an input of the proposed model and results are compared with state-of-the-art CNN models. A visual saliency thresholding (VST) method for the extraction of optic disc along with ROI generation has been proposed in [32]. The saliency parameters are used optic disc detection, and compared with other segmentation techniques like Otsu thresholding and region growing. The proposed is evaluated on DRISHTIGS1 public dataset, and attained promising results.

Although CNN based systems leverage the automatic feature extraction and classification, the training of CNN requires a larger dataset. To overcome such limitations, transfer learning-based methods such as AlexNet, ResNet, VGGNet has shown significant performance on a larger ImageNet dataset with over 1000 classes [33]. In this paper, we proposed a novel deep learning-based multitask model ODGNet. The proposed model is composed of two major steps: optic disc localization and glaucoma detection. In the first phase, a saliency map is used to determine salient regions for OD localization based on the cascading localization method of deep learning. In the second phase, three CNN models (AlexNet and VGG-16, and ResNet-34) are used via transfer learning to classify the extracted optic disc into normal or glaucomatous. Alternatives to these three pre-trained models, the shallow CNN and the other variations of VGGNet and ResNet are also investigated. The proposed ODGNet is evaluated on five larger public retinal fundus datasets: ORIGA [34], HRF [35], DRIONS-DB [36], DR-HAGIS [37], and RIM-ONE [38]. Contrary to most of the established detection techniques, the proposed method employs saliency map incorporated with shallow CNN for accurate OD localization, and transfer learning based classification has achieved state-of-the-art performance in terms of accuracy, sensitivity, specificity, and AUC.

The main contributions of the paper are as follows:

  • A novel ODGNet method is proposed for appropriate OD localization and glaucoma diagnosis.

  • A deep learning-based cascading localization method is introduced which employs a saliency map for OD localization.

  • Three CNN models (AlexNet, VGGNet-16, and ResNet-34) are used via transfer learning to classify the localized OD into normal or glaucomatous.

  • ODGNet is evaluated on five larger retinal datasets ORIGA, HRF, DRIONS-DB, DR-HAGIS, and RIM-ONE.

  • The proposed model obtained the highest diagnostic performance on the ORIGA dataset with 95.75%, 94.90%, and 94.75% for accuracy, specificity, and sensitivity, respectively, and outperformed the baseline methods.

2 Materials and methods

The schematic of the ODGNet architecture is shown in Fig. 2. The proposed model is comprised of two main components, i.e., OD localization and Glaucoma classification. Firstly, the salient region is extracted via a saliency map and integrated with a shallow CNN model for faster and cost-effective OD localization. Then, the extracted OD region is further fed to pre-trained deep learning models (AlexNet, VGGNet, and ResNet) to differentiate between healthy and glaucomatous images. The steps are explained in the subsequent sections.

Fig. 2
figure 2

Detail description of the proposed framework for OD localization and Glaucoma classification. The OD localization is performed based on a saliency map incorporated with a shallow CNN model, and the Glaucoma classification is performed via transfer learning-based method VGGNet with hyperparameters settings. The performance of the proposed model is evaluated by using various performance matrices

2.1 Saliency map based OD localization

Saliency represents the identical features of the image, such as pixels and resolution that stand out to neighboring pixels [39]. The detection of salient regions within the image plays an important role in object recognition or segmentation. The detected salient regions produce salient maps with defined boundaries of the object. In this research, we detected salient regions by using a frequency-tuned approach as in [40].

In the frequency-tuned approach, low-level features of color and luminance are used to compute the saliency in images, which provide high-level saliency maps and easy implementation. The saliency map makes the salient region more prominent than the bounded regions as depicted in Fig. 3. As the salient map is used to make a brighter region concerning human vision characteristics, the object with high intensity is considered as the OD region. The saliency map falsely detects the OD because of noise factors such as brighter lesions or bright fringe of the retina. The noise factor is always conceivable because of the pathological changes in the retinal fundus images. The OD localization by using lonely saliency maps yields 91%, 92%, and 94% accuracy on HRF, DRIONS-DB, and ORIGA datasets, respectively. Thus, we employed shallow CNN on the salient region to determine whether the selected region is OD or not.

2.2 Shallow CNN-based OD region classification

Convolutional neural network (CNN) is the most prominent model of deep learning models, which has shown great strength for computer vision tasks such as image classification [26]. CNN is typically used to handle high-dimensional inputs and to learn and classify hierarchical features directly from the image without human intervention. The automated extracted features using end-to-end learning have achieved state-of-the-art accuracy for various fundus images classification, i.e., diabetic retinopathy detection [41] and cataract classification [42]. The architecture of CNN is composed of various layers, including the convolution layer, max-pooling layer, and fully connected layer. We preferred to use shallow networks than deep networks because of its simplicity, lesser computational time, and about the same degree of accuracy for OD feature representation and region classification. Although CNN is a powerful model to learn the features directly from the input image, scanning the whole fundus image, elapse more time and extra computational cost. In this paper, we extracted the salient region and then used shallow CNN to determine the region of interest is OD or not. Thus, it provides fast OD detection with promising accuracy. As the saliency maps are the most visible region of the fundus image, but sometimes the saliency map marks the salient region other than OD because of the low quality and pathological changes of the fundus image. In this study, we developed a shallow CNN model to determine the salient region into OD or non-OD region. A sliding window approach is used to train the shallow CNN model by sliding the whole image to select the patches with or without OD and the saliency map target the next salient region in case of the non-OD region. The saliency map incorporated with shallow CNN achieved 99.37%, 98.83%, 98.40%, 98.05%, and 99.05% of localization accuracy on ORIGA, DRIONS-DB, DR-HAGIS, HRF, and RIM-ONE datasets, respectively. Fig. 3 indicates the OD localization based on a saliency map incorporated with shallow CNN.

Fig. 3
figure 3

Visual analysis of OD localization based on the saliency map and saliency map incorporated with shallow CNN. The second row specifies the saliency map based OD localization, while the saliency map inbuilt with CNN is used for OD localization in the third row

2.3 Glaucoma classification

There are various deep learning-based studies have been proposed over the last few years for the automatic detection and classification of glaucoma [26, 43, 44]. In this study, we investigated three pre-trained CNNs: AlexNet [45], VGGNet [46], and ResNet [47] for automatic glaucoma diagnosis. The baseline VGGNet and ResNet use Stochastic Gradient Descent (SGD) optimizer, which has high fluctuate error and slow convergence. To overcome such issues, we uniformly applied the Adaptive Moment (ADAM) optimizer for all the employed networks. Furthermore, we performed various hyperparameters configurations to use these transfer learning-based methods. The details of these hyperparameters configurations are illustrated in Table 2.

Table 2 Hyperparameters configurations to use Transfer Learning-based Methods

2.3.1 AlexNet

AlexNet [45] won the first ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. This method attained 84.7% by surpassing the second-best method with 73.8% accuracy. The architecture of AlexNet was comprised of 8 hidden layers (5 convolutions and 3 fully connected layers) along with data augmentation that was used to increase the training dataset. The ReLU activation function was employed for vanishing gradient problem and the dropout layer was added to address overfitting problems.

2.3.2 VGGNet

VGGNet [46] is the project of the visual geometry group for object recognition and won the (ILSVRC) in 2014. There are different variations of VGGNet based on the number of network layers. The common variations of VGGNet are VGG-16 and VGG-19 and among them, VGG-16 has better top-1 accuracy. The basic VGG-16 has fixed 3*3 kernel size for all the convolution layers unlike the AlexNet with variable sizes of 5*5 and 11*11 which reduced 28% and 62.8% training variables, respectively.

2.3.3 ResNet

Residual neural network (ResNet) [47] won the 2015 ILSVRC for object localization and classification. ResNet introduced skip-connections (residual blocks) to address the issues of vanishing gradient, which is iterated all over the network. There are different variations of ResNet based on the number of layers, such as ResNet-18, ResNet-50, and ResNet-101. In this study, we employed ResNet-34 because of less validation error and to restrict trainable parameters. ResNet-34 obtained 7.40% top-5 error rate on ImageNet dataset.

3 Dataset and experiments

In this paper, we used five state-of-the-art datasets to evaluate the performance of the ODGNet method. A total of 958 public retinal fundus images (665 healthy, 293 glaucomatous images) were collected from different resources. The details of these public datasets are given in Table 3.

Table 3 Retinal Images Dataset distribution into Normal and Glaucomatous Images

3.1 Datasets

  1. 1.

    ORIGA (Online Retinal Images for Glaucoma Analysis): It is one of the largest retinal fundus image datasets for glaucoma detection [34]. This dataset has been used as a standard dataset for various recent studies. The dataset was obtained from the Singapore Eye Research Center and comprised 650 images (168 glaucoma affected people and 468 healthy people). Besides the ground truth fundus images, it also provides a manual segmented disc and cup for all the images. A cup-to-disc (CDR) ratio with labels of normal and glaucomatous images are also given with this dataset.

  2. 2.

    HRF (High-Resolution Fundus): This retinal image dataset was acquired from Ophthalmology Department, Friedrich-Alexander University, Germany [35]. The dataset provides 15 images for normal, glaucomatous, and DR, along with gold-standard vessel segmentation traced by experts.

  3. 3.

    DRIONS-DB (Digital Retinal Image for Optic Nerve Segmentation Database): It is a public retinal images dataset for ONH segmentation [36]. The dataset was collected from Ophthalmology Service, Spain, and comprised 110 retinal images (50 normal and 60 glaucomatous images). This dataset also provides independent ONH contours verified by two retinal experts using software tools.

  4. 4.

    DR HAGIS (Diabetic Retinopathy Hypertension Age-related Macular Degeneration and Glaucoma Images): This dataset comprised 40 images taken for DR screening program by Health Intelligence, Sandbach, UK [37]. The first subgroup of the dataset is for glaucoma analysis, which consists of 10 retinal images. The dataset also provides masks and ground truth images by expert graders.

  5. 5.

    RIM-ONE (Retinal Images for Optic Nerve Evaluation): It is a retinal images database specially designed for glaucoma analysis [38]. The dataset was developed jointly by three hospitals in Spain. RIM-ONE contains 169 non-mydriatic fundus images and their ONH images (118 normal, 40 glaucomatous, 11 ocular hypertension images). The dataset also provides manual ONH segmentation as a gold standard by 5 experts.

3.2 Implementation

We performed experimentation of the proposed ODGNet model on Intel E5-2620 CPU, NVIDIA Tesla M2090 GPU, using Python 3 and Keras library running on top of Tensorflow.

3.3 Performance evaluation

The evaluation metrics such as accuracy (ACC), sensitivity (Sn), specificity (Sp), precision, F1-score, and area under the curve (AUC) are used to measure the performance of the proposed method evaluated on various retinal images datasets. Sensitivity and specificity represent the proportion of correctly identified glaucomatous and normal images, respectively. Accuracy is the ratio between correctly identified images (either normal or glaucomatous) and the total number of images. F1-score embodies the harmonic mean between precision and recall. The receiver operator characteristics (ROC) curve can be visually plotted with the help of the true positive rate (TPR) and true negative rate (TNR). The area under the curve (AUC) delimits the proposed models’ prediction ability [13]. The mathematical expressions of these metrics are given as follows:

$$\begin{aligned}&Recall=Sensitivity (Sn)= \frac{TP}{(TP+FN)} \end{aligned}$$
(1)
$$\begin{aligned}&Specificity (Sp)= \frac{TN}{(TN+FP)} \end{aligned}$$
(2)
$$\begin{aligned}&Precision= \frac{TP}{(TP+FP)} \end{aligned}$$
(3)
$$\begin{aligned}&F1-Score= 2.\frac{(Precision.SEN)}{(Precision+SEN)} \end{aligned}$$
(4)
$$\begin{aligned}&Accuracy (ACC)= \frac{(TP+TN)}{(TP+FP+TN+FN)} \end{aligned}$$
(5)

Where TP, FP, FN, and TN symbolize true positive, false positive, false negative, and true negative, respectively, and the terms TP and TN represent the glaucoma images and normal images correctly predicted as glaucomatous and normal images. Simultaneously, the FP and FN denote the normal and glaucomatous image incorrectly classified as a glaucomatous and normal image, respectively.

4 Results and discussions

Our proposed method focused on two-stages to improve model generalizability. In the first stage, the OD region is localized using a saliency map incorporated with a shallow-CNN model. In the second stage, transfer learning-based three pre-trained models AlexNet, ResNet, and VGGNet are used to classify the retinal fundus image as normal or glaucomatous. Alternatives to these three pre-trained models, the shallow CNN and the other variations of VGGNet and ResNet are also investigated. Although, CNN leverages automatic feature extraction and classification directly from the input image, which is comparatively slow and has a high computational cost. The visual saliency map with CNN provides faster and more accurate OD localization.

The experimental results of these methods evaluated on different datasets are mentioned in Table 4. The proposed saliency map-based localization incorporated with shallow CNN are more promising compared to the baseline methods evaluated on five public datasets. The saliency map-based OD localization by incorporating various methods is shown in Fig. 4.

Table 4 The Experimental Analysis of Localization Accuracy Evaluated on Five Public Datasets
Fig. 4
figure 4

A saliency map-based localization incorporating various methods including SVM, AlexNet, ResNet, VGGNet, and Shallow CNN. The extracted region of interest (ROI) with the saliency map’s ensemble and shallow CNN is considered for glaucoma diagnosis

After extracting ROI, the glaucoma classification is performed by employing transfer learning methods, i.e., AlexNet, ResNet, and VGGNet. The statistical measures such as accuracy (Acc), sensitivity (Sen), specificity (Spe), precision (Prc), F1-score, and AUC are used to measure the proposed method’s performance evaluated on five public retinal datasets, as mentioned in Table 5.

Table 5 Performance Evaluation of Transfer Learning Methods on Extracted ROI

The proposed method’s effectiveness in terms of accuracy, sensitivity, and specificity for these public retinal datasets is illustrated in Fig. 5. It can be observed that our method evaluated on the ORIGA dataset has shown more promising results with the highest values of 95.75, 94.90, and 94.75 for accuracy, sensitivity, and specificity, respectively.

Fig. 5
figure 5

The effectiveness of the proposed glaucoma diagnosis system for ORIGA, DRIONS-DB, DR-HAGIS, HRF, and RIM-ONE

An alternative to the proposed ensemble of saliency map with shallow CNN and VGGNet-16 model, we investigated various transfer learning methods and their variations. The comparative analysis of various TL-based methods and their variations is shown in Fig. 6. The proposed model’s performance evaluated on the ORIGA dataset achieved the highest accuracy of 95.75%. Fig. 7 shows the system’s confusion matrices using the shallow CNN and the VGGNet-16 on five public datasets.

Fig. 6
figure 6

Accuracy comparison of various transfer learning methods and their variations evaluated on the ORIGA dataset. The proposed ensemble of the saliency map, shallow-CNN, and VGGNet-16 yields better accuracy

Fig. 7
figure 7

Confusion matrix of the proposed system evaluated on ORIGA, DRIONS-DB, DR-HAGIS, HRF, and RIM-ONE, respectively

To measure the effectiveness of the proposed model, we compared our results with some state-of-the-art methods. A comparative study of baseline methods for glaucoma diagnosis evaluated on the ORIGA dataset is given in Table 6. The graphical representation of this comparison is shown in Fig. 8.

Table 6 Obtained results for ORIGA dataset using various methods and variations represented in accuracy
Fig. 8
figure 8

Accuracy comparison of the system

5 Conclusion

In this paper, a novel ODGNet method has been proposed for OD localization and glaucoma classification. For optic disc localization, a saliency map roughly determines the salient region, and then shallow CNN is used to differentiate between OD and non-OD regions. The segmented OD regions are fed to transfer learning models such as AlexNet, ResNet, and VGGNet for glaucoma diagnosis. The experimental results indicates that the performance of the proposed ODGNet improves by employing saliency map incorporated with shallow CNN. The proposed approach yields 95.75% accuracy, which can assist the ophthalmologists in reducing the burden on mass screening. In future, we are intended to integrate both the handcrafted and automatic features for glaucoma classification, and to expand the proposed study on other retinal diseases.