Abstract
Deep neural network has achieved promising results for automatic glaucoma detection on fundus images. Nevertheless, the intrinsic discrepancy across glaucoma datasets is challenging for the data-driven neural network approaches. This discrepancy leads to the domain gap that affects model performance and declines model generalization capability. Existing domain adaptation-based transfer learning methods mostly fine-tune pretrained models on target domains to reduce the domain gap. However, this feature learning-based adaptation method is implicit, and it is not an optimal solution for transfer learning on the diverse glaucoma datasets. In this paper, we propose a mixup domain adaptation (mixDA) method that bridges domain adaptation with domain mixup to improve model performance across divergent glaucoma datasets. Specifically, the domain adaptation reduces the domain gap of glaucoma datasets in transfer learning with an explicit adaptation manner. Meanwhile, the domain mixup further minimizes the risk of outliers after domain adaptation and improves the model generalization capability. Extensive experiments show the superiority of our mixDA on several public glaucoma datasets. Moreover, our method outperforms state-of-the-art methods by a large margin on four glaucoma datasets: REFUGE, LAG, ORIGA, and RIM-ONE.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Glaucoma is a group of eye conditions about damages to the optic nerve and vision, which is mainly diagnosed by ophthalmologists looking at the fundus images. Since glaucoma is an optic nerve-related disease, most existing works diagnose glaucoma disease by computing the optic cup-to-disk ratio automatically with deep neural networks [1]. The researchers also found that the ganglion cells and nerve fiber [2, 3] are strongly related to glaucoma detection in their early stage, which provides glaucoma new diagnosis indicators. Figure 1 illustrates the glaucoma fundus image with marked features and a comparison of healthy vision and glaucoma vision.
Recently, lots of glaucoma datasets have been released to the research community [4, 5] to accelerate the rapid development of data-driven deep neural methods on glaucoma detection. Actually, most of the available glaucoma datasets are in low volume that the medical images collection and annotation are always more challenging than the natural images. MT-UDA improves model performance by introducing binocular correlation in diabetic severity grading [6]. In addition, some medical images even cannot be released due to privacy protection. Therefore, some glaucoma datasets only have tens of glaucoma samples, e.g., 10 and 15 glaucoma samples in the whole datasets of DRHAGIS [7] and HRF [8], respectively. The recently proposed REFUGE [4] dataset also only has 40 glaucoma data samples in its training set. Thus, it is not easy to apply traditional machine learning algorithms in these low-resource data scenarios. Some works employ transfer learning to alleviate the low-resource training difficulty in deep models. However, this transfer learning paradigm is still hard-code finetuning without considering the domain discrepancies across glaucoma datasets.
Another challenge is domain divergences across different datasets that different optometrists collect those datasets from diverse devices, such as REFUGE [4] from Zeiss Visucam-500, RIM-ONE [9] from Kowa WX-3D, and ACRIMA [10] from Topcon TRC. Those differences directly result in the intrinsic discrepancies of the fundus images across glaucoma datasets, including image quality, lightness variations, resolution discrepancy, viewpoint changes, etc. All those discrepancies enlarge the domain gap between the aforementioned glaucoma datasets. Existing works point out the stable network input with similar data distribution helps the convergence of deep model as well as benefits the model performance in prediction [11, 12]. In contrast, the domain divergence of glaucoma dataset increases the instability of deep model inputs and leads to model performance declines, which are generally exist across glaucoma datasets and mostly be neglected in the research of glaucoma detection.
Transfer learning is widely used in low-resource learning scenario as well as reduces the domain gap between the source and target domains [13]. Specifically, some domain adaptation works make efforts to reduce the domain discrepancies through learning a shared feature space to represent multiple domains (e.g., feature sharing [14], domain confusion [15, 16]). Similarly, the feature disentanglement methods take efforts to reduce domain gap from another aspect that by learning a domain-specific feature representation for different domains [17]. Nevertheless, all those adaptation approaches conduct domain adaptation through implicit learning paradigm, which is hard to guarantee the adaptation explicitly. Moreover, the aforementioned learning-based adaptation methods neglect the outliers samples learning in adapted domains, which tend to be classified incorrectly. Therefore, how to effectively conduct transfer learning on outlier is urgently needed for glaucoma detection.
To address the aforementioned issues of transfer learning in glaucoma detection task, we propose a mixup domain adaptation (mixDA) to bridge the domain gaps across glaucoma datasets with an explicit domain adaptation manner. Figure 2 shows the overview of our mixDA, which integrates domain adaption with domain mixup into one framework with an enhanced outlier-learning capability. Generally, the source domains have significant gaps to the target domain in transfer learning that have the discriminated fundus images and divergent data distributions. Figure 2 also presents the domain gap by visualizing the data distributions of glaucoma datasets. To reduce the domain gap of the source domain to a target domain, the domain adaptation (DA) of our mixDA conducts data transformation from the source domain to the target domain in an explicit manner, which avoids the implicit adaptation of feature-learning-based approaches. Moreover, our mixDA improves the hard-code transfer learning paradigm by integrating mixup into domain adaptation, which mixes the original data with adapted data into a new mixup sample to enhance its generalized learning capability.
Overview of mixup domain adaptation (mixDA). Our mixDA contains of two key modules: domain adaptation (DA) and domain mixup (Mixup). The pipeline of distributions illustrates the data changing after different modules. Here, “Source-1” is the data distribution of ORIGA train set, “Source-2” is the data distribution of REFUGE training set, and “Target” is the data distribution of REFUGE validation set
To further address the low-resource data issue of the deep model in transfer learning, our mixDA extends the vanilla mixup [18] from inter-domain to cross-domain fashion. Moreover, mixDA formulates the mixup of DA and cross-domain mixup into one uniform domain mixup fashion. This generalized domain mixup enhances our model learning capability on the outliers (vicinal samples) in the adapted distribution. Correspondingly, Fig. 2 shows the adapted source domain still existing outliers between the adapted source domain and the target domain, which also called vicinal sample or adversarial sample in [19]. Thereby, our domain mixup is a cross-domain fashion that mixes the vicinal samples with other samples to improve the model generalization capability. Different from the hard-code finetuning method that directly conducts tuning on the target domain and neglects the domain gap, our domain mixup fills the small discrepancy gaps between the adapted domain and the target domain with a soft-filling manner. Benefit from the generalized learning capability, our domain mixup not only reduces the domain adaption discrepancies but also bridges the domain gaps of glaucoma datasets with a manner of smoothly gap filling. Lastly, we also want to claim that our mixDA is a backbone-free approach that the model performance can be enhanced with stronger backbones.
We conclude the main contributions as follows:
-
1
Mixup domain adaptation (mixDA) optimizes the implicit adaption manner of existing transfer learning paradigms with an explicit domain adaptation manner to improve model performance in diverse glaucoma detection datasets.
-
2
mixDA unifies the inter-domain and cross-domain mixup into one uniform fashion, which reduces the domain gap across glaucoma datasets and enhances model generic performance by minimizing the vicinal risk of adapted domain outliers.
-
3
Extensive experiments show the superiority of our mixDA over state-of-the-art glaucoma detection baselines on several public glaucoma datasets.
2 Related works
2.1 Glaucoma detection
The ophthalmologist generally diagnoses the glaucoma disease by measuring the optic cup-disk-ratio manually on the fundus images. Some early glaucoma detection research works employ deep neural networks to automatically learn the changes of optic cup and optic disk to help ophthalmologist diagnose glaucoma [20]. The later research works on glaucoma detection focus on the optical disk area (region of interest, ROI) instead of the whole original image. To further employ the prior knowledge, an intuitive solution is the two-staged detection paradigm, which segments the ROI first from the fundus image, then employs the advanced deep learning backbone to perform the glaucoma classification on the cropped fundus images. Concretely, [1], respectively, employs the DeepLab and MobileNet as its feature encoder and segmentation module to perform glaucoma detection. Inspired by the attentive networks for diabetic retinopathy grading [21, 22], AMNet [23] similarly utilizes a Faster-RCNN as their segmentation module to crop the optic disk area, then multiple the segmented mask with fundus image as an attentive glaucoma detection approach. Therefore, more and more detect glaucoma works follow the two-stage detection paradigm in their glaucoma detection pipeline. There also have some works make efforts from other aspects; SenBr [24] proposes a multi-branch network to distinguish the difficulty of training data, which aims to pay more attention to the hard samples learning. EGDCL learns the hard samples with curriculum learning to relieve the data bias issue in glaucoma datasets [25]. Recently, some ophthalmologists found that outside of the optic disk area, the apoptosis retinal cells [2] and the retinal nerve fiber layer [3] are also strongly related to glaucoma in its early stage. Those findings uncover that the ROI area (optic nerve head area) is not the sole indicator for glaucoma detection, which is mostly neglected in previous research. Moreover, the data discrepancy across different glaucoma datasets and low training samples further increases the challenge for the glaucoma detection task.
2.2 Transfer learning
Transfer learning is a sub-category of machine learning that transfers the learned knowledge from the source domain to the target domain. The pretrained models, obtained knowledge by training on source dataset, have greatly improved the model performance in various downstream tasks [16, 26,27,28]. Domain gaps typically exist between the source and target domains, significantly impacting the model's performance. Fortunately, through domain adaptation in transfer learning, the influence of these domain gaps can be alleviated. One representative domain adaptation approach is universal feature-based transfer learning, which assumes the different domains can be represented by learning a group of universal features. And this universal feature can bridge the domain gaps of source and target domains. Specifically, DAN [14] builds a deep adaptive neural network with a domain-shared encoder and domain alignment decoder in its transfer learning pipeline. AdaBN extends the DAN feature alignment from the specific decoder layers to the whole network to further reduce the domain learning gap [29, 30]. Moreover, the partial alignment methods further relieve the feature alignment and consider the domain intrinsic discrepancy to improve their downstream tasks, e.g., feature partial alignment adaptation [31], memory-assistant discriminative learning [32], selective feature alignment [33].
Besides, there are also some other works solving domain gaps of transfer learning from other aspects. Concretely, DANN [34] proposes inverse gradient learning to reduce the discrimination of source and target dataset. Adversarial-based transfer learning achieved great success in domain adaptation [35, 36] that generates adversarial data samples to confuse the discriminator to help deep model align different domains. Some works also try to improve the domain adaptive from partial alignment [37] with practical conditions [38]. Similarly, the reconstruct-based method [39] alternates the data generation by data reconstruction in their transfer learning pipeline. Unlike the learning-based methods conducting domain adaptation implicitly, FAD [40] solves the domain adaptation from frequency spectrum instead of feature alignment by swapping the source and target domain low-frequency spectrum. However, all aforementioned transfer learning methods neglect the vicinal risk (outliers) after domain adaptation.
3 Methodology
In this section, we are going to present our proposed method termed mixup domain adaptation (mixDA) that mainly consists of two parts, i.e., domain adaptation and domain mixup. For the domain adaptation, we employ the Fourier domain adaptation or histogram-matching domain adaptation to explicit adapt the source fundus images to the target domain. Then, our domain adaptation conducts a mixup of the adapted sample with the same category sample into a new sample to improve the hard-code transfer learning. For the domain mixup, we mixed the adapted source domains with the target domain together, which increases model generalization capability on discrepancy outlier data points and minimizes the vicinal risk [18]. More importantly, we formulate the inter-domain mixup of domain adaptation and the cross-domain mixup of domain mixup into one uniform framework in our mixDA. In the following parts, we will first present the domain gap across glaucoma datasets and then introduce the domain adaptation and domain mixup of our mixDA separately.
3.1 Domain gap
Suppose we are given a source dataset \({\mathcal {D}}^S = \{({\textbf{x}}_i^S, {\textbf{y}}_i^S)\}_{i=1}^{N_S}\), where \({\textbf{x}}^S \in {\mathbb {R}}^{H\times W \times 3}\) is a fundus images from source dataset and \(y^S \in \{0,1\}\) is the label associate with \({\textbf{x}}^S\). Similarly, we can define the target domain dataset \({\mathcal {D}}^R = \{ ({\textbf{x}}_i^R, {\textbf{y}}_i^R\}_{i=1}^{N_R}\). To facility view those domain gaps straightforwardly, we employ the data possibility distributions with kernel density estimation to visualize the data distributions across glaucoma datasets, which computes by:
where h is the smoothing parameter of kernel K, and \({\textbf{x}}\) is the given point of estimated density \({\mathcal {F}}_h\). Here, we visualize the data distributions of glaucoma training dataset by computing the mean of image samples.
From Fig. 3, we can clearly observe domain gaps among 12 public glaucoma dataset, which depart from the primary principle of learning theory that all the training and prediction data samples should follow a consistent distribution. From the distributions, most datasets are in a Gaussian-like distribution, except the dataset of IEEE1450 (1450) and DCGAN. All these domains apart from each other, where the gaps hinder the generalization performance and transfer learning of deep models. Thereby, our mixup domain adaptation (mixDA) is proposed to address those negative impacts of domain gaps in the glaucoma detection task.
3.2 Domain adaptation
Domain adaptation (DA) is the first step in pipeline of our mixDA, which aims to coordinate the source domain to the target domain on the data distribution. In other words, all samples from different domains are adapted to the target domain distribution as a stable input for the deep models training in an explicit domain adaptation manner. Concretely, our mixDA employs the out-of-shelf methods, Fourier Domain Adaptation (FDA) [40] and Histogram-matching Domain Adaptation (HDA) [41], as our domain adaptation backbones. Besides, our mixup domain adaptation introduces the inter-domain mixup to increase model generalization capability and reduces vicinal risks on vicinal samples outside of target domain distribution.
The significant data distribution gap causes the model performing well on the source dataset while performing poorly in the discriminated target datasets. Domain adaptation is a straightforward solution that keeps the inputs into a consistent distribution. Concretely, our mixDA introduces mixup Fourier domain adaptation (mFDA) and mixup histogram-matching domain adaption (mHDA) to reduce the domain gaps. Here, we introduce the mFDA first.
The pipeline of mixup Fourier domain adaptation (mFDA). “FFT” and “IFFT” are the fast Fourier transformation and inverse fast Fourier transformation. “\({\textbf{x}}, \check{{\textbf{x}}}, {\hat{{\textbf{x}}}}\) denote the image of source, FDA and HDA correspondingly. “\({\mathcal {M}}_{\alpha }\)” is the mixup parameter of mFDA
Figure 4 shows the pipeline of mFDA from the source domain to the target domain with mixup. In the fast Fourier transformation (FFT), the fundus image is transformed into frequency information \(F({\textbf{x}}(m,n))\), and for each color channel, it computes as follows,
where \(i^2 = -1\). Then, we can compute the amplitude spectrum (\({\mathcal {F}}_{A}\)) and phase spectrum (\({\mathcal {F}}_{P}\)) by,
The amplitude spectrum and phase spectrum are, respectively, store the features of relative brightness and object boundaries. Regarding changing lightness is not strong affect image information than changing the object boundaries [42], and fundus images also have the lightness issue across different glaucoma datasets. To relieve the image light issue, our mFDA conduces domain adaptation by transferring the amplitude spectrum information with a masked ratio of \(M =0.1\) from the target domain to the source domain:
Then, we map the adapted amplitude spectrum and original phase spectrum back to the fundus image with inverse Fourier transform (\({\mathcal {F}}^{-1}\)), as follows,
The last step of mFDA is its inter-domain mixup (Eq. 7), which conducts the sample mixup of samples from the same category,
where \({\textbf{x}}^{*} \in \{{\textbf{x}}; \check{{\textbf{x}}}; {\hat{{\textbf{x}}}} \}\), and \({\mathcal {M}}_{\alpha } \in (0,1)\).
Comparing with mFDA conducts adaptation on the amplitude spectrum, mixup histogram-matching domain adaptation (mHDA) conducts mixup operation on its histogram. Let \(P_r\) denotes the possibility density function of source domain,
where \(N({\textbf{x}}(m,n))\) denotes \({\textbf{x}}(m,n)\) with value \(r_j\). We can compute the cumulative distribution function (\({\mathcal {S}}({\textbf{x}}_j)\)) of source domain, as follows,
Similarly, the target domain defines its possibility density function \(P_z(r_j)\) and cumulative distribution function \({\mathcal {G}}({\textbf{z}}_j)\). So, the mHDA can bridge the domain gap between source and target through equalling the function of source and target cumulative distribution:
Thereby, the transformation of domain adaptation of our mHDA is computed by,
In this way, all pixels of fundus images are mapped to the target domain distribution by Eq. 11. For the last step, mHDA follows mFDA that mixes the inter-domain samples of the same category.
3.3 Domain mixup
After the domain adaptation of mFDA and mHDA, the fundus images of the source domain are generally adapted to the target domain. However, most glaucoma dataset has low-resource training data as well as some discrepancies exist outside the target domain distribution, which is named vicinal samples and adversarial examples. To solve the low-resource issue and minimize the vicinal risk of the discrepancies in domain adaptation, our mixDA introduces the domain mixup to further improve the learning capability on vicinal samples by mixing different domains.
From Fig. 5, we can observe the vanilla mixup method conducts mixup on the target domain images only, which we called the inter-domain mixup. Different from the vanilla mixup, mFDA/mHDA and mixDA perform cross-domain mixup that mix the data samples from different domains (i.e., domain 1 mixup with target domain). Note, the main difference is that mFDA/mHDA only do mixup on the same categories (the same colored y). In contrast, our mixDA is a generalized version, which performs cross-domain mixing not only on different domains but also on different categories. Moreover, our domain mixup formulates the mFDA and mHDA with cross-domain mixup into one uniform computation, as follows,
where \({\textbf{x}}^{*} \in \{ \check{{\textbf{x}}}; {\hat{{\textbf{x}}}}; {\textbf{x}}\}\) and \({\textbf{y}}^{*}\) is the corresponding label to \({\textbf{x}}^{*}\). From the equations, we can learn the domain mixup of our mixDA extends to different categories instead of the same category as well as their labels (i.e., \({\textbf{x}}^{*}_i\) and \({\textbf{x}}^{*}_j\) can be either the glaucoma sample or none-glaucoma sample). More importantly, this mixed data is also the new augmented data to release the low-resource data issue of glaucoma datasets.
To straightforward illustrate the intuitive differences of our domain mixup with transfer learning and vanilla mixup, Fig. 6 illustrates a straightforward comparison of transfer learning, vanilla domain mixup and our mixDA. Specifically, the transfer learning directly pushes the source pretrained model to perform hard-code finetuning on the target domain without considering the existing domain gap. The vanilla mixup tries to reduce the domain gap by mixing the samples of source and target domains. But the domain gap is still large if the source domain and target domain distribution has a far distance. Our mixDA firstly bridges the domain gap by domain adaptation and then conducts the domain mixup to further reduce the vicinal discrepancies between the adapted and target domains.
4 Experiments
4.1 Datasets
We evaluate our mixDA and conduct experiments on 12 public glaucoma datasets. The dataset overview information is summarized in Table 1. Some glaucoma datasets have multiple resolutions, where we only demonstrate one of them.
From the numbers of Table 1, we can observe that the huge discrepancies across glaucoma datasets. In details, the volume and partition are different that some glaucoma datasets are low-resource ones with only hundred samples, e.g., HRF (High Resolution Fundus), DRHAGIS (DRH), ACRIMA, and DRISHTI (DRISHTI-GS). Moreover, the image resolutions are changed across datasets. Since the original fundus images with the large image resolutions, some datasets made the image process that cropped the original large image into a smaller resolution with only ROI areas reserved, e.g., HPD (Harvard Processed Data), RIM (RIM-ONE), and ACRIMA. Last but not least is the intrinsic difference of fundus image, such as light, image processing, and camera hardware, which further increase the domain gaps of glaucoma datasets. All those data discrepancies increase the challenge of glaucoma detection with data-driven deep models.
4.2 Results
This section reports the state-of-the-art performance of our mixup domain adaptation on four public glaucoma datasets (REFUGE, LAG, ORIGA-light, and RIM-ONE). Following the previous glaucoma detection works [4, 10], we employ accuracy, sensitivity, specificity, and area under curve (AUC) as our evaluation metrics in the glaucoma detection task. Note, due to the imbalanced data distributions on different glaucoma datasets (i.e., the REFUGE is imbalanced 40 (glaucoma cases) vs 360 (healthy cases). Most research works [4, 50,51,52] employ the “AUC” as their main evaluation metric instead of the sensitive metrics (such as “accuracy, specificity and sensitivity”) in the imbalanced distribution datasets. In our experimental parts, we also provide the “accuracy, specificity and sensitivity” for the readers’ reference.
We first evaluate mixDA on the REFUGE, which exists a discriminated domain gap between the train and validation sets. To improve the glaucoma detection performance on the REFUGE, some existing works try to introduce the modules of multi-task learning (Masker, AMNet), feature fusion(FusionBr, SenBr), and model ensemble (EnsembleTL, Masker) to improve their model learning capability. Meanwhile, the two-stage glaucoma detection methods also make efforts by introducing the prior knowledge that glaucoma disease is strongly related to the optic nerve areas. Thereby, the two-stage methods (SDSIRC, CUHKMED) conduct feature learning on the RIO cropped fundus image instead of the original ones. What’s more, the SOTA method VRT employs an attentive neural network conducting discriminated learning and achieves the best performance on REFUGE. Following the official REFUGE setting, we summarize the experimental reports as follows,
Table 2 provides a performance summary of existing SOTAs on REFUGE. With the help of domain adaptation and mixup, our mixDA achieves the best performance on REFUGE with a 0.9901 AUC score. Unlike the aforementioned methods of improving model performance by feature fusion or ROI area cropping, our mixDA pays more attention to solving the intrinsic dataset issue of domain gaps and mixup learning to improve model performance. Meanwhile, our mixDA employs the extra datasets i.e., LAG, DCGAN into domain mixup learning as well as conducting multi-task learning to further improve the model performance from the previous SOTA AUC score 0.9885 to 0.9901. Our performance is also with a cost of a sensitive decline that REFUGE is an imbalanced dataset with a strong sensitivity fluctuation. Overall, our mixDA is a better solution in the domain discrepancy dataset with the official AUC criteria.
The fundus images of dataset LAG are cropped in a unified resolution and near the optical nerve head. Meanwhile, the LAG is the largest dataset than other glaucoma datasets listed in Table 1. In the experiments of LAG, our mixDA employed the cropped DCGAN dataset as the extra dataset in the domain mixup learning. The detailed results are reported as follows,
From the results of Table 3, our mixDA also achieves the best performance with an AUC score of 0.9953 on LAG than the other SOTA methods. Compared with REFUGE, we can observe the leading SOTAs have higher AUC scores around 0.99 than the REFUGE between \(0.97 \sim 0.98\). For the baselines, EGDCL introduced adaptive curriculum learning to help unbiased glaucoma diagnosis, Auxiliary-PSD and Transductive are the teacher-student learning models. All of where are all surpassed 0.99 AUC. The intuitive reason is LAG has much more training samples and similar data distributions in its train and test sets. Compared with the method of DCGAN pretrained on the same dataset, our generalized mixDA boosts \(2\%\) AUC improvements than DCGAN. Moreover, we also evaluate mixDA with different sources, i.e., REFUGE and ORIGA. mixDA achieved a competitive performance with the accuracy and AUC of 0.9720 and 0.9941, respectively. This performance further verifies the strong learning capability of mixDA, even trained on different sources.
In the evaluation of ORIGA dataset, we found there are different evaluation settings in the previous works, which cannot directly make comparisons. To filling the incomparable gap of previous works and providing a summarized baseline, we follow researchers on ORIGA dataset ( [61, 62, 51]) that evaluate ORIGA in three settings: two random partition and 10-fold cross-validation. For the baselines, DCNN and ReconstructNN are the early glaucoma works that introduce deep neural network into glaucoma classification tasks. Holistic+Local, SVM, and SVM+SMOTE are the classical machine learning solutions. M-Net, joint U-Net, and M-Net+PT are all based on U-shape-like neural network for glaucoma classification. All detailed experimental results are reported in Table 4.
From the results of Table 4, we can observe the overview performance of AUC scores is below 0.90, which indicates ORIGA is a more challenging dataset than REFUGE and LAG. With the help of domain mixup, our mixDA achieves consistent superior performance in all experimental settings. Specifically, the first experimental setting with only 99 training samples, our mixDA with the help of transfer learning on extra sources achieves \(5\%\) improvements than the previous works of DCNN and ReconstructNN. The secondary experimental setting splits more samples for training, which helps M-Net achieves higher AUC scores of 0.8508, but still far behind ours 0.8857. The last experimental setting is 10-fold cross-validation. Our mixDA consistent superior to the SVM+SMOTE.
Following the setting of EGDCL, we evaluate the mixDA on RIM-ONE-R1. Different from LAG cropped on optic nerve head area, the RIM-ONE only crops the areas near optical disk with a relatively smaller resolution.
Table 5 shows the summary of different methods on RIM-ONE. With low generalized backbones, DENet and GON limit their performance to 0.574 and 0.681, respectively. MCL-NET and DCNN improve their model performance superior 0.8 with the help of advanced models but still inferior to the generative model of AG-CNN 0.916. EGDCL introduces the adaptive curriculum learning and pushes the AUC score to 0.976. Different from EGDCL, our mixDA not only consider the domain adaptation but also consider the vicinal samples (also called adversarial samples in generative model [18]) to further improve \(2\%\) of AUC to 0.9933. Furthermore, the accuracy, sensitivity, and specificity all surpass other methods on RIM-ONE.
5 Discussions
In this section, we extend the explorations of mixDA from different aspects: ablation study, transferability, generalization performance, mixDA variant, backbones impact, etc. We also provide more experimental details in our supplementary information.
5.1 Ablation study
We first conduct the ablation study of mixDA on REFUGE and ORIGA datasets. Concretely, there are four settings: the baseline (ResNeST50), domain adaptation (DA), domain mixup (Mixup), and mixDA with both DA and Mixup. The detailed results are reported in Table 6.
For the REFUGE dataset, we can clearly observe the baseline has a poor performance on the scores of AUC and sensitive that REFUGE is an unbalanced dataset with only 40 glaucoma samples and 360 healthy samples in its test set. Thereby, the baseline model is easily over-fitting to the healthy category in the classification task with a low sensitivity score. As the domain gap exists between train set and validation/test sets, the DA conducts Fourier domain adaptation on its train set to the validation set, which improves its sensitive performance slightly. While, the improvement is not significant as the intrinsic domain gap existing. After introducing domain mixup with LAG and ORIGA, the model performance of sensitivity, specificity, and AUC all be improved. At last, our mixDA with both DA and Mixup further improves the AUC score to 0.9901.
Different from REFUGE, ORIGA train set and test set have the similar data distributions. So, the introducing of extra dataset to DA setting helps the model obtaining \(10\%\) improvements than the baseline setting. Meanwhile, the Mixup setting with the extra dataset also improves \(5\%\) than the baseline. The last setting has a similar result to REFUGE; mixDA with both DA and Mixup achieves the best performance with \(12\%\) improvements than the baseline model.
5.2 Transferability of glaucoma datasets
The transferability is defined by the AUC performance of the pretrained model on the unseen glaucoma datasets. To provide an intuitive transferability overview across different glaucoma datasets, we fix all experimental hyperparameters without data augmentation and only adjust the batch size regarding the dataset volume. From the results of Table 7, we conclude our findings as follows:
The first finding is that datasets with similar distributions can be benefited from each other. Specifically, REFUGE has a similar data distribution with DRH and DRISHT. So, its transferability score on DRH and DRISHT is higher than other datasets (i.e., G1020, RIM). Correspondingly, the pretrained model of DRH and DRISHT performs high transferability scores on REFUGE across their evaluations. Moreover, the IEEE1450 dataset has a discriminate distribution, making it has a poor transferability across evaluation datasets. Besides the data distribution, we also find the original fundus images are an important factor for the transferability. From the transferability comparison, we found the pretrained model on the original fundus dataset performs well on the original fundus dataset (e.g., ORIGA to REFUGE, G1020 to DRH), even to the crop ones (e.g., DRISHT to LAG, ORIGA to LAG).
Last but not least, the transferability of pretrained models is strongly related to the source data size. The small data size limited the model learning and enlarged model bias to its transferability performance. The representative low-resource datasets are HRF (22) and DRH (20), which get a good performance on their own dataset but perform poorly on unseen datasets. Overall, we found those three aspects have important impacts for domain mixup learning in mixDA.
5.3 Adaptation comparison
Our mixup domain adaptation has two strategies in the module of domain adaptation (DA): mixup Fourier domain adaptation (mFDA) and mixup histogram domain adaptation (mHDA). We evaluate the performance of those two adaptation strategies on the category-imbalanced REFUGE and category-balanced LAG. Moreover, we further explore those two adaptation strategies on different source domains. The evaluation of those two is reported as follows.
Table 8 shows the comparison of mFDA and mHDA training on different source domains. From the results, we can observe that the AUC performance of mFDA surpasses mHDA on three source domains than the two domains of mHDA surpasses mFDA. In contrast, mHDA works more stable than mFDA on the category imbalanced REFUGE dataset that mHDA with consistent sensitivities on most source domains than the mFDA. The best score is achieved on source domain LAG but with a collapsed sensitivity performance. To avoid the sensitivity collapsing into a low-resource category, our mixDA introduced the none-collapsed source to relieve this issue. So, our mixDA employ ORIGA instead of DCGAN with LAG to conduct domain mixup on REFUGE.
Compared with the unbalanced REFUGE, both adaptation strategies get a stable performance on the balanced LAG. From the results of Table 9, we can observe mFDA achieves a slight better AUC scores than mHDA in most source domains. The reason here is that the evaluated LAG is a category-balanced dataset with a large training data volume than most public glaucoma datasets.
5.4 Generalization capability on glaucoma detection
In this part, we have three experiments to evaluate the generalization performance of mixDA. The first setting is evaluating the domain generalization performance on the unseen dataset LAG. Then, we evaluate the model performance on the diverse dataset DCGAN, which consists of six public glaucoma datasets (ARCIMA, Drishiti-GS, RIM-ONE, HRF, ORIGA, and sjchoi86-HRF). At last, we conduct more evaluation on more public glaucoma datasets (1450, G1020, and HPD).
Our mixDA also works well on the unsupervised domain generalization, which is only trained on the source datasets and tested on the unseen dataset. Compared with the unsupervised method of SAIL trained on a private dataset pri-RFG, our unsupervised mixDA (u-mixDA) achieves a superior performance with a large margin. The main reason is u-mixDA benefits from the module of domain mixup, which pretrained on a larger and diverse glaucoma dataset DCGAN.
To verify the generality of mixDA on diverse datasets, we evaluate it on six different glaucoma datasets that are mostly low-resource with limited data size. Thus, we evaluate the mixDA on those datasets following the setting of SS-DCGAN [69] that splits the combo dataset DCGAN by \(70\%\) and \(30\%\) for train and test. From the results of Table 10, we find most baselines achieve an F-score around 0.81 with limited training samples. SS-DCGAN addresses this low-resource issue by introducing semi-supervised learning on a large extra fundus dataset which improved the ACU and F1 scores to 0.9017 and 0.8429. Different from SS-DCGAN, our mixDA directly performs domain mixup learning on the LAG glaucoma to enhance the model learning capability with \(3\%\) improvements on both AUC and F1 score than the SS-DCGAN. All those two settings verify our mixDA performs a good performance on the unsupervised learning setting and diverse dataset setting.
Besides, we also evaluate mixDA on more public glaucoma datasets: 1450, G1020, and HPD. The dataset partition default follows the comparable baselines in 1450 and G1020, and HPD is default set half by half partition. From the results of Table 11, we can observe mixDA all achieve the best performance on three datasets. Specifically, mixDA and the baseline method both get good performances on 1450. To the G1020, all baseline performances are dropped, and our mixDA can still keep a superior performance than the baselines in 6-fold cross-validation. In the last dataset HPD, we chose the ResNet50 and ResNeST50 as the baselines, and our mixDA consistently performance on F1 score on HPD dataset than its backbones. Meanwhile, mixDA can also improve the F1 score of backbone ResNet50 from 0.8550 to 0.8704, which verifies the mixDA is a backbone-free method and its performance can be boosted with the stronger backbones.
5.5 Backbone comparison
As mixDA is a backbone free approach, we evaluate it with different backbones on glaucoma datasets of REFGUE, LAG, and HPD. Seven different neural networks are employed as the backbones of mixDA, which can be categorized as transformer-based (VIT and COAT), ResNet-based (ResNet, SEResNeXt, and ResneST), and typical-based (Xception and EfficientNet) backbone.
From the overview of Fig. 7, the ResNet-based backbones perform better on datasets LAG and HPD than the others. But to the dataset REFUGE, only the ResneST achieves the best performance. Meanwhile, the VIT also performs well on different glaucoma datasets with the help of its attentive transformer layers. While, the same transformer-based COAT is not worked well on the glaucoma detection tasks. After analyzing the training process, we found that the backbones VIT and COAT with huge parameters are easier to go over-fitting than Resnet-based backbone models on the training-size-limited glaucoma datasets, which greatly limits their performance. Xception and EfficientNet can converge to lower losses, but their performances are not compatible to the ResneST. Thereby, our mixDA selects the ResNet-based ResneST and SEResNeXt as the default backbones in all glaucoma detection tasks.
5.6 One-stage and two-stage variants of mixDA
Since the optic cup-disk ratio is the key indicator for diagnosing glaucoma disease, most works pay attention to model training on the optic nerve head area or optic disk area as their region of interest (ROI) instead of training on the whole original fundus image. One straightforward solution is segment the ROI (crop) and conducts model training and prediction on the cropped fundus image. Another solution is attentive domain adaptation on the background area and reserves the ROI area by optic disk/cup segmentation (add). We named those two solutions as two-stage solutions that both need an extra process of ROI area segmentation. In contrast, the one-stage solution is direct conducts training and prediction on the original fundus images that the glaucoma disease also related to the ganglion cells and nerve fiber outside the ROI area. Detailed evaluation of those two paradigms on the REFUGE dataset is summarized in Table 12.
From the variant comparisons, we found both two-stage solutions can achieve higher AUC scores than one-stage solution in the vanilla setting and “+DA” setting (domain adaptation). The reason we think two-stage paradigms with “crop” and “add” process keep glaucoma-related information. And those processes help deep model learning glaucoma-specific information instead of learning from the original fundus image without prior information. However, the performances of two-stage solutions are both dropped in “+Mixup” setting, even worse than their vanilla model setting. The reason is that “Mixup” module introduces extra data having big differences with the processed (cropped or added) fundus images. These differences incur the model performance dropping on two-stage solutions. Although the combination of DA and Mixup can release this performance dropping, their performances are still inferior to the one-stage solution. This comparison further verifies the importance of domain distribution alignment and the effectiveness of our mixDA in glaucoma detection task.
5.7 Error analysis
In this part, we conduct error analysis on our mixDA. Traditionally, the main reasons for prediction error are the vicinal risk and model over-fitting, which misleads the deep model making the fault predictions. But we want to point out some other findings for prediction error: the challenging cases and poor image quality.
From Fig. 8, we can observe some challenging fault cases that the deep model predicts failure with a high possibility scores. On the one hand, those samples with a boarding condition for glaucoma diagnosis by the indicator of optic cup-disk-ratio. On the other hand, multiple deep models made the same fault predictions on this kind of samples with high fault confidence possibilities. In contrast, the poor image quality is an intuitive issue, which provides insufficient information and leads the fault predictions for deep models. We will pay more efforts to improve mixDA performance on the aforementioned fault predictions in our future work.
6 Conclusion
In this work, we proposed a novel mixup domain adaptation (termed mixDA) for glaucoma detection. Domain mixup and domain adaptation are two key modules in our mixDA, which help the deep model learning with a consistent data distribution as well as learning a generalized performance on the vicinal samples (outliers). We conducted extensive experiments on mixDA that got the competitive performance on 12 public glaucoma datasets and achieved new SOTA performance on the REFUGE, LAG, ORIGA, and RIM-ONE datasets. Moreover, we also discuss mixDA from different aspects, such as ablation study, domain transferability, generalization performance, mixDA variants, backbones impacts, and error analysis.
Data availability
All the data that support the findings of this study are available from REFUGE [4], G1020 [5], HPD [43], HRF [44], LAG [45], 1450 [46], DRH [7], ORIGA [10], DCGAN [47], RIM [9], ACRIMA [48], DRISHTI [49] in “https://refuge.grand-challenge.org”, “https://paperswithcode.com/datasets”, “http://ieee-dataport.org/2169”, and “http://www.cvblab.webs.upv.es/project/acrima_en/”.
References
Sreng S, Maneerat N, Hamamoto K, Win KY (2020) Deep learning for optic disc segmentation and glaucoma diagnosis on retinal images. Appl Sci 10(14):4916
Normando EM, Yap TE, Maddison J, Miodragovic S, Bonetti P, Almonte M, Mohammad NG, Ameen S, Crawley L, Ahmed F et al (2020) A cnn-aided method to predict glaucoma progression using darc (detection of apoptosing retinal cells). Expert Rev Mol Diagn 20(7):737–748
Kiyota N, Shiga Y, Omodaka K, Pak K, Nakazawa T (2021) Time-course changes in optic nerve head blood flow and retinal nerve fiber layer thickness in eyes with open-angle glaucoma. Ophthalmology 128(5):663–671
Orlando JI, Fu H, Breda JB, van Keer K, Bathula DR, Diaz-Pinto A, Fang R, Heng P-A, Kim J, Lee J et al (2020) Refuge challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Medical Image Anal 59:101570
Bajwa MN, Singh GAP, Neumeier W, Malik MI, Dengel A, Ahmed S (2020) G1020: A benchmark retinal fundus image dataset for computer-aided glaucoma detection. In: 2020 International joint conference on neural networks (IJCNN). IEEE, , pp 1–7
Qian P, Zhao Z, Chen C, Zeng Z, Li X (2021) Two eyes are better than one: exploiting binocular correlation for diabetic retinopathy severity grading. In: 2021 43rd Annual international conference of the IEEE engineering in medicine & biology society (EMBC). IEEE, pp 2115–2118
Holm S, Russell G, Nourrit V, McLoughlin N (2017) Dr hagis-a fundus image database for the automatic extraction of retinal surface vessels from diabetic patients. J Med Imaging 4(1):014503
Odstrcilik J, Kolar R, Budai A, Hornegger J, Jan J, Gazarek J, Kubena T, Cernosek P, Svoboda O, Angelopoulou E (2013) Retinal vessel segmentation by improved matched filtering: evaluation on a new high-resolution fundus image database. IET Image Process 7(4):373–383
Fumero F, Alayón S, Sanchez JL, Sigut J, Gonzalez-Hernandez M (2011) Rim-one: an open retinal image database for optic nerve evaluation. In: 2011 24th International symposium on computer-based medical systems (CBMS). IEEE, pp 1–6
Zhang Z, Yin FS, Liu J, Wong WK, Tan NM, Lee BH, Cheng J, Wong TY (2010) Origa-light: an online retinal fundus image database for glaucoma analysis and research. In: 2010 Annual international conference of the IEEE engineering in medicine and biology. IEEE, pp 3065–3068
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Kolesnikov A, Beyer L, Zhai X, Puigcerver J, Yung J, Gelly S, Houlsby N (2020) Big transfer (bit): general visual representation learning. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part V 16. Springer, pp 491–507
Sun S, Shi H, Wu Y (2015) A survey of multi-source domain adaptation. Inf Fusion 24:84–92
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: Bach F, Blei, D (eds) Proceedings of the 32nd international conference on machine learning. Proceedings of machine learning research, vol 37. PMLR, Lille, France, pp 97–105. https://proceedings.mlr.press/v37/long15.html
Tzeng E, Hoffman J, Darrell T, Saenko K (2015) Simultaneous deep transfer across domains and tasks. In: Proceedings of the IEEE international conference on computer vision, pp 4068–4076
Chen C, Li K, Wei W, Zhou JT, Zeng Z (2021) Hierarchical graph neural networks for few-shot learning. IEEE Trans Circuits Syst Video Technol 32, pp 2177-2186
Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D (2016) Domain separation networks. Adv Neural Inf Process Syst 29:343–351
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: beyond empirical risk minimization. In: International conference on learning representations
Chapelle O, Weston J, Bottou L, Vapnik V (2001) Vicinal risk minimization. In: Advances in neural information processing systems, pp 416–422
Chen X, Xu Y, Yan S, Wong DWK, Wong TY, Liu J (2015) Automatic feature learning for glaucoma detection based on deep learning. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 669–677
Zhao Z, Zhang K, Hao X, Tian J, Chua MCH, Chen L, Xu X (2019) Bira-net: bilinear attention net for diabetic retinopathy grading. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1385–1389
Zhao Z, Chopra K, Zeng Z, Li X (2020) Sea-net: squeeze-and-excitation attention net for diabetic retinopathy grading. In: 2020 IEEE international conference on image processing (ICIP). IEEE, pp 2496–2500
Yang G, Li F, Ding D, Wu J, Xu J (2021) Automatic diagnosis of glaucoma on color fundus images using adaptive mask deep network. In: Lokoč J, Skopal T, Schoeffmann K, Mezaris V, Li X, Vrochidis S, Patras I (eds) MultiMeda modeling. Springer, Cham, pp 99–110
Yu S, Zhou H-Y, Ma K, Bian C, Chu C, Liu H, Zheng Y (2020) Difficulty-aware glaucoma classification with multi-rater consensus modeling. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 741–750
Zhao R, Chen X, Chen Z, Li S (2020) Egdcl: an adaptive curriculum learning framework for unbiased glaucoma diagnosis. In: European conference on computer vision. Springer, pp 190–205
Zhao Z, Zeng Z, Xu K, Chen C, Guan C (2021) Dsal: deeply supervised active learning from strong and weak labelers for biomedical image segmentation. IEEE J Biomed Health Inform
Li T, Bo W, Hu C, Kang H, Liu H, Wang K, Fu H (2021) Applications of deep learning in fundus images: a review. Med Image Anal 69:101971
Zhao Z, Xu K, Li S, Zeng Z, Guan C (2021) Mt-uda: towards unsupervised cross-modality medical image segmentation with limited source labels. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 293–303
Maria Carlucci F, Porzi L, Caputo B, Ricci E, Rota Bulo S (2017) Autodial: automatic domain alignment layers. In: Proceedings of the IEEE international conference on computer vision, pp 5067–5075
Chen C, Li K, Teo SG, Zou X, Li K, Zeng Z (2020) Citywide traffic flow prediction based on multiple gated spatio-temporal convolutional neural networks. ACM Trans Knowl Discov Data (TKDD) 14(4):1–23
Li L, Wan Z, He H (2020) Dual alignment for partial domain adaptation. IEEE Trans Cybern 51, pp 3404-3416
Yan M, Chen C, Du J, Peng X, Zhou JT, Zeng Z (2021) Memory-assistant collaborative language understanding for artificial intelligence of things. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2021.3100397
Fu Y, Zhang M, Xu X, Cao Z, Ma C, Ji Y, Zuo K, Lu H (2021) Partial feature selection and alignment for multi-source domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16654–16663
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. The journal of machine learning research 17(1):2096–2030
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Sankaranarayanan S, Balaji Y, Castillo CD, Chellappa R (2018) Generate to adapt: aligning domains using generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8503–8512
Cao Z, Ma L, Long M, Wang J (2018) Partial adversarial domain adaptation. In: Proceedings of the European conference on computer vision (ECCV)
Tang H, Jia K (2020) Discriminative adversarial domain adaptation. Proc AAAI Conf Artif Intell 34(04):5940–5947. https://doi.org/10.1609/aaai.v34i04.6054
Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning. PMLR, pp 1989–1998
Yang Y, Soatto S (2020) Fda: Fourier domain adaptation for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4085–4095
Bhabatosh C et al. (1977) Digital image processing and analysis, PHI Learning Pvt. Ltd., London, pp 1–999
Xu Q, Zhang R, Zhang Y, Wang Y, Tian Q (2021) A fourier-based framework for domain generalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14383–14392
Wang X (2019). Raw-Processed data. Harvard Dataverse. https://doi.org/10.7910/DVN/WVESCH
Budai A, Odstrcilik J, Kolar R, Hornegger J, Jan J, Kubena T, Michelson G (2011) A public database for the evaluation of fundus image segmentation algorithms. Investig Ophthalmol Vis Sci 52(14):1345–1345
Li L, Xu M, Wang X, Jiang L, Liu H (2019) Attention based glaucoma detection: a large-scale database and cnn model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10571–10580
Song W (2020) 1450 Fundus images with 899 glaucoma data and 551 normal data. IEEE Dataport. https://doi.org/10.21227/4bcp-2z21
Diaz-Pinto A, Colomer A, Naranjo V, Morales S, Xu Y, Frangi AF (2019) Retinal image synthesis and semi-supervised learning for glaucoma assessment. IEEE Trans Med Imaging 38(9):2211–2218. https://doi.org/10.1109/TMI.2019.2903434
Andres D, Sandra M, Valery N, Thomas K, Jose MM, Navea A (2019) Cnns for automatic glaucoma assessment using fundus images: an extensive validation. Biomed Eng Online 18, pp 2-19
Sivaswamy J, Krishnadas SR, Datt Joshi G, Jain M, Syed Tabish AU (2014) Drishti-gs: retinal image dataset for optic nerve head(onh) segmentation. In: 2014 IEEE 11th international symposium on biomedical imaging (ISBI), pp 53–56. https://doi.org/10.1109/ISBI.2014.6867807
Wu J, Yu S, Chen W, Ma K, Fu R, Liu H, Di X, Zheng Y (2020) Leveraging undiagnosed data for glaucoma classification with teacher-student learning. In: Martel AL, Abolmaesumi P, Stoyanov D, Mateus D, Zuluaga MA, Zhou SK, Racoceanu D, Joskowicz L (eds) Medical image computing and computer assisted intervention - MICCAI 2020. Springer, Cham, pp 731–740
Zhao X, Guo F, Mai Y, Tang J, Duan X, Zou B, Jiang L (2019) Glaucoma screening pipeline based on clinical measurements and hidden features. IET Image Process 13(12):2213–2223
Bajwa MN, Malik MI, Siddiqui SA, Dengel A, Shafait F, Neumeier W, Ahmed S (2019) Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning. BMC Med Inform Dec Mak 19(1):1–16
Fu H, Li F, Xu Y, Liao J, Xiong J, Shen J, Liu J, Zhang X (2020) For iChallenge-GON study group: a retrospective comparison of deep learning to manual annotations for optic disc and optic cup segmentation in fundus photographs. Trans Vis Sci Technol 9(2):33–33. https://doi.org/10.1167/tvst.9.2.33
Gunasinghe H, McKelvie J, Koay A, Mayo M (2021) Comparison of pretrained feature extractors for glaucoma detection. In: 2021 IEEE 18th international symposium on biomedical imaging (ISBI), pp 390–394. https://doi.org/10.1109/ISBI48211.2021.9434082
Fu H, Cheng J, Xu Y, Zhang C, Wong DWK, Liu J, Cao X (2018) Disc-aware ensemble network for glaucoma screening from fundus image. IEEE Trans Med Imaging 37(11):2493–2501. https://doi.org/10.1109/TMI.2018.2837012
Chen X, Xu Y, Kee Wong DW, Wong TY, Liu J (2015) Glaucoma detection based on deep convolutional neural network. In: 2015 37th Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 715–718. https://doi.org/10.1109/EMBC.2015.7318462
Li Z, He Y, Keel S, Meng W, Chang RT, He M (2018) Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology 125(8):1199–1206
Li L, Xu M, Wang X, Jiang L, Liu H (2019) Attention based glaucoma detection: a large-scale database and cnn model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10571–10580
Zhao R, Li S (2020) Multi-indices quantification of optic nerve head in fundus image via multitask collaborative learning. Med Image Anal 60:101593. https://doi.org/10.1016/j.media.2019.101593
Al Ghamdi M, Li M, Abdel-Mottaleb M, Shousha MA (2019) Semi-supervised transfer learning for convolutional neural networks for glaucoma detection. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3812–3816. https://doi.org/10.1109/ICASSP.2019.8682915
Chen X, Xu Y, Kee Wong DW, Wong TY, Liu J (2015) Glaucoma detection based on deep convolutional neural network. In: 2015 37th Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 715–718. https://doi.org/10.1109/EMBC.2015.7318462
Fu H, Cheng J, Xu Y, Wong DWK, Liu J, Cao X (2018) Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE Trans Med Imaging 37(7):1597–1605
Xu Y, Lin S, Wong DWK, Liu J, Xu D (2013) Efficient reconstruction-based optic cup localization for glaucoma screening. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 445–452
Cheng J, Liu J, Xu Y, Yin F, Wong DWK, Tan N-M, Tao D, Cheng C-Y, Aung T, Wong TY (2013) Superpixel classification based optic disc and optic cup segmentation for glaucoma screening. IEEE Trans Med Imaging 32(6):1019–1032. https://doi.org/10.1109/TMI.2013.2247770
Li A, Cheng J, Wong DWK, Liu J (2016) Integrating holistic and local deep features for glaucoma classification. In: 2016 38th Annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 1328–1331
Bao Y, Wang J, Li T, Wang L, Xu J, Ye J, Qian D (2021) Self-adaptive transfer learning for multicenter glaucoma classification in fundus retina images
Alghamdi HS, Tang HL, Waheeb SA, Peto T (2016) Automatic optic disc abnormality detection in fundus images: a deep learning approach. In: Ophthalmic medical image analysis international workshop. University of Iowa, Iowa, , pp 17–24. https://doi.org/10.17077/omia.1042
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Pinto AYD (2019) Machine learning for glaucoma assessment using fundus images. PhD thesis, Universitat Politècnica de València
Song WT, Lai I-C, Su Y-Z (2021) A statistical robust glaucoma detection framework combining retinex, cnn, and doe using fundus images. IEEE Access 9:103772–103783. https://doi.org/10.1109/ACCESS.2021.3098032
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. All authors contributed to the study conception and design. The first draft of the manuscript was written by Ming Yan, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A the details of cross-validation
This part presents the detailed results of cross-evaluation on datasets ORIGA and G1020. The “index” denotes the fold number in cross-evaluation, and “AVG” denotes the mean value of evaluation metric. Besides reported “AUC,” which is the main evaluation metric on imbalanced data distributions instead of the imbalanced distribution sensitive metrics “accuracy, specificity and sensitivity,” we also report the F1 for reference in our cross-evaluation results.
Following the evaluation setting of previous work [52, 65], Table 13 shows the 10-fold cross-validation results on dataset ORIGA. From the above results, we find the sensitivity score performs well mostly, except the last two trials with only the poor sensitivity scores of 0.4375 and 0.1875, which decrease the final results of our performance. In the rest evaluation folds, our approach can perform stable and well.
Following the evaluation setting of previous work [51, 56], Table 14 shows the 6-fold cross-validation results on G1020 dataset. From the overall performances of our mixDA on ORIGA, we can find the ORIGA dataset is more challenging than the other glaucoma datasets (e.g., ORIGA, LAG).
Appendix B quantity comparison
We conduct quantity evaluation on REFUGE and LAG in this part. Three evaluation trials are performed with different hyper-parameters settings. Figure 9 plotted the detailed results.
From quantity evaluation results comparison between REFUGE and LAG, the most significant difference lies in the sensitivity scores of REFUGE and LAG. As REFUGE is a category-imbalanced dataset, our mixDA is easy to be over-fitting on the large-size health category, which incurs a relatively lower sensitivity performance on the glaucoma disease detection than the category-balanced LAG. In the rest evaluation metrics, LAG achieves a robust performance than the REFUGE. This is the reason why all the paper seeks the “AUC” as the final evaluation metric instead of “sensitve” or “specificit” for their performance comparison. Overall, our mixDA achieves the high AUC performances on both REFUGE and LAG.
Appendix C domain gap in REFUGE
Figure 10 shows the domain gap of its data distributions comparison on fundus images. From the comparison of fundus samples, we can clearly observe the differences exist in REFUGE itself between its training set and validation set. Thereby, our mixDA employs the domain adaptation (mFDA or mHDA) to reduce this gap. In the right of Fig. 10, most distribution gap is reduced after the adaptation of mixDA, while also exists vicinal samples (outliers), which are further reduced by the domain mixup module of mixDA.
Appendix D glaucoma datasets
Figure 11 presents the samples of different glaucoma datasets. We can clearly observe the differences across different glaucoma datasets in image resolution (i.e., G1020 vs LAG), lightness (i.e., HRF vs DRISHT), contrast, angle of aspect, etc. Meanwhile, we also provides their train and test set data distributions for reference. Note, REFUGE even has the domain gap between its own train and test sets. Moreover, the 1450 has a shifted distribution with the existence of fluorescent images, and DCGAN is a combination of different datasets that are not following Gaussian-like distribution.
Appendix E fundus image illustration
In Fig. 12, we present the fundus image comparison by different domain adaptation strategies: mFDA and mHDA. Our two strategies conduct explicit domain adaptation from different aspects that the mHDA enhances the contrast of fundus image and mFDA adapts the texture of original image. The detailed experimental results of two different adaptation strategies are reported in Table 8 and Table 9.
Figure 13 illustrates the fundus images of the two-stage variants (“Crop” and “Add”) of our mixDA, which are two-stage approaches with the extra segmentation. As glaucoma disease is related to the optic cup-disk-ratio, the two-staged “Crop” only keeps the ROI area instead of the large-resolution fundus image. In contrast, the two-stage “Add” still retains the large-resolution fundus image that conducts domain adaption on the background area and reserves the original fundus image on the ROI area. The detailed experimental results are reported in Table 12.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yan, M., Lin, Y., Peng, X. et al. mixDA: mixup domain adaptation for glaucoma detection on fundus images. Neural Comput & Applic (2023). https://doi.org/10.1007/s00521-023-08572-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00521-023-08572-3