Keywords

1 Introduction

Glaucoma is the second leading cause of blindness worldwide, as well as the foremost cause of irreversible blindness [1]. Although there is no cure, early detection and treatment can decrease the ratio of blindly. Digital retinal image is widely used for screening of glaucoma as it consumes less time, has higher accuracy. However, manual assessments by trained clinicians are not suitable for large-scale screening. Hence it is essential to design a reliable early detection system for glaucoma screening.

Generally, besides the increasing pressure in the eye [2], the risk factors for diagnosing glaucoma in retinal images include rim to disk area ratio, the optic disk diameter, and the vertical cup-to-disc ratio (CDR) [3]. Among these factors, CDR is considered as the essential measurement and is widely accepted by clinicians as well. According to clinical experience, a larger CDR indicates a higher risk of glaucoma. In order to obtain accurate CDR, precise segmentation of OD and OC [see Fig. 1] are essential. In our work, we propose an effective optic disc localization method, and with the OD location results, an end-to-end deep neural network called U-Net+CP+FL is proposed to segment the OD and OC simultaneously. The main contributions of our work are as follows:

Fig. 1.
figure 1

Retinal fundus image and cropped ROI region. The region enclosed by the blue dotted ellipse is optic disc (OD), the green one is optic cup (OC). The region between OD and OC is neuroretinal rim. The cup-to-disc ratio (CDR) is defined as the ratio of vertical diameter of OC (VCD) to the vertical diameter of OD (VDD). (Color figure online)

  1. (1)

    A new optic disc localization algorithm based on sliding window is proposed in this paper. The algorithm adopts intensity information and blood vessels to locate optic disc. Experimental results show that the optic disc can be effectively localized in various condition.

  2. (2)

    A modification of U-Net neural network named U-Net+CP+FL which introduces concatenating path in the encoder path is proposed to segment the optic disc and cup simultaneously. The concatenating path introduces feature maps sharing and multi-scale inputs from all the previous layers to help to segment the OD and OC simultaneously.

  3. (3)

    To segment OD and OC simultaneously and independently, we proposed a multi-label fusion loss function which consists of weighted binary cross-entropy loss and dice coefficient. The proposed loss function can deal with data imbalance problem, which is essential for the segmentation of OC.

The structure of the paper is as follows. We review the existing methods related to OD/OC localization and segmentation in Sect. 2. In Sect. 3, we describe the proposed algorithm in detail. Specifically, the framework of our method is first given. Then, the proposed OD localization method is described in Subsect. 3.2. Next, the proposed network architecture to OD and OC segmentation is presented in Subsect. 3.3, and post-processing after segmentation and CDR calculation are described in Subsect. 3.4. In Sect. 4, subjective and quantitative evaluations are performed to verify the effectiveness of our proposed method. Finally, we conclude the paper in Sect. 5.

2 Related Works

Accurate CDR calculation depends on precise segmentation of OD and OC. In this section we mainly introduce the related works on OD segmentation and OC segmentation. In [4], image gradient was applied to extract the optic disc contour. The spuerpixel method was also applied to OD and OC segmentation [5]. In [6], morphological operations were used to segment the optic cup. After removing the blood vessels, Babu et al. [7] employed fuzzy C-means clustering and wavelet transformed to extract OC. Common limitation of these algorithms is highly dependent on hand-crafted features, which are susceptible to different imaging conditions.

Few researchers Segment OD and OC simultaneously. For example, Yin et al. [8] employed a statistical model-based method for both OD and OC segmentation. In [9], a general energy function and structure constraint together with optimization of graph cut enables precise OD and OC segmentation. However, these algorithms segment them sequentially not simultaneously.

Aim at the above problems, an effective and robust OD localization method is first proposed to crop the OD region as ROI. Then the method segments the OD and OC from the ROI with the proposed U-Net+CP+FL which improves the U-Net with concatenating path and multi-labels fusion fundus loss function.

3 Our CDR Measurement Algorithm

3.1 Algorithm Framework

The proposed CDR measurement algorithm consists of three main phases: (i) OD localization and ROI extraction, (ii) Simultaneous segmentation of OD and OC, and (iii) CDR calculation, as shown in Fig. 2. These phases are further divided into several steps as follows:

Fig. 2.
figure 2

Framework of our work

  1. i.

    OD localization and ROI extraction: OD localization consists of brightest region extraction, blood vessel extraction and confidence calculation of the sliding window. Firstly, Morphological processing is applied for both brightest region and blood vessel extraction. Secondly, sliding window is employed to find the OD location based on the retinal fusion image. Finally, ROI region is cropped for OD and OC segmentation.

  2. ii.

    Simultaneous segmentation of OD and OC: U-Net+CP+FL is proposed for simultaneously segmenting OD and OC. Specifically, a U-shape network is adopted as our main architecture, and multi labels with fusion loss function are employed for better segmentation results. Besides, a novel concatenating path is also proposed and introduced along encoder path, which means multi-scale inputs and feature map reuse, resulting better segmentation results especially in OC.

  3. iii.

    CDR calculation: Once OD and OC are segmented, post processing like erosion and dilation operations can eliminate the isolated points which mostly are noise. Besides, ellipse fitting is employed to smooth the boundary of the segmented results. Finally, CDR is calculated by the ratio of the vertical OC diameter to the vertical OD diameter.

3.2 OD Localization and ROI Extraction

In the retinal image, the OD and OC occupy small portions, which are hard to segment. To handle this problem, we propose a novel OD localization algorithm that combines the intensity information with the blood vessels to localize the center of OD through the sliding window. The sub-image cropped from the center of the OD is considered as the ROI region. Thus, the segmentation of OD and OC will be operated on the cropped ROI. Figure 3 depicts the flowchart of the OD localization algorithm. It can be seen that there are three key steps for localizing the optic disc: Image enhancement and brightest region extraction, blood vessel extraction, and confidence calculation of the sliding window. Following we will discuss the three steps in details.

Fig. 3.
figure 3

Our OD localization algorithm flowchart. The intermediate steps are shown as red blocks and the key steps are shown as blue blocks. (Color figure online)

Step 1: Image Enhancement and Brightest Region Extraction.

Due to the various imaging conditions, morphological processing is applied on the input retinal image (see Fig. 4(a)) to enhance the retinal image and to extract brightest pixels from the fundus. Top-hat transformation (GT) is used to enhance bright objects of interest in a dark background (see Fig. 4(b)), and bottom-hat (GB) enhances the dark objects of interest in a bright background (see Fig. 4(c)). Thus, the enhanced gray-level retinal image (F′) can be defined as:

Fig. 4.
figure 4

Key steps for OD localization. (a) Input retinal image. (b) Bottom-hat transformation result. (c) Top-hat transformation result. (d) Enhanced retinal image by bottom-top-hat transformation. (e) Brightest region of retinal image. (f) Extracted blood vessels. (g) Fusion image which combined enhanced retinal image with the blood vessels. (h) Our OD localization result.

$$ F^{\prime} = F + G_{T} - G_{B} $$
(1)

As can be seen in Fig. 4(d), the region of OD is obviously enhanced, and the contrast of the gray-level retinal image is enhanced too. Thus, the pixels larger than 6.5% of the maximum pixel value are considered to be the candidate pixels of OD, since the OD accounts for brightest region of the retinal image, as shown in Fig. 4(e).

Step 2: Blood Vessel Extraction.

For the blood vessel extraction, Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to enhance the blood vessel in the green channel of the input retinal image. Then, bottom-top hat transformation is employed to extract blood vessels. Since the intensity of the blood vessels is generally smaller than that of background, the vessels of blood can be extracted by the difference between bottom-hat transformation and top-hat transformation. Besides, to eliminate the salt and pepper noise from the blood vessel segmentation result, median filtering is performed. Thus, the vessel extraction result Fvessel can be obtained as shown in Fig. 4(f). This process can be written as:

$$ F_{vessel}^{{}} = G_{B} - G_{T} $$
(2)

Step 3: Confidence Calculation of the Sliding Window.

To locate the OD fast and effectively, sliding window is employed to scan three different feature maps including brightest region of gray-level retinal image, blood vessels and the fusion image which combines brightest region and blood vessels, as shown in Fig. 4(g). Let f(i), f(bv) and f(ibv) represent score of each sliding window which is scanned through the three feature maps: intensity map I, blood vessel map bv, and intensity & blood vessel map ibv. In addition, min-max normalization is also applied to the scores of sliding windows in each feature map to normalize the data between 0 and 1. Thus, the final score of each window S is the mean value of f(i), f(bv) and f(ibv).

Finally, the localization of the sliding window with the maximum score will be considered to be the location of OD, as shown in Fig. 4(h).

Once the OD is located, the square region containing OD can be extracted from the retinal image as ROI region. In our work, all the ROI regions have the same size and the size is equal to 1.5 times of the maximum diameter of OD, where the maximum diameter of OD is calculated by the OD mask of retinal images from existing dataset before OD localization. Experiment on test images show that our method can effectively extract the OD inside the ROI region. An illustrative example is shown in Fig. 5.

Fig. 5.
figure 5

OD localization result and cropped ROI region

3.3 Simultaneous Segmentation of OD and OC

Inspired by U-Net [10] and DenseNet [11] architecture, we propose a novel architecture called U-Net+CP+FL which consists of U-shape convolutional architecture, concatenating path and fusion loss function, as shown in Fig. 6. As can be seen in the figure, the network includes three components: (i) U-shape deep convolutional neural network architecture, (ii) concatenating path—an additional connection design between encoder layers (iii) multi-label output layer with fusion loss function.

Fig. 6.
figure 6

Our proposed network architecture

3.3.1 U-Shape Network Architecture

U-shape network is an effective and powerful fully convolutional neural network for biomedical image segmentation even for small dataset. The network mainly consists of two parts: encoder path and decoder path, and skip connections.

Encoder path is responsible for feature extraction, which consists of convolutional block including batch normalization (BN), ReLU activation and convolutions successively. Maxpooling is employed to reduce the resolution of the feature maps. Decoder path is a reverse process of the encoder path, which is trained to reconstruct the input image resolution. To recover the resolution of the feature maps, deconvolution is employed in the decoder layer which matches pooling layer in the encoder path. Finally, the output at the final decoder layer is fed to a multi-label classifier.

Skip connection is a crucial design in encoder-decoder networks. The skip architecture relays the intermittent feature maps from encoder layer to the matched decoder layer, which not only helps reconstructing the input image resolution but also overcome the vanishing gradient problem.

3.3.2 Concatenating Path

Inspired by Densenet, we introduce new connections between encoder layers called concatenating path, which contributes to the feature maps sharing and multi-scale inputs for the encoder path. Along the concatenating path, the input of current layer is consisted of last pooling output and last resized input. Thus, the encoder path receives feature maps not only from the last layer, but also from the input layer and the semantic information from all the previous layers, which equals multi-scale inputs and feature maps sharing. Experimental results show that our proposed network improves the segmentation accuracy.

3.3.3 Multi-label Loss Function

OD and OC occupy small parts of retinal image (see Fig. 1), thus overfitting is prone to happen even trained on the cropped ROI region. In U-NET+CP+FL, we propose combining the weighted binary cross-entropy loss with the dice coefficient as the object function to optimize, where the introduction of dice coefficient relives the data imbalance problem effectively. For the proposed network, multi-label loss means that the pixel belongs to OD or/and OC independently, and this helps to mitigate the data imbalance problem too. The multi-label loss function is described as:

$$ L(p,g) = - \sum\limits_{i = 1}^{N} {\sum\limits_{c = 1}^{K} {w^{c} (g_{i}^{c} \cdot \log (p_{i}^{c} ) + \frac{{2 \cdot p_{i}^{c} \cdot g_{i}^{c} }}{{(p_{i}^{c} )^{2} + (g_{i}^{c} )^{2} }})} } $$
(3)

where \( p_{i}^{c} \) denotes the probability of pixel i belong to class c, and \( g_{i}^{c} \) denotes the ground truth label for pixel i. In our experiments, pixels belonging to OD or OC independently, thus k is set to be 2. wc in Eq. (3) is a trade-off weight to decide the contribution of OD and OC. For glaucoma diagnosis, both OD and OC are important, so we set wc to 0.5.

3.4 CDR Calculation

To achieve accurate CDR measurement, postprocessing on the segmentation result can mitigate the effects of noise and uneven boundaries. Most isolated points can be eliminated by erosion and dilation operations. Since another distinct feature of OD is its elliptical shape, we then use the least-squares optimization to fit the segmented OD contour with an ellipse, where the contour pixels are extracted by means of a Canny edge detector. Finally, the centroid and the long/short-axis length of the OD, which are obtained by ellipse fitting, are used to overlay ellipse on the input image to segment OD of the input retinal image. The same operations are conducted on the OC. At Last, CDR is calculated by the ratio of the vertical OC diameter (VCD) to the vertical OD diameter (VDD), as shown in Fig. 1.

4 Experimental Results

In this section, we present our experiment results. First, we evaluate our OD localization, simultaneous segmentation of OD and OC, and CDR calculation in terms of subjective evaluation. Next, the quantitative evaluation is carried out for CDR measurement to verify the effectiveness of the proposed method. The proposed method is implemented in python3.5. The experiments are performed on a PC with 3.40 GHz Intel CPU and Nvidia Titan XP.

4.1 Subjective Evaluation

4.1.1 OD Locating Performance

Both ORIGA [12] and DRISHTI-GS1 [13] are famous retinal datasets which contain 650 and 101 fundus images, respectively. We evaluate our OD localization method on both ORIGA and DRISHTI-GS1. Besides, we treat the OD as localized accurately when the predicted location of OD is inside the practical optic disc. The statistic results for OD location are shown in Table 1.

Table 1. Performance validation of OD localization on different retinal datasets

From Table 1, one can clearly see that the proposed OD location method achieve high accuracy (100% and 99.1%, respectively) for the two public databases, and the running speed is relatively fast since the proposed method only takes 0.3 s on average to process a retinal image with a size of 1755 × 2048. This speed can be further improved by using a GPU-based parallel algorithm.

4.1.2 Segmentation Performance

For subjective evaluation, different methods are compared with our proposed U-Net+CP+FL, including U-Net [10] and M-Net with Polar Transformation (M-Net+PT) [14]. The reason why we choose these methods to compare with the proposed U-Net+CP+FL is that U-Net, M-Net+PT and our proposed U-Net+CP+FL are all deep neural network based method, and the M-Net+PT method is regarded as one of the best OD and OC segmentation methods at present. Figure 7 shows the results of OD and OC segmentation for different methods. We can clearly see that our proposed U-Net+CP+FL achieves best segmented boundaries.

Fig. 7.
figure 7

Performance comparison of OD and OC Segmentation with different methods. (a) Cropped ROI region. (b) Ground truth. (c) Segmentation result obtained by U-Net. (d) M-Net+PT segmentation result. (e) Proposed method.

Note that all the OD boundaries obtained by different methods are similar. However, the OC segmentation is more a difficult task, and the OC boundary obtained by M-Net+PT is rough and irregular, which will mislead the calculation of CDR. The U-Net achieves a smooth but larger OC. For our results, the boundary of OC is not only smooth but also relatively accurate, thus CDR measurement and glaucoma diagnosis can benefit much from the results.

4.2 Quantitative Evaluation

For only 50 retinal images with segmentation ground truth are available in DRISHTI-GS1 database, this leads to few data for training, let alone testing. Thus, quantitative evaluation on OD and OC segmentation are only conducted on ORIGA dataset, and the CDR measurement obtained by proposed method is compared with the ophthalmologist. In our experiments, 500 retinal images are randomly selected for training and 150 for testing.

4.2.1 Segmentation Performance Evaluation

We use the overlap score S and average accuracy AVG_ACC respectively to evaluate the segmentation performance. The two indexes are defined as

$$ S = \frac{{{\text{Area}}(g \cap p)}}{{{\text{Area}}(g \cup p)}} $$
(4)
$$ AVG\_ACC = \frac{{{\text{sensitivity}} + {\text{specificity}}}}{2} $$
(5)

In Eq. (4), g and p denote the ground truth and segmented mask, respectively. Area (.) denotes the region areas. Apart from S, we also adopt the index AVG_ACC which consists of sensitivity (true positive rate) and specificity (false positive rate) described as follows:

$$ {\text{sensitivity = }}\frac{TP}{TP + FN} $$
(6)
$$ s{\text{pecificity = }}\frac{TN}{TN + FP} $$
(7)

In Eqs. (6) and (7), TP, TN, FP and FN are true positives, true negatives, false positives and false negatives respectively.

Several state-of-the-art OD/OC segmentation methods: Superpixel method [5], U-Net [10], and M-Net+PT [14] are also adopted to compare with the proposed method. Besides, two improved networks proposed in our work are also included: U-Net+Fusion Loss (U-Net+FL) and U-Net+Concatenating Path (U-Net+CP). Among these methods, Superpixel, U-Net+FL, U-Net+CP and U-Net+CP+FL are employed with ellipse fitting while U-Net and M-Net+PT are not.

Table 2 shows the segmentation comparison results of different methods on ORIGA dataset. Results shows that compared with U-Net, Superpixel method achieve better segmentation results both in OD and OC. M-Net+PT which introduced side-output layers and polar transformation makes huge strides in segmentation compared with the original U-Net. However, polar transformation strongly depends on the precise localization of OD, failure of localization would cause the irregular reconstructed segmentation results. Besides, our network is directly trained on ROI region with image augmentation (i.e. image translation and image rotation), which is not sensitive to the result of OD localization. And our proposed U-Net+CP+FL can achieve best results for most measurements, such as Sdisc, AVG_ACCdisc, Scup, and δCDR. Besides, both U-Net+FL and U-Net+CP, the two modified version of U-Net proposed in our work, can achieve comparable or even better results (e.g. AVG_ACCcup), which demonstrates that the concatenating path and the fusion loss introduced by our work contribute to the OD and OC segmentation.

Table 2. Performance comparison of different methods on ORIGA dataset

4.2.2 CDR Measurement Evaluation

We evaluate our CDR performance with absolute CDR error, which is defined as δCDR = |CDRg − CDRP|. Here, CDRg denotes the ground truth from trained clinician, and CDRP is the CDR calculated by our proposed method. From Table 2, we can conclude that our proposed method can achieve smallest CDR error compared with other methods. Smaller error of the calculated CDR shows the boundaries obtained by the proposed U-Net+CP+FL network are much finer.

Furthermore, the distribution of glaucoma and non-glaucoma measured by CDR is illustrated in Fig. 8. We can clearly see that the overall distribution of calculated CDR is close to ophthalmologist especially in the inter-quartile range. Besides, the inter-quartile range is separated completely, which means that CDR can be an important clinical measurement for glaucoma diagnosis. In summary, we can conclude that the CDR performance of our proposed method is close to expert level. Observations on other test images also confirm this conclusion.

Fig. 8.
figure 8

Box plots for CDR of ophthalmologist and proposed method in test cases.

5 Conclusion

A novel CDR measurement method is proposed in this paper. The proposed method first uses morphological operation and sliding window to locate OD and further extract ROI. Then, an end-to-end deep neutral network called U-Net+CP+FL, which consists of U-shape convolutional architecture, a novel concatenating path and a multi-label fusion loss function, is proposed to simultaneously segment OD and OC. Based on the segmentation results, the CDR value can be effectively calculated. There are several advantages in the proposed method compared with the other existing algorithms. First, the OC segmentation is more accurate than other existing methods. Second, the proposed method can automatically and simultaneously segment OD and OC in an end-to-end way without any user-interaction. Finally, our work combines traditional image processing technologies with deep learning to achieve better results.

However, the proposed algorithm also has some limitations and it may lead to invalid result in some situations. For example, when OD is surrounded by parapapillary atrophy (PPA), the PPA blurs the boundary of the OD, which may result in the over segmentation problem for both OD and OC regions. Nevertheless, we provide a new way to solve the CDR calculation problem and the result appears to be quite successful in most cases. Therefore, the proposed method could be suitable for automatic glaucoma analysis in a variety of clinical settings. In the future, we will try to build our own fundus image dataset to validate the effectiveness of the proposed method.