Introduction

According to a World Health Organization report, the number of diabetic patients worldwide has increased from 108 million in 1980 to 422 million in 2014, and the prevalence of diabetes worldwide in adults (> 18 years of age) has increased from 4.7% in 1980 to 8.5% in 2014 [1]. Voigt et al. reported that 25.8% of diabetic patients have complications of retinopathy (nonproliferative 20.2%; proliferative 4.7%; unclassified 0.7%; blindness 0.1%) [2]. Early treatment, compared to deferral of photocoagulation, was associated with a small reduction in the incidence of severe visual loss [3]; however, undergoing fundus examination by an ophthalmologist is unrealistic and costly for diabetic patients. Furthermore, there is a large cost burden associated with diabetic retinopathy, and the financial impact may be even more severe for many patients with this complication who live in developing countries [4].

Recently, image processing technology using a deep learning application, which is a machine learning algorithm, has attracted attention because of its accuracy. Using this technology for medical imaging is being actively studied [5,6,7]. In fact, image diagnosis has already been reported in ophthalmology [8,9,10,11]. In addition, the advent of wide-angle fundus cameras, such as the ultrawide-field scanning laser ophthalmoscope (Optos 200Tx; Optos plc, Dunfermline, UK) known as Optos, has made it possible to simply and noninvasively capture a wide range of fundus photographs [12,13,14]. In the present study, we assessed and determined the accuracy of ultrawide-field fundus images with deep learning to detect the presence of treatment-naïve proliferative diabetic retinopathy (PDR).

Methods

Dataset

The procedures in the present study conformed to the tenets of the Declaration of Helsinki. Informed consent was obtained from the subjects after they understood the study’s nature and possible consequences.

The study dataset was comprised of 132 images and data from patients with treatment-naïve PDR; those without fundus diseases were extracted from April 1, 2011, to March 30, 2018, at the clinical database of the ophthalmology departments of Saneikai Tsukazaki Hospital and Tokushima University Hospital. These images were reviewed by three retinal specialists to assess the presence of PDR using mydriatic slit-lamp binocular indirect ophthalmoscopy and were registered in an analytical database. All patients underwent Optos and ultrawide-field fluorescein angiography (FA) (Fig. 1). The levels of diabetic retinopathy were defined from the retinal images using the Early Treatment Diabetic Retinopathy Study (ETDRS) severity scale [3]. Out of 378 fundus images, 132 images were from PDR patients, and 246 images were from normal subjects without PDR.

Fig. 1
figure 1

A representative fundus image obtained by ultrawide-field scanning laser ophthalmoscopy. The presence of proliferative diabetic retinopathy (PDR) is seen on ultrawide-field fundus color image (A) and on fluorescein angiography (B)

In this study, we used K-fold cross-validation [15]. Briefly, the image data were divided into K groups; K-1 groups were used as training data, and one group was used as validation data. This process was repeated until each group became a validation dataset. In the present study, we divided the data into nine groups. The images of the training dataset were augmented by adjusting for brightness, gamma correction, histogram equalization, noise addition, and inversion; augmenting the images increased the amount of learning data 18 times. The deep convolutional neural network (DCNN) model, as detailed below, was created and trained with preprocessed image data.

Deep learning model and training the model

We implemented a deep learning model that used a VGG-16 DCNN (Fig. 2), which is a type of DCNN that automatically learns the images’ local features and generates a classification model [16,17,18]. The aspect ratio of the original Optos images was 3900 × 3072 pixels; for analysis, we resized the aspect ratio of all the input images to 256 × 192 pixels. The RGB image input had a range of 0–255; therefore, we normalized it to the range of 0–1 by dividing it by 255.

Fig. 2
figure 2

Overall architecture of the VGG-16 model. The deep convolutional neural network (DCNN) used ImageNet parameters: the weights of blocks 1–4 are fixed. Block 5 and the fully connected layers were adjusted

The VGG-16 comprised five blocks and three fully connected layers. Each block comprised some convolutional layers, followed by a max-pooling layer to decrease position sensitivity and improve generic recognition [19]. After flattening the output of block 5, there were two fully connected layers; the first removed the spatial information from the extracted features, and the last was a classification layer that used feature vectors of the target images acquired from the previous layers, together with the softmax function for binary classification. To improve the generalization performance, we conducted dropout processing; therefore, masking was performed with a probability of 25% for the first fully connected layer. Fine-tuning was used to increase the learning speed and optimize performance even with less data [20, 21]. We used parameters from ImageNet: blocks 1–4 were fixed, and block 5 and the fully connected layers were trained. The weights of block 5 and the fully connected layer that we were training were updated using the optimization momentum SGD algorithm (learning coefficient = 0.001, inertial term = 0.9), which is a stochastic gradient descent method [22, 23]. Of the 40 deep learning models obtained from 40 learning cycles, the one with the highest correct answer rate for the test data was selected as the DL model. To build and evaluate the model, Keras (https://keras.io/ja/) was run on TensorFlow (https://www.tensorflow.org/), which was written in Python.

Outcome

Receiver operating characteristic (ROC) curves were created based on the deep learning models’ abilities to discriminate between PDR and non-DR images. These curves were evaluated using area under the curve (AUC), sensitivity, and specificity.

Statistical analysis

Student’s t test was used to compare age, whereas Fisher’s exact text was used to determine the ratios of men to women and right to left eye images. The 95% confidence intervals (CIs) of the AUCs were obtained, as follows. Images that were judged to exceed a threshold were defined as positive for PDR, and an ROC curve was created. We created nine models and nine ROC curves. For AUC, a 95% CI was obtained by assuming a normal distribution, using the means and standard deviations of the nine ROC curves. For sensitivity and specificity, we used the optimal cutoff values, which were the points at which both sensitivity and specificity were 100% in each ROC curve [24]. The ROC curve was calculated using scikit-learn, and the CIs for sensitivity and specificity were determined using SciPy. The other statistical analyses were performed using SPSS version 22 software (IBM, Armonk, New York, USA). A two-sided P value of < 0.05 was considered statistically significant.

Data availability

The Optos image datasets analyzed during the present study are available from the corresponding author upon request.

Heatmap creation

As illustrated in the heatmap, the DCNN focuses on which coordinate axes on the image were classified (Fig. 3). The heatmap was generated using gradient-weighted class activation mapping (Grad-CAM) [25]; a gradient layer using the first convolution layer of block 3 was designated. ReLU was specified as the backprop modifier.

Fig. 3
figure 3

Representative ROC curve of the deep learning model

Results

In total, 132 PDR images from 94 patients (mean age 55.3 ± 12.5 years; 90 men and 42 women; 69 left fundus images and 63 right fundus images) and 246 non-DR images from 199 patients (mean age 55.2 ± 13.9 years; 161 men and 85 women; 127 left fundus images and 119 right fundus images) were analyzed. No significant differences were detected between these two groups in terms of age, sex, and left–right eye image ratio (Table 1).

Table 1 Patient demographics

Performance of the DCNN

For PDR diagnosis, the deep learning model had a sensitivity of 94.7% (95% CI 90.6–96.9%), specificity of 97.2% (95% CI 92.4–99.2%), and area under the curve (AUC) of 0.969 (95% CI 0.935–0.971) (Fig. 4).

Fig. 4
figure 4

Heatmap superimposed on the photograph. The red color represents the areas of deep neural network concentrations

Discussion

In this study, we investigated the deep learning method’s efficacy in identifying referable treatment-naïve PDR based on 132 fundus photographs. The deep learning algorithm showed high sensitivity of 94.7%, high specificity of 97.2%, and AUC of 0.969 for the detection of treatment-naïve PDR. We focused on treatment-naïve PDR only, because it can need immediate treatment. Even when the diagnosis was based on color photographs only, the results were comparable to those made based on color fundus images and FA assessment by retinal specialists.

In the past, deep learning was examined at all stages of diabetic retinopathy, with good results were obtained [8, 26,27,28,29]; however, the fundus camera of the rear pole was used. In this study, we used a wide-angle ocular fundus camera, because diabetic retinopathy is an important disease that can affect both the posterior region and periphery of the retina. The ETDRS 7 [30] defined lesions predominantly around the standard field as predominantly peripheral lesions (PPLs). The extent of these PPLs is associated with retinopathy progression [31, 32]. Therefore, the type of camera used is important.

The drawback of the present study was that we did not examine diabetic maculopathy, which causes vision disturbances and can also be diagnosed using deep learning, as reported by Gulshan et al. [8]. The algorithms’ ability to detect vision-threatening diabetic retinopathy is important to evaluate; the software’s sensitivity is especially important to determine. Originally, deep learning required tens of thousands of data to investigate the presence or absence of a diagnosis; however, the number of treatment-naïve PDR cases had been limited. Therefore, further studies are needed to assess whether diabetic retinopathy can be staged appropriately.

At present, making a final diagnosis based on images alone might not be accurate and prudent. We believe that image diagnosis should only be used to confirm a doctor’s diagnosis. However, in developing countries where the number of physicians may also be limited, remote image diagnosis may be especially useful. Kanjee et al. reported remote diagnosis was cost-effective, noting further reduction in medical expenses when automatic diagnosis was available [33]. To address the most urgent medical problems in the world in an efficient, timely, and cost-effective manner, all available resources are needed. Therefore, introducing artificial intelligence in the medical field is timely, welcomed, and needed.

Although combining DCNN and Optos images can provide better results, it is not particularly superior to medical examination. Personal and actual examinations by ophthalmologists remain indispensable for definite diagnosis. Furthermore, both conventional angiography and optical coherence tomography angiography, performed by a retinal specialist, are essential to confirm a qualitative diagnosis, assess treatment effects, and provide follow-up observations.

Conclusion

PDR could be diagnosed using an approach that involves wide-angle camera images and deep learning.