FormalPara Key Summary Points

Why carry out this study?

Diabetic macular edema (DME) is a leading cause of vision loss in people with diabetes.

Automated retinal image analysis systems (ARIAS) using artificial intelligence (AI) have shown promise in screening for DME, but performance in monitoring DME is unknown.

What was learned from the study?

ARIAS’ performance in DME detection was lower in subgroups like older patients and those with history of inactive DME.

ARIAS is effective for screening naïve DME cases but has limitations in monitoring inactive DME.

Additional imaging like OCT when using ARIAS for DME surveillance in patients with history of disease is still needed.

Introduction

Individuals with diabetes mellitus (DM) are prone to various health complications, which can significantly impact their quality of life and increase mortality rates [1]. One of the major threats to vision in patients with uncontrolled or long-standing diabetes is diabetic retinopathy (DR), with diabetic macular edema (DME) being the leading cause of central visual loss in these individuals [2]. The prevalence of DME ranges from 4.2 to 7.9% in patients with type 1 DM and from 1.4 to 12.8% in type 2 DM [3].

Treatment for DME typically involves the use of intravitreal anti-vascular endothelial growth factor (VEGF) agents and intravitreal steroids administered at regular intervals until the macula is dry, followed by progressive spacing of treatments [4, 5]. Regardless of the specific treatment received, regular follow-up is crucial, as DME may persist or recur [6]. Spectral domain-optical coherence tomography (SD-OCT) plays a vital role in the early detection of intraretinal or subretinal fluid, guiding tailored treatment management. Prompt identification and treatment of recurrent DME are essential to prevent vision loss [7].

Recent advancements in computing power, the availability of large datasets, and the accessibility of machine learning and neural network frameworks have greatly improved the automated grading of retinal images [8]. Automated retinal image analysis systems (ARIAS) with artificial intelligence (AI) capabilities have emerged, enabling accurate identification of DR and DME without the need for human graders [9,10,11,12,13,14,15]. The diagnostic accuracy of ARIAS for DR and DME detection is comparable to that of expert graders [16], leading to their implementation in nationwide screening programs [17].

Researchers have conjectured the possibility of utilizing ARIAS for the decentralized surveillance of patients with inactive sight-threatening DR or DME [18]. However, it remains unclear whether ARIAS' diagnostic performance would be consistent in such cases, where retinal lesions may be present in the absence of active disease. In this study, our focus was specifically on DME. We aimed to estimate the sensitivity and specificity of ARIAS in detecting active DME, regardless of the patient’s treatment status. Additionally, we sought to identify demographic and clinical factors that may influence the performance of ARIAS in DME detection.

Methods

We conducted a cross-sectional study involving adult patients (≥ 18 years old) with DM who were prospectively recruited at the Medical Retina Unit of the Department of Ophthalmology at San Raffaele Hospital in Milan, Italy, between April 2022 and January 2023. The participants were referred for screening or management of DME. Patients with concomitant retinal conditions that could potentially confound DME detection, such as age-related macular degeneration, were excluded. Both eyes of eligible patients were included in the study.

All investigations were conducted following the principles outlined in the Declaration of Helsinki for research involving human subjects. The study received approval from the San Raffaele Hospital Ethics Committee under the protocol name “OCTA-MIMS” (97/INT/2021), and written informed consent was obtained from all subjects.

Data Collection and Imaging

We collected demographic information, findings from slit-lamp examinations, medical and ocular history of each participant, including the type and duration of DM, glycated hemoglobin (HbA1c) levels, and details of previous ocular treatments. Retinal imaging was performed using a confocal LED fundus camera (iCare DRSplus, Centervue, Padua, Italy) after pupil dilation. The imaging protocol included two retinal photographs covering a field of 45 × 40 degrees, with one centered on the optic disc and the other on the macula. Additionally, ultra-widefield fundus photography (Silverstone, Optos, CA) and SD-OCT (Spectralis, Heidelberg Engineering, Heidelberg, Germany) were performed for each eye.

Imaging Analysis

The retinal images acquired with the LED fundus camera were processed using a commercially available, FDA-approved ARIAS [19] (EyeArt program V2.1.0, Eyenuk Inc, Woodland Hills, CA, USA). This cloud-based software provides disease classification (presence or absence) based on the detection of DR signs and assigns a score for DR severity according to the international clinical diabetic retinopathy (ICDR) severity scale. It also indicates the presence or absence of DME signs.

DR severity was graded based on ultra-widefield fundus photographs, categorized as no DR, non-proliferative DR (NPDR) (mild, moderate, or severe), or proliferative DR (PDR) using the Early Treatment Diabetic Retinopathy Study (ETDRS) scale. The grading of the retinal periphery was facilitated by the use of ultra-widefield fundus photographs. The presence of hard exudates (discrete white-yellow deposits in the posterior pole) and microaneurysms (localized capillary outpouchings in the macular area) was recorded after digitally zooming in the macular region.

SD-OCT scans were used to assess the presence of active DME [18], defined as intraretinal and/or subretinal fluid involving the central subfield zone (center-involving DME), not involving the central subfield zone (non-center involving DME), or a combination of both (diffuse DME). The grading was performed by two trained readers (L.L.F. and C.R.). Additionally, the presence of an epiretinal membrane (ERM) was evaluated, and the central macular thickness (CMT) was automatically measured using the Spectralis software. Cases where assessment was not feasible were treated as missing values.

Inactivated DME was characterized by a documented history of the condition, evidenced through prior treatments or OCT scans, coupled with the absence of DME in the current OCT scan.

Statistical Analysis

Statistical analyses were conducted using R version 4.2.2 and SAS v9.3 (SAS Institute Inc, Cary, NC, USA). The sample size calculation was based on assuming a sensitivity of 90% and a DME prevalence of 18% [20] using the epi.ssdxsesp function from the R epiR package. The sample size of 301 eyes provided 95% confidence to estimate the sensitivity within 0.08 of the true population value [21].

Demographic and clinical characteristics were summarized as means ± standard deviations (SD) or frequencies and proportions (%). Logistic regression models, with the patient identification number as the random term to account for eye inclusion from some patients, were used to compare features between eyes with active DME and those without active DME. The disease agreement between the two eyes of the same patient was assessed using the kappa statistic (κ), which measures the ratio between the observed proportion of agreement and the proportion of agreement expected by chance. Kappa values range from + 1 (perfect agreement) to -1 (perfect disagreement) [22].

A generalized estimating equations (GEEs) approach from the GENMOD SAS procedure was employed to estimate the sensitivity and specificity of ARIAS. The ARIAS results served as the outcome variable, and the presence of active DME on SD-OCT was the predictor [23]. The 95% confidence intervals (CIs) were calculated, accounting for potential inter-eye correlations [23]. The positive predictive value (PPV) and negative predictive value (NPV) were determined based on the observed prevalence of active DME in the study group.

Logistic regression models were used to identify factors associated with true positives (i.e., DME labeled as present by ARIAS + intraretinal and/or subretinal fluid on SD-OCT) in the subset of eyes with active DME on SD-OCT. Inter-eye correlations were not corrected in the models as the analysis focused on eye-level ARIAS performance. Similar analyses were conducted to identify factors associated with false positives (i.e., DME labeled as present by ARIAS + no intraretinal and/or subretinal fluid on SD-OCT) in the subset of eyes without active DME on SD-OCT. The optimal cut-point for numerical covariates was determined using the cutpointr R package. Odds ratios (OR), 95% CI, and corresponding p-values were reported. The level of alpha = 0.10 was used to assess clinical significance.

Results

A total of 399 eyes from 205 patients with DM were initially collected. After excluding eyes with retinal co-morbidities (24 eyes) and those with low-quality or missing SD-OCT scans (58 eyes), a total of 298 eyes from 154 patients were included in the study. Among these, 144 patients contributed both eyes to the study. Inter-observer agreement was 100%.

Prevalence and Clinical Characteristics of DME

Out of the included eyes, 73 (24%) had a history of DME, and 50 eyes had received previous intravitreal treatments. Sixty-two eyes had active DME on SD-OCT at the study visit, while 11 had inactivated DME. An additional 30 eyes were first diagnosed with DME during the study visit, resulting in a total of 92 eyes (31%) from 64 patients with active DME on SD-OCT. There was a high agreement (79%) between the two eyes of patients regarding the presence or absence of DME, with a kappa value of 0.51 (95% CI 0.35–0.66) (Table 1).

Table 1 Inter-eye agreement in active DME status from SD-OCT examination (N = 144 patients contributing with both eyes to the study pool)

The demographic and clinical characteristics of the study eyes are presented in Table 2. Patients with active DME were older on average compared to those without DME (p = 0.03). They were also more likely to have type 2 DM (p = 0.007) and higher HbA1c levels (p < 0.001). The presence of cataracts or prior cataract surgery was more common in eyes with active DME (p = 0.001). The severity of DR was significantly worse in eyes with DME, with higher proportions of moderate NPDR, severe NPDR, or PDR (p < 0.001). Fundus examination revealed a higher frequency of hard exudates (p < 0.001), microaneurysms (p < 0.001), and ERM (p = 0.07) in eyes with active DME. The CMT was also significantly higher in the active DME group (p < 0.001).

Table 2 Demographic and clinical characteristics of the study eyes stratified by the presence of active DME

SD-OCT and AI Grading Agreement

Among the 298 paired SD-OCT and ARIAS gradings, there was an 84% agreement on DME status (Table 3). The rate of false positives was 11%, while the rate of false negatives was 5%.

Table 3 Cross-tabulation of active DME status from SD-OCT and ARIAS at the eye-level (N = 298 eyes from 154 patients)

ARIAS Performance Indices

In the entire eye pool, the sensitivity of ARIAS for detecting active DME was 82.61% (95% CI 72.37–89.60), with a specificity of 84.47% (95% CI 78.34–89.10%). The PPV was 70.37% (95% CI 60.82–78.77), and the NPV was 91.58% (95% CI 86.68–95.11). The overall test accuracy was 84%, with a misclassification rate of 16%. Excluding eyes with a history of DME and no active DME at the study visit (as it would routinely happen in a screening setting) increased the specificity and sensitivity of ARIAS to 87.69% and 86%, respectively. Figure 1 shows four example cases: a true positive, a true negative, a false positive, and a false negative.

Fig. 1
figure 1

A series of cases that were found to be true positive (A), true negative (B), false positive (C), and false negative (D) for active diabetic macular edema (DME) detection, respectively

Factors Associated with True Positives on ARIAS Test

The analysis of factors associated with true positives was conducted in the subset of 92 eyes with active DME on SD-OCT. Younger age (60.6 ± 11.1 vs. 68.4 ± 11.6 years, OR = 0.94 for each year, 95% CI 0.88–0.99, p = 0.01), shorter duration of DM (15 ± 11.2 vs. 26.9 ± 9.46 years, OR = 0.92 for each year, 95% CI 0.85–0.97, p = 0.006), advanced DR stage (53% vs. 19% with severe NPDR or PDR; OR = 8.89 vs. mild NPDR, 95% CI 1.91–80.9, p = 0.045), presence of hard exudates (95% vs. 69%, OR = 8.18 if present, 95% CI 1.90–37.9, p = 0.005), and microaneurysms (97% vs. 69%, OR = 16.8 if present, 95% CI 3.21–128.2, p = 0.002) were associated with a higher chance of active DME detection by ARIAS. Previous diagnosis of DME and previous DME treatments did not affect ARIAS detection performance (Table 4).

Table 4 Distribution of demographic and clinical characteristics associated with true positives and false negatives on ARIAS test

In analyzing the distribution regarding the true positives and false negatives of DME detection by ARIAS, it is notable that among the categories of No DR, Mild NPDR, Moderate NPDR, Severe NPDR, and PDR, a total of 11 cases (eight from Moderate NPDR and three from Severe NPDR/PDR) would still be classified as referable retinopathy. Only four cases in the No DR/Mild NPDR category were completely missed.

Factors Associated with False Positives on ARIAS Test

The analysis of factors associated with false positives was conducted in the subset of 206 eyes without active DME on SD-OCT. Longer DM duration (23.4 ± 12.6 vs. 16.1 ± 12.4 years, OR = 1.05 for each year, 95% CI 1.01–1.08, p = 0.01), worse DR severity (63% vs. 20% with moderate NPDR, OR = 5.90 vs. mild NPDR, 95% CI 1.81–26.8; 25% vs. 2% with severe NPDR or PDR; OR = 20.7 vs. mild NPDR, 95% CI 4.25–132.5, p < 0.001), a history of DME (28% vs. 2%, OR = 19.4 if present, 95% CI 5.18–94.2, p < 0.001), and the presence of hard exudates (44% vs. 5%, OR = 16 if present, 95% CI 6.07–45.4, p < 0.001), microaneurysms (91% vs. 30%, OR = 22.5 if present, 95% CI 7.57–96.9, p < 0.001), or ERM (13% vs. 4%, OR = 3.39 if present, 95% CI 0.84–12.0, p = 0.06) were associated with a higher chance of false positive results by ARIAS. A higher CMT was also associated with false positives, with an optimal cut-point of 291 μm (53% vs. 28%, OR = 2.97 if CMT ≥ 291 μm, 95% CI 1.38–6.49, p = 0.005) (Table 5).

Table 5 Distribution of demographic and clinical characteristics associated with true negatives and false positives on ARIAS test

Discussion

Traditionally, DR and DME screening and monitoring have relied on dilated fundus examinations and OCT conducted by trained ophthalmologists. However, the advent of newer imaging modalities, including stereoscopic imaging, nonmydriatic cameras, and mobile phone fundus cameras, has revolutionized the landscape of eyecare for patients with DR. These technologies facilitate decentralized care delivery through telemedicine [24] and virtual clinics [25], enabling ophthalmologists to assess patient conditions remotely by examining fundus or OCT images, thereby reducing the need for in-person consultations.

While these imaging techniques have been helpful, picture grading by humans is subjective, time-consuming, and requires specialized training. Moreover, despite the increasing accessibility of OCT, its availability remains limited in many primary care settings, especially in resource-constrained areas. To address the increasing workload for eyecare services due to the growing diabetic population and uneven resource distribution, automatic algorithms like ARIAS have been developed for the recognition of DR and DME. These algorithms have demonstrated efficiency, cost-effectiveness, and reproducibility in screening for sight-threatening DR on a large scale [16, 26]. Among them, the EyeArt software has shown a sensitivity of over 96% in diagnosing referable DR [17, 27,28,29], regardless of various factors such as age, ethnicity, dilation status of the lens or pupil, and the type of fundus camera used [30]. ARIAS tools have been validated and approved for screening purposes, with previous studies conducted in treatment-naïve patients with DR. However, the performance of ARIAS in detecting inactivated sight-threatening DR, such as reabsorbed DME or inactive PDR, remains to be determined.

To alleviate the burden of frequent ophthalmic visits, the EMERALD study in the UK explored whether nonmedical staff in community-based settings could follow up with patients having inactivated sight-threatening DR. The study found that nonmedical graders had a sensitivity of 97% and a specificity of 31% in detecting active DME compared to face-to-face encounters with ophthalmologists [18]. The authors suggested implementing this alternative pathway in standard care to increase hospital capacity and improve cost-effectiveness. If ARIAS could detect the reactivation of PDR or recurrence of DME, it would further support the adoption of less burdensome surveillance strategies for the healthcare system.

This present study investigated the diagnostic performance of ARIAS in detecting active DME in a cohort of patients with diabetes referred to a tertiary medical retina center for either DME screening or management. The patients in the study were heterogeneous in terms of DR stage, DME history, and treatment status. ARIAS demonstrated excellent sensitivity and specificity for DME, as previously reported [28]. However, its diagnostic performance decreased when patients with a history of inactivated DME were included in the study. Some eyes were misclassified as not having DME despite the presence of intraretinal or subretinal fluid on SD-OCT (false negatives), while others were classified as having DME even though their macula was dry (false positives).

Detecting active DME in non-stereoscopic fundus images is inherently challenging and relies on surrogate biomarkers like hard exudates, microaneurysms, or hemorrhages. Most true positive cases had either hard exudates or microaneurysms in the macular area. However, 17% of eyes with active DME lacked these biomarkers, indicating a weak correlation between surrogate biomarkers and the presence of fluid on SD-OCT [31]. Older age and longer disease duration were predictors of lower sensitivity since fundus lesions suggestive of DME tend to spontaneously reduce over time [32], leading to higher false negative rates. In our analysis of the true positives and false negatives in DME detection by ARIAS across various retinopathy categories, it is significant to note that 11 cases (eight in Moderate NPDR and three in Severe NPDR/PDR) were still correctly identified as referable retinopathy.

ARIAS demonstrated a specificity of 84%. Longer disease duration and worse DR severity were associated with higher rates of false positives, where chronic damage to the macula, such as macular ischemia [33], might be misinterpreted as active DME. The presence of microaneurysms in the fovea and hard exudates were common in false positive eyes. Additionally, subclinical macular thickening with CMT higher than the optimal cut-point of 291 μm was associated with a high false positive rate by ARIAS for DME recognition [34]. Further research is needed to explore other macular features that may act as confounders for DME and evaluate their impact on ARIAS performance [35].

As OCT technology becomes increasingly available, the integration of AI for the analysis of OCT images may significantly enhance diagnostic precision. AI's ability to detect subtle changes indicative of early DME—changes that might be missed in manual or fundus examinations—exemplifies its potential to augment traditional diagnostic methods. The widespread availability of OCT scanners is undeniably beneficial, and when paired with AI, this duo can transform DME screening into a more efficient and accessible process. Nonetheless, while embracing this technological synergy, it is imperative to rigorously assess AI performance across varied clinical settings. This ensures that AI acts as a supportive tool, enriching rather than supplanting the invaluable expertise of healthcare professionals.

The study has some limitations, including its cross-sectional design, which precludes assessing ARIAS repeatability and consistency over time. The reported PPV and NPV for DME detection should be interpreted carefully as they are influenced by disease prevalence. Biases in population selection may not fully represent the population in a screening setting, as the exclusion of age-related macular degeneration patients from the study. The study did not differentiate between different treatments for DME, so the effect of specific molecules on the misclassification rate could not be determined. Lastly, the study focused on white patients, and the findings may not apply to other ethnicities.

Conclusions

In conclusion, although ARIAS demonstrated high sensitivity for DME, its diagnostic performance varied in specific subgroups. Older patients and those without classic DME-related funduscopic lesions had lower sensitivity. On the other hand, eyes with inactivated DME had lower specificity. Therefore, ARIAS systems cannot be solely relied upon for DME surveillance, and additional imaging with SD-OCT is still necessary. Consideration of multi-center studies or external validation would strengthen the study's impact and support the broader implementation of ARIAS in clinical practice.