Identifying the severity of diabetic retinopathy by visual function measures using both traditional statistical methods and interpretable machine learning: a cross-sectional study

Aims/hypothesis To determine the extent to which diabetic retinopathy severity stage may be classified using machine learning (ML) and commonly used clinical measures of visual function together with age and sex. Methods We measured the visual function of 1901 eyes from 1032 participants in the Northern Ireland Sensory Ageing Study, deriving 12 variables from nine visual function tests. Missing values were imputed using chained equations. Participants were divided into four groups using clinical measures and grading of ophthalmic images: no diabetes mellitus (no DM), diabetes but no diabetic retinopathy (DM no DR), diabetic retinopathy without diabetic macular oedema (DR no DMO) and diabetic retinopathy with DMO (DR with DMO). Ensemble ML models were fitted to classify group membership for three tasks, distinguishing (A) the DM no DR group from the no DM group; (B) the DR no DMO group from the DM no DR group; and (C) the DR with DMO group from the DR no DMO group. More conventional multiple logistic regression models were also fitted for comparison. An interpretable ML technique was used to rank the contribution of visual function variables to predictions and to disentangle associations between diabetic eye disease and visual function from artefacts of the data collection process. Results The performance of the ensemble ML models was good across all three classification tasks, with accuracies of 0.92, 1.00 and 0.84, respectively, for tasks A–C, substantially exceeding the accuracies for logistic regression (0.84, 0.61 and 0.80, respectively). Reading index was highly ranked for tasks A and B, whereas near visual acuity and Moorfields chart acuity were important for task C. Microperimetry variables ranked highly for all three tasks, but this was partly due to a data artefact (a large proportion of missing values). Conclusions/interpretation Ensemble ML models predicted status of diabetic eye disease with high accuracy using just age, sex and measures of visual function. Interpretable ML methods enabled us to identify profiles of visual function associated with different stages of diabetic eye disease, and to disentangle associations from artefacts of the data collection process. Together, these two techniques have great potential for developing prediction models using untidy real-world clinical data. Graphical Abstract Supplementary Information The online version of this article (10.1007/s00125-023-06005-3) contains peer-reviewed but unedited supplementary material.


Interpretable machine learning -clustering
A secondary aim of the analysis was to investigate the ways in which our models handled issues surrounding missing values in the dataset.Specifically, we wanted to know whether the machine learning (ML) models were using blocks of missingness in making predictions and whether these artefacts could be disentangled using interpretable ML techniques.This is a relatively common challenge in datasets of this type and so we included microperimetry and the other variables with substantial missingness (Matrix perimetry and Moorfields chart acuity).

Methods
In addition to the global measures of variable importance based on SHAP values, we used Kmeans clustering by SHAP values to identify clusters of eyes in which predictions were made for similar reasons.Degree of clustering for each task was determined by visual inspection of elbow plots.We paid particular attention to variation in the proportion of imputed values in each cluster, aiming to identify clusters with the greatest data support as least likely to be influenced by artefacts of imputation.

Results
For task A we identified eight clusters of eyes, two in which eyes were predicted to be DM no DR, but for different reasons in each.For example, the positive SHAP values for reading index, microperimetry central 5 points mean sensitivity, age, and Moorfields chart acuity for cluster Y indicated that these variables contributed most strongly to the prediction of membership in the "DM no DR" group (Figure ESM1).Cluster Y was characterised by below average values for reading index, central 5 points sensitivity and Moorfields chart acuity and above average age (Figure ESM2).In contrast, for cluster V microperimetry average sensitivity contributed most strongly to a "DM no DR" prediction, followed by age and reading index.Cluster V was characterised by above average values for the microperimetry variables and average values for age and reading index.The other tasks highlighted distinct profiles of visual function measurements (clusters of eyes) that produced similar model predictions.In task B eyes in four of eight clusters were predicted to be DR no DMO.In task C, two of five clusters were predicted to be DR with DMO.

Discussion
There was substantial variation in the proportion of imputed values across clusters, indicating that missingness clusters.For example, the global proportion of imputed values for microperimetry for task A was 55%, yet in clusters Y and V, the proportion of missing values for microperimetry was >60% and <50% respectively.Some of the patterns in these clusters are biologically plausible (e.g.below average reading index being associated with increased probability of being DM no DR rather than no DM) so imputation may have mirrored the pattern across the other visual function variables rather than driving the predictions.However, it is impossible to determine whether the imputed values are 'correct', so we interpret predictions from clusters with a larger proportion of missing values with more caution as there is less data support for the patterns detected.In clusters with less imputed values, (e.g.cluster V), more attention should be given to the influential variables with few imputed values such as reading index and Moorfields chart acuity.

FiguresFigure ESM1 .
Figures Figure ESM1.Influence of variables on classifications for two clusters in which the same classifications were made for different reasons.Positive SHAP values indicate increased probability of being DM no DR.

Figure ESM2 .
Figure ESM2.Distribution of visual function measurements for two clusters in which the same classifications were made for different reasons.

Table ESM1 .
Distribution of eyes by diabetes and retinopathy status and data source.

Table ESM2 .
Classification performance by model type and diabetes and retinopathy status classification task.TN = True Negative, FN = False Negative, FP = False Positive, TP = True