Introduction

Diabetic retinopathy is the most common complication in diabetes, affecting 97% of patients with long-term type 1 diabetes [1, 2]. In 2020 it was estimated that 103 million adults worldwide had diabetic retinopathy, and this number is projected to rise to 160 million by 2045 [3].

Diabetic retinopathy is an ischaemic ocular disease [4] that is initially characterised by retinal neurodegeneration [5] and the vascular formation of retinal microaneurysms and haemorrhages (Fig. 1). At later stages, cotton wool spots, venous beading and intraretinal microvascular abnormalities may emerge. Continuous hypoxia and associated release of vascular endothelial growth factor (VEGF) may result in proliferative diabetic retinopathy (PDR) [6], with the formation of fragile retinal vessels at the vitreous interface. This is a sight-threatening complication that may lead to vitreous haemorrhage and tractional retinal detachment [7]. While traditional therapeutic options for PDR include panretinal photocoagulation [7] and vitrectomy [8], it has recently been demonstrated that intravitreal therapy with VEGF inhibitors such as aflibercept [9] and ranibizumab [10] is also a viable option, although long-term data are lacking.

Fig. 1
figure 1

Emerging stages of diabetic retinopathy according to the International Clinical Diabetic Retinopathy and Diabetic Macular Edema Disease Severity Scales [21]. In general, treatment is indicated for patients with sight-threatening diabetic retinopathy (PDR or DME), while regular and continuous diabetic retinopathy screening is needed for those without diabetic retinopathy or with non-PDR. This figure is available as part of a downloadable slideset

Oxidative stress and hyperglycaemia may also induce VEGF-mediated breakdown of the inner blood–retinal barrier, with subsequent formation of diabetic macular oedema (DME) [11]. For many years, macular photocoagulation was considered the gold standard treatment for DME [12], but in the last decade this has been largely replaced by intravitreal VEGF inhibitors [13], which often provide a better gain in visual acuity, but also come at a higher financial cost and require a median of 13 injections within the first 5 years of treatment [14].

Screening is an established concept used to identify disease in people without apparent symptoms. In 1968, the WHO coined a well-adapted set of criteria that should be met if screening is to be considered for a given disorder [15]. These include that (1) the condition should be an important health problem; (2) there should be an accepted treatment and the prognosis should be favourably adjusted if this is given at a presymptomatic stage; (3) facilities for diagnosis and treatment should be available; (4) a recognisable latent stage should exist; (5) a suitable and acceptable test can be performed; and (6) the cost of case finding should be economically feasible for society.

In Europe and most other developing countries, these criteria are met for diabetic retinopathy screening. Specifically, (1) sight-threatening diabetic retinopathy (STDR) affects 28.5 million people worldwide [3]; (2) retinal photocoagulation, intravitreal therapy or surgery effectively reduces visual loss in STDR [7, 12, 13]; (3) ophthalmic care is available in most healthcare systems in Europe and to some extent throughout the world; (4) PDR is often asymptomatic prior to the advancement to severe visual loss caused by vitreous haemorrhage or tractional retinal detachment; (5) fundus photography can effectively identify diabetic retinopathy [16, 17]; and (6) diabetic retinopathy screening and treatment is a highly cost-effective health investment for society [18].

Clinical implementation of diabetic retinopathy screening

Recent position papers published by the International Council of Ophthalmology [19] and the ADA [20] have provided comprehensive evidence on diabetic retinopathy screening. One of the key perspectives was the recommendation to use a simple scale for diabetic retinopathy grading. The International Clinical Diabetic Retinopathy and Diabetic Macular Edema Disease Severity Scales (ICDR) have been adopted most frequently, as originally proposed by Wilkinson et al [21]. In contrast to the originally adapted Early Treatment Diabetic Retinopathy Study (ETDRS) scale [16], the ICDR scale contains only five levels, which makes it much easier to use in a clinical setting.

While ETDRS seven-field photography has traditionally been regarded as the gold standard method for diabetic retinopathy classification [16], retinal imaging with fewer than seven fields or using wide-field fundus photography has an acceptable level of performance in general and may be less time-consuming for patients and improve patient comfort [22]. Another consideration is to include optical coherence tomography (OCT) in diabetic retinopathy screening, given that three-dimensional retinal scans detect DME more easily than two-dimensional fundus photography [23]. In the UK screening programme, Mackenzie et al demonstrated that the inclusion of OCT made it possible to rule out 42.1% of cases in which DME had been suspected based on fundus photography alone [24]. While the inclusion of OCT in diabetic retinopathy screening might not be economically feasible in all countries, it is increasingly being implemented in ophthalmic care and is often integrated in fundus cameras. Including OCT in diabetic retinopathy screening in patients suspected of DME, as has been implemented in Denmark [25], may reduce the number of false positives.

While fundus photography is considered the gold standard for diabetic retinopathy screening, dilated fundoscopy is still used in some settings. This procedure has the advantage of a lower cost and, in the hands of trained ophthalmologists, sometimes provides a better visualisation of the retina with higher magnification in patients with blurry ocular media, vitreous opacities or insufficient pupil dilation. However, it has a lower performance than mydriatic fundus photography [26], is often more time-consuming and does not provide retinal images for clinical records.

With regard to the successful implementation of diabetic retinopathy screening, it is a concern that the number of ophthalmologists is limited, with substantial variation between countries. While Europe has a mean of 18.03 ophthalmologists per 1000 patients with STDR, the corresponding number in Africa is only 0.91 [27]. One way to deal with this shortage would be for non-ophthalmologists, such as nurses, optometrists or physicians, to perform retinal imaging and/or diabetic retinopathy grading [28]. Given access to high-quality retinal cameras and sufficient training in diabetic retinopathy grading, these tasks could potentially be outsourced to such healthcare professionals.

Finally, substantial evidence has demonstrated that diabetic retinopathy screening at fixed annual intervals is no longer cost-effective for most patients [29]; rather, individualised screening based on risk can extend the screening interval by almost three times [30] and at the same time reduce the number of screening episodes by 40% [31]. Conversely, for patients with high-risk characteristics such as advanced diabetic retinopathy, systemic dysregulation or pregnancy, shorter intervals are needed to detect potential STDR before irreversible visual loss occurs.

According to the St Vincent Declaration of 1990, European countries should ‘reduce the risk of visual impairment due to diabetic retinopathy by systematic programmes of screening reaching at least 80% of the population’ [32]. Since 1990, meetings of national representatives have been held at regular intervals to discuss progress and exchange experiences. Most recently, the WHO European Region recommended diabetic retinopathy screening for all people with diabetes and provided guidance on overcoming obstacles to implementation [33].

The largest national diabetic retinopathy screening programme is located in the UK, with most parts of the Diabetic Eye Screening Programme implemented in 2003 [34]. Diabetic retinopathy screening is offered to all those with diabetes aged 12 or above, and those with pre-PDR or early maculopathy are referred to a hospital eye service or digital surveillance clinic for surveillance and more frequent screening. While 2-year screening intervals for low-risk individuals have been recommended [35], this has been implemented only in Scotland, with annual screening performed elsewhere. As a partial consequence of the successful implementation of the UK screening programme, Liew et al demonstrated that, for the first time in half a century, diabetes was no longer the leading cause of blindness in working age adults in England and Wales [36].

Other European countries have also successfully implemented diabetic retinopathy screening. Iceland was among the first, with a national programme established as far back as 1980 [37]. Since then, some other European countries, including Denmark, Finland, Ireland and Sweden, have introduced national programmes [25, 38, 39], and some have partially launched or are on the brink of implementing programmes [39]. In Asia, national guidelines on diabetic retinopathy screening are available for 11 of 50 countries, but less than half of the population is covered and full details of screening programmes have been presented for only two countries [40].

Handheld mobile devices

The need to travel vast distances and lack of access to retinal cameras are both important barriers to the clinical implementation of diabetic retinopathy screening in rural areas and low-income countries. Low-cost, handheld mobile devices and ocular telehealth programmes are two potential solutions to these obstacles.

In a systematic review of commercially available handheld fundus cameras, Palermo et al evaluated five studies assessing diabetic retinopathy [41]. Compared with traditional non-portable retinal cameras, the pooled sensitivity and specificity for the detection of diabetic retinopathy using handheld devices were 87% and 95%, respectively. However, it was not reported if the results were also robust for STDR, which it is critical not to miss. Another potential concern is the rate of ungradable images. Piyasena et al demonstrated in 700 patients that this rate was substantially higher for non-mydriatic images (43.4%) than for mydriatic images (12.8%) [42], indicating that image quality is another important barrier to clinical implementation. These concerns were addressed to some extent by Zhang et al, who reported rates of 86–94% for gradable non-mydriatic retinal images along with sensitivities and specificities for STDR detection of 64–88% and 71–90%, respectively [43].

Another prerequisite for the successful implementation of handheld mobile devices in diabetic retinopathy screening is the availability of trained healthcare professionals to analyse the images produced. Artificial intelligence (AI) has the potential to address this issue. In a systematic review of four studies on AI technology by Sheikh et al, the pooled sensitivity and specificity for detecting referable diabetic retinopathy (moderate non-PDR or worse, with or without DME) were 97.9% and 85.9%, respectively [44]. In fact, these values were higher than the corresponding values for detecting any diabetic retinopathy (89.5% and 92.4%, respectively).

Ocular telehealth programmes

Ocular telemedicine also has the potential to improve the use of resources in diabetic retinopathy screening. This could address, in particular, the lack of trained healthcare professionals, as reported by Gibson et al, who found that one in four US counties did not have any ophthalmologists or optometrists [45].

The concept would involve having a central grading centre that would receive retinal images from local clinics. In a systematic review, Horton et al identified a number of factors that could improve the quality of such settings, including imaging a sufficient number and sufficient sizes of retinal fields, undertaking mydriatic examinations and stereoscopic imaging, and using licensed eye care providers to perform evaluations at the reading centres [46]. From a financial point of view, Avidor et al demonstrated that telemedicine-based diabetic retinopathy screening could lead to substantial cost savings, particularly in low-income countries and rural populations [47], and Nguyen et al reported that such a screening programme would save the Singaporean healthcare community 29.4 million Singapore dollars over a lifetime horizon [48].

In Denmark, telemedicine in diabetic retinopathy screening has been combined with evaluations of other micro- and macrovascular complications in a single visit screening programme for the complications of diabetes at selected hospital-based centres. The rationale is to improve compliance for patients by reducing the number of diabetes-related visits to healthcare clinics. With all examinations (including diabetic retinopathy screening) performed at local diabetes clinics (and telemedicine-based evaluation of diabetic retinopathy at regional reading centres), local diabetologists are able to gain real-time access to all relevant information (i.e. demographics, HbA1c, blood pressure, cholesterol and level of diabetic retinopathy and other vascular complications), enabling optimised treatment and care, which has often been difficult because of a lack of communication between different sectors of the healthcare community (e.g. ophthalmologists and diabetologists).

Deep learning

As diabetic retinopathy screening is demanding and time-consuming for healthcare professionals, the use of AI has been proposed as a potential solution to enhance diagnostic performance and save human resources.

As a first-generation system, machine learning (ML) has been validated for some time in diabetic retinopathy grading. These early systems have been based on automated recognition of diabetic retinopathy-related lesions such as retinal microaneurysms and haemorrhages. Recognition is based on well-defined inputs for disease classification (e.g. shape, colour and expected locations of particular lesions) and ultimately translates into an overall diabetic retinopathy category.

ML-based systems have demonstrated a high sensitivity of 87–95% for diabetic retinopathy classification [49], which is important, as the main purpose of diabetic retinopathy screening is to avoid missing those with STDR. On the other hand, automated diabetic retinopathy screening is based on the concept of saving resources, which is not necessarily the case for such algorithms, as they are generally limited by modest specificities of 50–69% [49]. This results in a high number of false positives, which limits the cost-effectiveness of implementing such systems.

Deep learning (DL) by convolutional neural networks (CNNs) is a new generation of ML that is particularly suitable for automated image analysis (Fig. 2) [50]. In contrast to traditional ML, DL requires much less human guidance, as it is not based on the formation of hand-crafted features. Thus, training of a DL-based algorithm for classifying diabetic retinopathy, for example, requires only a ground truth-labelled set of data (e.g. fundus photographs graded for diabetic retinopathy by human experts). After training, the CNN is able to reliably label unknown data.

Fig. 2
figure 2

Fundus photographs of eyes with PDR (a) and fovea-involving haemorrhages and hard exudates indicating diabetic macular oedema (b), with corresponding DL-based annotated retinal lesions ([c] for image [a] and [d] for image [b]) including retinal microaneurysms (green), haemorrhages (magenta), cotton wool spots (yellow), intraretinal microvascular abnormalities (cyan), new vessels (blue) and panretinal photocoagulation scars (purple). The DL-based model was developed at the University of Southern Denmark, Odense, Denmark, using a variation of the U-net architecture equipped with an Inception v3 encoder pretrained on ImageNet. This figure is available as part of a downloadable slideset

A CNN resembles the mammalian brain, which has multiple layers of neurons to process sensory inputs. In CNNs, initial layers recognise well-defined features such as edges, lines and colours, while succeeding layers enable identification of more complex patterns. There is a fully connected network of hidden layers between the input and output layers, and mathematical processes such as convolution and pooling are used to enable the software to apply different levels of importance to the features of the images (Fig. 3).

Fig. 3
figure 3

Structure of a CNN constructed to classify the level of diabetic retinopathy using multiple connected layers. This figure is available as part of a downloadable slideset

Classification of diabetic retinopathy

In a pivotal paper, Gulshan et al used 128,175 retinal images to build a DL-based algorithm, which was able to detect moderate or worse diabetic retinopathy with a sensitivity and specificity of more than 90% [51]. This was confirmed by Ting et al, who also showed that DL had a similar efficacy in detecting other ocular diseases such as possible glaucoma and age-related macular degeneration [52]. Along with others [53], these studies demonstrate that DL can effectively reduce the high number of false positives, which has been an obstacle to the implementation of traditional ML.

In 2018, the IDx-DR device (Digital Diagnostics, USA) was the first software program to obtain approval from the US Food and Drug Administration; it was able to meet all predefined superiority endpoints for classification of more than mild diabetic retinopathy with a sensitivity, a specificity and an imageability rate of 87.2%, 90.7% and 96.1%, respectively [54]. A subsequent study showed that classifications provided by the system compared well with those provided by three independent retinal specialists in a cohort of 1415 people with type 2 diabetes [55].

With the substantial variation in diabetic retinopathy screening systems in different healthcare systems, detection of moderate diabetic retinopathy or worse may not always be a suitable threshold for referral. For example, in countries such as Denmark, where referral is only needed for patients with STDR, this could lead to a 90% false positive rate [56]. In such countries, it will be pivotal to classify the exact stage of diabetic retinopathy (e.g. according to the ICDR scale) and, in particular, detect end-stages such as PDR. While this has not been addressed extensively, Tang et al demonstrated proof of concept that it is in fact possible to detect and localise retinal neovascularisation in diabetic retinopathy [57].

As wide-field imaging is increasingly used in diabetic retinopathy screening, it will also be important to test if the images produced are suitable for DL. Tang et al confirmed that this is the case for a DL system used for the detection of referable diabetic retinopathy in 9392 wide-field images from individuals from various geographic regions, which had accuracies of more than 90% [58].

While most studies on DL have addressed the classification of advanced levels of diabetic retinopathy, which may lead to referral and treatment, it would also be valuable to identify patients without diabetic retinopathy, as they comprise the vast majority of patients [59] and can safely undergo extended screening intervals [60]. This is, however, a more difficult task, as retinal microaneurysms are tiny and account for less than 0.5% of all pixels in retinal images [61]. Hence, so far this task has been performed with only a moderate sensitivity of 57% [61].

Detection of DME

In contrast to DL models for classification of diabetic retinopathy, detection of DME relies on segmentation to identify relevant lesions such as hard exudates or macular cysts at a pixel level. Consequently, ground truth notification depends on every pixel of the image, as opposed to classification tasks, which require only an overall grading of the image.

De Fauw et al elegantly demonstrated that DL was indeed able to identify DME and other causes of macular oedema in three-dimensional OCT images [62]. Using 14,884 scans, it was possible to construct an algorithm that, in many ways, had a lower error rate than that in a group of retinal specialists and trained optometrists. This was confirmed by Tang et al, who used 73,746 OCT scans to construct a multitask CNN to classify different types of DME, with an area under the receiver operating curve of >0.93 [63].

Clinical integration and future perspectives

Even though remote diabetic retinopathy screening using portable fundus cameras was launched in Australia in 2000 [64], integration and implementation of in silico experiments have advanced slowly in the field of diabetic retinopathy. While ML-based diabetic retinopathy screening has been implemented to some extent in countries such as Scotland [65] and Portugal [66], nationally embraced DL algorithms are still lacking.

With regard to integration in clinical settings, it is paramount to acknowledge some of the challenges and outstanding issues. For DL-based classification of diabetic retinopathy, it is conceptually challenging that the diagnostic ability of the system does not depend on the traditional identification of diabetic retinopathy-associated lesions but rather on general pattern recognition, which is not necessarily understandable by human graders. This ‘black box’ phenomenon could obstruct clinical implementation, as clinicians may feel uncomfortable if they are not able to understand the rationale behind a given prediction [67]. Consequently, it would also be difficult to identify and correct any potential bias in the system.

Generalisability is another important outstanding issue. While DL-based algorithms have often produced impressive results, they are often trained for a specific dataset. Hence, integration in other settings would also require robustness for such populations, as diabetic retinopathy phenotypes often differ according to region and ethnicity [3]. Likewise, algorithms are often developed using high-quality retinal images, which may not be truly representative of real-life populations. This may lead to a high number of ungradable images, which would limit the financial gain of AI-based diabetic retinopathy screening.

The potential cost saving of grading images using AI-based diabetic retinopathy screening should also be balanced against the risk of unnecessary referrals of screen-positive patients. An example is the referral of those who have already been sufficiently treated for PDR. Such eyes would automatically be classified as screen positive by most algorithms, even though further treatment is not necessarily indicated. Conversely, there might be a risk of missing eyes with active PDR, as training data for algorithms have often been skewed towards lower levels of diabetic retinopathy. For instance, the dataset of Gulshan et al consisted of only 1.1–1.4% of images classified as PDR, without any further distinctions between active and inactive (i.e. photocoagulated) disease [51].

While the risk of missing STDR has consistently been demonstrated to be low using AI algorithms, it is also essential to detect other serious ocular diseases, such as glaucoma, age-related macular degeneration, choroidal melanoma and retinal detachment. These aspects have been successfully addressed by some [52] but not all DL-based algorithms. Even so, a number of rare retinal diseases may be missed by automated screening, as it is difficult to train neural networks to include all potential retinal pathologies. A possible solution would be algorithms trained to recognise retinal images without any pathology. These could then be used as an initial filter, leaving retinal images with diabetic retinopathy or other retinal pathologies to be screened by a human grader as a second step.

Although clinically implemented AI-based diabetic retinopathy screening, integration of mobile devices and ocular telemedicine are still in their infancy, the technology is evolving rapidly and there is little doubt that these concepts will be able to optimise diabetic retinopathy screening within a few years (see text box). Along with nationally implemented, systematic diabetic retinopathy screening programmes, embracing this technology will be a giant step towards the prevention of visual loss and blindness in diabetes.

figure b