1 Introduction

Sense of sight is the one of the most senses in humans, and it is responsible for producing vision. According to a report by the World Health Organization (WHO) in 2012, the number of individuals with visual impairment (VI) was estimated to be 285 million (Pascolini and Mariotti 2012). Among these population, a significant proportion of 246 million were observed to have low vision (LW), while 39 million were identified as being blind. Moreover, the number of visually impaired individuals were increased to 2.2 billion in the year 2019 (see Figure 1). Out of these 2.2 billion cases, approximately one billion patients could have been completely cured if the disease spread was identified timely in early stages (World Health Organization, accessed December 24, 2022). The leading causes of visual impairments and blindness are the retinal diseases, such as trachoma, diabetic retinopathy, corneal opacities, and glaucoma. The progression of the retinal diseases is slowly and the symptoms do not appear to the subject in early stages. Also, in the United States, fifty percent of the patients were found to be uninformed about their retinal conditions (Tham et al. 2014). Due to the high correlation between retinal diseases and blindness, a lot of clinical research is being conducted toward screening retinal diseases in the early stages. Additionally, computer-aided screening methods are introduced by the researchers from the past decade to aid clinicians in objectively extracting the retinal lesions, and retinal layers to give reliable and accurate diagnosis of the retinal diseases in order to prevent the spread of blindness.

In this paper, we present a review of clinical pathophysiologies related to different retinal diseases and how they are examined using various modalities. Furthermore, we provide an insight on the development of artificial intelligence (AI) methods that are specifically designed to screen and grade retinal diseases using different retinal modalities to aid ophthalmologists in their clinical practice.

Fig. 1
figure 1

Global prevalence of people with vision impairment and preventable diseases, as depicted by WHO statistics

1.1 Contributions

The main contributions of this paper are two-fold:

  1. 1.

    To the best of our knowledge, this is a first attempt to compile a comprehensive review of clinical and machine learning methods related to screening and grading different retinal diseases, such as age-related macular degeneration (AMD), diabetic retinopathy (DR), diabetic macular edema (DME), and glaucoma.

  2. 2.

    This paper presents a detailed acquisition principle of fundus and OCT modalities in ophthalmology and how these non-invasive imaging scheme are used in clinical practice for visualizing retinal abnormalities in the early stages.

The rest of the paper is organized as follows: Section 2 presents the related works. Section 3 presents the methods for procuring and discussing state-of-the-art works which satisfied the eligibility criteria. Section 4 presents the detailed structural details of the eye anatomy. Section 5 presents detailed discussion on retinal imaging modalities. Section 6 discusses the commonly and majorly occurring retinal diseases. Section 7 presents the benefits of AI on Health Economics. Section 8 discusses the involvement of digital technologies in ophthalmology. Section 9 presents the clinical studies related to screening retinal diseases. Section 10 discusses the technical studies related to designing AI models for screening retinal diseases. Section 11 highlights the advanced deep learning models which are recently proposed to robustly and reliability screening the retinal diseases. Section 12 shed light on the retinal image datasets which were publicly released to facilitate the research community working in retinal image analysis. Lastly, Section 13 presents a detailed discussion on the retinal image analysis works which were introduced over the past decade. Section 13 also shed light on the future directions which can be pursued by the researchers working in the field of retinal image analysis. Apart from this, the complete organizational framework of the paper is shown in Figure 2.

Fig. 2
figure 2

The organizational framework of this review paper

Fig. 3
figure 3

The timeline of the major works that have been done by the researchers over the past decade to design robust AI models for screening different retinal diseases from the multi-modal imagery

2 Related works

Retinal image analysis is a widely researched topic where clinicians and researchers have rigorously worked on proposing novel methods to screen retinal diseases, especially in the early stages. Most of these works formulate clinical and machine learning/ AI-based methods for screening specific retinal diseases and analyzing the pros and cons of different modalities to perform accurate retinal image analysis. A detailed discussion of all of these works is presented in the subsequent sections. Furthermore, the detailed timeline, demostrating the major contributions of the researchers toward designing robust AI models for screening different retinal diseases across multi-modal imagery is shown in Figure 3.

2.0.1 Disease-specific studies

Nicholson et al. (2013) performed a systematic review to analyze the pathogenesis of central serous chorioretinopathy (CSR). The study (Khalil et al. 2014) presented a survey to detect glaucoma changes through fundus images. Preprocessing, feature extraction, feature selection, and machine learning (ML) techniques were discussed. Gupta and Karandikar (2015) reported a survey study that analyzes the automated techniques for DR diagnosis. A comparison was made between the algorithm that detects the various structural changes through fundus images. A total of 13 studies were included in the survey. The review study (Das et al. 2016) was presented in the literature that analyzed diabetic macular edema (DME) management in Indian subjects. Muramatsu et al. (2018) presented the survey for the treatment of DME in Japanese subjects. The clinical and technical review was presented for glaucoma diagnosis through fundus and OCT images (Naveed et al. 2017). Pead et al.(2019) presented a review study that evaluated the ML and deep learning (DL) techniques for automated drusen detection in the context of AMD. The paper included only those studies which detected the drusen in color fundus photography. A total of 14 articles were reviewed and only compared the ML and DL methods presented in those studies. Araki et al. (2019) presented a survey to analyze the effect of steroids on Japanese CSR subjects. Another clinical review (Van Rijssen et al. 2019) investigated the different treatments related to CSC, which included photodynamic therapy, laser treatment, and pharmacology. The survey (Lakshminarayanan et al. 2021) was conducted over the period of five years, from 2016 to 2021, to investigate the automated techniques, which includes ML and DL approaches, for the detection of DR in fundus and OCT images. A total of 114 papers were comprehensively reviewed from the open literature. Another review study (Abdullah et al. 2021) was presented that compared the automated ML techniques to detect the structural changes in fundus images. In addition to this, the author discussed the various fundus-related datasets (public and private). Sarki et al. (2020) comprehensively reviewed state-of-the-art approaches for the detection of diabetic and glaucomatous changes through fundus images. Image processing, ML, and DL techniques were explored. The author also reported available datasets. The paper (Bala et al. 2021b) presented the clinical and technical survey for glaucoma diagnosis. It reported DL techniques for detecting pathological changes in fundus and OCT images. The study (Shahriari et al. 2022) discussed how artificial intelligence (AI) is being used to screen, diagnose, and categorize DME.

2.0.2 Modality-specific studies

Abramoff et al. (2010) presented a clinical review of the retinal imaging trends. Besides this, the paper summarized the most prevalent causes of blindness, which include AMD, DR, and glaucoma. The review was about 2-D fundus imaging and 3-D OCT imaging techniques. Another study (Das and Malathy 2018) in the literature presented the clinical review of fundus images for detecting retinal diseases. The study (Kafieh et al. 2013) is modality specific, where image segmentation methods were reviewed for processing the retinal OCT images. The OCT segmentation approaches were classified into five categories, such A.scan, B.scan, active contour, AI methods, 3D graphs, and 3D OCT volumetric. Baghaie et al. (2015) reported the major issues related to OCT image analysis. More specifically, different techniques for noise reduction, image segmentation, and registration were discussed. Usman et al. (2017) provided an exhaustive review of various class image processing and computer vision techniques for detecting glaucoma, DR, and pathological myopia. The authors also reported the causes, symptoms, and pathological alterations of these diseases in OCT images, which can aid in the development of an automated system for the detection of retinal disorders. The precision of algorithms determined performance after an exhaustive examination and evaluation of various methods. Khan et al. (2019) presented a survey that is also modalities-specific. The survey comprehends the automated techniques for extracting retinal vessels in fundus images. The techniques are categorized into supervised and unsupervised groups. Supervised approaches are further classified into ensemble classification and neural network-based approaches. However, unsupervised techniques are grouped into four classes: matched filtering, mathematical morphological, multi-scale-based techniques, and region-growing methods. A valuable comparison was made among the techniques which were reported on the publicly available datasets. In the article (Nuzzi et al. 2021), a clinical review was reported on the state-of-the-art applications for AI in ophthalmology, which helps clinicians to have an overview of growing trends. The paper (Badar et al. 2020) was modality specific, focusing on DL techniques for retinal analysis through fundus images. The review includes automated disease classification methods based on retinal pathological landmarks. The methods were evaluated using accuracy, F score, sensitivity, specificity, and area under ROC curve on publicly available datasets Angiography has gained popularity in the field of ophthalmology for the diagnosis of ocular diseases. Boned-Murillo et al. (2022) presented the survey study related to OCT-A in diabetes subjects. Deep learning techniques were reported for the detection of retinal vascularization. Stolte and Fang (2020) performed a comprehensive survey for DR diagnosis covering the clinical and technical aspects. The paper also described the publicly available datasets of the fundus and OCT modalities. In addition to this, ML and DL frameworks were reviewed for the detection and classification of DR. However, fundus-related literature was more critically reviewed as compared to OCT and OCT-A modalities. A systematic review of clinical and technical studies across many disorders and modalities, represented as a chord diagram (Figure 4) illustrating the relationship between different categories.

Although, there are many survey articles which reports either modality-specific (Vujosevic et al. 2023; Ye et al. 2023), or disease-specific studies (Iannucci et al. 2023; Srivastava et al. 2023). But, to the best of our knowledge, there is a scarcity towards finding a comprehensive survey article which reports different clinical and machine learning methods related to screening and grading different retinal diseases, such as age-related macular degeneration (AMD), diabetic retinopathy (DR), diabetic macular edema (DME), and glaucoma using different retinal examination modalities (Table 1).

Table 1 Summary of review studies related to retinal diseases
Fig. 4
figure 4

Chord diagram showing the interactions between categories in a systematic review of clinical and technical studies across multiple diseases and modalities. The diagram highlights the relationships between subcategories of diseases (e.g., glaucoma, DR, AMD, DME, Cataract, and CSR) and modalities (e.g., Fundus, OCT, OCT-A, Adaptive optics, and FFA). The thickness of the chords represents the strength of the connections between the categories, with thicker chords indicating stronger connections.”

3 Methods

3.1 Timeline

This paper presents a comprehensive review of the works related to retinal image analysis which have been completed in the past decade (i.e., from January 2013 to January 2023).

3.2 Eligibility criteria

The criterion which we followed toward discussing the works in this paper are: (1) Presence of clinical and experimental findings related to retinal diseases, such as DR, glaucoma, AMD, and DME. (2) Formulation of machine learning models to extract retinal layers, and biomarkers related to retinopathy. (3) The machine learning models developed to detect retinal abnormalities from multiple modalities like fundus photography, fundus fluorescein angiography (FFA), optical coherence tomography (OCT), and OCT angiography (OCT-A). The main exclusion criteria were studies that were not related to the above-mentioned diseases and modalities.

3.3 Search strategy

The works presented in this paper are retrieved from the internet sources, scientific reports, conferences, and journal articles. The articles included in this paper are searched from public repositories, such as, PubMed, Science Direct, IEEE Xplore Digital Library, Springer Link, and Google Scholar. The search was carried out with a combination of different keywords, such as, diabetic retinopathy (DR), glaucoma, age-related macular degeneration (AMD), automated detection, mathematical retinal modeling, machine learning, deep learning, and advanced deep learning schemes. The search criterion was intentionally kept broad in order to encompass all peer-reviewed articles and report that potentially meets the eligibility criterion.

3.4 Study selection process

Articles found in the primary search were evaluated for eligibility to be included in the review based on their relevance to the research question or topic. We follow the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA), as shown in Figure 5.

3.5 Data extraction and synthesis

Following a thorough reading and summary of the chosen publications, the key points and arguments from each paper were extracted and synthesized in a separate file. The following findings were extracted from selected studies: pathological association from clinical or experimental articles, techniques, datasets, and results from technical literature. Results include evaluation metrics such as accuracy, sensitivity, specificity, precision, recall, F1 score, and intersection over union. However, some of the discussed studies have limited data, as we could not access the full paper.

Fig. 5
figure 5

PRISMA Flowchart for the Inclusion of studies in a review study of retinal diseases. The flowchart outlines the process of study selection and inclusion in the review, including the number of articles retrieved, screened, excluded, and included at each step

4 Eye anatomy

Human vision exists due to eyes. An eye is a sensory organ that perceive visual information and sends it to the brain for interpretation. Eye is the most complicated structure in the human body; it is spherical in shape and consists of three layers (Willoughby et al. 2010). The outermost layer of eye is made up of the sclera and cornea (see Figure 6). The sclera is composed of connective tissues, which helps in maintaining the eye shape and it also provides protection to the whole eyeball. The anterior most part of an eye is cornea, which is a protective transparent membrane that covers the iris and pupil. The human cornea has an average horizontal and vertical diameter of 11.5mm and 10.5mm, respectively. Beneath the sclera, a choroid lies that contains blood vessels which provide oxygen and nourishment to the whole eyeball (Hassan et al. 2016b). Moreover, the third and innermost layer of an eyeball is the retina, which contains light-sensitive tissues responsible for producing vision. The retina comprises two main regions, i.e., the macular region (also known as the macula of the retina) and the peripheral region (Hassan et al. 2018a). Light enters the eye through the pupil, is focused by the biconvex lens, and lands on the retina, where the macular region uses rod and cone cells in the macular center (called the fovea) to produce central vision. The peripheral region is responsible for producing side vision (Raja et al. 2020c). The other parts of the eye’s middle layer are the choroid, the ciliary body, and the iris. The ciliary body provides support to the lens and also produces the aqueous humor. Also, the moment of the pupil is regulated by the iris, which controls the amount of light that gets into the eye through contraction and relaxation process (Hassan et al. 2015). Furthermore, the retina has ten layers that help translate visual data into the electrical signals that are sent to the brain. The electrical signals are transmitted from the retina to the brain through the optic nerve situated near the optic nerve head (ONH) region of the peripheral retina.

Fig. 6
figure 6

A Represents a visual system of an human eye. It consists of the eye, optic nerve, and the visual cortex (Gilbert et al. 2019), B Showcases the structure of the human eye that consists of three layers Hugh Davson et al. (2022)

The visual information from the retina is transmitted to the brain in the form of neural impulses. The brain interprets the vision information from these neural impulses to perceive the context of the objects. There are total ten layers in the retina (see Figure 7), and each layer is responsible for a certain function, such as transforming light into electrical signals. The inner limiting membrane (ILM), which is made up of astrocytes and müller cells, is the first retinal layer beneath the vitreous body. Retinal ganglion cells (RGCs) with axons make up the retina’s second layer, also known as the retinal nerve fiber (RNFL). The 1.5 million retinal ganglion cell axons in the human eye converge at the optic nerve head (ONH), travel through the inner and outer neural canals, and finally exit the eye and enter the brain (Medeiros et al. 2012). Lamina cribrosa (LC) and bruch opening membrane (BMO) refer to the inner and outer neural canals, respectively. The LC is an inmost layer of the ONH. It is a network of capillaries that provide nourishment to RGCs and a 3D network of elastic porous connective tissues (Park and Park 2013). The fenestrated trabeculae create a pathway for the egress of RGC axons and vascular tissue. The ganglion cell layer (GCL) follows the RNFL and contains the bodies of ganglion cells. The inner plexiform layer (IPL) is situated posterior to the GCL and encompasses the synaptic connections between the dendrites of the ganglion, amacrine, and bipolar cells. The fifth position within the ocular anatomy is occupied by the inner nuclear layer (INL). It consists of the cell bodies of amacrine, bipolar, and horizontal cells. The next layer of the retina is the outer plexiform layer (OPL), which consists of a dense network of neuronal synapses between the dendrites of horizontal, bipolar cells (from INL) and photoreceptor cells. The outer nuclear layer (ONL) consists of the rod and cone nuclei responsible for visual phototransduction. The human retina contains approximately 7 million cones and 75–150 million rods. Cones correspond to photopic vision, whereas rods are responsible for scotopic vision. Cones are concentrated in the fovea, while rods are distributed throughout the retina, with the exception of the fovea. Cones and rods undergo a chemical transformation that transmits electrical impulses to the nerves. Initially, signals travel through bipolar and horizontal cells, then amacrine and ganglion cells, and finally, optic nerve fibers to the brain. These neural layers are responsible for processing the incoming picture data. The rod is the source of the signals, while the cones are the unprocessed data from individual points that are used to identify more complex features, including shapes, colors, contrasts, and motion. In contrast, photoreceptor cells’ nucleus and inner segments are separated by an outer limiting membrane (OLM). Each photoreceptor cell’s inner and outer segments can be found in the IO/OS layer. Retinal pigment epithelium (RPE) is located between IO/OS and choroid and is the outermost layer of the retina.

Fig. 7
figure 7

The retinal layers within the human eye

5 Retinal imaging modalities

Several examination schemes have been proposed to identify and track the advancement of the retinal diseases. Fundoscopy, also known as fundus photography, is a primary non-invasive retinal imaging scheme employed by ophthalmologists to examine the retinal fundus. The key areas of interest that can be evaluated through fundus photography are the macula, optic disc region, peripheral, and the central retina. The fundus image, however, does not reveal any information about the pathological changes and early diseases development within the retinal layers. For this purpose, imaging techniques like OCT/ OCT-A, scanning laser polarimetry (SLP), and confocal scanning laser ophthalmoscopy (CSLO) are typically used in the clinical practice. OCT provides assessments of both the retinal layers and the ONH, in contrast to CSLO, SLP, and other examination methods. Moreover, OCT has seen widespread usage as an imaging tool for identifying structural retinal abnormalities and tracking the progression of retinal diseases. The detailed discussion on some of the commonly used retinal imaging modalities are presented in the subsequent sections below. Apart from this, the detailed summary of these retinal modalities are reported in Table 2.

5.1 Fundus imagery

Fundus photography is used to analyze the fundus of the retina (Tran et al. 2012). In fundus photography, specialized fundus cameras are utilized that are comprised of an intricate microscope and it is connected to a flash-enabled camera. The principle of fundus cameras is based on the concept of monocular indirect ophthalmoscopy (Fundus Photography Overview, 2022). A fundus camera typically shows a 30\(^{\circ }\) to 50\(^{\circ }\) view of the retinal fundus region with a wide-angle lens.

Fundus photography can be conducted utilizing chromatic filters or specialized contrast agents such as fluorescein and indocyanine green (Fundus Photography Overview, 2022). Color fundus photograph (CFP) is acquired when the retina is illuminated by white light. A filter is utilized during the process of red-free fundus photography so that superficial lesions and certain vascular anomalies inside the retina and the surrounding tissue can be clearly observed. Fundus images are widely used by ophthalmologists for screening the retinal diseases, such as DR, glaucoma, AMD, and DME (Son et al. 2022). As the fundus photograph is two-dimensional, it does not provide the visualization of the retinal cross-sectional, and the retinal layers.

5.2 OCT imagery

OCT is a non-invasive imaging modality that employs low-coherence light to produce a higher-resolution cross-sectional retinal images (Drexler and Fujimoto 2008). Apart from ophthalmology, OCT imagery has also a wide range of clinical applications in cardiology, dermatology, oncology, and gastrointestinal (Adam et al. 2007). OCT imagery is rapidly becoming an essential tool for getting a 3-D cross-sectional representation of the retina and is, therefore, the most commonly used retinal examination method in the clinical practice. The Michaelson Interferometer is the apparatus that is used to acquire OCT scans. Michealson Interferometer measures the sample’s spatial location not through the passage of time but by means of light waves (near the infrared spectrum). The utilization of a super luminescent diode as a source is preferred in the OCT system due to its ability to comprehend broadband spectrum. Moreover, coherence length of the emitted light is a determining factor for the resolution of the OCT scans. The are two ways which are in commonly toward acquiring OCT imagery. The first one is the time-domain OCT imagery, dubbed TD-OCT. In TD-OCT, the depth range of the apparatus is sampled one point at a time by shifting the location of the reference mirror in order to create a longitudinal scan, also known as axial scan (A-scan). However, in order to achieve an A-scan, the reference mirror must be displaced mechanically by one cycle at a time. The utilization of Fourier transformation facilitated the transition from the conventional TD-OCT to the spectral-domain OCT (SD-OCT) implementation. In SD-OCT, to quantify the spectral modulations caused by interference between the reference reflection and the sample reflection, a spectrometer is used instead of a single detector, whereas the reference mirror is kept fixed (Huang 2009). Additionally, a Fourier transform was used to convert the spectrum modulations to depth information in order to capture A-scans. The working schematics of both TD-OCT and SD-OCT are shown in Figure 8 (A and B), respectively.

Fig. 8
figure 8

Illustration of time domain and spectral domain OCT principle. A In TD-OCT, the reference mirror moves across a distance. In B SD-OCT, the reference mirror remains fixed, and a spectrometer detects the spectral variation

5.3 Optical coherence tomography angiography

Optical coherence tomography angiography (OCT-A) is another non-invasive retinal examination modality. It is dye-free OCT-based imagery that provides volumetric visualization of retinal and choroidal vasculature (CNV) (Park et al. 2016). OCT-A relies on repeated scans of the same area to identify movement. In 2006, Makita et al. (2006) initially described OCT-A utilizing an SD-OCT device with a spectral resolution of 18.7 kHz. With further improvements in the OCT hardware machines, higher quality OCT angiograms could be generated with fewer image artifacts.

OCT-A is a novel variant of OCT; it has the capability to generate 3D angiograms of the retina and choroid with high resolution. Additionally, it can detect sub-retinal neovascular blood vessels. To analyze structural changes in various retinal diseases, ophthalmologists are turning their attention toward using OCT-A. The main advantage of OCT-A over other modalities is its ability to visualize the retinal microvasculature, which greatly helps in screening retinal diseases like choroidal neovascularization (CNV) (Roisman and Goldhardt 2017), glaucoma Igarashi et al. (2017), and DR Schaal et al. (2019).

5.4 Multi-modal AI systems

The field of ophthalmology has made great strides recently with the development of multi-modal AI systems for retinal image processing, which are transforming the diagnosis and treatment of retinal diseases. Through the integration of data from multiple imaging modalities, including fundus photography, angiography, and optical coherence tomography (OCT), these systems provide a more nuanced and comprehensive view of the complex structures found within the human retina (Zhang 2023). These holistic approaches not only aid in the early diagnosis of retinal disorders but also permits a more exact characterization of diseases, such as diabetic retinopathy, glaucoma, and age-related macular degeneration.

Moreover, an important development in the diagnosis and monitoring of retinal diseases is the increased accuracy of the multi-modal retinal layer segmentation methods. These multi-modal AI systems are excellent at picking up on minute details within retinal architecture, which enables a more thorough evaluation of the underlying pathology. Furthermore, these systems exhibit a noteworthy capacity to detect early indicators of pathology that can be missed by the clinicians due to their tiring work schedule. Similarly, the integration of multi-modal data can significantly help the doctors in giving a reliable retinal diagnosis and enables more individualized treatment regimens catered to the needs of each patient (Wang 2022).

Furthermore, a proactive approach to maintaining retinal health is facilitated by the predictive powers of multi-modal AI systems. These methods help in predicting the course of a disease and its response to treatment by evaluating longitudinal data and identifying minute changes over time. These multi-modal AI systems also facilitate patient care while supporting current attempts to provide preventive measures against retinal disorders (Li 2024). All things considered, current advancements in multi-modal AI systems for retinal image processing are extremely promising for promoting a new era of precision medicine in ophthalmology and boosting both clinical practice and research.

Table 2 Summary of imaging modalities used in clinical practice for the screening, diagnosis, and progression monitoring of various ocular diseases

6 Major retinal diseases

6.1 Diabetic retinopathy

Diabetic Retinopathy (DR) is a pathological condition of the eye that results from elevated levels of insulin in the bloodstream, leading to abnormalities in the retina (Klein et al. 1984). DR is the leading cause of visual impairments and blindness all over the world. DR is a chronic and degenerative disease that poses a significant challenge due to its asymptomatic nature during the early stages. The determination of the severity of DR is contingent upon the quantity and classifications of lesions that are observable on the retinal surface. The human retina comprises diverse constituents, including blood vessels, the fovea, the macula, and the optic disc (OD). DR is commonly categorized into two stages: non-proliferative DR (NPDR) and proliferative DR (PDR). NPDR is characterized by the impairment of blood vessels within the retina, leading to the leakage of fluid onto the retinal surface (Crick and Khaw 2003). This results in the swelling and moistening of the retina. NPDR may present with various manifestations of retinopathy, including microaneurysms (MAs), hemorrhages (HMs), exudates (both hard and soft), and inter-retinal microvascular abnormalities (IRMA) (Robert 1995). PDR is a severe form of DR in which new aberrant blood vessels sprout in various parts of the retina, potentially causing complete blindness. As it is shown in Figure 9, the NPDR lesions can be either MAs, HMs, or EXs. MAs are the earliest detectable indication of DR, and they form when fluid leaks out of the retina’s tiny blood capillaries. Their size is smaller, they have a round form, and they are red in color. The breakdown of MA walls results in HMs. Blot HMs are bigger red lesions, while hemorrhages seem like bright red dots (Sjølie et al. 1997). EXs are yellow spots on the retina caused by blood leakage containing lipids and proteins. If the lipid accumulation is on or close to the macula, it can result in permanent blindness. Both MAs and HMs are classified as dark lesions, while EXs are considered brilliant lesions (Robert 1995).

Fig. 9
figure 9

A fundus scan of DR subject, showing MAs, EXs, and HMs

6.2 Age-related macular degeneration

Age-related Macular Degeneration (AMD) is a chronic retinal condition that typically impacts both eyes and arises from a metabolic disorder (de Jong et al. 2020). The condition manifests within the macula, a region of the ocular apparatus that holds particular significance in the process of visual acuity. The etiology of this particular type of maculopathy, which ranks second in terms of prevalence, remains incompletely elucidated. Experts believe that macular degeneration develops when there is an issue with the extremely high-energy metabolic processes that occur in the retina’s sensory cells. The body has evolved to handle these reactions and eliminate the metabolic byproducts. If the body is unable to process these compounds, however, they accumulate in the form of drusen. The retina does not get enough oxygen and nutrients because of these deposits. Drusen growth behind the retina causes age-related macular degeneration, which typically affects the elderly people. Due to the RPE layer thinning or atrophying caused by the drusen, central vision of the subject gets blurred, and straight lines within the normal vision appears vivid. Pathological AMD symptoms on fundus and OCT images are depicted in Figure 10. AMD can cause significant visual defects or even irreversible loss of central vision, but it does not cause blindness on its own (Seltman 2021). The clinical classification of AMD divides the condition into two sub-types: dry AMD and wet AMD. Under the retina, drusen can form when a patient has dry AMD, also known as non-exudative AMD. In the early stages of the disease, small drusen deposits do not impair vision; nevertheless, they do promote RPE atrophy and the creation of scars, both of which contribute to the gradual dimming and distortion of central vision as the disease advances. If dry AMD isn’t treated, it might progress to wet AMD, also known as exudative AMD. Wet AMD, also known as choroidal neovascularization, occurs when aberrant blood vessels in the choroid leak fluid and blood into the retina near the macula. Fluid leakage causes peripheral blind spots and a wavy appearance of straight lines (Table 3).

Fig. 10
figure 10

A Fundus scan with drusen, B corresponding foveal B-scan with intra-retinal fluid (IRF), sub-retinal fluid (SRF), and RPE atrophy

Table 3 Summary of the symptoms, stages, and severity of different retinal diseases

6.3 Glaucoma

Glaucoma is a multifaceted and intricate retinal condition that can result in an irreversible vision loss if not treated timely. Glaucoma is typically attributed to elevated intraocular pressure (IOP) exceeding 24 mm, although it can manifest in eyes with IOP levels within the normal range of less than 20 mm. The elevation of intraocular pressure within the anterior chamber is attributed to the obstruction of fluid outflow or a narrowing of the angle at the point of drainage. When there is an obstruction in the trabecular meshwork, fluid accumulates in the anterior chamber, resulting in increased pressure on the posterior chamber. The nerve fibers are pressurized by the vitreous body, leading to the eventual loss of ganglion cells. This results in the thinning of the ganglion cell complex (GCC) and the enlargement of the optic cup, as depicted in the linked Figure 11. The detection of glaucoma is facilitated by utilizing the thickness profiles of the RNFL, GCL, and IPL layers, which are encompassed by GCC as illustrated in Figure 11 (B).

Fig. 11
figure 11

The structural changes due to glaucoma are shown in the fundus and OCT scans

6.4 Pathologic myopia

Pathologic myopia, commonly referred to as degenerative or malignant myopia, is an advanced and progressive type of nearsightedness distinguished by the elongation of the eyeball (Ohno-Matsui 2021). This elongation results in several structural alterations within the eye, such as the thinning of the sclera (the outer white covering of the eye) and the stretching of the retina. The importance of pathologic myopia resides in its capacity to induce permanent harm to the eye and, ultimately, result in loss of vision. Pathologic myopia, unlike regular myopia, cannot be easily corrected with glasses or contact lenses and poses a greater risk to the health of the eyes (Ikuno 2020).

Myopic macular degeneration is a significant consequence that can arise from pathologic myopia. This problem arises when the retinal thinning and elongation result in the development of anomalous blood vessels in the macula, the center region of the retina that is accountable for precise vision. These atypical blood vessels have the potential to release fluid and blood, resulting in the formation of scar tissue and substantial impairment of vision (Ikuno 2020). Individuals with pathologic myopia may have a progressive deterioration of their central vision, leading to difficulties in skills such as reading and recognizing faces (Ikuno 2020).

Apart from this, elongation of the eyeball within pathologic myopia can cause detachment of the retina from the underlying tissue, which can result in a medical emergency. Untreated retinal detachment can lead to lifelong blindness due to a sudden and severe impairment of vision. The retinal alterations linked to pathological myopia render individuals more vulnerable to this vision-endangering condition (Xu 2019).

Early detection and management are crucial in addressing the challenges posed by pathologic myopia. Regular eye examinations, especially for individuals with a family history of severe myopia, can help monitor the progression of the condition and implement interventions to minimize its impact (Ohno-Matsui 2021). Treatment options may include corrective lenses, prescription medications, or surgical procedures, depending on the severity of the condition. The significance of understanding and addressing pathologic myopia lies in the potential to preserve vision and prevent the devastating consequences that can lead to blindness (Xu 2019).

6.5 Retinopathy of prematurity (ROP)

Retinopathy of Prematurity (ROP) is a fatal retinal condition that primarily effects premature newborn babies, especially those who are born before 31 weeks of gestation and weigh less than 1500 gs at birth Smith (2021). The cause of this disease is undeveloped blood vessels that are contained within the retina. Retinal blood vessel growth happens in the womb during a full-term pregnancy and is typically finished by the time the baby is born Garcia (2020). However, the development of these arteries is disrupted in premature infants, which results in the start of ROP.

Two steps are usually involved in the evolution of ROP. The undeveloped blood vessels cause the body to respond by growing new, aberrant vessels during the first phase, which is referred to as the vaso-proliferative phase. While this phase aims to supply more oxygen, it frequently leads to the creation of flimsy, aberrant blood vessels that might burst or separate the retina. The second phase, known as the fibrovascular proliferation phase, is characterized by the growth of scar tissue, which, if ignored, may result in significant visual impairment or even blindness (Jones 2022).

Low birth weight, premature birth, additional oxygen therapy, and other conditions that upset the delicate equilibrium of oxygen delivery to the developing retina are risk factors for ROP. In order to mitigate the possible effects of ROP, prompt screening and management are essential (Jones 2022). The eyes of the premature babies should be routinely examined by ophthalmologists. In order to prevent further issues and preserve vision, treatment options may include surgery or laser therapy (Smith 2021). ROP is still a major problem despite advancements in diagnosis and treatment, highlighting the need of specialist care for premature newborns who are at risk of developing this disorder.

7 Benefits of AI on health economics and on screening retinal diseases

The use of artificial intelligence (AI) models in the field of health economics can fundamentally transform the identification and prediction of retinal diseases, which in turn can significantly reduce the socioeconomic consequences of blindness worldwide. By utilizing AI algorithms to precisely screen retinal disorders in their early phases, healthcare organizations might greatly diminish the financial strain linked to complex instances that require more extensive and costly treatments (Ting 2017). Utilizing AI in the screening of retinal diseases enables prompt detection of anomalies, offering a cost-efficient approach to implementing preventative measures and therapeutic interventions prior to the escalation of symptoms.

An important socioeconomic advantage of AI models in retinal disease screening is their ability to improve access to healthcare services, particularly in poor populations (Keel et al. 2019). AI technologies can be utilized in distant or resource-constrained regions, where conventional healthcare infrastructure may be deficient. The widespread availability of early detection tools allows people from various socioeconomic backgrounds to access timely and effective interventions (Ting 2017). This helps prevent a significant number of cases of blindness, which could otherwise lead to higher healthcare expenses and societal challenges.

Moreover, there are economic consequences that also include the decrease in healthcare costs in the long run that are linked to blindness. AI-based retinal screening enables prompt intervention, which can result in more controllable treatment strategies and improved patient outcomes. Consequently, the burden on healthcare systems, such as those associated with rehabilitation, long-term care, and disability assistance, is reduced. The socio-economic impact is dual, as it has the ability to enhance both individual welfare and the overall effectiveness of healthcare resource distribution (Keel et al. 2019).

Nevertheless, it is crucial to tackle ethical considerations, data privacy problems, and possible discrepancies in access to AI-powered healthcare solutions. To fully realize the socioeconomic benefits of breakthroughs in retinal disease screening, it is vital to ensure the fair distribution of AI technology and promote appropriate deployment tactics (Ting 2017). As we explore the convergence of AI and health economics, it is crucial to prioritize inclusion and ethical issues to ensure that the positive impact on blindness prevention is accessible and useful to various groups globally (Ting 2017).

8 Involvement of digital technologies in ophthalmology

The international health care organizations aim to provide an action-oriented, results-driven approaches for advancing health equity by improving the quality of care provided to minority and other undeserved communities (Hill-Briggs et al. 2021). However, healthcare organizations have increasingly acknowledged the presence of healthcare disparities across race/ethnicity and socioeconomic status, but significantly fewer have made health equity for diverse patients a proper priority (Chin 2016). The lack of financial incentives is a major barrier to achieving health equity. Now the focus of healthcare organizations is to report clinical findings based on race, ethnicity, and socioeconomic status in order to provide preventive care and primary care facilities all over the world. Social determinants of health (SDOH) have emerged as a primary focus of intervention in the quest for health equity as the healthcare system shifts toward a greater emphasis on population health outcomes and value-based treatment. Clinical ophthalmology studies have recently shifted their focus to SDOH in order to better understand and promote community health improvement prospects.

9 Current clinical practices in screening retinal diseases

In the recent years, extensive research in clinical settings has been carried out for improving clinical diagnostic capabilities, which include the consideration of risk factors, phenotypic, therapy, and drug management strategies in order to treat retinal diseases. The review of current clinical practices that are being followed to effectively screen the major retinal diseases, such as DR, AMD, and glaucoma are discussed in the subsequent sections.

9.1 Current clinical practices in screening DR

There are additional social and economic costs as a result of the diabetes patient’s inability to work. Understanding and reducing the effects of SDOH in diabetes is a top priority due to disease incidence, economic expenses, and a disproportionate population burden (Haire-Joshu and Hill-Briggs 2019; Hill et al. 2013). Hill-Briggs et al. (2021) presented a systemic review, which discusses associations of SDOH and diabetes risk and outcomes, as well as the results of programs designed to improve SDOH and its effect on diabetes outcomes. Moreover, the article also provides a brief introduction to key terms and various SDOH frameworks. Apart from this, blindness, due to DR, is mostly caused among age group of 20 to 60 years. Toward this end, one of the studies (Mistry et al. 2022) assessed the etiologic factors in a birth cohort and technology use among children and evaluated the drug management of Type-II diabetes in adolescents. Another study (Pacaud et al. 2016) presented used the international database to characterize the population of children with various forms of diabetes (non-type I). It was concluded that Type-II diabetes is more common but still difficult to diagnose worldwide. However, better management and outcomes for patients with uncommon kinds of diabetes may be achieved through collaboration with the eye hosiptals and clinics. In addition to this, another study (Tosur and Philipson 2022) summarized the history of maturity-onset diabetes of the young (MODY), and how it can be effectively treated.

The lipopolysaccharide (LPS) found on the outer membrane of gram-negative bacteria is responsible for triggering the host’s immune system and leading to systemic inflammation and cellular apoptosis. Patients with advanced diabetes have been reported to have high serum LPS levels, most likely as a result of intestinal permeability and dysbiosis. Consequently, there is substantial indication that systemic LPS challenge is closely linked to the prognosis of DR. Even though the underlying molecular mechanisms are not yet fully explored, LPS-related events in the retina may render DR’s vasculopathy and neurodegeneration severe. Qin and Zou (2022) presented a review while focusing on how LPS affects the development of DR, especially how it affects the blood-retina barrier and how it affects glial activation. In the end, they summarise the recent improvements in therapeutic strategies for blocking the effects of LPS, which could be used to treat DR with good clinical promise. It has been suggested that intestinal dysbiosis plays a contributing role in the development of type 2 diabetes (T2D) (Sharma and Tripathi 2019). The review study (Yang et al. 2021) provides an overview of the gut microbiota in T2D and associated diseases, focusing on its possible features and molecular pathways in relation to intestinal barrier breakdown, metabolic abnormalities, and chronic inflammation. The author concluded by summarising a therapeutic strategy for improving the malignant progression of type 2 diabetes and related disorders through intestinal microecology, with an emphasis on influencing gut bacteria. The goal of study (Pasini et al. 2019) was to find out how long-term exercise affects the gut flora and leaky gut in people with stable T2D. Exercise helps to control blood sugar levels by changing the gut microbiota and its functions. This data indicates an extra way exercise works and suggests that boosting gut flora could be a key part of tailor-made treatments for T2D. The putative roles of pyroptosis-signaling pathways in the pathophysiology and impact of DR development are discussed in detail in the review study (Al Mamun et al. 2021). The review reveals briefly the pharmacological drugs might be useful in the future treatment and management of DR.

The vascular endothelial growth factor (VEGF) family consists of the five ligands for the VEGF receptor (VEGFR) (VEGF-A, -B, -C, -D, and the placental growth factor [PlGF]). However, VEGF-A binds VEGFR1 and VEGFR2, while VEGF-B and PlGF only bind VEGFR1. Even though a lot of research has been done on VEGFR2 to Figure out what its main role is in retinal diseases, recent work has shown that VEGFR1 and its family of ligands are also important and play a role in microinflammatory cascades, vascular permeability, and angiogenesis in the retina (Uemura et al. 2021). VEGFR1 signaling alone leads to the pathological changes seen in DR, retinopathy of prematurity, retinal vascular occlusions, and AMD. Anti-VEGF medicines have shown remarkable clinical efficacy in various diseases, and their effect on modulating VEGFR1 signaling remains a fertile area for future research. Upregulation of VEGF-A in the diabetic eye has been linked to DR progression. The study (Singh et al. 2019) presented a review of anti-VEGF treatments for DR that have been approved for use in the USA. An improvement of 2 steps on the DR severity scale developed for the Early Treatment Diabetic Retinopathy Study is regarded clinically meaningful. After One year of medication with ranibizumab or aflibercept, about one-third of individuals with DR and DME obtain this level of improvement. Another study (Huang et al. 2022a) presents novel concepts for the prevention and treatment of DR.

There is research going on to find the association of diabetes and its risk factors with other medical conditions. Xiong et al. (2022b) investigated the IOP changes and acute angle closure (AAC) risk in diabetic patients after pupil dilatation. Diabetic patients were at a reduced risk of acquiring AAC after pupil dilatation. Increased post-dilation IOP was associated with lower pre-IOP. The study (Kjærsgaard et al. 2022) presented to determine whether or not DR is linked to and indicative of primary open-angle glaucoma (POAG). No significant links were found between DR and either the prevalence or incidence of POAG. The purpose of the study (Vergroesen et al. 2022) was to assess whether or not diabetes medication is linked to the prevalent eye disorders of AMD, OAG, and cataract, as well as to evaluate these diseases’ cumulative lifetime risks in a large cohort study. The findings of cohort analysis indicate that diabetes medication was not connected with cataracts, despite the fact that diabetes itself was definitely associated with cataracts. Metformin treatment was associated with a lower risk of OAG, and other diabetes medications were associated with a lower risk of AMD. To demonstrate the efficacy of the treatment, interventional clinical trials are required. The other studies (Cui et al. 2022; Zhang et al. 2022f; Jiang et al. 2022; Yongpeng et al. 2022; Cao et al. 2022b; Kulshrestha et al. 2022; Peled et al. 2022; Eton et al. 2022) that found in the published research that investigates the connection between diabetes and other ocular disorders. Researchers have been looking into the effects of commonly occurring comorbidities like diabetes as a result of the rapidly spreading coronavirus disease 2019 (COVID-19) pandemic. Although diabetes does not appear to raise the incidence of COVID-19 infection, it has been proven that hyperglycemia of any degree predisposes to worse outcomes, including more severe respiratory involvement, ICU admissions, the requirement for ventilators, and mortality. Infection with COVID-19 has also been linked to the development of new-onset diabetes and hyperglycemia, as well as a worsening of glycemic control in pre-existing diabetes (Xiong et al. 2022b). Previously, researchers hypothesized that this was related to the virus damaging the pancreas directly, the body’s stress response to infection, and the use of diabetogenic medicines such corticosteroids to treat severe COVID-19 infections. Patients diagnosed with mild COVID-19 may continue to take the majority of diabetes drugs while switching to insulin is the treatment of choice for those diagnosed with severe conditions. Diabetes and periodontal disease both exhibit the same pattern of inflammation. Both of these diseases, if not addressed, can cause a cytokine storm, which spreads pro-inflammatory substances all over the body (Stoica et al. 2022). Periodontitis has recently been considered to be the sixth complication of diabetes, and the most current studies point to a relationship between these two disorders that cannot be denied. Recent scientific research suggests that better glucose control in diabetes patients may be possible if their periodontal health is managed by appropriate and timely medication. New evidence of central visual system damage in diabetes patients was revealed in the recently published study (Chen et al. 2022b). Diabetes can cause damage to the peripheral sensory organs and the central visual system, which can result in decreased color vision. Adhesive capsulitis (AC) occurs more frequently and lasts longer in diabetic patients compared to patients with idiopathic AC. Joshua et al. presented a study (Gordon et al. 2022), the goal was to find out how gene expression is different in AC with and without diabetes mellitus. The study of RNA-sequencing data showed that 66 genes were significantly expressed between nondiabetic patients and diabetic patients with AC. Still, only three genes were differentially expressed between control patients with and without diabetes. In addition, 286 genes were found to have differential expression in patients with idiopathic AC, while 61 genes were found to have differential expression in patients with diabetic AC. The newly expressed genes provide an explanation for the dissimilarities in disease progression and provide potential therapeutic targets that could lead to alternative treatment strategies for the two groups. This study presented the use of ribonucleic acid (RNA) sequencing and analytics to examine gene expression in alveolar bone in health and diabetes subjects. The study (Zhu et al. 2020) presented to investigate the candidate genes involved in the T2D. The Gene Expression Omnibus (GEO) database was used to get the gene expression profile GSE26168. Differentially expressed genes were obtained using the online tool GEO2R. Metascape was used for annotation, visualization, and comprehensive discovery, to perform the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) term enrichment analysis. Cytoscape was used to identify prospective genes and important pathways for building the protein-protein interaction (PPI) network of DEGs. A total of 981 differentially expressed genes (DEGs) were identified in T2D, including 301 upregulated and 680 downregulated genes. Six potential genes (PIK3R1, RAC1, GNG3, GNAI1, CDC42, and ITGB1) were selected based on the DEGs’ PPI network. There are other studies investigation the genes affecting diabetes (Sufyan et al. 2021; Lei et al. 2021; Pujar et al. 2022; Prashanth et al. 2021; Dieter et al. 2021; Chen et al. 2022c; Oraby et al. 2022). The study (Nair et al. 2020) aims to find a strong new set of symptoms so that DR screening can be done automatically. For the automated DR detection, a new symptomatic instrument based on information-driven profound learning was made and tested. The system used a shading technique on fundus images and ranked them based on whether or not they had DR, allowing for the easy identification of medically relevant cases for referral. While the complication of DR has been extensively studied, but less attention has been given to the impact of diabetes on ocular surface health. While diabetic keratopathy can be a serious threat to one’s eyesight, it can also be used as a diagnostic and therapeutic tool for other diabetic systemic problems. In this review article (Bu et al. 2022), the current knowledge of diabetic ocular surface illness, which includes neuropathy, dry eye, and other corneal morphological alterations, was discussed. They also addressed several topics that have received less attention in the existing literature. This involves problems of the ocular surface in pre-diabetic stages as well as variances in the pathology of the ocular surface between human diabetics and animal models of diabetes. In addition to this, the author highlighted that recent breakthroughs have been made in experimental models of diabetic ocular surface problems. Finally, the most recent approaches to the diagnosis, therapy, and monitoring of ocular surface diseases caused by diabetes were analyzed. The future research prospects were described, the recent development of a technique known as protein microarrays, which has the potential to be utilized in the diagnosis and management of diabetic ocular surface disease. Traditional dilated ophthalmoscopy has been used for the initial screening of diabetic symptoms. However, as DR is a progressive disease, fundus doesn’t provide details of structural changes in the retina. Technological improvements in retinal imaging have allowed for more accurate diagnosis and treatment of DR. The review study (Saleh et al. 2022) discussed the several imaging techniques that can be used to diagnose, detect, and grade DR. Schreur et al. (2022) reviewed the current imaging modalities (CFC, OCT, OCT-A, FFA, UWFP) for the DR diagnosis. It was suggested that integrating data from multiple imaging techniques could lead to more precise diagnosis, treatment planning, and monitoring of disease.

Summary DR is the leading cause of blindness in people of working age, and it is getting worse as the number of people with diabetes rises. Imaging modalities include CFC, OCT, and OCT-A, which are used for DR screening and diagnosis. Existing treatments for DR focus on inflammation, angiogenesis, and oxidation, but they don’t work well enough to cure the disease completely. Researchers are working on finding new genes and risk factors for DR.

9.2 Current clinical practices in screening AMD

The aging of the population has been a contributing factor in recent years to the rise in the number of patients diagnosed with ocular disorders. AMD is one of the most prevalent and, if left untreated, can result in complete blindness. Toward this end, the purpose of study (Heesterbeek et al. 2020) is to review previous research on the phenotypic, demographic, environmental, genetic, and molecular risk factors for the development of AMD. The progression of the disease has been measured in different ways. Even though vision loss seems like a good way to measure the progression of AMD in natural history studies or clinical trials, it is often not a good idea to use visual acuity as an endpoint because vision loss can take years to happen. To account for this, most AMD studies have relied on anatomical endpoints to track the disease’s development across short periods of time. Geographic atrophy (GA) and choroidal neovascularization (CNV) are the two main anatomical markers used to diagnose late AMD (Schaal et al. 2016). GA, which is also known as dry AMD, is characterized by the loss of photoreceptors, RPE, and choriocapillaris, which results in a gradual loss of vision over the course of time. CFP, FAF, and OCT are the three main imaging modalities utilized for GA detection. On CFP, it can be difficult to spot early signs of GA development and establish the margins of GA in a reliable manner, whereas FAF and OCT imaging are more suited for this purpose and are more likely to produce accurate results. These new vessels expand into the retina, causing subsequent leakage and/or haemorrhage, which can lead to serous RPE detachment, which is accompanied by a rapid loss of vision, and finally causes a scar in the macula that poses a threat to the patient’s vision. CFP, OCT, and FA can identify exudative nAMD by fluid leakage and haemorrhaging, however, in some circumstances, a CNV can already be detected before exudation occurs using indocyanine green angiography (ICGA) and OCT-A imaging (Treister et al. 2018). This is possible because ICGA and OCT-A imaging are more sensitive to CNVs than CFP, OCT, and FAG. Drusen can be a result of aging or an early sign of AMD, depending on their number, size, shape, distribution, and morphology. The goal of the study (Domalpally et al. 2022) was to determine the prevalence of drusen outside the macula as well as their contribution to the development of AMD. Drusen size, area, and placement were analysed from the macular grid using 30-degree, wide-angle, colour photos from the third baseline field. Comparisons were made between drusen found outside of the macula and those found inside. It was observed that drusen outside the macula are common in eyes with AMD, and they occur more often as the number of drusen in the macula grows. Extramacular drusen do not provide an additional risk to previously recognized risk factors in the progression of intermediate AMD to late AMD. The study (Salehi et al. 2022) presented meta-analysis and systematic review, suggested that patients with AMD have significantly reduced values for several OCT measurements, including subfoveal CT, average pRNFL thickness, and average macular GCC thickness, compared to the general population. Quantifying the relative ellipsoid zone reflectivity (rEZR) could be a structural surrogate measure for an early disease development AMD (Saßmannshausen et al. 2022). Pigmentary abnormalities, the existence of reticular pseudodrusen (RPD), and the volume of the retinal pigment epithelial drusen complex (RPEDC) were examined in relation to the rEZR using linear mixed-effects models. The results of this investigation demonstrated a connection between rEZR and the existence of iAMD high-risk characteristics as well as increasing disease severity. HF seen on OCT scans is associated with ectopic RPE and hence represents a risk factor for the development of advanced AMD (Cao et al. 2021). It was observed that HF is not predictor but rather a marker of disease severity. The process of function gain and loss begins with individual RPE cells in the in-layer and extends to all aberrant phenotypes. The presence of evidence for RPE transdifferentiation, which may have been caused by ischemia, lends support to the concept of an epithelial–mesenchymal transition.

The pathophysiology and etiology of AMD are heavily dependent on inflammation. Humanin G (HNG) is a mitochondrially derived peptide (MDP) that has been shown to be cytoprotective in AMD and to be able to defend against the mitochondrial and cellular stress that is caused by damaged mitochondria in AMD. The purpose of study (Nashine et al. 2022) was to evaluate the hypothesis that the levels of inflammation-related marker proteins are higher in AMD and that treatment with HNG lowers the levels of those proteins. It was observed that HNG functions to decrease inflammatory protein production in stressed or injured cells, which may have a role in the development of AMD. It is important to highlight that HNG does not have any deleterious effects on cells that are healthy and have proper homeostasis. The study (Bhandari et al. 2022) was presented to determine if patients who underwent incident cataract surgery were at an increased risk for acquiring late-stage AMD. Late AMD was characterized by the presence of neovascular AMD or geographic atrophy seen on annual stereoscopic fundus scans or as documented by medical records, including intravitreous injections of medication intended to inhibit the effects of vascular endothelial growth factor. It was concluded that participants with up to 10 years of follow-up having cataract surgery did not raise the chance of developing late AMD. The objective of the study (Chua et al. 2022) was to investigate the correlations between air pollution and self-reported cases of AMD as well as in vivo measurements of retinal layer thicknesses. Greater self-reported AMD was associated with greater exposure to PM2.5, while differences in retinal layer thickness were associated with PM2.5, PM2.5 absorbance, PM10, NO2 and NOx. Polypoidal choroidal vasculopathy is common in Asia and is considered to be a form of neovascular AMD. In a similar vein, cardiovascular disease (CVD), which is also a complex condition associated with aging, is a main cause of morbidity and mortality. Previous work (Ikram et al. 2012; Hu et al. 2010) has shown that patients with AMD have a higher risk of cardiovascular disease, suggesting a “common soil.” Smoking, poor diet, and a lack of physical activity are all risk factors for cardiovascular disease, which also contribute to the development of AMD (Mauschitz and Finger 2022).

According to the review of many cohort studies in the general population, high levels of physical activity appear to be protective against the onset of early AMD (Mauschitz et al. 2022). These findings confirm that physical activity is a modifiable risk factor for AMD and can help guide future efforts to minimize the public health burden of this condition. A number of pharmacologic treatments are available for neovascular AMD; however, there is currently no authorized therapy that appreciably slows the progression of dry AMD. Both dry AMD and neovascular AMD have unmet medical needs related to the development of viable treatment options. In light of these findings, it is clear that innovative methods of drug delivery are required to enhance the pharmacological effect and drug concentration at the target areas. The study (Jiménez-Gómez et al. 2022b) summarised the pathophysiology and the existing therapy options for AMD, concentrating on the developing ocular sustained drug delivery techniques undergoing clinical trials. Although there is currently no cure for AMD, its symptoms can be suppressed. Current treatments for AMD are divided into four categories: device-based, anti-inflammatory drug, anti-vascular endothelial growth factor, and natural product treatment (Cho et al. 2022). All of these treatments come with side effects, but early AMD therapy combined with products has many benefits because it can stop RPE cell apoptosis at safe doses. Death of RPE cells is associated with oxidative stress, inflammation, and carbonyl stress, as well as a lack of essential cell components. Anti-oxidant, anti-inflammatory, and anti-carbonylation properties can be possessed by certain natural products. Candidates for AMD medicines derived from natural products reduce RPE cell death effectively; they have the potential to be utilized as medication for preventing early (dry) AMD. RPE cell transplantation intends to arrest or reverse vision loss by preventing the death of photoreceptor cells. It is regarded as one of the most promising stem cell therapy applications in the field of regenerative medicine. Recent studies have focused on transplanting RPE cells produced from human pluripotent stem cells (hPSC) (O’Neill et al. 2020). Early clinical trial data indicate that transplantation of RPE cells produced from hPSCs is safe and can enhance vision in AMD subjects. Unfortunately, the techniques currently employed to generate hPSC-RPE cells for clinical studies are inefficient. Delivering RPE cells on a thin porous membrane for better integration into the retina can be one way to enhance transplantation outcomes. Another way to improve transplantation outcomes is to manipulate the outcome by controlling immune rejection and inflammatory reactions. In article (Cohn et al. 2021), author summarised the most important findings from pre-clinical studies about how different laser interventions might work to make changes that are good for the RPE, Bruch’s membrane, and choriocapillaris. As laser technology has progressed toward short pulse, non-thermal delivery, such as the nanosecond laser, the most important takeaways from clinical trials of laser treatment for AMD have been summarised. Another study (Csaky et al. 2022) discussed the different treatment approaches for AMD.

Summary AMD is considered to be the most prevalent and, if left untreated, can lead to total blindness. CFP, FAF, OCT, OCT-A are the imaging modalities utilized for AMD diagnosis and progression tracking. However, OCT has been widely used by ophthalmologists to detect structural changes due to AMD. Advancements in multimodal imaging and functional testing tools, as well as continuous exploration of important disease pathways, have set the stage for future well-conducted randomized trials using nanosecond and other subthreshold short pulse lasers in AMD.

9.3 Current clinical practices in screening glaucoma

The functional and anatomical changes that occur in glaucomatous eyes can be powerfully described using today’s technologies for evaluating the disease’s activity. However, there is still a need for innovative diagnostic tools that can diagnose glaucoma early and more precisely (Wu et al. 2022). Glaucoma has been identified by screening tests, and even though therapy was associated with a lower risk of glaucoma development, there is still no evidence that treatment improves visual outcomes and quality of life (Chou et al. 2022). The study (Aspberg et al. 2021) was conducted to evaluate how population screening affects the rate of blindness caused by OAG. The longest-ever follow-up of an OAG screening project that lasted more than 20 years. According to the findings, the prevalence of cases of bilateral blindness in the population that was tested dropped by 50%. The study (Munteanu et al. 2022) performed an assessment of risk factors and various indicators of symptoms between POAG patients and non-glaucoma patients (NG), as well as between POAG with high intraocular pressure and normal intraocular pressure, in tertiary preventive measures. Only age (F = 2.381, df = 40, p = 0.000) remains statistically significant after controlling for potential confounders such as gender, place of residence, and marital status. The most common forms of pediatric glaucoma and its diagnosis and treatment were reviewed, based on the childhood glaucoma research network (CGRN) (Karaconji et al. 2022). These include juvenile open-angle glaucoma (JOAG),and primary congenital glaucoma (PCG). In addition to this, other causes of glaucoma linked to, non-acquired ocular anomalies (Peters anomaly, Axenfeld-Rieger anomaly, and aniridia), systemic disease (neurofibromatosis, Sturge-Weber syndrome) were investigated. Early diagnosis of the structural changes paves the way for earlier therapy and results in slower disease progression. Screening for glaucoma through tonometry has a significant false positive and false negative detection rate. It was observed that screening with an assessment of the optic disc is likely to identify the majority of glaucoma incidences. The study (Karvonen et al. 2020) evaluated the screening capacities of the OCT, laser polarimetry (GDx), and scanning laser ophthalmoscopy (Heidelberg Retinal Tomograph, HRT), and found that all tools were quite similar. Since the accuracy of each of the factors that were evaluated was moderate, screening with these parameters alone does not produce reliable results. The prospective study (Yu et al. 2016) demonstrates that OCT event and trend-based progression analysis programmes compare to linear mixed modeling (without relying on a normative database) and detect progression earlier than SAP. Damage to the RNFL could be detected with OCT prior to the onset of visual field abnormalities on SAP, suggesting that RNFL thickness assessment is a useful screening tool for glaucoma (Kuang et al. 2015). Vazquez et al. (2021) summarized the findings of current studies that concentrate on the relevance of OCT parameters in the diagnosis and monitoring of glaucoma. It has been shown that the ONH, RNFL, and macular parameters have significant diagnostic ability. According to Wanza et al. (2010), the maximum allowable difference in RNFL between two visits is 4µm. The thinning that is more than 4µm classified as a statistically significant progressive change from the baseline. The study (Aksoy et al. 2020) was presented to evaluate the accuracy of SD-OCT segmentation software in differentiating early glaucoma from ocular hypertension and healthy eyes. In addition to this, compassion of macular layer thicknesses between early glaucoma, ocular hypertension, and healthy eyes was performed. It was concluded that analysis of the pRNFL and macular segmentation can work together to provide a more accurate early diagnosis of glaucoma. The efficacy of SD-OCT RNFL thickness measurements in glaucoma diagnoses was evaluated (Mansoori et al. 2011), and results showed that SD-OCT could be helpful for identifying glaucoma patients in the elderly. The progression of the RNFL loss is more sensitive than the GCIPL loss in patients with early to moderate glaucoma (Hammel et al. 2017). However, in a more advanced stage, GCIPL remains above ground, making macular analysis the more promising method for diagnosing progression (Bowd et al. 2017). In addition to GCIPL, metrics related to ONH can be used to monitor development in the advanced stages (Chen 2009).

Despite developments in imaging technology, perimetry still plays a vital role in the diagnosis and management of glaucoma. The review article is to highlight recent developments in perimetry methods and to illustrate improvements in collecting and analyzing data on the visual field (Prager et al. 2021). The diagnosis and characterization of glaucomatous field damage have been significantly aided by the application of artificial intelligence in research settings. In addition, tablet-based techniques and virtual reality headsets show potential for the screening and remote monitoring of glaucoma patients. Research has shown that the LC plays a crucial role in the pathophysiology of glaucoma development and progression, and is thus considered an anatomic site of glaucomatous optic nerve injury (Czerpak et al. 2021). The most significant findings were the decrease in LC thickness, posterior LC displacement, and the presence of localized defects (Bastelica et al. 2022). In vivo, evaluation of LC features in both normal and glaucomatous eyes has been possible with the development of high-resolution OCT devices, most notably enhanced depth imaging OCT (EDI-OCT) and swept-source OCT (SS-OCT). The study (Kim et al. 2022) investigated whether the LC curve changes when IOP falls down after eye drops in normal tension glaucoma (NTG) subjects. It was concluded that topical glaucoma treatment resulted in a reduction in IOP from 15.7 2.5 mm Hg at baseline to 11.2 1.7 mm Hg. There are other clinical studies (Kim et al. 2019a; Czerpak et al. 2022; Bastelica et al. 2022; Guan et al. 2022; Glidai et al. 2022; Mochida et al. 2022) found in the literature highlighting the significance of LC for glaucoma diagnosis and progression tracking. Despite the advancement in imaging technology, accurate analysis of the LC is still changeling (Andrade et al. 2022; Kim et al. 2020).

Increased IOP and/or glaucomatous optic neuropathy have been linked to a wide range of systemic diseases, including renal disease and hemodialysis, neurologic disorders, primary familial amyloidosis, endocrine disorders, vascular disease, collagen vascular disease, hematologic disorders, irradiation; systemic viral disease, dermatologic disorders (Funk et al. 2022). An evaluation of the systemic illness causing the elevated IOP is necessary. Per et al. presented a study (Wändell et al. 2022) intended to examine the prevalence of OAG among people in Region Stockholm in relation to other somatic comorbidities. Higher fully adjusted OR (95% confidence intervals) were found for women and men with, cancer 1.175 (1.120–1.233) and 1.106 (1.048–1.166), hypertension 1.372 (1.306–1.440) and 1.243 (1.179–1.311), diabetes 1.138 (1.074–1.207) and 1.216 (1.148–1.289). It was concluded that glaucoma is more likely to develop in people who have certain somatic disorders, most notably diabetes, hypertension, and cancer. In addition to this, the risk of glaucoma is also higher in neighborhoods with higher socioeconomic status as compared to neighborhoods with lower socioeconomic status. Ro et al. (2022) conducted their research on the risk of OAG in the 12 years that followed the diagnosis of chronic kidney disease (CKD) using a cohort that was representative of the entire country. The results showed that CKD is a major contributor to the development of OAG, and that the risk of OAG increases with the severity of CKD. The purpose of study (Kolli et al. 2022) was to determine whether the genetic risk for POAG influences the correlations between cardiopulmonary diseases and glaucoma indicators. The history of common cardiopulmonary conditions and cardiopulmonary measurements were analyzed in the UK Biobank, together with history of glaucoma. The prevalence of diabetes (17.5% vs 6.5%), CKD (6.7% vs 2.0%) dyslipidaemia (31.2% vs 18.3%) were all greater in glaucoma patients than in controls (adjusted p0.0013 for each) within decile 1. Contrast test p-value for difference 0.05 indicates that the extent of the connection between glaucoma and diabetes, CKD, and glycated haemoglobin varies between deciles 1 and 10. The study (Mauschitz et al. 2022) conducted retinal layers assessment as biomarkers for brain atrophy. They investigated the relationship between segmented retinal layers and various cerebral parameters from magnetic resonance imaging (MRI). Relationships between retinal measurements and volumetric brain measures, as well as fractional anisotropy (FA) as a marker of microstructural integrity of white matter (WM) were analyzed using multiple linear regression. Inner retinal volumes were correlated with total brain and GM volumes, and even more strongly with WM volumes and FA. It was that both the inner and outer retina were linked to hippocampal size, whereas the outer retina was most strongly associated with GM volume.

Wang et al. (2022c) reviewed recent advances in the genetics of POAG. The study discussed how recent developments in research methods have led to the discovery of new risk genes, as well as how subsequent biological investigations could be conducted in order to define how the risk that is represented by a genetic sequence variant manifests itself in patients. By analysing transcriptomes from single cells with Smart-Seq2, new genes were found involved in regeneration that substantially increase axon regeneration (Li et al. 2022a). Among these, Anxa2 is the most powerful because of the synergistic effect it has with its receptor tPA in Pten-deletion-induced axon rejuvenation. In a clinically relevant model of glaucoma, Anxa2, its downstream effector ILK, and Mpp1 significantly protect RGC somata, axons and preserve vision. Asefa et al. (2022) prioritized the genes that are most likely to be “causal” and to uncover the functional properties and underlying biological pathways of POAG candidate genes. They drew on data from the GERA and UK Biobank cohorts to analyze the genetic risk factors for POAG. Systematic gene-prioritization analyses were performed based on, nearest genes, co-regulation analysis, epigenomic data, transcriptome-wide association studies, and nonsynonymous single-nucleotide polymorphisms. The study found 142 genes that should be prioritized, of which 64 were found to be new for POAG. According to at least four different lines of evidence, the genes that were given the highest priority were BICC1, AFAP1, and ABCA1. Another review study (Aboobakar and Wiggs 2022) summarized the genetic relationships between various types of glaucoma and the potential roles these genes play in disease pathogenesis. There are other studies (Hamel et al. 2022) (Milanowski et al. 2022; Choquet et al. 2022; He et al. 2022b) presented in the literature related to glaucoma risk genes. Glaucoma treatment has been challenging, because ocular barriers have inherent mechanics that impede the entry of ophthalmic medicines. Several carriers (inorganic, polymeric, hydrogel, and contact lens-based) with specialized chemical and physical properties have been intensively investigated as potential solutions. The review article (Patel et al. 2022) summarized the latest developments in ocular delivery formulations with a particular emphasis on glaucoma, including the different types of nanocarriers and delivery routes. IOP can be lowered with the use of the new Rho kinase inhibitor netarsudil/latanoprost FDC by enhancing trabecular outflow (Asrani et al. 2020; Brubaker et al. 2020). A very promising platform for the treatment of glaucoma and simultaneous protection of the ocular surface would be the combination of hypotensive liposomal formulations with osmoprotective agents (González-Cela-Casamayor et al. 2022). Drug delivery systems for the ocular surface, like contact lenses and nanotechnology, are currently under study as potential sustained release(SR) therapeutics. There is growing interest in using aqueous gels prepared with hydrophilic polymers (hydrogels) and based on stimuli-responsive polymers for the treatment of numerous ocular disorders (Akulo et al. 2022). Because of their chemical structure, they are able to incorporate a wide variety of ophthalmic medications, allowing them to achieve their optimal therapeutic doses while also providing more clinically relevant time courses (weeks or months, as opposed to hours and days). This will inevitably result in a reduction in dose frequency, which will improve patient compliance and clinical outcomes. Glaucoma is a chronic disease that may respond well to gel technologies used as drug-delivery methods and as antifibrotic therapy during and after surgery (Fea et al. 2022). Petchyim et al. (2022) investigated bleb-related infections, including their symptoms, causes, treatments, and effects. Pain and redness were the primary symptoms that patients experienced when they had a bleb-related infection. Nearly 25% of people had experienced some kind of eye injury in the past. Patients who display symptoms and engage in undesired behavior that have the potential to result in bleb infection should be identified, and treatment and education should both be provided to these patients. The study is the first direct proof that glaucoma can be treated with noninvasive femtosecond laser trabeculotomy (FLT) (Mikula et al. 2022). The study (Mikula et al. 2021) examined the safety and efficacy of FLT in reducing IOP in a perfused anterior segment model. The findings suggested that FLT treatment can result in a considerable reduction in IOP in a perfusion model, which suggests that it could be a viable noninvasive treatment option for POAG. Ex-Press shunt implantation, canaloplasty, and viscocanalostomy, are alternative surgical techniques that show promise in equivalence but need more research to evaluate discrepancy in outcome (Siesky et al. 2022) accurately. In addition to differences in treatment results, social disparities can also be seen to have an effect on clinical care in the form of decreased adherence, choice, and, access. Adherence to glaucoma drugs is a significant issue in the management of glaucoma patients, as up to fifty percent of patients fail to receive the desired treatment advantages. The study (Zaharia et al. 2022) overarching objective was to draw connections between the various approaches to gauging glaucoma patients’ propensity to take their prescribed medications, as well as the interventions designed to improve adherence.

Summary The damage caused by glaucoma is considered to be permanent and cannot be completely remedied. Nevertheless, the advancement of the disease can be decelerated by means of pharmaceutical interventions, laser therapy, or surgical procedures, which may aid in mitigating additional visual impairment. Various modalities have been used to detect structural and functional damage due to glaucoma, which includes fundus, GDx, scanning laser ophthalmoscopy, SAP, OCT, and OCT-A. However, OCT is widely used by ophthalmologists for the structural analysis of glaucomtous damage. It has been revealed that the ONH, RNFL, and macular characteristics all have substantial diagnostic ability. LC analysis with OCT has great potential in glaucoma management if some of the current constraints are resolved, particularly those relating to image acquisition.

10 Use of AI models in screening retinal diseases

Manual identification of retinal lesions through the scans of fundus, OCT, and OCT-A imagery is a time-consuming and subjective task, thus resulting in intra-variations. To overcome this limitation, AI algorithms are now used in the clinical practice to assist the ophthalmologist in effectively screening the retinal diseases. The research in this direction is ongoing in order to improve the accuracy and robustness of these automated methods. The ensure best readibility, we have divided all the literature, related to the identification of different retinal diseases, into three groups based upon the employed ML, or DL models.

10.1 Traditional image processing schemes

Optic disc and cup-based methods Ophthalmologists widely use fundus images for initial screening of various retinal diseases such as DR, AMD, DME, and glaucoma. The bright circular region in the center of the human retina is characterized as an optic disc (OD) in fundus scan. Locating the OD accurately is a crucial stage in computer-assisted glaucoma and DR diagnosis. OD detection has been a challenge in fundus analysis if there are other bright spots on the retina or if the images were not taken in a very controlled setting. In the paper (Kose and Ikibacs 2011), simple statistical methods were suggested for finding the OD and macula and determining the diameter of the OD and the distance between it and the macula. The weighted-distance method was used to make the healthy parts of a retinal image bigger. Qureshi et al. (2012) propose an ensemble algorithm that can automatically detect the OD and macula in fundus images. The feature set was based on pyramidal decomposition, edge detection, entropy filter, hough transformation, and uniform sample grid. On the three publicly accessible databases, Diaretdb0, Diaretdb1, and DRIVE, experimental findings and analyses were presented, and the combined algorithm achieved an average Euclidean error of 29.64, 24.26, and 26.80, respectively. The study (Usman et al. 2014) proposed a method based on classic image processing techniques to localize the OD. To eliminate gaps and erroneous regions, a threshold is first applied to the red plane of the fundus picture, and then morphological techniques are performed. The linked component labeling algorithm was employed to assign labels to the objects of the binary image. Adaptive histogram equalization and Laplacian of Gaussian (LoG) kernel were applied to enhance the bright regions within the image. The threshold image is then subjected to morphological opening to eliminate the noisy regions. Quantitative evaluation of the proposed system was performed on publicly available datasets DRIVE, STARE, and DiaretDB and achieved an accuracy of 100%, 97.50%, and 95.85%, respectively. Approximate Nearest Neighbor Field (ANNF) maps are often in computer vision and graphics to solve problems, including noise removal, image completion, and retargeting. Ramakanth and Babu (2014) extended the application of ANNF maps to include medical image analysis and more particularly, the detection of OD in fundus images. ANNF algorithm feature match was employed to determine the similarity between a reference image of an OD. This gives a list of the patches of image that are closest to the patches in the reference image. For OD detection, a probability map was made from the patche’s distribution in the query image. Five publicly accessible databases (DIARETDB0, DIARETDB1, DRIVE, STARE, and MESSIDOR) were used to evaluate the suggested methodology. The study Kao et al. (2014) proposed a method that employed area which was free from vessels and adaptive Gaussian template for fovea center detection in retinal images. The center of the OD is localized by using the template matching method. Next, the disc–fovea axis was defined by scanning the vessel-free region. Finally, the fovea center was identified using the matching of the fovea template. The centers of the OD and fovea in the various image resolutions were identified using adaptive Gaussian templates. For the DIARETDB0, DIARETDB1, and MESSIDOR databases, the proposed method found the fovea with an accuracy of 93.1%, 92.1%, and 97.8%, respectively. The OD and OC were extracted from fundus images usbased on adaptive thresholding for glaucoma diagnosis (Issac et al. 2015). An automatic method (Hu et al. 2017) was proposed that combined color difference information and vessel bends information to determine the OC boundary from fundus images. Xiong and Li (2016) proposed OD localization method that can accurately localize the OD even when the retinal image contains pathological abnormalities. Extracted features included were vessel direction, edges, intensity, and luminous region size. The proposed approach achieved an accuracy of 100% for the DRIVE, 95.8% for the STARE, 99.2% for the DIARETDB0, and 97.8% for the DIARETDB1 database. Wavelet feature extraction combined with optimized genetic feature selection to segment OD for glaucoma diagnosis through fundus images (Singh et al. 2016). Panda et al. (Panda et al. 2017) developed OD localization method incorporating three features; retinal vascular visual cues—global/local vessel symmetry, and vessel component count. The first OD center is determined by utilizing the skeletal image component with the highest concentration of major blood vessels. The proposed technique was effective for ocular diseased with different symptoms, such as bright lesions, hemorrhages, and twisted blood vessels. The study (Mahmood and Lee 2022) proposed a technique based on color and blur analysis for accurate detection and localization of the OD. To improve the visibility of the OD, fundus image was transformed into Lab color space. The extended maximum transform and directional blur were used to extract OD candidates accurately. In order to isolate the OD from the rest of the candidates, a radial blur was applied. Zaaboub et al. (2022) proposed an algorithm for OD segmentation in fundus scans. In the first stage, the OD was located, which was accomplished by performing 1) a preprocessing step, 2) vessel removal, and 3) a geometric analysis that delineate OD position. An OD contour is accurately completed using a candidate. Ten different public databases and one local database were employed to test the algorithm. RimOne and IDRID had an accuracy of 98.06% and 99.71%, respectively.

Blood vessels extraction methods Computer-aided pathology systems rely heavily on blood vessel detection in retinal images for early screening and diagnosis of ocular diseases such as retinal detachment, DR, and DME. Numerous studies (Ravichandran and Raja 2014; Liao et al. 2014; Ali et al. 2017; Ali et al. 2017; Alhussein et al. 2020) were found in the literature that utilized the histograms and enhancement techniques for vessel segmentation. Ravichandran and Raja (2014) developed an enhancement technique that incorporated histogram matching and Gabor filtering. The method first applied a region-based histogram equalizer to the retinal image, then used a 2D Gabor filter to further improve the appearance of the vessels. In the paper (Liao et al. 2014), a novel approach was proposed to the enhancement of retinal vessels. Initially, a multi-scale top-hat transformation was used to extract the best high-contrast and low-contrast picture features from an image. The optimal bright image features are then added to the image, and the optimal dim image features are removed, for a preliminary quality enhancement. As shown by the results findings on the DRIVE and STARE databases, the proposed technique efficiently boosted contrast and improved the finer features of the retinal vessels. Ali et al. (2017) proposed a method to detect retinal vessels in fundus images by combining automatic thresholding and Gabor Wavelet (GW). The green channel was extracted and then used to generate gabor feature image by utilizing GW. The final vessel output is generated by combining two vessel-enhanced images, each of which has been converted to a binary image by automatic thresholding. The algorithm achieved an accuracy of 94.53% on the Drive dataset. A method that performed retinal blood vessel analysis using more traditional methods was proposed (Toptaş and Hanbay 2021). The model extracted pixels-based features and grouped them into five groups, gradient, morphological, edge detection, statistical, and Hessian matrix. Every pixel is assigned an 18-D feature vector, and it is fed into the neural network. The system accuracy was calculated to be 96.18% for DRIVE and 94.56% for STARE. The normalized first and second-order derivatives of a Hessian matrix were used by study (Yang and Cheng 2014) to segment medical images. The Hessian matrix’s eigenvalues stand in for luminance data, while the eigenvector of the smallest eigenvalue reveals the orientation of the lines. A novel Hessian matrix-based vessel enhancement measure was presented in the study (Jerman et al. 2016) and addressed the issues with existing Hessian methods. These included insufficient responses to vessels of variable intensities and scales, as well as vessel bifurcation. Using the eigenvalues produced by the Hessian matrix at two different scales, the study (Alhussein et al. 2020) developed an unsupervised segmentation method to extract the thick and thin vessels. CLAHE technique was employed to enhance the contrast of retinal images. For contextual region tuning of CLAHE, a better version of the PSO algorithm was used. A morphological filter and a wiener filter were employed to remove noise. In order to extract thick and thin vessels, the eigenvalues of the Hessian matrix were calculated at two different scales. Global otsu thresholding was performed to intensity-transformed images and enhanced images of thick vessels, while ISODATA local thresholding was applied to enhanced images of thin vessels. Area, eccentricity, and solidity were used as region parameters in a post-processing step. On the publicly available CHASE DB1 and DRIVE datasets, the proposed framework was evaluated, where it showed a sensitivity of 77.76 and 78.51, and an accuracy of 95.05 and 95.59, respectively. The study (Madathil and Padannayil 2022) introduced a Morphological Closing-based Dynamic Mode Decomposition (MC-DMD) method for enhancing the retinal vessels, which is both effective and robust. The proposed algorithm uses the power of mathematical morphology to create the input channel for the DMD system, which separates the retinal images into their vessel and non-vessel features. The proposed method was accessed on three publicly available datasets: DRIVE, STARE, and HRF.

The other techniques for vessel enhancement in retinal images include, visual adaptation model (Wang et al. 2021d), Bi-orthogonal wavelet transform and bilateral filtering (Bala and Maik 2021a), retinex theory and dark channel prior method (Zhang et al. 2022a), luminosity and contrast enhancement (Kumar and Bhandari 2022), morphological operation (Ashanand and Kaur 2022), graph-based method (Zhao et al. 2015), multiscale fractional anisotropic tensor (Alhasson et al. 2018), and statistical feature-based transformation (Mahapatra and Agrawal 2021).

Retinal layers and lesions extraction methods The automated identification of retinal boundaries is an area of significant research interest due to its ability to offer a reliable, measurable, and unbiased evaluation of retinal lesions. Several automated algorithms for retinal layer segmentation have been suggested in scholarly literature. Those eight inner retinal boundaries were retrieved by Kromer et al. (2017). Before segmenting the layers, median filtering was performed for preprocessing, and curve regularization was then utilized. Duan et al. (2018) presented a developed model that used groupwise curve alignment to extract the retinal layers in OCT volume. The seven sub-retinal layers were automatically segmented using a high-pass iterative filter. In addition, they introduced a new denoising method tailored specifically for the OCT image (Roychowdhury et al. 2013). Using gradient information and shortest path search, Yang et al. (2010) devised a fast and accurate automated segmentation system to extract nine intra-retinal layers. Niu et al. (2014) developed an algorithmic technique for the automated segmentation of the six retinal layers. This method utilizes correlation smoothness constraint and dual gradient information. The construction of the edge map was followed by the utilization of a convolution operator in order to obtain the gradient map. The removal of outliers was facilitated by the imposition of smoothness constraints on spatial correlation.

Several automated segmentation algorithms have been suggested in the literature for extracting retinal layers and subsequently measuring their thickness. Mayer et al. (2010) presented an algorithmic approach for the quantification of RNFL in images obtained through SD-OCT. The utilization of gradient and local smoothing techniques was implemented in order to minimize the energy function utilized for the segmentation of retinal layers. The study calculated the average thickness of the RNFL for individuals with normal vision (94.1±11.7\(\upmu\) m) and those diagnosed with glaucoma (65.3±15.7\(\upmu\) m). A study (Kafieh et al. 2015) generated a thickness map of the eleven retinal layers through SD-OCT images of normal individuals without any known ocular abnormalities. Segmentation of retinal layers was based on edge statistics rather than contextual information. The average thickness of the RNFL, GCL-IPL, and GCC in the macula was calculated using a graph-based approach proposed (Gao et al. 2014). It was estimated that the RNFL thickness of healthy people was 36.5 \(\upmu\) m while that of glaucoma patients was 26.7 \(\upmu\) m. Three different OCT devices were used to test and compare the Iowa Reference Algorithms from the Iowa Institute for Biomedical Imaging, which are used for automatic intra-retinal layer segmentation and image scaling (Terry et al. 2016). Twenty-five healthy volunteers were scanned twice for macular volume using a 3D-OCT 1000 (Topcon), Cirrus HD-OCT (Zeiss), and a non-commercial long-wavelength (1040nm) OCT. Using the Iowa Reference Algorithms, the average thickness of 10 intra-retinal layers were calculated for the fovea, inner ring, and outer ring of the ETDRS field of view. The Iowa Reference Algorithms accurately segmented all 10 intra-retinal layers and showed more repeatability than the onboard software. With fixed-AEL scaling, the algorithm gave significantly different thickness values for the three OCT devices (\(P<0.05\)). An automated approach was suggested to segment and estimate the thickness of the ILM, inner outer segment, and RPE layers in an OCT image, and an online platform was made available for this purpose (Ometto et al. 2019). Motamedi et al. (2019) aimed to establish normative data for macular RNFL, GCL-IPL, and INL thickness; the obtained measurements for these parameters were 39.53 ± 3.57 µm, 70.81 ± 4.87 µm, and 35.93 ± 2.34 µm, respectively. The present study (Abdellatif et al. 2019) aimed to investigate alterations in the thickness of the outer retinal layer in individuals of varying ages, utilizing images obtained through SD-OCT. The subjects included in the study were deemed to be within normal limits.

Babu et al. (2012) proposed an algorithm for glaucoma diagnosis with an improved correlation coefficient. In order to measure CDR, the retinal nerve head vitreal boundary (RV) and the choroid nerve head boundary (RC) were separated. The RV and RC choroid boundaries were identified using multilevel thresholding and wavelet transform techniques. The accuracy was 92%, and the findings came extremely near to matching the values of the gold standard. Nithya and Venkateswaran (2015) compared segmentation methods for OCT and fundus glaucoma diagnosis. Four normal and eight glaucoma images were included. Fundus and OCT images of the same patient were used to determine CDR. Cup and disc regions in a fundus image were segmented using Hill climbing, fuzzy c-means clustering, and region growth. RPE and RNFL segmentation was used to determine cup and disc diameter in OCT images. Fundus and OCT CDR results were compared to clinical standards. Fuzzy c-mean clustering had the lowest performance error in experiments. Zhang et al. (2015) proposed an automated model to segment and quantify the CME with macular hole (MH) in 3D OCT scans. The model consisted of three stages, denoising, flattening, and the segmentation of intra-retinal layers. Next, intra-retinal CME was segmented utilizing adaptive boosting and kernel graph cut. Following that, adaptive boosting and kernel graph cut were used for fine segmentation of intra-retinal CME. The model was evaluated on 3D OCT from 18 CME and MH subjects and achieved the accuracy and false positive volume fraction of 84.6% and 1.7%, respectively. Sugruk et al. (2014) presented a model for the detection of AMD and DME. The model extracted the RPE layer from the macular OCT scans in order to diagnose AMD. Whereas to diagnose DME, cysts from the macular pathology were extracted. For cases of AMD, they found a success rate of 100%, while for DME, they found a success rate of 86.6%. Chiu et al. (2015) proposed a kernel regression based classification model to identify retinal layer boundaries and fluids within the retina. Then classification estimates were used to refine the extracted retinal boundaries while employing graph theory and dynamic programming framework. The model was evaluated on 110 B-scans from 10 subjects with severe DME pathology and achieved a mean Dice coefficient of 0.78. Wang et al. (2016) developed a model to identify between AMD, DME, and healthy macula scans. The Correlation-based Feature Subset (CFS) selection algorithm was used to filter the linear configuration pattern (LCP) based OCT images. Overall accuracy for the three classes was 99.3% for the best model based on the sequential minimum optimization (SMO) approach. Rashno et al. (2017) proposed a framework based on their proposed framework is based on neutrosophic transformation and graph-based shortest path search for the extraction of fluid-filled cyst segments from OCT scans. After undergoing a neutrophic transformation, an image was divided into three zones: true, indeterminate, and false. Noise in an image was represented by the indeterminate set, whereas the true set was obtained using their gamma-correction technique. The ILM, RPE, OPL, and ISM layers were extracted using a graph shortest path search. Using ILM and RPE, a target region of interest (ROI) was created, from which fluid-filled regions are automatically extracted using a cluster-based segmentation algorithm. Moreover, they were able to reach a sensitivity of 67.3% on the Duke Dataset-II, 88.8% on the Optima dataset, and 76.7% on their own dataset. Khalil et al. (2018) developed a technique that segments retinal layers to calculate CDR for glaucoma diagnosis. Delineating ILM and RPE measured cup-diameter-calculation (CDC) and disc-diameter-calculation (DDC), respectively, and employed countor, interpolation, and thickness value estimation techniques.

10.2 Machine learning schemes

This section presents the techniques based on classical ML models, such as linear regression, logistic regression, random forest, support vector machine (SVM), and XGBoost, for the identification and segmentation of different significant biomarkers of DR, DME, AMD, and glaucoma.

Detection of retinal lesions through fundus scans DR is a serious threat to sight and must be diagnosed and treated early to prevent permanent vision loss. MAs are the earliest symptom of DR, and their diagnosis is crucial. The study proposed (Akram et al. 2013) a three-stage methodology to discover MAs by early utilizing filter banks. The technique began by identifying and removing any potential MA candidate regions from the retinal image. The system created a feature vector for each candidate region based on variables such as shape, color, intensity, and statistics to determine the MA region. In order to increase the accuracy of classification, a hybrid classifier incorporated the Gaussian mixture model (GMM), the SVM, and an extension of the multimodel mediod based modeling approach in an ensemble. The model was evaluated on publicly available datasets DIARETDB0 and DIARETDB1. Akram et al. (2014) introduced a method for identifying and categorizing NPDR lesions. The system that was proposed involved preprocessing, the extraction of candidate lesions, the creation of a feature set, and classification. Candidates for the various NPDR signs (MAs, HMs, and EXs) were extracted. A feature set was created for each lesion based on the characteristics of lesions. The real lesions are found and labeled with the use of a hybrid classifier, which was based on weighted mixture of multivariate m-Mediods and a GMM. Based on the types, number, and locations of lesions, the system categorized retinal images into different stages of NPDR. The proposed model was evaluated on four databsets: DRIVE, STARE, MESSIDOR, and DIARETDB, and achieved accuracy of 95%, 97.5%, 98.90, and 95.05%, respectively. Huda et al. (2019) proposed a classification model for DR diagnosis based on decision trees, logistic regression, and SVM. Jebaseeli (2021) developed a system that classified DR and analyzed the disease severity with high accuracy. Adaptive Histogram Equalization (AHE) technique was used for image enhancement. After that, Hop Field Neural Network technique simultaneously segmented the boundaries and determined width of thevessels in fundus scan. The model was tested using a local dataset in addition to publicly available datasets (DRIVE, STARE, MESSIDOR, HRF, and DRIONS). Retinal blood vessel analysis on fundus images can provide a variety of significant biomarkers of retinal disease. DR is one of the ocular diseases that can be detected by analyzing the blood vessels in the retina. Deciphering cardiovascular illness from a retinal fundus image requires a careful examination of the vascular tree. Bifurcations and crosses of blood vessels must be located for an accurate study. Using COSFIRE filters, the study (Azzopardi and Petkov 2013) presented a method for detecting automatically vascular bifurcations in segmented fundus images. The COSFIRE filter’s output was determined by taking the geometric mean of the weighted responses of the blurred and shifted Gabor filters that were specifically chosen. The performance of algorithm was done on DRIVE and STARE datasets. Recall of 97.88% and precision of 96.94% were achieved on forty fundus scans from the DRIVE data set. Twenty manually segmented images from the STARE dataset had a recall of 97.32% and a precision of 96.04%. Manoj et al. (2013) proposed a technique that used feature based on orientation gradient vector fields, morphological transformation, and Gabor filter responses to extract the retinal vasculature in order to diagnose retinal disorders. A vector in 9-D feature space describes each pixel in the retinal image, and neural network classifiers are used to categorize those pixels using Feed Forward Backpropagation Neural network, Multi-Layer Perceptron, and Radial Basis Function. The method was evaluated on DRIVE, STARE, and MESSIDOR and achieved an accuracy of 96.23%, 95.83%, and 95.41%, respectively. Strisciuglio et al. (2016) proposed a robust method based on a set of B-COSFIRE filters selective to segment blood vessels in fundus images. Features were chosen automatically for maximum flexibility, and they can be customized for a variety of other applications. Analyzed and compared the efficacy of several distinct selection approaches based on the principles of machine learning and information theory.

Glaucoma affects the RNFL, which results in increased CDR; it is a clinically significant parameter for glaucoma diagnosis and screening. Computational analysis, such as the CDR, cup area, and rim area, is made possible by fundus imaging, greatly MESSIDOR assisting in identifying glaucoma. The fundus image provides the analysis of OC and OD; however, the edges of the OC are not very clear. Because of this, it is very hard to segment the OC accurately, and the performance of OD segmentation also needs to be improved. Akram et al. (2015) presented a unique feature set-based diagnostic system for automated identification of glaucoma using fundus images. The system was comprised of modules, which are as follows: preprocessing, detection of the region of interest based on autonomously segmented OD, feature extraction, and classification. Robust OD localization method (Usman et al. 2014) was employed. With the use of 2-D MESSIDORGabor wavelet and subsequent thresholding-based vessel segmentation, the vascular pattern is made more visible. Following ROI extraction, many features (CDR, Rim to disc ratio, mean intensity, standard deviation, energy, and gradient) were taken from it to create a detailed representation of the feature space. Local Fisher discriminant analysis (LFDA) was performed to do supervised enhancement of features. The retinal images were classified as normal or glaucoma using the m-Mediods model of normality. The performance of the proposed system was evaluated using publicly available (DRIVE, DiaretDB, Drions, HEI MED, HRF, MESSIDOR) and locally gathered fundus databases. An automated technique (Mvoulana et al. 2019) was proposed for glaucoma diagnosis from fundus scans. First, the method segmented the OD by combining a brightness criterion and a template-matching technique. Next, texture-based and model-based methods were employed to segment the OD and OC accurately. Finally, glaucoma screening is achieved through the calculation of the CDR, which allows for the differentiation between healthy and glaucomatous individuals. A publicly accessible DRISHTI-GS1 dataset was used to evaluate the proposed method and achieved 98% accuracy. The study (Mohamed et al. 2019) proposed an automatic glaucoma screening model based on superpixel classification. The preprocessing steps were noise removal and illumination correction, then input images were aggregated into superpixels by Simple Linear Iterative Clustering (SLIC). The statistical pixel-level (SPL) technique extracted image attributes from each superpixel based on histogram data and textural information. The extracted features are then fed into SVM to classify each superpixel into OD, OC, blood vessel, and background regions. On RIM-One dataset, the model was tested and achieved an accuracy and sensitivity of 98.6% and 92.3%, respectively. Rehman et al. (Rehman et al. 2019) employed region-based statistical and textural features to detect and localize OD in fundus images. Highly discriminative features were selected using the mutual information criterion, and four benchmark classifiers, SVM, RF, AdaBoost, and RusBoost, were compared. The RF classifier showed more competitive results than other classifiers and achieved an accuracy of 99.3%, 98.8%, and 99.3% on the DRIONS, MESSIDOR, and ONHSD datasets, respectively. DME is an ocular condition in which fluid rich in fat drains out of damaged blood vessels and is deposited near the macula, causing blurred central vision. The study (Akram et al. 2014) proposed a novel approach for macula detection while utilizing a rich feature set and a classifier based on the GMM. The method was evaluated on the DRIVE and STARE databases and achieved an accuracy of 100% and 95.4%, respectively. In ophthalmology, a transportable and cost-effective computer-aided diagnosis system can be achieved by the use of the d-Eye lens, which can be attached to a smartphone (Elloumi et al. 2018; Elloumi et al. 2021). Mrad et al. (2022) proposed an automated technique for glaucoma screening that is specifically designed for fundus images captured by smartphones was provided (SCFIs). The first challenge was to design an algorithm that achieved a higher level of accuracy even with moderate-quality SCFIs. The second task is to make the detection process computationally cost and recourse effective so that the method can be used on a smartphone. To do so, a central concept was used to infer glaucoma from vessel displacement inside the OD, where the vascular tree may still be well described using SCFIs. So, the vessel tree is broken up into sections and divided into quadrants based on the ISNT. The centroid of the vessel distribution on each quadrant is then determined. After the feature vector was generated, it was fed into a classifier (SVM) to diagnose glaucoma accurately.

Detection of retinal lesions through OCT scans An automatic technique based on graph theory dynamic programming and SVM was presented in the study (Srinivasan et al. 2014a). Seven to 10 retinal layers were effectively extracted using the proposed method. Septiarini et al. (2018) proposed an RNFL segmentation model by creating a co-occurrence matrix. The model employed 160 and 40 fundus images for training and testing, respectively, and achieved an accuracy of 94.52%. The study (Zang et al. 2019) performed the analysis of retinal layers and capillary plexuses from OCT and OCT-A scans by segmenting the optic disc and retinal layers. A neural network and graph search technique was combined to segment the OD. The study (Hassan and Raja 2016) proposed an automated algorithm for detecting ME while using directional gradients of the candidate OCT scan. Three features from computed gradients were extracted and used to train linear discriminant analysis to classify ME and healthy scans. They tested their proposed system on 30 OCT B-scans and got a sensitivity of 100% and a specificity of 86.67%. In the work (Abhishek et al. 2014), an automated segmentation approach was described for detecting intra-retinal layers in OCT images that are significant for edema detection. They found RPE layer and detected the shape of the drusen, and finally, the technique employed a binary classification to distinguish between AMD and DME scans. Results from experiments showed that AMD and DME were classified with an accuracy of 87.5%. Srinivasan et al. (2014b) developed an algorithm for the detection of AMD and DME based on multiscale histograms of oriented gradient descriptors. Finally, supervised SVM was used for the classification task. The classifier was successful in identifying all cases of AMD and DME and 86.67% of normal patients. The quantitative classification of AMD and normal eyes through retinal OCT images was presented (Farsiu et al. 2014). The model semi-automatically segmented the RPE, drusen, and retina. A map of “normal” non-AMD thickness was created by registering and averaging maps of thickness from control participants. Five automated classifiers were generated based on a generalized linear model regression framework. The classifier achieved an area under the curve (AUC) greater than 0.99. Khalid et al. (2017) proposed a model for the diagnosis of retinal epithelial detachment (RE), CSR, and AMD based on multilayered SVM. The model was evaluated on 2819 OCT images (1437 healthy, 640 RE, and 742 CSCR) from 502 patients across two datasets, achieving an accuracy of 99.92%, a sensitivity of 100%, and a specificity of 99.86%. The study (Hassan et al. 2016a) presented proposes an automated framework based on SVM to classify ME and CSR from OCT images. A total of 30 labeled images (10 ME, 10 CSR, and 10 healthy) were utilized for training a model while using five features (two derived from cyst fluids inside the retinal layers and three features were extracted from thickness profiles of the sub-retinal layers). A total of 90 TD-OCT images (30 for ME, 30 for CSR, and 30 for healthy) from 73 patients were used to evaluate the algorithm. It correctly identified 88 of 90 cases (97.77% accuracy, 100% sensitivity, and 93.33% specificity). Hassan et al. (2016b) utilized coherent tensors to develop an automated method to segment and evaluate the subretinal layers in OCT scans. Then, the SVM classifier was employed to make a ME prediction based on the subretinal layers of the candidate images. Seventy-one OCT images were obtained locally from 64 patients, 15 and 49 were ME and healthy subjects, respectively. Overall, the model successfully distinguishes between ME patients and healthy subjects with an accuracy of 97.78%. The proposed model (Rathore et al. 2021) assists in predicting whether or not DR may be identified based on the number of exudates visible in retinal fundus images. Several procedures have been taken in order to detect exudates, including scaling, removal of blue channels, performing feature extraction with Local Binary patterns (LBP), and classifying the images via SVM.

10.3 Deep learning (DL) schemes

DL computer models have recently made significant advances in various fields, such as computer vision, speech recognition, genomics, drug discovery, and ophthalmology. The DL algorithm can automatically learn complex structures from large data sets without explicit feature extraction. However, in order to achieve generalization large amount of data is required to train DL model. We have divided the DL literature into different categories, such as the segmentation model, segmentation-based classification model, and classification model. The segmentation section includes the studies which only extract the different biomarkers from fundus/OCT/OCT-A scans for various ocular diseases and the segmentation-based classification model performed classification based on the identified biomarkers. Whereas the classification section includes those studies which performed the classification of scans without any segmentation performed.

10.3.1 Segmentation

Retinal Lesions & Optic Cup/Disc Segmentation from Fundus Scans (Sevastopolsky 2017) proposed a universal method for automatically segmenting the OD and OC, which was based on U-Net CNN. CLAHE was used as a preprocessing step to equalize the contrast. The model was evaluated on publicly available databases DRIONS-DB, DRISHTI-GS, and RIM-ONE v.3 and achieved IOU of 0.89, 0.75, and 0.69, respectively. A CNN model was developed and trained to automatically and simultaneously segment the OD, fovea, and blood vessels (Tan et al. 2017). Fundus images were normalized, then three channels were extracted and fed into the model. On Drive dataset, the model correctly classified 92.68% of the ground truths. The best single-image accuracy was 94.54%, and the worst was 88.85%. The study (Al-Bander et al. 2018) proposed a DL model for the segmenting OC and OD. The model was based on DenseNet, a fully-convolutional network with a symmetric U-shaped topology that enables pixel-wise classification. The CDR for glaucoma diagnosis is then estimated along two axes using the projected OD and OC boundaries. The model was evaluated on four publicly available datasets, ORIGA, DRIONS-DB, Drishti-GS, ONHSD, and RIM-ONE. The results showed model achieved better segmentation, and it was suggested that the model could be used to detect various other retinal lesions. The majority of current ML segmentation techniques rely on manual segmentation of the disc. The annotation of pixel-level optic disc masks is a time-consuming task that invariably results in inter-subject variance. To address this issue, Xiong et al. (2022) proposed an automatic Bayesian U-Net with weak labels and Hough transform-based annotations to segment OD from fundus images. The expectation-maximization approach was alternately applied to estimating the OD mask and updating the weights of the Bayesian U-Net in order to optimize the model. Another study (Lu et al. 2019) presented a weakly-supervised learning approach based on modified CNN to segment the OD in fundus images. Labels at the image level and bounding box labels were employed to guide segmentation. The enhanced constraint CNN method was combined with the GrabCut method to construct a more refined foreground segmentation map with image-level labels and use them as “GroundTruth” for the subsequent training step. A weak loss function was used to constrain the training network base output size of a modified U-net model. The model was evaluated on RIM-ONE and DRISHTI-GS databases. This DL network contains 22 layers, which had 11 inception modules. Li et al. (2018a) proposed DL algorithm for recognizing glaucomatous optic neuropathy (GON) from color fundus images. A total of 70000 fundus images were randomly acquired from LabelMe (Label Me , accessed Nov 25, 2022). The results showed model achieved an AUC of 0.986 with a specificity of 92.0% and a sensitivity of 95.6%. Sun et al. (2018) proposed based on deep object detection networks to segment OD from retinal fundus images. To find the OD border by transforming the projected bounding box into a vertical and non-rotated ellipse. The method outperformed state-of-the-art algorithms in OD segmentation on the ORIGA dataset using Faster R-CNN as the object detector. The study (Sun et al. 2022) presented a Neural architecture search (NAS) in a two-level nested U-shaped structure. The segmentation model achieved average dice of 92.88% on the REFUGE dataset. The model was validated on Drishti-GS and GAMMA and obtained a dice of 92.32% and 92.11%, respectively. Shankaranarayana et al. (2019) proposed a DL framework to estimate monocular retinal depth from a fundus image. To handle the sparsity of labeled data, pretraining the deep network using a pseudo-depth reconstruction technique which was more effective than denoising methods. A fully convolutional guided network that used the depth map and the fundus image to perform OD and OC segmentation. The model was evaluated on three datasets ORIGA, RIMONEr3, and DRISHTI-GS. The study (Tian et al. 2020) used a multi-scale CNN to extract feature maps. For the segmentation task, GCN requires the feature map to be appended to graph nodes. The model was tested on the REFUGE dataset, Dice similarity coefficients (DSC) of the proposed technique for OD and OC were 0.97 and 0.95, respectively. A DL model DDSC-Net (densely connected depthwise separable convolution network) (Liu et al. 2021a) was proposed for OD and OC based on multi-category semantic segmentation. To achieve better segmentation results, the model utilized an image pyramid input and a depthwise separable convolutional layer. Model was evaluated on two publicly available datasets, Drishti-GS and REFUGE. The DDSC-Net outperformed GL-Net by 0.70 in disc coefficients on the Drishti-GS dataset and pOSAL by 0.79% on the REFUGE dataset. Wang et al. (2019) developed a coarse-to-fine DL architecture based on a classical CNN, the U-net model, to accurately segment the OD. The network used two distinct sets of inputs during training: color fundus images and their corresponding grayscale vessel density maps. The model fused the data using an overlap technique to locate a local image patch (disc candidate region), which was then used as input into the U-net model for further segmentation. On our dataset of 2978 test images, the model achieved an average of 0.89 for IoU and 0.93 for DSC. Surendiran et al. (2022) developed modified recurrent neural networks (mRNN) with fully convolutional network (FCN) for the extraction and segmentation of OD and OC. FCN generated a feature map for the intra- and interslice contexts, whereas RNN paid more attention to the interslice context. A novel method JOINED proposed for multi-task learning for joint OD, OC, and fovea detection (He et al. 2022a). To make the most of the information provided by the distance from each image pixel to landmarks of interest, a distance prediction branch was built in addition to the segmentation and detection branches. The JOINED pipeline has two stages: the coarse stage and the fine stage. At the coarse stage, a joint segmentation and detection module performed OD/OC coarse segmentation and generated a heatmap, which showed the location of fovea. After that, ROI was cropped for further fine processing, and use the predictions from the coarse stage as extra information to improve performance and speed up convergence. The model was on publicly available GAMMA, PALM, and REFUGE datasets. Although many DL methods have shown promising results in the area of OD and OC segmentation, it remains a difficult problem to segment the OC boundary while also increasing computing efficiency correctly. To address this issue, study (Wei et al. 2022) proposed a robust Multiscale Feature Extraction with Depthwise Separable Convolution (RMSDSC-Net) that tradeoff between performance and computational cost. The basic building blocks were the Multiscale Input (MSI), Dilated Convolution Block (DCB), Depthwise Separable Convolution Unit (DSCU), and External Residual Connection (ERC). MSI can help to mitigate data loss caused by the network’s pooling layers when it comes to having detailed representations of features. The model builds DSCU and DCB modules to improve segmentation performance and computational efficiency by preserving higher-level semantic features while minimizing the loss of spatial information from tiny details in the image. Finally, ERC was set up between the encoding and decoding layers to reduce feature degradation as much as possible. The model achieved Dice Coefficients of (0.978, 0.919) and (0.965, 0.910) for OD and OC segmentation on the DRISHTI-GS and REFUGE databases, respectively. Garifull et al. (2021) developed a model for segmenting DR lesions based on a Bayesian baseline. The model considered the parameters of a CNN as random variables and used stochastic variational dropout approximation to quantify uncertainty. The method achieved an AUC of 0.84 for HMs, 0.641 for EXs, 0.593 for MAs, and 0.484 for microaneurysms on IDRiD dataset. State-of-the-art models cannot achieve significant segmentation results because of a lack of sufficient pixel-level annotated data during training. To address this shortcoming, Lui et al. (2019b) proposed a semi-supervised conditional GAN-based approach for joint OD and OC segmentation. The model comprised a segmentation net, a generator, and a discriminator, that learned to map between the fundus images and segmentation maps. To further enhance the segmentation performance, both labeled and unlabeled data were used. Extensive trials demonstrated that model attained improved results on ORIGA and REFUGE datasets for segmenting the optic disc and cup. Jiang et al. (2019) proposed a multi-label DL model(GL-NET) that combined the GANs. GL-Net had a generator and discriminator. In the generator, skip connections were used to facilitate the fusion of low and high-level feature information, which minimizes the downsampling factor and prevents excessive feature information loss. L1 distance and cross-entropy were used as loss functions to improve segmentation accuracy. The model was verified on DRISHTI-GS1 dataset. Another study (Son et al. 2019) presented the OD and blood vessel segmentation model based on GAN. Results showed the model achieved better performance in blood vessel segmentation on DRIVE and STARE datasets. However, OD segmentation on DRIONS-DB, RIM-ONE, and Drishti-GS datasets did not provide statistically significant increases in AU-ROC. Table 4 summarized the various DL model for the segmentation of OD.

Table 4 Summarizing the different DL-based studies that segmented the OD

Blood Vessels Segmentation from Fundus Scans The paper (Wang et al. 2015) introduced a supervised approach for retinal blood vessel segmentation by combining CNN and Random Forest. The CNN served as a hierarchical feature extractor that can be trained, while the ensemble Random Forest performed the role of a classifier. The suggested method automatically learned features from the raw images and predicted the patterns using a combination of the benefits of feature learning and classical classifier. Two publicly available databases, DRIVE and STARE, of retinal images were used to evaluate the model and achieved an accuracy of 97.67% and 98.13%, respectively. To improve feature recognition in retinal images, a unique method was proposed (Fang et al. 2015), which first applied the DL method for vessel segmentation in order to produce the probability map of the image. The multi-scale Hessian response on the retinal image’s probability map was then used to detect landmarks. Melinscak et al. (2015) proposed a CNN-based model for the classification of retinal vasculature. The network has four convolutional and max pooling layers and two fully connected layers. The ReLU activation function was used in convolutional layers, and the softmax activation function was employed in the last fully connected layer. The model achieved an average accuracy of 94.66% on DRIVE dataset. The study (Fu et al. 2016) proposed a DL model and utilized the CNNs to generate a vessel probability map. The model was a modification of a holistically nested edge detection (HED) network (Xie and Tu 2015). Probability maps from output layers were combined to produce a single probability map. Conditional Random Fields (CRF) were utilized for the exact localisation of vascular boundaries. Mean field approximation of CRF distribution yields maximum posterior marginal inference. The method achieved an accuracy of 94.70% and 95.45% on the DRIVE and STARE datasets, respectively. Liskowski and Krawiec (2016) proposed a DL model for vessel segmentation. The preprocessing steps were based on global contrast normalization and zero-phase whitening. Data augmentation was performed to increase the number of images. Six distinct CNN models were constructed: PLAIN, GCN, ZCA, AUGMENT, NO-POOL, and BALANCED. The segmentation models were verified on DRIVE, STARE, and CHASE databases. A supervised method (Jiang et al. 2018) was proposed based on a pre-trained fully CNN through transfer learning. The suggested method reduces the complex challenge of retinal vascular segmentation from full-size picture segmentation to regional vessel element detection. Unsupervised image post-processing techniques were applied to the proposed method to enhance the final result further. Using the DRIVE, STARE, CHASE DB1, and HRF databases, extensive testing had shown an accuracy of 95.93%, 96.53%, 95.91%, and 96.62, respectively. Hu et al. (2018) proposed a model based on CNN and fully connected CRFs. There are essentially two phases to the segmentation procedure. First, a multiscale CNN architecture with an enhanced cross-entropy loss function was presented to generate the inter-image probability map. To acquire more specific knowledge of the retinal arteries, a multiscale network was built by merging the feature map of each intermediate layer. The proposed cross-entropy loss function concentrates on learning the difficult cases and pays less attention to losing small amounts of data on the easier samples. Second, the final binary segmentation result was achieved by applying CRFs, which used spatial context information by considering the interactions among all of the pixels in the fundus images. The DRIVE and STARE public datasets were used to test the efficacy of the proposed method and achieved an accuracy of 95.33% and 96.32%, respectively. The study (Wang et al. 2019) presented a model based on Dense U-net and the patch-based learning strategy for retinal vessel segmentation. Training patches were obtained using a random extraction strategy, the Dense U-net was utilized as the training network, and a random transformation technique was employed to augment the training data. The segmented image can be recovered using an overlapping-patches sequential reconstruction technique. The DRIVE and STARE public datasets were used to test the effectiveness of the model, which achieved an accuracy of 95.11% and 95.38%, respectively. By balancing losses with stacked deep FCN, Park et al. (2020) proposed a novel conditional generative adversarial network termed M-GAN for performing retinal vascular segmentation. For enhanced segmentation, a M-generator with deep residual blocks was included, while an M-discriminator with a greater in-depth network facilitates more rapid adversity-based model training. In particular, to facilitate scale-invariance of vessel segmentations of varying sizes, a multi-kernel pooling block was included between the stacked layers. The M-generator utilizes down-sampling layers to collect relevant data for feature extraction and up-sampling layers to create segmented retinal blood vessel pictures from the collected data. The pre-processing step utilized automated color equalization (ACE) to improve the visibility of the retinal vessels, and post-processing with a Lanczos resampling approach to smooth the vessel branching that reduced false negatives. To verify the proposed method, DRIVE, STARE, HRF, and CHASE-DB1 datasets were used and achieved an accuracy of 97.06%, 98.76%, 97.61%, and 97.36%, respectively. The study (Boudegga et al. 2021) proposed U-shaped DL design with lightweight convolution blocks used to maintain good performance while decreasing computational complexity. In order to improve the quality of the retinal image and the information gleaned from the blood vessels, a series of preprocessing and data augmentation techniques was proposed. The proposed method was evaluated on the DRIVE and STARE databases. It was shown to produce a better compromise between the retinal blood vessel identification rate and the detection time, with an average accuracy of 97.80% and 98% in 0.59 s and 0.48 s per fundus image, respectively. Wang et al. (2021a) proposed a model for fine retinal vascular segmentation by integrating Nest U-net and patch-learning. Training samples that included fine retinal vessels were generated efficiently with the help of a custom extraction approach, giving them a significant advantage in the segmentation of these specialized structures. The model sent high-resolution feature maps directly from the encoder to the decoder network. The model was learned with k-fold cross-validation, predictions were made using testing samples, and the final retinal vasculature was reconstructed using a sequential approach. The proposed model was evaluated using the DRIVE and STARE datasets and achieved an accuracy of 95.12% and 96.41%, respectively.

According to the properties of the retinal vessels in fundus scans, a residual CNN-based retinal vessel segmentation model was presented (Xu et al. 2021). The encoder-decoder network structure was built by joining the low-level and high-level feature graphs, and dilated convolution was integrated into the pyramid pooling. In addition to this, an improved residual attention module and deep supervision module were also used. The results from data sets DRIVE and STARE demonstrate that algorithm can segment the entire retinal vessel, along with related vessel stems and terminals. The accuracy achieved on DRIVE and STARE datasets was 95.90% and 96.88%, and specificity was 98.85% and 97.85%, respectively. Deng and Ye (2022) proposed a model for vessel segmentation based on multi-scale attention with a residual mechanism D-Mnet (Deformable convolutional M-shaped Network) and an improved PCNN (Pulse-Coupled Neural Network) model. The model was based on the encoder-decoder network structure. In order to boost the efficiency of retinal blood vessel segmentation, the network integrates an enhanced PCNN model, bringing together the benefits of supervised and unsupervised learning. Publicly available databases, DRIVE, STARE, CHASE-DB1, and HRF were used to conduct comparative verification, the model achieved an accuracy of 96.83%, 97.32%, 97.14%, and 96.68%, respectively. Kar et al. (2022a) proposed a segmentation model based on a generative adversarial network (GAN) and several loss functions to detect retinal vessels accurately. CLAHE method was used in the preprocessing stage to improve the contrast of blood vessels. The GAN architecture combined a segmentation network (the generator) and a classification network (the discriminator). The inception module detects fine vessel segments by extracting multi-scale properties of vessel segments at various scales. The discriminator is made up of two layers of stacked self-attention networks and a layer of position-wise fully linked feed-forward networks that infer a binary output. The transformer’s attention mechanism can effectively discriminate and store global and local information. The DRIVE, STARE, CHASE DB1, HRF, ARIA, IOSTAR, and RC-SLO databases were used to test the robustness and effectiveness of the proposed method. Chen et al. (2022a) proposed Patches Convolution Attention-based Transformer UNet (PCAT-UNet), which was a U-shaped network with a convolution branch based on transformer. Skip connections were employed to fuse the deep and shallow features. The model captures the global dependency connection and the features of the underlying feature space, overcoming the difficulties of insufficient retinal microvessel feature extraction and low sensitivity caused by easily predicting pixels as background. PCAT-UNet was evaluated using three publicly accessible retinal vasculature datasets DRIVE, STARE, and CHASE DB1. Results for accuracy were 96.22%, 97.96%, and 98.12%, and results for sensitivity were 85.76%, 87.03%, and 84.93%, respectively. The study presented (Zhang et al. 2022d) framework that provided new edge-aware flows into U-Net encoder-decoder architecture to steer retinal vascular segmentation, which makes the segmentation more sensitive to the capillaries’ fine edges. Using characteristics taken from the encoder path, edge-gated flow with gated convolution learns to highlight the vessel edges and then exports the resulting edge prediction. To further improve the segmentation results, the edge-downsampling flow extracted the edge features from the edge prediction output and re-feeds them into the decoder path. On the publicly available DRIVE, STARE, and CHASEDB1 datasets, the proposed technique outperforms the state-of-the-art U-Net baseline by 0.0056, 0.0026, and 0.0047, respectively. Most segmentation approaches still have certain shortcomings in effective fine vessel detection, however, mainly as a result of information loss issues produced by many pooling operations and insufficient process issues of local context features by skip connections. In order to solve this problem, a novel retinal vascular segmentation network named ResDO-UNet (Liu et al. 2023a) was proposed based on the encoder-decoder architecture to offer an automatic and end-to-end detection strategy using fundus photographs. Together with a depth-wise over-parameterized convolutional layer (DO-conv), a residual DO-conv (ResDO-conv) network was proposed to serve as the network’s backbone in order to obtain robust context features, which would improve feature extraction. Furthermore, a pooling fusion block (PFB) was developed to implement nonlinear fusion pooling, which used the benefits of max pooling and average pooling layers to mitigate the information loss that results from performing numerous pooling operations. Meanwhile, an attention fusion block (AFB) was employed as a solution to the problem of insufficient processing of local context information by skip connections. The model was evaluated on DRIVE, STARE, and CHASE_DB1 datasets. In the study (Qu et al. 2023), a vessel segmentation model (TP-Net) based on fundus images was proposed, which consisted of three modules, i.e., main-path, sub-path, and multi-scale feature aggregation module (MFAM). The main path’s responsibility is to identify the retinal vascular trunk, while the branching path is to track vessel information accurately. Retinal vascular segmentation was improved by combining the predictions from the two pathways using MFAM. In the main-path, a three-layer lightweight backbone network was designed based on retinal vessel features. Then a global feature selection mechanism (GFSM) was developed for the automated selection of features that are significant for the segmentation task. An edge feature extraction approach and an edge loss function were proposed in sub-path, which improved the network’s ability to capture edge information. Finally, MFAM combined the prediction by main and sub-paths, which can eliminate disturbances from the background while still keeping edge features, leading to improved vascular segmentation. The proposed TP-Net was tested using the DRIVE, STARE, and CHASE-DB1 datasets. A new lightweight segmentation model Wave-Net (Liu et al. 2023), was proposed for accurate vascular segmentation in the fundus images. The skip connections of the original U-Net were replaced with a detail enhancement and denoising block (DED) to improve the precision segmentation. DED reduced the impact of the semantic information loss problem in thin vessels and learned more about micro structures and features. In addition, it helped to reduce the impact of the semantic gap issue. Additionally, a multi-scale feature fusion block (MFF) was created for multi-scale vessel identification to fuse cross-scale contexts. The model was evaluated on DRIVE, CHASEDB1, and STARE datasets and achieved the F1 score of 0.8254, 0.8349, and 0.8140, respectively.

Table 5 summarises the studies that performed segmentation of blood vessels in fundus images. There are various other DL models, such as Bridge-Net (Zhang et al. 2022e), CSAUNet (Huang et al. 2022c), DilUnet (Huang et al. 2022c), Staircase-Net (Sethuraman and Palakuzhiyil Gopi 2022), DCCMED-Net (Budak et al. 2020), CcNet (Feng et al. 2020), MD-Net (Shi et al. 2021), NFN+ (Wu et al. 2020), DF-Net (Yin et al. 2022) found in the literature for segmentation of retinal vessels in fundus images.

Table 5 Summarizing the DL models for the segmentation of blood vessels in fundus images

Retinal Lesions Segmentation from OCT Scans:

OCT has been utilized extensively for scanning the retina to detect various ocular diseases. Significant indicators for a wide variety of ocular disorders can be found in the retinal layers. Multi-scale, end-to-end CNN architecture was proposed to delineate choroidal borders (Sui et al. 2017). The method was successful when applied to data on various global and local scales. Pixel data was used to update the appropriate graph-edge weight immediately. Results from testing the system on 912 OCT images showed that it performed best when using learned graph-edge weights. Gopinath et al. (2017) proposed a model that combined CNN with a Long Short Term Memory (LSTM) to extract the retinal layers through an OCT image. The model’s pixel-wise mean absolute error was 1.30 ± 0.48. Fang et al. (2017) presented a framework (CNN-GS) that integrated the CNN and graph search techniques for automated layer-by-layer retinal delineation. In order to learn how to classify retinal images, CNN was used to extract features from a specific layer of the retina. In addition, the probability maps produced by CNN were employed using a graph search approach to identify the boundaries of the retina.The supervised model (Xiang et al. 2018) was developed for segmenting layers and neovascularization. For the neural network classifier, spatial features (3), gray-level features (7), and layered-like features (14) were extracted. Multiple-scale bright and dark layer detection filters were employed to enhance retinal layer pathologies. To refine the retinal layers, graph search algorithm was utilized, and the weights of nodes were computed based on extracted layers. To validate the model, 42 SD-OCT images of AMD patients were used. Hu et al. (2019) proposed a model that combined multiscale CNN (MCNN) and graph search to extract the retinal layer in OCT images. Initially, multiscale features of the retinal layer were extracted in order to generate probability maps. To lessen the likelihood of the network incorrectly identifying the background as a target, the model uses location information to differentiate between foreground and background pixels. Finally, an enhanced graph search technique was used to delineate the retinal layers using probability maps. Mariottoni et al. (2020) introduced an algorithm that was capable of determining the thickness of the RNFL without the need for segmentation. The algorithm was trained using conventional RNFL thickness values obtained from SD-OCT images. In the study (Wei and Peng 2020), the priority of the mutex relationship among retinal layers was considered, and introduced new loss function as mutex dice loss (MDL). In addition to this, a novel FCN-based model was proposed that utilized the depth max pooling (DMP) to segment fluids and retinal layers in SD-OCT images. The Shortest Path (DL-SP) algorithm (Mishra et al. 2020) was proposed for automatically identifying the retinal layers responsible for drusen and reticular pseudodrusen (RPD) in OCT images. The U-net model was used to generate probability maps and then combined with pixel-to-pixel edge weights, which were measured using the gradient in the z-direction. The model was evaluated on 1000 images and achieved absolute mean differences for RPD and dursen of 0.75±1.99 pixels (2.92±7.74 \(\upmu\) m) and 1.53±1.47 pixels (5.97±5.74 \(\upmu\) m), respectively. A segmentation model DeepRetina (Li et al. 2020), was proposed to segment the retinal layers. Xception65 extracted feature maps and then fed them into atrous spatial pyramid pooling module to get multiscale information. The method was validated using 280 OCT volumes (40 B-scans per volume) and achieved IOU and sensitivity of 0.90 and 92.15%, respectively. The paper (Li et al. 2021) presented a novel two-stage approach that uses a graph convolutional network (GCN) to identify all nine retinal layers and the OD in OCT images. Multi-scale global reasoning module integrated into the U-shaped neural network between the encoder and the decoder to use the network’s prior knowledge of anatomy. The method was validated on Duke SD-OCT dataset, dice score, and pixel accuracy of 0.820 ± 0.001 and 0.830 ± 0.002, respectively. The work (Sousa et al. 2021) presented a method for the segmentation of the ILM, RPE, and BMO in OCT images of healthy and Intermediate AMD subjects. U-Net and DexiNed, two DL networks were used, and the results showed an average absolute error of 0.49, 0.57, and 0.66 for ILM, RPE, and BM, respectively. In the paper, He et al. (2021b), a unified DL framework was introduced that directly performed modeling of the distribution of the surface positions. A single feed-forward operation generated surfaces that are topologically accurate, continuous, and smooth. An embedded residual recurrent network (ERR-Net) (Hu et al. 2021) was developed based on a graph search for coarse-to-fine retinal layer delineation. In addition to resolving the gradient issue introduced by depth, the ERR-Net also encapsulates the image’s global spatial structure. Graph search was used to refine the retinal boundaries. The model was evaluated on Duke, Open University of Miami , and AREDS2 datasest. Parra-Mora and da Silva Cruz (2022) introduced a novel FCN architecture, dubbed LOCTSeg, to segment different diagnostic markers in OCT images. LOCTSeg was a lightweight model designed to balance performance and efficiency. Two publicly available benchmarking datasets were used to assess the performance of the model, AROI (1136 images) and HCMS (1715 images). The evaluation showed that the model achieved increased Dice score by 3% on the AROI dataset and by 1% on the HCMS dataset. The study (Viedma et al. 2022) proposed Mask R-CNN for segmenting retinal layers from OCT images. A CNN model was used for feature extraction from images and generated feature maps which were fed into the region proposal network (RPN). After that, bounding boxes (called anchors) were generated that were distributed over each feature map. The RPN then separates these anchors into two groups: foreground class (positive anchors), which are located in areas that reflect features relating to the objects, and background class (negative anchors), which are located outside of these objects. The study (Man et al. 2023) investigated different U-net models that combined with VGG and ResNet to segment the retinal layers, and compared their accuracy. Results showed that VGG16 and U-net (VGG16-Unet) performed better than the U-net and U-net++ model.

The study (Wilkins et al. 2012) presented a CNN model based on U-net autoencoder architecture to detect intraretinal fluid (IRF) in OCT image. The OCT images were collected from 2006 to 2016 at the Ophthalmology Department, University of Washington, 934 B-scans were used for training and 355 B-scans were used for the validation purposes. Karri et al. (2017) proposed an algorithm for identified retinal pathologies in OCT images. A pre-trained CNN, GoogLeNet, was fine-tuned to increase its prediction capabilities, and salient responses were identified during prediction to comprehend the properties of the learned filters. Subjects with dry AMD, DME, and normal were considered during the study. Roy et al. (2017) proposed a DL model (ReLayNet) to segment the retinal layers and fluid masses through OCT scans. The proposed model was evaluated on the Duke publicly available dataset. ReLayNet was able to provide more accurate estimates of layer thickness than graph-based comparing techniques. RelayNet extracted the ILM, NFL-IPL, INL, OPL, ONL-ISM, ISE, OS-RPE, and cumulative retinal fluids with dice coefficients of 0.99, 0.90, 0.94, 0.87, 0.84, 0.93, 0.92, 0.90, 0.99, and 0.77 respectively. Schlegl et al. (2018) developed a semantic segmentation model to extract IRF and sub-retinal fluid (SRF) from 1200 OCT volumes acquired from Zeiss Cirrus and Heidelberg Spectralis OCT machines. The model extracted IRF and SRF with a mean accuracy of 94.0% and 92.0%, respectively. Agari et al. (2019) developed a encoder-decoder model for solving the multitask problem of drusen segmentation. To segment RPE and BM, instead of training a multiclass model, a single decoder was employed for each target class. To further enhance the regularization, links between each class-specific branch and the decoder were developed. To enhance OCT image segmentation performance, Y-Net (Farshad et al. 2022) was proposed, an architecture that fuses frequency domain characteristics with the image domain. Y-net performed better than U-net, and obtained an increase in fluid segmentation dice score by 13% and our overall dice score by 1.9%. Hsu et al. (2022) proposed a DL model that segmented the IRF, SRF, and ellipsoid zone (EZ) in OCT images, in addition to this, also correlated the extracted features with visual acuity. The modified U-net model was trained on manually annotated 127 scans from 50 patients and validated on 38 scans from 16 patients. For IRF and SRF, the model obtained values of 0.80 and 0.89 for Srensen-Dice coefficients, respectively. The study (Philippi et al. 2023) employed a transformer-based technique to detect and isolate retinal lesions in SD-OCT scans automatically. The approach combined the data-efficient training of CNNs with the efficient long-range feature extraction and aggregation capabilities of Vision Transformers. Swin UNEt TRansformers (Swin-UNETR) (Hatamizadeh et al. 2022) was used, which was a segmentation network tailored to the unique challenges of medical image analysis. A private dataset consisted of 3842 SD-OCT images was used to evaluate the method. Specialists at the Franziskus Eye Center in Muenster manually classified the images. The Unet3+ achieved highest mean dice score of 0.508, whereas Swin-UNETR-24 obtained the second-best score of 0.457. Wang et al. (2022a) proposed a novel technique for segmenting 10 retinal layers in OCT images, including intraretinal fluid. A fan filter was utilized to minimize the impact of vessel shadows and fluid regions in an OCT image, hence improving the linear information pertaining to retinal borders. Random forest classifier was used to predicate the retinal boundaries. By combining the unique techniques of boundary redirection (SR) and similarity correction (SC), the model was able to perform boundary tracking and identify the retinal layers. On average, the proposed method utilized OCT images from 415 healthy subjects and 482 DME patients (Tables 6).

Table 6 Summarizing the segmentation models based on DL techniques for retinal lesions from OCT scans

10.3.2 Segmentation driven classification methods

Techniques Based on Fundus Scans The study (Lim et al. 2015) proposed a model CNN-FE to extract feature-enhanced inputs that highlight disc pallor without a degree of vessel kinking and blood vessel obstruction in fundus image. Pixel-level probability maps constructed by CNN went through a process of robust refinement, which takes into account information already known about the retinal morphology. In addition, confidence was estimated on the validity of the segmentation by analyzing the probability maps. Finally, the extracted cup and disc border were utilized to estimate CDR. MESSIDOR and SEED-DB datasets were used to evaluate the model. The overall screening performance of CNN-FE was higher (AUC = 0.847) than that of the reconstruction-based method (AUC = 0.838). A DL model for directly screening for glaucoma using fundus images based on image-relevant information (Fu et al. 2018). Global image stream, segmentation-guided network, local disc region, and disc polar transformation streams were defined as four deep streams on different levels. Finally, the probabilities of each stream’s output were combined to produce a final output. The model was evaluated on two glaucoma datasets (SCES and SINDI), and the results showed better performance than other state-of-the-art methods. An improved U-net CNN model was proposed to segment the OD and OC from the fundus image (Joshua et al. 2019). The DRISHTI-GS and RIM-ONE v.3 datasets were used to evaluate the model. Another DL model using Gradient-weighted Class Activation Mapping (Grad-CAM) (Kim et al. 2019b) was proposed for glaucoma diagnosis based on OD localization. The model was tested on fundus images from Samsung Medical Center (SMC) and achieved accuracy, sensitivity, and specificity of 96%, 96%, and 100%, respectively. Sreng et al. (2020) proposed a model for glaucoma diagnosis through fundus images. DeepLabv3+ architecture and encoder module with multiple CNN were employed for segmentation. For glaucoma classification, the model was tested on RIM-ONE, ORIGA, DRISHTI-GS1 and ACRIMA datasets with an accuracy of 97.37%, 90.00%, 86.84%, and 99.53%, respectively. Tulsani et al. (2021) presented a novel method for detecting glaucoma by employing segmentation of the OD and OC in fundus scan. For the segmentation task, a custom UNET++ model was developed by tuning the hyperparameters and a custom loss function. The employed loss function was useful for addressing the class imbalance that arises due to the small size of the ONH. Based on the identification of clinical features, the proposed method was 96% accurate at classifying images as either glaucomatous or healthy. Training times were decreased than state-of-the-art models and achieved Intersection over Union (IOU) scores (0.9477 for OD and 0.9321 for OC) using the improved model. The model was evaluated on RIM-ONE, DRIONS-DB, and ORIGA, and it was able to achieve an accuracy of 91%, 92%, and 90%, respectively. Abdel-Hamid (2022) proposed a deep convolutional (TWEEC) network that extracted anatomical information of OD and blood vessels. The spatial retinal images and wavelet subbands were fed into the model as inputs. TWEEC model achieved accuracy for the spatial and wavelet inputs of 98.78% and 96.34%, respectively. A novel multi-task strategy (Hervella et al. 2022) for identifying glaucoma while segmenting the optic disc and cup was proposed. The model achieved an increase in performance by utilizing pixel-level and image-level labels during training. Biomarkers like the CDR was extracted from the segmentation maps that were already predicted with the diagnosis. The model designed concurrent segmentation and classification that maximizes the use of shared parameters. In order to minimize the need for loss weighting hyperparameters, a multi-adaptive optimization technique was employed during training. CDR-based classification achieved an area under the curve of 94.18% on the REFUGE dataset. The study (Nawaz et al. 2022) developed a DL model for glaucoma diagnosis, the EfficientNet-B0 feature extractor was used to compute the deep features from the suspect samples. The features computed by EfficientNet-B0 are then fed into the EfficientDet-Bi-directional D0’s Feature Pyramid Network (BiFPN) module, where they fused many times using a top-down and bottom-up approach. Finally, the anticipated class of glaucoma lesions inside that confined region was predicted. To demonstrate model generalizability, cross-dataset validation was performed on the High-Resolution Fundus (HRF) and RIMONE datasets. An OD localization and Glaucoma Diagnosis Network (ODGNet) was presented by study (Latif et al. 2022). Initially, a visual saliency map combined with shallow CNN localized OD from images. In the second step, pre-trained transfer learning models (AlexNet, ResNet, and VGGNet) diagnosed glaucoma. The model was evaluated on ORIGA, HRF, DRIONS-DB, DR-HAGIS, and RIM-ONE publicly available datasets. The results showed that ODGNet tested on ORIGA for glaucoma diagnosis achieved accuracy, specificity, sensitivity, and AUC of 95.75%, 94.90%, 94.75%, and 97.85%, respectively. Touahri et al. (2022) proposed a glaucoma diagnosis model through fundus images that first segment the OD and OC and then classify them into normal or glaucomatous scans. To begin, the OD region was segmented in the fundus images to create a ROI. To get the fine-grained segmentation, a U-Net model was developed. The model was validated using the publicly available REFUGE dataset. Roshini and Alex (2022) proposed MultiResUNet architecture for glaucoma diagnosis based on CDR estimation. The results showed that MultiResUNet achieved a mean accuracy of 97.2%.

Context encoding network (CE-Net) (Wang and Huang 2022) architecture was developed for segmentation of the OD in diabetic retinal images. The model consisted of three module, 1) an encoder for features extraction, 2) a context extractor, and 3) a decoder. The context extractor module consisted of a residual multi-kernel pooling (RMP) and improved dense atrous convolutional block. The model was validated on Indian Diabetic Retinopathy Image Dataset (IDRID). A model (Zhang et al. 2022b) proposed for DR diagnosis based on lesion detection through fundus images. Inception V3 model was adopted for classification, whereas the grading of DR was performed by identification of different lesions. The Kaggle DR dataset was used for the training and testing of model. Another DL model (Li et al. 2018b) was proposed for the detection of DR (PDR, DME). A total of 106,244 nonstereoscopic retinal images were used to test the model. For external validation, 35,201 images of 14,520 eyes from population-based cohorts of Malays, Caucasian Australians, and Indigenous Australians were used. When tested on the independent, multiethnic data set, the AUC, sensitivity, and specificity were all found to be 0.95, 92.5%, and 98.5%, respectively. A fully patch-based CNN model (Zago et al. 2020) was developed for DR diagnosis by performing the retinal lesion localization. The use of strides enhances lesion localization by a factor of 25. Only 28 fundus images (from DIARETDB1) annotated at the pixel level were utilized in the training process for the model and tested on Messidor dataset. The system achieved sensitivity and an AOUC of 94.0% and 0.912, respectively. The study (Qomariah et al. 2021) presented a unique DLnetwork (MResUNet) that adapts UNet by replacing its identity mapping residual units with modified residual units to segment microaneurysm for DR diagnosis. During training, the mean weighted loss function was employed to handle imbalanced pixels of background and microaneurysms. Based on experimental results, the model outperformed than autoencoder, FCN16, FCN8, and UNet in terms of sensitivity on the IDRID and DiaretDB1 datasets. The paper (Kumari et al. 2022) presented image-processing techniques to extract the four key features microaneurysms, blood vessels, hemorrhages, and exudates from raw fundus images, and then employed a CNN for automatic identification of DR. When compared to other models, DenseNet-16 provides the best accuracy, when tested on DRIVE database. Jiwane et al. (2022) developed a DL model (ResNet50) for the detection of DR based on soft exudate and hard exudate along with OD. The paper (Murugan and Roy 2022) presented CNN model to train MA and non-MA patches, and a majority voting method was employed to identify MA patches. Retinopathy Online Challenge (ROC) data was used to assess the effectiveness of the provided technique. A three-class semantic segmentation model (Selçuk et al. 2022) was proposed to extract the exudates and hemorrhage. Also, a color space transformation was done, and the classic U-Net algorithm was employed so that high performance was achieved in images with low contrast. The results showed that the Dice and Jaccard similarity indices for the segmentation performance were calculated to be close to 0.95. In order to detect and grade DR through fundus images, the study (Parthiban et al. 2022) introduced a Wavelet Neural Network (EN-CSOWNN) model trained with EfficientNet and Chicken Swarm Optimization. In order to identify diseased areas in an image, a customized U-Net-based segmentation model was employed. Additionally, feature vectors were derived using the EfficientNet model, and class labels were assigned using the wavelet neural network model. Ultimately, the CSO approach was used to optimize the model’s classification performance by adjusting the model’s initial parameters. The model was validated on MESSIDOR dataset an accuracy of 98.60%. Bisneto et al. (2020) developed a DL model based on GAN to segment OD for glaucoma diagnosis through fundus images. ROIs segmented by the GAN were characterized using taxonomic indices. These indices were based on the diversity of species and the frequency of individuals, or the range of pixel values and the number of pixels with a given value, respectively. The model was evaluated on RIM-ONE and Drishti-GS public databases and achieved 77.9% accuracy. With modifications and changes, the model obtained 100% accuracy and a ROC curve of 1.

Techniques Based on OCT Scans A DL technique was presented to segment retina surfaces in OCT volume and diagnose AMD (Shah et al. 2017). The training data was used to cultivate a set of features and a transformation. Normal and diseased image surfaces were learned using the same CNN. A total of 40 OCT volumes were used to validate the model, 20 volumes from each group. The suggested method outperformed graph-based optimum surface segmentation using convex priors (G-OSC). Shah et al. (2018) developed a CNN model to detect intermediate AMD using segmented multiple retinal surfaces through OCT scans. In order to classify B-scan images into “healthy” and “intermediate AMD,” a single CNN was trained to segment all three retinal layers in a single pass. The model was validated on 3000 B-scans acquired from 50 OCT volumes. Saha et al. (2019) developed a DL method for the automated detection OCT biomarker and classification of early AMD. The model automatically detected and classified HFs, hyporeflective foci inside the drusen, and subretinal drusenoid deposits from OCT B-scans. A total of 19584 OCT B-scans with at least one eye diagnosed with early or intermediate AMD were included in the dataset, images were acquired from the Doheny Eye Centers. The model detected the subretinal drusenoid deposit with an accuracy of 86%. The accuracy of detecting HFs and hyporeflective foci was 89% and 88%, respectively. Fauw et al. (2018) developed a DL model that segments retinal layers from 3D OCT scans and then employs the extracted information for the diagnosis of retinal diseases. The segmentation network was trained on 877 images, and the classification network was trained on 14,884 maps and achieved an accuracy of 96.4% on the validation set. Chen et al. (2019) developed a DL method for screening early glaucoma that utilized features from fundus and EDI-OCT images. Both textual and structural elements from each modality were integrated. Once the OC was segmented from the fundus image using brightness compensation, CDR and textural features were obtained. Each pixel in an OCT image was labeled as being on the anterior LC surface or the background using a region-aware method and a residual U-Net architecture. After extracting features of LC deformation using an improved templated local binary pattern, the LC depth and width of the BMO’s were calculated. A CNN model (Raja et al. 2020b) was proposed based on CDR estimation for glaucoma diagnosis. To estimate the CDR, first ILM and RPE layers were extracted using CNN, and the graph searched was to refine the layers. Afterward, missing areas were filled by linear interpolation. Finally, cup and disc borders were determined in order to calculate the CDR. The model used the Armed Forces Institute of Ophthalmology (AFIO) dataset and achieved average specificity, sensitivity, and accuracy of 94.07%, 94.6%, and 94.68%, respectively. Hassan et al. (2020) proposed a hybrid convolutional framework (RAG-FW) that extracted multiple retinal lesions (such as IRF, SRF, HE, drusen, and CA) from OCT scans and utilized them for grading of retinopathy. RAG-FW was tested on 43,613 multi-vendor OCT scans and performed better than state-of-the-art solutions by getting 14.15% better at extracting retinal fluids from Duke-II, 2.02% better at classifying retinopathy from Zhang, and 1.24% better from BIOMISA datasets (Table 7). Raja et al. (2020) proposed a hybrid convolutional network (RAG-NETv2) for glaucoma diagnosis and grading by utilizing extracted RNFL, GC-IPL regions. For segmentation, encoder-decoder architecture was employed, atrous convolution, skip connection, and pyramid pooling techniques allowed to retain fine details of retinal layers. Afterward, the thickness profiles of extracted regions were computed and fed as a feature vector to the SVM for grading of OCT scan. The model was trained and tested on the publicly available AFIO dataset, and achieved a mean DC score of 0.8697 for extracting the regions, the F1 score, and accuracy of 0.9577 and 91.17% for glaucoma diagnosing and grading, respectively. In the study (Smitha and Jidesh 2022), a GAN-based model was proposed for the automated segmentation and classification of OCT-B images for the purpose of diagnosing AMD and DME. The handcrafted Gabor features were integrated into the method in order to improve retina layer segmentation, and non-local denoising was utilized in order to get rid of speckle noise. The model showed better results for OCT image segmentation and classification, with an F1-score of 0.79 and an accuracy of up to 92.42%. Deep ensemble learning model (Moradi et al. 2023) was proposed for early AMD diagnoses based on retinal layer segmentation in OCT images. In order to automatically annotate 11 retinal borders, the model combined a graph-cut method with a cubic spline. After the images had been refined, they were fed into a deep ensemble model that used a Bagged Tree with deep learning classifiers. Our boundary refinement-based segmentation model has a much lower overall error rate than OCT Explorer segmentation (1.7% vs 7.8%, p-value = 0.03).

Table 7 Summarizing the segmentation driven classification model based on DL techniques

10.3.3 Classification

Fundus Scans The study (Ahn et al. 2018) suggested that using DL algorithms in conjunction with fundus photography can be an effective method for differentiating between normal and glaucoma subjects, even in the early stages of the disease. From Kim’s Eye Hospital, fundus images (1542 images) were acquired from both healthy and glaucomatous eyes; out of a total 754 were used for training, 324 for validating, and 464 for testing. A logistic regression and CNN was developed; in addition to this GoogleNet Inception v3 model was also fine-tuned using the same datasets. The fundus image is a 3D array (240x240x3), but for the purpose of performing logistic regression, the images were flattened into a one-dimensional array. CNN model was consisted of two convolutional layers with 2020 and 4040 patch sizes, 1 stride, and 16 and 32 depths were utilized. Patch size 22 and stride 2 were used for max pooling. Fully connected layers have 32 and 64 hidden units. Convolutional and fully linked layers employed 0.5 dropout rate to avoid overfitting. The training accuracy of the logistic model was 82.9%, the validation accuracy was 79.5%, and the test accuracy was 77.2%. Transfer-learned On training data, the GoogleNet Inception v3 model obtained an accuracy and AUROC of 99.7% and 0.99, while on validation data, it reached 87.7% and 0.95, and on test data, it reached 84.5% and 0.93. The AUROC and accuracy of the CNN were 0.98 and 92.2% on the training data, 0.95 and 88% data, and on the validation data, 0.94 and 87% on the test data, respectively. The paper (Zhao et al. 2019) presented a semi-supervised model for glaucoma detection based on CDR estimation without segmentation of the OD and OC. The method directly regresses the CDR value based on the feature of the OHD using MFPPNet through fundus image. The proposed technique was tested on Direct-CSU and public ORIGA glaucoma datasets and improved average accuracy of 0.063% and the correlation of about 0.726 with measurements taken before human specialists manually segmented the optic disc/cup. On a dataset of 421 fundus images, estimated CDR values were tested for glaucoma screening and achieved an AUC of 0.905. ImageNet-trained models (VGG16, VGG19, InceptionV3, ResNet50, and Xception) were trained for automatic glaucoma assessment using fundus images (Diaz-Pinto et al. 2019). The Xception model achieved an average AUC of 0.9605 with a 95% confidence range. Gheisari et al. (2021) proposed a CNN and RNN that extracted the spatial features in a fundus image and temporal features from a fundus video. Combined CNN and RNN were used to train with 1810 images and 295 videos. The average F-measure for CNN basic and combined model was 79.2% and 96.2%, respectively. Other studies (Raghavendra et al. 2018; Gómez-Valverde et al. 2019) reported the DL model for glaucoma diagnosis without any explicit segmentation of biomarkers. The study (Gulshan et al. 2016) used DL to develop an algorithm for automated detection of DR and DME in retinal fundus images. Between May and December 2015, a panel of 54 US-based licensed ophthalmologists and ophthalmology senior residents graded a total of 128 and 175 retinal images for DR, DME, respectively. For EyePACS-1, the model had an AUC of 0.991 (95% CI, 0.988\(-\)0.993). Another automated DL model (Gargeya and Leng 2017) was developed for the detection of DR. Seventy-five thousand and one hundred thirty-three publicly available fundus images from diabetic patients were used to train and test the model respectively. Whereas validation was performed on MESSIDOR 2 and E-Ophtha databases. Tests performed on the MESSIDOR 2 and E-Ophtha databases yielded an AUC of 0.94 and 0.95, respectively. Wang et al. (2018) proposed a DR classification model, with transfer learning of AlexNet, VggNet, GoogleNet, and ResNet. The model was tested on a publicly available Kaggle dataset and achieved a classification accuracy of 95.68%. Qummar et al. (2019) proposed DL ensemble approach for DR detection and employed five CNN models, Resnet50, Inceptionv3, Xception, Dense121, and Dense169. The results showed that the model detected all the stages of DR when tested on Kaggle dataset. A DL method was proposed for feature extraction from fundus images and SVM-based classification of DR (Qomariah et al. 2019). The paper employed a CNN method for DR classification of fundus images. Pre-trained CNN models, AlexNet, VGG-16, and SqueezeNet) resulted in a 93.46%, 91.82%, and 94.49% accuracy in classification, respectively. The study (Doshi et al. 2020) proposed and investigated the usage of multiple down-scaling techniques prior to submitting image data to a DL network for classification. Multi-Channel Inception V3 architecture with a unique self-crafted preprocessing phase was employed. de La Torre et al. (2020) presented a DL interpretable classifier for DR detection through fundus scans, it also performed grading. The classifier was able to provide an explanation for the classification findings by giving a score to each point in both the hidden space and the input space. These scores, generated by pixel-wise score propagation model, represented how significantly each pixel aided in the overall classification. Four different transfer learning algorithms (VGG16, ResNet50, InceptionV3, and DenseNet121) were used to detect DR from fundus images (Sheikh and Qidwai 2020). DenseNet121 model yielded the best results for making predictions. Karki and Kulkarni (2021) developed a DL model based on EfficientNet for the classification of the DR. In addition to this, model performed grading of images, into mild, moderate, severe, or PDR. A quadratic kappa score of 0.924377 was attained by the best model on the APTOS test dataset after the models were trained using various datasets. The work (Deepa et al. 2022) presented a CNN ensemble (MPDCNN) model for accurate fundus image-based DR identification and grading. In the first phase, each input image was split into four patches and fed into one of two pre-trained CNN models (InceptionV3 and Xception). Prior knowledge is derived from the pertinent characteristics that are located in the shallow-dense layers of CNN models. The model was taught the crucial details from DR images by combining features from shallow and dense layers. In the second step, combined probability vectors from four patches were utilized to train the network classifier. DR classification accuracy was enhanced by using the ensemble method with multi-stage DL model and achieved an accuracy of 96.2% with fivefold cross-validation. A hybrid method (Butt et al. 2022) was proposed for finding and classifying DRbased on fundusimages. Model employed transfer learning, based on GoogleNet and ResNet-18 architectures, to find features that can be put together to make a hybrid feature vector. A number of classifiers were used to classify fundus images into binary and multiclasses based on the extracted feature vector. The model was trained and tested on APTOS. Kaggle, and The Aravind Eye Hospital in India datasets. For binary classification, the proposed a maximum accuracy of 97.8%, and for multiclass classification, it had an accuracy of 89.29%. The paper (Muthukannan 2022) developed a DL model (CNN-MDD) to detect early-stage AMD. Maximum entropy transformation was applied, and then images were fed into CNN which optimized using a flower pollination optimization algorithm (FPOA) for feature extraction. A Multiclass SVM classifier was employed to identify the disease from the CNN’s output. The model was tested on Ocular Disease Intelligent Recognition (ODIR) dataset and had specificity, precision, accuracy, and recall of 95.21%, 98.30%, 95.27%, and 93.3%, respectively. The study (Bhimavarapu and Battineni 2023) presented the DL with improved activation function for DR diagnosis from fundus images that reduced loss and processing time. Models was trained and evaluated using the DIARETDB0, DRIVE, CHASE, and Kaggle datasets. On the Kaggle dataset, the ResNet-152 model achieved the highest accuracy of 99.41%. A lightweight CNN (Lu et al. 2023) used transfer learning to classify the DR and DME simultaneously. The model’s average accuracy, precision, recall, and specificity after five rounds of cross-validation were 96.66%, 96.85%, 99.32%, and 96.63%. In the article (Adak et al. 2023), significant parameters of fundus images were captured using transformer-based learning models for a more nuanced understanding of DR severity. To determine the severity of DR from fundus photographs, transformers were employed and used four models: the Vision Transformer (ViT), Data-Efficient Image Transformers (DeiT), Bidirectional Encoder representation for image Transformer (BEiT), and Class-Attention in Image Transformers (CaiT). The model was tested on used the publicly available APTOS-2019 dataset. The work presented a CNN ensemble model for accurate fundus image-based DR identification and grading. The other studies (Rakhlin 2018; Fellah et al. 2023; Moin et al. 2023; Swarnalatha et al. 2023; Elmoufidi and Ammoun 2023) proposed DL model for DR classification through fundus images.

OCT Scans A CNN model (An et al. 2019) was developed for glaucoma diagnosis through fundus and OCT images. The parameters OD, RNFL deviation map, macular GCC thickness map, and RNFL thickness map were all calculated by commercial software and were used to train the model. Another study feature-based and feature agnostic techniques (Maetschke et al. 2019) for glaucoma diagnosis. A feature method based on 22 parameters (calculated by the OCT machine) and a traditional machine learning classifier was used. An unsegmented OCT volume was classified as normal or glaucomatous using a feature-agnostic framework built on 3D CNN for glaucoma diagnosis. To acquire depth information, they used 3D convolution, which allowed them to locate the significant area for diagnosis. The features were derived from raw data by using Class Activation Maps. The method failed to detect glaucoma in OCT images from patients over the age of 65 or those with advanced disease. In order to digitally stain the neuronal and connective tissues of ONH, a DL framework called DRUNET (Devalla et al. 2018) was presented, which was a U-Net-derived fully convolutional neural network and takes advantage of skip connections. The peripapillary sclera and LC were both successfully separated by the algorithm. The model could extract both global (spatial) and local (texture) characteristics of ONH tissues. A shortcoming of the proposed architecture is that it was only trained on 100 OCT images obtained from healthy and glaucoma participants.

Muhammad et al. (2017) proposed DL model based on AlexNet and utilized transfer learning. The OCT software’s measurements of features like RNFL and GCIPL thickness were fed into the CNN. CNN’s feature extraction was utilized to train a random forest classifier. Depending on the parameters used, the model’s accuracy was anywhere from 63.7% to 93.1%. Kermany et al. (2018b) proposed a DL model to diagnose DME, CNV, DRUSEN and normative retinal OCT. They used 108,312 OCT images from 4686 patients for training and 1000 scans from 633 subjects for testing and achieved accuracy, sensitivity, and specificity ratings of 96.6%, 97.8%, and 97.4%, respectively. Li et al. (2019a) proposed a deep classification model for choroidal neovascularization (CNV), DME, and DRUSEN through OCT images. To identify retinal OCT images, the proposed approach used an ensemble of four classification model instances, all of which were based on an enhanced ResNet50. The dataset consisted of 21,357 retinal OCT scans that were gathered from 2796 adult patients at Shanghai Zhongshan Hospital and Shanghai First People’s Hospital between 2014 and 2019. The model classification accuracy for the B-scan was 0.973 (95% 0.971–0.975), the CI, 0.971\(-\)0.975), sensitivity was 0.963 (95% and the 95% CI, 0.983–0.987). Butola et al. (2020) proposed a CNN model (LightOCT) to classify OCT images into classes normal, AMD, and DME. LightOCT had a two-convolutional-layer and a fully-connected-layer, and achieved accuracy greater than 96%. A DL model (Jin et al. 2022) was proposed based on feature-level fusion (FLF) method that combined the OCT and OCT-A images for the assessment of CNV in AMD. The model was tested on two external datasets and achieved an accuracy of 95.5% and an AUC of 0.9796 on multimodal data.

A novel uncertainty guided semi-supervised model (Sedai et al. 2019) was proposed based on student-teacher methodology. Limited labeled samples and a large number of unlabeled images were used for training. First, using Bayesian deep learning, a teacher segmentation model was trained using the labeled data. In order to generate soft segmentation labels and an uncertainty map for the unlabeled collection, the trained model was employed. After the data were softly segmented, the uncertainty of the teacher model was assessed, and the pixel-wise confidence of the segmentation quality was used to update the student model. Normal OCT scans were reconstructed using an adversarial network that was trained with little supervision (Wang et al. 2021c). The network then reconstructed the abnormal (disease) images at the inference stage, using the difference between the input and reconstructed scans to identify lesion pathologies. Das et al. (2020b) proposed an unsupervised framework using the GAN to perform fast and reliable super-resolution without the requirement of aligned low and high-resolution pairs. Adversarial learning identified mapping priors to obtain the spatial, color, and texture information in the high-resolution scans. Automated AMD diagnosis using the generated images yields an improved classification accuracy of 96.54%. Das et al. (2020a) proposed a semisupervised GAN-based classifier for automated diagnosis using limited labeled data. The two main components of the framework were the generator and the discriminator. The adversarial learning between them aids in the development of a generalizable classifier for the prediction of degenerative retinal illnesses like AMD and DME.

Research trends shifted towards the analysis of OCT volume for detecting various retinal diseases; the study (Rasti et al. 2018) proposed a model for the classification of abnormal macula through 3D-OCT. The technique evaluated intraretinal layers and lesions without the use of denoising, segmentation, or retinal alignment operations. A two-stage plan was used to separate abnormal cases from the control group based on adaptive feature learning and diagnostic scoring. Initially, the cumulative characteristics of 3-D volumes were extracted using a wavelet-based CNN model for generating B-scan CNN codes in the spatial-frequency domain. The second step involved using the derived features to score the existence of anomalies in the 3D OCT. The technique was tested on two independent retinal SD-OCT datasets using the five-fold cross-validation (CV) method. The first group is composed of 30 normal participants and 30 patients with DME 3-D OCT scans acquired with a Topcon instrument. The second set of data was from the Heidelberg device and included 45 subjects, each class (AMD, DME, and normal) contained 15 subjects. The results showed that in the two-class classification problem (dataset1), the suggested method achieved an average precision of 99.33%. When used for the three-class classification problem (dataset2), the model achieved an average precision of 98.67%. The study (Hassan et al. 2018a) introduced a multilayered CNN structure tensor Delaunay triangulation that extracted nine retinal and choroidal layers and the macular fluids. The retrieved retinal information was used for the automated diagnosis of maculopathy and the reliable reconstruction of the 3D macula of the retina. The model was validated on 41,921 OCT images collected from different vendors and achieved mean accuracy of 95.27% for extracting retinal layers. Whereas, for extracting fluid, the reported mean dice coefficient was 0.90, and the overall accuracy for maculopathy diagnosis was 96.07%. Mantel et al. (2021) proposed a DL model to identify and localize AMD biomekers such as IRF, SRF, and pigment epithelium detachment (PED). Cubic volumes of SD-OCT were collected from 117 AMD eyes, then manual annotation of the retinal lesions was performed. A 3D-FCN (Li et al. 2019b) based on U-Net was proposed to segment the retinal fluid OCT images. The model was evaluated on the local dataset (75 volumes), achieved Kappa coefficient of 98.47%, accuracy rate of 99.56%, and F1 score of retinal fluid was 95.50%. In the paper (Mukherjee et al. 2022), a 3D deep neural network was proposed that segmented the retinal layers ensuring accuracy and smoothness. The model was made up of two separate but complementary networks: (1) 3D UNet that performed multi-class voxel labeling of retinal layer surfaces, and (2) 3D convolutional-autoencoder, which limits the 3D UNet’s output and compels it to estimate a smooth contour.

10.4 Screening ROP using AI methods

Recent developments in the diagnosis of Retinopathy of Prematurity (ROP) have centered on utilizing cutting-edge technology, specifically artificial intelligence (AI), to improve early detection efficiency and accuracy (Smith 2021). For proper care of ROP, a potentially blinding disease affecting premature infants, prompt intervention is necessary. Using deep learning algorithms that have been trained on large datasets of retinal scans from premature infants is one noteworthy breakthrough. These algorithms are remarkably good at detecting small retinal abnormalities linked to ROP, which helps physicians diagnose and treat patients early (Smith 2021).

In addition, researchers have also looked into combining various imaging modalities to produce all-encompassing ROP diagnostic tools. For example, combining fundus photography, wide-field imaging, and OCT offers a more comprehensive picture of the vascular anomalies and retinal structure related to ROP (Jones 2022). This multi-modal approach also improves the accuracy of diagnosis and helps customize treatment plans according to each patient’s unique illness characteristics. Furthermore, it enhances our comprehension of the disease’s course and facilitates continuous attempts to improve treatment regimens (Jones 2022).

The use of remote screening and telemedicine in the detection of ROP has also been examined in recent research endeavors (Garcia 2020). Retinal images taken in newborn intensive care units can be safely sent to distant specialists for examination with the help of AI algorithms. This not only solves the scarcity of ROP-specialized ophthalmologists but also makes prompt diagnosis and intervention easier, especially in underprivileged areas. Improved accessibility to specialized care for premature infants at risk of developing this sight-threatening illness is possible with the integration of telemedicine into ROP diagnosis (Garcia 2020).

To improve early identification, accuracy, and accessibility to specialized care, new works on detecting Retinopathy of Prematurity highlight the combination of artificial intelligence, multi-modal imaging, and telemedicine. Together, these developments improve the prognosis for premature infants who are at risk of respiratory problems of the newborn (ROP) and mark a major advancement in the therapy of this serious neonatal diseases.

11 Advanced deep learning schemes

This section presents a brief introduction to advance DL techniques and state-of-the-art methods for the identification and classification of retinal lesions.

11.1 Meta-learning & multi-task learning

Meta-learning is a subfield of DL that focuses on the problem of learning how to learn or learning from previous learning experiences. The goal of meta-learning is to enable a model to quickly adapt to new tasks using only a small amount of data by leveraging the knowledge gained from previously seen tasks. There are several different approaches to meta-learning, each with its own set of advantages and disadvantages.

  • One common approach is to use a neural network as the model and train it on a variety of tasks in a way that the parameters of the network are updated so that they can be used to adapt to new tasks quickly. This is done by defining a loss function for the meta-learning process that tries to minimize the difference between the parameters of the model after adapting to a new task and the parameters of the model after adapting to similar tasks in the past.

  • Metric-based meta-learning, which learns a distance metric in the space of task representations, such that new tasks can be quickly adapted to by finding the most similar tasks to the new task in the learned metric space.

  • Model-based meta-learning methods, which learn a model of the task-generating process and can use this model to adapt to new tasks quickly.

  • Optimization-based meta-learning methods, which learn an optimization algorithm that can quickly adapt to new tasks by using the gradients of the loss function with respect to the model parameters.

All of these methods have been successfully applied to a variety of different problems, such as few-shot and one-shot learning, where a model must quickly adapt to new tasks with limited data, reinforcement learning, and other areas. Overall, the main idea behind meta-learning is to train a model on a variety of tasks such that it can quickly adapt to new tasks using the knowledge gained from the previous tasks. This allows the model to improve its learning efficiency and generalization performance. The meta-learning strategy involves training the model’s parameters explicitly so that good generalization performance can be achieved on a new task using a short number of gradient steps and a small amount of training data (Finn et al. 2017). For automated diagnosis of DR in fundus images, an anomaly characterization algorithm (Matta et al. 2023) was developed. A few-shot learning solution in which CNN trained for common conditions was combined with an unsupervised probabilistic model for detecting rare conditions. CNNs often assume that images with the same anomalies were similar, even though they were trained to look for differences. The algorithm achieved an average AUC of 0.938.

11.2 Few-shot learning

Few-shot learning (FSL) model is able to learn and recognize new classes with only a small number of examples. The goal is to train models that can generalize well to new classes, even when only a small number of examples are available for these classes. There are two main approaches for few-shot learning:

  • Meta-learning: This approach involves training a model on a large number of similar tasks so that it can learn to adapt quickly to new tasks. The model learns a general way of learning rather than memorizing the specific training examples.

  • Transfer learning: This approach involves using a pre-trained model on a large dataset and fine-tuning it on the few-shot task. The idea is that the model has already learned useful features from the large dataset that can be useful for the few-shot task.

FSL is often used in tasks such as image classification, where there is a small number of examples per class. Some other examples of real-world applications that use few-shot learning are medical imaging, rare species identification, and speech recognition. Recently, some new techniques have been proposed for few-shot learning, such as few-shot learning with attention mechanisms, memory-augmented neural networks, and metric-based learning. These methods have shown promising results in various few-shot learning benchmarks. Yoo et al. (2021) proposed a model based on FSL using GAN to diagnose rare ocular diseases in OCT images. Before training the classifier, GAN model was built to turn normal OCT images into pathological OCT images for disease. Inception-v3 was trained using a training dataset, and then the final model was tested on a separate test dataset. The study (Mai et al. 2021) modeled the problems caused by a lack of labeled data as a Student-Teacher learning with a knowledge distillation (KD). Kim et al. (2017) proposed a model for glaucoma diagnosis in fundus images using FSL. The study (Murugappan et al. 2022) proposed model DRNet, based on FSL and attention, that performed grading and detection of DR. To build the attention mechanism to preserve visual representations, the network makes use of aggregated transformations and class gradient activations. The model was evaluated on APTOS2019 dataset and achieved 99.73 % accuracy, 99.82% sensitivity for DR detection, 98.18% accuracy, 97.41% sensitivity for DR grading. Gulati et al. (2022) used FSL on the iris dataset to find out hemorrhages or Microaneurysms disease in images.

11.3 Incremental learning

Incremental learning model continually updates its knowledge from newly acquired data, without being retrained from scratch. This allows the model to continuously learn and improve its performance over time, making it suitable for scenarios where the data is constantly changing. The model parameters update incrementally instead of retraining the entire model on all the data again. This helps to reduce computational resources required to train the model, as well as allow the model to learn from new data continuously. The approach typically involves dividing the incoming data into mini-batches and using these mini-batches to update the model parameters using gradient descent or a similar optimization technique. It can be used in various tasks such as classification, regression, and clustering. It is especially useful in scenarios where the data is too large to fit into memory or where the data is streaming and needs to be processed in real-time. To improve classification accuracy, the study (Meng and Shin’ichi 2020) presented model “Attribute Driven Incremental Network” (ADINet) that combined class label prediction with attribute prediction inside an incremental learning framework. Knowledge distillation (KD) was used for image classification to preserve the information of base classes. For improved accuracy in attribute prediction, weights were assigned to each image attribute based on their relative importance. They came up with the concept of attribute distillation (AD) loss to preserve the data of base class attributes despite the advent of new classes. There is only a small performance hit when repeating this incremental learning process numerous times. Hassan et al. (2021) introduced a novel incremental cross-domain adaption technique that can be used by any deep classification model to gradually learn pathological abnormalities in OCT and fundus imaging with only a small number of training examples. By using a Bayesian multiobjective function, the proposed technique not only ensures that the candidate classification network retains its prior learned knowledge during incremental training but also that it understands the relationships between previously learned pathologies and recently introduced disease categories so the model can effectively recognize them during the inference stage. The model achieved an overall accuracy and F1 score of 98.26% and 0.98, respectively, when tested on six public datasets. In the study (He et al. 2021a), an incremental learning-based model was proposed for the DR lesion segmentation that distills the knowledge of the previous model in order to enhance the current model. A probability-map alignment scheme was proposed to combine the previous map and the current map. The scheme dealt with the special class background in the context of segmentation. Using the scheme, it was easy to calculate the optimized value for the model-based weight. The idea of “knowledge distillation” was used to move the information from the probability map to the current model.

11.4 Contrastive learning

Contrastive learning (CL) is a self-supervised learning method that aims to learn a feature representation of data that separates positive pairs from negative pairs (Tian et al. 2019). The idea is to maximize the similarity between positive pairs while minimizing the similarity between negative pairs. This is typically done by designing a contrastive loss function and optimizing it using gradient descent (Tan et al. 2022). The learned representation can then be used for downstream tasks, such as classification or clustering, without the need for labeled data. It has been applied to various domains, including computer vision and natural language processing, and has shown promising results. Numerous DL-based methods have been proposed, and they perform better than human analysis at diagnosing retinal disorders. Cross-entropy is widely employed as a loss function in conventional DL model training. But it has recently been found that this loss function has certain drawbacks, such as a poor margin that might cause erroneous findings, sensitivity to noisy data, and hyperparameter variability. To fix these problems, contrastive learning has been gaining popularity. Islam et al. (2022) proposed a supervised CL model for detecting DR from fundus images. For image enhancement, CLAHE was used, and Xception model was employed as the encoder for representation learning. The SCL of the model was interpreted by projecting a 128-D embedding space into a 2-D plane using the t-SNE method. Two publicly available datasets, APTOS and Messidor-2 were used for training and testing of the model. For DR (Binary classification), the model achieved an AUC of 98.50% and an accuracy of 98.36% and on the APTOS 2019 dataset. Whereas for five-stage grading, it gained AUC of 93.81% and an accuracy of 84.36%. Tian et al. (2019) developed CL-driven methodologies to constrain the model’s knowledge to learn the difference between new anchor examples and previously acquired positive and negative examples. With the goal of identifying retinal biomarkers in OCT images, a novel contrastive uncertainty network (CUNet) (Liu et al. 2022b) was developed. To improve the network’s capacity for distinguishing between distinct classes of retinal biomarkers, CUNet employed a proposed CL strategy to strengthen the feature representation of biomarkers. To further enhance the network’s sensitivity to the fuzzy boundaries of retinal biomarkers, bounding box uncertainty was proposed and integrated with the conventional bounding box regression. In the study (Kaplan and Lensu 2022), a variational autoencoder (VAE)-based technique was developed for the generation of OCT images of the retina using CL. In the second step, disease-specific OCT images were generated by applying VAEs to the learned embeddings. It was found that the diseases were effectively partitioned in the embedding space, and the suggested method successfully produced high-quality images with high-detail spatial resolution. Alam et al. (2022) proposed a CL-based framework with neural style transfer (NST) augmentation to generate models with improved representations for detecting DR in fundus scans. The EyePACS dataset was used to train and evaluate the model, and clinical data from the University of Illinois, Chicago (UIC) was used for testing. To improve a U-Net embedding capacity to segment retinal vessels in fundus image, a model (Xu et al. 2022) was presented that uses a local-region and cross-dataset CL strategy without introducing complex network structures. The main goal was to distinguish the characteristics of pixels that are easily confused with their neighbors within the same local region. The model took full advantage of the global contextual information of the entire dataset that improves the features by employing a memory bank method. The model was evaluated on DRIVE and CHASE-DB1 datasets. In order to provide lesion-aware scanner-independent screening and grading of retinopathy, the study (Hassan et al. 2023) introduced a novel self-supervised segmentation-driven classification pipeline that used a proposed angular contrastive distillation approach to extract retinal lesions. To further improve the proposed framework’s diagnostic capabilities, a novel co-attention mechanism was incorporated. The mechanism allowed the underlying network to concentrate on retinal abnormalities and effectively grade retinal diseases without requiring ground truth labels. The model was tested on seven publicly available datasets acquired using four different scanners, where it achieved a 9.22% improvement in mean IOU for extracting retinal lesions and a 10.71% improvement in F1 score as compared to state-of-the-art solutions for grading retinopathy. In order to identify and segregate the biomarkers in OCT scans using only image-level annotations, a weakly supervised network called TSSK-NET (Liu et al. 2023b) was proposed. The method is a Teacher-Student network with Self-supervised CL and Knowledge distillation-based anomaly localization. Initially, a unique pre-training technique based on supervised CL was proposed to teach the model morphology of normal OCT images. Second, a module for fine-tuning was built, and a novel hybrid network was proposed. The model employed supervised CL for learning features and cross-entropy loss for learning classes. To further enhance performance, it was proposed to combine these two losses in order to preserve the various morphologies and improve the encoding representation of features. Finally, a knowledge distillation-based anomaly segmentation method was utilized that was effectively integrated with the prior model to relieve the difficulty of insufficient supervision. In the study (Holland et al. 2023), the AMD progression through OCT images was analyzed in a self-supervised feature space. CL was used to pretrain an encoder, which then projects images from longitudinal time series to positions in feature space. This enables the construction of disease trajectories that were subsequently denoised, and divided into clusters. These clusters were associated with OCT biomarkers and were discovered in two datasets encompassing time series of 7912 patients scanned over a period of eight years.

11.5 Domain adaptation

Domain adaptation is an advanced DL technique that enables models trained on one domain (source domain) to generalize well to another domain (target domain) with different distributions (Liu et al. 2019a). This is especially useful in scenarios where annotated data is scarce in the target domain. The main idea behind domain adaptation is to align the feature representation of the source and target domains so that the model trained on the source domain can effectively generalize to the target domain. This can be achieved by various methods, including fine-tuning the model using a small amount of labeled target data, adversarial training, and feature reconstruction. The goal is to reduce the domain shift, or the difference between the source and target domains so that the model can perform well on both domains. In order to facilitate unsupervised domain adaptation to segment retinal vessels in fundus images, Zhuang et al. (2019) derived the asymmetrical maximum classifier discrepancy (AMCD) strategy from maximum classifier discrepancy. The model was trained using labeled data and then tested with unlabeled data from the target domain. The three classifiers were trained in an asymmetrical fashion, with one main classifier using only the source examples and the other two assist classifiers being utilized to maximize the discrepancy on target samples. The model was validated on DRIVE, STARE, CHASE-DB1, and IOSTAR eye vessel segmentation datasets. Lui et al. (2019a) proposed an unsupervised domain adaptation model Collaborative Feature Ensembling Adaptation (CFEA) that collaborated adaptation through both adversarial learning and ensembling weights. By ensembling weights during training, the model not only achieved domain-invariance but also maintained an exponential moving average of the previous predictions, leading to improved prediction for the unlabeled data. Multiple adversarial losses enable the extraction of domain-invariant features to confound the domain classifier and simultaneously benefit the ensembling of smoothing weights, all without performing annotation of any sample from the target domain. Song et al. (2020) proposed a domain-adaptive multi-instance learning with attention technique for DR grading. Labeled examples are generated through cross-domain to eliminate irrelevant instances. A multi-instance learning with attention technique was created to collect the spatial information of highly suspicious lesions and performed DR grading. The model achieved an average accuracy of 76.40% and an AUC value of 0.749 when tested on the Messidor dataset. Wang et al. (2021b) developed an unsupervised domain adaption model based on faster-RCNNs for lesion detection in multi-device retinal OCT images. Both the shift at the image level and the shift at the instance level were minimized together to reduce the domain shift. In order to synchronize the changes across all of the levels, the model used a combination of a domain classifier and a Wasserstein distance critic. Using OCT image data from two separate devices, the model achieved an average accuracy improvement of more than 8% compared to the technique without domain adaptation and surpassed the performance of other comparable domain adaptation methods. Another study (Yang et al. 2020) proposed an unsupervised domain adaptation framework for lesion detection in OCT images. To compel the network to learn device-independent features, the model developed global and local adversarial discriminators. A non-parameter adaptive feature norm was then presented for the global adversarial discriminator in order to stabilize classification in the target domain. The study proposed a Collaborative Adversarial Domain Adaptation (CADA) model (Liu et al. 2022) based on domain adaptation with multi-scale inputs and multiple domain adaptors employed in feature and output space. The information loss caused by the network’s pooling layers used for feature extraction can be mitigated with multi-scale inputs. CADA is an interactive paradigm that enables collaborative adaptation through adversarial learning and weight ensembling. Model accomplished domain invariance and model generalizability by using adversarial learning at multi-scale outputs from distinct network layers and retaining an exponential moving average (EMA) of the historical weights during training. Multiple adversarial losses in the encoder and decoder layers direct the extraction of domain-invariant features without annotating a single sample from the target domain. The model outperformed on REFUGE, Drishti-GS, and Rim-One-r3 datasets. An innovative approach (Madadi et al. 2022) for glaucoma diagnosis based on learned representations that included both domain-invariant and domain-specific in order to extract generic and domain-specific information. Low-rank coding was employed for aligning source and target distributions, as well as progressive weighting was used for the correct transfer of source domain information and mitigation of negative knowledge transfer to the target domain. The model was evaluated on OHTS, ACRIMA and RIM-ONE. Zhang et al. (2022c) proposed an unsupervised domain-adaptive segmentation (CAE-BMAL) model to extract the OD and OC. Initially, a convolutional autoencoder was used to boost the source domain, allowing the model to generalize better. Then, to mitigate the effects of the complex environment on segmentation, a boundary discrimination branch based on adversarial learning was be introduced. The model was validated on three datasets, Drishti-GS, RIM-ONE-r3, and REFUGE. The study (Cao et al. 2022) proposed a unified weakly-supervised domain adaptation framework for the DR diagnosis. The model comprised three parts: domain adaptation, progressive discriminator for individual instances, and multi-instance learning with attention. The method utilized multi-instance learning and an attention mechanism to model the connection between patches and images in the target domain. Additionally, it used a combined learning approach that takes into account data from both the source and the target domains. Results on the Messidor dataset showed that the model had an average accuracy and AUC of 94.90% and 0.76 for binary-class and 95.80%,0.74 for multi-class classification, respectively. The model achieved 88.70% accuracy on Eyepacs dataset. Chen et al. (2022d) proposed a segmentation-guided domain-adaptation model for adapting images from various OCT machines into a single image domain. It eliminates the time-consuming processes of manually labeling new datasets and retraining the existing network. The study (Hou et al. 2023) deals with image quality enhancement of fundus images in a completely unsupervised manner, neither paired nor high-quality photos were used.

11.6 Attention-based models

The study (Fang et al. 2019) proposed a novel lesion-aware DL model (LACNN) for retinal OCT image classification. At first, the OCT image was used to create a soft attention map using the lesion detection network. A classification network is then used to assign relative importance to the various convolutional layers based on the attention map. An improved U-Net model (Liu et al. 2021c) based on an attention mechanism was developed to identify the fluid region. By bringing together high-level and low-level information, skip connections improved the accuracy of the segmentation outcomes. The loss function is a combination of the weighted binary cross-entropy loss, the dice loss, and the regression loss (to prevent the problem of converging fluid areas). Another study (Liu et al. 2021b) presented a one-stage attention-based method for retinal OCT image segmentation and classification. Mishra et al. (2021) proposed a model for the classification of macular OCT images using Multilevel perturbed spatial attention to extract context-aware diagnostic features. The model classified AMD, DME, and CNV. Preprocessing techniques like region of interest extraction, denoising, and retinal flattening were not required with the proposed end-to-end trainable architecture. Liu et al. (2020) proposed an enhanced nested U-Net architecture (MDAN-UNet) for end-to-end segmentation of OCT images. The model was evaluated on two publicly available benchmark datasets, Duke DME and the RETOUCH datasets. The study (Sun et al. 2020) presented a model to classify OCT volume for diagnosing macular diseases. The model consisted of three modules: feature extractor from B-scan, 2D map generation, and volume-level classifier. The model was trained and used to construct a 2D map for OCT volume, whereas volume-level classifiers (SVM) classified 2D feature maps. The results of a five-fold cross-validation of the model showed an average of 98.17% accuracy, 99.26% sensitivity, and 95.65% specificity. The model was trained and tested on a publicly available dataset (Kermany et al. 2018a). In terms of accuracy, the model achieved 97.79% on the training data and 95.6% on the testing data. Rendering en face OCT of arbitrary retinal layers was made possible by real-time segmentation in combination with high-speed OCT volume acquisition, which can be used to improve the success rate of high-quality scans and give surgeons immediate feedback during image-guided procedures. In the study (Borkovkina et al. 2020), researchers used three tiers of optimization to successfully segment the eight retinal layers in real time using OCT. First, a simplified neural network architecture; second, a novel neural network compression approach using TensorRT; and third, dedicated GPU hardware to speed up calculations. The U-NetRT compressed network offered 21 times faster inference than regular U-Net inference with no loss of accuracy. Kumar and Gupta (2022) developed a model that incorporated Attention and Transfer Learning to classify the CNV, DME, drusen from the OCT images. An innovative end-to-end multiscale attention-gated network (MAGNet) was proposed (Cazañas-Gordón and da Silva Cruz 2022) for detecting and segmenting retinal layers and macular cystoid edema in OCT images. In order to deal with class imbalance, the MAGNet utilizes a weighting loss methodology and a FCN model that uses attention gates at different scales to perform segmentation. All B-scans were center-cropped along their longest axis to reduce their dimensions and create 496-by-496-pixel squares as part of the preprocessing phase. The model used Duke and HCMS datasets for training and testing and achieved the mean dice score of 0.92 ± 0.03. Li et al. (2023) a multiscale attention-guided fusion network (MAGF-Net) was proposed for vessel segmentation in fundus images. A multiscale attention (MSA) block was proposed for the construction of the backbone network in order to capture multiscale contextual variables. To obtain global multiscale contextual information, a feature enhancement (FE) block was designed and integrated into the bottleneck layer. Attention-guided fusion (AGF) blocks were created to combine characteristics from various network levels in order to maximize the usefulness of both channel information from deep layers and spatial information from shallow layers. For further data retention throughout the downsampling process, a hybrid feature pooling (HFP) block was utilized. The model was validated on three public datasets: the CHASE-DB1, the DRIVE, and the STARE. The model achieved F1 and accuracy of 0.8329 and 96.77% on DRIVE, 0.8307 and 95.78% on STARE, and 0.8364 and 96.%49 on CHASE-DB1, respectively. A multi-scale residual attention network (MRANet) (Yi et al. 2023) based on U-Net was developed to segment retinal vessels in fundus scans. Initially, a multi-level feature fusion (MLF block) block was introduced to collect blood vessel information more effectively. Then, variable weights of each fused feature were learned through the use of attention blocks, which can preserve more useful feature information while lowering interference from redundant features. Next, a multi-scale residual connection block (MSR block) was created to extract features more effectively. Finally, network overfitting was mitigated by incorporating a DropBlock layer within the network. The model achieved an accuracy rate of 96.98% and an AUC performance value of 0.98 on the DRIVE dataset, and 97.55% and 0.98 on the CHASE DB1 dataset, respectively.

12 Datasets

This section provides information about the datasets of fundus and OCT images. There are various datasets, private and publicly accessible, are present in the literature for both modalities. Here we are only describing the publicly available datasets. The publicly available dataset was originally created to serve as a testing dataset against which various detection algorithms could be evaluated.

12.1 Fundus datasets

There are various fundus-based datasets that are publicly available, which are used to detect various retinal abnormalities. Table 8 summarizes the features of publicly available fundus datasets. DRIVE (DRIVE 2004) dataset consists of 40 fundus images and segmented blood vessels.

DIARETDB0 dataset (DiaRetDb0 2007) contains 30 CFPs, 20 of which are considered normal and 110 of which exhibit DR (hard and soft EXs, MAs, HMs, and neovascularization). There are a total of 89 CFPs in the DIARETDB1 DiaRetDb1 (2007) database; 84 show at least mild NPDR signs of the DR, while five are considered normal and show no signs of the DR.

STARE (Structured Analysis of the Retina) (STARE 2000) dataset conists of 400 raw fundus images with labels of 13 categories and also segmentation annotation of blood vessels, arteries, and optic nerve for 40, 10, and 80 images, respectively.

DRIONS-DB (RIONS-DB 2009) consisting of 110 unlabeled images with two different segmentation of the OD for each image. The Messidor project Messidor (2017) objective was to conduct a comparative evaluation of various segmentation algorithms developed for the purpose of detecting lesions in fundus images. More specifically, there are 1200 fundus images in this dataset, each of which has been labeled with a medical diagnosis. The medical experts have presented two diagnoses for each image, retinopathy grade and risk of DME. It has been determined to classify DR into six levels of severity, four levels of exudation, and three levels of hemorrhage and to specify the number of microaneurysms for each image.

RIGA (RIGA 2018) is a dataset including 750 retinal fundus images for glaucoma analysis. The dataset includes the OC and OD ground truth for each image; however, the diagnosis of glaucoma is not provided. ORIGA30 (Zhang et al. 2010) consists of 482 and 168 fundus images of healthy and glaucoma, respectively, as well as the segmentation of the disc and cup. This dataset was accessible to the public and downloadable in 2010, but it does not appear to be accessible to the public now.

RIMONE (RIMONE 2011) first made available to the public in 2011. To test the CDR, 159 stereo fundus images were given in 2015, with two ground truth segmentations of OD and OC. These photos represented healthy individuals and glaucoma patients. The dataset was recently revised and improved for a deep learning context in 2020. The new data set includes 313 and 172 images from healthy and glaucoma patients, respectively.

Drishti-GS (Drishti-GS 2014) is a dataset including OD and OC segmentation for glaucoma assessment. It is comprised of 101 monocular fundus images (31 normal,70 glaucoma images), divided into training and test sets, with four segmentation of the OD and OC for the training set.

ACRIMA (Diaz-Pinto et al. 2019) has 705 fundus photos with labels (309 normal and 396 glaucomatous images). Two glaucoma specialists were involved for the annotations, and no other clinical evidence was considered while providing labels for the images.

G1020 (Bajwa et al. 2020) is a huge dataset of retinal fundus images for glaucoma diagnosis, containing 1020 images (724 healthy and 296 glaucoma). There was segmentation of the OD and OC as well as labeling of the images.

REFUGE (REFUGE 2020) dataset was released, which includes 1200 fundus images annotated with clinical glaucoma diagnoses and ground truth segmentation of the OC and OD.

PAPILA (Kovalyk et al. 2022) dataset includes data from 244 patients (333 health, 155 glaucoma images). Each file contains organized data on a single patient’s clinical history, as well as segmentation of the OD and OC in both eyes.

Kaggle Diabetic Retinopathy (Kaggle-DR) (Kaggle-DR 2015) a total of 88,702 CFPs are available in the dataset, split between 35,126 training samples and 53,576 test samples. EyePACS contributed the images, which were taken using a wide range of devices in a variety of settings at numerous primary care clinics in California and abroad. Images of both the left and right eyes were taken at the same resolution for each individual. Clinicians used the Early Treatment DR Study (ETDRS) scale to assess the severity of DR in each image.

GAMMA (GAMMA 2021) dataset includes 3D OCT and 2D fundus images from 300 patients. Each image in the dataset was annotated with glaucoma grade, macular fovea coordinates, and an optic disc/cup segmentation mask from the fundus image.

Table 8 Summarizing the feature of publicly available fundus dataset

12.2 OCT datasets

The publicly available OCT datasets are the (Zhang Kermany et al. 2018b), Duke-1 (Farsiu et al. 2014), Duke-2 (Chiu et al. 2015), Duke-3 (Srinivasan et al. 2014b), Rabbani (Rasti et al. 2017), BIOMISA (Hassan et al. 2018b), and AFIO (Raja et al. 2020a). The Table 9 provides an overview of the OCT datasets that are publicly available.

Zhang dataset (Kermany et al. 2018b) dataset is one of the most extensive OCT datasets that are freely accessible to the public. The images were acquired from Spectralis OCT, Heidelberg Engineering, Germany at Beijing Tongren Eye Center, the Shanghai First People’s Hospital, the California Retinal Research Foundation, Medical Center Ophthalmology Associates, and the Shiley Eye Institute of the University of California San Diego between 1 July 2013 and 1 March 2017. It has a total of 108,309 OCT images and was primarily developed with the intention of screening for DME, DRUSEN, CNV, and other normal images. Among the total of 109,309 scans analyzed, 51,140 of them exhibit a normal retina, while 11,348 scans indicate the presence of DME symptoms, 8616 shows DRUSEN, and 37,205 depicts CNV symptoms.

Duke-1 (Farsiu et al. 2014) is consisted of 38400 BScans from 269 AMD patients and 115 normal subjects. Images were collected from the Bioptigen system. Total retina (TR, between the ILM and the inner aspect of Bruch’s membrane) mean, and standard deviation maps for individual subjects are also provided in the dataset. In addition to this, the dataset contains the subject-specific mean and standard deviation thickness maps of the RPE and drusen complex.

Duke-2 dataset (Chiu et al. 2015) dataset was created by the Vision and Image Processing (VIP) group at Duke University and is currently accessible to the public. The Duke-II was designed to address DME pathologies of varying severity levels, ranging from mild to moderate to severe. The dataset comprised 610 OCT scans obtained from a cohort of ten subjects diagnosed with DME.

Duke-3 dataset (Srinivasan et al. 2014b) was also created by researchers in Duke University’s VIP lab. Fifteen controls, fifteen participants with dry AMD, and fifteen with DME all underwent volumetric Spectralis SD-OCT scans.

Rabbani dataset (Rasti et al. 2017) consisted of 4141 OCT images (including 50 normal OCTs, 48 OCTs of dry AMD, and 50 OCTs of DME) that were acquired at Noor Eye Hospital in Tehran. In this data set, the lateral and azimuthal resolutions were not uniform across patients, although the axial resolution is 3.5 m with a scan dimension of 8.9 7.4 mm\(^2\). Therefore, the number of A-scans varies between 512 and 768 scans, and the number of B-scans per volume varies between 19, 25, 31, and 61.

BIOMISA dataset (Hassan et al. 2018b) was created by the lab at the National University of Science and Technology, Pakistan, for the purpose of investigating retinal layers, lesions, and identifying normal and abnormal retinal conditions like DME, CSR, AMD, and Glaucoma. There are a total of 5324 scans from 99 people in the dataset, broken down as follows: 657 scans of dry AMD, 2195 scans of DME, 407 scans of wet AMD, 1,161 scans of CSR, and 904 scans of normal eyes.

AFIO (Raja et al. 2020a) was created by the lab at the National University of Science and Technology, Pakistan; it contains OCT and fundus images. The images were captured using the TOPCON 3D OCT-1000 camera attached to an OCT machine. There are a total of 50 images in the dataset, including both normal and glaucomatous conditions. Each OCT image has an accompanying annotated fundus image. Glaucoma experts provided labels for the CDR measured from fundus images. The optic nerve head (ONH) is the focal point of OCT scans. An ophthalmologist manually annotated the ILM and the RPE.

Table 9 A detailed summary of the publicly available OCT datasets, (Kermany et al. 2018b), Duke-1 (Farsiu et al. 2014), Duke-2 (Chiu et al. 2015), Duke-2 (Chiu et al. 2015), Duke-3 (Srinivasan et al. 2014b), Rabbani (Rasti et al. 2017), BIOMISA (Hassan et al. 2018b), and AFIO (Raja et al. 2020a)

13 Discussion and future directions

This paper presents a comprehensive review of AI models that are proposed over the past decade to screen retinal diseases using different set of non-invasive retinal modalities, such as fundus photography, OCT, and OCT-A. Fundus photography is a non-invasive retinal examination scheme that captures fundus of the retina, including optic disc, and blood vessels. Fundus imagery provides a wide-angle view of the retina, making it useful for general screening and documenting retinal diseases. Fundus scans are very good at detecting significant changes in the retina, but they may not be able to see small changes. However, OCT is a non-invasive imaging technique that provides 3D structural analysis of the retina. OCT scans have higher resolution as compared to the fundus photographs, and they can present the cross-sectional visualization of the retina. The cross-sectional retinal visualization allows the objective screening of retinal abnormalities in early stages. Apart from this, the higher spatial resolution of OCT enables earlier detection of the retinal diseases when treatment is more likely to be efficacious. Similarly, OCT imagery can be used to diagnose and monitor a wide variety of retinal diseases, such as DR, AMD, and glaucoma. OCT-A is also a recently introduced retinal examination scheme which allows objective visualization of retinal and choroidal blood flow and microvascular networks. In addition to this, OCT-A does not require the use of dye injection, making it more comfortable for patients to use in a non-invasive manner. OCT-A is often used to diagnose and monitor retinal diseases, such as DR. Both OCT and OCT-A imagery are often used together to provide a more comprehensive assessment of retinal health. Apart from this, many researchers proposed AI models to screen retinal diseases from multi-modal imagery. The screening results from multiple modalities provide more reliable and accurate results as per the clinical standards. Also, the identification and quantification of various biomarkers can produce better diagnosis and progression tracking of different retinal diseases. Manual segmentation can offer high accuracy when performed by expert clinicians. They can precisely outline the boundaries of retinal lesions and differentiate them from normal tissues. However, manual segmentation is subjective and can vary among different observers. AI and ML models, on the other hand, can provide consistent and reproducible results, with significantly lesser inter-observer variability. Manual segmentation can be time-consuming and hectic task, especially on large-scale datasets. However, the AI methods can process the scans more rapidly that can save a significant time of the ophthalmologists. AI methods can also be integrated in a telemedicine and remote screening programs, and their fusion can greatly improve the retinal healthcare. Nevertheless, the automated methods maybe vulnerable to false positives or false negatives which can be reduced by tuning the models in strict consultation with the expert clinicians through blind testing experiments. Figure 12 reports the studies that uses OCT and fundus imagery for screening retinal diseases. From Figure 12, we can observe that there are many studies that identified the significant biomarkers and performed diagnosis of retinal disorders with high accuracy. The main issues which the researchers faced in screening retinal diseases from the fundus images include inconsistent image quality, blurry and unclear backgrounds. Fundus image analysis is expanding into new research areas, encompassing multi-modal imaging fusion, combining genetic and clinical data, longitudinal studies, and incorporating other medical imaging modalities. These areas provide opportunities for further advancements and exploration. While research advancements are promising, the translation of AI models from laboratory experimentation to clinical practice is still limited. On the other hand, OCT image analysis is complex, as the images can be affected by various artifacts, such as motion artifacts caused by patient movement during the scan, shadowing artifacts due to tissue irregularities or structures in the path of the OCT beam, speckle noise, and axial or lateral resolution limitations. Correcting or mitigating these artifacts requires advanced algorithms and techniques.

Fig. 12
figure 12

Distribution of fundus and OCT studies reviewed in this survey. Fundus-related studies have a higher publication frequency compared to studies based on OCT analysis

Apart from this, this paper thoroughly examines the utilization of digital image processing, classic machine learning, and deep learning techniques in the segmentation of retinal lesions and the classification of retinal diseases, as shown in Figure 13. Figure 13 also represent the shift in the prominence of these techniques over time, where DL-based methods are gaining unprecedented popularity in the recent years. One of the limitations of digital image processing techniques is that they require manual parameter tuning, which can be time-consuming and subjective. The selection of optimal parameters for image processing methods may vary across different datasets, leading to potential inconsistencies in the analysis. Image processing methods may also not adapt well to the variations in image quality, such as variations in contrast, illumination, or noise characteristics. These techniques often rely on predefined rules or assumptions that may not be universally applicable. Image processing techniques may struggle to handle complex cases, such as overlapping or intertwined structures, subtle abnormalities, or cases with severe retinal pathologies. Furthermore, ML models also struggle with the generalization problem on unseen or diverse data, leading to performance degradation in the real-world scenarios. Additionally, ML models are sensitive to the choice of hyperparameters and can be prone to overfitting if the model complexity is not properly controlled. DL-based methods, on the other hand, can overcome the limitation of traditional image processing and machine learning models. One major advantage of DL methods is their ability to automatically learn hierarchical representations from the raw data, eliminating the need for manual feature engineering. Additionally, DL models can be trained in an end-to-end manner, allowing them to learn the entire task directly from input to output, without relying on intermediate steps or handcrafted rules. This streamlines the training and inference processes, making DL models more efficient. Furthermore, DL models have demonstrated superior scalability and performance for the segmentation and classification of retinal lesions. With ongoing advancements in the field of retinal analysis, DL is poised to revolutionize the future of diagnosis, treatment, and monitoring of retinal diseases, offering transformative potential for improved patient care and outcomes. Areas of focus include multi-modality fusion for a comprehensive understanding of retinal pathologies, real-time applications for timely diagnosis, transfer learning and domain adaptation for improved generalization, integration with electronic health records for personalized medicine, uncertainty estimation for risk assessment, and collaborative/federated learning for robust and secure analysis. These advancements have the potential to revolutionize retinal analysis, leading to better diagnosis, personalized treatment, and improved patient outcomes.

Fig. 13
figure 13

The chart illustrates the transformative trend in automated techniques over the last decade, showcasing the shift from traditional image processing methods to the adoption and prominence of deep learning approaches. It highlights the increasing reliance on deep learning algorithms for tasks such as the identification, quantification, and classification of biomarkers for various retinal diseases