A survey on computer aided diagnosis for ocular diseases

Zhang, Zhuo; Srivastava, Ruchir; Liu, Huiying; Chen, Xiangyu; Duan, Lixin; Kee Wong, Damon Wing; Kwoh, Chee Keong; Wong, Tien Yin; Liu, Jiang

doi:10.1186/1472-6947-14-80

A survey on computer aided diagnosis for ocular diseases

Research article
Open access
Published: 31 August 2014

Volume 14, article number 80, (2014)
Cite this article

Download PDF

You have full access to this open access article

BMC Medical Informatics and Decision Making Aims and scope Submit manuscript

A survey on computer aided diagnosis for ocular diseases

Download PDF

Zhuo Zhang^1,2,
Ruchir Srivastava¹,
Huiying Liu¹,
Xiangyu Chen¹,
Lixin Duan¹,
Damon Wing Kee Wong¹,
Chee Keong Kwoh²,
Tien Yin Wong³ &
…
Jiang Liu¹

15k Accesses
64 Citations
13 Altmetric
Explore all metrics

Abstract

Background

Computer Aided Diagnosis (CAD), which can automate the detection process for ocular diseases, has attracted extensive attention from clinicians and researchers alike. It not only alleviates the burden on the clinicians by providing objective opinion with valuable insights, but also offers early detection and easy access for patients.

Method

We review ocular CAD methodologies for various data types. For each data type, we investigate the databases and the algorithms to detect different ocular diseases. Their advantages and shortcomings are analyzed and discussed.

Result

We have studied three types of data (i.e., clinical, genetic and imaging) that have been commonly used in existing methods for CAD. The recent developments in methods used in CAD of ocular diseases (such as Diabetic Retinopathy, Glaucoma, Age-related Macular Degeneration and Pathological Myopia) are investigated and summarized comprehensively.

Conclusion

While CAD for ocular diseases has shown considerable progress over the past years, the clinical importance of fully automatic CAD systems which are able to embed clinical knowledge and integrate heterogeneous data sources still show great potential for future breakthrough.

View this article's peer review reports

Computational intelligence in eye disease diagnosis: a comparative study

Article 03 January 2023

Ocular disease detection systems based on fundus images: a survey

Article 03 August 2023

Computerised approaches for the detection of diabetic retinopathy using retinal fundus images: a survey

Article 07 June 2017

Background

Patients with ocular diseases are often unaware of the asymptomatic progression of the said disease [1] until at a later stage when treatment is less effective in preventing vision impairment [2]. Though regular eye screenings enable early detection and timely intervention of such diseases, it would put a significant strain on limited clinical resources. Computer Aided Diagnosis (CAD) systems, which automate the process of ocular disease detection, are urgently needed to alleviate the burden on the clinicians.

Owing to the fast pace of technological advancements in both hardware and software, many CAD systems have been developed for the diagnosis of ocular diseases over the past years, though most of them are still undergoing evaluation or clinical validation. For example, Fujita et al. [3] discussed an emerging CAD system using retinal fundus images for the detection of glaucoma, diabetic retinopathy (DR) and hypertensive retinopathy. Their project has entered the final stage of development, and commercialized CAD systems ought to appear by its completion.

Though such fully automated systems are not yet on the market, semi-automated and manual computer systems incorporating these CAD systems are relatively widely used, with several clinical publications already reporting on their usage. Examples of the development of such systems include IVAN [4] from University of Wisconsin and more recently SIVA from National University of Singapore [5] for semi-automated vascular analysis. Software packages allowing for processing of data garnered from these systems also exist: ADRES 3.0 by Perumalsamy et al. [6] is used for the grading of DR and has been commercialised and deployed for use in diabetic centres and general physician clinics in India; the Singapore Eye Research Institute has also been running clinical trials for the diagnosis of several ocular diseases (e.g., pathological myopia (PM), DR and age related macular degeneration (AMD)) using a uniform set of ophthalmic image reading and analysis protocols [7].

This survey covers three types of data for CAD systems: clinical data, image based data and genetic data. Clinical data refers to a patient’s demographic information (e.g., age, race etc.) and data acquired from clinical laboratory tests or exams, e.g. intra-ocular pressure (IOP), but excludes data acquired from digital imaging or genomic tests (Section “Result: CAD of ocular diseases based on clinical data”). Image based data refers to images captured using an imaging device for observing the pathology in the affected part of the eye (details are in Section “Imaging modalities”). Genetic information refers to any data obtained from an individual’s DNA, genes or proteins (Section “Result: predicting ocular diseases based on genetic information”). These definitions are specific to this paper and may vary depending on context. Of the three data types, CAD systems using clinical data has already been widely studied in the clinical field [8–10]. As far as CAD using genetic information is concerned, recent advancements in genotyping technology have made individual genetic information more commonly available, but it is still unfeasible to utilise genetic information for CAD systems on a large scale presently. Perhaps with time, genetic information will find its rightful place in medicine by supplementing phenotypic clinical data with validated genetic interpretations [11]. We cover genetic data as a possible input to future CAD systems. A considerable amount of the survey is focused on the usage of image based data in CAD systems as they are by far the most important type of data in ocular disease diagnosis.

There have been surveys on retinal imaging in the area of ocular research [12, 13]. However, there lacks a broader literature survey on using CAD for ocular disease diagnosis. This has motivated us to write a systematic review of recently developed methods for CAD in ocular research.

Methods

In this work, we review research and development on automatic ocular disease diagnosis in the light of three data types, viz. clinical, image and genetic. For each data type, we investigate the algorithms and available databases developed for different ocular diseases. The associated publications were retrieved from two literature databases, PubMed and IEEEXplore. Considering the works which use images as data, to understand the major image modalities used for CAD applications and the trends of research areas, we summarize the statistics of image-based studies conducted on various ocular diseases. We examine the biomedical databases to extract the known genetic information regarding ocular diseases.

The results of the review are presented in three sections: Sections “Result: CAD of ocular diseases based on clinical data” and “Result: CAD of ocular diseases based on imaging” describe the CAD of ocular diseases based on clinical data and ocular imaging respectively. Section “Result: predicting ocular diseases based on genetic information” concerns studies relating genomic informatics to disease prediction. Furthermore, in Section “Discussion” we discuss the observed trends in the field and the possibility of CAD systems based on integrated data sources.

Result: CAD of ocular diseases based on clinical data

One of the pioneer research works on Clinical Decision Support Systems (CDSS), CASNET [14] (causal-associational network), was developed in late 1970s to assist in the diagnosis of glaucoma. Clinical data used in CASNET covered symptoms reported by the patient, e.g., ‘ocular pain’, ‘decreased visual acuity’ and various eye examination results, e.g. visual acuity, IOP, anterior chamber depth, angle closure, pupil abnormality and corneal edema [15]. CASNET used a descriptive model of the disease process for logical interpretations of clinical findings for glaucoma. The model representing pathophysiological mechanisms had the form of a semantic net with weighted links. It represented early medical expert systems, providing a framework describing the knowledge of expert consultants and simulating various aspects of the cognitive process of clinicians.

In 2002, Chan et al. [16] reported the first implementation of Support Vector Machines (SVM) in glaucoma diagnosis. Clinical data used in the research was the output from Standard Automated Perimetry (SAP), a common computerized visual field test. The authors compared the performance of a number of machine learning algorithms with SAP output. The machine learning algorithms studied included multilayer perceptron (MLP), SVM, Linear and Quadratic Discriminant Analysis (LDA and QDA), Parzen window, mixture of Gaussian (MOG), and mixture of generalized Gaussian (MGG). It was observed that machine-learning-type classifiers showed improved performance over the best indexes from SAP. The authors also discussed the advantage of using feature selection to further improve the classification accuracy with a potential to reduce testing time by diminishing the number of visual field location measurements.

In 2011, Bizios et al. [17] conducted a study investigating the data fusion methods and techniques for simple combinations of parameters obtained from SAP and measurements of the Retina Nerve Fibre Layer Thickness (RNFLT) obtained from Optical Coherence Tomography (OCT) for diagnosis of glaucoma using Artificial Neural Networks. The results showed that the diagnostic accuracy from a combination of fused SAP and OCT data was higher than using either of the two alone. This was the first reported study using fused data for glaucoma diagnosis.

A recent study [18] investigates the relationship between the central corneal thickness (CCT), Heidelberg Retina Tomography II (HRTII) structural measurements and IOP using an innovative non-linear multivariable regression method, in order to define the risk factors in future glaucoma development.

Two recent works on ocular disease diagnosis based on clinical data need to be mentioned here. Liu et al. [19] developed an automatic glaucoma diagnosis and screening architecture, automatic glaucoma diagnosis through medical imaging informatics (AGLAIA-MII), which combined subjects’ personal data, imaging information from Digital Fundus Photographs (DFPs), and patients’ genome information for glaucoma diagnosis. Features from each data source were extracted automatically. Subsequently, these features were passed to a multiple kernel learning (MKL) framework to generate a final diagnosis outcome. In another work, Zhang et al. [20] proposed a computer-aided diagnosis framework for Pathological Myopia (PM) based on Biomedical and Image Informatics. These heterogeneous data sources contained fundus images, demographic/clinical and genetic data. Their system combined these potentially complementary pieces of information to enhance the understanding of the disease, providing a holistic appreciation of the multiple risks factors as well as improving the diagnostic outcomes. A data-driven approach was proposed to exploit the growth of heterogeneous data sources to improve assessment outcomes.

Other less prevalent diseases which are detected using clinical data are briefly explained in the following:

Trachoma:

Most people with trachoma in its initial stages display no signs or symptoms. Clinically the diagnosis of trachoma can be done by using magnifiers and a flashlight (physical examination) or through a cultural sample of bacteria from the eye tested in a laboratory [21].

Onchocerciasis:

Onchocerciasis is the 2nd leading cause of infectious blindness worldwide. Also called ‘river blindness’, it is a skin and eye disease caused by the parasitic worm and spread by blackflies that breed in fast-flowing water. The two common diagnostic techniques are skin biopsies and serological assays [22].

Clinical databases

There are a number of large scale or population-based eye studies conducted in various countries. For example,

Blue Mountains Eye Study (Australia) [23]
Singapore Malay Eye Study [24]
Singapore Indian Eye Study [25]
Singapore Chinese Eye Study [26]

Many research works conducted on various ocular diseases have been published based on the data collected in these eye studies. However, the data is not publicly available in research community.

Result: CAD of ocular diseases based on imaging

In ophthalmology, ocular imaging has developed rapidly during the past 100 over years and play an critical role in clinical care and ocular disease management [27]. Large-scale systematic research and development of CAD from radiology and medical images began in the early 1980s. The first report on retinal image analysis was published in 1973, focusing on vessel segmentation [28]. In 1984, Baudoin et al. [29] described an image analysis method for detecting lesions related to DR.

Over the past 20 years, developments in image processing relevant to ophthalmology have paved the way for the development of automated diagnostic systems for many diseases such as DR [30], AMD [31], glaucoma [32] and cataract [33]. These diagnostic systems offer the potential to be used in large-scale screening programs, with significant resource savings, as well as freedom from observer bias and fatigue. This section briefly mentions such CAD systems based on ocular imaging. Details are mentioned in Appendix B Details on methods for disease detection. The imaging modalities used by these systems are first introduced below.

Imaging modalities

Figure 1 shows the anatomy of eye. The visible parts of the eye include the transparent cornea, the sclera, the iris and the pupil. A ray of light, passes through the cornea and anterior chamber, followed by the pupil, the lens and the vitreous before finally focusing on the retina [12].

Various medical imaging devices have been developed to capture the different parts of the eye. These imaging modalities are developed based on various technologies and the captured images are used to observe various pathological signs. Table 1 lists the anatomical structure(s) and the associated disease(s) each imaging modality is able to observe.

Table 1 Imaging modalities and diseases to observe

Full size table

Though the eye fundus has been observed since 1850 with the invention of the ophthalmoscope by the German physician Hermann Von Helmholtz [34], it was not until the mid 1920s that the Carl Zeiss Company made available the first commercial fundus camera. In the late 1950s fundus photography became ubiquitous in the practice of ophthalmology for general fundus examination and as a means for recording, storing, and indexing images of a patient with relatively simple and affordable equipment [13]. In recent years, other important imaging modalities, such as fluorescent angiography, stereo fundus photography and confocal laser ophthalmoscopy have appeared to enhance diagnostic and observational capabilities in ophthalmology [35].

Major image modalities used for CAD applications and other research trends are shown in Figure 2. These statistics are obtained by searching the IEEEXplore publication database and demonstrates the trend of research areas and major imaging modalities for ocular research. Figure 2(a) shows the number of publications related to various ocular imaging modalities, while Figure 2(b) shows the number of publications on CAD for ocular diseases using retinal images. The keywords associated with the search are mentioned in the legend of the corresponding figures. It is observed from Figure 2(a) that of all the imaging modalities, DFP has been attracting the most interest. This observation is further substantiated by a distribution of the works surveyed in this paper (Table 2) wherein the works are arranged according to the disease and the associated imaging modality. Note that imaging modalities or diseases with very few associated works have not been included.

Table 2 A distribution of works on CAD of major ocular diseases based on imaging

Full size table

The possible reasons for this observation are two fold. First, information extracted from the eye fundus could be useful in detecting a variety of diseases such as heart disorders, stroke, hypertension, peripheral vascular disease and DR [13]. Furthermore, the availability of inexpensive fundus imaging cameras makes eye examination simple and cost effective. Another modality which is gaining interest in the research community is OCT. First proposed in 1991 [163], OCT has been widely applied in medical imaging especially for imaging the eye. The most important advantage of OCT compared with DFP is that it provides quantifiable depth information enabling a 3D scan of the target part. Therefore it is possible to detect pathologies with topological changes in-vivo. Although a powerful tool [164], in early years, the progress of OCT-based ocular disease detection has been constrained by the speed of OCT imaging. Early version of OCT required lengthy amounts of time to capture an image. In recent years, with the progress of spectral domain OCT (SD-OCT), which needs only 6 seconds to take a high resolution image, OCT-based ocular disease detection methods are increasing in popularity [165]. A brief description of image databases using DFP and OCT is presented in Appendix A Image databases. In terms of the diseases, the most studied disease is DR, followed by glaucoma and AMD (Figure 2(a)).

The images associated with the above mentioned modalities often need preprocessing to remove noise and improve contrast before they can be analyzed further using CAD methods.

Image preprocessing

Some of the common preprocessing methods are histogram equalization [79, 87], shade correction [88, 89, 96], convolution with a Gaussian mask [97], median filtering [98] and blood vessel removal [105, 106].

Most of the contrast enhancement techniques use histogram equalization [79, 87]. Shade correction is often used to normalize illumination [88, 89, 96]. For noise reduction, the commonly used techniques are convoluting with a Gaussian mask [97] or using a median filter [98]. Some of the methods also use blood vessel removal as a preprocessing step since they can be detected as false positives while detecting red lesions, especially MAs [105, 106].

The choice of a suitable preprocessing method depends on the desired effect. Antal and Hajdu [107] experimentally showed that contrast limited adaptive histogram equalization [113] effectively improves local contrast but also introduces noise. Similarly, vessel removal is used to reduce false positives which can be found during red lesion detection. Considering this subjective nature of the preprocessing methods [107], proposed to choose the best pair of preprocessing and segmentation methods through a fusion algorithm.

The remaining part of this section surveys the works on detecting the major ocular diseases, focusing mainly on DR, PM, AMD and glaucoma since these diseases are investigated more than others. Also, for these diseases, DFP is still the main stream modality, but OCT is rapidly gaining widespread adoption. Therefore we focus on these two modalities. The works on other diseases, such as cataract and corneal opacity, will be reviewed briefly in the section Other diseases (Section “Other diseases”).

Diagnostic methods for diseases

This section briefly introduces causes and symptoms for the major ocular diseases, methods of detecting them from images and a brief discussion on the state-of-the-art and possible future directions. More details on the algorithms are mentioned in Appendix B Details on methods for disease detection.

Diabetic retinopathy

Causes and symptoms

DR is a side effect of diabetes which is caused when the blood vessels in the eye start getting blocked due to high sugar content in the blood [166]. Reduced blood supply to the retina can even cause blindness [98]. Symptoms of DR include lesions appearing on the retinal surface. These lesions are visible in a DFP. Figure 3(a) and (b) show the DFPs of a normal eye and a DR affected eye, respectively. DR-related lesions can be categorized into red lesions such as Microaneurysms (MA) and Haemorrhages and bright lesions such as Hard Exudates (HE) and cotton-wool spots (Figure 3(c)). There are a few works which detect other symptoms as well [146].

Detection

Almost all of the work for detecting DR has been performed using DFPs. Most of these approaches detect lesions with special focus on detecting red lesions (Figure 3(d)) especially MAs. MAs receive higher attention since they indicate DR at an early stage [98]. This is important considering that one of the goals for CAD is to provide early detection (Section “Background”). Lesions are detected using morphological operations [114, 167] or image filters [130, 131]. From our study, we could not find any work on detecting lesions from OCT images.

Brief discussion

From the survey of works on DR, it was observed that most of the works have focused on detecting lesions associated with DR. Few works [156] have gone further to convert lesion detection to DR detection. Even for DR detection, most of the works surveyed, have presented their results as a binary detection, i.e whether DR is present or not in an eye. It might be useful to provide a grade to the severity of DR.

In terms of the approach used, only few works [157] have attempted to bypass the lesion detection and used non-clinical features for DR detection. Future research can focus on filling these gaps.

Glaucoma

Causes and symptoms

Glaucoma is characterized by the progressive degeneration of optic nerve fibres, which leads to structural changes of the optic nerve head, the nerve fibre layer and a simultaneous functional failure of the visual field. As the symptoms only occur when the disease is quite advanced, glaucoma is called the silent thief of sight. Although glaucoma cannot be cured, its progression can be slowed down by treatment. Therefore, timely diagnosis of this disease is important [168, 169].

Detection

Glaucoma diagnosis is typically based on the medical history, intra-ocular pressure and visual field loss tests together with a manual assessment of the Optic Disc (OD) through ophthalmoscopy. OD or optic nerve head is the location where ganglion cell axons exit the eye to form the optic nerve, through which visual information of the photo-receptors is transmitted to the brain. In 2D images, the OD can be divided into two distinct zones; namely, a central bright zone called the optic cup (in short, cup) and a peripheral region called the neuroretinal rim [90]. Glaucoma causes an enlargement of cup region with respect to OD (thinning of neuroretinal rim) called cupping [69]. This is one of the important indicators and various parameters related to cupping have been used to detect glaucoma.

These parameters include vertical cup to disc ratio (CDR) [170], disc diameter [171, 172], ISNT rule [173], peripapillary atrophy (PPA) [174] and notching [175]. The most popular measurement is CDR, which is computed as the ratio of the vertical cup diameter (VCD) to vertical disc diameter (VDD) clinically (Figure 4).

Brief discussion

Utilizing DFP and OCT to detect glaucoma are two popular and active directions with OCT having a shorter history. Till now, time-domain OCT and SD-OCT have been widely utilized to perform glaucoma detection [38–40, 44–46, 49]. However, swept-source OCT (SS-OCT) has not been further exploited for the research of glaucoma. For DFP, the combined analysis of stereo DFP and OCT for extracting disc parameters may boost current performance of state-of-the-art algorithms.

Age-related macular degeneration (AMD)

Causes and symptoms

AMD causes vision loss at the central region and blur and distortion at the peripheral region (Figure 5). Depending on the presence of exudates, AMD is classified into dry AMD (non-exudative AMD) and wet AMD (exudative AMD). Dry AMD results from atrophy of the retinal pigment epithelial layer below the retina [176]. It causes vision loss through loss of photoreceptors (rods and cones) in the central part of the retina. The major symptom and also the first clinical indicator of dry AMD is drusen, sub-retinal deposits formed by retinal waste. Wet AMD causes vision loss due to abnormal blood vessel growth (choroidal neovascularization) in the choriocapillaris, through Bruch’s membrane, ultimately leading to blood and protein leakage below the macular. Bleeding, leaking, and scarring from these blood vessels eventually cause irreversible damage to the photoreceptors and rapid vision loss if left untreated. The major symptom of wet AMD is exudation [177].

Detection

AMD can be detected from DFP, OCT, X-ray, and Magnetic Resonance Imaging (MRI). Among them, DFP is perhaps the most widely used one for AMD detection, while OCT is rapidly growing in use. Most of the approaches detecting AMD from DFPs focus on detecting drusen using local thresholding [63, 65], wavelets [63], background modeling [94] and saliency [102] etc. Some of the works have also attempted to bypass drusen detection and directly predict AMD [111, 112, 119, 120, 127, 128, 178]. Considering detecting AMD from OCT, it is easier to observe exudates and edema in OCT images. OCT can segment out retinal layers. Texture and thickness of these layers can help in distinguishing normal region and region corresponding to exudates [31, 36].

Brief discussion

From the above works, it was observed that although OCT imaging is increasingly prevalent, DFP is still the mainstream image modality for AMD detection and screening. It is an active research avenue. However with the progress of SD-OCT, OCT based AMD detection and screening is emerging as a new area of focus.

Pathological myopia (PM)

Causes and symptoms

As one of the leading causes of blindness worldwide, Pathological myopia (PM) is a type of severe and progressive nearsightedness characterized by changes in the fundus of the eye, due to posterior staphyloma and deficient corrected acuity. PM is different from myopia which is caused by the lengthening of the eyeball. For myopia both environmental and genetic factors have been associated with its onset and progression [179], while PM is primarily a genetic condition [180]. Unlike myopia, PM is accompanied by degenerative changes in the retina, which if left untreated can lead to irrecoverable vision loss. The accurate detection of PM will enable timely intervention and facilitate better disease management to slow down the progression of the disease.

Detection

PM has been detected mostly from DFPs where retinal degeneration is observed in the form of PPA [181, 182]. PPA is the thinning of retinal layers around the optic nerve and is characterized by a pigmented ring like structure around the optic disc. Apart from DFPs, there have been studies to detect PM from OCT images [183] however CAD systems for detecting PM from OCT images have not emerged yet.

Brief discussion

Ohno-Matsui et al. [47] analyzed the relationship between the shape of the sclera and the myopic retinochoroidal lesions, and concluded that SS-OCT can provide important information on deformations of the sclera which are related to myopic fundus lesions. Such clinical discoveries provide strong evidences for the use of SS-OCT as a good candidate for future PM-CAD development.

Other diseases

Other major diseases that may lead to blindness include cataract and corneal opacity. Cataract is characterized by a cloudiness in the lens while corneal opacity finds cloudiness in the cornea. CAD research has been conducted for cataract grading rather than detection using on slit lamp images [33]. Grading of cataract severity is essential for cataract surgical planning [184] and an automated grading system offers an objective and efficient solution. Grading is performed by locating the cloudiness and assessing its opacity level [33]. For corneal opacity, there have not been any automatic detection methods reported so far, to the best of our knowledge.

Discussion

Feature extraction plays an essential role in ocular image based CAD systems. From the survey, we observe two broad classes of features used in the ocular CAD systems. Approaches using each one of these are described below:

Approaches using clinical features

Many of the retinal image based CAD systems employ clinical domain knowledge during the feature selection and decision making processes. Such systems focus on identifying disease associated landmarks from images. A number of clinically relevant features can be extracted from the identified landmarks. For example, the following image cues are highly related to glaucoma: large optic CDR [185]; appearance of optic Disc haemorrhage (DH) [186]; thinning of the neuroretinal rim (NRR) or notching of the NRR [175] and presence of PPA [174]. These features based on clinical knowledge can be described as clinical features.

The early efforts in retinal image analysis were focused on optic disc localization. Lowell et al. [187] used specialized template matching to locate optic disc, followed by a global elliptical and local deformable contour model for disc segmentation. Xu et al. [132] presented a deformable-model-based algorithm for the detection of the optic disc boundary in fundus images. Later efforts were spent in optic cup detection. Abramoff et al. [133] analyzed stereo-based DFPs for rim and cup segmentation via pixel feature classification. Wong et al. [188] detected the optic cup using vessel kinking analysis. Joshi et al. [189] proposed a depth discontinuity (in the retinal surface)-based approach to estimate the cup boundary. Based on cup and disc detection, CDR can be obtained based on which CAD systems for automatic glaucoma detection were developed [32, 69, 70, 80]. Cheng et al. [73, 190] developed PPA detection algorithms for Pathological Myopia (PM) detection. Liang et al. [104] focused on detecting drusen presented in retina for automatic AMD detection. Other researchers worked on CAD systems for DR based on various vasculature segmentation algorithms, e.g., matched filters [66, 67], vessel tracking [68] or morphological processing [77, 78].

The advantages of using clinical features in CAD systems are obvious: the CAD results can be interpreted and presented with clinical meaning, furthermore, the prior knowledge allows modeling the disease detection with a small data set, which is critical when the training data is insufficient.

However, the detection models built using clinical features have a number of limitations as mentioned below:

The modeling process is localization or segmentation dependent. For example, [32, 69] detect glaucoma based on optic cup and disc segmentation, a small error in disc localization may propagate downstream and finally yield an error in detection.
The systems are usually threshold-based or rule-based in the decision making stage thus it, by nature, does not produce a quantifiable measurement for the disease detection.
A model built upon prior knowledge may not evolve with the growing available data.
As different diseases may possess different landmark features, the system developed for one disease may not be adaptable for other diseases.
Such systems usually needs to learn from manually curated ground truth images, which is not only time consuming but also prone to human error.
Finally and most importantly, detection of one particular disease associated landmark may neither be the necessary nor be the sufficient condition for disease detection. For example, [71, 83] proposed to recognize PM based on PPA detection, however, having PPA may or may not imply having PM.

Detecting all the retinal changes in DFPs is much more difficult compared to detecting a particular landmark. Statistical learning based on image feature extraction can be a possible solution to address these challenges. The following section casts light on this possibility.

Approaches using non-clinical features

With an increasing availability of image databases and advances in statistical learning, new CAD systems are shifting to non-clinical features. Non-clinical image features relate to the content of the image such as color, texture and gradient.

Many image feature extraction techniques can be applied to retinal image based CAD systems. Bock et al. [81] used an appearance based approach to quantitatively generate a glaucomatic risk index from retina images. Cheng et al. [91] used Focal Biologically Inspired Feature (FBIF) for glaucoma type classification. Wang et al. [191] presented a DFP mosaic algorithm based on Scale-Invariant Feature Transform (SIFT) feature [192] to overcome low contrast and geometric distortion between different fields of view of DFPs. Extracted SIFT features were described using vectors to determine the matching feature point pairs between two images. The transformation matrix was then computed according to purified matching point to generate a panoramic picture with a wide field of view containing more information which may improve CAD systems. Xu et al. [181] presented a CAD system for PM detection based on SIFT features extracted from a DFP. The system achieved a high AUC value (98.4%) as compared to the earlier approaches to detect PM using particular image cues [83].

Another example is the use of superpixels [193, 194]. A superpixel is a perceptually consistent unit with all pixels in a group being similar in color and texture. It reduces the complexity of images from thousands of pixels to only a few hundred superpixels. Algorithms such as Simple Linear Iterative Clustering (SLIC) [195] have been developed to aggregate nearby pixels into superpixels whose boundaries closely match true image boundaries. Many features can be computed from superpixels such as shape, color, location and texture, and they can be used for classification via learning algorithms. Xu et al. [92] presented a superpixel based learning framework based on retinal structure priors for glaucoma detection. The use of superpixels leads to a more descriptive and effective representation than those employed by pixel-based techniques while at the same time yielding significant computational savings over methods based on sliding windows.

Non-clinical features can be considered to be associated with a data driven approach, which has shown many advantages over the approach using clinical features. Extracting non-clinical features is followed by learning from the labeled examples, therefore fewer manual ground truth labeling is needed as compared to the approaches using clinical features. As these systems do not rely on particular image landmarks, they avoid the error cascading due to initial segmentation or localization. Non-clinical features are generalized features which make it possible for the system to transfer knowledge learned from one disease to other diseases. Such feature extraction can facilitate learning algorithms such as multi-task learning [196, 197] and transfer learning [198]. Furthermore, since the techniques apply statistical evaluation, the performance of the systems is expected to improve when more data is available. The result of such systems can be a quantifiable score other than Yes or No, which is particularly useful in clinical assessment. The use of non-clinical features for CAD is a promising area for future CAD systems.

Result: predicting ocular diseases based on genetic information

Genetic information can be used to detect heritable disease related genotypes, mutations or phenotypes for clinical purposes [199]. Ocular diseases are highly inheritable, thus genetic information can provide important insights into disease risk and disease prognosis.

Heritability of ocular diseases

Heritability is the proportion of phenotypic variation in a population that is attributable to genetic variation among individuals [200].

According to [201], heritability can be presented in statistical terms a linear mixed model, where the observable characteristics of an organism can be represented as a linear function of genetic and environmental factors, namely: Phenotype(P) = Genotype(G) + Environment(E), and the heritability can be represented as H ² = G /P where H ² represents the heritability due to all genetic effects. Since the beginning of the 20th century, heritability studies have been conducted on numerous diverse biological and psychological human traits. Among these, attempts have been made to estimate the genetic contribution to human longevity and lifespan [202, 203], and a person’s susceptibility to becoming a smoker [204, 205].

In 1992, the first ophthalmic twin study was conducted to investigate the heredity of refractive error [206].

Since then, over 100 articles have been published in the scientific literature examining the genetic contribution to variation in ophthalmic traits. Table 3 summarizes the heritability of various ocular diseases or ocular related phenotypes as reported in the literature. It is observed that the heritability values reported in different studies vary from each other, as the value is population related.The range of heritability values are shown in Figure 6, from which it is observed that Central Corneal Thickness is the most heritable trait while PM spans a wider range due to its population dependence, and cataract seems a less heritable disease.

Table 3 Heritability for ocular diseases or disease related traits

Full size table

Knowledgebases of genetic markers for ocular diseases

For the past 20 years, biomedical research community has spend huge efforts in identifying genetic markers for heritable diseases, through classical linkage studies [231] or recent Genome-wide association studies [232]. The discovered disease related biomarker include genes, mutations or Single-nucleotide polymorphisms (SNPs). Such valuable knowledge has been continuously accumulated in various biomedical databases which are usually called as knowledgebases. This section introduces the knowledgebases highly relevant to this study.

OMIM - Online Mendelian Inheritance in Man
OMIM is a continuously updated catalog of human genes and genetic disorders and traits, with particular focus on the molecular relationship between genetic variation and phenotypic expression [233]. It is thus considered to be a phenotypic companion to the Human Genome Project [234]. As on 8 May 2013, it has more than 14, 000 disease related gene entries in stock.
GWAS Catalogue - Catalogue of Published Genome-Wide Association Studies (GWAS)
GWAS is an approach to rapidly scan markers across the complete sets of genome (DNA) of many people to find genetic variations associated with a particular disease [235]. The first GWAS published in 2005 [236] was associated with an ocular disease. It investigated AMD and found two SNPs that are significantly associated with AMD. Since then, similar successes have been reported using GWAS to identify genetic variations that contribute to risk of type 1 diabetes [237], Parkinson’s disease [238], heart disorders [239], obesity [240] etc. The GWAS Catalogue http://www.genome.gov/gwastudies/is a collection of GWAS discovered SNPs, hosted by NHGRI (National Human Genome Research Institute). SNP-trait associations listed in the GWAS Catalogue are limited to those with p - values < 1.0 × 10^-5. As on 8 May 2013, the catalog includes 1594 humane GWA studies which examined over 200 diseases and identified more than 10,000 disease associated SNPs.

Ocular disease related SNPs

Figure 7 shows the ocular disease related SNPs found from the OMIM and GWAS Catalogue knowledgebases. There are potentially many uses of these identified SNPs: a better understanding of disease etiology, personalized medicine, new leads for studying underlying biology and risk prediction. From a risk prediction perspective, it is reasonable to average a larger number of predictors, of which some may have (limited) predictive power, and some actually may be noise. The idea being that when added together, the combined small signals results in a signal that is stronger than the noise from the unrelated predictors [241].

Discovering novel disease related snps from large-scale genome wide association study

Computational methods investigating for SNP-trait association study [242, 243] have been developed. Such methods treat SNPs as individual players in one’s genetic profile. Following these methods, efforts [244–246] have been expanded to investigate those SNPs which have little effects on disease risk individually but influence the disease risk jointly, the phenomenon being known as epistatic interaction, where the effects of one gene are believe to be modified by one or several other genes. The single-locus and epistasis SNP detection based algorithms test individual SNPs or pair of SNPs without taking into consideration, the underlying biological intertwining mechanism. Whereas, the real gene-gene interaction participating in biological pathway are often composed of a group of arbitrary number of SNPs. To date, exhaustively detecting significant SNP groups of arbitrary size is still computationally infeasible [245].

Recently, machine learning especially sparse learning algorithms have been introduced for GWAS data analysis. This is intended to tackle the challenge of identifying a group of N potent but interwinely correlated SNPs, some of which may not pass the stringent threshold by themselves. Penalized regression based on Least Absolute Shrinkage Selector Operation (LASSO) [247] have recently been explored for GWAS analysis. Some researchers [248, 249] have proposed 2-step approaches for Genome-wide association analysis via shortlisting a group of marginal predictors using penalized likelihood maximization for further higher order interaction detection. Hoggart et al. [250] have proposed a method to simultaneously analyze all SNPs in genome-wide and re-sequencing association studies. D’Angelo [251] have combined LASSO and principal-components analysis for detection of gene-gene interactions in genome-wide association studies. These approaches are not global due to the 2-stage process and none of them have considered incorporating prior knowledge into the model building. Prior knowledge can be combined into GWAS to improve the power of association study [252]. it can also model dependencies and moderate the curse of dimensionality.

Discussion

From the above survey, two major observations were made. First, there is a trend of transition of the way of acquiring knowledge about CAD from semi-automatic to automatic. The second trend is the integration of heterogeneous data sources. These two trends are discussed in the following subsections.

The trend of semi-automatic to automatic knowledge acquisition

In the 1970s and 80s, research was focused on constructing knowledge-bases from inputs of physicians [253, 254] for CAD tools. Building such systems required a lot human intervention, e.g. experts’ inputs, and can be considered as a ‘semi-automatic’ way for knowledge acquisition. Over the years, the alternative approach of automatic knowledge acquisition without inputs from clinicians or experts, has become more popular [255, 256]. One such way of knowledge acquisition is to capture patterns in data using non-clinical features (Section “Approaches using non-clinical features”). This approach offers several advantages:

Knowledge-bases derived from datasets are more precise in comparison with knowledge-bases constructed from expert inputs, as the inputs provided by human experts may be vague, due to limited grades of perception [257]. An increased precision of CAD systems will make them more reliable for a mass screening application.
Knowledge-bases constructed using the automated approach captures empirical evidence in the data. This approach aligns with the trend of evidence-based decision making, which emphasizes on the use of empirical evidence to make clinical decisions [258].
Medical datasets embed local epidemiological patterns. Hence the derived knowledge-bases can result in more accurate CAD tools, as disease and symptom patterns vary from one region to another [259]. A system learnt using data obtained from a particular region can be expected to be more precise in performing mass screening in that same region. The physician experts on the other hand may not be aware of local trends, especially when they do not have sufficient experience of clinical practice in a particular locality.

The trend of integration of heterogeneous data sources

One of the reasons, why CAD tools may be found to have sub-optimal accuracy is that the training data may itself lack all the attributes that are required for decision making [260]. Combining decision support methodologies that process information stored in different data formats has been shown to improve the performance [261]. Apart from laboratory information, attributes extracted from gene profiling data, visual clues from medical image, as well as other sources could be combined and may possibly lead to more satisfactory accuracy.

The advances in technologies related to medical signal acquisition, medical imaging and genotyping have resulted in a increased volume and complexity of collected bio & medical data. This makes it difficult for physicians to parse through the information while providing timely diagnoses and prognoses. Due to its complexity, analysis of such data has been limited to bioinformatics applications [262]. There is a significant need for development and improvement of computer-aided detection or decision support systems in medicine, with an expected amplification in the future.

In the era of information explosion, data from multiple sources are becoming increasingly available. Retinal fundus cameras can be found in numerous primary community healthcare institutions as well as optical shops. With the dramatic reduction in genotyping costs in recent years, it is foreseeable that SNP data can be acquired at low cost and with as much as ease as demographic clinical data in the near future. The health screening outreach programs have allowed individuals access the clinical data which was hard-to-access previously.

Each of these heterogeneous data sources (image features, personal profile data, SNP data) is likely to contain a different perspective on the disease risk of an individual, based on the pathological, environmental and genetic mechanisms of the disease. These perspectives may potentially be complementary and a combination of the data from these independent sources can provide a more comprehensive and holistic assessment of the disease.

Integration of different data sources in CAD systems can also help in early detection since some of the early symptoms of the disease may appear in one data source but not the other. Consequently, using just one single source or type of data may be limiting for early detection.

There is no previous work attempting to combine these three types of data for automatic disease detection except [20] mentioned in Section “Result: CAD of ocular diseases based on clinical data”. Possible reasons could be that only until recently such data has become available on a large scale. Also, researchers working on these heterogeneous data sets usually come from different domains with different foci, e.g. computer vision and image understanding researchers focused on DFP analysis, bioinformaticians are interested in discovering disease associated SNP or SNP groups. Effectively combining these data can maximize the information gain and pave the way for a holistic approach for automatic and objective disease detection and screening.

Converse to the integration of multiple data sources, there is a possibility of using the same image to detect multiple diseases since many ocular diseases may have common symptoms. Along this line, there are already machine learning algorithm such as multi-task learning which look to solve similar problems. However, to the best of our knowledge, currently there is no work in this direction.

Conclusion

CAD for ocular diseases, which can automate the detection process, has attracted extensive attention from many clinicians and researchers. They not only alleviate the increasing burden on the clinicians by providing automatic and objective diagnosis with valuable insights, but also offer early detection and easy access for patients. In this article, we have reviewed in detail the recent progress of developed methods used in CAD of ocular diseases in available literature. We investigated three types of data (i.e., clinical, genetic and imaging) that have been commonly used in existing methods for CAD. A number of major ocular diseass including DR, Glaucoma, AMD and PM were also introduced along with existing methods that have been proposed to detect these diseases. The necessity of turning semi-automatic acquisition of domain knowledge into fully automatic ones (which does not require inputs from operators) was examined. The advantages of integrating heterogeneous data sources for ocular disease detection were highlighted. We are of the belief that these two trends are of great importance and deserve further study in the future.

Appendix

A Image databases

This section briefly describes the commonly used databases for each disease. The name of the associated disease is mentioned in brackets after the name of the database.

ORIGA ^-light (Glaucoma): The ORIGA ^-light[263] database contains 650 annotated DFPs, including 168 glaucomatous images and 482 randomly selected nonglaucoma images. Each image is tagged with grading information, and manually segmented result of optic disc and cup.
Erlangen Glaucoma Registry (Glaucoma): The Erlangen Glaucoma Registry [264] includes 861 eyes of 454 Caucasian subjects (239 normal eyes of 121 subjects, 250 ocular hypertensive eyes of 118 patients, 372 eyes of 215 patients with chronic open-angle glaucoma).
The Singapore Malay eye study (SiMES) (Glaucoma): SiMES [24] is a population-based study conducted from 2004 to 2007 to assess the causes and risk factors of blindness and visual impairment in the Singapore Malay community. The study was approved by the institutional review board of Singapore Eye Research Institute. The database contains 3280 subjects, with complete or partial personal data, DFP data and genome information for each subject. The personal data in SiMES contains demographic data such as age, gender and height, ocular examination data, such as IOP and cornea thickness, as well as historical medical data. SiMES examined a population-based, cross-sectional, age stratified, random sample of 3280 Malays (78.7% participation rate) aged 40 to 80 years living in Singapore.
The Singapore Indian Eye Study (SINDI) (Glaucoma): The SINDI [25] is a population-based, cross-sectional study, which was conducted on 3400 Indians aged 40 to 83 years residing in Singapore. Ocular components including axial length (AL), anterior chamber depth (ACD), and corneal radius (CR) were measured by partial coherence interferometry. Refraction was recorded in spherical equivalent (SE). After 502 individuals with previous cataract surgery were excluded, ocular biometric data on 2785 adults were analyzed.
The Singapore Chinese Eye Study (SCES) (Glaucoma): The aims of SCES [26] are to identify the determinants of Anterior Chamber Depth (ACD) and to ascertain the relative importance of these determinants in Chinese persons in Singapore. 1060 Chinese participants were recruited from the Singapore Chinese Eye Study. All subjects underwent AS optical coherence tomography (OCT; Carl Zeiss Meditec, Dublin, CA). Customized software (Zhongshan Angle Assessment Program, Guangzhou, China) was used to measure the AS-OCT parameters. Anterior chamber depth was determined using IOLMaster (Carl Zeiss Meditec). Univariate and multivariate regression analysis were performed to assess the association between ACD with ocular biometric and systemic parameters.
High-Resolution Fundus (HRF) Image Database (Glaucoma): The HRF [265] database has been established by Friedrich-Alexander University Erlangen-Nuremberg (Germany) and the Brno University of Technology (Czech Republic). contains 15 images of healthy patients, 15 images of patients with DR and 15 images of glaucomatous patients. Binary gold standard vessel segmentation images are available for each image. Masks determining field of view (FOV) are provided for particular datasets. The gold standard data is generated by a group of experts working in the field of retinal image analysis and clinicians from the cooperating ophthalmology clinics.
The Rotterdam Study (Glaucoma): The Rotterdam Study [266] is a prospective population-based cohort study investigating age-related disorders. The study started in 1990 and is still ongoing. The original cohort was comprised of 7983 participants 55 years or older; ancillary studies were added later on, and in total 14,926 participants have been enrolled. In 2007, OCT scanning of the macular and ONH regions was added to the armamentarium. To determine which regions of the OCT volumes could be segmented in what fraction of subjects, the macular and ONH of 925 consecutive subjects was imaged with the Topcon 3-D OCT-1000 (Topcon, Tokyo, Japan).
DIARETDB0 and DIARETDB1 (DR): These two databases [267, 268] of DFPs contain wide variety of DR related lesions such as Hemorrhages (H), Microaneurysms, Hard Exudates (HE), Cotton Wool Spots (CWS) or Soft Exudates and Neovascularization. There are 219 images in total with 25 of them completely normal. The Field of View (FOV) is 50 deg and image resolution is 1500×1152 pixels. The ground truth is in the form of locations and sizes of the lesions. The major difference between the two databases is that DIARETDB0 has calibration level 0 DFPs which means that the images are taken with different fundus cameras with unknown camera settings. However DIARETDB1 has calibration level 1 DFPs in a sense that images are taken from the same fundus camera. DIARETDB0 is supposed to have more variation in visual appearance across images as compared to DIARETDB1.
ROC (DR): ROC stands for Retinopathy Online Challenge [269] which is a competition aiming to compare the accuracies of MA detectors on a benchmark database. The database consists of 50 training and 50 testing images. The ground truth consists of the positions of the centers of MAs and irrelevant lesions. Ground truth for the training images is released while that for the test images is kept with the organizers. Participants can submit their detection results through the challenge website and the organizers compute a performance score for the detections.
Messidor (DR): Messidor database [270] consists of 1200 DFPs containing MAs, Neovascularization and Hemorrhages. The images were acquired using a color video 3CCD camera on a Topcon TRC NW6 non-mydriatic retinograph with a 45 degree FOV. The images are of resolution 1440 × 960, 2240 × 1488 or 2304×1536 pixels. The ground truth is in the form of Retinopathy grade from 0 (normal) to 3 (most severe). Similarly, risk of macular edema is marked on a scale from 0 (no risk) to 2 (high risk).
STARE (DR, AMD): (STructured Analysis of the REtina) is a dataset containing images of multiple diseases. It contains 397 DFPs in total and ground truth is in the form of severity grades for the disease. The images are of resolution 700×605. Of all the images, 62 were labeled as containing drusen, including 20 ones as large many, 13 ones as large few, 10 as fine many, and 19 as fine few. To the best of our knowledge, it is the first dataset containing drusen labeling. STARE also contains DR related lesions. 91 images are labeled as being affected by DR [75]. It also contains manually labeling of vessels of part of the images.
ARIA (DR, AMD): ARIA was published by St Paul’s Eye Unit of Royal Liverpool University Hospital Trust in UK. It contains 212 images in total, including 92 ones with AMD, 61 normal ones, and 59 ones with DR.
AREDS (AMD): Age-Related Eye Disease Study (AREDS) enrolled 4,757 participants, aged 55-80 years. Among them, 3640 participants had at least early AMD and the other 1117 ones did not [271].
Thalia-D (AMD): Thalia is a dataset constructed by iMED group from I ² R (Institute of Infocomm Research, Singapore). It consists of 350 images, with 96 labeled as early AMD (drusen) and the others non-AMD (no drusen). Image resolution is 3072×2048 and ground truth is in the form of marked drusen boundary [272].
EUGENDA (AMD): Euregio genetic database (EUGENDA) is an ongoing project currently targets on AMD. Now it contains more than 4000 images with more than 191 ones containing drusen (http://www.eugenda.org/).
CAPT (AMD): Complications of Age-Related Macular Degeneration Prevention Trial (CAPT) is a randomized clinical trial to evaluate whether prophylactic laser treatment to the retina can prevent the complications of the advanced stage of AMD. In total, 1052 patients with two high-risk eyes were enrolled. The images collected by CAPT can be used as dataset for automatic AMD detection [273].

Note that for Pathological Myopia, to the best of our knowledge, there have not been many studies on image based CAD. However, there were studies on the prevalence rate of PM [274–277] which used large volumes of DFPs.