A deep learning LSTM-based approach for AMD classification using OCT images

Hamid, Laila; Elnokrashy, Amgad; Abdelhay, Ehab H.; Abdelsalam, Mohamed M.

doi:10.1007/s00521-024-10149-7

A deep learning LSTM-based approach for AMD classification using OCT images

Original Article
Open access
Published: 07 August 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

A deep learning LSTM-based approach for AMD classification using OCT images

Download PDF

Laila Hamid¹,
Amgad Elnokrashy²,
Ehab H. Abdelhay^3,4 &
…
Mohamed M. Abdelsalam^1,4

29 Accesses
Explore all metrics

Abstract

Age-related macular degeneration (AMD) is an age-related, persistent, painless eye disease that impairs central vision. The central area (macula) of the retina, located at the back of the eye, sustains damage that is the cause of loss of vision. The early detection of AMD can increase the probability of treatment and prevent vision loss. The AMD can be classified into dry and wet AMD based on the absence of neovascularization. This study introduces a new methodology for the classification of AMD using optical coherence tomography (OCT) retinal images. The proposed methodology is based on three stages. The first stage is the data preparation stage for resizing and normalizing the used images. The second stage is the image processing stage for enhancing the image quality as contrast and resolution these enhancements have been checked by the weighted peak signal-to-noise ratio (WPSNR) methodology. The third stage is the deep feature extraction and classification stage, which consists of two sub-models. The first model is MobileNet V1 which has been used as a deep feature extractor. The second model is LSTM (long short-term memory), fed with deep features to classify the AMD stages. A multi-classification with six separate trials has been employed with the proposed methodology, and compared with other models like DenseNet201 and InceptionV3. The proposed model has been tested on a sample of benchmark data with 4005 grayscale images labeled into three classes. The proposed methodology has achieved an accuracy of 98.85%, a sensitivity of 99.09%, and a specificity of 99.1%. To ensure the effectiveness of the proposed methodology, a comparative analysis has been established with previous approaches in the related field, and the results demonstrated the superiority of the proposed system in AMD multi-classification.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Age-related macular degeneration (AMD) is one of the most progressive retinal diseases worldwide and is responsible for 8.7% of blindness, particularly in elderly individuals over 60 years [1]. It is labeled a "priority eye disease" by the WHO [2]. AMD initially affects the center of the retina which is called the macula which is a portion of the retinal eye layer and responsible for the central vision as it contains photoreceptor cells [3]. Therefore, any damage to the macula ultimately leads to vision loss. AMD cannot result in total blindness, but it can make the vision more difficult to drive, read, recognize people, and do close-up tasks like housework or cooking.

AMD can be classified into two types dry and wet AMD based on the neovascularization absence [4]. Dry AMD, also called Atrophic AMD, accounts for approximately 85–90% [5]. In dry AMD, the macula becomes thinner, as well as drusens form under the retina. These drusens are the buildup of yellowish deposits, causing visual loss that gradually aggravates over time depending on the number, size, and location of drusens [6]. There are three phases of the dry AMD early, intermediate, and late phase. Usually, it takes several years to improve slowly. Although there is no cure for late dry AMD, there are strategies to maximize the amount of vision the patient still has.

Wet AMD, often known as advanced neovascular AMD, is a less frequent kind of late AMD that typically results in a faster loss of eyesight. Wet AMD can develop from any stage of dry AMD, although it is always a late stage. The macula is harmed when aberrant blood vessels develop in the rear of the eye, which is called neovascularization. This choroidal neovascularization begins to build up beneath the retina causing blood leaks and macula damage. Approximately 10% to 15% of AMD's dry form develops into its wet form [5]. The loss of eyesight can happen quickly and gradually in wet AMD [7].

Ophthalmologists examine the macula through a dilated pupil to decide if the patient has AMD or if it is a dry or wet type [8]. The traditional diagnosis tools for AMD use fundus retinal imaging (FRI) and optical coherence tomography retinal imaging (OCT). OCT is a highly preferred method for ophthalmologists to assess retinal diseases [9]. Figure 1 illustrates the normal eye and the AMD types using OCT images. Ophthalmologists must carefully examine numerous OCT cross sections for each patient, which is time-consuming and requires professional training for ophthalmologists. In remote regions or places with a lack of medical resources, healthcare technologists act as the primary health care, and they might not have enough experience to make an accurate diagnosis. Patients often have to wait several weeks for the diagnosis, which delays therapy and consumes a lot of labor and social resources [10]. Although AMD cannot be completely treated [11], early detection and treatment can reduce its progression. Thus, the usage of recent techniques, especially artificial intelligence (AI), for early detection of eye diseases helps prioritize the cases of the patients.

The foundation of medical diagnostics is the analysis of disease images obtained using cutting-edge digital technologies. AI makes it possible to automatically make accurate assessments of medical images, which decreases the workload of physicians, reduces diagnostic errors; and turnaround times, and improves performance in the prognosis and detection of various diseases [12, 13].

Recently, AI technology, specifically deep learning, has made significant advances, which give a hand to novel algorithms to classify eye diseases such as AMD. The next section will highlight some of the most recent studies that have sparked interest in AMD classification.

2 Related work

To categorize OCT images of AMD and diabetic macular edema (DME) obtained from the Kaggle dataset, Chen et al. [14] employed a convolutional neural network (CNN) using transfer learning models that include Googlenet22, VGG16, VGG19, ResNet18, ResNet50, and ResNet101. When the proper hyperparameters were used, their experimental results showed that the VGG19, ResNet101, and ResNet50 models performed remarkably well in classifying OCT images. Sotoudeh-Paima et al. [15] presented a multi-scale CNN using a feature pyramid network (FPN) for feature fusion, with VGG16 acting as the encoder. The architecture was tested using the NEH dataset and the UCSD dataset. Their framework results showed that the ability to detect retinal diseases that appeared in various sizes was demonstrated by the generation of heat maps they used in the framework, which reached an accuracy of 93.4% ± 1.4%.

Serener et al. [16] used transfer learning models like ResNet18 and AlexNet to categorize the two types of AMD as dry and wet. The two models were fine-tuned, and the results showed that ResNet18 is superior for this classification due to its accuracy of 99.5% for dry type and 98.8% for wet type. Kadry et al. [17] extracted handcrafted features from images and concatenated them with VGG16 features, so they extracted the features and then used the Mayfly algorithm as a feature selector. Using the OCTID database and the iChallenge-AMD database, this framework served as a binary classifier for AMD. He et al. [18] classified the UCSD dataset and the Duke dataset into normal and abnormal types using the local outlier factor (LOF) algorithm. This model relied on the ResNet50 model with L2-constrained SoftMax loss to extract features after retraining the model. It produced better results, with an accuracy of 99.87% for the UCSD dataset and 97.56% for the Duke dataset.

Thomas et al. [19] trained a seven-layer multi-scale convolutional neural network on the Mendeley dataset to accurately classify images into AMD and normal cases. This method made it possible to construct more local structures with different filter sizes and achieved an accuracy of 99.73% for the Mendeley dataset, 98.08% for OCTID, 96.66% for Duke, and 97.95% for the NEH dataset. Celebi et al. [20] proposed a modified CapsNet model that incorporated optimized Bayesian non-local means (OBNLM) for data augmentation speckle noise reduction. The method was evaluated on the Kaggle dataset and achieved an accuracy of 96.39%. Thomas et al. [21] suggested a multi-scale and multi-path CNN that was assessed using tenfold cross-validation methods and six convolutional layers. The suggested CNN was employed as a feature extractor, with the extracted features fed into various traditional classifiers. The accuracy achieved when tested on various datasets was 96.66% for the Duke dataset, 98.97% for the NEH, 99.74% for AREDS2, and 99.78% for the Kaggle dataset.

An et al. [22] built the model in two steps. First, they adjusted the pre-trained model VGG16 to distinguish between images of normal and AMD. Second, the suggested model from the first step was transferred and re-learned to distinguish between images of AMD with fluid and those without any fluid. As a consequence, the first classification had an accuracy of 99.2%, while the second classification had an accuracy of 95.1%. Some researchers have relied on the extraction of the retinal pigmented epithelium (RPE) layer to classify OCT images such as the following research. Arabi et al. [23] extracted the images' (RPE) layer to identify OCT images as AMD or normal. The average mean value of white pixels for both classification types was calculated using a sample of the extracted layer. The images were classified with 75% accuracy by setting a threshold value and developing a decision rule.

Sharif et al. [24] used graph theory dynamic programming to extract the retinal RPE layer. This research employed a feature set composed of features taken from the RPE differential signal and the RPE inner segment–outer segment layer. The created technique used the support vector machine classifier to detect AMD and normal types with 95% accuracy. Several Researchers also used FRI images to classify AMD types. Heo et al. [25] used the VGG16 model as a classification model to classify 399 FRI images after performing preprocessing and augmentation. First, they categorized the three types by dividing each of the two types individually with fivefold cross-validation. The model achieved an average accuracy of 91.92% for dry and normal classification, 98.13% for wet and normal classification, and 91.32% for dry and wet classification. Then, the model was employed to classify three types at once with an average accuracy of 90.86%. Zapata et al. [26] proposed a CNN model to categorize AMD as normal or not, which got an accuracy of 86.3% when they used private fundus images. Finally, Table 1 summarizes all the previous related works.

Table 1 Previous work of AMD classification

Full size table

3 Research significance

OCT has become the most widely used imaging modality in ophthalmology. It is a vital tool for baseline retinal evaluation before therapy commences and for monitoring the effectiveness of treatment [27]. DL techniques hold significant promise for revolutionizing the classification of AMD with the use of OCT images due to several factors [28]:

1.
Ophthalmologists can utilize them as a supplementary tool. This could make diagnosis more accessible, particularly where access to specialists is scarce.
2.
DL techniques can identify complex patterns that are invisible to the human eye. This can lead to more accurate AMD classifications at earlier stages, potentially preventing vision loss.
3.
By streamlining the diagnostic process, DL techniques would allow ophthalmologists to concentrate on difficult cases.
4.
The ability of deep learning to recognize complex patterns in retinal images may reveal novel biomarkers, assisting in the diagnosis of AMD [29].

4 Motivation and contributions

Even with the recent research papers, there are some major challenges regarding AMD classification using AI; coming first, the main challenges can be listed as follows:

1-
The complexity of the OCT structure makes it hard to extract features with high accuracy.
2-
The high similarity between normal and dry AMD, requires the physicians’ experience for diagnosis.
3-
Early detection may slow the progression of dry cases to wet ones.

This research aims to propose a complete framework based on convolutional neural networks (CNN), which includes extensive steps from image acquisition to AMD multi-classification. The proposed deep learning framework classifies retinal OCT images and predicts the presence and severity of AMD disease using digital image processing. This paper's contributions are summarized below:

1-
To enhance the retinal OCT dataset, a digital image processing technique has been implemented and the MATLAB weighted peak signal-to-noise ratio (WPSNR) has been used as a quantitative measure for the enhanced image quality.
2-
Different regularization techniques have been utilized during training the model to stop the overfitting such as L2 regularizer and early stopping function.
3-
To achieve high performance in AMD multi-classification, a framework consisting of the MobileNet V1 model which has been fine-tuned with the addition of an LSTM layer, with custom fully connected layers has been built.
4-
To validate the proposed approach, a comparative study has been discussed and the evaluation has been discussed based on the seven important performance metrics.

5 Materials

This study used an open-access dataset from Kaggle [30]. This dataset includes 84,495 OCT images, which are divided into four categories: (normal) represents no retinal disease, (drusen) or dry AMD, choroidal neovascularization (CNV) or wet AMD, and diabetic macular edema (DME). Table 2 shows the number of images for each category. Samples of the used dataset are shown in Fig. 2.

Table 2 Set of OCT images of AMD and DME

Full size table

The proposed work has been established on two separate platforms. MATLAB (MathWorks) has been used to accomplish image preparation and processing. The Kaggle notebook, with Nvidia Tesla T4 GPUs provided for free access, has been used for training and evaluating the proposed model using Python 3.

6 Methods

This section describes in detail the work stages to categorize OCT images as normal, dry, and wet AMD. The feasibility of AMD multi-classification has been studied using deep learning. The raw data have been used directly without the need for data augmentation. The main stages of the proposed approach are described in Fig. 3. The dataset was acquired at first. Then, image preparation and processing have been implemented on the dataset. After that, part of the dataset has been used to train the DL model. After training the model, it has become able to classify normal, dry, and wet AMD. Finally, different evaluation metrics have been utilized to validate the proposed model performance with another part of the dataset.

6.1 Data acquisition

The proposed study used a subset of the Kaggle dataset [30]. This study has focused on AMD retinal disease; therefore, normal, dry, and wet AMD images have only been used. The subset data have been split into 65% for training the proposed model, 15% for the model validation during training, and 20% for testing the model. The size of the used data for the proposed model with cross-validation percentage is shown in Table 3.

Table 3 The size of the used data for the proposed model

Full size table

6.2 Data preparation

Images have been preprocessed in two steps. In the first step, it was critical to resize all images to 224 × 224 pixels with 3 channels to match the pre-trained models’ input shapes. The second step, known as normalization, involves rescaling every image pixel from a specific range (0 to 255) to a value between the range of 0 and 1. This has been accomplished using the Keras image data generator. This is shown in Fig. 4.

6.3 Image processing

The image processing system's block diagram is shown in Fig. 5. A histogram has been constructed for the raw data to display the weights of each image's gray level. Subsequently, we have selected the primary peak value from each image histogram and subtracted it from the original images. The sections below illustrate the whole system.

The image subtraction technique has been used to eliminate the noise in the used dataset to enhance the image quality. Hence, the original images have white spots scattered throughout the entire image. The technique is based on plotting the histogram for each class at first. Then, select the value of the most repetitive gray level in the image’s pixels according to the weight of the pixels. Finally, this value has been subtracted from all pixels' weights in the corresponding image. This makes the illumination more customized and the image darker. Figure 6 shows the result after processing.

Several measures have been used to ensure the quality of the resultant images. A qualitative measure has been deduced by a group of specialists in the Ophthalmic Center of Mansoura University, the group has been consulted to ensure that this process has not affected the retinal OCT structure and ensure the clarity of the features extracted therefrom.

A quantitative measure has been done by using the weighted peak signal-to-noise ratio (WPSNR) methodology. This method produces a report about the resultant image quality compared with the original image. It considers the local human visual sense (HVS) sensitivity [31]. The WPSNR is defined by Eq. (1):

$$ {\text{WPSNR}} = 10 \log_{10} \left( { \frac{{{\text{MAX}}^{2} }}{{{\text{WMSE}}}} } \right) \left( {{\text{dB}}} \right) $$

(1)

where MAX is the maximum possible intensity value in the image and WMSE is the weighted mean square error. Figure 7 shows the histogram of an original image; by examining the histogram, there has been one primary peak in all the images.

Figure 8 demonstrates that the highest WPSNR, which correlates with the best image in terms of eye perception, has been obtained by subtracting the histogram peak value in each image from the original one. The WPSNR value is above 20 dB which is a great value as mentioned in [32].

6.4 Training the deep learning model

6.4.1 CNN architecture and transfer learning

Convolutional neural networks (CNNs) are one of the most common classification architectures. CNNs are particularly good when handling image classification because they can learn the spatial hierarchies of characteristics such as edges, textures, and forms, which are critical for recognizing objects in images. Transfer learning with (CNNs) exploits the fact that CNNs trained on big datasets, such as ImageNet, have learned general features applicable to a wide range of visual tasks. Instead of training a CNN from scratch on a new dataset, transfer learning involves starting with a previously trained CNN and making customized adjustments to it on the new dataset.

6.4.2 Proposed approach

In the proposed approach, two sub-models have been conducted: a transfer learning model called MobileNet V1, which has been constructed with optimized hyperparameters to extract the deep features of the considered images, and the LSTM model for interpreting the features across time steps. The scheme of the proposed approach is depicted in Fig. 9.

The full MobileNet network consists of 30 layers. MobileNet uses depthwise separable convolutions which are made from a sequence of two operations: depthwise convolution and pointwise convolution. Depthwise separable convolutions significantly reduce the number of parameters when compared to the network with regular convolutions with the same depth in the nets.

As a first step, the resultant images, after preprocessing, have been fed to the model with an input shape of 224 × 244 with 3 channels. Although the pre-trained MobileNet learned a huge number of image features, fine-tuning the model has helped it to gain more specific features from the used dataset. Thus, the deep layers of the pre-trained MobileNet have been trained using the training set of the considered data as a starting point of fine-tuning; after training, the deep features have been extracted. The fine-tuning technique has improved the model's effectiveness with the considered dataset. Several trials of fine-tuning, shown in the results section, have proved that the best performance has been obtained when all the model layers have been unfrozen.

As a second step, the final layer of the MobileNet model, the hypercolumn features, has been fed to one layer of long short-term memory (LSTM). A flattened layer has been added to get a 1D feature vector of dimension 1 × 1 × 4900. Then, adding a fully connected layer with 1000 neurons has been accompanied by a dropout layer with a rate of half. Another fully connected layer with 3 neurons has been integrated with the SoftMax activation function to classify the images.

For the training parameters, the proposed model has been compiled with an average learning rate of 0.000125. Different optimizers have been used such as Adam, RMSprop, and SGD to find which optimizer will gain the best performance. As shown in the following sections, the number of epochs employed varied since using regularization techniques to reduce overfitting, such as the early stopping function and the L2 regularizer.

6.4.2.1 Regularization

The model learns from the dataset during training and adjusts the weights accordingly. The model's training and validation errors decrease as it generalizes the input dataset. After a few epochs, the model begins to memorize the data. As long as the training error decreases, the validation error grows, resulting in the model overfitting. Regularization is the most applied mathematical technique for reducing overfitting. Two separate regularization techniques have been used in the proposed work. The first one is the L2 regularizer. It reduces the model overfitting by penalizing the model by adding the regularization terms to the model's loss function. For the loss function, the cross-entropy loss function has been selected which is defined by Eq. (2):

$$ C = - \sum\limits_{{k = 1}}^{k} {y_{k} {\text{ log}}(P_{k} )} $$

(2)

where ${P}_{k}$ is the predicted value, ${y}_{k}$ is the actual value, and $k$ is the total number of classes. When the regularization term is added to the loss function, it is defined by Eq. (3):

$${L}_{\lambda }(w) = L(w) + \lambda {\left|w\right|}_{2}^{2} .$$

(3)

where ${L}_{\uplambda }$ is the regularized loss, $L(\text{w})$ is the unregularized loss function, ${\left|w\right|}_{2}^{2}$ is the regularization term, the absolute square of the model’s weights, and $\lambda $ is a regularization parameter [33].

The second regularization technique is called early stopping. In this technique, a specific number of epochs is set as a threshold [34]. Following that, the validation loss is automatically monitored throughout the training. When the validation loss continues to rise, it is an indicator for the early stopping function to stop training the model. As a result, the training stops as soon as the validation loss exceeds the threshold. This technique helps regularize the model and saves training time. All the early stopping formulas exist in this reference [34].

6.4.2.2 Long short-term memory (LSTM)

LSTM is a type of recurrent neural network (RNN) that includes feedback connections. It has been developed due to gradient vanishing problems of traditional RNN and has the potential to memorize long-term dependencies. LSTM is combined with three gates, an input gate ${i}_{t}$, a forget gate ${f}_{t}$, and an output gate ${\text{o}}_{\text{t}}$, where ${x}_{t}$ denotes the current input, ${C}_{t}$ and ${C}_{t-1}$ denote the new and previous cell states, and ${{h}_{t} }$ and ${{\text{h}}_{\text{t}} }_{-1}$ are the current and previous outputs. The LSTM equations are numbered from (4) through (9).

$$ i_{t} = \sigma (W_{i} [h_{t - 1} ,x_{t} ] + b_{i} ) $$

(4)

$$ z_{t} = \tanh \left( {W_{z} \left[ {h_{t - 1} ,x_{t} } \right] + b_{z} } \right) $$

(5)

$$ f_{t} = \sigma \left( {W_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right) $$

(6)

$$ C_{t} = \left( {i_{t} *z_{t} } \right) + \left( {f_{t} *C_{{t_{{ - 1}} }} } \right) $$

(7)

$$ o_{t} = \sigma \left( {W_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right) $$

(8)

$$ h_{t} = o_{t} *\tanh \left( {C_{t} } \right) $$

(9)

In the equation,$\text{W}$ represents each LSTM gate's unit weight, and b represents each LSTM gate's unit, while $\sigma {\text{represents logistic sigmoid function}}.$ The structure of LSTM cells is shown in Fig. 10.

It has been found that LSTM could enhance CNN's feature extraction capability. LSTM can remember patterns for a long time while CNNs can extract the important features from them [7]. The local and global information of the feature set is encoded by the convolution layers, while the LSTM layer decodes the encoded information. The information is further flattened and passed into a fully connected layer for classification. So, the LSTM-CNN layered structure has an edge over conventional CNN classifiers when used for image classification. The LSTM in this research has been implemented using the Keras package with a TensorFlow backend.

7 Results and discussion

Figure 11 shows the steps involved in the image preparation and processing. Block (a) represents the original images of each class sample. The original images have been resized to constant size 224 × 224 × 3, presented in block (b). After that, images have been normalized in block (c). Finally, the image subtraction technique has been applied to the normalized image to produce the processed images in block (d).

7.1 Evaluation metrics

Various performance evaluation metrics have been utilized in this study to validate the proposed model. The evaluation results have been stored in the form of a confusion matrix. The confusion matrix is a prominent machine learning classification measure. It represents the predicted values from the model and the actual values for each class [35]. This matrix displays four key elements to test the model: the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) produced by the proposed model on the test data. Based on these keys, different metrics such as the overall accuracy, sensitivity, specificity, precision, f1-score, and the area under the curve can be measured. Their corresponding mathematical formulations are given below.

As given in Eq. (10), accuracy is the number of correct predicted samples to the total number of test samples.

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} $$

(10)

As given in Eq. (11), sensitivity measures the effectiveness of the model. It is the number of (TP) samples to the sum of (TP) and (FN) samples.

$$ {\text{Sensitivity}}\left( {{\text{Recall}}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$

(11)

As given in Eq. (12), specificity is the number of the predicted samples to be negative to the total number of negative samples.

$$ {\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}} $$

(12)

As given in Eq. (13), precision describes the accuracy of the model’s positive predictions. It is the number of the correct predicted samples to be positive to the total number of predicted samples to be positive.

$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$

(13)

As given in Eq. (14), F1-score is the harmonic means of recall and precision.

$$ F1 - {\text{score}} = 2*\frac{{{\text{Precision}}*{\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$

(14)

In addition, the area under the receiver operating characteristic curve (ROC curve) is used to show the performance of the proposed model for each class.

Figure 12 shows different fine-tuning experiments that have been conducted on the MobileNetV1 model with Adam optimizer. We have used the test set which consists of 1335 images for normal, 1335 images for dry AMD, and 1335 images for wet AMD. First, we have fine-tuned only the last 10 layers without additional layers, resulting in a training accuracy of 97.59% and a testing accuracy of 96.78%. So, we have fine-tuned the entire layers until we have reached a training accuracy of 99.74% and testing accuracy of 97.1%, allowing the model to update the weights while training on the dataset. All these results have been improved when the previous experimental model has been combined with LSTM and an additional dense layer. The model has achieved 99.98% training accuracy and 98.85% testing accuracy. To summarize, the best technique we have observed is to fine-tune the whole layer by adding an LSTM layer and a dense layer.

Figure 13a shows the accuracy results during the training and validation stages while Fig. 13b shows the loss results of training and validation.

To further demonstrate the effectiveness of the proposed model, various CNN models have been deployed to classify the OCT dataset into normal, dry, and wet AMD. Separately, MobileNetV1, DenseNet201, and InceptionV3 have been combined with LSTM. Several optimizers (Adam, SGD, and RMSprop) have been applied to these models in order to demonstrate the efficacy of each on the data with AMD Classification. Table 4 records the best precision, accuracy, sensitivity, F1-score, and specificity out of six independent runs of each experiment on the test set.

Table 4 Comparison of using different optimizers with various CNN models

Full size table

Comparing previous experiments, we can observe that the MobileNetV1-LSTM model with the Adam optimizer, the proposed model indicated in bold, produced the best results. The proposed model has been examined over 73 epochs, yielding a testing accuracy of 98.85%, a specificity of 99.09%, a sensitivity ranging from 98.12 to 99.47%, a precision ranging from 98.5 to 99.47%, and an f1-score ranging from 98.34 to 99.47%.

Figure 14 presents the confusion matrices of the previous experiments. As shown, the rows represent the used optimizers (Adam, SGD, and RMSprop) and the columns represent the used models (MobileNetV1, DenseNet201, and InceptionV3 each combined with LSTM).

By comparing the matrices, it can be observed that the proposed model, represented by the green matrix, has achieved the best values. 1312 correctly identified dry AMD test images, 1321 correctly classified wet AMD test images, and 1327 correctly classified images as normal.

The ROC curve of the proposed model is plotted in Fig. 15 and the values of AUC are very close to 1.00 for all classes. Hence, it proves that the proposed model performs well in the classification of AMD using OCT images.

7.2 Comparative analysis

A comparative analysis with previous researchers’ studies has been introduced in Table 5 to examine the robustness of the proposed model. As shown, these studies can be divided into four categories:

Table 5 Comparative analysis

Full size table

The first category contains the studies that used the same working Kermany dataset with the same classes of normal, dry, and wet AMD. Sotoudeh-Paima et al. used FPN-VGG16 as a classifier with an accuracy of 93.4 ± 1.4 using the full Kermany dataset while the proposed model has achieved 98.85 accuracy using only 20,010 OCT images of the same dataset.

The second category contains the studies that used the same working Kermany dataset but with different AMD classes. Thomas et al. used multi-scale and multi-path CNN techniques with SVM as a classifier which resulted in an accuracy of 99.78, taking into consideration that they used this technique for AMD binary classification not for multi-classification, unlike the proposed model. He et al. used ResNet50 to classify the Kermany dataset into binary classes with an accuracy of 99.87. Yan et al. used ResNet34 to classify 51,140 OCT images of the Kermany dataset into normal, drusen, active CNV, and inactive CNV. This technique produced a maximum sensitivity of 96.5, which is approximately 3% lower than the proposed model sensitivity. In the Chen et al. study, they used the ResNet18 model to classify 46,321 OCT images into dry and wet AMD only with a maximum accuracy of 99.5.

The third category contains the studies that used different datasets with the same AMD proposed classes. In another study by Chen et al. study, they used multi-modality models to classify the ZJU dataset with a maximum accuracy of 96.08, which is approximately 2.5% lower than the proposed model accuracy. The last category contains the studies that used different datasets with AMD binary classification. As a result, the proposed classification model has achieved higher performance measures in the multi-classification of AMD.

8 Conclusion

In this study, a new methodology has been introduced using two sub-models: the MobileNetV1 and LSTM deep networks. The proposed methodology was used in the classification of normal, dry AMD, and wet AMD using Optical Coherence Tomography (OCT) images. The proposed methodology is based on three stages. (1) the data processing stage, (2) the image processing stage, and (3) the deep feature extraction and classification stage which has two models (i) a feature extraction model using MobileNet V1 and (ii) a classifier model which uses LSTM. A multi-classification with six separate trials has been employed with the proposed methodology. The proposed methodology has achieved higher accuracy when compared with the state-of-the-art models on AMD multi-classification using a small part of the Kermany dataset. This research proves that a dataset of small size around 20,010 images was sufficient to obtain high accuracy for the multi-classification of AMD. The proposed methodology has achieved 98.85% accuracy, 99.09% sensitivity, and 99.1% specificity. In subsequent work, it is planned to optimize the model's hyperparameters and focus on the progression of dry AMD to wet type.

Data availability

This study used an open-access dataset from Kaggle [30].

References

Wong WL, Su X, Li X, Cheung CMG, Klein R, Cheng CY, Wong TY (2014) Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Global Health 2(2):e106–e116. https://doi.org/10.1016/s2214-109x(13)70145-1
Article Google Scholar
Marks R (2023) https://actascientific.com/ASOR/pdf/ASOR-06-0704.pdf. Acta Scientific Orthopaedics, 6(3): 126–133. https://doi.org/10.31080/asor.2023.06.0704
van Lookeren Campagne M, LeCouter J, Yaspan BL, Ye W (2013) Mechanisms of age-related macular degeneration and therapeutic opportunities. J Pathol 232(2):151–164. https://doi.org/10.1002/path.4266
Article Google Scholar
Bird AC, Bressler NM, Bressler SB, Chisholm IH, Coscas G, Davis MD, de Jong PT, Klaver CC, Klein BE, Klein R et al (1995) An international classification and grading system for age-related maculopathy and age-related macular degeneration. The international ARM epidemiological study group. Surv Ophthalmol 39(5):367–74. https://doi.org/10.1016/s0039-6257(05)80092-x
Article Google Scholar
Schultz NM, Bhardwaj S, Barclay C, Gaspar L, Schwartz J (2021) Global burden of dry age-related macular degeneration: A targeted literature review. Clin Ther. 43(10):1792–1818. https://doi.org/10.1016/j.clinthera.2021.08.011
Article Google Scholar
Bonilha V (2008) Age and disease-related structural changes in the retinal pigment epithelium. Clin Ophthalmol. https://doi.org/10.2147/opth.s2151
Article Google Scholar
Age-Related Macular Degeneration. (2017). Age-related macular degeneration. http://www.drgindi.com/PatientPortal/MyPractice.aspx?UAID=%7B245E31C8-DEFF-4BF8-90E6-8FDB07A10710%7D&ID=HW5hw176039&Title=Age-Related-Macular-Degeneration
Bernstein PS, Zhao DY, Wintch SW, Ermakov IV, McClane RW, Gellermann W (2002) Resonance Raman measurement of macular carotenoids in normal subjects and in age-related macular degeneration patients. Ophthalmology 109(10):1780–1787. https://doi.org/10.1016/s0161-6420(02)01173-9
Article Google Scholar
Wilde C, Patel M, Lakshmanan A, Amankwah R, Dhar-Munshi S, Amoaku W (2015) The diagnostic accuracy of spectral-domain optical coherence tomography for neovascular age-related macular degeneration: a comparison with fundus fluorescein angiography - eye. Nature. https://doi.org/10.1038/eye.2015.44
Article Google Scholar
Alsaih K, Lemaitre G, Rastgoo M, Massich J, Sidibé D, Meriaudeau F (2017) Machine learning techniques for diabetic macular edema (DME) classification on SD-OCT images. BioMed Eng OnLine. https://doi.org/10.1186/s12938-017-0352-9
Article Google Scholar
Zarbin MA, Rosenfeld PJ (2010) Pathway-based therapies for age-related macular degeneration. Retina 30(9):1350–1367. https://doi.org/10.1097/iae.0b013e3181f57e30
Article Google Scholar
Ghaffar Nia N, Kaplanoglu E, Nasab A (2023) Evaluation of artificial intelligence techniques in disease diagnosis and prediction. Discover Artif Intell. https://doi.org/10.1007/s44163-023-00049-5
Article Google Scholar
Asteris PG, Gandomi AH, Armaghani DJ, Tsoukalas MZ, Gavriilaki E, Gerber G, Konstantakatos G, Skentou AD, Triantafyllidis L, Kotsiou N, Braunstein E, Chen H, Brodsky R, Touloumenidou T, Sakellari I, Alkayem NF, Bardhan A, Cao M, Cavaleri L, Dimopoulos MA (2024) Genetic justification of COVID-19 patient outcomes using DERGA, a novel data ensemble refinement greedy algorithm. J Cell Mol Med. https://doi.org/10.1111/jcmm.18105
Article Google Scholar
Chen YM, Huang WT, Ho WH, Tsai JT (2021) Classification of age-related macular degeneration using convolutional-neural-network-based transfer learning. BMC Bioinform. https://doi.org/10.1186/s12859-021-04001-1
Article Google Scholar
Sotoudeh-Paima S, Jodeiri A, Hajizadeh F, Soltanian-Zadeh H (2022) Multi-scale convolutional neural network for automated AMD classification using retinal OCT images. Comput Biol Med 144:105368. https://doi.org/10.1016/j.compbiomed.2022.105368
Article Google Scholar
Serener A, Serte S (2019) Dry and wet age-related macular degeneration classification using OCT images and deep learning. IEEE. https://doi.org/10.1109/ebbt.2019.8741768
Article Google Scholar
Kadry S, Rajinikanth V, González Crespo R, Verdú E (2021) Automated detection of age-related macular degeneration using a pre-trained deep-learning scheme. J Supercomput 78(5):7321–7340. https://doi.org/10.1007/s11227-021-04181-w
Article Google Scholar
He T, Zhou Q, Zou Y (2022) Automatic detection of age-related macular degeneration based on deep learning and local outlier factor algorithm. Diagnostics 12(2):532. https://doi.org/10.3390/diagnostics12020532
Article Google Scholar
Thomas A, Harikrishnan PM, Krishna AK, Palanisamy P, Gopi V P (2021) A novel multiscale convolutional neural network based age-related macular degeneration detection using OCT images. Biomed Signal Process Control 67:102538. https://doi.org/10.1016/j.bspc.2021.102538
Article Google Scholar
Celebi ARC, Bulut E, Sezer A (2022) Artificial intelligence based detection of age-related macular degeneration using optical coherence tomography with unique image preprocessing. Eur J Ophthalmol 33(1):65–73. https://doi.org/10.1177/11206721221096294
Article Google Scholar
Thomas A, Harikrishnan P, Ramachandran R, Ramachandran S, Manoj R, Palanisamy P, Gopi VP (2021) A novel multiscale and multipath convolutional neural network based age-related macular degeneration detection using OCT images. Comput Methods Progr Biomed 209:106294. https://doi.org/10.1016/j.cmpb.2021.1062946
Article Google Scholar
An G, Akiba M, Yokota H, Motozawa N, Takagi S, Mandai M, Kitahata S, Hirami Y, Takahashi M, Kurimoto Y (2019) Deep learning classification models built with two-step transfer learning for age related macular degeneration diagnosis. Annu Int Conf IEEE Eng Med Biol Soc 2019:2049–2052. https://doi.org/10.1109/EMBC.2019.8857468
Article Google Scholar
Arabi PM, Krishna N, Ashwini V, Prathibha HM (2018) Identification of age-related macular degeneration using OCT images. IOP Conf Ser: Mater Sci Eng 310:012096. https://doi.org/10.1088/1757-899x/310/1/012096
Article Google Scholar
Sharif MM, Akram MU, Malik, AW (2018) Extraction and analysis of RPE layer from OCT images for detection of age related macular degeneration. In: Proceedings of the 2018 IEEE 20th international conference on e-health networking, applications and services (Healthcom), Ostrava, Czech Republic, pp 17–20
Heo TY, Kim KM, Min HK, Gu SM, Kim JH, Yun J, Min JK (2020) Development of a deep-learning-based artificial intelligence tool for differential diagnosis between dry and neovascular age-related macular degeneration. Diagnostics 10(5):261. https://doi.org/10.3390/diagnostics10050261
Article Google Scholar
Zapata MA, Royo-Fibla D, Font O, Vela JI, Marcantonio I, Moya-Sanchez EU, Sanchez-Perez A, Garcia-Gasulla D, Cortes U, Ayguade E, Labarta J (2020) Artificial intelligence to identify retinal fundus images, quality validation, laterality evaluation, macular degeneration, and suspected glaucoma. Clin Ophthalmol 14:419–429. https://doi.org/10.2147/opth.s235751
Article Google Scholar
Lee CS, Baughman DM, Lee AY (2017) Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol Retina 1(4):322–327. https://doi.org/10.1016/j.oret.2016.12.009
Article Google Scholar
Esfahani PR, Reddy AJ, Nawathey N, Ghauri MS, Min M, Wagh H, Tak N, Patel R (2023) Deep learning classification of drusen, choroidal neovascularization, and diabetic macular Edema in optical coherence tomography (OCT) images. Curēus. https://doi.org/10.7759/cureus.41615
Article Google Scholar
Muchuchuti S, Viriri S (2023) Retinal disease detection using deep learning techniques: a comprehensive review. J Imaging. https://doi.org/10.3390/jimaging9040084
Article Google Scholar
Retinal OCT Images (optical coherence tomography). (n.d.). Retinal OCT Images (Optical Coherence Tomography)|Kaggle. https:///datasets/paultimothymooney/kermany2018
Fourati W, Bouhlel MS (2005) A novel approach to improve the performance of JPEG2000. In: ICGST international journal on graphics, vision and image processing, 5(5)
Abd-el-kader A, El-Din Moustafa H, Rehan S (2011) Performance measures for image fusion based on wavelet transform and curvelet transform. In: 2011 28th national radio science conference (NRSC), Cairo, Egypt, pp 1–7
Van Laarhoven T (2017) L2 Regularization versus batch and weight normalization. arXiv (Cornell University). https://arxiv.org/pdf/1706.05350.pdf
Prechelt L (1998) Automatic early stopping using cross validation: quantifying the criteria. Neural Netw 11(4):761–767. https://doi.org/10.1016/s0893-6080(98)00010-0
Article Google Scholar
Amin E, Elgammal Y, Zahran M, Abdelsalam M (2023) Alzheimer’s disease: new insight in assessing of amyloid plaques morphologies using multifractal geometry based on Naive Bayes optimized by random forest algorithm. Sci Rep 13:18568. https://doi.org/10.1038/s41598-023-45972-w
Article Google Scholar
Yan Y, Jin K, Gao Z, Huang X, Wang F, Wang Y, Ye J (2021) Attention-based deep learning system for automated diagnoses of age-related macular degeneration in optical coherence tomography images. Med Phys 48(9):4926–4934. https://doi.org/10.1002/mp.15002
Article Google Scholar
Chen M, Jin K, Yan Y, Liu X, Huang X, Gao Z, Wang Y, Wang S, Ye J (2022) Automated diagnosis of age-related macular degeneration using multi-modal vertical plane feature fusion via deep learning. Med Phys 49(4):2324–2333. https://doi.org/10.1002/mp.15541
Article Google Scholar

Download references

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and Affiliations

Computers and Control Systems Engineering Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Laila Hamid & Mohamed M. Abdelsalam
Mansoura Ophthalmic Center, Faculty of Medicine, Mansoura University, Mansoura, Egypt
Amgad Elnokrashy
Electronics and Communications Engineering Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Ehab H. Abdelhay
Faculty of Engineering, Mansoura National University, Mansoura, Egypt
Ehab H. Abdelhay & Mohamed M. Abdelsalam

Authors

Laila Hamid
View author publications
You can also search for this author in PubMed Google Scholar
Amgad Elnokrashy
View author publications
You can also search for this author in PubMed Google Scholar
Ehab H. Abdelhay
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed M. Abdelsalam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laila Hamid.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hamid, L., Elnokrashy, A., Abdelhay, E.H. et al. A deep learning LSTM-based approach for AMD classification using OCT images. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-10149-7

Download citation

Received: 09 May 2024
Accepted: 01 July 2024
Published: 07 August 2024
DOI: https://doi.org/10.1007/s00521-024-10149-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A deep learning LSTM-based approach for AMD classification using OCT images

Abstract

1 Introduction

2 Related work

3 Research significance

4 Motivation and contributions

5 Materials

6 Methods

6.1 Data acquisition

6.2 Data preparation

6.3 Image processing

6.4 Training the deep learning model

6.4.1 CNN architecture and transfer learning

6.4.2 Proposed approach

6.4.2.1 Regularization

6.4.2.2 Long short-term memory (LSTM)

7 Results and discussion

7.1 Evaluation metrics

7.2 Comparative analysis

8 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation