1 Introduction

Age-related macular degeneration (AMD) is one of the most progressive retinal diseases worldwide and is responsible for 8.7% of blindness, particularly in elderly individuals over 60 years [1]. It is labeled a "priority eye disease" by the WHO [2]. AMD initially affects the center of the retina which is called the macula which is a portion of the retinal eye layer and responsible for the central vision as it contains photoreceptor cells [3]. Therefore, any damage to the macula ultimately leads to vision loss. AMD cannot result in total blindness, but it can make the vision more difficult to drive, read, recognize people, and do close-up tasks like housework or cooking.

AMD can be classified into two types dry and wet AMD based on the neovascularization absence [4]. Dry AMD, also called Atrophic AMD, accounts for approximately 85–90% [5]. In dry AMD, the macula becomes thinner, as well as drusens form under the retina. These drusens are the buildup of yellowish deposits, causing visual loss that gradually aggravates over time depending on the number, size, and location of drusens [6]. There are three phases of the dry AMD early, intermediate, and late phase. Usually, it takes several years to improve slowly. Although there is no cure for late dry AMD, there are strategies to maximize the amount of vision the patient still has.

Wet AMD, often known as advanced neovascular AMD, is a less frequent kind of late AMD that typically results in a faster loss of eyesight. Wet AMD can develop from any stage of dry AMD, although it is always a late stage. The macula is harmed when aberrant blood vessels develop in the rear of the eye, which is called neovascularization. This choroidal neovascularization begins to build up beneath the retina causing blood leaks and macula damage. Approximately 10% to 15% of AMD's dry form develops into its wet form [5]. The loss of eyesight can happen quickly and gradually in wet AMD [7].

Ophthalmologists examine the macula through a dilated pupil to decide if the patient has AMD or if it is a dry or wet type [8]. The traditional diagnosis tools for AMD use fundus retinal imaging (FRI) and optical coherence tomography retinal imaging (OCT). OCT is a highly preferred method for ophthalmologists to assess retinal diseases [9]. Figure 1 illustrates the normal eye and the AMD types using OCT images. Ophthalmologists must carefully examine numerous OCT cross sections for each patient, which is time-consuming and requires professional training for ophthalmologists. In remote regions or places with a lack of medical resources, healthcare technologists act as the primary health care, and they might not have enough experience to make an accurate diagnosis. Patients often have to wait several weeks for the diagnosis, which delays therapy and consumes a lot of labor and social resources [10]. Although AMD cannot be completely treated [11], early detection and treatment can reduce its progression. Thus, the usage of recent techniques, especially artificial intelligence (AI), for early detection of eye diseases helps prioritize the cases of the patients.

Fig. 1
figure 1

Normal OCT and AMD types

The foundation of medical diagnostics is the analysis of disease images obtained using cutting-edge digital technologies. AI makes it possible to automatically make accurate assessments of medical images, which decreases the workload of physicians, reduces diagnostic errors; and turnaround times, and improves performance in the prognosis and detection of various diseases [12, 13].

Recently, AI technology, specifically deep learning, has made significant advances, which give a hand to novel algorithms to classify eye diseases such as AMD. The next section will highlight some of the most recent studies that have sparked interest in AMD classification.

2 Related work

To categorize OCT images of AMD and diabetic macular edema (DME) obtained from the Kaggle dataset, Chen et al. [14] employed a convolutional neural network (CNN) using transfer learning models that include Googlenet22, VGG16, VGG19, ResNet18, ResNet50, and ResNet101. When the proper hyperparameters were used, their experimental results showed that the VGG19, ResNet101, and ResNet50 models performed remarkably well in classifying OCT images. Sotoudeh-Paima et al. [15] presented a multi-scale CNN using a feature pyramid network (FPN) for feature fusion, with VGG16 acting as the encoder. The architecture was tested using the NEH dataset and the UCSD dataset. Their framework results showed that the ability to detect retinal diseases that appeared in various sizes was demonstrated by the generation of heat maps they used in the framework, which reached an accuracy of 93.4% ± 1.4%.

Serener et al. [16] used transfer learning models like ResNet18 and AlexNet to categorize the two types of AMD as dry and wet. The two models were fine-tuned, and the results showed that ResNet18 is superior for this classification due to its accuracy of 99.5% for dry type and 98.8% for wet type. Kadry et al. [17] extracted handcrafted features from images and concatenated them with VGG16 features, so they extracted the features and then used the Mayfly algorithm as a feature selector. Using the OCTID database and the iChallenge-AMD database, this framework served as a binary classifier for AMD. He et al. [18] classified the UCSD dataset and the Duke dataset into normal and abnormal types using the local outlier factor (LOF) algorithm. This model relied on the ResNet50 model with L2-constrained SoftMax loss to extract features after retraining the model. It produced better results, with an accuracy of 99.87% for the UCSD dataset and 97.56% for the Duke dataset.

Thomas et al. [19] trained a seven-layer multi-scale convolutional neural network on the Mendeley dataset to accurately classify images into AMD and normal cases. This method made it possible to construct more local structures with different filter sizes and achieved an accuracy of 99.73% for the Mendeley dataset, 98.08% for OCTID, 96.66% for Duke, and 97.95% for the NEH dataset. Celebi et al. [20] proposed a modified CapsNet model that incorporated optimized Bayesian non-local means (OBNLM) for data augmentation speckle noise reduction. The method was evaluated on the Kaggle dataset and achieved an accuracy of 96.39%. Thomas et al. [21] suggested a multi-scale and multi-path CNN that was assessed using tenfold cross-validation methods and six convolutional layers. The suggested CNN was employed as a feature extractor, with the extracted features fed into various traditional classifiers. The accuracy achieved when tested on various datasets was 96.66% for the Duke dataset, 98.97% for the NEH, 99.74% for AREDS2, and 99.78% for the Kaggle dataset.

An et al. [22] built the model in two steps. First, they adjusted the pre-trained model VGG16 to distinguish between images of normal and AMD. Second, the suggested model from the first step was transferred and re-learned to distinguish between images of AMD with fluid and those without any fluid. As a consequence, the first classification had an accuracy of 99.2%, while the second classification had an accuracy of 95.1%. Some researchers have relied on the extraction of the retinal pigmented epithelium (RPE) layer to classify OCT images such as the following research. Arabi et al. [23] extracted the images' (RPE) layer to identify OCT images as AMD or normal. The average mean value of white pixels for both classification types was calculated using a sample of the extracted layer. The images were classified with 75% accuracy by setting a threshold value and developing a decision rule.

Sharif et al. [24] used graph theory dynamic programming to extract the retinal RPE layer. This research employed a feature set composed of features taken from the RPE differential signal and the RPE inner segment–outer segment layer. The created technique used the support vector machine classifier to detect AMD and normal types with 95% accuracy. Several Researchers also used FRI images to classify AMD types. Heo et al. [25] used the VGG16 model as a classification model to classify 399 FRI images after performing preprocessing and augmentation. First, they categorized the three types by dividing each of the two types individually with fivefold cross-validation. The model achieved an average accuracy of 91.92% for dry and normal classification, 98.13% for wet and normal classification, and 91.32% for dry and wet classification. Then, the model was employed to classify three types at once with an average accuracy of 90.86%. Zapata et al. [26] proposed a CNN model to categorize AMD as normal or not, which got an accuracy of 86.3% when they used private fundus images. Finally, Table 1 summarizes all the previous related works.

Table 1 Previous work of AMD classification

3 Research significance

OCT has become the most widely used imaging modality in ophthalmology. It is a vital tool for baseline retinal evaluation before therapy commences and for monitoring the effectiveness of treatment [27]. DL techniques hold significant promise for revolutionizing the classification of AMD with the use of OCT images due to several factors [28]:

  1. 1.

    Ophthalmologists can utilize them as a supplementary tool. This could make diagnosis more accessible, particularly where access to specialists is scarce.

  2. 2.

    DL techniques can identify complex patterns that are invisible to the human eye. This can lead to more accurate AMD classifications at earlier stages, potentially preventing vision loss.

  3. 3.

    By streamlining the diagnostic process, DL techniques would allow ophthalmologists to concentrate on difficult cases.

  4. 4.

    The ability of deep learning to recognize complex patterns in retinal images may reveal novel biomarkers, assisting in the diagnosis of AMD [29].

4 Motivation and contributions

Even with the recent research papers, there are some major challenges regarding AMD classification using AI; coming first, the main challenges can be listed as follows:

  1. 1-

    The complexity of the OCT structure makes it hard to extract features with high accuracy.

  2. 2-

    The high similarity between normal and dry AMD, requires the physicians’ experience for diagnosis.

  3. 3-

    Early detection may slow the progression of dry cases to wet ones.

This research aims to propose a complete framework based on convolutional neural networks (CNN), which includes extensive steps from image acquisition to AMD multi-classification. The proposed deep learning framework classifies retinal OCT images and predicts the presence and severity of AMD disease using digital image processing. This paper's contributions are summarized below:

  1. 1-

    To enhance the retinal OCT dataset, a digital image processing technique has been implemented and the MATLAB weighted peak signal-to-noise ratio (WPSNR) has been used as a quantitative measure for the enhanced image quality.

  2. 2-

    Different regularization techniques have been utilized during training the model to stop the overfitting such as L2 regularizer and early stopping function.

  3. 3-

    To achieve high performance in AMD multi-classification, a framework consisting of the MobileNet V1 model which has been fine-tuned with the addition of an LSTM layer, with custom fully connected layers has been built.

  4. 4-

    To validate the proposed approach, a comparative study has been discussed and the evaluation has been discussed based on the seven important performance metrics.

5 Materials

This study used an open-access dataset from Kaggle [30]. This dataset includes 84,495 OCT images, which are divided into four categories: (normal) represents no retinal disease, (drusen) or dry AMD, choroidal neovascularization (CNV) or wet AMD, and diabetic macular edema (DME). Table 2 shows the number of images for each category. Samples of the used dataset are shown in Fig. 2.

Table 2 Set of OCT images of AMD and DME
Fig. 2
figure 2

Samples of the dataset

The proposed work has been established on two separate platforms. MATLAB (MathWorks) has been used to accomplish image preparation and processing. The Kaggle notebook, with Nvidia Tesla T4 GPUs provided for free access, has been used for training and evaluating the proposed model using Python 3.

6 Methods

This section describes in detail the work stages to categorize OCT images as normal, dry, and wet AMD. The feasibility of AMD multi-classification has been studied using deep learning. The raw data have been used directly without the need for data augmentation. The main stages of the proposed approach are described in Fig. 3. The dataset was acquired at first. Then, image preparation and processing have been implemented on the dataset. After that, part of the dataset has been used to train the DL model. After training the model, it has become able to classify normal, dry, and wet AMD. Finally, different evaluation metrics have been utilized to validate the proposed model performance with another part of the dataset.

Fig. 3
figure 3

Workflow stages of the proposed approach

6.1 Data acquisition

The proposed study used a subset of the Kaggle dataset [30]. This study has focused on AMD retinal disease; therefore, normal, dry, and wet AMD images have only been used. The subset data have been split into 65% for training the proposed model, 15% for the model validation during training, and 20% for testing the model. The size of the used data for the proposed model with cross-validation percentage is shown in Table 3.

Table 3 The size of the used data for the proposed model

6.2 Data preparation

Images have been preprocessed in two steps. In the first step, it was critical to resize all images to 224 × 224 pixels with 3 channels to match the pre-trained models’ input shapes. The second step, known as normalization, involves rescaling every image pixel from a specific range (0 to 255) to a value between the range of 0 and 1. This has been accomplished using the Keras image data generator. This is shown in Fig. 4.

Fig. 4
figure 4

Image preparation stages

6.3 Image processing

The image processing system's block diagram is shown in Fig. 5. A histogram has been constructed for the raw data to display the weights of each image's gray level. Subsequently, we have selected the primary peak value from each image histogram and subtracted it from the original images. The sections below illustrate the whole system.

Fig. 5
figure 5

Block diagram of image processing system

The image subtraction technique has been used to eliminate the noise in the used dataset to enhance the image quality. Hence, the original images have white spots scattered throughout the entire image. The technique is based on plotting the histogram for each class at first. Then, select the value of the most repetitive gray level in the image’s pixels according to the weight of the pixels. Finally, this value has been subtracted from all pixels' weights in the corresponding image. This makes the illumination more customized and the image darker. Figure 6 shows the result after processing.

Fig. 6
figure 6

AMD classes with and without processing

Several measures have been used to ensure the quality of the resultant images. A qualitative measure has been deduced by a group of specialists in the Ophthalmic Center of Mansoura University, the group has been consulted to ensure that this process has not affected the retinal OCT structure and ensure the clarity of the features extracted therefrom.

A quantitative measure has been done by using the weighted peak signal-to-noise ratio (WPSNR) methodology. This method produces a report about the resultant image quality compared with the original image. It considers the local human visual sense (HVS) sensitivity [31]. The WPSNR is defined by Eq. (1):

$$ {\text{WPSNR}} = 10 \log_{10} \left( { \frac{{{\text{MAX}}^{2} }}{{{\text{WMSE}}}} } \right) \left( {{\text{dB}}} \right) $$
(1)

where MAX is the maximum possible intensity value in the image and WMSE is the weighted mean square error. Figure 7 shows the histogram of an original image; by examining the histogram, there has been one primary peak in all the images.

Fig. 7
figure 7

Original image histogram with the primary peak

Figure 8 demonstrates that the highest WPSNR, which correlates with the best image in terms of eye perception, has been obtained by subtracting the histogram peak value in each image from the original one. The WPSNR value is above 20 dB which is a great value as mentioned in [32].

Fig. 8
figure 8

WPSNR value of the processed image

6.4 Training the deep learning model

6.4.1 CNN architecture and transfer learning

Convolutional neural networks (CNNs) are one of the most common classification architectures. CNNs are particularly good when handling image classification because they can learn the spatial hierarchies of characteristics such as edges, textures, and forms, which are critical for recognizing objects in images. Transfer learning with (CNNs) exploits the fact that CNNs trained on big datasets, such as ImageNet, have learned general features applicable to a wide range of visual tasks. Instead of training a CNN from scratch on a new dataset, transfer learning involves starting with a previously trained CNN and making customized adjustments to it on the new dataset.

6.4.2 Proposed approach

In the proposed approach, two sub-models have been conducted: a transfer learning model called MobileNet V1, which has been constructed with optimized hyperparameters to extract the deep features of the considered images, and the LSTM model for interpreting the features across time steps. The scheme of the proposed approach is depicted in Fig. 9.

Fig. 9
figure 9

Proposed model structure

The full MobileNet network consists of 30 layers. MobileNet uses depthwise separable convolutions which are made from a sequence of two operations: depthwise convolution and pointwise convolution. Depthwise separable convolutions significantly reduce the number of parameters when compared to the network with regular convolutions with the same depth in the nets.

As a first step, the resultant images, after preprocessing, have been fed to the model with an input shape of 224 × 244 with 3 channels. Although the pre-trained MobileNet learned a huge number of image features, fine-tuning the model has helped it to gain more specific features from the used dataset. Thus, the deep layers of the pre-trained MobileNet have been trained using the training set of the considered data as a starting point of fine-tuning; after training, the deep features have been extracted. The fine-tuning technique has improved the model's effectiveness with the considered dataset. Several trials of fine-tuning, shown in the results section, have proved that the best performance has been obtained when all the model layers have been unfrozen.

As a second step, the final layer of the MobileNet model, the hypercolumn features, has been fed to one layer of long short-term memory (LSTM). A flattened layer has been added to get a 1D feature vector of dimension 1 × 1 × 4900. Then, adding a fully connected layer with 1000 neurons has been accompanied by a dropout layer with a rate of half. Another fully connected layer with 3 neurons has been integrated with the SoftMax activation function to classify the images.

For the training parameters, the proposed model has been compiled with an average learning rate of 0.000125. Different optimizers have been used such as Adam, RMSprop, and SGD to find which optimizer will gain the best performance. As shown in the following sections, the number of epochs employed varied since using regularization techniques to reduce overfitting, such as the early stopping function and the L2 regularizer.

6.4.2.1 Regularization

The model learns from the dataset during training and adjusts the weights accordingly. The model's training and validation errors decrease as it generalizes the input dataset. After a few epochs, the model begins to memorize the data. As long as the training error decreases, the validation error grows, resulting in the model overfitting. Regularization is the most applied mathematical technique for reducing overfitting. Two separate regularization techniques have been used in the proposed work. The first one is the L2 regularizer. It reduces the model overfitting by penalizing the model by adding the regularization terms to the model's loss function. For the loss function, the cross-entropy loss function has been selected which is defined by Eq. (2):

$$ C = - \sum\limits_{{k = 1}}^{k} {y_{k} {\text{ log}}(P_{k} )} $$
(2)

where \({P}_{k}\) is the predicted value, \({y}_{k}\) is the actual value, and \(k\) is the total number of classes. When the regularization term is added to the loss function, it is defined by Eq. (3):

$${L}_{\lambda }(w) = L(w) + \lambda {\left|w\right|}_{2}^{2} .$$
(3)

where \({L}_{\uplambda }\) is the regularized loss, \(L(\text{w})\) is the unregularized loss function, \({\left|w\right|}_{2}^{2}\) is the regularization term, the absolute square of the model’s weights, and \(\lambda \) is a regularization parameter [33].

The second regularization technique is called early stopping. In this technique, a specific number of epochs is set as a threshold [34]. Following that, the validation loss is automatically monitored throughout the training. When the validation loss continues to rise, it is an indicator for the early stopping function to stop training the model. As a result, the training stops as soon as the validation loss exceeds the threshold. This technique helps regularize the model and saves training time. All the early stopping formulas exist in this reference [34].

6.4.2.2 Long short-term memory (LSTM)

LSTM is a type of recurrent neural network (RNN) that includes feedback connections. It has been developed due to gradient vanishing problems of traditional RNN and has the potential to memorize long-term dependencies. LSTM is combined with three gates, an input gate \({i}_{t}\), a forget gate \({f}_{t}\), and an output gate \({\text{o}}_{\text{t}}\), where \({x}_{t}\) denotes the current input, \({C}_{t}\) and \({C}_{t-1}\) denote the new and previous cell states, and \({{h}_{t} }\) and \({{\text{h}}_{\text{t}} }_{-1}\) are the current and previous outputs. The LSTM equations are numbered from (4) through (9).

$$ i_{t} = \sigma (W_{i} [h_{t - 1} ,x_{t} ] + b_{i} ) $$
(4)
$$ z_{t} = \tanh \left( {W_{z} \left[ {h_{t - 1} ,x_{t} } \right] + b_{z} } \right) $$
(5)
$$ f_{t} = \sigma \left( {W_{f} \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right) $$
(6)
$$ C_{t} = \left( {i_{t} *z_{t} } \right) + \left( {f_{t} *C_{{t_{{ - 1}} }} } \right) $$
(7)
$$ o_{t} = \sigma \left( {W_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right) $$
(8)
$$ h_{t} = o_{t} *\tanh \left( {C_{t} } \right) $$
(9)

In the equation,\(\text{W}\) represents each LSTM gate's unit weight, and b represents each LSTM gate's unit, while \(\sigma {\text{represents logistic sigmoid function}}.\) The structure of LSTM cells is shown in Fig. 10.

Fig. 10
figure 10

Structure of LSTM cells

It has been found that LSTM could enhance CNN's feature extraction capability. LSTM can remember patterns for a long time while CNNs can extract the important features from them [7]. The local and global information of the feature set is encoded by the convolution layers, while the LSTM layer decodes the encoded information. The information is further flattened and passed into a fully connected layer for classification. So, the LSTM-CNN layered structure has an edge over conventional CNN classifiers when used for image classification. The LSTM in this research has been implemented using the Keras package with a TensorFlow backend.

7 Results and discussion

Figure 11 shows the steps involved in the image preparation and processing. Block (a) represents the original images of each class sample. The original images have been resized to constant size 224 × 224 × 3, presented in block (b). After that, images have been normalized in block (c). Finally, the image subtraction technique has been applied to the normalized image to produce the processed images in block (d).

Fig. 11
figure 11

Steps of image preparation and processing

7.1 Evaluation metrics

Various performance evaluation metrics have been utilized in this study to validate the proposed model. The evaluation results have been stored in the form of a confusion matrix. The confusion matrix is a prominent machine learning classification measure. It represents the predicted values from the model and the actual values for each class [35]. This matrix displays four key elements to test the model: the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) produced by the proposed model on the test data. Based on these keys, different metrics such as the overall accuracy, sensitivity, specificity, precision, f1-score, and the area under the curve can be measured. Their corresponding mathematical formulations are given below.

As given in Eq. (10), accuracy is the number of correct predicted samples to the total number of test samples.

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} $$
(10)

As given in Eq. (11), sensitivity measures the effectiveness of the model. It is the number of (TP) samples to the sum of (TP) and (FN) samples.

$$ {\text{Sensitivity}}\left( {{\text{Recall}}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$
(11)

As given in Eq. (12), specificity is the number of the predicted samples to be negative to the total number of negative samples.

$$ {\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}} $$
(12)

As given in Eq. (13), precision describes the accuracy of the model’s positive predictions. It is the number of the correct predicted samples to be positive to the total number of predicted samples to be positive.

$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$
(13)

As given in Eq. (14), F1-score is the harmonic means of recall and precision.

$$ F1 - {\text{score}} = 2*\frac{{{\text{Precision}}*{\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(14)

In addition, the area under the receiver operating characteristic curve (ROC curve) is used to show the performance of the proposed model for each class.

Figure 12 shows different fine-tuning experiments that have been conducted on the MobileNetV1 model with Adam optimizer. We have used the test set which consists of 1335 images for normal, 1335 images for dry AMD, and 1335 images for wet AMD. First, we have fine-tuned only the last 10 layers without additional layers, resulting in a training accuracy of 97.59% and a testing accuracy of 96.78%. So, we have fine-tuned the entire layers until we have reached a training accuracy of 99.74% and testing accuracy of 97.1%, allowing the model to update the weights while training on the dataset. All these results have been improved when the previous experimental model has been combined with LSTM and an additional dense layer. The model has achieved 99.98% training accuracy and 98.85% testing accuracy. To summarize, the best technique we have observed is to fine-tune the whole layer by adding an LSTM layer and a dense layer.

Fig. 12
figure 12

Several fine-tuning experiments for the proposed model

Figure 13a shows the accuracy results during the training and validation stages while Fig. 13b shows the loss results of training and validation.

Fig. 13
figure 13

a Training and validation accuracy. b Training and validation loss

To further demonstrate the effectiveness of the proposed model, various CNN models have been deployed to classify the OCT dataset into normal, dry, and wet AMD. Separately, MobileNetV1, DenseNet201, and InceptionV3 have been combined with LSTM. Several optimizers (Adam, SGD, and RMSprop) have been applied to these models in order to demonstrate the efficacy of each on the data with AMD Classification. Table 4 records the best precision, accuracy, sensitivity, F1-score, and specificity out of six independent runs of each experiment on the test set.

Table 4 Comparison of using different optimizers with various CNN models

Comparing previous experiments, we can observe that the MobileNetV1-LSTM model with the Adam optimizer, the proposed model indicated in bold, produced the best results. The proposed model has been examined over 73 epochs, yielding a testing accuracy of 98.85%, a specificity of 99.09%, a sensitivity ranging from 98.12 to 99.47%, a precision ranging from 98.5 to 99.47%, and an f1-score ranging from 98.34 to 99.47%.

Figure 14 presents the confusion matrices of the previous experiments. As shown, the rows represent the used optimizers (Adam, SGD, and RMSprop) and the columns represent the used models (MobileNetV1, DenseNet201, and InceptionV3 each combined with LSTM).

Fig. 14
figure 14

Confusion matrices of the experiments

By comparing the matrices, it can be observed that the proposed model, represented by the green matrix, has achieved the best values. 1312 correctly identified dry AMD test images, 1321 correctly classified wet AMD test images, and 1327 correctly classified images as normal.

The ROC curve of the proposed model is plotted in Fig. 15 and the values of AUC are very close to 1.00 for all classes. Hence, it proves that the proposed model performs well in the classification of AMD using OCT images.

Fig. 15
figure 15

ROC of the proposed model

7.2 Comparative analysis

A comparative analysis with previous researchers’ studies has been introduced in Table 5 to examine the robustness of the proposed model. As shown, these studies can be divided into four categories:

Table 5 Comparative analysis

The first category contains the studies that used the same working Kermany dataset with the same classes of normal, dry, and wet AMD. Sotoudeh-Paima et al. used FPN-VGG16 as a classifier with an accuracy of 93.4 ± 1.4 using the full Kermany dataset while the proposed model has achieved 98.85 accuracy using only 20,010 OCT images of the same dataset.

The second category contains the studies that used the same working Kermany dataset but with different AMD classes. Thomas et al. used multi-scale and multi-path CNN techniques with SVM as a classifier which resulted in an accuracy of 99.78, taking into consideration that they used this technique for AMD binary classification not for multi-classification, unlike the proposed model. He et al. used ResNet50 to classify the Kermany dataset into binary classes with an accuracy of 99.87. Yan et al. used ResNet34 to classify 51,140 OCT images of the Kermany dataset into normal, drusen, active CNV, and inactive CNV. This technique produced a maximum sensitivity of 96.5, which is approximately 3% lower than the proposed model sensitivity. In the Chen et al. study, they used the ResNet18 model to classify 46,321 OCT images into dry and wet AMD only with a maximum accuracy of 99.5.

The third category contains the studies that used different datasets with the same AMD proposed classes. In another study by Chen et al. study, they used multi-modality models to classify the ZJU dataset with a maximum accuracy of 96.08, which is approximately 2.5% lower than the proposed model accuracy. The last category contains the studies that used different datasets with AMD binary classification. As a result, the proposed classification model has achieved higher performance measures in the multi-classification of AMD.

8 Conclusion

In this study, a new methodology has been introduced using two sub-models: the MobileNetV1 and LSTM deep networks. The proposed methodology was used in the classification of normal, dry AMD, and wet AMD using Optical Coherence Tomography (OCT) images. The proposed methodology is based on three stages. (1) the data processing stage, (2) the image processing stage, and (3) the deep feature extraction and classification stage which has two models (i) a feature extraction model using MobileNet V1 and (ii) a classifier model which uses LSTM. A multi-classification with six separate trials has been employed with the proposed methodology. The proposed methodology has achieved higher accuracy when compared with the state-of-the-art models on AMD multi-classification using a small part of the Kermany dataset. This research proves that a dataset of small size around 20,010 images was sufficient to obtain high accuracy for the multi-classification of AMD. The proposed methodology has achieved 98.85% accuracy, 99.09% sensitivity, and 99.1% specificity. In subsequent work, it is planned to optimize the model's hyperparameters and focus on the progression of dry AMD to wet type.