Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier

Mohammed, Kamel K.; Hassanien, Aboul Ella; Afify, Heba M.

doi:10.1007/s00521-023-08607-9

Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier

Original Article
Open access
Published: 03 May 2023

Volume 35, pages 17415–17427, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier

Download PDF

Kamel K. Mohammed^1,4,
Aboul Ella Hassanien^2,4 &
Heba M. Afify ORCID: orcid.org/0000-0002-6279-0883^3,4

1527 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Acute lymphocytic leukemia (ALL) is a common serious cancer in white blood cells (WBC) that advances quickly and produces abnormal cells in the bone marrow. Cancerous cells associated with ALL lead to impairment of body systems. Microscopic examination of ALL in a blood sample is applied manually by hematologists with many defects. Computer-aided leukemia image detection is used to avoid human visual recognition and to provide a more accurate diagnosis. This paper employs the ensemble strategy to detect ALL cells versus normal WBCs using three stages automatically. Firstly, image pre-processing is applied to handle the unbalanced database through the oversampling process. Secondly, deep spatial features are generated using a convolution neural network (CNN). At the same time, the gated recurrent unit (GRU)-bidirectional long short-term memory (BiLSTM) architecture is utilized to extract long-distance dependent information features or temporal features to obtain active feature learning. Thirdly, a softmax function and the multiclass support vector machine (MSVM) classifier are used for the classification mission. The proposed strategy has the resilience to classify the C-NMC 2019 database into two categories by using splitting the entire dataset into 90% as training and 10% as testing datasets. The main motivation of this paper is the novelty of the proposed framework for the purposeful and accurate diagnosis of ALL images. The proposed CNN-GRU-BiLSTM-MSVM is simply stacked by existing tools. However, the empirical results on C-NMC 2019 database show that the proposed framework is useful to the ALL image recognition problem compared to previous works. The DenseNet-201 model yielded an F1-score of 96.23% and an accuracy of 96.29% using the MSVM classifier in the test dataset. The findings exhibited that the proposed strategy can be employed as a complementary diagnostic tool for ALL cells. Further, this proposed strategy will encourage researchers to augment the rare database, such as blood microscopic images by creating powerful applications in terms of combining machine learning with deep learning algorithms.

DeepLeuk: a convolutional neural network pre-trained model for microscopic cell images-Based leukemia Cancer analysis

Article 12 June 2024

Acute Lymphoblastic Leukemia Cells Image Analysis with Deep Bagging Ensemble Learning

A survey on automated detection and classification of acute leukemia and WBCs in microscopic blood cells

Article 18 January 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

There are four kinds of blood cancer [1] based on cell types such as acute leukemia (AL) or chronic leukemia (CL) and source types such as myeloid (M) or lymphoid (L). ALL [2] is severe leukemia caused by rapidly progressing malignant lymphoid and is fatal. Hematologists are recognized ALL using microscopic testing of blood samples through a WBC count [3]. It reveals small, spherical, homogeneous blast cells with sparse cytoplasm and nuclei having single nucleoli. However, this microscopic test is the initial manual screening for ALL that usually results in errors and changes in the diagnostic process due to visual homogeneity between normal and ALL cells. On the other hand, there is a painful surgery, such as bone marrow biopsy to diagnose ALL complex using procedures [4].

According to leukemia statistics [5], 6660 patients are expected to be diagnosed with ALL, and 1560 deaths are estimated from ALL in 2022. The majority of ALL cases are in youngsters, yet most of ALL deaths are in adults. As a result, researchers emphasized that treatment of ALL is not easy because of the risk of destroying the healthy WBC together with cancerous cells and thus affecting the patient's immune system [6]. To tackle the mentioned issues, the correct diagnosis of leukemia relies on non-invasive methods by analyzing ALL microscopic images using computational classifiers to increase the survival rate in affected patients [7].

With the growth of modern diagnostic technology, machine learning approaches are used in conjunction with deep learning approaches, which are less susceptible to errors in classifying ALL images. Therefore, an automated approach for leukemia detection has been developed to ease the burden of healthcare professionals in reading microscopic leukemia images and distinguishing normal WBC from ALL. Using the computer-aided leukemia approach, inexperienced healthcare workers may screen ALL with less workforce, lower costs, and time savings [7].

Machine learning approaches [8,9,10] are used for ALL prognoses based on image pre-processing and feature extraction. Among these approaches, the common effective features for ALL classification are color, texture, and statistical features, while the SVM classifier had superior accuracy in detecting ALL compared to other classifiers. However, ALL detection-based machine learning approaches are cumbersome, time-consuming, and require programmer expertise to select appropriate features. Sometimes, the selected features to describe ALL images are unsuitable for providing the preferable classifier model. Also, overfitting is more likely to occur in a classifier trained on a limited dataset, and the result may not be ideal.

Although studies [11, 12] used ALL detection-based CNN approaches to demonstrate that the ensemble model outperformed any single CNN classification model.

The first public database for leukemia is ALL-IDB [13], with two versions used for classifying ALL and normal microscopic images. Putzu et al. [14] demonstrated the machine learning approach based on ALL classification with an accuracy of 93% using ALL-IDB1.

The deep learning approach has been extended to employ swarm optimization for ALL differentiation with an accuracy of 90% [15].

Kokeb Dese et al. [16] developed a classification approach for four types of leukemia using SVM based on a real microscopic database with an accuracy of 97.69%,

Another microscopic cell database called C-NMC 2019 [17] is generic data based on unbalanced categories of normal and ALL images.

The ensemble learning method [18] and ResNeXt model [19] were proposed to classify ALL using C-NMC 2019 database with F1-score of 84% and 87.89%, respectively.

The CNN weighted ensemble [20] is employed to refine the classification of ALL extracted from the C-NMC 2019 database to reach an F1-score of 88.6%.

The shortcomings of the previous researchers are summarized in the following:

(1)
CNN models need large data for the training process, and the ALL database does not contain large data. Therefore, there is difficulty in applying CNN models to ALL databases.
(2)
Low performance using machine learning algorithms on ALL images.
(3)
Unbalanced classes of normal and abnormal blood images in the database.
(4)
Variation in model performance for classifying ALL images due to differences in image characteristics, image division, and image pre-processing as well as the limited number of ALL images. Therefore, the comparative study between previous and existing models is complex.
(5)
Most previous works could only be measured by the F1-score ratio to evaluate the ALL diagnostic model, leading to neglect of other performance metrics.
(6)
Combined application of deep learning and machine learning methods in the field of ALL diagnosis has not achieved much so far.
(7)
Public databases for ALL images are lower than those of other cancer images.
(8)
Accuracy of the ALL recognition model is unsatisfactory.

There is still a scientific call to innovate ALL detection algorithms based on the leverage of ensemble learning methods and feature extraction in machine learning models. This research focused on solving these shortcomings in the previous works regarding ALL images.

In this aspect, the proposed ensemble model is designed for ALL detection using deep features and MSVM as a classifier on the C-NMC 2019 database. Deep spatial features or “low-level spatial features” are derived from CNN and temporal features or “high-level temporal features” are derived from BiLSTM [21], and GRU [22] to distinguish between healthy and ALL images accurately.

2 Related work

There are some trials to present CNN automated approaches to handle the classification of ALL and normal cells on the recent C-NMC 2019 database [17], as mentioned in Table 1. Significantly, the F1-score is commonly used to evaluate the CNN models applied to the imbalanced C-NMC 2019 unbalanced database.

Table 1 Deep learning models for ALL classification on the C-NMC 2019 database

Full size table

Recently, Chen et al. [23] applied the Resnet101-9 ensemble model to the C-NMC 2019 database with an accuracy of 85.11% and an F1- score of 88.94% for identifying ALL images.

The weighted ensemble CNN model was developed by C. Mondal et al. [24] for ALL recognition under the same C-NMC 2019 database with an F1-score of 89.7% and an accuracy of 88.3%.

Marzahl et al. [25] explored a ResNet18 model for classifying ALL images extracted from the C-NMC 2019 database using normalization and augmentation methods with an F1-score of 87.5%.

Another framework [26] adapted the different versions of ResNeXt model to achieve good performance with an F1-score of 85.7%. The MobileNet-V2 model is applied to C-NMC 2019 database [27] to achieve an F1-score of 87.0%.

Pan at el. [28] proposed fine-tuned ResNet to train a C-NMC 2019 database for normal and ALL image classification.

The highest F1-score is achieved using the heterogeneity loss function [29] and NasNetLarge architecture [30] on the C-NMC 2019 database to discern the ALL microscopic images.

Moreover, performance comparisons for ALL recognition with C-NMC 2019 database are inapplicable due to the variance in image size, image division, evaluation parameters, image pre-processing, and only a small percentage of publicly available datasets. The problem with using CNN models is that their good performance depends on the large dataset. Still, in reality, it is unavailable to obtain a large public dataset for ALL images.

Therefore, the classification of ALL images within the C-NMC 2019 database created a major challenge for the early-stage diagnosis with better performance.

Considering machine learning approaches on ALL images, the K nearest neighbor (KNN) clustering is used to detect normal and blast cells on 108 blood images [31]. Also, the SVM classifier was performed to differentiate normal cells from blast cells on 958 microscopic images [32]. Texture features have been proposed for leukemia classification using SVM with a Gaussian radial basis kernel to achieve good performance [14]. It was observed that the local binary pattern and geometric texture features are suggested for ALL detection using the SVM classifier [33]. For the ALL classification task, the SVM classifier is commonly used to classify the blood images into normal and ALL classes [34]. Recently, the MSVM is also used for ALL classification with 94.6% accuracy [9]. The shortcoming of the previous works using machine learning approaches on ALL images is based on a small dataset that leads to overfitting in the results of ALL classification models. Additionally, the manual features extracted from ALL images are inappropriate for categorizing ALL images. Furthermore, the machine learning approaches are implemented with limited accuracy to diagnose ALL images. Currently, there is a demand to produce automated features and a large dataset to accurately ALL diagnose.

3 Methodology

3.1 Proposed framework

The proposed ensemble framework for ALL image classification from the C-NMC 2019 database is shown in Fig. 1. This framework consists of four phases: pre-processing, CNN models to extract spatial features from image data, these features are sent to GRU-BiLSTM architecture for sequence learning, and then classifier. Image pre-processing is annotated using oversampling and splitting into training ad test data. The GRU-BiLSTM architecture consists of GRU with 500 layers, two BiLSTM with 500 layers, and a dropout layer. It was observed that the dropout layers are applied to relieve overfitting by selecting fewer features. The CNN models are five different networks such as ResNet-101, GoogleNet, SqueezeNet, DenseNet-201, and MobileNetV2. Adopting the deep features and test data would lead to classification based on MSVM to obtain the final better performance. For the classification method, two techniques were implemented to classify the pre-processed images: (1) MSVM classifier is used, and (2) fully connected layer (FCL), and softmax layer are used as a classifier.

3.2 Dataset description

The publicly C-NMC 2019 dataset [17] is extracted from the cancer imaging archive (TCIA) used in the ISBI competition to identify the normal B-lymphoid precursors from ALL cells in 12,528 microscopic, including 8491 ALL images and 4037 normal images.

This database originates from 101 cases, including 60 infected cases with ALL and 41 normal cases.

This paper uses the training and preliminary testing dataset, but the final testing dataset is not used. The sample images produced from the C-NMC 2019 dataset are shown in Fig. 2. Each microscopic image had 450 × 450 pixels with two imbalanced classes [35].

3.3 Data pre-processing

Remarkably, the microscopic images from the C-NMC 2019 dataset are unbalanced, with ALL images being more than normal in the training process. This unbalanced class distribution issue created a biased classifier toward ALL classes and overfitting.

The original microscopic images have been pre-processed using oversampling [36] to remove unbalanced classes in the training dataset before applying them to the network. In this paper, the images in the normal cells were increased to equal 8491 as in the number of ALL images. Thereby, 16,982 images in both classes were divided using a 9:1 split rate for the trained and tested images, respectively. The training dataset consists of 7642 images for both normal and ALL images, as well as the testing dataset consists of 849 images for both normal and ALL images.

3.4 Spatial feature extraction based on CNN, temporal feature extraction based on GRU-BiLSTM architecture, and softmax layer as a classifier

A CNN can learn to acquire low-level spatial features but cannot learn sequential correlations or acquire high-level temporal features. On the other hand, a recurrent neural network (RNN) is a type of feed-forward neural network that is specifically designed for sequence learning or the realization of temporal features. RNNs be able to handle time-series data by applying repeated hidden states, the activation of which at each time stage in a sequence is reliant on the activation value of the prior time stage (s) [37]. Moreover, it has been discovered in several studies that the standard RNN faces the vanishing gradient problem when the data contain long-term correlations [38]. To address this problem, two prominent types of RNNs have been proposed including LSTM [39] and GRU [40]. Noteworthy, many studies have revealed that the LSTM is more successful than the GRU. Although GRU [41] is a revised pattern of LSTM, the working proceedings are completely identical. GRU is faster than LSTM because GRU relies on less memory and fewer training parameters. In contrast, the LSTM suffered from the time-consuming, especially for large data. Notably, the development of LSTM is called BiLSTM [21]. Therefore, the hybrid technique based on two of the most favorable networks is introduced to merge the impact of these two techniques into a single one. The primary objective of this paper is to prove that the mixed strength of GRU and BiLSTM is used to test ALL image classification. This work extracted spatial features from the ALL training images using CNN to decrease CNN network training complexity based on reducing parameter sizes and input dimensions for the CNN network. A combined CNN-GRU-BiLSTM-based deep learning strategy is presented to extract spatial–temporal features and learn the dependency between features. Finally, a softmax function is used for classification as shown in Fig. 3. All the CNN features generated through these five CNN models were established to produce the feature vectors in each model, as shown in Table 2. This work extracted spatiotemporal features from the training images using six layers, including CNN, GRU with 500 layers, BiLSTM with 500 layers, dropout layer 0.25, BiLSTM with 500 layers, and dropout layer 0.25, as mentioned in Fig. 1. After that, all the CNN features extracted through these five CNN models were established to generate the feature vectors in each model, as shown in Table 2.

Table 2 Description of utilized features according to five CNN models and their corresponding feature vectors for each image

Full size table

This study used five pre-trained CNN models as below:

ResNet-101 architecture [42] consists of 101 layers, but it suffers from complexity and time cost.
GoogleNet architecture [43] consists of 22 layers.
SqueezeNet architecture [44] consists of 18 layers that are based on a smaller network with few parameters
DenseNet201 architecture [45] consists of 210 layers. The power of DenseNet architecture is more than ResNet architecture, but it needs more memory in the training step.
MobileNetV2 architecture [46] is focused on an inverted residual model to filter the effective features

The GRU [22] algorithm is based on the RNN [47]. In RNN, gradient problems and the time computational are produced because it can't learn long-term dependencies. It consists of two outputs i.e., h_t and o_t as shown in the following Eqs. (1,2):

$$h_{t} = f\left( {W_{h} h_{t - 1} + W_{{{\text{hx}}}} x_{t} + b_{h} } \right)$$

(1)

$$o_{t} = f\left( {W_{0} h_{t} + b_{0} } \right)$$

(2)

where f is the activation function on all nodes in the RRN network, h _t is a hidden state, t is the time step, W is weight and b is bias. To solve the time computational in RNN, GRU is used to reduce the features extracted from CNN models by using the update gate (z_t), reset gate (r_t), and current memory gate ($\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{h}_{t}$) as the following Eqs. (3, 4 and 5).

$$z_{t} = \sigma \left( {W_{XZ } X_{t } + W_{0Z} O_{t - 1 } + b_{z} } \right)$$

(3)

$$r_{t} = \sigma \left( {W_{Xr } X_{t } + W_{0r} O_{t - 1 } + b_{r} } \right)$$

(4)

$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{h}_{t} = f\left( {W_{h} h_{t} + W_{hx} \left( {r_{t} \odot h_{t - 1} } \right) + b_{h} } \right)$$

(5)

where $W_{{\text{XZ }}} ,{ }W_{{\text{Xr }}} \;{\text{and}} \;W_{h}$ are weights of the input vector, while $W_{0Z} ,{ }W_{0r} ,W_{hx}$ are weights of the preceding time step. bz, br, and b_h are bias. The update gate implies the prior information must be transferred to the future. The function of the update gate in GRU is similar to the function of the output gate in LSTM. The reset gate implies the prior information should be forgotten. The function of the reset gate in GRU is similar to the function of the input gate and the forget gate in an LSTM. The function of the current memory gate is to reduce the impact that prior information has on the present information that is being transferred to the future.

The output of the CRU layer is used for inputting BiLSTM [48] based on forwarding LSTM and backward LSTM to recognize the features in both the forward and backward directions. The LSTM is based on long-term memory using three gates, including input (i_t), forget (f_t), and output gate (g_t) as in Eqs. (6, 7 and 8).

$$i_{t} = \sigma \left( {W_{Xi } X_{t } + W_{hi} h_{t - 1 } + b_{i} } \right)$$

(6)

$$f_{t} = \sigma \left( {W_{Xf } X_{t } + W_{hf} h_{t - 1 } + b_{f} } \right)$$

(7)

$$g_{t} = \tanh \left( {W_{Xg } X_{t } + W_{hg} h_{t - 1 } + b_{i} } \right)$$

(8)

where ${W}_{Xf }, {W}_{Xf }\, {\mathrm\,{and} W}_{xg}$ are weights of the input vector, while ${W}_{hi}, {W}_{hf},{W}_{hg}$ are weights of the preceding time step. bi and bf are biased.

Finally, 1000 systematic features were generated from each image using six combined layers, which were stored in a feature vector.

The proposed GRU-BiLSTM architecture for deep learning predictive network includes four blocks as manifested in Fig. 3. The first block includes deep features from the CNN network. The second block consists of six layers including temporal features based on GRU-BiLSTM architecture. The six layers are sequences input layer, GRU with 500 layers, BiLSTM with 500 layers, dropout layer 0.25, BiLSTM with 500 layers, and dropout layer 0.25. The third block includes the classification using an FCL with a softmax layer. The last block is the output layer that is used to measure the classifier's performance.

3.5 Feature extraction based on CNN-BiLSTM architecture, and MSVM as a classifier

All 1000 feature vectors extracted from CNN-BiLSTM architecture are used as input to the MSVM classifier [49] as shown in Fig. 4. The first and fourth blocks in Fig. 4 are the same as in Fig. 3.

For Fig. 4, the second block consists of a sequences input layer, GRU with 500 layers, BiLSTM with 500 layers, dropout layer 0.25, and BiLSTM with 500 layers. The third block includes the MSVM classifier.

This classifier is based on the multiclass binary learning model's Vapnik– Chervonenkis (VC) dimensional. In this paper, this MSVM classifier is employed a one-against-all approach to classify data points from $n$ classes in the dataset. The kernel function [49] is used in this classifier. Also, the MSVM classifier showed good results on ALL images in previous work [50].

4 Empirical results

The proposed model is executed in the MATLAB 2020a language with a computer with 750 GB RAM and Intel® Core™ i9 processors. The proposed model has applied the C-NMC 2019 dataset [17], which contains 8491 ALL images and 4037 healthy images. To alleviate the class imbalance problem in the C-NMC 2019 database, the proposed model initiated the oversampling process to balance between normal and ALL cells in the training phases. In this paper, the training dataset had 7642 images for ALL cells and 7642 images for healthy cells, while the testing dataset had 849 images for ALL cells and 849 images for healthy cells.

The main objective is to identify CNN models compatible with BiLSTM [21] and GRU to provide good performance for ALL image classification.

The results of the proposed model were divided into two groups. Group (1) used five CNN models, two BiLSTM, and GRU as feature extraction as well as MSVM as a classifier. Group (2) used five CNN models, two BiLSTM, and GRU as feature extraction as well as FCL, and softmax layer as a classifier. The proposed model was exploited in the second group using the FCL and softmax layer instead of the MSVM classifier.

4.1 Performance metrics

The quality of both proposed models to classify ALL images and normal images is measured by five metrics. These metrics are accuracy, sensitivity, specificity, precision, and F1-score, as presented in Eqs. (9, 10, 11, 12 and 13). Accuracy refers to ALL and healthy images correctly categorized, divided by the total number of images in the test set. Sensitivity refers to correctly categorized ALL images divided by all actual ALL images. Specificity refers to correctly categorized healthy images divided by all actual healthy images. Precision refers to accuracy in categorizing an image as ALL. F1-score refers to the average sensitivity and precision.

$${\text{Accuracy}} = \frac{{\mathop \sum \nolimits_{i = 1 }^{c} \frac{{{\text{TP}}_{i} + {\text{TN}}_{i} }}{{{\text{TP}}_{i} + {\text{FN}}_{i} + {\text{FP}}_{i} + {\text{TN}}_{i} }} }}{c}$$

(9)

$${\text{Sensitivity }} = \frac{{\mathop \sum \nolimits_{i = 1 }^{c} \frac{{{\text{TP}}_{i} }}{{{\text{TP}}_{i} + {\text{FN}}_{i} }} }}{c}$$

(10)

$${\text{Specificity}} = \frac{{\mathop \sum \nolimits_{i = 1 }^{c} \frac{{{\text{TN}}_{i} }}{{{\text{TN}}_{i} + {\text{FP}}_{i} }} }}{c}$$

(11)

$${\text{Precision }} = \frac{{\mathop \sum \nolimits_{i = 1 }^{c} \frac{{{\text{Tp}}_{i} }}{{{\text{Tp}}_{i} + {\text{FP}}_{i} }} }}{c}$$

(12)

$${\text{F}}1 - {\text{score}} = 2 \frac{{{\text{Precision*}}\;{\text{Sensitivity }}}}{{{\text{Precision}} + \;{\text{Sensitivity}}}}$$

(13)

where c_i is the ith class, TP indicates the amount of correctly categorized ALL images, TN indicates the amount of correctly categorized healthy images, FN indicates the amount of ALL images wrongly categorized as healthy images and FP indicates the number of healthy images wrongly categorized as ALL images.

Also, the confusion matrix is employed to formulate the findings of the test image on both proposed models.

4.2 Results of the CNN models, BiLSTM, GRU with MSVM classifier

The first proposed framework extracted the most productive features from CNN models, BiLSTM, and GRU. The features of the three methods were combined into a single vector, yielding 1000 relevant features per image, and the features were fed into the MSVM classifier.

The performance of five CNN models, namely ResNet-101, GoogleNet, SqueezeNet, DenseNet-201, and MobileNetV2, for the C-NMC 2019 dataset using deep features from BiLSTM, GRU, and MSVM classifier is discussed. Table 3 summarizes the performance of the CNN models with the MSVM classifier on the C-NMC 2019 dataset using 849 test images for both ALL cells and normal cells.

Table 3 The performance of the CNN models with MSVM classifier on the C-NMC 2019 dataset

Full size table

The DenseNet-201 model outperformed the other models in evaluating the C-NMC 2019 dataset for ALL detection, achieving 96.29% accuracy, 94.58% sensitivity, 98% specificity, 96.23% F1- score, and 97.93% precision. The MobileNetV2 model obtained results similar to the results of the DenseNet-201 model for all the metrics, achieving 96% accuracy, 94.23% sensitivity, 97.76% specificity, 95.92% F1-score, and 97.58% precision.

As for the ResNet-101 model, it obtained 95.76% accuracy, 93.99% sensitivity, 97.53% specificity, 95.68% F1-score and 97.44% precision. By using the MSVM classifier, it can be seen that the SqueezeNet model achieved fewer results than the other models for the C-NMC 2019 dataset. Figure 5 displays the confusion matrix representing the five CNN models for the classification of ALL images from the C-NMC 2019 dataset.

The ensemble model with the accurate findings was developed by combining deep learning models, BiLSTM [21], and GRU [22] as favorable feature extraction and MSVM as a classifier.

This proposed ensemble model is feasible to classify ALL images using the blend of CNN models and machine learning techniques.

4.3 Results of the CNN models, BiLSTM, GRU without MSVM classifier

The second proposed framework extracted the most productive features from CNN models, BiLSTM, and GRU. The features of the three methods were combined into a single vector, yielding 1000 features per image. Then, the features were fed into FCL, the basis of the transfer learning, and the FCL output was fed to the softmax activation function. Table 4 summarizes the performance of the CNN models without the MSVM classifier on the C-NMC 2019 dataset using 849 test images for both ALL cells and normal cells.

Table 4 The performance of the CNN models without MSVM classifier on the C-NMC 2019 dataset

Full size table

The MobileNetV2 model obtained the best results for all the metrics, achieving 92.41% accuracy, 89.75% sensitivity, 95.06% specificity, 92.20% F1-score, and 94.78% precision. The DenseNet201 model obtained results similar to the results of the MobileNetV2 model for all the metrics, achieving 92.35% accuracy, 87.51% sensitivity, 97.18% specificity, 91.96% F1-score, and 96.87% precision. By using FCL and softmax classifier, it can be seen that the SqueezeNet model achieved fewer results than the other models for the C-NMC 2019 dataset.

The confusion matrix shown in Fig. 6 illustrates the preferable performance of the MobileNetV2 model without an MSVM classifier for the classification of ALL images from the C-NMC 2019 dataset.

As shown in Fig. 7, the training progress of the MobileNetV2 model achieved a high accuracy of 92.41% by using the Adam training function, 30 epochs, 0.0001 learning rate, 15 batch sizes, and 28,550 iterations.

After testing, the best performance of ALL images classifier is the DenseNet-201 model by applying the C-NMC 2019 dataset on the first proposed framework based on the combination of CNN, BiLSTM, and GRU [22] favorable feature extraction with MSVM as a classifier.

4.4 Comparison with previous works on the C-NMC 2019 dataset

As mentioned in Table 1, some previous work used the C-NMC 2019 database to test their method to identify ALL cells and healthy cells. The best F1-score value reached 95.2% on the C-NMC 2019 dataset by Goswami et al. [29]. Previously, Ullah et al. [4] proposed a recent CNN architecture for ALL recognition with an accuracy of 91.1% over the C-NMC 2019 dataset.

Also, the aggregation-based deep learning [30] presented the ensemble model of NASNetLarge and VGG19 to classify ALL cells with an accuracy of 96.5%, a sensitivity of 95.9, a specificity of 96.9%, and an F1-score of 94.6% on the C-NMC 2019 database.

On the other hand, the proposed model proved that the DenseNet-201 model achieved the effective classification of ALL cells with an accuracy of 96.29%, a sensitivity of 94.58%, a specificity of 98%, an F1-score of 96.23%, and a precision of 97.93% on the C-NMC 2019 database.

The limitation of previous works on the C-NMC 2019 database is based on the variance in model performances for the classification of ALL images. Some researchers have only shown the value of the F1-score without accounting for other metrics.

The challenges of using the C-NMC 2019 database are based on highly unbalanced training images, lower intra-class contrast, and high optical similarity between the two classes [51]. Therefore, a poor rating of the previous work on the C-NMC 2019 database is provided for ALL images compared to the proposed ensemble model.

The comparison table of the proposed model with existing models in the C-NMC 2019 database is provided in Table 5. The previous works used the C-NMC 2019 dataset [17], including the training dataset, preliminary test, and final test set. But, the proposed model used the C-NMC 2019 dataset [17], which included only the training dataset and preliminary test. This study is an improvement of similar studies recently developed in [29, 30] and is better than the diagnostic accuracy of hematologists.

Table 5 Performance comparisons between the proposed method and other previous work on the C-NMC 2019 database

Full size table

5 Discussion

This study introduced an innovative method for the classification of ALL and normal microscopic images, resulting in automated proceedings to aid the development of the ensemble model. The available data for microscopic cells creates comparable algorithms that help hematologists to diagnose ALL cell images [52]. This paper uses the C-NMC 2019 dataset [17] to train and test the propounded classifier for ALL images. After the training process through 7642 images in each class, the test is conducted using only 849 images in each class.

The empirical results emphasized that the proposed model with an MSVM classifier is able to efficiently detect ALL images instead of the proposed model without an MSVM classifier. Regarding the normal/ ALL classification in the C-NMC 2019 dataset, the DenseNet-201 model promoted the best accuracy when using BiLSTM and GRU as feature extraction with the MSVM classifier. The F1-score for the DenseNet-201 model is 96.23%, which achieved the best performance compared to the previous works, as shown in Table 1.

Given the findings, previous work for ALL classification does not guide such significant enhancements in the performance of the proposed model.

Also, the proposed framework suggested the new combined features extracted from the CNN, BiLSTM, and GRU, as detailed in Fig. 1. It can be utilized to elicit the features in many medical image classifications. It means that selecting the most important features will support the highest level of accuracy in a classification task.

Deep learning approaches are becoming popular in the medical image diagnostic system, based only on the large training dataset [53]. Therefore, machine learning approaches are limited in the large training dataset. However, the CNN models allow features to be extracted from the original image without any preliminary image analysis and eliminate any bias during the feature extraction process [54].

Consequently, this study will emphasize that the large dataset is applied to CNN models and machine learning algorithms.

So far, there is no study on the hybrid model in diagnosing ALL images. Therefore, this paper was conducted according to the new application to handle the hybrid model to improve the diagnosis of ALL images.

This proposed model will be a supportive diagnostic tool for ALL image classification on the C-NMC 2019 unbalanced dataset. To ensure credibility in future, the proposed results should be validated by a hematologist.

6 Conclusion and future works

ALL cell-related leukemia causes organ dysfunction in patients. It is not easy to interpret ALL microscopic images with manual techniques. This paper has developed all microscopic image-based automated ensemble models to reach superior performance. To build the ensemble model, the output of the deeply embedded features is fed to the MSVM classifier to determine the preferable model. It was experimentally validated that the deep features of CNN, BiLSTM, and GRU are closely related to the level of best performance in ALL image classification. The proposed ensemble model outperforms the best-performing single model with an F1-score of 96.23% using the MSVM classifier. This model is used as a heuristic approach to deal with the flawed database dilemma and choose the right features.

Further studies are desired to verify the accuracy and sensibility of this proposed model by applying it to different leukemia image databases.

Data availability

C-NMC 2019 dataset: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=52758223

References

Saritha M, Prakash BB, Sukesh K, Shrinivas B (2016) Detection of blood cancer in microscopic images of human blood samples: a review. Int Conf Electr Electron Optim Tech ICEEOT 2016:596–600
Google Scholar
Redaelli A, Laskin BL, Stephens JM, Botteman MF, Pashos CL (2005) A systematic literature review of the clinical and epidemiological burden of acute lymphoblastic leukaemia (ALL). Eur J Cancer Care Engl 14:53–62
Article Google Scholar
Fauziah K, Anton SP, Abdullah A (2012) Detection of leukemia in human blood sample based on microscopic images: a study. J Theor Appl Inf Technol 46:579–586
Google Scholar
Ullah MZ, Zheng Y, Song J, Aslam S, Xu C, Kiazolu GD, Wang L (2021) An attention-based convolutional neural network for acute lymphoblastic leukemia classification. Appl Sci 11:10662
Article Google Scholar
Siegel RL, Miller KD, Fuchs HE, Jemal A (2022) Cancer statistics, 2022. CA Cancer J Clin 72(1):7–33
Article Google Scholar
Huh YO, Ibrahim S (2000) Immunophenotypes in adult acute lymphocytic leukemia: Role of flow cytometry in diagnosis and monitoring of disease. Hematol Oncol Clin North Am 14:1251–1265
Article Google Scholar
Sajana T, Maguluri LP, Syamala M and Kumari CU (2020). Classification of leukemia patients with different clinical presentation of blood cells. Mater Today 1–7
Mishra S, Majhi B, Sa PK (2019) Texture feature based classification on microscopic blood smear for acute lymphoblastic leukemia detection. Biomed Signal Process Control 47:303–311
Article Google Scholar
Gebremeskel KD, Kwa TC, Raj KH, Zewdie GA, Shenkute TY, Maleko WA (2021) Automatic early detection and classification of leukemia from microscopic blood image. Abyssinia J Eng Comput 1(1):1–10
Google Scholar
Viswanathan P (2015) Fuzzy c means detection of leukemia based on morphological contour segmentation. Procedia Comput Sci 58:84–90
Article Google Scholar
Ding Y, Yang Y, Cui Y (2019) Deep learning for classifying of white blood cancer. In: ISBI 2019 C-NMC challenge: classification in cancer cell imaging. Springer, pp 33–41
Shi T, Wu L, Zhong C, Wang R, Zheng W (2019) Ensemble convolutional neural networks for cell classification in microscopic images. In: ISBI 2019 C-NMC challenge: classification in cancer cell imaging. Springer, pp 43–51
Donida Labati R, Piuri V, Scotti F (2011) ALL-IDB: the acute lymphoblastic leukemia image database for image processing. In: Macq B, Schelkens P (eds) Proceedings of the 18th IEEE ICIP international conference on image processing, September 11–14. Brussels, Belgium. IEEE Publisher, pp 2045–8
Putzu L, Caocci G, Di Ruberto C (2014) Leucocyte classification for leukaemia detection using image processing techniques. Artif Intell Med 62:179–191
Article Google Scholar
Sahlol AT, Kollmannsberger P, Ewees AA (2020) Efficient classification of white blood cell leukemia with improved swarm optimization of deep features. Sci Rep 10:1–11
Article Google Scholar
Dese K et al (2021) Accurate machine-learning-based classification of leukemia from blood smear images. Clin Lymphoma Myeloma Leuk 21(11):E903–E914
Article Google Scholar
Gupta A, Gupta R, Gehlot S, Mourya S. (2019) Classification of normal vs malignant cells in B-ALL white blood cancer microscopic images. In: IEEE international symposium on biomedical imaging (ISBI)-2019 challenges internet
Liu Y, Long F (2019) Acute lymphoblastic leukemia cells image analysis with deep bagging ensemble learning. ISBI 2019 C-NMC challenge: classification in cancer cell imaging, pp 113–121
Prellberg J, Kramer O (2019) Acute lymphoblastic leukemia classification from microscopic images using convolutional neural networks. In: ISBI 2019 C-NMC challenge: classification in cancer cell imaging, pp 53–61
Mondal C, Hasan MK, Jawad MT, Dutta A, Islam MR, Awal MA, Ahmad M, Alyami SA, Ali Moni M (2021) Acute lymphoblastic leukemia detection from microscopic images using weighted ensemble of convolutional neural networks, pp 1–31
Bin Y, Yang Y, Shen F, et al (2019) Describing video with attention-based bidirectional LSTM. IEEE Trans Cyber 7:1–11
Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: Proceedings of the 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), Boston, MA, USA, 6–9 August 2017, pp 1597–1600
Chen YM, Chou FI, Ho WH, Tsai JT (2021) Classifying microscopic images as acute lymphoblastic leukemia by Resnet ensemble model and Taguchi method. BMC Bioinf 22:615
Article Google Scholar
Mondal C, Hasan K, Ahmad M, Awal A, Jawad T, Dutta A, Islam R, Moni MA (2021) Ensemble of convolutional neural networks to diagnose acute lymphoblastic leukemia from microscopic images. Inform Med Unlock 27:100794
Article Google Scholar
Marzahl C, Aubreville M, Voigt J, Maier A (2019) Classification of leukemic b-lymphoblast cells from blood smear microscopic images with an attention-based deep learning method and advanced augmentation techniques. In: Gupta A, Gupta R (eds) ISBI 2019 C-NMC challenge: classification in cancer cell imaging. Springer Nature Singapore Pte Ltd, pp 13–22
Kulhalli R, Savadikar C, Garware B (2019) Toward automated classification of b-acute lymphoblastic leukemia. In: Gupta A, Gupta R (eds) ISBI 2019 C-NMC challenge: classification in cancer cell imaging. Springer Nature Singapore Pte Ltd, pp 63–72
Verma E, Singh V (2019) ISBI challenge 2019: convolution neural networks for B-ALL cell classification. In: Gupta A, Gupta R (eds) ISBI 2019 C-NMC challenge: classification in cancer cell imaging. Springer Nature Singapore Pte Ltd, pp 131–139
Pan Y, Liu M, Xia Y, Shen D (2019) Neighborhood-correction algorithm for classification of normal and malignant cells. In: Gupta A, Gupta R (eds) ISBI 2019 C-NMC challenge: classification in cancer cell imaging. Springer Nature Singapore Pte Ltd, pp 73–82
Goswami S, Mehta S, Sahrawat D, Gupta A, Gupta R (2020) Heterogeneity loss to handle intersubject and intrasubject variability in cancer 2003:03295
Google Scholar
Kasani PH, Park SW, Jang JW (2020) An aggregated-based deep learning method for leukemic B-lymphoblast classification. Diagnostics 10:1064
Article Google Scholar
Joshi MD, Karode AH, Suralkar S (2013) White blood cells segmentation and classification to detect acute leukemia. Int J Emerg Trends Technol Comput Sci (IJETTCS) 2:147–151
Amin MM, Kermani S, Talebi A, Oghli MG (2015) Recognition of acute lymphoblastic leukemia cells in microscopic images using k-means clustering and support vector machine classifier. J Med Signals Sens 5:49
Article Google Scholar
Singhal V, Singh P (2014) Local binary pattern for automatic detection of acute lymphoblastic leukemia. In: Proceedings of 2014 twentieth national conference on communications (NCC), Kanpur, India, 28 February–2 March 2014, pp 1–5
Karthikeyan T, Poornima N (2017) Microscopic image segmentation using fuzzy c means for leukemia diagnosis. Int J Adv Res Sci Eng Technol 4:3136–3142
Google Scholar
C-NMC 2019 dataset: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=52758223
Hasan MK, Jawad MT, Hasan KN, Partha SB, Al Masba MM, Saha S, Moni MA (2021) COVID-19 identification from volumetric chest CT scans using a progressively resized 3D-CNN incorporating segmentation, augmentation, and class-rebalancing. Inf Med Unlocked. 26:100709
Article Google Scholar
Donahue J, et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CG (2018) Videolstm convolves attends and flows for action recognition. Comput Vis Image Underst 166:41–50
Article Google Scholar
Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2018) Action recognition in video sequences using deep Bi-directional LSTM with CNN features. IEEE Access 6:1155–1166
Article Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modelling. arXiv preprint arXiv:1412.3555
Ranjit S, Shrestha S, Subedi S, Shakya S (2018) Comparison of algorithms in foreign exchange rate prediction. In: Proceedings of the 2018 IEEE 3rd international conference on computing, communication and security (ICCCS). IEEE, pp 9–13
He K, Zhang Z, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of computer vision and pattern recognition (CVPR), pp 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. Preprint, submitted November 4, 2016
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of computer vision and pattern recognition (CVPR), pp 2261–2269
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: Proceedings of international conference on machine learning, pp 2342–2350
Kim J, Moon N (2019) BiLSTM model based on multivariate time series data in multiple field for forecasting trading area. J Ambient Intell Hum Comput 1–10
Pradhan D, Sahoo B, Misra BB, Padhy S (2020) A multiclass SVM classifier with teaching learning based feature subset selection for enzyme subclass classification. Appl Soft Comput 96:106664
Article Google Scholar
Mirmohammadi P, Rasooli A, Ashtiyani M, Amin MM, Deevband MR (2018) Automatic recognition of acute lymphoblastic leukemia using multi-SVM classifier. Curr Sci 115:1512–1518
Article Google Scholar
Gupta R, Gehlot S, Gupta A (2022) C-NMC: B-lineage acute lymphoblastic leukaemia: a blood cancer dataset. Med Eng Phys 103:103793
Article Google Scholar
Faivdullah L, Azahar F, Htike ZZ, Naing WYN (2015) Leukemia detection from blood smears. J Med Bioeng 4:488–491
Google Scholar
Manescu P, Narayanan P, Bendkowski C et al (2023) Detection of acute promyelocytic leukemia in peripheral blood and bone marrow with annotation-free deep learning. Sci Rep 13:2562
Article Google Scholar
Devi TG, Patil N, Rai S, Philipose CS (2023) Gaussian blurring technique for detecting and classifying acute lymphoblastic leukemia cancer cells from microscopic biopsy images. Life 13:348
Article Google Scholar

Download references

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and Affiliations

Center for Virus Research and Studies, Al Azhar University, Cairo, Egypt
Kamel K. Mohammed
Faculty of Computers and AI, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Systems and Biomedical Engineering Department, Higher Institute of Engineering, Shorouk Academy, Al Shorouk City, Cairo, Egypt
Heba M. Afify
Scientific Research Group in Egypt (SRGE), Cairo, Egypt
Kamel K. Mohammed, Aboul Ella Hassanien & Heba M. Afify

Authors

Kamel K. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Aboul Ella Hassanien
View author publications
You can also search for this author in PubMed Google Scholar
Heba M. Afify
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heba M. Afify.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mohammed, K.K., Hassanien, A.E. & Afify, H.M. Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier. Neural Comput & Applic 35, 17415–17427 (2023). https://doi.org/10.1007/s00521-023-08607-9

Download citation

Received: 08 July 2022
Accepted: 11 April 2023
Published: 03 May 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-023-08607-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier

Abstract

Similar content being viewed by others

DeepLeuk: a convolutional neural network pre-trained model for microscopic cell images-Based leukemia Cancer analysis

Acute Lymphoblastic Leukemia Cells Image Analysis with Deep Bagging Ensemble Learning

A survey on automated detection and classification of acute leukemia and WBCs in microscopic blood cells

1 Introduction

2 Related work