Introduction

Blood is the main constituent of the human body composed of 55% of liquid called plasma [1] to flow freely through a blood vessel. The main function of plasma is to carries nutrients, proteins, and hormones to the needy parts and remove waste material from the human body. According to color, shape, size, texture, and composition cellular components of blood are divided into three basic cells, i.e. red blood cells (RBCs) (erythrocytes) [2], WBCs (leukocytes) [3], and platelets (thrombocytes) [4]. RBCs are the main subsequent of blood samples because hemoglobin is its main component which gives a red color to blood and its core purpose is to deliver oxygen to various organs of the human body. When some blood cells decrease, less oxygen is carried by the blood which causes fatigue and weakness. Normally, the range of RBCs in the blood is about 4 to 6 million per microliter and make 40–45% of total blood volume [5, 6]. WBCs fight in counter to germs and prevent a human from their infections therefore also known as “Defender cells” and have ranged from 4500 to 11,000 per microliter of blood [7, 8]. WBCs make only 1% of blood volume but a little change causes a major difference because the human immunity system depends upon WBCs. Platelets are responsible for the clotting of blood in case of injury and have ranged from 150,000 to 450,000 per microliter [9]. As the blood cells are in massive number so the excessive and deficient amount of any blood cells cause different health problems such as leukemia [10], sickle cell disease [11], thalassemia [12], and anemia [13], etc. Leukemia occurs due to a large number of WBCs in the immune system, which covers out the RBCs and platelets of blood that need to be healthy. Based on its development speed and its effect, doctors classify it into four types which include acute leukemia, myelogenous leukemia, lymphocytic leukemia, and chronic leukemia. Leukemia is one most common disease which may lead to death. To overcome the severity of this disease, it is necessary to diagnose the shapes of immature cells at an initial phase that ultimately reduces the modality rate of the patients. Many scholars have suggested different techniques and algorithms for the recognition, segmentation, and classification of leukemia, but still, there are some gaps in this domain [14]. The challenges of the existing works act as the motivation for the proposed work.

The proposed model's primary contributions are as follows:

  • A GAN network is developed with selected learning parameters for synthetic data generation to increase the size of the datasets that ultimately increased the performance of classification accuracy.

  • Deep features are extracted from pre-trained deep models i.e., Darknet-53 and ShuffleNet. Later, optimal features are selected by PCA and fused serially to create a fused features vector for the classification of different kinds of WBCs.

  • The classified images are segmented based on two improved frameworks, such as segmentation based on the statistical morphological method and semantic segmentation using a CNN.

    1. a.

      In the statistical segmentation method, RGB input images are converted into HSV and morphological operations with selected disc shape structuring elements are employed for WBCs segmentation.

    2. b.

      A Semantic directional acyclic graph (DAG) network is designed, where Deeplab V3 + is utilized as a base network of the pre-trained ResNet-18 for more accurate WBCs segmented results.

The article organization as follows: “Related work” describes the recent existing literature. “Proposed methodology” describes the detail of the designed proposed methodology. “Result and discussion” covers the detail of the experiments and results. “Conclusion” includes a conclusion section.

Related work

This section presents existing methods that mainly cover leukemia detection, segmentation, classification using deep features, and hand-crafted features extraction methods [15,16,17,18]. The preprocessing phase enhances the ROI that directly affects segmentation outcomes [19,20,21,22,23]. The [24] wiener filter based on curvelet transform has been utilized for enhancement of input images to avoid false noisy edges. Proposed three successive preprocessing techniques namely as color distortion, flipping-mirroring of the image, and bounding-box distortion [25]. To improve image quality for segmentation [26], offered a preprocessing algorithm using partial contrast stretching, k-means clustering, subtractive clustering, and median filter. Three color components (C and M of CMYK color space, S of HSV color space) are preprocessed to create new image components named Cn, Mn, and Sn to select the best feature components fused by PCA for nuclei segmentation of WBCs [27]. A method of Color spaces transformation from RGB to HSV with weighted cross-entropy loss function for WBCs segmentation proposed by [28]. Deep convolutional generative adversarial networks (DCGAN) is a famous method to upsurge the samples of the image [29], performed preprocessing by matrix transformation (horizontal and vertical flipping) and the DCGAN [30] to increase the robustness of the training model, DCGAN then combined with ResNet for WBCs classification. An image format is an issue that mostly occurs in image acquisition, detection, and segmentation, so the algorithm created by [31] offered to enhance the image with the conversion of RGB image into YCbCr by adjusting contrast and smoothening to improve its quality, then apply morphological operation and the last step is image digitization [32] for blood cell segmentation. Semantic segmentation of WBCs with preprocessing stages proposed by [33] include pixel labeling, color space conversion, a fusion of pixels and unity-mask generation were performed. An image format is an issue that mostly occurs in image acquisition, detection, and segmentation, so the algorithm created to enhance the image with the conversion of RGB image into LAB and CMYK models with a K-mean clustering algorithm to detect acute leukemia [34]. Perform preprocessing to find ROI, improved images by histogram equalization, and Wiener filtering [35]. Leukemia prediction using the HoG descriptor and LR based on the ALL-IDB dataset is proposed by [36] to extract features. Histogram thresholding and watershed techniques, proposed by [37] used for extraction of 2852 color and texture features. Perform a relative analysis of feature abstraction techniques including color correlogram, color histogram, color co-occurrence matrix, steerable pyramid, wavelet transform, Gabor wavelet, Tamura texture features respectively [38]. The SURF feature is a descriptor used to mine features from query-colored images, embedded the SURF technique with GA [39] to improve performance measures, and achieved an accuracy of 92% for feature extraction. SIFT [40], FAST [41], and CenSurE [42] detectors. We can also perform a feature fusion method by combining hand-crafted features to gain better performance measures [43]. Manually experience and analyzing blood cells is time-consuming, subjective, and labor-intensive. Traditional methods of segmentation, extraction of quantitative and qualitative features, matching of features [44] for recognition successfully achieved accuracy but there are still some limitations due to poor robustness. Efficient classification based on automatic feature extraction is prepared by a convolutional neural network [22, 45,46,47,48,49,50]. Deep learning [19, 20, 43, 48, 51, 52] methods are pre-trained models like AlexNet, GoogleNet [53], ResNet [54], VGG-16 [33], Inception V3 [55], and many others are used as feature extractors and feature selector. Ahmed et al. suggested a method of WBCs feature extraction using a powerful CNN architecture VGGNet, extracted features are then filtered by the simulation of electron spectra for surface analysis (SESSA) algorithm [56]. The merits and drawbacks of the existing works are listed in Table 1.

Table 1 Merits and demerits of the existing methods

Proposed methodology

The two phases proposed technique are seen in Fig. 1, where the size of input images is increased by applying a GAN model and supplied to the two-phases model. In phase-I, classification is performed using a fusion of deep models. These classified images are passed as input to the next phase, in which required regions are segmented in two variant ways such as statistical segmentation method based on color-based morphological thresholding approach and deep semantic neural network.

Fig. 1
figure 1

Main steps of the proposed work

Dataset preparation

In this work, two publicly available smear images are utilized for proposed work evaluation, one is LISC, and the second is ALL-IDB [60,61,62]. LISC dataset is collected from BMT and hematology-oncology research center in Tehran (Iran). LISC dataset includes hematological images taken by 400 samples from 100 microscopic slides and 250 images of ground truth. The images obtained are in.bmp format and having \(720\times 576\) pixels size as shown in Fig. 5. Hematologists divide the dataset into five sub-classes with several images as 52 lymphocytes, 48 monocytes, 50 neutrophils, 39 eosinophils, and 53 basophils[63].

Figure 2a shows random blood smear images of the LISC dataset including its five classes, and (b) shows ground truth images relevant to its original image. ALL-IDB has a further two subsets as ALL-IDB1 and ALL-IDB2. The blood smear images are captured with a microscope and canon G5 camera of the optical laboratory. Images obtained are in. JPG format having a resolution size of \(2592\times 1944\). All images refer to the 3-colored RGB channel. ALL-IDB1 contains 107 images acquired from 510 samples. ALL-IDB2 contains 260 images acquired from 130 blood samples of patients. Both datasets are further divided into two subfolders: Normal cells (healthy cells) and immature/blast cells (leukemia) as shown in Fig. 3. For All-IDB1, normal cells contain 74 images of normal cells while 33 images are of blast cells. Same in ALL-IDB2, 130 images are of normal cells while 130 images are of blast cells.

Fig. 2
figure 2

Sample images of LISC dataset. a Normal images. b Ground truth images [63]

Fig. 3
figure 3

Sample images of ALL-IDB dataset a normal cells, b blast/immature cells (leukemia) [60,61,62]

Figure 3a shows healthy blood smear images of the ALL-IDB dataset including its two classes, and (b) shows images of abnormal growth of WBCs. LISC dataset consists of 242 images for five sub-types, and ALL-IDB contains 367 images for two sub-types. These dataset images are not sufficient for good classification accuracy; therefore, GAN is proposed for synthetic blood smear image generation to increase the performance of the classification.

Generative adversarial network (GAN)

GAN is one of the well-known and significant types of deep-learning networks [64]. It consists of a generator and discriminator as the front-end and back-end of a network as shown in Fig. 4. The generator is used to maps random vector values (noise) into generated samples. Discriminator tries to discriminate images into corresponding classes and generated synthetic images. The detailed parameters to configure and train a GAN network are listed in Table 2. By using GAN deep model, the LISC dataset size increases from 242 to 11,920 images such that basophil contains 2160 images, neutrophil contains 2488 images, eosinophil contains 2488 images, lymphocyte contains 2480 images and monocyte contain 2304 images. ALL-IDB dataset size also increases from 367 to 1187 images using the GAN network, while only the ALL-IDB2 dataset is modified to achieve the highest accuracy i.e., blast cells contain 390 images and no blast cells also contains 390 images for classification.

Fig. 4
figure 4

Flow diagram of the GAN training

Table 2 List of used parameters for the training of GAN model

In Table 2, the parameters used for GAN network training are given with their respected values. GAN is trained by using the deep connection of layers where input image and random noise have to be loaded to the network. The size of an input image is \(64\times 64\times 3\). A mini-batch queue is used to manage and process the mini-batch of images. To rescale images in [1] range “preprocess Minibatch” function is used. 10,000 epochs with 30,000 iterations are selected because the dataset that we used is large and the given epochs are much enough for forward and backward pass on all training data to complete the count of iterations. To get the balance for learning of generator and discriminator, 30% of flip used to add noise to real data labels, this flip factor adds noise by randomly flipping 15% of the total labels while training the discriminator network. The gradient decay factor is set as 0.5 because it reduces the training error of the model and makes a regularization strategy. Generated images by the generator are shown after every 100 iterations. GPU is required to train the GAN model, an option of “output environment” should be auto which converts the output to “dlarray” objects. The model shows a period of 5 min and 26 s for the training of the model. A GAN comprises two further networks which are generator and discriminator.

Generator

It sets a vector to pass as an input that consists of random values. Generator networks create data with a similar structure as input data as shown in Fig. 5. In the proposed model generator consist of 13 layers among which the first layer is the missing layer, which is used to convert arrays of noise to \(7\times 7\times 128\) size, it is also known as “projectAndReshapeLayer”. There are 3 sets of arrays comprised of consecutive transposed convolution layers, ReLU layers, and batch-normalization layers. Cropping of the output layer is done with stride 2 after each set of layers with a diminishing number of filters, this is done to upscale the resulting arrays to \(32\times 32\times 3\) for a specified size of the output array. The Tanh layer is indeed the network's final layer for generating the output.

Fig. 5
figure 5

Layered architecture of the generator model

Discriminator

In the proposed model, the Discriminator network is created by 14 layers among which the first layer is the dropout layer. The network takes an input image of size \(64\times 64\times 3\). The discriminator used a series of convolution, leaky ReLU, and batch-normalization layers. Specified the dropout layer probability of 0.5 to pass real and noisy synthetic images generated by generator. Apply padding of 2 after each edge of layer and having 0.2 scales for each leaky ReLU layer. At last, again convolutional layer is used as a final output layer. The discriminator returned a scalar prediction as real or fake data.

In Fig. 6, the working flow of the discriminator is presented, where real and generated data is passed as input, and the discriminator has to validate the images with corresponding classes.

Fig. 6
figure 6

Layered architecture of the discriminator model

Deep feature extraction

Feature extraction is the basic step for machine learning and pattern recognition algorithms to detect and recognize different types of diseases. In this proposed work, two deep CNN pre-trained models are used i.e., DarkNet-53 and ShuffleNet. Both are pretrained on a database of ImageNet that contains more than 1000 object categories. As a result, these networks can be learned for the representation of rich features for a wide range of databases.

DarkNet-53 is a CNN model that contains 184 layers, i.e. 53 convolutions 2 layers, 52 leaky ReLU layers, 52 batch-normalization layers, 23 addition layers, 1 average global layer, and softmax layer [65]. DarkNet-53 has trained on LISC and ALL-IDB databases by utilizing the transfer learning concept. DarkNet-53 model is loaded first, then the dataset path is set for the selection of randomized best images, 50% for training, and 50% for testing purposes. For the feature vector, convolution 53 layer as “Conv53” is used, activation is performed utilizing the function of cross-entropy. After a process of activation 1024 features are extracted, this gives a dimension vector of \(N\times 1024\) as output, where testing and training images are denoted by \(N\). The features extracted by the Darknet-53 models are presented in the matrix and referred to as feature set 1.

ShuffleNet is developed by using the channel shuffle operation [66]. There is a global average pooling layer of size \(1\times 1\times 544\) and at the end, there is a fully connected layer of dimension 1000 and a Softmax layer of size \(1\times 1\times 1000\). This network returned a CNN-based trained model with 172 layers i.e. 1 convolutional 2D layer, 49 batch-normalization layers, 33 ReLU layers, 48 grouped convolution 2D layers, 13 addition layers, 16 channel shuffling layers, 3 average pooling 2D layers, 2 depth concatenation layers, 1 global average pooling layer, 1 fully connected and 1 softmax layer.

For extraction of deep features, initially, the ShuffleNet model has trained on LISC and ALL-IDB datasets by utilizing the transfer learning concept. The ShuffleNet model is loaded first, then the dataset path is set for the selection of randomized best images, 50% of training, and 50% of testing purposes. On the average pooling layer (node-200) and fully connected layer (node-202), activation is performed utilizing the function of cross-entropy. After a process of activation 1000 features are extracted, this gives a dimension vector of \(N\times 1000\) as output, where the number of testing and training images is denoted by \(N\). The features extracted by the ShuffleNet models are presented in the matrix and referred to as feature set 2.

Feature Selection

The dimensionality of data can be reduced by selecting a subset of features known as feature selection. PCA [67] is a method used for dimensionality reduction from d-dimensional space. Due to correlations between variables, variation occurs in data information, PCA creates new variable names PC, which uncorrelated and ordered by a fraction of total information from each retains. The PCA function uses the singular value decomposition. The working of PCA for the proposed feature selection method is given in Fig. 7. The method is made up of the following steps:

  • Normalized data with the same range of attributes by using the mean vector Ī.

  • Initializing of \(n\) data vectors. Give \(100\) vectors of ALL-IDB1, 500 vectors of ALL-IDB2, and LISC data from data space of \(N\times 1024\) and \(N\times 1000\).

  • Computation of covariance matrix of vectors (PCs) for condition \(k\le n\).

  • Compute Eigenvector for the covariance matrix.

  • Select features values according to score value from higher-dimensional space \({N}^{K}\) to lower dimensional space \({N}^{R}\).

Fig. 7
figure 7

Flow diagram of the proposed PCA based features selection

In Fig. 7, the working flow of PCA based features selection method is presented, where two feature sets i.e. feature set 1 and feature set 2, are passed as input, PCA selected the best optimal features which will further used for classification.

Feature fusion

Feature fusion is an essential step in the field of pattern recognition. In this, horizontal concatenation of different feature vectors is performed for obtaining a final single feature vector for recognition of disease. The key motivation of performing this step is putting all descriptors information in one feature vector column that may be useful for minimizing the error rate. In this work, the process of fusion is shown in Fig. 8, which illustrates the internal details of fusion steps.

Fig. 8
figure 8

The proposed features fusion method's flow diagram

In Fig. 8 the fusion method of PCA-based selected features is presented. Each feature vector extracted from Darknet-53 and ShuffleNet is plotted before the process of fusion. There are 200 features selected by PCA for ALL-IDB1, 500 features for ALL-IDB2, and 500 features are selected for the LISC dataset. A new final fused feature vector for ALL-IDB1 is obtained having dimensions \(N\times 201\). The fused feature vector of ALL-IDB2 is \(N\times 501\). LISC dataset is obtained having dimensions \(N\times 501\). Where some observation for images is presented by \(N\) and 1 more column represent labels of initialized datasets.

Classification of leukemia and different types of white blood cells

In machine learning, the classification process is used for the prediction of class labels from the given input data. For leukocyte classification, i.e. SVM [68], KNN [69], ensemble, decision tree [70], and Naïve bayes [71] are utilized for both LISC and ALL-IDB datasets.

Segmentation of white blood cells

Two proposed strategies, the statistical morphological approach and the semantic segmentation model, are used to segment the images in order to examine the contaminated region. Statistical approach of segmentation is a traditional technique needing domain expertise, provide hard features extraction, and may lead to inaccurate results, where neural network-based semantic segmentation is an advancement of the machine learning techniques. Semantic segmentation has great potential to produce better results by assigning labels to each pixel and extract enormous features than other conventional methods.

Statistical morphological-based segmentation using color conversion and morphological operation

Due to noisy, unpredictable, and incomplete datasets, preprocessing plays a vital role. To get accurate ROI, there is a need to enhance the acquired input images, so preprocessing is the first step to perform segmentation. The proposed color-thresholding method is given in Fig. 9. First, read the input image of size \(720\times 576\times 3\) from the LISC dataset. The resizing of the input image is performed by a factor of 0.5 scales. The rescaled image is then converted from RGB to HSV, where H (hue) denotes color, S (saturation) denotes the amount of white color mixed with the respective color and V (value) denotes brightness of color. The mathematical formula for conversion [72] is given below, where Eq. 1 denotes a division of the R, G, B values by 255, to convert the range of pixels from [0, 255] to [0, 1].

Fig. 9
figure 9

Flow diagram of proposed statistical morphology-based segmentation using color conversion and morphological operations

Equation 1 referred to calculate the value of brightness which will denote the maximum value evaluated from the R, G, and B channels. It is also referred to as brightness and ranges from 0 to 100%.

$$V=C \left(\mathrm{max}\right)$$
(1)

Color space conversion is performed to improve contrast \((C)\) from the background. Applied Gaussian Smoothening filter with a sigma value of 4 to make images noise-free. Extracted all three layers from the RGB color channel to apply a color-based threshold, optimize the 0.399 threshold value. By evaluating threshold conditions, applied three morphological operations as erosion, opening, and filling with “disk” shape structuring element. The summation of erosion and filling operation is performed to get segmentation results.

Semantic segmentation using convolutional neural network

Semantic segmentation includes Deeplab V3+ network having weights and acts as a base model for segmentation [73], these weights are modified from the pre-trained network known as Resnet-18 [74] as given in Table 3. Deeplab V3+ is the base network of Resnet-18, where Resnet-18 is a well-organized network that is suited for the application with limited processing resources. In proposed CNN-based semantic segmentation, the DAG network is proposed with Deeplab V3+ and Resnet-18. Load the LISC dataset images using image data store which contains 200 Gy-scale images with their ground truth for the training of the model. The size of each input image is \(300\times 300\) pixels. Deeplab V3+ network consists of 227 connections of neurons and 206 layers. Resnet-18 also has 227 connections and returns a layer graph object with 72 layers. The proposed model follows by the softmax layer for segmentation of the blood cells. The depth of this network is carried out by the processing of input images as down-sampled and up-sampled during training.

Table 3 List of used parameters for the training of the semantic model

In Table 3, important parameters are discussed with their value respectively. This network returned as a CNN model with 62 convolution 2D layers, 57 ReLU layers, 61 Batch normalization layers, 1 max-pooling layer, 2 transposed convolutional 2D layers, 2 depth concatenation layers, 16 addition layers, 2 cropped-2D layers, and 1 Softmax layer for training purpose. Softmax out is the last layer used for segmentation. Figure 10 shows the encoder-decoder design of the DAG system:

Fig. 10
figure 10

Proposed semantic segmentation model

In Fig. 10, the overall flow diagram of CNN-based semantic segmentation using Deeplab V3+ and ResNet-18 is presented.

Result and discussion

The outcomes of the experiments are detailed in explained in this section. This study is evaluated on three experiments. Experiment #I is implemented to access the performance of the classification task and classified images are segmented using statistical and semantic convolutional neural networks in experiment II. The proposed efficiency of the system will be checked in terms of Jaccard-coefficient (JAC), dice similarity-coefficient (DSC), specificity, sensitivity, precision, and accuracy [75] for segmentation. For purpose of comparison and analysis of the proposed classification technique, five different classifiers including decision tree, Naïve bayes, SVM, KNN, and Ensemble. For the implementation of the proposed work, MATLAB 2020a is utilized with GPU. Firstly, classification is performed by partitioning both datasets into 50% training and 50% testing set to evaluate holdout validation. As the LISC dataset is large, so training of fused features is also evaluated with fivefold cross-validation. For deep features, the activation function of cross-entropy is utilized. Secondly, segmentation is performed using 70% of training and 30% for the testing phase. Performance is evaluated by using the Jaccard index, dice coefficient, intersection over union (IoU), global accuracy, sensitivity (Sey), specificity (Spy), f-measure, and precision (pn).

Experiment#1 classification of WBCs

For classification, both LISC and ALL-IDB datasets are used. The fused feature set is passed as input to different classifiers, where ensemble subspace KNN achieved the highest accuracy as given in Table 4.

Table 4 ALL-IDB1 dataset results of DarkNet-53 + ShuffleNet

In Table 4, the highest accuracy of ensemble subspace KNN classifier is represented. Overall ACC is 100% achieved with 1.00 AUC, 1.00 Sey, 1.00 Spy, 1.00 Pn, and 1.00 F1-score measure value.

In Table 5, the highest accuracy of ensemble subspace KNN is presented. The overall ACC of 100% was achieved with 1.00 AUC, 1.00 Sey, 1.00 Spy, 1.00 Pn, and 1.00 F1-score. Table 6 reports the analysis of the classification on the LISC database.

Table 5 ALL-IDB2 dataset results of DarkNet-53 + ShuffleNet
Table 6 LISC dataset results of DarkNet-53 + ShuffleNet

In Table 6, the highest accuracy of the cosine KNN classifier is represented. Overall accuracy is 99.7% achieved with 1.00 AUC, 0.99 Sey, 0.99 Spy, 0.99 Pn, and 0.99 F1-score measure value. The fused feature vector performed worthy results and achieved the highest accuracy for the synthesized dataset by GAN, as given in Table 7.

Table 7 Results comparison for real and synthesized dataset

In Table 7, the comparison of the real dataset and the synthesized dataset is presented, which shows that the synthesized dataset achieved better accuracy by using a fused vector as compared to the original data. A brief comparison is presented in Table 8.

Table 8 Comparison of the segmentation outcomes

In Table 8, the recently proposed algorithms are accessible for comparison. As compared to existing classification methodologies, the proposed technique is assessed on the fusion of pre-trained darknet-53 and shuffle-net features. Shuffle-net provides channel shuffling which reduces the computational cost and darknet-53 is a very powerful feature extractor, due to blurred and light-colored blood smear images, a fusion of these two models is performed and achieved significantly improved performance.

Experiment#2 segmentation of WBCs

Segmentation is applied to classified LISC datasets with their ground truths. As statistical morphological-based segmentation using color transformation and morphological operation is a manual method, so all images have to be initialized one by one. There is no need for training and testing purposes. The overall accuracy is given in Table 9 and the segmented image is presented in Fig. 11.

Table 9 Segmentation results of statistical morphology-based segmentation using color conversion and morphological operations

In Table 9, accuracy measured by Jaccard and Dice index is presented, where the average accuracy achieved is more than 90% except for lymphocytes.

In Fig. 11, the output of segmented images is presented. The similarity is measured by the Jaccard index and the Dice coefficient parameters results are shown with the output image.

Fig. 11
figure 11

Segmentation results. a Input image, b ground truth, c segmented output, d Jaccard index, e dice index

For CNN-based semantic segmentation, prepared dataset by dividing it into two categories: training and testing. Training samples are of 70% contain 200 images and testing samples are of 30% contain 50 images of all types of leukocytes from the LISC dataset. The proposed Deeplab V3+ and Resnet-18 based semantic segmentation achieved global accuracy of 98.6%, mean accuracy of 99.1%, mean IoU of 98.1%, weighted IoU of 98.0% and F1-score is 99.2%. The evaluation results using different performance measures are given in Table 10 and segmented output is given in Fig. 12 as below:

Table 10 Segmentation results of the deep semantic model
Fig. 12
figure 12

Semantic segmentation results a input image, b ground truth, c segmented output

In Fig. 12, the highlighted pink color represents the output of semantic segmented images. The proposed model performance is compared to existing as depicted in Table 11:

Table 11 Results comparison

In Table 11, a comparison is assessed between results achieved by existing techniques and the proposed model. The existing work has been employed for the detection of WBCs with 88.1% accuracy [79]. A semantic segmentation for recognition and localization of leukocytes using Deeplab V3+ with ResNet 50 is utilized [77]. The suggested algorithm acquired 98.21% mean accuracy, 84.22% mean IoU for segmentation. Their achieved accuracy is higher than existing methods but computationally exhaustive. Another semantic segmentation method is offered by [57] used the encoder-decoder architecture of Deeplab V3+. Training of Deeplab V+ is based upon ResNet 50 as a pretrained network, ResNet 50 is used to overcome the problem of degrading accuracy. As compared with existing works, this study presents a new improved model based on Deeplab V3+ and ResNet-18, and training is performed on fine-tuned learning parameters. The performance of the proposed method is evaluated on three benchmark datasets and achieved a global accuracy of 98.60%, and mean IoU of 98.10% with the LISC databases. The obtained quantitative findings show that the suggested methodology outperformed previously published studies.

Conclusion

Accurate and precise segmentation and classification of leukocytes is a difficult process. The flexible structure of the nucleus of WBCs creates a problem for the detection of leukemia. To overcome these challenges, a method with two phases is proposed. In the first phase of the method, a GAN network is developed with selected learning parameters for synthetic data generation to increase the size of the datasets. The deep features are extracted from pre-trained deep models, i.e., Darknet-53 and ShuffleNet. Later, optimal features are selected by PCA and fused serially to create a fused features vector. The fused features are supplied to SVM, KNN, Naïve Byes, Tree, and Ensemble classifiers out of which Ensemble KNN and Cosine KNN provided better results. The classified images are segmented based on two frameworks, i.e., segmentation based on the statistical morphological method and semantic segmentation using Darknet-53 and ShuffleNet. The classification results are 100.0% for ALL-IDB and 99.70% for the LISC dataset. The statistical morphological-based segmentation is compared with ground truth and achieved an average accuracy of 85.95% except for lymphocytes. The semantic segmentation achieved 0.986 global ACC, 0.991 mean ACC, 0.981 mean IoU, 0.980 weighted IoU and 0.992 F1-score. The comparison confirms that the results from the proposed model are better as compared with recently published work in this domain. In the future, this work will be extended to explore the quantum computing algorithms for the more precise detection of WBCs.