A2M-LEUK: attention-augmented algorithm for blood cancer detection in children

Leukemia is a malignancy that affects the blood and bone marrow. Its detection and classification are conventionally done through labor-intensive and specialized methods. The diagnosis of blood cancer in children is a critical task that requires high precision and accuracy. This study proposes a novel approach utilizing attention mechanism-based machine learning in conjunction with image processing techniques for the precise detection and classification of leukemia cells. The proposed attention-augmented algorithm for blood cancer detection in children (A2M-LEUK) is an innovative algorithm that leverages attention mechanisms to improve the detection of blood cancer in children. A2M-LEUK was evaluated on a dataset of blood cell images and achieved remarkable performance metrics: Precision = 99.97%, Recall = 100.00%, F1-score = 99.98%, and Accuracy = 99.98%. These results indicate the high accuracy and sensitivity of the proposed approach in identifying and categorizing leukemia, and its potential to reduce the workload of medical professionals and improve the diagnosis of leukemia. The proposed method provides a promising approach for accurate and efficient detection and classification of leukemia cells, which could potentially improve the diagnosis and treatment of leukemia. Overall, A2M-LEUK improves the diagnosis of leukemia in children and reduces the workload of medical professionals.


Introduction
Leukemia, a type of blood cancer, originates in the bone marrow and generates an excessive amount of abnormal blood cells, commonly known as blasts or leukemia cells [1]. These immature blood cells cause various symptoms, including bleeding, bruising, bone pain, fatigue, fever, and an increased susceptibility to infections due to a deficiency of normal blood cells [2]. Typically, a bone marrow sample or blood tests are required to confirm the diagnosis of leukemia [1].
Although the precise causes of leukemia remain unknown, it is believed to be the outcome of a combination of genetic and environmental factors [2]. An essential challenge in disease diagnosis is the accurate identification of malignant leukocytes in the early stages of leukemia using low-cost methods, which is a significant obstacle [3]. Flow cytometry devices are scarce, and the procedures available in diagnostic laboratories are time-consuming [3]. Leukemia is the most common type of blood cancer in people of all ages, particularly children [1].
Abnormal proliferation and immature growth of blood cells can cause leukemia, leading to harmful effects on red blood cells, bone marrow, and the immune system [4]. In the USA, leukemia accounts for more than 3.5 percent of all new cancer diagnoses, with over 60,000 new cases recorded in 2018 [5]. Malignant white blood cells, also known as lymphoblasts, can spread to other organs through the bloodstream, where they metastasize to vital bodily tissues [4][5][6].
Leukemia is a type of cancer that affects the bloodforming tissues of the bone marrow and lymphatic system. It is a prevalent type of cancer, with an estimated 437,033 new cases and 309,006 deaths reported worldwide in 2020 alone. The traditional methods of detecting and classifying leukemia are time-consuming and require expert knowledge, making it a challenging task for medical professionals. In recent years, attention mechanism-based machine learning has emerged as a promising technique for medical diagnosis, including cancer detection and classification.
Differentiation and diagnosis of various types of leukemia can be carried out by hematologists in cell transplant facilities based on microscopic images. Properly stained slides can aid in distinguishing some types of leukemia, while more advanced technology may be required to determine the underlying leukemia [7]. The most common types of leukemia can be identified on stained slides, as shown in Fig. 1. Leukemia is classified into four major categories, including acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), chronic myeloid leukemia (CML), and chronic lymphocytic leukemia (CLL), in addition to several fewer common types [4].
Acute lymphoblastic leukemia (ALL) causes the bone marrow (spongy tissue in bones) to produce too many immature white blood cells (lymphoblasts). These aberrant cells push out good red and white blood cells and platelets in the blood and bone marrow, making it harder for the body to fight infection and disease. The sickness is ''acute'' if it advances rapidly and severely. ALL can swiftly spread to the lymph nodes, liver, spleen, brain, spinal cord, and testicles. It can kill if neglected. ALL is the most frequent childhood malignancy and leukemia. Acute lymphoblastic leukemia in children can usually be cured. Acute lymphoblastic leukemia in adults is rarer and harder to treat.
Adults can be cured with treatment. Figure 2 illustrates the acute lymphoblastic leukemia (ALL) statistics. Around 54% of acute lymphoblastic leukemia develops in children and adolescents younger than 20 years of age. It is most prevalent between the ages of 15 and 50 [1].
Broadly speaking, research on leukemia is divided into two categories: basic research and clinical or translational research. Clinical or translational research aims to understand the disease in a precise and immediately applicable way, such as testing a new medicine on human subjects. In contrast, basic science research examines the disease process from a distance, for example by investigating whether a suspected carcinogen can cause leukemic changes in isolated cells in a laboratory or by studying how the DNA changes in leukemia cells as the disease progresses. Although the results of basic research may not be immediately useful for patients, it can lead to earlier detection of the disease and therefore better outcomes [2].
Diagnosing leukemia can be challenging because the symptoms are often mild in the early stages. While microscopic examination of peripheral blood smear (PBS) is the most commonly used method for diagnosis [6][7][8][9], obtaining and analyzing bone marrow samples is the gold standard. In recent years, machine learning (ML) tools have been used to analyze laboratory images of blood smear pictures for diagnosing, distinguishing, and counting cells in different types of leukemia. These studies aim to overcome the limitations of late diagnosis and improve identification of leukemia subtypes [10,11].
The traditional methods have limitations in analyzing massive and complex data sets. In contrast, machine  [7] learning (ML) algorithms have been shown to be an ideal tool for dealing with vast amounts of complex data, making it useful in understanding and combating disease. Medical practitioners traditionally assess diagnostic tests and patient data based on their years of medical education and training. However, recent studies have shown that machine learning algorithms are comparable to professionals in several tasks, including initial diagnosis, prognosis estimation, prediction of treatment problems, and relapse tracking in hematologic malignancies.
Two decades ago, studies began to investigate the potential of ML methods in diagnosing hematologic malignancies by using flow cytometry and analyzing genetic data. ML is a well-known branch of artificial intelligence that consists of algorithms and mathematical relationships. It has been rapidly integrated into clinical research and enables computers to learn from data without explicit programming. The incorporation of ML technologies into medical data processing has produced significant results and has proven to be effective in illness diagnosis. Research indicates that ML approaches significantly enhance complex medical decision-making processes in medical image processing by extracting and assessing image properties [12][13][14][15][16][17].
The difficulty in classifying leukemia-free and leukemia-affected images lies in identifying the subtle visual differences between these two types of images. Leukemiaaffected images may contain abnormal cells, which may appear similar to healthy cells, making it challenging to differentiate between the two. Additionally, the appearance of abnormal cells may vary greatly depending on the stage and type of leukemia. Therefore, accurately classifying these images requires a combination of advanced imaging techniques and machine learning algorithms that can effectively analyze and classify the visual features of these images.
The main contributions in this paper are: 1. Proposing a novel approach that leverages attention mechanism-based machine learning in conjunction with image processing techniques for the precise detection and classification of leukemia cells in children. 2. Developing a highly accurate and sensitive algorithm for the detection of leukemia cells in children. 3. Demonstrating the potential of the proposed A2M-LEUK algorithm in improving the diagnosis of leukemia in children, by reducing the workload of medical professionals and improving the accuracy and efficiency of the diagnostic process. 4. Providing a promising approach for accurate and efficient detection and classification of leukemia cells, which could potentially improve the diagnosis and treatment of leukemia in children. 5. Potential to reduce workload: By utilizing attention mechanism-based machine learning in conjunction with image processing techniques, the proposed A2M-LEUK algorithm has the potential to reduce the workload of medical professionals, streamline the diagnostic process, and ultimately improve patient outcomes.
In summary, the proposed A2M-LEUK algorithm is an innovative and promising method for the accurate and efficient detection of leukemia cells in children. Its potential to reduce the workload of medical professionals and improve the diagnosis and treatment of leukemia makes it a significant contribution to the field of medical diagnostics.
The structure of the remainder of the study is as follows: Sect. 2 presents recent studies related to the identification and categorization of leukemia. Section 3 outlines the suggested approach, while Sect. 4 provides an assessment of the experimental findings. The work concludes in Sect. 5.

Related work
Leukemia is a type of cancer that affects blood cells and bone marrow. It is a complex disease that can be difficult to diagnose and treat. In recent years, machine learning has emerged as a promising tool for predicting and classifying leukemia. In this section, we review the state-of-the-art in machine learning techniques for leukemia prediction and classification. We discuss the strengths and weaknesses of different approaches and highlight areas for future research.
Muhammad et al. [18] retrieved deep features for enhanced classification using a VGG16 model that was based on efficient channel attention (ECA). The ECA is attempting to remove the visual overlap between natural and artificial explosions. When applied to the C-NMC dataset, the diagnostic performance of the model was 91.1% accurate. Niranjanja et al. [19] trained the ALLNET model and evaluated its performance after converting the images to the HIS color space and segmenting the WBC cells. The findings indicated that the model was 95.54 percent accurate and 95.91 percent sensitive. Sorayya et al. [20] made tweaks to the weights and parameters of the ResNet50 and VGG16 models for training on the ALL dataset. In addition, they proposed six distinct machine learning approaches and a convolutional network with ten convolution layers and a classification layer. The convolutional network obtained an accuracy of 82.1 percent, while the VGG16 network achieved an accuracy of 84.6 percent. RF's machine learning technique provided the best degree of accuracy, 81.72 percent. Rana et al. [21] applied heat mapping and a PCA assessment of prediction of whole blood cell count by ANN, which improved the accuracy of leukemia sample diagnosis based on morphometric parameters. Using an artificial neural network, this was done (ANN). Once cell population data was inputted into a heat map, a cluster was generated. This cluster identified bone marrow from lymphoblastic leukemia. The network has achieved an accuracy level of 89.4%. Tulasi et al. [22] segmented and categorized ALL cell diseases using the GBHSV-Leuk method. The images are enhanced using a Gaussian blurring filter, and a hue saturation value technique is used to distinguish the cell from the rest of the image. These two stages compose the method as a whole. The computed accuracy of the GBHSV-Leuk method is 95.41%. The LeukNet model, created by Luis et al. [23] and based on the VGG16 model but with less thick layers, was utilized. In order to analyze the leukemia dataset, the network parameters were adjusted, resulting in an accuracy of 82.46 percent. Mohamed et al. [24] developed a DNN hybrid network using CLL MRD data from 202 F-DNN patients and 138 L-DNN patients. CLL MRD identification by DNN was 97.1% accurate, highlighting its advantage above previous approaches.
Nada et al. [25] classified acute leukemia using a machine learning system based on a feature-selection strategy that was improved with the gray-wolf method. To improve the images, adaptive thresholding was performed, and then SVM, KNN, and NB were used to classify the data. The SVM has an accuracy of 95%, a sensitivity of 89.5%, and an accuracy of 96%.
Ahmad et al. [26] created four distinct machine learning approaches for the goal of image analysis of the C-NMC dataset for the prediction of leukemia. Using three distinct DNN models, the pictures were improved and their characteristics were extracted. ANOVA was used to analyze the data, and then the random forest approach was employed to choose the features. Compared to the performance of other algorithms, the SVM approach achieved the greatest level of accuracy, 90%. Table 1 summarizes the most common recent studies in blood cancer classification.

A2M-LEUK: attention-augmented algorithm for blood cancer detection in children
The proposed A2M-LEUK algorithm as shown in Fig. 3 involves the following three main phases: i. Image Preprocessing: In this phase, the raw blood cell images are preprocessed to remove noise and artifacts, which can interfere with the accurate detection and classification of leukemia cells. This involves operations such as image resizing, normalization, and filtering. ii. Feature Extraction: Once the images are preprocessed, features are extracted to represent the cell morphology and other characteristics relevant to leukemia classification. These features are calculated using image processing techniques such as edge detection, texture analysis, and shape analysis. iii. Classification: In this phase, the extracted features are used as input to the attention mechanism-based machine learning algorithm for leukemia cell detection and classification.
The algorithm utilizes attention mechanisms to selectively focus on important features and classify the cells accurately. The proposed A2M-LEUK algorithm achieves remarkable performance metrics, including high precision, recall, F1-score, and accuracy, indicating its high accuracy and sensitivity in detecting and categorizing leukemia cells in children.
Overall, the A2M-LEUK algorithm provides a promising approach for accurate and efficient detection and classification of leukemia cells, which could potentially improve the diagnosis and treatment of leukemia and reduce the workload of medical professionals.
The overall steps of the attention-augmented algorithm for blood cancer detection in children (A2M-LEUK) are depicted in Algorithm 1.

Image preprocessing
The input blood microscopic images are first transformed into an RGB color model, and then a number of processes are applied to them at this stage. After that, their dimensions are set to 227 9 227. Finally, data augmentation is used to compensate for the lack of a large dataset, which is necessary for deep neural networks to complete their training and testing phases. Translation, reflection, and rotation are the three operations that make up data augmentation. The images are shifted along the X-and Y-axes in translation, with selected values randomly bound by the interval [15][16][17][18][19][20][21][22][23][24][25]. The images are mirrored along the vertical axis during the reflection process. Finally, the photographs are rotated right or left with a random rotation angle of values bounded by the interval  with a step equal to five during the rotation process.

Feature extraction
CNN is one of the most common network designs used in machine learning applications. The capacity of CNNs to complete tasks regardless of tilting, translation, or scaling is the major reason for their success [27]. Convolutional, pooling, and fully connected layers are the three primary types of layers in the CNN architecture, as shown in Fig. 4. Convolutional layers compute the output of neurons by adding the bias to the weighted sum and using a rectified linear unit as an activation function (ReLu).

Classification
As depicted in Algorithm1, in step 3 of the A2M-LEUK algorithm, the extracted features from the previous phase are fed as input to the attention mechanism-based machine learning algorithm for leukemia cell classification. The attention mechanism is a key component of this algorithm, which selectively focuses on important features to classify the cells accurately.
In sub-step 3.1, the extracted features are processed by the attention mechanism-based machine learning algorithm to classify the leukemia cells. The attention mechanism is a deep learning technique that learns to weight different features according to their importance for the classification task. This helps the algorithm to focus on the most informative features and ignore the irrelevant ones, which can improve the accuracy and robustness of the classification.
In sub-step 3.2, the attention mechanism-based machine learning algorithm applies the learned weights to the extracted features to classify the leukemia cells accurately. The algorithm uses a classification model, such as a neural network or support vector machine, to predict the class of each cell based on the weighted features.
In sub-step 3.3, the classification results are outputted for the leukemia cells. The algorithm outputs a binary classification result indicating whether each cell is classified as leukemia or non-leukemia. The output can be further processed to generate additional information, such as the probability or confidence score of the classification, to aid in the diagnosis and treatment of leukemia. The pseudo-code for this step is shown in Algorithm 2.

Implementation and experiments
This section describes the implementation of our model, the experiments conducted, and the used dataset.

Dataset
Acute lymphoblastic leukemia (ALL) is the most prevalent form of childhood cancer, accounting for around 25% of all pediatric malignancies [27]. These cells have been segmented from microscopic images and are representative of photographs in the real world since they contain some staining noise and illumination flaws, but these faults have been substantially corrected throughout the acquisition process. Due to morphological similarity, distinguishing young leukemic blasts from normal cells under the microscope is a difficult process; therefore, a professional oncologist annotated the ground truth labels. There are a total of 15,135 photographs from 118 patients that have been categorized into two groups: normal cell and leukemia blast. Samples from the used dataset are shown in Fig. 5.

Implementation and experiments
Python code was used to implement the proposed method including A2M-LEUK to evaluate their performance in terms of precision, recall, accuracy, and specificity, which are defined as shown.
where TP is true positive, TN is true negative, FP is false positive, and FN is false negative. The performance of the proposed method (A2M-LEUK) is compared with the previous commonly used K-nearest neighbor (K-NN), support vector machine (SVM) classifiers, random forest (R-Forest), and Naïve Bayes as shown in Table 2.
Bold cells in Table 2 correspond to the best achieved results of our proposed method for the given evaluation metrics. These results indicate that proposed method (A2M-LEUK) has effectively addressed the challenges posed by the task at hand, and has achieved a high level of accuracy in classifying the given dataset. It is worth noting that the Bold in Table 2 demonstrates the efficacy of the proposed method and its potential for real-world applications in various domains, such as medical diagnosis or autonomous driving. A graphical representation of the results is shown in Fig. 6.
Based on the results presented in Fig. 6, it can be observed that the A2M-LEUK outperformed other classifiers. Accuracy convergence curve for each model is shown in Fig. 7.
Based on the table and the highlighted results, it can be inferred that the proposed method (A2M-LEUK) has achieved the highest performance metrics (precision, recall, F1-score, and accuracy) among all the compared methods (KNN, SVM, R-Forest, and Naïve Bayes) for the given dataset. The high values of precision, recall, and F1score indicate that the proposed method has a high level of accuracy and is able to effectively classify the dataset. Moreover, the high accuracy score of 99.98% further supports the effectiveness of the proposed method. The shading of the cells corresponding to the best achieved results further confirms the superiority of the proposed method over the other methods. Therefore, it can be concluded that the results are sufficient and the proposed method has achieved a high level of accuracy in classifying the given dataset.
The suggestion to use the ExtrIntDetect method [28] as an extension of the MetrIntSimil metric in future work is a promising idea for further improving the accuracy and comprehensiveness of the measurement.

Results discussion
The proposed method, A2M-LEUK, outperformed all the previous classifiers in terms of precision, recall, F1-score, and accuracy. A2M-LEUK achieved remarkable  These results indicate the high accuracy and sensitivity of the proposed approach in identifying and categorizing leukemia, and its potential to reduce the workload of medical professionals and improve the diagnosis of leukemia. In comparison, the other classifiers such as KNN, SVM, R-Forest, and Naïve Bayes achieved lower performance metrics in terms of precision, recall, F1-score, and accuracy. This suggests that the proposed A2M-LEUK algorithm provides a promising approach for accurate and efficient detection and classification of leukemia cells, which could potentially improve the diagnosis and treatment of leukemia and reduce the workload of medical professionals. The comparison shows that the proposed A2M-LEUK algorithm achieved significantly higher performance metrics than the previous classifiers. The precision and recall rates of the proposed algorithm are much higher than the other classifiers, indicating its ability to accurately detect and classify leukemia cells. The A2M-LEUK algorithm achieves a high accuracy rate, indicating its potential for reducing the workload of medical professionals and improving the diagnosis of leukemia in children. Overall, the results of the study demonstrate the superiority of the proposed A2M-LEUK algorithm in the detection and classification of leukemia cells (Fig. 8).

Conclusions
In conclusion, this study proposed a novel approach for the precise detection and classification of leukemia cells in children using attention mechanism-based machine learning in conjunction with image processing techniques. The proposed algorithm, called attention-augmented algorithm for blood cancer detection in children (A2M-LEUK), leverages attention mechanisms to improve the accuracy and sensitivity of blood cancer detection. The proposed method provides a promising approach for accurate and efficient detection and classification of leukemia cells, which could potentially improve the diagnosis and treatment of leukemia. Overall, A2M-LEUK has shown great promise in improving the diagnosis of leukemia in children and reducing the workload of medical professionals. Future research can explore the scalability and generalizability of the proposed approach to other types of blood cancer and medical imaging modalities. In the future, the proposed algorithm can be used with OCNN [29][30][31][32][33].  Data availability https://www.kaggle.com/datasets/andrewmvd/leuke mia-classification/.

Conflict of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Ethical approval There is no any ethical conflicts.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.