Classification of anemia using Harris hawks optimization method and multivariate adaptive regression spline

Yagmur, Nagihan; Dag, İdiris; Temurtas, Hasan

doi:10.1007/s00521-023-09379-y

Classification of anemia using Harris hawks optimization method and multivariate adaptive regression spline

Original Article
Open access
Published: 10 January 2024

Volume 36, pages 5653–5672, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Classification of anemia using Harris hawks optimization method and multivariate adaptive regression spline

Download PDF

1165 Accesses
Explore all metrics

Abstract

Data mining methods are important for the diagnosis and prediction of diseases. Early and accurate diagnosis of patients is vital for their treatment. Various methods have been used in the literature to classify anemia. However, due to the different characteristics of patient datasets, changes in dataset sizes, different parameter numbers and features, and different numbers of patient records, algorithm performances vary according to datasets. In this study, the Harris hawks algorithm (HHA) and the multivariate adaptive regression spline (MARS) were used to classify anemia based on blood data of 1732 patients from the Kaggle database of patients with and without anemia. Six different algorithms were proposed to determine the parameters of the linear anemia approximation, namely multilinear form HHA, multilinear quadratic form HHA, multilinear exponential form HHA, first-order MARS model, second-order MARS model, and the best performing MARS model. The performance of the six proposed algorithms has been analyzed and found to be better than the previous studies in the literature.

Case-Based Reasoning (CBR)-Based Anemia Severity Detection System (ASDS) Using Machine Learning Algorithm

Anemia Multi-label Classification Based on Problem Transformation Methods

Decision Rules Generation Using Decision Tree Classifier and Their Optimization for Anemia Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Anemia is a global health problem affecting human health [1]. It particularly affects young children and pregnant women. The World Health Organization estimates that 42% of children under 5 years of age and 40% of pregnant women worldwide are anemic [2].

Anemia, which is expressed by a decrease in the number of red blood cells in the blood, occurs with a decrease in the level of hemoglobin in the blood with parameters such as sex, age, pregnancy, and nutrition. So anemia is defined as the hemoglobin value below the appropriate reference range. The measurement of the values related to the cells in the blood circulation is called the complete blood count (hemogram). The complete blood counting marks the blood values low or high according to the reference range.

Both diagnosis and treatment of the anemia is decided by doctors. In order to diagnose anemia more accurately, blood tests, radiological images, etc., must be observed by the doctor. The diseases produce a lot of medical data from which alternative solutions are produced such as to detect diseases at an early stage, to prescribe appropriate drugs to the patient, and not to extend the initial phase before reaching the critical phase. Consequently, disease determinations can be made for new patients according to the medical data obtained from the patients. This is very important for doctors to minimize the margin of error in the diagnosis they will make for the patient, and it is important for helping doctors to diagnose. Therefore, the evaluation of data records in health institutions is of great importance for patients and hospitals. However, these processes can be difficult and costly, especially in underdeveloped countries.

There are many studies on designing decision support systems for doctors for new patients by evaluating data records in hospitals with biomedical image processing, biomedical signal processing, biomedical digital data processing, etc. [3,4,5,6]. In image processing studies, medical images (magnetic resonance imaging (MRI), computerized tomography (CT) scans, etc.) have been analyzed and systems have been developed to help doctors make better treatment decisions [7,8,9]. Signal processing studies aim to develop systems that help doctors by analyzing and interpreting medical signals (electrocardiography (ECG), electroencephalography (EEG), etc.) [10, 11]. In studies on digital data processing, digital data (blood count, C-reactive protein (CRP) level, etc.) from patients are usually processed and systems have been developed to help doctors respond faster and more accurately for new patients. In addition to classical methods such as support vector machines, Naïve Bayes, regression, and k-nearest neighborhood, artificial intelligence-based methods such as artificial neural networks, deep learning, and random forest trees have started to be used in studies [12, 13].

Optimization methods have an important place in the solution of engineering problems. Modeling a problem has become an area where optimization methods are frequently used. Finding the model parameters that best represent the problem is a very important step for modeling the problem. For this reason, mathematical modeling is needed in areas such as data analysis, control system design, machine learning, etc. [14,15,16].

Engineering problems are faced with increasing levels of complexity day by day. Classical methods cannot be successful in the optimization of complex systems due to problems such as difficulty in solving high dimensional problems, local minima problems, the fact that many classical methods are designed for differentiable problems, etc. Therefore, the need for new optimization methods inspired by nature is increasing. These methods, which tend to perform better on complex problems, can deal with non-continuous problems and are less sensitive to local minima [17, 18]. Examples of nature-inspired algorithms frequently used in optimization are crow search optimization (CSO), chicken swarm algorithm (CSA), JAYA, ant colony, HHA, artificial bee colony (ABC), etc. [19, 20].

MARS method, which is another mathematical modeling method preferred for analyzing complex datasets, has been frequently used in prediction, analysis, classification, etc., studies [21]. When the studies in which the Mars method is applied are examined, it shows that this machine learning approach can help to create good prediction models for engineering datasets [22,23,24].

These methods may not perform well for every dataset due to different features in different datasets used in studies. Limits such as different parameter properties, number of parameters, and changes in the number of patient records in the dataset significantly affect the success of anemia disease classification methods. So, it is essential to develop new techniques because the properties of the studied datasets and the number of parameters or sizes may differ [13]. In machine learning, image, biomedical, robotics, natural language processing, and other fields, both classical and metaheuristic methods have been successfully used to classify data with different parameters and feature structures [25,26,27]. The methods have been developed from different perspectives such as linear, quadratic, and exponential in the literature, and the methods have been analyzed under various scenarios such as linear, quadratic, and exponential in order to model the relationships between the parameters in the datasets [28, 29]. The model weight values in the constructed scenarios were calculated by optimizing the proposed methods according to the objective function. Disease classification was made by generating the weight values with the lowest error.

For these reasons, new approaches and algorithms need to be developed to predict anemia. There are many data mining methods used in anemia diagnosis in the literature. These are: learning vector quantization neural network (LVQ), k-nearest neighbors (k-NN), multiple linear regression (MLR), logistic regression (LR), fuzzy logic (FL), artificial neural networks (ANN), etc. [30, 31].

In this study, 1732 blood data from the Kaggle database were analyzed using the Harris hawks algorithm, a nature-inspired evolutionary algorithm, and the MARS algorithm, a classical mathematical modelling method. The proposed methods are analyzed under 6 different scenarios: multilinear form HHA, multilinear quadratic form HHA, multi-exponential form HHA, first-order MARS model, second-order MARS model, and MARS model to obtain the best degree and pruning coefficient. Thus, the pruning parameter and degree values, which have a significant effect on the performance of the MARS method in 3 models, enable the model to learn different relationships and reveal complex models, while in the other 3 models, the effects of the parameters on the classification success of the problem modelled in linear, quadratic, and exponential form were optimized by HHA method and the most appropriate weight values were obtained. To the best of our knowledge, no anemia classification study has been performed using the MARS method and parameter estimation method based on mathematical modelling with HHA.

2 Literature review

With the help of artificial neural networks and decision trees developed by genetic programming, an average of 90% performance was obtained as a result of the tests performed for the classification problem of thalassemia (Mediterranean anemia) disease [32]. In a 2008 study, a decision support system was designed to help physicians in iron deficiency anemia [33]. Finally, Anemia (+) and Anemia (−) results were evaluated at the end of the procedure. The results of the decision support system completely coincided with the decisions of the doctors. Serum iron, serum iron-binding capacity, and ferritin were used as parameters in the study, and six different blood parameters, namely HGB, RBC, MCH, MCHC, WBC, and HCT, were used in our study. In a study conducted in 2011, anemia prediction and classification were analyzed using data mining techniques, J48 and sequential minimum optimization (SMO) classification methods were applied in Weka, and the C4.5 decision tree algorithm (CDTA) and support vector machine (SVM) were studied [34]. Another study designed a neuro-fuzzy network to determine the level of anemia in a child [35]. With this system, which was developed after statistical measurements, the root mean square of the errors was found to be 0.2743. In 2012, artificial neural networks and an adaptive neuro-fuzzy inference system (ANFIS) were developed to predict iron deficiency anemia based on four laboratory data of mean erythrocyte volume (MCV), mean cellular hemoglobin (MCH), mean cellular hemoglobin concentration (MCHC), and red blood cell count (RBC) [36]. In a study on iron deficiency anemia in women, feedforward networks (FFN), cascade forward networks (CFN), distributed delay networks (DDN), probabilistic neural network (PNN), and LVQ were used [37]. Another article presents the classification of blood characteristics with a CDTA, Bayesian classifier, and a multilayer perceptron (MP) [38]. With the study classified eighteen thalassemia anemia with high prevalence in Thailand, the best classification performance is obtained with the Naïve Bayes (NB) classifier and then with the multilayer perceptron. A study was carried out using machine learning algorithms in the detection of anemia [39]. In this study, ANN, SVM, and statistical model methods were applied in the diagnosis of iron deficiency. Some classification algorithms such as NB, MP, J48, and SMO were used by using WEKA data mining tool [40]. As a result, it was observed that the J48 decision tree algorithm (JDTA) had the best performance. The deep learning methodologies were used to increase the performance of white blood cell (WBC) identification systems. A new WBC recognition system has been proposed based on deep learning theory [41]. In a 2018 study, an easy-to-use and inexpensive device was developed to determine the anemia status in patients, preventing the patient from going to the laboratory frequently, allowing a large number of people to be screened for anemia [42]. It has been observed that there is a strong correlation between the information estimated by the device and the actual Hb values obtained by taking blood samples. The k-NN classification algorithm was used to assess the anemia status and gave good results. Thus, doctors avoid a significant number of blood tests [42]. In the study conducted in 2019, the effect of biochemistry values on iron deficiency anemia was investigated by k-NN, CDTA, and ANN methods, based on the blood values stated in the literature to be effective for iron deficiency anemia [43]. As a result, it has been seen that the highest-performance artificial neural network method is. In another study, machine learning algorithms, linear discriminant analysis (LDA), classification and regression trees (CART), SVM, randomized forest (RF), k-NN, and LR were used [44]. They found that the RF algorithm achieved the best classification accuracy. In another study conducted in 2019, a new machine learning method (HEAC—Hemoglobin Estimation and Anemia Classification) was proposed for anemia classification based on blood parameters and compared with other machine learning methods in the literature [45]. Since the symptoms of iron deficiency anemia and β thalassemia are similar in the study conducted in 2020, a decision support system was developed to ensure discrimination [31]. In the proposed system, LR, k-NN, SVM, extreme learning machine, and regularized extreme learning machine classification algorithms are used. As a result, in the study in which male and female patients were evaluated together, an accuracy of 96.30% for women and 94.37% for men was obtained. In a study conducted in 2021, a structure was proposed that will enable the recognition of anemia in clinical practice conditions [46]. ANN, SVM, NB, and ensemble decision tree methods were used as classification algorithms. In another study, two hybrid models using genetic algorithm (GA) and deep learning algorithms (DLA) of stacked autoencoder (SAE) and convolutional neural network (CNN) were proposed for the prediction of some types of anemia [13]. When the performances of the proposed algorithms were evaluated, the performance of the GA-CNN algorithm was found to be better. One study in 2022 used synthetic minority over-sampling technique SMOTE to improve the imbalance of the anemia dataset from India [47]. Then, with the help of the decision tree rule-based learning method, the rules for the detection of anemia were derived using the original and SMOTE dataset.

When the studies on anemia are examined, it is seen that the use of swarm-based optimization methods is quite low. This study aims to see the success of the HHA algorithm, one of these algorithms, also called metaheuristics, which uses the advantages of swarm intelligence to solve complex optimization problems that cannot be solved by analytical methods, to obtain weight coefficients that will emphasize the importance of the parameters in the dataset, and the success of the MARS method, a classical mathematical modeling method preferred for the analysis of complex datasets, in the classification problem with different degree and pruning parameters. Both methods are tested under three different scenarios to highlight their success by modeling the relationships between dataset parameters. Both methods are tested under three different scenarios, and their success is highlighted by modelling the relationships between dataset parameters.

3 Material and method

3.1 Dataset and preprocessing

Blood data of 1732 patients from the Kaggle database were used in the study. The dataset consists of 351 patients with anemia and 1381 patients without anemia. As shown in Table 1, the study used 6 attributes and 2 classes, anemia (1)/healthy (0). The RBC value indicates the amount of red blood cells in the blood of each patient data, hemoglobin, HGB value indicates the amount of iron-rich protein stored in red blood cells, HCT value indicates the volumetric amount of blood in red blood cells, MCV value indicates the average cell volume, MCH value indicates the ratio of hemoglobin to red blood cells in a given volume, MCHC value indicates the average amount of hemoglobin in a single red blood cell, and 6 different blood components and their corresponding anemia outcome information.

Table 1 List of attributes in the dataset

Classification of anemia using Harris hawks optimization method and multivariate adaptive regression spline

Abstract

Similar content being viewed by others

Case-Based Reasoning (CBR)-Based Anemia Severity Detection System (ASDS) Using Machine Learning Algorithm

Anemia Multi-label Classification Based on Problem Transformation Methods

Decision Rules Generation Using Decision Tree Classifier and Their Optimization for Anemia Classification

Explore related subjects

1 Introduction

2 Literature review

3 Material and method

3.1 Dataset and preprocessing

3.2 Harris hawks optimization method

3.2.1 Exploration phase

3.2.2 Transition from exploration to exploitation

3.2.3 Exploitation phase

3.3 Multivariate adaptive regression spline (MARS)

4 Application of HHA and MARS methods to the anemia prediction problem

4.1 Adaptation of HHA algorithm to anemia disease problem

4.2 Adaptation of the MARS method to the problem of anemia

5 Evaluation

6 Experimental results of HHA and MARS

7 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation