Mortality Prediction of Various Cancer Patients via Relevant Feature Analysis and Machine Learning

Breast, lung, prostate, and stomach cancers are the most frequent cancer types globally. Early-stage detection and diagnosis of these cancers pose a challenge in the literature. When dealing with cancer patients, physicians must select among various treatment methods that have a risk factor. Since the risks of treatment may outweigh the benefits, treatment schedule is critical in clinical decision making. Manually deciding which medications and treatments are going to be successful takes a lot of expertise and can be hard. In this paper, we offer a computational solution to predict the mortality of various types of cancer patients. The solution is based on the analysis of diagnosis, medication, and treatment parameters that can be easily acquired from electronic healthcare systems. A classification-based approach introduced to predict the mortality outcome of cancer patients. Several classifiers evaluated on the Medical Information Mart in Intensive Care IV (MIMIC-IV) dataset. Diagnosis, medication, and treatment features extracted for breast, lung, prostate, and stomach cancer patients and relevant feature selection done with Logistic Regression. Best F1 scores were 0.74 for breast, 0.73 for lung, 0.82 for prostate, and 0.79 for stomach cancer. Best AUROC scores were 0.94 for breast, 0.91 for lung, 0.96 for prostate, and 0.88 for stomach cancer. In addition, using relevant features, results were very similar to the baseline for each cancer type. Using less features and a robust machine-learning model, the proposed approach can be easily implemented in hospitals when there are limited data and resources available.


Introduction
Cancer is a broad term that encompasses a wide range of illnesses that can affect any region of the body. Malignant tumors and neoplasms are also another terminology for that. Cancer develops when normal cells are transformed into tumor cells in a multi-stage process that usually evolves from a pre-cancerous lesion to a malignant tumor. One of the hallmarks of cancer is the rapid emergence of anomalous cells that grow beyond their normal bounds, allowing them to infect other sections of the body and spread to other organs; this is known as metastasis. The most common cause of death among cancer patients is widespread metastasis [1].
In 2020, there were an anticipated 18.1 million cancer cases worldwide [2]. Breast and lung cancers were the most frequent cancers globally in 2020, accounting for 12.5% and 12.2% of all new cases, respectively. Prostate and stomach cancer were the fourth and fifth most frequent cancers, accounting for 7.8% and 6.0% of all new cases diagnosed in 2020, respectively [2]. Breast cancer was the most common cancer with 2,261,419 new cases in 2020, 684,996 mortalities [3]. Followed by lung cancer, with 2,206,771 new cases in 2020, it had 1,796,144 mortalities [4]. With the fourth most, prostate cancer had 1,414,259 new cases in 2020 and 375,304 mortalities [5]. Followed by stomach cancer, with 1,089,103 new cases in 2020, it had 768,793 mortalities [6]. As can be seen from the case statistics, the mortality rate of these cancer types is very high.
Early-stage detection of cancer is crucial in order to decrease the mortality rate of patients [7][8][9]. For physicians and researchers, detection and diagnosis of cancer pose a challenge in the literature. Detection of cancerous cells is mainly done by medical imaging and laboratory tests [10,11]. These procedures are time consuming and require a huge workforce.
When dealing with cancer patients, physicians must select among various treatment methods, each of which has a significant risk. The existing therapies for patients with late-stage cancer disease can only provide a small survival chance. In addition, therapies have side effects, which can be worse than the symptoms that are being treated or prevented [12]. The results of some treatments can take up to 3 months to manifest, while adverse effects can appear sooner and last up to 6 weeks. Since the risks of therapy may outweigh the benefits, therapy schedule is critical in making treatment decisions. Overtreatment should be avoided in order to minimize unnecessary toxicities or collateral damage [13]. With the numerous treatment approaches, a timely selected one can increase the treatment's rate of resolution. Precision medicine [14][15][16] has also come forward since susceptibility to drugs may vary for each patient. Like drug response, a patient's comorbidities may affect in-hospital mortality since comorbidities also alter medication and treatment choices.
Manually deciding which medication and treatment will be used to cure cancer takes a lot of time and can be problematic for physicians. One solution to aid physicians in the assessment of cancer patients is machine-learning approaches. Recent studies [7,8,10,11, focused mainly on machine-learning approaches to recognize the main characteristics of cancer patient data since they showed that they have a deep impact on early-stage mortality prediction. Machine-learning methods can analyze and extract key knowledge from patient data. In addition, they can learn from data, and predict desired results much more rapidly. Because of these advantages, machine-learning approaches gained popularity in the cancer research area.
Although machine-learning approaches are quicker than manual decision making, as the number of features increases, computation time and the resources that the model requires expand as well. Therefore, in order to overcome this issue, the number of features that go into models needs to be small. It should be small enough to reduce dimensionality in a reasonable manner and still produce adequate results to be used in prediction tasks.
It has been mentioned in many studies that comorbidities, medication, and treatments of a patient are very important features to assess whether a cancer patient can survive or not [12][13][14][15][16][40][41][42][43][44][45][46][47]. These features can easily be retrieved from the hospital's electronic health database. Since other data, like laboratory measurements or microbiology events, mostly have missing data, require additional preprocessing, and are time consuming to collect, they were not considered in this work.
There are no prior studies on mortality prediction with cancer cases in the MIMIC-IV dataset. The first contribution of this work is studying with these cancer types for mortality prediction on MIMIC-IV dataset. Therefore, the main approach in this work is finding easily acquirable features that can be used with machine-learning approaches while keeping the prediction rate as high as possible for inhospital mortality prediction on various cancer patients. To this end, several machine-learning classifiers with various feature sets have been evaluated. Our approach also investigates how well these features act together with machinelearning approaches for limited amounts of data.
A classifier framework reduces the burden on doctors and effectively uses easily accessible electronic health data designed. In this work, using patient's comorbidities, medications, and procedures from MIMIC-IV dataset, the most significant features are selected with Logistic Regression for in-hospital mortality prediction on various cancer patients. Several experiments are done with different machine-learning models. These are Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, and Multi-Layer Perceptron models. Mortality prediction capabilities of machine-learning models are evaluated with the F1 Macro-Average and AUROC score metrics.

Related Work
There are many studies in the literature for cancer-related detection solutions. Currently, there are no prior studies on mortality prediction with cancer cases on the MIMIC-IV dataset. "Related work" contains machine-learning applications on various cancer types that uses different datasets than MIMIC, other cancer-related works on MIMIC-III and MIMIC-IV-related studies.
Xie et al. [7] put effort on finding diagnostic biomarkers for lung cancer from Chinese patients' plasma metabolites. The dataset they used is from Hubei Taihe Hospital which consists of 110 lung cancer patients and 43 healthy individuals. Combination of six metabolic biomarkers was found to be worthy for stage I lung cancer patients with high AUC. They recommend Naive Bayes for early lung tumor prediction. Raoof et al. [10] analyzed lung cancer patients with Naive Bayes, Support Vector Machine, Logistic Regression, and Artificial Neural Network methods for early detection and mortality prediction. Using UCI MLDb data, collected data from various hospitals, and CT images, they discussed the cause of lung cancer and compared these methods. They pointed out mostly that the reason of mortality from lung cancer resulted from smoking cigarette and radon gases. Several other studies [11,[17][18][19][20][21] used various machine-learning methods to predict lung cancer and survival prediction.
Shalini et al. [22] showed hidden patterns with Support Vector Machine, K-Nearest Neighbor, and Decision Tree for breast cancer. Wisconsin breast cancer dataset from UCI has SN Computer Science been used. They also used Deep Learning methods for automatic feature selection and prediction. BI_RADS assessment, tumor size, and shape found related features. Naveen et al. [23] predicted breast cancer by applying feature scaling, cross-validation, and various ensemble machine-learning methods. Coimbra dataset from UCI was used. With Decision Tree and K-Nearest Neighbor, they achieved highest accuracy on training set. They discussed results on confusion matrix and classification report. Other studies [24][25][26][27][28][29][30]48] conducted diverse machine-learning methods to predict breast cancer and survival prediction. Feature selection has been made [31] for breast cancer patients.
For prostate cancer, Revett et al. [32] investigate rough sets using decision table in order to classify patients. On 502 prostate patients, they achieved 90% accuracy with 91% sensitivity and specificity. Stage, treatment, age, Pf, Hx, Sbp, Dbp, Hg, tumor size, and bone metastases features were taken into account.
Danilatou et al. [8] conducted experiments on in-hospital and after-discharge mortality prediction using machinelearning methods. 2468 venous thromboembolism patients in ICU from MIMIC-III database were used in their study. Using automated machine-learning platform JADBIO and Random Forest, they got highest AUC for early mortality. 1471 features were extracted. Other conducted studies on MIMIC-III dataset were mainly on early mortality prediction for several cancer patients [33][34][35][36]. Other papers [37][38][39] focused on diagnoses of breast cancer using physician's text notes of electronic health records on MIMIC-III.
Nowroozilarki et al. [49] presented survival analysis method for real-time mortality warning system with the help of the information conveyed by the time-varying EHR data. Developed method was BoXHED 2.0 and they used MIMIC-IV dataset. AUC-PRC of 0.41 and AUC-ROC of 0.83 results were achieved. Meng et al. [50] conducted analyses on interpretability, dataset representation, and prediction fairness of deep learning models for in-hospital mortality. They used MIMIC-IV dataset. Using Long Short-Term Memory, they got highest AUROC score.
As previous studies have shown, multiple machine-learning algorithms have been used to predict cancer and used in mortality prediction for cancer patients. However, for the MIMIC-IV dataset, there are no studies that investigate mortality prediction for these cancer types with this set of features.

General Framework
An open access healthcare dataset called Medical Information Mart for Intensive Care IV (MIMIC-IV) was used. In this dataset, diagnosis, medication, and treatment features were extracted for breast, lung, prostate, and stomach cancer patients. This framework takes multiple features and uses a classification model to predict mortality. A typical model for supervised classification of problems was employed, which consists of training and testing phases (Fig. 1). Primarily, patients' diagnosis, medication, and treatment data were extracted from the dataset and labeled with patient's mortality status. Next, at the preprocessing stage, null values were excluded, while others were trimmed down and formatted as lowercase in order to group features correctly. These feature vectors have been trained with a classifier. As a result, from the trained data, a machine-learning model have been deployed. Furthermore, a test sample is given to model to predict the mortality event. Lastly, using multiple metrics for classification evaluation, the comparison results is reported.

Medical Information Mart in Intensive Care (MIMIC) IV Dataset
In this work, the largest publicly available healthcare dataset from the Medical Information Mart for Intensive Care IV (MIMIC-IV v1.0) containing over 500,000 ED visits between 2008 and 2019 was used. It contains real hospital stays for patients admitted to a tertiary academic medical center of Beth Israel Deaconess Medical Center (BIDMC) in Boston, Massachusetts. The MIMIC-IV dataset is an updated and enhanced version of the MIMIC-III dataset created in 2016. MIMIC-IV is divided into three modules: core, hosp, and icu. These components are designed to emphasize their intended function and provenance. The core module holds patient tracking information that is required for any MIMIC-IV data analysis. The data in the hosp module come from the hospital's EHR system. The icu module includes data Fig. 1 Proposed framework mortality prediction from the BIDMC's MetaVision clinical information system (iMDSoft) [51].
There are many tables existing in aforementioned dataset. These tables are populated with different kinds of data for each patient. Following dataset tables are selected as base features for this study: diagnoses_icd, emar, procedures_icd, and admissions.

Data Preparation and Inclusion Criteria
Cohort is selected from distinct patients who have either breast, lung, prostate, or stomach cancer. There are 3321 diagnoses and 2065 distinct patients for breast cancer, 6677 diagnoses and 3364 distinct patients for lung cancer, 3112 diagnoses and 1971 distinct patients for prostate cancer, and 1146 diagnoses and 583 distinct patients for stomach cancer, either with ICD versions 9 or 10.
Only the first diagnosis has been taken into account, except for the one that the patient was dead on the same day as the hospital admission (Fig. 2). Using the admissions table, this narrows down the numbers to 2037 breast, 3134 lung, 1927 prostate, and 557 stomach cancer patients. No enclosure criteria have been further applied. For this cohort, overall deceased patient numbers are: 92 for breast, 335 for lung, 121 for prostate, and 63 for stomach cancer.

Selected Classifiers for Proposed Framework
In this proposed framework, with parametric and nonparametric models, supervised learning was used in order to solve a typical binary classification problem. Parametric models are made up of function sets that are used to estimate a set of parameters using training data. Nonparametric models are those whose parameters are determined by the training set, a subset of which are in use during prediction. Supervised learning infers a function from labeled training data and a set of training instances. Binary classification is a task that predicts the class label for a given case as true or false [52][53][54][55]. The following classifiers are used in this study: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, and Multi-Layer Perceptron.
Logistic Regression is a probabilistic-based statistical model that is commonly used to handle classification challenges in machine learning. It uses a defined sigmoid function and a fixed threshold in order to estimate the probability of a class. It works well when the dataset can be divided linearly and suffers from overfitting with highdimensional datasets [52,53]. The reason for choosing Logistic Regression in this study is that it defines the relationship between variables and can predict the probability of its existence.
Decision Tree is a handy nonparametric supervised learning method that can find complex nonlinear relationships in the data. It classifies the data by sorting down the tree from the root node to a few leaf nodes. On the branch, respected attribute instances are categorized by verifying the attribute defined by that node. The most prominent criteria for splitting are "gini" which is used for Gini impurity and "entropy" which is used for information gain [53,54]. In this study, the Decision Tree is chosen because of its low cost and high classification accuracy.
Random Forest is an ensemble optimization strategy that fits many Decision Tree classifiers simultaneously on a subset of features. By combining majority voting with bootstrap aggregation, it reduces overfitting and improves prediction accuracy. Since feature selection is randomized, it is usually more accurate than a single Decision Tree [54,55]. In this case, the Random Forest is chosen because the training phase is fast and it does not memorize the data. Support Vector Machine constructs a set of hyper-planes from data and separates differences. Mainly, the hyper-plane creates a significant separation by having the maximum distance from the closest training data points of a class. Although it performs classification, regression, and outlier detection, it is highly sensitive to noise data [52,54]. The reason for choosing the Support Vector Machine in this study is that it is flexible with different kernel methods and is effective in cases where there are many attributes, such as health data.
Multi-Layer Perceptron is an artificial neural network which has the feed-forward architecture from deep learning. It consists of fully connected hidden layers between the input and output layer that classifies with neurons' calculated weights due to some activation functions. While creating the model, it adjusts the internal weights with backpropagation approach [53,55]. In this study, the Multi-Layer Perceptron is chosen because it has a flexible structure that can work well with big data and can give fast predictions.

Feature Extraction and Selection
The following features have been used as binary flags in this work.
It has been stated that using the comorbidity (e.g., hypoxemia, acidosis, and sepsis) of a patient for in-hospital mortality, cancer survival, treatment selection, and usage in clinical applications are highly related [40][41][42][43]. With that stated, diagnosis data (i.e., distinct values of icd_code for diagnosis) from the diagnoses_icd (Billed ICD-9/ICD-10 diagnoses for hospitalizations) Recent studies also point out that using medication (e.g., Morphine Sulfate, Cefazolin, and Pilocarpine) as a feature in drug response prediction, cancer survival, and precision medicine is vital [14-16, 44, 45]  It has been shown that for preventing futile treatment strategy, cancer survival, clinical decision making, and treatment (e.g. closed biopsy, venous catheterization, and bronchoscopy) selection play an immense role [12,13,46,47]. Taking these into consideration, procedures data (i.e., distinct values of icd_code for procedures) from the proce-dures_icd (billed procedures for patients during their hospital stay) table are used. The number of procedures is 10,250 for breast, 20,160 for lung, 11,084 for prostate, and 5421 for stomach cancer patients as treatment.
Rather than using a specific subset of features like the examples given above, they are grouped and considered as feature groups. These are diagnosis, medication, and treatment. These features did not discriminate against a specific cancer type. In fact, they were suitable and common for all cancer types. These features also hold discrete values. Features were handled as binary variables, whether they occurred or not, for that cancer type. Steps of this approach are explained in following paragraphs. As a result, no continuous data type has been used. Extended subset of features is given as "Supplementary Information 1" in their relevancy order.
For each cancer type, all these features have been grouped by, which results in a unique number of feature lists (Table 1).
It has been seen that, especially in medication data, similar values have been written in different formats. The same value has been noted down either in camel case, all capitals, all small, or with white spaces at the end. For diagnosis and treatments, between versions of ICD-9 and ICD-10 spellings were different with the same reason explained earlier, which was resulting false grouping outcome of features. In order to group features correctly, following data cleaning and formatting procedures have been applied: first, null data were excluded; second, data were trimmed down by removing unnecessary white spaces either at the end or at the beginning; and lastly, data were formatted as lowercase. Since diagnoses and procedures use similar ICD codes, in order to prevent confusion, diag-, med-, and pro-prefixes have been added to diagnose, medication, procedure features' headers, respectively. No other preprocessing operation has been applied than these (Fig. 3). For all patients, every feature encountered put into a map, creating a unique feature dictionary for that type of cancer. Then, by assigning these features as columns for each patient, the frequency of that feature is set if it occurred for that patient; otherwise, it is set to 0.
It has been stated that in order to reduce computational load, feature data can be transformed to binary encoding before feeding the model. In addition, it has been shown that one-bit representation of the data results in rapid outcomes and low resource costs [56,57].
Therefore, in order to reduce the computational cost and time spent on model training, the following feature extraction method is applied: if a feature occurred for that patient, its value updated to 1. Lastly, the class variable (i.e., dependent variable or result) in this problem is mortality, which is dead or alive. If that patient deceased, the result set as 1 otherwise 0. With that, the final dataset is created.
In order to clarify the preprocessing steps in Fig. 3, first, text-formatting procedures are applied to group same terms. Second, feature prefixes are added in order to prevent confusion for same term codes for different feature set. Third, all subsets of features for that cancer type are used as column and patients as row. Then, if that feature occurred for that patient, the intersection is set to 1. Lastly, the mortality status of that patient is added as a result column in order to find out the final data.
Due to its natural way of working, Logistic Regression is an effective feature selector. Feature weighting interprets model coefficients by their relevancy. It not only shows how important the coefficient is, but also depicts its direction of relation as positive or negative. This feature selection approach has been adopted by many studies [58][59][60][61]. It has been shown that with Logistic Regression, less important features were eliminated in a study that focuses on breast cancer [58]. Logistic Regression also used for biomarker selection, and relevant gene selection in lung cancer studies [59,60,62].
In Logistic Regression, the classification result is determined by a probability value between 0 and 1. The coefficient (β) shows the change in the logarithm of the probability of the result along with the direction and size of the relationship. When calculating the coefficient, the weighted sum is converted into a probability by the logistic function. The term in the ln() function is the probability of an event divided by the probability of it is not happening, and as a whole these are called log rates [52]: When one of the x attributes is changed by 1 unit, the exp() function is applied to each side in order to understand how the estimation changes. To maximize log probability, the regression coefficients are iteratively reweighted until classification is complete [54]. In Logistic Regression, the exponent of positive values produces a coefficient greater than 1, while a coefficient with a value of zero produces an exp(β) equal to 1, indicating that the feature does not affect the probability of the result [58].
In this work, feature selection is made with Logistic Regression for each cancer type. It is seen that irrelevant features' weights stay around 0. In addition, some features are positively related, which is when that feature occurs, it affects the result in an enhancing way. Negatively related ones are affecting result in a decreasing way yet are still related. After relevancy marked, Logistic Regression weights need to be in order, so that relevant features can be fed to models early. Features are ordered according to weight; negative ones appear at the end. (1)

Training and Evaluation Setup
For all cancer types, stated machine-learning algorithms are used. Test size has been chosen as 30%, and training size has been chosen as 70%. The created dataset is fed to the model with a fixed random state value in order to reproduce the results again, creating a base result.
The area under the receiver-operating characteristic curve (AUROC) and the F1 Macro-Average is used as evaluation metrics for binary classification. The ROC is the true-positive rate versus the false-positive rate. Specificity is the ratio of negative instances that are correctly classified as negative. Sensitivity is the ratio of negative instances that are falsely classified as positive. Thus, the ROC curve plots sensitivity versus 1 − specificity. The F1 score is the harmonic (balanced) mean of recall and precision. The F1 Macro-Average score is the unweighted mean of all the F1 scores per class [52]: There are four possible outcomes given a classifier and an instance. A true positive (TP) is defined as a positive occurrence that is classified as positive; a false positive (FP) is defined as a negative instance that is identified as positive. A true negative (TN) is defined as a negative occurrence that is classified as negative; a false negative (FN) is defined as a positive instance that is identified as negative. True positive rate is referred as sensitivity, and true negative rate is referred as specificity [52]:

Empirical Results
In this part, the stated methods have been analyzed in three ways. First, using all features, the BASE results are collected. BASE corresponds to the baseline results which are compared against. They documented using all features in all methods. With that, baseline performance of models is extracted. Second, the same procedures are followed with set of features starting at 100 to number of possible features for that type of cancer by adding the next 100 features each time. After collecting all results, if that result either surpasses the BASE result or is as close as possible, it is marked as a candidate. Among candidates, the minimum number of features formed the BEST X result, where X denotes the number of features that creates this best result for that cancer type with that model. Finally, ordered with Logistic Regression by their relevancy, the best 100 features is chosen. Then, using only these features, the FIRST 100 results are noted. This is to show how well the methods perform with only the minimum feature set.
Results are divided into two as the comparison of F1 Macro-Average and as AUROC. The performance of all methods performed in this work is shown in Tables 2, 3, 4, and 5. For the following tables, rows show the used models grouped by that specific cancer type. As columns, there are three types of result set. "BASE (ALL) Features" shows the baseline performance results with all features for that model. "BEST X Features" shows the best performance results compared to its baseline result. Number of X is also shown here in order to depict the feature number that is in use. Lastly, "FIRST 100 Features" shows the result of first 100 features which is the possible minimum feature set. All colored cells indicate that compared with other models, they are the local best result for that cancer type in that feature group.
As can be seen in Table 2, using BASE features, Decision Tree method gave the best results with breast, lung, and prostate cancer. For stomach and lung cancer, Logistic Regression gave the best results. Among all types, prostate cancer gave the highest score which is 0.82. For all cancer types, FIRST 100 features results were generally close to the BASE results for all methods. Random Forest and Multi-Layer Perceptron methods' results surpassed their respected BASE results for breast and lung cancer. Random Forest and Support Vector Machine methods' results also surpassed their respected BASE results for prostate and stomach cancer with FIRST 100 features. It can be seen that FIRST 100 features' results were close to BASE results by approximately 90% (Table 2).
BEST X features can be employed in cases where computational load and memory resources can be coped. Mortality prediction results using BEST X features can be seen in Table 3. For breast cancer, using approximately 20% of all features, score was close to BASE result using Decision Tree model. With 18% of all features, the same result as with BASE was achieved using Logistic Regression. For lung cancer, with 10% of all features, BASE result was outperformed using Logistic Regression. With approximately 12% of all features, the score was close to BASE result using Decision Tree. For prostate cancer, with 8% of all features, BASE result was outperformed using Logistic Regression. With 14% and 1% of all features, scores were close to BASE results using Decision Tree and Multi-Layer Perceptron, respectively. For stomach cancer, with 12% of all features, BASE result was outperformed using Logistic   In order to increase comprehension, the final F1 Macro-Average results are also shown in Fig. 4 for full comparison.
AUROC scores of the experiments can be seen in Table 4. On AUROC scores, using BASE features, Support Vector Machine method gave the best results with breast and stomach cancer, and Random Forest method gave the best results with lung and prostate cancer. Among all types, prostate cancer gave the highest score, which is 0.94. Logistic Regression method result surpassed their respected BASE result for breast cancer. Logistic Regression and Multi-Layer Perceptron methods' results also surpassed their respected BASE results for lung and prostate cancer with FIRST 100 features. It can be seen that FIRST 100 features' results were close to BASE results by approximately 98% among all results (Table 4). In addition to AUROC scores, ROC curves are drawn to demonstrate how base and 100 features behaved in mortality prediction. As can be seen in Fig. 4, for all cancer types, FIRST 100 features results were generally close to the BASE results for all methods.
AUROC results for BEST X features and machine-learning models can be seen in Table 5  In order to increase comprehension, the final AUROC results are also shown in Fig. 5 for full comparison.
An AUROC score is a representation of how well a binary classification model performed on the positive class. The ideal fraction of accurate positive class predictions would be 1. This demonstrates that the top-left of the plot is the best feasible classifier and reaches perfect competence. A classifier will generate a diagonal line if it lacks the ability to distinguish between positive and negative classifications. Due to its lack of bias toward either the majority or minority class, it is a well-liked diagnostic tool for classifiers on both balanced and imbalanced binary prediction problems [52].
The results below can be compared in two ways. The first is holding one cancer type and feature group constant and comparing different types of machine-learning models. For example, for breast cancer and BASE features, it can be clearly seen that SVM classifies better than LR since SVM has more area under the ROC. The second is holding one cancer type and one machine-learning method constant and comparing different types of feature groups. For example, for lung cancer and Logistic Regression method, it can be seen that FIRST 100 features have better score than BASE features with 0.86-0.77 (Fig. 6).

Conclusion and Discussion
Cancer is the most threatening disease in the world. Breast, lung, prostate, and stomach cancers are the most frequent ones globally. Early-stage detection and diagnosis of these cancers pose a great challenge in the literature. When dealing with cancer patients, physicians must select among various treatment methods that have a risk factor. Since the risks of treatment may outweigh the benefits, treatment schedule and treatment selection are critical in clinical decision making. Manually deciding which medications and treatments are going to be successful, takes a lot of expertise and can be hard. Furthermore, early-stage detection of cancer is crucial in order to decrease the mortality rate of patients. As a result, computerized mortality detection with least required features will help physicians to overcome these limitations and issues.
Although machine-learning approaches are more quick than manual decision making, as the number of features increases, computation time and the resources that the model requires expand as well. The main problem that was addressed in here is: finding the least required features while keeping the prediction rate as high as possible for in-hospital mortality prediction on various cancer patients in order to help physicians. In addition, no other study was found for this purpose on the MIMIC-IV dataset.
Using various machine-learning methods and patients' diagnoses, medications, and treatment features, a comparative analysis of these methods was conducted. Machinelearning methods are Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, and Multi-Layer Perceptron. For feature extraction, patient data are represented as one-bit flags in order to reduce computational cost and time that spend on model training. with Logistic Regression, the most significant features are identified and selected. These features are fed to models in multiples of 100 in search of similar or preferably, better results than using all features.
With a smaller set of features, results were promising and mainly surpassed their baseline scores. F1 Macro-Average scores were: 0.74 for breast, 0.73 for lung, 0.82 for prostate, and 0.79 for stomach. AUROC scores were: 0.94 for breast, 0.91 for lung, 0.96 for prostate, and 0.88 for stomach. It can be seen that this set of features is much more suitable and has more generalization opportunities for prostate cancer than other cancer types.
The proposed approach has several contributions. First, our approach uses less features while maintaining the same prediction performance as using all available features.  Second, by converting feature sets to binary flag values, the memory footprint of feature vectors becomes smaller. Third, it is the only study that explores cancer mortality in the MIMIC-IV dataset. Finally, in our study, we identified which machine-learning models work well with which cancer type in predicting mortality. When a result class has a majority in one sample, which is also called class distribution [52], this creates an imbalanced dataset. The data that we used in this study have a 90/10 life-to-death ratio. Since the cancer data here are imbalanced, the model results were highly sensitive to class distribution instance numbers and also possibly mislabeled noisy data. But as can be seen from the results, our approach can cope with a limited amount of data using relevant feature sets and efficient machine-learning algorithms. In future works, traditional features such as age, gender, and lab values can be added and can be refollowed the same procedures. In addition, other similar publicly available datasets can be considered to validate our approach. Finally, this problem was not applicable to deep learning methods due to low number of data available [63,64]. In order to thrive, deep learning architectures require large sample sized datasets. To solve that issue, methods that increase sample sizes can be considered for future studies [65]. For the next steps, deep learning frameworks [66] can be considered for future approaches when there are more patient data available.
Funding Open access funding provided by Tampere University including Tampere University Hospital, Tampere University of Applied Sciences (TUNI).