Introduction

Cancer cachexia is a multifactorial wasting syndrome characterized by involuntary weight loss [1]. It significantly impacts the quality of life of cancer patients and exacerbates chemotherapy toxicity, thereby substantially increasing cancer mortality [2]. Cachexia affects 50% of cancer patients and accounts for 20% of cancer-related deaths [3, 4].

It is now widely recognized that cachexia responds poorly to available nutritional and pharmacological treatments and eventually and inevitably leads to patient death once a critical point is reached [5]. Prevention and early recognition of cachexia deserve more attention than the treatment of cachexia because once the patient enters cachexia, he or she enters a disease process that is almost irreversible [6]. Owing to the complex presentation of cancer cachexia and its specificity in different patients, there is no precise definition or diagnosis of cachexia, and the current diagnostic criteria rely on the international expert consensus on cachexia reached in 2011. This consensus describes cancer cachexia as a continuum of three stages: precachexia, cachexia, and refractory cachexia [7]. However, although this consensus provides a framework for the diagnosis of cachexia, in practice, precachectic presentation is very insidious and difficult for clinicians to detect. The lack of pathophysiological evidence for cachexia has led clinicians to rely on their own experience or research strength to describe cachexia [8]. Patients also sometimes ignore weight changes, making the identification of cachexia more difficult in the absence of information on weight loss. Therefore, there is a need to develop other methods to identify cachexia and its stages.

Typically, patients with cancer undergo extensive physical examinations and inquiries from medical staff during their hospital stay. The results of these examinations and the patient’s self-reported physical condition provide a great deal of information that may be beneficial in identifying cachexia at an early stage. However, their potential value is underutilized. Much of the current research on cachexia has focused on the relationship between biomarkers and prognosis; however, little attention has been paid to precachexia studies. Machine learning is becoming increasingly popular because of its suitability for handling biological data of an ever-increasing size and for analyzing the complexity inherent in diseases [9]. In medical research, machine learning can be applied to clinical datasets to develop powerful risk models to reclassify patients [10]. Therefore, we aimed to use machine learning methods, combined with hematological information and symptoms of cancer patients, to identify individuals in precachexia and cachexia.

Methods

Study design and population

The Investigation on Nutrition Status and Clinical Outcome of Common Cancers (INSCOC) was a multi-center cancer cohort study in China. Data of this study were collected between 2013 and 2019 [11]. This study included patient with cancer older than 20 years and hospitalized ≥ 48 h. Figure 1 shows the specific inclusion and exclusion process. In the end, 3896 participants were included in the study.

Fig. 1
figure 1

Workflow

The definition of cachexia was from the 2011 International Consensus on Cancer Cachexia [7]. A cachexia participant was considered cachectic if she had lost more than 5% of her body weight involuntarily in the last 6 months, or more than 2% with a combined BMI < 20 kg/m2. Simple weight loss of more than 2% but not more than 5% was considered to be pre-cachexia. Participants were distinguished as non-cachexia (n = 2247), pre-cachexia (n = 471), and cachexia (n = 1178).

In accordance with the principles of the Declaration of Helsinki, this study was approved by the Medical Ethical Review Committee of the hospital where it was conducted (registration number: ChiCTR1800020329). Each participant provided informed consent. All participants’ information was obtained after participants signed informed consent.

Variables

The basic patient characteristics collected in this study include demographics, laboratory test results, physical measurements, and physical symptoms. All personal patient information was deidentified. Demographic data included sex, age, and basic disease information such as tumor type and stage. The tumor stage was recorded as the stage of the patient at first diagnosis.

All laboratory tests were conducted within 48 h of hospitalization after the patients had fasted for at least 9 h. Blood tests included blood lipid, protein, creatinine, glucose, and transaminase levels, and neutrophil, white blood cell, platelet, and lymphocyte counts. The composite indicators were calculated on this basis. Platelet-lymphocyte ratio: platelet count (× 109)/lymphocyte count (× 109); neutrophil lymphocyte ratio: neutrophil count (× 109)/lymphocyte count (× 109); leukocyte lymphocyte ratio: white blood cell count (× 109)/lymphocyte count (× 109); prognostic nutritional index: albumin (g/L) + 5 × lymphocyte count (× 109); C-reactive protein-albumin ratio (CAR): C-reactive protein (mg/L)/albumin (g/L); triglycerides-glucose index: ln (triglycerides (mg/dL) × glucose (mg/dL)/2).

Physical measurements included arm circumference (MAC), calf circumference (CAC), triceps skinfold thickness (TSF), and hand grip strength (HGS). HGS was measured using an electronic handheld dynamometer (EH101; CAMRY, Guangdong, China). The patients were instructed to stand comfortably and perform three maximal isometric contractions with the nondominant hand 30 s apart. For post-mastectomy patients, measurements should be taken using the side not subjected to lymph node dissection.

Eating changes, gastrointestinal symptoms, and decreased physical activity were reported by the participants, and fever, ankle edema, and ascites were assessed by health care providers.

Machine learning models

Our study was divided into two parts: the first distinguishing between patients with noncachexia and cachexia and the second distinguishing between patients with precachexia and noncachexia. Before building the machine learning models, the two groups of participants were divided into training and validation sets at a ratio of 7:3. Precachexia and cachexia were defined as dichotomous outcome variables.

We first developed five models to calculate variable importance ranking, namely, logistic regression (LR), least absolute selection and shrinkage operator regression (LASSO), eXtreme gradient boosting (XGB), random forest (RF), and decision tree (DT). To improve the performance of the models, we used tenfold cross-validation when training the models and hyperparameter tuning in the RF and DT models.

According to the ranking chart of variable importance, we determined the importance of each variable according to the number of occurrences and filtered out variables with more than four occurrences. To avoid model overfitting, we calculated the correlation between the variables, excluded those with high correlation, and then incorporated the remaining variables into the models for the second model training. We used receiver operating characteristic (ROC) curves to calculate the area under the curve (AUC) values of the models in the validation set to compare the judgment performances of the different models. After determining the optimal model, a decision curve was used to test its accuracy and a nomogram was established.

Statistical analysis

For statistical data, we used the mean ± standard deviation for normally distributed continuous variables, and the median [interquartile range] for continuous variables that did not follow a normal distribution. The normality of the variables was assessed using the Shapiro–Wilk test. The Student’s t-test or Kruskal–Wallis test was used for continuous variables, and the chi-square test was used for categorical variables.

All tests were two-sided, and statistical significance was set at P < 0.05. All analyses were performed using statistical software (R Studio version 4.2.0).

Result

Characteristics of the participants

The study included 3896 participants, including 2247 patients with noncachexia, 471 with precachexia, and 1178 patients with cachexia. The mean age of all participants was 58.49 ± 11.10 years, including 2228 men (57.2%) and 1668 women (42.8%). Among the participants diagnosed with precachexia, 61.4% were men; among the participants diagnosed with cachexia, 62.7% were men (Table 1). The tumor types in all participants included lung, stomach, liver, colorectal, breast, esophageal, cervical, endometrial, nasopharyngeal, pancreatic, ovarian, prostate, bladder, cholangiocarcinoma, and other cancers. The number and percentage of different cancer types are presented in Table 2. The results indicate that lung cancer is the most common, accounting for 34.8% of the cases. Among all cancer types, the highest incidences of cachexia were observed in pancreatic cancer, gastric cancer, and cholangiocarcinoma, with pancreatic cancer having the highest cachexia incidence rate at 48.15% (Table 2).

Table 1 Basic characteristic of participants
Table 2 The distribution of different cancer types and the incidence rate of cachexia in this study

Variable screening and model training

First, we performed model training to filter out appropriate variables. We used all baseline features, except weight change, as input variables in the training set for model training and optimization. The training and validation sets were divided at a 7:3 ratio (Supplementary Tables S1 and S2). The variable importance was calculated and ranked separately for all models. The top ten variables in importance are shown in Supplementary Figure S1 and S2. Based on the variable importance ranking chart, we listed the number of occurrences of each variable in all the models and ranked the variable importance again according to the number of occurrences (Table 3; Table 4). We selected variables with greater than four occurrences, from which we know that the most important features in identifying cachexia were eating changes, MAC, high-density lipoprotein (HDL), and CAR. The most important features for identifying precachexia were eating changes, serum creatinine (Scr), HDL, HGS, and CAR. Subsequently, we performed correlation tests between the variables, and the results showed a high correlation between MAC and CAC (r = 0.64). Since MAC was more important than CAC in Table 2, therefore, we excluded MAC (Supplementary Figure S3).

Table 3 Distinguish between non-cachexia and cachexia, and quantify the importance of variables based on the results of the first model training
Table 4 Distinguish between pre-cachexia and non-cachexia, and quantify the importance of variables based on the results of the first model training

Filtered variables were used to train the models separately to identify cachexia and precachexia. The results showed that the LR models had the best recognition effect for both cachexia and precachexia, with AUC values of 0.830 and 0.701, respectively, for the ROC curves (Figs. 2 and 3).

Fig. 2
figure 2

ROC curves to differentiate between cachexia and non-cachexia (including pre-cachexia). a First model training to filter variables. b Second model training to screen the best model LR, logistic regression model; LASSO, least absolute selection and shrinkage operator regression model; XGB, eXtreme gradient boosting; RF, random forest; DT, decision tree

Fig. 3
figure 3

ROC curves to differentiate between pre-cachexia and non-cachexia. a First model training to filter variables. b Second model training to screen the best model. LR, logistic regression model; LASSO, least absolute selection and shrinkage operator regression model; XGB, eXtreme gradient boosting; RF, random forest; DT, decision tree

Model validation and development of nomograms

There were significant associations between variables in the model and precachexia or cachexia (Supplementary Tables S3). To assess the accuracy of the models, we plotted decision and calibration curves for the established LR models. The results indicated that the LR models performed well in identifying noncachexia from precachexia and cachexia (Fig. 4). Therefore, we used the LR models to create separate nomograms to identify precachexia and cachexia (Fig. 5).

Fig. 4
figure 4

Evaluation of the model after the second model training. a Calibration curve for cachexia model. b Calibration curve for precachexia model. c Decision curve for cachexia model. d Decision curve for precachexia model

Fig. 5
figure 5

Nomograms based on logistic regression models. a Characteristic variables of cachexia. b Characteristic variables of pre-cachexia. Scr, serum creatinine; HGS, hand grip strength; MAC, arm circumference; HDL, high-density lipoprotein; CAR, C-reactive protein to albumin ratio

Discussion

In this large retrospective study of 3896 patients with cancer, we used machine learning methods to identify patients with precachexia and cachexia. After screening for these characteristics, we constructed a screening model for cachexia risk and a diagnostic model for identifying patients with precachexia. These models have good identification efficacy, and nomograms were created based on these models. This may help clinicians detect precachexia and cachexia in a timely manner and improve the outcomes of cancer patients.

Despite the high prevalence of cachexia in cancer patients, evidence-based interventions remain scarce, and a clear definition of cachexia and effective screening tools are urgently needed [12]. A previous study used the machine learning method to identify cachexia without weight loss information [13]. However, their study did not identify precachexia. The identification of precachexia is crucial, as none of the current treatments can reverse it effectively, and refractory cachexia does not respond to treatment [14]. Regardless of its prevalence, a systematic search for precachexia is a powerful tool to prevent the onset of venting or delay the devastating picture of cancer cachexia [15]. Therefore, we wanted to develop an effective method to identify precachexia to fill the gap in this area.

In this study, we screened a large amount of clinical information to identify precachexia and cachexia in an easy and economical manner. We found that cachectic patients were characterized by eating changes, MAC, HDL, and CAR, whereas precachectic patients were characterized by eating changes, Scr, HDL, HGS, and CAR. Of these, eating changes, HDL, and CAR were common to both, suggesting that these features persisted in the cachectic antecedent.

Scr and HGS are unique features of precachexia, and both are associated with muscle status. Under normal renal function, creatinine is usually produced in the body at a relatively constant rate depending on the absolute muscle mass [16]. Previous studies have demonstrated a significant correlation between creatinine levels and muscle mass. Previous studies estimating muscle mass based on creatinine kinetics showed good correlation between creatinine and other indicators of muscle mass [17]. Scr levels are highly correlated with lean body mass [18]. Increased creatinine levels may result from increased muscle catabolism. A prospective study suggested that creatinine might be a surrogate indicator for assessing sarcopenia in advanced lung cancer [19]. Our results showed that muscle loss manifests differently at different stages of cachexia. Precachexia is manifested by elevated Scr and decreased HGS, whereas cachexia is manifested by decreased absolute muscle mass. A previous study on precachetic lung cancer showed unchanged muscle mass but significantly reduced muscle function in the precachetic stage. Our findings are consistent with them, showing only a decrease in HGS but not in muscle mass in the precachexia phase.

In addition, low HGS is not only associated with cachexia survival but also has a synergistic effect with inflammation [20]. A previous study showed that muscular inflammatory signaling and UPS activity were not altered in patients with precachexia [21]. It transitions from systemic to local inflammation to initiate cachexia, which is a characteristic of sarcopenia. There is a synergistic effect between inflammation and sarcopenia that promotes each other and mediates the progression from precachexia to cachexia. Cachexia is an active catabolic process whose high catabolism is mainly attributed to the systemic inflammatory response caused by the tumor itself, which promotes the catabolism of fats and proteins, and this inflammatory response persists [22, 23]. Our study shows that CAR is common to both precachexia and cachexia, suggesting that inflammation persists during the development of cachexia, which is consistent with the findings of previous study. Previous studies have suggested that systemic inflammation is significantly associated with cachexia prognosis [24]. The etiology of cachexia is not fully understood, but the vast majority of patients appears to have chronic systemic inflammation [25]. Therefore, we should pay sufficient attention to the persistent inflammatory state, which may predict the development of sarcopenia and cachexia and is associated with a poorer prognosis.

In addition to sarcopenia and inflammatory, changes in eating are another characteristic of cachexia. Changes in eating is the most important features of cachexia and precachexia. Reportedly, reduced food intake was strongly associated with weight loss [26]. An international multicenter cohort analysis suggested that reduced eating predicted a high probability of weight loss and was associated with poor prognosis [27]. Reduced eating in patients with cancer is multifactorial and includes tumor growth, protein and fat hydrolysis, cytokine release, systemic inflammation, intestinal obstruction, and response to chemotherapy [28]. Tumor-released substances, such as pro-inflammatory cytokines, lactate, and parathyroid hormone releases peptides (PTHRP), are decisive factors in diagnosing anorexia nervosa [29]. Anorexia and weight loss are two major features of cancer cachexia that are regulated by two independent mechanisms, and anorexia occurs earlier than wasting in the course of cachexia [30]. Increased tumor and resting energy expenditure are important causes of progressive cachexia in cancer patients with anorexia, which further alters the metabolic and inflammatory responses of cancer patients [28]. Although a decline in eating cannot be fully equated with anorexia, it should be taken seriously when a patient with cancer shows this symptom, as this may indicate that the patient is entering the elusive precachexia phase. Decreased eating is present throughout the course of cachexia and seems to be a warning sign for this disease.

Our study innovatively developed a diagnostic model for precachexia. Although the study is preliminary, it is informative for the clinical diagnosis of precachexia. In the present study, precachexia and cachexia were screened separately from noncachexia for characteristics, and in future studies, a diagnostic model with three outcome variables could be further developed to identify noncachexia, precachexia and cachexia. In addition, precachetic and cachetic sarcopenia may be different, and the molecular mechanisms involved deserve further investigation. Sarcopenia was originally defined by the European Working Group on Sarcopenia in Older People 2 (EWGSOP2) in the geriatric population as a progressive systemic skeletal muscle disease that involves reduction of muscle mass and loss of muscle function, mainly represented by muscle strength [31, 32]. Treatment of sarcopenia in precachexia may be a potential treatment modality to prevent disease progression.

This is the first study to use machine learning methods to identify precachexia. The data and results are generalizable to participants from multiple large hospitals across the country, including those with multiple cancer types. However, this study still has several limitations. First, our definitions of cachexia and precachexia come from the 2011 International Expert Consensus, which relies on the patient’s recollection of his or her weight changes over the past 6 months and may lead to errors. The latest guidelines on cachexia suggest that changes in muscle mass should also be taken into account [33]. However, due to data limitations, we were unable to further explore the role of modeling in predicting muscle loss in this study. Second, our distinction between the cachectic stages should include the cachectic refractory stage; however, machine learning was not supported because of numerical limitations. Similarly, owing to population limitations, we did not apply the models to different tumor types, but we should explore whether the models we developed are suitable for all tumor types in future studies. Furthermore, in future studies, we should validate the model we developed for identifying precachexia patients in larger cancer cohorts.

Conclusion

Our study established two LR models to identify precachexia and cachexia and demonstrated the same and specific characteristics of the different cachexia stages. The variables used in the model are part of routine inpatient examinations, which are easily accessible and do not add an additional burden to cancer patients. This may inform and assist clinicians in identifying and diagnosing early stage versus full cachexia and help guide management strategies to optimize outcomes in cancer patients.