Introduction

Cervical cancer is the fourth most common cancer in the female reproductive system and the seventh most common cancer worldwide. There is a higher likelihood of cancer tumors growing in areas where endocervix cells become exocervix cells or near the Squamocolumnar Junction (SCJ). Cervical cancer is one of the main factors related to the death of females worldwide [1]. According to the World Health Organization (WHO) cervical cancer report in 2020, there were about 604,127 diagnosed cases and 341,831 deaths worldwide, of which 1,056 diagnosed cases and 644 deaths occurred in Iran [2]. Sexually transmitted diseases, multiple partners, smoking, weak nutrition, and the immune system play a role in the growth and development of cervical cancer [3]. An important risk factor for cervical cancer is the persistence of human papillomavirus (HPV), especially genotypes 16 and 18 [4]. Although about 90% of human papillomavirus infections heal by themselves within two years, some may also lead to the growth of cancerous masses in the cervix [5, 6]. Diagnosing a cancerous mass in the early stages increases the patient’s chance of survival and treatment. In late diagnosis, the possibility of complete recovery of the patient decreases [7]. Cervical cancer is entirely preventable and treatable if pre-cancer symptoms are identified at an early stage. The pap smear is frequently used for cervix medical diagnosis to track cervical cancer. A few cervical cell samples are taken, a cell smear is made, the cells are examined under a microscope for abnormalities, and the result is a diagnosis of the cervical condition [8]. Physicians consider the patient's chance of survival to guide their treatment plan.

Survival prediction is a set of statistical methods for data analysis, where the outcome variable is the time to an event. In other words, survival prediction is calculated by considering the time between exposure to the event and the occurrence of the event [9]. According to the American Society of Clinical Oncology (ASCO), the average 5-year overall survival rate for cervical cancer is 66%, i.e., about 66% of people diagnosed with cervical cancer today will survive for at least the next five years. The best treatment method for each patient can be adopted by evaluating the patient’s clinical and treatment data to accurately predict the patient’s survival. Researchers have often used classical statistical methods such as non-parametric, parametric, and semi-parametric (COX) tests to predict survival [10]. In recent years, artificial intelligence algorithms, with their impressive capabilities, have been in fierce competition with statistical tests and have grown significantly in survival prediction.

Big data are being generated and stored with the rapid growth of digital technologies in healthcare and the evolution of electronic health records (EHR) [11]. Classical statistical methods often focus on the relationship between dependent variables to achieve the final result, but machine learning algorithms can learn hidden patterns in data. Machine learning algorithms do not require implicit assumptions and can manage non-linear relationships between variables [12]. Machine learning makes computers intelligent without directly teaching them how to make decisions and solve problems [13]. Today, machine learning algorithms have been studied and developed in the diagnosis, prognosis, and prediction of the occurrence of many diseases [14], which performed very well in dealing with Big data [15].

This study aimed to evaluate published studies on machine learning algorithms in predicting the survival of patients with cervical cancer, considering overall, disease-free, and progression-free survival.

Materials and methods

This systematic review examined original articles that used machine learning algorithms to predict the survival of patients with cervical cancer and discovered knowledge.

Study selection

The article selection method was based on the Preferred Protocol for Systematic Reviews and Meta-Analysis (PRISMA) and the retrieved articles were imported into Excel software. The first search returned 229 articles, then 45 review articles and 85 duplicate articles were removed. A total of 99 items remained for screening based on the eligibility criteria. During the screening process, 70 articles were excluded by title and abstract verification, and 16 articles were excluded based on method, results, or study design nature. The screening process was performed twice to reduce errors. Any discrepancies were resolved through discussions with the second and third authors. Finally, 13 articles were thoroughly examined and included in the study (Fig. 1).

Fig. 1
figure 1

Description: Flow diagram of the study identification and selection process, following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines

Search strategy

Articles published until October 1, 2022, were collected from three electronic databases, PubMed, Scopus, and Web of Science, and the search query consisted of three basic parts. The first part was about cervical cancer, which included two keywords of "cervical cancer" and "Uterine Cervical Neoplasms". The second part was about predicting survival with one keyword named "Survival", and the third part was about artificial intelligence with three keywords, including "Machine learning", "Deep learning", and "Artificial Intelligence." Details are available in Table 1.

Table 1 Keywords and search strategy in three databases: PubMed, Scopus, and Web of Science

Inclusion and exclusion criteria

This study included original articles and full English text, which used machine learning algorithms as predictive models for cervical cancer survival.

Books, review articles, meta-analyses, case reports, posters and case studies were filtered. In addition, articles that did not sufficiently focus on the implementation of machine learning algorithms, cervical cancer, and model outputs were excluded in the screening section. All entry and exit criteria are listed in Table 2.

Table 2 Inclusion and exclusion criteria for articles in the study

Results

From the initial search results, 229 articles were found, of which only 13 articles met the study criteria and were included in the study for further investigation. All included articles were retrospective and used machine learning algorithms as modeling to predict cervical cancer survival.

Characteristics of studies

Most of the imported articles were published from 2018 onwards, and the last was from 2022 (Table 3). Table 4 provides additional information and a general view of the included studies. A total of eight articles were performed in Asia [16,17,18,19,20,21,22,23], four in Europe [24,25,26,27], and one in the United States [28]. Generally, eight articles on overall survival (OS) [17, 19,20,21, 23, 26,27,28], six articles on disease-free survival (DFS) [16, 18, 21,22,23,24], and three articles on survival progression-free (PFS) [19, 25, 28] were used to predict the survival of patients with cervical cancer. Moreover, two articles were excluded from the study due to the use of machine learning algorithms only as a tool for feature selection [29, 30].

Table 3 Extracted characteristics of the included articles
Table 4 Classification of the features of the included articles

Database information

Ten articles used hospital and clinic datasets [16, 19, 21,22,23,24,25,26,27,28], and three articles each used the cancer genome atlas [20], SEER [17], and Geo [18]. The datasets used in the three articles were more detailed and open to public access [17, 18, 20], but private datasets were used in the other ten articles. The maximum and minimum sizes of the datasets used for modeling were 14,946 and 85 records, respectively, and the datasets had more than 1000 records only in three articles [17, 19, 21].

Data preprocessing

A total of 11 articles used data preprocessing techniques [16,17,18,19,20,21,22,23,24,25,26], and three mentioned missing data [18, 19, 25]. Selected approaches to handle missing data included record deletion, multiple imputations, and the nearest neighbor algorithm. The feature selection approach was used in all the articles except article [27], but only eight articles specified the details [16, 18, 20, 21, 23,24,25,26]. Logistic regression [24], Naive Bayes [24], Random Forest [24], Genetic algorithm [26], lasso [17, 18, 25, 27], k-means [19, 20], Support vector machine [18, 19, 26, 28], AdaBoost [18], Elastic-net [23], recurrent feature elimination (RFE) [16, 25], and deep learning [22, 23, 28] were among the algorithms used for feature selection and extraction. Two articles mentioned the management of outlier data [16, 20], but only one provided more details [16].

Imbalanced data in the dataset causes a lack of generalizability in the model and is considered a serious challenge [31]. The challenge of unbalanced data in the dataset was discussed in two articles [25, 26], and the RF cost-sensitive method was used to overcome this challenge in one article [25].

Data modeling

The model was calibrated in three articles [16, 18, 25], but the work details were not provided. Hyperparameter tuning was used in model training in six articles, but only four shared the work details [18, 24, 25, 28].

Six articles used only one machine learning algorithm to build the model [16, 17, 20, 22, 23, 26]. Further, two or more machine learning algorithms were used in seven articles, and their output was compared with each other [18, 19, 21, 24, 25, 27, 28]. The most frequent machine learning algorithms were random forest, logistic regression, support vector machine, deep learning, and ensemble and hybrid learning.

Model validation

The selected articles were based on internal validation in 11 articles and external validation in two articles [18, 24]. Most of the studies related to internal validation used the cross-validation method.

The most common criteria for evaluating the algorithm performance in the articles were the model AUC from 0.40 to 0.99 in seven articles, regardless of the type of survival. C-index was 0.39 to 0.94 in 5 articles, and the accuracy was 0.61 to 0.92 in 4 articles. In three articles, sensitivity and F1-score were 0.20 to 0.97 and 0.22 to 0.92, respectively. More details were shown in Table 5.

Table 5 Classification of the used evaluation criteria into types of survival from the lowest to the highest

Regarding articles with more than one model, ensemble and hybrid models in 3 articles [18, 19, 21], random forest in 3 articles [24,25,26], logistic regression [17], and deep learning [28] in 1 article had the best performance.

Important variables

Clinical tabular data were used as model inputs in 11 articles [16, 17, 19,20,21,22,23,24,25, 27, 28], which were the only model inputs in five articles [17, 19, 21, 27, 28]. Image-based data was used [16, 22,23,24,25,26] in six articles, one of which applied the machine learning model trained only with images [26]. In two articles, molecular data were used to predict survival [18, 20]. According to the output of all survival prediction models, cancer stage variables, histology, treatment method, and tumor-related information have significantly affected cervical cancer survival prediction. The important variables extracted from the included articles are shown in Table 6.

Table 6 Influential variables in predicting types of survival extracted from articles

Discussion

A systematic review of 229 articles resulted in the inclusion of 13 articles. The selected articles contained qualitative and quantitative information about predicting and analyzing the survival of cervical cancer patients using machine learning algorithms. The number of articles using machine learning algorithms to predict cervical cancer survival was few. Studies related to all three types (overall survival, disease-free survival, and progression-free survival) were inevitably included in the study due to the variation in survival and the small number of studies specific to each type of survival.

The three included studies that used open-access databases were more transparent and competitive in preprocessing and model building. Multiple researchers can analyze open-access databases to discover the most valuable features and the best machine-learning model for that particular dataset. Another essential thing even mentioned in the article [32] was the correlation of the model output with the data of a specific geographical environment and the change of medical prescriptions over time. Generalizability and the time interval between data collection and modeling can be evaluated in the applicability of the model output. Databases with open access were more suitable and valuable for studying and predicting survival.

The included articles used datasets with different sizes and types for modeling. The largest dataset included in the study was related to the article [17], with 14,946 clinical tabular data and C-index (0.86). The smallest dataset included in the study is related to the article [26] with 85 image data records (PET/CT) and C-index (0.77). Image datasets had fewer records than other datasets among the imported articles. According to the reports of (Illia Horenko) [33], small datasets used in model training often cause overfitting of the model and reduce the model’s capacity for generalization. Image datasets sometimes make the model more accurate than tabular data, which can be caused by the power of image processing algorithms [34]. Feature extraction, feature selection, transfer learning, fine-tuning, augmentation, object segmentation, and object detection were the most critical advantages of image processing algorithms [34,35,36]. In addition to the cases mentioned, convolutional neural networks obtained valuable results on 3D images [37]. Recently, medical image datasets have been used to predict the survival of patients. However, larger image datasets and more optimal convolutional neural network structures should reach a robust model.

Only two of the articles included in this study had external validation. Article [18] with molecular data and the other article [24] with the combination of clinical tabular data and images (PET/CT) obtained precision of 0.82 and 0.42 respectively. The model’s generalizability is more reliable in external validation due to the use of different data. Most included articles used the five-fold cross-validation method for internal validation. Cross-validation is a resampling method for evaluating a model with limited data [38]. The advent of open-access datasets and standard databases of medical data has made it more feasible to evaluate models using external validation methods.

Data wrangling and preprocessing play an essential role in modeling and model output. Medical datasets often include noise, redundant data, outliers, missing data, and irrelevant variables [39]. Hoeren mentioned that the actual value of data lies in its usability [40], and data quality is the most critical concern in model training. Data cleaning is one of the essential solutions in the data preprocessing stage for reducing errors, preventing model bias caused by dirty data, and obtaining the best results [41]. Therefore, data preprocessing such as cleaning, transformation, reduction, and integration, should be conducted properly, which includes 70–80% of the training and model workload [42]. All the included studies paid attention to this principle.

Among all the included articles, six used hyperparameter tuning and feature selection methods in their study [18, 21, 24,25,26, 28]. Studies often used hyperparameter tuning and feature selection to avoid overfitting or to achieve high-accuracy models [24, 25]. According to articles [25, 32], selecting appropriate modeling variables directly affected the model’s output. Therefore, feature selection, extraction, reduction, and engineering are necessary to reach an ideal model. Hyperparameter tuning is one of the essential steps in the model-building pipeline, which can produce a model with high accuracy by finding the most optimal input parameters. Most of the entered studies used the Grid search method for this operation. Considering that feature selection in convolutional neural networks is done automatically, having background knowledge can enhance the model’s reliability. Approaches such as Bayesian Optimization and Evolutionary algorithms like Genetic Algorithms [26] and Artificial Fish Swarm [18] can be more suitable approaches for hyperparameter tuning and feature selection.

Recently, the use of Hybrid and Ensemble models has increased in the medical field, especially in predicting survival. Three of the included studies that used the abovementioned methods to predict survival have obtained acceptable accuracy and precision [18, 19, 21]. Random forest (RF) and Extreme Gradient Boosting (XGBoost) models are also among Ensemble learning (EL) algorithms [26]. Developing and optimizing machine learning models using hybrid and ensemble techniques continuously improve computational aspects, performance, generalizability, and accuracy [43]. Ensemble models, like deep learning algorithms, have spontaneous feature selection ability. In these two Ensemble and Hybrid learning methods, several models with weak learners are trained to solve a specific problem and combined to achieve better results [44].

Most studies have used a combination of clinical, imaging, and molecular data to predict survival to achieve greater accuracy in training machine learning models. Articles [22,23,24,25] used a combination of clinical data types with more accuracy and reliability. Most articles that used composite data to predict cervical cancer survival occurred from 2021 onwards. Random forest and deep learning were the most used in mixed data modeling. All types of patient data, with the help of artificial intelligence, can play a significant role in Precision Medicine.

With recent advances in artificial intelligence, deep learning algorithms have undeniably gained power as well. Deep learning algorithms are able to recognize patterns from large, extensive and heterogenous data. They have also provided an admirable ability to process image, video, text, audio and signals [45]. According to comparative studies, it has been determined that artificial intelligence has a better performance than classical statistics [45]. With the daily advancement of technologies and the rapid expansion of artificial intelligence science, we will see the use of transformers [46], meta learning [47] and quantum machine learning [48] in medical data processing in the near future. Nevertheless, solutions to the questions of interpretability and explainability should be considered together with the immense potential of AI in health research [49].

Conclusions

Recording and storing patient information has become easy and is overgrowing due to the growth and improvement of hospital information systems (HIS) and electronic health record systems (EHRs). Classical statistical models such as Cox are used in many survival studies but are no longer compatible with many medical data. Today, machine learning algorithms have become a focal point in research and development because of their unique abilities in pattern recognition in data, feature selection and extraction, and great power in medical image processing.

Most of the survival articles of the last few years have used machine learning algorithms to predict the survival of cervical cancer patients. Combining heterogeneous multidimensional data with machine learning techniques could affect the prediction of cervical cancer survival. The low or lack of explainability in machine learning algorithms has prevented the official use of artificial intelligence models in health. Machine learning is more accurate than other statistical methods in predicting the survival of cervical cancer patients, but more studies are needed to become a standard.