Background

Cervical cancer is the second most severe female cancer worldwide with 570,000 women diagnosed and 311,365 women died in the year 2018 despite worldwide applications of early screening for the disease or for the presence of human papillomavirus (HPV) [1]. It was estimated that 44.4 million cervical cancer cases would be diagnosed globally over the period of 2020–2069 [2]. Commonly used screening methods include HPV test, thin prep cytological test (TCT), and joined tests by HPV and TCT [3]. By comparison, TCT has lower false positive and higher false-negative rates than HPV test, but HPV test may cause higher unnecessary referrals to colposcopy [4]. With more and more HPV and TCT joined tests applied and compared [5,6,7,8], WHO changed cervical cancer screening guideline and listed HPV DNA test as the first recommended method for the application.

Currently, the results of HPV testing were generally reported as HPV positive or negative qualitatively based on the cut-off value of the assay used for the diagnosis. However, accumulated HPV screening data showed that HPV viral load could add valuable information as a screening triage marker. For example, Thomas identified a significant correlation between HPV viral load and integration status with high-grade squamous intraepithelial lesion (HSIL) [9]. Zhao’s study found that the 10-year cumulative incidence rate of cervical intraepithelial neoplasia (CIN2 +) was associated with cytological lesions and viral load and they recommended viral loads as a triage marker for non-16/18 hrHPV (high risk HPV) positive women [10]. A recent study also indicated that HPV viral load was positively correlated with cervical lesion grade based on 8556 women’s cervical cancer screening results [11]. In addition to being considered as a potential triage marker, HPV viral load was also a potential disease progression indicator as being showed that cervical cancer patients with high HPV viral load had a significantly lower 15-year survival rate and an advanced stage based on the International Federation of Gynaecology and Obstetrics (FIGO) as well as increased recurrence rate [12]. However, inconsistent conclusions related to viral load triage and prediction value from different studies restrain applications of viral load value in clinical settings [13]. One of the reasons causing result inconsistency is likely due to the different methods used in different diagnostic laboratories as being shown by a few small sizes of HPV viral load studies based on Hybrid Capture 2 (HC2) [14], Aptima E6E7 [15], and Cobas 4800 [16].

In this study, we retrospectively compared our cervical cancer screening results assayed by the 3 HPV testing platforms (HC2, Aptima E6E7, and HPV Cobas 4800) with accompanied TCT test results. A model for predicting different levels of cervical lesions was established by integrating potential cervical cancer risk factors, such as HPV infection status, HPV viral load, age, bacterial vaginosis, fungus, etc.

Materials and methods

Patients and data collection

In total, 48,565 individuals were tested by both TCT and one of the 3 HPV testing methods (31,954 individuals tested by HC2, 3269 individuals tested by Aptima E6E7, and 13,342 individuals tested by Cobas 4800) from the years of 2016 to 2019 in our laboratory, a CAP- and ISO15189-accredited reference laboratory in Guangzhou, China. (Fig. 1). The cases were collected in three datasets, named Dataset HC2, Dataset E6E7, Dataset Cobas, respectively. The institutional review board of KingMed Diagnostics approved the study with code 022.

Fig. 1
figure 1

Flow chart diagram of study design and data analysis procedures

HPV testing

HC2 assay detects 13 hrHPV subtypes, including HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59 and 68 using hybrid Capture 2 high-Risk HPV DNA Test from Digene Corporation (Gaithersburg, MD, USA), providing an HPV positive or negative result based on the reading value compared with the cutoff value, RLU/CO > 1.0. Aptima HPV assay targets E6E7 mRNA expression of 14 hrHPV subtypes, including HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66 and 68 using TMA (transcription mediated amplification) based methodology from Hologic Company (Marlborough, MA, USA). Roche Cobas 4800 HPV DNA assay (Pleasanton, CA, USA) is a real-time PCR-based assay used for HPV16, HPV18, and other 12 hrHPV subtypes, including HPV31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and 68.

TCT testing-liquid-based cytology

Collected specimens were automatically treated and converted to cytological specimens by using ThinPrep method from Hologic (Bedford, MA, USA) [17]. Prepared specimens were evaluated independently by at least 2 certified cyto-pathologists. Results were classified as: negative for intraepithelial lesion or malignancy (NILM); atypical squamous cells of undetermined significance (ASCUS); atypical squamous cell cannot exclude high-grade squamous intraepithelial lesion (ASC-H); low-grade squamous intraepithelial lesion (LSIL); high-grade squamous intraepithelial lesion (HSIL) [18]. Patients with a diagnosis of AGUS or cervical cancer were excluded from the study due to the limited number of individuals identified. Meanwhile, BV and fungal infections are determined by pathologists through the result of TCT.

Data processing

Each of the 3 HPV platform datasets was divided into two datasets, all cases dataset (ACD), and dataset with only HPV positive cases (POS). HPV viral load values were calculated based on the reported value from each method, RLU/CO from HC2, S/CO from Aptima E6E7, and PCR cycle number from Cobas 4800.

Risk factors selection and model establishment

The original datasets were divided into 2 datasets, the training dataset contained 80% of the cases while the validation dataset had 20%. Synthetic minority over-sampling technique (SMOTE) analysis using the DMwR package was applied to balance data before model establishment. Pearson's correlation coefficient was applied to determine the association between viral load, age, HPV infection status, BV, and fungus infection with cytology diagnostic stages (ASCUShigher, ASC-Hhigher, LSILhigher, HSILhigher). Different combinations of the significantly correlated variable factors were used for further logistic regression model analysis, and comparison was applied by using the area under curve (AUC) value of each receiver operating characteristic (ROC) curve. Besides logistic regression model analysis, five more machine learning methods, including Decision tree, Xgboost, Random forest, support vector machines (SVM), and Neural net, were applied to build models using the Rattle package with default parameters.

Results

Data sets characteristics and comparisons

All diagnostic results and related information were summarized in Table 1. In total, the average positive detection rate for HPV was 46.64% (22,654/48,565), including 59.10% (18,878/31,954) identified by HC2, 25.52% (3406/13,342) identified by Cobas 4800, and 11.31% (370/3269) identified by Aptima E6E7. Of the TCT results, NILM represented about 80% of the cases assayed, followed by LSIL (14%), ASCUS (7%), HSIL (3%), and ASC-H (2.6%). The proportions of cases with different TCT stages were similarly distributed among all 3 platform datasets (Additional file 1: Supplemental Fig. 1).

Table 1 Demographic data of patients collected in the three datasets

The viral loads showed an increasing trend along with the advancing cytology stages in each of the 3 HPV datasets (Fig. 2 and Additional file 1: Supplemental Fig. 2). Viral load values of each two stages were found significantly different in HC2 ACD except that between stage ASCUS and ASC-H. Compared with the other two platform datasets, more significant differences between TCT stages in the HC2 dataset were observed, no matter in ACD or positive dataset. Ct value of Cobas assay was used as viral load value and three types of HPV positive cases of Cobas were shown separately, other type HPV (HPV OT), HPV16, and HPV18.

Fig. 2
figure 2

Distribution of viral load value with cervical lesion stages of the three platform ACDs. a HC2. b E6E7. c Cobas

Correlations between variable factors

Correlation analysis was carried out to analyse the relationship among any 2 of the following factors (Additional file 1: Supplemental Table 2). In detail, we observed the following relations: (1) A significant correlation between viral load with cervical lesion stages in all the 3 datasets; (2) A significant correlation between age with cervical lesion stages in HC2 and Cobas datasets; (3) A significant correlation between viral load with BV infection in HC2 ACD, and E6E7 ACD but not in the POS of E6E7 and Cobas; (4) Fungus infection was observed significantly correlated with age but not with viral load and BV in all the three platform datasets; (5) There was no significant correlation between BV and age in most datasets, except HC2 POS. The detailed results were shown in Additional file 1: Supplemental Tables 2 and 3.

Table 2 Performance summary of models established by six ML methods in terms of PPV, NPV, Sensitivity, Specificity, Accuracy, Precision
Table 3 AUC value of the best two models established by Xgboost with test dataset analysis

Logistic regression models build on different factor combinations

The logistic regression model of each test dataset was established with every precancerous stage and higher as a diagnostic endpoint. Different risk factor combinations of viral load, BV, and age were used for building the regression equation. The AUC value of each model and comparison results of each two-variable combinations were summarized in Additional file 1: Supplemental Table. To avoid data imbalance, SMOTE was applied to balance the data of each cervical lesion stages. The results, elucidated that: (1) models of HC2 ACD and POS all performed best compared with the models established by the other two platform data sets with significant difference (Additional file 1: Supplemental Table 5); (2) models of HC2 POS and ACD with HPV viral load and bacterial vaginosis as variables performed best with significant difference compared with models established by viral load (VL) only and VL with Age variables (Additional file 1: Supplemental Table 6). ROC curves of each platform ACD models were shown in Fig. 3. It showed that models performed differently by using different cervical lesion stages and higher as a diagnostic endpoint. Models of HC2 performed best (AUC = 0.9467) with LSIL higher stage as a diagnostic endpoint. E6E7 (AUC = 0.9341) and Cobas OT models (AUC = 0.9038) performed best with ASC-H higher stage as a diagnostic endpoint. However, Cobas 16 models performed best (AUC = 0.9915) with HSIL higher stage as a diagnostic endpoint. In summary, the models generated by the HC2 platform with BV and VL as variables had the best performance compared with models of the other two platform data sets.

Fig. 3
figure 3

ROC curve of logistic regression model established by VL and BV variables using all data sets of the three platforms. a HC2. b E6E7. c Cobas_OT. d Cobas_16

Establishment and comparison of machine learning models

To establish the best model for diagnosing early cervical lesion stages, six machine learning methods were further applied in HC2 ACD and POS with VL and BV as variable factors. AUC values, PPV, NPV, accuracy, sensitivity, and specificity of the models were analysed for model performance evaluation, shown in Table 2, and comparisons were carried out between different methods, Additional file 1: Supplemental Table 7. The results indicated that the AUC value of Xgboost models in both ACD and POS was the highest compared with the other five methods, with an AUC value of ASCUS higher, ASC-H higher, LSIL higher, and HISL higher were 0.915, 0.953, 0.956, and 0.961 in ACD and 0.860, 0.910, 0.924 and 0.929 in POS, respectively. The ROC curve of Xgboost models of each diagnostic endpoint were shown in Fig. 4. And a significant difference was observed between ACD and POS AUC values. The Xgboost models were evaluated with a sensitivity of 0.826 (ASCUS higher), 0.914 (ASC-H higher), 0.925 (LSIL higher) and 0.952 (HSIL higher) and specificity of 0.838 (ASCUS higher), 0.845 (ASC-H higher), 0.849 (LSIL higher) and 0.838 (HSIL higher) in HC2 ACD, respectively. The sensitivity and specificity of Xgboost models of HC2 POS were significantly lower (sensitivity, P = 0.007; specificity, P = 0.05) than them in ACD.

Fig. 4
figure 4

ROC curve of six machine learning methods model by using HC2 dataset. a ASCUS higher. b ASC-H higher. c LSIL higher. d HSIL higher

Validation of the best HC2 models

To further validate the model established by Xgboost, we collected a new batch of HC2 HPV testing data, which consisted of 3932 NILM, 148 ASCUS, 28 ASC-H, 62 LISL, and 15 HSIL patients and evaluated the performance of the models in all and positive datasets. The results were summarized in Table 3. It showed that by using a new set of HC2 results, diagnostic models of Xgboost could predict the cytologic stage of the patient with acceptable AUC values, 0.8200 for ASCUS higher, 0.9385 for ASC-H higher, 0.9413 for LSIL higher, and 0.9293 for HSIL higher stage of test ACD model and 0.7176 for ASCUS higher, 0.7285 for ASC-H higher, 0.7210 for LSIL higher, and 0.7336 for HSIL higher stage of test positive data set. The ACD model performed better than the positive dataset with specificity ranging from 0.9547 to 0.9577 and sensitivity ranging from 0.5020 to 0.6484.

Discussion

The mean values of HPV VL in each cytology stage increased with the severity of cervical lesion grade, consistent with previous findings, indicating the reliability of our conclusion [10, 19]. However, the associations of HPV subtypes VL with cervical lesions were inconsistent across studies. Luo Hongxue reported that the viral load of HPV16/18 could be used as a triage marker for HPV-positive women while Dong Li’s research found it cannot [10, 14]. The disagreement of studies might be caused by methods limitations in the studies or the reality of different viral load distribution characteristics of each HPV subtype in different populations. Based on our comparison results of platforms, which was seldom to be seen in one study, although the VL value trend seems similar among platforms, there was still a difference that could be observed in the distribution of viral load in each specific disease stage and coefficient among factors. It indicated that different methods could provide different detection ranges, which further differently reflected the real viral load situation of the sample. Therefore, the method with more broad detection range and lower limit of detection should be recommended for viral load study.

The cervical microbiome has been found to be affected by HPV infection [20] and the presence of BV was reported to be associated with HPV infection and persistence [21, 22]. BV and other factor, multiple sexual partners, were combined to predict of CIN/CC status [23]. A significant association between BV with HSIL cytologic stage in our HC2 dataset was observed, consistent with a previous report [24, 25]. These results provided strong support for our model comparison results which indicated that BV and VL are the two factors that provide the best accuracy for the effect of models. Although the BV status of our results was retrieved from cytologic diagnosis results, it also indicated the potential of DNA test assays or tools of detecting the two factors at the same time and collected information that could be used for cervical lesion prediction. The simultaneous detection method of HPV infection and microbiome of cervical samples have been developed by another study [26], providing the value of detecting both factors in the prevention of cervical cancer development. Since there were many factors that could affect cervical cancer development and their correlation relationship was not fully understood. Therefore, more exploration between them is necessary. The correlation analysis of risk factors in our study discovered a more significant correlation between them in specific population groups, which indicated different models with specific different factors might be established in the future to get more accurate results for clinical application.

Of the 3 HPV test platforms, Cobas 4800 is the only platform that could differentiate HPV16, HPV18, and HPV OT, enabling us to analyse the correlations between viral loads of the HPV subtypes and the severity of the cervical lesions caused by HPV. Our results showed that viral load in the cases with HPV16 infection increased more obviously with advanced cervical lesion stages compared with HPV18 and HPV OT, like a previous report [27]. If actual correlations between viral loads of HPV subtypes and cervical lesions caused by these viruses could be demonstrated, it might be possible to accurately diagnose people with similar conditions, using viral load and other variable factors without being necessarily referred to pathologists in the future [28].

This study indicated that: (1) HPV viral load values generated by the HC2 platform fit more for the diagnostic model establishment than the other two platforms, Aptima E6E7 and Cobas; (2) Sample balance treatment (SMOTE) improved our model performance in the unbalanced dataset since our datasets were from cervical cancer screening with a significantly higher percentage of normal status samples than abnormal samples. Similar results were reported showing that datasets pre-processed by SMOTE could improve model accuracy by avoiding bias caused by imbalance of the datasets used [29]. The AUC values of other diagnostic models had been reported as 0.895 and 0.64 in diagnosing CIN2 + by Tuerxun’s study and Xiao’s study, respectively [30, 31]. However, the AUC value of our model for HSIL prediction is 0.9293.

In summary, our results provided valuable information for the evaluation of viral load of HPV in clinical diagnostic applications. We also proved it is feasible to predict the cytological stage by using a diagnostic model based on viral load and other factors, especially in areas lacking enough pathological resources. As we all know that cervical cancer mainly occurs in low-level income countries, which often lack high-quality clinical resources, including clinicians and equipment. Therefore, our model with accurate diagnostic prediction function provides strong evidence for its clinical application with reliable results. However, due to the significant difference between HPV test methods, more studies need to be carried out to standardize the best way of diagnosing by models. Based on our study, the PCR-free method might be a better choice in this scenario. What’s more, further study combing patients’ information, cervical cancer screening results, colposcopy diagnose results, and management information should be carried out in the future to evaluate the application value of our model.

Conclusions

Using clinical laboratory cervical cancer screening datasets, after evaluating optimal datasets, machine learning method, and variable factors, early diagnostic models of four cervical lesion stages were defined. It is the first study by using BV and HPV VL for cervical lesion cytological diagnosis prediction and the accuracy of the prediction was shown to be superior to other clinical characteristics. Furthermore, machine learning models built based on HPV VL and BV demonstrated excellent performance in determining cervical cancer precancerous lesions at different stages, especially the Xgboost model. These promising findings warrant the early diagnosis for cervical lesions in clinical applications, especially in scenarios with limited pathological resources.