Introduction

Since December 2019, novel coronavirus pneumonia (NCP) emerged in Wuhan, Hubei, which was well known as the largest transportation hub in China. The pathogen has been proved to be a novel betacoronavirus that is currently named 2019 novel coronavirus (2019-nCoV)1. The disease has swept across China rapidly through human-to-human transmission2,3,4. Since February 27, 2020, more than 78,000 people were confirmed to be infected and more than 2700 were died in China5.

As the number of patients soaring, scholars have summarized the clinical characteristics of NCP6,7,8. Symptoms at onset of disease included fever, cough, headache, vomiting, diarrhea and so on. Normal or decreased leukocyte count was common. Radiologic abnormalities like ground-glass opacity and patchy shadowing on chest X-ray or computed tomography (CT) were marked characteristics. Acute respiratory distress syndrome, arrhythmia and shock could also occur in severe cases. Until now, to detect 2019-nCoV by the accurate real-time reverse transcription polymerase chain amplification (RT-PCR) assessment has been regarded as the golden diagnostic standard9.

Nevertheless, false negative results in initial RT-PCR examination existed in a number of cases10. Besides, the time-consuming process, short supply of kits, and difficulty in qualified sampling prevented us from early-stage diagnosis and treatment, as well as prompt isolation of patients. Therefore, it necessitates establishment of a rapid diagnostic model to screen high-risk patients with 2019-nCoV infection.

In this study, we aimed to develop a novel screening scale to determine highly suspected subjects based on epidemiological data, clinical manifestations, laboratory and radiological examinations. Given the evolution of the pandemic, we combined the epidemic regions into one parameter, and even dropped the epidemiological factors from the model, to establish another two simplified, but still effective models. It is the first study for screening and predicting NCP in Zhejiang Province, China, and can be popularized nationwide and even worldwide.

Results

Clinical characteristics of the study participants

Of the 880 subjects enrolled in the study, 21 subjects were excluded due to missing data, and 859 participants were eligible for evaluation. Of them, 339 were diagnosed as NCP with the positive detection of 2019-nCoV by real-time RT-PCR, while the other 520 participants were ruled out with at least two times negative results by RT-PCR. The 21 excluded cases refused chest X-ray or CT because of pregnancy or preparing for pregnancy. Fortunately, their RT-PCR tests all showed negative results.

The characteristics of participants were exhibited in Table 1. Among these 339 NCP sufferers, 188 (55.46%) were male, and the mean age was 46.88 ± 14.65 years. The age of NCP sufferers was significantly larger than those without NCP (P < 0.001). 33.63% of the confirmed patients had a history of travel or residence in Wuhan within 14 days, and 27.43% had contacted patients with fever or respiratory symptoms from Wuhan within 14 days. 35.10% of the confirmed cases were related to cluster outbreaks in families or places of work.

Table 1 Clinical characteristics of the study participants.

The common symptoms of NCP included fever (47.79%), dry cough (43.95%), sputum (34.81%), fatigue (26.55%) and dyspnea (9.73%). But fever was not a specific symptom because it was also commonly seen in non-NCP individuals (47.50%, P = 1.000). Normal or decreased WBC count happened in 93.81% of NCP patients, and decreased lymphocyte count was seen in 54.28% of patients. The mean WBC count in NCP group was 5.40 ± 2.56 (× 109/L), significantly lower than those without NCP (7.36 ± 3.06, × 109/L, P < 0.001). Meanwhile, the lymphocyte count was 1.22 ± 0.85 (× 109/L), significantly lower than those without NCP (1.71 ± 0.86, × 109/L, P < 0.001). No significant difference was found between the two groups in terms of C-reactive protein level (CRP). Most NCP patients (94.39%) had pulmonary radiologic changes like unilateral or bilateral patchy shadowing, ground-glass opacity or pulmonary consolidation on X-ray or CT.

Predictors associated with NCP

We performed both univariate and multivariate logistic regression analyses to assess predictors of NCP (Table 2). In the univariate analysis, age, co-existing diseases, travel or residence history within 14 days in Wuhan, neighboring areas of Wuhan in Hubei Province, and other areas with persistent local transmission, or community with definite cases, contacting patients with fever or respiratory symptoms within 14 days from Wuhan, neighboring areas of Wuhan in Hubei Province, and other areas with persistent local transmission or community with definite cases, relationship with a cluster outbreak, presence of sputum, fatigue, dyspnea, diarrhea or bellyache, muscle soreness, absence of nasal congestion or sore throat, decreased WBC count, lymphocyte count, and neutrophil cell count, and imaging changes in chest X-ray or CT were observed to be associated with higher odds of NCP.

Table 2 Predictors associated with NCP.

The above characteristics were utilized in the subsequent multivariate analysis, revealing that the following nine characteristics were independent risk factors for NCP: travel or residence history within 14 days in Wuhan (OR 8.440, 95% confidence interval (CI) 4.204–16.944, P < 0.001), contacting patients with fever or respiratory symptoms within 14 days who had a travel or residence history in Wuhan (OR 2.967, 95% CI 1.630–5.402, P < 0.001), contacting patients from other areas with persistent local transmission or community with definite cases (OR 4.139, 95% CI 2.334–7.342, P < 0.001), relationship with a cluster outbreak (OR 25.164, 95% CI 11.833–53.516, P < 0.001), presence of fatigue (OR 2.710, 95% CI 1.490–4.930, P = 0.001), dyspnea (OR 5.276, 95% CI 2.076–13.410, P < 0.001), muscle soreness (OR 14.187, 95% CI 1.998–100.730, P = 0.008), decreased WBC count (OR 0.750, 95% CI 0.659–0.852, P < 0.001), and imaging changes in chest X-ray or CT (OR 6.291, 95% CI 4.315–9.171, P < 0.001).

Derivation of the model

In this multivariate logistic regression model, the probability of having NCP was 1/(1 + e−(− 2.043 + 2.133 (if travelling to or residing in Wuhan) + 1.088 (if contacting patients from Wuhan) + 1.421 (if contacting patients from other areas with persistent local transmission or community with definite cases) + 3.225 (if relating to a cluster outbreak) + 0.997 (if having fatigue) + 1.663 (if having dyspnea) + 2.652 (if feeling muscle soreness) − 0.288 * WBC count + 1.839 * chest imaging score)). Consequently, we utilized the exponents of this formula and established a Zhejiang rapid screening model for predicting NCP as follows:

Model score (model 1) = 2.133 (if travelling to or residing in Wuhan within 14 days) + 1.088 (if contacting patients with fever or respiratory symptoms from Wuhan within 14 days) + 1.421 (if contacting patients with fever or respiratory symptoms from other areas with persistent local transmission or community with definite cases within 14 days) + 3.225 (if relating to a cluster outbreak) + 0.997 (if having fatigue) + 1.663 (if having dyspnea) + 2.652 (if feeling muscle soreness) − 0.288 * WBC count + 1.839 * pulmonary imaging score (as introduced in the “Method” part).

The AUROC of the model was 0.920 (95% CI 0.902–0.938), indicating a greater capability to discriminate NCP than WBC count (AUROC 0.727, 95% CI 0.692–0.762) or chest imaging score (AUROC 0.795, 95% CI 0.766–0.825) (Fig. 1). To internally examine whether the model was over fitted, we used fivefold Cross-Validation of the trained model, and repeated the cross-validation for 10 times. It showed that the mean of AUROC was of 0.916 with the standard deviation of 0.017. The Hosmer–Lemeshow test which measured the calibration showed a χ2 of 10.857 (P = 0.210), demonstrating that there was no significant difference from a perfect fit. The patients with NCP had a model score of 3.60 ± 2.41, higher than those without NCP (model score = −0.42 ± 1.69, P < 0.001) (Fig. 2). At a cut-off value of 1.0, the rapid screening model could determine NCP with a sensitivity of 85% (95% CI 81.2–88.8%), a specificity of 82.3% (95% CI 80.6–84.0%), a diagnostic accuracy of 83.2% (95% CI 80.7–85.7%), and a Youden index of 0.673.

Figure 1
figure 1

Receiver-operating characteristic (ROC) curves of the predictive model 1 and its included features for detecting novel coronavirus pneumonia. The area under the ROC curve was 0.920 (95% CI 0.902–0.938), 0.727 (95% CI 0.692–0.762), 0.795 (95% CI 0.766–0.825), with a standard error of 0.009, 0.018, and 0.015 for predictive model 1, WBC count, and chest imaging score. The optimized Youden based cutoff was 1.00, 6.20, and 0.15, respectively. The sensitivity and (1-specificity) of the binary factors were also illustrated. WBC: white blood cell.

Figure 2
figure 2

The capability of the models to discriminate novel coronavirus pneumonia. The three panels illustrate the performance of three models trained in this study. In each column the figure atop plots the fitted distribution of the predicted scores for the cases (blue) and the controls (green), respectively. The small vertical ticks underneath the distribution curve are the detailed predicted scores for individual, and the estimated mean scores of the model are presented in colored vertical lines. In the bottom plot the receiver-operating characteristic (ROC) curves (red) with the point-wise 95% confidence intervals (grey) for the corresponding prediction model. The area under the ROC curve of model 1 (the primary predictive model), model 2 (the simplified model), and model 3 (model without epidemiological history) was 0.920 (95% CI 0.902–0.938), 0.909 (95% CI 0.889–0.929), 0.859 (95% CI 0.833–0.884), with a standard error of 0.009, 0.010, and 0.013, respectively.

It’s worth mentioning that at a value of whether the predicted score > 4.0, the model could detect NCP with a specificity of 98.3% (95% CI 97.7–98.9%), while the sensitivity was 42.8% (95% CI 37.6%-48.1%), and the accuracy was 76.4% (95% CI 73.5–79.2%); at a cut-off value of < −0.5, the model could rule out NCP with a sensitivity of 97.9% (95% CI 97.1–98.7%), while the specificity was 51.0% (95% CI 46.7–55.2%), and the accuracy was 69.5% (95% CI 66.4–72.6%) (Table 3). Among the 859 subjects, 154 subjects (17.9%) had a model score of > 4, and 272 subjects (31.7%) had a model score of < −0.5. Based on the cut-off values, 410 subjects (96.2% of subjects with the model score > 4 or < −0.5) were correctly classified.

Table 3 Diagnostic accuracy of the rapid screening model.

Derivation of a simplified model

We further combined all the geographical regions into one parameter as “epidemic areas with persistent local transmission or communities with definite cases”. The results of the univariate and multivariate logistic analyses were showed in Table 4. To develop a simplified model, we rounded the coefficients and elicited the model as follows.

Table 4 Predictors associated with NCP (combining the geographical regions).

Simplified model score (model 2) = 1 (if contacting patients with fever or respiratory symptoms from areas with persistent local transmission or community with definite cases within 14 days) + 3 (if relating to a cluster outbreak) + 1 (if having fatigue) + 2 (if having dyspnea) – 1 (if having nasal congestion) + 3 (if feeling muscle soreness) − 0.3 * WBC count + 2 * pulmonary imaging score. The AUROC was 0.909 (95% CI 0.889 – 0.929) (Fig. 2). In fivefold Cross-Validation, the average AUROC was 0.862, with the standard deviation of 0.028. The Hosmer–Lemeshow χ2 was 11.962 (P = 0.153). The optimal cutoff value was 0.7, with a sensitivity of 82.3% (95% CI 76.3–87.0%), a specificity of 86.2% (95% CI 82.9–88.9%), a diagnostic accuracy of 84.6% (95% CI 82.2–87.0%), and a Youden index of 0.685.

Given the evolution of the pandemic, we dropped the epidemiological history that are likely to become outdated and repeated the analysis. The risk factors and their ORs were exhibited in Table 5. Consequently, a predictive model without epidemiological history was established.

Table 5 Predictors associated with NCP (dropping the epidemiological history).

Model score without epidemiological history (model 3) = 0.6 (if having coexisting diseases) + 0.8 (if having fatigue) + 1.2 (if having dyspnea) + 2.4 (if feeling muscle soreness) − 0.3 * WBC count − 0.3 * Lymphocyte count + 1.6 * pulmonary imaging score. The AUROC was 0.859 (95% CI 0.833–0.884) (Fig. 2), with the optimal cutoff value of − 1, a sensitivity of 83.5% (95% CI 79.1–87.1%), a specificity of 76.0% (95% CI 72.1–79.4%), a diagnostic accuracy of 78.9% (95% CI 76.2–81.7%), and a Youden index of 0.595. Repeated fivefold Cross-Validation showed the average AUROC was 0.854, with the standard deviation of 0.027. The Hosmer–Lemeshow χ2 was 12.218 (P = 0.142), indicating no statistical difference from a perfect fit.

Discussion

In this study, we compared the characteristics between the NCP patients and the suspected individuals who were finally ruled out of NCP. Having analyzed the clinical and epidemiological features, we developed a rapid screening model for predicting NCP in a Zhejiang population. The model included four epidemiological features: travel or residence history within 14 days in Wuhan, contacting patients with fever or respiratory symptoms within 14 days from Wuhan, contacting patients from other areas with persistent local transmission or community with definite cases, relationship with a cluster outbreak; and five clinical manifestations: fatigue, dyspnea, muscle soreness, decreased WBC count, and imaging changes in chest X-ray or CT. The diagnostic performance of the established scale was excellent with an AUROC of 0.920.

At a cut-off value of > 1.0, the model could detect NCP with a sensitivity of 85% and a specificity of 82.3%. Due to the nature of a communicable disease, the associated costs of a false negative are huge, therefore it is essential to avoid missed diagnoses, in particular given its surging outbreak. When the score was higher than 4.0, subjects were more likely to suffer from NCP (with a specificity of 98.3%) and they should be immediately isolated and further tests are highly recommended. In contrast, during the outbreak, a great quantity of patients with flu-like symptoms were scared and crowded into hospitals, giving clinicians great pressure. A model score of < −0.5 demonstrated a very small probability to be infected by 2019-nCoV (with a sensitivity of 97.9%). Clinicians can set the best cut-off value based on actual demands.

Under the circumstance of continuing spread of 2019-nCoV, Zhejiang model established in this study, as the first rapid screening diagnostic model for NCP, is of great significance in this battle. Unlike virus isolation or RT-PCR testing, the screening scale is economical, uncomplicated and fast, which can be used to select potential patients for further RT-PCR examinations.

Nevertheless, there were several limitations based on the model. Firstly, the enrolled participants were limited to Zhejiang Province, leading to certain regional limitations in the application of the screening model, in particular the epidemiological characteristics of possible compromise in another area. Secondly, our research was confined to early and rapid screening, without adequate information on disease progression and prognosis. Last but not least, with the development of epidemic situation, weight of certain characteristics, especially epidemiological characteristics, should be modified to increase the scope and accuracy of the diagnostic model.

In the purpose of eliminating the effects of location-specific factors like Wuhan-related criteria, which might be no longer applicable with the evolution of the pandemic, a simplified model was developed by combining all the epidemic regions. Furthermore, we repeated the analysis by dropping the epidemiological data. Both of the subsequent models were proved to be effective. In addition, fivefold Cross-Validations were repeated in each model during internal validation to quantify any optimism in the predictive performance, and Hosmer–Lemeshow χ2 test was utilized to measure calibration. Further nationwide even worldwide studies are needed to access the utility of this model, and subject to further adjustment and calibration if necessary.

According to the recent literatures, most patients with NCP are characterized by fever, cough, fatigue, and myalgia in the initial stage11. Atypical symptoms include diarrhea, nausea, headache, sore throat and so on. As the illness progressed, a proportion of patients gradually presented with dyspnea, especially in the populations with low immune functions12. Complications like acute respiratory distress syndrome (ARDS), arrhythmia and shock, is probably associated with a poor prognosis7.

The most common laboratory abnormalities observed are leukopenia and lymphocytopenia. Moreover, it is reported that hypoalbuminemia, elevated CRP and lactate dehydrogenase (LDH), and decreased CD8 count can be seen in part of cases6,13. The most frequent imaging manifestation is patchy/punctate ground glass opacities involved in single or multiple pulmonary lobes14. Alterations on chest CT can reflect the severity and progress of NCP15. However, 2019-nCoV infection can also present with normal pulmonary imaging, particularly in early stage, suggesting the necessity to combine epidemiological information, clinical manifestations and imaging in the screening and diagnosis16.

At present, RT-PCR remains the confirmation criteria for the diagnosis of 2019-nCoV infection. RT-PCR is a technology combining RNA reverse transcription (RT) with polymerase chain amplification (PCR) of cDNA. It has been widely used in detecting different coronavirus (such as SARS-CoV and MERS-CoV) in laboratory, because of its high specificity and sensitivity. Besides that the RT-PCR test can be time-consuming, a shortage of test kits supply may not meet the needs of a growing infected population. Furthermore, RT-PCR of 2019-nCoV may be false negative due to unstable kits or unstandardized sampling17. Xiao et al. reported that some patients who met the diagnosis of NCP based on clinical and imaging findings, had negative results for viral RNA18. In another study, initial negative RT-PCR results turned positive in repeated testing in a number of patients10. With the purpose of timely isolation and early treatment, it is necessary to establish a rapid screening diagnostic model for distinguishing highly suspicious patients with NCP. Actually, the authors are trying to develop a procedure for fast scoring in clinical application based on this model.

In conclusion, the study established a rapid screening model for predicting NCP in a Zhejiang population. What’s more, we developed a simplified model by combining the epidemic regions and rounding the coefficients, as well as a model without any epidemiological factor. The models can be used as a simple, fast, and cost-effective tool for screening NCP with significant clinical value.

Methods

Patients

From January 17 to February 19, 2020, a total of 880 patients who were suspected of 2019-nCoV infection were recruited from hospitals in Hangzhou, Wenzhou, Shaoxing, Taizhou, Ningbo and Jiaxing in Zhejiang Province. The study was approved by the Ethics Committee of Zhejiang Provincial People’s Hospital (2020KY006); in addition, all research was performed in accordance with relevant guidelines19. Exempt informed consent was approved by the Ethics Committee of Zhejiang Provincial People’s Hospital because the subjects would not be exposed to any risk in this observational study, and the information of subjects was anonymized at collection and prior to analysis.

Epidemiological history and clinical manifestations were collected in each individual. Age, gender, region, coexisting diseases, body temperature, results of blood routine test and chest X-ray or CT were recorded for all participants. Throat swab, sputum, blood, or stool samples were collected to examine the 2019-nCoV nucleic acid using real-time RT-PCR. If the first-time RT-PCR test revealed negative, samples should be collected after 24 h for a repeated test.

Eligibility

Patients admitted to the fever clinics who were initially suspected of NCP were included in the study. Suspected or confirmed cases were diagnosed according to the 5th edition of the Chinese recommendations for diagnosis and treatment of pneumonia caused by 2019-nCoV19.

Suspected cases

NCP should be suspected if subjects conform to any one of the criteria in the epidemiological history and any two of the standards in clinical presentations. If there is no epidemiological history, suspected cases should meet three of the criteria in clinical presentations.

Epidemiological history: (1) Subjects with a travel or residence history in Wuhan or its neighboring areas, or other areas with persistent local transmission, or communities with definite cases within 14 days; (2) Subjects with a history of contacting confirmed cases with 2019-nCoV infections (positive nucleic acid detection) within 14 days; (3) Subjects with a history of contacting patients with fever or respiratory symptoms who have a travel or residence history in Wuhan or its neighboring areas, or in other areas with persistent local transmission, or communities with definite cases within 14 days; (4) Subjects who are associated with a cluster outbreak, which is defined as one definite case with NCP in family or place of work within 14 days, along with other patients with fever or respiratory symptoms.

Clinical presentations: (1) Fever and/or respiratory symptoms; (2) Typical chest imaging features of NCP, such as ground-glass opacity, infiltrating shadows, and pulmonary consolidation. (3) Normal or decreased white blood cell (WBC) count, or decreased lymphocyte count in the early stage of the disease.

Confirmed cases

Suspected cases who accord with any one of the following criteria: (1) Positive 2019-nCoV nucleic acid in throat swab, sputum, blood samples, or stool by using real-time RT-PCR; (2) Genetic sequencing of samples being highly homologous with the known 2019-nCoV.

Establishment of the rapid screening scale

Based on the epidemic situation in Zhejiang Province, we included age, gender, co-existing diseases, the epidemiological parameters, clinical symptoms, body temperature, WBC count, lymphocyte count, neutrophil count, and chest imaging to establish a novel diagnostic model of NCP. The epidemiological features and symptoms were considered as binary variables, and were scored as “1” if “yes”, and “0” if “no”. As to chest radiologic changes, they were simply classified as “normal”, “unilateral local patchy shadowing”, “bilateral multiple ground glass opacity”, “bilateral diffuse ground glass shadowing with pulmonary consolidation”, and “Other imaging alterations such as pulmonary nodule or pleural effusion”, and were scored as “0”, “0.5”, “1”, “2” and “0.3”, respectively.

The samples were classified to NCP, 339 individuals, and non-NCP, 520 individuals, according to their real-time PT-PCR outcomes since the detection of the 2019-nCoV nucleic acid using real-time RT-PCR was considered the golden standard. Consequently, after the derivation of the screening model, the diagnostic performance of the established scale was also verified. We further combined all the geographical regions into one parameter, that is “epidemic areas with persistent local transmission or communities with definite cases”, and developed a simplified model by rounding the coefficients. Moreover, we dropped the epidemiological parameters that are likely to become outdated given the evolution of the pandemic, and repeated the analysis.

Statistical analysis

Statistical analyses were conducted using SPSS software (version 22.0) for Windows (SPSS, Chicago, IL). Continuous variables were presented as mean ± standard deviation. Continuous variables were compared using the Student’s t-test, and categorical variables were compared using the chi-squared test. For multiple comparisons, the one-way analysis of variance (ANOVA) was performed. Univariate logistic regression analyses were conducted to assess the factors associated with NCP. The parameters with statistical significance were loaded to a multivariate logistic regression model to further identify independent predictors for NCP.

To identify candidate predictors, we performed a stepwise logistic regression analysis (P value to enter = 0.05 and P value to remove = 0.10). A model based on the results of multiple logistic regression analysis was established to screen NCP; furthermore, fivefold cross-validation was employed repeatedly for 10 times to evaluate the performance of the model and examine whether the model was over fitted. Model calibration was evaluated using the Hosmer–Lemeshow χ2 test. Area under receiver operating characteristic curve (AUROC) with 95% CI was used to assess the predictive accuracy of the screening model for determining NCP20. Bootstraps with 500 resample were applied to overplot the point-wise 95% CIs of the ROC curves by the R software (version 4.0.3) (not to re-estimate the regression coefficients). Optimal cut-off values were set, and the corresponding sensitivities, specificities, diagnostic accuracies, positive likelihood ratios, and negative likelihood ratios of the model were calculated. A two-sided P value cutoff < 0.05 was considered to be statistically significant.