Introduction

Coronavirus disease 2019 (COVID-19) is a viral disease caused by severe acute respiratory syndrome coronavirus 2, which was identified in Wuhan, China, in late December 2019 [1]. It rapidly spread worldwide and by early 2020 had affected most Western countries. It was officially recognised as a pandemic on the 11th of March 2020.

Real-time reverse transcription polymerase chain reaction (RT-PCR) has emerged as the gold standard for the qualitative detection of nucleic acid from SARS-CoV-2 in upper and lower respiratory specimens, with a nasopharyngeal swab being the preferred method for sample collection. Recent studies have shown possible inaccurate RT-PCR results (false negatives and false positives) with regard to detection of COVID-19, with reported sensitivity ranging from 60 to 71%, and specificity around 96% [2,3,4,5,6]. These inaccurate RT-PCR results may be due to inadequate sampling techniques, variations in viral load, and test kit sensitivity.

Chest CT has been suggested as a potential complementary approach alongside RT-PCR in the context of the current outbreak. Although COVID-19 CT findings may overlap with other diseases, such as other viral infections, they display a typical pattern in the context of this outbreak, presenting ground-glass opacities (GGO), with or without consolidations, in lung regions close to visceral pleural surfaces [7]. Chest CT may be implemented prior to RT-PCR testing, with a sensitivity of 60 to 98% [2, 4, 8, 9]. Furthermore, chest CT may provide crucial information regarding the extent and complications of COVID-19, or its differential diagnoses, within short time periods, without the need to wait for the results of RT-PCR.

Most radiology societies recommended that chest CT should not be used as first-line screening [3, 10] and should only be used in patients who are suspected or confirmed to have COVID-19, who have moderate to severe disease requiring hospitalisation, or who have underlying comorbidities if access to RT-PCR testing is restricted [11].

The accuracy of COVID-19 diagnosis based on CT features by radiologists has been previously investigated, however, largely on a retrospective basis and within settings with a high prevalence of COVID-19. In a previous study [12], structured CT reports by radiologists from several hospitals were evaluated based on multicentric homogeneous data using logistic regression modelling. However, analysis of diagnostic accuracy may be hampered by the heterogeneity of radiologists involved with varying degrees of experience.

Outsourced teleradiology is now widely implemented, but opinions differ regarding this practice. In the 2016 European Society Radiology survey, 70.8% of National Member countries practiced outsourcing [13]. While teleradiological outsourcing enable to provide radiology services in a variety of local settings, studies often point out the quality of reports and insufficient communication with clinicians. A clear need to improve confidence in the accuracy of outsourced reports and ensuring timely responses have been highlighted [14, 15].

Our aim was to prospectively assess the diagnostic accuracy of chest CT for COVID-19 and determine the inter-observer agreement between radiologists in patients suspected to have the virus from 15 French emergency departments operating an outsourced emergency teleradiological system during the COVID-19 outbreak in France.

Materials and methods

Patient selection and study design

The local institutional ethics review board approved this multicentric observational prospective study (N° CRM-2005-088).

At our emergency teleradiology center, all consecutive patients who were suspected, by a board-certified emergency physician, to have COVID-19, and who underwent both chest CT imaging and RT-PCR from March 13, 2020, to April 14, 2020, in 15 French emergency departments, were included (Fig. 1). The examinations took place during on-call duty periods between 6 pm and 8.30 am on weekdays and 24 h a day on weekends. A COVID-19 dedicated workflow was implemented, which consisted of a standardised COVID-19 CT request form for the requesting physician, structured and standardised radiological reports, and a systematic review by a senior radiologist.

Fig. 1
figure 1

Flowchart

CT techniques

Chest CT examinations were performed using a 16, 64, or 80-detector row CT scanner with a standardised non-contrast chest CT COVID-19 protocol for all hospitals. If pulmonary embolism was suspected, a CT pulmonary angiographic protocol with bolus-tracking intravenous iodine contrast agent administration at a rate of 3–4 mL/s was used instead. The on-site requesting physician supervised contrast administration if needed. Details regarding the CT scanner and the number of examinations for each hospital are given in Supplementary material 1. The geographic distribution of patients included is presented in Fig. 2.

Fig. 2
figure 2

Geographic distribution of patients included

Radiology interpretation protocol

The teleradiology interpretation protocol met the current French recommendations for teleradiology practice [16]. Reports and requests with clinical data for the interpretation of COVID-19 chest CT images were received from partner hospitals at our teleradiology center, using teleradiology software (ITIS; Deeplink Medical). The images were securely transferred over a virtual private network (VPN) to a local picture archiving and communication system for interpretation (PACS; Carestream Health 12). Images were interpreted by a teleradiologist (TR) in two dedicated emergency reading rooms during the study period. The panel of TRs consisted of 106 senior radiologists with at least 5 years of emergency imaging experience (mean length of practice: 7 years) and 45 junior radiologists (i.e., residents) with between 3 and 5 years of emergency imaging experience (mean length of practice: 4 years). TRs operated an on-call rota in groups of at least five TRs per night, and the report turn-around times were recorded.

CT examinations were systematically reviewed within a week after each on-call period by a senior radiologist (15 senior radiologists; mean length of practice: 12.1 years) who was not involved in the on-call duty period, blinded to RT-PCR results and the first reader report, and unblinded to the patient’s medical history.

Clinical data

Clinical information was prospectively provided by emergency physicians upon presentation and was collected using the teleradiology software as a dedicated COVID-19 CT request form (ITIS; Deeplink Medical). This clinical information included age, gender, active smoking, significant medical history, recent medication with anti-inflammatory drugs, time since onset of symptoms (categorised as: < 1 week, 1–2 weeks, ≥ 2 weeks), oxygen saturation (categorised as ≥ 95%, 90–95%, and < 90%), dyspnoea, fever (≥38 °C), cough, asthenia, headache, and ear, nose, and throat symptoms. The RT-PCR results were retrospectively collected from the patients’ electronic medical records by each partner hospital. The initial RT-PCR was considered as the standard of reference.

Discrepancies between the RT-PCR results and the score from the second reading were reviewed by contacting hospitals and investigating patients’ outcome, in order to determine whether a second chest CT and/or a second RT-PCR test had been performed.

CT image analysis

Six common radiological features were extracted from the specific, structured COVID-19 chest CT reports by the first and second independent readers. These features included the presence of GGO, consolidation, fibrosis (with traction bronchiectasis and architectural distortions), intralobular reticulations, and extent of abnormalities (categorised as low [< 25%], moderate [25–50%] or high [> 50%]). Additionally, the second reading included an assessment of image quality (categorised as good, moderate, or poor) and the following radiological features: (a) underlying pulmonary disease (categorised as emphysema, lung cancer, interstitial lung disease, pleural lesions, bronchiectasis); (b) GGO pattern (categorised as rounded and non-rounded GGO); (c) consolidation pattern (categorised as rounded and non-rounded consolidations and subpleural bands); (d) predominant pattern (categorised as GGO or consolidation); (e) distribution pattern of lesions (categorised as peripheral predominant, central predominant, or mixed); (f) bilateral lesions; (g) diffuse lesions (i.e., five lobes involved); (h) basal-predominant lesions; (i) pleural effusion (categorised as uni- or bilateral); (j) adenomegaly (defined as lymph node with short axis > 10 mm); (k) bronchial wall thickening (further categorised as lobar/segmental or diffuse); (l) airways secretions; (m) tree-in-bud centrilobular micronodules; and (n) pulmonary embolism.

Each reading was categorised using a five-point score, adapted from the recommendations of the Société Française de Radiologie (SFR) [17]: (1) normal; (2) non-infectious findings; (3) infectious findings but not consistent with COVID-19 infection; (4) consistent with COVID-19 infection; (5) typical appearance of COVID-19 infection (Fig. 3).

Fig. 3
figure 3

Categorisation using the 5-point scale adapted from the SFR. (a) Score of 1: normal chest CT. (b) Score of 2: lung abnormalities but non consistent with any pulmonary infection (showing acute cardiogenic pulmonary oedema with bilateral pleural effusion [black arrowheads]). (c) Score of 3: lung abnormalities consistent with an infection but not SARS-CoV-2 (showing a lobar consolidation [white arrowhead] with acute community-acquired pneumonia). (d) Score of 4: lung lesions compatible with COVID-19 (showing three small rounded central ground-glass opacities in the lower left lobe [white arrowheads]). (e) Score of 5: lung abnormalities strongly suspicious of COVID-19 (showing bilateral peripheral basal-predominant non-rounded ground-glass opacities [white arrowhead])

To support the conclusions of the TRs, all radiologists underwent a 2-h e-learning session based on reported chest CT findings associated with COVID-19 from the literature, which was made publicly available on the 7th of April [18]. A private medical discussion group (PandaLab) was used at the onset of the outbreak, such that on-call TRs could discuss and share images from their cases with all TRs who were not on call, prior to completing their reports.

Statistical analysis

Statistical analyses were performed using R (version 3.5.3, R foundation for statistical computing). A p value of less than 0.05 was deemed significant.

Inter-observer agreement between the six radiological features common to both readings and the scores were assessed using Cohen’s kappa (for dichotomised variables) and weighted kappa (for ordinal variables).

The frequencies of all clinical and radiological categorical variables from the second reading were compared between patients with positive RT-PCR (RT-PCR+) and those with negative RT-PCR (RT-PCR−) using Pearson χ2 or Fisher exact tests, except for age which was compared between the two groups using the Student t test. The odds ratio (OR), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), as measurements of accuracy with 95% confidence (95% CI), were calculated for each categorical variable independently and for a score of 4 or 5 and for a score of 5 for the 1st and 2nd readings. The probability of RT-PCR+ based on the scores was estimated using univariate binary logistical regression.

Changes in accuracy measurements (including the percentage of correctly predicted observations), inter-observer agreement, and disease prevalence over time were estimated by dividing the study period into eight standard periods of 4 days (as a compromise between an acceptable number of patients for each period [> 20] and preserving the shape of the epidemic curve).

Results

General description

Overall, 513 patients were included (Fig. 1). The median age of the population was 68.4 years old (range: 18–100) and 241/513 were female (47%). The prevalence of RT-PCR+ was 244/513 (47.6%). The distribution of RT-PCR+ and chest CT over the study period is shown in Fig. 4. Table 1 presents the descriptive features of the study population. The average dosimetry for chest CT was 251 ± 130 mGy cm.

Fig. 4
figure 4

Diagnostic accuracy measurements of on-call radiologists and inter-observer agreement variation between on-call radiologists and senior radiologists over the eight periods during the inclusion period: (a) number of cases of chest CT (depicted in black [weighted kappas for each step of the study period]) and number of RT-PCR+ patients (depicted in red [prevalence]). Accuracy corresponds to the percentage (from 0 to 1) of correctly classified observations, the p values for weighted kappa indices were all < 0.001; (b, c) diagnostic accuracy measures of on-call radiologists for a score of 4–5 (b) and 5 (c)

Table 1 Descriptive characteristics of the study population

Accuracy of radiologists

The 513 chest CT scans were interpreted at a first reading by 101 radiologists from the panel of the 151 teleradiologists available during on-call periods. Among them, 69/101 (68.3%) were senior and 32/101 (31.7 %) were junior radiologists. The average report turn-around time was 15.7 ± 9.1 min for a chest CT scan and 22.7 ± 12.5 min when there was a chest CT scan with another part of the body.

Table 2 provides a comparative summary of the diagnostic accuracy of the two readings. Scores from both readings were significantly associated with RT-PCR status (p < 0.001). The inter-observer agreement for scores was excellent (weighted kappa = 0.87, 95% CI = 0.84–0.90, p < 0.001). Other inter-observer agreements are shown in Table 3 and ranged from 0.41 (for fibrosis) to 0.79 (for GGO).

Table 2 Diagnostic accuracy measurements of on-call radiologists and senior radiologists
Table 3 Inter-observer agreement between first and second radiological readings

Regarding the second reading, the accuracy measurements were as follows: sensitivity of 0.92 (95% CI: 0.88–0.95), specificity of 0.84 (95% CI: 0.79–0.88), PPV of 0.84 (95% CI: 0.80–0.87), and NPV of 0.92 (95% CI: 0.86–0.95). The mean number of CT reviews per senior radiologist was 34.

By applying a univariate binary logistic regression model, we were able to estimate the probability of RT-PCR+. For a score of 1, 2, or 3, the probability of RT-PCR+ was 0.09 and 0.08 for the first and second readings, respectively. For a score of 4, the probability of RT-PCR+ was 0.47 and 0.49 for the first and second readings, respectively. For a score of 5, the probability of RT-PCR+ was 0.89 and 0.95 for the first and second readings, respectively (details for the modelling are provided in Supplementary material 2). The AUCs were 0.89 (95% CI = 0.86–0.92) and 0.93 (95% CI = 0.91–0.95), respectively.

For scores of 4 or 5 at the first reading, over the eight periods, accuracy ranged from 0.67 (after the first period; 95% CI: 0.45–0.84, RT-PCR+ prevalence: 0.25) to 0.90 (after the seventh period; 95% CI: 0.67–0.99, RT-PCR+ prevalence: 0.29). For a score of 5, accuracy at the first reading ranged from 0.80 (after the fourth period; 95% CI: 0.73–0.86, RT-PCR+ prevalence: 0.58) to 1 (after the last period; 95% CI: 0.86–1, RT-PCR+ prevalence: 0.04). Changes in other measurements of diagnostic accuracy of on-call radiologists and weighted kappas over the study periods are shown in Fig. 4.

Discrepancies between second readings and RT-PCR results

In total, discrepancies were reported in 62 patients. Forty-three of these 62 (69.4%) patients had negative RT-PCR but strongly suggestive chest CT. Of these 43 patients, 14 (32.6%) had a second RT-PCR test, which turned out to be positive in one 3 days later, and remained negative for the 13 other patients. The final diagnoses according to the emergency medical records were confirmed to be COVID-19 (1/43, 2.3%), indeterminate (8/43, 18.6%), probably COVID-19 (13/43, 30.3%), or another disease (21/43, 48.8%).

Nineteen of the 62 patients (31.6%) had positive RT-PCR but negative chest CT. The median time from onset of symptoms to CT was 3 days (range: 0–16). Of these 19 patients, two (10.5%) underwent repeat chest CT, the first 7 days later which was strongly suggestive of COVID-19, and the second 15 days later which was still classified as normal.

Clinical and radiological findings associated with RT-PCR status

Table 4 shows the overall and subgroup frequencies for all variables as well as measurements of their accuracy. The following clinical variables were associated with RT-PCR+: time since onset of symptoms (≥ 1 week, p = 0.04), oxygen saturation (< 95% [p = 0.03] and < 90% [p = 0.005]), presence of fever (p < 0.001), cough (p = 0.02), asthenia (p = 0.001), and myalgia (p = 0.008).

Table 4 Correlation between clinical and radiological features and RT-PCR status or corresponding diagnostic accuracy measurements

The following radiological variables were positively associated with RT-PCR+: presence of GGO (p < 0.001), non-rounded GGO (p < 0.001), rounded GGO (p < 0.001), presence of consolidation (p < 0.001), non-rounded consolidation (p = 0.01), subpleural bands (p < 0.001), intralobular reticulations (p < 0.001), fibrosis (p < 0.001), GGO predominant pattern (p < 0.001), peripheral predominant location (p < 0.001), bilateral lesions (p < 0.001), diffuse lesions (p < 0.001), basal-predominant lesions (p < 0.001), and low, moderate, and high extent of abnormalities (all p < 0.001).

The following radiological variables negatively correlated with RT-PCR+: consolidation predominant pattern (p = 0.02), central predominant location (p = 0.001), mixed predominant location (p = 0.002), airway secretion (p < 0.001), bronchial wall thickening (p < 0.001), either lobar/segmental (p < 0.001) or diffuse (p < 0.001), and tree-in-bud micronodules (p < 0.001).

Ninety-seven of 513 (18.9%) patients underwent a CT pulmonary angiographic protocol for suspected pulmonary embolism. Six of them (6/93, 6.5%) presented with a pulmonary embolism with the same frequency in each RT-PCR group (3/52 [5.8%] in the RT-PCR− group and 3/45 [6.7%] in the RT-PCR+ group). For these 6 cases, chest CT categorisations at the first and second reading were in accordance with the RT-PCR status.

Discussion

Our study found that inter-observer agreement was excellent between on-call radiologists with varying degrees of experience and senior radiologists. The ability to categorise chest CT scans from various hospitals was strongly reproducible between the emergency setting and the later second reading. This indicator was higher than previously reported in recent investigations [7]. Prokop et al [7] considered multiple rates based on a smaller population using Fleiss kappa, a different statistical approach to that used in our study. Moreover, the COVID-19 Reporting and Data System (CO-RADS) classification is not entirely equivalent to our adapted SFR classification. For instance, CO-RADS scores of 4 and 5 are equivalent to a score of 5 based on our SFR system. Hence, highly suspected cases of COVID-19 are classified under a single category in the latter system, rather than the two categories for CO-RADS, which may limit inter-observer discrepancies for the SFR system. The PPV for a score of 5 was 0.89 according to the first reading and 0.96 for the second reading, highlighting the strong reliability with which radiologists diagnosed COVID-19. The diagnostic sensitivity for radiologists (0.92) was lower than that reported by Ai et al, but prevalence of the disease was close to 100% and no threshold for determining positive scans was specified in the latter [2].

Based on this multicentric population of suspected COVID-19 in an emergency setting, radiologist scoring was shown to strongly correlate with RT-PCR status. Therefore, it may have been useful to perform initial CT screening while waiting for RT-PCR results, before hospitalisation. The clinical and radiological variables that highly correlated with RT-PCR status are consistent with those found in the literature, namely: fever, asthenia, oxygen saturation, presence of GGO (with a GGO main pattern), consolidation (especially non-rounded), subpleural bands, and intralobular reticulations with bilateral, diffuse, basal-predominant, and peripheral distributions [19, 20].

In our study, the AUCs were 0.89–0.93, slightly higher than those previously reported under on-site radiology conditions and were, overall, good [7, 21]. Accuracy and inter-observer agreement over the eight periods remained almost stable over time, although the number of RT-PCR+ and chest CT cases followed a bell curve distribution. There may be several reasons for this: knowledge regarding COVID-19 radiological findings has considerably increased since the initial papers were published in January and February 2020; open-source publications issued by radiological societies worldwide were immediately relayed to our facility and implemented in our workflow; we provided e-learning courses to junior and senior TRs; during on-call duty, TRs could ask for collegial help through a private discussion forum and also from other TRs on-call working together on-site in two dedicated emergency reading rooms; in addition to simply reducing reporting variability, the use of structured standardised reports and the scoring may have provided confidence and guidance to TRs [3].

It should be noted that the PPV for the combined scores of 4 and 5 declined when prevalence was at its lowest (25% to 4%), whereas the PPV for a score of 5 remained stable. These findings confirm the impression that, at low prevalence rates, a patient should be classified as suspected of having COVID-19 by radiologists only when confronted with consistent evidence of COVID-19 CT features. It should also be noted that 19 patients who were RT-PCR+ had a chest CT which was not categorised as compatible with COVID-19 infection. In this subgroup, the median time period between the onset of symptoms and the chest CT was only 3 days. Previously, similar CT findings were seen in 56%, 9%, and 4% of patients at 0–2, 3–5, and 6–12 days after the onset of symptoms, respectively [22]. However, upon further investigation, only two of these 19 patients were shown to have had a second chest CT, one which was normal 2 weeks later and the other which was positive 1 week later. These observations highlight that normal chest CT (i.e., a score of 1) does not exclude COVID-19 within the first 3 days of symptoms, as mentioned in the SFR recommendations. Conversely, 9/201 (4.5%) patients with typical CT findings according to the second reading showed a negative RT-PCR result. We investigated these discrepancies and found that 14 of the 49 (28.6%) patients were treated as COVID-19 patients, although only one of them was RT-PCR+ when tested again. This illustrates that chest CT could be useful to reclassify false-negative RT-PCR cases in patients with a high clinical suspicion of COVID-19.

There are, however, limitations to this study. First, we were not able to collect exhaustive data on clinical outcomes prospectively due to the limited nature of teleradiological investigation, as part of the patient care pathway. Second, in our multicentric study, local prevalence varied between the hospitals across the six French regions. Moreover, emergency physicians may have had different clinical screening practices, leading to heterogeneity among the study cohort. Third, we did not have access to on-site radiologist’s performance of the same departments. But in our teleradiology workflow, most of the time on-site radiologists do not interpret or re-interpret the exams we take care of. Finally, the RT-PCR results were collected retrospectively, leading to the exclusion of several patients and thereby presenting a risk of sampling bias.

To conclude, we present data from our French multicentric emergency cohort based on prospective radiological reports and secondary expertise. Our results illustrate the high diagnostic accuracy in diagnosing COVID-19 of TRs with various degrees of experience, in settings with different levels of prevalence, as well as excellent inter-observer agreement for chest CT. Thus, this kind of structured outsourced teleradiology model could bring high-quality structured and standardised reports with report turn-around time meeting the requirements of emergency medicine during the pandemic.