Introduction

False-negative errors account for 0.8–5% of daily clinical diagnostic imaging errors, and false-negative error rates at 13–90% were recorded under experimental conditions used to measure radiologists’ diagnostic ability [1,2,3,4]. A false-negative finding, or missed diagnosis, was the most common type of findings in these studies; for example, of the 12 subgroups of radiological error types defined by Kim and Mansfield, the false-negative finding accounted for 42% [5]. False-negative findings in an imaging diagnosis can lead to serious patient outcomes, as they can result in a delayed diagnosis and/or delayed treatment.

Most cases of false-negative findings are retrospectively detected by a follow-up examination. Findings that are visible in retrospect but were present in a previous examination are usually included in the radiological report of the most recent examination. However, an interpretation’s oversight is sometimes identified by a second reading of an imaging examination by a clinician or another radiologist or in a patient review meeting. In such cases, the revision of a finding might be added to the radiological report as an addendum. This is considered an example of “findings visible in retrospect,” in a sense. In such cases, the false-negative findings are likely to be included in the radiological report. We speculated that false-negative findings might be simply and easily extracted by searching radiological-report databases for words or phrases related to “visible in retrospect” in the radiological reports. We conducted the present study to evaluate the false-negative findings obtained by a search for words and phrases related to “visible in retrospect” in radiological reports.

Methods

Study design

This retrospective analysis was approved by the Ethics Review Committee of our hospital, waiving the need for written informed consent from the patients (No. 21C144).

Extraction of false-negative findings in radiological reports

We analyzed a total of 135,251 of radiological reports made at our over 700-bed hospital with over 40 departments during the 34 month period from October 2018 to July 2021. These reports included 81,899 reports of computed tomography (CT) findings, 36,174 of magnetic resonance imaging (MR), 9,585 of digital radiography (DR), 2,258 of positron emission tomography and computed tomography using 18F-fluorodeoxyglucose (PET), and 5,335 of other nuclear medicine (NM) examinations. We extracted all radiological reports containing wording related to “visible in retrospect:” ‘looking back,’ ‘reviewing back,’ ‘retrospective,’ ‘re-reading,’ ‘correction,’ ‘amendment,’ or ‘addendum.’ We then performed the same wording search in the content of each of extracted radiological report to identify the details of false-negative findings. Misinterpretations of findings were excluded, and false-negative findings that were not mentioned in previous reports were finally registered.

Number and qualification of radiologist

Different personnel had created the radiological reports in our hospital’s diagnostic radiology department over the study period: from October 2018 to March 2020, the reports were by ten full-time board-certified diagnostic radiologists and two residents; from April 2020 to May 2020, the reports were by five full-time board-certified diagnostic radiologists; and from June 2020 to July 2021, the reports were made by six full-time radiologists. At least two board-certified specialists for nuclear medicine were included through the study period. At least one expert with more than 10 years of neuroradiology experience was included from April 2020 to July 2021. There were 120,732 examinations (89%) for which imaging reports were completed within 2 days regardless of weekdays or holidays.

Regional organ classification (ROC)

The publication Terminologia Anatomica (hereinafter abbreviated as the “TA”), which is the international standard for human anatomical terminology developed by the Federative International Programme for Anatomical Terminology (FIPAT), provides a systemic organ classification [6]. However, findings obtained medical imaging modalities such as CT and MR are usually of a part of the body that is separated simply by a plane perpendicular to the long axis of the body. We therefore created a Regional Organ Classification (hereinafter abbreviated as the “ROC”) which was adjusted for medical images based on the TA. In its first chapter (General Anatomy), the TA describes parts of human body as the ‘Head,’ ‘Neck,’ ‘Trunk,’ ‘Upper Limb,’ and ‘Lower Limb.’ To reclassify these terms, we defined six major ROC categories: Head, Face-Neck, Chest, Abdomen, Extremity, and Other. Figure 1 illustrates the matching concept of the TA and the ROC and the details were described Appendix. For each of the six major ROC categories, we divided the minor category of the ROC into five parts: organ, vessel, lymph node, membranous structure forming cavity, and bone & soft tissue. Table 1 provides the details of the ROC’s minor category.

Fig. 1
figure 1

Reassignment of the Terminologia Anatomica categories to the Regional Organ Classification (ROC). The Terminologia Anatomica (TA) developed by the Federative International Programme for Anatomical Terminology (FIPAT) describes parts of the human body as ‘Head,’ ‘Neck,’ ‘Trunk,’ ‘Upper Limb,’ and ‘Lower Limb’ in the General Anatomy chapter; we re-classified these parts to six major categories in the ROC: Head, Face–Neck, Chest, Abdomen, Extremity, and Other. The categories' boundaries are the skull base, the manubrium—1st rib—C7/T1 disc, the diaphragm—T12/L1disc, the shoulder joint, and the hip joint. A false-negative finding extending equally to multiple regions was assigned to other of the ROC

Table 1 Major and minor categories of the Regional Organ Classification (ROC)

Lesion classification

We extracted the suspected diseases in these radiological reports or imaging findings and assigned them into six categories: Localized Lesion, Vascular Lesion, Inflammatory Lesion, Traumatic Lesion, Organ Dysfunction, Degenerative Lesion, and Other. False-negative findings were mainly assigned to the specific lesion classification based on the contents of the radiological report, and if this was quite difficult to determine, the judgment was made with reference to the patient’s electronic medical records. Table 2 explains the six lesion classifications.

Table 2 Lesion classification

Statistical analysis

Descriptive statistics were performed, including the incidence of false-negative findings by examination, time period, modality, ROC category, and lesion classifications.

Results

Of total of 135,251 examinations, we identified 940 reports (0.70%) and 962 findings (0.71%; 21 reports containing two findings and a single report containing three findings), with an average of 28 findings per month and 1.4 findings per work day (Table 3).

Table 3 The incidence of false-negative findings by examination and time period

Table 4 demonstrates the number of false-negative findings by modality and major ROCs: Of the 962 false-negative findings, approximately three-quarters were CT (74%), one-fifth was MR (22%), and the others were found in PET (4%), NM (1%), and DR (0.2%). In the major ROC categories, two-fifths were Chest (40%) and Abdomen (37%), one-sixth were Head (17%), and the others were found in Face- Neck (4%), Extremity (1%), and Others (1%). The appreciable frequent major ROCs by modality were Chest in CT (37%) and Head in MR (11%).

Table 4 Number of false-negative findings by modality and major ROCs

Table 5 shows the number of false-negative findings by minor ROCs: Among the minor ROC categories, false-negative findings were the most common in Lung (27%), followed by Liver (11%) and Brain (9%).

Table 5 Number of false-negative findings by minor and major ROCs

Table 6 shows the number of false-negative findings by lesion classification and major ROCs: According to the lesion classification, two-thirds were a localized lesion (69%), one–six were a vascular lesion (18%), and less than 10% each were found in inflammatory lesion (9%), traumatic lesion (1%), organ dysfunction (1%), degenerative lesion (1%), and seven of other (1%). The most common major ROCs of the lesion category were chest localized lesions (32%), followed by head vascular lesions (11%), chest inflammatory lesions (4%).

Table 6 Number of false-negative findings by lesion classification and major ROCs

The notable high-frequency false-negative findings in the combination of modality, the major ROC, and the lesion classification concerned a localized lesion in a lung on CT (n = 210, 22%), followed by a localized lesion in the liver on CT (n = 62, 6%), and a vascular lesion in a head vessel on MR (n = 45, 8%).

Discussion

False-negative error is one of the most critical issues in diagnostic radiology. Earlier reports have referred to false-negative errors as a perceptual error, non-identification error, missed diagnosis, omission error, underreading error, overlooking error, or oversight error [1,2,3,4,5, 7,8,9,10,11,12]. False-negative error has also been called “delayed diagnosis”, since it is identified later than the initial diagnosis. “Diagnostic discrepancy” or “diagnostic disagreement” is used as an indirect term, because the finding was deemed negative in the first reading and positive in a second reading. The description that a finding is “visible in retrospect” is used, because many such findings are discovered by looking at past imaging examinations with radiological reports based on the most current imaging inspection.

False-negative errors are unfortunately common in radiology practice, as is also true in the other clinical departments [2, 13,14,15]. Serious false-negative errors in medicine are unacceptable to the general public. The same legal penalties as those imposed for serious and unavoidable traffic accidents have been proposed for false-negative errors by radiologists [9]. Therefore, false-negative errors should be controlled and potentially eliminated by continuous monitoring and analyses and by taking steps to avoid causing medical distrust among the public.

Lee proposed that methods that could be applied for successful quality management in radiology must be reliable, robust, consistent, and easy to follow [10]. However, most patients undergoing imaging examinations do not have a pathological diagnosis or genetic confirmation, and it is thus difficult to extract definitive false-negative findings. Methods that have been suggested include auditing by the random pick-up of a small number of cases [10], validation in autopsy cases [16], radiology discrepancy meetings [5, 17, 18], double reading [19, 20], and an error registration system [11, 21,22,23]. While these methods are effective in that their use will enable medical personnel to identify errors more reliably and will also serve as error education, such errors are likely not representative of those occurring in hospitals on a daily basis [4]. In addition, these methods require extra actions in addition to normal reporting work, e.g., to introduce a new safety system or perform case sampling for auditing.

Diaz et al. analyzed the frequency of diagnostic errors in radiological reports by extracting radiological reports with a history of revision [24]. However, revisions of radiological reports may be made for a variety of reasons including insignificant reasons, and the revised words and phrases may be scattered in the context of the report, making it difficult to analyze them. Brigham et al. proposed that radiologists’ self-reported error identification by searching the addenda is another report-focused extraction method [4]; however, this method cannot count error detection by their colleagues. We adopted the keyword search related to “findings visible in retrospect” that could extract the false-negative findings pointed out by self-assessments as well as other radiologists. Using the same keyword search in the context of the reports made it easy to identify the details of false-negative results, as well. Performing our analysis in daily or monthly quality control can be cumbersome and time-consuming. We are currently brushing it up, so that it can be computationally automated.

Our present analyses revealed that the false-negative detection rate during the study period at our hospital was 0.71%, which is lower than 0.8–5% of diagnostic imaging errors in previous studies. This could be due mainly to the possibility that not all findings that are visible in retrospect are necessarily described in the radiological reports. Moreover, physicians are often reluctant to document their colleagues’ mistakes on the record [10]. It would thus be far better to have a common consensus on how to explain false-negative findings. Berlin advised that a report of a misdiagnosis should be succinct, matter-of fact, and nonjudgmental [8]. A simple statement such as “In retrospect, the lesion was present on the radiograph taken January 4, 1993” is sufficient. Words such as “missed,” “error,” and “mistake” and such phrases as “should have been diagnosed” and “was obviously present but not seen” should be avoided [8]. Our low false-negative detection rate might be also affected by the frequency of DR false-negative findings, which was lower than that in previous articles [1, 5]. We only read DRs from selected departments. This is a manifestation of a unique Japanese medical trend to cut DR readings to read numerous CT and MRI examinations [25, 26].

Conventionally, the anatomical classification used in examinations of diagnostic imaging errors has been performed empirically or ad hoc, and not been unified [24]. The anatomical classification is important, because organs, such as lung [27], liver [28], brain [29], and others, have specific regional characteristics that are associated with oversight. The imaging anatomical classification should thus be unified and consistent with the terminology of human anatomy. Our present investigation was the first attempt to re-organize and transfer the anatomical terminology defined by the FIPAT to the new ROC that matches the results of medical imaging.

Most of the organs listed by the FIPAT could be assigned into six ROC categories: head, face–neck, chest, abdomen, extremity, and other. However, the internal, external, and common carotid arteries, the trachea, and the esophagus as boundary organs had to be divided. On the other hand, ‘Pelvis’ in the TA was included in the ROC category Abdomen. This integration allowed us to avoid the vague divisions of the small and large intestines, abdominal vessels, and ureter, which are considered borderline organs between the upper abdomen and pelvis. A problem remains, however: not being able to classify abdominal MR findings precisely, since imaging examinations are usually performed separately for the upper abdomen and pelvis.

The lesion classification of false-negative findings in our study was organized to be completed primarily within the content of the radiological report, so that cases with few definitive diagnoses can be monitored without undue burden. The ROC divides the lesion classification into the six categories of Localized, Vascular, Inflammatory, Traumatic, and Degenerative lesions, Organ dysfunction, and other. These categories may not always be accurate or distinct, but they do not require an excessively profound search for the final diagnosis. Nevertheless, a further re-organization of this classification might be required if new categories that do not belong in the others category are identified.

The most frequent false-negative findings were in Localized lesions in the Lung found on CT, at 22% of all of the false-negative findings detected in this study. This high frequency suggests that a lung lesion must be differentiated from a malignant tumor. If this situation can be ameliorated, 22% of the false-negative results could be suppressed. Until very recently, the practice of diagnostic radiology was solely a human effort, and it was thought that the extermination of false-negative findings could not be achieved [30]. However, artificial intelligence-based computer-assisted diagnosis (AI-CAD) has afforded sensational developments in radiology [31,32,33,34].

AI-CAD is computer software that learns image data with labeling and outputs the optimum diagnosis. Although it takes a long time until AI-CAD can catch up with actual human perception [35], some AI-CADs in limited domains are catching up with human imaging-based diagnosis ability. In addition, since various types of AI-CAD are introduced in the future, it is necessary to develop a methodology that enables continuous accuracy analysis by a unified method. Our method for monitoring false-negative findings might be an easy way to verify the performance evaluation of AI-CAD in the near future. Our present analyses identified frequent false-negative findings in localized lung lesions on CT (22%), followed by localized lesions of the liver on CT (6%) and vascular lesions of cerebrovascular disease on MR (5%), and the careful management of these lesions could be the most effective for reducing missed diagnoses, and these results suggest a direction for the development of AI-CAD.

Our study includes some limitations. It is entirely possible that some false-negative findings are not included in radiological reports. However, we observed that the more serious the previously missed findings were in the most current examination, the more likely they were to be mentioned as findings “visible in retrospect” in the current report. This is because the serious findings were usually investigated by multiple physicians. We did not examine whether changes in the number of radiologists during the study period contributed to differences in the number of the false-negative findings. The number of radiologists can influence two factors; the occurrence of the false-negative findings and retrospective detection thereof, which are difficult to simultaneously analyze. Our analysis might not have detected false-negative findings in the one-time-only examinations performed at our hospital due to a lack of observation of temporal changes. Our study may have failed to detect the false-negative findings that would have been detected if multiple radiologists made careful observations. There could also be findings that one radiologist believes to be a false negative, while another radiologist believes them to be a true negative. The “visible in retrospect”-related keywords used in our search may not have identified all false-negative findings. In terms of being able to complete with a radiological report, however, our method is easy to use as a surrogate monitoring system for false-negative findings, even if it is not perfect.

Conclusions

Our analysis revealed regional and lesion characteristics for false-negative findings in the whole body across a wide variety of imaging modalities. Our results demonstrated that missed lung localized lesions on CT, which account for about a fifth of false-negative findings, were the most common false-negative finding.