Introduction

With the growing significance of pathological diagnosis in recent years, semi-personalized medicine, based on histopathological diagnosis and molecular profiling, has become a basic strategy for treating cancer and other intractable diseases.

As pathological diagnosis is the gold standard for cancer diagnosis, pathologists are expected to possess a wide range of knowledge and constantly be up to date with clinical advancements. Accurate diagnosis in every specialty area has become an increasingly difficult task for individual pathologists. Therefore, many institutions have run trials to create a system of double-checking by experts, which has shown improved diagnostic accuracy [1].

However, not all hospitals have the capacity to build an in-house framework encompassing all areas of expertise; therefore, case consultations are sent out for external expert opinions. This relatively time-consuming process requires sectioning and staining by lab technicians, creating a clinical information summary, obtaining macroscopic and radiological imaging data, writing a consultation letter, and mailing glass slides. Since the whole process could take at least 1 week, it is common practice that such external consultations are only utilized for limited cases, which, in a retrospective study by Cook et al., was expected to be 0.35–0.56% [2].

Digital pathology is a rapidly growing field internationally [3]. A whole-slide image (WSI) produced by scanning of glass slide is a core element of digital pathology. Digital transition requires an advanced laboratory infrastructure and allows for multiple tasks in pathology, from education and research to clinical uses, such as primary diagnosis, intraoperative consultation, telecytology, and multidisciplinary team discussions [4,5,6]. With recent advancements in digital pathology [5], it has become easier to obtain external consultations by avoiding cumbersome steps [3, 7]. In addition to the validation of primary diagnosis by WSI [8,9,10], multiple studies comparing the accuracy of consultations using WSI with those using glass slides have shown high agreement rates, suggesting potential widespread use of remote consultation in the future [8, 11,12,13]. This trend has been accelerated by the COVID-19 pandemic, during which digital pathology has proved to be key to maintaining clinical and academic activities in pathology departments [14, 15].

Currently, in some European and Asian (e.g., Japan) countries, consultations are not included in medical insurance coverage and are performed without compensation. Any related costs, such as additional staining, are covered by the consultant’s institute. This is because of the lack of data demonstrating how much external consultations improve diagnostic accuracy [16,17,18,19]. There is an urgent need for scientific evidence that consultation based on WSI improves diagnostic accuracy, which may promote further regulation amendments on a broad scale.

This study aimed to elucidate whether external WSI-based consultations of diagnostically difficult cases improve the accuracy of histopathological diagnosis.

Materials and methods

A two-step study was designed, wherein the first step investigated the number of inconclusive diagnostically difficult cases in participating institutions, and the second examined the degree of improvement in diagnostic accuracy after sending them out for expert consultations. The research protocol was approved by the ethical committee of Nagasaki University (#17,051,513).

First step: investigating the frequency of diagnostically difficult cases

Although external consultations are availed for cases that are difficult to diagnose, there is no clear definition of what constitutes a diagnostically difficult case in histopathological practice. In this study, diagnostically difficult cases were defined as cases with an inconclusive diagnosis, usually indicated by words such as “suspicious,” “uncertain,” “probable,” “suggestive,” and “inconclusive” in the diagnostic line [20]. The frequency of such cases was investigated at two independent institutions: Nagasaki University Hospital (academic facility with 800 beds) and Awaji Medical Center (community hospital with 450 beds). We set two study periods, 2010 (January to December) and 2013 (January to December), during which different pathologists were responsible for diagnosis. All histopathological cases from the two study periods were screened in the laboratory information system, to identify diagnostically difficult cases as per the above definition.

Second step: expert consultation and evaluation of diagnostic agreement

A total of 30 pathologists, including 24 expert pathologists and six senior consultant multi-expertise pathologists from academic institutions, were involved in this study. A subspecialty expert was defined as a board-certified pathologist with more than 10 years of expertise in the field and a core member of the national subspecialty society or working group under the Japanese Society of Pathology. A senior consultant was defined as a board-certified pathologist with more than 25 years of expertise, practicing in the academic setting, and regularly performing sign-out of general pathology cases from various specialties.

Among the 1018 cases extracted during step 1, 30 were randomly selected from each of the following nine subspecialties: hematopathology, dermatopathology, gastrointestinal, hepato-pancreatico-biliary, genitourinary, gynecological, breast, head and neck, and bone and soft tissue pathology (270 total cases). The initial quality control check aimed in excluding cases suboptimal for evaluation (i.e., faded stain) and replacing those with another diagnostically difficult case from the same subspecialty. Data on the patient’s age, sex, sampled organ, clinical diagnostic data, and original pathological diagnosis were collected. After anonymization, all the archival slides belonging to the cases, including immunostains, were provided new labels and were converted to WSI by scanning at 40 × with a Philips Ultrafast Scanner® (Philips, Amsterdam, Netherlands). Scanned images were stored on a secure, firewall-protected, demilitarized zone server (DMZ; Philips) and assigned to subspecialty experts. Each expert could access and view their WSIs via the Philips IMS Viewer using their user ID and password.

The consultation process consisted of one to three rounds, depending on the decisions made at each level, as shown in the study flowchart (Fig. 1). Twelve experts reviewed 270 cases to classify them into four categories, as per our previous nationwide study [10]. These categories were as follows: (1) agree with the initial inconclusive diagnosis, (2) agree, but change the diagnosis to definite by removing uncertain vocabulary from the diagnostic line, (3) change the diagnosis to require different treatments (defined as “major discrepancy”), and (4) change the diagnosis, but the treatment approach is not affected (defined as “minor discrepancy”).

Fig. 1
figure 1

Flowchart of the study

Cases in the major and minor discrepancy groups were sent to a second round of subspecialty experts, who reported whether they agreed with the original or the expert diagnosis. If the second expert gave a diagnosis discordant with the first expert, by either agreeing with the original inconclusive diagnosis or disagreeing with both the original and first expert diagnoses, the case was further assigned to six senior consultant pathologists. These third-round experts were asked to select whether they agreed with the first expert, the second expert, or neither experts. The consensus of the six senior consultants was obtained by a majority vote, and it was recorded as the final diagnosis (Fig. 1).

The percentages of diagnostic change were calculated, and their confidence intervals were computed using the Wilson score interval. Statistical analyses were performed using the JMP SAS, version 13.0.0 (SAS Institute, Cary, NC, USA).

Results

The rate of diagnostically difficult cases

Table 1 shows the distribution of inconclusive diagnoses at Nagasaki University Hospital and Awaji Medical Center in 2010 and 2013. The rate of inconclusive diagnosis at the Nagasaki University Hospital was 5.7% (2010) and 3.1% (2013). After the implementation of a default double-check system in 2013, the inconclusive diagnosis rate considerably decreased. At Awaji Medical Center, the inconclusive diagnosis rate was 1.2% and 5.4% in 2010 and 2013, respectively.

Table 1 Prevalence of cases with inconclusive diagnoses

Consultation by subspecialty experts (rounds 1 and 2)

Among the initially selected 270 diagnostically difficult cases, four were excluded due to missed slides with immunostaining.

Out of the total 266 inconclusive diagnoses, the original diagnosis was not changed in 116 (44%) cases, while 150 (56%) cases received a change of diagnosis by the first expert (Fig. 2A). Among these 150 cases, 80 (53%) were changed from an inconclusive to a definite diagnosis. The remaining 70 (47%) cases showed diagnostic discrepancies, wherein 38 cases showed minor diagnostic discrepancies, accounting for 14% of the total and 25.3% of corrected cases. Major discrepancies (n = 32) accounted for 12% of the total and 21.3% of the corrected cases.

Fig. 2
figure 2

Results of the first expert consultation. A Diagnosis after the first round of expert consultation. Cases with diagnostic discrepancy (enclosed in red) were further sent for a second consultation. B Change of diagnosis by subspecialty. a, agreement with the original diagnosis; b, changed to definite concordant diagnosis; c, minor discrepancy; d, major discrepancy. Y-axis indicates the number of cases. ENT: ear, nose, and throat (i.e., head and neck); Heme: hematopathology; GYN: gynecological; HPB: hepato-pancreatico-biliary; soft/bone: soft tissue and bone; GI: gastrointestinal; GU: genitourinary

Certain trends among the nine subspecialties were noted after the first consultation (Fig. 2B). The highest rate of diagnostic change from inconclusive to definite was found in breast pathology, with 14 cases (47%), followed by the head and neck and genitourinary subspecialties with 12 cases (40%) each. The occurrence of major discrepancy was highest in the head and neck area with 8 cases (27%), followed by 7 dermatopathology cases (23%) and 6 bone and soft tissue cases (21%). The highest number of minor disagreement rate cases was found in hematology with 10 cases (34%), followed by head and neck and gynecology with six cases (20%) each. After combining major and minor discrepancies, the disagreement rate remained the highest in the head and neck group (47%, 14 cases), followed by dermatopathology (40%, 12 cases), hepato-pancreatico-biliary (28%, 8 cases), and gynecology (27%, 8 cases). The disagreement rate was lowest in urology (7%, 2 cases), followed by the gastrointestinal subspecialty (13%, 4 cases).

Cases labeled as major and minor discrepancies (26% of total cases) were further sent to 12 experts for consultation. Second-round experts agreed to the first consultation in 39 cases (56%) and agreed to the original diagnosis in 21 cases (30%) (Fig. 3). Additionally, in 10 cases (14%), the second-round experts disagreed with both the original and the first consultation diagnosis, for which new diagnoses were provided.

Fig. 3
figure 3

Results of the second expert consultation. Diagnostic agreement on cases with major and minor discrepancies detected in the first round of consultation. The pie chart corresponds to the diagnostically discrepant cases shown as an area enclosed in red in Fig. 2A

Consultation by senior multi-expertise pathologists (round 3)

Six senior consultant pathologists reviewed 31 cases, in which the first and second consultation results were discordant (Fig. 4, Table 2; WSIs accessible via https://t.ly/QlWu). The senior consultants agreed with the first consultation diagnosis in 19 cases (61%), the original diagnosis in eight cases (26%), the second consultation diagnosis in two cases (6%), and for two cases no consensus was reached. Only three cases (9.6%) showed complete agreement by all six senior pathologists, reaffirming the fact that they were indeed difficult to diagnose.

Fig. 4
figure 4

The result of round 3 consultation on the cases discordant between the previous two rounds. Diagnosis of all 31 cases by six senior pathologists. a, agreement with the original diagnosis; b, agreement with the first consultant; c, agreement with the second consultant; d, different diagnosis; e, no consensus

Table 2 Summary of 31 cases submitted to round 3 of consultation (WSIs accessible via https://t.ly/QlWu)

The analysis of these diagnoses revealed that the highest rate of disagreement (14/31, 45.2%) occurred due to diverse interpretations among pathologists. Seven of these cases were borderline cases, in which it was difficult to differentiate between two diagnoses, for example, atypical vs. malignant cells or grade 1 vs. grade 2 neuroendocrine tumors. Other reasons for disagreements included requirement of multidisciplinary team discussion (6/31, 19.4%), insufficient immunohistochemical workup (4/31, 12.9%), lack of diagnostically significant clinical information (3/31, 9.7%), sampling error (3/31, 9.7%), and a morphologically unusual case (1/31, 3.2%).

Combining the second and third round consultant agreements, from a total of 70 cases with diagnostic discrepancies, 58 cases (83%) showed agreement with the first consultant diagnosis and two cases (3%) with the second (Fig. 5). Eight cases (11%) showed agreement with the original diagnosis, and two cases had no consensus diagnosis. Overall, out of 70 cases, 60 cases were considered cases with major or minor diagnostic discrepancies.

Fig. 5
figure 5

The final diagnosis in 70 discordant cases identified in the first consultation round. The pie chart corresponds to the diagnostically discrepant cases shown as an area enclosed in red in Fig. 2A

Summary of all consultation rounds

After consultation rounds 1–3, the original inconclusive diagnosis was changed for 140 out of the 266 cases, accounting for 52% of the total diagnostically difficult cases. Among these cases, 80 cases (30%) reversed the inconclusive diagnosis to a definite diagnosis, and 60 cases (22%) changed the diagnosis with a minor (32 cases, 12%) or major (28 cases, 10%) disagreement to the original diagnosis (Fig. 6). There were two cases (1%) with no consensus diagnosis.

Fig. 6
figure 6

Summary of all three rounds of expert consultation

A survey on the utility of digital pathology for external consultation

After completing consultation rounds, all the experts were invited to participate in online survey using Google Forms (Google Inc, Mountain View, CA) and 25 respondents were able to provide their feedback.

The quality of WSIs used in the study was qualified as sufficient or nearly-sufficient for making diagnosis by the vast majority of experts (90%), irrespective of subspecialty. Furthermore, about 2/3 of pathologists mentioned that they did not experience problems while reviewed WSIs. Remaining experts noticed certain issues related to digital images, such as difficulty with recognizing fine details on high magnification, inability to perform precise focusing, and some technical difficulties related to slow image loading or out-of-focus areas.

Regarding the potential impact of telepathology on consultation practice, there was a major agreement that the digital approach may decrease the cost of the whole consultation process (96%), shorten turnaround time (100%), and, ultimately, improve the quality of the final diagnosis (96%). Given that the WSI scanner is available in a laboratory, all the above benefits will definitely increase the number of consultations (100%). Most of the respondents indicated the convenience of digital mode for sending and receiving extradepartmental consults. Interestingly, while only 60% of experts regularly use digital pathology, there was no significant difference in scores between the two groups. Detailed answers are provided in the supplement.

Discussion

This study evaluated the impact of WSI-based expert consultation on inconclusive histopathological diagnoses and showed substantial improvement in diagnosis after remote second opinion.

In our series, approximately 5% of the total histopathological cases were considered diagnostically difficult, requiring expert consultation. In 2002, the College of American Pathologists (CAP) in a large-scale survey of consultation cases reported an extradepartmental consultation rate of 0.5% of all cases [21]; this lower percentage can be explained by the differences in methodology and definitions compared to our study. In addition, the CAP survey recorded cases that were sent out for consultation as glass slides and/or paraffin blocks, which is a mainstream approach but requires much more comprehensive logistics compared to digital consultations. Moreover, our study was specifically designed to evaluate the rate of cases with inconclusive diagnoses and not all of them would be sent out for extradepartmental consultation in a real-world scenario. Most institutions (95%) who participated in the CAP survey were from the USA [21], which has the largest number of pathologists (over 20,000) in the world [22], providing opportunities for easier access to subspecialty experts within the same institution. In contrast, our study results would be more applicable to mid-range laboratories of countries with relatively lower pathologist workforce.

Interestingly, CAP predicted the growth of consultation by about 29% in 2030 compared to 2010 [23]. More recently, several studies have confirmed the feasibility of WSI-based diagnosis [5, 10, 11, 24,25,26]; therefore, with the convenience and low-cost process of WSI-based consultation, the potential for expert consultation may increase in tandem with growing digital infrastructure [21]. Our team reported about the successful establishment of the multi-institutional digital pathology network with a broad range of subspecialty experts readily available for a second opinion via the online telecommunication system [27]. Our experience showed that in-house digital pathology integrated with the laboratory information system, electronic medical records, and further automated with cloud-based consultation portal allows for efficient expert consultation [27, 28]. Well-established WSI-based workflow may overcome most of the difficulties associated with the traditional glass-based second opinion. This concept was further reinforced by survey conducted among 25 experts participated in our study, who shared common viewpoints that digital mode will decrease the cost of consultation, shorten turnaround time, and improve the diagnostic quality.

The difference in the percentage of extradepartmental consultations between the academic facility and the community hospital in our study was small. While the number of inconclusive diagnoses at Awaji Medical Center increased over time, Nagasaki University showed a decrease in inconclusive diagnoses. This could be explained by the implementation of a double-check system, also known as double reporting, which involves consulting a case to more than one pathologist from the same department prior to reporting the case [29].

Complete agreement after a consultation was found in less than half (47%) of diagnostically difficult cases, while 22% of the diagnoses were changed by consultants, resulting in major or minor discrepancies (Fig. 6). These numbers are in agreement with those reported on glass slides in the CAP survey, which encompassed 2,746 consultation cases, wherein the rate of complete agreement and change of original diagnosis were 54.6% and 27.7%, respectively [21]. Another important finding in our series was the high rate (30%) of diagnosis change from inconclusive to definite after consultation.

This study reports the first set of objective data indicating the degree of improvement in diagnostic accuracy after remote expert consultations using WSI. Consultations are not recognized as medical practices in Japan; therefore, there are no medical fees for expert consultations, and these services are often provided pro bono. Currently, only a limited number of cases undergo such consultations because of the complex process and lack of additional compensation. As the framework for outside consultation is being created by utilizing digital media such as WSI, it is our hope that expert consultations will soon be established as default, which may potentially impact up to 5% of routine cases that fall into the “difficult to diagnose” category. For example, by making projections from our study, a mid-volume histopathology laboratory with an annual load of 20,000 cases and without a default double-check system is estimated to have up to 1000 cases with an inconclusive diagnosis. Of these, approximately 300 cases could receive definite diagnosis by expert consultation (Fig. 6). Furthermore, additional 220 cases could be corrected from discrepant diagnosis, which will change the course of treatment in approximately 100 of these cases (10.5%), which is a significant impact. These estimates increase proportionally with a higher caseload.

In a recent study of diagnostic discrepancies between glass slides and WSI in challenging consultation cases, Bauer et al. (2013) found major and minor discrepancies in 0.9% and 3.7% cases, respectively [11]. A similar validation study of 1,070 diagnoses with WSI by the current authors found a major discrepancy rate of 0.9% and minor discrepancy rate of 3.5% [10]. The lower rate of diagnostic discrepancy in these two studies, compared to that of the present study, can be explained by the difference in scope and design of these studies. For instance, the main objective of the above two studies was to evaluate intra-observer variability between glass slides and WSI in unselected series. In contrast, the present study evaluated inter-observer variability specifically in challenging cases, using remote consultation by expert pathologists.

Major diagnostic discrepancies between primary and second opinion diagnoses ranged from 2.2 to 2.3% [30, 31]. These studies were performed in consecutive cases referred from other hospitals, which could explain the low rate of major discrepancy. In contrast, our study deliberately collected diagnostically difficult cases to measure the effect of expert consultation and showed a significant improvement in diagnostic accuracy. Interestingly, in the breast pathology field, a high rate of major discrepancy (11.4%) was reported [32] which is similar to our result. The levels of disagreement may differ by organ and disease category, such as neoplastic vs. non-neoplastic. For instance, Strosberg et al. (2018) reported that among malignancies, neuropathology (10.9% major and 41.3% minor discrepancies) and genitourinary (2.0% major and 30.7% minor discrepancies) cases had the highest number of disagreements [31]. However, in our cohort, dermatopathology followed by head and neck pathology showed the highest amount of disagreement. The proportion of non-neoplastic and neoplastic cases in our study was 27% and 73%, respectively.

This study had some limitations. The number of participating laboratories was limited; therefore, a smaller number of cases were evaluated for consultation compared to nationwide surveys [21]. In addition, there was a lack pulmonary/thoracic case among those sent for the consultation. The reason behind this is that both hospitals participated in the study are tertiary respiratory centers well equipped with expert pulmonary pathologists. Therefore, all the challenging lung cases during the study period were resolved on site either by consensus of in-house pulmonary pathology experts or with an additional input of multidisciplinary discussion. Another minor reason, which precluded us from enrolling lung cases, is that the international guidelines allow diagnostic terminology of “indeterminate” and “probable” for some entities among interstitial lung diseases, which could create a bias considering our definition of diagnostically difficult cases. From the technical point of view, we evaluated only one model of the scanner with the only default magnification (40 ×). It is also important to note that, given the comprehensive design of the study, we did not aim to draw direct comparisons between WSI vs. glass slide evaluation.

Emphasizing the strengths of the study, this is the first investigation to clarify diagnostic improvement by remote digital consultation in collaboration with several experts, representing different subspecialties. Additionally, the employment of two levels of expert pathologist consultation and a third-level consultation with multi-expertise senior pathologists were instrumental in making the final diagnosis highly accurate.

With wide acceptance of remote consultation and the establishment of medical fees, it would be possible to create an environment for pathologists to make diagnoses without barriers such as compensation, timing, and location. Fully digitized, remote consultation diagnostic centers can benefit from this approach in the near future. In conclusion, we present a process of significant improvement in the pathological diagnosis of difficult cases by performing remote consultation by experts using WSI.