Introduction

Colorectal adenocarcinoma (CRC) is the second most common cancer in women and the third most common in men with an estimated 1,849,418 new cases worldwide in 2018 [1]. More colorectal cancers are now diagnosed at an early stage thanks to advances in screening, and digestive endoscopy techniques allow an increasing number of early-stage cancers to be removed. Endoscopic treatment is also less invasive and has lower morbidity compared with traditional surgery [2].

Lymph node metastases are found in between 3.6 and 16.2% of patients with superficial pT1 colorectal cancers (sCCR), conditioning their eligibility for endoscopic treatment alone or for additional surgery with lymph node dissection [3]. According to current international guidelines, including those of the Japanese Society for Cancer of the Colon and Rectum (JSCCR), incomplete resection, significant budding (grade 2 or 3), venous and/or lymphatic invasion, adenocarcinoma with poor differentiation and submucosal invasion (SMI) deeper than 1000 μm are indications for surgery [4,5,6]. Whether forthcoming European guidelines will endorse the same indications or will propose another SMI threshold of 2000 μm remains uncertain. Indeed, some studies suggest that in the absence of other indications, a SMI threshold > 1000 μm, alone, is not associated with higher risk of lymph node metastases or poorer survival [7,8,9,10,11,12]. These parameters are widely accepted but they suffer from variable interobserver agreement [13,14,15,16,17,18,19,20]. Moreover, there are some debate in the literature about how best to measure SMI depth and what threshold, ranging from 1000 to 3000 μm, best predicts the risk of lymph node metastasis [4, 7, 9, 21, 22]. Three quantitative methods have been proposed since the turn of the century and the corresponding measurement differences can affect patient management [4, 7, 9]. Interobserver agreement has only ever been assessed for the Ueno method, and interobserver and intermethod variability, with or without immunohistochemistry (IHC) and/or digitized slides, is an important concern [13,14,15]. The use of IHC, whether to measure infiltration or to assess budding, is not yet well established, and while digital pathology is increasingly used for diagnosis, its place in the evaluation of these criteria has not been studied.

The aims of this study were therefore to evaluate (i) the reproducibility of histopronostic factors to guide patient management after endoscopic resection of superficial colorectal cancer and (ii) the contributions of IHC and digital pathology in evaluating these criteria and the impact of these techniques on indications for additional surgery in current international guidelines and forthcoming European recommendations.

Methods

Patients

All patients who had a pT1 sCCR treated by endoscopic resection between 01/01/2010 and 31/12/2019 in the study centre (department of gastroenterology, Edouard Herriot Hospital, Lyon France) were included. Patients were identified exhaustively by cross-referencing database of the sample management software (Diamic, Dedalus C&G) from our pathology department with patient lists from multidisciplinary gastrointestinal tumour board (MDT). The exclusion criteria were insufficient material for immunohistochemical study, no visible infiltrating cells left on immunohistochemical slides and the tumour being reclassified to a higher stage than pT1 on examination of the additional surgical specimen. Clinical data on follow-up, overall survival, metastasis-free survival and recurrence were collected from the patient’s medical record.

Endoscopic data

The endoscopic data considered were the location of the tumour, its size and the type of resection (endoscopic submucosal dissection (ESD), endoscopic mucosal resection (EMR), endoscopic piecemeal mucosal resection (EPMR)), as recorded in patient’s endoscopy reports.

Sample processing and immunochemistry

The analyzed slides were 4 μm haematoxylin-eosin-saffron (HES)-stained tissue sections. Dual colour IHC was performed using a Ventana BenchMark ULTRA® automated slide preparation system (Ventana-Roche Diagnostics), an UltraView DAB Detection Kit (Ventana-Roche Diagnostics) and an UltraView Alkaline Phosphatase Red Detection Kit (Ventana-Roche Diagnostics) with the following antibodies: AE1/AE3 keratin (1:400 dilution, Dako); D33 desmin (1:50 dilution, Dako).

Measurement of histological parameters

All HES slides with infiltrating cells were independently analyzed by three pathologists (GP, TF and VH), respectively a junior, a senior and a senior pathologist specialized in gastrointestinal pathology. The HES and IHC slides were digitized with a Leica biosystems Aperio AT2 brightfield scanner. The parameters only evaluated on physical slides were the type of polyp, lymphovascular invasion (lymphatic and venous invasion was differentiated based on the absence/presence of muscular layer in the invaded vessel), histological grade according to the 2010 World Health Organization classification and the 2019 WHO classification, mucinous or signet ring cells in the deepest part of tumour and presence of a positive vertical margin as recommended by the JSCCR [4, 23]. Whenever possible, the SM level of invasion was classified according to Kikuchi et al. [24].

Tumour buds were counted according to the recommendations of the 2016 International Tumour Budding Consensus Conference (single cells or clusters of < 5 cancer cells without gland formation at the front of the tumour/0.785 mm2) [5]. Tumour budding was then scored in a three-tiered (grade 1 to grade 3) and two-tiered system (not significant: grade 1 or significant: grade 2 and 3).

The depth of SMI that was measured in micrometres according to the Ueno, Kitajima and JSCCR methods (Fig. 1) [9, 7, 4]. The measurements were made either with an optical micrometre under microscope or using the Aperio ImageScope software (Leica Biosystems) for virtual slides. The measurements were made sequentially, each set after another (respectively HES slides, digital HES slides, IHC slides and digital IHC slides), blindfolded to the data obtained at the previous steps to minimize learning bias.

Fig. 1
figure 1

Measurement of submucosal invasion depth using the methods proposed by Ueno et al. [9] (top row), Kitajima et al. [7] (middle row) and the Japanese Society for Cancer of the Colon and Rectum [4] (JSCCR, bottom row), for different forms of invasion in pedunculated and sessile polyps: (a) a pedunculated polyp with head invasion and a visible, intact muscularis mucosae; (b) a pedunculated polyp with head invasion and a damaged but locatable muscularis mucosae; (c) a pedunculated polyp with head invasion and an invisible or altered muscularis mucosae; (d) a pedunculated polyp with head and stalk invasion and a tangled muscularis mucosae; (e) a sessile polyp with a visible, intact muscularis mucosae; (f) a sessile polyp with a damaged but locatable muscularis mucosae and (g) a sessile polyp with an invisible or altered muscularis mucosae

The potential impact of these factors on therapeutic decisions was measured. Each case was classified as low or high risk according to JSCCR criteria to estimate potential differences in therapeutic decisions arising from differences between observers and methods. The risk was defined as high, for T1 cancers, if at least one of the five following criteria were met: (i) a positive vertical margin (R1, automatically considered when piecemeal resection), (ii) SMI depth > 1000 μm, (iii) adenocarcinoma with poor differentiation including signet ring cell and mucinous carcinomas, (iv) grade 2–3 tumour budding, (v) presence of lymphovascular invasion (LVI). The results were also interpreted in terms of indications for surgery based on expected unpublished European guidelines, namely (i) a positive vertical margin (R1, automatically considered when piecemeal resection), (ii) adenocarcinoma with poor differentiation, (iii) presence of venous and/or lymphatic emboli, (iv) presence of high-grade budding and (v) SM invasion depth > 2000 μm.

Statistical analysis

The level of significance was set at p < 0.05. To quantify inter and intra-observer reproducibility (IOR and IAR) for qualitative data the Fleiss-kappa statistic test was used. For quantitative data intra-class correlation coefficient was used. The values of kappa strength agreements were interpreted according to McHugh et al. [25]. The value of intra-class correlations (ICC) was according to Koo et al. [26]. All statistical analyses were done with R (version 4.0.3).

Results

Study population

A total of 98 patients were included (56.1% of male; median age of 71 years old): 98 samples (one sample by patient) were studied, 65 of which (66.3%) were endoscopic submucosal dissection, 22 (22.5%) endoscopic mucosal resection specimens and 11 (11.2%) endoscopic piecemeal mucosal resection specimens. Three samples were excluded because the infiltrating cells were no longer visible after IHC staining. The lesions ranged in size from 6 to 100 mm with an average of 37 mm and a median of 30 mm (Table 1). No significant difference was observed between the groups with and without piece meal resection except for the median follow-up time that is increased in the piece meal resection group (Table 1). This result is not surprising as the surveillance has to be more intense for these patients for which no information about the quality of resection is available.

Table 1 Clinical and main pathological characteristics of patients

At the time of our study in December 2021, 1 (1%) patient presented with a recurrence of a dysplastic lesion without an infiltrating lesion 4 years after piecemeal resection. A total of 49 (50%) patients underwent subsequent colorectal surgery with lymphadenectomy and 3 (3.1%) of them had regional nodal metastases. Distant metastases were observed in 1 (1%) patient, without any CRC-related death. The patient’s clinical and pathological characteristics are presented in the Table 1.

Distribution of pejorative histopronostic factors

In our study, most cases had an infiltration depth > 1000 μm according to the JSCCR method. Only one case had infiltration < 1000 μm and was associated with other pejorative histopronostic factors (Supplemental table 3). The patient in question did not have lymph node metastasis. Significant budding was found in 2 cases (2.0%), lymphatic invasion in 6 cases (6.1%) and veinous invasion in 3 cases (3.1%) (Table 1). The other aggressive pathological features linked to differentiation were more often found with a respective frequency of 7.1% for poor differentiation (7 cases) and 11.2% for poorly differentiated clusters (11 cases). It has to be noticed that no signet ring cell contingent was found.

Reproducibility of infiltration’s depth

The Ueno and JSCCR methods had excellent interobserver reproducibility (IOR), with intra-class correlation coefficients (ICCs) of 0.858 and 0.903, respectively, on HES, under microscope. IHC analysis improved it further (ICC = 0.923 and 0.925) (Table 2). The JSCCR method obtained the best IAR between modalities (ICCs ranging from 0.738 to 0.894), except for the junior pathologist’s analysis on the digital slides (Table 3). The IAR between methods was poor (was poor to good) (Table 4).

Table 2 Summary of interobserver agreement with intra-class correlation coefficients (95% confidence intervals) between three raters for measuring the depth of infiltration according to the method and modality of observation
Table 3 Summary of intra-observer agreement with intra-class coefficients (95% confidence intervals) between modalities for measuring the depth of infiltration according to methods of observation from three raters
Table 4 Summary of intra-observer agreement with intra-class coefficients (95% confidence intervals) between methods for measuring the depth of invasion according to the observation modality from three raters

Reproducibility of other prognostic factors

Regarding poorly differentiated clusters, lymphatic invasion and venous invasion on microscope examination of HES, IOR was poor (κ = 0.141, 0.196 and 0.313, respectively) (Table 5). For budding, the IOR of microscope evaluation was poor, either with the recommended classification [5] or when classified as non-significant/significant (i.e. no budding or grade 1 versus grade 2 or 3) (κ= 0.122 and 0.172, respectively). Microscope IHC analysis seemed to be better, reaching moderate IOR (κ = 0.560 for the three-tiered classification and 0.714 for binary classification). More high-grade and significant budding cases were detected. Digital analysis did as well as microscope examination (Table 5).

Table 5 Summary of interobserver agreement measurement by Fleiss’s kappa coefficient for qualitative and semi-qualitative data, from three raters

Additional surgery

Regarding theoretical indications for additional surgery according to JSCCR recommendations (Table 6) IOR based on microscopic HES analysis was moderate (κ = 0.607). IHC analysis improved it (κ = 0.763). Digital pathology analysis was even more reproducible when combined with IHC analysis (κ = 0.802).

Table 6 Summary of interobserver agreement by Fleiss’s kappa coefficient on the surgical indication according to the JSCCR criteria and to the different modalities

IOR for surgery indications based on forthcoming European recommendations was increased with IHC (κ = 0.659) (Table 7). Furthermore, the number of cases in which surgery would have been indicated was not significantly different between HES and IHC analysis for the two recommendations (Tables 6 and 7). Digital pathology did not change significantly the IOR.

Table 7 Summary of interobserver agreement by Fleiss’s kappa coefficient on the surgical indication according to likely future European recommendations to be published and to the different modalities

Among the 90 theoretical indications of additional surgery, 53 were proposed during the dedicated MDT and 49 patients underwent this surgery: persistent local tumour was found in 3 patients and 3 other had lymph nodes involvement. The 3 patients with local recurrence presented deep infiltration > 2000 for 2 of them no matter the method used to establish that measure, whereas the third had a deep invasion measure that was varying between > 1000 or > 2000 depending on the method. Three patients with lymph node metastasis all presented one aggressive feature. The first presented veinous invasion, the second poor differentiation and the third a deep invasion > 2000 μm. After a median follow-up of 27.7 months, the median recurrence-free survival was 30.8 months (Table 1).

Discussion

This study was carried out on endoscopic resection specimens only, on the contrary to most of the other studies in the literature. These are biased by selection towards more severe endoscopic patterns, for which surgery was indicated in the first place [7, 9, 27]. The limitations of our study include learning effect from the sequentially analyzed cases. Besides, it may be relevant to consider pedunculated and sessile polyps separately, as that the risk of metastasis is lower for the former and the SMI is probably a more important factor for the latter [7, 28,29,30]. However, in our study, there was no differences between the two groups (Table 1).

The depth of submucosal invasion is one of the key factors for additional surgery decision. However, there is still no consensus about the measurement method and the staining to use to obtain a robust criterion (Fig. 2 and 3).

Fig. 2
figure 2

Difficulties for submucosal infiltration depth measurement in A, C, D, E HES- and B and D IHC-stained sections (100 × magnification) of colorectal adenocarcinoma in a sessile polyp specimen. In this first case, A HES staining suggests that muscularis mucosae fibres are present (arrows) while B IHC staining reveals that there is none. In the second case, note how a very small lesion with estimable muscularis mucosae in HES (C) but destroyed in IHC (D) can exceed the depth threshold of 1000 μm (here, 1162 μm) if measured using the JSCCR (double-headed black round dot arrow), Ueno (double-headed black dashes arrow) and Kitajima (double-headed long dashes arrow) methods. In the third case, depending on the block selected, the muscularis mucosae appears completely ruptured (E) or assessable (F) (black single arrow). The invasion depth is 328 μm if measured using the Ueno and Kitajima method (double-headed black dashes arrow and double-headed long black dashes arrow respectively) and 2459 μm if measured using the JSCCR method (double-headed black round spot arrow)

Fig. 3
figure 3

Difficulties for submucosal invasion depth measurement in A HES- and B IHC-stained sections (100 × magnification) of colorectal adenocarcinoma in a pediculated polyp specimen. Example of a major discrepancy between measurement methods for a pedunculated polyp. The black line represents level 2 of the Haggit classification. The invasion does not extend below this line (A). According to Kitajima et al., this is a case of « head invasion ». The double-headed black round dot and the double-headed black dashes arrows show the depth of invasion (6.7 mm) measured from the surface of the lesion (as in the JSCCR and Ueno methods respectively) in cases where the muscularis mucosae is destroyed (B)

To our knowledge, this is the first study to have evaluated IOR between observers with different experiences both using three different methods (Ueno, Kitajima and JSCCR) and different histological technics including IHC and digital pathology. The fact that Ueno and JSCCR methods had excellent IOR, particularly on IHC analysis should be considered for future recommendations. The use of digital pathology was equivalent and did not reduce IOR. The good IOR of the Ueno method was consistent with other reports by several authors with ICCs varying from 0.89 in Barel et al. study to [14] 0.64 in Wang et al.’s work [13]. The JSCCR method IAR was excellent, except for those of the least experienced observer, whereas the Ueno method IAR was not affected by the experience of the pathologist. This may be linked to the complexity of the JSCCR method compared to Ueno’s. However, with the Ueno method, agreement between HES and IHC results was lower. This is explained by the ability with IHC staining to better identify the MM fibres and thus adjust the upper level of the SM layer (Fig. 2). Digital pathology also seems to be impacted by the experience of the pathologist as the IAR (microscope versus digitalized HES) was moderate to good except for the junior pathologist.

IAR is highly variable when comparing one measurement method to another. These results may explain why different measurement thresholds have been established in different studies, ranging from 1000 to 3000 μm. Indeed, the daily practice of these methods do not give concordant measurements. Therefore, we recommend that future recommendations mention to always report which measurement method was used [7, 9, 31].

Both Ueno and Kitajima methods, are based on subjective evaluation of MM integrity. As in our study, Davenport et al. and Kitajima et al. found it hard to evaluate the MM status. While not perfect, IHC can resolve certain ambiguities. The JSCCR method is much stricter in that SMI depth is measurement, although it is important to bear in mind that the aspect of the MM can differ a lot between sections (Fig. 3). The JSCCR method is therefore highly reproducible at the cost of SMI depth overestimation. Supporting this statement, Kouyama et al. and Yoshida et al. reported that depth measurements they made from the surface of the lesion were in all cases > 1000 μm [32, 33] leading to many surgeries.

Regarding the IOR of other prognostic factors, which lead to complementary surgery on their one, the rarity of these events makes the κ difficult to interpret, as in other studies of sCRC endoscopic treatment. However, the proportions of cases in which these features were observed were consistent between techniques and similar to those reported in the literature for poor differentiation and signet ring cells [7, 8, 14, 34]. LVI was found between 8 and 14.6% of cases, when we found 2–9% of cases for lymphatic emboli and 6–9% of cases for venous emboli [14, 9, 15]. The distinction between lymphatic and veinous emboli may also be relevant as it is not linked to the same pathological mechanism. However, the percentage of tumours with positive vertical margins was lower in the present study (7 to 15%) than in Barel et al.’s (38%) [14]. We believe this is because of our study’s setting in a tertiary endoscopy centre with high resection volume. Although the IOR of each histopronostic criteria considered here was low, indications for surgery based on multiple factors agreed much better.

Regarding budding, as κ is highly dependent on the number of observed events, the low κ on HES mainly reflects that there were few cases with grade 2/3 budding. Our results show that IHC might improve reproducibility with a three-tiered system (grade 1 and grade 3 being the most reproducible). However, the highest IOR (κ = 0.714) was achieved by using a two-tiered system (significant or not) with IHC. As two-tiered classifications are more reproducible, this should be considered for future recommendations. Since many more buds were detected with IHC than with HES staining, it should be kept in mind that it may have a direct impact on patient management. That’s why to this day, IHC is not recommended in guidelines so far. To be recommended, IHC needs to be more studied to define specific thresholds adapted to the fact that more buds are counted with this technic. Even with IHC, the IOR only achieve moderate agreement. Indeed, there are many pitfalls in the evaluation of budding [12, 16]. Previous studies are consistent on the role of IHC for budding detection, with an even stronger impact in the Barel et al. study (HES κ = 0.235 versus IHC κ = 0.842 [14, 17].

In terms of theoretical indications for surgery according to JSCCR recommendations or forthcoming European ones, moderate reproducibility (κ = 0.607 to 0.763) is explained in part by the low prevalence of cases with no indication for surgery. The fact that nearly all cases had an indication for surgery is mostly explained by the measurement method, which increases the likelihood of measurements > 1000 μm [12, 32, 35].

UK recommendations mentioned that strict application of the JSCCR recommendations leads to overuse of surgery [35]. In our practice, after considering the patients’ comorbidities and their wishes, the number of patients who underwent surgery is much smaller (n = 53 (54%)) (Table 1). A posteriori, in 41% of cases, the therapeutic management did not follow the recommendations of the JSCCR. However, it does not seem to impact the patient prognosis, as with close follow-up, although less than 5 years median, only one patient developed distant metastasis without death from colorectal cancer occurring.

Importantly, for the first time, a study shows that digital pathology achieves the same levels of reproducibility as microscope on all factors studied. This is an important condition for its use, which will probably become more and more widespread in the coming years. The main point with digital pathology is to improve and accelerate consultation between pathologists from several centres to respond more accurately and quickly to patient management problems.

Conclusions

In conclusion, although most histopronostic factors associated with the occurrence of lymph node metastases have poor measurement reproducibility, here and in the literature, our results suggest that their combined use in therapeutic decision making compensates for the variability of each factor and yields clinically acceptable levels of reproducibility. This study also indicates that IHC facilitates the evaluation of certain criteria and may therefore improve the reproducibility of these assessments. Digital analyses could be used as the reproducibility is like microscope examination. Finally, we call for new recommendations or consensus for daily practice pathological assessment of endoscopic specimens, as there is still a lot of specific issues that remain unclarified and to raise the question about the relevance of the threshold to 1000 μm.