Serrated polyps of the colon: how reproducible is their classification?
- First Online:
- Cite this article as:
- Ensari, A., Bilezikçi, B., Carneiro, F. et al. Virchows Arch (2012) 461: 495. doi:10.1007/s00428-012-1319-7
- 1.1k Views
For several years, the lack of consensus on definition, nomenclature, natural history, and biology of serrated polyps (SPs) of the colon has created considerable confusion among pathologists. According to the latest WHO classification, the family of SPs comprises hyperplastic polyps (HPs), sessile serrated adenomas/polyps (SSA/Ps), and traditional serrated adenomas (TSAs). The term SSA/P with dysplasia has replaced the category of mixed hyperplastic/adenomatous polyps (MPs). The present study aimed to evaluate the reproducibility of the diagnosis of SPs based on currently available diagnostic criteria and interactive consensus development. In an initial round, H&E slides of 70 cases of SPs were circulated among participating pathologists across Europe. This round was followed by a consensus discussion on diagnostic criteria. A second round was performed on the same 70 cases using the revised criteria and definitions according to the recent WHO classification. Data were evaluated for inter-observer agreement using Kappa statistics. In the initial round, for the total of 70 cases, a fair overall kappa value of 0.318 was reached, while in the second round overall kappa value improved to moderate (kappa = 0.557; p < 0.001). Overall kappa values for each diagnostic category also significantly improved in the final round, reaching 0.977 for HP, 0.912 for SSA/P, and 0.845 for TSA (p < 0.001). The diagnostic reproducibility of SPs improves when strictly defined, standardized diagnostic criteria adopted by consensus are applied.
KeywordsSerrated polypDiagnosisCriteriaKappa statistics
The term “serrated polyp” is used as a generic name for polyps demonstrating saw tooth-like infolding of the surface and crypt epithelium. The family of serrated polyps (SPs) comprises hyperplastic polyps (HPs), sessile serrated adenomas/polyps (SSA/Ps), and traditional serrated adenomas (TSAs) according to the latest WHO classification , in which the term “SSA/P with dysplasia” is preferred instead of the category of mixed hyperplastic/adenomatous polyps (MPs) . The most common members of the SP family, HPs, comprise 80–90 % of all serrated polyps and are found throughout the colon and rectum, yet with distal predominance. Histologically, HPs are characterized by simple elongated crypt architecture and narrow crypt bases resembling normal mucosa, with proliferative activity confined to the basal third of the crypts [1, 3–6]. SSA/Ps, on the other hand, account for 8–20 % of serrated polyps with a predilection for the right colon. Their diagnosis is based mainly on crypt architectural features including serration, dilatation, horizontal orientation, L-shape or inverted T-form at the base of the crypts [1, 3–9], which furthermore show an asymmetrical proliferation zone and goblet cell or gastric foveolar cell differentiation . The rarest type of SPs, TSA, has a protuberant growth pattern with a complex villiform configuration and premature crypt formation, defined as “ectopic crypt” [1, 5, 9, 11].
How reproducible are the diagnoses of serrated polyps?
What is the impact of a standardized classification (i.e., WHO classification) on diagnostic reproducibility?
Which criteria are used for diagnosis, how reproducible are they, and what is their discriminative value?
Materials and methods
Twenty European pathologists, all members of the Digestive Diseases Working Group of European Society of Pathology and all with special interest and experience in gastrointestinal pathology, were invited to participate in the study. Cases diagnosed as SPs (28 HPs, 25 SSA/Ps, 11 TSAs, and six MPs) in the original sign-outs were retrieved from the pathology archives of Ankara, Graz, and Warsaw. None of the features as size, location, or biopsy orientation was considered in the selection of cases in order to simulate a real-life experience for participating pathologists. Thus, the cases comprised a mixture of polyps with characteristic features and those with features hampering an ambiguous diagnosis. The aim was to assess the histological criteria and classification without any supportive information to avoid bias in morphological assessment. Consequently, the original sign-out diagnoses, size, and location of the polyps were not provided on the worksheet.
The study was designed to allow an assessment of diagnostic reproducibility before and after introduction of the latest WHO classification . The initial round of observations included 20 pathologists and a total of 70 cases. To initiate this process, the group first evaluated 15 cases of SPs retrieved from the files of the Department of Pathology, Ankara University Medical School, including round table discussions based on the initial observations and focusing on diagnostic criteria and terminology in order to establish a common language and a standardized diagnostic approach. A further 55 cases were then provided by three centers (Ankara, Graz, and Warsaw) to allow in-depth analysis of reproducibility and diagnostic criteria on a total of 70 cases, permitting reliable Kappa analysis. Subsequent to the publication of the 4th edition of WHO Classification of Tumors of the Digestive System , the same group (four members could not participate, leaving a total of 16 assessors) re-assessed the 70 case series using the WHO diagnostic categories and proposed criteria, which constituted the second round. The worksheets for the second round contained the initial diagnostic categories of HP, SSA/P, TSA, and MP in order to allow comparisons between the two rounds, but the participants were informed that the category of MP in their previous assessment corresponded to the new category of SSA/P with dysplasia, in line with the WHO classification. The participants were requested to diagnose such cases as SSA/P with dysplasia, including a qualification of the degree of dysplasia (low or high grade). Dysplasia was not further classified as adenomatous or serrated, although nuclear features comprising hyperchromasia, elongation, and pseudostratification are considered as diagnostic for “adenomatous” dysplasia, while vesicular nucleus, prominent nucleolus, and cytoplasmic eosinophilia are indicative of “serrated” dysplasia according to the WHO .
To define the most reproducible and discriminative criteria for each diagnostic category of SPs, all participants registered on the excel sheets as to which of the predefined diagnostic criteria they regarded as the most relevant.
Kappa statistics was performed in the Department of Biostatistics, Ankara University Medical School, by the same biostatistician (NK) for each round using SPSS for Windows 15.0 for paired (between two observers) and overall (group) inter-observer agreement and for intra-observer agreement. Comparisons between the initial and the second round were assessed using the Wilcoxon signed ranks tests. Analysis of the criteria was performed using chi-square tests and by means of proportions of positive judgements method for agreement levels . Kappa values were grouped as poor (<0.2), fair (0.21–0.40), moderate (0.41–0.60), good (0.61–0.80), and perfect (>0.80). A p value less than 0.05 was considered as significant.
Initial round: how reproducible are the diagnoses of SPs?
In the initial round, 44 % (n = 31) of the 70 cases were diagnosed as HP, 40 % (n = 28) as SSA/P, 10 % (n = 7) as TSA, and 6 % (n = 4) as MP by the majority of the observers.
The overall kappa value for the first 15 cases was poor with a kappa value of 0.202 (CI lower 0.147–CI upper 0.256; p < 0.001), while overall kappa values for each diagnostic category were 0.315 for HP, 0.223 for SSA/P, 0.181 for TSA (p < 0.001), and 0.107 for MP (p > 0.05).
Following consensus discussions, on the additional 55 cases a fair inter-observer agreement was achieved with an overall kappa value of 0.349 (CI lower 0.320–CI upper 0.377; p < 0.001). Kappa values for each diagnostic category were 0.443 for HP, 0.323 for SSA/P, 0.512 for TSA (p < 0.001), and 0.235 for MP (p = 0.01), respectively.
Finally, on the full 70-case set, a fair overall kappa of 0.318 (CI lower 0.293–CI upper 0.343; p < 0.001) was reached; also, for each diagnostic category, kappa values were fair, reaching to 0.415 for HP, 0.301 for SSA/P, 0.433 for TSA (p < 0.001), and 0.221 for MP (p = 0.01).
Second round: the impact of WHO classification on reproducibility
In the second round, a re-evaluation of all 70 cases was performed by 16 of the initial 20 participants using WHO criteria. A diagnosis of HP was made on 43 % (n = 30), SSA/P on 46 % (n = 32), TSA in 10 % (n = 7), and MP on 1 % (n = 1) of the cases. One HP was diagnosed as SSA/P and three MPs as SSA/P with dysplasia, while the diagnoses of 30 HPs, 28 SSA/Ps, seven TSAs, and one MP remained unchanged (Fig. 1d). No case was classified as UCP by the majority in either round, but 49 % (n = 33) of the cases received this diagnosis from between one to three observers.
Paired and overall agreement for all cases and for each diagnostic category
Initial round (total, n = 70)
Paired kappa (min–max)
Second round (total, n = 70)
Paired kappa (min–max)
Intra-observer agreement between the initial and final rounds of assessments on 70 cases
How reproducible and discriminative are the histologic criteria?
Agreement of observers for diagnostic criteria
Epithelial serration in upper crypts
Surface epithelial serration
Mitosis in lower crypts
Goblet cells in upper crypts
Dilatation in upper crypts
Dilatation in lower crypts
Goblet cells in lower crypts
Epithelial serration in lower crypts
Mitosis in upper crypts
Gastric epithelium in lower crypts
Diagnostic criteria discriminative for each SP category
Mature goblet cell
This study presents a European initiative aiming to improve the reproducibility of histological diagnoses of SPs. The poor overall inter-observer agreement for 15 cases studied initially led us to repeat the assessment after a consensus discussion on 70 cases, which improved the agreement from poor to fair, highlighting the significance of a standardized approach supported by a consensus discussion.
Introduction of the WHO classification through a second consensus meeting led to a further improvement in overall agreement for the whole group, and perfect agreement was achieved for the categories HP, SSA/P, and TSA.
Summary of publications on reproducibility of the diagnoses of SPs
Publication (reference number)
Bariol et al. 2003 
255 polyps (72 HPs, 9 SAs, 4 MPs, 170 conventional adenomas), 2 observers
Kappa for diagnostic criteria ranging from −0.029 to 0.852
Sandmeier et al. 2007 
102 SPs (58 HPs, 7 SSAs, 5 TSAs, 3 MPs, 29 UCPs)
Criteria for HP vs. SSA, no kappa
Glatz et al. 2007 
20 SPs (8HPs, 4 TSAs, 4 SSAs, 4 tubulovillous adenomas), 168 participants (Internet-based quiz)
High variation in SSA, no kappa
Farris et al. 2008 
185 SPs (92 HPs, 74 SSAs, 19 TSAs), 5 observers
Kappa = 0.55
Bustamente-Balen et al. 2009 
195 SPs (187 HPs, 8 SAs), 2 observers
Kappa = 0.14
Wong et al. 2009 
60 polyps (26 SAs, 11 HPs, 6 MPs, 12 conventional adenomas, 5 other polyps), 4 observers
Kappa = 0.38
Khalid et al. 2009 
40 SPs (comprised of HPs and SSAs and all originally diagnosed HPs), 3 observers
Kappa = 0.16
Gunia et al. 2011 
19 SPs (8 SSAs, 3 TSAs, 8 inflammatory polyps), 3 observers/trainees
Kappa = 0.29–0.65
70 SPs (28 HPs, 25 SSA/Ps, 11 TSAs, 6 MPs)
Kappa = 0.557, 0.977 for HP, 0.912 for SSA/P, and 0.845 for TSA
Our study was conducted in a blind fashion regarding the size and localization of the polyps in order to analyze the morphological criteria in a more stringent way, devoid of any bias. It is, on the other hand, generally accepted that diagnosis of polyps with intermediate features that lie on a continuum between HP and SSA/P may require clinical information, and for such polyps, pathologists may be more inclined to make a diagnosis of SSA/P when the polyp is large and localized in the right colon. However, additional clinical information failed to have any effect on the diagnostic accuracy of the observers in two previous studies [23, 27]. We believe that the histopathological diagnosis of a polyp should not depend upon the size or the localization of the lesion. However, for unclassified non-dysplastic serrated polyps with intermediate features or in case of sampling error, pathologists need to know the size and localization of the lesion before making a diagnosis. Although a diagnosis of serrated polyp “unclassified” is recommended by WHO , improvement in our diagnostic skills together with the use of standardized criteria will help to better classify such borderline cases and avoid unnecessary utilization of the unclassified category.
The third issue that was raised by the initiative was to define the most reproducible and discriminative criteria for each type of serrated polyp. Although many publications utilize similar criteria in the diagnosis of SPs, there are considerable variations in histologic features as well as definitions and terminology in many others, particularly for SSA/P [4, 10, 18]. The first approach of standardization came from Bariol et al. , who evaluated the diagnostic utility of histologic criteria attributed to serrated adenomas in a series of cases including hyperplastic polyps, serrated adenomas, admixed polyps, and conventional adenomas. In their study, surface epithelial dysplasia, surface epithelial tufting, increased surface mitosis, and epithelial serration in more than 20–50 % of crypts were the criteria with the highest kappa values for the diagnosis of serrated adenoma. They did not, however, assess how consistently different observers could distinguish serrated adenoma from other polyps, nor did they further classify their cases as sessile and traditional serrated adenomas.
In a more recent attempt to distinguish HP from SSA, Sandmeier and colleagues  assessed 102 SPs and tested the reproducibility of Snover’s criteria . They, too, blinded the observers to the clinical information in order to avoid any bias in the histopathological interpretation. As in our study, they found architectural changes together with the various cell types including goblet cells, undifferentiated cells, and gastric foveolar cells in basal crypts to be the most useful criteria to distinguish SSA from HP. Also, Farris et al.  concluded that architectural rather than cytological features are most helpful in distinguishing SSA from HP, whereas cytological features like nuclear elongation and stratification are more useful in distinguishing TSA from other serrated lesions. Of the WHO criteria, the most discriminatory for HP were serration, dilatation, and presence of mature goblet cells in the upper crypts, whereas the presence of these features in the lower crypt zone was significantly consistent with a diagnosis of SSA/P together with the architectural features, such as horizontal crypts and crypt branching, and nuclear features, including vesicular nucleus and prominent nucleolus. TSA, on the other hand, was associated with ectopic crypts and crypt branching as well as nuclear features comprising hyperchromasia, elongation, pseudostratification, and cytoplasmic eosinophilia. These findings confirm that architectural rather than cytologic features are diagnostically useful and also that nuclear features are more reproducible than architectural criteria.
On similar grounds, the Working Group on Gastrointestinal Pathology of the German Society of Pathology  proposed that the SSA/P architectural features should be present in at least two different crypts, not necessarily adjacent, although the new WHO definition of SSA/P requires two or three contiguous crypts with these features . There is no evidence base for either view, but it is safe to postulate that several crypts close together should be a minimal criterium, which was however not assessed in our study.
Two types of dysplasia have been observed in SSA/Ps: “adenomatous” dysplasia and “serrated” dysplasia, the latter characterized by round cells with eosinophilic cytoplasm, vesicular nuclei, and prominent nucleoli [1, 20, 35]. SSA/Ps with dysplasia were probably classified as “mixed polyps” in the past [2, 5, 6, 32, 36], a term used for lesions with distinct foci of adenomatous epithelium and hyperplastic/serrated architecture. WHO  recommends the term “SSA/P with dysplasia” in order to emphasize that the dysplastic part of the lesion never shows APC mutations as found in adenomatous epithelium but rather presents with MSI resulting from methylation of MLH-1. In our study, a small number of cases, almost half of which showed dysplastic foci, were diagnosed as MP in the first round, whereas a diagnosis of SSA/P with high-grade dysplasia was made for these cases in the second round. Surprisingly, however, there was one particular case which was persistently diagnosed as MP in both rounds by the majority of the participants who apparently felt that dysplastic SSA/P category does not always coincide with polyps showing truly and distinctly mixed features. For the latter ones, a category of MP would seem legitimate.
In the present study, although dysplasia was graded as low and high without further classification into adenomatous or serrated types, the histological criteria used in the study already comprised features of serrated dysplasia such as vesicular nuclei, prominent nucleoli, and eosinophilic cytoplasm, as well as features of adenomatous dysplasia characterized by pseudostratification, nuclear byperchromasia, and elongation . An evaluation of these criteria demonstrated that the majority of SSA/Ps possessed features of “serrated” dysplasia. Although their original description involves dysplastic epithelium, TSAs lack mitotic activity in the tall columnar epithelial cells with pencillate nuclei and eosinophilic cytoplasm and are therefore not truly dysplastic in the sense of tubular or villous adenomas . The high rate of low-grade dysplasia in the TSA group suggests that the epithelial lining of TSAs was misinterpreted as low-grade dysplastic by the majority of the observers due to its resemblance to adenomatous epithelium. The low-grade dysplasia observed in a small percentage of HPs seems to correspond with a mucin-poor variant of HP presenting with regenerative epithelium that can be misinterpreted as dysplasia.
In conclusion, the results of the study show that consensus discussions on a sufficiently large SP collection improve inter-observer agreement, which was further improved when the new WHO classification was introduced. Furthermore, architectural criteria appear as most reliable for an accurate diagnosis of a SP. However, even when a consensus classification such as that provided by the WHO is applied, the reproducibility of the histopathological diagnosis on a SP remains imperfect.
Conflict of interest
The authors declare that they have no conflict of interest.