Grading of invasive breast carcinoma: the way forward

Histologic grading has been a simple and inexpensive method to assess tumor behavior and prognosis of invasive breast cancer grading, thereby identifying patients at risk for adverse outcomes, who may be eligible for (neo)adjuvant therapies. Histologic grading needs to be performed accurately, on properly fixed specimens, and by adequately trained dedicated pathologists that take the time to diligently follow the protocol methodology. In this paper, we review the history of histologic grading, describe the basics of grading, review prognostic value and reproducibility issues, compare performance of grading to gene expression profiles, and discuss how to move forward to improve reproducibility of grading by training, feedback and artificial intelligence algorithms, and special stains to better recognize mitoses. We conclude that histologic grading, when adequately carried out, remains to be of important prognostic value in breast cancer patients.


History of histologic grading
The importance of the histologic profile of invasive breast cancer in correlation with the disease course was first acknowledged by Von Hansemann in 1893. He presumed that tumors with a loss of differentiation, which he so-called anaplastic, had a greater tendency to metastasize, which he confirmed in 1902. In 1922, MacCarthy and Sistrunk described a correlation between post-mastectomy survival and the degree of differentiation, lymphocytic infiltration, and hyalinization of the tumor [1]. In 1925, Greenough was the first to describe a grading classification system, that, similar to the current classification, separated tumors into three grades of malignancy, based on tubular differentiation, the size of cells/nuclei, and hyperchromatism and mitosis [2]. Several other studies, which also took clinical staging into account, followed. Importantly, it was concluded that histologic grading was of value with regard to prognosis, yet, clinical staging was the most important factor [3].
Until the late 1950s, tumors were simply and only classified according their clinical stage, which does not take into account the accepted range of biological behavior of breast carcinomas. Bloom and Richardson observed at that time that clinical staging provided a useful guide, yet "it fails completely to indicate the likelihood of occult lymphatic and blood-born metastases being present in what appears to be an early case, nor the speed with which such metastases may develop" [4]. This prompted them to develop a technique of histologic grading, which they correlated with survival in a series of 1544 breast cancer patients [4]. In their classification system, tumors were allocated a score of 1-3 for each of three components, differentiation of tubule formation, pleomorphism, and "hyperchromatosis" or mitotic nuclei. A total score, derived from the summation of the three component scores, of 3-5 indicated a low-grade (I) tumor, scores 6-7 an intermediate (II) tumor, and scores of 8-9 a high-grade (III) tumor [4]. Importantly, Bloom and Richardson stated that the different grades were not different pathologic entities, and their 3 grades were based on arbitrarily divisions of a continuous scale of malignancy. They did not claim to have discovered a mathematically accurate grading classification, yet they emphasized that their point system was merely a useful aid in guiding prognosis [4]. Despite these compelling observations, histologic grading of breast cancer was not accepted as a routine procedure, mainly due to perceived reproducibility issues, until decades later [5].
The grading classification of Bloom and Richardson [4] was revised by Elston and Ellis in 1991, who use semiquantitative criteria to improve objectivity and reproducibility [5]. Tubular differentiation was based on evaluation of the percentage of tubule formation, hyperchromatic figures were excluded from assessment, and mitosis was counted using a defined field area. The degree of nuclear pleomorphism was scored according to more objective definitions based on comparison with normal cell types. Elston and Ellis demonstrated the relevance of histological grade in breast cancer and its strong correlation with prognosis, in a series of 1813 patients with primary operable disease who had been followed up for many years [5]. This Elston-Ellis modification of the Bloom and Richardson grading classification (also known as the Nottingham grading system (NGS)) has become globally used to guide management of invasive breast carcinoma [6][7][8].

Histologic grading: the basics
Histologic grade represents the degree of differentiation, which reflects the resemblance of tumor cells to normal breast cells. The NGS is a semiquantitative assessment of three morphological characteristics, being tubule/gland formation, nuclear pleomorphism, and mitotic frequency. The NGS is a simple and cheap method, which in principle can be performed in all breast cancer cases [9]. Furthermore, it merely requires appropriately prepared hematoxylin-eosin (HE)-stained tumor slides from optimally formalin-fixed paraffin-embedded tissue blocks by trained experienced pathologists, who are prepared to take the time to diligently follow the standard protocol.
Grading itself is evaluated by a numerical scoring system of 1-3 per category (tubule formation, nuclear pleomorphism, mitotic count). Tubule/gland formation is classified according to the percentage of tubular or glandular acinar spaces (> 75% score 1, 10-75% score 2, < 10% score 3), where only those structures with clear central lumina enclosed by polarized cells are counted ( Fig. 1A-C). The inside-out polarization features of micropapillary invasive carcinoma does not by itself count as tubule formation, although these cancers can have tubule formation on the inside of the micropapillary groups that counts as tubules.
Nuclear pleomorphism, describing the size and degree of variation in tumor cell nuclear size and shape, is scored in the least differentiated area of the tumor. It is assessed by examining the regularity of nuclear size and compared to the shape of normal epithelial cells in the surrounding tissue. Score 1 is allocated to tumor cells that are similar in size to normal epithelial cells, which show only minimal pleomorphism and whose nucleoli and chromatin pattern are inconspicuous at most ( Fig. 2A). Tumor cells with nuclei that are 1.5-2 × larger than epithelial cells and with moderate pleomorphism and still inconspicuous nucleoli are given score 2 (Fig. 2B). In contrast, score 3 nuclei are more than 2 × larger in size, which vary considerably in size and which show vesicular chromatin and often prominent nucleoli (Fig. 2C).
Mitotic counting is performed in the most proliferative area of the tumor, usually the most solid area, typically at the periphery of the tumor. A score of 1-3 is based on the number of defined mitotic figures seen in a given tumor area or microscope field area, with cutoff points dependent on field area size assessed using the diameter of the high-power-field (HPF) ( Table 1). Examples of well-defined mitotic figures can be found in Fig. 3.
Overall, the three grade component values are summated, resulting in a total score of between 3 and 9 and then categorized into final grade. Scores 3-5 represent well-differentiated grade I tumors, scores 6-7 represent moderately differentiated grade II tumors, and scores 8-9 represent poorly differentiated grade III tumors.

Histologic grading: prognostic value
The NGS has shown to be of independent significance with regard to breast cancer prognosis [5,7,[10][11][12]. In addition, histologic grade has been incorporated in prognostic index scores, of which the Nottingham prognostic index (NPI) [13] is regarded as the only index score that has been extensively validated and which retains its predicting ability in most independent populations [14,15]. Within the NPI, histologic grade is combined with lymph node (LN) status and tumor size, where grade is considered equally important as lymph node status. In contrast, studies have suggested that histologic grade predicts tumor behavior more accurately than tumor size, which may be considered a more "timedependent" factor [7,8,11,13,16].
Breast cancer is now detected at earlier stages by mammographic screening programs, thereby resulting in a greater proportion of both smaller [17, 18] and lymph node-negative tumors at diagnosis [19]. This furthermore increases the clinical contribution of histologic grade [7,20].
[29] included a spectacularly large number of 22,616 breast cancer cases. They showed similar prognosis for breast cancer patients with stage II/grade I disease and breast cancer patients with stage I/grade II disease [29]. Furthermore, they showed an excellent prognosis for small (< 2 cm) grade I tumors, even when they showed lymph node metastases at presentation. Therefore, Henderson et al. concluded that using histologic grade in conjunction with disease stage (consisting of tumor size and lymph node status) could improve outcome predictions [29].
These results were further supported by a somewhat more recent, retrospective series of 2219 operable breast cancer cases from a single institution by Rakha et al. [7]. Histologic grade proved to be associated with both breast cancerspecific and disease-free survival in the whole series, as well as within the specific subgroups of small tumors (T1a, T1b, T1c) and lymph node-negative and lymph node-positive tumors [7]. The latter has also been shown in other studies [7,22,24,28,30]. More importantly, the prognostic value of histologic grade was independent of tumor size and lymph node status [7]. Furthermore, it was shown that grade is complimentary and equivalent in impact magnitude to lymph node status, which is widely regarded as a major prognostic factor in breast cancer. For example, patients with grade II tumors and 1-3 positive lymph nodes had a better prognosis than patients with grade III tumors without any lymph node metastases [7]. Moreover, the Swedish twocounty trial demonstrated that the independent prognostic effect of histologic grading on survival (as well as lymph Of note, although grading was initially deemed applicable to Not Otherwise Specified ("ductal") cancers, grading has been proven prognostically important across all histologic breast cancer types.
In addition, the prognostic role of histologic grading in specific subgroups, for whom the benefit of adjuvant chemotherapy is uncertain, like patients with low-volume lymph node metastases, or patients with ER-positive/ lymph node-negative breast cancer, has also been established. For example, histologic grade is an independent prognostic factor in breast cancer patients with ER-positive disease, with [31] or without neoadjuvant endocrine therapy [32]. Furthermore, histologic grade has been shown to be one of the two remaining prognostic factors that was associated with relapse-free survival in a multivariate analyses of ER-positive/HER2-negative breast cancer patients [33].
As to the relative prognostic contribution of the three constituents of grade, several studies have shown that the mitotic count is the most important variable followed by nuclear atypia and then tubule formation [34-36]. Thus, as grade is an important and long-term prognostic factor across breast cancer subtypes, being equally important as lymph node status, more important than tumor size, and being of specific prognostic influence in different subgroups, it would be an omission to exclude histologic grade from clinical decision-making.

Histologic grading: reproducibility issues
Although histologic grade has long known to be of prognostic value, its reproducibility has also been the subject of debate for decades. Firstly, the distribution of grade varies largely (i.e., up to 27%) between studies (Table 2) [7, 16, 22-24, 26, 27, 30, 35, 45-56]. However, these differences may partly be explained by the wide variety of patient cohorts ( Table 2). For example, these cohorts vary in age, type of detection method (screening versus symptomatic, early, or advanced breast cancer), and type of tissue fixation.
Secondly, inter-and intra-observer variation has been extensively reported, with a wide range Kappa values (0.43 and 0.85) which correlates to a range from "fair" to "almost perfect" agreement [9,12,25,30,[57][58][59][60][61][62][63][64][65][66][67][68][69] ( Table 3). A recent nationwide study in the Netherlands showed substantial variation in grading in daily clinical practice, both between pathology laboratories and between pathologists within individual laboratories [55]. Importantly, these differences were not explained by differences in case mix [55]. Subsequently, initiatives were launched to improve variation in grading. Feedback reports in which pathologists and laboratories were benchmarked against the nationwide average and their colleagues (all anonymized) were sent [56], and pathologists and residents were trained using e-learning [70]. Both initiatives resulted in a promising decrease in grading variation and may be implemented broadly in the field of pathology. Yet,

Immunohistochemistry for proliferation markers
With the proliferation marker mitotic index being the most important constituent of grade, with observer variation in counting [34-36], one could argue that replacing mitotic index with a potentially more objective method based on immunohistochemical staining of proteins highlighting proliferating cells could help to reduce variation. Ki67 and phosphohistone H3 (PHH3) have shown most promise here. Ki67 is a protein expressed in all phases of the cell cycle, except for resting cells in the G0 phase. However, there is controversy with regard to its clinical utility in routine clinical management, due to variation in analytical practice [71][72][73] and the absence of consensus on cutoff values [74]. PHH3 is highly expressed in cells in the mitotic phase, and this proliferation marker has shown great promise, also with regard to reproducibility [75]. It may help to better identify mitotic cells and highlight the areas of highest proliferation and thereby increase reproducibility, especially in cases with sub-optimal fixation associated compromised morphology.

More research on its clinical utility is necessary [75].
Another potentially important tool is the IHC-4 algorithm, which consists of a combination of ER-, PR-, HER2-, and Ki-67 status [76]. However, to date, this system has not been widely incorporated in breast cancer guidelines, mainly due to the above-mentioned lack of consensus on the clinical utility of Ki67-assessment.

Molecular profiling and gene expression profiling
In the past decade, molecular profiling and gene expression profiling (GEP) have emerged as new tools to predict tumor behavior. Molecular profiling studies showed that grade I, II, and III breast cancers are most likely different entities as they show specific molecular profiles at immunohistochemical, genomic, and transcriptomic levels, which further supports the relevance of histologic grading [77]. Furthermore, histologic grading has been shown to better correspond to the molecular profile of breast cancer than lymph node status and tumor size [78][79][80].
Although these "new" biomarker/molecular profile methods were launched with great excitement, it seems unlikely that molecular or gene expression profiling will substitute classic clinicopathologic variables. For ER-positive disease, for example, histologic grade remains an independent prognostic factor in multivariate models, even when molecular signatures are included [81,82]. In addition, several studies have shown that the added value of GEPs to clinicopathologic variables (age, ER-status, lymph node status) in prognostic models may be limited and sometimes only equal to prognostic indices like the NPI [31, 83,84].
Two well-known GEPs, Oncotype DX [81] and Mam-maPrint [85], are currently being used in daily clinical practice in some countries. However, it is also important to acknowledge that these tests are not accessible (i.e., $4.000 per Oncotype DX/per MammaPrint) nor applicable to every breast cancer patient [86][87][88][89][90][91][92][93][94]. Even more so, studies have shown that "simple" biomarkers like histologic grade and progesterone status can predict Oncotype DX scores, thereby saving the need for these expensive GEPs [95][96][97][98][99][100]. However, GEPs may be of added value in patients for whom the indication for adjuvant therapy remains doubtful based on classic biomarkers [85]. Lastly, molecular signatures and GEPs are not without flaws themselves, although they are generally considered to be more objective biomarkers [82]. For example, in the MammaPrint study, a change in the RNA extraction solution that was used to calculate the Mam-maPrint score led to a shift of genomic risk scores in > 150 patients [85]. In addition, similar to statistical cut-offs used for histologic grading, molecular tests and GEPs also depend on biostatistical approaches. Furthermore, intratumor heterogeneity has been found to affect the prognostic risk stratification by GEPs in early breast cancer [101]. Lastly, the results of GEPs and molecular profiles also depend on wellprepared tissue samples to begin with. Overall, molecular profiling and GEPs should not be seen as the new "holy grail" and will not substitute but rather complement classic clinicopathologic biomarkers, which in return need to be assessed adequately, by well-trained pathologists.

Artificial intelligence
Artificial intelligence methodology is currently finding considerable traction as a tool to aide pathologists. It is expected to be especially helpful with regard to reproducibility concerns that surround histologic grading in its current state. An example of this is the CAMELYON 16 challenge, which showed that some deep-learning algorithms achieved better diagnostic performance in detecting lymph node metastases in breast cancer patients than routine pathologists (under time pressure) and comparable diagnostic performance to expert pathologists (without any time constraints) [102]. Since then, promising results have been published, for example, on predicting tumor proliferation in breast cancer by deep-learning (TUPAC16 challenge) [103] and mitosis counting [104]. However, practical utility studies need to be performed [102,103]. In addition, it is important to acknowledge that well-annotated (consensus based) datasets are required for the development of AI algorithms. A major pathology-led consortium with 46 partners from all fields of research and businesses, which aims to create a platform of Whole Slide Imaging (WSI) data to develop advanced AI algorithms (BIGPICTURE), may be very helpful here and will start in the near future. These advanced AI algorithms may be able to directly grade breast carcinoma themselves [101].

Conclusions
In conclusion, histologic grading is a simple and inexpensive method to assess tumor behavior and patient prognosis, thereby identifying patients at risk for adverse outcomes, who may be eligible for (neo)adjuvant therapies. However, histologic grading needs to be performed accurately, on properly fixed specimens, and by adequately trained dedicated pathologists that take the time to diligently follow the protocol methodology. Levels of inter-observer variation can and should still be improved. Feedback and training may be helpful tools to support this. In addition, artificial intelligence is very likely to be able to support pathologists in the near future. When accessible to patients, GEPs may complement classic pathology biomarkers in doubtful cases, rather than substitute them. Furthermore, the GEPs are costly and have flaws of their own. Hence, histologic grading, when adequately carried out, remains to be of important prognostic value in breast cancer patients.
Author contributions This was an invited review agreed on the concept and jointly wrote the paper.

Availability of data and material Not applicable.
Code availability Not applicable.

Declarations
Ethics approval Not applicable, no use of data or tissue, neither human participants, nor animals.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.