The BIG 2.04 MRC/EORTC SUPREMO Trial: pathology quality assurance of a large phase 3 randomised international clinical trial of postmastectomy radiotherapy in intermediate-risk breast cancer

Introduction SUPREMO is a phase 3 randomised trial evaluating radiotherapy post-mastectomy for intermediate-risk breast cancer. 1688 patients were enrolled from 16 countries between 2006 and 2013. We report the results of central pathology review carried out for quality assurance. Patients and methods A single recut haematoxylin and eosin (H&E) tumour section was assessed by one of two reviewing pathologists, blinded to the originally reported pathology and patient data. Tumour type, grade and lymphovascular invasion were reviewed to assess if they met the inclusion criteria. Slides from potentially ineligible patients on central review were scanned and reviewed online together by the two pathologists and a consensus reached. A subset of 25 of these cases was double-reported independently by the pathologists prior to the online assessment. Results The major contributors to the trial were the UK (75%) and the Netherlands (10%). There is a striking difference in lymphovascular invasion (LVi) rates (41.6 vs. 15.1% (UK); p = <0.0001) and proportions of grade 3 carcinomas (54.0 vs. 42.0% (UK); p = <0.0001) on comparing local reporting with central review. There was no difference in the locally reported frequency of LVi rates in node-positive (N+) and node-negative (N−) subgroups (40.3 vs. 38.0%; p = 0.40) but a significant difference in the reviewed frequency (16.9 vs. 9.9%; p = 0.004). Of the N− cases, 104 (25.1%) would have been ineligible by initial central review by virtue of grade and/or lymphovascular invasion status. Following online consensus review, this fell to 70 cases (16.3% of N− cases, 4.1% of all cases). Conclusions These data have important implications for the design, powering and interpretation of outcomes from this and future clinical trials. If critical pathology criteria are determinants for trial entry, serious consideration should be given to up-front central pathology review. Electronic supplementary material The online version of this article (doi:10.1007/s10549-017-4145-4) contains supplementary material, which is available to authorized users.


Abstract
Introduction SUPREMO is a phase 3 randomised trial evaluating radiotherapy post-mastectomy for intermediaterisk breast cancer. 1688 patients were enrolled from 16 countries between 2006 and 2013. We report the results of central pathology review carried out for quality assurance. Patients and methods A single recut haematoxylin and eosin (H&E) tumour section was assessed by one of two reviewing pathologists, blinded to the originally reported pathology and patient data. Tumour type, grade and lymphovascular invasion were reviewed to assess if they met the inclusion criteria. Slides from potentially ineligible patients on central review were scanned and reviewed online together by the two pathologists and a consensus reached. A subset of 25 of these cases was double-reported independently by the pathologists prior to the online assessment.
Results The major contributors to the trial were the UK (75%) and the Netherlands (10%). There is a striking difference in lymphovascular invasion (LVi) rates (41.6 vs. 15.1% (UK); p = \0.0001) and proportions of grade 3 carcinomas (54.0 vs. 42.0% (UK); p = \0.0001) on comparing local reporting with central review. There was no difference in the locally reported frequency of LVi rates in node-positive (N?) and node-negative (N-) subgroups (40.3 vs. 38.0%; p = 0.40) but a significant difference in the reviewed frequency (16.9 vs. 9.9%; p = 0.004). Of the N-cases, 104 (25.1%) would have been ineligible by initial central review Intermediate risk was defined as either node-positive (N?) (pN1) disease of any grade in tumours B5 cm diameter (T1 or T2), or T2 node-negative (N-) tumours that were either grade 3 and/or showed lymphovascular invasion (LVi), or T3N0 tumours, independent of pathological features. Trial entry was determined locally based on local pathological evaluation. Central pathology review was planned to be carried out later for quality assurance and not to confirm or reject trial entrants, retrospectively. This policy was adopted to allow applicability of the results to the realworld situation of daily clinical practice. To the best of our knowledge, this is the first and largest report of pathology quality assurance within an international randomised breast radiotherapy trial recruiting across three continents (Europe, Asia and Australasia). We report the results of the pathology review.

Patient data and pathology materials
All patient data including locally reported pathology were recorded and held centrally in the SUPREMO Trial Office at the Scottish Clinical Trials Research Unit (SCTRU), NHS Scotland in Edinburgh, UK. If multiple operations had been performed, all reports were obtained. A requirement for trial entry was the submission of a representative haematoxylin and eosin (H&E) stained section of the tumour or a paraffin block from which an H&E could be made centrally. For patients treated with neo-adjuvant systemic therapy, the initial pre-treatment biopsy tissue was used. Because of local tissue governance regulations central pathology review was restricted to hospitals from France, Germany, Japan, the Netherlands, Poland, Switzerland, Spain, Turkey, the UK, and one centre each in Australia, China and New Zealand.

Central pathology review
The two reviewing pathologists (JT & AH) were sent in batches of 25, a single anonymised H&E section for each patient identified by the SUPREMO Trial Number only. The H&E section was usually recut rather than an original because the majority of patients had also consented to future translational studies. Data were recorded as follows: tumour type; histological grade (Bloom and Richardson as modified by Elston and Ellis 1991) [1]; and presence or absence of lymphovascular invasion (LVi). Reviewing pathologists were blinded to all patient data including locally reported pathology and node status. The pathologists are specialist breast pathologists working in large UK centres (Edinburgh and Leeds). The reviewing pathologists reported LVi according to UK reporting guidelines [2].

Pathology quality assurance
Data were analysed as follows: 1. Completeness of data. 2. Differences between reporting profiles of reviewing pathologists and local reporting. 3. Discrepancies between local pathology reporting and central review.
Analysis was limited to those discrepancies which would have changed a patient's eligibility to enter the trial, i.e. a difference of overall grade or LVi which was critical to the inclusion of patients in the N-group. The original H&E section from the discrepant cases which had been reviewed previously by one of the pathologists was scanned at 940 magnification using the Aperio ScanScope slide scanner (Aperio Technologies, Vista, CA) and was then viewed on line by both pathologists simultaneously, and a consensus was reached re grade and LVi. The pathologists were blinded to their original diagnoses.
4. Comparison of Nottingham Prognostic Index (NPI) in N? and N-subgroups.
The NPI for the two subgroups was calculated from the tumour size and number of positive nodes as reported locally and the histological grade [3]. Two calculations were made using the reported grade and the grade from central review.

Statistical analysis
Comparison of proportions was made using a Chi squared test. Groups were compared using the Mann-Witney U Test. A two-sided p value of \0.05 was deemed significant. Statistical calculations and charts were made with Analyse-it Ò v2.11 for Excel Ò.

Completeness of data
Patient enrolment and exclusions from this QA study are summarised in Fig. 1. 1688 patients were enrolled in the trial, and 44 patients were of unknown nodal status at the time of this analysis and were excluded from this study. Reporting profiles by nationality of treating site, of reviewing pathologists and differences between central and local reporting: The data relating to nationality of treating site are summarised in Table 1. This is limited to the top 7 (of 16) countries submitting patients accounting for 97% of the trial population. The two major contributors were the UK (75%) and the Netherlands (10%). The presence of LVi was reported locally in 41.6% of UK cases and 28.2% of Dutch cases. The difference is significant (p = \0.001). On central review, the frequency of LVi in the two countries was 15.1 and 19.2%, respectively. The difference is not significant (p = 0.23). There were 161 different randomising centres individually submitting between 1 and 70 cases (median 7 cases).
The overall data relating to the N? and N-subgroups is summarised in Of the 1688 patients entered into the trial, 1382 had an H&E section available for review. The two reviewing pathologists evaluated 409 and 973 sections, respectively. The centrally reported grade and LVi profile for the two pathologists and as reported locally are summarised in Table 3. The two reviewing pathologists show similar reporting profiles, and there is no evidence of case selection bias between the two reviewed subsets.
A detailed breakdown of reviewed and reported LVi for all patients, and those cases reviewed centrally against both reported and reviewed grade is shown in Supplementary  Tables 1a and 1b, respectively. There is a striking difference in LVi rates on comparing local reporting with central review across all grade groups.

Prognostic equivalence of N1 and N2 subgroups
The NPIs as calculated from reported grades for N? and N-subgroups are shown in the box plots Fig. 2, and the data for reported and reviewed subgroups are summarised in Supplementary Table 2

Numbers of discrepant cases
Because pathology criteria were used to determine eligibility for the N-group, potentially ineligible cases inevitably fell in this group following a pathology QA exercise on the grounds of neither being grade 3 nor showing LVi (114 cases, 95 from the UK).
Numbers of cases per reviewing pathologist and reasons for discrepancy: Pathologist 1-29/409 (7%) cases: 14 cases LVi; 12 cases grade; 3 cases both Pathologist 2-85/873 (10%) cases: 33 cases LVi; 47 cases Grade; 15 cases both Of these 114 cases, 108 were scanned satisfactorily and were available for review online by the two pathologists. 23 cases were upgraded on review from grade 2 to 3, and a further 12 cases were agreed to show LVi. Therefore, 32% of cases originally deemed ineligible by initial central review were deemed eligible following joint discussion.

Cross-over reporting
25 cases were re-reported from slides by the two pathologists independently. There was complete agreement on grade in 20 cases (80%). 5 cases showed grade 2/3 disagreements (20%). There was no evidence of grade bias by either pathologist. 2 cases showed disagreement about LVi (8%).

Implications for patient eligibility for SUPREMO and other clinical trials
Following a central review of pathology variables in the SUPREMO Trial population, we identified 19% of Npatients who would, if central pathology data were used, be ineligible for the trial. Whilst the total number of cases deemed ineligible by central review was low, it represents a significant sub-group of the N-patients.
The non-eligible rate for our N-subgroup raises concerns about the interpretation of outcomes from this trial, particularly in the N-subgroup. Our data raise questions about whether clinical trials need to be powered to accommodate significant minorities of patients actually being ineligible or should they reflect practice in the real world? In the ARTemis trial, the principal pathological end point was confirmed by review of pathology reports by the clinical investigators [4]. This was because the trial was powered on the basis of full recruitment, whereas slide retrieval was anticipated to be 85% of entrants at best. If it is decided that pathological central review is the desired way to assess a particular outcome, then the powering of the trial will need to be adjusted to allow for this estimated retrieval rate of around 85%.
In the SUPREMO trial, N-patients were required to have either grade 3 carcinomas or LVi or both, whereas N? patients were not. This was an attempt to ensure a degree of prognostic equivalence between the two groups. We compared the two groups looking at their respective NPIs to test this assumption and found a significant difference between them. We appreciate that the NPI does not include LVi as a factor and so this tool only examined this issue partially. Critical evaluation of this central pathology review The following issues need to be considered in the interpretation of our data: 1. We reviewed a single recut H&E section and not the original tumour sections available to the local pathologists. We accept fully that this will lead inevitably to a lower reviewed LVi frequency compared with the local frequency. The availability of a single H&E for central review is certainly an important issue in explaining the lower LVi frequency on central review but does not explain the lack of difference in local reporting between N? and N-subgroups. 2. In the original trial protocol, specific instructions were not given as to how LVi should be reported. The reviewing pathologists did not meet to discuss how this aspect of the review should be carried out but simply followed the UK guidelines as per their normal practice. In view of fact that 75% of SUPREMO cases were from the UK, we would expect these cases to have been reported according to standard UK practice. It is notable that SUPREMO Trial cases were not entered into the trial until the MDM where the case was discussed-therefore after it had been reported. It follows that on average in the UK patients with intermediate-risk breast cancer (whether N? or N-) have an LVi frequency of [40%. This is not in line with the reviewing pathologists' experience. 3. When the reviewing pathologists carried out the crossover review, they upgraded LVi status on 20% of cases. If this were extrapolated across the whole Ngroup (assuming that the status change was always in one direction), then the LVi frequency would rise from 15 to 19%. That is still a long way from 41%. 4. The proximity of reporting profiles of the two reviewing pathologists is remarkably close, and it is of concern that the reviewing pathologists consistently found a substantially lower rate of LVi than was locally reported where the bias was in favour of the presence of LVi rather than its absence. There is a trend in our data of increased frequency of LVi with increasing grade, but there is no difference between the frequency of reported LVi in the N? and N-groups, whereas this was a consistent finding by the two reviewing pathologists. In the Nottingham case series, there were strong correlations between nodal status and tumour grade and LVi where 12% of grade 1 carcinomas and 40% of grade 3 carcinomas showed LVi [5]. Two further large studies of LVi in N-breast cancer have shown overall rates of 19.5 and 19%, respectively [6,7]. In the Uppsala, radiotherapy trial for Stage 1 breast cancer where all tumour slides were reviewed LVi was recorded in 22% of cases [8]. 5. Our data also show significant differences between the frequency of grade 3 carcinomas as reported locally (53%) and following central review (42%). The central review figure is very close to that reported in the Nottingham series of 3255 patients where grade 3 carcinomas accounted for 43% of cases overall [5]. 6. From a logistical point of view, the QA process for this trial was labour-intensive. The two reviewing pathologists (AH & JT) are currently carrying out the pathology QA for the LORIS trial [9] where pathological eligibility criteria are confirmed at the time of diagnosis by near-real-time review of scanned images on line. Using this approach all potential patients' pathology is turned around within five working days with no delay to the patient's management pathway.

Consistency of reporting among pathologists
There is substantial variability in the grading consistency of pathologists [10], although a recent study showed moderate to good consistency for grades 1 & 3 (kappa = 0.7) [11] in a large review of the NHS Breast Screening Programme EQA Scheme the kappa for grade was lower at 0.48 [12]. The literature is, however, conflicting on consistency of reporting by generalist and specialist pathologists [13][14][15]. It is encouraging to note that there were no major differences in the broad metrics of reporting profiles between the major countries contributing to this trial.

Comparability of the N1 and N2 subgroups
NPI has been tested extensively as a prognostic tool and has been shown to correlate well with medium and longterm outcomes [16,17]. This trial was designed and powered on the assumption that the presence of grade 3 histology and/or lymphatic invasion would render the Npatients prognostically equivalent to those with N? disease. This will only be known when outcome data become available when the trial reports.

Conclusion
This international study provides unique data comparing local reporting and central review of pathology for a large clinical trial in three continents. Pathology criteria were critical for the inclusion of N-patients and central review even after arbitration suggest that up to 20% of this subgroup were ineligible for trial entry. The study raises questions about design of clinical trials, particularly how they are powered, the methodology of central pathology review and the role of digital technology in supporting this process. Consistency in pathology reporting between Europe and China provides a sound platform for collaboration in clinical trials requiring multinational accrual.