Background

Regional nodal irradiation has been proven to benefit breast cancer patients with positive axillary nodes, and with negative axillary nodes and high-risk features [1,2,3,4]. The internal mammary node (IMN) chain is an important first station of lymphatic drainage of breast cancer, but the value of IMN irradiation (IMNI) has not been defined in previous prospective studies [4,5,6,7,8]. Most patients enrolled in these studies received less systemic therapy or underwent two-dimensional radiotherapy (RT). The optimal subgroups that may benefit from IMNI with modern treatment should be identified in further studies. Therefore, we launched a multicenter, randomized, phase 3 trial to evaluate postmastectomy radiation therapy (PMRT) with or without IMNI for patients with high-risk, node-positive breast cancer (POTENTIAL trial, NCT04320979), which was approved by the ethics committee of the Cancer Hospital, Chinese Academy of Medical Sciences (19/317–2101). This trial intends to enroll 1800 patients during a 5-year period with the primary endpoint being disease-free survival; the detailed trial protocol has been previously published [9].

Pretrial quality assurance (QA) is very important in multicenter RT trials to guarantee uniform planning quality and enhance the reliability of outcomes [10,11,12]. Some studies have shown that protocol violations adversely affect outcomes [13, 14]. IMNI would increase the complexity of RT plan, and it remains a big challenge for physicians to balance the target coverage and normal tissue sparing, especially for left-sided breast cancer [15]. Twenty-six institutions participated in this trial to guarantee the accrual sample size within an acceptable time period. Different radiation techniques were implemented in this trial, including electron beam, three-dimensional conformal RT (3DCRT), intensity-modulated RT (IMRT) and volumetric-modulated arc therapy (VMAT) [9]. IMRT and VMAT were rarely used in previous IMNI studies [6,7,8]. Therefore, it was essential to evaluate the potential heterogeneities and improve the planning quality before enrolling patients. In this trial, we performed a strict QA program including general credentialing, trial-specific credentialing, and individual case review. Target delineation and planning QA were performed in trial-specific credentialing. The results of target delineation QA have been previously reported [16]. The present study aimed to report the results of planning benchmark case to assess the plan design and protocol compliance of the participating institutions.

Methods and materials

Benchmark planning procedure

The benchmark case was a 42-year-old, non-smoking, woman with left-sided breast cancer (stage IIIC, T2N3M0) after mastectomy and axillary node dissection followed by eight cycles of dose-dense chemotherapy. Surgical pathology showed grade 2, invasive ductal carcinoma with a tumor measuring 2.7 × 2.0 cm and the presence of lymphovascular invasion. Immunohistochemistry showed positive estrogen receptor and progesterone receptor, and negative human epidermal growth factor receptor-2; the ki-67 index was 30%. Of 23 dissected lymph nodes, 17 showed metastases, with the presence of extracapsular extension and massive lymphovascular invasion. There was no manifestation of residual tumor, recurrence, or metastasis in work-up images prior to chemotherapy and RT.

For the benchmark planning, the patient was scanned in the supine position immobilized with a cervicothoracic thermoplastic mask with free breathing, and then the computed tomography (CT) dataset in the Digital Imaging and Communications in Medicine (DICOM) format was provided to participating institutions. The RT structures including clinical target volumes (CTVs), planning target volumes (PTVs), and organs at risk (OARs) had been delineated by the QA team per protocol [16] to eliminate the dosimetric difference caused by delineation variability. We designed the contouring atlas by comprehensively referring to the Radiation Therapy Oncology Group (RTOG), European Society for Radiotherapy and Oncology (ESTRO), and Radiotherapy Comparative Effectiveness (RADCOMP) atlas and the results of failure pattern-mapping studies [17,18,19]. Considering the high-risk recurrence of this cohort and rapid dose falloff of IMRT and VMAT, the contouring atlas was considerably large. The PTV of chest wall (PTVcw); supraclavicular fossa plus axilla levels I, II, III (PTVsc + ax); and IMN region (PTVim) were generated from the corresponding CTV, with a 5-mm expansion in all directions, but limited to 5 mm beneath the skin surface for PTVsc + ax, PTVim, and PTVcw2 (without bolus), and limited to skin surface for PTVcw1 (with bolus). The OARs included heart, left anterior descending coronary artery (LADCA), both lungs, contralateral breast, spinal cord planning organ at risk volume (PRV), esophagus, ipsilateral brachial plexus, ipsilateral shoulder joint, thyroid gland, liver, and stomach.

All participating institutions were requested to generate RT plans with IMRT or VMAT techniques with 6MV X-ray beams, because these modern techniques are complicated and have not been routinely used for PMRT in some centers. The dose constraints per protocol are summarized in Table 1, which had been modified from our in-house recommendation and referred to literature regarding the low rates of toxicities under certain OAR dose constraints after IMRT and VMAT came into use [20, 21]. The prescribed dose was either 43.5 Gy in 15 fractions over 3 weeks for hypofractionated RT (HFRT) or 50 Gy in 25 fractions over 5 weeks for conventional fractionated RT (CFRT) [22, 23]. The following planning guidelines were recommended. When multi-beam IMRT technique was applied, 4–6 coplanar beams close to the tangential direction were set up at the affected side to minimize lung irradiation, and one or more anterior beams could be added to the supraclavicular and IMN regions to achieve optimum balance between target coverage and OARs sparing. For VMAT, partial arcs covering angles that extended slightly beyond the multi-beam IMRT field setup could be used. Low-dose irradiation to OARs should be strictly limited. Considering the set-up uncertainties, breathing, and possible anatomical changes, skin flash will be applied to IMRT and VMAT plans to expand the tangential beams or control points of 1.5–2 cm outside from the chest wall skin to ensure target coverage. The methods to achieve adequate coverage of “flash region” include using automatic skin-flash tool, virtual bolus, or robust optimization [24, 25]. The optimization and final dose calculation will be performed with inhomogeneity corrections.

Table 1 Dose constraints for target volumes and organs at risk in the POTENTIAL trial

The completed RT plans were submitted to the QA team for review. Meanwhile, other details such as treatment planning system (TPS), dose prescription, treatment technique, and beam information were provided. The qualified documents of the QA process of CT simulator, linear accelerator, image-guided RT, and TPS were also provided for general credentialing. If major deviations occurred in the submitted plans, detailed recommendations were sent back, and the participating institutions revised the plans until they were approved. The problems were also discussed during regular online workshops to improve the plan quality of all centers. For the finally approved plans, dosimetric verification of absolute dose distribution was performed by each institution and the passing rate was required to be ≥ 90%, based on the gamma criteria of 3%/3 mm and 10% dose threshold.

Dosimetric analysis

The DICOM files of submitted plans were imported into MIM software (Cleveland, OH) for review. All plans were reviewed by at least one experienced radiation oncologist and one specialized dosimetrist in the QA team. The plans were evaluated regarding the homogeneity and conformality of PTV, dose to OARs, beams arrangement, skin flash, inhomogeneity corrections, space for improvement, and the results of dosimetric verification. Major deviations such as inappropriate beam arrangement were defined by the radiation oncologists and dosimetrists during review. The protocol compliance and the actual value of each parameter were assessed in the first and final submission, respectively. Statistical analyses were computed using SPSS 22.0 (IBM Corporation, Armonk, NY, USA). Mc-Nemar test was used for paired differences between the first and final submission. Two-sided P < 0.05 indicated statistically significances.

Results

A total of 26 institutions (Additional file 1: Table A) participated in the planning benchmark case; among these, all submitted first plans and 22 institutions resubmitted revised versions. The details of TPS, fraction regimen, radiation technique, use of skin flash, and the number of beams or arcs are shown in Table 2. The dose calculation was performed with inhomogeneity corrections in all plans. As shown in Table 3, some major deviations were found in the first submission. They were corrected in the revised submission. Examples of dose distributions of the final plans are shown in Fig. 1.

Table 2 Summary of the treatment planning system, fraction regimen, radiation technique, use of skin flash, and radiation beams/arcs for the benchmark case
Table 3 Summary of major deviations that occurred in the 26 first submitted plans
Fig. 1
figure 1

Examples of dose distributions in the final submitted IMRT or VMAT plan with HFRT or CFRT regimen. A IMRT with HFRT; B VMAT with HFRT; C IMRT with CFRT; D VMAT with CFRT. The blue line represents CTV of supraclavicular and axilla region (CTVsc + ax); the green line represents PTVsc + ax; pink line represents CTV of chest wall (CTVcw); the sky-blue line represents PTVcw; the purple line represents CTV of internal mammary node region (CTVim); and the forest green line represents PTVim

The number and ratio of plans that met the optimal and acceptable criteria are summarized in Table 4. The dosimetric results compared with the dose constraints are shown in Fig. 2. Actual dosimetric data with HFRT or CFRT regimen are shown in (Additional file 1: Table B and C), respectively. For target volumes, the optimal plus acceptable rates of dose coverage for PTVcw, PTVsc + ax, and PTVim (V100%) were all significantly improved in the final submission compared to first submission, which were 96.2% vs. 69.2% (P = 0.016), 100% vs. 76.9% (P = 0.031), and 88.4% vs. 53.8% (P = 0.012), respectively (Table 4, Fig. 2 A and D). In the final submission, the PTVcw V100% of the only one plan that did not meet the acceptable criteria was 88.4%; the PTVim V100% of the three plans that did not satisfy the acceptable criteria were 82.9%, 85.7%, and 86.6%, and the V90% values were 97.4%, 95.1%, and 96.8%, respectively. In the first and final submission, the mean values of PTVim V100% were 79.9% and 92.7%.

Table 4 Protocol compliance of target volumes and organs at risk in the first and final submission
Fig. 2
figure 2

The boxplots for the dosimetric results of target volumes and organs at risk in first and final submission. HFRT: A-C; CFRT: D-F. Abbreviations: PTVcw, chest wall planning target volume; PTVsc + ax, supraclavicular fossa plus axilla levels I, II, III planning target volume; PTVim, internal mammary nodal planning target volume; LADCA, left anterior descending coronary artery; PRV, planning organs at risk volume; Vx, the relative volume irradiated to a minimum dose x Gy; Dmean, mean dose; Dmax, maximal dose

For OARs, the optimal plus acceptable rates of heart Dmean, ipsilateral lung V5Gy, and stomach V5Gy were significantly improved in the final submission compared to the first submission, which were 100% vs. 73.1% (P = 0.016), 92.3% vs. 65.4% (P = 0.016), and 92.3% vs. 53.8% (P = 0.002), respectively (Table 4, Fig. 2B, C, E, and F). In the first and final submission, the mean values of heart Dmean were 11.5 Gy vs. 9.7 Gy for HFRT and 11.5 Gy vs. 11.0 Gy for CFRT, respectively (Additional file 1: Table B and C). Although the protocol compliance of LADCA V40Gy was significantly enhanced, it was still low after revision at only 65.4%.

For dosimetric verification, all institutions reported > 90% gamma passing rate (median: 96.9% [range: 90.9–100%]).

Discussion

To our best knowledge, this is the first study to evaluate the IMRT and VMAT plans regarding regional nodal irradiation including IMNI in the planning benchmark case, and is also the first study to compare the first and revised plans before enrolling patients. The results showed that a number of major deviations were found in the first submission. After revision, the major deviations were corrected; the protocol compliance was significantly improved and was of high level; and the inter-institutional consistency of planning quality was achieved in the revised plans in the benchmark case.

Some previous studies showed that a variety of potential protocol deviations and heterogeneities were always detected in the pretrial benchmark case, and many of them could be improved during actual patient enrollment [26,27,28,29,30]. In the current study, some deviations were found in the first submitted plans and were corrected by timely review and feedback. Almost all of the dose parameters were improved and inter-institutional variations were decreased after revision as shown in Fig. 2, guaranteeing the planning quality and its uniformity. Similarly, in the EORTC AMAROS trial 10,981/22023, the protocol deviations found in the benchmark case were considerably improved at 18 months after the trial started by adapting the recommendations from the QA committee, and inter-institutional conformance was achieved [31]. Furthermore, the QA program in the EORTC 22922/10925 trial showed that the number of deviations found in the individual case review was substantially less than that in the benchmark case [27, 30]. In the previous QA programs on PMRT, either two-dimensional RT or 3DCRT technique was always used [31,32,33,34]. However, IMRT and VMAT were used in all plans during our benchmark case. For the large volume irradiation including chest wall and regional lymph nodes simultaneously with IMNI, the plan design was highly complicated; for example, many fields (sometimes ≥ 10) were necessary for multi-beam IMRT and should be reasonably arranged to achieve dose homogeneity and conformity [35, 36]. Because the most common chest wall recurrence site is the skin and subcutaneous tissues anterior to the pectoralis muscles [37, 38], the use of skin flash was recommended for IMRT and VMAT, which could be solved by different methods [24, 25]. However, the skin flash was not applied in five first submitted plans and were corrected after feedback, which should be noted for patients enrolled in the future. In addition, inhomogeneity correction was an important step during plan design to obtain more accurate dose calculation [39], which was applied in all plans in our study.

In our study, the case used for the benchmark planning had a considerably large irradiated volume with left-sided breast cancer, including chest wall, supraclavicular fossa, axilla levels I-III, and IMN region, for which the plan design was very difficult. Various optimization strategies were used by the dosimetrists. In the first submission, insufficient target coverage, hot spot dose, and dose inhomogeneity in PTV were common major deviations. The protocol compliance rates were all low for first PTVcw, PTVsc + ax, and PTVim V100% that were significantly improved to 96.2%, 100.0%, and 88.5% after revision, respectively. Though the acceptable rate of PTVim V100% was lower than that of other targets due to heart and lung sparing, the PTVim V90% of three plans that did not satisfy the dose constraint were 97.4%, 95.1%, and 96.8%, which met the criteria of electron beam and were higher than the CTVim V90% of 86.9% in the DBCG-IMN study using two-dimensional RT technique [40]. In addition, although the hot-spot dose and dose uniformity constraint of PTV V110% < 25% seemed to be permissive, the actual mean values of hot-spot doses were 49.9–51.1 Gy and 57.4–58.9 Gy, and those of the PTV V110% were 4.5–5.8% and 8.0–12.1% for HFRT and CFRT regimens, respectively, which were acceptable.

Given that increased radiation-induced heart and lung injury were the main concerns for IMNI [41, 42], more attention should be paid to heart and lung dose, especially for left-sided breast cancer. In our study, the protocol compliance of heart and lung Dmean was improved after revision. The mean value of heart Dmean was 11.0 Gy with CFRT in our study, while it was 5.2 Gy in the benchmark case of the KROG 0806 trial. In contrast to IMRT or VMAT used in our study, partially wide tangent field and reverse hockey stick techniques were used in the KROG 0806 trial [34]. LADCA was a key substructure associated with radiation-induced cardiac damage [43]. Although the protocol compliance of LADCA V40Gy was significantly improved, it was only 65.4% in the final submission. The high dose to the heart and LADCA was mainly attributed to inclusion of IMNI and the close proximity of the heart to the target in this case, which is not uncommon in our practice. A systematic review of heart doses showed that irradiating the IMN approximately doubled the mean heart dose (MHD) in left-sided breast cancer (8.4 Gy vs. 4.2 Gy). Meanwhile, women with unfavorable anatomy received higher heart dose since small differences in the anatomy of the heart’s location can substantially affect heart dose [44]. The other systematic review of heart dose in breast RT showed that Asian countries reported the highest MHD for left-sided RT among the four continents (6.2 Gy vs. 2.8–3.9 Gy), probably partially because of differences in anatomy [45]. Darby et al. reported that if the MHD was 10 Gy for a 50-year-old woman, her absolute risk of death from ischemic heart disease would increase from 1.9% to 3.4% [46], which might compromise the potential gains from IMNI [7, 47]. Therefore, individualized cardiac-sparing techniques, such as deep inspiration breath hold, are encouraged for actual enrolled cases with high predicted heart dose, to reduce the exposure dose [48, 49]. The present study showed acceptable lung dose, with the mean ipsilateral lung V20Gy in CFRT regimen being lower than that in the KROG 0806 trial (29.4% vs. 34.6%) [34].

It is worth noting that the use of multi-beam IMRT and VMAT improves homogeneity and conformity at the expense of extending low-dose spread [35, 50], which was an easily ignored predictor for toxicities, such as radiation pneumonitis, digestive symptoms, second cancer, or lymphopenia [51,52,53,54]. Insufficient constraint on low-dose spread was one of the most common major deviations in our study. The protocol compliance rates of heart, ipsilateral lung, and stomach V5Gy in the first plans were unsatisfactory mainly because the dosimetrists lacked experience with a less strict limit on relevant optimization parameters. In addition, there was much room for improvement for the low-dose radiation to contralateral lung, contralateral breast, and liver in the first submission despite the majority of them showing protocol compliance. These were improved subsequently and the variations were reduced by stricter optimization strategy after revision. The ipsilateral shoulder joint V30Gy was also an easily overlooked parameter relating to shoulder joint dysfunction, which was also improved. All final plans’ dosimetric verification met the gamma criteria, suggesting that they could be implemented safely in clinical practice.

This study has some limitations. First, there was only one benchmark case in this QA procedure, and the large irradiated volume and left-sided tumor resulted in difficulties for plan design, which might be unrepresentative, but was effective to improve the ability of dosimetrists in individual institutions. Second, owing to the close proximity between the IMN and chest wall, the unintentional IMN dose in the non-IMNI group is an important focus, which might affect the trial results. However, no benchmark case was provided for non-IMNI planning, and the unintentional IMN dose was not evaluated in this study, which would be assessed in individual case review. Third, electron beams were not used in this benchmark case; therefore, careful QA is warranted in subsequent individual case review for the actual enrolled patients. Last, the protocol compliance in other follow-up cases was not evaluated in this paper. We will report the results of subsequent individual case review in the near future and reflect upon the fact that this benchmark planning procedure provided a meaningful contribution to improving the plan qualities for actual enrolled patients.

Conclusions

In this planning benchmark case, a number of major deviations were found in the first submission, and they were corrected after revision. The protocol compliance was significantly improved and was of high level in the final submission. The reduced variations will guarantee good RT plan quality and its inter-institutional consistency. The benchmark case results provided a valuable insight into the importance of pretrial QA, continuous education, communication through regular workshops, real-time central review, and feedback in multi-center clinical trials.