Introduction

Osteosarcoma is the third most common type of neoplasia in adolescents preceded only by leukemia and lymphoma [19]. Although it can occur at any age, it is predominantly a disease that afflicts the young with a peak incidence in the second and third decades of life [9]. Numerous variables have been associated with an adverse prognosis in osteosarcoma including metastatic disease at presentation, nonosteoblastic histological subtype, tumor size, male gender, young age, tumor location, genetic variations, poor response to neoadjuvant chemotherapy, and inadequate surgical margins [4, 7, 10, 12]. However, the last two are the only factors that have been shown to independently increase the risk of local recurrence (LR) [3]. The relationship between inadequate margin and local disease recurrence has been clearly reported, although there remains substantial debate about the appropriate “thickness” of a margin for primary high-grade osteosarcoma [2, 3, 12]. From a pragmatic perspective, the interpretation of exactly what entails a marginal and wide excision as defined by Enneking et al. [11] remains inherently subjective and may vary depending on who is assessing the margin. A surgeon may classify a margin one way and the involved pathologist might have a different designation based on histology. In both cases, the experience of the observer likely influences the margin designation.

Chemotherapy has a clear role in the management of high-grade osteosarcoma, leading to improved survival rates when combined with adequate local control by surgical resection or amputation. There is sufficient evidence to argue that a response to chemotherapy (determined by analyzing the percentage of necrotic tumor in the resected specimen after neoadjuvant chemotherapy) is an important risk factor for LR [3, 19], although surprisingly, this is not reflected in any existing staging criteria [15]. The Musculoskeletal Tumor Society (MSTS) was the first to consider the extent (margin) of the surgical excision of sarcomas and relate it into the staging matrix, on which ongoing treatment decisions are based and long-term recurrence-free prognosis is predicted [11]. However, this proposal was made in the era just before the routine use of chemotherapy for bone sarcomas and was considered to be a “surgical” staging system that could be modified by the use of adjuvants such as chemotherapy or radiation therapy. However, the existing surgical staging systems fail to adequately reflect the response to chemotherapy or define an appropriate safe metric distance from the tumor that will allow complete excision and closely predict the chance of disease recurrence.

The purpose of this study was to review a group of patients with primary high-grade osteosarcoma treated with neoadjuvant chemotherapy and surgical resection and to analyze the risk of LR based on the chemotherapy response and surgical margins achieved. We asked the following questions: (1) What predictor or combination of predictors available to the clinician can be assessed that more reliably predict the likelihood of local recurrence? (2) Can we determine a better predictor of local recurrence-free survival than the currently applied system of surgical margins? (3) Can we determine a better predictor of overall survival than the currently applied system of surgical margins?

Patients and Methods

A retrospective study was performed of our prospectively collected oncology database and all patients with a diagnosis of a primary high-grade conventional osteosarcoma treated between January 1, 1997, and December 31, 2012, at our institution were identified. Only patients who had histologically diagnosed high-grade osteosarcoma, younger than age 50 years at the time of diagnosis, and who had been managed with preoperative chemotherapy followed by surgery were included in the study. A minimum of 2 years followup was required for patients alive. From a total of 558 eligible patients seen during the study time period, those with metastasis at diagnosis (n = 103 [18.5%]), progressed on preoperative chemotherapy and did not receive definitive surgery (n = 38 [7%]), patients diagnosed at our institution but who received final surgery at another unit (n = 7 [1%]) and followup elsewhere (n = 18 [3%]), and died from complications of chemotherapy (n = 3 [0.5%]) were excluded.

Patient demographic characteristics were collected in addition to disease-specific variables from the database and then included: sex (male or female), age at diagnosis (< 16 or > 16 years), MSTS stage (intralesional, marginal, wide, radical), nearest margin to the tumor (mm), Fletcher’s classification of type of osteosarcoma [13], tumor size (< 8 cm or > 8 cm), tumor location (extremity or central), percentage chemotherapy necrosis (< 90% or ≥ 90%), vascular invasion (yes or no), complete pathological fracture (yes or no), type of operation (limb salvage or amputation), local recurrence-free survival (LRFS), and overall survival (OS). The unicortical breaks were described in the radiology reports and initially mistakenly coded as pathologic fractures when they would have been the cortical breaks seen in osteosarcomas with large soft tissue extension. They are not pathologic fractures and were included in the nonpathologic group for analysis.

Patient Characteristics

A total of 389 (70% of 558) participants matched the inclusion criteria. Mean age at diagnosis was 16.4 years (SD 0.40; range, 3–49 years), 228 were males (59%), and mean followup was 79 months (SD 3.04; range, 4–219 months). The common anatomical sites for tumor location were the femur (n = 196 [50%]) followed by the tibia (n = 99 [25%]) and the humerus (n = 51 [13%]). Osteosarcoma mean length was 104 mm (SD 2.38; range, 20-420 mm) and osteoblastic subtype was the most prevalent (51%). Sixty patients (15%) presented with a pathological fracture (56 at diagnosis and four after diagnosis) requiring cast immobilization or traction while undergoing preoperative chemotherapy. Endoprosthetic reconstructions were the most commonly carried out surgical procedures (71%) (Table 1).

Table 1 Patient characteristics

Over this time period, patients received neoadjuvant chemotherapy according to the respective European Osteosarcoma Intergroup (EOI) or European and American Osteosarcoma Studies (EURAMOS) trials at the time of their diagnosis primarily receiving cisplatin, doxorubicin, and methotrexate with surgery performed after 8 weeks and 12 weeks, respectively [17, 22]. In terms of histological response to chemotherapy, 42% had ≥ 90% necrosis and 58% had < 90% necrosis. Histological examination was performed by a histopathologist (VPS) highly experienced in bone sarcomas. The resection specimens were photographed and examined for involvement of margins. Perpendicular sections were taken from the margins before bisecting the bone along its long axis exposing most of the tumor. All sections were examined for extent of chemotherapy-induced necrosis. Necrosis was expressed as a percentage of the total tumour area [18]. Margins were measured on histologic slides in millimeters from the resection surface to the nearest tumor (Fig. 1). All relevant information was extracted from the pathology reports and where the metric information of margins was lacking, the pathology slides were reviewed for this study (n = 8).

Fig. 1A–B
figure 1

(A) Cross-section of high-grade osteoblastic osteosarcoma of distal femur shows tumor extending through the cortex into the soft tissue. (B) On microscopy (hematoxylin and eosin; original magnification, ×1.25), the distance of viable tumor to the closest peripheral soft tissue resection margin measures 2 mm.

Statistical Analysis

Descriptive statistics were used to display demographic data. Kaplan-Meier analysis was used to determine LRFS and OS with time zero defined as the date of diagnosis and censored at the date of last followup or local recurrence and death, respectively. Univariate analysis was performed comparing groups with log-rank test and significant variables underwent subsequent multivariate Cox proportional hazard analysis to identify predictors of LRFS and OS with p value < 0.05 considered significant. A new classification of surgical margins (the Birmingham classification) was devised on two stems governed by the response to chemotherapy (good response = ≥ 90% necrosis; poor response = < 90% necrosis) with subdivisions within each stem by margin (≤ 2 mm or > 2 mm). To compare the existing MSTS staging criteria with the proposed Birmingham classification for predicting LR, a two-stage Cox regression model was undertaken where after entering one variable into the model, the second variable would only enter the model if the introduction of the latter significantly improved the prediction of LR. Corresponding Harrell’s C statistics were calculated for the Birmingham classification and MSTS models. This process was repeated comparing the Birmingham classification with different margin thresholds (clear versus contaminated, 1 mm, 3 mm) and response to chemotherapy. Statistical analysis was performed on Stata (Version 9.2; College Station, TX, USA).

Results

Predictors of Local Recurrence

A total of 47 patients developed LR (12%). Factors that were identified as significant predictors of LRFS were surgical margins (intralesional margin hazard ratio [HR], 9.9; 95% confidence interval [CI], 1.2–82; p = 0.03 versus radical margin HR, 1) and response to neoadjuvant chemotherapy (response < 90% chemotherapy necrosis HR, 3.8; 95% CI, 1.7–8.4; p = 0.001 versus response ≥ 90% chemotherapy necrosis HR, 1) (Table 2; Appendix 1 [Supplemental materials are available with the online version of CORR ®.]). With the numbers we had, we could not detect a higher risk of LR by age < 16 years or those without a pathologic fracture.

Table 2 Multivariate Cox proportional hazard analysis for local recurrence-free survival

Comparison of Birmingham With MSTS for Local Recurrence

The Birmingham system offered a better prediction of LRFS than did the MSTS system in a two-stage Cox regression model, wherein introducing the Birmingham classification into a model with the MSTS variable improved the prediction of LR (MSTS HR, 1.2; 95% CI, 0.8–1.9; p = 0.3 and Birmingham HR, 1.9; 95% CI, 1.3–2.7; p < 0.0001). Introducing the MSTS variable into a model already with the Birmingham classification did not improve the prediction of LR (Birmingham HR, 2.1; 95% CI, 1.5–2.8; p < 0.0001; MSTS omitted p = 0.3). The probability of predicting LR as quantified by Harrell’s C-statistic was 0.68 for the Birmingham classification and 0.59 for MSTS. This process was repeated comparing the Birmingham classification of chemotherapy response and 2-mm cutoff margin with other models using chemotherapy response and different margin cutoffs (clear/contaminated, 1 mm, 3 mm) with the Birmingham classification having the highest Harrell’s C-statistic (0.66 for chemotherapy response and clear/contaminated margins, 0.65 for 1 mm, and 0.61 for 3 mm). The Kaplan-Meier curve analysis for LRFS illustrates the spread achieved between groups for the Birmingham classification (log-rank test p < 0.0001) (Fig. 2). Patients classified as Birmingham 2b (margins ≤ 2 mm, < 90% chemotherapy necrosis) were 20× more likely to develop LR than those with margins > 2 mm and ≥ 90% chemotherapy necrosis (Birmingham 2b HR, 19.6; 95% CI, 2.6-144; p = 0.003 versus Birmingham 1a HR, 1) (Table 3). No significant interaction between surgical margins and chemotherapy response was found on multivariate analyses. Chemotherapy response was a very strong predictor and should be part of any assessment for predicting local recurrence (not just margins). Adding necrosis to MSTS improves prediction, but the two-by-four stratification is cumbersome, spread is not well defined, and using Enneking definitions potentially affects reproducibility of margin status between centers. Using a clearly defined margin of 2 mm (or potentially clear/not clear in future studies with bigger numbers) in combination with chemotherapy response makes for an easy, reproducible classification, which can be communicated accurately between surgeons.

Fig. 2
figure 2

Kaplan-Meier analysis shows local recurrence-free survival (LRFS) of patients with osteosarcoma using the Birmingham classification. Birmingham 1a (≥ 90% chemotherapy necrosis, > 2-mm margins) 99% 5-year LRFS (95% CI, 91%–99.8%). Birmingham 1b (≥ 90% chemotherapy necrosis, ≤ 2-mm margins) 92% 5-year LRFS (95% CI, 83%–96%). Birmingham 2a (< 90% chemotherapy necrosis, > 2-mm margins) 84% 5-year LRFS (95% CI, 73%–91%). Birmingham 2b (< 90% chemotherapy necrosis, ≤ 2-mm margins) 76% 5-year LRFS (95% CI, 77%–83%).

Table 3 The Birmingham classification system for margins and chemotherapy-induced necrosis and corresponding local recurrence-free survival rates

Comparison of Birmingham With MSTS for Overall Survival

The 5-year OS for the patient cohort was 67% (95% CI, 61%–71%). After controlling for size and tumor location, amputation (HR, 1.53; 95% CI, 1.0–2.2; p = 0.02 versus limb salvage HR, 1), vascular invasion (HR, 2.2; 95% CI, 1.4–3.3; p < 0.0001 versus no vascular invasion HR, 1), and < 90% chemotherapy necrosis (HR, 3.1; 95% CI, 2.0–4.8; p < 0.0001 versus > 90% chemotherapy necrosis HR, 1) were independent risk factors predicting OS (Table 4; Appendix 2 [Supplemental materials are available with the online version of CORR ®.]). Margins stratified by MSTS criteria were not predictive of overall survival (log-rank test p = 0.14), whereas the Birmingham classification showed differences in survival between Groups 1 and 2 divided by chemotherapy (log-rank test p < 0.0001) but not between subdivisions by margin (Fig. 3). Adding necrosis to MSTS does improve prediction. Point of classification is that margins + response is significantly better than margins alone and that response should always be in the equation.

Table 4 Multivariate Cox proportional hazard analysis for overall survival
Fig. 3
figure 3

Kaplan-Meier analysis showing difference in overall survival (OS) between groups by the Birmingham classification compared with no differences in MSTS. Birmingham 1a (≥ 90% chemotherapy necrosis, > 2-mm margins) 86% 5-year OS (95% confidence interval [CI], 75%–92%). Birmingham 1b (≥ 90% chemotherapy necrosis, ≤ 2-mm margins) 85% 5-year OS (95% CI, 75%–91%). Birmingham 2a (< 90% chemotherapy necrosis, > 2-mm margins) 53% 5-year OS (95% CI, 42%–62%). Birmingham 2b (< 90% chemotherapy necrosis, ≤ 2-mm margins) 53% 5-year OS (95% CI, 44%–62%).

Discussion

Neoadjuvant chemotherapy response and surgical margins achieved after oncological resections are well-established prognostic factors for survival and LR in patients with osteosarcoma [4, 6]. There is a clear relationship between inadequate margins and local disease recurrence, although what is defined as “marginal” and “wide” excision remains inherently subjective and may vary depending on the reporter [21]. Existing surgical staging systems fail to reflect a response to chemotherapy or define an appropriate safe metric distance from the tumor that will allow complete excision and closely predict the risk of disease recurrence. With this large, observational study, we confirmed that surgical margins and chemotherapy response were associated with LR. By incorporating surgical margins and response to chemotherapy in a new system, we were able to predict LR and OS better than the commonly used MSTS margin criteria.

We acknowledge that there are several limitations to our study. Although this study had a relatively large number of patients, the chemotherapy regimens varied somewhat over the time of the study and both the length and intensity of preoperative chemotherapy will likely change the overall percentage of patients with good or standard responses as has been shown in other studies [17]. Also, the determination of percent necrosis is subjective and may vary even between pathologists with extensive experience with bone sarcomas. Our cutoff of a 2-mm margin was not different from the other metric margins (Harrell’s C statistics Birmingham 0.68, 1 mm 0.65, 3 mm 0.61), but we chose it because it was the best predictor of LR when taken along with chemotherapy response in our series and is an attainable target for sarcoma surgeons. With the numbers we had, there was no difference between 2-mm margins and clear/contaminated margins (Harrell’s C statistic 0.66) in combination with chemotherapy, but with a larger number of patients, this may become statistically significant. Bertrand et al. in a study of 51 patients with osteosarcomas used negative margins > 1 mm versus positive margins to predict LR [5].

The proportion of patients experiencing LR in the group of patients our study (12%) is comparable to previous studies [4, 23] and confirms that both response to chemotherapy and the surgical margin are predictive of LR (Tables 2, 4). Bacci et al. in a study of 789 nonmetastatic osteosarcoma found that inadequate surgical margins and poor response to chemotherapy were associated with local recurrence [4]. Andreou et al. attempted to define the width of surgical margin in the soft tissue periphery after resection of osteosarcomas but had significant missing data and could not correlate this with local recurrence [1].

The Birmingham classification only incorporates two predictors of LR. The authors acknowledge, and have shown through the multivariate analyses in this article, that only chemotherapy response and margins were associated with LR. An ideal classification should be simple, reproducible, and clinically relevant and the authors feel this system could be taken forward and, if adopted by a major organization such as the International Society of Limb Salvage (ISOLS), could lead for the first time to a universal, simple classification of defined surgical margins in osteosarcoma. This system has only been modeled on osteosarcoma margins and application to other tumor types (such as Ewing’s sarcoma) requires further investigation, because it may not be applicable. Although the recording of the response to neoadjuvant chemotherapy is standard practice in the majority of centers treating patients with osteosarcomas of bone, the recording of surgical margins is variable. Currently, there is no consensus on how surgical margins for sarcomas are reported in different centers worldwide, although the MSTS staging remains popular with orthopaedic oncologists [16]. Our results, however, demonstrate that using both chemotherapy response and surgical margin attained in millimeters is more predictive for the development of LR when compared with the MSTS criteria. We propose that the adoption of such a simple system incorporating measurable and reproducible variables and replacing the subjectivity of “wide,” “marginal,” or “microscopic tumor at margin” will allow standardization of treatment and monitoring as well as improving communication about cases and aid future research. For this reason, we did not look at combining chemotherapy response with the four types of margins included in the MSTS classification to avoid using these subjective definitions of margins. Adding necrosis to the MSTS classification might have also increased the predictive value of that system.

The ability of functional imaging to predict response to chemotherapy in osteosarcomas becomes more vital with our findings, and several groups are investigating the role of PET-CT or functional MRI to preoperatively predict response to neoadjuvant chemotherapy, which may aid the preoperative planning of margins in light of this classification system [8, 14]. In particular, it may affect the decision as to the type of surgery planned if the margins are likely to be less than 2 mm and the chemotherapy response is poor because of the probable higher rate of LR after a closer margin. A more radical resection or amputation might be considered preferable in that circumstance. Whether this affects survival is still unclear because recent work has suggested that even amputation in this group of patients may not improve survival [20].

Because the classification system has been shown to predict survival and locally recurrent disease, it proves to be a useful surgical staging system; however, the authors are cognizant that the prime predictor of survival is chemotherapy response, as shown by the fact that the groups with > 90% response to chemotherapy had the best survival regardless of margin status (Fig. 3). Bielack et al. noted that incomplete surgical margins compared with macroscopically complete surgery were related to survival in an analysis of 1702 patients with osteosarcoma [6]. In their study, Bertrand et al. found that a positive margin compared with a negative margin > 1 mm was an independent predictor of local recurrence and overall survival [5]. However, as a result of low patient numbers, they were unable to show the effect of chemotherapy response on LR or indeed OS. Response to chemotherapy has otherwise been commonly associated with survival in several large osteosarcoma trials [4, 6, 23].

If our observations are confirmed by others in larger series, we propose that a combination of the recording of surgical margins in millimeters and the response to neoadjuvant chemotherapy should be the standard practice in oncology centers treating patients with osteosarcoma because we have shown that this combination can predict the risk of LR. Although intralesional versus tumor-free margins may indeed turn out to be discriminatory for LR in larger multicenter studies, the ideal of resecting the tumor with a clear margin of at least 2 mm of normal tissue in our series was a better predictor, and the authors strongly feel that a classification system that includes very close resection margins, ie, no tumor at the edge of the resection specimen, may lead to surgical errors and thus inadequate treatment, especially in those patients who are found to have a poor response to chemotherapy after surgery (25% risk of LR at 5 years). Conversely, the notion of not performing limb salvage if a margin of > 10 mm is not technically possible (a classic Enneking wide margin) has also been shown to be incorrect; a clear margin of 2 mm of normal tissue in a poor responding tumor has a 16% risk of LR at 5 years but without compromising survival. Further studies are required to determine whether further resection, secondary amputation, or adjuvant radiotherapy is the optimal treatment for the group of patients at higher risk of LR (Birmingham 2b).

We believe the Birmingham classification may represent an improvement on the ability to predict LR and survival in patients with osteosarcoma treated with neoadjuvant chemotherapy. If our observations are confirmed by others perhaps in a large prospective multicenter validation study undertaken by the ISOLS, it could offer useful prognostic information for treating oncologists and be helpful in advising patients.