Introduction

Treatments of oncological diseases of musculoskeletal system mostly aim to guarantee function and quality of life (QoL) at their best, especially when eradication of the neoplasia is not achievable. Although in the past most outcome studies focused on survival rate and local recurrences as primary outcomes, in the last decades more emphasis has been placed on patient’s perspective. It has been shown that taking into account patient perception promotes communication, improves decision-making process and increases patient satisfaction. Objective measures integrated with patient perception could provide better medical care [1,2,3].

The Musculoskeletal Tumor Society (MSTS) score was developed in 1993 as an objective tool to measure functional outcome in patients affected by neoplasms [4]. Even if the MSTS score has been never properly validated in its original version [4], it is widely used in clinical practice. As a matter of fact, the original version underwent cross-cultural adaptation and validation in several languages, such as Greek [5], Danish [6], Brazilian [7], Chinese [8], Japanese [9], French [10] and Turkish [11].

The score is available for upper and lower limb [12]. Main strengths of the score mainly rely on ease of use and briefness [13]. Main concern is that the MSTS score was formulated as an objective tool (hetero-administered), but it is currently worldwide used as a patient-reported outcome measure (PROM) (self-administered).

The main purpose of the present study is to evaluate if there is a difference between objective or subjective administration of the MSTS score in a cohort of patients affected by musculoskeletal oncological diseases. The hypothesis of the study is that there are no differences between patient- and clinician-reported outcomes using the MSTS score for both lower and upper limb.

Materials and methods

Study design

An observational study was conducted, after approval of the study protocol by the local ethic committee (NP 4912 Spedali Civili, Brescia).

Patients

All patients who underwent surgery for bone or soft tissue localization of neoplastic disease in lower or upper limb from June 2015 to June 2020 at Spedali Civili in Brescia, Italy, were considered eligible for the study. Patients were included regardless of previous treatment and disease stage. Onco-emathologic diseases were also included. Inclusion criteria also included: Italian as mother language, age of 18 years or above, minimum twelve-month follow-up from surgery, willingness to enter the study and ability to provide informed consent. Patients with a Karnofsky’s score lower than 30% [14], those who did not undergo surgical treatment and those who had diagnosis of dementia (any type) or were in a state of altered metal status were excluded.

Intervention

In order to administer the score as a PROM, the first part of the study consisted of translation and cross-cultural adaptation of an Italian version of the questionnaire according to well-established guidelines [15, 16]. Questionnaires were then administered during the postoperative follow-up visits in an outpatient setting. Thirty to sixty minutes after completion of self-administered questionnaires, patients underwent an interview by an orthopedic surgeon, based on the MSTS questionnaire. The examiner was blinded to the patients’ answers at the self-administration of the questionnaire. Retest was conducted after a period of two weeks after first administration in order to avoid any recall bias.

Outcome measures

Besides the Italian version of MSTS questionnaire, all patients filled out the national validated version of the Toronto Extremity Savage Score (TESS) [17]. Each patient completed the upper or the lower limb version of both MSTS and TESS score, depending on the localization of the disease. The national validated version of the SF-36 [18] was used as general health measurement.

The MSTS questionnaire [4] consists of six domains, each scored on a scale from 0 to 5, with a higher score indicating better function. The total score, ranging from 0 (maximum disability) to 30 (no impairment), can be transformed to a point scale of 0 to 100.

The TESS score assesses functional outcome in musculoskeletal tumor patients aged 12–85 years [19]. It consists of 29 items for upper extremity and 30 items for lower extremity. The degree of disability is rated from 0 (complete disability) to 5 (no functional impairment) in each item. Similar to MSTS, the final TESS score can be converted to a score ranging from 0 to 100 points.

The SF-36 is a non-pathology-related questionnaire aiming to test both physical and mental components of patient perception of QoL. It is composed of 36 questions divided into eight different domains. Each of these domains can be rated from 0 (worst) to 100 (best).

Data analysis

Sample size was estimated according to established guidelines for questionnaire validation [16, 17]. The Italian version of MSTS questionnaire was first test retested on at least 30 patients per group (upper and lower limb). Psychometric properties of the questionnaire were then assessed on a sample of 50 patients per group.

All the data were analyzed by SPSS 25 (IBM Statistics, Harmonk, NY, USA). Descriptive statistics were used to report scores and answers distribution for each question. Data normality was ascertained by Shapiro–Wilk test. Discrete data were expressed as mean ± standard deviation in case of normal data distribution, otherwise as median and interquartile range (IQR). Categories were expressed as frequencies and percentages.

Ceiling and floor effect were considered significant if more than 15% of patient reached the lowest or the highest possible score, respectively.

Content validity could not be tested through the multi-trait analysis because each question corresponds to a domain. The structure of the questionnaire was determined by the factor analysis. Factor’s number was calculated using Kaiser criteria (eigenvalue > 1) and a scree plot.

Construct validity was calculated through Pearson’s coefficient correlation. The Italian version of MSTS score was compared to TESS and SF-36 score. Correlation between objective and self-administered MSTS score was assessed through Pearson’s coefficient. Correlation was deemed as very weak when ranging 0 to 0.19, weak if between 0.20 and 0.39, mild if between 0.40 and 0.69, strong if between 0.70 and 0.89 and very strong if between 0.90 and 1 [15].

Reliability was assessed by internal consistency and test–retest reliability. Cronbach’s alpha coefficient measured internal consistency for every domain. Internal consistency higher than 0.70 indicates good reproducibility [20]. Intraclass correlation coefficient (ICC) measured test–retest reliability. ICC values ranged between 0 (absolute disagreement) and 1 (maximum disagreement). Values were interpreted as follows: poor reliability when less than 0.50, moderate when between 0.50 and 0.75, good when between 0.75 and 0.90 and excellent when greater than 0.90 [21].

Significance at probability tests was estimated for p value < 0.05.

Results

No major issues were encountered during translation from the original version. No major difficulties in comprehension were revealed during testing the pre-final version. Patients took about 5–10 min to complete the questionnaire (see Appendix 1).

The psychometric properties were tested on a finale sample of 110 patients: 59 affected by lower extremity involvement and 51 affected by upper extremity involvement. Patients’ characteristics are shown in Tables 1 and 2, for upper and lower limb, respectively.

Table 1 Upper limb involvement: patients characteristics
Table 2 Lower limb involvement: patients characteristics

Psycometric properties of MSTS score for upper extremity

The descriptive statistics data are shown in Tables 3 (self-administered) and 4 (hetero-administered). No missing data were reported, thus confirming that the translated questionnaire was well understood by the patients. A ceiling effect > 15% was observed for all items in both administration modalities.

Table 3 MSTS upper extremity (self-administered)
Table 4 MSTS upper extremity (hetero-administered)

Factor analysis, as indicated in the scree plots (Fig. 1), showed that the appropriate number of factors was 1. This was visible in both the self-administered and hetero-administered modalities.

Fig. 1
figure 1

A MSTS upper extremity (self-administered). Scree plot for factor analysis. B MSTS upper extremity (hetero-administered). Scree plot for factor analysis

The assessment of construct validity (Appendix 2) showed that both administration modalities had overall good correlation with TESS (r = 0.78 and r = 0.80 for the self-administered version and for the hetero-administered version, respectively). On the opposite, both administrations showed poor correlation with SF-36: (r = 0.19 and and r = 0.1 for the self-administered version and for the hetero-administered version, respectively).

Internal consistency was as high as Cronbach’s alpha = 0.84 for self-admnistered MSTS and 0.77 for hetero-administered MST versions (Appendix 3). Test–retest reliability was good in both versions, with an overall ICC of 0.84 for self-administered version and 0.78 for hetero-administered version (Appendix 4).

Correlation between self-administered and hetero-administered version of the questionnaire was high (r = 0.96) (Table 5).

Table 5 MSTS Upper extremity

Psycometric properties of MSTS score for lower extremity

Descriptive statistics are shown in Tables 6 (self-administered) and 7 (hetero-administered). No missing data were reported. A ceiling effect > 15% was observed for all items in both administration modalities.

Table 6 MSTS Lower extremity (self-administered)
Table 7 MSTS Lower extremity (hetero-administered)

Factor analysis, as indicated in the scree plots (Fig. 2), showed that the appropriate number of factors is 1. This was confirmed for both the self-administered and the hetero-administered modalities.

Fig. 2
figure 2

A MSTS lower extremity (self-administered). Scree plot for factor analysis. B MSTS lower extremity (hetero-administered). Scree plot for factor analysis

Assessment of construct validity showed that both administration modalities had overall strong correlation with TESS, as high as r = 0.80 for both versions, and mild correlation with SF-36, equal to r = 0.5 for both versions (Appendix 5).

Internal consistency was very high for both self- and hetero-administered MSTS versions. Self-administered version showed an overall Cronbach’s alpha = 0.96, while the hetero-administered version reached an overall Cronbach’s alpha = 0.98 (Appendix 6). Test–retest reliability was excellent in both versions: ICC = 0.96 for the self-administered version and ICC = 0.98 for the hetero-administered version (Appendix 7).

The correlation between self-administered and hetero-administered version of the questionnaire was as high as r = 0.97 (Table 8).

Table 8 MSTS Lower extremity. Correlation between self-administered (A) and hetero-administered (E) versions of the questionnaire

Discussion

Main finding of the present study was that MSTS score can be interchangeably used as a PROM or as an objective tool because both administration modalities showed to be valid and the correlation between the two was very high. At the same time, it must be highlighted that the Italian version of the MSTS score showed good psychometric properties for both lower and upper extremity.

MSTS score has been originally developed as a clinician-administered questionnaire and it is now widely used to evaluate residual function in patients with extremity tumors [4]. However, even if it has been developed to be completed by an examiner, MSTS score is often reported in the literature as a self-administered tool [22] or even as a mixed version with some questions completed by the patient and some others completed by the clinician.

Marchese et al. [23] and Ginsberg et al. [24] assessed functional outcomes in patients affected by lower extremity sarcomas. Both studies used the MSTS score by asking patients to complete pain, emotional acceptance and supports, while physical therapists rated gait and walking abilities.

Janssen et al. [25] first compared self- and hetero-administered modality of the original version, in a study about functional outcome after surgery in patients affected by lower and upper extremity bone metastasis. According to the authors, clinician reports overestimate function as compared to the patient perceived score. This assumption strongly differs from the outcome of the present study. The reason probably relies in the study design. As recognized by the authors [25], they collected the clinician reports by resuming information from previously noted medical records, but the score was developed to be completed at the time of consultation. As a matter of fact, the discrepancy was the largest for the common overall function and emotional acceptance domains. In the present study, patients were directly visited and interviewed by the clinician, who filled out the form during the clinical examination, thus reducing the risk of possible misinterpretations.

Looking at the results of the psychometric properties, some issues deserve further explanation.

The original version of MSTS score was never properly validated, therefore results of the present study can be only compared to other cross-cultural adaptations [5,6,7,8,9,10,11, 26]. We observed that ceiling effect was high for all questions both for upper and lower extremity forms, which it means that the questionnaire cannot distinguish higher functioning patients. This finding is in agreement with previous studies [6, 7, 26, 27]. At least two possible explanations can be attempted. First, MSTS score was developed in a time when limb savage surgery and reconstructive options were less common, and expectations on functional results were quite low. Secondly, a sensitivity analysis aiming to distinguish between hystotypes or at least between aggressive, intermediate and benign tumors could have probably lowered or partially better explained the effect. Saebye et al. [6], in their study on cross-cultural adaptation and validation of Danish version of MSTS score, found no ceiling effect among patients with lower extremity bone sarcomas or high aggressive tumors after stratification. Rebolledo et al. [7] provided cross-cultural adaptation and validation of the Brazilian Portuguese MSTS score. They only included patients who underwent limb salvage surgery or amputation for primary sarcoma of the lower limb, and no ceiling effect was reported. Unfortunately, sample size of the present study did not allow stratification for hystotype, albeit a strong and valid outcome measurement tool should be as universal as possible. Inclusion criteria of the present study were kept wide on purpose. In fact, MSTS score is widely used for any kind of tumor, and therefore, it was deemed important to be as inclusive as possible.

In agreement with previous translations of MSTS questionnaire [5, 7,8,9, 26], we observed that the instrument composed by all six items is able to evaluate one latent factor (e.g., lower or upper limb function).

TESS and SF-36 were chosen to test the construct validity to be consistent with the previously translated versions [5, 7,8,9, 11, 26]. MSTS and TESS reported moderate to strong correlation in all studies. Results differed and became controversial when it comes to SF-36. While some studies [9, 11, 26, 27] showed better correlation with the physical component of SF-36, some others did not [5]. However, the SF-36 is a tool designed and widely used as a general health and health-related QoL assessment measurement, thus this controversial correlation can be easily understood.

In terms of reliability, we found a Cronbach’s alpha coefficient > 0.95 for both upper and lower extremity in each administration modality. A general accepted rule is that Cronbach’s alpha of 0.6–0.7 indicates an acceptable level of reliability, and 0.8 or greater a very good level. However, values higher than 0.95 are not necessarily good, since they might be an indication of redundance [28]. Overall, previous cross-cultural adaptations of MSTS score showed a Cronbach’s alpha between 0.70 and 0.90. Once again, the different values of Cronbach’s alpha in the present study are possibly due to indirect influence from external factors such as heterogeneity of study population [20].

The present study has some limitations. First, as already mentioned, a sensitivity analysis could have been clarified some controversial psychometric properties. However, it must be highlighted that results are comparable to other cross-cultural adaptations of the questionnaire. Therefore, the major flaw is probably that the original version has never been properly tested. At the same time, as the MSTS score is the most popular and widely used, it was mandatory to provide an Italian version. The added value of the study relies on its main purpose: the comparison between self- and hetero-administration of the score. Second, only patients attending the outpatient clinic were asked to participate in the study. It must be considered that somehow patients with progressive disease or unsatisfied patients are less likely to be available for testing in the same setting.

In conclusion, the Italian version of the MSTS is a valid tool to evaluate outcomes of surgical treatment of patients affected by extremities tumors and it can be used as a subjective tool for both lower and upper extremity.