Reliability and Validity of a Japanese‐language and Culturally Adapted Version of the Musculoskeletal Tumor Society Scoring System for the Lower Extremity

Background The Musculoskeletal Tumor Society (MSTS) scoring system is a widely used functional evaluation tool for patients treated for musculoskeletal tumors. Although the MSTS scoring system has been validated in English and Brazilian Portuguese, a Japanese version of the MSTS scoring system has not yet been validated. Questions/purpose We sought to determine whether a Japanese‐language translation of the MSTS scoring system for the lower extremity had (1) sufficient reliability and internal consistency, (2) adequate construct validity, and (3) reasonable criterion validity compared with the Toronto Extremity Salvage Score (TESS) and SF‐36 using psychometric analysis. Methods The Japanese version of the MSTS scoring system was developed using accepted guidelines, which included translation of the English version of the MSTS into Japanese by five native Japanese bilingual musculoskeletal oncology surgeons and integrated into one document. One hundred patients with a diagnosis of intermediate or malignant bone or soft tissue tumors located in the lower extremity and who had undergone tumor resection with or without reconstruction or amputation participated in this study. Reliability was evaluated by test‐retest analysis, and internal consistency was established by Cronbach's alpha coefficient. Construct validity was evaluated using the principal factor analysis and Akaike information criterion network. Criterion validity was evaluated by comparing the MSTS scoring system with the TESS and SF‐36. Results Test‐retest analysis showed a high intraclass correlation coefficient (0.92; 95% CI, 0.88‐0.95), indicating high reliability of the Japanese version of the MSTS scoring system, although a considerable ceiling effect was observed, with 23 patients (23%) given the maximum score. Cronbach's alpha coefficient was 0.87 (95% CI, 0.82‐0.90), suggesting a high level of internal consistency. Factor analysis revealed that all items had high loading values and communalities; we identified a central role for the items “walking” and “gait” according to the Akaike information criterion network. The total MSTS score was correlated with that of the TESS (r = 0.81; 95% CI, 0.73‐0.87; p < 0.001) and the physical component summary and physical functioning of the SF‐36. Conclusions The Japanese‐language translation of the MSTS scoring system for the lower extremity has sufficient reliability and reasonable validity. Nevertheless, the observation of a ceiling effect suggests poor ability of this system to discriminate from among patients who have a high level of function.


Introduction
Advances in the treatment of musculoskeletal tumors, including accurate radiographic diagnostic tools, more effective and precise chemotherapeutic regimens, and various alternative reconstruction techniques, have led to a change in paradigm among musculoskeletal oncology surgeons; now, the goal of a treatment is not only to save lives but also to improve the quality of life. Extirpation of tumor tissue is expected to cause some degree of functional impairment; therefore, efforts should be made to decrease such impairment. Studies have reported on the functional outcomes of impairments on musculoskeletal tumor treatment [9,10,14,16,17,19,21].
The Musculoskeletal Tumor Society (MSTS) scoring system was developed in 1985 and revised in 1993 as a physician-derived tool to measure a patient's functional outcome and quality of life after musculoskeletal tumor treatment [5]. Since then, this system has been used in numerous studies to evaluate functional outcomes [9,10,14,16,17,19,22], making it one of the most widely used functional evaluation tools. The original scoring system was written in English and subsequently has been translated into Brazilian Portuguese and Japanese. Although the original and Brazilian Portuguese versions have been validated [8,13,22], the Japanese-language and cross-culturally adapted version of the MSTS scoring system has not yet been validated, which is a considerable shortcoming in light of the substantial volume of musculoskeletal oncology research now being done in Japan. Moreover, several tools are available for analyzing physical function or health-related quality of life, such as the Toronto Extremity Salvage Score (TESS) [4] or SF-36 [3], which are patient-derived assessments. Few studies have shown a correlation between the MSTS scores and those of the two other methods [7,8,12].
Therefore, we performed a validation analysis of the Japanese-language translation of the MSTS scoring system for the lower extremity, focusing on psychometric characteristics; specifically, we sought to determine whether the MSTS scoring system had (1) sufficient reliability and internal consistency, (2) adequate construct validity, and (3) adequate criterion validity compared with the TESS and SF-36.

Materials and Methods
This study was designed as a cross-sectional study, and study approval was obtained from the institutional review boards of the participating institutes. The patients meeting the following eligibility criteria were included in the study: (1) a diagnosis of intermediate or malignant bone or soft tissue tumors located in the lower extremity or pelvic girdle, according to the 2013 WHO classification [6]; (2) age between 12 and 85 years; (3) a minimum interval of 6 months after the most recent definitive surgery; and (4) confirmed absence of local recurrence or distant metastasis after definitive surgery. Patient recruitment was conducted from August 2014 to December 2014, and 100 patients agreed to participate ( Table 1). The MSTS scoring system is the most widely used physician-completed scoring system [5] and is based on an analysis of factors pertinent to the patient as a whole and of those specific to the affected limb. It contains six items: pain, function, emotional acceptance, use of any external support, walking ability, and gait. These items were included in 1983 based on recommendations of a committee of the International Symposium in Limb Salvage and after modifications by the MSTS in 1993 [5]. Each of these items is assigned a value of 0 to 5 points, and the total scores are divided by the maximum possible number of points (30 points). The score is subsequently obtained by multiplying the calculated point value by 100.

Translation and Cross-cultural Adaptation
The Japanese version of the MSTS scoring system approved by the Japanese Orthopaedic Association Musculoskeletal Tumor Committee was made available to members in 2010. To develop the version, a translation to Japanese was prepared along with a cross-cultural adaptation of the MSTS scoring system.
The English version of the MSTS scoring system was translated separately by five native Japanese bilingual musculoskeletal oncology surgeons with proficiency in Japanese and English. Because each sentence was relatively simple, no professional medical interpreter was used.
Subsequently, all independent translations were compared and combined into one document. Back-translation was not performed; meanwhile, the final version was approved by all translators. During the translation, all translators consented to require no modification from the point of view of cross-cultural adaptation since each description of the item in the original version fits well to the Japanese lifestyle and appeared appropriate. After release of the Japanese version, we have been using it in the clinical setting, and so far, we have had no difficulties with interpretation of the questionnaire.

Psychometric Characteristics
To validate the Japanese version of the MSTS scoring system, psychometric analysis of 100 patients was performed. Investigators who were not surgeons but were research nurses or therapists evaluated each patient's score during an interview in our clinic.
Reliability, referring to the consistency or reproducibility of measurement, was evaluated by test-retest analysis, which consists of two administrations of the same test to the same patient on two different occasions. A second survey was performed for the same patient 2 to 5 weeks after the first survey by ether the same investigator or a different investigator. The test-retest reliability of the MSTS scoring system was assessed by calculating the intraclass correlation coefficient (ICC). The ICC was calculated on the basis of the responses of the first (test) and second surveys (retest) for each item and the total score. In general, an ICC greater that 0.9 is regarded as an index of high reproducibility. In addition to test-retest analysis, floor and ceiling effects were calculated for each item and total score. These effects were considered present if greater than 15% of the respondents achieved the lowest (floor effect) or highest (ceiling effect) number of points [18].
Validity refers to a measure of how well a test measures what it claims to measure. In this study, we used three types of analysis for validity. First, internal consistency, which reflects the strength of the relationship among the six items in the system, was identified by calculating Cronbach's alpha coefficient. A coefficient of 0.80 was used as the cutoff, indicating sufficient internal consistency [11], namely, the system includes inconsistent items if the coefficient is less than 0.80. Second, construct validity, which refers to the degree to which the system assesses the underlying theoretical construct it is supposed to measure, generally is evaluated by principal component analysis, which is a statistical method examining the latent structure of the six items in the system. The number of factors to take into account was determined by Kaiser's criteria (eigenvalue [ 1 rule) and a scree plot. Next, principal Resection + prosthesis 11 (11) Resection + biological reconstruction 14 (14) Curettage 5 (5) component analysis followed by repeated varimax rotations was done to calculate each item's factor loading, which represents how much the item explains a variable. Factor loadings can range from À1 to 1 and values close to À1 or 1 indicate that the item strongly affects the variable. In addition to this conventional method, the degree of correlation among the items in the system was evaluated using the Akaike information criterion (AIC) network to examine the latent structure of the system's construct validity; this is a graphic modeling method used to assess the relationship among the items [1]. The Categorical Data Analysis Program (Institute of Statistical Mathematics, Tachikawa, Japan) was used to perform crosstable analyses involving all combinations of the two items in the MSTS system. The program simultaneously searched for the best subset and categorization of explanatory items and automatically indicated matching combinations using the AIC [15]. Third, criterion validity, which measures how well one measure predicts an outcome for another measure, was evaluated by comparing the Japanese version of the MSTS scoring system with evaluation systems already adequately validated, including the TESS and SF-36. The TESS is a disease-specific, self-assessment questionnaire developed for patients with musculoskeletal tumors in the extremities [4]; similarly, the SF-36 is also a self-assessment questionnaire for comprehensive evaluation of health-related quality of life [3]. All patients participating in this study completed the TESS and SF-36 questionnaires at the same time as the first analysis of the MSTS score. The correlation of these measures was assessed using Spearman's correlation coefficient. The investigators were not surgeons but were research nurses or therapists. The investigators evaluated each patient's score and interviewed each patient in our clinic throughout the study. Total scores generated by the Japanese version of the MSTS scoring system for all 100 patients ranged from 13 to 100, and the mean score was 82 (SD, 20.4). There were no missing data in any of the surveys.

Statistical Analysis
All statistical analyses were performed using SPSS version 18.0 (SPSS Inc, Chicago, IL, USA). The scores were reported as mean values ± SD. The threshold for significance was set at a probability less than 0.05.

Reliability and Floor and Ceiling Effects
The ICC between the test and retest of the total score obtained by the Japanese version of the MSTS scoring system was 0.92 (95% CI, 0.88-0.95), indicating high reliability of the system. ''Support'' showed the highest ICC (ICC, 0.93; 95% CI, 0.90-0.95), whereas ''emotional acceptance'' had the lowest ICC (ICC, 0.69; 95% CI, 0.57-0.78) ( Table 2). No patients obtained the lowest possible total score of 0, indicating the absence of floor effects. In contrast, the highest possible total score of 100 was observed in 23 patients (23%). This suggests a considerable ceiling effect for the total score of the Japanese version of the MSTS scoring system. The same trend was observed for each item; therefore, all items revealed ceiling effects with an absence of floor effects.
We also performed a sensitivity analysis to clarify whether the patients with intermediate tumors (underwent a less-invasive surgery) may contribute to the ceiling effect observed (Fig. 1). The results showed that the average MSTS scores of all the patients with intermediate tumors was 95, whereas that of all the patients without intermediate tumors was 80. The overall Cronbach's alpha coefficient was 0.87 (95% CI, 0.82-0.90), suggesting a high level of internal consistency (Table 2).

Construct Validity
On the basis of the results of principal component analysis, the construct validity of the Japanese version of the MSTS scoring system is high, based on the result of principal component analysis. First, we used a scree plot to determine how many factors to extract in the factor analysis (Fig. 2). The scree plot displays the eigenvalues (amount of variation in the total sample accounted for by that factor) in descending order of their magnitude against the number of the factor. A sharp break in the plot suggests the optimal number of the factor. According to the result, the appropriate number of the factor was considered to be one. On the basis of the factor-loading pattern, all items had high loading values and communalities of the first factor ( Table 3). The AIC network, which is another index of construct validity, identified ''walking'' and ''gait'' as having a central role among the six factors of the system. Minimal distance assortments, that is, degrees of independence, for the two-item groupings was observed, and the AIC network of these six items was observed graphically with the spatial association of the calculation of each item (Fig. 3). This AIC network showed that ''walking'' and ''gait'' were related to all the other items; however, the other four items were unrelated to each other. These results were consistent with the highest loading values for ''walking'' and ''gait'' in the factor analysis.

Criterion Validity
Criterion validity analysis confirmed that the total score of the Japanese version of the MSTS scoring system was highly correlated with that of the TESS and the physical component summary and physical functioning of the SF-36. The criterion validity was evaluated by correlating the total score of the system with the total score of the TESS and each of the components of the SF-36 (Table 4). The total score of the MSTS system correlated well with that of the TESS (r = 0.81; 95% CI, 0.73-0.87; p \ 0.001). In addition, the physical component summary and physical functioning of the SF-36 correlated with the total MSTS Fig. 1 The sensitivity analysis shows the high performance of the Japanese version of the MSTS scoring system especially in patients with lower function. The average MSTS scores for the patients who underwent less invasive surgery is shown to the right of the bars, and those of the patients who underwent invasive surgery is shown to the left.  score and physical items of the MSTS system (function, support, walking, and gait). However, the mental component summary of the SF-36 did not correlate with the total MSTS score (r = 0.04; 95% CI, À0.15 to 0.24) or any of the items of the MSTS scoring system.

Discussion
The MSTS scoring system is a widely used functional evaluation tool for patients undergoing musculoskeletal tumor resection. The Japanese-language version of the MSTS scoring system was made available to members in 2010. The MSTS scoring system has been validated twice [8,20]; however, the Japanese version has not yet been validated. We aimed to clarify the validity of the Japaneselanguage version of the MSTS scoring system using psychometric analysis. Our study showed that the Japanese version has sufficient reliability with high ICC by testretest analysis but also with a high ceiling effect, a favorable internal consistency with a high Cronbach's alpha coefficient, adequate construct validity indicated by factor analysis and the AIC network, and reasonable criterion validity compared with the TESS and SF-36. Our study has several limitations. First, the study was performed with Japanese patients using a Japanese version of the MSTS scoring system. The original MSTS scoring system was officially introduced in Japan in 2000 by the Japanese Orthopaedic Association [20], and the Japanese version was developed by the Japanese Orthopaedic Association Tumor Committee in 2010. Intercultural differences might affect the outcomes; however, the items included in the MSTS scoring system appear to be basic factors. Therefore, there might be little or no difference between the patients from Japan and those from others part of the world regarding these items. Moreover, several indices of psychometric analyses, such as Cronbach's alpha coefficient and ICC, were similar to those reported in studies of patients with other ethnicities [4,8,13]. Second, we included patients who achieved relatively high MSTS scores compared with patients in other studies [2,7,8,13]. This may have been the case because only a few patients with intermediate tumors (based on the WHO classification of bone and soft tissue tumors [6], in terms of biological potential) were included. Seventeen (17%) patients in our cohort had a diagnosis of an intermediate tumor, such as a well-differentiated liposarcoma (atypical lipomatous tumor) or giant cell tumor of bone, and these patients undergo smaller surgical procedures, such as marginal resection or tumor curettage, respectively, than do patients with malignant tumors who undergo wide resection, leading to superior function. In addition, we performed a sensitivity analysis to analyze the performance of the MSTS scoring system with and without patients who were high-functioning. The result indicated the inferior discrimination capability of the MSTS scoring system for patients with higher function. From these perspectives, this system may not be appropriate for patients undergoing less invasive surgery.
Sufficient reliability of the Japanese version of the MSTS scoring system was confirmed by psychometric analysis; however, we observed a considerable ceiling effect. In this study, 23% of the patients surveyed (23 of 100) achieved the highest possible score, indicating that the Japanese version of the MSTS scoring system has a ceiling effect. This is important because it may reflect low sensitivity in terms of discriminating the patients who have superior function. The ceiling effect was noted even among patients with a malignant tumor; however, the proportion of patients achieving the highest score of this group was lower than the proportion from the intermediate tumor group (20% and 35%, respectively). Nevertheless, the MSTS scoring system contains only six items, which represent basic functions or symptoms, and the Japanese version also achieved a high ICC and a high Cronbach's alpha coefficient. The effort required by patients for completing the MSTS questionnaire would be much less compared with the TESS (32 items) or SF-36 (36 items). In general, increasing the number of items makes the scale more reliable; the tradeoffs are lower convenience, high time consumed, and more effort required for patients and investigators. The six items in the MSTS system appear  Table 4. adequately reliable despite being concise. These advantages of the MSTS scoring system have been described, adding support to the validation of the system [8,13]. Psychometric analysis also provides adequate validity of the items included in the Japanese version of the MSTS scoring system. In the current study, construct validity analyses were performed to reveal the latent structure of the MSTS scoring system using principal component analysis and the AIC network. In other words, we performed those analyses to assess whether the Japanese version of the MSTS scoring system measures the functional outcome of the patients with musculoskeletal tumors. According to the results of the factor analysis, the system exhibits high unidimensionality, that is, the Japanese version of the MSTS scoring system can be used as a reliable scale. There was no item showing low factor loading, specifically so regarding items associated with physical activities, such as function, support, walking, and gait. In addition, the central roles of walking and gait identified in the AIC network supported the highest factor loading values for these parameters. Criterion validity showed that a high correlation with total scores of the MSTS scoring system was observed with the TESS and SF-36 physical component summary. Furthermore, these two components significantly correlated with the items associated with the MSTS components regarding the physical activities mentioned above.
Interpretation of the two nonphysical items, pain and emotional acceptance, is a bit less clear in the context of the Japanese version of the MSTS score that we studied. These two items showed adequate, although relatively lower, factor loading according to the factor analysis (0.68 and 0.57, respectively). In addition, criterion validity analysis resulted in no correlation being observed between each of these two items and any component of the SF-36 or the total score of the TESS (Table 4). This indicates that these two can be relatively inappropriate compared with the physical items. The MSTS scoring system is a physician-reported questionnaire, and this type of assessment would not be suitable for evaluating the adequacy of treatment without the bias of the physician, especially regarding the psychological factors. Another reason for less validity of emotional acceptance is a defect in the questionnaire. This item comprises the expressions ''enthused'', ''satisfied'', ''accepted'', and ''dislike'', and these expressions appear not to be definitive. The TESS system, one of the most frequently used patient-oriented evaluation systems, was translated into Japanese in 2015 [12]. Data accumulation regarding the outcome evaluation from various aspects would be required in the future.
With better outcomes for patients with cancer achieved through improved cancer therapy, there will be greater demand for a comprehensive outcome assessment of treatment, including psychometric assessment and healthrelated quality of life Through this study, we confirmed that the Japanese version of the MSTS scoring system is a reliable and adequate evaluation system for assessing the functional outcome in patients with musculoskeletal tumors. In addition, the MSTS scoring system is widely used, and continued accumulation of results is valuable. Therefore, it should be used as a standard tool in the future. However, we also found that the system is not sufficient for evaluation of nonphysical factors such as emotional support. Therefore, a new or adjunct evaluation system should be developed and introduced that more completely and comprehensively assesses patient-derived outcomes and that is applicable to patients with higher function.