Reliability and Validity of a Japanese-language and Culturally Adapted Version of the Musculoskeletal Tumor Society Scoring System for the Lower Extremity
- 475 Downloads
The Musculoskeletal Tumor Society (MSTS) scoring system is a widely used functional evaluation tool for patients treated for musculoskeletal tumors. Although the MSTS scoring system has been validated in English and Brazilian Portuguese, a Japanese version of the MSTS scoring system has not yet been validated.
We sought to determine whether a Japanese-language translation of the MSTS scoring system for the lower extremity had (1) sufficient reliability and internal consistency, (2) adequate construct validity, and (3) reasonable criterion validity compared with the Toronto Extremity Salvage Score (TESS) and SF-36 using psychometric analysis.
The Japanese version of the MSTS scoring system was developed using accepted guidelines, which included translation of the English version of the MSTS into Japanese by five native Japanese bilingual musculoskeletal oncology surgeons and integrated into one document. One hundred patients with a diagnosis of intermediate or malignant bone or soft tissue tumors located in the lower extremity and who had undergone tumor resection with or without reconstruction or amputation participated in this study. Reliability was evaluated by test-retest analysis, and internal consistency was established by Cronbach’s alpha coefficient. Construct validity was evaluated using the principal factor analysis and Akaike information criterion network. Criterion validity was evaluated by comparing the MSTS scoring system with the TESS and SF-36.
Test-retest analysis showed a high intraclass correlation coefficient (0.92; 95% CI, 0.88–0.95), indicating high reliability of the Japanese version of the MSTS scoring system, although a considerable ceiling effect was observed, with 23 patients (23%) given the maximum score. Cronbach’s alpha coefficient was 0.87 (95% CI, 0.82–0.90), suggesting a high level of internal consistency. Factor analysis revealed that all items had high loading values and communalities; we identified a central role for the items “walking” and “gait” according to the Akaike information criterion network. The total MSTS score was correlated with that of the TESS (r = 0.81; 95% CI, 0.73–0.87; p < 0.001) and the physical component summary and physical functioning of the SF-36.
The Japanese-language translation of the MSTS scoring system for the lower extremity has sufficient reliability and reasonable validity. Nevertheless, the observation of a ceiling effect suggests poor ability of this system to discriminate from among patients who have a high level of function.
Advances in the treatment of musculoskeletal tumors, including accurate radiographic diagnostic tools, more effective and precise chemotherapeutic regimens, and various alternative reconstruction techniques, have led to a change in paradigm among musculoskeletal oncology surgeons; now, the goal of a treatment is not only to save lives but also to improve the quality of life. Extirpation of tumor tissue is expected to cause some degree of functional impairment; therefore, efforts should be made to decrease such impairment. Studies have reported on the functional outcomes of impairments on musculoskeletal tumor treatment [9, 10, 14, 16, 17, 19, 21].
The Musculoskeletal Tumor Society (MSTS) scoring system was developed in 1985 and revised in 1993 as a physician-derived tool to measure a patient’s functional outcome and quality of life after musculoskeletal tumor treatment . Since then, this system has been used in numerous studies to evaluate functional outcomes [9, 10, 14, 16, 17, 19, 22], making it one of the most widely used functional evaluation tools. The original scoring system was written in English and subsequently has been translated into Brazilian Portuguese and Japanese. Although the original and Brazilian Portuguese versions have been validated [8, 13, 22], the Japanese-language and cross-culturally adapted version of the MSTS scoring system has not yet been validated, which is a considerable shortcoming in light of the substantial volume of musculoskeletal oncology research now being done in Japan. Moreover, several tools are available for analyzing physical function or health-related quality of life, such as the Toronto Extremity Salvage Score (TESS)  or SF-36 , which are patient-derived assessments. Few studies have shown a correlation between the MSTS scores and those of the two other methods [7, 8, 12].
Therefore, we performed a validation analysis of the Japanese-language translation of the MSTS scoring system for the lower extremity, focusing on psychometric characteristics; specifically, we sought to determine whether the MSTS scoring system had (1) sufficient reliability and internal consistency, (2) adequate construct validity, and (3) adequate criterion validity compared with the TESS and SF-36.
Materials and Methods
Descriptive characteristics of the study population
Number of patients (%)
Age, years; mean [SD]
48.6 (range, 14–82) [17.9]
Time from surgery, months; mean [SD]
47.3 (range, 7–229) [50.1]
Undifferentiated pleomorphic sarcoma
Giant cell tumor of bone
Malignant fibrous histiocytoma of bone
Type of surgery
Limb salvage surgery
Resection + prosthesis
Resection + biological reconstruction
The MSTS scoring system is the most widely used physician-completed scoring system  and is based on an analysis of factors pertinent to the patient as a whole and of those specific to the affected limb. It contains six items: pain, function, emotional acceptance, use of any external support, walking ability, and gait. These items were included in 1983 based on recommendations of a committee of the International Symposium in Limb Salvage and after modifications by the MSTS in 1993 . Each of these items is assigned a value of 0 to 5 points, and the total scores are divided by the maximum possible number of points (30 points). The score is subsequently obtained by multiplying the calculated point value by 100.
Translation and Cross-cultural Adaptation
The Japanese version of the MSTS scoring system approved by the Japanese Orthopaedic Association Musculoskeletal Tumor Committee was made available to members in 2010. To develop the version, a translation to Japanese was prepared along with a cross-cultural adaptation of the MSTS scoring system.
The English version of the MSTS scoring system was translated separately by five native Japanese bilingual musculoskeletal oncology surgeons with proficiency in Japanese and English. Because each sentence was relatively simple, no professional medical interpreter was used. Subsequently, all independent translations were compared and combined into one document. Back-translation was not performed; meanwhile, the final version was approved by all translators. During the translation, all translators consented to require no modification from the point of view of cross-cultural adaptation since each description of the item in the original version fits well to the Japanese lifestyle and appeared appropriate. After release of the Japanese version, we have been using it in the clinical setting, and so far, we have had no difficulties with interpretation of the questionnaire.
To validate the Japanese version of the MSTS scoring system, psychometric analysis of 100 patients was performed. Investigators who were not surgeons but were research nurses or therapists evaluated each patient’s score during an interview in our clinic.
Reliability, referring to the consistency or reproducibility of measurement, was evaluated by test-retest analysis, which consists of two administrations of the same test to the same patient on two different occasions. A second survey was performed for the same patient 2 to 5 weeks after the first survey by ether the same investigator or a different investigator. The test-retest reliability of the MSTS scoring system was assessed by calculating the intraclass correlation coefficient (ICC). The ICC was calculated on the basis of the responses of the first (test) and second surveys (retest) for each item and the total score. In general, an ICC greater that 0.9 is regarded as an index of high reproducibility. In addition to test-retest analysis, floor and ceiling effects were calculated for each item and total score. These effects were considered present if greater than 15% of the respondents achieved the lowest (floor effect) or highest (ceiling effect) number of points .
Validity refers to a measure of how well a test measures what it claims to measure. In this study, we used three types of analysis for validity. First, internal consistency, which reflects the strength of the relationship among the six items in the system, was identified by calculating Cronbach’s alpha coefficient. A coefficient of 0.80 was used as the cutoff, indicating sufficient internal consistency , namely, the system includes inconsistent items if the coefficient is less than 0.80. Second, construct validity, which refers to the degree to which the system assesses the underlying theoretical construct it is supposed to measure, generally is evaluated by principal component analysis, which is a statistical method examining the latent structure of the six items in the system. The number of factors to take into account was determined by Kaiser’s criteria (eigenvalue > 1 rule) and a scree plot. Next, principal component analysis followed by repeated varimax rotations was done to calculate each item’s factor loading, which represents how much the item explains a variable. Factor loadings can range from −1 to 1 and values close to −1 or 1 indicate that the item strongly affects the variable. In addition to this conventional method, the degree of correlation among the items in the system was evaluated using the Akaike information criterion (AIC) network to examine the latent structure of the system’s construct validity; this is a graphic modeling method used to assess the relationship among the items . The Categorical Data Analysis Program (Institute of Statistical Mathematics, Tachikawa, Japan) was used to perform crosstable analyses involving all combinations of the two items in the MSTS system. The program simultaneously searched for the best subset and categorization of explanatory items and automatically indicated matching combinations using the AIC . Third, criterion validity, which measures how well one measure predicts an outcome for another measure, was evaluated by comparing the Japanese version of the MSTS scoring system with evaluation systems already adequately validated, including the TESS and SF-36. The TESS is a disease-specific, self-assessment questionnaire developed for patients with musculoskeletal tumors in the extremities ; similarly, the SF-36 is also a self-assessment questionnaire for comprehensive evaluation of health-related quality of life . All patients participating in this study completed the TESS and SF-36 questionnaires at the same time as the first analysis of the MSTS score. The correlation of these measures was assessed using Spearman’s correlation coefficient.
The investigators were not surgeons but were research nurses or therapists. The investigators evaluated each patient’s score and interviewed each patient in our clinic throughout the study. Total scores generated by the Japanese version of the MSTS scoring system for all 100 patients ranged from 13 to 100, and the mean score was 82 (SD, 20.4). There were no missing data in any of the surveys.
All statistical analyses were performed using SPSS version 18.0 (SPSS Inc, Chicago, IL, USA). The scores were reported as mean values ± SD. The threshold for significance was set at a probability less than 0.05.
Reliability and Floor and Ceiling Effects
Summary of test-retest data
The overall Cronbach’s alpha coefficient was 0.87 (95% CI, 0.82–0.90), suggesting a high level of internal consistency (Table 2).
Factor loading of MSTS scoring system
Spearman’s correlation coefficients (95% CI)
SF-36 (physical component summary)
SF-36 (mental component summary)
(−0.15 to 0.24)
(−0.17 to 0.22)
(−0.26 to 0.13)
(−0.02 to 0.36)
(−0.31 to 0.08)
(−0.20 to 0.19)
(−0.26 to 0.13)
SF-36 (physical functioning)
SF-36 (role physical)
SF-36 (bodily pain)
SF-36 (general health)
(−0.01 to 0.37)
(−0.08 to 0.30)
SF-36 (social functioning)
SF-36 (role emotional)
SF-36 (mental health)
(−0.09 to 0.30)
(−0.08 to 0.30)
(−0.14 to 0.24)
(−0.02 to 0.36)
(−0.06 to 0.32)
The MSTS scoring system is a widely used functional evaluation tool for patients undergoing musculoskeletal tumor resection. The Japanese-language version of the MSTS scoring system was made available to members in 2010. The MSTS scoring system has been validated twice [8, 20]; however, the Japanese version has not yet been validated. We aimed to clarify the validity of the Japanese-language version of the MSTS scoring system using psychometric analysis. Our study showed that the Japanese version has sufficient reliability with high ICC by test-retest analysis but also with a high ceiling effect, a favorable internal consistency with a high Cronbach’s alpha coefficient, adequate construct validity indicated by factor analysis and the AIC network, and reasonable criterion validity compared with the TESS and SF-36.
Our study has several limitations. First, the study was performed with Japanese patients using a Japanese version of the MSTS scoring system. The original MSTS scoring system was officially introduced in Japan in 2000 by the Japanese Orthopaedic Association , and the Japanese version was developed by the Japanese Orthopaedic Association Tumor Committee in 2010. Intercultural differences might affect the outcomes; however, the items included in the MSTS scoring system appear to be basic factors. Therefore, there might be little or no difference between the patients from Japan and those from others part of the world regarding these items. Moreover, several indices of psychometric analyses, such as Cronbach’s alpha coefficient and ICC, were similar to those reported in studies of patients with other ethnicities [4, 8, 13]. Second, we included patients who achieved relatively high MSTS scores compared with patients in other studies [2, 7, 8, 13]. This may have been the case because only a few patients with intermediate tumors (based on the WHO classification of bone and soft tissue tumors , in terms of biological potential) were included. Seventeen (17%) patients in our cohort had a diagnosis of an intermediate tumor, such as a well-differentiated liposarcoma (atypical lipomatous tumor) or giant cell tumor of bone, and these patients undergo smaller surgical procedures, such as marginal resection or tumor curettage, respectively, than do patients with malignant tumors who undergo wide resection, leading to superior function. In addition, we performed a sensitivity analysis to analyze the performance of the MSTS scoring system with and without patients who were high-functioning. The result indicated the inferior discrimination capability of the MSTS scoring system for patients with higher function. From these perspectives, this system may not be appropriate for patients undergoing less invasive surgery.
Sufficient reliability of the Japanese version of the MSTS scoring system was confirmed by psychometric analysis; however, we observed a considerable ceiling effect. In this study, 23% of the patients surveyed (23 of 100) achieved the highest possible score, indicating that the Japanese version of the MSTS scoring system has a ceiling effect. This is important because it may reflect low sensitivity in terms of discriminating the patients who have superior function. The ceiling effect was noted even among patients with a malignant tumor; however, the proportion of patients achieving the highest score of this group was lower than the proportion from the intermediate tumor group (20% and 35%, respectively). Nevertheless, the MSTS scoring system contains only six items, which represent basic functions or symptoms, and the Japanese version also achieved a high ICC and a high Cronbach’s alpha coefficient. The effort required by patients for completing the MSTS questionnaire would be much less compared with the TESS (32 items) or SF-36 (36 items). In general, increasing the number of items makes the scale more reliable; the tradeoffs are lower convenience, high time consumed, and more effort required for patients and investigators. The six items in the MSTS system appear adequately reliable despite being concise. These advantages of the MSTS scoring system have been described, adding support to the validation of the system [8, 13].
Psychometric analysis also provides adequate validity of the items included in the Japanese version of the MSTS scoring system. In the current study, construct validity analyses were performed to reveal the latent structure of the MSTS scoring system using principal component analysis and the AIC network. In other words, we performed those analyses to assess whether the Japanese version of the MSTS scoring system measures the functional outcome of the patients with musculoskeletal tumors. According to the results of the factor analysis, the system exhibits high unidimensionality, that is, the Japanese version of the MSTS scoring system can be used as a reliable scale. There was no item showing low factor loading, specifically so regarding items associated with physical activities, such as function, support, walking, and gait. In addition, the central roles of walking and gait identified in the AIC network supported the highest factor loading values for these parameters. Criterion validity showed that a high correlation with total scores of the MSTS scoring system was observed with the TESS and SF-36 physical component summary. Furthermore, these two components significantly correlated with the items associated with the MSTS components regarding the physical activities mentioned above.
Interpretation of the two nonphysical items, pain and emotional acceptance, is a bit less clear in the context of the Japanese version of the MSTS score that we studied. These two items showed adequate, although relatively lower, factor loading according to the factor analysis (0.68 and 0.57, respectively). In addition, criterion validity analysis resulted in no correlation being observed between each of these two items and any component of the SF-36 or the total score of the TESS (Table 4). This indicates that these two can be relatively inappropriate compared with the physical items. The MSTS scoring system is a physician-reported questionnaire, and this type of assessment would not be suitable for evaluating the adequacy of treatment without the bias of the physician, especially regarding the psychological factors. Another reason for less validity of emotional acceptance is a defect in the questionnaire. This item comprises the expressions “enthused”, “satisfied”, ”accepted”, and “dislike”, and these expressions appear not to be definitive. The TESS system, one of the most frequently used patient-oriented evaluation systems, was translated into Japanese in 2015 . Data accumulation regarding the outcome evaluation from various aspects would be required in the future.
With better outcomes for patients with cancer achieved through improved cancer therapy, there will be greater demand for a comprehensive outcome assessment of treatment, including psychometric assessment and health-related quality of life
Through this study, we confirmed that the Japanese version of the MSTS scoring system is a reliable and adequate evaluation system for assessing the functional outcome in patients with musculoskeletal tumors. In addition, the MSTS scoring system is widely used, and continued accumulation of results is valuable. Therefore, it should be used as a standard tool in the future. However, we also found that the system is not sufficient for evaluation of nonphysical factors such as emotional support. Therefore, a new or adjunct evaluation system should be developed and introduced that more completely and comprehensively assesses patient-derived outcomes and that is applicable to patients with higher function.
We thank Kazuo Saita MD (Department of Orthopaedic Surgery, Saitama Medical Center, Jichi Medical University, Saitama, Japan), Hirokazu Chuman MD (Department of Musculoskeletal Oncology, National Cancer Center Hospital, Tokyo, Japan), and Hirotaka Kawano MD (Department of Orthopaedic Surgery, The University of Tokyo Hospital, Tokyo, Japan) for clinical assistance, and Misuzu Mori, Yoko Kato (Department of Musculoskeletal Oncology, National Cancer Center Hospital, Tokyo, Japan), and Keiko Hishiki (Department of Orthopaedic Surgery, Saitama Medical Center, Jichi Medical University, Saitama, Japan) for administrative support with data collection. Furthermore, we thank Enago (Crimson Interactive KK, Tokyo, Japan) for reviewing the English language.
- 1.Akaike H. Information Theory and an Extension of the Maximum Likelihood Principle. In Petrov BN, Caski F, eds. Proceedings of the Second International Symposium on Information Theory. Budapest, Hungary: Akademiai Kiado; 1973:267–281.Google Scholar
- 6.Fletcher CD, Bridge JA, Hogendoorn P, Mertens F. WHO Classification of Tumours of Soft Tissue and Bone. Geneva, Switzerland: WHO Press; 2013.Google Scholar
- 11.Nunnally JC, Bernstein IH. Psychometric Theory. New York, NY: McGraw-Hill; 1994.Google Scholar
- 12.Ogura K, Uehara K, Akiyama T, Iwata S, Shinoda Y, Kobayashi E, Saita K, Yonemoto T, Kawano H, Chuman H, Davis AM, Kawai A. Cross-cultural adaptation and validation of the Japanese version of the Toronto Extremity Salvage Score (TESS) for patients with malignant musculoskeletal tumors in the lower extremities. J Orthop Sci. 2015;20:1098–1105.CrossRefPubMedGoogle Scholar
- 20.The Japanese Orthopaedic Association Committee of Tumors. General Rules of Clinical and Pathological Studies on Malignant Bone Tumours (in Japanese). Tokyo, Japan: Kanehara Shuppan Co; 2000.Google Scholar