Functional Outcome Measurement in Patients with Lower-Extremity Soft Tissue Sarcoma: A Systematic Literature Review

Background The importance of functional outcome (FO) in the treatment of patients with extremity soft tissue sarcoma (STS) has been increasingly recognized in the last three decades. This systematic review aimed to investigate how FO is measured in surgically treated lower-extremity STS patients. Methods A systematic search of PubMed, Web of Science, and Scopus was performed based on the PRISMA guidelines. The methodologic quality of the publications was measured using the MINORS tool. The results from the included studies examining measurement types, measures, and time of FO measurement were compiled. The FO pooled mean and standard deviation were calculated as a weighted average for the groups. The validity of the applied measures is reported. Results The literature search found 3461 publications, 37 of which met the inclusion criteria. The measurement types used were clinician-reported outcomes (n = 27), patient-reported outcomes (n = 20), and observer-reported outcomes (n = 2). The most frequently used measures were the Toronto Extremity Salvage Score (TESS) (n = 16) and the Musculoskeletal Tumor Society (MSTS) score 1993 (n = 12). The postoperative FO was relatively good. The pooled mean TESS and MSTS 1993 scores were respectively 83.3 and 86.2 (out of 100). Of the 10 previously reported measures, 3 provide validated FO scores. The methodologic quality of publications was generally low. Conclusions Based on this systematic review, several different methods exist for assessing FO in patients with lower-extremity sarcoma. The most frequently used measure is a validated TESS. The postoperative FO of patients with lower-extremity STS seems to increase to the preoperative baseline level during long-term follow-up evaluation.


ABSTRACT
Background. The importance of functional outcome (FO) in the treatment of patients with extremity soft tissue sarcoma (STS) has been increasingly recognized in the last three decades. This systematic review aimed to investigate how FO is measured in surgically treated lower-extremity STS patients. Methods. A systematic search of PubMed, Web of Science, and Scopus was performed based on the PRISMA guidelines. The methodologic quality of the publications was measured using the MINORS tool. The results from the included studies examining measurement types, measures, and time of FO measurement were compiled. The FO pooled mean and standard deviation were calculated as a weighted average for the groups. The validity of the applied measures is reported. Results. The literature search found 3461 publications, 37 of which met the inclusion criteria. The measurement types used were clinician-reported outcomes (n = 27), patientreported outcomes (n = 20), and observer-reported outcomes (n = 2). The most frequently used measures were the Toronto Extremity Salvage Score (TESS) (n = 16) and the Musculoskeletal Tumor Society (MSTS) score 1993 (n = 12). The postoperative FO was relatively good. The pooled mean TESS and MSTS 1993 scores were respectively 83.3 and 86.2 (out of 100). Of the 10 previously reported measures, 3 provide validated FO scores. The methodologic quality of publications was generally low.
Conclusions. Based on this systematic review, several different methods exist for assessing FO in patients with lower-extremity sarcoma. The most frequently used measure is a validated TESS. The postoperative FO of patients with lower-extremity STS seems to increase to the preoperative baseline level during long-term follow-up evaluation.
In the last 30 years, limb salvage has become the standard of care in the treatment of extremity sarcoma, and amputations are rare. This has been achieved by improved diagnostics, pre-and postoperative radiotherapy, and more refined reconstructive surgical methods. Reports on functional outcome (FO) have been increasing. [1][2][3] Several methods for assessing FO have been described, including subjective and objective measures. [2][3][4] Functional outcome measures should be valid, reliable, accurate, and clinically meaningful for the population in which the measurement is made. 5,6 Consistent usage of the same measures allows benchmarking and comparison of study results over time and between research centers.
A few previous review studies have considered the topic from different perspectives. The review by Davis 4 focused on FO of all extremity tissue sarcoma patients, including upper extremities and bone sarcomas. Tang et al. 3 investigated quality-of-life studies in adult extremity sarcomas (also including upper extremities and bone sarcomas) and all quality-of-life studies. Furtado et al. 2 reviewed physical functioning after treatment for lower-and upper-extremity sarcoma patients, including both bone and soft tissue sarcoma (STS). Only objective measures investigating postural balance, gait, and physical activity were included. All patient-reported outcome (PRO) measurement studies were excluded. Groundland et al. 7 investigated pediatric patients. Wilson et al. 8 studied pelvic sarcoma patients, and Winnette et al. 9 investigated all patients with STS, including abdominal sarcomas. Thus, no systematic literature review has previously focused specifically on measurement of FO after surgical treatment of adult lowerextremity STS patients.
This study aimed to identify how FO has been measured in patients with surgically treated lower-extremity STS. More specifically, we sought to determine the type of methods and measures used to measure FO, whether the measures used had been tested for validity, FO for lowerextremity STS patients, and quality of the publications that report FO.

Overview and Eligibility Criteria for Review
A systematic literature review was performed based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 10 A review protocol was created by the authors and is available on request.
The study included all publications concerning patients with surgically treated lower-extremity STS whose FO was measured. The exclusion criteria ruled out duplication, studies that included fewer than 20 lower-extremity STS patients (considered pilot studies), 11 non-adult study populations, and publications in languages other than English.
In the current review, the measures for assessing FO were classified as ''previously developed and reported measures'' or ''new measures developed by the authors.'' The ''previously developed and reported measures'' were defined as measures developed to assess FO and published previously. Measures, scales, or questionnaires developed by the authors themselves with the purpose to assess FO only in the reviewed article were considered ''new measures developed by the authors.'' To report the type of measurement applied for assessment of FO, we used the terms ''patient-reported outcome,'' ''clinician-reported outcome,'' ''observer-reported outcome,'' and ''performance outcome'' measures. 12 Patient-reported outcome (PRO) measures are based on a patient's subjective assessment. Clinician-reported outcome measures are based on evaluation by a trained health care professional. Observer-reported outcome measures are based on observation by a person other than the patient or a health professional. Performance outcome is based on measurement of performance in specific tasks and are considered objective measures.

Search Methods
PubMed, Scopus, and Web of Science search engines were used for the search. All published articles were retrieved without a search time constraint on 5 May 2018. The keywords combined with Medical Subject Headings (MeSH) terms were ''lower AND (limb OR limbs OR leg OR legs OR extremity OR extremities OR foot) AND sarcoma AND (functional OR functionality OR function OR outcome).'' Two authors (G.K. and M.K.) independently reviewed all titles and appropriate abstracts. All unsuitable articles were excluded by the previously mentioned exclusion criteria. A manual search was performed for all references of suitable studies by review of titles and appropriate abstracts. The included studies were reviewed and added to the final list by the set inclusion criteria. Disagreements in data extraction were resolved by discussion and consensus of the authors (G.K., M.K., K.K., J.R., and I.B.R.).

Study Data
Two authors (G.K. and M.K.) independently collected the following information from the included publications: study period, origin of the study, article type, anatomic location of the tumor, number of patients, age, diagnosis, measures and measurement types used for assessing FO, results of FO, and follow-up time. In case of missing data, an e-mail requesting additional information was sent to the corresponding authors.

Validity Assessment
In this review, a validated FO measure is defined as a measure that has been scientifically validated to assess FO in extremity tumor patients. A literature search was performed to detect suitable literature concerning the validity of the FO measures.

Functional Outcome
In this review, FO is reported from publications that used validated FO measures, and FO results are presented as means ± standard deviations (SDs). The pooled mean and SD were calculated as a weighted average of SDs for the groups. Publications reporting FO for bone sarcoma or upper-extremity STS patients in addition to lower-extremity STS patients were included in the FO report.

Quality of Publications Assessment
Because the quality of a publication can be measured according to many different criteria, and because the validity of these criteria have not been determined, quality was not used to exclude studies in this review. 13 The Methodological Index for Nonrandomized Studies (MINORS) quality assessment tool was used to assess the quality of publications. 14 The MINORS tool is a valid instrument designed to assess the methodologic quality of nonrandomized surgical studies, either comparative or noncomparative. 14 We used the MINORS tool to assess the quality of randomized controlled trials (RCTs), as has been done in previous literature. 15 The MINORS tool consists of the following 12 methodologic items for studies: a clearly stated aim, inclusion of consecutive patients, prospective collection of data, end points appropriate for the aim of the study, unbiased assessment of the study end point, follow-up period appropriate for the aim of the study, less than a 5% loss to follow-up evaluation, and prospective calculation of the study size. Additional criteria for comparative studies include adequate statistical analyses, an adequate control group, contemporary groups, and baseline equivalence of groups. The items are scored as follows; 0 (not reported), 1 (reported but inadequate), or 2 (reported and adequate). The maximum score is 16 for non-comparative studies and 24 for comparative studies. 14 Results are reported as percentages from 0 to 100.

Study Selection
Details of the literature search are presented in Fig. 1. For the final review, 37 publications were selected.

Study Characteristics
The included studies were published between 1984 and 2018. Of the 31 retrospective and 6 prospective studies, 3 were RCTs. Of the 37 studies, 25 were cross-sectional, 7 were cohort, and 2 were case-control studies. The sample sizes of the studies ranged from 25 to 728 patients. The characteristics of the articles and the FO measurements are presented in Table 1.

Type of Measurement to Assess FO
A single PRO measurement was used in 8 of the 37 publications, and a clinician-reported outcome measurement was used in 17 of the 37 publications. The PRO and clinician-reported outcome measurement types were used together in 10 studies. The PRO and observer-reported outcome measures were used in two studies. No performance outcome measurement was used. All six prospective studies used clinician-reported outcome measures, and four of these also used a PRO measure ( Table 2).

FO Measures
The majority of the publications (27/37) used previously reported measures. The most common measures were the Toronto Extremity Salvage Score (TESS) (43%, n = 16) and the Musculoskeletal Tumor Society (MSTS) score 1993 (32%, n = 12). A total of 10 different previously reported measurement tools were used ( Table 2). New measures developed by the authors were used in 10 publications. Of the 27 publications, one previously reported measure alone was used in 14 publications, two and three measures in 6 publications each, and four measures in 1 publication.
The RAND-36 is multidimensional PRO questionnaire identical to the Short Form 36 (SF-36) but uses a different scoring method. 35 In this review, RAND-36 and SF-36 are reported together.

Timing of FO Measurement
In 4 (11%) of the 37 studies, FO was measured both preand postoperatively. 24,37,39,52 More than one postoperative time point of measurement was used in 3 (8%) of the 37 studies. 24,39,52 The time of FO measurement varied in the 31 retrospective studies. The inclusion criteria required a minimum follow-up period of 1 year in most of the studies. In 18 (58%) of the 31 retrospective studies and 12 (50%) of the 24 retrospective cross-sectional studies, the FO was measured at least 1 year after the surgery. The time point of measurement was not reported in seven studies, and additional information was not available. Additional details are presented in Table 1.

Validity of the Measures
Validated FO measures were used in 23 (62%) of the 37 publications. the validated measures were the TESS 1,53 and MSTS 1985 4,71 and 1993 1,55 ( Table 2). The SF-36 56 validation also has been studied, but its validity as a extremity FO measure in this population is questionable. 1 In addition, the validity of the Short Musculoskeletal Function Assessment (SMFA) questionnaire, 57,58 the Foot Function Index (FFI), 59,60 the Karnofsky score, 61 and the Reintegration to Normal Living Index (RNL) 62,63 measures also has been studied. However, these measures were studies in a different population of patients or as a general FO measure, or both. No studies on the validity or reliability of the modified International Society Of Limb Salvage (ISOLS)/ MSTS 1993 27 scoring system or the modified Convery scale 64 were found.

Functional Outcome
The TESS, MSTS 1993, or MSTS 1987 results were presented in 23 of the 37 studies. In some studies, not all the required results (mean or SD of FO results, number of lower-vs upper-extremity patients, number of STS vs other tumors) were presented. Additional information was requested from 15 corresponding authors. Five replied, with two supplying the requested additional information.
Of the 23 studies, 3 were excluded from our FO report. The reasons were as follows: only the difference in means was reported; 18 several TESS scores from the same patients were included (in lower-extremity patients, a mean of 2.6 TESS results for one patient was included); 21 and two studies used the same data set (a study by O'Sullivan was excluded). 39,52 Means and SDs for a study by Saebye et al. 19 were calculated using median and IQR results. 65,66 The SDs for studies by Friedmann et al. 28 and Pradhan et al. 33 were approximated based on range using the range rule calculation formula (SD = [max -min]/4). The preand postoperative function results are presented in Table 3.
After study selection and additional gathering of information, some studies still included upper-extremity or bone sarcoma patients.
The pooled mean and SD were calculated for TESS and MSTS 1993 results. Of 19 studies, 2 were excluded due to missing SD data. The pre-and postoperative TESS and MSTS 1993 pooled mean and SD results are presented in  and MSTS 1993 (n = 1) results included other extremity tumor patients in addition to STS patients. The number of included patients is described in Table 3.
The preoperative pooled mean TESS and MSTS 1993 scores were respectively 83.3 and 86.6, and the postoperative scores were respectively 83.3 and 86.2 (out of 100). In the pooled mean and SD analysis, the proportion of lower-extremity and STS patients were respectively 88% and 94%.
As a sensitivity analysis, the mean overall postoperative FO in publications including also upper-extremity or bone sarcoma patients was investigated. In publications including upper-extremity patients (7 of 17 publications), the mean overall TESS score was 86.7 (5 publications), and the MSTS score was 89.0 (5 publications). In publications that included also bone sarcomas (3 of 17 publications), the mean overall TESS score was 63.1 (2 publications), and the MSTS score was 90.3 (1 publication).

Quality of Publications
The mean MINORS score was 62.2 (median, 62.4; range, 31-92). For the three RCTs, the respective scores were 96, 39 96, 52 and 88. 36 For the observational studies, the mean MINOR score was 60.2 (median, 62.5; range, 31-92). The MINORS scores for each study are presented in Table 1.

DISCUSSION
Based on the current systematic literature review, the most frequently used measurement types used to measure FO for surgically treated adult lower-extremity STS patients are clinician-reported outcome and PRO measurements. The most frequently used measures are the TESS and the MSTS 1993 questionnaires. Of the 10 previously reported measures, 3 were proven to provide valid scores for lower-extremity sarcoma patients. Most of the studies on lower-extremity STS FO have poor methodologic quality.

Methods and Measures of FO
The majority of the included studies used clinician-reported outcome measurement. In the last 10 years, the use of PRO measurements has increased. Furtado et al. 2 considered performance outcome an important component of FO in sarcoma patients. This clearly is a minority position in their study because no performance outcome measurement type was found. Performance outcome measurement type has been used more frequently in publications with smaller study samples. 2 This may be due to the greater time requirements and equipment and personnel costs required for performance outcome measurements. 2 In the literature review of Furtado et al. 2 that investigated performance outcome measurement techniques, the study size ranged from 4 to 82 patients. In the current review, one of the inclusion criteria was a minimum of 20 lower-extremity STS patients. In our study, the sample size ranged from 25 to 728 patients.
When FO is assessed in a study, it is important to avoid loss to follow-up evaluation. Loss to follow-up evaluation less than 5% is indicated in the MINORS publications' quality measurement tool as an indicator of a good-quality study. 14 More important than the measurement type is that the outcome measure provide valid and reliable scores in  According to the current review, the TESS and MSTS 1993 questionnaires have been used most frequently. Tang et al. 3 observed that the most frequently used outcome measures were the TESS, the MSTS 1987, and the SF-36. However, they observed that the TESS was used only four times, the MSTS 1987 three times, and the MSTS 1993 two times. They also found that when the MSTS was used, the 1987 version was preferred. Likewise, Groundland et al. 7 found that the MSTS is the most widely used FO measure for pediatric patients after limb-preservation surgery.
Winnette et al. 9 investigated patient experience with STS in all anatomic locations, including abdominal sarcomas. They found that in extremity patients, the RNL and the TESS were used three times each. The MSTS questionnaire was not presented in their review because it examined only PRO measures.
Wilson et al. 8 found that the MSTS was the most frequently used measure and that the TESS was presented only once for pelvic sarcoma patients. Some studies have used both the older and newer versions of the MSTS questionnaires in the same study. 18,41 Other measures such as the SF-36, the FFI, the Karnofsky score, the RNL, the SMFA, the modified MSTS 93/ISOLS, and the modified Convery scale 64 have been used but much less frequently.
The MSTS 1987 54 is a clinician-reported outcome assessment that evaluates seven parameters of FO (mobility, pain, stability, deformity, strength, functional, and emotional acceptance). The MSTS 1993, 55 a revised version of the MSTS 1987, also is assessed by physicians. However, the MSTS 1993 is more limb-specific than the older version and includes six parameters. Pain, function, and emotional acceptance are measured for both extremities. For the lower-extremity, use of walking aids, gait, and walking are evaluated. Hand positioning, dexterity, and lifting ability are evaluated for the upper extremity.
The TESS was developed for limb sarcoma patients. As a PRO questionnaire, it measures physical disability and performance in activities of daily living. 53 Janssen et al. 67 found that the TESS has adequate coverage and is more reliable than the MSTS questionnaire. In addition, they found that the MSTS score is the least reliable, as indicated by a high standard error of measurement for the complete range of ability scores. Although both versions of MSTS are the most frequently used tool for assessing FO, the TESS tool is more frequently used than the two MSTS versions separately. The TESS also was developed later than the MSTS measures. [53][54][55] The study by Tang et al. 3 included only articles published in last 10 years. The TESS was used in four publications, the MSTS 87 in three publications, and the MSTS 93 in two publications. Because of the relatively wide use of MSTS scores, the MSTS questionnaires permit comparison of results more easily and widely than in other studies. On the other hand, the MSTS questionnaire must be completed by a clinician, thus limiting its use in studies with larger samples.
The TESS questionnaire was designed to be completed by patients and can therefore be administered by mail or electronically. This is important particularly in long-term follow-up studies. Using the TESS may avoid the need for a physician consultation, thus saving resources. Using PRO measures may make it easier for patients to participate in the study. Also, minimal clinically important differences (MCIDs) are calculated for the TESS. 68 Because the MSTS measure was developed by orthopedics for surgically treated bone and soft tissue musculoskeletal tumors, it may not capture the effects of radiotherapy, chemotherapy, and other factors that also affect FO. 16,36,48,49,69 Although all FO measurements have limitations, the use of standardized instruments is important. Using the TESS and MSTS measures allows for benchmarking and comparison of results with other studies.
According to the results of our literature review, most studies used only one FO measure. The use of more than one FO measure provides more precise information on FO. For example, the TESS measures activity limitations, whereas the MSTS measures impairment in extremity sarcoma patients. 53,55 On the other hand, using too many time-consuming questionnaires could lead to decreased participation and loss to follow-up evaluation. In addition, using more than one FO measuring tool might not be clinically relevant. 70 Study participants should be informed about the patient burden, including how many items must be completed, how long it takes to complete the questionnaires for participation in the assessment, and how many assessments are made. Questionnaires implemented in a clinical study should be carefully chosen to ensure that they are fit for the purpose. [71][72][73][74] Most retrospective studies required a minimum followup period of 1 year after surgery. In addition, FO showed a decrease up to 6 months after the surgery. Beyond 1 year, FO plateaued before the scores returned to approximate pre-treatment levels 1 year after surgery. 39,43,75 Rivard et al. 24 did not observe significant changes in the TESS and MSTS scores from the preoperative period to 6 months after surgery. By 12 months, the scores showed significant improvement. This is an important factor when prospective data measurement in retrospective samples is planned. In cross-sectional studies and other prospective studies, it is important to consider measuring FO at least 1 year after the surgery, in addition to other measurement time points.

Validity of Measures
The validity of a measurement tool is a multi-dimensional term. The most important measurement property is content validity. The measure should be relevant, comprehensive, and comprehensible with respect to the construct of interest and the study population. 76 Structural validity is the degree to which the scores adequately reflect the dimensionality of the construct to be measured. 39,77 This review considered the measure to be validated when it was tested in lower-extremity patients and reported to be a valid measurement tool.
Half of all the studies reviewed (23/37 studies) used validated tools. In 1999, Davis 4 found that studies often did not use standardized, validated measures. Several measurement tools currently in routine use had been available for only a few years in 1999. Based on our review and on the previous literature, it seems that although the psychometric properties of several PRO and clinician-reported outcome measures have been extensively studied in the last decade, performance outcome measures lack quality in this field. 2 In a systematic review of objective measurement methods, Furtado et al. 2 found that only a few studies investigated aspects of validity of outcome measures. For example, they found that only 1 in 18 studies investigated reliability. They concluded that this raises questions about the accuracy of the objective (in our terminology, performance outcome) measures and veracity of the results. 2

FO of Lower-Extremity STS Patients
The data sample in the current review included a heterogeneous group of lower-extremity STS patients. According to the current review and analysis, the postoperative FO for patients is relatively good. The mean postoperative FO measured by MSTS for patients with extremity osteosarcoma is reported to range between 40% and 76.6%. 78 In pediatric bone sarcoma patients, the reported postoperative MSTS mean scores ranged from 76% to 82.5%. 7 In a review of pelvic sarcoma patients, the mean MSTS score was 65%. 8 Amputation seems to decrease FO. 23,78 Our FO data included relatively few articles on amputation. The MSTS 1993 FO analysis did not include any articles and the TESS analysis included only two articles on amputations. 23,42 In the work of Davis et al. 42 the mean TESS score for the patients with amputations was 74.5 versus 85.1 for the patients with limb-sparing procedures. In the study by Furtado et al. 23 the mean TESS score was only 56.4. Han et al. 78 performed a meta-analysis of osteosarcoma patients and observed that the number of amputation patients was relatively higher. The mean MSTS for the amputation patients ranged from 41.1% to 71% versus 70% to 76.6% for the patients with limb-sparing procedures, which is lower than the result of the current review. As expected, some studies show that amputation decreases FO, 23,78 whereas others have failed to show a significant difference compared with limb-sparing treatment. [79][80][81] Based on this review, lower-extremity STS patients seem to achieve preoperative function levels postoperatively during longterm follow-up evaluation ([ 1 year).

The Quality of the Publications Reporting FO
This review included different types of studies examining varied methodologic quality. The use of the MINORS tool showed that most studies investigating lower-extremity STS FO are lacking in methodologic quality (median MINORS score, 62.4%; range, . This is the first systematic review of FO measurement to focus on lower-extremity STS patients. The strengths of this review were the use of PRISMA guidelines and the methodologic quality assessment for this topic. The main weakness of the current review was the nonstructural search strategy for validity of FO measurement tools used. In addition, one exclusion criterion was to have ''fewer than 20 lower-extremity STS patients in the study (considered a pilot study).'' This might have excluded some relevant studies with small samples. Because the authors attempted to overview the existing literature on FO measurement of lower-extremity STS patients as thoroughly as possible, they included some studies containing small amounts of upper-extremity and bone sarcomas. Because the number of non-lower extremity STS patients in the reviewed studies was small (88% lower-extremity and 94% STS patients), and because the sensitivity analysis presented similar FO results for publications reporting on upper-extremity or bone sarcoma patients, the effect on FO results was small.

CONCLUSION
The most frequently used FO measurements for surgically treated adult lower-extremity STS patients are clinician-reported outcome and PRO measurements. The most widely used measure is the patient-reported TESS instrument, which has been shown to produce reliable and valid scores in assessing FO for lower-extremity sarcoma patients. Using the TESS and MSTS measures allows for benchmarking and comparison of results with other studies. Functional outcome scores seem to return to pretreatment levels 1 year after surgery. Thus, measurement of FO also should be performed at least once 1 year after surgery or later in addition to other time points. This review indicates that quality is lacking in FO studies examining surgical treatment of lower-extremity STS.
ACKNOWLEDGMENTS Open access funding provided by University of Helsinki including Helsinki University Central Hospital.
DISCLOSURE There are no conflicts of interest.
OPEN ACCESS This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.