Introduction

Evaluation of a patient’s functional outcome after TKA is challenging and requires an objective and a functional assessment. Greater importance has been attached to functional scoring systems in the form of patient-reported outcome measures, which assess knee function in activities specific to each patient [19, 20]. The past decade has seen a substantial increase in the number of TKAs performed in Korea, rivaling the use reported in some Western countries [14]. With an increase in large multicenter and international studies across borders, ethnicities, and cultures, we need to integrate such functional scores of Korean patients in these studies and to compare results of TKAs in Korea and other countries.

The Knee Society Clinical Rating System was developed in 1989 as an objective scoring system to assess patient function and rapidly became a popular method for reporting outcomes after TKA [13, 20]. In 2011, the new Knee Society Knee Scoring System© (2011 KS Score©) [19] was developed, which consists of objective items from the previous Knee Society scores and completely new subjective items in the form of patient-reported outcome measures. The 2011 KS Score has been validated, confirming its overall reliability and consistency and those of its different domains, and has been reported to be without the ambiguities of the previous scoring systems [19]. However, to our knowledge, the 2011 KS Score has not been adapted and validated for use in Korean patients undergoing knee surgery. This scoring system not only should be translated, but also culturally adapted to the native Korean-speaking population through the process of crosscultural adaptation.

The purpose of this study therefore was to establish a Korean version of the 2011 KS Score for Korean-speaking patients who undergo TKA, developed through crosscultural adaptation and to investigate the psychometric properties of the Korean version of the 2011 KS Score in terms of: (1) test-retest reliability, which refers to the degree to which test results are consistent over a brief internal consistency, defined as coherence among the different scale components, without any treatment change; (2) construct validity, defined as the degree to which a test measures what it claims or purports to be measuring when compared with scores with proven validity such as the Korean WOMAC and Korean SF-36; and (3) responsiveness, defined as the ability to reflect changes in patient status by comparing preoperative and postoperative results.

Materials and Methods

Translation Procedure

Korean translation was performed using guidelines provided by Guillemin et al. [8, 9] using the crosscultural adaptation process. This process not only ensures an appropriate linguistically translated version, but also adapts to maintain content validity across cultures. The crosscultural adaptation process is conducted in six stages: translation, synthesis, back-translation, expert committee review, pretesting, and submission for appraisal. Briefly, the English version of the Knee Society Score was translated separately by three native Korean bilingual translators. After uniform agreement was reached among the three translators, a pretest Korean translation version was established. This version was back-translated by two bilingual native English speakers who were blinded to the original English version. We continued this process until a final version was produced that had no disagreements between the English and Korean versions. When the consensus version was formed, the back-translated English version was sent to and approved by the inventor of the Knee Society Score. The final version was pretested in 20 Korean patients with knee osteoarthritis.

Study Design

A total of 350 patients were requested to participate in our study, from which 123 patients met the inclusion exclusion criteria and were available for final analysis (Fig. 1). All patients had knee osteoarthritis and were scheduled to undergo TKA at our institute between June 2013 and February 2014. Of the patients, 90% (111 of 123) were females; to address this issue of gender disparity we later added 53 males to this mixed gender population as a separate analysis. The mean BMI was 27 kg/m2 (mean ± SD, 4 kg/m2) (Table 1). All patients with knee osteoarthritis undergoing TKA and available to undergo 1-year postoperative followup were included in the study. Patients with hip or spine disease, congenital deformity, knee infection, history of surgery on the ipsilateral or contralateral leg, and those who declined to participate in the study were excluded. Patients who had had a second, contralateral TKA within this 12-month period also were excluded to ensure that the results reflected the outcome of the index operation and not a subsequent operation. The participants were informed properly about the study and signed the consent form.

Fig. 1
figure 1

The flowchart shows patient enrollment and exclusion criteria in our study. * = patients not meeting inclusion criteria.

Table 1 Demographic data of the total patient population

To address the issue of gender disparity in our study, after completion of our study we identified 71 males with knee osteoarthritis who underwent TKA at our institute between February 2014 and December 2016 from our patient database (as per our routine followup protocol all patients complete three questionnaires). Of these, 53 met the same inclusion and exclusion criteria as study participants, with at least 1 year followup. We generated age-matched control groups in the ratio of 2:1 with these 53 men and 106 women from the earlier mixed- gender population to comparatively evaluate validity and responsiveness of the Korean New KSS in Korean men. Although we could not provide comparative reliability data, data of validity and responsiveness are enough to resolve the concern surrounding gender composition of our study. Approval for this study was obtained from the ethical and review boards of our institute.

Questionnaires

All patients were asked to complete a questionnaire containing the Korean-translated New Knee Society Score (KSS), Korean WOMAC, and Korean SF-36. All the questionnaires initially were filled out by patients and were checked for missing and appropriate responses by a research assistant (JSJ) during subsequent visits, in the presence of the patient. A research assistant was employed to limit responder burden, because patients were being asked to simultaneously complete different questionnaires containing many similar questions. Clinical examination and radiologic assessment were performed by a trained fellow (SJK) in knee arthroplasty following a written protocol so that all the patients were examined using the same method. All personnel involved in data collection were trained by the lead author (TKK) and used a standardized method of data collection following a written protocol to minimize interobserver variability as much as possible. Written guidelines regarding how to assess pain also were developed and given to all the investigators. Data collection was started 4 weeks (Korean New KSS, Korean WOMAC, and Korean SF-36) and 1 day before the operation (Korean New KSS) and 1 year after the operation (Korean New KSS, Korean WOMAC, and Korean SF-36).

To evaluate the test-retest reliability of the Korean-translated New KSS, all patients completed the questionnaires 4 weeks apart without any intervening treatment (4 weeks, 1 day before the operation) with the assumption that during this period, the spontaneous change in the condition of their osteoarthritis was minimal. Reliability was assessed by using intraclass correlation coefficients (ICCs) with 95% CIs [4, 10]. Internal consistency, defined as coherence among the different scale components, was assessed by using Cronbach’s alpha coefficient.

The ideal method of measuring validity is to compare it with a gold standard test; however, currently, no gold standard test has been established that perfectly reflects pre- and post-TKA status [5, 7]. Owing to the lack of such a test, the domains of the Korean-translated New KSS were tested by comparing them with the appropriate subscales of the Korean WOMAC and Korean SF-36 by using Spearman’s coefficient. These scoring systems were selected because their test validities have been proven in previous studies [2, 11].

Responsiveness is a nonstatistical term that shows the ability to reflect changes in preoperative and postoperative results, in which the higher the responsiveness, the greater the ability to detect changes. Responsiveness was evaluated by comparing preoperative and postoperative scores among the Korean New KSS, Korean WOMAC, and Korean SF-36 by using the standardized response means (SRM). These scoring systems were selected based on prior rigorous psychometric evaluations [2, 11].

Statistical Analyses

An ANOVA was used to calculate ICC with 95% CI between the first and second applications of the Korean New KSS to assess test-retest reliability. An ICC greater than 0.8 correlates with excellent reproducibility. Internal consistency was assessed by using Chronbach’s alpha. An alpha of 0.7 was considered fair; 0.8, good; and 0.9, excellent. Construct validity was estimated by using the correlation between the domains of the 2011 KS Score and the other questionnaires (Korean version of the WOMAC and SF-36) with Spearman’s coefficient (r). These correlations can be converging (positive) or diverging (negative). Correlation was considered strong if the value was greater than 0.5; moderate if the value was between 0.5 and 0.35; and weak if the value was less than 0.35. Responsiveness was evaluated by comparing the responsiveness of the Korean New KSS with the treatment (TKA) by using the SRM, calculated as the mean difference between the preoperative and 12-month postoperative scores divided by the SD of the score; the larger the SRM value, the greater is the responsiveness. A SRM of 0.2 to 0.8 reflects a small change; 0.5 to 0.8, a moderate change; and greater than 0.8, a large change. The means and SDs of the SRMs of the two measurements were estimated with a jackknife procedure, and then tested with a paired t-test [15, 18]. Because TKA is effective for reducing patients’ pain and improving their quality of life, we assumed a large standardized response mean (> 0.8) for all the scores. The preoperative and postoperative Korean New KSS scores were assessed against the Korean WOMAC and Korean SF-36 questionnaires by using the SRM. All statistical analyses were performed by a trained statistician (YGK) using IBM SPSS, Version 22.0 (IBM Corp, Armonk, NY, USA). All the scores were reported as mean and SD, and p values less than 0.05 were considered statistically significant.

Results

The test-retest reliability and internal consistency proved excellent or good to excellent for all domains of the Korean New KSS. All the domains of the Korean New KSS exhibited an ICC between 0.69 and 0.85, depending on the domains tested, which proves adequate reproducibility. Internal consistency, as indicated by Chronbach’s alpha, ranged from 0.83 to 0.92 for the individual subscales and was good to excellent for all the domains (Table 2). All the subscales of the Korean New KSS (symptoms, satisfaction, expectation, and activity) had an ICC greater than 0.8. However, in the activity subscale, only functional activities and discretionary activities showed ICCs of 0.73 and 0.71 and Chronbach’s alpha of 0.84 and 0.83, respectively.

Table 2 Test-retest reliability and internal consistency of the total scores

The Korean New KSS overall scores correlated well with the Korean WOMAC and Korean SF-36 scores. When compared with the WOMAC, all the domains of the Korean New KSS correlated either strongly or moderately with the individual subscales of Korean WOMAC score except for expectation (r ≥ 0.35, p < 0.001). Furthermore, all the domains of the Korean New KSS correlated with the Korean WOMAC total score (symptom: r = −0.53, p < 0.001; satisfaction: r = −0.49, p < 0.001; expectation: r = 0.21, p = 0.019; activity: r = −0.53, p < 0.001) (Table 3). When compared with the Korean SF-36, the Korean New KSS showed good correlation of its satisfaction and activity domains to all or most of the subscales of SF-36 (Table 4). The satisfaction domain showed a weak positive correlation with all the subscales of the Korean SF-36 except general health (r = 0.32, p < 0.001; r = 0.23, p = 0.012; r = 0.33, p < 0.001; r = 0.23, p = 0.012; r = 0.28, p = 0.002; r = 0.21, p = 0.022; r = 0.21, p = 0.022; r = 0.29, p = 0.001; and r = 0.21, p = 0.022, respectively). The activity domain showed a strong positive correlation with physical function (r = 0.62, p < 0.001) and physical component summary (r = 0.52, p < 0.001), moderate with physical role (r = 0.46, p < 0.001), and weak with bodily pain (r = 0.26, p = 0.003) and social function (r = 0.31, p = 0.001). The symptom domain also exhibited a similar moderate positive correlation with physical function (r = 0.41, p < 0.001) and weak with bodily pain (r = 0.22, p = 0.016), social function (r = 0.20, p = 0.025), and physical component summary (r = 0.26, p = 0.003). The expectation domain showed a weak negative correlation with physical function (r = -0.22, p = 0.017) and weak positive correlation with general health (r = 0.21, p = 0.017).

Table 3 Construct validity between the Korean New KSS and the Korean WOMAC
Table 4 Construct validity between the Korean New KSS and the Korean SF-36

The Korean New KSS was found to be more responsive than the Korean WOMAC and Korean SF-36. All the domains of the Korean New KSS, except for expectation, showed a large change (> 0.8), calculated as SRM. This analysis showed that the Korean New KSS having a SRM of 2.03 (p < 0.001) was more responsive than the Korean WOMAC with a SRM of 1.88 (p < 0.001) and the Korean SF-36 physical and mental component summaries, with SRMs of 1.14 (p < 0.001) and 0.68 (p < 0.001) respectively. The SRM of the Korean KSS symptom score was 2.23 (p < 0.001) which was higher than the Korean WOMAC pain (2.12, p < 0.001) and SF-36 bodily pain scores (1.14, p < 0.001). Furthermore, regarding the functional scale, the Korean New KSS had a mean score of 1.85 (p < 0.001) which indicates that it was more responsive than the Korean WOMAC with a score of 1.75 (p < 0.001) and the SF-36 physical function score of 1.67 (p < 0.001) (Table 5).

Table 5 Responsiveness of the Korean New KSS compared with the Korean WOMAC and Korean SF-36

Construct validity and responsiveness for age-matched control groups between the Korean New KSS, Korean WOMAC, and Korean SF-36 also showed good correlations. Gender analysis between the Korean New KSS and Korean WOMAC showed strong to moderate correlations with the exception of the expectation subscale which correlated weakly (Table 6). Similarly, the Korean New KSS and Korean SF-36 showed moderate to weak positive correlation when men and women were compared individually (Table 7). Furthermore, gender analysis of responsiveness showed that the Korean New KSS had higher SRMs when compared with the Korean WOMAC and Korean SF-36 (Table 8).

Table 6 Construct validity between the Korean New KSS and the Korean WOMAC for men and women
Table 7 Construct validity between the Korean New KSS and the Korean SF-36 for men and women
Table 8 Responsiveness of the Korean New KSS compared with the Korean WOMAC and Korean SF-36

Discussion

The recently developed 2011 KS Score is widely used for patients undergoing TKA. For a measure to be effective across cultures, it not only has to be translated well linguistically, but also has to be adapted culturally to maintain the content validity of the instrument [8]. The current study was conducted to develop the Korean version of the 2011 KS Score for Korean-speaking patients who undergo TKA, which was developed through crosscultural adaptation, and to investigate its psychometric properties. Similar versions of the 2011 KS Score have been translated to Japanese [10], French [6], Dutch [22], and Chinese [16] through the process of crosscultural adaptation, and all have been shown to have good psychometric properties. The current study similarly shows that the Korean New KSS is a reliable, consistent, and valid instrument to evaluate the functional outcomes and expectations of Korean-speaking patients before and after TKA (Appendix 1. Supplemental material is available with the online version of CORR ®.).

We acknowledge certain limitations of our study. First, because all three questionnaires were administered simultaneously 4 weeks preoperatively and 1 year postoperatively, the patients could have been affected by responder burden, which might lead to similar or missing responses. This was addressed by having a trained research assistant review the questionnaires for such responses in the presence of eligible patients without any attempt to influence the response. However, we acknowledge that by having research personnel assist in the administration of what is designed as a patient-reported outcomes tool, the performance with the Korean version of the KSS with respect to reliability and validity we observed in this study may be somewhat better than might be achieved with unassisted patients in general use. Second, the demographic features of TKA use in Korea, such as the predominance of older women, should be taken into account; further studies are needed to investigate these psychometric properties in Korean men, although we attempted to address that to some extent with our additional analysis. Third, although our translation methods were rigorous, certain inconsistencies might remain in the translation from one language to another. If better words or phrases are suggested, they should undergo validation by using the same standardized protocol.

The test-retest reliability was excellent for all the domains of the Korean New KSS, showing a high degree of concordance (ICC) and good to excellent internal consistency. Reliability refers to the degree to which test results have a brief internal consistency without any treatment change, and internal consistency is defined as the coherence among scale components. Some authors quote a period between 2 days and 2 weeks for the second application of a test, which is an adequate compromise between recall bias and change in disease condition [17]. However, according to Terwee et al. [21], the appropriateness of the period chosen is not as important as the justification of the period described. In the current study, reliability and internal consistency were assessed by asking patients to complete the questionnaires twice during a 4-week interval. The Korean New KSS showed excellent reliability for all domains (ICC, 0.69–0.85), showing good reproducibility. It also showed good-to-excellent internal consistency in all the subscales (Cronbach’s alpha, 0.83–0.92). Our study showed an ICC of 0.69 to 0.85, which is similar to the results of Japanese [10], French [6], and Dutch [22] studies that translated the 2011 KS Score to their respective languages, with ICCs ranging from 0.65 to 0.88, from 0.84 to 0.97, and from 0.73 to 0.92, respectively. Similarly, our study showed a Cronbach’s alpha of 0.83 to 0.92, which also is in agreement with those of other translated versions of the 2011 KS Score and the original English KSS (Cronbach’s alpha, 0.68–0.95) [19], showing g good-to-excellent reliability and internal consistency (Table 9). Although we could not provide comparative data regarding reliability for age-matched control groups as reliability in our study was assessed at an interval 4 weeks apart, our data regarding validity and responsiveness appear adequate to address the gender disparity of our study.

Table 9 Comparison of test-retest reliability and internal consistency

The Korean New KSS showed adequate construct validity when compared with the Korean WOMAC and Korean SF-36. Because no gold standard measure has been established to evaluate validity post-TKA, correlations between the preoperative scores of the Korean New KSS and those of the Korean WOMAC and Korean SF-36 were determined. These scoring systems were selected because their test validities have been proven in previous studies [2, 11]. All the domains of the Korean New KSS correlated well with the Korean WOMAC, except for the expectation subscale. All the domains tested showed strong or moderate correlation with the individual subscales of the Korean WOMAC. The satisfaction domain of the Korean New KSS showed a weak positive correlation with all the subscales of the SF-36 except general health, which might be expected owing to the high satisfaction rates post-TKA. Similarly, some studies have indicated that the current TKA population is physically more active than in the past and observed that some patients start participating in physical activities postoperatively, which they were not able to do preoperatively [1, 12]. Therefore, the activity domain becomes an important tool for post-TKA evaluation. In the current study, the activity domain of the Korean New KSS showed a strong positive correlation with physical function (r = 0.62, p < 0.001) and physical component summary (r = 0.52, p < 0.001), moderate with physical role (r = 0.46, p < 0.001), and weak with bodily pain (r = 0.26, p = 0.003) and the social function (r = 0.31, p = 0.001) component of the Korean SF-36. Similarly, the symptom domain also exhibited similar moderate positive correlation with physical function (r = 0.41, p < 0.001) and weak positive correlation with bodily pain, social function, and the physical component summary (r = 0.22, p = 0.016; r = 0.20, p = 0.025; and r = 0.26, p = 0.003, respectively). The expectation domain showed a similar correlation with the physical function and general health subscales of the Korean SF-36. Our study shows low levels of correlation when compared with the Dutch [22] and French studies [6] but similar correlations when compared with the Japanese version of the 2011 KS Score [10]. A possible explanation for this finding could be the difference in cultural background between the European and Asian populations. Nevertheless, such differences in correlation coefficients do not reduce the usefulness of our study but rather indicate that further studies are needed to identify the reason for these findings and their effect on the instrument. Our data regarding age-matched control groups showed similar values for Korean men and women when construct validity was analyzed individually, thereby eliminating any gender-based biases from our study.

The Korean New KSS was the most responsive scale when compared with the Korean WOMAC and Korean SF-36. Responsiveness shows the ability of a scale to reflect changes in perioperative and postoperative results; the higher the responsiveness, the greater the ability of a scale to detect changes [3]. Analysis also revealed that the Korean New KSS symptom score was more responsive when compared with the WOMAC pain and the SF-36 bodily pain scores. However, regarding the functional scale, the Korean New KSS was more responsive than the Korean WOMAC and the Korean SF-36. Similarly, for age-matched control groups, the Korean New KSS also was more responsive when compared with the Korean WOMAC and Korean SF-36, thereby confirming our current study population is representative of the Korean population undergoing TKA.

The Korean version of the 2011 KS Score appears valid, reliable, and responsive in Korean-speaking patients who undergo TKA for knee osteoarthritis. Therefore, it now can be used as a valuable metric to assess functional outcomes and expectations of Korean patients who undergo TKA. Because the population of men undergoing knee arthroplasty in Korea is small compared with that of women, further studies will be required to investigate the properties of the Korean New KSS among men.