Background

The knee is the biggest and most complex joint in the human body and is highly prone to injury [1]. With outdoor physical activities being performed more frequent and more complex in modern times, the incidence of knee injuries rises as well, particularly in young people and athletes [2].

With an increasing concern on treatment of knee injury and rehabilitation, a number of questionnaires have been put into practice that may help doctors evaluate the severity of knee injury and recovery after treatment, such as the Oxford Knee Score (OKS) [3], International Knee Documentation Committee Subjective Knee Form (IKDC) [4], Tegner Activity Scale [5], Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) [6], Knee Injury and Osteoarthritis Outcome Score (KOOS) [7], and the Lysholm knee score [8]. These questionnaires tend to focus on clinical manifestations and the patient’s subjective feelings to evaluate the impact of injury on knee function and overall quality of life, which may help to offer better diagnosis and treatment options.

The Lysholm knee score, published in 1982 [8], was initially used to evaluate the functional state of the patient after anterior cruciate ligament (ACL) injury, and follow-up researches have proven its value in functional evaluation for other knee injuries, including patellofemoral pain syndrome [9], meniscal injuries [10], medical patellar plica syndrome [11], patellar dislocation [12], and various chondral disorders [13]. Compared with other knee scoring scales, the Lysholm knee score has many advantages. For example, OKS is only applicable for functional evaluation in knee osteoarthritis, and IKDC and Tegner Activity Scale are only used to evaluate knee ligament [36]. WOMAC and KOOS have 24 and 42 items, respectively, and the average time to complete the questionnaires ranges from 5 to 10 min [14], which is considered lengthy in the realm of knee questionnaires. The Lysholm knee score, on the other hand, has broad applicability and has only eight items that can be completed by patients in a short period of time [8, 15].

Because of these benefits, the Lysholm knee score has been used by clinicians and researchers for over three decades. During the past 5 years, over 700 articles cited in PubMed have reported outcomes using the Lysholm knee score. Additionally, the original English version of Lysholm knee score has been further translated and validated into many languages [1, 15, 16]. Despite China being the most populous country and Mandarin being the most prevalent language in the world, but a Chinese version is still absent. Thus, it is essential for presumably the largest patient population with knee injuries in the world to have.

When a reliable, valid questionnaire is being used in populations with different cultures, it is necessary to test the psychometric properties of the questionnaire, rather than to simply translate the content, in order to avoid evaluation errors caused by cultural differences [17, 18]. Hence, we aimed to translate and adapt the Lysholm knee score into a Chinese version (C-LKS) and to evaluate the psychometric properties of the C-LKS in a cohort of native Chinese-speaking patients with ACL injuries. These psychometric properties assessed were acceptability, reliability, validity and responsiveness.

Methods

Translation and cross-cultural adaptation

Translation of the English original Lysholm knee score followed previously published guidelines [19, 20]. The entire process consisted of five steps: 1) Forward translation from English to Chinese by two bilingual translators independently, who are native Chinese speakers and well conversant in English. One of the translators is an orthopedic surgeon in our department (the author, WW), the other is a full-time translator (RL) with no medical background, and is not informed of our investigation; 2) Revision and modification of the questionnaire regarding language expressions and cultural differences was discussed by the two forward translators and other research members. A primary version of C-LKS was then obtained; 3) Backward translation by two independent native English translators (FA and GD) who are well conversant in Chinese. The primary version of C-LKS was translated from Chinese to English. The two translators had medical backgrounds, with no knowledge of the original Lysholm knee score; 4) All researchers and translators convened and had discussions to solve any discrepancies, ambiguities and other language expression issues that existed in the questionnaire, and the pre-final version of C-LKS was obtained; and 5) Twenty patients with ACL injuries patients were invited to complete the pre-final C-LKS for assessment, and feedbacks were collected. The researchers met once more to make final adjustments according to these feedbacks and the final version of C-LKS was obtained.

Patients and data collection

Owing to the fact that the Lysholm knee score was initially designed for patients with ACL injuries [8, 15, 21], we also recruited patients with ACL injuries to minimize deviations. Cases involved in the present study were mainly recruited among patients with ACL injuries scheduled for arthroscopic ACL reconstruction at our hospital [22]. Our inclusion criteria were: Aged over 16 years old with independent signing authority; Chinese as the first language with adequate capability to read and complete a questionnaire; and a definitive diagnosis of ACL injury as determined by arthroscopy [21]. Patients with other complicated knee injuries, such as meniscal injuries or patellar dislocation; with a history of lower limb or spine surgery; patients who had surgery within a month of the study; and patients with a history of systemic disease and/or malignancy, were excluded from this study. Our study met the quality criteria proposed by Terwee and associates [23] for measurement properties of health status questionnaires, which required the results from at least 100 patients to perform internal consistency analysis and from at least 50 patients for floor or ceiling effects, reliability, and validity analyses. All patients involved in the study had thoroughly read and signed the informed consent. This study was approved by the ethics committee in the local hospital (No. CHEC2013-199).

The patients provided demographic information, such as sex, age and weight, on the first day of enrollment and independently completed the four questionnaires, C-LKS, WOMAC, IKDC, and Medical Outcomes Study Short-Form 36 (SF-36) in a quiet meeting room, followed by a C-LKS again 1 week later before receiving reconstruction surgery, in order to evaluate the test-retest reliability of the questionnaire. Patients were also reached via mail or telephone 6 months postoperatively to complete the C-LKS a time to help evaluate the responsiveness of the questionnaire.

Questionnaires

The Lysholm knee score had eight items that evaluated walking gait, frequency of knee locking, frequency of pain, stair climbing, need for external support, body stability, joint swelling, and squatting ability. A total score (ranged 0 to 100 points) was calculated from the patient’s answers that best reflected his/her functional state. A lower score was indicative of poor knee function [8].

IKDC, first published in 2001, consists of 10 dimensions and evaluates the clinical symptoms, activity states, and function of the patient. The answers are checked accordingly, and the final score (range 0 to 100 points) is then calculated by a certain formula. WOMAC is a self-reported questionnaire specifically designed to evaluate the functional state of the knee or hip. It consists of 24 items divided into three subscales, namely, Pain (5 items), Stiffness (2 items), and Physical functioning (17 items). The final score (range 0 to 96 points) is the sum of all items. Unlike the majority of other questionnaires, a lower score represents a better functional state of the joint. SF-36 is a generic questionnaire for quality of life comprised of 35 items and eight dimensions that evaluate mental health and physiological and social functioning. Each dimension has its unique scoring system, and the final score is converted to percentages. A lower SF-36 score suggests a poorer quality of life or functional state. The above three scales have existed Chinese versions and are proven with excellent reliability, validity and responsiveness [2426].

Psychometric assessments and statistical analysis

To evaluate the acceptability of the questionnaire in the general population, we asked each patient in our study cohort if they had any difficulties understanding the content. We also calculated the miss rate of every item, and a >5 % miss rate of a certain item suggested an existing problem regarding acceptability or comprehension [27]. We also recorded the average time required to complete the questionnaire.

A distribution of scores was analyzed to determine whether a ceiling or floor effect existed. A result of <30 % of either results was considered acceptable [21].

Reliability tests included evaluations for test-retest reliability and internal consistency. Test-retest reliability was performed by comparing the former two C-LKS results of which the evaluation norm was the intraclass correlation coefficient (ICC), which was derived from a two-way analysis of variance (ANOVA) in a random effects model. These results displayed good reliability when ICC > 0.6, and excellent reliability when ICC was >0.8 [28]. Cronbach’s alpha was used to evaluate internal consistency when it was >0.7, >0.8 and >0.9, the questionnaire was regarded as having acceptable, good and excellent internal consistency, respectively [23]. We further depicted Bland-Altman plots to observe for systematic error between the investigations [29, 30].

Validity tests were performed particularly evaluating content validity, construct validity and external validity. For content validity, a rehabilitation expert (QW) and three orthopedic experts (ZW, YK and the author WX) helped to assess the understanding and relevance of all of the items in the C-LKS. Good construct validity referred to a questionnaire that correlated well with measures of the same construct (convergent validity), while poorly correlated with measures of different constructs (divergent validity) [31]. Therefore, we initially assumed that the score of C-LKS should correlate well with the physical subscales (i.e. Physical Functioning, Role-Physical, and Bodily Pain) of SF-36, but correlate poorly with other subscales (Vitality, Role-Emotional, Mental Health, Social Function and General Health) of SF-36. Based upon such assumption, the Pearson’s correlation coefficient (r) of C-LKS with subscales of the SF-36, WOMAC and IKDC was calculated. The construct validity of C-LKS was evaluated by comparing the compatibility of the results with our initial assumption. The correlations were judged either as poor (r = 0–0.2), fair (r = 0.2–0.4), moderate (r = 0.4–0.6), very good (r = 0.6–0.8), or excellent (r = 0.8–1.0) [31]. We also calculated the Cohen’s Kappa (k) coefficient between the results of C-LKS and IKDC to evaluate external validity, and k > 0.60 was thought to be necessary for an acceptable external validity [32].

Finally, we evaluated the responsiveness of C-LKS by comparing questionnaire results before treatment and 6 months after treatment. Effect size (ES) and standardized response mean (SRM) were the two indices used to evaluate responsiveness. SRM was defined as the mean change between these time points divided by the SD of this change. ES was defined as the mean change between preoperative results and 6 month postoperative results divided by the SD of the preoperative C-LKS score [33]. A greater value of ES and SRM suggested a better responsiveness of C-LKS.

Statistical Package for the Social Sciences, version 20.0 (SPSS, Chicago, IL, USA) was used for statistical analysis. Data are presented as mean ± standard deviation (SD). ICC values are reported with 95 % confidence intervals (CIs). P value of 0.05 or less was considered statistically significant.

Results

Participants

A total of 159 patients with ACL injuries (89 males and 70 females) admitted to our hospital from January 2013 to May 2014 were invited to participate in our study. A total of 126 (79.2 %, 69 males and 57 females) of those invited patients agreed to participate in our study. They all had completed the C-LKS three times in the following 6 months with no withdrawn cases. Detailed demographic information was listed in Table 1.

Table 1 Characteristics of participants

Translation and cross-culture adaptation process

Forward and backward translations went smoothly. The most important modification in the prefinal C-LKS compared with the original English version was that the corresponding points marked beside the items and answers were removed, and other detail issues were also resolved. During the pre-evaluation period, more than half of the patients claimed to have difficulty understanding the terminology in the questionnaire, such as “locking” and “instability”; therefore, explanations with simple language were attached beside the questions in the final version of C-LKS.

Acceptability and score distribution

In our formal investigation, no respondents claimed difficulties understanding the questionnaire after completing C-LKS for the first time, and the answer rates for all questions were 100 % with no missed questions. The average time to complete the questionnaire was 79 ± 21 s.

Overall, C-LKS had no ceiling effect (1.6 %) or floor effect (0.8 %), but the ceiling effect did exist for items three (“Locking”) and six (“Support”) (Table 2).

Table 2 Score distribution and floor-ceiling effects of the C-LKS

Reliability

The Cronbach’s alpha of C-LKS was 0.726, indicative of acceptable internal consistency. The overall test-retest reliability was “excellent” (ICC = 0.935), and the test-retest reliability for each item ranged from acceptable, good to excellent (ICC = 0.770–0.994) (Table 3). The Bland-Altman plots revealed no systematic error in the first two questionnaires (Fig. 1), which also confirmed and highlighted good test-retest agreement of C-LKS.

Table 3 Test-retest reliability and responsiveness of the C-LKSa
Fig. 1
figure 1

These are Bland-Altman plots of test-retest reliability of the C-LKS. Each data point indicates how the difference between the two test sessions for an individual patient compares to the mean of the two sessions for scores of each C-LKS. The interval of two sessions was 2 weeks. The dashed line shows the 95 % (±1.96 SD) limits of agreement

Validity

With the analysis and evaluation of content by rehabilitation and orthopedic experts, the questionnaire was regarded to have good content validity, and the information acquired from the questions was adequate to evaluate the functional state of patients with ACL injuries. Therefore, no addition or deletion of items was recommended.

Relevant data for construct validity evaluation are listed in Table 4, and the data were highly consistent with our presumed results. The correlation between C-LKS and IKDC was excellent (r = 0.734–0.811), while that with the three subscales of WOMAC was very good or excellent (r = 0.634–0.811), and that with the physical subscales of SF-36 were moderate or very good (r = 0.514–0.709), but that with other subscales of SF-36 was fair or moderate (r = 0.207–0.462) or no significant correlations (P > 0.05). These results suggested that C-LKS had good construct validity.

Table 4 Construct validity of the C-LKSa

Lastly, the Cohen’s Kappa (k) coefficient between the results of C-LKS and IKDC was 7.3, which suggested that C-LKS had acceptable external validity.

Responsiveness

Finally, we evaluated the responsiveness of C-LKS by comparing the questionnaires completed before and after ACL reconstruction. Relevant data are listed in Table 3. In general, the average score increased by 21 points after treatment, the ES (1.36) and SRM (1.26) values both exceeded 1.00, suggesting good responsiveness to the questionnaire.

Discussion

Functional or quality of life questionnaires are critical tools in clinical investigations. Researchers are able to compare data with other questionnaires and evaluate the functional state of patients. China has witnessed a rapid development of clinical scientific research over time, and a large amount of relevant articles are being published each year, which not only owes to the largest patient population of China, but also to the great attention and support from the government [31]. Therefore, effective questionnaires are very much in need to better support these massive clinical researches. The Lysholm knee score is one of the most widely applied questionnaires in evaluating the functional state after knee injuries with excellent reliability, validity and responsiveness [9, 10, 12, 13, 15, 21, 3436]. Therefore, we believe that the translation of such scale for the country with the largest patient population is of great significance, which is also the major objective of our study.

During the course of C-LKS development, we removed the corresponding scores noted beside the questions and answers, because we believed that they may affect patient answers if the patients could see the point values. To note, removal of such markings did not influence their understanding of the content. Meanwhile, we also consulted the suggestions from Derya et al [15] to convert the units of walking distance in item 3 (“Pain”) from km to min in order to better estimate patient’s walking capability. Additionally, because China is a developing country, the average education level is still relatively low, and thus some terminologies in the questionnaire, such as “locking” and “instability” may confuse the patients, as reflected by many of them in the pretest phase. Hence, detailed explanations were added beside these items. “Locking” was explained as “A loss of activity with a ‘locked’ sense of the knee when walking or squatting, usually with marked pain”. No further difficulties in understanding the words or content were reflected in the follow-up research.

C-LKS had an acceptable internal consistency (Cronbach’s alpha = 0.726), which was consistent with other studies (Cronbach’s alpha = 0.650–0.729) [10, 12, 13, 15, 21]. It also had excellent test-retest reliability (ICC = 0.935), also consistent with other studies (ICC = 0.820–0.950) [10, 12, 13, 15, 21, 34]. Notably, the ICC value associated with item 5 was very close to 1 (0.994), possibly because of the fact that the usage of crane within a week may not differ at all. Furthermore, we believe that the 1-week margin for test-retest reliability was subtle because the functional state would not markedly change within a week. Additionally, this did not exceed the time interval adopted in other studies (4 to 24 days) [13], and moreover, this timeframe equaled the time required to wait for reconstruction with no additional treatment, thus reducing related errors.

The correlation of C-LKS with the WOMAC, IKDC and SF-36 is consistent with our presumptions, showing good construct validity. This result corroborated previous studies [12, 13, 15, 21, 37]. The correlation between C-LKS and IKDC was the strongest (r = 0.836), which may be because of the fact that IKDC was also designed for patients with ACL injuries, and the cases included in this study were also ACL injury patients. Despite that the physical subscales in SF-36 are to evaluate the functional states of activity, they do not have strong correlation with C-LKS (r = 0.514–0.709). One possible reason is that as a generic scale, SF-36 has a markedly lower degree of accuracy when evaluating the functional state for certain patients, compared with other specific scales [38]. Particularly, the mental subscales of the SF-36 correlated weakly or not at all with C-LKS (r = 0.207–0.303 or P > 0.05). Such a finding may be expected because the mental state of a patient is affected by many factors in life.

The responsiveness to a questionnaire is an important determining factor for prospective clinical investigations. Our results showed good responsiveness of C-LKS (ES = 1.36, SRM = 1.26), suggesting that it could sensitively detect the change in functional state after ACL reconstruction surgery. The ES and SRM values are slightly greater than previous studies (ES = 0.87–1.20, SRM = 0.77–1.10) [10, 13, 21, 34]. This may be because of the fact that all of our study participants underwent arthroscopic ligament reconstruction, while conservative treatment was included in other studies, which may have affected the degree of difference in the improvement of functional state. Furthermore, only the ES and SRM values of item 2 (“Locking”) were below 0.5, which may not be issues from our translation or modification, because other studies showed similar results (ES = 0.28–0.55, SRM = 0.23–0.50) [10, 13, 21]. Thus, we posit that surgery or any other conservative treatment is not likely to significantly improve locking symptom.

However, some limitations of the present study should be noted. First, a relatively small sample size may not perfectly represent the entire Chinese knee injury patient population, but the information from 126 patients is adequate to evaluate psychometric properties [23], and is no less than that of similar studies [1, 15, 16]. Therefore, the reliability of our study would not be affected by the sample volume; second, the language we chose to adapt into is Chinese which does not cover the whole population because China is a multi-group nation and each minority group speaks their own tongue, which should be noted when applying the questionnaire.

Conclusions

In summary, we have successfully translated and modified the Lysholm knee scale into Chinese version, and proved good reliability, validity and responsiveness. Therefore, we suggest the application of the translated C-LKS for Chinese-speaking patients to evaluate the functional state after ACL injury to better collect data required for doctors or researchers.