Abstract
Purpose
In patients undergoing total hip arthroplasty, limping is a significant symptom, often assessed with the limping sub-score of the Harris Hip Score. However, the reliability of this sub-score has not been specifically investigated. The purpose of this study is to investigate the intra- and inter-rater reliability of this sub-score.
Methods
Thirty patients undergoing THA were recruited and performed a gait analysis before surgery and three months after surgery. In addition, 30 asymptomatic participants were included. In total, 90 visits were analysed in this study. The HHS limping sub-score was assessed for each visit using a video (front and back view side-by-side) of a ten metre walk at a self-selected speed. Two orthopaedic surgeons evaluated the limping of each video in two different grading sessions with a one week delay. To avoid recall bias, the patient’s number identity was randomized and different for each grading session and each rater. The weighted Cohen’s Kappa coefficient was used to quantify the intra- and inter-reliability. The reliability of three components was studied: the presence of limping, its severity, and the compensation type.
Results
For all components, the agreement for intra-rater reliability ranged from moderate to strong and from none to moderate for the inter-rater reliability.
Conclusion
These results do not encourage the use of HHS-limping sub-score for data involving different raters in both clinical and research contexts. It calls for improved consensus on limping definitions or the creation of objective measures.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Limping is a gait disorder described as an abnormal gait pattern that is frequently reported in patients undergoing total hip arthroplasty (THA) [1, 2]. Pain is one of the main causes of limping in these patients and is responsible for compensatory mechanisms such as the Duchenne limp pattern [3]. This pattern is characterized by an exaggerated lateral bending of the trunk towards the affected limb during gait, to reduce the pain. Pauwel’s balance can explain this compensation: the centre of mass of the trunk is shifted above the hip joint centre [3], this reduces the force required by the abductor muscles to stabilize the pelvis, which in turn lowers the mechanical burden of the hip joint and results in pain relief [4]. Abductor weakness is also a common cause of limping [5]. As a consequence of abductor weakness, THA patients can present a pelvic drop which characterizes the Trendelenburg limping pattern. It is interesting to note that the pelvic drop can be compensated by the Duchenne limping pattern. Duchenne limping pattern can therefore be the result of both pain and abductor weakness in severe cases of Trendelenburg limping. The limping and various compensations result in increased energy expenditure [6] and an accelerated process of wear and tear at the hip joint [7]. With higher functional limitations, limping was also reported to reduce the postoperative quality of life [8] and postoperative patient satisfaction [2].
Although limping is an important symptom to characterize the gait pattern of patients with hip osteoarthritis or THA, it is rarely reported in the studies. This may be because there is no specific questionnaire or scale for its assessment. Most of the time, the evaluation of limping is included in the general function and pain assessment of the hip. The Harris Hip Score (HHS) is widely used in patients with hip disorders and is validated and reliable to quantify pain, function, deformity, and range of motion [9]. The limping is included in the function-gait domain of the HHS and is graded with a Likert scale between 0 and 3 (0= none; 1=lightly; 2=moderate; 3=severe). Several studies used this sub-score (or similar scales) independently to investigate the influence of limping severity on different outcomes such as patient satisfaction or the effectiveness of the surgery [1, 2, 8]. However, although the global HHS score is deemed reliable, the reliability of the limping sub-score has not been specifically investigated. Therefore, assessing the reliability of this sub-score is necessary to fully understand the results of past and future studies focusing on limping and to gain confidence in this commonly used clinical tool.
This study aimed to evaluate the intra- and inter-rater reliability within-day of the Harris Hip limping sub-score. Three components were investigated: (1) the reliability of limping status independently of the severity (limping/no limping), (2) the reliability of the limping severity, and (3) the reliability of the type of limping compensation (Duchenne/Trendelenburg).
Materials and methods
This retrospective cohort study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Cantonal Research Ethics Committee (Geneva, Switzerland) on October 26th (CCER: 2017-00817).
Participants
Thirty patients undergoing THA were randomly selected retrospectively from a research project (starting in 2017) of the Geneva University Hospitals including a clinical gait analysis. As inclusion criteria, patients were included if they were between 30 and 80 years old, programmed for a primary elective THA (e.g. due to osteoarthritis), with an anterior, lateral, or posterior surgical approach, and with a ceramic-on-polyethylene implant device.
Thirty asymptomatic participants (Controls) were also included in the study to increase the number of participants without limping. They were included if they were between 30 and 80 years old and had no osteoarthritis or previous THA. Exclusion criteria for both groups were the following: Incapability to walk ten m without assistance, previous arthroplasty of the lower limbs, resident in a retirement home/special care institution, and neurological, muscular, or orthopaedic problems (other than THA-related) that could affect gait parameters.
Limping evaluation
Limping was assessed on an ordinal scale of 0 to 3 (0 = none; 1 = slight; 2 = moderate; 3 = severe) according to the HHS limping sub-score. Additionally, a qualitative evaluation of limping was performed by reporting the presence of Trendelenburg limping pattern, Duchenne limping pattern, or “other” if the limping did not fall within the two previous categories. The Trendelenburg limping pattern was defined as “a contralateral pelvic drop during a single leg stance [10], and the Duchenne limping pattern was defined by trunk lean towards the affected stance limb with the pelvis stable or elevated on the swinging limb side during unipodal phase [11]. Note that this qualitative evaluation is not included in the HHS.
Design of the study
The patients performed a clinical gait analysis before and three months after surgery (2 sessions of analysis) and the control group only once. This analysis consisted of 2D video recordings and 3D motion analysis of the full body during gait and various functional tasks (including sit-to-stand, timed up-and-go, and unipodal balance). This led to a database of 90 videos (front and back view side-by-side) of participants walking at a self-selected speed. The face of the participants was blurred and the patient’s identity number and affected side were reported on the videos (example: “Subject n°1 - LEFT”) as shown in the supplementary data 1. A random side was affected to control participants to blind raters from the presence of a control group (see Fig. 1).
Raters
Two senior orthopaedic surgeons (MG, AD) performed the limping evaluation on each video in two different sessions with a one week delay. They had more than seven years of experience and performed approximately 75 THA per year. In the clinical routine, they performed an examination including a limping evaluation with the HHS-limping sub-score on patients pre- and post-surgery.
Statistical analysis
Three components were assessed for reliability as follows: (1) the limping status, (2) the limping severity, and (3) the type of limping compensation. The reliability of the limping status (component 1) was evaluated on all sessions of the THA patients and Controls (90 videos) considering the status as limp for scores superior or equal to 1 and as no limp for scores equal to 0. From a clinical point of view, we consider more relevant to highlight the moderate & severe limping severity. Thus, a second limping status was set considering the status as limp for scores superior or equal to 2 and as no limp for scores inferior or equal to 1. The reliability of the limping severity (component 2) was evaluated only on the THA patients (60 videos: 1 video per session). The type of limping compensation (component 3) was evaluated only on videos in which a score superior or equal to 1 was reported by each rater in each session (25 videos included and 35 videos excluded).
The weighted Kappa coefficient was used to quantify the intra- and inter-reliability [12]. The interpretation of the Kappa coefficient was classified as follows: 0–0.20 as none, 0.21–0.39 as minimal, 0.40–0.59 as weak, 0.60–0.79 as moderate, 0.80–0.90 as strong, and above 0.90 as almost perfect agreement [12]. Confidence intervals were also reported. The percentage of agreement can be also used to quantify the reliability but it does not take into account the possibility that the raters guess the score, while the Kappa coefficient does [12]. It is nevertheless reported in the supplementary Data 2.
Results
Characteristics of the participants are presented in Table 1. Because the anterior approach is by far the most commonly used approach in the institution, only patients with anterior approach happened to be randomly included in the study. Among the four evaluations, limping was observed in 41% of the videos (7% of the Controls; 71% and 47% of the patients for pre- and post-3-month sessions, respectively). Note that limping in Controls was observed only for one rater either slight (n=7) or moderate (n=1). The proportion of each rater for each session of evaluation is presented in Fig. 2.
Concerning the limping status 1 (none vs. slight, moderate & severe) (component 1), Cohen’s Kappa was 0.782 [0.689 to 0.875] and 0.539 [0.415 to 0.663] for intra- and inter-rater reliability, respectively (Fig. 3). For limping status 2 (none & slight vs. moderate & severe) Cohen’s Kappa was 0.662 [0.503 to 0.820] and 0.624 [0.459 to 0.789] for intra- and inter-rater reliability, respectively (Fig. 3).
Concerning the limping severity (component 2), only THA patients were included in the analysis (i.e. 60 sessions/videos). Cohen’s Kappa was 0.726 [0.587 to 0.866] and 0.534 [0.359 to 0.709] for intra- and inter-rater reliability, respectively (Fig. 3).
Concerning the limping compensation type (component 3), 25 sessions from 18 different THA patients were analyzed. Cohen’s Kappa was 0.846 [0.639 to 1.000] and 0.137 [0.000 to 0.463] for intra- and inter-rater reliability, respectively (Fig. 3).
The confusion matrix and detailed results of each analysis are presented in the Supplementary Data 2.
Discussion
This study investigated the intra- and inter-rater reliability of the HHS-limping sub-score. The reliability of three components was evaluated: limping status, limping severity, and compensation type. Results showed intra-rater reliability ranging from moderate to strong (0.77< k < 0.89) and inter-rater reliability ranging from none to moderate (0 < k < 0.62).
For all components, the intra-rater was greater than the inter-rater reliability. This result is commonly reported in the literature because the between-rater variability is eliminated in intra-rater reliability [13]. The intra-rater reliability result is close to the reliability of the HHS (global score) [9]. Indeed, the HHS showed excellent intra-rater reliability, especially for the domain of function (r>0.93), which includes the limping sub-score [14]. The difference could be related to the reliability calculation methods (correlation vs. Cohen’s Kappa coefficient). However, the HHS showed a strong to almost perfect inter-rater reliability (Cohen’s Kappa Coefficient of 0.82 to 0.91) [15], higher than the HHS-limping sub-score. The global score is composed of ten items for a total score of 100. Limping is one of these and represents only 11% of the total score [9]. The reliability of the other items, especially the pain (44% of the score) may compensate for the weak reliability of the HHS-limping sub-score.
The quantitative evaluations of the limping showed moderate to strong intra-rater agreement while inter-rater agreement was weak to moderate. In other words, the assessment of limping status and its severity level using the HHS-limping sub-score is adequate when the evaluation is performed by only one clinician, but less suitable when the assessment is performed by multiple clinicians.
It is also interesting to note that distinguishing moderate & severe limping from none & slight limping presents better reliability than none from slight, moderate & severe limping. For limper vs. non-limper group analysis, we suggest using the moderate & severe scores as limper and none & slight scores as non-limper when different raters are involved. This reinforces the need to clearly describe the origin of the data and the categories of limping that are used/combined in clinical follow-up and research studies using data from a registry or database involving different clinicians.
Concerning the qualitative evaluation of limping (not included in the HHS), results showed strong intra-rater reliability but the inter-rater agreement was also qualified as none to weak. This result is partially consistent with the literature. Indeed, the Trendelenburg test agreement was reported to be none to weak in patients with hip pain [16]. It suggests that the interpretation of the Trendelenburg limping pattern and Duchenne limping pattern varies between clinicians. This could indicate a need to improve the consensus on the definition of limping signs in THA patients. As an example, during videotaped observation of gait, Dürregger et al. (2020) reported an intra-class correlation for inter-rater reliability ranging from 0.47 to 0.88 for a positive Trendelenburg’s sign set over 8° and from 0.76 to 0.92 for a positive Duchenne’s compensation set over 10° [17]. In the present study, no specific threshold was given to the rater which can explain the low inter-rater reliability. Another solution could be found in the development of tools for an objective assessment of limping based on motion capture and biomechanical outcomes which would reduce the between-rater variability. However, considering clinical constraints (time, cost, workload...), this assessment must be fast, inexpensive, and easy to use. Using the Instrumented Time Up and Go (also named iTUG) test with inertial measurement units could be an interesting pathway for clinical use [18].
Limitations
This study presents several limitations. The limping assessment was performed on videos that could be different than seeing the patient in real life but, video analysis offers the possibility to play the video several times or frame by frame, which can improve the evaluation. Moreover, as previously reported, the percentage of agreement and the Cohen’s Kappa coefficient could individually lead to different conclusions on inter-rater reliability. It was suggested that if raters are well trained and have few risks of guessing the score, the percentage of agreement can be used. Despite the surgeons of the present study being experienced, the HHS does not include a clear definition of the severity level (slight vs. moderate vs. severe) which can lead to a guessing score and different interpretations between clinicians. Nevertheless, similar results using the percentage of agreement (Supplementary material 2) on limping severity were found. Regarding the limping type, the frontal viewpoint of the video only allowed the raters to estimate deviations in the frontal plane. Different types of limping characterized by deviations in other planes, e.g. hip extension deficits for the sagittal plane, may have been missed and not reported in the “other” category of limping [19].This may therefore have influenced the reliability of the type of limping.
Conclusion
This study found a moderate to strong intra-rater agreement and a none to moderate inter-rater agreement for the HHS-limping sub-score. These results highlight the limitations of using the HHS-limping sub-score for data involving different raters in both clinical and research contexts. For a limper vs. non-limper group analysis, we suggest using the moderate & severe scores as limper and none & slight scores as no limper when different raters are involved. This study suggests clarification of the definition of limping (presence/absence and severity level) and the training of the raters according to the same definitions. Another solution could be the development of an objective outcome measure based on biomechanical parameters which would limit the influence of the between-rater variability.
References
Horstmann T, Listringhaus R, Brauner T et al (2013) Minimizing preoperative and postoperative limping in patients after total hip arthroplasty: relevance of hip muscle strength and endurance. Am J Phys Med Rehabil 92:1060–1069. https://doi.org/10.1097/PHM.0B013E3182970FC4
Bonnefoy-Mazure A, Poncet A, Gonzalez A et al (2022) Limping and patient satisfaction after primary total hip arthroplasty: a registry-based cohort study. Acta Orthop 93:602–608. https://doi.org/10.2340/17453674.2022.3489
Bronstein A, Brandt T (2004) Clinical disorders of balance, posture and gait, 2nd Editio. CRC Press
Reininga IHF, Stevens M, Wagenmakers R et al (2012) Subjects with hip osteoarthritis show distinctive patterns of trunk movements during gait-a body-fixed-sensor based analysis. J Neuroeng Rehabil 9:1–8. https://doi.org/10.1186/1743-0003-9-3
Böhm H, Hagemeyer D, Thummerer Y et al (2016) Rehabilitation of gait in patients after total hip arthroplasty: comparison of the minimal invasive Yale 2-incision technique and the conventional lateral approach. Gait Posture 44:110–115. https://doi.org/10.1016/j.gaitpost.2015.10.019
Nankaku M, Tsuboyama T, Kakinoki R et al (2007) Gait analysis of patients in early stages after total hip arthroplasty: effect of lateral trunk displacement on walking efficiency. J Orthop Sci 12:550–554. https://doi.org/10.1007/S00776-007-1178-2
Gandbhir VN, Rayi A (2019) Trendelenburg Gait. StatPearls Publishing
Vučković M, Ružić L, Tudor A, Šutić I (2021) Difference in patient quality of life after hip arthroplasty with a minimally invasive approach or classic approach. Acta Clin Croat 60:89–95. https://doi.org/10.20471/acc.2021.60.01.13
Nilsdotter A, Bremander A (2011) Measures of hip function and symptoms: Harris Hip Score (HHS), Hip Disability and Osteoarthritis Outcome Score (HOOS), Oxford Hip Score (OHS), Lequesne Index of Severity for Osteoarthritis of the Hip (LISOH), and American Academy of Orthopedic Surgeons (A. Arthritis Care Res 63. https://doi.org/10.1002/ACR.20549
Gogu S, Gandbhir VN (2022) Trendelenburg Sign. Br Med J 1:58. https://doi.org/10.1136/bmj.1.5322.58
Salami F, Niklasch M, Krautwurst BK et al (2017) What is the price for the Duchenne gait pattern in patients with cerebral palsy? Gait Posture 58:453–456. https://doi.org/10.1016/J.GAITPOST.2017.09.006
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Medica 22:276. https://doi.org/10.11613/bm.2012.031
Poulsen E, Christensen HW, Penny JØ et al (2012) Reproducibility of range of motion and muscle strength measurements in patients with hip osteoarthritis – an inter-rater study. BMC Musculoskelet Disord 13:242. https://doi.org/10.1186/1471-2474-13-242
Söderman P, Malchau H (2001) Is the Harris hip score system useful to study the outcome of total hip replacement? Clin Orthop Relat Res 384:189–197. https://doi.org/10.1097/00003086-200103000-00022
Kirmit L, Karatosun V, Unver B et al (2005) The reliability of hip scoring systems for total hip arthroplasty candidates: assessment by physical therapists. Clin Rehabil 19:659–661. https://doi.org/10.1191/0269215505CR869OA
Cibere J, Thorne A, Bellamy N et al (2008) Reliability of the hip examination in osteoarthritis: Effect of standardization. Arthritis Care Res (Hoboken) 59:373–381. https://doi.org/10.1002/ART.23310
Dürregger C, Adamer KA, Pirchl M, Fischer MJ (2020) Inter-rater reliability of a newly developed gait analysis and motion score. J Orthop Trauma Rehabil 2020. https://doi.org/10.1177/2210491720967366
Gastaldi L, Digo E, Ortega-Bastidas P et al (2023) Instrumented Timed Up and Go Test (iTUG)-more than assessing time to predict falls: a systematic review. Sensor. https://doi.org/10.3390/s23073426
Khamis S, Carmeli E (2017) Relationship and significance of gait deviations associated with limb length discrepancy: a systematic review. Gait Posture 57:115–123. https://doi.org/10.1016/J.GAITPOST.2017.05.028
Funding
Open access funding provided by University of Geneva This work was partly supported by the Research Fund of the Department of Orthopaedic Surgery at Geneva University Hospitals.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This retrospective cohort study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Cantonal Research Ethics Committee (Geneva, Switzerland) on october 26th of october (CCER: 2017-00817).
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file1 (MP4 13128 KB)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rose-Dulcina, K., Gasparutto, X., Djebara, AE. et al. Reliability of the Harris Hip limping sub-score in patients undergoing total hip arthroplasty. International Orthopaedics (SICOT) 48, 991–996 (2024). https://doi.org/10.1007/s00264-023-06082-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00264-023-06082-4