Introduction

Femoroacetabular impingement (FAI) is a painful hip disorder characterized by premature and abnormal contact between the proximal femur and acetabulum, damaging the labrum and cartilage, with a potential risk of developing osteoarthritis1,2. The diagnosis of FAI is both clinical and radiological, with three types: cam (loss of the femoral head sphere), pincer (excessive coverage of the acetabulum), or mixed3.

Treatment is usually staggered and has two main objectives: pain control and osteoarthritis prevention. Over the last few years, arthroscopy has gained importance, and its use has exponentially increased in the USA and UK4,5, especially in young patients6. Hip arthroscopy has become popular because it is minimally invasive, causes less tissue damage, and allows early rehabilitation7. The aim of surgery is to restore the cam/pincer morphology together with osteoplasty and to reestablish, repair, or reconstruct the labrum or cartilage. However, the protective role of arthroscopy in osteoarthritis has not been established because of the short-term follow-up of comparative studies. Physical therapy is the primary treatment for FAI and is based on increasing strength and stabilizing the hip and pelvic musculature, as well as education, activity modification, and muscle coordination8,9.

Several meta-analyses have evaluated the efficacy of arthroscopic surgery versus conservative treatment for femoroacetabular impingement syndrome10,11,12,13,14,15,16. However, these meta-analyses have reported conflicting results, with some concluding in favor of arthroscopy and others finding no significant differences between interventions. Despite these findings, these meta-analyses have been criticized for their limitations. The limitations of previous RCTs and meta-analyses must be considered when evaluating the efficacy of arthroscopic surgery and physiotherapy or arthroscopic lavage for femoroacetabular impingement syndrome10,11,12,13,14,15,16. Specifically, the lack of studies on patient loss in both groups and different times since randomization to the start of treatment, as well as the absence of analysis on how moderating or demographic variables affect outcomes, make it difficult to draw definitive conclusions about the effectiveness of each intervention10,14. Additionally, the lack of demonstration on whether the superiority of either technique is clinically relevant is a significant gap in literature10,11,12,13,14,15,16. Moreover, previous studies have not included all the available evidence and have conducted meta-analyses of only a few studies in most cases11,12,13,14,15,16. This approach may limit the generalizability of the results and prevent comprehensive understanding of the efficacy of each intervention. Therefore, it is essential to conduct a new comparative meta-analysis that addresses these limitations and includes all the available evidence to provide a more robust and comprehensive analysis of the efficacy of arthroscopic surgery and physiotherapy or arthroscopic lavage for femoroacetabular impingement syndrome.

The objectives of this study were (1) to compare arthroscopy with a more conservative treatment (physiotherapy and joint lavage) in FAI patients in terms of efficacy and safety, and (2) to clarify whether the superiority of either technique is clinically relevant and analyze demographic or secondary variables that may influence the results.

Materials and methods

Eligibility criteria

This meta-analysis was registered in PROSPERO (CRD42022375273). The current study followed PRISMA guidelines (Fig. 1)17. The research question was conducted following the PICOS strategy: (P) patients with clinical and radiologic diagnoses of femoroacetabular impingement syndrome; (I) interventions were arthroscopic surgery; (C) comparisons were more conservative procedures. More conservative procedures were considered the following: physiotherapy therapy or arthroscopic lavage alone; (O) outcomes were efficacy assessed by functional scores and safety evaluated by adverse events; (S) we included randomized controlled trials and meta-analyses to assess the quality of previous published level I evidence studies. The diagnosis of the femoroacetabular impingement syndrome was made clinically and by image (X-rays, MRI and/or CT). To maintain sample homogeneity and minimize potential confounding factors, we excluded studies that enrolled < 16 yo patients or those with systemic disease. Furthermore, studies that initially enrolled patients with osteoarthritis, one of the variables under investigation, were excluded. To ensure the precision of the sample size and estimates, we removed duplicate studies as well as those with incomplete data that could not be analyzed in the statistical program or did not share relevant variables, which could have hindered the meta-analysis.

Figure 1
figure 1

Study selection flow diagram (Preferred Reporting Items for Systematic reviews and Meta-Analysis).

Information sources

A systematic search of the literature using PubMed, EMBASE, Scopus, and the Cochrane Collaboration Library database was carried out. No date limit was specified. Language was not limited. Studies of interest that appeared in the references of the included studies in the first search were also evaluated by manual searching.

Search methods for identification of studies

We used the following search terms to search all trials registers and databases: “femoroacetabular impingement AND arthroscop*” the (Supplementary File 1). Two reviewers independently agreed on selection of eligible studies and achieved consensus on which studies to include. Regarding data extraction, two authors also independently reviewed the studies. If consensus was not reached, a third review author was asked to complete the data extraction form. We analyzed the records of the RCTs as well as their complementary material. We consulted expert opinion to assess which variables would be of most interest.

Data extraction and data items

The following baseline characteristics of each study were obtained: number of participants, type of study, journal, age, %female, %right hip, morphology (pincer, cam, or mixed), and follow-up. The lost follow-up rate and time since randomization were also analyzed. Funding and conflicts of interest were also evaluated. The primary efficacy outcomes were iHOT-33, HOS ADL (activities of daily living), and HOS S (sports). These measurements were taken at 6 months, and 12 months. The minimal clinically important difference (MCID) was included in the outcomes, based on previous studies that analyzed these scales. The MCID for iHOT-33, HOS ADL, and HOS S were six, 14, and 11 points, respectively18,19. We then assessed whether MCID was achieved using the confidence intervals of the mean difference between the experimental and control groups (yes/no).

Regarding safety outcomes, we assessed infection, numbness, additional surgery, osteoarthritis, and nerve injury. Studies evaluating complications were assessed up to the end of follow-up. Although some of the complications almost exclusively occur with arthroscopic surgery, since they are related to surgery and will not occur with physical therapy (e.g., infection), they were also compared to see if such a complication is more frequent. This is because some complications may potentially have negative consequences.

To assess the quality of the previously published meta-analyses, we extracted the variables required by the AMSTAR-2 scale. AMSTAR-2 is a tool that allows for a detailed assessment of meta-analyses and systematic reviews of randomized controlled trials (RCTs) and nonrandomized studies. AMSTAR-2 is a questionnaire with 16 domains and simple answers: yes (positive result), no (insufficient information), or partial yes (partial information to standard)20.

Assessment of risk of bias in included studies

The quality of the RCT was evaluated in accordance with Review Manager by two reviewers. The evaluation methods consisted of the following steps: (A) random sequence generation, (B) allocation concealment, (C) blinding patients and personnel, (D) blinding of data extraction, (E) incomplete outcome data, and (F) selective outcome reporting. The justification for the rating for each item is provided in the Supplementary File 2 (Fig. 2). In addition, we provide the risk of bias for each item within each forest plot to facilitate critical reading of the article.

Figure 2
figure 2

Risk of bias (green = low risk; red = high risk; yellow = unknown).

Assessment of results

The meta-analysis was performed using the Review Manager 5.4 software package provided by the Cochrane Collaboration. For dichotomous variables, odds ratios with a confidence interval (CI) of 95% were calculated. In this study, the odds ratio was preferred over the risk ratio because of its ease of interpretation and usefulness when the data are skewed. This decision was considered appropriate given the low frequency of the outcome of interest and the relatively infrequent occurrence of complications. The mean difference (MD) and the 95% CI were calculated for the continuous variables. Heterogeneity was checked with both the chi2 and the I2 test. I2 varies from 0 to 100%, considering the values of 25, 50 and 75% as low, moderate, and high heterogeneity, respectively. A fixed effects model was adopted if there was no statistical evidence of heterogeneity, and a random effects model was adopted if significant heterogeneity was observed. WebPlotDigitizer version 13.1.4 was used to obtain accurate information from the figures in the articles.

Risk of bias across the studies

We assessed the possibility of publication bias by evaluating a funnel plot (Review Manager 5.4) of the trial mean differences for asymmetry, which can result from the non-publication of small trials with negative results. We acknowledge that other factors, such as differences in trial quality or true study heterogeneity, could produce asymmetry in funnel plots.

Additional analyses

A sensitivity analysis was also carried out using Review Manager 5.4 eliminating the top-weight study from the comparisons of all outcomes.

Regression analysis was also performed to examine the impact of the moderating variables in the effect size study. Different moderators and demographic variables were also included. Qualitative variables were assigned a numerical value to carry out the analysis. The dependent variables were efficacy outcomes (iHOT-33 and HOS ADL), whereas the independent variables were morphology, risk of bias, loss to follow-up, time since randomization, and direct funding from arthroscopic foundations or societies. Regression analysis using ordinary least squares method was conducted using the SPSS package v. 24.0 (IBM, USA), and p < 0·05 was considered statistically significant. Thus, to complement this analysis, a subgroup analysis of the aforementioned variables was performed using the Review Manager 5.4 software package.

The GRADE system was used to assess the quality of the evidence and grade the strength of the recommendations, the Grade of Recommendation, Assessment, Development, and Evaluation (GRADE) system was used using GRADEpro. This system assesses study design, risk of bias, inconsistency, indirectness, imprecision, and summary of findings21.

Results

Study selection

The searches in PubMed, EMBASE, Scopus, and the Cochrane Collaboration Library provided a total of 2095 citations and 42 randomized clinical trials. After adjusting for duplicates 29 remained. Of these, 15 studies were discarded because, after reviewing the abstracts, it appeared that these papers clearly did not meet the criteria. The full texts of the remaining 14 citations were examined in more detail. Eight of the 8 studies did not meet the inclusion criteria. Six studies met the inclusion criteria and were included in the systematic review and meta-analysis22,23,24,25,26,27 (Fig. 1).

Study characteristics

A total of six randomized clinical trials published between 2018 and 2021 were included (Table 1)22,23,24,25,26,27. A total of 839 patients were included, with 418 in the arthroscopy group and 421 in the control group. The mean age ranged from 29.7 to 39.6 years in the arthroscopy group and from 30.6 to 49.1 in the control group. The mean % of females was 48.5%, while in the control group, it was 48%. The follow-up varied between 8 months and 2 years, with a mean follow-up of 13·3 months. Regarding the number of patients lost in each study, the physiotherapy group presented a higher rate of loss than the arthroscopy group (48/421, 11.4% in the physiotherapy group vs. 33/418, 7.9% in the arthroscopy group). Three studies evaluated the mean time since randomization to the start of treatment: 38.0 days in the physiotherapy group, varying between 33 and 44 days, and 98.5 days in the arthroscopy group, varying between 86 and 122 days. The funding for each article was analyzed. Two studies received direct funding through foundations or arthroscopy associations, and conflicts of interest were present in five of the six included studies. In two studies, there was a conflict of interest between arthroscopy associations and societies.

Table 1 Characteristics of the included studies.

Outcomes

The iHOT-33 scale showed significant differences at six and 12 months in favor of arthroscopy (MD 3.98, 95% CI 0.19–7.77; and MD 10.65, 95% CI 6.54–4.76, respectively). HOS ADL assessed at six and 12 months also showed superiority in the arthroscopy group (MD 5.19, 95% CI 0.77–9.61, and MD 8.09, 95% CI 3.11–13.07, respectively) (Fig. 3). For HOS ADL, arthroscopy did not exceed the MCID in any case, both at six and 12 months. For HOS S at six and 12 months, arthroscopy did not exceed the MCID in any case (Table 2). Complications forest plots are shown in Figs. 4 and 5. There were no significant differences regarding infection rate (OR 4.14, 95% CI 0.87–19.59), or nerve injury (OR 2.32, 95% CI 0.34–15.83). Significant differences were found in additional surgery (OR 11.11, 95% CI 1.42–86.89), osteoarthritis (OR 6.18, 95% CI 1.06–36.00) and numbness (OR 73.73, 95% CI 10.00–543.92).

Figure 3
figure 3

Forest plot showing functional and disability outcomes. AT group showed a statistically significant difference in iHOT-33 at six (a) and 12 (b) months. Regarding the HOS ADL there were significant differences in favor of AT group at 6 (c) and 12 (d) months. No significant heterogeneity was observed in these comparations.

Table 2 Assessment of MCID between the confidence intervals between two groups.
Figure 4
figure 4

Forest plot showing complications. There were no differences regarding infection rate (1.5%) (a) or additional surgery (6.7%) (b). The numbness rate was higher in the AT group (26.7%; p < 0.0001).

Figure 5
figure 5

Forest plot showing nerve injury rate (a) and osteoarthritis rate (b). A significant difference was observed with respect to osteoarthritis rate (5·5% in AT group versus 0·7% in control group; p = 0·04).

Risk of bias within studies

Strong evidence of heterogeneity (I2 ), greater than 70, was not observed in any of the outcomes. However, the publication bias of the main variables considered by the RCTs (iHOT-33 and HOS ADL) was also examined. A sensitivity analysis was performed by eliminating the top-weight studies from the comparisons of all outcomes. Only one variable changed the direction of the results by eliminating the heaviest studies. This outcome was the HOS ADL at 6 months (Fig. 6). Regarding publication bias, the values were within the acceptable range (Fig. 7).

Figure 6
figure 6

Sensitivity analysis showing statistically significant differences regarding the iHOT-33 at 6 months outcome.

Figure 7
figure 7

Funnel plot evaluating the publication bias regarding the main functional outcomes: (a) iHOT-33 at 6 months; (b) iHOT-33 at 12 months; (c) HOS ADL at 6 months; (d) HOS ADL at 12 months.

Additional analyses and risk of bias across the studies

The following moderating variables were analyzed to assess the percentage of efficacy variables that could be explained by morphology, risk of bias, loss to follow-up, time since randomization, and funding. There was a significant association between iHOT-33 at 6 months and the time since randomization, with an explained variance of 99.9%, p = 0.01 (adjusted R-squared). There was also an association between iHOT-33 at 6 months and pincer-type morphology, with an explained variance of 86.4%, p = 0.46 (adjusted R-squared).

Regarding the subgroup analysis, there were significant differences in the independent variable of direct funding from arthroscopic foundations or societies in the iHOT-33 at six and 12 months and the HOS ADL at six and 12 months. The subgroup analyses are provided in Supplementary Fig. 1.

The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) summary of the results of these three comparisons is shown in Table 3.

Table 3 GRADE assessment of the quality of the evidence and the strength of the recommendations.

The results of the AMSTAR-2 assessment are shown in Supplementary Table 110,11,12,13,14,15,16. Further alteration of the standards was observed with respect to duplicate data extraction, lack of discussion or interpretation of the results according to the risk of bias of the articles, and lack of discussion of the influence of individual article funding or conflict of interest on the overall results.

Discussion

In this meta-analysis, we compared the use of arthroscopic surgery with a control group for the treatment of femoroacetabular impingement syndrome. We also included articular lavage because of the need to compare RCTs with placebo16. In our study, we observed that arthroscopic surgery was significantly superior to the control group in iHOT-33 and HOS ADL; however, this superiority did not reach the minimum clinically relevant improvement. The rate of complications was higher in the arthroscopy group and presented a higher risk of conversion to osteoarthritis than in the control group. The risk of bias between RCTs was high in terms of performance and attrition bias.

Regarding MCID, arthroscopy was not superior to physical therapy in any of the cases. It should be noted that MCID may vary among different articles. Nwachukwu et al.28 was more demanding with respect to iHOT-33 and HOS S, setting the MCID at 12.1 and 14.5, while the MCID of the HOS ADL is set at 8.3, with only one of the studies exceeding the MCID28. On the other hand, another study by Nwachukwu et al.29 established a higher MCID requirement for iHOT-33 and HOS S of 10.7 and 12.1, respectively. The HOS ADL was also lower (9.8), and only one of the studies exceeded the MCID29. Studies that established the MCID limits were based on statistical methods such as distribution-based methods or receiver operating characteristic (ROC) analysis. In addition, the area under the curve (AUC) determined the validity of the identified threshold for predicting the probability of a patient reaching the MCID, considering an acceptable strength of association for AUC 0.728,29.

In addition, the quality of the evidence and the strength of the recommendations were low or very low for the efficacy variables assessed using the GRADE system. The meta-analysis by Zhu et al.10, the one with the highest number of studies, also included six RCTs concluding in favor of the statistical superiority of arthroscopy; however, one of the included studies was a duplicate of the UK FASHioN trial. Furthermore, it did not include the analysis of clinical superiority in the main variables of these RCTs, grade the evidence according to GRADE, and discuss the impact of different moderating or demographic variables on the main results of the meta-analysis21. This meta-analysis concluded that “arthroscopy treatment is recommended for patients who need improvement in a shorter period of time, but did not consider the time lag since randomization between groups, which was almost 3 months. Consistent with and in favor of arthroscopy are the meta-analyses of Gatz et al.11, Dwyer et al.12, Mok et al.14, Casartelli et al.15, and Ferreira et al.16, although these included three RCTs. Mok et al.14, found the superiority of arthroscopy but highlighted the importance of demographic variables or factors that could affect outcomes25. In our meta-analysis, we were only able to perform regression analysis of the main demographic variables contributed by the studies without finding differences, except for time since randomization and pincer morphology with respect to iHOT-33 at 6 months. Regarding subgroup comparisons, funding played an important role in some functional outcomes, although these analyses were of limited value because of the small number of studies included. Furthermore, Bastos et al.13, including the same three RCTs, concluded that surgery is not more effective, criticizing the lack of cost-type outcomes and adverse events in the longer term, such as osteoarthritis31.

Regarding complications, recent meta-analyses have not analyzed these results and have focused on functional scales. Long-term studies are important to assess conversion to osteoarthritis, as this is one of the most relevant concerns in FAI. In patients undergoing arthroscopy and osteochondroplasty, one out of six patients over 40 years of age opted for total hip arthroplasty at 2 years30. Meanwhile, the prognosis is better31. Some risk factors for conversion to arthroplasty include age, level of chondral damage, femoral osteoplasty, smoking, and inflammatory joint diseases, among others32,33. The conversion rate to osteoarthritis remains relevant since the early development of osteoarthritis, less than 5 years, has been shown to decrease the cost-effectiveness of arthroscopy34. In our study, we found that 6% of patients in the arthroscopy group developed OA.

The difference in time since randomization between the arthroscopy and control groups is a crucial factor to consider when interpreting study results. The assessment of the outcome scales was performed from the time of randomization, not at the start of treatment, which means that the difference in treatment initiation time could have potentially affected the outcomes of interest. For instance, in the evaluation of time at 12 months, the effect of the intervention in the arthroscopy group was observed 2 months earlier than in the physiotherapy group. Griffin et al.26 reported a difference of 85 days, almost 3 months19. This difference in treatment initiation time could have led to differences in the progression of the condition, severity of symptoms, and efficacy of interventions. Moreover, the regression analysis of randomization time showed that the time since randomization explained 99.9% of the iHOT-33 variable at 6 months. Overall, the difference in time since randomization could have potentially affected the efficacy of the interventions, progression of the condition, and observed outcomes. Therefore, the importance of time since randomization cannot be underestimated in the analysis and interpretation of study results.

Two studies received direct funding from foundations or arthroscopy. When subgroups were performed, studies funded directly by foundations or arthroscopy societies showed better results, although subgroup analysis was underpowered.

The need for high-quality reviews is evident and necessary when the AMSTAR-2 scale was reviewed objectively and independently review two authors. Almost all the studies were of low or critically low quality. This study had several limitations. Our meta-analysis included a small number of articles, and the sensitivity analysis was affected by the study with the highest weight in the case of HOS ADL at 6 months. In addition, regression analysis was performed. Although Cochrane recommends including at least ten studies to perform this type of analysis, caution should be exercised when interpreting these results. In addition, because of the limited number of articles, the statistical program was unable to assess the subgroup analyses of these variables in many cases.

This meta-analysis has several strengths. First, it included the highest number of randomized controlled trials conducted on the topic, which enhances the statistical power of the analysis and increases confidence in the results. Second, the meta-analysis emphasized the importance of considering the difference in time since randomization between the arthroscopy and control groups, which can potentially affect the efficacy of interventions and observed outcomes. Third, this study provides a more comprehensive comparison by including articular lavage, which offers a better understanding of the efficacy of different interventions for femoroacetabular impingement syndrome. In addition, this meta-analysis has a potential impact on various stakeholders in the field of orthopedics. The results of this study will likely be of interest to researchers, clinicians, and policymakers. Researchers may find valuable findings in informing future studies or meta-analyses, and clinicians can use the results to make informed decisions when considering treatment options for patients with femoroacetabular impingement syndrome. Policymakers can also use these results to develop healthcare policies and guidelines related to the use of arthroscopic surgery and physical therapy for this condition. Furthermore, the results of this study provide valuable information on the safety and efficacy of different treatment options, which can ultimately benefit patients with femoroacetabular impingement syndromes. Therefore, this meta-analysis holds significant importance and can contribute to the advancement of knowledge in the field of orthopedics.

Conclusion

We have conducted analyses that justify a new review on this topic and enhance the evidence of the findings. Therefore, we can conclude that in terms of efficacy, arthroscopic surgery showed statistical superiority over the control group; however, the clinical difference is controversial, and arthroscopy did not show MCID in all cases. These results were observed both at six and twelve months. In addition, the efficacy of arthroscopy was related in many cases to secondary variables such as time since randomization and funding received although these analyses must be taken with caution given the low number of articles. In terms of safety, arthroscopic surgery showed a higher rate of conversion to osteoarthritis than the control group. Future research should focus on analyzing how these moderator, or demographic variables affect the results.