Introduction

Osteoarthritis (OA) is the most prevalent musculoskeletal disease in Western populations. It is recognized as an important cause of pain, loss of function and disability and is a major public health problem [1, 2] associated with a substantial and ever-growing burden on society [3, 4]. As life expectancy increases, OA is expected to become the fourth leading cause of disability by the year 2020 [5]. Despite the nearly concurrent publication of international evidence-based OA management guidelines, there are important differences in the interpretation of the evidence, and agreement is lacking concerning the different treatment modalities [610]. This reality is particularly evident for the drug class of symptomatic slow-acting drugs in OA (SYSADOAs) [11, 12]. Discrepancies between the guidelines were related to the heterogeneity of the expert panels involved, geographical differences in the availability of pharmacotherapies and heterogeneity of the studies included [11]. For glucosamine sulfate (GS) and chondroitin sulfate (CS), the evidence for a beneficial effect on symptomatic knee OA is primarily supported by studies of uneven quality performed over > 25 years and, in most cases, supported by the pharmaceutical industry [13]. For GS, it was recently demonstrated that most of the heterogeneity in trials is explained by the brand and that whereas large inconsistency between trials was found, low risk of bias trials using one single pharmaceutical-grade product reveal a small but significant effect size (ES) [14]. The objective of this study was to conduct, for the first time, a similar assessment of the symptomatic effects of CS and to explore the potential determinants of the discrepancy between the results of studies investigating pain and function in knee OA by conducting a systematic review and stratified meta-analysis of available randomized placebo-controlled trials (RCTs).

Methods

The protocol of this meta-analysis has been published in the PROSPERO database under registration no. CRD42018087103. The systematic review and meta-analysis were performed following the recommendations provided in the Cochrane Handbook for Systematic Reviews of Interventions [15]. The findings are reported in accordance with the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [16]. The Cochrane online platform Covidence was used to manage the review process. This article does not contain any studies with human participants or animals performed by any of the authors.

Literature Search and Search Strategies

A comprehensive literature review was performed, searching the Medline (via Ovid), Cochrane central register for controlled trials (CENTRAL via Ovid) and Scopus databases. Pre-registered search strategies (on PROSPERO) adapted to the vocabulary for each database were used (see Appendix 1), combining adequate words concerning disease, the intervention of interest and type of study. We searched the databases from inception to 22 December 2017, limiting the searches to English and French language publications. The latest meta-analyses and systematic reviews on oral CS in OA were additionally screened for any other relevant articles that could have been missed following our literature search. New publications on CS were also checked until 8 October 2018 through automatic databases alerts.

Study Eligibility

Randomized, placebo-controlled trials assessing the effect of oral CS on pain and functional status (using the Lequesne index) in patients with OA were eligible for inclusion in this meta-analysis. The following studies were not considered for inclusion: studies with combination products (e.g., combined GS/CS), crossover studies without efficacy results for the first placebo-controlled period, open-label studies, reviews or meta-analyses, letters, comments, editorials or animal trials. There were no particular considerations concerning age, ethnicity, gender or country of origin of the trial participants.

Study Selection

The selection of the articles retrieved from the literature search was performed following two primary steps: after the duplicates were removed, a first selection was undertaken using information in titles and/or abstracts, and a second selection was performed by fully reading the manuscripts of the previously selected references.

Two members of the review team (GH and AG) independently screened each title and/or abstract to exclude only obvious irrelevant studies according to the predefined eligibility criteria described above. Full texts of the references that were selected as potentially relevant for the meta-analysis were searched or purchased when not available through the institutional library. The two investigators (GH and AG) then independently reviewed each of these full manuscripts to determine whether the studies met all selection criteria. The articles that did not meet the selection criteria were excluded, and the reason(s) for the exclusion was (were) reported by each reviewer. After the article selection, which was based on titles/abstracts and full texts, disagreements between the two reviewers were resolved through consensus, and a third party was included when necessary.

Data Extraction and Data Items

A pilot-tested standard data extraction form was used for data extraction by two independent reviewers (GH and AG). For each study, general information was extracted by one reviewer (GH) and cross-checked by the second reviewer (AG); we extracted the characteristics of the manuscript for identification and the following study-level characteristics for the purpose of subgroup analyses: trial, patient (mean age, gender, etc.), disease (e.g., the OA location) and treatment characteristics (dosage, manufacturer, etc.). Outcomes were independently extracted by the two reviewers (GH and AG), and extractions were compared after all data had been fully extracted.

For the pain outcomes data, we used the prioritized list of patient-reported outcomes proposed by Juhl et al. [17]. The total score of the Lequesne index (LI), a multidimensional assessment tool that includes parameters on pain or discomfort, maximum distance walked and activities of daily living, was also considered [18]. As recommended in the Cochrane handbook, we used the means and standard deviations (SD) at the end of follow-up, calculated (as much as possible) based on the intention-to-treat (ITT) population for the intervention and control groups. When the mean (± SD) values at end of follow-up were not available, we used the mean (± SD) changes from baseline [15]. In cases in which adequate data for the meta-analysis were not reported, the manuscript authors or study sponsor was contacted for details.

Risk of Bias in Individual Studies

We used the Cochrane risk of bias tool to assess the potential bias in each study selected for inclusion in the meta-analysis, categorizing the risk for specific bias as “low,” “unclear” (no or not enough evidence in the article for a definitive judgement) or “high” [15]. Six critical domains of RCTs were independently assessed by two reviewers (GH and AG). We assessed whether the random sequence was generated using an adequate method and whether the allocation was adequately concealed so that neither the investigators nor the patient could predict the allocation group. We also assessed whether the participants and investigators (including the outcomes assessors) were blinded from knowing which intervention a participant received and whether the intended blinded treatment was effective. Finally, we assessed whether the data were analyzed, following the ITT principle (avoidance of attrition bias) and whether all pre-specified outcomes were reported in the results section as planned. Disagreements in judgments between the two reviewers were solved through consensus with the assistance of another member of the review team as necessary.

Data Analysis

The meta-analysis was undertaken using the Review Manager software (RevMan software program, version 5.3. Copenhagen: The Nordic Cochrane Centre, the Cochrane Collaboration, 2014). More complex analyses were performed using STATA software, version 14.2 (StataCorp LP).

The pain and LI scores were expressed in this meta-analysis as standardized mean differences (SMD), with a 95% confidence interval (CI) to take into account the use of various measurement tools for pain assessment and to compare the results of these two outcomes. Anticipating substantial heterogeneity in the treatment effects across trials, we estimated the overall effects and heterogeneity using the DerSimonian and Laird random-effects model [19]. Hedges’ bias correction was applied to account for small-study bias, as integrated by default in the Review Manager software [15]. For sensitivity analyses, the fixed-effect model was applied. We tested heterogeneity using Cochran’s Q test. As we performed a random-effects meta-analysis, we used the tau-squared (tau2) estimate as a measure of the between-study variance. A tau2 value of 0.16 represents the threshold for high between-study heterogeneity [20]. The I-squared (I2) statistic was used to quantify heterogeneity, measuring the percentage of total variation across studies due to heterogeneity [21]. In the case of substantial heterogeneity, we planned to perform subgroup analyses, stratifying the original analyses according to pre-specified study-level characteristics, such as the CS manufacturer (i.e., the brand), use of oral non-steroidal antiinflammatory drugs (NSAIDs) as rescue or concomitant medication, CS daily dose, treatment duration and risk of bias [22]. Funnel plot asymmetry was assessed for publication bias or heterogeneity through visual inspection and with Egger’s test [23]. Additional exploratory analyses were performed to assess the sources of any residual heterogeneity [24]. Finally, we assessed the certainty of evidence for the primary results of this meta-analysis using the GRADE approach [25].

Results

Search Result and Studies Characteristics

The search strategies that were applied to the various databases yielded 746 references, 478 of which were screened for selection based on the title and/or the abstracts, after the duplicates had been removed. This process resulted in 43 articles selected for full-text screening, which allowed the inclusion of 18 studies fulfilling the predefined selection criteria in the meta-analysis (Fig. 1). These studies represented a total of 3791 participants, 1886 of whom received oral CS, and 1905 were randomized to placebo.

Fig. 1
figure 1

Flow chart of the review

Most of the studies selected for this meta-analysis included patients with knee OA; only one trial investigated patients with hand OA, and one study investigated patients with hip or knee OA. The mean age of the trial participants in the active groups ranged from 58.2 to 67.3 years. Most of the studies used pharmaceutical-grade CS of IBSA origin (Condrosulf®, Chondrosulf®). The trial durations varied from 13 to 104 weeks, and CS at a daily dose of 800 mg was used most frequently. Oral NSAIDs were permitted as rescue or concomitant medication in half of the selected studies. The detailed characteristics of studies included in this meta-analysis are described in Table 1.

Table 1 Characteristics of the included studies

Assessing the risk of bias in each individual study, five studies were found to be at high risk of attrition bias (incomplete outcome data), which means that the authors of these studies failed to analyze the data based on the ITT principle or did not adequately use this principle [26, 27]. This item was the only one for which a high risk of bias was found in the studies selected for the meta-analysis (Appendix 2).

Effect of CS on Pain

All included trials contributed to the analysis of the effect of CS on pain. Overall, the use of CS resulted in a significant positive effect on pain compared with placebo (ES: − 0.63; 95% CI: − 0.91, − 0.35), however with substantial heterogeneity (I2 = 94%; tau2 = 0.33) (Fig. 2).

Fig. 2
figure 2

Meta-analysis of the effect of chondroitin sulfate on pain in patients with osteoarthritis, including all eligible trials

Exploring the potential sources of the observed heterogeneity, the analyses were stratified according to the predefined study-level variables. A substantial decrease of tau2 was observed among the trials that used CS of IBSA origin (tau2 = 0.06) in studies with a low risk of bias for ITT analysis (tau2 = 0.04) and in studies with treatment durations of 4–12 weeks (tau2 = 0.05) and > 12 weeks (tau2 = 0.00). These results suggest that the treatment effect is more consistent between the studies in these strata. Although the other subgroup analyses did not result in a decrease of the tau2 value in either stratum, the CS effect remained positive, regardless of the daily dose or the concomitant use of oral NSAIDs, with no statistically significant subgroup differences. However, the effect size (ES) for studies that did not allow concomitant oral NSAIDs was better (ES: − 0.75; 95% CI: − 1.22, − 0.28) compared with the overall effect (ES: − 0.63; 95% CI: − 0.91, − 0.35) and with the effect in studies with concomitant oral NSAIDs (ES: − 0.53; 95% CI: − 0.91, − 0.15) (see Appendix 3).

The CS effect on pain remained positive in studies with a low risk of bias for ITT analysis (ES: − 0.28; 95% CI: − 0.42, − 0.15; tau2 = 0.04); however, its effect in studies with a high risk of bias did not reach the level of statistical significance (ES: − 1.50; 95% CI: − 3.20, + 0.20; tau2 = 3.68). Sensitivity analyses were then performed, excluding the studies with a high risk of bias. Each pre-specified subgroup analysis was performed again on this basis, using the fixed effect model. The results confirmed, overall and by subgroup, the findings previously described in all included studies. Table 2 summarizes the results of the sensitivity analyses. These results particularly confirmed that CS of IBSA origin had a significant effect on pain reduction in patients with knee OA (ES, fixed: − 0.25; 95% CI: − 0.34, − 0.16), contrary to the compounds from the other manufacturers (ES, fixed: − 0.08; 95% CI: − 0.19, + 0.02). The test for subgroup difference was significant (p = 0.02), suggesting that CS of IBSA origin affected pain in a manner different from the placebo compared with CS from the other manufacturers.

Table 2 Results of the sensitivity analyses for the effects of CS on pain and the Lequesne index, excluding the studies with a high risk of attrition bias

Effect of CS on the Lequesne Index

Ten studies reported LI data. Together, these studies showed a significant effect of CS on functional status (ES: − 0.82; 95% CI: − 1.31, − 0.33), although with high inconsistency (I2 = 95%) and between-study variability (tau2 = 0.58) (Fig. 3).

Fig. 3
figure 3

Meta-analysis of the effect of chondroitin sulfate on the Lequesne index in patients with osteoarthritis, including all eligible trials

Subgroup analyses resulted in important tau2 decreases in the subset of studies using CS of IBSA origin (tau2 = 0.03) and in the studies with a low risk of bias for ITT analysis (tau2 = 0.02), leading to more consistent ES of − 0.39 (95% CI: − 0.60, − 0.18; I2 = 53%) and − 0.32 (95% CI: − 0.47, − 0.17; I2 = 44%), respectively. In the subset of studies using the compounds from other manufacturers, large imprecision in the treatment effect estimate was observed (ES: − 1.24; 95% CI: − 2.42, − 0.07), with an increased tau2 value (1.74); a similar result was obtained in studies with a high risk of bias for ITT analysis (ES: − 1.96; 95% CI: − 3.84, − 0.08; tau2 = 2.67) (Appendix 4).

In sensitivity analyses using only the studies without any high risk of bias, CS of IBSA origin was found to be better (ES, fixed: − 0.33; 95% CI: − 0.47, − 0.20) than the preparations from the other manufacturers (Table 2).

Other Sensitivity Analyses

Additional sensitivity analyses were conducted, excluding only the two studies with outlier values for the symptomatic efficacy of CS in patients with OA. These analyses showed similar results as those previously described (results not shown).

Assessment of Funnel Plot Asymmetry and Exploration of Sources of Residual Heterogeneity

Funnel plot asymmetry was assessed, stratifying the analyses on whether attrition bias was high in the studies. The Egger’s test was statistically significant for both pain (p = 0.004) and the LI (p = 0.049), and visual inspection of the two funnel plots showed an asymmetry on the left (Fig. 4). However, for pain, the studies were well distributed around the vertical line (overall ES). For the LI, there were more studies on the right side of the vertical line than on the left. Instead of any publication bias issue, these asymmetries seemed to be likely due to true heterogeneity [24], at first sight, generated by studies with outlier ES; these studies were also among those with a high risk of bias. Limiting the analyses to studies without any high risk of bias [24], the funnel plot asymmetry persisted (see Appendix 5). We then computed Spearman’s correlation coefficient between the study size and treatment effects, testing whether there was an association between these two parameters [24, 44]. We found strong positive correlations between study size and intervention effects in those studies without any high risk of bias—pain (rS = 0.93; p = 0.00); LI (rS = 0.86; p = 0.01)—confirming that the funnel plot asymmetry was due to true heterogeneity [24].

Fig. 4
figure 4

Funnel plots for pain (a) and the Lequesne index (b), using data from the primary meta-analyses including all eligible trials, stratified according to the risk of attrition bias in studies

We performed an additional exploratory meta-analysis with studies at low risk of bias for ITT analysis, without the smallest ones. No heterogeneity was found [overall effects: pain (I2 = 26.6%, p = 0.20); LI (I2 = 0%, p = 0.47)], demonstrating that the residual heterogeneity was due to the smallest studies (Appendix 5). Thus, we concluded that there was no publication bias in this meta-analysis.

GRADE Assessment of Findings

From “low” when no consideration was given to the brand and risk of bias, the certainty of evidence was upgraded to “moderate” for pain and “high” for the LI with CS of IBSA origin in sensitivity analyses using only the studies without a high risk of bias. Heterogeneity was the only issue affecting the certainty of evidence in this meta-analysis, being “serious” or “very serious” in some analyses (see Appendix 6).

Discussion

The objective of this meta-analysis was to determine whether oral CS is effective in alleviating pain and improving functional status in patients with knee OA and to identify the factors that explain heterogeneity in clinical trial results. Overall, this new meta-analysis showed that CS provides moderate pain benefit (SMD: − 0.63; 95% CI: − 0.91, − 0.35) and has a large effect on function (SMD: − 0.8 2; 95% CI: − 1.31, − 0.33), however with substantial inconsistency (I2 = 94% and 95%, respectively). This large inconsistency decreased our confidence in such evidence, which was then graded as “low” [45]. Similar results were reported by the latest Cochrane review on CS in OA, which, however, concluded that the observed effects were clinically meaningful [46]. Investigating the causes of the heterogeneity, as recommended by Glasziou and Sanders [22], our sensitivity analyses on studies without any high risk of bias upgraded our confidence to “moderate” for pain (SMD: − 0.18; 95% CI: − 0.25, − 0.12; I2 = 71%) and to “high” for the LI (SMD: − 0.28; 95% CI: − 0.39, − 0.18; I2 = 44%, p = 0.10). This improved quality of the evidence was associated with decreased but statistically significant effect sizes, as reported by McAlindon et al. [47], who found that the aggregated effect size of CS was diminished when only high-quality trials were considered.

In contrast to our findings and those of some other meta-analyses on CS [4649], two meta-analyses concluded that the benefit of CS was minimal or non-existent [50, 51]. However, rather than any real differences in the combined treatment effect estimates, the conflicting conclusions reached by the authors of these meta-analyses were primarily because of the various results obtained when attempting to explain the observed heterogeneity in the overall effects [50] and the sample size restrictions in the study selection [51]. Additionally, in one of these meta-analyses, placebo-controlled and non-placebo-controlled studies were included and analyzed together, and some of the included studies did investigate intramuscular CS [50].

Although subgroup and sensitivity analyses are of an exploratory nature, they help to understand the reasons for the treatment effect variations across studies [15]. In the current meta-analysis, these exploratory analyses provide evidence that the effects of CS on pain and functional status are not the same, based on brand. In fact, after stratifying the analyses according to the compound’s manufacturer in studies with a low risk of attrition bias, we found greater effects in the subset of studies conducted with the pharmaceutical-grade CS of IBSA origin (SMD for pain: − 0.25; 95% CI: − 0.34, − 0.16) but lower and non-statistically significant effects (SMD for pain: − 0.08; 95% CI: − 0.19, + 0.02) when the preparations from the other manufacturers were used (p value for interaction = 0.02). Previous meta-analyses investigating inconsistency in trials on GS found similar results to those obtained in this meta-analysis [14, 52].

Differences in the treatment effects, according to brand, may be explained by differences in the quality of the compounds. Indeed, as reported by Volpi [37], these quality differences result in different therapeutic activity in the CS preparations. Pharmaceutical-grade CS preparations have been found to be of high and standardized purity and properties, contrary to the non-pharmaceutical-grade preparations [53, 54], which explains the position of the European Society for Clinical and Economic Aspects of Osteoporosis and Osteoarthritis (ESCEO) regarding glucosamine and chondroitin (i.e., recommending only the patented pharmaceutical-grade preparations of crystalline GS and CS to ensure a clinical benefit in patients with OA) [9, 54]. Contrary to the ESCEO, some other clinical guidelines such as those of the Osteoarthritis Research Society International (OARSI) [7] and the American College of Rheumatology [8] did not make the same recommendation, primarily because they did not make any distinction in terms of quality of GS and CS.

Exploring heterogeneity, our analyses showed that the effect of CS was positive at 3 months and continued until 4–12 months of treatment, but not beyond. Such results may be misleading in terms of the long-term effect of CS. Of the three studies that contributed to the analysis in that subgroup, one used a non-pharmaceutical-grade CS (Table 1), and in the two others (which used CS of IBSA origin), the primary end point was the joint space width (JSW), and the Michel et al. study [55] included many patients without baseline pain. Therefore, these data are not sufficient to make a conclusion regarding the long-term symptomatic effects of CS. However, investigating the effect of CS on joint space narrowing, it was shown in a meta-analysis of RCTs that CS reduced the rate of decline of JSW of patients with knee OA over a 2-year treatment period [56].

The comparison of the CS effect according to the concomitant use of oral NSAIDs did not show any statistically significant difference between the subsets of studies that allowed NSAIDs and the others for both pain and the LI. However, our analyses showed a better effect of CS on pain when no concomitant NSAIDs were permitted. This finding might suggest that the intrinsic effect of CS is mitigated by the concomitant use of NSAIDs during RCTs. In this meta-analysis, half of the included studies allowed concomitant NSAIDs (Table 1).

Our analyses showed that doses of CS ranging from 800 mg to 2000 mg were effective on both pain and the LI, with a tendency for a better effect with the doses ranging from 1000 to 2000 mg/day.

Finally, investigating the sources of the remaining heterogeneity in various subgroups in studies without any high risk of bias, we found strong correlations between the treatment effects and study size, suggesting true heterogeneity due to study size [24]. This result might suggest that the smallest studies included (more) patients with more severe disease than the others. Indeed, it has been reported that the size of the treatment effect in clinical trials could be increased by selecting high-risk patients [57], given that patients with more severe disease seem to be more likely to demonstrate beneficial treatment effects than other patients [58]. However, we were unable to show a correlation between the mean baseline VAS scores and mean ES (SMD). For the studies included in this meta-analysis, in which a 0–100-mm visual analog scale (VAS) was used for pain measurement, various levels of pain at baseline were considered for patient inclusion; however, the mean baseline values were relatively close. Beyond the variations in the minimum level of pain required for inclusion, it is likely that patients with various levels of pain would have been included in each single study (SD varied from 13 to 17 mm VAS across most of the studies), which might have led to a likely within-study treatment effect variation and then to between-study heterogeneity [22].

Compared with the ES reported for opioid analgesics, oral NSAIDs or intra-articular hyaluronic acid for which the same inconsistency issue was reported [5961], CS appears to be a good treatment option for patients with OA, considering its superior safety profile [46]. Long-term trials assessing the symptomatic effects of oral CS are warranted. To obtain better results on the intrinsic benefit of CS over placebo, concomitant anti-OA medications (NSAIDs and others) should not be allowed during the trials, in accordance with the guidelines from the European Medicines Agency [62].

Some limitations are present in this meta-analysis, including the few studies on oral CS in OA, and particularly the number of studies that reported data on the LI, which limits any comparison of the effects of CS on pain and functional status. Another limitation to this meta-analysis is the fact that only the LI was considered as a measure of functional status. Indeed, at the stage of the protocol, pain, physical function and the LI were considered as co-primary outcomes. However, during the screening of individual studies, we realized that only a few studies assessed physical function using a specifically dedicated tool (5 studies used the WOMAC function questionnaire and 1 used another tool). Since the LI (including parameters on pain or discomfort, maximum distance walked and activities of daily living) and the WOMAC function index (including only questions assessing the degree of difficulty while performing various activities of daily living) do not really measure the same things, it would not have been adequate to combine data on function measured by these two different tools in a meta-analysis, as has been done for pain. Therefore, we decided to limit our outcomes of interest in this meta-analysis to pain and LI, using the LI as a measure of functional status.

Conclusion

Overall, oral CS is superior to placebo in the management of OA, with a moderate benefit for pain and a large effect on functional status measured by the LI, however with a high level of inconsistency. Our analyses showed that risk of bias, brand of CS and study size are the factors that explain the inconsistency in clinical trial results. Specifically, this meta-analysis demonstrated the following results.

  • The pharmaceutical-grade preparations with CS of IBSA origin generated greater benefit for pain and functional status than the other CS.

  • The effect of CS was positive at 3 and 12 months for both pain and functional status.

  • Administering oral NSAIDs as concomitant medication during RCTs did not significantly alter the intrinsic benefit of CS on pain. However, the effect size was greater when NSAIDs were not allowed, suggesting that the real benefit of CS on pain might be greater than that reported from RCTs.