Results of the search
The total records identified through database searching and other sources were 513. Four hundred ninety-seven (497) were excluded based on the title and abstract and 16 for more detailed evaluation. After that, 7 studies were included [24, 26, 33,34,35,36,37], while 9 studies were excluded [15,16,17,18, 38,39,40,41,42] according to the inclusion criteria (Fig. 1). The 9 studies were Kimmatkar 2003 [15], Chopra 2004 [38], Shah 2010 [17], Chopra 2013 [39], Perera 2014 [16], Belcaro 2015 [40], Bolognesi 2016 [41], Gupta 2011 [18], Badria 2002 [42].
Description of included trials
A total of 7 RCTs with 545 participants were included. Sengupta’s RCTs consist of 2 trial groups and 1 control group [24, 26]. According to the method of Cochrane Handbook 5.1.0, the control group was divided into two groups, matching the two trial groups (Sengupta 2008a and Sengupta 2008b; Sengupta 2010a and Sengupta 2010b) [31]. The characteristics of the RCTs were shown in Tables 2 and 3.
Table 2 The characteristics of the included studies Table 3 The characteristics of the included studies Risk of Bias in included studies
The summary and graph of risk of bias ware shown in Fig. 2.
Sequence generation
Among the 7 included RCTs, three studies [35,36,37] adopted unclear randomization procedures, we therefore rated it as having an unclear risk of bias. The other RCTs described their randomization procedures: Sengupta 2010 [24], Sengupta 2008 [26] and Vishal 2011 [33] utilized the computer-generated randomization scheme, while Haroyan 2018 [34] used the treatment randomization code to draw randomization. Thus, these RCTs were thought to have low risks of bias.
Allocation concealment
Sengupta 2010 [24], Sengupta 2008 [26], Vishal 2011 [33] and Haroyan 2018 [34] described that the appearance, smell and color of drugs preparations were similar and organoleptically indistinguishable, which is an acceptable method of allocation concealment. Hence, they were rated as having low risks of bias. Karimifar 2017 [35], Notarnicola 2016 [36] and Notarnicola 2011 [37] did not describe an acceptable method of allocation concealment; therefore, they were rated as having an unclear risk of bias.
Blinding
For blinding of participants and personnel, although all RCTs claim to use blinding, only Haroyan 2018 [34], Notarnicola 2016 [36] and Notarnicola 2011 [37] described the implementation process of blinding. Thus, we gave a low risk of bias for them. The other RCTs were rated as having an unclear risk of bias for they did not describe the blind implementation process.
For blinding of outcome assessment, the statisticians in Sengupta 2010 [24], Sengupta 2008 [26] and Vishal 2011 [33] wasn’t blinded, hence, they were rated as having a high risk of bias. Haroyan 2018 [34], Notarnicola 2016 [36] and Notarnicola 2011 [37] described the implementation process of blinding, hence, we gave a low risk of bias for them. Karimifar 2017 [35] were rated as having an unclear risk of bias for they did not describe the blind implementation process.
Incomplete outcome data
The missing outcome data of all RCTs balanced in numbers across intervention groups with similar reasons for missing data across groups. We gave them low risks of bias.
Selective reporting
One RCT (Vishal 2011 [33]) failed to provide all outcomes mentioned in its protocols, thus we thought its risk of bias was high. The other RCTs provided their protocols, and all of the study’s pre-specified outcomes that are of interest in the review had been reported in the pre-specified way; their risks of bias were low.
Other potential bias
In the RCTs of Sengupta 2010 [24], Sengupta 2008 [26] and Vishal 2011 [33], their protocols noted three primary outcome parameters, which means that when p = 0.05/3 = 0.017, the difference is significant. Strictly speaking, the statistics of this study were not carried out correctly. Hence, their risks of bias were high. Similarly, in Haroyan 2018 [34]‘s RCT, the difference is significant when p = 0.05 / 2 = 0.025. Its risk of bias was also high. Other sources of bias were not observed in other RCTs; therefore, the risks of other bias of them were low.
Primary outcomes
Visual analogue score
Six RCTs [24, 26, 33, 35,36,37] reported the changes of the visual analogue score (VAS) at the end of treatment. Due to the high heterogeneity (Tau2 = 10.85, I2 = 94%, P<0.00001), we used random effect model. In this index, it can be found that in improving VAS, Boswellia is better (WMD -8.33; 95% CI -11.19, − 5.46; P<0.00001). (Fig. 3).
WOMAC
Four RCTs [24, 26-, 34–35] reported the changes of WOMAC pain. Due to the high heterogeneity (Tau2 = 94.69, I2 = 99%, P<0.00001), we used random effect model. According to the result, compared with the control group, Boswellia is better in improving WOMAC pain (WMD -14.22; 95% CI -22.34, − 6.09; P = 0. 0006). (Fig. 4).
Four RCTs [24, 26, 33, 34] reported the changes of WOMAC stiffness. Due to the high heterogeneity (Tau2 = 44.40, I2 = 97%, P<0.00001), we used random effect model. According to the result, compared with the control group, Boswellia is better in improving WOMAC stiffness (WMD -10.04; 95% CI -15.86, − 4.22; P = 0. 0007). (Fig. 5).
Four RCTs [24, 26, 33, 34] reported the changes of WOMAC function. Due to the high heterogeneity (Tau2 = 23.03, I2 = 93%, P<0.00001), we used random effect model. According to the result, compared with the control group, Boswellia is better in improving WOMAC function (WMD -10.75; 95% CI -15.06, − 6.43; P<0.00001). (Fig. 6).
Lequesne index
Six RCTs [24, 26, 33, 35,36,37] reported the changes of lequesne index. Due to the high heterogeneity (Tau2 = 0.55, I2 = 47%, P = 0.07), we used random effect model. According to the result, compared with the control group, Boswellia is better in improving Lequesne Index (WMD -2.27; 95% CI -3.08, − 1.45; P<0.00001). (Fig. 7).
Secondary outcomes
Pain
Several RCTs reported Pain index (VAS and/or WOMAC pain) at week 4, 8, 12. The details of them were shown in Table 4 and Figure S1 ~ S6 (see Supplementary Materials).
Table 4 the pain index at week 4, 8, 12 Stiffness
Several RCTs reported Stiffness index (WOMAC stiffness) at week 4, 8, 12. The details of them were shown in Table 5 and Figure S7 ~ S9 (see Supplementary Materials).
Table 5 the stiffness index at week 4, 8, 12 Function
Several RCTs reported Function index (WOMAC function) at week 4, 8, 12. The details of them were shown in Table 6 and Figure S10 ~ S12 (see Supplementary Materials).
Table 6 the function index at week 4, 8, 12 Adverse events
Five studies [24, 33,34,35, 37] reported AEs. Three of them were excluded because they reported no events in both arms. According to the results, there is also not strong evidence that which one is safer because there was no statistical difference (RR 0.63; 95% CI 0.22, 1.83; P = 0.39) (Fig. 8).