Background

The experimental studies, particularly randomized controlled trials (RCTs) provide the least biased information on effectiveness of medical interventions and create the basis for systematic reviews on effectiveness of interventions [1]. Blinding of patients and care givers safeguards that knowledge of treatment allocation will not confound the effectiveness estimates [1]. However, the effect estimates may vary between blinded and non-blinded RCTs, and this variation may be dependent on the outcome. Perceived outcomes, for example pain, may be reduced by the placebo effect in non-blinded real-world circumstances, while some objective outcomes, e.g. mortality may show less difference between blinded vs. non-blinded comparisons [2, 3]. The effectiveness of arthroscopic partial meniscectomy (APM) for a ruptured meniscus of the knee has been under debate in scientific journals [4,5,6,7,8,9,10], and was thus chosen for obtaining empirical data for the current study.

The aims of this study were to find and operationalize the study question to be considered first in randomized controlled trials (RCTs); to assess consequences of this operationalization both conceptually and by using empirical data on effectiveness of arthroscopic meniscectomy for meniscal rupture of the knee; and to explore the consequent clinical and research implications.

Methods

The present study utilized the methods for observational effectiveness studies, the benchmarking controlled trials (BCTs), where there is a need for a very detailed description of the study questions, selection of patients, characteristics of patients, interventions, and outcomes [11]. In BCTs the study question is always on effectiveness in routine health care. The BCT framework was used to assess which is the first study question to be asked in experimental studies, the randomized controlled trials (RCTs).

Literature search was undertaken to find all randomized controlled trials published in peer-reviewed journals assessing effectiveness of arthroscopic partial meniscectomy of the knee in comparison to any other non-pharmacological treatment, including sham surgery, among patients having knee pain, with at least one-year follow-up. Trials focusing on knee osteoarthrosis were excluded. The following key words were used: arthroscopic partial meniscectomy, randomized controlled trial, systematic review. Cochrane CENTRAL, Ovid MEDLINE, and Web of Science databases till October 2017 were used to find the eligible articles by the author, who checked the search findings to exclude misclassifications. The search strategy is described in Additional file 1.

The descriptive information in each trial concerning blinded vs non-blinded study design, selection of patients, and characteristics of patients, interventions and outcomes were extracted by the author, who rechecked twice the accuracy of the data. The study question characteristics were also depicted in a flow chart to show the similarities and differences between the studies.

Results

Characteristics of RCTs assessing pure intervention effect or effectiveness in real-world health care are shown in Table 1. The validity issues in RCTs utilizing a double blinded design and in RCTs using a non-blinded design are shown in Fig. 1. The double blinded design aims to assess the pure (specific) effects of an intervention, often biologic effects; while in routine health care the specific effect is complemented with a placebo effect and non-specific effects caused by the interaction between the patient and those providing care (Fig. 2).

Table 1 Characteristics of randomized controlled trials (RCTs) aiming to study effectiveness (or cost-effectiveness) of an intervention per se or effectiveness (or cost-effectiveness) of an intervention in routine health care circumstances
Fig. 1
figure 1

The validity issues in RCTs utilizing a double blinded design and in RCTs using a non-blinded design

Fig. 2
figure 2

The components of the effect estimates of a double blinded design providing evidence of pure intervention effect and of a non-blinded design providing evidence of intervention effect in routine health care circumstances. The quantitative effects are illustrative and not based on empirical data

Study question dictates double blinded study design

When the study question is to assess the pure intervention effect the answer can be obtained only by a double blinded design (Table 1). The comparison can be between active intervention and placebo, or between two or more active interventions. When the comparison is to placebo, the study assesses the incremental effectiveness of the intervention beyond the effect caused by placebo. The study question in a double blinded randomized controlled trial is on pure (usually biological) effect of the treatment, while the clinical question in routine health care, where the patient and the care provider are always aware of the treatment options, is on what is the biological effect of the intervention plus the placebo effect of this intervention, Furthermore the overall effect is increased with non-specific treatment effects (information, advice, support) related to the intervention under study (Fig. 2).Therefore, double blind placebo controlled trials do not answer the question of effectiveness in routine health care. When two or more active interventions are compared with another in a double blind RCT the study assesses difference in effectiveness between the interventions, when placebo effect has been controlled by the double blinded design.

Study question dictates non-blinded study design

In routine health care patients and their health care providers are always aware of the treatment options and to concord with this a non-blinded RCT is necessary (Table 1). When the comparison is between active intervention and no intervention (a choice to be made in clinical praxis), the study assesses the pure intervention effect plus its placebo effect plus non-specific treatment effects in comparison to none of these three effectiveness components (Fig. 2). Similarly, when the comparison is between two active interventions the study question again accords with the clinical question in routine health care, where in both groups the specific effect, placebo effect and non-specific effect add to the effectiveness estimates.

The study questions in RCTs on arthroscopic partial meniscectomy

The primary literature search of the three databases found altogether 2375 abstracts of articles. Based on data in abstracts altogether 6 randomized controlled trials (RCTs) assessing effectiveness of arthroscopic partial meniscectomy and fulfilling the inclusion criteria where found (Table 2). [12,13,14,15,16,17]

Table 2 The main characteristics of the six randomised controlled trials assessing effectiveness of arthroscopic partial meniscectomy

All the 6 trials had reported appropriately the items needed for assessment of the study question. Information of patients’ selection process to the study was limited in five studies (Table 2). In one study, Gauffin et al., it was reported that more than 95% of patients were referred by the general practitioners to the study hospital from its catchment area19. The number of patients recruited per year per hospital varied from 6 to 82 between the studies. The proportion of eligible patients declining participation varied from 3 to 55%.

Very little or nothing was reported on general health status, comorbid conditions, behavioural factors like degree of physical activity, environmental factors like work conditions, or on degree of education or other socioeconomic factors of the patients (Table 2).

The trial by Gauffin et al. was the only study, which had a prerequisite of 12 weeks of exercises before eligibility of the patients was considered. In three studies previous exercise was not required before randomizing patients (Table 2).

In four studies crossover from conservative treatment to surgery varied from 19 to 36% during one year follow-up. In the studies by Sihvonen et al. and Yim et al. the cross-over was negligible.

Four studies used KOOS (Knee injury and Osteoarthritis Outcome Score) as their primary outcome; one study used WOMAC (Western Ontario and McMaster Universities Osteoarthritis Index), one study used visual analogue scale for pain and the Lysholm Knee Scoring Scale. Only one study used the meniscal lesion specific WOMET (Western Ontario Meniscal Evaluation Tool) as the primary outcome (Table 2). In one study no patient dropped out from the follow-up, in four trials at least 90% of patients did attend the primary follow-up; in one study (Gauffin et al.) 20% and 7% of patients were lost to follow-up in the exercise and APM groups, respectively.

Figure 3 shows in a flow chart the characteristics of the study questions: the pure intervention effect or effectiveness of intervention in routine praxis; representativeness of the study populations, the treatments before patients were considered eligible for the trial, the contents of the treatments in the actual experiment, and the primary outcome measures. Five of the RCTs had a non-blinded study design, and one had a double blinded design. Only one study, Gauffin et al., had recruited a comprehensive (and thus representative) patient population from its catchment area [16]. There were between study differences in the degree of concomitant osteoarthrosis of the knee. Gauffin et al. was the only study, which had 12 weeks of exercises tried before surgery. The content of the index and control interventions varied as well as the primary outcomes and their decisive time-points. In the trial by Gauffin et al., arthroscopic partial meniscectomy was more effective than exercise therapy. In the four other RCTs no treatment effectiveness was found. The only trial with a double blinded design, Sihvonen et al., found no effectiveness of arthroscopic partial meniscectomy in comparison with sham surgery.

Fig. 3
figure 3

The study question analysis flowchart in randomized controlled trials on effectiveness of arthroscopic partial meniscectomy. The sequence is the following: Study population Intervention effectiveness per se or intervention effectiveness in routine health care (blinded or non-blinded design)  Degree of selection (representative or non-representative study population)  Subcategory of the study population Prior treatments before the experiment Content of the index and reference interventions Reference to the study; Primary outcome measures. 1 KOOS (Knee injury and Osteoarthritis Outcome Score); 2Lysholm (Lysholm Knee Scoring Scale): 3Tegner Activity Scale; 4VAS visual analog scale for pain; 5WOMAC (Western Ontario and McMaster Universities Osteoarthritis Index);6WOMET (Western Ontario Meniscal Evaluation Tool)

Discussion

The main finding of this paper is that the study question of assessing pure intervention effect dictates use of a double blinded design, and the study question of assessing effectiveness of intervention in routine health care settings dictates use of non-blinded RCT (Table 1, Fig. 1).

Table 2 and Fig. 3 show that all the RCTs on effectiveness of arthroscopic meniscectomy have studied a different study question, and therefore it was not possible to assess empirically the differences in effectiveness between those studies with non-blinded vs those with a blinded study design.

Double-blinded placebo controlled RCTs are needed to ensure that the intervention has favorable biological effects, and that the effects exceed the potential harms. If there is enough evidence that the intervention does not have any biological effect in a particular patient group, it may not be justified to study effectiveness of this intervention among similar patients in routine health care. Non-blinded RCTs are needed to gain evidence of effectiveness in routine health care circumstances, and double blinded RCTs, which have shown effectiveness of an intervention per se, should be followed by non-blinded RCTs to evaluate the effect in routine health care circumstances.

The effectiveness shown by a double blinded RCT is different from the effectiveness in ordinary health care circumstances, because in the latter placebo effects and non-specific treatment effects add to the pure intervention effect. Consequently, evidence of effectiveness from blinded placebo controlled RCTs, may not be as such valid for decisions in routine health care, and neither for informing patients. The blinded RCTs may underestimate effectiveness in routine health care, where placebo and non-specific effects add to the pure intervention effect; making also the number needed to treat figures biased. Therefore, when informing patients the potential for the placebo effect and non-specific effects to increase further the pure intervention effect should be taken into account.

These findings have implications also for health economics. The most valid way to obtain evidence on cost-effectiveness of a particular intervention is by an economic analysis alongside a randomized controlled trial [18]. However, if a double blinded design has been utilized, the effectiveness consists only of the effect of the pure intervention effect, and does not reflect the routine health care context, where there is also the placebo effect plus the effects provided by the interaction between the patient and the therapists. Thus, cost-effectiveness information from double blind RCTs with economic analysis should be followed by non-blinded trials to answer the routine health care study question. This applies also to modelling studies assessing the incremental cost-effectiveness ratios (ICERs): double blinded RCTs may not provide valid effectiveness estimates, as the placebo effect and other effects by routine health care are not considered. Thresholds for acceptable cost per one health related quality of life (HRQoL) are used in some countries like UK [19]. Again, the ICER estimates based on double blinded placebo-controlled trials may be biased and lead to larger costs per health-related quality of life (HRQoL), than actually occur in routine health care circumstances for which the estimates are intended.

Blinded RCTs are not the best way to assess efficacy (i.e. effectiveness in ideal circumstances) of an intervention, but ideal circumstances (meticulously selected patient population, most competent staff) can be designed equally well also for the non-blinded, routine health care experiments. Double blinded randomized trials are often conducted in optimal circumstances, and in these cases may reveal the best attainable effectiveness estimates, but they may also be conducted - as pragmatic trials - in routine health care contexts. Consequently, the idea of categorically denoting efficacy to double blind RCTs and effectiveness to non-blinded RCTs may not be justifiable.

The extent of the placebo effects and non-specific effects in routine health care may be dependent on the individual patient, and on the health care provider, and how well they are able to communicate between each other. Additional potential modifying factors may be e.g. competence of staff, structures of the health care system and cultural features [20]. There is a need to study these issues as modifiers of effectiveness in different patient groups and health care systems.

The assessment of risk of bias (internal validity) of the two study designs differs. When the study question is on intervention effectiveness per se, the success of double blinding and concealment of treatment allocation are of outmost importance. But, when the study aims to quantify the effectiveness of an intervention in routine health care circumstances, where both patients and health care staff are aware of the interventions chosen, blinding is not justified. Hitherto, the instructions for assessment of risk of bias in RCTs consider success of blinding in all RCT designs an important validity criterion [21]. The present paper argues that this interpretation is not tenable, and a distinction should be made between the two main study questions, which dictate whether to use blinded RCTs or non-blinded RCTs.

The study question analysis on arthroscopic meniscectomy of the knee shows major differences in the characteristics of the six trials, and all the trials are clinically heterogeneous. Therefore comparison of the double blinded vs non-blinded trials is not appropriate. The trial by Gauffin et al., which found effectiveness in routine health care, might be reproduced in a double blinded design with a sham surgery comparison to quantify the pure intervention effect in this representative patient population. In the four other non-blinded RCTs no treatment effectiveness was found, and considering the invasive nature of the intervention, with potential harms there may not be an indication to repeat these experiments in a double blinded design. In the only double blinded trial by Sihvonen et al. no benefits of surgery were found. As the surgery involves potential risks, there is no justification to proceed into non-blinded study design using same patient, intervention and outcome characteristics.

The study question analysis can be applied for planning and assessing of future RCTs, and for assessment of clinical homogeneity in systematic reviews.

Conclusions

When the aim is to assess pure intervention effect, a double blinded RCT is indicated, and when the intention is to assess effectiveness of an intervention in routine health care circumstances, a non-blinded RCT is required. Appropriate blinding of patients and therapists is an essential validity criterion when assessing pure intervention effect, but blinding is contraindicated, when assessing effectiveness of interventions in routine health care. There is a need for non-blinded trials assessing effectiveness and cost-effectiveness in different patient groups and health care settings. When informing patients, the potential for additional effects besides the pure intervention effect should be considered. The study question analysis of the RCTs on arthroscopic meniscectomy of the knee showed that all the trials are clinically heterogeneous, and do not allow a meta-analysis or a comparison of the double blinded vs non-blinded trials.