The use of systematic reviews (SRs) for making evidenced-based decisions on healthcare is common practice in the clinical setting. Although most experimental animal studies aim to test safety and or efficacy of treatments to be used for human healthcare, summarizing the available evidence in an SR is far less common in the field of laboratory animal experiments. Fortunately, since an influential commentary was published in the Lancet (2002) [1], first setting out the scientific rationale for SRs of animal studies, awareness of the merits of SRs of experimental animal studies has been steadily increasing [2]. The methodology for conducting SRs of animal intervention studies is currently evolving but not yet as advanced as for clinical studies. In the clinical field, the randomized controlled trial (RCT) is considered the paradigm for evaluating the effectiveness of interventions. Animal intervention studies, like RCTs, are experimental studies, but they differ from RCTs in many respects [3] (Table 1, supporting information in Additional file 1). This means that some aspects of the systematic review process need to be adapted to the characteristics of animal intervention studies. In this paper, we focus on the methodology for assessing the risk of bias in animal intervention studies.

Table 1 Main differences between randomized clinical trials (RCTs) and animal intervention studies

The extent to which an SR can draw reliable conclusions depends on the validity of the data and the results of the included studies [48]. Assessing the risk of bias of the individual studies, therefore, is a key feature of an SR. To assess the risk of bias of RCTs, the Cochrane Collaboration developed the Cochrane RoB Tool [9]. Such a general tool is not yet available for animal intervention studies. The checklists and scales currently used for assessing study validity of animal studies [1014] vary greatly, are sometimes designed for a specific field (i.e., toxicology) and often assess reporting quality and internal and external validity simultaneously. We believe that, although it is important to asses all aspects of study quality in an SR, the assessment and interpretation of these aspects should be conducted separately. After all, the consequences of poor reporting, methodological quality and generalizability of the results are very different. Here, the SYstematic Review Centre for Laboratory animal Experimentation (SYRCLE) presents an RoB tool for animal intervention studies: SYRCLE’s RoB tool. This tool, based on the Cochrane Collaboration RoB Tool [9], aims to assess methodological quality and has been adapted to aspects of bias that play a role in animal experiments.


Development of SYRCLE’s RoB tool

The Cochrane RoB Tool was the starting-point for developing an RoB tool for experimental animal studies. The Cochrane RoB Tool assesses the risk of bias of RCTs and addresses the following types of biases: selection bias, performance bias, attrition bias, detection bias and reporting bias [9]. The items in the Cochrane RoB Tool that were directly applicable to animal experiments were adopted (Table 2: items 1, 3, 8, 9 and 10).

Table 2 SYRCLE’s tool for assessing risk of bias

To investigate which items in the tool might require adaptation, the differences between randomized clinical trials and animal intervention studies were set out (Table 1). Then we checked whether aspects of animal studies that differed from RCTs could cause bias in ways that had not yet been taken into account in the Cochrane RoB tool. Finally, the quality assessments of recent systematic reviews of experimental animal studies were examined to confirm that all aspects of internal validity had been taken into consideration in SYRCLE’s RoB tool.

To enhance transparency and applicability, we formulated signaling questions (as used in the QUADAS tool, a tool to assess the quality of diagnostic accuracy studies [15, 16]) to facilitate judgment. In order to obtain a preliminary idea of inter-observer agreement for each item in the RoB tool, Kappa statistics were determined on the basis of 1 systematic review including 32 papers.


SYRCLE’s RoB tool

The resulting RoB tool for animal studies contains 10 entries (Table 2). These entries are related to 6 types of bias: selection bias, performance bias, detection bias, attrition bias, reporting bias and other biases. Items 1, 3, 8, 9 and 10 are in agreement with the items in the Cochrane RoB tool. The other items have either been revised or are completely new and will be discussed in greater detail below. Most of the variations between the two tools are a consequence of the differences in design between RCTs and animal studies (see also Table 1). Shortcomings in, or unfamiliarity with, specific aspects of the experimental design of animal studies compared to clinical studies also play a role.

Bias due to inadequate randomization and lack of blinding

Random allocation of animals to the experimental and control groups, firstly, is not yet standard practice in animal experiments [17]. Furthermore, as the sample size of most animal experiments is relatively small, important baseline differences may be present. Therefore, we propose to include the assessment of similarity in baseline characteristics between the experimental and control groups as a standard item. The number and type of baseline characteristics depend on the review question. Before launching a risk of bias assessment, therefore, reviewers need to discuss which baseline characteristics need to be comparable between the groups.

Secondly, we slightly adjusted the sequence allocation item, specifying that the allocation sequence should not only be adequately generated but also be adequately applied. We decided to do so because, in animal studies, diseases are often induced rather than naturally present. The timing of randomization, therefore, is more important than in a patient setting: it needs to be assessed whether the disease was induced before actual randomization and whether the order of inducement was randomly allocated. The signaling questions for judging this entry are represented in Table 3.

Table 3 Signaling questions

Thirdly, a new item pertains to randomizing the housing conditions of animals during the experiment. In animal studies, the investigators are responsible for the way the animals are housed. They determine, for example, the location of the cage in the room. As housing conditions (such as lighting, humidity, temperature, etc.) are known to influence study outcomes (such as certain biochemical parameters and behavior), it is important that the housing of these animals is randomized or, in other words, comparable between the experimental groups in order to reduce bias [18]. Animals from different treatment groups, for example, should not be housed per group on different shelves or in different rooms as the animals on the top shelf experience a higher room temperature than animals on the lowest shelf, and the temperature of the room may influence the toxicity of pharmacological agents (Table 4). When cages are not placed randomly (e.g., when animals are housed per group on different shelves), moreover, it is possible for the investigator to foresee or predict the allocation of the animals to the various groups, which might result in performance bias. Therefore, randomizing the housing conditions is also a requisite for adequately blinding the animal caregivers and investigators. Therefore, this has also been included as a signaling question in Table 3.

Table 4 Some underlying evidence for the importance of random housing and random outcome assessment

Fourthly, in a recent update of the Cochrane RoB tool (, bias related to blinding of participants and personnel (performance bias) is assessed separately from bias related to blinding of outcome assessment (detection bias). In our tool, we followed this approach, although animals do not need to be blinded for the intervention as they do not have any expectations about the intervention. In addition, it is important to emphasize that personnel involved in the experimental animal studies should be taken to include animal caregivers. In animal studies, this group is often not taken into account when blinding the allocation of animals to various groups. If animal caregivers know that a drug might cause epileptic seizures or increases urine production, for example, they might handle the animals or clean the cages in the group receiving this drug more often, which could cause behavioral changes influencing the study results.

With regard to adequately blinding outcome assessment (entry 7), possible differences between the experimental and control groups in methods used for outcome assessment should be described and judged. It should also be determined whether or not animals were selected at random for outcome assessment, regardless of the allocation to the experimental or control group. For instance, when animals are sacrificed per group at various time points during the day, the scientist concerned might interpret the results of the groups differently because she or he can foresee or predict the allocation.

Another reason to select animals at random for outcome assessment is the presence of circadian rhythms in many biological processes (Table 4). Not selecting the animals for outcome assessment at random might influence the direction and magnitude of the effect. For example, the results of a variety of blood tests depend on their timing during the day: cholesterol levels in mice may be much higher in the morning after a meal than in the afternoon. Because of these effects, assessing whether or not animals were selected at random for outcome assessment has also been presented as a separate entry.

Reporting bias

As mentioned before, assessing reporting bias is in agreement with the Cochrane RoB tool. It is important to mention, however, that this item is quite difficult to assess in animal intervention studies at present because protocols for animal studies are not yet registered in a central, publicly accessible database. Nevertheless, many have called for registration of all animal experiments at inception [19, 20], so we expect that registration of animal studies will be more common within a few years. For this reason, we already decided to include it in SYRCLE’s RoB tool. Furthermore, protocols of animal studies, like those of clinical studies, can already be published in various (open access) journals, which will also help to improve the standard of research in animal sciences.

Other bias

Beyond the above-mentioned types of bias, there might be further issues that may raise concerns about the possibility of bias. These issues have been summarized in the other bias domain. The relevance of the signaling questions (Table 3) depends on the experiment. Review authors need to judge for themselves which of the items could cause bias in their results and should be assessed. In assessing entry 10 (“Was the study apparently free of other risks of bias?”), it is important to pay extra attention to the presence of unit-of-analysis errors. In animal studies, the experimental unit is often not clear, and as a consequence statistical measures are often inaccurately calculated. For example, if mice in a cage are given a treatment in their diet, it is the cage of animals rather than the individual animal that is the experimental unit. After all, the mice in the cage cannot have different treatments, and they may be more similar than mice in different cages.

Use of SYRCLE’s RoB tool

In order to assign a judgment of low, high or unclear risk of bias to each item mentioned in the tool, we have produced a detailed list with signaling questions to aid the judgment process (Table 3). It is important to emphasize that this list is not exhaustive. We recommend that people assessing the risk of bias of the included studies discuss and adapt this list to the specific needs of their review in advance. A “yes” judgement indicates a low risk of bias; a “no” judgment indicates high risk of bias; the judgment will be “unclear” if insufficient details have been reported to assess the risk of bias properly.

As a rule, assessments should be done by at least two independent reviewers, and disagreements should be resolved through consensus-oriented discussion or by consulting a third person.

We recommend that risk of bias assessment is presented in a table or figure. The investigators can present either the summary results of the risk of bias assessment or the results of all individual studies. Finally, the results of the risk of bias assessment could be used when interpreting the results of the review or a meta-analysis. For instance, sensitivity analysis can be used to show how the conclusions of the review might be affected if studies with a high risk of bias were excluded from the analysis [8, 9].

We do not recommend calculating a summary score for each individual study when using this tool. A summary score inevitably involves assigning “weights” to specific domains in the tool, and it is difficult to justify the weights assigned. In addition, these weights might differ per outcome and per review.

Inter-observer variability

Inter-observer agreement was evaluated using Kappa statistics. At time of writing, the Kappa statistics could only be determined for items 1, 6, 7, 8, 9 and 10 and was based on 2 raters in one systematic review including 32 papers. For items 1, 6, 7, 8, 9 and 10, the inter-observer variability varied between 0.62 and 1.0. Kappa was for item 1: 0.87; item 6: 0.74; item 7: 0.59; item 8: 1.0; item 9: 0.62; item 10: 1.0. Kappa could not be calculated for items 2, 3, 4, and 5 as Kappa is defined for situations with at least two raters and two outcomes, and in these items we had only 1 outcome (unclear risk of bias) as a result of poor reporting.

Discussion and conclusion

In animal studies, a large variety of tools to assess study quality is currently used, but none of the tools identified so far focussed on internal validity only [11]. Most instruments assess reporting quality and internal and external validity simultaneously although consequences of poor reporting, risk of bias and generalizability of the results are very different.

Therefore, we developed SYRCLE’s RoB tool to establish consistency and avoid discrepancies in assessing risk of bias in SRs of animal intervention studies. SYRCLE’s RoB tool is based on the Cochrane RoB tool [9] and has been adjusted for particular aspects of bias that play a role in animal intervention studies. All items in our RoB tool can be justified from a theoretical perspective, but not all items have been validated by empirical research. However, the same holds for the original QUADAS tool (to assess the quality of diagnostic accuracy studies) and the Cochrane RoB tool [8, 16]. For example, in the Cochrane RoB tool, the item on “inadequately addressing incomplete outcome data” is mainly driven by theoretical considerations [8]. In QUADAS, no empirical or theoretical evidence was available for 2 out of the 9 risk of bias items [16].

Although validation is important, providing empirical evidence for all items in this tool is not to be expected in the near future as this would require major comparative studies, which, to our knowledge, are not currently being undertaken or scheduled. Using the existing animal experimental literature is also challenging because the current reporting quality of animal studies is poor [17]; many details regarding housing conditions or timing outcome assessment are often unreported. However, we feel that publishing this tool is necessary to increase awareness of the importance of improving the internal validity of animal studies and to gather practical experience of authors using this tool.

We started to use this tool in our own SRs and hands-on training courses on conducting SRs in laboratory animal experimentation, funded by The Netherlands Organization for Health Research and Development (ZonMW). The first experiences with this tool were positive, and users found SYRCLE’s RoB tool very useful. The inter-rater variability Kappa varied between 0.6 and 1 9. Users also indicated that they had to judge many entries as “unclear risk of bias”. Although most users did not expect this finding, it is not altogether surprising [21, 22], as a recent survey of 271 animal studies revealed that reporting experimental details on animals, methods and materials is very poor [17]. We hope and expect, therefore, that use of this tool will improve the reporting quality of essential experimental details in animal studies [23, 24].

Widespread adoption and implementation of this tool will facilitate and improve critical appraisal of evidence from animal studies. This may subsequently enhance the efficiency of translating animal research results into clinical practice. Furthermore, this tool should be tested by authors of SRs of animal intervention studies to test its applicability and validity in practice. We invite users of SYRCLEs RoB tool, therefore, to provide comments and feedback via the SYRCLE LinkedIn group (risk of bias subgroup) As with the QUADAS, CONSORT and PRISMA statements [15, 16, 25, 26], we expect that user feedback and developments in this relatively new field of evidence-based animal experimentation will allow us to update this tool within a few years.