Introduction

Preclinical research, particularly important to biomedical research, forms the foundation on which future studies are built. The exciting and new ideas it provides will eventually turn into clinical studies and new drugs that provide benefit to humankind. However, some preclinical research is poorly predictive of ultimate success in the clinic owing to the low methodology and reporting quality (Hackam and Redelmeier, 2006; Perel et al. 2007). Indeed, a structured approach or full details in the randomization steps, baseline characteristic balance, blinding procedures, sample size calculations, or sufficient reporting of an experiment allow for reproducibility of the findings. However, an increasing number of evidence showed highly inadequate methodological and reporting quality upon animal research in some scientific publications (Kilkenny et al. 2012). Thus, there is a growing need for valid, efficient, and easy scoring scales and systematic assessment to provide rigorous scientific methods and rate the quality of animal studies. At present, a few studies have assessed the quality of experimental methods and reports by evaluating compliance with various assessment tools (Fabian-Jessing et al. 2018). The Systematic Review Centre for Laboratory Animal Experimentation (SYRCLE), based on the Cochrane Collaboration risk-of-bias (RoB) tool, is an assessment instrument to evaluate the risk of bias and the methodological quality of animal studies (Hooijmans et al. 2014). The Animal Research: Reporting In Vivo Experiments (ARRIVE), a comprehensive set of guidelines for animal research, provides format and content of details relating to animals in a typical scientific report (Kilkenny et al. 2012). Thus, adherence to the two comprehensive sets of rules for inclusion of detail that is consistent across all articles has many advantages (McGrath et al. 2010).

Worldwide, stroke is an important cause of disability and mortality. The burden of stroke has increased substantially over the past few decades due to population growth and aging as well as the increased prevalence of modifiable stroke risk factors, especially in developing countries (Katan and Luft, 2018). At present, there are three different types of strokes: Ischemic strokes, hemorrhagic strokes, and transient ischemic attacks, among which ischemic strokes account for about 87% of all strokes (Virani et al. 2020). Ischemic stroke is characterized by the sudden loss of blood circulation to an area of the brain, typically in a vascular territory, resulting in a corresponding loss of neurologic function (Virani et al. 2020). Currently, FDA-approved drug for ischemic stroke is the recombinant tissue plasminogen activator (r-tPA), a fibrinolytic agent, which is effective if applied within 3 h, but no longer than 4.5 h, after symptom onset (Rabinstein, 2017). However, cerebral ischemia–reperfusion injury (CIRI) following the application of r-tPA sometimes will lead to secondary injury to the brain tissue. The short therapeutic time window and CIRI limited the benefit of r-tPA for large clot burden, and hence, research is ongoing to find more effective and safer reperfusion therapy, as well as focusing on refinement of patient selection for acute reperfusion treatment (Fukuta et al. 2017).

According to the theory of traditional Chinese medicine (TCM), the primary pathological process of ischemic stroke is Qi deficiency and blood stasis, and Qi is an important concept in the theory of TCM, which is a vital energy that can invigorate the body and promote blood circulation and meridian circulation (Li et al. 2014; Tan et al. 2016; Zhao et al. 2012). Buyang Huanwu decoction (BHD), a classic traditional Chinese prescription invented by the Chinese well-known herbalist Wang Qing-ren (AD 1768–1831) for the treatment of ischemic stroke with Qi deficiency and blood stasis, exhibits the efficacy of replenishing Qi and activating blood circulation according to the theory of TCM. It has been utilized clinically and showed significant preventive and therapeutic effects for ischemic stroke and stroke-induced disability for more than 190 years in China and some other Asian countries (Wei et al. 2013). Recently, increasing clinical evidence about BHD application have showed significant improvement in neural functions and symptoms for ischemic stroke (Han et al. 2018; Hao et al. 2012; Jiang et al. 2020; Li et al. 2014), while the underlying mechanisms remain indistinct. BHD consists of seven kinds of Chinese medicinal materials: Radix Astragali, Radix Paeoniae Rubra, Radix Angelicae Sinensis, Rhizoma Ligustici Chuanxiong, Flos Carthami, Semen Persicae, and Pheretima Aspergillum (Table 1) (Cui et al. 2015). Growing experimental studies reported that BHD was beneficial for cerebral ischemia and CIRI, suggesting that BHD may be a prospective therapy that could decrease infarct volume and ameliorate neurological impairment (Cai et al. 2007; Chen et al. 2020a, b, 2019). There are basic methodology and reporting practices in the laboratory study upon BHD treatment against CIRI that are sub-optimal, and it is likely to be affecting the validity and replicability of research.

Table 1 The ingredients of Buyang Huanwu decoction

Thus, the aim of this study was to assess the methodological and reporting quality of experimental researches concerning BHD treatment for CIRI with the SYRCLE tool and ARRIVE guideline, respectively, to provide valuable insights for future studies and support the development of methodological and reporting guidance.

Methods

Search strategies

Studies of BHD in animal models of CIRI were identified from PubMed, Embase, China National Knowledge Infrastructure (CNKI), VIP Database for Chinese Technical Periodicals (VIP), China Biology Medicine Database (CBM), and Wan Fang Data until November 23, 2022. Our search strategy included the following words and phrases: “Buyang Huanwu” OR “Bu yang Huan wu” OR “Bu-yang Huan-wu” AND “cerebral ischemia–reperfusion.”

Inclusion criteria and data extraction

An eligible study had to meet all of the following criteria: (1) The retrieval date was from January 1, 2015 to November 23, 2022; (2) only journal articles were selected; (3) experimental models of CIRI were induced in rats; (4) Buyang Huanwu decoction was used as treatment; and (5) articles published in Chinese or English languages were included. Studies were excluded if they were in vitro studies, clinical articles, review comments, publication without full text, duplicated researches, and language other than English or Chinese. The following information was extracted using a predesigned data extraction form from each eligible study: first author, year of publication, strain, sex, weight, anesthetic, method of establishing the model, and effects of BHD. All the included studies were checked for the consistency separately by two assessors (XY Chen, T Yang), and disagreement would be settled by discussion and a third assessor (ZG Mei).

Methodological and reporting quality evaluation

Two assessors (XY Chen, T Yang) were trained to assess the methodological and reporting quality by experienced assessors before the assessment started. Each reviewer independently assessed the quality of each study. Methodological quality evaluation was assessed using the SYRCLE tool. According to the SYRCLE tool, each item was assigned one of three responses: “low risk, unclear or high risk.” Reporting quality evaluation was performed using the ARRIVE guideline. According to the satisfaction degree of item reporting requirements, it can be divided into “yes,” “partial yes,” and “no.” In the case of a discrepancy, each reviewer provided reasoning for the judgment and disagreements were solved by discussion. If necessary, the third assessor (ZG Mei) was involved in judgment.

Data analysis

Microsoft Excel 2016 was used for the descriptive statistical analysis, and summary statistics were given percentages. Kappa test was performed by utilizing SPSS 25.0 (IBM Corp., Armonk, NY, USA). The kappa index was used to measure the inter-rater reliability between the two assessors for the SYRCLE tool and ARRIVE guideline. A kappa index less than 0.4 suggested poor agreements, 0.4 to 0.75 suggested fair agreements, and over 0.75 suggested excellent agreements.

Results

Study selection

The search strategies yielded 284 records in total. After duplicating retrieval by the NoteExpress database, 114 studies were remained. Among these, 50 records were excluded due to failing to meet the inclusion criteria after screening the titles and abstracts. The full texts of the remaining 64 records were examined for further assessment. And, 19 studies were discarded because of duplicated publication and improper indices. Finally, we included 45 studies in this overview. A flow chart describing the systematic search and study selection process is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of literature search

Basic characteristics of included studies

Among the included studies, about 80% of studies were published in Chinese-language journals, and the remaining 9 studies were in English language. In terms of strain and sex of rats, 35 studies used male of Sprague Dawley, 1 study utilized male of Wistar, and 4 studies used female and male of Sprague Dawley. Five studies used Sprague Dawley without indicating sex of rats. The weight of rats varied from 150 to 320 g in each study. There were 27 studies anesthetized rats with chloral hydrate, 8 studies with pentobarbital sodium, 4 studies with isoflurane, 1 study with 3% amobarbital sodium, 1 study with Zoletil 50 and xylazine, and the remaining 4 studies not listed in the article. Different methods of modeling were performed, including middle cerebral artery occlusion, 4-vessel occlusion, carotid artery drainage, and bilateral carotid artery occlusion. Different outcome indexes were observed, and the most frequently used indicator was neurological severity score (in 19 studies); others included cerebral infarct size, cerebral edema volume, Bcl-2, Bax, VEGF, AKT, p-AKT, IL-6, SOD, and so on. The main characteristics of including studies are displayed in Table 2.

Table 2 The characteristics of the included studies

Methodological quality of included studies

General compliance with the SYRCLE tool was incomplete. No study achieved a decent overall rating (percentage of items with “low risk” ≥ 50%) with the SYRCLE tool. Therefore, the items of all the studies were poorly evaluated. Among all the 22 items, merely 2 items about the published report included all expected outcomes, which was rated as low risk bias in all articles. Eleven studies (24%) described a random component in the sequence generation process. One study (2%) kept the distribution of relevant baseline characteristics balanced for the intervention and control groups. Sixteen studies (36%) induced the disease before randomization of the intervention, and twenty-eight studies (62%) mentioned the outcome was not influenced by not randomly housing the animals. When it came to whether the outcome assessor was blinding, two studies (4%) blinded outcome assessor and judged the outcome was not likely to be influenced by lack of blinding. Twenty-three studies held all animals included in the analysis to ensure adequate outcome data. Only one study reported new animals added to the control and experimental groups to replace dropouts from the original population. There were still 8 items that were poorly evaluated, including the allocation of concealment, housing of animals randomly, implementation of blinding between the caregivers and investigators, selection of animals randomly, planning of a protocol and other problems that could result in a high risk of bias. Overall, only 7 items (31.82%) were rated as “low risk” in more than 50% of the included studies of the 22 items on the SYRCLE tool. The inter-rater reliability was excellent between the two assessors (kappa = 0.94). The details can be found in Table 3 and Fig. 2.

Table 3 Methodological quality evaluation by the SYRCLE tool
Fig. 2
figure 2

Methodological quality assessment by the SYRCLE tool

Reporting quality of included studies

A summary of the ARRIVE guideline results is demonstrated in Table 4 and Fig. 3. No study fulfilled all 39 items of ARRIVE guideline. Merely three studies (7%) described the complete ethical statement. Two studies (4%) drew a time chart or flow chart (Fig. 3(6d)), described the procedure implementation place (Fig. 3(7c)), provided complete details of the animals used (Fig. 3(8a)), explained the reason why some animals or data were not included (Fig. 3(15b)), and commented on the study limitations (Fig. 3(18b)). Six studies (13%) reported housing (Fig. 3(9a)), and eight studies (18%) provided adequate husbandry conditions (Fig. 3(9b)). Nine studies (20%) mentioned randomized grouping. Even worse, there were 10 items even achieving a 100% “no,” including the explanation of how and why the animal species and model were used (Fig. 3(3b)), the description of the procedure implementation time (Fig. 3(7b)), welfare-related evaluations, interventions that were carried out throughout the experiment (Fig. 3(9c)), detailed sample size calculation (Fig. 3(10b)), the order in which the animals in the different experimental groups were treated and assessed (Fig. 3(11b)), the unit of analysis for each dataset specially (Fig. 3(13b)), an offer of the baseline data of experimental animals (Fig. 3(14)), details of important adverse events in each group (Fig. 3(17a)), modifications to the experimental protocols (Fig. 3(17b)), and any implications of your experimental methods or findings for the 3Rs (Fig. 3(18c)), while there were 6 items accurately reported, achieving a 100% “yes,” including a title that accurately described the content of the article (Fig. 3(1)); a study design that mentioned experimental unit (Fig. 3(6c)); an experimental procedure that provided precise details of drug formulation and dose, site and route of administration, anesthesia used, surgical procedure, and method of euthanasia (Fig. 3(7a)); a clear definition that the primary and secondary experimental outcomes were assessed (Fig. 3(12)); details of the statistical methods used for each analysis (Fig. 3(13a)); and the reports on the results of each analysis and accuracy of measures (Fig. 3(16)). Overall, in the 39 items of ARRIVE guideline, 14 (35.90%) items were rated as “yes” in more than 50% of the included studies. The inter-rater reliability was excellent between the two assessors (kappa = 0.95).

Table 4 Reporting quality evaluation by the ARRIVE guideline
Fig. 3
figure 3

Reporting quality assessment by the ARRIVE guideline

Discussion

Experimental researches involving animal models play a crucial role in scientific innovation provided that the experiments are designed, performed, interpreted, and reported well (Bezdjian et al. 2018). Hitherto, an increasing number of experimental researches have reported that BHD is beneficial for CIRI. It has been reported that BHD can protect neurons from ischemic injury, reduce infarction volumes, and stimulate neural proliferation (Zhang et al. 2018). BHD has exhibited the profile as a potential target medicine for the treatment of ischemic stroke or CIRI in facilitating the translation of basic science to the clinical application. However, the conclusions and results may be impeded due to the methodological flaws and poor reporting of experimental researches. Hence, this study aims to assess the methodological and reporting quality of experimental research concerning Buyang Huanwu decoction for mitigating CIRI in rats, to provide useful suggestions for the implementation of the future reviewers and researchers. Unfortunately, the results revealed some limitations in the quality of methodology and reporting, suggesting the need for an improvement in quality in the future.

The methodological quality of included studies is assessed using the SYRCLE tool. The SYRCLE RoB tool (Hooijmans et al. 2014), with 10 items and 22 sub-items, is considered to have high reliability and practicability to evaluate the methodological quality of animal experiments. It not only can assess the risk of bias and improve transparency in the animal research process, but also include a comprehensive user guide. As the sample size of most animal experiments is relatively smaller than that of clinical trials, therefore, necessary baseline characteristics and adequate timing of disease induction for animal experimental disease modeling should be determined to reduce baseline imbalance. In our previous experimental research (Mei et al. 2022, 2020; Yang, et al. 2021), we observed that the 90 min and 24 h may be the appropriate lasting time of MCAO and reperfusion in rat stroke model to mimic the clinical injury of cerebral ischemia and reperfusion. Adequate randomization, allocation concealment, and blinding are suggested to be implemented carefully to reduce the risk of selection bias, performance bias, and detection bias. Also, incomplete outcome data should be reported to reduce the risk of attrition bias, including the appropriate imputations and reasons for missing outcome data. Adherence to a well-developed protocol can reduce the risk of reporting bias. However, these items are not well explained because of a database of registered animal research protocols is not yet publicly accessible. In addition to the above, some other sources of bias need to be paid attention including the contamination, inappropriate influence of funders, unit of analysis errors, design-specific risks of bias, and new animals added to the groups to replace dropouts from the original population.

As for the reporting quality of included studies, we assess it by the ARRIVE guideline. The ARRIVE guideline (Kilkenny et al. 2012), with 20 items and 39 sub-items, aims to fill the gap lacking a set of comprehensive animal research reporting guidelines. It involves important information of animal experiments and promotes substantial improvements in methods used in in vivo animal research. The ARRIVE guideline has been endorsed by over 300 research journals around the world in 2014. Nowadays, the ARRIVE guideline 2.0 has been published in 2020 (Percie du Sert et al. 2020). According to this guideline, it is necessary to explain the reason of using the animal species and model and the study’s relevance to human biology. The full detailed description of the experimental procedures, including the implementation time of procedures, the reason for the route of administration, and the drug dose selection, is the key to ensure accurate experimental reproduction. Baseline data of experimental animals is essential for the results to be comparable. The authors should precisely and explicitly record the details of the animals used including source, species, strain, international strain nomenclature, sex, developmental stage, weight, genetic modification status, genotype, health/immune status, and previous procedures. The information of relevant characteristics and health status of animals before treatment or testing can often be tabulated. The risk of overestimating intervention benefits may be induced by inadequate samples. An adequate sample size with enough statistical power can easily detect statistical differences between groups. So, it is necessary to calculate the sample size before the experiment. It is also significant to focus on adverse events in animal researches to determine the pros and cons of an intervention. In addition, the significance of the study is closely relevant to the utilization rate and conversion of the study. It is also necessary to discuss whether and how these study findings can translate into other species or systems.

The results suggested that methodological and reporting quality should be controlled strictly during the design, implementation, interpretation, and report of experimental research. The SYRCLE tool and ARRIVE guideline could be used to assess the whole process of the animal experiment both rigorously and comprehensively. Nowadays, most studies have low methodological and reporting quality. A recent research (Zhang et al. 2019a) shows that Chinese basic medical researchers have a low awareness and use rates of the SYRCLE tool and the ARRIVE guideline, leading to the low quality of animal studies in Chinese journals (Wang et al. 2019). In this review, 80% of studies are published in Chinese-language journals. Therefore, it is necessary to take specific measures to promote and popularize these standards and specifications and to introduce them into guidelines of Chinese domestic journals as soon as possible so as to raise awareness and increase utilization rates of researchers and journal editors.

Although we follow strict procedures in this review, it still has some limitations. Firstly, to some extent, the quality defects of the included studies affect our evaluation results. Secondly, the literatures included in our study are only in Chinese and English languages, which may have a linguistic bias. Thirdly, this is the first time that the evaluation analysts use the SYRCLE tool and ARRIVE guideline. The evaluation of many items involved is inevitably subjective and may have led to bias. Finally, the findings may not be applicable to other traditional Chinese medicine trials due to the interventions merely contained BHD and the species restricted to rats.

Conclusions

BHD has been used increasingly in the preclinical researches to treat CIRI, which seems to be a potential treatment option for alleviating CIRI in patients. However, based on the SYRCLE tool and ARRIVE guideline, the methodological and reporting quality of BHD against CIRI were poor. Our findings will urge journal editors; the researchers, clinicians, and reviewers; or funding agencies to pay more attention to address these deficiencies and strengthen training to meet relevant requirements on methodologies and reporting quality by strictly adopting and adhering to well-developed reporting guidelines: the SYRCLE tool and ARRIVE guideline.