Background

Alzheimer’s disease (AD) is a major public health problem and a leading cause of disability [1, 2]. The number of people affected by AD is increasing rapidly worldwide, and more than 35 million people currently have AD. By 2050, the prevalence of AD is expected to quadruple to 1 in 85 people, of which 43 % are expected to need a high level of care. World Alzheimer Report 2015 showed that the total estimated worldwide cost of dementia is $818 billion, and it will reach the trillion dollar mark by 2018 [3]. The clinical characteristics of AD are memory loss and impairment of at least one other cognitive domain [4]. Memory dysfunction is generally the first symptom of AD, and it is generally the most severe cognitive impairment. Mounting evidence indicates that the severity of memory dysfunction correlates strongly with the presence of beta-amyloid plaques and intracellular tau and neocortical neurofibrillary tangles [5, 6]. Despite massive research effort to elucidate the causes and mechanisms underlying AD, including recent advances in our understanding of its molecular pathology, effective treatment remains elusive, and none of the existing drugs are able to halt its progression [7, 8]. Consequently, there is a growing interest in new therapeutic strategies for the treatment of AD [9].

Polygonum multiflorum Thunb (PM) is a traditional Chinese herb that has been used widely as an anti-aging drug in the Orient since ancient times. TSG (2, 3, 5, 4′-tetrahydroxystilbene-2-O-β-D-glucoside), a monomer of stilbene, is one of the main components extracted from the root of PM [10]. TSG can cross the blood–brain barrier and has protective effects on hippocampal synaptic plasticity in vitro [11, 12] and in vivo [13]. Recent studies have also shown that TSG reduces the overexpression of amyloid precursor protein (APP) [14], inhibits reactive oxygen species generation [15], and attenuates cognitive impairment in several animal models of AD, including age-advanced rats [11], APP transgenic mice [16], amyloid-β1–42-injected rats [12], and aluminium-exposed rats [14].

No systematic studies have investigated the effect of TSG on cognition in humans with AD. Thus, in the absence of systematic studies investigating TSG in humans, it is not appropriate to state that studies are needed to confirm the benefits of TSG because there are no findings to confirm. It is more appropriate to state that studies must be conducted to identify a potential benefit. Systematic reviews of animal studies synthesize the available evidence in an unbiased manner to provide evidence for the potential translational value of effective therapeutic interventions in animal models to humans [17], contribute to models of clinically relevant problems, and facilitate decisions regarding the design and conduct of subsequent human clinical trials [18]. Therefore, the aim of the current study was to perform a robust systematic review and meta-analysis of all available experimental evidence concerning the effects of TSG on cognitive impairment in animal models of AD and to provide an evidence-based foundation for future clinical trials.

Methods

Literature search

On April 3, 2015 we searched seven electronic databases (PubMed, Web of Science, MEDLINE, Google Scholar, Embase, CNKI, and Wanfang data). All searches were restricted to literature published between January 1980 and April 2015. The following terms were included in the searches: “Alzheimer’s disease” (or “Alzheimer disease”, “dementia”, “Alzheimer”, “Alzheimers” or “Alzheimer’s”) and “tetrahydroxystilbene glucoside” (or “Polygonum multiflorum Thunb”, “Radix Polygonum Multiform”, “tetrahydroxy stilbene glucoside”, “2, 3, 5, 4′-tetrahydroxystilbene-2-O-β-D-glucoside”, or “TSG”). We limited our search results to animal studies. Additional relevant publications were identified from the reference lists of the resulting research articles and reviews. Bias was prevented by the a priori defined inclusion and exclusion criteria described in Table 1. Two investigators (SC and PW) assessed the titles and abstracts of the studies and obtained copies of the articles describing controlled studies of TSG or its analogues in animal models of AD.

Table 1 Criteria for study inclusion/exclusion

Data extraction

The following information was extracted from each included study by two investigators: animal species; sex; type of AD model; sample size; dose, method, and timing of TSG administration; main experimental groups; intervention regime (i.e., administration route and number of injections); and cognitive outcome assessments.

Any studies that reported effects of TSG on learning and memory abilities using an animal model of AD were included. The cognitive outcomes were assessed by the Morris water maze, passageway water maze, passive avoidance task, and Y maze experiment, among others, which are commonly used to evaluate spatial learning/memory in both mice and rats [19, 20]. The details of the individual study characteristics were extracted from each publication. When a single publication included groups with different TSG doses or different AD models across groups, these data were extracted and considered independent experiments. Because the learning trials to assess memory function were conducted over 5 days, the final test indicates the learning ability of rats/mice [21]. Therefore, we extracted the data for the final time point only when memory function was assessed at a different time point. If any information was missing, then the study investigators attempted to obtain the information from the study authors. If these data were not available, then we excluded the study from the analysis. If the data were presented in graphical form only, then we contacted the authors to request the numerical values. If numerical values could not be obtained, then the numerical values were estimated from the graphs using digital ruler software.

Methodological study quality

The methodological quality of the studies was assessed based on a checklist of the Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies (CAMARADES), as previously described, with minor modifications [22]. One point was assigned for written evidence of each of the criteria described in Table 2.

Table 2 The CAMARADES quality items

Although a large number of tools is currently used to assess the quality of animal studies, most instruments assess study quality and internal and external validity simultaneously [23]. No tools that have been identified that are able to assess internal validity alone. Therefore, in addition to the modified CAMARADES checklist, we used another previously described checklist [24, 25] to assess study quality based on the study characteristics, such as the age, species, and sex of the animals used, and the dose and duration of TSG supplementation (Table 3). The quality of all studies was assessed independently by two reviewers (PW and SC).

Table 3 The second quality items

Statistical analysis

The global estimated effect of TSG treatment on cognitive outcomes was calculated using the standardized mean difference (SMD) and 95 % confidence intervals (CI), which is used as a summary statistic in meta-analyses when studies assess the same outcome but measure the outcome in different ways, based on the guidelines in the Cochrane Handbook for Systematic Reviews of Interventions [26]. The SMD is equal to the difference in mean outcomes between groups divided by the standard deviation of outcomes among participants and is reported in units of standard deviation. Negative SMD effect sizes indicate a positive efficacy for acquisition memory, whereas positive SMD effect sizes indicate a positive efficacy for retention memory. Within- and between-study heterogeneity was evaluated using Cochran’s Q-statistic, P < 0.10, to indicate heterogeneity among studies [25]. The statistical heterogeneity across studies was assessed using the I 2 statistic, with values of 75, 50, and 25 % representing high, moderate, and low heterogeneity, respectively. A value ≥50 % suggests unacceptable heterogeneity among the studies [27]. A random-effects model was used to pool the SMD when the heterogeneity was significant (I 2 ≥ 50 %); otherwise, a fixed effects model was applied.

Subgroup analyses were also used to identify associations between relevant study characteristics, such as species, sex, TSG dose, and study quality, as possible sources of heterogeneity. Heterogeneity and the x 2 distribution with n-1° of freedom (df), where n equals the number of groups, was used to assess the differences in mean effect sizes between groups. To adjust the values for multiple comparisons, we used Bonferroni’s correction methods (declared significance =1 − (1 − denoted significance)∧(1/number of comparisons)), which was appropriate for the number of analyses conducted [28]. The denoted significance level was set at P < 0.05. The declared P values for this study were 0.0017 for acquisition memory and 0.0037 for retention memory.

Finally, meta-regression analyses were conducted to reveal potential sources of heterogeneity in the efficacy of TSG when high heterogeneity was present. The following variables were included in the meta-regression analyses: species, sex, TSG dose, and study quality. To allow for multiple comparisons, the significance was set at P < 0.01.

All statistical analyses were performed using the Stata software package (version 13.0) and Review Manager (version 5.3).

Results

Study inclusion

A total of 381 publications were identified, of which 18 met our inclusion criteria [11, 12, 14, 16, 2942]. Our meta-analysis is based on these 18 studies, which include 39 comparisons of acquisition memory and 15 comparisons of retention memory (Fig. 1).

Fig. 1
figure 1

Flow diagram of the study search process

Study characteristics

Of the 18 included studies (Table 4), 13 were published in Chinese academic journals and the remainder were published in English. The characteristics of these studies are presented in Table 1. A total of 10 studies used mice (3 Balb/c mice [31, 32, 36], 2 Kunming mice [33, 38], 3 PDAPPV717I transgenic mice [16, 29, 30], and 2 senescence accelerated prone mice/8 [34, 35]), 7 studies used Sprague-Dawley rats [11, 12, 14, 37, 39, 41, 42] and 1 study used Wistar rats [40]. Female animals were used in 3 studies [31, 32, 36], male animals were used in 11 studies [11, 12, 14, 34, 35, 3742], and both males and females were used in 4 studies [16, 29, 30, 33]. Five studies used a transgenic model [16, 29, 30, 34, 35], 3 studies used a D-galactose-induced model [3133], 2 studies used a cholinergic damage model [37, 38], 2 studies used an age-advanced model [11, 42], 5 studies used an amyloid-β1-42-injected model [12, 36, 39, 41, 42], 1 study used an aluminium chloride-exposed model [14], and 1 study used a hypercholesterolemia model [40]. To assess learning and memory, 14 studies used the Morris water maze test, and all of these studies used a hidden platform during the probe phase [12, 16, 2938, 40, 42]. One study used a passageway water maze [11], 1 study used passive avoidance task [14], and 2 studies used a Y maze experiment [39, 41].

Table 4 Characteristics of included studies

Study quality

According to the modified CAMARADES checklist, the median quality score for the 18 included studies was poor (5.692; interquartile range: 5–6), with scores ranging from 4 to 7. No study received a score of 0 or 10. Five studies received scores indicating high quality [12, 14, 16, 39, 42]. One study reported monitoring of physiological parameters [12]. One study mentioned allocation concealment [16]. Two studies [31, 37] did not report randomization of animals into treatment groups. Ten studies [16, 29, 30, 3235, 37, 38, 40] assessed dose-response relationships. Four studies [12, 14, 39, 42] stated no potential conflicts of interest. Unfortunately, no studies described the calculation of the sample size required to achieve sufficient power to detect differences.

According to our secondary criteria, the average quality score of the included studies was 16.74, with scores ranging from 15 to 19. Six studies [12, 14, 37, 39, 40, 42] received a score of 15, and two studies received a score of 19 [16, 36]. Six studies did not report the age of the animals [12, 14, 37, 39, 40, 42]. Only one study [16] reported blinded outcome assessments. No studies mentioned any dropouts. No studies mentioned whether the order of the outcome assessments was randomized across groups.

Overall efficacy

For acquisition memory, the global estimated effect of TSG was −1.46 (95 % CI: −1.81 to −1.10, P < 0.0001), with significant heterogeneity among studies (heterogeneity: x 2 = 216.17, df = 38, P < 0.00001, I 2 = 82 %; Fig. 2a). For retention memory, the global estimated effect of TSG was 1.93 (95 % CI: 1.40 to 2.46, P < 0.0001), with significant heterogeneity among studies (x 2 = 56.97, df = 14, P < 0.0001; I 2 = 75 %; Fig. 2b).

Fig. 2
figure 2

Effects of TSG on acquisition memory (a) and retention memory (b). The horizontal lines represent the mean estimated effect sizes and 95 % CIs for each comparison. The vertical grey bars represent the 95 % CIs of the pooled estimated effect sizes

Stratified meta-analysis

Subgroup analyses were conducted to assess the degree to which the methodological differences between trials may have systematically influenced differences observed in the primary treatment outcomes. The results of the stratified analyses are described in Table 5.

Table 5 The results of stratified meta-analysis
Table 6 Metaregression analysis to identify sources of bias associated with study characteristics

We examined the protective effects of TSG on different rodent species. For both acquisition and retention memory, the effect size was significantly higher in studies that used Sprague-Dawley rats than in studies that used other species (Fig. 3a and b); P = 0.0002 for acquisition memory and P < 0.00001 for retention memory, respectively). The effect size was −2.78 (95 % CI: −4.06 to −1.51) for acquisition memory and 3.60 (95 % CI: 2.63 to 4.57) for retention memory in studies that used Sprague-Dawley rats.

Fig. 3
figure 3

Effect size stratified by animal species for (a) acquisition memory and (b) retention memory, according to animal gender. Effect size stratified by gender for (c) acquisition memory and (d) retention memory stratified by the model method for (e) acquisition memory and (f) retention memory. Grey bands represent the 95 % CIs for the global estimated effect sizes

We also examined the effect size of TSG on acquisition memory and retention memory in studies that used male, female, or mixed sex animals. The effect size on acquisition memory was significantly higher in studies that used mixed sex animals than in those that used male or female animals only (x 2 = 18.45, df = 2, P < 0.00001; Fig. 3c). The effect size on retention memory was examined in studies that used mixed sex or male animals only because limited data were available from studies with female animals only. The effect size was higher in studies that used mixed sex animals (−2.06, 95 % CI: 1.15 to 2.98) than in those that used male animals only, but this difference was not significant (x 2 = 0.21, df = 1, P = 0.65; Fig. 3d). A significant effect size for acquisition memory was observed in both transgenic models (−1.46, 95 % CI: −1.94 to −0.98, P < 0.0001) and non-transgenic models (−1.46, 95 % CI: −2.00 to −0.91, P < 0.0001); no significant difference was observed between models (x 2 = 0.00, df = 1, P = 0.99; Fig. 3e). A slightly higher effect size for retention memory was observed in non-transgenic models than in transgenic models, but no significant differences were observed between models (x 2 = 0.001, df = 1, P = 0.93; Fig. 3f).

Next, we analysed the efficacy of different doses of TSG on cognitive performance. For both acquisition and retention memory, significant beneficial effects were found for all doses of TSG, with a maximum effect at the lowest dose for both acquisition memory (−1.92, 95 % CI: −2.52 to −1.32) and retention memory (2.23, 95 % CI: 1.34 to 3.11). However, no significant differences among doses were detected for either acquisition memory (x 2 = 8.48, df = 3, P = 0.04; Fig. 4a) or retention memory (x 2 = 3.86, df = 2, P = 0.15; Fig. 4b).

Fig. 4
figure 4

Effect size stratified by the dose of TSG for (a) acquisition memory and (b) retention memory according to animal gender. Effect size stratified by quality score for (c) acquisition memory and (d) retention memory. Rey bands represent the 95 % CIs for the global estimated effect sizes

The effect sizes for acquisition and retention memory were also examined relative to the study quality score. For acquisition memory, the effect size was significantly higher in studies with a quality score of 7 (−3.82, 95 % CI: −4.41 to −3.23) than in those with a quality score of 4, 5, or 6 (x 2 = 101.37, df = 3, P < 0.00001); Fig. 4c). No significant differences in effect size were observed relative to study quality for retention memory (x 2 = 12.51, df = 3, P = 0.006; Fig. 4d); however, the effect size was highest for studies with a quality score of 4 (3.86, 95 % CI: 1.66 to 6.05).

Meta-regression analyses

A multivariate random-effects regression with species, sex, model, TSG treatment dose, and study quality score was conducted to further explore the heterogeneity among studies regarding acquisition and retention memory. For acquisition memory, the study quality score was a significant source of heterogeneity (P < 0.05). For retention memory, heterogeneity was independent of all tested factors (Table 6). Finally, we analysed the combined data for acquisition and retention memory to determine whether the study quality score was a significant source of heterogeneity. However, the results showed that the study quality score was not a significant source of heterogeneity for the combined data (coef: −.1396835; 95 % CI: −1.896979-1.61761; t: −0.16).

Discussion

Systematic reviews of animal studies synthesize the available evidence in an unbiased manner to provide evidence for the potential translational value of effective therapeutic interventions in animal models to humans [17], contribute to models of clinically relevant problems, and facilitate decisions regarding the design and conduct of subsequent human clinical trials.

Systematic reviews of animal studies synthesize existing evidence in an unbiased manner to facilitate decisions regarding the design and conduct of subsequent human clinical trials [18]. To the best of our knowledge, this is the first systematic review and meta-analysis to examine the efficacy of TSG in animal models of AD. This systematic review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses [43] flow diagram (Fig. 1). Although small study effects and statistical heterogeneity were present among the included studies, we found that TSG may improve cognitive outcomes relevant to AD [44].

Subgroup analyses of stratified characteristics were assessed to examine the variation in the effects of the intervention, which would suggest that the stratifying characteristic is a crucial factor for heterogeneity and may affect the treatment efficacy. Based on current guidelines, which recommend at least 10 studies per characteristic to stratify subgroups [45], we were able to conduct subgroup analyses of potential sex and species differences, which revealed a higher effect of TSG on acquisition and retention memory in Sprague-Dawley rats than in other species, and less acquisition and retention memory loss following TSG supplementation in studies that used mixed sex groups than in those that used only male or female animals. In addition, TSG treatment was similarly neuroprotective for acquisition and retention memory in both transgenic and non-transgenic models. However, we also found that the lowest dose of TSG provided the greatest benefit in terms of acquisition and retention memory, which is not consistent with a previously described dose-linear response curve [46]. This finding suggests that effect size has been overstated in studies that used lower doses of TSG.

The meta-regression analysis revealed that the heterogeneity was not due to the variables included in the model. Sex, species, animal model, and TSG dose did not affect the heterogeneity of either acquisition or retention memory. The study quality score may explain the heterogeneity in acquisition memory, but the heterogeneity could not be explained by the findings for retention memory or the combined data for acquisition and retention memory. These results may be a consequence of the small sample sizes in those studies and the limited number of studies, reducing the reliability of the analysis. Therefore, we could not conclude that the study quality is dependent on the outcome.

There are some limitations to our present meta-analyses. First, our conclusions are limited by the availability of published trials. We did not include unpublished data in our study. Although we attempted to identify all relevant studies from both Western and Eastern countries, all included studies were conducted within China, which may limit generalizations based on our findings. In addition, it has been reported that some Asian countries, including China, publish unusually high proportions of positive results [27, 47]. Of the studies included in this meta-analysis, most did not report negative findings. We conducted an extensive search of unpublished material in an attempt to obtain negative results, but no unpublished negative studies were identified. We cannot exclude the possibility that studies with negative findings remain unpublished because significant positive findings are more likely to be published than non-significant findings. A meta-analysis based on the published literature may overestimate the efficacy of an intervention [48]. Therefore, publication bias may exist in our meta-analysis, although it seems unlikely that the direction or significance of our findings would be modified by unacknowledged trials.

Second, we observed significant heterogeneity among the study results. Although we used accepted techniques for the meta-regression analysis to identify factors associated with variability in the benefits of TSG treatment, the statistical power of these analyses was relatively low given the number of available trials. Unfortunately, for retention memory, the adjusted R2 was 29.72 % due to the limited number of studies. In addition, the covariates included in the model could not explain the heterogeneity more than would be expected by chance. Therefore, it was impossible to accurately determine whether the observed heterogeneity was independent of these factors. The presence of heterogeneity highlights the need for caution in interpreting the present findings [49].

Third, no trial exceeded 6 months in duration, which is relatively short given that patients with AD may require treatment with TSG for decades. Long-term treatment may lead to adverse events or persistent or significant disability/incapacity. Furthermore, we focused on only the effect of TSG on cognitive deficits in animal models of AD, largely due to insufficient data regarding the effect of TSG on neuropathological changes (i.e., β-amyloid plaques and neurofibrillary tangles) in AD.

Fourth, we assessed the methodological quality of studies in accordance with previously described standards for the preclinical development of neuroprotective drugs, with minor modifications [18]. Overall, we found that the quality of the included studies was poor. Many of the studies failed to report blinded outcome assessments, which is recommended for open-label trials to reduce bias. Patient, clinician, and/or assessor awareness of the treatment assignment may influence outcome reporting or measurements and introduce bias [50]. Moreover, although it is important to judge the efficacy of a new drug or therapy, no study reported sample size calculations [51], which should be calculated during the planning phase of the study to evaluate the accuracy of a priori estimates and assist in the design of future experiments [52]. Furthermore, lower quality studies showed a trend towards better retention memory outcomes. Therefore, the global estimated effect of TSG on cognition may be overstated in low quality studies. In addition, studies that included female animals failed to describe their hormonal cycle, which may influence behaviour, body physiology, and cognitive and learning-related performance, and should be accounted for in the experimental design (e.g., by increasing power and/or balancing the randomization of animals across groups) [24, 53].

Fifth, an increasing number of reports on adverse effects and hepatotoxicity of PMT products have been reported in patients [54]. TSG, the main water-soluble active component of PMT, was considered the major cause of hepatotoxicity [55, 56]. Nonetheless, no study reported any data on the safety and toxicity of TSG perhaps due to the perception that herbal agents are safe because they are natural products and have a long history of use. Along with the medical use and researches of herbal medicines increased, toxicity and safety of those medicinal materials had become the crucial concerns [57]. It is essential to design additional well-designed and detailed experimental studies to evaluate the safety of TSG before human clinical studies and application.

Conclusions

Despite its limitations, this systematic review and meta-analysis demonstrates that TSG may reduce cognitive deficits in animal models of AD and indicate a potential therapeutic role of TSG in AD therapy. However, additional scientific experimental studies are needed to evaluate the safety of TSG before human clinical studies and application.