Overview of the systematic review
The search strategy led to 1469 results in the literature databases. After removing duplicates, 757 studies were screened against the eligibility criteria. By reviewing the titles and abstracts, 683 studies were excluded and 74 were assessed for eligibility. After applying the selection criteria, 62 studies were excluded based on the full-text version. Finally, 12 studies were included into the analysis. A study selection flow chart is provided in Fig. 1.
The 12 studies included in the systematic review were published between 1995 and 2006. Sample sizes varied between 8 and 64 participants. Four of the studies tested only female participants [22,23,24,25], seven studies tested only males [11, 12, 26,27,28,29,30], and one study included both genders [31]. The participants’ mean age ranged from 21.6 to 72.8 years with ten studies testing young participants (< 40 years) [11, 12, 22, 25,26,27,28,29,30,31], and two studies testing for effects in older adults (> 65 years) [23, 24]. Ten studies used a randomized approach to assign participants into exercise or passive (i.e., sitting or resting) control groups [11, 12, 22, 23, 25,26,27,28,29,30,31].
Endurance exercise was used in eight studies [11, 12, 22, 25, 27,28,29, 31], three studies tested the effect of resistance training [23, 24, 30], and one study used both forms of exercise in a crossover design [26]. Following the terminology used by Norton et al. [32], exercise intensity was light in two [25, 27], moderate in six [11, 22, 26, 30], vigorous in seven [12, 25, 28, 29], and high in three trials [11, 23, 24]. The duration of the exercise trials varied between 5 and 120 min. As indicated in Table 1, blood samples were taken prior to and immediately after the exercise, whereas the timing of the final blood draw differed considerably due varying recovery periods ranging from 45 to 240 min.
Table 1 Characteristics of the studies included in the systematic review and meta-analysis In ten studies [11, 12, 22,23,24,25,26,27,28, 30], NKCA was expressed as cytotoxicity in percent (i.e., percentage of target cell lysis) and in two studies NKCA was reported in form of Lytic Units (i.e., number of effector cells required to lyse a certain percentage of target cells) [29, 31]. In half of the 12 studies, whole blood samples were mixed with the target cells (i.e., K562 leukaemia cells) [22,23,24, 28, 29, 31], while the other half used peripheral blood mononuclear cells (PBMC) [11, 12, 25,26,27, 30]. The Chromium-51 (Cr51) release assay was applied to quantify cytotoxicity in 11 studies [11, 12, 22,23,24,25,26, 28,29,30,31], while one study tested for lactate dehydrogenase (LDH) release of the target cells [27]. Moreover, nine studies reported absolute numbers of NK-cell count in response to exercise [11, 12, 22,23,24, 26, 27, 29, 31]. A summary of the basic characteristics of each study is given in Table 1.
Meta-analysis of exercise effects
Overall, 12 studies reporting 18 effect sizes were available for the quantitative synthesis. There was a total of 223 participants included in the analysis. The overall Hedges' g showed a large effect size (k = 18, g = 1.02, 95% CI 0.59–1.46, p < 0.01) with large heterogeneity (τ2 = 0.69, p < 0.01, I2 = 91%). The prediction interval ranged from g = − 0.79 to 2.84, meaning that the effect size can substantially vary across settings (Fig. 2).
The visual inspection of the funnel plot suggested asymmetry in the data (Fig. 3). However, this asymmetry was not statistically significant according to Egger's test (slope = 1.35, one-tailed p = 0.38). Nevertheless, after sensitivity analysis two outliers were identified with Cook’s Distance larger than 0.45 [20], hence the Moyna study [31] and one out of three effect sizes from the McFarlin study [28] were withdrawn from further analyses.
Rerunning the meta-analysis without outliers revealed a slightly increased effect size (k = 16, g = 1.08, 95% CI 0.69–1.47, p < 0.01) and a narrower prediction interval of g = − 0.49 to 2.64 (see ESM Figure A) due to lower between-study heterogeneity (τ2 = 0.50, p < 0.01, I2 = 73%). To further corroborate the results, changes in the experimental and passive control groups were evaluated separately. The effect of physical exercise on NKCA was large and significant in the experimental group (k = 16, g = 1.59, 95% CI 1.14–2.05, p < 0.01), whereas no significant deviation was detected in the control groups (k = 16, g = 0.10, 95% CI − 0.01 to 0.21, p = 0.06), confirming that both conditions worked as expected. The forest plots are available in ESM Figure B and Figure C, respectively.
Moderator Analyses
To better explain the large between-study heterogeneity, a moderator analysis was conducted. Individual-level moderators (i.e., gender, age), exercise-level moderators (i.e., type, intensity), and a method-level moderator (i.e., type of blood sample) were assessed by subgroups analyses, while meta-regression was applied to examine a confounding effect of NK-cell count.
The effect size was stronger in males (k = 10, g = 1.29, 95% CI 0.72–1.87) than in females (k = 6, g = 0.72, 95% CI 0.18–1.25), however, this difference was not significant (χ2 = 3.08, df = 1, p = 0.08). The comparison of young and old subgroups revealed a significant between-subgroup difference (χ2 = 10.69, df = 1, p < 0.01) with a large effect size among young participants (k = 14, g = 1.18, 95% CI 0.74–1.61) and a smaller effect size among old participants (k = 2, g = 0.47, 95% CI − 0.52 to 1.46). However, this result needs to be considered in caution given that the absolute majority of studies had tested young participants.
On the exercise level, a significant difference (χ2 = 12.92, df = 1, p < 0.01) was detected between studies using resistance (k = 4, g = 0.48, 95% CI 0.33–0.63) compared to endurance exercise (k = 12, g = 1.30, 95% CI 0.81–1.78). Moreover, the intensity of exercise played a significant role (χ2 = 11.21, df = 3, p < 0.01) in the sense that high (k = 3, g = 1.27, 95% CI − 2.39 to 4.93) and vigorous intensity (k = 5, g = 1.41, 95% CI 0.76–2.06) led to larger effect sizes than moderate (k = 6, g = 1.01, 95% CI 0.37–1.65) and light exercise intensity (k = 2, g = 0.48, 95% CI − 1.68 to 2.64).
On the methodological level, no significant difference (χ2 = 0.14, df = 1, p = 0.71) was found for the type of blood sample used in the NKCA assay with whole blood (k = 7, g = 1.01, 95% CI 0.45–1.56) compared to PBMC (k = 9, g = 1.14, 95% CI 0.46–1.83).
Finally, it was tested if the effect of physical exercise on NKCA covaried with the number of NK-cells in the peripheral blood. Based on 9 studies reporting 11 effect sizes of NK-cell count along with NKCA values, a meta-regression was applied which did not reveal a significant confounding effect (k = 11, beta = 0.123, R2 = 0%, p = 0.55). Thus, NK-cell count could not explain the between-study heterogeneity. The relationship between the effect size and the NK-cell count is displayed on the bubble plot in ESM Figure D.
Meta-analysis of recovery effects
Next, the effect of physical exercise on NKCA after a recovery period was examined. By comparing the pre-exercise NKCA values with the recovery NKCA values in both the exercise and control groups a moderate negative effect size was discovered (k = 16, g = − 0.51, 95% CI − 0.86 to − 0.16, p < 0.01).The prediction interval ranged from a huge negative to a large positive effect size (g = − 1.85 to 0.83), meaning that the effect size can still substantially vary across settings. As shown in Fig. 4, large between-study heterogeneity (τ2 = 0.36, p < 0.01, I2 = 76%) became evident.
Finally, the exercise and control groups were analysed separately. Within the exercise groups, no evidence was found for NKCA alterations subsequent to the recovery period (k = 16, g = 0.06, 95% CI − 0.37 to 0.50, p = 0.76; see ESM Figure E). In the control groups, however, the effect size was significant and moderately positive (k = 16, g = 0.66, 95% CI 0.43–0.89, p < 0.01), while the between-study heterogeneity was small and insignificant (τ2 = 0.02, I2 = 37%, p = 0.07). In other words, physical exercise did not significantly decrease the NKCA after the recovery period in the experimental groups, but the relative comparison with the elevated NKCA levels in the control groups suggested a negative overall effect. Figure 5 presents the forest plot for the control groups.
It can be assumed that the effect on NKCA was altered by the recovery period which varied considerably in length between the studies. Therefore, meta-regression models were run to analyse the impact of the recovery time in the overall sample (k = 16, beta = 0.00, R2 = 0%, p = 0.898), exercise groups only (k = 16, beta = 0.001, R2 = 0%, p = 0.810), and control groups only (k = 16, beta = 0.002, R2 = 0%, p = 0.372). All three meta-regression models turned out to be insignificant, that is, the duration of the recovery period did not impact the NKCA values, and thus, could not explain between-study heterogeneity.