1 Introduction

High-intensity interval training (HIIT) involves repeated bouts of exercise performed in the vigorous intensity domain interspersed with periods of complete rest or low-intensity exercise. High-intensity intervals can last anywhere from 5 s to 8 min and are generally performed above the second ventilatory threshold which elicits a heart rate response between 80 to 100% of heart rate maximum [1]. Variables such as the interval duration, intensity, and the number of work and recovery bouts can be manipulated to develop a myriad of different HIIT protocols. When the work rate increases towards the upper bound of the typical intensity range for HIIT, a specific form of HIIT referred to as ‘sprint interval training’ (SIT) occurs. This involves very brief work bouts, usually around 8–30 s, that are repeated and performed at supra-maximal intensity (greater than that associated with 100% of \(\dot{V}\)O2max) or ‘all-out’ efforts [2]. These relatively short intervals are also interspersed with either long or short recovery periods [3].

Previous studies have shown that HIIT can elicit equal or greater improvements in maximal oxygen uptake (\(\dot{V}\)O2max) compared with continuous training [4,5,6,7], particularly with the use of higher intensities and longer work intervals [8, 9]. Currently, HIIT is increasingly prescribed as a potential therapeutic intervention to address a variety of chronic illnesses including cardiovascular disease, cancer, and metabolic syndrome, due to the robust evidence showing significantly enhanced cardiorespiratory fitness [6, 10,11,12,13]. The approach appears to be a viable strategy for fostering mental, psychological, and cognitive health and may reduce the severity of anxiety and depression [14,15,16,17,18,19]. Lastly, HIIT can be undertaken without the need for expensive gym equipment or access to commercial exercise training facilities. Overall, HIIT appears to be a feasible alternative to traditional endurance training for improving cardiorespiratory fitness, and may facilitate these changes with a surprisingly low training volume [20, 21].

Anatomical and physiological differences between men and women are believed to underlie differences in \(\dot{V}\)O2max and endurance performance [22]; however, differences in the adaptation to chronic training are less well known. Studies in which biological sex was treated as an independent variable have been considered crucial towards improving the understanding of overall human health, and also for enabling more personalized, sex-specific training regimens [23]. Compared with men, the absolute aerobic capacity of trained women is 10–25% lower, however, when maximal oxygen uptake (\(\dot{V}\)O2max) is adjusted relative to body weight, the difference can be reduced to around 5–10% [24,25,26]. After normalization of body weight, the remaining difference could be due to lower blood hemoglobin concentration, cardiac dimensions, and total blood volume [24, 27,28,29]. For example, women's hearts and major blood vessels are typically smaller than those of men of the same body weight, ethnicity, and chronological age [22, 30,31,32,33]. Similarly, various studies have identified respiratory system limitations in women. Compared with men, height- and weight-matched women appear to have smaller lung sizes [34,35,36]. Furthermore, the diameter of the conducting airways is lower and the number of alveoli is less than in men, both of which negatively affect airflow and efficiency of gas exchange during heavy exercise [36,37,38,39]. Although some research indicates that performance could be impacted by sex differences in lung volume, but not airway anatomy and mechanics [22, 40, 41], such differences likely still contribute to physiological limitations to oxygen transport and thus would tend to exert a negative impact on exercise performance in women compared with age-, height- and/or weight-matched men.

Another physiological sex difference that has the potential to influence exercise response is that the less fatigable type I muscle fibers tend to be more abundant in women [42]. As such, there is evidence that for the same period of high-intensity exercise, women tend to experience less peripheral muscle fatigue-related contractile dysfunction than men, which translates to greater fatigue resistance and faster recovery [43]. Due to the differences in muscle fiber type percentages, women oxidize more fat and less protein and carbohydrate at matched relative intensity compared with men [44], whereas men possess higher glycolytic capacity [45, 46], which would therefore tend to alter intracellular homeostasis to a greater extent in men versus women at an equivalent relative intensity. Moreover, in response to HIIT or SIT, some studies have reported that females present with lower blood lactate levels [47]. Other findings have demonstrated that anaerobic capacity, estimated by energetic equivalents of the phosphagen and glycolytic pathways, may be lower in women when compared with men after a supramaximal effort [48]. Collectively, these studies suggest that women are less prone to peripheral muscle fatigue and have a greater tendency towards more aerobic metabolism than men.

To date, many reviews investigating the role of sex differences on acute exercise responses and chronic adaptation have been narrative in nature [2, 49]. One review concluded that attenuated blood lactate accumulation, lower protein synthesis, and mitochondrial biogenesis occur in women relative to men following SIT [2]. One systematic review, which also included a meta-regression [50], examined the effects of low-volume HIIT on cardiorespiratory fitness in adults and found moderate improvements in the \(\dot{V}\)O2max of active and sedentary participants, without presenting a conclusion regarding a sex-specific response to HIIT. A more recent meta-analysis concluded that HIIT is an efficient method of decreasing total abdominal and visceral fat mass without differences between men and women, but it did not investigate cardiorespiratory fitness as an outcome [51]. A meta-analysis by Diaz‑Canestro and Montero [52] found significantly larger increases in both absolute and relative \(\dot{V}\)O2max after moderate-intensity endurance training in men compared with women; however, this review did not investigate the effects of HIIT as an intervention. Overall, the findings of these reviews indicate the potential for sex to impact health outcomes and cardiorespiratory fitness adaptations to exercise training, yet to date, no definitive conclusions can be drawn regarding how sex differences influence the adaptation to HIIT. Therefore, the objective of this systematic review with meta-analyses was to examine the influence of biological sex on the relative magnitude of adaptations in cardiorespiratory fitness and performance, following either HIIT or SIT interventions.

2 Methods

2.1 Development of the Research Question

To address the objective of the review, the research question was formulated using the Population, Intervention, Comparison, Outcome (PICO) framework as follows:

Is the relative magnitude of adaptation of maximal cardiorespiratory fitness and measures of performance (outcome) in response to HIIT or SIT (intervention) in healthy adults (population) different between men and women (comparison)?

2.2 Literature Search and Screening

This systematic review has been registered on the PROSPERO International Prospective Register of Systematic Reviews (registration number: CRD42021272615). Additionally, this review has been conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [53].

A search of three major electronic databases (MEDLINE, Sports Medicine & Education Index in ProQuest, and SPORTDiscus) was undertaken through to 8 September 2022. The keywords used during the search were ‘high intensity’, ‘high-intensity’, ‘HIIT’, ‘HIT’, ‘intervals’, ‘intermittent’, ‘sprint’, ‘HIIE’, ‘vigorous’, ‘maximal’, ‘exercise’, ‘workout’, ‘intervention’, ‘physical activity, ‘activity’, ‘training’, ‘gender’, ‘sex’, ‘male’, ‘males’ ‘man’, or ‘men’ and ‘female’, ‘females’ ‘woman’, or ‘women’. Subject (MeSH) headings were used for ‘exercise’, ‘exercise training’, ‘exercise adaptation’, and ‘physical activity’ in MEDLINE and SPORTDiscus. The search strategy was recreated in the Sports Medicine & Education Index in ProQuest without the option of subject headings. The ProQuest search was set to search ‘everything except full text’, including title, abstract, and keywords. The full search strategy as it was undertaken in MEDLINE is outlined in Supplementary Online Resource 2 (see electronic supplementary material [ESM]). In addition, the reference lists of previous reviews relevant to HIIT were manually screened to identify any relevant references that were not included in the electronic search. All references captured in the search and identified from reference lists were exported into Zotero reference management software (version 5.0.96.2, USA), and subsequently imported into Covidence online review management program (Australia) for the study selection phase of the review.

Title, abstract screening, and full-text screening were conducted through the Covidence website by two independent reviewers (IY and ML). Any conflicts during the screening process were resolved via consultation between the two reviewers to confirm the reasons underlying inclusion or exclusion. A third reviewer was available for any conflicts that could not be resolved between the first two reviewers.

For inclusion in the review, studies were required to have implemented a HIIT or SIT protocol intervention in a cohort of adults including male and female participants; to have measured cardiorespiratory fitness or performance outcomes; to have presented separate outcomes for men and women, or individual data including sex, and/or presented results of a sex × HIIT analysis for the outcomes of interest. This approach to study inclusion was taken in order to control for confounding from different HIIT/SIT protocols across studies, and to ensure that exercise dose is normalized between male and female sub-groups. Peer-reviewed publications or unpublished theses that were written in English were included in the review. Participant groups were excluded from the review if clear pathology was present (i.e., if diagnosed diseases or disorders were a focus of the intervention) or if the intervention included major confounding factors (i.e., dietary supplementation or manipulation, pharmaceutical or herbal intervention, bed rest, or if HIIT/SIT was not the primary cardiorespiratory exercise intervention). Studies and participant groups with risk factors for disease were included if diagnosed disease states were not present. Research designs including reviews, previous meta-analyses, conference abstracts, case studies, and non-scientific articles were excluded from the review. Outcome measures included any measures relating to cardiorespiratory fitness, maximal or sub-maximal exercise performance including power, anaerobic threshold, or speed-related measures (i.e., time trials and sprints). Musculoskeletal performance outcomes such as field tests of muscular power, strength, or endurance were outside of the scope of the current review. No limitations were placed on the type of measure used for fitness or performance outcomes (i.e., measured vs estimated \(\dot{V}\)O2max, or lab-based vs field tests of performance) or whether or not the study achieved a positive effect overall.

2.3 Assessing the Risk of Bias Within Studies

The risk of bias for each of the individual included trials was evaluated independently by two authors (IY and ML) using the Newcastle–Ottawa Scale (NOS) for quality assessment of case–control studies [54]. To address the unique research question of the current review, the research team needed to focus on an observational element (biological sex differences in outcomes) within interventional studies, therefore a tool for assessing the risk of bias in observational studies was deemed to be more appropriate than a tool to assess the risk of bias within experimental studies. This approach has been previously used for another meta-analysis with a similar research question [52]. Additionally, to address the risk of bias specific to the research question, the comparison of men and women was applied to the NOS in place of cases versus controls.

Due to the risk of low inter-rater reliability associated with the subjective interpretation of the NOS and the previously highlighted need for more detailed guidance around the application of the scale [54,55,56], additional directions were developed by the research team to apply the NOS to the specific objectives of the current review (see Supplementary Online Resource 3 in the ESM). Studies were scored on a scale of nine in accordance with the NOS scoring system. Any conflicts in the quality rating scores of individual items within each study were resolved through discussion between the researchers. Inter-rater reliability for individual items of the NOS was calculated as the number of trials with the same score from both reviewers before conflict resolution as a proportion of the total number of trials. Domain scores were used to categorize studies into good, fair, and poor quality using the thresholds outlined by the Agency for Healthcare Research and Quality (AHRQ) [57].

2.4 Data Extraction and Meta-Analysis

Data including reference identification information, details of the participant characteristics such as age and target population, details of the intervention (intervention length, HIIT protocol, frequency, intensity, and exercise mode), methods of fitness or performance outcome testing, and results were extracted using a customized Microsoft Excel spreadsheet and Microsoft Word tables. Additionally, any concurrently measured, potentially influential physiological variables such as those relating to cardiac, muscular, or cellular metabolic adaptations, and measures of blood lactate accumulation or lactate clearance were also extracted. In cases where raw data necessary for meta-analysis was not directly reported but could be determined from the available information, it was calculated according to Cochrane recommendations [58]. Similarly, where individual trials included multiple intervention groups that met inclusion criteria, the groups were pooled according to Cochrane recommendations [58]. In cases where outcomes were only presented in figure format, the necessary data was extracted using WebPlotDigitizer software (version 4.5, Ankit Rohatgi, United States of America).

Where possible, pre-and post-intervention outcome data were meta-analyzed using the Meta-Essentials package (Erasmus University, the Netherlands) for Microsoft Excel [59] using differences for dependent groups and continuous data [60]. The dependent measures meta-analysis used a random-effects model and was based on Hedges’ g. The magnitude of the effect was inferred based on the exercise science-specific thresholds of small (0.20), moderate (0.60), and large (1.20) standardized mean differences, as outlined by Hopkins and colleagues [61]. Correlation coefficients for individual studies were calculated using mean outcome and change data by applying Follmann’s equation [62]. A mean of the calculated correlation coefficients from studies that provided the necessary data was used to impute a correlation coefficient for all other studies. Standardized mean differences were used to account for variation in the units of measure that were presented across different studies, however, wherever possible the most frequently reported unit of measure for each given outcome was included in the meta-analysis to maximize consistency. In cases where the first and second anaerobic thresholds were reported, the second threshold was used for inclusion in the meta-analysis.

2.5 Sensitivity Analyses and Methods for Exploring Heterogeneity

The primary research question was addressed by sub-grouping male and female data. Sensitivity analysis was then undertaken to check the effect of study quality and any observed outlying studies on total effect size and heterogeneity within the meta-analysis. Outlying studies were removed from the analysis if they were observed to have a poor fit with the remaining studies and an underlying methodological explanation for this was identified. Studies that were categorized as poor on the NOS were excluded for sensitivity analysis to assess the effect of study quality and subsequently excluded from all meta-analyses if the effect was deemed significant. An additional sensitivity analysis was undertaken on the primary analysis of \(\dot{V}\)O2max to check the effects of the meta-analytical approach and to estimate raw mean differences for the pooled data. This analysis was undertaken in order to provide more practical estimates of baseline values and effects, as well as checking whether the use of raw mean differences over standardized mean differences would result in any changes in the overall findings. Data were pooled in Review Manager (RevMan) version 5.4.1 (The Cochrane Collaboration) using a random effects model and a raw mean difference directly comparing baseline absolute and relative \(\dot{V}\)O2max for men and women, as well as pre-post measures for \(\dot{V}\)O2max (absolute and relative).

Potential sources of heterogeneity were explored through additional sub-grouping by the pre-determined population characteristics of baseline training status and mean group age, as well as by intervention type and length. Baseline training status was categorized as untrained, moderately trained, and well trained based on the population description by the authors of the primary studies. The grouping of studies into each training status category was confirmed by cross-checking the category against the mean baseline \(\dot{V}\)O2max, where available. In cases where the description of baseline training status was not clear within the primary study, grouping was informed by the baseline \(\dot{V}\)O2max and the homogeneity with other studies in the grouping. Untrained populations were identified as previously sedentary or those not currently participating in regular exercise at baseline. Moderately trained populations included recreationally active individuals. Well-trained populations included those described as ‘well trained’, and elite, or semi-elite athletes. Sub-grouping by age was undertaken using the mean sample group age and categorized as (a) adults under 30 years; (b) participants aged 30–45 years, and (c) participants over 45 years of age. Sub-grouping into intervention type involved categorizing study data into HIIT (interventions using sub-maximal intensities) or SIT (supra-maximal/all-out intensities). Sub-grouping by intervention length involved categorizing study data into interventions ≤ 4 weeks, 5–9 weeks, and ≥ 10 weeks in duration.

2.6 Qualitative Synthesis

For outcomes where only a small number of trials reported data, and meta-analytical methods were deemed to be inappropriate, results were synthesized qualitatively by grouping various measures and reporting the relevant results of each study.

2.7 Assessing the Risk of Bias Across Studies

The risk of bias across studies was assessed using visual inspection of the funnel plots for the primary meta-analyses [63, 64]. The presence of publication bias was assumed if notable asymmetry was present within the funnel plot. Adjusted effect sizes were reported where relevant.

3 Results

3.1 Search Results and Study Characteristics

A total of 33 references from 28 individual trials including 965 participants (462 women and 503 men) were included in the review. One study was initially included in the review but was subsequently excluded due to the inclusion of only two female participants, resulting in the inability to compute an effect size in the meta-analysis [65]. A sex × HIIT analysis was not undertaken by the study authors for the same reason and therefore despite meeting all inclusion criteria the reference could not contribute to the results of the review. The flow of references through the search and screening process is shown in the PRISMA diagram in Fig. 1. The study and population characteristics are shown in Table 1. A summary of the training protocols used is shown in Table 2. Results of the primary meta-analyses for all outcomes and the sub-grouping by participant characteristics are shown in Table 3. Results of sub-groupings by intervention characteristics are shown in Table 4. Individual study results for all fitness and performance outcomes and measures of physiological adaptation are shown in Tables 5, 6, 7 and 8. The PRISMA checklist for the reporting of review methods and results can be found in Supplementary Online Resource 4 in the ESM.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow chart of the search and screening process. HIIT high-intensity interval training, \(\dot{V}\)O2max maximal oxygen uptake

Table 1 Study and population characteristics for included studies
Table 2 Summary of training protocols used within the included studies
Table 3 Summary of meta-analyses of \(\dot{V}\)O2max, peak power output from incremental testing, and work at anaerobic threshold, primary analysis, relative and absolute \(\dot{V}\)O2max, and sub-groupings by participant characteristics
Table 4 Summary of meta-analyses of \(\dot{V}\)O2max and peak power output from incremental testing, sub-groupings by intervention characteristics
Table 5 Summary of outcomes and results of included studies: maximal oxygen uptake
Table 6 Summary of outcomes and results of included studies, peak power output from incremental testing and threshold power
Table 7 Summary of outcomes and results of included studies, additional cardiorespiratory and performance outcomes and measures of fatigue
Table 8 Summary of outcomes and results of included studies, concurrent measures of physiological adaptation

3.2 Risk of Bias Within Studies

Quality appraisal scores for individual studies ranged from five to eight out of a maximum possible score of nine. The majority of studies (n = 17) were classified as good quality, indicating a low risk of bias for the current review. Eight trials were classified as fair and three trials were classified as poor. The mean inter-rater reliability for individual items of the NOS was 82.54% (± 12.17%; range 64.28–100%). The lowest inter-rater reliability scores were for the selection of men compared to women item and the withdrawals and non-adherers item. All included studies applied an equivalent intervention for males and females as indicated by question 7 on the NOS, demonstrating that prescribed exercise protocols were dose-matched between men and women within each meta-analysis. A detailed breakdown of scoring for each study can be seen in Supplementary Online Resource 5 in the ESM.

3.3 Correlation Coefficients

Seven studies [66,67,68,69,70,71,72] contributed data to the imputed correlation coefficients for \(\dot{V}\)O2max, which were calculated as 0.79 (± 0.16) for women and 0.81 (± 0.10) for men. Four studies [67, 71, 73, 74] contributed data for peak power output from incremental exercise testing (PPO), which were calculated as 0.84 (± 0.07) for women and 0.81 (± 0.08) for men. Two studies [73, 75] contributed data for threshold power (power output at lactate or ventilatory threshold; powerAT), which were calculated as 0.53 (± 0.22) for women and 0.47 (± 0.16) for men.

3.4 Cardiorespiratory Fitness Outcomes

3.4.1 Maximal Oxygen Uptake: Study Characteristics and Primary Analysis

Twenty-eight references from 24 individual trials [66,67,68,69,70,71,72, 74, 76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95] presented \(\dot{V}\)O2max outcomes, with 19 trials presenting sufficient data for meta-analysis. A summary of all outcomes and results relating to \(\dot{V}\)O2max is shown in Table 5. All except three of the included studies measured \(\dot{V}\)O2max using direct calorimetry [82, 90, 95]. Of the three studies that did not measure \(\dot{V}\)O2max using direct calorimetry, only two presented sufficient data for meta-analysis [82, 90]. Effect sizes from both studies appeared to be consistent with other studies in all analyses. Out of 32 sex × HIIT/SIT interaction analyses for \(\dot{V}\)O2max that were reported in the primary studies, 26 were not significant. Of the six that were significant, two favored females (one study, relative and absolute \(\dot{V}\)O2max) while four favored males (four studies, all absolute \(\dot{V}\)O2max). The primary meta-analysis was undertaken to address the research question by sub-grouping study data by sex only (see Fig. 2). One study [79] demonstrated a large outlying effect size for both men and women on the initial forest plot and was excluded from all subsequent analyses (see Supplementary Online Resource 6 in the ESM). It is possible this outlying study was due to the use of an arm ergometer for measurement of \(\dot{V}\)O2max whereas all other studies employed cycle ergometry or running protocols. This study also used an upper limb-specific HIIT training protocol using ‘battling ropes’, compared with lower limb-specific exercise modes used in all other studies. After excluding this outlying study, heterogeneity decreased from I2 = 77.26–62.14% for women and I2 = 85.97–78.80% for men. The meta-analysis demonstrated near-moderate effect sizes for increasing \(\dot{V}\)O2max for both men (g = 0.57; p < 0.001) and women (g = 0.57; p < 0.001) with no between-group differences (p = 0.97). Significant levels of heterogeneity were still present for both men (I2 = 78.80%, Q = 84.91, p < 0.001) and women (I2 = 62.14%, Q = 47.54, p < 0.001).

Fig. 2
figure 2

Meta-analysis of \(\dot{V}\)O2max, pre- and post-HIIT or SIT intervention. Standardized mean differences and 95% confidence intervals. HIIT high-intensity interval training, SIT sprint interval training, \(\dot{V}\)O2max maximal oxygen uptake, LL confidence interval lower limit, UL confidence interval upper limit

Three studies [82, 89, 90] were categorized as poor using the NOS and AHRQ criteria, and only two had sufficient data for meta-analysis [82, 90]. Upon removing these studies, sensitivity analysis revealed small changes in effect sizes for \(\dot{V}\)O2max in both men (g = 0.57; p < 0.001) and women (g = 0.52; p < 0.001). Similarly, only small changes in heterogeneity occurred for both men (I2 = 79.91%, Q = 79.64, p < 0.001) and women (I2 = 54.59%, Q = 35.23, p = 0.004). The impact of removing the data from these studies was considered to be small, and hence the data were retained for subsequent meta-analyses and sub-group analyses.

Initial qualitative analysis of \(\dot{V}\)O2max outcomes revealed conflicting results regarding whether there were sex-specific differences when \(\dot{V}\)O2max was considered in relative or absolute terms. When absolute and relative \(\dot{V}\)O2max outcomes were retrospectively meta-analyzed separately, effect sizes were found to be similar for men and women for both outcomes, without the presence of between-group differences for either measure (Table 3; Supplementary Online Resource 7, see ESM).

The sensitivity analysis for the meta-analytical approach and for estimating raw pooled mean differences found that baseline \(\dot{V}\)O2max was significantly higher in men compared with women for both absolute \(\dot{V}\)O2max (between-group Δ 1.06 L·min−1; p < 0.001) and relative \(\dot{V}\)O2max (between-group Δ 5.88 mL·kg−1·min−1; p < 0.001). Heterogeneity was significant for both baseline absolute and baseline relative \(\dot{V}\)O2max (I2 = 70% and 60% respectively; p = 0.001 for both). Pre-post response to HIIT/SIT interventions measured using a raw mean difference was similar between men and women with no significant between-group differences for change in either absolute (men, Δ 0.32 L·min−1 vs women, Δ 0.20 L·min−1; p = 0.38) or relative \(\dot{V}\)O2max (men, Δ 3.50 mL·kg−1·min−1 versus women, Δ 3.34 mL·kg−1·min−1; p = 0.88). Heterogeneity was low for the pre-post analysis of both absolute and relative \(\dot{V}\)O2max (I2 = 0% for both; p = 0.97 and p = 1.0, respectively). Forest plots for this sensitivity analysis are shown in Supplementary Online Resource 8a–d (see ESM).

3.4.2 Maximal Oxygen Uptake: Sub-Groupings

When \(\dot{V}\)O2max data were stratified for training status (Fig. 3a), baseline training status accounted for significant levels of heterogeneity for moderately trained and well-trained men and women (I2 range 0.00–34.48; p range 1.00–0.19). Mean baseline \(\dot{V}\)O2max in the moderately trained groups tended to sit between 35 and 48 mL·kg−1·min−1 in men, and 28 and 40 mL·kg−1·min−1 in women, whereas untrained groups and well-trained groups tended to sit below and above those ranges, respectively. The results of this meta-analysis showed significant differences overall with smaller effect sizes present in the moderately trained groups, but effect sizes for men and women were similar (Table 3).

Fig. 3
figure 3

Sub-grouping of meta-analysis of \(\dot{V}\)O2max, pre- and post-HIIT or SIT intervention, a by baseline training status, and b by mean group age. Standardized mean differences and 95% confidence intervals. HIIT high-intensity interval training, SIT sprint interval training, \(\dot{V}\)O2max maximal oxygen uptake, LL confidence interval lower limit, UL confidence interval upper limit

Sub-group analysis for men and women by the mean age of the participant group (Fig. 3b) did not account for significant levels of heterogeneity in any of the sub-groups, with the exception of women under 30 years (I2 = 35.93; p = 0.12). All significant heterogeneity in the men's 30–45-year-old group was accounted to one study [72]. The exclusion of this study resulted in a substantially lowered effect size for this group (g = 0.18) and a significant between-group difference overall (p < 0.001), indicating high sensitivity and a general lack of robustness within this particular analysis. Despite the application of the age categories, participants over 45 years actually only consisted of studies with a mean age of ≥ 59 years.

Sub-grouping by intervention type comparing HIIT and SIT protocols demonstrated no significant between-group differences (p = 0.72; Fig. 4a), whereas sub-grouping by intervention length demonstrated a significant between-group difference (p < 0.001) with significantly smaller effect sizes present in both men and women for interventions with a duration of 4 weeks or less compared to those with a longer duration (Fig. 4b).

Fig. 4
figure 4

Sub-grouping of meta-analysis of \(\dot{V}\)O2max, pre- and post-HIIT or SIT intervention, a by intervention type, and b by intervention length. Standardized mean differences and 95% confidence intervals. HIIT high-intensity interval training, SIT sprint interval training, \(\dot{V}\)O2max maximal oxygen uptake, LL confidence interval lower limit, UL confidence interval upper limit

3.5 Performance Outcomes

A multitude of different performance outcomes were presented in the included studies. Fourteen references from 11 individual trials presented measures of PPO [67, 71, 73, 74, 80, 81, 83,84,85, 89, 92,93,94, 96], and seven references from six individual trials presented measures of powerAT (n = 6) [67, 73, 75, 84, 85, 95, 96]. A summary of the outcomes and results for PPO and powerAT is outlined in Table 6. Forest plots for the meta-analyses of PPO, and PPO sub-grouped by baseline training status and intervention type, are shown in Figs. 5, 6a, b, respectively. The forest plot for powerAT is shown in Fig. 7. Power and \(\dot{V}\)O2-based outcomes that were not meta-analyzed included peak power output during training sessions (n = 2) [80, 91], threshold \(\dot{V}\)O2 (n = 4) [67, 72, 85, 86], Wingate outcomes (n = 2) [66, 76, 97], and relative and absolute power output (peak and at lactate threshold) from an arm cycle protocol (n = 1) [67]. Additional performance measures included time trials for running (n = 1) [98] and cycling (n = 3) [73, 80, 91], maximal speed (n = 2) [70, 95], 40-m sprint ability (n = 1) [98], repeated sprint ability (n = 2) [86, 98], fatigue (n = 4) [73, 76, 80, 93], and speed decrement (n = 1) [95].

Fig. 5
figure 5

Meta-analysis of peak power out from incremental exercise testing (PPO), pre- and post-HIIT or SIT intervention. Standardized mean differences and 95% confidence intervals. HIIT high-intensity interval training, SIT sprint interval training, LL confidence interval lower limit, UL confidence interval upper limit

Fig. 6
figure 6

Sub-grouping of meta-analysis of peak power out from incremental exercise testing (PPO), pre- and post-HIIT or SIT intervention, a by baseline training status, and b by intervention type. Standardized mean differences and 95% confidence intervals. HIIT high-intensity interval training, SIT sprint interval training, LL confidence interval lower limit, UL confidence interval upper limit

Fig. 7
figure 7

Meta-analysis of threshold power (PowerAT), pre- and post-HIIT or SIT intervention. Standardized mean differences and 95% confidence intervals. HIIT high-intensity interval training, SIT sprint interval training, LL confidence interval lower limit, UL confidence interval upper limit

3.5.1 Peak Power Output from Incremental Testing

Eight trials reporting peak power output from incremental testing were meta-analyzed [67, 74, 83,84,85, 92, 93, 96]. All trials tested PPO using a cycle ergometer protocol. Results demonstrated significant increases in PPO for all female and male subgroups. Women consistently demonstrated larger percent increases (6.71–13.99%) and effect sizes (g, range 0.35–0.77) for PPO compared to men (2.56–12.23%; g, range: 0.17–0.60), and the between-group difference reached the threshold for statistical significance in the sub-grouping by baseline training status, due to the larger effect size for well-trained women (g = 0.37) compared with well-trained men (g = 0.17; p = 0.05). Baseline training status accounted for all significant heterogeneity in PPO in moderately trained and well-trained men and women (I2, range: 0.00–54.31%; p, range: 0.14–0.93); however, significant levels of heterogeneity were present for the total sample and untrained sub-groups (I2, range: 54.58–79.57%; p, range: 0.00–0.09). Due to the smaller number of studies, PPO could not be sub-grouped by mean group age or intervention length.

3.5.2 Meta-Analysis of Threshold Power

Five trials [67, 73, 75, 84, 85] presented sufficient data to meta-analyze outcomes relating to powerAT. No differences were demonstrated between men and women (p = 0.96). The percent increase in powerAT for men was 7.09 ± 7.17% (small effect size: g = 0.38; p < 0.01), and that for women was 8.07 ± 6.55% (small effect size: g = 0.38; p < 0.01). Some inconsistency existed in the units presented for these outcomes (e.g., work presented as Watts, W/kg, and speed in m/s) and the measures of anaerobic thresholds (lactate thresholds and ventilatory thresholds both included); however, despite this, heterogeneity was not significant, and the grouping of these outcomes appeared to be appropriate. Results demonstrated small increases in powerAT for men and women with low heterogeneity (men: I2 = 29.85%, Q = 5.70, p = 0.22; women: I2 = 31.26%, Q = 5.82, p = 0.21). Due to the small number of studies presenting relevant data, outcomes for powerAT could not be further sub-grouped.

3.5.3 Additional Performance Outcomes

A summary of results for additional performance outcomes and measures of fatigue is shown in Table 7. Most performance outcomes showed no significant differences in the magnitude of improvement between men and women. In cases where significant sex × HIIT interactions existed, these included a greater improvement in mean and maximal Wingate power output [97], repeated sprint speed decrement [95], and a 3000-m cycling time trial [98] for women compared to men. Additionally, one study demonstrated a significant correlation between the change in power output at the second lactate threshold (LT2) and the change in 40-km time trial performance for women (r2 = 0.77; p < 0.01) [73], while no relationships with any of the measured variables were present for men (r2 = 0.01–0.21; p all < 0.05). One study demonstrated a greater improvement in men for the mean power of the third of four repeated sprints within SIT sessions (pertaining to less power decrement over repeated sprints) [91].

3.6 Concurrent Measures of Physiological Adaptation

Various physiological adaptions that were measured alongside other fitness and performance outcomes were reported in 15 trials [67, 70, 74, 76, 78,79,80, 83, 85,86,87, 91, 93, 96, 97]. These included maximal accumulated oxygen deficit (MAOD; n = 1) [74], various blood lactate measures (n = 8) [67, 70, 73, 74, 79, 85, 86, 95, 96], cardiac adaptations (n = 1) [80], mitochondrial and metabolic adaptations (n = 3) [81, 91, 93, 97], muscle fiber types (n = 1) [97], and correlational analyses for fitness or performance outcomes (n = 4) [67, 73, 80, 96]. A summary of the results relating to these concurrent measures of physiological adaptation is shown in Table 8.

Sex × HIIT interactions for most blood lactate and cellular or muscular measures were either not reported or not significant. Cases where significant interactions indicated greater increases in women compared to men after HIIT included maximal [67] or post-test blood lactate [70], type II muscle fiber cross-sectional area [97], and muscle glycogen content [97]. Conversely, significant interactions where men demonstrated greater increases compared to women included total muscle creatine content [97], muscle fiber β-HAD activity and GLUT4 protein content [83], coupled and uncoupled mitochondrial respiratory capacity [81], muscle mitochondrial biogenesis [91], and muscle protein synthesis [91]. All significant sex × HIIT interactions relating to central cardiorespiratory measures other than \(\dot{V}\)O2max indicated greater changes for men compared to women following HIIT. These included significant interactions for increases in maximum carbon dioxide output [76], peak cardiac output [80], peak stroke volume [80], peak cardiac index [80], maximal minute ventilation [70], oxygen pulse [67], and accumulated oxygen uptake [74], and decreases in accumulated oxygen deficit [74].

3.7 Assessment of Publication Bias

The funnel plots for the meta-analyses of \(\dot{V}\)O2max, PPO, and powerAT are shown in Fig. 8a–c. Some asymmetry can be seen on the funnel plots for \(\dot{V}\)O2max and PPO where a lack of data points can be seen at the negative effect size area of the plot. The combined male and female adjusted effect size for \(\dot{V}\)O2max was calculated as 0.32 (95% confidence interval [CI] 0.27–0.38) compared with the observed effect size of 0.49 (95% CI 0.43–0.55). The combined adjusted effect size for PPO was calculated as 0.25 (95% CI 0.19–0.32) compared with the observed effect size of 0.41 (95% CI 0.33–0.49). Although the observed asymmetry in these funnel plots may indicate some risk of publication bias, most included studies were laboratory studies where results were based on adherence to strict protocols (see also the fidelity check in Table 2). Due to the dose–response effects of exercise load and cardiorespiratory outcomes, it is unlikely that many studies with high adherence would have produced negative overall effect sizes. Similarly, the objectives of the review necessitated this level of adherence since the aim of the review was to explore and quantify the impact of biological sex on observed physiological adaptations. As such, the presence of some asymmetry within the funnel plots may not indicate excessive publication bias within the current objectives of this review.

Fig. 8
figure 8

Funnel plots of the pre-post effect size (standardized mean differences; Hedges’ g) versus standard error for a maximal oxygen uptake (\(\dot{V}\)O2max), b peak power output from incremental exercise testing (PPO), and c threshold power (PowerAT) in response to HIIT or SIT interventions. Funnels represent the 95% prediction interval for pooled, observed effect size. HIIT high-intensity interval training, SIT sprint interval training

A visual inspection of the symmetry of the datapoint distribution for powerAT shows no asymmetry as an indication of notable publication bias in the pre-post effect meta-analysis. Observed and adjusted effect sizes were equivalent at 0.37 (95% CI 0.22–0.53); however, it should be noted that this meta-analysis only included data from five studies. Additionally, some reporting bias for all meta-analyzed outcomes was noted with some studies reporting a lack of sex differences in outcomes and thereby pooling male and female data. Although efforts were taken to obtain data from authors where possible, this type of reporting has resulted in some missing data from the meta-analyses. It is, however, unlikely that this missing data would have significantly affected the results for \(\dot{V}\)O2max and powerAT, since these strongly indicated no between-group differences in the magnitude of adaptation. The meta-analysis of PPO could have benefitted from additional data which may have influenced the final results.

4 Discussion

The main finding of the current review with meta-analyses is that men and women improve fitness and performance outcomes to a similar extent following equivalent HIIT and SIT interventions. In particular, meta-analyzed outcomes for \(\dot{V}\)O2max and powerAT revealed strikingly similar small to moderate increases for men and women. These findings are consistent with those of Weston and colleagues [50], who found moderate improvements in \(\dot{V}\)O2max in active and sedentary adults in response to HIIT. Until recently, there has been insufficient research to conclude whether or not sex differences exist in fitness adaptations to HIIT interventions. Our findings expand on the work of Weston and colleagues [50], who could not come to a conclusion regarding a sex-specific response to HIIT, by demonstrating near-identical effect sizes in women and men and a lack of significant between-group differences for the primary meta-analysis of any outcome.

The sensitivity analysis that was undertaken to estimate raw mean differences found that baseline \(\dot{V}\)O2max in men was significantly higher, the equivalent of 1.06 L·min−1 or 5.88 mL·kg−1·min−1, compared with women, as expected. The pre-post analysis using a raw mean difference indicated significant overall improvements of approximately 0.23 L·min−1 and 3.40 mL·kg−1·min−1, without the presence of significant between-group differences for men and women. This analysis indicates that the general results were not altered by the use of a standardized mean difference designed for dependent data and provides more practical estimates of effect. The results of this analysis should be taken with some caution, however, since the meta-analytical approach assumes independent data. In particular, the low levels of heterogeneity in the pre-post analyses appear to be underestimated.

While larger effect sizes for PPO were demonstrated for women compared with men across all sub-groupings, this difference only reached the threshold of significance in participants who were well trained at baseline. Despite the presence of this apparent difference, it must be noted that the well-trained male and female sub-groups only consisted of two studies and had a small sample size of only 18 men and 19 women. Baseline training status accounted for all significant heterogeneity in \(\dot{V}\)O2max and PPO outcomes in moderately trained and well-trained men and women. While decreases in the variability of outcomes for participants who were trained at baseline could make sense from a physiological standpoint as fitness and performance outcomes move closer to a theoretical physiological ceiling, it is unclear from the current analysis what factors contributed to the variability of outcomes for untrained participants.

A significant between-group difference was present overall for \(\dot{V}\)O2max when sub-grouped by baseline training status, with smaller effect sizes present for the moderately trained groups. This likely reflected some confounding from shorter intervention lengths in the moderately trained groups. All except one study within the moderately trained groups had an intervention length ranging between 2 and 4 weeks, while studies in the well-trained category had a range of 6–8 weeks. Consistent with this, the sub-grouping by intervention length revealed significant between-group differences with smaller effect sizes seen for interventions of ≤ 4 weeks in duration for both sexes. While the female data demonstrated a gradual increase in effect size with longer interventions, the male data demonstrated similar effect sizes but greater variability with all sub-groups longer than 4 weeks in duration. Interestingly, these findings are consistent with the study by Hirsch et al. [72], who highlighted potential differences in the rate of adaptation between men and women, with significant changes in \(\dot{V}\)O2max occurring during the first 4 weeks for the men in their study, while the significant changes in women occurred during the second 4 weeks.

Conversely, the sub-group analyses for mean group age and intervention type (HIIT versus SIT) did not significantly influence the heterogeneity in \(\dot{V}\)O2max or PPO outcomes and revealed no significant between-group differences. Furthermore, the observed sensitivity of the mean age sub-group analysis indicated fundamental issues with the robustness of this grouping, and as such these results should be interpreted with caution. Overall, it appears that baseline training status and intervention length may be important factors influencing the variability of outcomes in response to HIIT and SIT interventions for both sexes, rather than age or intervention type.

Except for conflicting evidence regarding fatigue and speed/power decrement, the qualitative analysis of results of the additional performance outcomes (outlined in Table 7) demonstrated a similar lack of differences between men and women regarding the magnitude of change for most performance outcomes. Generally, the included studies supported current knowledge that women are less fatigable than men [66, 73, 76, 91, 95, 96] prior to HIIT. While some of the studies included in this review indicated that men may be able to adapt this to a greater extent than women through HIIT [91, 93], some studies found significant improvements in fatigability or power output through repeated efforts in women only [95, 97], while still other studies found no sex differences at all regarding the change in fatigability [66, 73, 76, 96].

Despite the notable lack of sex differences in the magnitude of adaptation of fitness and performance outcomes in response to HIIT, some of the findings outlined in Tables 7 and 8 indicated potential differences regarding the underlying mechanisms contributing to these improvements. Most notably, all significant sex × HIIT interactions reported in the primary studies that related to central cardiorespiratory adaptations favored men, with women generally demonstrating smaller, and often non-significant changes. Examples of this included the observed increases in peak cardiac output, peak stroke volume, and peak cardiac index in men only after SIT, as demonstrated by Bostad et al. [80]; the significant sex × HIIT interaction for oxygen pulse (the amount of oxygen ejected from the ventricles with each cardiac contraction) with a greater increase in men compared with women as demonstrated by Marterer and colleagues [67]; the increases in accumulated oxygen uptake and decreases in accumulated oxygen deficit after HIIT seen only in men, as demonstrated by Weber and Schneider [74]; and finally, the increases in minute ventilation and decreases in maximum heart rate and the ventilatory equivalents for oxygen and carbon dioxide in men only, as demonstrated by Mucci and colleagues [70]. These findings, together with the increase in coupled and uncoupled mitochondrial respiratory capacity that was demonstrated in men only, as outlined by Chrøis et al. [81], indicate that improvements in oxygen delivery and uptake in men may play a greater role in achieving fitness and performance adaptations after HIIT compared with women. While adaptations to high-intensity exercise via improvements in oxygen delivery and uptake may seem counter-intuitive on initial contemplation, previous studies have indicated that the greater oxygen availability during exercise in women provides an advantage with regard to the fatigability of muscular contractions, even at maximal intensities [43, 99]. Consistent with this, it appears that the sex differences in fatigability can be eliminated under ischemic conditions [100]. As such, it is conceivable that such adaptations may contribute to improvements in maximal fitness and performance outcomes. Similar findings have also been reported in a previous review of responses to endurance training, where increases in left ventricular end-diastolic volume and stroke volume increased to a greater extent in men compared with women [101]. Although this meta-analysis reported outcomes using mean differences, thereby reporting absolute changes in these outcomes, the results reported in the primary studies included in the current review indicate that these differences may also exist in these changes when considered relative to baseline.

In contrast, there was a relative lack of evidence surrounding the mechanisms of adaptation that account for the equivalent improvements in fitness and performance outcomes in women. Interestingly, the study by Hoffmann and colleagues [73] reported a significant correlation where a change in threshold power (LT2) accounted for 77% of the performance improvement in the 40 km cycling time trial for women. This same result was not observed for men, nor were there any significant correlations between change in time trial performance and measures of heart rate or blood lactate at LT2, absolute or relative peak power output, or incremental time to fatigue. Another study in the current review [97] found a significant increase in mean power during repeated Wingate tests and greater increases in the cross-sectional area of type IIb muscle fibers in women only in response to four weeks of SIT. Although these are the findings of only two studies, which could have been influenced by exercise mode since both used cycle-based testing and training protocols, some additional information can be gathered from outcomes that have been presented in other contexts. Spina and colleagues [102] compared mechanisms of adaptation in older men and women who participated in 9–12 months of moderate to vigorous uphill walking and running-based endurance training and found that 66% of the \(\dot{V}\)O2max adaption in older men was accounted for by a 15% increase in stroke volume in combination with a 7% increase in arteriovenous oxygen difference at maximal exercise. In contrast, this study found no change in stroke volume in older women and the whole change in \(\dot{V}\)O2max could only be contributed to an increase in peripheral oxygen extraction. In another example, similar to the relationship noted by Hoffmann and colleagues [73], one cross-sectional study [103] found that 60% of the variability in 10-km performance for highly trained female runners aged 23–47 years was explained by running velocity at the lactate threshold. The authors noted, however, that this relationship was age-dependent, with \(\dot{V}\)O2max explaining 74% of the variability in performance in women aged 37–56 years. The potential differences highlighted here alongside the strikingly similar effect sizes seen for \(\dot{V}\)O2max in the current meta-analysis suggest some potential sex differences in the adaptive responses to HIIT/SIT that may warrant further investigation.

Despite highlighting potential differences in the adaptative responses to HIIT and SIT in men and women, these appear to be only different means to the same end. Improvements in maximal cardiorespiratory fitness along with many other performance measures outlined in the results of the current review were found to occur to a similar magnitude in men and women. In contrast to this, a previous meta-analysis by Diaz-Canestro and Montero [52] reported greater increases in absolute and relative \(\dot{V}\)O2max in men compared with women in response to moderate-intensity endurance training. This discrepancy between the results of these two meta-analyses may be influenced by the differing interventions (endurance training vs HIIT/SIT) or the methodological differences between the two meta-analyses. The earlier review by Diaz-Canestro and Montero [52] presented absolute change in \(\dot{V}\)O2max (mL·kg−1·min−1 and mL·min−1) using raw mean differences, whereas the current review primarily focused on changes relative to baseline reflected as standardized effect sizes, percentage change, and sex × HIIT interactions (see also the comment by Senefeld and colleagues [104]). Despite these differences in overall approach, the sensitivity analysis using a raw mean difference in the current review persisted to indicate that there were no significant differences between men and women for either absolute or relative \(\dot{V}\)O2max in response to HIIT/SIT interventions. Overall, these findings seem to indicate that sex differences in \(\dot{V}\)O2max response may be protocol dependent, and could warrant further investigation.

5 Limitations

The current review has a number of limitations. Firstly, in order to minimize confounding from different exercise protocols (exercise dose) across studies, only studies that presented both male and female data were included in the current review. While this ensures that exercise dose is normalized between male and female sub-groups and provides relatively similar numbers of men and women within each analysis, the majority of the included studies were small, which limits the number of participants within the meta-analyses overall. Since studies with small sample sizes tend to be associated with larger effect sizes and greater error, some of the effect sizes seen here have the potential to be overestimated or unduly influenced by a small number of participants. Despite this, the focus of the current review was to identify differences between the relative change in outcomes for men and women rather than quantifying effect sizes. The general effects of HIIT, particularly with relation to \(\dot{V}\)O2max, have been demonstrated for mixed male and female groups in previous meta-analyses [4, 9, 50], and the current analysis strongly indicates that sex differences in the relative change in these outcomes are minimal.

Another potential limitation of the current meta-analysis could be that while prescribed training doses were matched between men and women in all studies, there was an inability to properly assess the actual dose received in many studies and whether this was consistent with the prescribed dose (dose delivered). Many studies provided only basic details regarding compliance to prescribed exercise, with only a few reporting that this remained consistent between men and women. Despite this, many of the included studies appeared to have been tightly controlled interventions, in which case the differences between the prescribed and actual training doses are unlikely to have been substantially different.

In addition to the small pooled sample size and challenges associated with assessing the fidelity of the interventions, the use of pre-post meta-analysis techniques and multiple effect sizes from individual studies in the same analysis (such as the matched male and female sub-group data) have been widely used, but also widely debated in the literature [105,106,107]. While the use of dependent pre-post data has been largely accounted for with the use of correlation coefficients in the meta-analyses, sufficient data was not available to calculate correlation coefficients for all studies, therefore many of these were imputed. Furthermore, while the design of the current review ensures that prescribed exercise dose is matched between male and female groups, the wide range of intervention lengths and exercise protocols included in the literature makes it difficult to precisely examine the influence of different protocols on these outcomes. Overall, the approaches used within the current review with meta-analyses aimed to minimize statistical and methodological errors as much as possible in the face of the unique set of challenges associated with the research question; however, the results should be considered within the constraints of the limitations that are outlined here.

Finally, while the current review provides some insight into generalized similarities and differences between men and women regarding physiological adaptations to HIIT interventions, evaluation of the influence of hormonal status on outcomes was outside of the scope of this review. While the influence of hormonal fluctuations in women may be somewhat offset by the current focus on adaptation over several sessions, generally spanning weeks or months, these concepts appeared to be largely overlooked in the included studies and should be a focus of future primary and secondary research. Despite the limitations outlined here, the findings from the current review will be critical in order to fill the research gaps and to promote better optimization of exercise prescription and health for both women and men.

6 Conclusions

The current review with meta-analyses aimed to clarify sex differences in the adaptations of fitness and performance outcomes in response to HIIT and SIT interventions. The main findings of this review indicated that the magnitude of change in \(\dot{V}\)O2max and powerAT in response to HIIT interventions is similar for men and women. While a borderline significant sex difference was found for PPO in well-trained men and women, the sub-groups consisted of small sample sizes and therefore should be interpreted with caution. Additionally, qualitative analysis of performance outcomes and concurrent measures of physiological adaptation indicated potential differences in the underlying mechanisms of adaptation for men and women. Lastly, it appears that baseline training status and intervention length may play a role in influencing the variability of \(\dot{V}\)O2max and PPO outcomes in both sexes, including significantly smaller effect sizes for interventions with a duration of 4 weeks or less.