FormalPara Key Points

When compared with a natural menstrual cycle, oral contraceptive pill (OCP) use might result in slightly inferior exercise performance, although any group level effect is most likely to be trivial, and as such from a practical perspective, the current evidence does not warrant general guidance on OCP use compared with non-use.

Exercise performance appeared relatively consistent across the OCP cycle, suggesting that different guidance is not warranted for OCP-taking days versus non-OCP taking days.

In the case of sportswomen who are focussing on performance, it is recommended that an individualised approach is sought, based on each athlete’s response to OCP use.

1 Introduction

Sex hormones are one of the main determinants of biological sex [1]. During adulthood, levels of testosterone, the predominant male sex hormone, remain consistent in men [2], whilst concentrations of oestrogen and progesterone, the prevailing female sex hormones, undergo circamensal changes in women [3], marking one of the major differences between sexes. Moreover, the eumenorrheic menstrual cycle is susceptible to internal (e.g., amenorrhea, oligomenorrhea and menorrhagia) and external (e.g., hormonal contraceptives) perturbations, highlighting the diversity in ovarian hormone profiles between women. In a recent audit of 430 elite female athletes, Martin et al. [4] showed that 213 athletes were hormonal contraceptive users, meaning that almost half of the population surveyed did not have a eumenorrheic menstrual cycle. Of these, 145 (68%) athletes reported taking oral contraceptive pills (OCPs), making them the most common type of hormonal contraceptive used and the second most common hormonal profile, after non-hormonal contraceptive users. These differences in endocrine profiles, between men and women, and amongst women (i.e., hormonal contraceptive users and non-users), highlight the need for sex-specific consideration within sport and exercise science.

Combined OCPs significantly reduce endogenous concentrations of 17 beta oestradiol and progesterone [5], when compared to the mid-luteal phase of the menstrual cycle, a stage when endogenous oestradiol and progesterone are relatively high. The exogenous oestrogens and progestins act via negative feedback on the gonadotrophic hormones, resulting in the chronic downregulation of the hypothalamic-pituitary-ovarian axis. Most combined, monophasic OCPs are second generation OCPs, containing low to standard doses of ethinyl oestradiol and either levonorgestrel, norethisterone, desogestrel or gestodene, delivered in a fixed amount every day for 21 OCP taking days (i.e., consumption phase), followed by 7 OCP free days (i.e., withdrawal phase) [6]. In some countries, rather than a consumption and withdrawal approach, there are 21 active OCP days and 7 inactive OCP days. There are many types of OCPs with different compositions and potencies; for a comprehensive overview of hormonal contraceptives and OCPs please see Elliott-Sale and Hicks [6]. Overall, OCP use results in four distinct hormonal environments: (1) a downregulated endogenous oestradiol profile of ≈ 60 pmol·L−1 for 21 days that rises during the 7 OCP free days to ≈ 140 pmol·L−1; (2) a chronically downregulated endogenous progesterone profile of ≈ 5 nmol·L−1; (3) a daily surge of synthetic oestrogen and progestin that peaks within 1 h after ingestion [from ≈ 2 to ≈ 6 pg·mL−1], with baseline values accumulating slightly from ≈ 2 to ≈ 3 pg·mL−1 over the 21 OCP-taking days; (4) 7 exogenous hormone-free days [7]. These profiles, reflecting OCP consumption and withdrawal, are referred to as pseudo-phases, as they are “artificial” phases in comparison with the phases of the physiological menstrual cycle.

Aside from fertility control, OCPs are also used to alleviate the symptoms of dysmenorrhoea and menorrhagia; reduce the occurrence of premenstrual tension, symptomatic fibroids, functional ovarian cysts and benign breast disease; and decrease the risk of ovarian and endometrial cancer and pelvic inflammatory disease [8]. Furthermore, athletic populations have reported strategically using OCPs to manipulate the timing of, or omit entirely, the often-perceived inconvenient withdrawal bleed that occurs during the 7 OCP free days, using back-to-back OCP cycles [4, 9, 10]. Reliable and reversible contraception, along with the means to alleviate the side-effects associated with the eumenorrheic menstrual cycle, such as cramps/pain, bloating and headaches, and the ability to eliminate unpredictable menstruation, make OCPs a desirable option for many athletes.

Despite the prevalence of OCP use in athletic populations [4], the effects of OCPs on exercise performance are poorly understood. Although many experimental studies [11,12,13], numerous narrative and systematic reviews [14, 15] and books [16, 17] have addressed this topic, few in the area of sport and exercise science (e.g., athletes, coaches, practitioners or researchers) truly understand the implications of OCP use on exercise performance, as previous research has shown conflicting findings on the directional effects of OCPs on outcomes such as muscle function [18, 19], aerobic and anaerobic [20,21,22] capacity and performance-based tests [23, 24]. As such, it is not possible to provide useful guidance to either the sporting or research community on how to work with athletes or participants using OCPs. Accordingly, the aim of this review was to investigate the effects of OCP use on exercise performance in women by making a between group comparison of OCP users and non-users (i.e., naturally menstruating counterparts) and a within group comparison of OCP consumption and withdrawal. This is the first meta-analysis on the effects of OCPs on exercise performance. Additionally, this review is the first of its kind to appraise the quality of previous studies using robust assurance tools.

2 Methods

2.1 Design

The review was designed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Electronic Supplementary Material Appendix S1) guidelines [25], and consideration of the Population, Intervention, Comparator, Outcomes and Study design (PICOS, Table 1) was used to determine the parameters within which the review was conducted.

Table 1 Population, intervention, comparator, outcomes and study design (PICOS) criteria
Table 2 Overview of studies included in the systematic review and meta-analysis

2.2 Study Search and Selection

PubMed, The Cochrane Central Register of Controlled Trials (CENTRAL), ProQuest and SPORTDiscus were systematically searched using the search terms “oral contraceptives” AND “athletic performance”; “sports performance”; “muscle”; “skeletal muscle”; “strength”; “force”; “muscular strength”; “muscular force”; “power”; “anaerobic”; “anaerobic power”; “anaerobic performance”; “anaerobic capacity”; “aerobic”; “aerobic capacity”; “aerobic power”; “aerobic performance”; “endurance”; “endurance capacity”; “endurance power”; “endurance performance”; “fatigue”; “recovery”. Searches were limited to humans, English, and females and no date restriction was applied. Only original research articles were considered for inclusion and review articles or conference abstracts were excluded. An example electronic search strategy for PubMed, including limits, can be found in Electronic Supplementary Material Appendix S2. All searches were conducted in January 2019 by KES. Three independent reviewers (KES, KLM and KMH) undertook a three-phase screening strategy: title and abstract, full-text screen and full-text appraisal. The search was updated in April 2020 using the same search criteria and screening strategy. These papers were subsequently included within the review and the meta-analysis was updated.

2.3 Data Extraction and Quality Appraisal

Data were extracted by ED using a pre-piloted extraction sheet. When data were presented in graphical, and not in numerical format, DigitizeIt software (Version 2.3, DigitizeIt, Germany) was used to convert the data. The quality of each review outcome (defined as each of the statistical models undertaken) was assigned using a strategy based on the recommendations of the Grading of Recommendations Assessment Development and Evaluation (GRADE) working group [26]. This approach considers the quality of research outcomes in a systematic review according to five domains, namely risk of bias, directness, consistency, precision and evidence of publication bias. Risk of bias and directness were assessed at the individual study level with mode ratings used to categorise whole outcomes. The meta-analysis results were subsequently used to ascertain the consistency, precision and risk of publication bias for each outcome. Each individual study was initially appraised using a modified version of the Downs and Black Checklist [27], which was specifically tailored for use in this review (see Electronic Supplementary Material Appendix S3). The modified quality appraisal checklist comprised 15 outcomes, and had a maximum attainable score of 16, with all studies classified as being of high (H; 14–16), moderate (M; 10–13), low (L; 6–9) or very low (VL; 0–5) quality. The results of this assessment were used to assign an a priori quality rating to each outcome. This a priori rating was either maintained, or downgraded a level, based on the response to two questions that were considered key to the directness of the research design, i.e., Question 1: was the natural menstrual cycle phase confirmed using appropriate biochemical outcomes? Question 2: was the type of OCP described to the level of detail required for categorisation or replication? With regards to Question 1, for studies with OCP groups only, biochemical confirmation was not deemed necessary, as OCP users do not have cyclical fluctuations in endogenous sex hormones, in which case the a priori score was maintained rather than downgraded. This rating was then either maintained, or downgraded another level based on whether the results obtained were consistent (determined by visual inspection of effect size estimates and the degree of credible intervals [CrI] overlap); precise (with outcomes downgraded if they were based on < 5 data points) and whether or not publication bias was evidence (determined using Egger’s test along with visual inspection of funnel plots as described in Sect. 2.4). The proportion of studies in each category was reported, with the mode considered to represent the overall quality rating for each individual review outcome. Two independent reviewers (KES and KMH) verified the data extraction and quality appraisal.

2.4 Data Analysis

Data were extracted from studies comprising both between group and within group designs. Pairwise effect sizes were calculated by dividing mean differences by pooled standard deviations. At the study level, variance of effect sizes were calculated according to standard distributional assumptions [28]. All meta-analyses were conducted within a Bayesian framework enabling the results to be interpreted more intuitively compared to a standard frequentist approach through use of subjective probabilities [29]. With a Bayesian framework, dichotomous interpretations of the results of a meta-analysis with regards to the presence or absence of an effect (e.g., with p values) can be avoided, and greater emphasis placed on describing the most likely values for the average effect and addressing practical questions such as the probability the average effect is beyond a certain threshold [29]. The Bayesian framework is also particularly suited to hierarchical models and sharing information within and across studies to improve estimates [29]. In the present meta-analysis, three-level hierarchical models were conducted to account for covariance in multiple outcomes presented in the same study [30]. Initial models were conducted including both strength and endurance outcomes with a regression coefficient assessing difference in the average effects. Where no evidence of a difference was identified, the model was re-run combining both categories of outcomes to increase data to better estimate model parameters. Given the expectation of relatively small effect sizes, an a priori threshold of ± 2 was identified for outliers. Primary analyses were completed with outliers removed but results also presented from the full complement of studies as sensitivity analyses. Additionally, sensitivity analyses were conducted on data obtained from studies categorised as “high” or “moderate” in quality. Inferences from all analyses were performed on posterior samples generated by Hamiltonian Markov Chain Monte Carlo with Bayesian 95% CrIs constructed to enable probabilistic interpretations of parameter values [29]. Interpretations were based on visual inspection of the posterior sample, the median value (ES0.5: 0.5-quantile) and 95% CrIs. Cohen’s [31] standard threshold value of 0.2 was used to describe effect size as small, and values between 0 and 0.2 were described as trivial. Analyses were performed using the R wrapper package brms, which was interfaced with Stan to perform sampling [32]. Convergence of parameter estimates was obtained for all models with Gelman–Rubin R-hat values below 1.1 [33]. Additional sensitivity analyses were conducted by restricting the analysis to studies that included exercise performance as the primary study outcome. Assessment of publication bias using Egger’s multilevel test with effect sizes regressed on inverse standard errors [34] identified no evidence of publication bias with median absolute intercept values less than 0.1 across all analyses.

2.5 Rationale for Between Group Comparisons

For the between group analyses of habitual OCP users to naturally menstruating women, the OCP withdrawal phase [days 1–7] was compared with the early follicular phase [days 1–5] of the menstrual cycle and the OCP consumption phase [days 8–28] was compared with all phases of the menstrual cycle [days 6–28] except the early follicular phase [days 1–5]. The OCP withdrawal phase was compared with the early follicular phase as during the withdrawal phase OCP users experience a withdrawal bleed and during the early follicular phase of the menstrual cycle women experience menstruation. In addition, during both phases endogenous concentrations of oestrogen and progesterone are comparably low. During the remainder of the menstrual cycle, endogenous concentrations of oestrogen and progesterone change over time (e.g., the mid-cycle peak in oestrogen and the mid-luteal rise in progesterone and oestrogen) and there is large variation in endogenous concentrations of oestrogen and progesterone as a result of different OCP formulations. As such, it is difficult to make meaningful comparisons during these phases and this could be considered a limiting factor of any meta-analysis making between group comparisons of naturally menstruating women and OCP users. To reduce the impact of this limitation, a sensitivity analysis was completed on the between group design data to better match the physiological menstrual cycle and OCP pseudo-phases. This was achieved by mapping days 1–5, 12–16 and 19–23 from both cycles, which correspond with the early follicular, ovulatory and mid-luteal phases in a natural menstrual cycle and represents the following hormonal profiles: low oestrogen and progesterone, high oestrogen and low progesterone and high progesterone and medium oestrogen. As such, this meta-analysis (1) compared the two most stable phases of the OCP and menstrual cycles in the first between group analyses; (2) compared the two least stable phases of the OCP and menstrual cycles in the second between group analysis; and (3) performed an additional sensitivity analysis to better match the OCP and menstrual phases.

3 Results

3.1 Study Characteristics

Figure 1 shows the studies identified and selected by the search strategy. Details of the included studies are shown in Table 2. In total 42 studies [5, 13, 18,19,20, 22,23,24, 35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68] and 590 participants were included.

Fig. 1
figure 1

Search flow diagram

Methodological quality at the level of the individual study is shown in Fig. 2; 83% of the studies were graded as M, L or very low VL, with 17% achieving H quality. Specifically, 4 studies were graded as VL, 10 as L, 21 as M and 7 as H quality.

Fig. 2
figure 2

Quality rating of outcomes from all included studies (n = 42). Each bar represents the proportion of articles assigned a high, moderate, low, or very low-quality rating. The x-axis represents the different stages of this process, with the first bar based on the assessment of risk of bias and study quality as determined by the Downs and Black checklist, while question 1 (Q.1) and question 2 (Q.2) were used to determine if the natural menstrual cycle phase comparison was verified using appropriate biochemical outcomes and whether the oral contraceptive pill under investigation was described in a sufficient level of detail. The final bar represents the proportion of studies assigned to each quality rating category

3.2 Between Group Analyses of Habitual Oral Contraceptive Users Compared to Naturally Menstruating Women

Thirty of the included studies (combined quality rating = M; specifically 20% H; 37% M; 30% L; 13% VL) generated 151 effects sizes from research designs comparing habitual OCP users with naturally menstruating women. The data were collected from 597 participants (habitual OCP n = 303, naturally menstruating n = 294) with studies comprising a mean group size of 10 (range n = 5–25).

3.2.1 Oral Contraceptive Pill Withdrawal [Days 1–7] Versus the Early Follicular Phase [Days 1–5] of the Menstrual Cycle

Three outliers were identified with effect sizes greater than + 2, and were removed from the analysis, leaving a total of 49 effect sizes (26 endurance, 23 strength) from 18 studies (combined quality rating = M; specifically 17% H; 33% M; 28% L; 22% VL; habitual OCP n = 176, naturally menstruating n = 169). The three-level hierarchical model indicated a trivial effect with the median value associating greater performances with naturally menstruating women (ES0.5 = 0.18 [95% CrI − 0.02 to 0.37]; Fig. 3). Relatively large between-study standard deviation was identified (\(\tau\) 0.5 = 0.16 [95% CrI 0.01–0.44]) with estimates indicating moderate intraclass correlation (ICC0.5 = 0.42 [95% CrI 0.00–0.80]) due to analysis of multiple outcomes reported within studies. Pooling of strength and endurance outcomes was conducted as no evidence was obtained that indicated a differential effect between the performance categories (ES0.5/Endurance-Strength = 0.04 [95% CrI  − 0.41 to 0.43]). Posterior estimates of the pooled effect size identified a moderate probability of a small effect favouring naturally menstruating women in the early follicular phase of the menstrual cycle (d ≥ 0.2; p = 0.404) and effectually a zero probability favouring habitual OCP women (d ≤  − 0.2; p = 0.001). Inclusion of outliers within the model substantially increased the average effect size (ES0.5 = 0.34 [95% CrI  − 0.04 to 0.72]) and between study variance (\(\tau\) 0.5 = 0.70 [95% CrI 0.24–1.23]).

Fig. 3
figure 3

Bayesian Forest plot of multilevel meta-analysis comparing performance measured during oral contraceptive pill withdrawal phase and early follicular phase of the menstrual cycle. The study-specific intervals represent individual effect size estimates and sampling error. The circle represents the pooled estimate generated with Bayesian inference along with the 95% credible interval (95% CrI)

3.2.2 Oral Contraceptive Pill Consumption [Days 8–28] Versus all Phases of the Menstrual Cycle [Days 6–28] Except the Early Follicular Phase [Days 1–5]

Eleven outliers were identified with effect sizes greater than + 2, and were removed from the analysis, leaving a total of 88 effect sizes (53 endurance, 35 strength) from 24 studies (combined quality rating = M; specifically 21% H; 42% M; 25% L; 13% VL; habitual OCP n = 244 habitual OCP, naturally menstruating n = 230). The three-level hierarchical model indicated a trivial effect with the median value associating greater performances obtained in the naturally menstruating women (ES0.5 = 0.13 [95% CrI  − 0.05 to 0.28]; Fig. 4). Relatively large between study variance was identified \(\tau\) 0.5 = 0.22 [95% CrI 0.06–0.45] with central estimates indicating very low intraclass correlation ICC0.5 = 0.08 [95% CrI 0.0–0.61] due to analysis of multiple outcomes reported within studies. Pooling of strength and endurance outcomes was conducted as no evidence was obtained that indicated a differential effect between the performance categories (ES0.5/Endurance-Strength = 0.02 [95% CrI  − 0.25 to 0.31]). Posterior estimates of the pooled effect size identified a small probability of a small effect favouring naturally menstruating women (d ≥ 0.2; p = 0.188) and effectually a zero probability favouring habitual OCP women (d ≤  − 0.2; p < 0.001). Inclusion of outliers within the model increased the average effect size (ES0.5 = 0.19 [95% CrI  − 0.14 to 0.51]) and between study variance (\(\tau\) 0.5 = 0.71 [95% CrI 0.49–1.07]).

Fig. 4
figure 4

Bayesian Forest plot of multilevel meta-analysis comparing performance measured during oral contraceptive pill consumption phase with menstrual cycle phases (excluding early follicular phase). The study-specific intervals represent individual effect size estimates and sampling error. The circle represents the pooled estimate generated with Bayesian inference along with the 95% credible interval (95% CrI)

3.2.3 Sensitivity Analyses; Primary Outcome Studies/Moderate or High-Quality Studies only

Sensitivity analyses were completed for between and within group designs using data from studies that included exercise performance as the primary study outcome (Table 3) and from studies categorised as high or moderate in quality (Table 4). No substantive differences were obtained from any of the previous analyses with pooled effect sizes identifying trivial effects with greater performances obtained in naturally menstruating women.

Table 3 Results from sensitivity analyses with data from studies including performance as the primary outcome
Table 4 Results from sensitivity analyses with data from studies categorised as “high” or “moderate” in quality

3.2.4 Sensitivity Analysis of Physiological Menstrual Cycle Phases Versus Pseudo Oral Contraceptive Pill Phases; Days 1–5, Days 12–16 and Days 19–23

An additional set of sensitivity analyses were completed on the between group design data to better match the physiological menstrual cycle and OCP pseudo-phases. This was achieved by mapping days 1–5, 12–16 and 19–23 from both cycles (Table 5). Collectively, findings were aligned with the more coarsely matched phases presented above (i.e., Sects. 3.2.1 and 3.2.2). In days 1–5 and 19–23, pooled effect sizes again identified trivial effects with greater performances obtained in naturally menstruating women. In days 12–16, pooled effect sizes were effectually zero with a wide CrI reflecting the limited data available (11 effect sizes from 5 studies).

Table 5 Results from sensitivity analyses comparing performance outcomes comparing physiological menstrual cycle phases versus pseudo oral contraceptive pill phases

3.3 Within Group Analyses of Oral Contraceptive Consumption with the Hormone-Free Withdrawal phase

Twenty-four of the included studies (combined quality rating = H/M; specifically 33% H; 33% M; 17% L; 17% VL) generated 148 effect sizes (positive values favouring OCP consumption) from research designs comparing OCP consumption with OCP withdrawal. The data were collected from 221 participants with studies comprising a mean group size of 10 (n = 5–17). The three-level hierarchical model incorporating both strength (96 effect sizes) and endurance (52 effect sizes) provided some evidence of a trivial effect with the pooled effect size very close to zero (ES0.5 = 0.05 [95% CrI  − 0.02 to 0.11]; Fig. 5). Between study variance was relatively small \(\tau\) 0.5 = 0.06 [95% CrI 0.0–0.16] as were central estimates of intraclass correlation ICC0.5 = 0.20 [95% CrI 0.0–0.62] due to analysis of multiple outcomes reported within studies. Pooling of strength and endurance outcomes was conducted as no evidence was obtained that indicated a differential effect between the performance categories (ES0.5/Endurance-Strength = 0.02 [95% CrI  − 0.22 to 0.33]). Posterior estimates of the pooled effect size identified almost zero probability of a small effect in either direction (|d|≥ 0.2 p ≤ 0.001). Sensitivity analyses conducted with data from studies where performance was identified as a primary outcome had minimal effect on model outputs (Table 3) and from studies categorised as high or moderate in quality (Table 4) had no substantive influence on model outputs.

Fig. 5
figure 5

Bayesian Forest plot of multilevel meta-analysis comparing performance measured during oral contraceptive pill consumption with the hormone-free withdrawal phase. The study-specific intervals represent individual effect size estimates and sampling error. The circle represents the pooled estimate generated with Bayesian inference along with the 95% credible interval (95% CrI)

3.4 Within Group Comparison of Oral Contraceptive Use and Non-Use

Only two studies [20, 42] met the inclusion criteria for this category and as such no meta-analysis was performed on these data. Casazza et al. [20] tested participants during two phases (4–8 days and 17–25 after the start of menses) of the menstrual cycle, in a randomised order. Following this, participants began taking the same triphasic OCP for four complete cycles (28 days per cycle) and were tested during the week of the inactive OCPs and during the second week of active OCP ingestion. Menstrual cycle phase had no effect on peak exercise capacity. Conversely, 4 months of OCP use resulted in significant decreases in time to peak exercise (14%) and the peak power output attained (8%) during a continuously graded cycle test. In addition, all participants experienced an 11% decline in peak oxygen uptake (\(\dot{V}\)O2 peak; L∙min−1). Ekenros et al. [42] employed a cross-over design, such that participants taking an OCP upon recruitment were tested on day 2, 3 or 4 during the OCP free days and on days 7 or 8 and 14 or 15 during the OCP-taking days, after which they stopped taking the OCP and were tested on day 2, 3 or 4, 48 h after ovulation and 7 or 8 days after ovulation. Those who were naturally menstruating at recruitment were tested on day 2, 3 or 4, 48 h after ovulation and 7 or 8 days after ovulation and were re-tested following one OCP cycle on day 2, 3 or 4 during the OCP free days and on days 7 or 8 and 14 or 15 during the OCP-taking days. There were no significant differences in muscle strength between groups, although maximum muscle strength of the knee extensors was different between the early follicular (days 2, 3 or 4) and luteal phase (7 or 8 days after ovulation) in the naturally menstruating group; 139 (28) N·m compared with 145 (26) N·m (p = 0.02).

3.5 Randomised Controlled Trials of Oral Contraceptive Use Versus Placebo Intake

Only one study [23] met the inclusion criteria for this category and as such no meta-analysis was performed on these data. Lebrun et al. [23] employed a randomised, double-blind, placebo-controlled trial in naturally menstruating women. Testing was performed during the early follicular (days 3–8) and mid-luteal (days 4–9 after ovulation) phases of an ovulatory menstrual cycle, after which participants were randomly assigned to either an OCP (n = 7) or placebo (n = 7) group and were tested between days 14 and 17 of the second cycle of OCP (i.e., the same triphasic OCP) or placebo administration. Participants were active women, who regularly competed in aerobic activities such as running, cycling, triathlon, rowing, cross country skiing. OCP use resulted in a mean decrease of 4.7% in \(\dot{V}\)O2max compared with a 1.5% improvement in the placebo group. The decrease in absolute \(\dot{V}\)O2max was accompanied by an increase in the sum of skinfolds, but not by significant changes in weight or measures of strength, anaerobic, or endurance performance.

4 Discussion

The aim of this review was to identify if OCP use influenced exercise performance. Results generally indicated a trivial performance effect on average with OCP use, with superior performance generally observed for naturally menstruating women compared to their OCP using counterparts. In addition to the estimated trivial to small average effect, results from the meta-analysis models indicated relatively large between study variance indicating that research design, participant characteristics and performance measured might influence any effect. Collectively, these findings indicate that OCPs might, on average, exert a slightly negative impact on performance, but from a practical point of view the effect magnitude and variability support consideration of an individual’s response to OCP use, so that decisions as to the appropriateness of OCP use can be tailored to the individual requirements (e.g., contraceptive or medical need) and response (i.e., to what degree they might be affected) of each athlete. Pooling of data comparing exercise performance between OCP consumption and withdrawal estimated an effect that was very close to zero, indicating that exogenous supplementation of oestrogen and progestin is unlikely to have any substantive effect on exercise performance across an OCP cycle.

As a result of OCP use, endogenous concentrations of oestradiol and progesterone are significantly downregulated when compared with the mid-luteal phase of the menstrual cycle [5]. This chronic downregulation might be responsible for the slightly impaired exercise performance demonstrated in OCP users when compared with their naturally menstruating counterparts. Indeed, the endogenous hormonal profile of an OCP user is comparable to the profile observed during the early follicular phase of the physiological menstrual cycle; i.e., correspondingly low levels of endogenous oestradiol and progesterone [5, 69, 70]. In our meta-analysis [71], on the effects of the menstrual cycle on exercise performance, the available evidence indicated potentially inferior performance during the early follicular phase, when compared with all other phases of the menstrual cycle that had considerably higher concentrations of endogenous oestrogen and/or progesterone. Similarly, the within group results of the current meta-analysis showed that exercise performance between the OCP consumption and withdrawal phases was, on average, very unlikely to exhibit even a small effect, during which time the concentrations of endogenous oestradiol and progesterone were consistently low and did not significantly increase [5]. Collectively, these results indicate that exercise performance might be mediated by the concentration of endogenous ovarian hormones in some individuals, as reflected by evidence of slightly impaired performance on average at a time when these hormones are lowest.

The between-group findings from the present review align with those of Casazza et al. [20] and Lebrun et al. [23] who also showed that experimental OCP use resulted in reduced peak exercise capacity and decreased maximal oxygen uptake, when compared with non-hormonal contraceptive use. Casazza et al. [20] employed a cross-over design for their study, with data from two phases of a physiological menstrual cycle compared with data after 4 months of triphasic OCP use, whilst Lebrun et al. [23] utilised a randomised, double-blind, placebo-controlled trial, with data from two phases of the physiological menstrual cycle compared with data after 2 months of triphasic OCP use. These longitudinal intervention studies represent a change from inactive to active OCP use in the same individuals, which is a stronger research design when compared to the cross-sectional observational studies that were used in the between-group analysis in the present review, which further supports the notion that OCP use might result in small adverse effects on performance in some individuals when compared with naturally menstruating women. It is worth noting that experimental OCP use may not always be carried out in consultation with a clinician who would monitor any potentially unfavourable side effects, and possibly make changes to the OCP type or dose, as such higher detrimental effects may potentially be observed in experimental OCP users as opposed to habitual OCP users. In addition, some adverse side-effects, which are experienced during initial OCP use, can mitigate over time, potentially compounding the issue of comparing habitual OCP users with experimental OCP users.

Ekenros et al. [42] showed no difference in performance between OCP and non-OCP use, which is contrary to the findings from the present study and those of Casazza et al. [20] and Lebrun et al. [23]. Although Ekenros et al. [42] employed a longitudinal intervention study design, the original ‘non-OCP’ users only received a monophasic OCP for 1 month (i.e., 21 OCP-taking days) before they were retested as ‘habitual’ OCP users. Casazza et al. [20] and Lebrun et al. [23] retested after 4 and 2 months of OCP use, which might have resulted in a greater downregulation of endogenous oestradiol and progesterone than that seen by Ekenros et al. [42]. In addition, the participants in the Ekenros et al. [42] study used a variety of OCPs, whereas Casazza et al. [20] and Lebrun et al. [23] used the same OCP, resulting in a more homogenous group, with potentially less inter-individual variation in endogenous ovarian hormone concentration, and reducing the possibility of type II errors [72]. Ekenros et al. [42] used a strength based performance measure, whilst Casazza et al. [20] and Lebrun et al. [23] employed more endurance type performance measures, representing different physiological pathways for oestrogen and/or progesterone to exert their effects. For example, progesterone is likely to mediate changes in ventilatory drive [73], whilst oestrogen might be responsible for sex-differences in substrate metabolism [74], both considered to influence endurance performance. Whereas for strength-based performance, both sex hormones act as neurosteroids, which are capable of traversing the blood–brain barrier thereby potentially enacting effects on maximal neuromuscular performance [75]. These methodological differences, alongside the differing modes of exercise, might account for the disparity in result between Ekenros et al. [42] and Casazza et al. [20], Lebrun et al. [23] and the present review.

Our within group analysis indicates that the exogenous supplementation of ethinyl oestradiol and progestin is very unlikely to exert any substantive effect, such that performance was relatively consistent across an OCP cycle. From a practical perspective, this means that exercise performance is not moderated by the exogenous hormonal profile of an OCP but is more likely mediated by the endogenous hormonal milieu caused by OCP use (i.e., the continuous downregulation of oestradiol and progesterone between OCP consumption and withdrawal). These data suggest that the ‘supplementary’ nature of OCPs should not be considered as performance-enhancing. As OCPs are also not ergolytic, the timing of the withdrawal bleed can be manipulated (e.g., to avoid bleeding during competition) without negatively impacting performance, although the long-term health implications of continuous OCP consumption without any withdrawal are unknown. Schaumberg et al. [10] have noted that menstrual manipulation for exercise and sports performance reasons is already a fairly common practice amongst physically active women.

Although all results from the current meta-analysis align, and have solid mechanistic underpinnings, it is important to acknowledge that the practical implications of these findings are small. All point estimates and outliers were in the same direction and indicated a potentially negative influence, on average, of ovarian hormonal suppression on performance. However, the real-life implications of these findings are likely to be so small as to be trivial and therefore not meaningful for most of the population. Additionally, a large range of moderating factors [76, 77] (independent of hormonal changes) are likely to influence an individual’s response to, and requirement for, OCPs and we suggest that individuals do not solely make their decision to use or not use OCPs based on the performance related findings reported herein. For example, some individuals are prone to substantial menstrual symptoms such as cramps, bloating or heavy menstrual bleeding, and for these individuals, the benefits of OCP use [78, 79] might outweigh the small detriments observed in the present review. Similarly, the consequences of unplanned pregnancy might be far greater than the trivial effects observed in the current meta-analysis. Conversely, large inter-individual variation exists in the response to most interventions [80, 81] whereby some individuals might experience no performance-related side-effects whatsoever, whereas others might experience substantial performance-related side-effects from OCP use [4]. As such, we recommend that individuals consider all relevant factors (which might include physical, emotional, practical, financial and health related aspects) before making decisions as to the appropriateness (or not) of OCP use.

The current review was primarily conducted on non-randomised observational trials, which might be considered a limitation of its value. Randomised controlled trials are the preferred design to investigate the potential influence of a treatment (in this case OCPs) on an outcome (in this case exercise performance); however, they can be difficult to implement in this population, as individuals tend to be habitual OCP users or non-users. Only one randomised controlled trial was identified from the relevant literature [23], alongside two further trials wherein an OCP was prescribed to or withheld from non-users and habitual users in a cross-over design [20, 39]. Withholding OCPs from a habitual OCP user might have ethical and practical (e.g., unplanned pregnancy) implications and as such, this type of research design is rarely employed. In addition, having the resources to conduct appropriately standardised and controlled studies across the time-periods required to adequately address this question is, in many cases, prohibitive (i.e., an adequate wash-out and/or supplementation period). Instead, most data on OCP use versus non-use are based on between group investigations of independent parties, which might be impacted by a large range of confounding variables and does not permit causal inference to be made. The lack of randomised controlled trials will affect analyses within this area of study for the foreseeable future.

Following the Downs and Black quality assessment [27], most studies (64%) were classified as M or L, which was largely due to a lack of standardisation (e.g., prior activity and food intake) and inadequate familiarisation (i.e., often no familiarisation took place or long periods of time had elapsed between testing sessions, potentially warranting re-familiarisation). Additionally, most studies had small samples (range: n = 5–25), with a mean group size of 10, meaning that many were likely to be under-powered. Rigorous control of these research design factors in future studies, along with consideration of individual response [65, 66] and more randomised controlled trials will provide further insight into the effects of OCP use on exercise performance and will allow exercising women to make evidence-based decisions on OCP use within the context of sport. Moreover, consideration of the topic-specific methodological issues recommended by Cable and Elliott [82] and Elliott-Sale et al. [72], namely biochemical confirmation of menstrual phase and adequate description of OCP type, resulted in a further reduction in high quality studies, from 36 to 17%, and an increase in very low-quality studies, from 0 to 10%. Future studies should use appropriate biochemical outcomes (i.e., blood samples to determine the concentration of endogenous oestradiol and progesterone) to confirm the hormonal milieu in OCP users, and naturally menstruating women, a tenet that is also supported by Janse de Jonge [83]. Such measures would permit the relationship between specific ovarian hormonal profiles and exercise performance to be established. In addition, future investigations should describe the type of OCP used to the level of detail required for categorisation or replication, as different types of OCPs cause varying concentrations of endogenous sex hormones, resulting in non-homogenous participant groups [72]. The heterogeneity, caused by the non-homogenous populations plus the considerable variation in outcomes measured, likely contributed to the relatively large between study variance observed. In the future, it would be interesting to tease out which factors might cause some women to have a negative effect, while others do not, but this was not possible with the current evidence base. Future studies need to include homogenous populations, improve methodological quality and limit confounders to facilitate a deeper understanding of individual effects.

5 Conclusion

Collectively, our results indicate that OCP use might result in slightly inferior exercise performance on average when compared to non-use, although any group level effect is likely to be trivial. Although most of the data used in this meta-analysis were rated as moderate to low quality (83% of the total studies), a sensitivity analysis of moderate and high quality papers (67% of the total studies) did not change the general findings described herein, thus bolstering the confidence in the evidence. From a practical perspective, as the effects tended to be trivial and variable across studies, there appears to be no performance related evidence to warrant general guidance on OCP use compared with non-use. As such, an individualised approach should be taken, based on each athlete’s response to OCP use, along with other factors such as their primary objective for using OCPs, and their experience of the naturally occurring menstrual cycle. Moreover, the difference in exercise performance between the OCP consumption and withdrawal phases was estimated on average to be close to zero, suggesting that the endogenous hormonal profile is the prevailing driver of performance rather than the supplementation of exogenous hormones. From a practical perspective, there appears to be no performance related evidence to warrant general guidance on OCP consumption versus OCP withdrawal.