1 Introduction

The International Olympic Committee (IOC)’s Framework on Fairness, Inclusion and Non-Discrimination on the Basis of Gender Identity and Sex Variations recommends but does not mandate that sport’s governing bodies follow the ten principles [1, 2] of: (1) inclusion, (2) prevention of harm, (3) non-discrimination, (4) fairness, (5) no presumption of advantage, (6) evidenced-based approach, (7) primacy of health and bodily autonomy, (8) stakeholder-centred approach, (9) right to privacy and (10) periodic reviews. As a result of principles 5, 6 and 8, the IOC proposes that sport’s governing bodies generate high-quality sport- and population-specific research to inform the development of any eligibility criteria applying across their relevant sports, disciplines and/or events [1, 2]. Currently, there is limited research on the effects of gender affirmation therapy on sports performance in well-trained transgender athletes, which constitutes a nascent but growing field. Scientific evidence ranges from the gold standard lab-based studies on athlete comparison groups to review articles that use a variety of methods to synthesise existing knowledge. One paper asserting policy relevance is the paper titled “Transgender Women in the Female Category of Sport, Perspectives on Testosterone Suppression and Performance Advantage” [3]. This publication by Hilton and Lundberg [3] has been prominently cited in numerous policies that exclude [4,5,6,7] and restrict [8] transgender women from the female category, with the suggestion that it offers relevant scientific evidence. Since it is up to individual international federations to decide which policy is best for their sport [1, 2], the goal of this letter is to help inform the physicians, scientists, athletes, international federations and other relevant stakeholders as to where this particular contribution to the literature falls in the hierarchy of evidence [9, 10] and to highlight concerns with the theoretical paradigms used by Hilton and Lundberg [3].

2 Hierarchy of Evidence: Narrative Reviews

Systematic reviews with accompanying meta-analyses are recognised as research designs offering high-quality evidence when conducted meticulously (Fig. 1). The goal of this type of research is to formulate an answer to a scientific issue on the basis of the synthesis of data that has been collected and analysed from a variety of different research studies. Systematic reviews and meta-analyses follow established standards, notably the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria [11]. There are crucial distinctions between systematic and narrative reviews, which try to summarise previously published research on a certain topic but generally lack systematic ways to determine research eligibility, fail to conduct exhaustive searches, are prone to study bias and lack systematic ways to quantitatively assess patterns across studies in a field/topic of research. As a result, narrative reviews may be biased in their selection by excluding relevant research or favouring the findings of particular studies.

Fig. 1
figure 1

Adapted from a combination of Wallace et al. [9] and Evans [10]. SR, systematic review; MA, meta-analysis; RCT, randomized control trial

Strength of Evidence Pyramid in Healthcare Interventions.

The evidence offered by Hilton and Lundberg [3] is in the form of a narrative review and not a systematic review, albeit the search criteria are provided in the supplemental material for Sect. 4, “is the male performance advantage lost when testosterone is suppressed in transgender women?” All previous literature searched in sections “2, the biological basis for sporting performance advantages in males,” and “3, sports performance differences between males and females” do not correspond to the scope of the article, which is transgender women in the female category of sport and seeks only to fuel the underlying assumption that cisgender male study populations are an appropriate substitute for transgender women (see Sect. 3). Furthermore, the literature utilised in Sects. 2 and 3 searches does not follow PRISMA [11] guidelines or follow principle 6, recommending that evidence from the field or the laboratory “be based on data collected from a demographic group that is consistent in gender and athletic engagement [1, 2],” making it difficult for readers to judge the transparency and scientific value of the process and review. This important distinction would allow the reader to assess the evidence presented in this article more accurately.

Given that there are currently very few transgender athletes competing in elite sports, and therefore even fewer related studies, the systematic review and meta-analysis strategy is at present impractical. Randomised control trials, the next highest-quality study design (Fig. 1), are also impractical in this area at present, leaving observational research, such as cohort studies [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30], cross-sectional studies [28, 29, 31,32,33,34,35,36,37,38], case–control studies, case reports and case series, as the higher-ranking evidence base that could be considered by international federations (Fig. 1). Studies on the effects of gender affirming hormone treatment (GAHT) on performance-related metrics [39] will offer important insights; although results using sedentary transgender cohorts are useful, they cannot be directly extrapolated to transgender athletes. Data on transgender athletes are being generated [40, 41], but it will likely take more than 2 years to gather available observational information, which may not adequately address the question fully. Given the increasing prevalence of transgender persons in society [42, 43], systematic reviews and meta-analyses on transgender athletes may become available in the not-too-distant future. With the lack of an established systematic review approach, the evidence presented by Hilton and Lundberg [3] is limited to being characterised as low quality (Fig. 1 [9, 10]) and graded as an opinion piece with a low strength of evidence. In the interim, a comprehensive systematic evaluation on the impact of GAHT on sports performance in a sedentary transgender cohort [44] should be prioritised to guide sports organisations’ policy choices until sufficient transgender athlete data are acquired, as this is at least consistent in gender and fulfils some of principle 6 recommended by the IOC [1, 2].

3 Underlying Assumptions

The article by Hilton and Lundberg [3] has two primary theoretical misconceptions that undermine the accuracy and therefore the usefulness of the paper, first that cisgender male study populations are an appropriate substitute for transgender women, and second, that in the examination of sports performance, absolute data have a stronger level of evidence than relative data.

A starting assumption of Hilton and Lundberg [3] is that in the absence of research on transgender women, it is appropriate to rely on data collected from cisgender men to then compare the physiological and performance characteristics of cisgender men and women, as though the two groups are equivalent contrasting with IOC principle 6 [1, 2]. Existing research suggests that transgender women do not equate to cisgender men in either physiological [27] or performance terms [30]. In terms of physiology, the work of Wiik et al. [27] demonstrated that mean transgender women’s muscle volumes are already below the mean muscle volumes of cisgender men at baseline, with a similar outcome displayed in isometric torque levels. Furthermore, Van Caenegem et al. [28] showed that the lean mass was 6.8% less in pre-GAHT transgender women compared with control cisgender men (57.4 ± 8.7 kg versus 61.3 ± 6.8 kg). These results are noteworthy even though they are obtained from sedentary transgender women, and transgender women athletes may be closer to cisgender men as a starting point. However, this is unclear with physical performance measures, showing transgender women in the military had pre-GAHT upper body strength measures that were 12% lower (− 6 push ups, 95% CI − 20 to − 12) [30] than cisgender male controls, in comparison with sedentary transgender women who had pre-GAHT strength measures 15% lower (42 ± 9 versus 49 ± 6 N/kg) than cisgender male controls [28].

With research revealing the cumulative longitudinal physiological and performance effects of GAHT in sedentary transgender women, Hilton and Lundberg [3] highlight understanding gaps which include reductions in lean body mass that range from − 0.8% to − 5.4% [14,15,16, 18, 22, 24, 27,28,29, 44], reduced muscle cross-sectional area ranging from − 1.5% to − 11.7% [12, 24, 27, 28, 44], and decreases in the O2-carrying protein haemoglobin, ranging from − 3.4% to − 14% [15, 17, 19, 44]. Overall, the assumption that transgender women and cisgender men can simply be used as equivalent groups undermines the study’s assertion that it offers insights regarding the sporting capabilities of transgender women compared with cisgender women since it undertakes no such comparison.

The use of relative measures is foundational to sports performance testing. From undergraduate education onwards, sports scientists in the field normalise characteristics (e.g. height or weight) when concluding how an absolute measure (e.g. of muscular strength) will translate into actual athletic ability [45,46,47,48,49]. Despite this, Hilton and Lundberg’s [3] literature prioritise absolute rather than relative data, failing to normalise their results to either height or mass. In Table 4, Hilton and Lundberg [3] compare taller groups with shorter groups, which may account for a substantial portion of the variances in lean body mass or thigh area. Moreover, supplying reference female values generated from parallel cohorts of pre-GAHT transgender males rather than cisgender women creates a misleading impression for the reader. In Table 4, as previously discussed, sedentary transgender men are not the same as cisgender women athletes.

This issue reappears in Hilton and Lundberg’s article: 50% of the reported data on muscle strength, in Table 4 [3], is sourced from the test of absolute hand grip strength, despite absolute grip strength being a function of hand size [50]. Hilton and Lundberg [3] neither provide a relative measures analysis that is most relevant in understanding sports performance and aiding in the creation of sports policy, nor discuss the lack of relative performance measures as a limitation of these data. To properly determine whether athletic performance parameters are lost or gained in transgender women and men following the initiation of GAHT, any future research on the sporting ability of transgender athletes must, in addition to the non-corrected data, normalise for height, body mass and fat-free mass as appropriate. Interestingly, the authors do report that maximal oxygen uptake is 50% greater in cisgender men than cisgender women when reported in absolute terms and 25% when reported in relative terms, indicating that half of the differences is due to body mass alone.

Had Hilton and Lundberg [3] made their data available, as recommended by open science guidelines [51, 52] and the European Commission Research and Innovation Strategy 2020–2024 [53], the analyses might have been re-run by others in science to corroborate the authors’ findings and strengthen their analyses. Therefore, due to the priority of absolute data and the lack of open scientific methods, the analyses can only be characterised as limited.

4 Concluding Remarks

Hilton and Lundberg [3] are being widely cited by international federations to inform policy, yet as discussed here, the arguments and underlying assumptions presented in this paper are supported by the lowest level of evidence (Fig. 1) and do not align with the principles of the IOC Framework [1, 2], specifically principles 5, 6 and 8. It is unsurprising, therefore, that legal challenges have already been made as a result of the policies that this literature has provided as the evidence base [54, 55], which will serve to undermine the efforts and have negative repercussions for international federations, such as harming the confidence of its athletes, sponsors and the general public. To ensure both the integrity of the policy process and competition, international federations should reflect on the strength of the evidence in the studies they consider and utilise the highest-quality evidence available to make their decisions. Due to the growing number of transgender people in society, sub-elite transgender athletic data will be available imminently; however, due to the low participation rates of transgender athletes in elite sport and the growing number of exclusionary or limiting policies on their participation in elite sport, performance data on elite transgender athletes will likely not be readily available in the near future. Performance metrics are the most informative data, and research is currently underway using both sedentary [39, 56] and transgender [40, 41] athlete cohorts. While awaiting these data, international federations should not rush into making decisions informed by poor scientific evidence and should therefore prioritise following the ten principles of IOC framework [1, 2].