Background

Myostatin, previously known as growth and differentiation Factor 8 (GDF-8), is a member of the transforming growth factor-beta (TGF-β) superfamily. It is highly expressed in skeletal muscle and negatively regulates skeletal muscle growth and development [1]. Previously, investigations have demonstrated that myostatin acts via its receptors, such as activin receptor type IIA (ACVR2A or ActrIIA) and activin receptor type IIB (ACVR2B or ActrIIB), which leads to the recruitment and activation of the suppressor of mothers against decapentaplegic (SMAD) transcription factor family, which regulates the expression of numerous genes. Both receptors are also members of the TGF-ß superfamily [2, 3].

Regarding myostatin, the protein function appears to be conserved across species, since mutations of the myostatin gene (MSTN) causing a loss or a reduction in the protein activity have been linked to muscular hypertrophy, which is described in various vertebrate species, including rodents, cattle, dogs, sheep, and cattle [1, 4, 5]. The “double-muscling” phenotypes of animals lacking myostatin and the high degree of sequence conservation during evolution have raised the possibility that myostatin may also control muscle differentiation and growth in humans. Strong evidence that myostatin plays a key role in the regulation of human muscle mass has been provided by Schuelke et al., who described a case of an extraordinarily muscular child with a loss-of-function mutation in the MSTN gene [6].

The MSTN gene is of interest in sports, where its association with performance, especially in sports that require muscle mass and strength, can be monitored. It could also be used in gene doping in athletes with the aim of increasing muscle mass, strength, and sports performance [7]. Although there is a growing body of evidence explaining the effects of the MSTN variation on human muscle mass, athlete studies have been inconclusive. To date, only a few MSTN single nucleotide polymorphisms (SNPs), such as K153R (rs1805086, 458 A > G), A55T (rs180565, 163 G > A), E164K (rs35781413, 490 G > A), P198A, I225T, rs11333758 (c.373 + 90delA), and c.373 + 5 G > A (rs397515373), are of particular interest in sports genetics [6, 8,9,10,11].

The complications in studying mutations in MSTN are the low frequency of some alleles, interethnic differences in allele frequencies, and sex-related differences [9]. Additionally, none of the studies on the strength or endurance abilities of athletes concern the polymorphic sites within genes encoding myostatin receptors (ACVR2A and ACVR2B genes), which have been described in the context of their effect on growth traits only in chickens [12].

Due to the role of the gene products in regulating skeletal muscle growth and development, the aim of our study was to analyze the sequences of the MSTN, ACVR2A, and ACVR2B genes and determine the interaction between selected allelic variants and athletic performance and competition level in the Caucasian population. Overall, we verify the usefulness of the MSTN, ACVR2A, and ACVR2B genes as genetic markers for sports skills, which may underpin differences in the potential to be an elite athlete. The combination of whole-genome sequencing (WGS), subsequent bioinformatic assessment and validation of selected variants in a larger cohort is a new, comprehensive approach to sport genetics allowing the identification of previously unknown sports-related variants in both coding and noncoding regions of the genome.

Results

Genetic variants detected in MSTN, ACVR2A, and ACVR2B with WGS

WGS was performed in a cohort of 102 Polish elite sportsmen and 41 healthy controls from the same population. High-quality SNPs and InDels with at least one non-reference call were identified and yielded 40 variants in MSTN, 200 in ACVR2A, and 152 in ACVR2B. Full results are presented in Supplementary 1 Table A. None of the variants passed the significance threshold after the correction for multiple comparisons when athletes and control groups or athletes subgroups were compared. Additionally, we compared alternate allele frequencies observed in elite athletes with those in the non-Finnish European subset of gnomAD [13] and those in a database of 943 Polish genomes [14]. Here also none of the variants differed significantly in terms of frequency. Nevertheless, based on the obtained results we selected two SNPs for further genotyping: ACVR2A rs3764955 and MSTN rs11333758 (Table 1). The intronic ACVR2A SNP rs3764955 was selected as it was relatively common (minor allele frequency, MAF = 0.27) and the alternate allele showed (non-significant) overrepresentation in the sprint/power subgroups of elite athletes (nominal p = 0.027, OR = 2.12). This SNP has a CADD score of 12.9, which is below the commonly-used damaging variant threshold of 16 but is predicted to be within the 5% of most impactful variants, The MSTN SNP rs11333758 was selected as, again, it was relatively common in the athletes cohort (MAF = 0.28) and the alternate allele showed (nonsignificant) overrepresentation in elite athletes when compared with non-Finnish European cohort in g-nomAD (nominal p = 0.005, OR = 1.57) and a database of Polish genomes (nominal p = 0.065, OR = 1.38). Its CADD score is 4.6, meaning that about 35% of all known human SNP variants are more deleterious, which may suggest rather modulatory phenotypic role of this particular polymorphism.

Table 1 Characteristics of the selected polymorphic sites

Genotyping of rs3764955 and rs11333758

Both of the studied SNPs were in Hardy-Weinberg equilibrium (HWE) in the genotyped group (Table 2).

Table 2 The overall frequency of genotypes in the study and HWE test results

Comparisons of athletes (sprint/power, endurance, and mixed-sport athletes) with controls revealed no significant associations (Supplementary 2 Tables S1 and S2), regardless of assumptions about the genetic model of a trait. Similarly, when athletes were divided into sprint/power, endurance and mixed-sport groups (Supplementary 2 Tables S3 and S4), we found no significant differences in allele frequencies between each discipline and controls in any of the models.

The athletes were also stratified according to their performance: high-elite, elite and subelite (Tables 3 and 4). Here we found a significant underrepresentation of heterozygotes for polymorphism ACVR2A rs3764955 in elite athletes (overdominant model corrected p = 0.043, Table 3) when compared with the controls. Carriers of the CG genotype were 1.56 times more likely to be in the control group than in the elite athletes group as compared with the CC and GG carriers. Moreover, high-elite athletes showed a similar tendency, with an even higher OR. Heterozygotes for the ACVR2A rs3764955 were 1.93 times more likely to be in the control group rather than in the high-elite group as compared with homozygotes. Although, likely due to a smaller n, this association was not significant (overdominant model p = 0.067). For the MSTN rs11333758 (Table 4) there was a significant overrepresentation of alternate allele homozygotes –/– in the high-elite athletes group (recessive model: OR = 4.39, p = 0.004). This SNP also showed significant overrepresentation in high-elite athletes under the codominant model; however, the association was again driven by alternate allele homozygotes (codominant model for deletion genotype carriers: OR = 4.47, p = 0.017).

Table 3 ACVR2A rs3764955 association analysis for athlees stratified by performance and compared with controls
Table 4 MSTN rs11333758 association analysis for athletes stratified by performance and compared with controls

Finally, the athletes were divided both by their discipline type and athletic performance (Tables 5 and 6). For clarity, only results with at least one significant association are presented and full results for both SNPs are available in Supplementary 1 Table B. For ACVR2A rs3764955, heterozygotes were 2.12 times less likely to be sprint/power elite athletes than controls (overdominant model p = 0.048, Table 5), while carriers for the alternate homozygous genotype (CC) were 2.65 times more likely to be in the mixed-sport subelite group as compared to the control group. In the case of MSTN rs11333758 (Table 6) mixed-sport high-elite athletes showed significant overrepresentation of the alternate genotype (codominant model p = 0.002). In the codominant model, heterozygotes (A/–) had genotypic OR of 3,1 and homozygotes (–/–) of 15.5. This association was also significant for both the recessive and dominant models (dominant model p = 0.01, recessive model p = 0.0027).

Table 5 The results of association analysis for the ACVR2A rs3764955 with athletes stratified by both discipline and performance. Full results are available in Supplementary 1 Table B
Table 6 The results of association analysis for MSTN rs11333758 with athletes stratified by both discipline and performance. Full results are available in Supplementary 1 Table B

ACVR2A rs3764955 and MSTN rs11333758 interaction analysis

Although both selected SNPs are located on the same chromosome 2, their distance of over 40 million bp is past the distance (~ 500 000 bp) at which linkage disequilibrium (LD) in the human genome decays completely [16]. Therefore, they cannot be considered as a haplotype. Here we tested for interaction between MSTN rs3764955 and ACVR2A rs11333758 using the model-free W-test for epistasis. p-values for each comparison are reported in Supplementary 1 Table C. The group of mixed-sport high-elite athletes showed a significant interaction between the tested SNPs (p = 0.0057). Table 7 shows the frequencies of each genotype combination in these two groups. The largest difference was observed in the between carriers of the ACVR2A rs3764955 GC and MSTN rs11333758 AA combination. In controls, this combination had a frequency of 0.28 (n = 104) while none of the mixed-sport high-elite athletes was a carrier of this genotype set (Fisher exact p = 0.005). On the other hand, in the mixed-sport high-elite group, we observed a significant overrepresentation of carriers of the ACVR2A rs3764955 GC with MSTN rs11333758 –/– combination (Fisher exact p = 0.0097) and the ACVR2A rs3764955 CC with MSTN rs11333758 –/– (Fisher exact p = 0.0445).

Table 7 Frequencies of different genotype combinations in mixed high-elite athletes and controls. In this comparison, a significant epistasis between the genotypes was found (Supplementary 1 Table C). Fisher exact test was run for each combination

Discussion

Skeletal muscle size, which is crucial to athletes’ strength, is one of the most heritable quantitative traits in humans, with genetic variation contributing 92–94% of the total variance in muscle circumference [17]. While myostatin is an important factor influencing muscle mass, which is well confirmed in nonhuman species, it has not been clearly confirmed whether MSTN expression influences interindividual differences in skeletal muscle mass, affects posttraining changes in body composition, or plays a role in the age-related loss of skeletal muscle mass and function in humans [11]. Although the inconclusive results are usually explained by ethnic differences and the low frequency of some alleles [18], it is possible that the role of receptors that affect the biological activity of myostatin is crucial. However, no study has analyzed the association of the polymorphic sites not only within the MSTN gene but also within the genes encoding its receptors with interindividual differences in sports abilities. Therefore, we conducted a comprehensive study sequencing the MSTN, ACVR2A, and ACVR2B genes from 102 elite athletes and 41 controls. The sequence analysis revealed two nonsynonymous polymorphisms (rs11333758 in MSTN and rs3764955 in ACVR2) relatively common in the athlete cohort, and the alternate allele showed overrepresentation in elite athletes. The first polymorphism was the deletion of one of three adenines (AAA→AA) at position 88–90 bp in the first intron of MSTN (c.373 + 90delA, rs11333758). The second was rs3764955, located at the end of the fifth intron of ACVR2, involving a G to C transversion. Selected SNPs are relatively common in the European population and are not expected to be pathogenic.

The comparisons of all athletes with the controls and the comparisons of athletes divided into sprint/power, endurance, and mixed-sport groups with the controls revealed no significant associations in any of the models. However, we found significant differences when the athletes were stratified according to their competition level: high-elite, elite, and subelite. Regarding the polymorphic site rs11333758 located in MSTN, there was a significant overrepresentation of rare homozygotes (–/–) in the high-elite athletes group when compared with the controls. In addition, when the athletes were divided by their discipline type and athletic performance, mixed-sport high-elite athletes showed significant overrepresentation of the –/– genotype. This finding suggests that this genotype may be favorable for achieving success in sports utilizing mixed anaerobic/aerobic energy production. Thus, for the first time, this experiment revealed that harboring this indel MSTN variant is significantly associated with athletes’ competition level in the Polish population, especially in the mixed-sport group. To our knowledge, only three studies have been published to date on the functional relevance of the rs11333758 polymorphism in the MSTN gene [19, 20]. First, in a study including 110 elite athletes with a high amount of endurance training and 27 male controls, Karlowatz et al. (2011) analyzed the association of the left ventricular mass (LVM) of an athlete’s heart with polymorphisms in the insulin-like growth Factor 1 (IGF1) signaling pathway in combination with MSTN. An analysis of the MSTN sequence revealed only one significant polymorphism, rs11333758, in the MSTN gene. An increased MSTN effect for the deletion allele was observed. Thus, the carriers of the A/– and –/– genotypes may show an attenuated training-induced growth response of the heart, resulting in a lesser LVM increase in comparison with carriers homozygous for the wild type allele (A/A) [20]. Second, Gineviciene et al. (2021) analyzed the whole coding sequence of the MSTN gene in a group of 103 Lithuanian elite athletes and 127 controls. They confirmed an association between the rs11333758 polymorphism and elite athlete status, suggesting that this SNP affects the development of physical performance phenotypes. Specifically, the associations of the deletion allele and genotype with success in endurance sports in female athletes and in sprint/power-oriented male athletes were demonstrated [19]. Third, MSTN sequence analysis performed by Al Majidi et al. (2022) revealed that the homozygous deletion genotype (–/–) was significantly higher in Iraqi endurance athletes and power athletes than in the controls [21]. It was suggested that despite intronic and not altering the amino acid composition of myostatin, this variant may affect the expression of the MSTN gene and myostatin function [19, 20]. Our study and previous data confirmed a potential role of this polymorphism in determining the success of elite athletes; however, more experimental data are required.

It should be emphasized that the rs3764955 polymorphism in ACVR2A has not been previously described in athletes; thus, our results cannot be discussed in direct comparison with other studies. The performed analysis revealed a significant underrepresentation of rs3764955 GC heterozygotes in elite athletes when compared with that in the controls. Carriers of the CC and GG genotypes were more likely than heterozygotes to be elite athletes. Moreover, high-elite athletes showed a similar tendency. In addition, when the athletes were divided both by their discipline type and athletic performance, GC heterozygotes were less likely than the controls to be sprint/power elite athletes, while carriers for the CC genotype were more likely than the controls to be in the mixed-sport subelite group. These results suggest that the GC genotype may be unfavorable regarding achieving success in sports. In humans, the rs3764955 polymorphism was only described in the context of the risk of hypertensive disorders of pregnancy in the northern Chinese population [22] and preeclampsia in the Norwegian population [23].

Earlier investigations demonstrated that myostatin negatively regulates skeletal muscle development by activating its receptors [3]. In an animal study, Bhattacharya et al. demonstrated that a single knockdown of MSTN and its receptors could enhance growth traits more so than combinations of MSTN and its receptors [12]. Thus, only a simultaneous analysis of polymorphic sites located in MSTN and genes encoding myostatin receptors can provide additional unique information about the relationships between the gene variants and observed phenotypic traits and insight into the dependency among genetic markers [24]. To the best of our knowledge, this is the first study to analyze the association of the MSTN-ACVR2A interaction with athletic performance and competition level. The complex analysis revealed that mixed-sport high-elite athletes showed significant underrepresentation of the rs3764955 GC - rs11333758 AA genotype combination (none of the mixed high-elite athletes was a carrier of this genotype set). We observed a significant overrepresentation of the rs3764955 GC - rs11333758 –/– and rs3764955 CC - rs11333758 –/– genotype combination in the same group. Thus, it can be concluded that the rs3764955 GC - rs11333758 AA genotype combination might be unfavorable for achieving success in mixed-sports. In addition, the rs3764955 GC - rs11333758 –/– and rs3764955 CC - rs11333758 –/– genotype combinations may be considered beneficial, and carriers of these combinations of genotypes might achieve sports success utilizing mixed anaerobic/aerobic energy production. The results of gene‒gene interaction analyses confirmed the results of individual SNP analysis.

Although myostatin is usually described as a key factor affecting the growth and differentiation of muscle cells, which is advantageous in the improvement of strength and power [10, 11], animal studies have revealed that the effects of myostatin on the mechanical properties of muscles depend on species, muscle type, developmental stage, and the extra and intracellular factors determining the fiber type [5, 25]. These factors may partly explain the inconsistent findings regarding myostatin [8]. Interestingly, it has been shown that myostatin is expressed at higher levels in slow-twitch muscle fibers and may therefore have a more significant functional impact in this muscle group [26], which explains the described association between the rs11333758 MSTN polymorphisms and endurance predispositions in female athletes [19]. An association between this SNP and success has also been demonstrated in sprint/power-oriented male athletes, indicating a sex difference in the effect of the rs11333758 SNP on athletes’ physical performance [19]. A systematic large-scale rare-variant association analysis of 4,529 phenotypes revealed that missense variants in MSTN are associated with body composition and creatinine levels [27]. We did not compare genotype frequencies with respect to sex; thus, we cannot confirm this relationship. We found an association between the specific MSTN rs11333758 and ACVR2A rs3764955 genotypes with the mixed-sport group, designated strength-endurance athletes, comprising athletes whose sports utilize mixed anaerobic/aerobic energy production. Unfortunately, the athletes in the other studies were stratified into only two groups (endurance-oriented and sprint/power-oriented athletes), complicating the comparison of results [19, 21]. Although the important role of myostatin in skeletal muscle development has been confirmed, additional replication studies are needed to establish its role in sports performance. In addition, epigenetic mechanisms that determine whether a gene is silenced or activated and when, and in what tissue it will be expressed, should be considered [28].

Conclusions

The sequence analysis performed in the present study revealed two polymorphisms (rs11333758 in MSTN and rs3764955 in ACVR2A) likely associated with sports abilities. We then confirmed that the specific genotypes of the selected SNPs, either individually or in gene‒gene combination, are significantly associated with athletes’ competition level in the Polish population, especially in the mixed-sports athlete group (strength-endurance athletes). Thus, although further research is required, these polymorphisms, alone or in combination with other polymorphisms, are among the numerous candidates that could explain individual variations in muscle phenotypes.

Materials and methods

Participants

In the first part of the experiment involving WGS, the study group consisted of 101 male and 1 female elite sprint/power (n = 53) and endurance (n = 49) athletes (age: 23.5 ± 5.9 years) of the highest nationally-competitive standards (classification was based on scoring tables, e.g., International Amateur Athletic Federation (IAAF), International Federation of Swimming (FINA), or receiving a medal in the national championships, or participation in international competition at the European or World Championships). Athletes with personal best results ranking them in the top 100 in a particular sports discipline in the world or in Europe were included in the study group. As the aim of this part of the study was to determine genetic variants associated with overall physical performance, all athletes were considered as a single group.

The control group consisted of 41 healthy individuals (23 females and 18 males) with no pairs with kinship above 0.125. Kinship was assessed with Hail (pc_relate method [29]; kinship metric scale 0–0.5; age: 22.4 ± 6.3 years). The inclusion criteria for volunteers were no medical history of any cardiorespiratory diseases, and not participating in professional sport training.

The second part of our study aimed to assess the association of selected gene variants with sport-related phenotypes, and 330 Polish athletes (age: 27.8 ± 7.1 years; 82 females and 248 males) who competed in national and international events were involved. The athletes were stratified into three groups according to values of relative anaerobic/aerobic energy system contribution, time of competitive exercise performance, and intensity of exertion in each sport:

  • endurance athletes (n = 101): rowers (n = 36), 1500–3000 m swimmers (n = 13), 15–50 km cross-country skiers (n = 4), canoeing (n = 9), road cyclists (n = 10), 1500 m runners (n = 9), 3–10 km runners (n = 8), triathletes (n = 6), marathon runners (n = 6);

  • mixed-sport athletes (n = 138): fencers (n = 10), boxers (n = 16), judokas (n = 11), wrestlers (n = 37), karate fighters (n = 1), volleyball players (n = 13), handball players (n = 21), ice hockey players (n = 25), gymnasts (n = 1), pentathletes (n = 3);

  • sprint/power athletes (n = 91): jumpers (n = 8), 100–400 m runners (n = 29), weightlifters (n = 17), powerlifters (n = 19), archers (n = 4), throwers (n = 5), 50–100 m swimmers (n = 9).

The athletes in these groups were divided into subgroups according to their competition level: high-elite (n = 51; gold medalists in the World and European Championships, World Cups, or Olympic Games), elite (n = 150; silver or bronze medalists in the World and European Championships, World Cups, or Olympic Games), and subelite (n = 129; participants in international competitions).

Non-training unrelated students (n = 365) from the Gdansk University of Physical Education and Sport (age 22 ± 3.4 years; 153 females and 212 males) were included in the control group. Athletes in the first and second part of the study, as well as controls, were of Polish origin.

We performed sample size calculations to assess the power of the genotyping study using a tool available at RE: https://clincalc.com/stats/samplesize.aspx, and assumed the following parameters: power 90%, effect size 25%, and alpha 0.05. The results indicated a minimum group size of 329 participants per group.

WGS data processing and analysis

DNA was isolated from peripheral blood leukocytes using the standard salting-out procedure and from saliva using the Oragene DNA self-collection kit and Prep IT L2P Purification Kit (DNA Genotek Inc., Stittsville, ON, Canada) following the manufacturer’s instructions.

All WGS data analyses were performed with Hail (v. 0.2.85) [29]. WGS and quality control (QC) was performed as described previously [30]. For this study, after QC whole-genome .vcf files were filtered to contain variants only in MSTN (40 variants), ACVR2A (200 variants), and ACVR2B (152 variants). Fisher exact tests were used to compare allele frequencies of sequenced variants between athletes and controls, sportsmen subgroups (endurance vs. sprint/power) and the sportsmen group with two external control groups: non-Finnish European population from gnomAD database with data from 33 988 samples [13] and a database of 1000 Polish genomes with data from 943 samples [31]. Bonferroni correction was applied to the obtained p-values and corrected p-values are reported in this manuscript unless clearly stated otherwise. The code used to export, annotate and test variants in MSTN, ACVR2A, and ACVR2B is available in the projects’ GitHub repository [32].

SNPs genotyping

DNA was extracted from buccal cells with a GenElute Mammalian Genomic DNA Miniprep Kit (Sigma, Steinheim, Germany) in accordance with the manufacturer’s protocol. All samples were genotyped using an allelic discrimination assay on a C1000 Touch Thermal Cycler (Bio–Rad, Feldkirchen, Germany) instrument with TaqMan® probes. To differentiate the MSTN rs11333758 and ACVR2A rs3764955 alleles, TaqMan® Pre-Designed SNP Genotyping Assays (Applied Biosystems, Waltham, MA, U.S.A.; assay ID: C_175825166_10 and C___3144655_10, respectively) containing fluorescently labeled (VIC and FAM) probes and primers were used.

Statistical analyses of genotyping results

All analyses were performed in R studio 4.2.0 (R Core Team 2020, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, [33]). SNPassoc R package was used for single-SNP and Hardy–Weinberg equilibrium tests. Association tests were performed for the following genetic models: codominant, dominant, recessive, and overdominant. For each comparison, odds ratio (OR), p-value, and 95% confidence interval (CI) are reported. SNP epistasis was tested with W-test in the wtest R package [34]. All reported p-values were corrected for the number of variants genotyped using the Bonferroni correction. p-values < 0.05 were considered significant.