Introduction

Based on a squad of 25 players, a professional football team will sustain approximately 50 time-lost injuries a season [13]. Consequently, each player will likely sustain two injuries per season that affect their availability for match selection. A recent meta-analysis in professional football reported an injury incidence rate of 8.1/1000 h of exposure [35]. This is problematic, as injuries which result in match-minutes lost have a strong association with team success. To be specific, player availability is strongly correlated with league position, total points, games won, and goals scored [11, 22]. Injuries also significantly impact a club’s finances. For example, a professional player injured for a month costs a club competing in the UEFA Champions League approximately €500,000 [12]. Moreover, across five seasons (2012/2013 to 2016/2017) in the English Premier League, it was estimated that injuries cost clubs an average of £45,000,000 per season [14]. Injury risk is multifactorial, but recent research into the identification of risk factors that predispose injury has seen a rise in interest in genetic susceptibility [50].

Genetic susceptibility studies utilising the ‘heritability’ statistic have shown that cognitive abilities, motor attributes, morphological dimensions, functional capacities, and personality traits are moderately to highly hereditary [20]. The heritability estimates for injuries are unclear as the exact aetiology of these multifactorial conditions remains to be elucidated. However, several heritability studies have reported evidence of a genetic component to injuries. For example, Harvie et al. [24] reported there was a greater risk to siblings suffering a full thickness tear of the rotator cuff, than the spouses of participants who had previously sustained the same injury. Similarly, Flynn et al. [17] showed that individuals who reported a family history of sustaining an anterior cruciate ligament (ACL) rupture, were twice as likely to suffer an ACL rupture, compared to individuals with no prior family history of ACL rupture. Moreover, Kraemer et al. [30] revealed that Achilles tendinopathy occurs significantly more in individuals with a prior family history of sustaining the injury. In addition, a recent study on lifetime ACL rupture risk (involving 88,414 twins) reported a heritability estimate of 69% [40].

Although heritability studies are important, as they reveal the possibility of a genetic predisposition, they fail to provide information concerning which specific genetic variants (i.e., polymorphisms) are responsible [21]. Therefore, the focus of current football genomic research is on further understanding genotype–phenotype relationships, through the exploration of genetic association. Many empirical studies have investigated genetic association with injury. However, to the author’s knowledge, no study has reviewed genetic associations with injury specifically in football players, with the only other team-sport review (non-systematic) having been completed in rugby [4]. Therefore, the aim of this review was to synthesise genetic association studies that have investigated injury involving football players to identify the genetic variants which have the most empirical evidence to date.

Methodology

Search strategy and eligibility criteria

The search strategy followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 guidelines [59]. A comprehensive search of the PubMed, SPORTDiscus, and MEDLINE databases was initially conducted up until June 27th 2020 and was subsequently updated on March 11th 2022, using the Boolean search of: ((football OR soccer) AND (injury OR injuries)) AND (genetics OR gene OR genotype OR snp OR polymorphism)). Additionally, Google Scholar was searched using word combinations of the aforementioned Boolean search. Furthermore, reference lists of the identified articles and reviews were searched for additional relevant studies and forward citation tracking was performed on all eligible studies. The literature search and selection of eligible studies was performed in duplicate and independently by two researchers. Studies were included if they met the following inclusion criteria: (1) were primary cohort or case–control investigations; (2) reported that football players were included in their population sample; (3) examined the association of a genetic variant with injury; and (4) were published in the English language.

Data extraction and analysis

The following data were extracted from included studies in duplicate and independently by two researchers: first author’s name and year of publication; number of participants, footballers, and controls; gender; age; nationality; ethnicity; study design, injury phenotype; injury measurement; names of genes and polymorphisms investigated; and main findings. It was deemed that a statistically pooled quantitative synthesis of the extracted data could not be performed due to the variation in: genes; polymorphisms; injuries; and, ethnicities. Therefore, due to the observed heterogeneity between studies, a narrative synthesis was chosen as the appropriate method with which to summarise results. The methodological quality of the studies was assessed using the Newcastle–Ottawa Scale (NOS; [72]), as utilised in previous research [39, 82]. Two reviewers independently scored the studies (0–9 stars based on the items within the three factors of the appropriate case–control (selection, comparability; exposure or cohort (selection; comparability; outcome version of the NOS.

Results

Search process

The systematic search process culminated in 34 studies being judged as adequately meeting the predetermined inclusion criteria and subsequently being included in the final analysis (see Fig. 1).

Fig. 1
figure 1

Flow diagram of systematic search process

Study characteristics and quality

Of the 34 included studies, there were 33 candidate gene association studies (CGAS) and one genome-wide association study (GWAS) (see Table 1). Of which, there were 13 longitudinal studies (ranging from 3 to 10 years follow up), 10 cross-sectional studies, 10 case–control studies, and one study used both cross-sectional and longitudinal analysis. The total number of participants across all studies was 9642, ranging from 43 to 1311 participant sample sizes, with a median sample size of 227. Twenty studies included only footballers, whilst the remaining 14 studies included athletes of other sports alongside footballers. Fifteen studies focused exclusively on male subjects, whilst the remaining 19 focused on both male and female subjects. Ethnicity was reported in 25 studies, whilst the remaining 9 failed to report ethnicity. Injuries were diagnosed by a medical professional in 27 studies, whilst the injuries in the remaining seven studies were self-reported. The injury phenotypes described in the studies included: ACL rupture (n = 8); concussion (n = 6); musculoskeletal soft tissue injuries (n = 4); muscle injuries (n = 5); stress fracture (n = 3); musculoskeletal injuries (n = 3); hamstring (n = 1); knee and ankle (n = 1); low-back pain (n = 1); medial collateral ligament (MCL; n = 1); tendinopathy (n = 1). The NOS scores of the case–control studies ranged from 4 to 9, with a mean score of 7.1. The NOS scores of the cohort studies ranged from 3 to 8, with a mean score of 6.7. The overall mean NOS score across all studies was 6.8 (see Table 2).

Table 1 Study characteristics and main findings
Table 2 Quality assessment of studies

Genetic variants

Across the 33 CGASs, a total of 99 unique polymorphisms were assessed within 63 genes (see Table 3). Forty-one unique polymorphisms were associated with injury at least once, whereas three polymorphisms had their specific allelic associations with injury replicated at least twice in independent cohorts: Actinin alpha 3 (ACTN3; rs1815739), Aggrecan (ACAN; rs1516797), and Vascular endothelial growth factor A (VEGFA; rs2010963). More specifically, the XX genotype of ACTN3 (rs1815739) was associated with an increased susceptibility to non-contact muscle injuries [8, 46]. Whereas the G allele and TT genotype of ACAN (rs1516797) were associated with increased and decreased susceptibility to ACL injuries, respectively [7, 41], whilst the CC genotype of VEGFA (rs2010963) was associated with an increased risk of ACL rupture [36, 37] and ligament or tendon injuries [23]. In the only GWAS [68], three additional polymorphisms had a ‘suggestive’ association (P < 10–5 with tendinopathy; however, they failed to reach the study’s set (P < 10–7) and typical (P < 10–8) genome-wide statistical significance thresholds.

Table 3 Polymorphism investigations and associations

Discussion

The aim of this review was to synthesise genetic association studies that have investigated injury involving football players to identify which genetic variants have the most empirical evidence to date. To the author’s knowledge, this is the first review to explore this within a football context. The main findings of this review show that of the 99 unique polymorphisms that have been assessed regarding genetic associations with injury in football players, only the ACTN3 (rs1815739), ACAN (rs1516797), and VEGFA (rs2010963) polymorphisms presented similar findings in independent cohorts. Replication is vitally important in genetic association research, as associations in preliminary studies are often overstated. Indeed, a meta-analysis showcased that it is common for subsequent studies to report more modest associations, compared to superior associations of initial studies when investigating novel genetic variants [29]. As such, preliminary reported genetic associations should be carefully interpreted until a subsequent study using an independent population sample replicates the results. Thus, as ACTN3 (rs1815739), ACAN (rs1516797), and VEGFA (rs2010963) were the only three isolated polymorphisms that appeared to replicate their specific allelic associations in more than one independent cohort, they will be discussed in greater detail. Furthermore, a critical evaluation of the intra- and inter-study methodological limitations will be provided to examine the reliability of their individual findings and validity of their replications.

The R allele of ACTN3 (rs1815739) is regarded as beneficial to strength/power performance and in a recent meta-analysis has been associated with professional status within football [50]. Regarding injury susceptibility, Moreno et al. [57] recently reported that endurance runners with ACTN3 XX genotype were at an increased risk of sustaining a muscle injury. One of the functions of ACTN3 involves encoding the actinin alpha 3 protein [58], which is a vital structural component of the Z-line as it, along with actin-containing filaments, anchors and stabilises the muscle contractile mechanism [56]. The rs1815739 polymorphism can produce a deficient protein when a premature stop codon (X) replaces arginine (R) at residue 577 [83]. As such, it is speculated that because XX individuals have a lack of actinin alpha 3 protein, this reduces the capacity of the skeletal muscle to tolerate the consequent muscle contractions from long-term and exhaustive exercise that facilitate muscle injury [2]. Specifically, a XX genotype may create a less powerful link between the actin filaments and the Z-line, which then results in a structural deficiency, leading to a sarcomere more prone to suffering damage under high mechanical stress [2]. Furthermore, it has been reported that actinin alpha 3 deficient individuals appear to sustain greater muscle damage following physical activity [34, 60], but require less recovery time [10, 34]. Therefore, it has also been speculated that although football players possessing a XX genotype may recover at a faster rate, they are perhaps exposed to an intolerable amount of muscle damage, inhibiting the recovery process [8]. Indeed, several studies found that XX homozygotes sustained more severe injuries or were absent for a greater number of days following injury [23, 46, 67]. A mechanistic explanation for an increased injury risk in those with the XX variant has been proposed. The XX genotype may result in an enhanced activation of calcineurin and a consequent shift in fast-twitch fibres toward oxidative metabolism [71]. This inter-genotype metabolic handling resulted in significantly higher calcium release during muscle contractions in ACTN3 knockout mice [25, 65]. Moreover, an additional by-product of complete ACTN3 deficiency is the upregulation and accumulation of other Z-line proteins [70]. Therefore, in X-allele carriers, this may decrease the stability and rigidity of type IIa fibres [5], which may facilitate a greater susceptibility to muscle injuries [3]. It is important to note that these suggestions remain speculative, as the exact mechanism to explain the potential higher incidence of sports-related muscle injury in XX athletes is yet to be established [2].

The ACAN (rs1516797) polymorphism was investigated in three independent population samples. Whilst only two of the three investigations found a significant association with injury, it is important to note that the study that showed no significant differences was only investigating associations with muscular (hamstring) injury [33]. Whereas, the two studies that found a significant association with ACAN (rs1516797) and injury were investigating ligament (more specifically ACL) injuries. The role of ACAN in the structural governance of the ligament [84], may provide an explanation for the findings shown. Both studies reported that individuals possessing a G allele of ACAN are more at risk of sustaining an ACL injury, with the TT genotype being associated with increased protection. The genotype distribution of individuals who sustained an ACL injury was similar between studies (i.e., G/G = 11%, T/G = 47%–49%, T/T = 40%–42%. The ACAN gene encodes the aggrecan protein, which is a large structural proteoglycan mostly abundant in cartilage [84]. Proteoglycans perform a synergistic role in fibrillogenesis, potentially through many of their direct/indirect interactions with several proteins, including, the collagen network and cell-signalling molecules within the extracellular matrix [26]. The alteration of collagen fibril properties may change various biomechanical and functional components of the ligament, possibly increasing injury risk. Indeed, lower levels of proteoglycan and glycosaminoglycan have been observed in ruptured versus non-ruptured human ACL tissue [84]. As such, proteoglycan encoding genes are deemed viable candidates worthy of investigation regarding associations with ligament injuries. However, the specific biological functions of the T/G genotypes within the ACAN (rs1516797) polymorphism are yet to be determined [26, 41]. Until an exact mechanistic explanation for ACAN (rs1516797) is determined, it is currently unclear how the shown associations can be used in the identification of genetic risk factors for injury in football players.

The VEGFA (rs2010963) polymorphism was investigated in four studies using independent cohorts. The two studies that reported an association between VEGFA (rs2010963) and injury found associations with ACL ruptures [36, 37] and ligament or tendon injuries [23]. The two studies that reported no association between VEGFA (rs2010963) and injury were investigating ACL ruptures [66] and hamstring injuries [33]. The VEGFA gene encodes the vascular endothelial growth factor-A protein, which is considered the dominant inducer of angiogenesis [69]. Angiogenesis can be described as the formation of new capillary blood vessels from existing micro vessels, which has an important function in numerous biological processes (e.g., embryological development, inflammation, and wound healing) and the pathogenesis of several diseases (e.g., cancer, diabetic retinopathy, and rheumatoid arthritis) [85]. Functional polymorphisms of VEGFA can alter gene expression and protein production, an imbalance of which can have negative physiological consequences. For instance, it has been reported that an increase in VEGFA expression upregulates the expression of matrix metalloproteinases, which may adversely alter the biomechanical properties of ligaments via compromised extracellular matrix homeostasis [81]. The CC genotype of VEGFA (rs2010963) has been associated with enhanced protein expression and plasma VEFGA concentration [69]. As such, the increased susceptibility of CC homozygotes to ligament injuries in football may be due to these biological mechanisms associated with increased VEFGA expression.

When critically analysing the methodological approach and cohort characteristics of each ACTN3, ACAN, and VEGFA study, it is clear that there are some significant within- and between-study limitations and variability. As such, this undermines the reliability and validity of the reported associations. For instance, with regards to the ACTN3 studies, Massidda et al. [46] reported on 169 Caucasian male football players of varying ages and competitive playing levels recorded via injury incidence (i.e., total injuries per 1000 h). Whereas Clos et al. [8] reported on 43 senior professional male football players of different ethnicities (Caucasian = 23, Hispanic = 13, Black African = 7) recorded via injury rate (i.e., total injuries per season). With regards to the ACAN studies, Cięszczyk et al. [7] reported on a sample of 229 male (n = 158) and female (n = 71) Polish football players of a similar age but varying competitive playing levels who sustained an ACL injury via non-contact mechanisms. Whereas Mannion et al. [41] reported on a sample of 227 male (n = 166) and female (n = 61) Caucasian athletes from multiple sports (football players = 14 males) of varying age and competitive playing levels who sustained an ACL injury via non-contact (n = 126) and contact mechanisms (n = 101). Finally, with regards to the VEGFA studies, [36, 37] sample consisted of 222 senior Polish Caucasian male (n = 156) and female (n = 66) football players of a similar age but varying competitive playing levels who sustained an ACL injury via non-contact mechanisms. Whereas Hall et al. [23] sample consisted of 402 Caucasian male academy football players analysed separately based on maturity status who sustained an injury via contact and non-contact mechanisms. There are some evident issues present in these studies (e.g., small sample sizes, cohort heterogeneity, and population stratification), which are prevalent throughout all studies in this review and are discussed in more detail below (see limitations section). However, a specific limitation is the combination of male and females in regards to ACL injury risk. It is well reported that females are at a higher risk of ACL injury compared to males, with female football players in particular at a 2- to threefold higher risk than their male counterparts (see [80] for a review). As such, when factoring in these limitations and the considerable between-study variability, the evidence associating ACTN3 (rs1815739), ACAN (rs1516797), and VEGFA (rs2010963) with injury risk in football loses its credibility.

Perhaps the most surprising result of this review was the lack of association and/or replication between some of the most heavily researched genes regarding injury susceptibility in sport; namely, the collagen type 5 alpha 1 chain (COL5A1) and collagen type 1 alpha 1 chain (COL1A1) genes. Both genes are involved in providing instructions for making components of collagen [39, 82]. The genetic variants of both genes have been extensively researched in a variety of sports, particularly the COL5A1 (rs12722) and COL1A1 (rs1800012) polymorphisms. Indeed, two recent meta-analyses have reported that individuals possessing the TT genotype of the COL5A1 (rs12722) polymorphism are predisposed to a higher risk of sustaining tendon and ligament injuries [39], whilst, individuals possessing the TT genotype of the COL1A1 (rs1800012) polymorphism have a reduced risk of tendon and ligament injuries [82]. Within this review, the COL5A1 (rs12722) polymorphism was studied the most frequently (7 studies), followed by the COL1A1 (rs1800012) polymorphism (six studies). However, whilst the COL5A1 (rs12722) polymorphism was found to be associated with injury in two independent cohorts, contrasting allelic associations were reported in each study. More specifically, McCabe & Collins [52] showed that the T allele was associated with knee and ankle injuries in senior players, whereas Hall et al. [23] presented associations between the C allele and musculoskeletal soft-tissue injuries and ligament injuries in youth players. As such, with regards to football players, it would appear that the COL5A1 (rs12722) and COL1A1 (rs1800012) polymorphisms are not independently associated with injury. However, as also previously mentioned regarding the ACAN gene, not all studies investigated the COL5A1 (rs12722) and COL1A1 (rs1800012) polymorphisms with the most appropriate soft tissue injury (tendon and ligaments). Indeed, a number of studies instead chose to focus on muscular injuries, injuries as a whole, and/or contact injuries. It is important to note the substantial variation in the underpinning mechanism(s) contributing to different injuries and that different injuries are likely to have dissimilar genetic predispositions. Moreover, contact injuries are likely to be less influenced by genetic predispositions and more a consequence of the contributing environmental factors. Consequently, the ability to draw valid conclusions based on genetic associations with contact injuries is extremely problematic. As such, the true association of the COL5A1 (rs12722) and COL1A1 (rs1800012) polymorphisms with injury in footballers remains unclear.

Limitations

Several methodological limitations exist across genetic association research investigating injury susceptibility in football. Thus, evaluations of many potential associations between polymorphisms and injury remain unclear and are currently inconclusive. Firstly, many of the studies were conducted using retrospective designs, with participants who self-reported their injuries (see Table 1). This is problematic, as many athletes have insufficient knowledge of what constitutes specific injuries and also allows for the possibility of recall bias [19]. Secondly, the sample size of most studies was very small (see Table 1). Consequently, many studies may have lacked the statistical power required to detect a significant association between a polymorphism and an injury, which may partly explain why most of the significant associations reported in this review were in studies with larger sample sizes. It is well known that a study investigating the association of a single polymorphism usually requires a sample size of hundreds, and in some cases, thousands [28]. This is because individual polymorphisms most likely only contribute a small amount to injury susceptibility, given that injury is such a multifactorial phenomenon [54]. It is also important to note that few studies attempted to investigate the combined influence of multiple polymorphisms with injury risk. Possible gene–gene/gene-environment interactions are crucial regarding injury risk, in order to understand the complex mechanisms and polygenic nature underpinning each condition [54]. Furthermore, the authors discovered only one GWAS within the literature. A GWAS is one of the most appropriate methods with which to assess genetic associations, given their ability to assess over 1,000,000 polymorphisms at once and identify novel genetic associations [21]. As such, it is important that more advanced genomic technology is utilised in the future for studies that are appropriately designed for such.

The heterogeneity of population samples via population stratification is also of concern. Individuals from different ancestral backgrounds inherent distinct allele frequencies. Thus, inter-individual ethnic variation is problematic and can result in spurious associations [18]. However, many studies in this review have investigated footballers of diverse ethnicities together. Moreover, several studies have analysed participants of different genders, playing levels, and sports together. This is problematic, as training and match loads differ across these population substructures, which consequently results in diverse injury susceptibility and further exacerbates sample heterogeneity [39]. As such, it is essential that future research is conducted with improved methodological approaches and study designs [48, 51]. Recognisably, it is difficult to obtain a large homogenous cohort of athletes and afford the use of advanced genomic technology [27, 47,48,49,50,51]. Therefore, the participation of organisations in international consortia is imperative (McAuley et al. [61]. This will allow the sharing of data and resources, facilitate superior statistical power, and progress our understanding of the exact underlying pathobiology of injury susceptibility. Finally, it is also important to note that this review in itself is not without its limitations. Specifically, the inclusion of papers published only in the English language at the searching, screening, and analysis phases and the inclusion of solely published papers in general. As such, language restriction and publication bias may have had a significant effect on the number of studies included in this review, and consequently the number of positive and/or negative associations. In addition, although a protocol was developed before the research began, this review was not registered online during the planning stage.

Practical Applications

The utilisation of genetic information to aid in tailoring individualised training program designs for prehabilitation, and the recovery of players susceptible to injury, is an exciting prospect. However, at present, the evidence base supporting the use of genetic testing as a prognostic or diagnosis tool for injury risk in football is weak, and confounded by methodological limitations and inconsistencies. Future research via collaborations with improved methodological approaches and mechanistic studies will help identify significant biological pathways underpinning injury risk. Although, to truly have meaningful clinical application, genetic information will have to be used collectively alongside the various other intrinsic and extrinsic risk factors of injury (e.g., sex, age, injury history, anthropometrical differences, technique, equipment, and player loads) [35, 53, 79].

Conclusion

Currently, 41 polymorphisms have been associated with injury in football at least once, whereas three polymorphisms (i.e., ACTN3 rs1815739, ACAN; rs1516797, and VEGFA rs2010963) have had their specific allelic associations with injury replicated at least twice in independent cohorts. However, there are several methodological issues (e.g., small sample sizes, cohort heterogeneity, and population stratification) that limit the subsequent reliability and external validity of findings. As such, within a football context, based on this review, there are currently no replicated and validated genetic variants that warrant the utilisation of genetic information as a prognostic or diagnosis tool for injury risk. The future participation of organisations in international consortia is suggested to combat the current methodological issues discussed within this review and subsequently improve clarity concerning the underlying genetic contribution to injury susceptibility.