Background

Physical inactivity and sedentariness represent a major challenge to public health and contribute substantially to ill health and premature mortality [1, 2]. The impact of physical inactivity on development of non-communicable diseases has been compared to that of tobacco smoking, alcohol consumption, or an unhealthy diet [1, 3, 4]. In contrast, there is ample evidence that a physically active lifestyle is associated with a myriad of health benefits [5,6,7,8,9]. Despite this, a large proportion of the population remains inactive below the recommended levels of physical activity [10]. Although the variation in physical activity and sedentariness is likely to be determined by a multitude of factors, evidence from family- and twin studies suggest a significant genetic influence [11, 12].

Recent developments in both objective measurements of physical activity and sedentary behaviour [13, 14], along with improved genotyping technology facilitating extensive genotyping in large populations [15], give promise for the identification of valid and robust genotype-phenotype associations of physical activity and sedentary behaviour. These associations may in turn serve as genetic instruments in Mendelian randomisation studies [16] to improve causal inference about the health effects of physical activity and sedentariness [17], and thus guide the development of effective preventive strategies and interventions.

Previous reviews have reported associations between different physical activity and sedentary behavior phenotypes and various genes [12, 18,19,20]. However, most reviews did not describe a systematic literature search [18,19,20] and no previous review has conducted a quality assessment to critically assess the methodological quality of the included studies, which is recommended for systematic reviews of genetic association studies [21]. The aim of the current systematic review was therefore to provide a comprehensive overview of genetic variants associated with physical activity or sedentary behaviour.

Methods

The review protocol was registered in Prospero (International prospective register of systematic reviews): CRD42019119456. The results are presented according to the PRISMA statement [22].

Eligibility criteria

We included all original studies on humans of any age, published in English in international peer-review journals, that 1) identified new genetic variants associated with physical activity or sedentary behaviour (i.e., GWAS), or 2) reported the association between a genetic variant and these behaviours (i.e., candidate gene studies). Studies assessing physical activity or sedentary behaviour as a modifier/moderator of genetic variants associated with other outcomes were not included. We did not include case reports, editorials or reviews, or studies solely including animals.

The phenotype definitions of physical activity and/or sedentary behaviour in the included studies were defined based on data from self-reports (e.g. questionnaires, diaries) or objective measurements (e.g. accelerometry, pedometer). We excluded studies that only measured fitness or strength, or with an aim to study genes associated with performance in sports. Furthermore, we excluded studies that only reported on physical activity related to active transport or occupational activity. Studies using a polygenic risk score (i.e. not reporting associations for individual genetic variants), or studies examining interaction were excluded if no estimate on the association between genetic variants and physical activity or sedentary behaviour was reported.

Information sources and search strategy

Studies were identified by searching electronic databases and inspecting reference lists of studies and relevant systematic reviews. The design and execution of the literature search were supervised by a trained research librarian with expertise in systematic reviews. The search was performed in PubMed and Embase (via Ovid) from 1990 until April 14th 2020. The search strategy was based on domains related to physical activity, sedentary activity and genetics. The full search strategy is presented in online supplementary 1.

Study selection

Eligibility assessment was performed in a two-stage screening process, described in Bramer et al. [23]. In the first stage, titles and abstracts were screened by three pairs of two researchers (ALN/ESS, IM/KAIE, TILN/LA) blinded to each other’s selection. These pairs remained the same throughout all the steps of the review process. Disagreements within pairs were discussed and resolved by a third researcher (PJM) when necessary. Studies considered not to be relevant were excluded and full-text articles were obtained for the remaining studies. In the second stage, two reviewers independently screened the full-text articles against the inclusion and exclusion criteria. If necessary, disagreements were resolved by discussion with a third reviewer. Reasons for excluding studies were recorded (Fig. 1).

Fig. 1
figure 1

Flowchart for the selection of studies

Data extraction

We developed a data extraction form (online supplementary 2) inspired by Eskola et al. [24]. The form was adopted to the purpose of the current study and pilot-tested. Two researchers extracted data independently using the form. Disagreements were resolved by discussion between the two reviewers and if necessary, discussed with a third researcher (PJM). The following data was extracted from the included studies, if available: 1) general information (authors and year of publication); 2) participant characteristics (country of origin, ethnicity, age and gender); 3) study characteristics (study design, genotyping method, and physical activity measuring instrument); 4) outcomes/results (physical activity phenotype, genetic variant, strength of association, confidence interval and/or p-value).

Risk of bias and methodological quality

We developed criteria for assessing risk of bias and methodological quality of the included studies (online supplementary 2). The criteria were inspired by Hayden et al. [25] and Eskola et al. [24] and assessed the following: selection bias (inclusion/exclusion criteria and population stratification), sample size calculations, genetic data (DNA sampling, genotyping method, quality control, blinding and Hardy-Weinberg equilibrium), physical activity and sedentary behaviour data (assessment procedure, validation and whether self-reported or objectively measured) and statistical analyses (measure of association and replication within the study), giving a maximum score of 12 points. The studies where then classified according to their score value: very low quality, < 3 points; low quality, 3–5.5 points; medium quality 6–8.5 points; high quality, ≥9 points. Three pairs of researchers (ALN/ESS, IM/KAIE, TILN/LA) assessed the criteria independently.

Data synthesis and analysis

Due to the expected heterogeneity of phenotypes and genetic markers, we did not aim for a quantitative data synthesis involving a meta-analyses approach. The results of the individual studies are presented and discussed according to their scores on the risk of bias and quality assessment, putting more emphasis on studies with a higher quality score. For the genetic variants identified in candidate gene studies, we also report their association with accelerometry defined phenotypes using summary statistics [26] from the high quality GWAS by Doherty et al. [27].

Most candidate gene studies only presented results that had a p-value < 0.05 (i.e., nominally statistically significant). To avoid bias in the extracted results from the different studies, we only retrieved associations that had a p-value < 0.05 from studies that also reported results with higher p-values.

Results

Search results and selection of studies

Figure 1 shows a PRISMA flowchart of the study selection process. In total, 6697 records were identified through the database search and 10 records through inspection of reference lists or citation tracking. After removal of 1287 duplicates, 5420 records were screened at title and/or abstract level and 109 full-text articles were assessed for eligibility. Of these, 54 articles were found eligible for inclusion in the current review (online supplementary 3).

Characteristics of the included studies

Table 1 shows the main characteristics of the included studies. Among the included studies 48 used a candidate gene approach and six were GWAS, where three also examined candidate genes. The phenotypes of physical activity and sedentary behaviour were operationalized in a variety of ways and mainly measured by questionnaires. In total, 12 studies used objectively measured physical activity data (accelerometry) of which two were GWAS.

Table 1 Characteristics of the included studies in alphabetical order, grouped by study type (i.e., GWAS and candidate gene studies)

Critical appraisal

Risk of bias and methodological quality varied considerable between the included studies (Table 2). The scores for the GWAS ranged from 7 to 9 with a median of 7.75. One GWAS was considered high quality [27]. Among the 48 candidate gene studies, the scores varied from 1 to 9.5, with a median of 6.5. Three candidate gene studies were considered high quality [32, 33, 41], 35 medium quality, while 10 studies were considered low or very low quality.

Table 2 Quality/risk of bias assessment for the included GWAS and candidate gene studies. Studies sorted in descending order according to quality score (high to low)

Few studies described a priori sample size calculation and only two studies described blinded genotyping. Most GWAS scored high on description of the genotyping process and phenotype definition, but only three studies used a validated self-reported instrument or objective measurement of physical activity or sedentary behaviour. Most of the candidate gene studies had a limited description about the quality control for the genotyping process.

Associations between genes and habitual physical behaviour

The characteristics of the included studies (Table 1) and the risk of bias assessment (Table 2) are presented according to type of study (i.e., GWAS or candidate gene study). Table 3 shows the results for medium and high-quality studies ordered by chromosome. For a more detailed overview for all included studies see online supplementary 4 (GWAS) and 5 (candidate gene studies).

Table 3 Genotype-phenotype associations in medium (6–8.5 points) and high (≥9 points) quality candidate gene studies and GWAS. For GWAS, results with a genome-wide significance level of p < 5 × 10−8 or lower are presented. GWAS are indicated by grey cells. Studies are sorted according to chromosome position

GWAS

In the six included GWAS, several SNPs were identified that were associated with physical activity or sedentary behaviour (Table 3 and online supplementary 4). Three studies [27, 29, 30] used a genome-wide significance level of p < 5 × 10− 8 or lower. The high-quality GWAS [27] was based on data from the UK Biobank and identified three loci associated with overall physical activity and four loci associated with sedentary behaviour. Also based on data from the UK Biobank, Klimentidis et al. [30] identified 10 loci that were associated with at least one of four physical activity phenotypes (i.e., moderate-to-vigorous physical activity, vigorous physical activity, strenuous sport or other exercises and overall physical activity level assessed by accelerometry). SNPs in CADM2 were associated with all three phenotypes, whereas SNPs in EXOC4 were associated with the first two. One SNP in DPY19L1 was associated with vigorous physical activity only. Hara et al. [29] found one SNP (rs10252228) associated with regular leisure time physical activity. This SNP was located in the intergenic region between NPSR1 and DPY19L1, and the SNP was also significant in replication samples. Heritability estimates varied from 1.3% in the study by Hara et al. [29] who used self-report to measure leisure time physical activity, to 21% for overall activity in the study by Doherty et al. [27] who used accelerometry (online supplementary 4).

Three of the GWAS also included candidate gene analysis of genes that previously have been reported to be associated with physical activity [28, 29, 31]. Lin et al. [31] reported a statistically significant association (p < 5 × 10− 3) for SNPs in several loci, including SNPs close to GABRG3, CYP19A1, PAPSS2 and CASR. Hara et al. [29] found a weak association for a SNP in DNAPTP6 with leisure time physical activity, but the association was not statistically significant after Bonferroni correction (p < 0.05/6). De Moor et al. [28] reported statistically significant associations (p < 0.01) for SNPs in LEPR and CYP19A1.

Candidate gene studies

The candidate gene studies showed associations (p < 0.05) between variants in 30 different genes and physical activity and/or sedentary behaviour (Table 3 and online supplementary 5). The high-quality study by Bruneau et al. [32] found an association between walking distance per week and an insertion/deletion polymorphism of a 287-bp Alu repeat sequence within the intron 16 in ACE (rs4340). This polymorphism was also found to be associated with both physical activity and sedentary behaviour in two medium quality studies [43, 52] and in one low quality study [51]. However, the GWAS by Lin et al. [31] and De Moor et al. [28] did not successfully replicate SNPs in or close to the ACE gene. In another high quality candidate gene study, Bruneau et al. [33] found an association between light intensity physical activity and a SNP in IL15RA (rs2228059). In total, variants in nine candidate genes (ACE, CASR, CYP19A, FTO, DRD2, CNR1, LEPR, MC4R, NPC1) were found to be associated with physical activity or sedentary behaviour in more than one study. Variants in or close to MC4R was associated with physical activity in three medium quality studies [34, 40, 42]; however, the GWAS by De Moor et al. [28] and Lin et al. [31] did not report an association between physical activity and SNPs in the vicinity of the MC4R gene.

Online supplementary 6 shows effect size, standard error and p-value for genetic variants from candidate genes studies associated with accelerometry defined phenotypes reported in GWAS summary statistics [26, 27]. All p-values were above the conventional threshold of 5 × 10− 8; however, we observed that the FTO gene (rs9939609) reported to be associated with sitting time by Klimentidis et al. [39] and the NPC1 (rs1805081) reported to be associated with physical activity level by Reddon et al. [47] had p-values of 0.04 and 0.002 for sedentary behaviour and overall activity, respectively.

Discussion

This systematic review provides an overview of genetic variants associated with physical activity or sedentary behaviour. Fifty-four studies met the inclusion criteria, of which six studies were GWAS and 48 studies were candidate gene studies. While the quality scores for the GWAS were medium-to-high, most of the included candidate studies showed low-to-medium quality. The GWAS reported up to 10 loci that were significantly associated with physical activity or sedentary behaviour, and variants in nine candidate genes were found to be associated with physical activity or sedentary behaviour in more than one study. However, the available evidence was not consistent, and the included studies had several limitations that prevent us from drawing firm conclusions about valid and robust genotype-phenotype associations.

In line with previous reviews [12, 19, 20] we noted that phenotype definitions of physical activity varied considerable between studies, including constructs such as walking distance [32], low-intensity physical activity [45, 52], moderate intensity physical activity [34, 36, 49], vigorous physical activity [34, 37], energy expenditure [29, 31, 48], engagement in sports activities [44, 51], meeting recommended levels of physical activity [50], and physical activity level from childhood to adolescence [46]. Moreover, these phenotype definitions were in many studies based on instruments with poor validity. Likewise, phenotype definitions of sedentary behaviour were in several studies based on self-reports, which is shown to have very poor validity [53,54,55].

Self-reported measures of physical activity and sedentary behaviour are prone to measurement error and misclassifications [56] and findings on genotype-phenotype associations should therefore be interpreted with caution. Likewise, there are some limitations related to the use of objective measurements to define phenotypes that should be considered when interpreting the results [57,58,59]. For example, a single accelerometer may not capture all relevant activity [60, 61] and the use of different cut-offs points and methods for processing the accelerometer data are known to create large and significant differences in the estimated physical activity level [62, 63]. Thus, it is possible that the studies included in this review capture different aspects of physical activity and sedentary behaviour. Accordingly, the inconsistent findings across studies between genetic variants and physical activity or sedentary behaviour can partly be related to discrepancies in the measurements of physical activity and sedentary behaviour and the resulting phenotype definition.

Although recent advancements in methods and technology allow fast and accurate analyses of whole-genome samples [64, 65], only six GWAS have investigated genetic variants associated with physical activity. Moreover, only one GWAS has investigated genetics variants associates with sedentary behaviour and only two GWAS used objective measurements to define phenotypes. Several SNPs were associated with physical activity, but few SNPs or genes have been identified in more than one study. One exception is SNPs close to the DPY19L1 gene, which was identified by two medium quality GWAS [29, 30]. The molecular mechanism behind the association between DPY19L1 and physical behaviour remains elusive. DPY19L1 may be required for a proper radial migration of glutamatergic neuron, a major excitatory component of the mammalian neocortex [66]. Hara et al. [29] found an association between self-reported physical activity and the rs10252228 SNP, which is located in the intergenic region between NPSR1 and DPY19L1. This study comprised individuals of Japanese ancestry and the findings were confirmed in replication samples. Likewise, Klimentidis et al. [30] found an association between the rs328902 SNP close to the DPY19L1 and self-reported leisure time physical activity. However, the genetic effect sizes in the latter study were small, and the replication cohort was considered insufficiently powered to replicate the associations. It should also be noted that the two GWAS by Klimentidis et al. [30] and Doherty et al. [27] are based on the same data from UK Biobank. The few GWAS performed to date, along with the variable study size, different phenotypes of physical behaviours, and the wide range of ethnicities (e.g., Caucasians, Japanese, and African American) makes it difficult to compare the GWAS. Moreover, a GWAS require a large sample size to be adequately powered to adopt a significance level that account for multiple testing [15]. With a recommended genome-wide significance threshold of p < 5 × 10− 8 [67, 68], most GWAS in this review were underpowered to detect all the possible heritability explained by the SNPs (three out of six GWAS used the recommended threshold level of p < 5 × 10− 8). Moreover, GWAS have been criticized because markers across genomes with no direct biological relevance to the phenotype of interest could be located [15]. Nevertheless, this is a rapid growing area of research and one can overcome several limitations by larger sample sizes and advancements in technology, methodology and computing. Future studies may therefore have the potential to identify missing signals, account for population stratification, identify rare mutations, identify gene-environment interactions, and correspondingly, explain more of the heritability [15, 65].

Despite the widespread use of candidate gene studies, our review shows that this approach has produced only a few replicated associations related to physical activity or sedentary behaviour. Nine out of 30 candidate genes were found to be associated with physical activity in more than one study. The explanation for these inconsistent findings may be linked to the small study samples and the heterogeneity of the definitions of physical activity phenotypes. Population-based candidate gene studies with large study samples with adequate statistical power were rare. Most candidate gene studies had rather small sample sizes, and the likelihood of identifying a true genetic variant may therefore be low.

There may exist a complex set of genetic, environmental, and phenotypic factors that connect physical activity and sedentariness to other behavioural traits [40, 69, 70], and we cannot exclude the possibility of pleiotropic effects (i.e., a single genetic variant affecting multiple traits), nor that these effects are influenced by the phenotype definitions. For instance, two candidate gene studies reported an association between FTO and self-reported physical activity and time spent sitting [35, 39]. However, these findings were not supported by studies using objective measurements [41] or a more well-defined physical activity index [38]. It has been argued that candidate gene studies are insufficient for identifying the genetic contribution to variation in physical activity [71] and that the genetic susceptibility to a physically active or inactive lifestyle should be studied in the context of social and environmental factors [11, 19, 72], i.e., gene-environment interactions are expected to explain some of the unexplained heritability [73, 74]. In most of the included studies, the conclusions were based on p-values and many studies did not present an estimate for the associations under study, making it impossible to make a judgement about the strength of the association. Together with predominantly small sample sizes this might introduce biased results. It is also possible that the strong focus on p-values leads to publication bias and selective reporting [21]. The findings from the candidate gene studies should therefore be interpreted in view of unclear or unknown effect sizes, small study samples, and the possible influence of sociodemographic and environmental factors.

Strengths of the current systematic review include the comprehensive literature search in two bibliographic databases supervised by a trained research librarian, the use of checklists to assess risk of bias/methodological quality and blinding of reviewers during data extraction. However, the quality assessment could be problematic since all potential sources for bias are weighted equally. Since bias in genetic association studies are not completely understood, evidence of what study characteristics that are most important is lacking [21]. Another limitation is that we only retrieved associations from candidate gene studies that were nominally statistically significant (i.e., p-value < 0.05), since most authors only showed statistically significant results. Thus, potentially important associations from small studies may have been omitted from this review. Furthermore, although we excluded studies that only reported physical activity related to active transport or occupational activity, few studies reported whether occupational physical activity or work-related sedentariness were included in their measurements. This might bias the reported associations since occupational physical activity can be constrained by the type of occupation and work tasks. Obtaining accurate and detailed measurements of physical activity behaviour (type of activity, duration, intensity, frequency, and domains [leisure, work, transportation]) are critical to understand the genetic contribution to physical activity behaviour. This is underlined by heritability being greater in studies using objective measurements of physical behaviour [27, 29, 30]. Future studies should therefore aim at using objective measurements to obtain more well-defined phenotypes, enabling identification of more robust genetic instruments for physical activity behaviour. This could in turn provide the basis for Mendelian randomisation studies to improve causal inference about the effect of physical activity and sedentary behaviour on morbidity and mortality, and thus evade some of the central challenges of conventional epidemiological studies, such as confounding, reverse causation and measurement error [16].

This systematic review shows that several genetic variants are associated with physical activity or sedentary behaviour. However, findings across studies are inconsistent and the results should be interpreted with caution due to methodological shortcomings, such as the large variation in phenotype definitions, study designs, and study populations. Moreover, replications issues are prominent in this field and there is general lack of high-quality studies. Thus, our review highlights the need for more high-quality GWAS with consistent phenotype definitions using objective measurements to elucidate the genetic influence on physical activity and sedentary behaviour.