Associations of polygenic inheritance of physical activity with aerobic fitness, cardiometabolic risk factors and diseases: the HUNT study

Physical activity (PA), aerobic fitness, and cardiometabolic diseases (CMD) are highly heritable multifactorial phenotypes. Shared genetic factors may underlie the associations between higher levels of PA and better aerobic fitness and a lower risk for CMDs. We aimed to study how PA genotype associates with self-reported PA, aerobic fitness, cardiometabolic risk factors and diseases. PA genotype, which combined variation in over one million of gene variants, was composed using the SBayesR polygenic scoring methodology. First, we constructed a polygenic risk score for PA in the Trøndelag Health Study (N = 47,148) using UK Biobank single nucleotide polymorphism-specific weights (N = 400,124). The associations of the PA PRS and continuous variables were analysed using linear regression models and with CMD incidences using Cox proportional hazard models. The results showed that genotypes predisposing to higher amount of PA were associated with greater self-reported PA (Beta [B] = 0.282 MET-h/wk per SD of PRS for PA, 95% confidence interval [CI] = 0.211, 0.354) but not with aerobic fitness. These genotypes were also associated with healthier cardiometabolic profile (waist circumference [B = -0.003 cm, 95% CI = -0.004, -0.002], body mass index [B = -0.002 kg/m2, 95% CI = -0.004, -0.001], high-density lipoprotein cholesterol [B = 0.004 mmol/L, 95% CI = 0.002, 0.006]) and lower incidence of hypertensive diseases (Hazard Ratio [HR] = 0.97, 95% CI = 0.951, 0.990), stroke (HR = 0.94, 95% CI = 0.903, 0.978) and type 2 diabetes (HR = 0.94, 95 % CI = 0.902, 0.970). Observed associations were independent of self-reported PA. These results support earlier findings suggesting small pleiotropic effects between PA and CMDs and provide new evidence about associations of polygenic inheritance of PA and intermediate cardiometabolic risk factors. Supplementary Information The online version contains supplementary material available at 10.1007/s10654-023-01029-w.


Introduction
Observational studies find that a greater daily physical activity (PA) volume is associated with decreased risk of cardiometabolic diseases (CMDs), and the association is accounted for at least through effects on intermediate risk factors [1][2][3]. Clinical trials of physical activity also reduce intermediate risk factors, while strong evidence based on disease outcomes is largely lacking. Based on existing studies, PA is recommended for the prevention and treatment of CMDs [4].
Shared genetic factors may underlie the association between lifestyle behaviour and disease risk in observational epidemiological studies. Twin studies have suggested that PA may be influenced by genetics [5,6], and more recently, molecular genetics has identified multiple genetic loci associated with PA. Yet the fraction of variance Anja Bye and Elina Sillanpää contributed equally to this study. Elina Sillanpää elina.sillanpaa@jyu.fi accounted for by genetics (i.e. the single nucleotide polymorphism (SNP) heritability is modest (8 to 16%) and less than the estimates from twin and family studies [7,8]. Thus, it has been proposed that individuals with favourable PA genotypes participate more frequently in PA. These participants often have better cardiorespiratory fitness [5] and may more easily adopt a physically active lifestyle than those with less favourable genotypes [6]. The gene-environment interaction could prevent an assessment of the independent environmental contribution to active lifestyle preference because the healthy lifestyle adopted by an individual may be partly influenced by their genotypes. Previous studies have suggested that genetic factors affect disease risk [9,10]. If the genetic influences of PA behaviour and disease risk overlap, genetics may also cause bias in PA-CMD association studies because of genetic pleiotropy.
Polygenic risk scores (PRSs) may provide new insights into the genetic basis behind the associations among lifestyles, disease risks and mortality. Individual-level genetic risk estimates, that is, the PRSs, are generated by summarising genome-wide SNPs and the associated effect sizes into a single variable. PRSs have been used to estimate an individual's genetic propensity for multiple diseases and traits [11,12]. Recently, PRSs of the lifetime risk for coronary heart diseases have been found to weakly improve prediction models for coronary heart disease compared with models based on traditional risk factors [13]. Sillanpää et al. [14] found that PRSs for PA (PA PRSs) were weakly associated with several noncommunicable diseases, suggesting small pleiotropic effects. Recently, in their multiancestry metaanalysis of genome-wide association studies (GWASs) based on over 700,000 persons, Wang et al. [8] found that self-reported moderate-to-vigorous leisure-time PA (LTPA) correlated weakly or moderately with multiple anthropometric characteristics, lifestyle factors, noncommunicable diseases and biomarkers at the genomic level. A possible reason may be the difficulty of measuring PA accurately over sufficient time periods to obtain informative and reliable measures of lifetime physical activity.
In the present study, we have applied a previously developed PA PRS for self-reported moderate PA volume [15]. We first assessed PA-PA PRS association with LTPA using the Trøndelag Health Study (HUNT, Fig. 1). Then, we examined whether PA PRS, as a measure of PA genotype, were associated with aerobic fitness, cardiometabolic risk factors and disease outcomes, here as derived from Norwegian health register data. We hypothesise that the PA PRS were associated with LTPA, aerobic fitness and diseaserelated phenotypes, possibly because of pleiotropic effects. In addition, we tested whether these associations were independent of self-reported LTPA measured at the same time point. This can provide evidence for the hypothesis that the PA genotype directly affects aerobic fitness, cardiometabolic risk factors and the incidence of CMDs.

Study cohorts
We used Pan-UK Biobank GWAS summary statistics from the data-sharing repository as base data for PA PRS calculation [16] (Fig. 1). Pan-UK biobank GWAS included 458,541 participants (45.9% men), and our analysis sample was restricted to persons of European ancestry (N = 400,124). The Pan-UK Biobank participants are not a representative cohort of the general UK population because the participants were somewhat healthier [17]. For more information, see Supplementary material (Supplementary methods).
A PRS was computed and association analyses were conducted in an independent Norwegian cohort, the HUNT Study, which is one of the largest population-based health studies worldwide. Notably, the HUNT data is not part of the Pan-UK Biobank data. The HUNT Study is a unique database of questionnaire data, clinical measurements, and biological samples from over 120,000 participants through four waves of data collection conducted over 35 years [18]. We calculated PA PRS for participants which had both genotype and self-reported LTPA data available in HUNT3 (third measurement round of HUNT). The HUNT3 data was collected between 2006 and 2008, and the study sample included 47,148 participants with mean age of 52.9 years (range, 19.1-100.8 years; 45.9% men, Table 1).
To analyse the association of the PA PRS and aerobic fitness, we used data from the HUNT3 Fitness Study [19], which is a subcohort of HUNT3. The HUNT3 Fitness Study consist of healthy adults (age ≥ 20 year), free from cardiovascular disease (CVD), respiratory symptoms, cancer and use of blood pressure medication. This subcohort included 4,462 genotyped participants with self-reported LTPA, along with directly measured aerobic fitness tests (mean participation age, 48.5 years [range, 19.2-89.2 years]; 49.1% men; Table 1). The HUNT3 Fitness Study is one of the largest European reference materials on cardiorespiratory fitness in the adult population [19]. Sex-specific descriptive tables for all cohorts used can be viewed in the Supplementary material (Sex-specific association analyses and Supplementary Tables S1 and S2).
Survival analyses between the PA PRS and CMD incidences were conducted using 24,960 genotyped HUNT3 participants. The participants were selected for the survival analyses if they had given a permission to use their health register data. The register data was derived from the Nord-Trøndelag Health Trust discharge register (1987-2017) and linked to HUNT3 phenotype data using personal identification numbers. Mean participation age of this subsample was 59.1 years (range, 19.1-100.8 years) in HUNT3 and 46.5% of the participants were men (Table 1).

Genotyping, quality control, and imputation
The UK Biobank Axiom Array was used for genome-wide genotyping in the Pan-UK Biobank. Detailed description of genotyping, quality control and imputation for the Pan-UK Biobank study is available in the Pan-UK Biobank documentation [20]. Genotyping of the HUNT participants was performed with one of three different Illumina Human-CoreExome arrays (HumanCoreExome12 version 1.0, HumanCoreExome12 version 1.1 and UM HUNT Biobank version 1.0) according to standard protocols. Genetic principal components (PCs) were calculated from pruned SNP data to account for clustering related to ancestry. A detailed description of the HUNT Study genotyping, quality control and imputation have been published by Brumpton et al. [21].

Polygenic risk scoring
We utilised previously derived PRS for the self-reported moderate PA [15]. In the original Pan-UK Biobank, GWAS moderate PA was determined based on the self-report question on the 'number of days/week of moderate PA 10 + min' [16]. To construct the PA PRS in the HUNT Study we utilised SBayesR summary statistics methodology in the GCBT software [22], and the Pan-UK Biobank GWAS Fig. 1 Design and workflow of the study. Polygenic score for questionnaire-based moderate physical activity was derived from the Pan-UK Biobank genome-wide association study summary statistics and the third cohort of the HUNT Study (HUNT3; N = 47,148). Association analyses were conducted in HUNT3 and its subcohorts. The cardiometabolic disease (CMD) endpoints were derived from a Norwegian hospital discharge register Tampa, USA). The arithmetic average of the second and third measurements was used. A combined scale (Model DS-102, Arctic Heating AS, Nøtterøy, Norway) was used to measure weight (kg) and height (cm). BMI was calculated as weight divided by height squared (kg/m 2 ). The waist circumference was measured horizontally at the umbicillus height (cm).

Cardiometabolic disease endpoints
The International Statistical Classification of Diseases and Related Health Problems (ICD-9, and ICD-10) codes derived from the Nord-Trøndelag Health Trust discharge register (1987-2017) were used to identify CMD endpoints (Supplementary material; Supplementary Table S10). The quality of the CMD diagnoses in Norwegian registers has been previously validated [27]. The ICD codes included in each endpoint category (disease group) were selected according to FinnGen Data Freeze 9 categorisation, which can be found on FinGenn webpages [28]. This was done to allow for a comparison of the results with a previous Finnish study [14]. The FinnGen disease endpoints have been determined by expert groups, which include medical doctors with different areas/fields of speciality.

Smoking status
Self-reported smoking status included four response options: 'Never smoked', 'Ex-smoker', 'Daily smoker' and 'Occasional smoker'. In analyses, responses were reclassified into two dichotomous variables: never smokers vs. others and current smokers vs. others.

Alcohol consumption
In HUNT3 alcohol consumption was determined as total quantity of pure ethanol in grams per week (g/wk). Alcohol consumption was calculated using the variables "Alcohol Frequency Last 12 months", "Alcohol Beer Last 2 Week(s) Number", "Alcohol Wine Last 2 Week(s) Number" and provided the SNP-specific weights used in the computation. This methodology is based on multiple regression models and a reference link disequilibrium estimated from the genotype correlation matrix. GWAS summary statistics, and the HUNT data were restricted to the European HapMap3 [23] variants with minor allele frequency > 5% and excluding the major histocompatibility complex region from chromosome 6 (GRCh37: 6p22. 1-21.3). Restricting subjects to European ancestry minimises the risk of false positives by stratifying the population [24]. PA PRS was computed as a sum of risk alleles, as weighted by risk allele effect sizes from Pan-UK Biobank, to the HUNT Study's data. The utilised PA PRS is a genome-wide score, and the number of SNPs was restricted to 1,006,313 for computational purposes.

LTPA variable
Average weekly LTPA was collected from three questions regarding frequency, intensity, and duration in HUNT3. We calculated MET-h/wk by recoding the response values of the LTPA intensity and duration, and multiplying the frequency, intensity, and duration for each participant [25]. See the Supplementary material (Supplementary methods) for detailed description of the International Physical Activity Questionnaire (short format) items in the HUNT Study used to assess the dimensions of LTPA.

Aerobic fitness
In the HUNT3 Fitness Study, aerobic fitness was measured as the maximal oxygen uptake (VO 2 max) during an individualised treadmill test protocol until volitional exhaustion [19]. Standard respiratory parameters were measured using mixing chamber gas analyser ergospirometry (Cortex Meta-Max II, Cortex, Leipzig, Germany). VO 2 max was measured as millilitres of oxygen per minute relative to body weight (mL/kg/min). All participants did not reach true VO 2 max, as defined by reaching a plateau in oxygen consumption, despite increased workload and the respiratory exchange ratio reaching above 1.05. Therefore, we used the term peak oxygen consumption (VO 2 peak) for aerobic fitness. A detailed description of the aerobic fitness test protocol is in the Supplementary data (Supplementary methods).

Clinical measurements
Standardised clinical measurements were performed in HUNT3, including blood pressure (diastolic and systolic), waist circumference and body mass index (BMI) and blood lipid and lipoprotein fresh venous nonfasting blood samples [18]. Blood pressure (mmHg) was measured three times at 1-minute intervals using a Dinamap 845XT (Citikon, The proportional hazards assumption was assessed using scaled Schoenfeld residuals and removal of outliers were assessed based on df-beta statistics. The number of events for the different disease categories varied between 934 (stroke) and 19,387 (all CVDs; Fig. 3). We assumed that the human genome stays nearly constant during the life course, so we could set the follow-up starting years to individual birth years. The participants were followed-up until the year of the first CMD event or when contact with the individual was lost (no subsequent healthcare visits). We created separate follow-up times in each CMD analysis. Incidence rate per 10,000 person-years was calculated by dividing the number of CMD events by the total number of person-years and multiplying the result by 10,000 for each CMD category. Because all register data participants either got a disease or were censored, their last follow-up year was averagely considered to contribute half of their last year's follow-up time. We also conducted sensitivity analyses by excluding participants whose disease-onset predated the HUNT3 data collection and set the follow-up starting years to the HUNT3 laboratory visit (Supplementary material; Supplementary Table S9). In these sensitivity analyses we were also able to adjust CMD analyses with LTPA and other lifestyle and socioeconomic covariates.
An increase in the outcome variables and CMD incidences was calculated per standard deviation (SD) unit change in the PRS. The significance threshold was set to P < 0.05, with no adjustment for multiple testing. Standardised PA PRS was used in all models. For linear regression models, the effect size estimation was assessed based on squared semipartial correlation coefficient approach, and for event time models hazard ratios, they were expressed as comparative Cohen's d effect size estimates based on the approach presented by Chen et al. [31] and Rahlfs and Zimmermann [32].

Descriptive characteristics of the cohorts
In the present study, we used three subcohorts of the HUNT Study, which is one of the largest health-related cohorts in Europe. The HUNT3 dataset consist of 47,148 participants, including 21,658 (45.9%) men and 25,490 (54.1%) women. The descriptive statistics of the participants at baseline are shown in Table 1. The age span of the participants ranged from 19 to 101 years. On average, they were mildly overweight and had slightly elevated cardiometabolic risk factors. Sex-specific tables for all cohorts used can be viewed in the Supplementary material (Supplementary Tables S1  and S2).
"Alcohol Liquor Last 2 Week(s) Number". One unit of alcohol in Norway are equal to (1) 33 cl of beer (4.5%, 11.9 g of pure ethanol), (2) 15 cl of wine (12%, 14.4 g of pure ethanol) and (3) 4 cl liquor (40%, 12.8 g of pure ethanol). For each beverage type the number of units consumed in the prior two weeks was multiplied by the average ethanol content, and then summed over all beverage types and divided by two to estimate the weekly consumption of alcohol as grams of ethanol per week.

Socioeconomic status
Socioeconomic status (SES) was declared according to participant working title in HUNT3. The Norwegian working title version of the occupation codes were based on the European standard of the International Classification of Occupations -ISCO-88(COM). In the Norwegian version there are nine major categories as in the International Classification of Occupations -ISCO-88. The coding was re-encoded into three categories according to the International Classification of Occupations -ISCO-88 occupation skill levels [29]. Skill category one (high) includes managers, professionals and technicians (ISCO-88(COM) major categories from one to three). Skill category two (medium) includes clerical, service and sales workers, skilled agricultural and trades workers, and plant and machine operators and assemblers (ISCO-88(COM) major categories from four to eight). Skill category three (low) includes elementary occupations (ISCO-88(COM) major category nine).

Statistical analyses
In the first part, we tested the associations of PA PRS and LTPA using linear regression models. All models were adjusted for the HUNT3 participation age, sex, and 10 genetic PCs. The model was further adjusted for weekly alcohol consumption, smoking status and SES.
Second, the associations of PA PRS with aerobic fitness and cardiometabolic risk factors were analysed using linear regression models and same covariates. When necessary, the outcome variables were log-or square-root-transformed to resemble a normal distribution as far as possible (absolute skewness ≤ 0.5 and kurtosis ≤ 0.5). Model assumptions (linearity, homoscedasticity and outliers) were investigated using plots and relevant statistics and tests before conducting the final modelling. Including genetic PCs as covariates reduced the risk of false positives by stratifying the population [30]. Additionally, because some PRS-sex interactions were found, all analyses were also performed separately by sex (Supplementary material; Supplementary Tables S3-8).
Third, Cox proportional hazard models were used to analyse the association between PA PRS and CMD incidence.

Associations between polygenic scores for physical activity and cardiometabolic risk factors
Second, we tested the associations of the PA PRS and cardiometabolic risk factors in HUNT3. The risk factors included diastolic and systolic blood pressure, waist circumference, BMI, total cholesterol concentration, HDL cholesterol concentration, LDL cholesterol concentration and triglyceride concentration. One SD unit increase in the PA PRS was statistically significantly associated with lower waist circumference (B = -0.003 cm per SD of PA PRS, 95% CI = -0.004, -0.002) and BMI (B = -0.002 kg/m 2 , 95% CI = -0.004, -0.001) and higher HDL cholesterol (B = 0.004 mmol/L, 95% CI = 0.002, 0.006) (Fig. 2). The variances explained by the PA PRS were low (< 0.001-0.050%). The associations remained statistically significant when self-reported LTPA was added into the models (P < 0.001; P = 0.021; P = 0.016, respectively) and further when smoking status, weekly alcohol consumption and SES were added into the models (P < 0.001; P = 0.028; P = 0.004). Additionally, when weekly alcohol consumption, smoking status and SES were added into the model regarding systolic blood pressure, the association between systolic blood pressure and the PA PRS was statistically significant (P = 0.037) suggesting slightly stronger protective association among women. No statistically significant associations were observed with other cardiometabolic risk factors. Sex-PA PRS interaction was statistically significant only for BMI (max P = 0.024). The association analyses separated by sex are presented in the Supplementary material (Supplementary Tables S5 and S6).
The HUNT3 Fitness Study, a further subcohort of HUNT3, consist of 4,462 participants, including 2,191 men (49.1%) and 2,271 (50.9%) women, here with a mean participation age of 48.5 years (range, 19-89 years). The mean peak oxygen consumption (VO 2 peak) was 40.0 mL/kg/min. Compared with HUNT3, the mean PA PRS and self-reported LTPA were higher in this subcohort. The participants were also healthier based on their cardiometabolic risk factors.
In the survival analyses between PA PRS and CMD incidences, we used data from the Nord-Trøndelag Health Trust discharge register (1987-2017), which included 24,960 participants (mean birth year 1944 [range 1907-1988]; 46.5% men) from HUNT3. The average age at CMD onset was 60 years, ranging from 4 to 99 years.

Associations between polygenic score for physical activity, self-reported leisure time physical activity and aerobic fitness
First, we derived a genome-wide PRS (over 1 million SNPs) for self-reported moderate PA using Pan-UK Biobank summary statistics (Phenotype manifest 2020 phenocode: 884) [16]. We determined the proportions of variation of LTPA in HUNT3 and aerobic fitness (VO 2 peak) in the HUNT3 Fitness Study explained by the PA PRS. PA PRS was statistically significantly associated with self-reported LTPA (B = 0.282 metabolic equivalent of task hours per week [MET-h/ wk] per one SD unit of PA PRS, 95% confidence interval [CI] = 0.211, 0.354; Table 2) and further when smoking status, weekly alcohol consumption and socioeconomic status (SES) were added into the model (P < 2•10 -16 ). PA PRS accounted for 0.13% of the variation in the LTPA. However, PA PRS was not statistically significantly associated with VO 2 peak (B = 0.093 mL/kg/min, 95% CI = -0.112, 0.299) in the HUNT3 Fitness Study. The squared semipartial correlations indicated low explanatory strength (< 0.13%) for , sex and 10 genetic principal components; Model 2: adjusted for participation age, sex, 10 genetic principal components and PA PRS; Model 3: adjusted for participation age, sex, 10 genetic principal components, PA PRS, smoking status, alcohol consumption and socioeconomic status. PA PRS = polygenic risk score for moderate physical activity. LTPA = leisure time physical activity. MET-h/wk = metabolic equivalent of task hours per week. VO 2 peak = peak oxygen consumption. B = standardized regression coefficient. CI = confidence interval. R 2 = coefficient of determination. 100ΔR 2 = R-square difference between current and previous model multiplied by 100. Bold type indicates statistical significance at the level of P ≤ 0.05 6% lower hazard for stroke (HR = 0.94, 95% CI = 0.903, 0.978) and 6% lower hazard for type 2 diabetes (HR = 0.94, 95% CI = 0.902, 0.970; Fig. 3). The statistically significant effects were low when expressed as comparative Cohen's d effect size estimates. No significant associations were observed between PA PRS and other diseases, and their effect sizes were also low. Sex-PA PRS interaction was statistically significant only for pulmonary CVDs, which included pulmonary heart disease and pulmonary circulation diseases (P = 0.026). However, the sex-specific analyses for pulmonary CVDs did not show statistically significant association

Associations between polygenic score for physical activity and cardiometabolic diseases
Finally, we tested associations of the PA PRS and CMD incidences in a dataset of participants who gave their consent for their data to be used from a health registry data (the Nord-Trøndelag Health Trust discharge register). One SD unit increase in the PA PRS was associated with a 5% lower hazard for cerebrovascular diseases (hazard ratio [HR] = 0.95, 95% CI = 0.917, 0.984), 3% lower hazard for hypertensive diseases (HR = 0.97, 95% CI = 0.951, 0.990), Fig. 2 Associations between the polygenic risk score for physical activity (PA PRS) and cardiometabolic risk factors in HUNT3. Model 1: adjusted for participation age, sex and 10 genetic principal components; Model 2: adjusted for participation age, sex, 10 genetic principal components and PA PRS; Model 3: adjusted for participation age, sex, 10 genetic principal components, PA PRS and leisure-time physical activity; Model 4: adjusted for participation age, sex, 10 genetic principal components, PA PRS, leisuretime physical activity, smoking status, alcohol consumption and socioeconomic status. HDL = high-density lipoprotein. LDL = low-density lipoprotein. N = number of all participants. B = standardized regression coefficient. CI = confidence interval. R 2 = coefficient of determination. 100ΔR 2 = R-square difference behaviour and risk of CMDs. However, the associations, although statistically significant, were minor and may not be clinically relevant. Overall, the PA PRS has low predictive power, possibly because of the acknowledged inconsistencies in assessing PA in cohort studies [33] and the PRS methodology [34]. For example, self-reports are prone to bias because of personal characteristics and according to meta-analyses self-reported and device-based measures can yield discrepant estimates of PA [35]. Device based measures of PA have low repeatability [36] and they do not consider effects of aging on relative intensity of activity [37], which make it difficult to estimate associations with health variables.
Because LTPA is a behavioral trait, genetic variation is not expected to be explained by single-gene variants but rather by a large set of different gene variants. According to the polygenic model, each variant has its own effect on the LTPA phenotype, with a variety of magnitude, but mostly of small effects. Many GWASs have discovered statistically significant gene variants related to LTPA phenotypes [7,8]. However, replications of these findings have not been successful. The data in the GWASs studies require a very large sample size and accurate phenotype measurements to reach reasonable power to detect significant loci, and both have been a challenge in physical activity and sports-related phenotypes. There are several inherent problems in many LTPA measurements [38]. For example, LTPA levels within individuals vary over the lifespan, and even the day-to-day variation in activity is large [39,40]. Harmonising PA data across cohorts often leads to oversimplifying PA behaviour.  Table S9) of the sensitivity analyses of cerebrovascular diseases, hypertensive diseases and stroke remained statistically significant but were slightly lower compared to the HRs from the main analyses (Fig. 3). Also, type 2 diabetes was no longer statistically significant in the sensitivity analyses. However, the effect sizes of the statistically significant HRs remained low in the sensitivity analyses when expressed as Cohen's d effect size estimates.

Discussion
In the current study, we constructed a polygenic score for self-reported moderate PA [15] in a large Norwegian population-based study, using it as a measure of PA genotype. We observed that the PA PRS was statistically significantly associated with self-reported LTPA, but accounting for only 0.13% of the variance in LTPA. We also found that the PA genotype was statistically significantly associated with some cardiometabolic risk factors and the incidence of several CMDs but not with aerobic fitness. Our observations are consistent with previous findings, suggesting that participants whose genotype supports lower PA volumes tend to participate slightly less in LTPA [15] and may be at a slightly higher risk of developing some major CMDs when compared with participants having a genetic predisposition for high PA [14]. This could suggest small pleiotropic effects; that is, the same genetic variation regulated both PA Fig. 3 Associations between the polygenic risk score for physical activity (PA PRS) and cardiometabolic diseases using Cox proportional hazard models among 24,960 participants free of cardiometabolic diseases at baseline. Hazard ratios (HRs) alongside with their respective confidence intervals (CIs) per each standard deviation of PA PRS for overall cardiovascular diseases and specific outcomes identified in the hospital register are graphically and numerically illustrated. CVD = cardiovascular disease not to physiological determinants of aerobic fitness. In addition, mitochondrial genome variation may explain some of the missing associations with aerobic fitness [44]. Finally, the baseline characteristics have suggested selection bias in these analyses because both PA PRS (2.0 vs. 1.9 per 10 7 units) and LTPA (10.1 vs. 7.8 MET-h/wk) were somewhat larger in the HUNT3 Fitness Study than in HUNT3. Selection bias is a commonly observed phenomenon in sport science research. In our study, it may have limited the variance in aerobic fitness.
Previous studies have suggested that regular participation and increases in LTPA may have a positive impact on cardiometabolic health [45,46]. Managing cardiometabolic risk factors through adequate levels of LTPA may decrease the risks for many CMDs, but the potential effects of shared genetic factors have been unclear. As far as we know, the current study was the first to report the associations between a PA PRS and laboratory measured cardiometabolic risk factors. Our results have suggested that the risk of unhealthy health behaviour (low PA) and CMDs are potentially overlapping to a small degree. We found that a genotype supporting a lower duration of PA was statistically significantly but weakly associated with unfavourable cardiometabolic health measured as intermediate clinically validated risk factors. In the total group of participants, lower PA PRS was statistically significantly but weakly associated with higher waist circumference, greater BMI and lower HDL cholesterol concentration. These results were in line with the associations observed regarding PA PRS and diseases in our study and earlier studies [14]. Future studies are needed to investigate how genetics affect individual training response on cardiometabolic risk factors and adherence in interventions.
There is a limited number of studies related to the genetic predisposition for PA and its association with noncommunicable diseases. Recently, Sillanpää et al. [14] used a large Finnish biobank study, FinnGen, and found that PA PRS was weakly associated with lower CMD incidence. In our study, the associations of PA PRS with stroke, hypertension and type 2 diabetes were comparable to the results observed in the FinnGen study. However, we found that the PA genotype was not statistically significantly associated with CVD (all), coronary atherosclerosis and ischaemic heart diseases incidence, while an increase in PA PRS was, to a small degree, related to a reduction in the incidence of these diseases in the Finnish population. The PA PRS in the current study was developed based on self-reported moderate PA, while Sillanpää et al. [14] used PA PRS based on continuous device-based overall PA volume. Also, the larger cohort size (N = 218,792) and utilisation of the logistic regression modelling, which did not consider the follow-up time in the polymorphism gene variants, here based on trait-risk association, into a single individual-level score [12]. To the best of our knowledge, at least three different PRS have been utilised to describe the PA genotype [8,15]. Kujala et al. [15] constructed a PRS for self-reported moderate PA volume ('number of days/week of moderate PA 10 + min'). Additionally, they constructed a second PRS for device-based overall PA volume (a 7-day period using an Axivity AX3 wrist-worn triaxial accelerometer). These scores, which used Pan-UK Biobank as a base data and included variations in over 1.1 M SNPs, were obtained using a Bayesian approach. In two Finnish cohorts, the self-reported PA PRS explained 0.24% and 0.25% of the variation in the daily self-reported MET scores. The predictive value of the objectively measured PA PRS was largest in the device-based daily steps (1.44%) and worst in the self-reported daily MET score (0.07%). Our current results from the Norwegian cohort are consistent with those reported in the Finnish cohorts [15]. Our PA PRS explained a statistically significant but small proportion of the variation in LTPA (0.13%), which is slightly less than for the different PA phenotypes in the Finnish cohorts (0.24-0.25%). This was to be expected because in HUNT3, the LTPA variable was constructed differently from the Pan-UK Biobank moderate PA variable. Recently, Wang et al. [8] created a PRS for self-reported dichotomous moderate-to-vigorous PA. In their study, the effect sizes were small and largely nonsignificant. Taken together, the previous findings and ours suggest the current PA PRSs can explain only a very small amount of the variation in LTPA.
Earlier studies using a rat model reported that inherited high aerobic fitness was associated with higher levels of spontaneous PA [41,42]. Also, Hanscombe et al. [43] found that genetic variants, expressed mainly in the heart, artery, lungs, skeletal muscle and adipose tissue, were associated with both aerobic fitness and device-based overall PA; they reported a moderate genetic correlation (r g =0.37) between device-based overall PA and cardiorespiratory fitness (VO 2 max). Based on this evidence, we hypothesised that higher PRSs for PA could have been associated with better aerobic fitness in humans. To the best of our knowledge, previous studies have not assessed the association of genetic inheritance of PA using polygenic scores and aerobic fitness.
The results of the current study did not support our hypothesis about shared genetic variation behind PA and aerobic fitness. There are several potential explanations for this. The low heritability (approx. 5%) reported for the GWAS from the Pan-UK Biobank suggests that the lack of association may be explained by the low associations of PA PRS, which can lead to weak statistical power in our relatively small and healthier subcohort. It is also possible that the PA PRS mainly includes genetic variants related to PA behaviour, data, the results of the Cox regression were in line with those from the main analyses. It is also commonly known that PRS may not be portable across cohorts representing different genetic ancestry [49]. As Europeans, the Pan-UK Biobank population (British) and Norwegians are both of European ancestry genetically but may have some minor genetic differences [50]. These differences may affect how the PA PRS constructed using genetic data of British participants is adaptable to a Norwegian population. To address the possible genetic differences in the cohorts, all analyses were adjusted for the 10 first PCs, to account for population stratification. In addition, the selected sub-cohort with fitness measurements was smaller, younger, more physically active and healthier compared to the whole HUNT3 cohort.

Conclusions
Our results have provided complementary evidence that polygenic inheritance of PA statistically significantly overlaps with cardiometabolic diseases and their associated intermediate risk factors and that these associations were not substantially changed when LTPA was included as a predictor. However, in general, the PA PRS explained only a minor proportion of variance in the studied phenotypes. A major limitation in this field is the use of varying methodologies to measure PA, which complicates harmonisation of different data sets and hinders the development of adequately powered datasets. For the first time, we tested the association between PA PRS and aerobic fitness, here as measured as VO 2 peak, which was found to not be statistically significant in the HUNT3 Fitness Study. However, different PRSs derived using aerobic fitness variable might reveal stronger associations between genetic predisposition for PA and aerobic fitness. In addition, in this field, large-scale collaborative efforts are needed to pool together genotyped datasets with measured aerobic fitness. To conclude, the current study suggests some similarities in the genetic inheritance of PA behaviour and development of cardiometabolic diseases. However, currently the PA PRS is not expected to have clinical utility in health promotion but improved PRSs constructed based on different types of device-based PA measurements should be tested. previous study, may have made smaller effects statistically significant.

Supplementary Information
The sex differences in our association analyses were generally minor. We found that an increase in PA PRS was statistically significantly but weakly associated with lower BMI in women but not in men. Some studies have found sex-specific genetic effects associated with BMI variance [47,48], suggesting modifications depending on lifestyle factors arising from calorie intake, PA and sedentary behaviour. We also found a statistically significant sex-PA PRS interaction in pulmonary heart disease and pulmonary circulation diseases. This could indicate commonly known differences in health behaviour between genders (e.g., more frequent smoking among men) and biological sex differences affecting cardiometabolic health, such as the protecting effect of estrogen before menopause in women. Overall, sex differences were very marginal and can also be related to sample-to-sample variations.

Strengths and limitations
There were several markable strengths in our study compared with previous research. We were able to assess the associations between the PA genotype, directly measured aerobic fitness and clinically assessed cardiometabolic risk factors and diseases in one of the largest population-based health studies worldwide. This has not been possible in earlier population-based studies [14]. We utilised a stateof-the-art method for quantifying the PA genotype and robust analyses to evaluate the associations between the PA genotype and outcomes. These novel approaches elucidated whether the associations between genetic inheritance of PA, cardiometabolic risk factors and diseases were confounded by LTPA, hence helping to test the hypothesis of shared genetic associations between PA and aerobic fitness.
There were several notable limitations. First, the moderate PA phenotype of the Pan-UK Biobank GWAS differed from the LTPA phenotype available for PRS computation in HUNT3. This may have lowered the predictive capability of the PA PRS. The diagnosis codes for the CMDs studied were comprehensive, but the data of the Nord-Trøndelag Health Trust discharge register (1987-2017) may have included identification bias because the Norwegian Patient Registry started to use personal identification codes only from the year 2008 onwards [27]. The major weaknesses in the Cox regression models were that we were not able to separately analyse fatal CMDs, the register data patient number was relatively low for this kind of genetic analysis and we did not have exact death dates available. Although our sensitivity analyses, including confounding variables and adjustment for initial measurement time, reduced the sample size by a loss of more than half of the participants due to missing holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.
HUNT Databank personnel. The Gerontology Research Center is a joint effort between the University of Jyväskylä and the University of Tampere.
Author contributions Elina Sillanpää, Jaakko Kaprio and Urho Kujala conceived the idea for the study. Anja Bye and Marie Klevjer accessed and verified the data and provided support for the use of HUNT cloud. Teemu Palviainen and Elina Sillanpää designed and supervised the construction of polygenic scores and Niko Paavo Tynkkynen conducted the analyses under supervision of Teemu Palviainen. Elina Sillanpää, Timo Törmäkangas and Jaakko Kaprio designed and supervised the statistical analysis. Niko Paavo Tynkkynen, Matti Hyvärinen and Timo Törmäkangas performed the statistical modelling. Niko Paavo Tynkkynen, Elina Sillanpää, Timo Törmäkangas, Matti Hyvärinen, Laura Joensuu, Urho Kujala and Jaakko Kaprio performed the interpretation of the data. Niko Paavo Tynkkynen and Elina Sillanpää drafted the first version of the manuscript, and Timo Törmäkangas and Laura Joensuu contributed significantly to writing. Elina Sillanpää and Anja Bye acquired the funding for the study. All the authors have been involved in drafting the manuscript or revising it critically for important intellectual content; they have approved the analyses done and they have given their final approval of the version to be published.
Funding Open Access funding provided by University of Jyväskylä (JYU). This study is a part of the GenActive project which is funded by the Academy of Finland (341750, 346509 to E.S.), Juho Vainio Foundation (E.S.) and Päivikki and Sakari Sohlberg foundation (E.S.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.
Data Availability The summary statistics for the self-reported moderate PA GWAS from the Pan-UK Biobank are available at https://pan. ukbb.broadinstitute.org/. The used HUNT Study dataset are available under restricted access by application to HUNT Databank (https:// hunt-db.medisin.ntnu.no/hunt-db/#/).

Declarations
Competing interests Authors declare no conflict of interests.

Ethical statements The North West Multi-Centre Research Ethics
Committee approved the UK Biobank study (approval number: 11/ NW/0382). HUNT3 was approved by the Regional Committee for Medical and Health Research Ethics (no. 29771). HUNT Study was approved by the Norwegian Data Inspectorate, and by the National Directorate of Health. This study was performed according to the guidelines of the Finnish Advisory Board on Research Integrity (http:// www.tenk.fi/en/), good scientific practice and current legislation.

Consent to participate
The principles of informed consent in the Declaration of Helsinki were implemented, and written informed consents were received from all participants.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright