Introduction

Parkinson’s disease (PD) is a multifactorial complex disorder featuring dopaminergic neuron loss and the pathological hallmarks of α-synuclein and Lewy bodies1. Although highly efficacious symptomatic therapeutics are available, curative therapies remain scarce2. This can be explained by the unclear pathogenesis of neurodegeneration and its insidious onset of 5–6 years predating the onset of typical symptoms3. Previous research suggested that mitochondrial dysfunction, defective protein degradation, and oxidative stress are considered important prodromal molecular pathways involved in PD pathogenesis4. Nevertheless, the molecular pathways underlying the pathogenesis of PD remain incompletely understood. Therefore, disentangling additional pathways related to the future development of PD is needed for a more comprehensive understanding of the aetiology of PD and helps to reveal potential targets for intervention.

Metabolic profiling is an emerging area of research. Unlike conventional approaches of detecting biomarkers, metabolomics provides a comprehensive roadmap of biological processes in real-time and can reflect the integrated effects of genetics, lifestyle, and environmental factors5. This lends itself to the idea that it may unveil novel pathways for multifactorial diseases, such as PD6. And the value of metabolite biomarkers to predict health outcomes has been revealed by several previous studies7,8.

A series of studies have characterised the metabolomics profile in individuals with PD9,10,11,12,13,14,15, but distinct analytical techniques used, metabolite panels quantified, and varied statistical analyses that were performed in these studies resulted in inconclusive findings16. However, there are a limited number of studies that investigated the metabolomics in individuals predating the onset of PD, where only a few metabolites were found to have a robust association with PD17. There is also a lack of evidence on comparing the metabolic profiles between prevalent PD and incident PD, which potentially informs the changes of metabolomics across different stages of PD. Moreover, whether metabolite profiles predating the onset of PD have predictive capabilities of developing future PD remains unclear.

The UK Biobank study is a large prospective study with comprehensive health-related information and has quantified the metabolomic profiles of more than 110,000 participants18. With these resources, the current study aims to investigate the metabolite profile alterations associated with incident PD in the large-scale sample derived from the UK Biobank Study and whether metabolites can provide added value for the prediction of future PD risk.

Results

Study population

A total of 109,991 participants were included in the present study with a mean (standard deviation [SD]) age of 56.5 (8.10) years, and 53.8% were females. Among the 109,790 participants without PD at baseline, after the median (range and interquartile range [IQR]) follow-up of 12.2 (range: 0.01–14.0, IQR: 11.5–12.9) years, 644 (0.59%) cases of incident PD were identified. Baseline characteristics of the participants stratified by incident PD are shown in Table 1. Participants who developed PD were more likely to be older, male, with higher blood pressure, have a history of diabetes, hyperlipidemia and stroke, take psychotropic medications, and be APOE and GBA variants carriers. The details of the baseline characteristics of participants stratified by prevalent PD at baseline were described in Supplementary Table 2.

Table 1 Baseline characteristics of participants stratified by incident PD

Circulating metabolites associated with incident PD

Given the chronic onset of PD, we excluded the participants who developed PD within one year from baseline when exploring the association between plasma metabolites and incident PD. The results of the metabolic associations for the remaining 639 incident PD cases are shown in Figs. 1, 2 and Supplementary Table 4. After adjusting for all covariates, 68 metabolites among the 249 measured metabolites were associated with incident PD at nominal significance, spanning lipid subgroups (cholesterol, cholesterol esters, fatty acids [FA], free cholesterol, phospholipids, triglycerides, and other lipids), lipoprotein particle concentrations, total lipids, lipid constituent of lipoprotein subclasses and ratios of lipid constituents. All metabolites were inversely associated with incident PD except for four metabolites in the subgroup of lipid ratios in lipoprotein subclasses: cholesteryl esters to total lipids ratio in chylomicrons and extremely large very-low-density lipoprotein (VLDL), phospholipids to total lipids ratio in intermediate-density lipoprotein (IDL) and in small low-density lipoprotein (LDL), and free cholesterol to total lipids ratio in very large VLDL (Fig. 2).

Fig. 1: Manhattan plot of all 249 metabolites investigated for incident PD.
figure 1

All 249 metabolites were categorised by 19 sub-groups. Metabolites above the grey line indicate metabolites associated with PD at nominal significance (P < 0.05) and those annotated above the red line indicate a significant association after multiple testing corrections (P < 9\(\times\)10−4). P values for the association between each metabolite and PD risk were derived from Wald tests.

Fig. 2: Forest plot of 68 metabolites associated with PD at nominal significance.
figure 2

The estimated scales and directions of the associations (expressed as hazard ratios and 95% confidence intervals) between 68 metabolites and incident PD.

After correcting for multiple comparisons, two metabolites including omega-6 FA (HR = 0.86, 95% CI: 0.79–0.94, P = 8.71\(\times\)10−4) and polyunsaturated FA (PUFA) (HR = 0.86, 95% CI: 0.79–0.94, P = 8.98\(\times\)10−4) remained to be significantly associated with incident PD (Figs. 1, 2 and Supplementary Table 4).

After further adjusting the frailty index in the model, the results were similar to the main analysis (Supplementary Table 3). 92 metabolites were identified to be associated with incident PD, with omega-6 FA and PUFA remaining significant associations with incident PD.

Circulating metabolites comparison between prevalent and incident PD

Participants who developed PD within one year from the baseline were considered to be in the prevalent PD group. A total of 133 metabolites were associated with prevalent PD at nominal significance and 15 metabolites were associated with prevalent PD after multiple testing corrections (Supplementary Table 4). At nominal significance, 56 metabolites were shared by both prevalent and incident PD, falling in categories of lipoprotein subclasses, FA, free cholesterol, lipoprotein particle concentrations, triglycerides, cholesterol, cholesterol esters, choline, phospholipids, and total lipids. After multiple testing corrections, prevalent and incident PD shared one overlapping metabolite, PUFA. Omega-6 FA was exclusively associated with incident PD, and 14 metabolites were exclusively associated with prevalent PD, including amino acids (tyrosine and valine), FAs (omega-3 FA, DHA, and total FA), different lipoprotein subclasses and ratios of lipids.

Predictive value of added metabolites in 10-year PD risk

Supplementary Fig. 1 demonstrated the receiver operating characteristic (ROC) curves of conventional risk factor-based model, the metabolite-based model and the combined model for prediction of PD risk in 10 years. The AUC for conventional risk factors-based model was 0.766 (95% CI: 0.746–0.787). The model comprising omega-6 FA and PUFA achieved an AUC of 0.580 (95% CI: 0.554–0.607). The AUC for adding omega-6 FA and PUFA to the risk-factor-based model was 0.768 (95% CI: 0.748–0.788). Despite the slight increase in the predictive value, performance of the combined model was comparable to the risk-factor-based model (P = 0.145).

Discussion

Our study identified 68 metabolites associated with incident PD including lipids, all sizes of lipoprotein subclasses, and their concentrations and ratios of lipid constituents. Two metabolites including PUFA and omega-6 FA remain significant inverse associations with PD after correction for multiple testing and PUFA was the metabolite shared by prevalent and incident PD. Many metabolites were supported by previous studies examining the molecular signatures of PD, including lipoproteins and lipid constituents. The present study provided additional evidence on all sizes of lipoprotein subclasses (chylomicron and extremely large VLDL, VLDL, LDL, and HDL), their lipid constituents and ratios of components. These findings help to provide additional evidence to understand different pathways related to the development of PD.

Fatty acid metabolism was one of the most important pathways found in the analyses, with several most robust biomarkers of PD falling into this category. Total FA, saturated FA, monounsaturated fatty acids, PUFA, omega-3 FA and omega-6 FA-two classes of PUFA, and linoleic acid-a short-chain omega-6 FA were found to be protective against PD in our study. Previous prospective longitudinal studies and clinical trials have provided supportive evidence of the inverse association between the intake of fatty acids and the risk of PD19,20,21,22. The neuroprotective properties of unsaturated FA could be attributed to their proven properties of inflammation resolution, immune modulation, and oxidative stress alleviation23,24,25,26. Specifically, PUFA derivatives had the ability to modulate dopaminergic activity in the basal ganglia27. In vitro, studies have found that omega-6 FA (linoleic acid and arachidonic acid) could resurrect the viability of MPP-induced Parkinsonism cell models28 which could explain our findings on protective benefits of omega-6 FA. However, our findings are the first to suggest an association of saturated FA with incident PD. Previous studies on dietary intake found no association between saturated FA and incident PD20,22 or even adverse effects of saturated FA in animal experiments29. Considering the differences between simple dietary intake and the complex metabolism procedure and between humans and rats, these findings warrant further investigations in cohort studies.

Lipoproteins of various particle sizes and subclasses (total cholesterol, LDL-cholesterol, HDL-cholesterol, VLDL-cholesterol and chylomicrons) were inversely associated with PD in the present study. These findings were supported by previous studies30,31,32,33,34, although inconsistent results were found in some research31,35,36. The potential mechanism underlying the association could be the critical neuroprotective role of lipids in repairing or alleviating PD pathology in the central nervous system (CNS)20,35. Higher levels of cholesterol could be recruited to sustain synaptogenesis37 or sequester ferrous iron to prevent iron-induced oxidative stress38. Alternatively, given that the brain can synthesise cholesterol de novo39, the involvement of lipids in PD pathogenesis might start in the periphery, such as the gut31. For example, α-synuclein (α-syn) accumulation found in the enteric nervous system may predate CNS pathology and its presence may be an indicator of early PD pathogenesis40. Indeed, gastrointestinal symptoms are common in PD patients, and malnutrition is among the manifestations, which provides the rationale for the lower lipid levels found in these patients41. Research on VLDL-cholesterol and chylomicrons is relatively limited. Only one cross-sectional study found lower serum VLDL-cholesterol was associated with PD42, partially supporting our findings of VLDL as protective. As these are triglyceride-rich lipoproteins, the inverse association of chylomicrons with PD might be attributed to the makeup of the molecule being largely triglyceride and cholesterol.

Triglycerides and cholines were also found to be negatively associated with incident PD. A growing number of studies and meta-analyses have suggested the role of triglycerides in PD is protective35,36,43. Moreover, Laguna et al. found decreased triglycerides in a specific lipoprotein subclass (LDL) during the conversion from prodromal phase to Lewy body dementia (DLB) which shared similar hallmark pathology with PD44. Cholines are an essential nutrient critical for brain development and basic biological processes in cells such as the synthesis of plasma membrane lipids45,46,47. Decreased levels of specific cholines such as phosphatidylcholines and sphingomyelins have been found in the various brain structures of PD patients48,49,50.

Interestingly, among 68 metabolites associated with PD, only four metabolites were positively associated with incident PD, which were all ratios of component lipids in different sizes of lipoprotein particles. It could be postulated that, although these lipids were considered protective when examined separately, their interplay and balance could hold meaning to biological processes that are yet to be understood. The value of lipoproteins ratios is an emerging area of research in cardiovascular diseases, and some ratios may even be better indicators of risk prediction than lipoproteins in isolation51,52,53,54. There are limited studies on these biomarkers in neurodegenerative diseases, so the implications of these biomarkers require further scrutiny.

Although we identified metabolites that were significantly associated with PD, the improvement in the predictive value by adding them to the conventional risk-factor-based model was subtle. This may potentially be explained by the involvement of PD risk factors in the model, such as diabetes, hypertension and hyperlipidemia per se already demonstrated some changes in the metabolic profile, which masked the added value of the selected metabolites. Nevertheless, our study provided additional insight into the underlying mechanism of PD through the aspects of metabolites including the detailed breakdown of the lipid constituents and ratios of lipids in different sizes of lipoproteins which were rarely studied previously. We also examined the metabolic profiles of prevalent and incident PD and found the exclusive and overlapping metabolites between different disease states, which potentially provides clues for future studies to corroborate the dynamics of molecular pathways.

PD is increasingly considered as a complex heterogeneous disorder that has been classified into different subtypes in terms of clinical, hereditary, imaging and pathological features55. Whether different molecular pathways are involved in different subtypes remains to be elucidated55. Further studies are warranted to investigate metabolomics in the context of different subtypes of PD to increase the specificity and explore the potential additional value of this tool. Moreover, the protocol of blood sampling in the UK Biobank prevented us from understanding the potential fluctuations of the metabolite levels throughout the day, which needs to be considered when interpreting the results. Further studies should streamline the protocol of sample collection with the aim to delve deeper into the influences of timing on the metabolic profile of PD. In addition, although we compared the metabolic profiles between prevalent and incident PD, future studies are needed to investigate the longitudinal changes of the metabolomic profiles of participants from prodromal to clinical onset of PD. This will help to reveal the trajectory of the changes in the metabolic profile and therefore, enable the selection of target biomarkers for early intervention or evaluation of treatment response.

There are several strengths of our study including a large sample size and long-term follow-up period. In addition, nuclear magnetic resonance (NMR) spectroscopy demonstrates higher reproducibility and quantitative capabilities than mass spectrometry56. However, there are several limitations to mention. First, NMR platforms have lower sensitivity and selectivity, making it difficult to detect metabolites at very low concentrations for targeted analysis56. Second, despite the longitudinal design of the study, we could infer potential risk factors for incident PD but could not postulate the causality. Therefore, our results warrant further investigation by interventional studies. Thirdly, the UK Biobank study consists of a generally healthy, Caucasian cohort which may make it prone to health selection bias. Despite this, the representativeness of PD within the sample would not affect our associations57. Fourthly, although we adopted an algorithm-based strategy for incident PD ascertainment, there is a possibility that incident PD cases were underestimated because of the lag between disease onset and diagnosis. Fifth, we selected the risk factors for PD risk prediction from the previous literature instead of using the Movement Disorder Society (MDS) criteria for prodromal PD, given the difficulty to define all the markers using the UK Biobank data. Moreover, the results were not validated in external datasets, and some metabolomics data were derived from absolute values, calling for future external validation of the metabolic associations and validity of the derived metabolite indices. Lastly, we could not exclude residual confounding.

In the present study, we identified lipids, apolipoproteins, lipoprotein subclasses, and their concentrations and ratios of lipid constituents to be metabolites associated with incident PD. In addition, the metabolic profiles between prevalent and incident PD were different but shared certain common metabolites. These findings suggested that metabolic profiles can provide additional insights to understand the pathogenesis of PD.

Methods

Study sample

The study sample was derived from the UK Biobank study, a cohort consisting of more than 500,000 participants aged 40–69 years across the UK18. Baseline recruitment was performed from 2006 to 2010 with comprehensive health-related information collected, and additional data were regularly augmented. Repeating visits and online follow-ups were also performed, and health outcomes were tracked longitudinally through electronic health-related records. A detailed protocol of the UK Biobank has been described in a previous study18. In the present study, participants with complete data on quantified metabolites and genetic data at baseline (n = 109,991) were included. Of these, 201 individuals had a history of PD at baseline, and five participants developed PD within one year from the baseline. Considering the chronic onset of PD, the remaining 109,785 participants were included in the analysis of the metabolic associations with incident PD. In order to compare the metabolic profiles between prevalent and incident PD, a total of 109,991 participants, including 201 participants with diagnosed PD at the baseline and five participants diagnosed with PD within one year from the baseline were included in the association study between metabolites and prevalent PD.

Metabolites quantification

Details of the NMR platform and experimentation have been described elsewhere54. Nightingale Health (Finland) quantified the metabolites of EDTA plasma samples from approximately 120,000 UK Biobank participants. 118,000 samples were collected at baseline of the UK Biobank study, and 5000 were obtained at the repeat assessment. Samples were measured from June 2019 and April 2020. The blood samples were collected from participants in a non-fasting state, but they were advised to have at least four hours of fasting beforehand58. A total of 249 metabolites (Supplementary Table 1) were quantified, including lipoprotein lipids of 14 different particle sizes, FA, amino acids, ketone bodies, and glycolysis metabolites.

Definition of PD

PD was defined by the UK Biobank algorithm using combined sources of self-reported PD through questionnaire and nurse-led interviews, hospital admission records and death registry (see details at http://www.ukbiobank.ac.uk). Vascular parkinsonism and atypical parkinsonism such as multiple system atrophy (MSA) and progressive supranuclear palsy were excluded. In hospital admission records and death registries, PD was ascertained by the International Classification of Diseases (ICD-9) codes 332.0 and ICD-10 codes G20. Participants with PD at baseline were ascertained by self-report of PD diagnosis, and hospital admission records. Incident PD was ascertained by hospital admission records and death registers after the baseline assessment. Follow-up periods were from baseline to the first occurrence of PD, death date, or the last follow-up date, whichever is the earliest.

Covariates

By reviewing previous studies, covariates in the present study included baseline age, sex, smoking status, body mass index (BMI), systolic blood pressure (SBP), treated hypertension, history of diabetes, history of hyperlipidemia, history of stroke and use of psychotropic medications33,59,60,61,62,63,64. Additional covariates included the GBA variants which increase the risk of PD and affect the levels of plasma apolipoproteins65,66, and the presence of APOE allelic variants, which interfere with the lipoprotein levels in PD patients67. Age, BMI, and SBP were treated as continuous variables, and other variables were categorised as yes or no.

Treated hypertension was defined by participant-reported treatment on anti-hypertensive medications. History of diabetes was defined as with an HbA1c level \(\ge\)6.5%, with a diagnosis of diabetes, taking anti-diabetic medications, or on insulin treatment. History of hyperlipidemia was defined as having self-reported hyperlipidemia, taking anti-dyslipidemia medications, or a cholesterol level of 6.21 mmol/L and higher. History of stroke was defined by the first occurrence of stoke if preceding the baseline assessment. Psychotropic medications included anti-depressive, anti-migraine, and anxiolytic medications.

Statistical analyses

For the plasma metabolites, a natural logarithmic transformation was applied to the raw data and Z score normalisation was further performed. Descriptive statistics summarised and organised our data. Categorical variables were described as numbers and percentages, and continuous variables were described with means and SDs or medians and IQRs. Comparison of the variables across different groups was carried out using the non-paired t tests continuous variables or Chi-square tests for categorical variables. A two-sided P value less than 0.05 was considered statistically significant for the above tests.

Cox proportional hazard models were applied to model the association between metabolites and incident PD. Logistic regression models were used to evaluate the associations between metabolic profiles and prevalent PD at baseline. Multivariable-adjusted models were conducted. The confounders adjusted in the main model were age, sex, smoking status, BMI, SBP, treated hypertension, history of diabetes, history of hyperlipidemia, stroke, use of psychotropic medications, presence of APOE allelic variants and GBA variants. We conducted an additional model by further adjusting for frailty index to investigate the association between metabolites and incident PD68. The frailty index was defined by a previously published method in the UK Biobank population, combining five aspects: weight loss, feeling of exhaustion, physical inactivity, walking speed and grip strength69. Hazard ratio (HR), odds ratios (OR) and 95% confidence intervals estimated the associations for which a P value less than 0.05 was considered nominally significant. Considering the potential strong correlation between the metabolites, we performed a principal component analysis (PCA) according to a previously described method70 and found that 55 components could account for 99.5% of the total differences. Therefore, we considered a p value of less than nine\(\times\)10−4 (0.05/55) as statistically significant. Wald tests were employed to assess the statistical significance of the estimated HRs and ORs for each metabolite.

Three logistic regression models were established to test the predictive values of conventional risk factors and specific metabolites for future PD development in 10 years. Model 1 included traditional risk factors determined by reviewing previous research: baseline age, sex, smoking status, BMI, SBP, treated hypertension, history of diabetes, history of hyperlipidemia, history of stroke, use of psychotropic medications, and the presence of GBA variants33,59,60,61,62,63,64,65. Notably, these factors are not chosen based on the MDS Research Criteria for Prodromal PD71, given the lack of data to define the whole set of risk markers and prodromal markers in the UK Biobank. Model 2 was based on metabolites associated with incident PD after correction for multiple tests. Model 3 combined all variables in Model 1 and Model 2. Area under receiver operating characteristics curves (AUC) were used to estimate the performance of each model. DeLong test was employed to assess the statistical significance of the difference between AUCs.

All statistical analyses were performed using Stata version 13 (StataCorp LLC, College Station, Texas USA).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.