Data-driven subtyping of Parkinson’s disease: comparison of current methodologies and application to the Bochum PNS cohort

Chen, Qiang; Scherbaum, Raphael; Gold, Ralf; Pitarokoili, Kalliopi; Mosig, Axel; Zella, Samis; Tönges, Lars

doi:10.1007/s00702-023-02627-4

Data-driven subtyping of Parkinson’s disease: comparison of current methodologies and application to the Bochum PNS cohort

Neurology and Preclinical Neurological Studies - Original Article
Open access
Published: 31 March 2023

Volume 130, pages 763–776, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Neural Transmission Aims and scope Submit manuscript

Data-driven subtyping of Parkinson’s disease: comparison of current methodologies and application to the Bochum PNS cohort

Download PDF

1770 Accesses
Explore all metrics

Abstract

Considerable efforts have been made to better describe and identify Parkinson's disease (PD) subtypes. Cluster analyses have been proposed as an unbiased development approach for PD subtypes that could facilitate their identification, tracking of progression, and evaluation of therapeutic responses. A data-driven clustering analysis was applied to a PD cohort of 114 subjects enrolled at St. Josef-Hospital of the Ruhr University in Bochum (Germany). A wide spectrum of motor and non-motor scores including polyneuropathy-related measures was included into the analysis. K-means and hierarchical agglomerative clustering were performed to identify PD subtypes. Silhouette and Calinski–Harabasz Score Elbow were then employed as supporting evaluation metrics for determining the optimal number of clusters. Principal Component Analysis (PCA), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) were conducted to determine the relevance of each score for the clusters’ definition. Three PD cluster subtypes were identified: early onset mild type, intermediate type, and late-onset severe type. The between-cluster analysis consistently showed highly significant differences (P < 0.01), except for one of the scores measuring polyneuropathy (Neuropathy Disability Score; P = 0.609) and Levodopa dosage (P = 0.226). Parkinson’s Disease Questionnaire (PDQ-39), Non-motor Symptom Questionnaire (NMSQuest), and the MDS-UPDRS Part II were found to be crucial factors for PD subtype differentiation. The present analysis identifies a specific set of criteria for PD subtyping based on an extensive panel of clinical and paraclinical scores. This analysis provides a foundation for further development of PD subtyping, including k-means and hierarchical agglomerative clustering.

Trial registration: DRKS00020752, February 7, 2020, retrospectively registered.

Diagnosis of Early Alzheimer’s Disease: Clinical Practice in 2021

Article Open access 09 June 2021

Epidemiology of Parkinson’s disease

Article 01 February 2017

Multiple system atrophy: an update and emerging directions of biomarkers and clinical trials

Article Open access 14 March 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Parkinson's disease (PD) is a complex and heterogeneous disorder in both clinical and paraclinical terms. The clinical presentation and progression of PD are highly variable and present different scenarios not only in various stages of the disease but also within a single stage (Barone et al. 2009; Beiske et al. 2009; Defazio et al. 2008; Fereshtehnejad et al. 2017; Ford 1998; Gallagher et al. 2010; Goetz et al. 1986; Hendricks and Khasawneh 2021; Jankovic et al. 1990; Kalia and Lang 2015; Koller 1984; Lawton et al. 2018; Snider et al. 1976; Wasner and Deuschl 2012). The identification of PD subtypes may therefore lead to further insights into pathophysiological mechanisms of disease but could also identify novel therapeutic targets and ultimately lead to improvements in patient care.

Usually, the clinical presentation of PD is described with three main clinical phenotypes: tremor-dominant phenotype, akinetic-rigid phenotype, and mixed or equivalence phenotype (the latter being a combination of the other two phenotypes without any dominant symptoms (The German Neurological Society 2016). The course of disease has been shown to be associated to the clinical phenotype as with the tremor-dominant phenotype developing more slowly with a less severe course than the akinetic-rigid or equivalence phenotype (Jankovic et al. 1990; Wojtala et al. 2019).

However, the depth of phenotypic information in the aforementioned studies was often variable and limited. Each of these subgroups shows different clinical progression, and disease symptomatology does not only consist of motor symptoms but also of disabilities from non-motor deficits (e.g., depression, anxiety, fatigue, orthostatic hypotension, sleep disturbances, polyneuropathic hypoesthesia, and thus movement difficulties). So far, relatively few studies have focused on a more thorough analysis of these complex manifestations of the disease (Mestre et al. 2021). These are characterized by high variability not only between patients or within the same patient, but also by high variability depending on the clinical disease stage (Barone et al. 2009; Beiske et al. 2009; Defazio et al. 2008; Ford 1998; Gallagher et al. 2010; Goetz et al. 1986; Jankowicz et al. 1986; Kalia and Lang 2015; Koller 1984; Nègre-Pagès et al. 2008; Snider et al. 1976; Wasner and Deuschl 2012). Interestingly, previous work showed that polyneuropathy (PNP) has a high prevalence in people with PD and can be associated with non-motor and motor symptoms of PD, as well as with the disease severity (Kühn et al. 2020). PNP as associated disease criterion has never been applied before in disease pattern clustering analyses.

In this paper, the authors propose a data-driven subtyping method that integrates motor and non-motor characteristics, including polyneuropathy-related scores of PD patients.

The main objectives of the present study were: (1) to perform cluster analysis using machine learning algorithm; (2) to identify the PD subtypes and compare the clinical characteristics of each subtype; and (3) to identify the crucial factors to differentiate the PD subtypes and analyze the association between those factors.

Data and methods

This original paper adhered to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines for reporting observational studies (Elm et al. 2007). By following these guidelines, the authors aimed to ensure the transparency, completeness, and rigor of the current study, and to facilitate the critical appraisal and interpretation of the present findings by readers and reviewers.

Study design

The study was performed as a data-driven subtyping approach based on a cross-sectional sample from a single-center prospective observational cohort study (Kühn et al. 2020). The study was approved by the Institutional Review Board of the Medical Faculty of the Ruhr University Bochum on September 12, 2018 (Register No. 18-6360), registered in the German Clinical Trials Register (DRKS-ID: DRKS00020752), and conducted in accordance with the ethical standards of the Declaration of Helsinki.

Setting and participants

Data for the present analysis were collected from October 2018 to January 2022 at the department of neurology of a university medical center (St. Josef-Hospital of Ruhr University Bochum, Germany). As published previously (Kühn et al. 2020), eligibility criteria comprised an age over 18 years, a diagnosis consistent with the PD diagnostic criteria according to both the United Kingdom Parkinson’s Society Brain Bank criteria (Gibb and Lees 1988) and the Movement Disorders Society’s Criteria for Parkinson’s disease (Postuma et al. 2015), and written informed consent. Exclusion criteria comprised causes of neuropathy, such as diagnoses of diabetes mellitus or alcohol dependence disorder, as well as severe dementia, insufficient language skills, illiteracy, and acute mental disorders. Inpatients and outpatients of the department were screened for eligibility during the period of recruitment by review of the hospital information system. The participants included in this analysis were or are followed up over several years, but only data from the baseline visit were included into the present analysis.

Variables and data sources

A comprehensive assessment including clinical examination, and patient-report questionnaires were performed. For the cluster analysis, 14 features were included:

(a)
Demographic information: age and disease duration since diagnosis of PD.
(b)
Motor symptoms: Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part II (motor experiences of daily living), Part III (motor examination), and Part IV (motor complications); The Hoehn and Yahr (H&Y) Stage.
(c)
Non-motor symptoms: MDS-UPDRS Part I (non-motor experiences of daily living), Non-motor Symptom Questionnaire (NMSQuest) (Chaudhuri et al. 2006), Scales for Outcomes in Parkinson’s Disease-Autonomic (SCOPA-AUT) (Visser et al. 2004), and Parkinson’s Disease Questionnaire (PDQ-39) (Jenkinson et al. 1997).
(d)
Cognitive function: Montreal Cognitive Assessment (MoCA) (Nasreddine et al. 2005).
(e)
Other feature: levodopa equivalent daily dose (LED).
(f)
Polyneuropathy examination: modified Neuropathy Disability Score (NDS) (Dyck et al. 1980; Xiong et al. 2015) and modified Neuropathy Symptom Score (NSS) (Dyck et al. 1980).

Statistical methods

A total of 114 patients were included in the present study, and the missing values (4.45%) of the 14 features were imputed using mean values for the entire data, which are presented as means with standard deviations. Data were tested for normality and homogeneity of variance with Quantile–quantile plot, Shapiro–Wilk test, and Levene test, respectively. Univariate statistical tests were performed with Pearson's chi-square tests for categorical data. For continuous variables, analysis of variance (ANOVA) with Kruskal–Wallis H test and post hoc analysis with Dunn’s test with Holm adjustment were performed. Analysis of covariance (ANCOVA) was conducted to adjust between-clusters comparisons for age and disease duration as potential covariates. Correlations between the 14 parameters were calculated with Spearman's rank method. Global significance was set at α < 0.05. Statistical analyses were performed with Python version 3.9.14. Due to the observational and exploratory nature of the study, no sample size calculation was performed.

Choice of appropriate clustering methodology

Clustering is a technique used in data analysis to group similar data points together. Two of the most widely used methods for clustering are hierarchical and partitioning (Sharma 1996; Fraley and Raftery 1998). The hierarchical method is a non-partitioning approach. The clusters are represented hierarchically through a dendrogram. A dendrogram is a tree-like structure where the leaves represent individual data points, and the branches represent groups of data points that are similar to one another. Depending upon whether this hierarchical representation is created in top–down or bottom–up fashion, these representations may be considered either agglomerative or divisive (Aggarwal and Reddy 2013). Hierarchical method is more structured, and it is easier to decide the number of clusters. However, the resulting clustering is time complexity and may be sensitive to the ordering by which the data are presented. Furthermore, hierarchical clustering technique is very subtle for outlier.

In contrast, partitioning clustering involves dividing the data points into a fixed number of clusters, typically using algorithms like k-means. K-means clustering is an unsupervised machine learning algorithm; a cluster is represented by its centroid, which is usually the mean of points within a cluster. It works by minimizing the sum-of-squares distance of the data points in the same cluster. The k-means method is considered one of the simplest and most classical methods for data clustering (Jain 2010). It is also the most widely used methods in cluster analysis (Aggarwal and Reddy 2013). K-means method produces tighter clusters than hierarchical method and runs faster if the variables are large (Pandya and Saket 2020). A disadvantage is that the number of clustered must be specified.

Given the advantages and drawbacks that have been previously discussed, the present study employed a combination of both k-means and hierarchical method to cluster the data. The primary method used was k-means, while the hierarchical method was utilized as a validation tool to confirm the robustness of the cluster number obtained from the primary method. K-means++ (Arthur and Vassilvitskii 2007) is an advanced version of standard k-means algorithm that improves the way of selecting initial centroids. Instead of choosing them randomly, k-means++ selects a data point farther away from any existing centroid with probability proportional to the squared distance to the closest existing centroid, which leads to better performance and faster convergence. A previous review of PD cluster analysis studies conducted between 1999 and 2021 identified that the k-means cluster method was the most used approach, being utilized in 13 out of 24 studies. The hierarchical method, on the other hand, was employed in three studies (Hendricks and Khasawneh 2021). The 14 features included in the current clustering analysis are all numerical variables, and k-means is a popular distance-based clustering algorithm that is particularly well-suited to numerical data, as it uses the Euclidean distance to measure the similarity between data points. As k-means algorithm is sensitive to the scale of variables, to ensure accurate results, the data were first standardized and transformed. Silhouette method and Calinski–Harabasz scores were applied to determine the optimal number of clusters. To validate the optimum solution of K, hierarchical clustering was implemented as a final step before applying the k-means++ algorithm.

Cluster analysis methods

The k-means++ algorithm, hierarchical clustering, and Principal Component Analysis (PCA) were performed using scikit-learn (Pedregosa et al. 2011). Correlation circle and Biplot were generated using the FactoMineR and Factoextra packages (Lê et al. 2008) in R programming language (version 4.2.0) (R Core Team 2021).

Results

The total analysis set included 114 participants and consisted of 66 (57.9%) males and 48 (42.1%) females. In reference to the selected 14 features, the following presents a summary of the descriptive statistical results (mean ± SD):

(a)
Demographic information: age (70.49 ± 10.02) and disease duration (8.14 ± 5.10).
(b)
Motor symptoms: MDS-UPDRS Part II (13.41 ± 9.41), Part III (30.15 ± 14.59), and Part IV (4.01 ± 3.95); H&Y (2.65 ± 0.77).
(c)
Non-motor symptoms: MDS-UPDRS Part I (11.91 ± 6.26), NMSQuest (9.76 ± 5.20), SCOPA-AUT (14.00 ± 7.94), and PDQ-39 (42.70 ± 27.82).
(d)
Cognitive function: MoCA (22.60 ± 4.00).
(e)
Other feature: levodopa equivalent daily dose (657.97 ± 411.55).
(f)
Polyneuropathy examination: NDS (3.69 ± 2.59) and NSS (4.88 ± 3.01).

Ensuring the validity of the results obtained from a PD analysis requires the identification and adjustment of potential confounders (Hubble et al. 2015). Aging can affect the movement system independently of PD, and advanced age has previously been proposed to be associated with a more severe PD phenotype with accelerated progression (Raket et al. 2022). In the context of Parkinson's disease research, age and disease duration are considered as confounders as they have the potential to influence the outcome of the study. To accurately compare results between clusters, ANCOVA (analysis of covariance) is employed as a statistical technique to control for the effect of these confounders. Furthermore, to bolster the robustness of the analysis, bootstrapping is utilized to estimate the 95% Confidence Interval (CI) of the adjusted means, thereby providing a more comprehensive examination of the precision of the between-cluster comparison.

Determination of optimal number of clusters

The number of clusters (K) is a crucial parameter in the k-means algorithm and must be set prior to running the algorithm. The k-means algorithm assumes that the data can be divided into a fixed number of clusters, and this number is defined by the K parameter. The algorithm operates by defining a fixed number of centroids, or cluster centers, and then iteratively assigning data points to the cluster with the closest centroid. The centroids are subsequently updated according to the mean of the data points assigned to each cluster. Without an accurate and appropriate value of K, the k-means algorithm may not be able to partition the data into meaningful clusters.

Two methods were used to specify the optimal number of clusters (K). The first one is the silhouette method. The silhouette coefficient, which ranges between − 1 and 1, was calculated and it indicates how similar a data point is within-cluster compared to other clusters. The two-cluster solution and the three-cluster solution had the highest average silhouette coefficient (0.243 and 0.158, respectively). A silhouette analysis of k-means clustering with different numbers of clusters was performed (Fig. 1). The two-cluster option formed one cluster consisting of 74 patients and the other one of 40 patients. In comparison, the three-cluster solution formed clusters with 49, 40, and 25 patients, respectively.

The second method is the Calinski–Harabasz Score Elbow (Fig. 2a). It suggests that the optimal number of clusters is three, which has the highest Calinski and Harabasz score. The potential numbers of clusters are those values of K for which the angles are formed; an elbow can also be observed for K = 3.

Additionally, hierarchical clustering was performed and the resulting dendrogram (Fig. 2b) revealed a well-balanced cluster of three groups.

Both k-means clustering and hierarchical clustering suggested three as an optimal number of clusters (Fig. 2a, b). The silhouette and Calinski–Harabasz Score supported the optimal number of clusters issued from the k-means clustering. The authors identified the three-cluster solution as optimal, because it provided a better-balanced data distribution and clinical relevance than the two-cluster solution.

Principal component analysis (PCA)

To reduce the high dimensionality of the features to three dimensions, the Principal Component Analysis (PCA) was performed, allowing the visualization of the resulting k-mean clusters (Fig. 3a). To analyze how important each feature was for the characters of the different clusters, the loading scores for the first and second principal components were calculated (Fig. S1). The loading score represents the importance of each feature in defining the subtypes. In terms of the first principal component (explained variance ratio 43.9%), PDQ-39 and MDS-UPDRS Part II had the largest loading scores (0.342 and 0.341, respectively) and therefore contribute mostly to the first principal component, followed by NMS (0.317), MDS-UPDRS Part III (0.305), SCOPA-AUT (0.304) and MDS-UPDRS Part I (0.302). For the second principal component (explained variance ratio 10.9%), the disease duration and LED were the most important features with a loading score of 0.393 and 0.380, respectively. For both principal components, the contribution of variables was calculated and the top three most important features in characterizing the subtypes were PDQ-39, NMSQuest, and MDS-UPDRS Part II (Fig. 3b). Furthermore, a biplot of the three clusters with the PCA and the loadings of the 14 features is illustrated in Fig. 4.

Subtype identification

The cluster analysis, as depicted in Figs. 3a, 4, and 5, revealed the existence of three distinct subtypes. The descriptive statistics of each subtype, as well as the results of the ANOVA and post hoc analysis between the subtypes, are presented in Tables 1 and 2.

Table 1 Descriptive statistics of the subtypes

Full size table

Table 2 ANOVA and post hoc analysis between the three subtypes

Full size table

Subtype I included 40 patients (21 males, 19 females, mean age 76.40 ± 7.68 years). Those patients had the highest PDQ-39 scores (69.36 ± 20.59), the most severe motor and non-motor symptoms, which are demonstrated by the highest MDS-UPDRS Part II scores (22.37 ± 8.44) and NMSQuest scores (14.26 ± 4.70). Subtype III was the youngest group with 25 patients (18 males, 7 females, mean age 61.04 ± 7.86 years). Patients in this group had the least affected motor and non-motor impairment in all domains. They exhibited the mildest cognitive impairment as demonstrated by the highest MoCA score (25.40 ± 2.84). Forty-nine patients belonged to subtype II (27 males, 22 females, mean age 70.49 ± 9.04 years), presenting an intermediate score in both motor and non-motor components.

The data presented in Table 2 indicate a substantial clinical differentiation among the subtypes with respect to the mean values of various variables, including age, disease duration, scores on the PNP (NSS and NDS), MDS-UPDRS Part I, II, and III, H&Y, NMSQuest, SCOPA-AUT, PDQ-39, and MoCA. After adjusting for the confounders of age and disease duration using ANCOVA, the subtypes continue to exhibit statistically significant differences (P < 0.01) in relation to all of the previously mentioned variables, with the exception of NDS.

The variations in clinical characteristics between the mean values of each cluster are illustrated using a radar chart (Fig. 5). Individuals belonging to subtype III were found to be younger and exhibited the least severe motor and non-motor symptoms, consistent with the shortest disease duration. Conversely, individuals belonging to subtype I displayed the most severe motor and non-motor manifestations, along with the most impaired cognitive function and older ages. Individuals belonging to subtype II were characterized by intermediate values between subtype I and III. As a result, subtype I can be classified as a late-onset severe type, subtype III as an early onset mild type, and subtype II as an intermediate type.

Discussion

The clinical variability between PD patients suggests the existence of subtypes of the disease. Identification of subtypes is important, since a focus on homogeneous groups may enhance the chance of success of research on mechanisms of disease and may also lead to tailored treatment strategies (van Rooden et al. 2010). Defining subtypes of PD is, therefore, needed to better understand underlying mechanisms and predict disease course (Fereshtehnejad et al. 2015). In this study, the authors developed a data-driven subtyping method for 114 patients with idiopathic PD. A wide spectrum of motor and non-motor variables was included in clustering analysis and three unique subtypes emerged:

Subtype I, comprised 40 patients, was characterized as late-onset severe type.
Subtype II, comprised 49 patients, was identified as an intermediate type.
Subtype III, comprised 25 patients, was characterized as the early onset mild type.

The current clustering analysis provides evidence that the PDQ-39, NMSQuest, and the MDS-UPDRS Part II are the most crucial variables in differentiating patients with PD. A strong correlation was observed between the PDQ-39 and the MDS-UPDRS Part II (Spearman Rs = 0.796; P < 0.001) as demonstrated in Fig. S4, which serves as an indicator of the alignment between the patient-reported quality-of-life measurement and the motor experience of daily life. Additionally, a significant correlation was found between PDQ-39 and NMSQuest (Spearman Rs = 0.699; P < 0.001), as illustrated in Fig. S4. This is an indicator for the agreement between the patient-reported quality-of-life measurement and the non-motor symptoms. Previous research has already demonstrated that non-motor symptoms, as measured by the Non-Motor Symptoms Scale (NMSS), have the most significant impact on the health-related quality of life (Hr-QoL) of PD patients (Li et al. 2010). The current study confirms this notion by demonstrating a strong correlation between the PDQ-39 and the NMSQuest, which measure the patient-reported quality of life and the non-motor symptoms, respectively.

In the current analysis, PDQ-39 had the highest quality of representation of all the 14 variables and the largest loading score in the current PCA analysis. PDQ-39 is the most widely used patient-reported rating scale in PD (Hagell and Nygren 2007) and a reliable evaluation of PD on both motor and non-motor aspects. PDQ-39 was also found to have the largest effect size to measure QoL (quality of life) by PD in the meta-analysis (Zhao et al. 2021).

While PDQ-39 is widely used to assess QoL in PD patients, it has certain limitations. Several studies have demonstrated the importance of non-physical factors, such as education, disease acceptance, and financial background in determining the quality of life in PD patients. Jenkinson et al. (1997) found that higher levels of education were associated with better overall quality of life in PD patients. The study conducted by Cubo et al. (2002) emphasized the role of education and psychological factors in determining the QoL of PD patients, particularly in emotional and social domains. Schrag et al. (2000) reported that disease acceptance was a crucial factor in determining the QoL in PD patients, as is financial background. Notably, financial stressors can impede patients' access to medical care, medications, and resources needed to manage their condition. Overall, healthcare professionals should consider these factors to provide targeted care and improve patients' experience.

Like PDQ-39, NMSQuest made a crucial contribution to form the three subtypes. Several studies have demonstrated that non-motor symptoms are important to define features of PD subtypes (Marras 2015; Zella et al. 2019). In a previous cluster analysis of PD, a separate non-motor dominant subtype was described (Erro et al. 2013). Another study (Fereshtehnejad et al. 2015) found that the best cluster solution was based on non-motor features. The NMSQuest has been developed as a patient-reported instrument to evaluate a broad spectrum of non-motor symptoms (Chaudhuri et al. 2006). In the current analysis NMSQuest scores of the three subtypes (14.26 ± 4.70, 8.79 ± 3.23, and 4.48 ± 2.37 for subtype I, II, and III, respectively) are consistent with the cut-off points of NMSQuest grading system proposed by (Chaudhuri et al. 2015): very Severe: > 14; severe: 10–13, moderate: 6–9, and mild: 1–5. Since NMSQuest was not developed for measuring the severity of symptoms (Chaudhuri et al. 2006), the Movement Disorder Society Non-Motor Rating Scale (MDS-NMS), which was introduced in 2019, utilizes a novel approach to evaluate non-motor symptom severity by computing a total score through the multiplication of symptom severity and its frequency. This approach offers a more precise method of assessing non-motor symptoms, as it considers both the intensity and the frequency of the symptoms.

The MDS-UPDRS Part II (motor experiences of daily living) captures the impact of PD on daily function and it was included in the analysis as an important variable by most of the previous cluster analysis. The self-rated MDS-UPDRS Part II proved to be useful for assessing disability in PD and showed a better performance than other rater-based, generic or specific, scales to assess disability in PD (Rodriguez-Blazquez et al. 2013; Rodríguez-Blázquez et al. 2017). As a remarkable variable in the present cluster analysis, a strong correlation was found between MDS-UPDRS Part II and PDQ-39 (Spearman Rs = 0.796; P < 0.001), which is consistent with a previous study (Skorvanek et al. 2018): health-Related Quality of Life (HRQoL), which was measured by PDQ-8 (a shortened version PDQ-39), was found significantly related to MDS-UPDRS Part II (ADLs) and Part I (NMS). A previous study found a strong correlation between the scores on the MDS-UPDRS Part II and the duration of the disease in 888 patients with idiopathic PD. The results of this study suggest that a single measurement of UPDRS II scores may be a good marker of disease progression than other scores on the MDS-UPDRS scale (Harrison et al. 2009).

To minimize the influence of comorbidities that may confound the interpretation of neuropathy scores in the current cluster analysis, the authors implemented stringent exclusion criteria that excluded individuals with a diagnosis of diabetes mellitus or alcohol dependence disorder, as well as other conditions that may cause neuropathy. Furthermore, an earlier study of the same authors indicated that there was no significant correlation between LED and tibial nerve compound muscle action potential (cMAP) (Kühn et al. 2020). This finding reinforces the validity and reliability of the polyneuropathy scores applied here and allows the authors to utilize them with greater confidence in the current cluster analysis, thereby reducing the risk of potential bias. The polyneuropathy scores (NSS and NDS) have the lowest quality of representation values in the cluster analysis and therefore the least contribution to distinguish between subtypes. The patient-reported polyneuropathy symptoms correlate weakly with motor- and non-motor symptoms: NSS and MDS-UPDRS Part II (Spearman Rs = 0.293; P = 0.001), and NSS and NMS (Spearman Rs = 0.368; P < 0.001). No correlation could be observed between NDS and MDR-UPDRS Part II (Spearman Rs = 0.093; P = 0.328) and between NDS and NMS (Spearman Rs = 0.116; P = 0.221). Despite the high prevalence of polyneuropathy among patients with PD being reported in the previous studies (Crespo-Burillo et al. 2016; Kühn et al. 2020), it was not a major determinant of patient subtypes in the current cluster analysis. Additionally, even after adjusting for factors such as age and disease duration, no significant differences in the Neuropathy Disability Score (NDS) were observed between the clusters.

The PDQ-39, NMSQuest, and MDS-UPDRS Part II are self-reporting questionnaires that are easy to apply and do not require specialized training or equipment, making them accessible to clinicians and researchers. Furthermore, the use of self-reported questionnaires allows patients to provide insight into their own experiences with PD symptoms, potentially leading to more accurate assessments of their symptoms. They are essential tools in differentiating subtypes of Parkinson's disease (PD) and evaluating the quality of life of PD patients. These questionnaires provide a comprehensive assessment of various aspects of the disease, including patient-reported quality of life, non-motor symptoms, and motor function, respectively. They are easily accessible way to gather important information about PD patients, which can aid in diagnosis, treatment planning, and monitoring of disease progression.

Comparison with other cluster analysis methodologies

The three-cluster solution found in the current study aligns with the results of previous research that used either k-means or hierarchical clustering as the method of grouping. The clustering of domains in patients with PD shows a consistent pattern, indicating the validity and reliability of these clustering techniques. Post et al. (2008) identified three clusters, including a group with younger onset, an intermediate group with older onset, and an oldest onset group. Three subtypes were also defined by Fereshtehnejad et al. (2015) as mainly motor/slow, diffuse/malignant, and intermediate progression. Another study by Fereshtehnejad et al. (2017) conducted a cluster analysis of 421 PD patients from the PPMI Database and identified three PD subtypes: mild motor-predominant, diffuse malignant, and intermediate subtype. Based on clinical and biomarker data, Zhang et al. (2019) also described three PD subtypes: Subtype I (Mild Baseline, Moderate Motor Progression), Subtype II (Moderate Baseline, Mild Progression), and Subtype III (Severe Baseline, Rapid Progression). Krishnagopal et al. (2020) applied Trajectory Profile Clustering (TPC) and found three distinct clusters: mixed subtype, mild subtype, and severe subtype.

The present clustering analysis demonstrated that PDQ-39, NMSQuest, and the MDS-UPDRS Part II were the crucial variables to differentiate the patients. Other variables like MDS-UPDRS Part I, SCOPA, MDS-UPDRS Part III, H&Y and disease duration contributed substantially to the formation of the clusters (Fig. 3b). A previous PD clustering study (Fereshtehnejad et al. 2015) came to a similar conclusion: the most informative variables in generating clusters were identified including UPDRS Part II, UPDRS Part III, REM sleep behavior disorder (RBD), mild cognitive impairment (MCI), Orthostatic hypotension, depression, and anxiety.

Age is considered a main risk factor for developing PD (Elbaz et al. 2016). The progression of PD is slower in early onset PD (Ferguson et al. 2016), on the contrary, older age at onset was associated with a more severe motor and non-motor phenotype (Pagano et al. 2016). A recent study compared 24 PD cluster analysis research between the years of 1990–2021 and a series of limited age ranges were discovered among those cluster solutions: the smallest difference in minimum and maximum patient cluster ages was 3.7 years, which was among three clusters. While the largest difference between patient cluster ages was 12.4 years (Reijnders et al. 2009; Hendricks and Khasawneh 2021). The current cluster subtypes had an age range of 15.4 years between the early onset mild type and late-onset severe type.

In previous studies on PD clusters, the use of silhouette scores was not reported (Hendricks and Khasawneh 2021). The average silhouette score is a commonly used method to determine the optimal number of clusters prior to analysis and evaluate the results of clustering. The current study incorporated both silhouette score and Calinski–Harabasz score elbow to validate the cluster solution.

Limitations

Limitations of the study include the exclusive use of data from clinic-recruited PD patients, resulting in a small sample size of 114 patients, which may have impacted the machine learning algorithm's efficacy and the ability of the analysis to capture the full data variability. Relying on clinic-recruited patients may also limit diversity and generalisability. These limitations should be considered when interpreting and applying the findings to a larger population. Future studies with larger and more diverse patient populations are needed to validate the findings and improve generalisability. Furthermore, longitudinal data are critical for an understanding of the stability of proposed subtypes (Mestre et al. 2021). In this regard, a longitudinal follow-up study is to be continued and will be carried out at the Department of Neurology, St. Josef-Hospital, at the Ruhr University in Bochum (Germany), to compare the prognosis and progression rate between the identified subtypes. The analysis conducted in this study did not incorporate the use of biomarkers and imaging techniques, which could have provided additional insights into the subtypes of PD. Despite this limitation, the clustering analysis performed in this study provides a useful starting point for understanding PD subtypes. However, it is important to note that the subtyping of PD presented in this study must be validated through further research in clinical practice to establish its reliability and validity.

Conclusions

Three distinct PD subtypes were identified using k-means++ cluster analysis: late-onset severe type, intermediate type, and early onset mild type. Through PCA and between-cluster comparison, self-reporting questionnaires, such as the PDQ-39, NMSQuest, and the MDS-UPDRS Part II, were found to be the crucial factors to differentiate PD subtypes and evaluate the PD heterogeneity. They are easy to administer, accessible, and provide subjective insight into patients' experiences with PD symptoms, enhancing symptom assessments' accuracy. These questionnaires are valuable for identifying and classifying PD subtypes, enhancing disease understanding, and informing clinical practice and patient care. By identifying a statistical relationship, this research provides a solid foundation for defining different subtypes of PD, making the clustering process highly differentiated and effective. Finally, future works should aim at analyzing the longitudinal trend of progression between the different subtypes in the sense of a follow-up of years, to identify patients and patient groups with different rates of progression and how this relates to their clinical characteristics in the early years of the disease.

Availability of data and materials

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

H&Y:: Hoehn and Yahr Stage
LED:: Levodopa dosage
MDS-UPDRS:: Movement Disorder Society-Unified Parkinson’s Disease Rating Scale
MoCA:: Montreal Cognitive Assessment
NDS:: Modified Neuropathy Disability Score
NMS or NMSQuest:: Non-motor Symptom Questionnaire
NSS:: Modified Neuropathy Symptom Score
PCA:: Principal Component Analysis
PDQ-39:: Parkinson’s Disease Questionnaire
SCOPA-AUT:: Scales for Outcomes in Parkinson’s Disease-Autonomic

References

Aggarwal CC, Reddy CK (2013) Data clustering. algorithms and applications, first edition. Chapman & Hall/CRC data mining and knowledge discovery series. Chapman and Hall/CRC, Boca Raton
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035
Barone P, Antonini A, Colosimo C, Marconi R, Morgante L, Avarello TP, Bottacchi E, Cannas A, Ceravolo G, Ceravolo R, Cicarelli G, Gaglio RM, Giglia RM, Iemolo F, Manfredi M, Meco G, Nicoletti A, Pederzoli M, Petrone A, Pisani A, Pontieri FE, Quatrale R, Ramat S, Scala R, Volpe G, Zappulla S, Bentivoglio AR, Stocchi F, Trianni G, Del Dotto P (2009) The PRIAMO study: a multicenter assessment of nonmotor symptoms and their impact on quality of life in Parkinson’s disease. Mov Disord 24(11):1641–1649. https://doi.org/10.1002/mds.22643
Article PubMed Google Scholar
Beiske AG, Loge JH, Rønningen A, Svensson E (2009) Pain in Parkinson’s disease: prevalence and characteristics. Pain 141(1–2):173–177. https://doi.org/10.1016/j.pain.2008.12.004
Article CAS PubMed Google Scholar
Chaudhuri KR, Martinez-Martin P, Schapira AHV, Stocchi F, Sethi K, Odin P, Brown RG, Koller W, Barone P, MacPhee G, Kelly L, Rabey M, MacMahon D, Thomas S, Ondo W, Rye D, Forbes A, Tluk S, Dhawan V, Bowron A, Williams AJ, Olanow CW (2006) International multicenter pilot study of the first comprehensive self-completed nonmotor symptoms questionnaire for Parkinson’s disease: the NMSQuest study. Mov Disord 21(7):916–923. https://doi.org/10.1002/mds.20844
Article PubMed Google Scholar
Chaudhuri KR, Sauerbier A, Rojo JM et al (2015) The burden of non-motor symptoms in Parkinson’s disease using a self-completed non-motor questionnaire: a simple grading system. Parkinsonism Relat Disord 21(3):287–291. https://doi.org/10.1016/j.parkreldis.2014.12.031
Article PubMed Google Scholar
Crespo-Burillo JA, Almarcegui-Lafita C, Dolz-Zaera I, Alarcia R, Roche JC, Ara JR, Capablo JL (2016) Prevalence and factors associated with polyneuropathy in Parkinson’s disease. Basal Ganglia 6(2):89–94. https://doi.org/10.1016/j.baga.2016.01.005
Article Google Scholar
Cubo E, Rojo A, Ramos S, Quintana S, Gonzalez M, Kompoliti K, Aguilar M (2002) The importance of educational and psychological factors in Parkinson’s disease quality of life. Eur J Neurol 9(6):589–593. https://doi.org/10.1046/j.1468-1331.2002.00484.x
Article CAS PubMed Google Scholar
Defazio G, Berardelli A, Fabbrini G, Martino D, Fincati E, Fiaschi A, Moretto G, Abbruzzese G, Marchese R, Bonuccelli U, Del Dotto P, Barone P, de Vivo E, Albanese A, Antonini A, Canesi M, Lopiano L, Zibetti M, Nappi G, Martignoni E, Lamberti P, Tinazzi M (2008) Pain as a nonmotor symptom of Parkinson disease: evidence from a case-control study. Arch Neurol 65(9):1191–1194. https://doi.org/10.1001/archneurol.2008.2
Article PubMed Google Scholar
Dyck PJ, Sherman WR, Hallcher LM, Service FJ, O’Brien PC, Grina LA, Palumbo PJ, Swanson CJ (1980) Human diabetic endoneurial sorbitol, fructose, and myo-inositol related to sural nerve morphometry. Ann Neurol 8(6):590–596. https://doi.org/10.1002/ana.410080608
Elbaz A, Carcaillon L, Kab S, Moisan F (2016) Epidemiology of Parkinson’s disease. Revue Neurol 172(1):14–26. https://doi.org/10.1016/j.neurol.2015.09.012
Article CAS Google Scholar
Erro R, Vitale C, Amboni M, Picillo M, Moccia M, Longo K, Santangelo G, de Rosa A, Allocca R, Giordano F, Orefice G, de Michele G, Santoro L, Pellecchia MT, Barone P (2013) The heterogeneity of early Parkinson’s disease: a cluster analysis on newly diagnosed untreated patients. PLoS ONE 8(8):e70244. https://doi.org/10.1371/journal.pone.0070244
Fereshtehnejad S-M, Romenets SR, Anang JBM, Latreille V, Gagnon J-F, Postuma RB (2015) New clinical subtypes of parkinson disease and their longitudinal progression: a prospective cohort comparison with other phenotypes. JAMA Neurol 72(8):863–873. https://doi.org/10.1001/jamaneurol.2015.0703
Article PubMed Google Scholar
Fereshtehnejad S-M, Zeighami Y, Dagher A, Postuma RB (2017) Clinical criteria for subtyping Parkinson’s disease: biomarkers and longitudinal progression. Brain J Neurol 140(7):1959–1976. https://doi.org/10.1093/brain/awx118
Article Google Scholar
Ferguson LW, Rajput AH, Rajput A (2016) Early-onset vs. Late-onset Parkinson’s disease: a clinical-pathological Study. The Canadian journal of neurological sciences. Le J Can Sci Neurol 43(1):113–119. https://doi.org/10.1017/cjn.2015.244
Ford B (1998) Pain in Parkinson’s disease. Clin Neurosci (New York) 5(2):63–72
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Article Google Scholar
Gallagher DA, Lees AJ, Schrag A (2010) What are the most important nonmotor symptoms in patients with Parkinson’s disease and are we missing them? Mov Disord 25(15):2493–2500. https://doi.org/10.1002/mds.23394
Article PubMed Google Scholar
Gibb G, Lees AJ (1988) The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson’s disease. J Neurol Neurosurg Psychiatry 51:745–752. https://doi.org/10.1136/jnnp.51.6.745
Article CAS PubMed PubMed Central Google Scholar
Goetz CG, Tanner CM, Levy M, Wilson RS, Garron DC (1986) Pain in Parkinson’s disease. Mov Disord 1(1):45–49. https://doi.org/10.1002/mds.870010106
Article CAS PubMed Google Scholar
Hagell P, Nygren C (2007) The 39 item Parkinson’s disease questionnaire (PDQ-39) revisited: implications for evidence based medicine. J Neurol Neurosurg Psychiatry 78(11):1191–1198. https://doi.org/10.1136/jnnp.2006.111161
Article PubMed PubMed Central Google Scholar
Harrison MB, Wylie SA, Frysinger RC, Patrie JT, Huss DS, Currie LJ, Wooten GF (2009) UPDRS activity of daily living score as a marker of Parkinson’s disease progression. Mov Disord 24(2):224–230. https://doi.org/10.1002/mds.22335
Article PubMed PubMed Central Google Scholar
Hendricks RM, Khasawneh MT (2021) A systematic review of Parkinson’s disease cluster analysis research. Aging Dis 12(7):1567–1586. https://doi.org/10.14336/ad.2021.0519
Hubble RP, Naughton GA, Silburn PA, Cole MH (2015) Wearable sensor use for assessing standing balance and walking stability in people with Parkinson’s disease: a systematic review. PLoS ONE 10(4):e0123705
Article PubMed PubMed Central Google Scholar
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Article Google Scholar
Jankovic J, McDermott M, Carter J, Gauthier S, Goetz C, Golbe L, Huber S, Koller W, Olanow C, Shoulson I (1990) Variable expression of Parkinson’s disease: a base-line analysis of the DATATOP cohort. The Parkinson Study Group. Neurology 40(10):1529–1534. https://doi.org/10.1212/wnl.40.10.1529
Article CAS PubMed Google Scholar
Jankowicz E, Drozdowski W, Zawadzka-Tołłoczko W (1986) Pain as a symptom of parkinsonism (Bóle jako objawy parkinsonizmu). Neurol Neurochir Pol 20(4):308–312
CAS PubMed Google Scholar
Jenkinson C, Fitzpatrick R, Peto V, Greenhall R, Hyman N (1997) The Parkinson’s Disease Questionnaire (PDQ-39): development and validation of a Parkinson’s disease summary index score. Age Ageing 26:353–357. https://doi.org/10.1093/ageing/26.5.353
Article CAS PubMed Google Scholar
Kalia LV, Lang AE (2015) Parkinson’s disease. Lancet (london) 386(9996):896–912. https://doi.org/10.1016/s0140-6736(14)61393-3
Article CAS Google Scholar
Koller WC (1984) The diagnosis of Parkinson’s disease. Arch Intern Med 144(11):2146–2147
Article CAS PubMed Google Scholar
Krishnagopal S, von Coelln R, Shulman LM, Girvan M (2020) Identifying and predicting Parkinson’s disease subtypes through trajectory clustering via bipartite networks. PLoS ONE 15(6): e0233296. https://doi.org/10.1371/journal.pone.0233296
Kühn E, Averdunk P, Huckemann S, Müller K, Biesalski A-S, zum Hof Berge F, Motte J, Fisse AL, Schneider-Gold C, Gold R, Pitarokoili K, Tönges L (2020) Correlates of polyneuropathy in Parkinson’s disease. Ann Clin Transl Neurol 7(10):1898–1907. https://doi.org/10.1002/acn3.51182
Lawton M, Ben-Shlomo Y, May MT, Baig F, Barber TR, Klein JC, Swallow DMA, Malek N, Grosset KA, Bajaj N, Barker RA, Williams N, Burn DJ, Foltynie T, Morris HR, Wood NW, Grosset DG, Hu MTM (2018) Developing and validating Parkinson’s disease subtypes and their motor and cognitive progression. J Neurol Neurosurg Psychiatry 89(12):1279–1287. https://doi.org/10.1136/jnnp-2018-318337
Lê S, Josse J, Husson F (2008) FactoMineR: an R package for multivariate analysis. J Stat Soft 25(1):1–18. https://doi.org/10.18637/jss.v025.i01
Li H, Zhang M, Chen L, Zhang J, Pei Z, Hu A, Wang Q (2010) Nonmotor symptoms are independently associated with impaired health-related quality of life in Chinese patients with Parkinson’s disease. Move Disord off J Move Disord Soc 25(16):2740–2746. https://doi.org/10.1002/mds.23368
Article Google Scholar
Marras C (2015) Subtypes of Parkinson’s disease: state of the field and future directions. Curr Opin Neurol 28(4):382–386. https://doi.org/10.1097/wco.0000000000000219
Article CAS PubMed Google Scholar
Mestre TA, Fereshtehnejad S-M, Berg D, Bohnen NI, Dujardin K, Erro R, Espay AJ, Halliday G, van Hilten JJ, Hu MT, Jeon B, Klein C, Leentjens AFG, Marinus J, Mollenhauer B, Postuma R, Rajalingam R, Rodríguez-Violante M, Simuni T, Surmeier DJ, Weintraub D, McDermott MP, Lawton M, Marras C (2021) Parkinson’s disease subtypes: critical appraisal and recommendations. J Parkinson’s Dis 11(2):395–404. https://doi.org/10.3233/JPD-202472
Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, Cummings JL, Chertkow H (2005) The montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment. J Am Geriatr Soc 53(4):695–699
Article PubMed Google Scholar
Nègre-Pagès L, Regragui W, Bouhassira D, Grandjean H, Rascol O (2008) Chronic pain in Parkinson’s disease: the cross-sectional French DoPaMiP survey. Mov Disord 23(10):1361–1369. https://doi.org/10.1002/mds.22142
Article PubMed Google Scholar
Pagano G, Ferrara N, Brooks DJ, Pavese N (2016) Age at onset and Parkinson disease phenotype. Neurology 86(15):1400–1407. https://doi.org/10.1212/wnl.0000000000002461
Article CAS PubMed PubMed Central Google Scholar
Pandya S, Saket S (2020) An overview of partitioning algorithms in clustering techniques. Int J Electr Comput Eng 5:1
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830. https://doi.org/10.48550/arXiv.1201.0490
Post B, Speelman JD, de Haan RJ (2008) Clinical heterogeneity in newly diagnosed Parkinson’s disease. J Neurol 255(5):716–722. https://doi.org/10.1007/s00415-008-0782-1
Article PubMed Google Scholar
Postuma RB, Berg D, Stern M, Poewe W, Olanow CW, Oertel W, Obeso J, Marek K, Litvan I, Lang AE, Halliday G, Goetz CG, Gasser T, Dubois B, Chan P, Bloem BR, Adler CH, Deuschl G (2015) MDS clinical diagnostic criteria for Parkinson’s disease. Mov Disord 30(12):1591–1601. https://doi.org/10.1002/mds.26424
Article PubMed Google Scholar
R Core Team (2021) R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Raket LL, Oudin Åström D, Norlin JM, Kellerborg K, Martinez-Martin P, Odin P (2022) Impact of age at onset on symptom profiles, treatment characteristics and health-related quality of life in Parkinson’s disease. Sci Rep 12(1):526
Article CAS PubMed PubMed Central Google Scholar
Reijnders J, Ehrt U, Lousberg R, Aarsland D, Leentjens AFG (2009) The association between motor subtypes and psychopathology in Parkinson’s disease. Parkinsonism Relat Disord 15(5):379–382. https://doi.org/10.1016/j.parkreldis.2008.09.003
Article CAS PubMed Google Scholar
Rodriguez-Blazquez C, Rojo-Abuin JM, Alvarez-Sanchez M, Arakaki T, Bergareche-Yarza A, Chade A, Garretto N, Gershanik O, Kurtis MM, Martinez-Castrillo JC, Mendoza-Rodriguez A, Moore HP, Rodriguez-Violante M, Singer C, Tilley BC, Huang J, Stebbins GT, Goetz CG, Martinez-Martin P (2013) The MDS-UPDRS Part II (motor experiences of daily living) resulted useful for assessment of disability in Parkinson’s disease. Parkinsonism Relat Disord 19(10):889–893. https://doi.org/10.1016/j.parkreldis.2013.05.017
Article PubMed Google Scholar
Rodríguez-Blázquez C, Alvarez M, Arakaki T, Campos Arillo V, Chaná P, Fernández W, Garretto N, Martínez-Castrillo JC, Rodríguez-Violante M, Serrano-Dueñas M, Ballesteros D, Rojo-Abuin JM, Ray Chaudhuri K, Merello M, Martínez-Martín P (2017) Self-assessment of disability in Parkinson’s disease: the MDS-UPDRS Part II versus clinician-based ratings. Move Disord Clin Pract 4(4):529–535. https://doi.org/10.1002/mdc3.12462
Article Google Scholar
Schrag A, Jahanshahi M, Quinn N (2000) What contributes to quality of life in patients with Parkinson’s disease? J Neurol Neurosurg Psychiatry 69(3):308–312. https://doi.org/10.1136/jnnp.69.3.308
Article CAS PubMed PubMed Central Google Scholar
Sharma S (1996) Applied multivariate techniques, vol 15. Wiley, New York
Skorvanek M, Martinez-Martin P, Kovacs N, Zezula I, Rodriguez-Violante M, Corvol J-C, Taba P, Seppi K, Levin O, Schrag A, Aviles-Olmos I, Alvarez-Sanchez M, Arakaki T, Aschermann Z, Benchetrit E, Benoit C, Bergareche-Yarza A, Cervantes-Arriaga A, Chade A, Cormier F, Datieva V, Gallagher DA, Garretto N, Gdovinova Z, Gershanik O, Grofik M, Han V, Kadastik-Eerme L, Kurtis MM, Mangone G, Martinez-Castrillo JC, Mendoza-Rodriguez A, Minar M, Moore HP, Muldmaa M, Mueller C, Pinter B, Poewe W, Rallmann K, Reiter E, Rodriguez-Blazquez C, Singer C, Valkovic P, Goetz CG, Stebbins GT (2018) Relationship between the MDS-UPDRS and quality of life: a large multicenter study of 3206 patients. Parkinsonism Relat Disord 52:83–89. https://doi.org/10.1016/j.parkreldis.2018.03.027
Article PubMed Google Scholar
Snider RS, Maiti A, Snider SR (1976) Cerebellar pathways to ventral midbrain and nigra. Exp Neurol 53(3):714–728. https://doi.org/10.1016/0014-4886(76)90150-3
Article CAS PubMed Google Scholar
The German Neurological Society (2016) The national guidelines for diagnosis and treatment of idiopathic Parkinson’s syndrome. https://2022.dgn.org/wp-content/uploads/2013/01/030010_LL_kurzfassung_ips_2016.pdf. Accessed 24 Oct 2022
van Rooden SM, Heiser WJ, Kok JN, Verbaan D, van Hilten JJ, Marinus J (2010) The identification of Parkinson’s disease subtypes using cluster analysis: a systematic review. Mov Disord 25(8):969–978. https://doi.org/10.1002/mds.23116
Article PubMed Google Scholar
Visser M, Marinus J, Stiggelbout AM, van Hilten JJ (2004) Assessment of autonomic dysfunction in Parkinson’s disease: the SCOPA-AUT. Mov Disord 19(11):1306–1312. https://doi.org/10.1002/mds.20153
Article PubMed Google Scholar
von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet (london) 370(9596):1453–1457. https://doi.org/10.1016/s0140-6736(07)61602-x
Article Google Scholar
Wasner G, Deuschl G (2012) Pains in Parkinson disease—many syndromes under one umbrella. Nat Rev Neurol 8(5):284–294. https://doi.org/10.1038/nrneurol.2012.54
Article CAS PubMed Google Scholar
Wojtala J, Heber IA, Neuser P, Heller J, Kalbe E, Rehberg SP, Storch A, Linse K, Schneider C, Gräber S, Berg D, Dams J, Balzer-Geldsetzer M, Hilker-Roggendorf R, Oberschmidt C, Baudrexel S, Witt K, Schmidt N, Deuschl G, Mollenhauer B, Trenkwalder C, Liepelt-Scarfone I, Spottke A, Roeske S, Wüllner U, Wittchen H-U, Riedel O, Dodel R, Schulz JB, Reetz K (2019) Cognitive decline in Parkinson’s disease: the impact of the motor phenotype on cognition. J Neurol Neurosurg Psychiatry 90(2):171–179. https://doi.org/10.1136/jnnp-2018-319008
Article PubMed Google Scholar
Xiong Q, Lu B, Ye H, Wu X, Zhang T, Li Y (2015) The diagnostic value of neuropathy symptom and change score, neuropathy impairment score and Michigan neuropathy screening instrument for diabetic peripheral neuropathy. Eur Neurol 74(5–6):323–327
Article PubMed Google Scholar
Zella MAS, May C, Müller T, Ahrens M, Tönges L, Gold R, Marcus K, Woitalla D (2019) Landscape of pain in Parkinson’s disease: impact of gender differences. Neurol Res 41(1):87–97. https://doi.org/10.1080/01616412.2018.1531208
Article PubMed Google Scholar
Zhang X, Chou J, Liang J, Xiao C, Zhao Y, Sarva H, Henchcliffe C, Wang F (2019) Data-driven subtyping of Parkinson’s disease using longitudinal clinical records: a cohort study. Sci Rep 9(1):797. https://doi.org/10.1038/s41598-018-37545-z
Article CAS PubMed PubMed Central Google Scholar
Zhao N, Yang Y, Zhang L et al (2021) Quality of life in Parkinson’s disease: a systematic review and meta-analysis of comparative studies. CNS Neurosci Ther 27:270–279. https://doi.org/10.1111/cns.13549
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank all participants. They would also like to thank the people involved in data collection.

Funding

Open Access funding enabled and organized by Projekt DEAL. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Samis Zella and Lars Tönges have contributed equally to this paper.

Authors and Affiliations

Department of Neurology, St. Josef-Hospital, Ruhr University Bochum, 44791, Bochum, Germany
Qiang Chen, Raphael Scherbaum, Ralf Gold, Kalliopi Pitarokoili, Samis Zella & Lars Tönges
Department of Psychiatry, Landschaftsverband Rheinland-Klinik (LVR-Klinik), 40764, Langenfeld, Germany
Samis Zella
Medizinisches Zentrum für Erwachsene mit Behinderung (MZEB), Landschaftsverband Rheinland-Klinik, 40764, Langenfeld, Germany
Samis Zella
Center for Protein Diagnostics (ProDi), Ruhr University Bochum, 44801, Bochum, Germany
Ralf Gold, Axel Mosig & Lars Tönges
Immune-Mediated Neuropathies Biobank (INHIBIT), Ruhr-University Bochum, Bochum, Germany
Ralf Gold & Kalliopi Pitarokoili
Bioinformatics Group, Ruhr University Bochum, 44801, Bochum, Germany
Axel Mosig

Authors

Qiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Scherbaum
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Gold
View author publications
You can also search for this author in PubMed Google Scholar
Kalliopi Pitarokoili
View author publications
You can also search for this author in PubMed Google Scholar
Axel Mosig
View author publications
You can also search for this author in PubMed Google Scholar
Samis Zella
View author publications
You can also search for this author in PubMed Google Scholar
Lars Tönges
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: QC, LT, and SZ. Methodology: QC, AM, LT, and SZ. Software: QC and AM. Validation: LT, SZ, and RS. Formal analysis: QC. Investigation: QC. Resources: LT, SZ, and AM. Data curation: QC. Writing—original draft: QC and SZ. Writing: LT, SZ, and RS. Visualization: QC. Supervision: SZ, LT, RG, KP, and AM. Project administration: LT and SZ. Funding acquisition: none.

Corresponding author

Correspondence to Lars Tönges.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent to participate

All persons gave their informed consent prior to their inclusion in the study.

Consent for publication

All authors have read and approved the submitted manuscript.

Ethics approval

This study was approved by the Institutional Review Board of the Medical Faculty of the Ruhr University Bochum (Reg. No. 18-6360) and has therefore been performed in accordance with the ethical standards of the 1964 Declaration of Helsinki and its later amendments.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (DOCX 914 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Q., Scherbaum, R., Gold, R. et al. Data-driven subtyping of Parkinson’s disease: comparison of current methodologies and application to the Bochum PNS cohort. J Neural Transm 130, 763–776 (2023). https://doi.org/10.1007/s00702-023-02627-4

Download citation

Received: 10 February 2023
Accepted: 23 March 2023
Published: 31 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00702-023-02627-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data-driven subtyping of Parkinson’s disease: comparison of current methodologies and application to the Bochum PNS cohort

Abstract

Similar content being viewed by others

Diagnosis of Early Alzheimer’s Disease: Clinical Practice in 2021

Epidemiology of Parkinson’s disease

Multiple system atrophy: an update and emerging directions of biomarkers and clinical trials

Introduction

Data and methods

Study design

Setting and participants

Variables and data sources

Statistical methods

Choice of appropriate clustering methodology

Cluster analysis methods

Results

Determination of optimal number of clusters

Principal component analysis (PCA)

Subtype identification

Discussion

Comparison with other cluster analysis methodologies

Limitations

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (DOCX 914 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation