Parkinson’s disease (PD) is a chronic, progressive neurodegenerative disorder straining health systems worldwide due to increasing prevalence rates . A large global burden of disease study identified PD as one of the top five leading causes of death from neurological disorders in the US . It is estimated there were approximately 6.1 million People with Parkinson’s (PwP) globally in 2016, a sharp increase compared to 2.5 million PwP in 1990 .
Cardinal PD symptoms include tremor, rigidity, bradykinesia, and postural instability, within the broader remit of motor, cognitive, and neuropsychiatric symptoms . The breadth of motor and non-motor symptoms underscores the need for frequent, detailed, longitudinal monitoring and personalized medicine approaches [3, 4]. Comorbidities are common , which further complicates symptom management of symptoms [3, 4].
PD is a heterogeneous disease with considerable variation in PwP’s symptom severity trajectories, where differences in clinical phenotypes likely reflect different PD subtypes . In fact, delineating subtype identification had been highlighted as one of top three PD research priorities by the National Institutes of Health . Correctly determining PD subtypes is clinically important since homogeneous groups exhibit stronger clinical symptom manifestation and potentially stronger genetic coherence: this may lead to new insights towards involved biological pathways, which in turn may lead to better-informed, targeted treatment strategies [6, 8]. Ultimately, the purpose of understanding heterogeneity within disease populations in general and PD in particular, is the delivery of more personalized care tailored for individuals and the development of targeted pharmaco-therapeutic treatments. Due to the chronic, progressive, and multi-dimensional symptom constellation nature of PD, there are different approaches to define subtypes, e.g. depending on the underlying characteristics we choose to focus on (which may include motor and non-motor symptoms, cognitive ability, imaging, and symptom severity progression amongst others . In practice, defining PD subtypes has been pursued using either (i) empirical clinical experience and expertise or (ii) data-driven approaches. Arguably, the former approach is subjective and as yet there is no set of empirical accepted guidelines. Different clinical experts focus on different PD symptoms during routine clinical evaluation and this choice could be the main driving force on how they would define how PwP might form groups . For example, some experts might prefer to focus primarily on some of the cornerstone PD-related motor symptoms whereas others might also consider additional non-motor, cognitive, and other symptoms in their routine assessment and overall consideration of how we could define PD subtypes. There is clear potential bias inherent to this expert-based approach in defining PD subtypes .
For this reason, data-driven approaches to define PD subtypes have received considerable attention in the PD research community over the last few years using cluster analysis [10,11,12,13,14,15]. In the following, we aim to provide a broad overview of studies published in this area placing emphasis on recent work, however, we remark we did not attempt to pursue a systematic literature review. Indicatively, Selikhova et al.  used clinico-pathological characteristics; Lawton et al.  used standardized clinical instruments to assess motor, non-motor, and cognitive domains; Brendel et al.  similarly used standardized clinical instruments assessing motor, non-motor, cognitive, daily activity and sleep; and Nguyen et al.  used sensor-based gait pattern analysis. Extracting variables from different modalities or clinical instruments to present to clustering algorithms may potentially lead to new insights, however, it makes comparisons across studies particularly challenging and may explain discrepancies in the reported PD subtypes. For example, there is no clear consensus amongst experts on the number of subtypes or the clinical PwP characteristics within those subtypes: a recent systematic review in PD cluster analysis (summarizing the research literature published from 1999 to 2021) found that most studies report the presence of two to five clusters . Almost all reviewed studies in  had used PD-specific clinical scales and diverse variables which were diverse across studies, e.g. some have used variables extracted from imaging techniques, additional self-reports and tests assessing cognitive status, amongst others . Many of these data modalities can be expensive and/or time consuming to administer. If there was some modality that was easy to collect (ideally remotely without the need of expensive specialized equipment) and reliably inform PD clusters, that could facilitate subtype membership assignment and provide a much needed impetus to this field. The underlying motivation of this study is, therefore, the development of an approach that relies on an easy-to-collect data modality that enables meaningful PD subtype definition.
Crucially for the purposes of this study, speech as an item within comprehensive PD clinical scales has been previously shown to be very strongly associated with overall PD symptom severity as assessed using standardized clinical metrics [17, 18]. Furthermore, in the context of PD assessment we have previously used sustained vowel /a/ phonation to demonstrate: (1) differentiating a control group from PwP with almost 99% accuracy , (2) accurately replicating the standard clinical metric used to assess overall PD symptom severity (Unified Parkinson’s Disease Rating Scale (UPDRS)) [20,21,22], (3) automatically assessing voice rehabilitation ; (4) distinguishing PwP from controls and groups which are known to exhibit genetic mutations or disorders which may lead to later PD diagnosis [24, 25]; and (5) speech articulation kinematic models to characterize PD dysarthria and provide insights into the underlying vocal production mechanism . Collectively, these studies and many others in the research literature convincingly demonstrate the enormous potential of using speech signals in the context of diverse PD applications.
Therefore, the use of speech towards defining PD subtypes suggests itself as a potentially useful approach. Surprisingly, this area is little explored, despite the immense interest of the research community into investigating speech-PD applications. Rueda and Krishnan  used sustained vowel /a/ phonations from 57 PwP and 57 matched controls aiming to cluster participants. Unfortunately, the very small sample size of that study limits the application of appropriate cluster analysis methods. Moreover, although we appreciate this was driven by the limited sample size, mixing PwP and controls is not leading to the definition of PD subtypes. Thus, to the best of our knowledge, we were the first to propose defining PD groups using clustering methods with variables extracted from sustained vowels in our two BIOSTEC-BioSignals papers [28, 29]. Both studies reported findings using data from the Parkinson’s Voice Initiative (PVI) study, the world’s largest known speech-PD dataset . In brief, more than 19,000 phonations were collected across seven major geographical locations (Argentina, Brazil, Canada, Mexico, Spain, USA, and the UK) from about 10,000 participants, where approximately 18% were PwP [30, 31]. In the first study , we used data from the largest PVI cohort (Boston), acoustically analyzed the phonations, and used hierarchical clustering to explore PD groups. The second study  followed exactly the same methodology to explore how well findings generalized in two additional cohorts (Oxford and Toronto).
In this study, we aimed to extend our previous results [28, 29] by: (1) more rigorously exploring the methodology of using acoustic analysis to mine sustained vowels to extract features and select a robust feature subset that was presented into cluster analysis algorithms and (2) more formally assessing the validity of the resulting clusters to draw meaningful insights into PD subtypes. The practical significance of this study is offering a novel data-driven approach to define PD subtypes building on the sophisticated methodology we have previously developed to successfully process sustained vowels across related PD applications: given sustained vowels are easy to self-collect remotely, regularly, and inexpensively, the proposed approach may facilitate large-scale investigations in exploring PwP groupings objectively.
The study draws on data from the PVI study [30, 31] for secondary analysis. The PVI invited people to contribute their voices to facilitate PD research. Self-enrolled participants called a dedicated phone number that was closest to their geographical location and were requested to provide some basic demographic information (age, gender), self-report whether they had received a clinical PD diagnosis, and record two sustained vowel /a/ phonations. The instruction for the phonations was to sustain vowel /a/ (say ‘ahh…’) for as long and as steadily as possible, following the standard speech protocol in this field  which we had used in similar PD studies [19, 20, 23, 25, 33]. The speech recordings were sampled at 8 kHz with 16 bits resolution, collected over standard telephone lines and stored in Aculab servers. Participants provided oral informed consent by self-enrolling and completing the requested steps during a short phone call, participating in this research project without additional incentives other than altruistically contributing to research.
In this study, we processed data from the three English-speaking sites of PVI (Boston, Oxford, Toronto), to explore how well findings compare and generalize in PwP coming from an English linguistic background. The basic demographic information for the PwP study participants is summarized in Table 1. Additional detailed information of PD-symptom specific aspects, including whether participants were “on” or “off” medication when they provided the speech recordings, or additional PD scales were not recorded in PVI. For further details on PVI please see [30, 31].