Introduction

Parkinson’s disease (PD) is the second most prevalent neurodegenerative disorder1,2. In China, there are currently over 3 million PD patients. It is anticipated that by the year 2030, the number of PD patients in China will reach 5 million, constituting half of the global PD patients3,4. Speech disorder in PD called hypokinetic dysarthria has been observed in up to 90% of the PD patients5,6. As the most complex quantitative indicator of motor function, speech exhibits remarkable sensitivity to neural damage, making it a compelling and promising biomarker7,8. Existing literature on non-tonal language exibited hypokinetic dysarthria characterized by reduced loudness, monotonicity, monoloudness, reduced articulatory precision, and altered speech rate in PD9,10. Recent advances in acoustic analysis techniques can provide such an objective feedback with rigorous guidelines11.

Various acoustic parameters measured by the state-of-the-art technologies such as the Praat12, wavesurfer13, Dysarthria Analyzer14, and commercial Visi-Pitch15 proved to be sensitive to prodromal, early, and developed stages of PD for non-tonal languages16. However, one of the main prerequisite for acoustic analyses is that the speech paradigms are suitable for the language given its pronunciation characteristics. Existing research on speech disorders in PD is predominantly conducted in English, accounting for 65% of the studies17. Mandarin Chinese is the most widely spoken native language in the world and the predominant language among the Chinese population, yet it belongs to the tonal languages, presenting numerous differences from non-tonal languages commonly used in speech disorders8,18. Mandarin Chinese convey both semantic and phonological information through logograms19, and defines syllables as a combination of four lexical tones with vowels and consonants demonstrating specific profiles of prosody, pitch variations and regularity20,21. Therefore, existing findings and speech protocols as defined in the guidelines22 can not be directly generalized without further research. Improving the mentioned acoustic technologies to tonal languages would require a thorough investigation of linguistic relevance, technical implementation, clinical validity, and relation to underlying pathophysiology.

Furthermore, for the dramatic impairment in quality of life and social isolation brought by speech disorders in PD patients, espically for advanced-stage PD patients, current PD therapeutic approaches, such as dopamine-based pharmacotherapy and deep brain stimulation (DBS)22,23,24, fall short in providing substantial benefits. Personalized and objectively monitored interventions are required for the effective management of speech disorders in PD. Exploring the specific brain regions associated with speech disorders lays the groundwork for devising more precise treatment strategies for PD.

The recent calls to increase inclusion of diverse populations in clinical research7,25,26,27 underline the need for comprehensive studies on underrepresented language groups to further our understanding of PD. We aim for the first time to address these issues by progressing on the current clinical guidelines22 to bridge the barrier between non-tonal and tonal languages and figuratively the Eastern and Western World in the domain of acoustic analyses of speech in PD. We will investigate the comprehensive acoustic profiles of Mandarin-speaking PD patients on four standard speech paradigms using semi-supervized measurements and compare their relation to motor status and magnetic resonance imaging (MRI) measurements of structure and function of brain.

Results

Clinical characteristics of the participants

A total of 160 participants with native Chinese Mandarin language proficiency were recruited for the study. The cohort consisted of 40 healthy controls (18 males, 22 females) with an average age of 60.68 years, 40 MRI controls (21 males, 19 females) with an average age of 59.83 years, and 80 PD patients (45 males, 35 females, please see Table 1 for more details) with an average age of 62.34 years, PD motor symptom duration of 8.61 (SD = 3.83) years, and mean MDS-UPDRS part III score of 45.56 (SD = 17.96). Please see Table 1 for a summary of clinical characteristics.

Table 1 Clinical characteristics of PD patients

Speech features

PD and HC groups were significantly different for a total of 19 speech features. Please see a summary in Table 2. The representative speech characteristics selected for the classification experiment are slow average diadochokinetic rate (DDKavr, p < 0.001), slow net speech rate (NSR, p < 0.001), longer duration of pause intervals (DPI, p = 0.002), increased standard deviation of F0 in sustained vowel /a/ (F0std, p < 0.001), increased regularity of the second formant variations (F2reg, p < 0.001), and reduction in vowel space area (Vowel Area, p < 0.001). The Maximum phonation Time (MPT) did not differ between PD and HC (p = 0.417). Furthermore, specific differences in selecting speech characteristics among PD patients at different HY stages are illustrated in Fig. 1. The features are described in more detail in Table 3.

Table 2 Disparities in acoustic metrics
Fig. 1: The clinical features of selecting matrices.
figure 1

ag Significant differences exist between PD and HC in diadochokinesis, second formant transition, fundamental frequency, vowel space area, as well as pause and speech rate. Two-sample t-test was used to analyze HC and PD groups. Statistically significant differences between H-Y PD groups and HC group after Bonferroni adjustment. hj The ROC curves were generated by plotting the true positive rate against the false positive rate by AUC with the Naïve Bayes. i Blue (HY1.5–2.0 vs HC) with AUC 0.8125, Purple (HY2.5 vs HC) with AUC 0.9725, Orange (HY3.0 vs HC) with AUC 0.9201, Red (HY4–5 vs HC) with AUC 0.9201; j Blue (HY2.5 vs 1.5–2.0) with AUC 0.7400, Red (HY2.5 vs 3.0) with AUC 0.6477, Green (HY2.5 vs 4–5) with AUC 0.6477, Purple (HY3.0 vs 1.5–2.0) with AUC 0.5982, Blue (HY3.0 vs 4–5) with AUC 0.5982, Orange (HY4–5 vs 1.5–2.0) with AUC 0.7000. k, l Prediction of UPDRS scores and sub-UPDRS scores using acoustic metrics through linear regression model and one-way analysis of variance (ANOVA) was conducted to compare with distinct speech scores. The Least Significant Difference (LSD) test was employed for pairwise comparisons between the patient groups. DDKavr average diadochokinesis rate, F0std standard deviation of F0, DPI duration of pause intervals, NSR net syllable rate, F2reg regularity of the second formant variations, MPT maximum phonation time, UPDRS Unified Parkinson’s Disease Rating Scale, PD Parkinson’s disease, HC speech controls. *p < 0.05, **p < 0.01, ***p < 0.001. The error bars represent the standard deviation of the mean.

Table 3 Description of Speech Features

The Naïve Bayes model combining DDKavr, NSR, DPI, F0std, F2reg, and VowelArea showed an overall AUC value of 0.931 for distinguishing between PD and HC. Additionally, the AUC for discriminating between HC and different HY stages of PD is depicted in Fig. 1h.

Relationship of speech features and clinical variables

As age, gender, MoCA, height, and weight are covariates, the passage reading metrics NSR (p = 0.002, R = −0.349) were significantly associated with bradykinesia (Med-Off) and MDS-UPDRS III (Med-Off) (p = 0.022, R = −0.265). Additionally, F0std was negatively correlated with tremor (Med-Off) (p = 0.003, R = 0.336). Furthermore, when comparing the speech subscores of MDS-UPDRS III between groups based on the “Med-Off” state, we observed significant differences in DPI and NSR across different speech ratings. Correlations are plotted in Fig. 1i, j.

Relationships between acoustic measurements and anatomical structure

With atlas-level, after regressing out the age, gender, MoCA score, height, weight, and dominant hand as covariates, slow reading measured by NSR is positively correlated with the thickness of the right lateral fusiform gyrus (FDR p = 0.010, R = 0.391) and right insula (FDR p = 0.007, R = 0.405). DPI in the reading passage is negatively correlated with the left hippocampus (FDR p = 0.071, R = −0.312) (Fig. 2b).

Fig. 2: The overall partial correlation results between Destrieux atlas-based cortical thickness and acoustic metrics after regressing out covariates in PD patients.
figure 2

a the overall correlation results between Destrieux atlas-based volume of subcortical nuclei and acoustic metrics. b The FDR p values distribution of the partial correlation between normalized cortical thickness and acoustic metrics. Most of the significant correlations were observed within the passage reading paradigm related measurements. c The FDR p values were projected to the corresponding anatomical regions, and brain regions that survived the FDR p < 0.05. The detailed areas and statistics were: Right: lateral fusiform gyrus (FDR p: 0.010 R: 0.391) and giant Insula and superior central sulcus (FDR p: 0.007 R: 0.405). d Linear regression plot of cortical thickness of brain regions after FDR correction, with respect to NSR. FDR false discovery rate, A anterior, P posterior, neg. negative, pos. positive, corr. correlation. DDKavr average diadochokinesis rate, F2reg regularity of the second formant variations, DPI duration of pause intervals, NSR net syllable rate.

The DDKavr in diadochokinesis is positively correlated with the left thalamus (FDR p = 0.031, R = 0.344), right thalamus (FDR p = 0.031, R = 0.331), brainstem (FDR p = 0.031, R = 0.337), and corpus callosum (FDR p = 0.033, R = 0.323).

F2reg measured in /i/-/u/ repetition paradigm is negatively correlated with the right thalamus (FDR p = 0.042, R = −0.339). The results are visualized in Fig. 2. Further vertex-wise correlation replicating similar results as shown in the Supplementary Table 2.

Relationships between acoustic measurements and functional connectivity and network properties

At the atlas level, after regressing out the age, gender, MoCA score, height, weight, and dominant hand as covariates, there was a significant negative correlation between whole-brain aCp and F0std (p = 0.031, R = −0.680), while a significant positive correlation was observed between whole-brain aEloc and DDKavr (p = 0.037, R = 0.663).

Regarding node efficiency, DPI exhibited a negative correlation with left hippocampus (uncorrected p = 0.002, R = −0.841), NSR showed negative correlations with left pallidum (uncorrected p = 0.003, R = −0.838) and right pallidum (uncorrected p = 0.002, R = −0.840). DDKavr displayed negative correlations with left dorsolateral superior frontal gyrus (uncorrected p < 0.001, R = −0.935) and left medial superior frontal gyrus (uncorrected p = 0.003, R = −0.836) (Fig. 3b). For node local efficiency, DPI was negatively correlated with left hippocampus (uncorrected p = 0.003, R = −0.831), NSR showed negative correlations with left pallidum (uncorrected p = 0.001, R = −0.872) and right pallidum (uncorrected p = 0.003, R = −0.832) (Fig. 3c).

Fig. 3: Relationships between acoustic measurements and functional connectivity and network properties after regressing out covariates in PD patients.
figure 3

a Linear regression plots depicting the associations between speech metrics and global properties after covariate regression. b, c Correlations between nodal local efficiency, nodal efficiency, and speech metrics after covariate regression (p < 0.005, uncorrected). df Correlations between language metrics at the AAL90 atlas level and functional connectivity (p < 0.005, uncorrected) after covariate regression. DDKavr average diadochokinesis rate, DPI duration of pause intervals, NSR net syllable rate. aCp average clustering coefficient, aEloc average network efficiency, NLE nodal local efficiency, NE nodal efficiency, HIP hippocampus, PAL pallidus, SFGdor superior frontal gyrus, dorsal, SFGmed superior frontal gyrus, medial.

DPI exhibited positive correlations with FC of right middle frontal gyrus and right caudate, FC of right inferior frontal gyrus and left superior frontal gyrus, FC of right superior frontal gyrus and left angular gyrus; and negative correlation with FC of left putamen and right Heschl’s gyrus. NSR showed positive correlations with FC of right precentral gyrus and right superior frontal gyrus, FC of left, right superior frontal gyrus, right fusiform gyrus and right postcentral gyrus; negative correlations with FC of left posterior cingulate gyrus, right posterior cingulate gyrus, and right middle temporal gyrus. The FC results of DDK, along with correlation coefficients, are detailed in Fig. 3 and Supplementary Table 4.

Discussion

This study for the first time extends of standard clinical guidelines22 for tonal languages, highlighting the effectiveness of a comprehensive acoustic analysis of speech for Chinese-speaking patients with PD and conducting MRI studies on their potential brain functions and structures. We observed the possibility of acoustic features combination in Mandarin for diagnosis and disease monitoring in PD. Additionally, localized atrophy in the hippocampus and fusiform gyrus, reduced nodal efficiency in the hippocampus and pallidum, and decreased functional connectivity between the cingulate gyrus, motor cortex, and inferior frontal gyrus, were correlated with the speech production and reading.

The speech disorder characteristics observed in Chinese-speaking PD cohort were partially different to those in atonal languages5,6,8,9,10,28. Due to differences between Chinese and Indo-European languages, the presence of four tones may render Running Standard Deviation of F0 (rSTD) and Running Variation of Amplitude (rvAm) during reading less sensitive to distinguishing abnormal speech states in PD. However, many studies in non-tonal languages have identified differences of rSTD and rvAm between PD and HC7,8. We attribute this to the dominance of tones in expressing meaning during Chinese reading, as opposed to emphasis on stress and F0 fluctuations. This suggests that the four tones in Chinese-speaking PD patients may not be significantly impaired, but further research centered on tonal changes is required for confirmation.

Although we did not include non-tonal languages in this study, we followed comparable examination protocols and used identical analytical software for the calculation of our key metrics of voice of onset (VOT), NSR, and DPI as in the previous multicentric study covering Czech, German, English, French, and Italian8. The result on VOT in Chinese did not reveal significant differences, which can be attributed to the distinct pronunciation of consonants in English and the Mandarin Pinyin initials, when measuring VOT during /pa/ pronunciation, the Chinese /pa/ sound has a longer VOT compared to non-tonal languages29. Nevertheless, vowel pronunciation in Chinese and Indo-European languages exhibit similarity, with impaired articulation reflected in the reduction of vowel area among PD patients11. Furthermore, despite significant differences in stress, grammatical structure, and vocabulary systems between Chinese and Indo-European languages, NSR and DPI show comparable trends and effectivity in differentiating PD and HC8,11,30. Previous studies have also highlighted differences between Chinese and Polish, emphasizing that vocal features sensitive to diagnosis in non-tonal languages may not be effective in Chinese31. This underscores the importance of screening language features specific to Chinese.

The vast majority of selected acoustic features showed significant group differences between HC and PD and these differences persist even after stratifying by H-Y stage. The combination of six distinct features offered high performance (AUC > 0.9) in PD diagnosis. Additionally, while the model demonstrates only moderate performance in classifying HY stages, the relationships between NSR and bradykinesia, as well as F0std and tremor, indicate that vocal biomarkers are associated with disease progression. In comparison to the study conducted by ref. 32, who aimed to build the high performance model from a small dataset (N = 34) from technique perspective, we provided further insight on the clinical interpretability while keeping a comparable AUC over 0.9 in a larger dataset. Such approach could be of great benefit for remote assessment and disease monitoring for the Chinese PD population33. Furthermore, the preoperative speech disorders are one of the crucial factors influencing the long-term speech prognosis of patients after DBS34 and it is equally essential to evaluate the speech disturbance utilizing appropriate paradigms and speech metrics during DBS surgery35. Hence, quantitative acoustic evaluation in Mandarin also holds significant importance for DBS candidates. Notably, the foundation of this study lies in the methodologies developed by Jan Rusz’s lab14,36,37 and the MSP Program. Although the proposed speech analyses were newly adapted to the Mandarin tonal language and thus were performed under supervision to achieve full control over the quality of the processing, we hypothesize that the final solution can be computerized since the Dysarthria Analyzer14 as well as the MSP Program are already fully automated.

In addition to all the possible benefits, the compelling one is the discovery of relationships between vocal biomarkers and brain structures and network. Impairment in the reading passage of PD patients may not only involve damage to articulation or prosody but also extend to higher-order language deficits38. The correlation between hippocampus atrophy, local efficiency impairment, and DPI highlights the role of the hippocampus in language processing39. The hippocampus is implicated in cognitive language processing, including semantic encoding40,41. It combines incoming words with stored semantics39, engages in sensory prediction based on associative contextual representations encoded in the hippocampus, and ultimately provides feedback by coupling with the auditory cortex, thus “predicting” the expression of sentences42. The observed reduction in functional connectivity between the temporal lobe and the striatum further supports this notion. Additionally, it is noteworthy that Mandarin-speaking patients encode sentences by syllables and express semantics through word phrases43. The relationship between DPI and the hippocampus further underscores the connection between pauses and semantic expression in Mandarin44,45. This further underscores that the mid-to-late stages of PD patients experienced a spectrum of mixed speech disorders involving both cognitive and motor control levels.

Additionally, reduced activation is noted in the premotor cortex in PD patients experiencing speech difficulties, aligning with our findings of decreased functional connectivity between the right precentral gyrus and the right medial superior frontal gyrus in patients with slowed speech46. Consistent with our results, a recent study also reported an association between reading impairment and damage to the cingulate gyrus in PD patients47. The dual-pathway theory of reading attributes the SFG and fusiform to “access to meaning,” while assigning the precentral gyrus to “access to pronunciation and articulation”48. This corroborates the atrophy of the fusiform gyrus, along with reduced functional connectivity observed in the fusiform gyrus, superior frontal gyrus, and precentral gyrus in PD patients with slowed NSR, suggesting potential impairments in speech encoding and motor generation functions37,49. In addition to cortical regions, PD patients with speech disorders exhibit aberrant functional connectivity between the globus pallidus and premotor cortex50, we similarly observed compensatory increases in nodal efficiency and local efficiency of the pallidus in PD patients with slowed speech. This suggested the damage to speech in PD may entail a broad-scale network disruption encompassing both cortical and subcortical regions50,51. The reduced overall efficiency further supports this notion.

Thalamus is significantly involved in speech perception and production52,53 as underlined by correlation between atrophy of thalamus and articulation. In previous fMRI studies of DBS candidates, we identified changes consistent with our functional connectivity results. Furthermore, our study revealed a correlation between these abnormal functional connections and speech metrics54. Articulation disorders may be more closely related to brain regions associated with motor control, such as the thalamus, while prosody disorders may be more connected to the entire process of speech production. Specific vocal biomarkers are key for the functional localization of lesion following the landmark work by ref. 55 who established the hypothesis and linked structural damage to motor speech disorders. The presented findings could help not only with the study of basal ganglia and diagnosis but also with personalization of neuromodulation therapies and experimental design of invasive intracranial recordings53,56.

Only Mandarin-speaking candidates with a disease duration of 5 years or more were included in the PD cohort, which limits the generalizability of our results to tonal languages. Additionally, the brain functional correlations were exploratory and not subject to rigorous multiple comparison correction. Studies on brain structure and function are predominantly observational, and have less of fMRI data, and future work should employ intervention approaches such as transcranial stimulation to validate the association of specific brain areas to speech manifestations.

To summarize, we identified a tremendous diagnostic value of the proposed comprehensive acoustic evaluation of Mandarin speech for the rapid determination of PD progression stages as well as localization within different speech subsystems of structural and functional impairments in the hippocampus, cingulate gyrus, and basal ganglia. This study extends the conventional speech examination protocols and paradigms based on non-tonal languages22, which creates an opportunity for the vast language group to share harmonized results, contributing to the expansion of our knowledge about PD.

Methods

Study design and participants

Total of 80 PD patients, 40 healthy controls (HC) for speech examination, and 40 normal MRI controls were prospectively recruited from October 2022 to April 2023 at Beijing Tiantan Hospital, Capital Medical University, China. The inclusion criteria for PD cohort were: (1) the diagnosis of idiopathic PD and meeting the UK Brain Bank criteria for PD57; and with a disease duration of 5 years or more; (2) normal cognitive function (scores of Montreal Cognitive Assessment (MoCA) ≥ 24) or mild cognitive impairment (meeting the criteria for the diagnosis of PD-MCI level I58 and 16 ≦ MoCA < 24)59; (3) completing the entire experiment and cooperating fully with the investigators; (4) meeting the quality control standards on neuroimaging and speech examination. The severity of motor symptoms was defined using the Movement Disorders Society-Unified Parkinson’s Disease Rating Scale, Parts III (MDS-UPDRS-III) and Hoehn & Yahr scale (H-Y scale), while speech examination was conducted in the medication off-state with at least 8 h after dopamine withdrawal. The healthy controls were selected from healthy individuals without any history of communication disorders or respiratory diseases to match the PD patients age range between 40 and 80 years. The research flowchart and baseline information are in Fig. 4 and Table 1. Ethical approval for this study was obtained from the Beijing Tiantan Hospital, and all patients provided written informed consent, adhering to the principles of the Declaration of Helsinki.

Fig. 4: Participant flowchart.
figure 4

After applying inclusion and exclusion criteria, we included PD patients and normal subjects. Speech recordings were obtained from PD and speech control groups using recording equipment for subsequent speech analysis. FreeSurfer and GRETNA were used to process MRI scans, obtaining both structural and functional MRI data for PD patients and MRI controls. Data analysis was conducted after normalization based on the control groups.

Clinical examinations

The clinical evaluation of each subject consisted of: (1) a review of their personal and medical history, including information on gender, age, height, weight, and past medical conditions; (2) quantitative testing of motor symptoms of PD using the MDS-UPDRS III; and (3) cognitive testing using the MoCA. All clinical scales and diagnoses were conducted by a neurologist with the expertise in movement disorders.

Speech examination

The speech was recorded in a closed room with low ambient noise and reverberation levels using a head-mounted condenser omnidirectional microphone (Sennheiser HSP-2, Germany) placed ~2 cm from the participant’s mouth (Fig. 5a). The sampling frequency was set to 48 kHz with 16-bit resolution. Each participant was examined by a trained speech specialist within a single session. Participants were instructed to perform the following vocal tasks with two repetitions: (1) sustained phonation of the vowel /a/, /i/, and /u/ per single breath for as long and steadily as possible; (2) fast diadochokinetic (DDK) syllable repetition at least 7 s per single breath; (3) fast /i/-/u/ syllable repetition at least 5 s per single breath; (4) reading a short passage of a weather report containing 105 Chinese characters (syllables).

Fig. 5: Overview of applied acoustic measurements.
figure 5

a The voice was recorded using a head-mounted condenser microphone positioned 2 cm away from the lips. b A sound wave example of the DDK paradigm was obtained to measure the speed related indicators of the DDK alternate motion. c A sound wave example of the fast /i/-/u/ transition was obtained to measure the formant F2 transition related indicators. d Sounf waveform examples of monophthong pronunciation was used to obtain the first and second formant related indicators for each vowel sound. e The sound waveform example of the pronunciation of /a/ was used to obtain relevant indicators such as fundamental frequency and its variability during the production of a monophthong. f The vowel triangle area was obtained by measuring the first and second formant of /a/, /i/, and /u/. g An example spectrogram of the vowel /a/ showing the maximum phonation time. h An example of paragraph reading paradigm was used to obtain relevant indicators such as reading speed and pauses.

We tested both fast single /pa/ and triple syllable /pa/-/ta/-/ka/ for the DDK paradigm in all patients to evaluate the discriminatory effect of syllable types. The effect size (HC vs PD) doesn’t differ significantly between the tasks (Supplementary Table 1), therefore, only the fast /pa/ paradigm was included in the further analysis for simplicity. The paradigms were customized for Mandarin Chinese.

Parameter extraction

We performed the acoustic analysis using vocal biomarkers carefully selected from established features documented in previous studies on tonal languages to ensure enhanced applicability and comparability with tonal languages.

The audio quality control and trimming were performed in WaveSurfer (https://sourceforge.net/projects/wavesurfer/). We implemented a battery of acoustic features using WaveSurfer and combined with in-house MATLAB scripts and Dysarthria Analyzer (https://www.dysan.cz/)14 in a semi-supervised manner in order to gain a more increased and controlled quality of measurements.

The DDK was analyzed by calculating the waveform envelope and applying the findpeaks function to identify the syllable onset, syllable nuclei, and syllable offset with manually set constraints such as the threshold of peak’s prominence to get optimal detection accuracy. We described the detected events via standard parameters. Furthermore, the VOT was measured by Dysarthria Analyzer supervised by inspecting the results via spectrogram (more details in Table 3).

The formant estimation in /i/-/u/ repetition paradigm was carried out via linear predictive coding in WaveSurfer with default settings. The transitions of the second formant (F2) were identified with the supervised method similar to the DDK analysis and parameterized with common descriptors (Table 3).

The pitch and amplitude of the sustained vowel /a/ were analyzed using WaveSurfer as follows. We optimized the setting of the pitch detection constraints manually for each recording and calculated the statistics such as mean and standard deviation over the detected F0 sequence in MATLAB. Additionally, we parameterized also amplitude within the F0 intervals by standard deviation and other metrics outlined in Table 3. MPT and Harmonic-to-noise ratio were calculated using the Dysarthria Analyzer14 supervised by plotting the results against the spectrogram as illustrated in Fig. 5.

Pauses in the reading passage task were determined as unvoiced and non-consonant signals, including the intervals of respiration following the established criteria60. Pauses were identified using an automated method incorporated within the Dysarthria Analyzer14, with the minimum pause duration set at 30 ms60. All results were supervised by inspecting the results via spectrogram. The net syllable rate (NSR) was calculated as a ratio of 105 syllables read and total net time of reading excluding pause intervals according to definitions in Table 3.

A total of 31 speech parameters were calculated in MATLAB according to definitions summarized in Table 3 based on well-established methods11,22,61 and/or inspired by the Motor Speech Profile (MSP) Program (PENTAX Medical, New Jersey, USA) and the Dysarthria Analyzer14.

Please see Fig. 5 illustrating speech feature extraction on examples of sound waves for various speech paradigms. The Supplementary Fig. 1 illustrates the semi-supervised measurement process. Please see Supplementary Material 1 and Table 3 for feature details.

Features selection and normalization

We conducted feature selection to reduce the number of analyzed features and prevent overfitting in the machine learning experiment, while also aiming to provide clinically relevant metrics for practical results interpretation. We selected representative measurements from four speech dimensions including prosody, articulation, phonation, and respiration based on the following criteria: high effect size, high clinical interpretability, and low intraclass correlation. The MPT was included as the key characteristic of respiration to make the final analysis more comprehensive. To enhance comparability with non-tonal languages, we selected the metrics recommended by the clinical guidelines that were previously employed in studies involving PD in non-tonal languages8,11. All the measurements of all the participants were normalized to z-scores separately for each gender using the data of healthy controls for further statistical analysis.

MRI acquisition

Magnetic resonance imaging images of 80 PD patients and 40 MRI controls were acquired using 3.0T MRI scanner (Simens Medical Systems) with 32-channel head coil. A whole-head three-dimensional sagittal T1-weighted 3D magnetization-prepared rapid acquisition gradient echo (MPRAGE) sequence was used (repetition time, 1.56 ms; echo time, 0.00169 ms; flip angle, 8°; matrix size, 256 × 256; isotropic voxel, 1 × 1 × 1 mm3; the number of slices, 196).

We included fMRI scans of 16 PD patients with an off-medication period of 12 h or longer. Patients whose head motion exceeded 3 mm of translation were excluded. During the scan, patients were instructed to keep their eyes closed and remain awake. Blood-oxygen-level-dependent images were acquired employing an echo-planar imaging sequence with the following parameters: a repetition time (TR) of 750 ms, an echo time (TE) of 35 ms, a flip angle of 52°, an acquisition matrix of 92 × 92, a field of view (FOV) of 100 mm × 100 mm, a Multiband Acceleration Factor of 8, and a scanning time approximately equal to 600 s.

Imaging processing

We converted all DICOM files to NifTi format using the SPM12 software (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/) and carefully inspected the data for quality. For data preprocessing and image analysis, we employed the FreeSurfer software (version development, http://www.freesurfer.net), and GRETNA 2.0.0 toolbox (https://www.nitrc.org/projects/gretna/) as previously described62,63.

The feature extraction of cortical thickness and subcortical nuclei volume was performed in both vertex-level and atlas-level approaches. We utilized the Destrieux atlas62 for cortical segmentation, extracting and calculating the average cortical thickness for each region as the regional thickness. To normalize the regional thickness in each patient, we first normalized the regional cortical thickness by the global mean of cortical thickness at an individual level and subsequently, the results were z-score normalized to the healthy controls64. The z-score values of cortical thickness were used in the correlation analysis with acoustic measurements. Similarly, we treated the volumes of subcortical structures (extracted from aseg.stats output file) as subcortical nuclei features, and the normalization was calculated similarly as for the thickness.

The intrinsic connectivity network within the brain is comprised of nodes and edges. Utilizing the AAL90 atlas, the brain is segmented into 90 nodes, and a functional connectivity network (FCN) of size 90 * 90 is constructed by computing correlations of the time series for these 90 nodes. To correct for the non-normality of the correlation coefficients, Fisher’s z transformation was applied to calculate the z-scores. Global properties (average clustering coefficient (aCp) and average network efficiency (aEloc)) and node properties (nodal local efficiency (NLE) and nodal efficiency (NE)) are calculated based on the FCN using graph theory.

Statistical analysis

Group level feature comparison was conducted using two-sample t-test. The comparison between different HY stages of PD and HC was conducted using one-way ANOVA analysis, with Bonferroni correction applied for multiple comparisons. Age, gender, MoCA, height, and weight were used as covariates for the partial correlations analysis between acoustic features and MDS-UPDRS III.

Additionally, we performed a classification experiment using Gaussian Naïve Bayes classification in a 5-fold cross-validation scheme to differentiate between HC and PD using the set of six selected speech features. The performance was described with the receiver operating characteristic (ROC) and the area under the ROC curve (AUC).

We accounted for the age, gender, MoCA score, height, weight, and dominant hand as covariates. The correlation between standardized speech metrics and standardized cortical thickness/subcortical volume, functional connectivity, and network properties was analyzed using Pearson correlation. In the statistical analysis of structural MRI, the false discovery rate (FDR) correction was applied to control for multiple comparisons across different regions. For fMRI statistical results, given the relatively limited number of patients included and the exploratory design, results with p-values below 0.005 are presented without correction.

All Hypotheses were considered two-tailed and tested with the significance threshold of p < 0.05. Statistical analyses and visualizations were performed using MATLAB 2021b (MathWorks Inc., Natick, Massachusetts, USA).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.