Introduction

Parkinsonian disorders comprise a group of neurodegenerative conditions sharing motor symptoms, such as slow movement (bradykinesia), stiffness (rigidity), and shaking (tremor). Parkinson's disease (PD) is the most common among these conditions [28]. Other less frequent but clinically important parkinsonian disorders include multiple system atrophy (MSA), dementia with Lewy bodies (DLB), progressive supranuclear palsy (PSP), and corticobasal syndrome (CBS) [2]. While these disorders differ in the type of protein, cell type, and brain region afflicted, they are often misdiagnosed by neurologists due to symptom overlap, especially in early stages [3, 31, 34]. Moreover, currently, there is no concrete method to precisely ascertain the timing, progression, and specific outcomes of prodromal conditions like REM behavior disorder (RBD) and pure autonomic failure (PAF) [8, 9].

Misdiagnosis not only negatively impacts patient prognosis, potentially leading to inappropriate treatments and worsening health outcomes, but also exacerbates emotional distress, intensifying feelings of uncertainty and anxiety about their health conditions, and impacts appropriate patient stratification in clinical trials. This lack of reliable diagnostic tools also obstructs efforts to assess disease-modifying treatments during the prodromal stages, a critical period where majority of neuronal death occurs [6].

Extracellular vesicles (EVs) are tiny, bi-lipid membrane-enclosed structures released by cells, which play vital roles in facilitating communication among cells and regulating various bodily processes. Unlike living cells, EVs do not replicate and serve as carriers of biological cargo, enabling the exchange of molecular information and contributing to intercellular signaling. They contain a diverse array of biomolecules, including proteins, lipids, and nucleic acids, which mirror the condition of the originating cell [10]. Due to their ability to traverse the blood–brain barrier to the peripheral circulation [38], speculative central nervous nervous system (CNS)-enriched EVs may provide a unique insight into the brain's biochemical processes, enabling the investigation of CNS functions and the identification of potential biomarkers in neurodegenerative conditions such as parkinsonian disorders [13].

As potential carriers of cell-state-specific information from the CNS to the peripheral circulation, speculative CNS-enriched EVs have emerged as a possible tool for minimally invasive diagnostic and therapeutic strategies in parkinsonian disorders. Many groups have quantified biomarkers in speculative CNS-enriched EVs for the differential diagnosis of these disorders from one another and/or from healthy controls (HCs) [13]. Despite this, there has been consistent failure in independent validations, replication, and differing outcomes even when the same methodology is employed.

A recent meta-analysis suggested that the combined concentration of α-synuclein (α-syn) in speculative neuronal and oligodendroglial EVs (nEVs and oEVs, respectively) may be higher in patients with PD in comparison to HCs, CBS, and PSP [41]. These elevated concentrations could potentially be utilized to develop a diagnostic test for these diseases. However, the meta-analysis did not compare the diagnostic accuracy of tests utilizing biomarkers in speculative CNS-enriched EVs, which include α-syn combined with other biomarkers.

Our goal is to expand upon previous findings by conducting a meta-analysis of diagnostic accuracy using studies attempting to differentiate either prodromal or established parkinsonian disorders from each other or from HCs, using biomarkers in speculative CNS-enriched EVs. We use the term “speculative CNS-enriched EVs” for two key reasons. Firstly, current research has yet to conclusively demonstrate that these enriched EVs originate specifically from the brain. This uncertainty is compounded by the fact that the markers used to perform CNS enrichment are also found on other cell types, or even in soluble forms [26], which have been shown to cross-react with the antibodies used for biomarker quantification. Secondly, the integrity of these EVs as purely CNS-originating is questionable. EVs are known to be absorbed and recycled by various cells through different mechanisms, even if they initially come from the CNS [10]. This process of uptake and rerelease further obscures their original CNS origin.

Methodology

We performed a systematic review and meta-analysis according to the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA). Our research exclusively utilized anonymized data, with no collection of personal information or involvement of human subjects, thus obviating the need for ethical approval. The study protocol was not registered.

Standard protocol approvals, registrations, and patient consents

Standard protocol approvals, registration, and patient consents are not applicable to this meta-analysis.

Data sources and search strategy

We performed a thorough search for relevant articles using specific search terms related to PD and parkinsonian disorders. The search was conducted in two databases (PUBMED and EMBASE) and covered articles published from the inception of the databases until Sept 29, 2023. The search terms we used included combinations of “Parkinson's disease OR multiple system atrophy OR Lewy body dementia OR corticobasal syndrome OR progressive supranuclear palsy” AND “Extracellular Vesicle OR exosome” AND “Diagnosis”. We manually examined the reference lists of eligible studies and conducted thorough literature reviews to identify suitable studies for inclusion. Any discrepancies in the selection of articles were resolved through discussions. The comprehensive search strategy can be accessed in Table S1.

Eligibility criteria

The eligible studies included in our analysis focused on assessing biomarkers in speculative CNS-enriched EVs isolated from cerebrospinal fluid, plasma, serum, urine, or saliva in patients with PD along with at least one of the following diseases: MSA, DLB, PSP, CBS, RBD, PAF, or HCs. The studies must have included a receiver operating characteristic (ROC) analysis and provided sensitivity, specificity, area under curve (AUC), and sample size. We excluded studies that used animals or cell lines, studies that did not include the specified diseases, and studies that did not report the sample size. We excluded studies that have used general EVs as they have been reviewed elsewhere [42]. If sensitivity, specificity, or sample size were not included in the study, we contacted the authors to obtain the missing information. For studies that included longitudinal measurements or treatment interventions, we only considered the baseline assessments. For studies that included discovery/training and validation ROC models, we only considered the validation model. In cases where more than 2 ROC models existed, we chose the model with the best AUC for reporting. In three studies [1, 16, 24], two models performed similarly, and we included both models. In one study [35], there were only two models, but we excluded the one with AUC close to 0.50, indicating no accuracy.

Risk of bias assessment

The quality and the risk of bias of all eligible studies were evaluated using the Quality Assessment for Diagnostic Accuracy Studies (QUADAS-2) criteria [47]. The assessment was carried out by independent researchers (HBT and AB), and any disagreements were resolved through discussion until a consensus was reached. Additional details regarding the quality assessment can be found in Table S2.

Data synthesis and statistics

In this study, we chose a hierarchical summary ROC (HSROC) and a bivariate model [20, 30] utilizing a random effect with a restricted maximum likelihood estimation method in analyses where the number of studies was > 3. This approach allows a comprehensive assessment of the diagnostic accuracy measures, accounting for both within-study and between-studies variability as well as the inherent negative correlation between sensitivities and specificities across studies. In cases where the number of studies was ≤ 3, we utilized a univariate model as the parameters in the bivariate model are not recommended when there are only a few studies [45]. Detailed information regarding the HSROC and bivariate models is described elsewhere [42]

In addition, we utilized informative graphical representations, including crosshair plots, which integrate both ROC curves and forest plot means. These visualizations allow us to simultaneously examine the bivariate relationship between sensitivity and false-positive rate (FPR or 1-specificity) while assessing the degree of heterogeneity across studies. Notably, wider crosshairs on the plot indicate a larger sample size, reflecting the level of precision and reliability in the estimates. The ROC ellipse plot visually represents the estimated uncertainty of the pair (sensitivity, FPR) in logit ROC space using confidence regions. The ellipses in the plot symbolize the variability of the sensitivity and FPR estimates, providing an indication of their statistical uncertainty.

The summary ROC curve utilized both the dotted means obtained from the bivariate model with its corresponding confidence interval as well as the summary line obtained from the HSROC model [32], which describes the relationship between the mean sensitivity and specificity. In this meta-analysis, when significant heterogeneity is present, the summary line provides more informative results compared to the point means of sensitivities and specificities, as it comprehensively takes into account the heterogeneity across the included studies [45]. The accuracy of the test increases as the point summary of sensitivities/specificities and the summary line approaches the upper left corner.

Funnel plots, Begg’s rank correlation, Egger’s and Deek’s regression tests, and trim-and-fill method [36] were used to evaluate publication bias [21].

Results

The systematic search identified 403 studies of which 73 duplicated studies were removed. After title and abstract screening of 330 studies, 67 studies were considered potentially eligible (Fig. 1). After full-text screening, 49 studies were excluded. Forty-three of those studies did not enrich for speculative CNS-enriched EVs. Five studies enriched for speculative CNS-enriched EVs [4, 19, 22, 23, 27] but did not include information for sensitivity and specificity from the diagnostic test. One study was excluded because it included preliminary data [14]. All authors were contacted to obtain the missing information.

Fig. 1
figure 1

PRISMA flow diagram for inclusion of selected studies

In total, the meta-analysis included 18 studies [1, 5, 11, 16,17,18, 24, 25, 35, 37,38,39, 43, 46, 48,49,50, 52] with 1695 patients with PD, 253 with MSA, 21 with DLB, 172 with PSP, 152 with CBS, 189 with RBD, and 1288 HCs (Table 1). Using biomarkers in speculative CNS-enriched EVs, most studies attempted to differentiate patients with PD from HCs (n = 16, 88.8%). Six studies attempted to differentiate patients with PD from MSA [11, 16, 17, 43, 46, 49] while two studies aimed to differentiate patients with PD from PSP and CBS [16, 24]. One study attempted to differentiate patients with PD from frontotemporal dementia (FTD), PSP and CBS [17]. Three studies aimed to differentiate patients with MSA from HCs [11, 43, 49] or patients with RBD from HCs [17, 35, 48]. Most studies utilized biomarkers in speculative nEVs (n = 16, 88.8%). Only three studies used speculative oEVs [49,11,43] while one study used speculative astrocyte EVs (aEVs) [46] . To quantify biomarkers in speculative CNS-enriched EVs, the included studies utilized bead-based arrays such as Luminex [38, 49] and single-molecule array (Simoa) [37, 52], enzyme-linked immunosorbent assay (ELISA) [50, 5, 39, 1,24], electrochemiluminescence ELISA (ECLIA) [25, 18, 17, 16,11, 43], western blots (WB) [48], flow-cytometry [46], or an in-house electrochemical assay [35]. 

Table 1 Demographic characteristics of patients with parkinsonian disorders or healthy controls (HC) included in the meta-analysis

The studies included in the analysis were generally of high quality as indicated in Table S3. However, there was a lack of clear reporting on the sampling method, which made the assessment of the risk of bias in patient selection unclear. One study excluded participants from their analysis and was deemed to have a high risk of bias [48]. The measurement of biomarkers in speculative nEVs and oEVs was considered to have a low risk of bias in the index test domain, as it is an objective measure unaffected by prior knowledge of the clinical status. While the majority of the articles (66.7%) had a low risk of bias in the reference standard domain, four studies using an in-house test [12, 35, 38, 43, 49] and one using WBs [48] were identified as having a high risk of bias. In terms of the Flow and Timing domain, all studies were deemed to have a low risk of bias as the time interval from clinical diagnosis to biomarker measurement could be reliably estimated.

As previously indicated [40,41,42], several pre-analytical elements can have a substantial effect on the purity, content, dimensions, and amount of EVs. Such elements involve the selection of anticoagulation molecules mixed with plasma, EV isolation methodology, the centrifugation procedure, the transportation characteristics, the frequency of freezing and thawing cycles, the storage parameters, the temperature, and the type of tube used for collection. Regrettably, these aspects are not universally standardized across biobanks or methods of clinical lab blood collection. Moreover, the use of the anti-L1 cell adhesion molecule (L1CAM) antibody clone UJ127 has initiated doubts regarding its possible cross-reactivity with α-syn antibodies [26].

To tackle these concerns, we performed subgroup analyses based on the medium (either plasma or serum) and the type of antibody clone (e.g., L1CAM clone UJ127 or 5G3) for the analyses for patients with PD vs HCs. We did not perform such analyses for other diseases due to the small number of studies included.

Descriptive statistics of the meta diagnostic analysis including the sensitivity, specificity, FPR, diagnostic odds ratio (DOR), positive likelihood ratio (posLR), and negative likelihood ratio (negLR) for each included analysis are summarized in Table 2.

Table 2 Descriptive statistics of the diagnostic metrics of studies included in the meta-analysis

PD vs control

Sixteen studies attempted to differentiate patients with PD from HCs using biomarkers in speculative nEVs [1, 5, 11, 16,17,18, 25, 35, 37,38,39, 43, 48, 50, 52], oEVs [11, 43], and/or aEVs [46]. The AUC ranged between 0.610 and 0.915 with the highest AUC obtained in 2023 from Wang et al. [46], while the sensitivity (Fig. 2A) and specificity (Fig. 2B) ranged between 0.10–0.97 and 0.50–0.95, respectively. The chi-square (χ2) equality test revealed high heterogeneity for sensitivity (χ2 = 312.45, df = 23, p < 0.0001) and specificity (χ2 = 345.19, df = 223, p < 0.0001). Both crosshair and ROC ellipse plots confirmed the heterogeneity present (Fig. S1A, B). Univariate Forest plots of the DOR, posLR, and negLR for each individual analysis are shown in Fig. 2C–E.

Fig. 2
figure 2

Diagnostic accuracy of biomarkers in speculative CNS-enriched EVs for the differential diagnosis of Parkinson’s disease (PD) from healthy controls (HCs). A-E Univariate Forest plots for sensitivity, specificity, diagnostic odds ratio (DOR), positive (posLR) and negative (negLR) likelihood ratios, respectively. F Summary receiver operating characteristics (SROC). The dotted circle shows the mean summary estimate of sensitivities and specificities using a bivariate model. The summary line is obtained from a hierarchical SROC model. CNS central nervous system; EVs extracellular vesicles

Bivariate and HSROC models (Table 3), each demonstrated a fair discriminatory ability of the diagnostic test. These models independently suggested that measuring biomarkers in speculative CNS-enriched EVs achieved fair accuracy in distinguishing patients with PD from HCs.

Table 3 Meta-analysis of diagnostic accuracy for patients with Parkinson’s disease vs healthy controls summary statistics for the bivariate and hierarchal summary receiver operating characteristic (HSROC) models

Heterogeneity (I2) values showed significant variations depending on the approach utilized. Zhou and Dendukuri [51] reported a value of 35.1%, while Holling's sample size unadjusted [33] had values ranging from 87.7% to 93.5%, and the adjusted values ranging between 7.3% and 9.2%. However, all approaches generally indicated substantial heterogeneity across the studies, suggesting that the variability in the results cannot be attributed solely to random chance but rather to differences between the studies themselves, supporting the crosshair and ROC ellipse plots. It is also in agreement with the fact that studies measuring biomarkers in speculative CNS-enriched EVs generally suffer from failure of independent validation, which could be due to methodological and expertise heterogeneities. Though, as mentioned in the introduction, failure of independent validation and replication is often observed. This is the case even when the same methodology is employed.

The HSROC for this model is provided in Fig. 2F. The model suggested that measurement of biomarkers in speculative CNS-enriched EVs for distinguishing patients with PD from HCs may not be promising. The summary line shows an inverse relationship between sensitivity and specificity, indicative of a threshold effect, with the line being far away from the upper left corner. Moreover, while some studies achieved high sensitivity and specificity, the combined mean indicated that this test only achieves a fair distinguishing ability.

Importantly, few studies subdivided patients with PD to early vs advanced stages [5, 18, 25, 48, 50, 52] or only included patients with early-stage PD [39] using the Hoehn and Yahr scale: early-PD: 1–2 and late-PD: 3–5 [5, 50, 52], early-PD: ≤ 2 [25, 48], ≤ 2 with disease duration < 5 years [18] or 1–2.5 and drug-naïve [39]. Two of these studies [18, 25] attempted to differentiate patients with early-stage PD from HCs from similar research groups. Unfortunately, one of them [25] had the lowest sensitivity and specificity. This suggested that biomarkers in speculative CNS-enriched EVs may not be a good way to discriminate early-stage patients with PD from HCs despite it being the most clinically desired outcome of the test.

All statistical tests conducted to assess publication bias in our analysis consistently indicated the presence of such bias. Begg’s correlation test revealed a significant positive correlation between lnDOR and its variance (tau = 0.48, p value = 0.0016; Fig. 3A), implying that larger effect sizes were associated with greater variances. Similarly, Egger’s regression test showed a significant positive relationship between the lnDOR and the standard error of the lnDOR (slope = 5.092, SE = 1.45 t = 3.51, p = 0.0019; Fig. 3B), suggesting that smaller studies, which tend to have larger standard errors, were reporting larger effect sizes than what would be expected if there was no bias. Finally, Deek’s regression test also indicated potential publication bias, with a significant positive slope (slope = 34.39, SE = 9.19, t = 3.74, p = 0.0011; Fig. 3C) showing that studies with smaller effective sample sizes were associated with larger effect sizes. Further examination of publication bias using Deek’s funnel plot (Fig. 3D) and a bivariate bagplot (Fig. 3E) also suggested the presence of publication bias.

Fig. 3
figure 3

Publication bias was assessed using A Begg’s correlation, B Egger’s regression, C Deek’s regression, D Deek’s funnel plot, E A bagplot and F Funnel plot after application of the trim-and-fill method for biomarkers in speculative CNS-enriched EVs for the differential diagnosis of Parkinson’s disease from healthy controls. Collectively, they suggested a substantial presence of publication bias. The trim-and-fill method estimated five missing studies (white circles) on the left side of the figure with either small or null diagnostic accuracy. CNS central nervous system; EVs extracellular vesicles

Importantly, Duval and Tweedie's trim-and-fill method [36], a non-parametric method of adjusting for publication bias, estimated that there were approximately 5 studies missing from our meta-analysis due to publication bias. These missing studies are hypothesized to be on the left side of the funnel plot (see white circles in Fig. 3F), indicating smaller studies with lower DORs, and therefore may explain why they were not published. When these missing studies were imputed and included in a random-effects model, the adjusted diagnostic odds ratio became 1.77 (SE = 0.27, 95% CI: 1.23–2.30, z = 6.46, p < 0.0001). This suggested that when adjusting for potential publication bias, the diagnostic effect for using biomarkers in speculative CNS-enriched EVs for patients with PD vs HCs is much smaller than what is reported in the literature.

Collectively, the hierarchical bivariate model revealed moderate diagnostic accuracy of patients with PD from HCs using biomarkers in speculative CNS-enriched EVs, but with high heterogeneity and unreliability. Publication bias analyses showed that smaller studies with non-significant or low effects size results have been less likely to be published. Unsurprisingly, this is to be expected as alluded to previously [40, 41], there has been consistent failure of independent validation across studies using speculative CNS-enriched EVs, likely due to EVs being very sensitive to various pre-analytical factors [40], high complexity of methodologies used to isolate speculative CNS-enriched EVs as well as user differences in handling, among others. Even though measuring biomarkers in speculative CNS-enriched EVs for patients with PD vs HCs has been popular since 2014, only few studies currently exist, further indicating that studies with null results might not have been published. When the trim-and-fill method was used to account for the estimated five missing studies, the diagnostic effect for patients with PD vs HCs decreased substantially.

PD vs control: sub-analysis by media, antibody clone, and quantification methodology

As described above, several pre-analytical factors may affect the EV signature obtained from plasma or serum. Recent studies suggested that plasma provides superior accuracy and reliability in comparison to serum for EV biomarker analysis [40], while the anti-L1CAM antibody clone UJ127 has been reported to cross-react with α-syn proteoforms [26].

In the present meta-analysis, we observed distinct differences between studies using plasma and serum. The plasma model (Table S4) yielded an overall lower diagnostic accuracy in comparison to the serum model (Table S5). Comparison of the hierarchical bivariate HSROC (Fig. 4A) obtained from studies using plasma [5, 18, 25, 37, 38, 46, 48, 50, 52] or serum [1, 11, 16, 17, 35, 39, 43] also suggested that the studies using serum had, on average, slightly better accuracy, though there was a decent overlap in the confidence intervals of both models.

Fig. 4
figure 4

Summary receiver operating characteristics (SROC) comparing isolation of speculative CNS-enriched EVs using A plasma vs serum, B the anti-L1CAM antibody clone UJ127 vs 5G3 and C quantification methodology. CNS central nervous system; ECLIA electrochemilumiscence ELISA; ELISA enzyme-linked immunosorbent assay; EVs extracellular vesicles; L1CAM L1 cell adhesion molecule

It should also be noted that three [16, 17, 35] and two [11, 43] studies using serum, respectively, originated from the same research group while the majority of studies using plasma originated from unique research groups, suggesting a potential overlap in methodologies in studies using serum. Another potential explanation for the discrepancy in accuracy between plasma and serum studies is the way coagulation factors are handled. Many of the studies using plasma did not treat it with thrombin followed by a high-speed centrifugation to remove these factors despite using ExoQuick, a polymer-based precipitation technique, for EV isolation. ExoQuick's guidelines recommend removing the coagulation factors to prevent the precipitation of an insoluble  fibrin pellet after addition of ExoQuick and subsequent centrifugation, which can potentially skew the measurements. Moreover, the current scientific literature lacks details on how these coagulation factors or the presence of a fibrin pellet might impact the quantification of biomarkers within EVs. As such, these differences in the number of studies and potential methodological biases do not definitively establish one medium as superior over the other. Further independent studies focusing on these issues are needed to draw more conclusive comparisons.

Comparisons of studies using the anti-L1CAM antibody clone UJ127 (Table S6) vs 5G3 (Table S7) showed that studies using the 5G3 clone obtained a slightly higher accuracy though significant overlap was observed (Fig. 4B). Moreover, the studies included quantified α-syn using bead-based techniques (e.g., Simoa and Luminex), electrochemilumiscence ELISA (ECLIA) or ELISA, and as such, we compared the diagnostic accuracy of these methodologies. The results (Fig. 4C, Table S8) showed that ECLIA and ELISA obtained similar accuracies while bead-based methods achieved the lowest accuracy. We did not perform additional sub-analyses due to the small number of studies.

PD vs HCs: speculative CNS-enriched EVs vs general EVs

As EVs are speculated to communicate cell-state-specific messages from the CNS to the peripheral circulation, measurement of biomarkers in speculative CNS-enriched EVs to distinguish patients with PD from HCs has been popular [13]. Speculative CNS-enriched EVs are often captured through direct immunoprecipitation or as a part of a two-step procedure where EVs are first isolated using a polymer-based precipitation technique (e.g., ExoQuick) or ultracentrifugation and nEVs, oEVs, or aEVs are immunoprecipitated using beads coupled to the chosen antibodies. Herein, we compared the diagnostic accuracy of biomarkers in general EVs [42] vs speculative CNS-enriched EVs.

Comparison of the bivariate and HSROC model statistics revealed that biomarkers in general EVs [42] have a higher diagnostic accuracy vs speculative CNS-enriched EVs (Table 4). Both methodologies showed evidence of publication bias, but the trim-and-fill method identified fewer missing studies in general EV biomarkers (2 out of 21) compared to speculative CNS-enriched EV biomarkers (5 out of 16), suggesting less publication bias in the former. We observed that only a single study [48] used biomarkers in general EVs and speculative CNS-enriched EVs for distinguishing between patients with PD and HCs. The rationale for the omission of such biomarkers in general EVs for diagnosis before transitioning to speculative CNS-enriched EVs remains unclear. It's important to highlight that isolation of speculative CNS-enriched EVs is notably more complex, time-consuming, and labor-intensive than general EVs.

Table 4 Comparison between general EVs [42] vs speculative CNS-enriched EVs for diagnosing patients with Parkinson’s disease (PD) from healthy controls using a bivariate and hierarchal summary receiver operating characteristics (HSROC) model. Reproduced from Taha et al. [42]

PD vs MSA

Only six studies attempted to differentiate patients with PD from MSA [11, 16, 17, 43, 46, 49]. The AUC ranged between 0.709 and 0.980 while the sensitivity (Fig. 5A) and specificity (Fig. 5B) ranged between 0.53–0.96 and 0.64–0.92, respectively. Similarly, to the above, the chi-square (χ2) equality test revealed high heterogeneity for sensitivity (χ2 = 131.63, df = 7, p < 0.0001) and specificity (χ2 = 57.84, df = 7, p < 0.0001). Both the crosshair and ROC ellipse plots confirmed the heterogeneity present (Fig. S2A-B). Univariate Forest plots of the DOR, posLR, and negLR for each individual analysis are shown in Fig. 5C–E. Bivariate and HSROC models’ summary statistics are provided in Table 5.

Fig. 5
figure 5

Diagnostic accuracy of biomarkers in speculative CNS-enriched EVs for the differential diagnosis of Parkinson’s disease (PD) from multiple system atrophy (MSA). AE Univariate Forest plots for sensitivity, specificity, diagnostic odds ratio (DOR), positive (posLR) and negative (negLR) likelihood ratios, respectively. F Summary receiver operating characteristics (SROC). The dotted circle shows the mean summary estimate of sensitivities and specificities using a bivariate model. The summary line is obtained from a hierarchical SROC model. CNS central nervous system; EVs extracellular vesicles

Table 5 Meta-analysis of diagnostic accuracy for patients with Parkinson’s disease vs multiple system atrophy summary statistics for the bivariate and hierarchal summary receiver operating characteristic (HSROC) models

Heterogeneity (I2) values exhibited variations based on the approach employed, similar to the aforementioned findings. The Zhou and Dendukuri approach estimated the heterogeneity at 49.7%. The Holling sample size unadjusted approaches reported higher levels of heterogeneity ranging from 90.9% to 92.2%, while adjusted approaches indicated lower levels of heterogeneity ranging from 8.3% to 12%. These findings suggested substantial heterogeneity across the studies, indicating that the variability in the results may not be due solely to random chance but rather to differences among the studies.

The HSROC curve for this model is provided in Fig. 5F. The summary line was found to be distant from the upper left corner, suggesting that measurement of biomarkers in speculative CNS-enriched EVs for distinguishing patients with PD from MSA may not be promising. Moreover, while some studies achieved good sensitivity and specificity, the combined mean for sensitivity and specificity (shown as the circle) indicated that this test only achieved a fair distinguishing ability.

Publication bias assessment using Begg’s correlation (Fig. S3A) and Egger’s regression test (Fig. S3B) revealed no publication bias. However, Deek’s regression indicated that there may be some publication bias (slope = −55.86, SE = 11.35, t = −4.92, p = 0.0017, Fig. S3C). Further examination using Deek’s funnel plot (Fig. S3D), bagplots (Fig. S3E), and the trim-and-fill method (Fig. S3F) suggested no publication bias.

PD vs PSP and CBS

As only two studies attempted to differentiate patients with PD from PSP and CBS [16, 24] and one from FTD, PSP, and CBS [17], we used a univariate approach for this analysis.

Crosshair (Fig. S4A) and ROC ellipse plots (Fig. S4B) suggested low heterogeneity. Forest plots of sensitivity, specificity, DOR, posLR, and negLR are shown in Fig. 6A–E. The model provided an AUC of 0.961 (95% CI: 0.920–1.0), indicating high discriminatory ability. The correlation estimate between sensitivity and FPR was -0.185 (95% CI: -0.973–0.944). The wide confidence interval and the presence of both positive and negative values indicated low precision, high variability, and uncertainty in the correlation estimate. The coefficient θ of 0.041 (95% CI: −0.0058–0.087; plotted as SROC in Fig. 6F) provided support for the utility of this model. The smaller the coefficient θ, the larger the area under the ROC curve, resulting in larger accuracy of the model.

Fig. 6
figure 6

Diagnostic accuracy of biomarkers in speculative CNS-enriched EVs for the differential diagnosis of Parkinson’s disease (PD) from progressive supranuclear palsy (PSP) and corticobasal syndrome (CBS). AE Univariate Forest plots for sensitivity, specificity, diagnostic odds ratio (DOR), positive (posLR) and negative (negLR) likelihood ratios, respectively. F Summary receiver operating characteristics (SROC) using a univariate model. CNS central nervous system; EVs extracellular vesicles

With low heterogeneity (chi-square quality test under heterogeneity: χ2 = 4.11, df = 2, p value = 0.13), high accuracy and larger overall standardized mean difference of biomarkers in patients with PD vs PSP and CBS [41], measuring biomarkers in speculative CNS-enriched EVs to differentiate patients with PD from PSP and CBS may be promising. However, as the results came only from three studies, two of which are from the same research group [16, 17], interpretation and generalizability are limited. A significant challenge in the field arises from the lack of independent validation across studies, and to combat such issue, it is essential to obtain similar results across different laboratories and cohorts.

Assessment of publication bias using Begg’s correlation (Fig. S5A), Egger’s regression (Fig. S5B), Deek’s regression (Fig. S5C) tests, Deek’s funnel plot (Fig. S5D), bagplots (Fig. S5E), and funnel plots using the trim-and-fill method (Fig. S5F) suggested no publication bias.

MSA vs control

Three studies attempted to differentiate patients with MSA from HCs [11, 43, 49] and were analyzed using a univariate approach. The Forest plots for sensitivity, specificity, DOR, posLR, and negLR are shown in Fig. 7A–E. The coefficient θ of 0.17 (95% CI: −0.55–0.89; plotted as SROC in Fig. 7F) indicated that this model is not promising for diagnosing patients with MSA from HCs despite what is reported in the literature [11, 43]. The large coefficient θ suggested smaller AUC and lesser accuracy of this model. Close inspection of the SROC (Fig. 7F) also suggested large variability and heterogeneity, in support of crosshair (Fig. S6A) and ROC ellipse (Fig. S6B) plots.

Fig. 7
figure 7

Diagnostic accuracy of biomarkers in speculative CNS-enriched EVs for the differential diagnosis of multiple system atrophy (MSA) from healthy controls (HCs). AE Univariate Forest plots for sensitivity, specificity, diagnostic odds ratio (DOR), positive (posLR) and negative (negLR) likelihood ratios, respectively. F Summary receiver operating characteristics (SROC) using a univariate model. CNS central nervous system; EVs extracellular vesicles

Assessment of publication bias using Begg’s correlation (Fig. S7A), Egger’s regression (Fig. S7B), Deek’s regression (Fig. S7C) tests, Deek’s funnel plot (Fig. S7D), bagplots (Fig. S7E), and funnel plots using the trim-and-fill method (Fig. S7F) revealed that only Egger’s regression test suspected publication bias.

Synucleinopathy vs prodromal synucleinopathy

RBD and PAF are recognized as prodromal disorders that are likely to progress and develop into one of the three synucleinopathies [8, 29]. None of the studies with a RBD cohort in the present meta-analysis [17, 35, 48] provided ROC discriminatory models for the disease against PD or DLB except for MSA [17]. Moreover, no study included a PAF cohort, precluding our ability to conduct a meta-analysis.

RBD vs control

Three studies evaluated biomarkers in speculative nEVs in an attempt to differentiate patients with RBD vs HCs [17, 35, 48] and were analyzed using a univariate approach. The Forest plots for sensitivity, specificity, DOR, posLR, and negLR are shown in Fig. 8A–E. The large coefficient θ of 0.14 (95% CI: −0.17–0.45; plotted as SROC in Fig. 8F) indicated that this model may not be promising in distinguishing patients with RBD from HCs as it suggested smaller AUC and lesser accuracy. Close inspection of the SROC (Fig. 8F) also suggested large variability and heterogeneity, in support of crosshair (Fig. S8A) and ROC ellipse (Fig. S8B) plots. Since the number of studies was small, with one study not reporting any false positives [35], we did not assess publication bias.

Fig. 8
figure 8

Diagnostic accuracy of biomarkers in speculative CNS-enriched EVs for the differential diagnosis of REM behavior disorder (RBD) from healthy controls (HCs). AE Univariate Forest plots for sensitivity, specificity, diagnostic odds ratio (DOR), positive (posLR) and negative (negLR) likelihood ratios, respectively. F Summary receiver operating characteristics (SROC) using a univariate model

Discussion

The lack of precise and accurate biomarkers for parkinsonian disorders, including PD, MSA, DLB, PSP, and CBS, often leads to misdiagnoses, hampering patients' ability to receive appropriate and timely care. The inability to predict prodromal disease conversion from RBD and/or PAF to a synucleinopathy further compounds this problem. These challenges are not only distressing for the patients who are left uncertain about their health status and future, but also for the physicians who strive to provide optimal care. Measurement of biomarkers in speculative CNS-enriched EVs isolated from the blood has been popular due to their hypothesized ability to contain cell-state-specific biomarkers and traverse the blood-brain barrier to the peripheral circulation. The current meta-analysis encompassed 18 studies [1, 5, 11, 16,17,18, 24, 25, 35, 37,38,39, 43, 46, 48,49,50, 52] with 1695 patients with PD, 253 with MSA, 21 with DLB, 172 with PSP, 152 with CBS, 189 with RBD and 1288 HCs (Table 1) and aimed to evaluate the diagnostic accuracy of biomarkers in speculative CNS-enriched EVs for parkinsonian disorders (Fig. 9).

Fig. 9
figure 9

Summary receiver operating characteristic (SROC) comparisons for patients with Parkinson’s disease (PD), multiple system atrophy (MSA), progressive supranuclear palsy (PSP), corticobasal syndrome (CBS) and REM behavior disorder (RBD)

Studies (n = 16) attempting to differentiate patients with PD from HCs exhibited considerable variability in sensitivity (Fig. 2A) and specificities, (Fig. 2B), indicating potential methodological inconsistencies among them. The analysis showed that while biomarkers in speculative CNS-enriched EVs achieved a fair ability in distinguishing patients with PD from HCs (Fig. 2F, Table 3), the results were plagued by high heterogeneity and potential publication bias (Fig. 3A–F), casting doubts on the reliability of these findings. Furthermore, our examination using the trim-and-fill method suggested that smaller studies with lower or non-significant diagnostic odds ratios (n = 5) have been less likely to be published (white circles in Fig. 3F). This revealed a substantial overestimation of the diagnostic utility of biomarkers in speculative CNS-enriched EVs for patients with PD.

Comparing the diagnostic accuracy of biomarkers in speculative CNS-enriched EVs isolated from the plasma (Table S4) vs serum (Table S5), suggested that serum may be superior in accuracy (Fig. 4A). However, three and two studies out of 7 using serum were from the same research group while all studies using plasma were mostly from unique research groups, suggesting possible bias in studies using serum. Further comparisons by the anti-L1CAM antibody clone UJ127 (Table S6) vs 5G3 (Table S7) did not reveal substantial differences with large overlap in the confidence intervals, though studies using the 5G3 clone obtained a slightly higher accuracy (Fig. 4B). Comparison of studies based on quantification methodology (Fig. 4C, Table S8) revealed that ELISA achieved the highest diagnostic accuracy followed by ECLIA and bead-based arrays (i.e., Simoa, Luminex). We also noted that general EVs [42] obtained better diagnostic accuracy and less publication than speculative CNS-enriched EVs (Table 4) as the trim-and-fill method estimated 2 missing studies out of 21 vs 5 out of 16 for the former and latter, respectively.

On the other hand, six studies [11, 16, 17, 43, 46, 49] attempted to differentiate patients with PD from MSA and provided mixed results. The analysis (Table 5) revealed wide-ranging values for sensitivity (Fig. 5A), specificity (Fig. 5B), and DOR (Fig. 5C), underlining the significant variability among these studies. Although the collective AUC was 0.903 (Fig. 5F), suggesting a reasonable diagnostic test's discriminatory capacity, the substantial heterogeneity in the results raises concerns about the reliability of the findings.

Only three studies [16, 17, 24] attempted to distinguish patients with PD from those with PSP and CBS. The results, while promising with an AUC = 0.961 (Fig. 6F), are undermined by wide confidence intervals and both positive and negative values in the correlation estimate between sensitivity and FPR (−0.185, 95% CI: −0.973–0.944). This variability indicated uncertainty in the reliability of these findings. The studies exhibited low heterogeneity, which usually strengthens the findings; however, considering two of the three studies originated from the same research group [16, 17], this limited pool restricted the conclusions' generalizability. More diverse research is required to confirm these results and establish the potential of biomarkers in speculative CNS-enriched EVs in differentiating patients with PD from PSP and CBS.

Three studies attempted to differentiate patients with MSA from HCs, but despite prior reports of successful differentiation [11, 43], our analysis suggested that this approach may not be as promising. A high coefficient θ (0.17, 95% CI: −0.55–0.89, Fig. 7F), indicating smaller AUC and lesser accuracy, along with large variability and heterogeneity raises concerns about the reliability of this diagnostic approach.

The prodromal disorders RBD and PAF are considered to eventually convert into one of the three synucleinopathies: PD, MSA, and/or DLB. However, none of the studies that included an RBD cohort [17, 35, 48] provided a ROC discriminatory model for the disease against patients with PD or DLB, except for MSA [36], while no study to date examined biomarkers in speculative CNS-enriched EVs for the prodromal disorder PAF. The attempt to differentiate patients with RBD from HCs in three studies [17, 35, 48] also appears unpromising, as suggested by the large coefficient θ (0.14, 95% CI: -0.17–0.45; Fig. 8F) indicating smaller AUC and lesser accuracy, along with significant variability and heterogeneity.

Notably, one critical challenge is that studies measuring biomarkers in speculative CNS-enriched EVs suffer from a failure of independent validation and replication, even when the same methodology is employed. There is also a lack of standardization of pre-analytical factors in obtaining speculative CNS-enriched EVs despite them being highly sensitive to these pre-analytical factors [40], which further complicates the generalizability of such a test in the clinic.

Importantly, most studies did not adequately detail information concerning pharmacological treatments, such as type, duration and dosage, which are likely to alter the EVs signature. There was also a notable absence of data on race/ethnicity and potential comorbidities, all of which can influence the outcomes. It is imperative that studies using speculative CNS-enriched EVs or general EVs provide a thorough and detailed methodology of blood handling through the EV-TRACK platform [7] as previously reported [40,41,42] along with comprehensive information on the pre-analytical factors. These include but are not limited to fasting status before blood collection, the time of day when blood was collected, the duration of the blood collection process, the needle size used, the specific method and duration for blood layer separation, and the type of tube utilized. Additionally, considerations such as the nature of transport, whether the tube was oriented vertically or horizontally, the chosen anticoagulation agent mixed with plasma, centrifugation techniques, the number of freeze–thaw cycles, platelet-depletion processes, storage conditions (including time and temperature), defibrinization treatments, and the methodologies for freezing EVs or EV lysates after isolation and lysis should also be meticulously documented [40].

In the broader landscape of clinical practice, this meta-analysis uncovers crucial concerns. Though individual studies may seem promising, the current meta-analysis suggested otherwise. Diverse methodologies and variations among the studies using speculative CNS-enriched EVs challenge the reliability of these findings for everyday clinical application. Most critically, such inconsistencies hampers the successful development of a dependable biomarker for parkinsonian disorders. Finding such biomarkers could serve multifaceted roles: diagnosing the diseases, providing prognosis insights, distinguishing the mamong one another or from HCs, tracking disease progression, monitoring and anticipating how a patient might respond to treatment, initial screening, evaluating patient risk, stratifying patients in clinical trials, interpreting drug behavior and responses in the body, discovering the origins and mechanisms of the disorder, identifying environmental triggers or exposures, and playing a key role as primary or alternative measures in clinical research trials. Moreover, having a reliable biomarker would alleviate the undue stress and concerns faced by patients and their families due to uncertainties in diagnosis or prognosis.

As the search for reliable biomarkers in parkinsonian disorders persists, it becomes evident that a more standardized and rigorous approach is imperative in the field. As we move forward, greater emphasis should be placed on improving study design and minimizing bias, enhancing the comparability and reproducibility of findings, and addressing the heterogeneity in the results. Current efforts by the International Society for Extracellular Vesicles (ISEV) [44] and others [7, 15] aim toward more rigorous reporting and standardization to enhance accuracy and reproducibility of research utilizing EVs.

Conclusion

Our comprehensive meta-analysis underscores current limitations and challenges associated with the use of speculative CNS-enriched EVs as diagnostic biomarkers for parkinsonian disorders. The significant methodological inconsistencies across studies, combined with high levels of heterogeneity and potential publication bias, considerably undermine the reliability of these findings. Furthermore, the occasional signs of diagnostic promise are frequently offset by the presence of considerable variability, publication bias, and the lack of independent validation across different research groups. The absence of standardized protocols for pre-analytical factors, which are critical in determining the accuracy of EV-based biomarkers, further compounds these issues. All these aspects culminate in a rather sobering picture, suggesting that this approach may not provide the anticipated breakthrough in the diagnosis of parkinsonian disorders. As we navigate through the complexities of these debilitating diseases, it is becoming increasingly clear that we may need to re-evaluate our strategies, either by adopting more rigorous standardization and reporting [15] as suggested through current efforts by ISEV [44] and others [7] or exploring alternative avenues for effective biomarker discovery. While the journey ahead may be challenging, our continued pursuit of this endeavor remains crucial in transforming the landscape of discovering biomarkers for parkinsonian disorders diagnosis and management.