Background

Parkinson’s disease (PD) is a progressive neurodegenerative disease affecting approximately 630,000 people in the USA and for which no disease-modifying therapy is currently available. With the ever growing ageing population, this number is projected to almost double to 1.1 million by 2030 [1].

The Food and Drug Administration (FDA) defines “real world data” as “all data collected from sources outside of traditional clinical trials” and “real world evidence” as “all evidence derived from aggregation and analysis of real world data” [2]. Such real world evidence reflecting disease progression, treatments and outcomes under conditions of routine clinical practice is a very important resource. It can take a pivotal role to improve the understanding of the underlying disease process [3], optimize currently available therapies and develop new treatment strategies [2, 4].

Although the burden of PD and the interest of real world data are well-known [5, 6], there has not been a literature review to present the overview of longitudinal, real world studies conducted in the USA on PD patients.

There is a need for a comprehensive review to create an integrated view and assist investigators and clinicians to optimize the measurements that best match with their objectives and the already existing data sources [4, 7]. Such an assessment can be very helpful, to support a future effort to harmonize real world data collection and use the available resources in an optimal way.

The objective of this comprehensive literature review is to systematically identify and describe the longitudinal, real world data sources in PD, and to provide a summary of the key characteristics and the measurements assessed in real world studies, as a part of an effort to mobilize a harmonization process, similar to the one that already takes place in Europe.

Methods

Search strategy and literature sources

The search was performed on ProQuest. It was based in MEDLINE on Pubmed, in EMBASE and internet key word search between May and August 2016. Related MeSH, EMTREE and key terms were combined. Articles from peer-reviewed journals, conference abstracts and reviews were screened (AT). The search equation terms are detailed in Appendix 1.

Study screening and selection

We included all studies including patients with a diagnosis of PD based on real world data. We restricted inclusion to only longitudinal, observational cohort studies and registries. The setting was restricted to the USA and the timing of publication in the last 10 years (2006-2016). Cohorts or registries without any publication in the last 10 years were considered as outdated. Exclusion criteria were based on population characteristics: Other diagnosis (e.g. Wolff-Parkinson-White disease or only Parkinsonian syndromes), autopsy data, and studies not focused on patients (e.g. focused on physicians). Moreover, studies without American patients or non-longitudinal studies, such as case-control, were also excluded. Only one main exclusion criterion was reported in the flow chart per excluded study (Fig. 1). No limits were applied for language.

Fig. 1
figure 1

Flowchart

Data extraction

In a first step, when a publication allowed the identification of a data source of interest, the detailed information available in the publication was extracted. Information on design and setting, funding, population selection, follow-up and measurements were recorded. This was supplemented and updated via information found with an internet search of the study website, registration sites such as clinicaltrials.gov and investigators / funders’ websites. The list of all information captured is available in Appendix 2.

In a second step, a classification of measurements was performed for the following dimensions: motor and neurological function, cognition, psychiatric symptoms, activities of daily living, sleep quality, quality of life, autonomic symptoms and other. The “other” dimension gathers some known PD symptoms such as olfaction [8] not included in the previous main dimensions and more general information such as caregivers’ burden measurements. Some dimensions were subdivided in sub dimensions due to their complexity and variety (e.g. Motor and neurological symptoms is sub divided into 4 sub dimensions: global, gait and balance, fine movement and other). This classification was based on the literature [4] with one adaptation: as very few sensory markers were identified, they were gathered in the “other” category.

Data analysis

Data source characteristics were described globally. To address the variability of sources, the description was also performed according to four main characteristics: the completion status (ongoing vs completed); the study population (Parkinson specific data sources vs “generic” data sources including both Parkinsonian patients and patients of other diagnostics); the categories of studies (investigate for motor symptoms, non-motor symptoms, biomarkers, genetics or mixed); and the country (US only vs international sources). Descriptive statistics were reported as absolute frequency and percentages.

Results

Of 1463 records screened, 84% were excluded based on title and abstract, and 7% after review of the full-text (Fig. 1). The most frequent exclusion criterion was that studies were not longitudinal. Only 133 (9%) were included in the qualitative analysis. Of these 133 studies, data from 53 different data sources were extracted [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61]. Only one registry was included with 52 cohorts.

Longitudinal real world sources (Table 1)

Forty-two sources (79%) were only in the USA. Three of the 11 international sources were only in North America while the other eight included patients in the USA and Europe, and two also included Asia. Most of the sources included less than 500 PD patients (79%) for more than 5 years (51%). Although most of the sources included information about current medications (81%) and comorbidities (79%); only few collected information on medical imaging (36%), genetics (30%), caregiver’ burden (11%) and healthcare costs (2%).

Table 1 Overview of data sources characteristics (n = 53)

Among the 53 sources, 16 (30%) are still ongoing. There has been an increased availability of genetic information (38% vs 27%) and caregivers’ burden data (27% vs 5%) in ongoing versus completed sources, respectively. Moreover, there has been a trend toward larger inclusions and longer durations: comparing ongoing versus completed sources, 31% vs 16% included more than 500 patients and 75% vs 41% have a duration of more than 5 years.

Likewise, US sources were smaller and shorter than international sources (88% vs 45% included less than 500 PD patients, and 52% vs 45% have a duration of more than 5 years). US sources reported more caregiver burden data than international sources (12% vs 9%) but less frequently the other assessments such as medical imaging (26% vs 73%) or genetic information (24% vs 55%).

Sources including only Parkinsonian patients were smaller (12% vs 28% included more than 500 patients) and shorter (32% vs 68% had a duration of more than 5 years) than the “generic” cohorts. Medical imaging (24% vs 46%) and genetics (12% vs 46%) were less assessed in Parkinson’s specific than in “generic” cohorts.

The 53 data sources have different objectives. Mainly the sources investigated as their primary objective: non-motor symptoms (32%), then biomarkers (21%), motor symptoms (15%) and genetics (4%). Fifteen sources (28%) investigated several of these points as first objective. The sources investigating the biomarkers as primary objective were large and recent with four sources still ongoing and four sources begun in the last 5 years. In contrast, the sources investigating the motor symptoms as primary objective were small, all with less than 500 patients and with very frequent assessment, on average twice a year.

Measurements in real world studies in PD

The name of each included data source with its main characteristics (Table 2) and its measurements (Table 3) are presented individually. A large number of measurements (n = 108) was identified through this literature review and each of the 53 sources had its own unique range of measurements (Table 4). Most of the measurements were cited only once or twice. The distribution of the number of measurements over the different dimensions was not equal with only 3 different to assess autonomic symptoms and 43 to assess cognition.

Table 2 Overview of data sources characteristics listed in alphabetic order (n = 53)
Table 3 Overview of data source measurements and of the number of evaluations or assessments applied (n = 53)
Table 4 Measurements classification and use in data sources (n = 108)

Most sources assessed motor and neurological functions (87%), cognition (77%) and psychiatric symptoms (72%). Activity level (42%), sleep quality (21%), quality of life (17%) and autonomic symptoms (13%) were reported to a lesser extent. The most commonly measurements used to assess motor and neurological symptoms were the Unified Parkinson’s Disease Rating Scale part III (UPDRS-III, 77% of included data sources) and the Hoehn and Yahr scale (H&Y, 57% of included data sources)(Table 4). To evaluate the cognitive impairment, the Mini Mental State Examination (MMSE, 57%) was the most frequent. Those most frequently used to assess psychiatric symptoms were the Geriatric Depression Scale (GDS, 32%) and Beck Depression Inventory (BDI, 15%). For the other dimensions, the most commonly used measurements were: the Epworth Sleepiness Scale (ESS, 8%, for sleep), the Schwab and England (S&E, 19%, for activities of daily living), the 39-item Parkinson’s disease Quality of life (PDQ-39, 9%, for the quality of life) and the autonomic part of the Scales for outcomes of Parkinson’s disease (SCOPA-AUT, 6%, for autonomic symptoms). In absolute frequency, the use of ESS, PDQ-39 and SCOPA-AUT is very low, even if they were the most frequently used measurements in their dimension.

The analysis reveals some interesting differences between sources on the number of measurements applied by dimension. Some sources evaluate only one dimension (source n°13) when others evaluate seven dimensions (source n°43). Completed sources have more frequent measurements of motor and neurological symptoms (92% vs 75%), psychiatric symptoms (76% vs 63%) and activities of daily living (43% vs 38%) than ongoing sources. US sources evaluate more frequently the cognitive impairment then international sources (86% vs 45%) but less frequently all the other dimensions. “Generic” sources evaluate three dimensions more frequently than specific sources including only Parkinsonian patients: cognition (86% vs 68%), sleep (32% vs 8%) and autonomic symptoms (25% vs 0%).

Lastly, the frequencies of these assessments are dependent on the primary objective of the sources but with an important overlap: 100% of the sources investigating motor symptoms used measurements of motor symptoms and mainly the UPDRS-III, but they also frequently assessed cognition (88%), sleep (25%) and quality of life (25%). The sources investigating non-motor symptoms frequently assessed cognition (82%), psychiatric symptoms (88%) most of the time with, respectively, the GDS (41%) and the MMSE (65%). The two genetic sources have several patient reported outcomes and they both measured motor and psychiatric symptoms.

Some measurements were used more often for some above-mentioned objectives. While the GDS and the UPDRS-III were used specifically in sources investigating, respectively, the non-motor symptoms and the motor symptoms as a primary objective, the BDI and the H&Y were used in sources investigating the other objectives.

Discussion

A large number of longitudinal real world data sources for PD have been identified. There is no consistency of the dimensions assessed, nor of the measurements used across sources, reflecting the absence of harmonization on the optimal choice of measurements.

There are a number of issues with collecting real world data such as limited size of the databases [1], inability to accurately determine specific outcomes [62], and more chance of bias and confounding factors [5]. Nevertheless, they have an important role to play in the evaluation of epidemiology, burden of disease and treatments patterns [6]; and in assisting health-care decision-makers, especially related to coverage and payment decisions [63]. In this context, a harmonization seems necessary. These results are quite consistent with those observed in Europe where a “consensus on domains incorporated in different studies [was observed] with a substantial variability in the choice of the evaluation method” [4]. There are a number of possible explanations for this absence of harmonization and some of them are discussed here.

First of all, some dimensions are broad. In consequence many measurements are available according to each source objective, design and population. This heterogeneity probably reflects both the absence of harmonization and the complexity of the evaluation of a dimension like cognition [64]. A single measurement cannot assess all necessary information. For example, the combination of patient reported outcomes and medical reported outcomes can be very informative and complement one another. In a consistent manner, the combination of Parkinson specific and generic measurements can be a necessity especially for “generic” data sources including not only Parkinsonian patients. In another example, while the objectives of the UPDRS-III and the H&Y (or of the GDS and the BDI) are close, the difference of their use according to the study primary objective of the source seems more linked to the investigator choice than to the suitability of the measurement.

Secondly, PD is characterized by several initial system disorders and treatment complications [65]. To date, motor subtyping has dominated the landscape of PD research but non-motor dimensions evaluations are increasing [9, 66], and thus the number of dimensions to evaluate. For non-motor dimensions, some have validated measurements such as psychiatry [67], activity disability [7], sleep [68] or quality of life [69]; but others have no clear review of validated and used scales [4]. Among the psychiatric scales, the two most frequently used were the GDS and the BDI. This finding highlights the well-known relationship between PD and depression, and the fact that when validated scales [70] are available, a harmonization of practice is observed. The lack of evaluation and validation of the measurements in PD is probably partly a source of such an heterogeneity.

Thirdly, clinical research purposes and outcomes are in permanent evolution over time [71, 72], as highlighted by the many differences between completed and ongoing sources. New trends are not well covered right now, either due to lack of measurements or due to lack of capture (i.e. utilization of available measurements in databases). Among the most important of those are the genetic testing, the caregiver burden and the costs. The important development of genetic testing has come in the last few years, with an increase of the mutations and treatment discoveries such as LRRK2 and its kinase inhibitors. But research is necessary to understand the role of genetic mutations in PD [73]. Sources based on caregiver burden and relevant validated measurements are very limited [7]. But the interest for these data is growing with the recognition of their physical, emotional and economic burden [74]. The only data source identified as measuring healthcare costs associated with PD was ongoing. It probably reflects both the recent growing interest of health economic evaluation and the fact that this type of study is more often conducted in automated healthcare databases [75].

Fourthly, there is a possible improvement of the access to the data source details. Given information is fragmented between different sources of information and study protocols or outcomes lists are not always available. In consequence identifying and gathering this information to produce an integrated view can be really difficult.

Finally, the variability of our results is greater than in the European study. This may be because the classification is based on dimensions assessing mostly symptoms, 5 out of 8 dimensions. This classification probably more appropriate for data sources with a primary objective of treatment evaluation (e.g. open-label extension), which are a minority of the included sources. The classification may not be as applicable to assess other data sources focused on the evaluation of burden. Real world evidence collection is done for various purposes and such a restricted classification can lead to ambiguous conclusions. It can lead to a perception of consensus while actually missing important aspects such as burden, function or complications of treatments.

Our study has several limitations. First of all, only one reader has conducted the record selection and the data extraction unlike systematic reviews. Nevertheless, the search methods identified a large number of PD data sources for extraction and comparison. No contact was established with investigators of the included studies to confirm data extraction results. To address this issue, a second step has been performed after the data extraction from the publications, to update and complete the published information with all other available sources. At risk/prodromal cohorts have not been separated from clinical PD cohorts, but the distinction between these two subgroups has recently been described as artificial [4].

Our study has several strengths. It is the first review of existing real world longitudinal data sources on PD in USA to our knowledge. Moreover, it was performed with broad research criteria and without any limitation on language, type of publication or type of measurements. This review creates an integrated view and should assist investigators and clinicians to identify and optimize the measurements that best match with their objectives and the already existing data sources.

Conclusion

In conclusion, many longitudinal real world data sources on PD exist. Different types of measurements have been used over time. To allow comparison and pooling of these multiple data sources, it will be essential to harmonize practices in terms of types of measurements.