Introduction

Amyotrophic lateral sclerosis (ALS) is a disease that affects the motor neurons, leading to progressive paralysis and death, mostly from respiratory failure, typically within 3–5 years [1, 2], although survival ranges greatly between patients. Throughout life, ALS effects 1 in 400 individuals [3], and the only treatment approved for it, Riluzole, increases survival by an average of 2–3 months [4].

In recent years, several potentially promising ALS treatments were tested in large clinical trials, but they have, unfortunately, failed to lead to a significant improvement in survival or slowing down of disease progression [5, 6]. These failures demonstrate our desperate need to better understand disease mechanisms and identify novel potential treatment targets. Importantly, they also point to our need to better understand the heterogeneity of disease manifestation in different patients. While, on average, patients with ALS survive from 3 to 5 years after symptom onset, >10% will survive for >10 years [7], and there are no currently available clinical tools to differentiate these patients. Genetic understanding of patients with sporadic ALS, who constitute >90% of the patients, is still in its early stages [8, 9].

The heterogeneity in the course of disease progression and ultimately survival, together with the rarity of the disease, makes predicting disease progression at the level of the individual patient very challenging. This presents a substantial barrier to the planning and interpretation of clinical trials for treatments, leading to large, expensive, and potentially unbalanced trials.

Large sample sets are critical for identifying statistically significant and biologically relevant observations, particularly for diseases resulting from the complex interplay of genetic and environmental factors [10]. Indeed, pooled clinical trial datasets have proven to be invaluable resources for researchers seeking to unravel other complex neurodegenerative diseases in the past—an effort similar to PRO-ACT was instrumental in developing and characterizing a novel clinical outcome measure for multiple sclerosis [11]. In 2012, the Coalition Against Major Diseases used its Alzheimer’s disease clinical trial database to develop a US Food and Drug Administration-approved tool to allow simulations of clinical trials (http://c-path.org/programs/camd/simulation-tool/) [12]. Therefore, given that ALS trials are expensive and infrequent, pooling data from existing sources of trial data into a dataset of several thousand ALS patient records can not only provide useful information concerning many aspects of ALS clinical trials, but could also be “data-mined” for unique observations, patterns of disease progression, epidemiologic data, and a host of yet unconsidered analyses.

To allow similar and more thorough analysis of ALS patient-level data, we decided to bring together as many completed ALS clinical trials as possible. This led to the development of the Pooled Resources Open-Access Clinical Trial (PRO-ACT) platform—a collaboration between Prize4Life, the Neurological Clinical Research Institute at Massachusetts General Hospital, and NEALS ALS consortium, with funding from the ALS Therapy Alliance. Datasets from industry and academic clinical trials were collected, cleaned, and harmonized into a single unified dataset—the largest ALS clinical trials dataset ever established, with records from >8600 patients. The data are available for download by the research community, free of charge, at www.ALSDatabase.org.

The research potential of the PRO-ACT data, as well as preliminary results already obtained from it—improving the ability to plan, recruit, and model ALS clinical trials, as well as elucidate disease mechanisms and foster better standards of clinical care—will be reviewed herein. We believe that beyond the benefit to accelerating ALS research, this database can also serve as a unique example of the advantages of open-access clinical trials data in accelerating research.

The PRO-ACT Platform: The Basic Facts

Between 1990 and 2010, 27 trials with at least 80 patients have been conducted. Our goals were to identify and obtain subject records from as many as possible completed phase II/III ALS clinical trials, de indentify, harmonize, and aggregate the data, and make that data available for the global research and development community.

Of the 27 trials, the data of 17 (63%) trials were donated, representing a mix of sources (industry, academic, and government-sponsored trials, including the Riluzole trials [13, 14], lithium carbonate [15], creatine [16], celecoxib [17], topiramate [18], TCH-346 [19], brain-derived neurotrophic factor [20], ciliary neurotrophic factor [21], xaliproden [22], talampanel [23], arimoclomol [24], and gabapentin [25]). A full list of trials is available by referring to Attasi et al. [26]. Several more trials are expected to be added to the database in 2015.

Study protocols were approved by the participating medical centers and all participating patients gave informed consent. The data were fully de-identified with Health Insurance Portability and Accountability Act de-identification conventions for personal health information in mind: any potential patient identifiable information and dates were removed, new random subject identifiers were created, and, wherever possible, trial-specific information was removed in the resulting dataset. Then, the data were harmonized and organized according to a comprehensive common data structure, which was developed based on the National Institute of Neurological Disorders and Stroke-recommended common data elements. The datasets were then imported according to the mappings, preserving the natural grouping and properties (for more information see [16]). Finally, the PRO-ACT platform was launched for open access use by the ALS research committee in December 2012.

The PRO-ACT database currently includes the following data elements:demographics;family history;subject ALS history;survival;ALS Functional Rating Score, original (ALSFRS) and revised (ALSFRS- R) [27, 28];forced vital capacity; slow vital capacity; treatment arm; riluzole use; vital signs; concomitant medication (added in winter 2014); adverse events (added in winter 2014).

The PRO-ACT database had been requested by various researchers from the ALS research community. In the 24 months since its launch, access was requested by >400 researchers from >40 countries (see Fig. 1), including representatives from 41 pharmaceutical companies, 24 independent informatics companies (either working for a pharmaceutical client or independently), and >60 academic institutions.

Fig. 1
figure 1

An overview of the rationale behind the Pooled Resources Open-Access Clinical Trial (PRO-ACT) database. (a) The types of data in PRO-ACT (see text of a full list of data types). The data is then used by researchers all over the world. (b) A map depicting the spread of researchers that have requested access to the PRO-ACT database (>400 researchers in 41 countries). Each star depicts a country. Pharmaceutical indicates pharmaceutical companies. Informatics indicates informatics companies and academic indicates academic institutions. The researchers then produce new puzzle pieces for understanding amyotrophic lateral sclerosis (ALS). (c) Some of the potential research and development benefits of the database. ALSFRS = ALS Functional Rating Scale [31]; ConMed = concomitant medication

Describing the ALS Clinical Trials Population

One use of the PRO-ACT database is to provide, for the first time, an accurate description of the patient population found in ALS clinical trials—its characteristics, rates of disease progression, natural history, and more. In order to plan and execute an ALS clinical trial, one needs to build a model of the ALS patient population that would be used throughout the various stages of the planning and execution of the clinical trial. There needs to be a good knowledge of the disease’s basic demographics and other relevant features in order to plan clinical trial recruitment; a good knowledge of disease progression (as measured by different outcome measures) in order to plan the overall length of the trial and the number of participants in each treatment arm needed to show an effect of the treatment; and to determine which outcome measure is the most appropriate to use and what level of noise can be expected from it.

When using the PRO-ACT database for these purposes it is important to be mindful of the changes over time in the ALS patient population, as the PRO-ACT database includes trials concluded over the last 20 years, during which there has been an improvement in overall survival [29], potentially representing an improvement in interventions (gastrostomy, tracheotomy) and noninvasive assistive technology. However, such interventions have not yet had a statistically significant effect on the disease course as rates of decline in the ALSFRS have remained steady over the years [30, 31].

Another thing to be mindful of is that clinical trial patient population is not necessarily representative of the ALS patient population as a whole [31, 32]. There is likely to be some bias toward recruiting patients with slower-progressing disease, as they are more likely to survive until the trial recruitment. Also, eligibility criteria for various clinical trials make the clinical trial population less diverse, with typical thresholds for maximal age of recruitment, length of the disease, severity of the disease, as measured by ALSFRS and ALSFRS-R, forced vital capacity, or other relevant measures.

Indeed, compared with patients observed in the clinic, patients with ALS in PRO-ACT were, on average, younger, had somewhat slower disease progression, and a shorter time interval between symptom onset and diagnosis [26, 29]. Investigators should be aware of these differences when generalizing results from clinical trial patients to the general ALS population.

Open Research Question: Clinical Trial Simulation

The size of the PRO-ACT database allows assessment of the resources and tools needed to manage a future ALS clinical trial. These include decisions in clinical trial management such as trial size (how many patients to recruit); estimation of dropout rate and factors likely to influence dropout rate; planning the length of the trial; determining which measures are appropriate as primary and secondary end points; and determining how measures of interest behave longitudinally and what level of noise to expect from them. The database may enable the development of a model to address these decisions more clearly.

Understanding ALS Clinical Manifestation

Another use of the PRO-ACT database is to try to unravel the factors predicting disease onset, progression and survival. In the past, researchers have identified various features related to ALS survival, and developed equations to predict survival in ALS. Yet other factors were found to be related to the progression as measured by ALSFRS or ALSFRS-R. Using the PRO-ACT database allowed for validation of these findings; for example, site of onset has been shown to affect the slope of ALSFRS and overall survival time [3341], with bulbar onset leading to poor prognosis compared with limb onset. Similarly, age of onset was predictive of prognosis (with poorer prognosis for patients with an later disease onset [3444]). Other factors connected to prognosis in previous reports included body mass index or absolute weight [42, 45]; cognitive functioning [46]; level of uric acid [42, 47]; and levels of albumin and creatinine [48, 49].

Using the PRO-ACT database, we were able to test these and other predictors on a larger population [26]. Multivariate analysis was conducted to identify the effect of these factors, as well as 15 baseline serum blood tests of interest, such as uric acid, creatinine, glucose, creatine kinase, cholesterol, triglycerides, white blood count, and bicarbonate. Indeed, age at symptom onset, site of onset, and body mass index at onset all verified to be independent predictors of survival. In addition, baseline levels of creatinine and uric acid at clinical trial onset were identified as independent predictors of rate of ALSFRS-R decline, and of tracheostomy-free survival after controlling for other factors in multivariate analysis, with lower levels being related to poorer prognosis. This suggests a relation between ALS and oxidative stress, and can suggest disease pathways to investigate further.

Breaking Down the ALS Patient Population

Other analyses are aimed at identifying specific subpopulations of patients with ALS. For example, Küffner et al. [50] developed a mathematical model to characterize ALS disease progression, and a probabilistic model to estimate the presence of clusters in the rate of progression. The researchers identified 2 distinct patient populations—slow-progressing patients and fast-progressing patients—and determined the optimal cutoff for classifying patients as belonging to 1 of the 2 groups based on up to 4 weeks of observations. Such models can improve clinical trial recruitment and balancing. The study also demonstrated the heterogeneity of the patient progression, again highlighting the need for such models.

Indeed, the benefits of a large database is in its capacity to allow novel analyses, including identifying various groups based on disease progression, but also based on different symptom progression profiles, different comorbidities, or laboratory results profile, and more. These analyses are needed for overcoming the limitations of ALS as a rare disease, where parts of the clinical literature is bounded by the limited sample size. Thus, stratification of the ALS patient population and identification of smaller subsets can serve to improve our understanding of disease mechanisms, responses to medication, and individual level prognosis.

Open Research Question: Stratifying Slow and Fast Patients

On average, patients with ALS die within 3–5 years of disease onset. However, there is substantial variance between the patients, with some patients succumbing to ALS within a year, and other surviving for over a decade. Importantly, these patients might, in fact, respond to ALS medication differently than others, and there have been suggestions that certain ALS therapies tested in clinical trials unsuccessfully might actually have beneficial effects on specific subpopulations of patients. Understanding what separates both the very slow progressors and the very fast progressors is a question important for clinicians and for planning a clinical trial. In addition, the results of this analysis can shed new light and have substantial implications on understanding ALS biological mechanisms and potential novel biomarkers of ALS progression.

ALS Enters the World of Big Data

One general benefit of a larger database is that it allows utilization of techniques from quantitative disciplines that were not previously commonly used in ALS research. Such techniques include network models, machine learning algorithms for clustering, predicting and stratifying data, and more.

One example for this is the DREAM-Phil Bowen ALS prediction Prize4Life challenge initiated by Prize4Life, which ran in 2012 in collaboration with The DREAM Project (Dialogue for Reverse Engineering Assessments and Methods). The challenge, which ran on the Innocentive (www.InnoCentive.com) prize platform, invited solvers to develop algorithms utilizing 3 months of data to predict disease progression 9 months later, for a potential prize of $25,000. The challenge drew in >1000 registrants from 64 countries worldwide [51]. The challenge resulted in a statistical tie, with 2 groups winning first place, with the best teams predicting ALS progression with an accuracy of ~0.511, when tested by the challenge host on a blind, never-used-before test set. The performance of the algorithms was further assessed with regard to their effect on reducing the number of patients needed for a clinical trial (owing to the reduction in uncertainty about the patients) and with regard to the performance compared to expert clinicians. The challenge also helped identifying new potential predictors of ALS progression [51].

The winning algorithms were still at a proof-of-concept level, and require substantial further work to be incorporated into clinical trials or to be used by clinicians in their standard practice. Therefore, one of the winners, a team of researchers from Sentrana, Inc., has created a spin-off company, Origent Data Sciences, Inc., to focus on the development of predictive analytics for healthcare and life science applications. Since the contest, with the availability of the full PRO-ACT database, Origent has developed a model that predicts the ALSFRS-R score itself and has improved the performance of its model by 14%. Statistics on the performance of their model are shown in Fig. 2. Origent continues to further improve the ALS algorithm and extend it to other diseases, as well as data types that are used in clinical setting.

Fig. 2
figure 2

Performance of the Origent model. (a) The predicted scores for Amyotrophic Lateral Sclerosis (ALS) Functional Rating Scale-revised (ALSFRS-R), derived from a random forest algorithm, are plotted against the actual scores for 658 observations from 222 ALS patient records and linear regression analysis was performed. If perfectly correlated, the slope would equal 1. As can be seen, the slope at 0.82 is approaching 1 and the R2 statistic is high, indicating a good fit of the data to the line. (b) The predicted scores have been subtracted from the actual scores, the results plotted on a histogram and descriptive statistics have been calculated. If the model was perfect, the difference would be 0. The model comes very close—the average difference is –0.04 with a SD of ± 3.29

In summary, the ALS prediction challenge made it possible to draw many brilliant minds from outside the field of ALS to bring their scientific approaches and insights to bear on the disease.

More Open Research Questions

In the following, we have listed several additional research directions in the field of ALS research, which the PRO-ACT database can uniquely help address by providing sufficiently large sets of patient records, beyond the questions listed beforehand.

Understanding ALS Symptom Profile

Beyond the speed of disease progression and survival outcome, it is important to understand the symptom profile [52, 53]—why do some symptoms occur together more than others—and the patterns in which new symptoms manifest over time. For a patient, knowing which function is likely to deteriorate next will be of substantial impact on his/her daily life. Better understanding of symptom profiles and their relations to prognosis is needed.

For example, ALSFRS was, and ALSFRS-R is, the most common ALS end point or outcome measure currently used in clinical trials [52, 54], and the most well-accepted tests of disease progression used in the clinic. Understanding the development of ALSFRS over time is relevant for understanding disease progression. Reports vary regarding the linearity of ALSFRS-R measure over time [34, 43, 54, 55], and these reports likely depend on number of patients, distribution of disease progression, and frequency of measuring ALSFRS-R.

In addition, the longitudinal changes in specific ALSFRS-R answers are of interest as they relate to specific disabilities that can be addressed by assistive technology.

Understanding Potential Biological Mechanisms of ALS Through Comorbidities

ALS has been linked to many mechanisms that are also manifested in other diseases. Relationships have been suggested in the literature between ALS and cardiovascular diseases [56], diabetes [57], and inflammatory diseases [58]. However, large enough longitudinal data have rarely been available to explore these relationships in full. The PRO-ACT database allows exploration of evidence to the biological mechanisms of ALS, achieved through looking at the patients’ adverse events and concomitant medication use. Another avenue of research is looking at the rare occurrences—the number of patients with ALS with a certain rare disease or rare abnormal blood tests. These can help shed new light on the mechanisms shared between ALS and other diseases.

Conclusions

In this paper, we have reviewed several approaches to ALS research that are aided by the availability of the PRO-ACT platform. These include benefits to ALS clinical development, assessment of ALS clinical profile and its heterogeneity and development of models for stratification of the ALS patient population, for predicting disease progression, and other models that have not previously been available in ALS research.

To maintain its relevance for the ALS research community, the PRO-ACT database must constantly be updated to incorporate more trials. We urge the ALS research community to donate trial data for inclusion in the PRO-ACT dataset. The support and generosity of the ALS research community with regard to the PRO-ACT database have been impressive and we hope it will remain so. Concomitant medication use and adverse events will be added to the published dataset in late 2014. In 2015 new data from several clinical trials, amounting to thousands of additional subject records, will be incorporated into the database.

Together, we hope that the current and future data will provide new research opportunities in addressing questions suggested throughout this review, as well as new inquiries. The PRO-ACT platform and its datasets serve as a valuable resource that can provide us with more of the in-depth understanding and novel insights that are essential for breakthrough in ALS research, better future treatments, and, hopefully, a cure.