, Volume 17, Issue 3, pp 407–421 | Cite as

Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering

  • Ming Tang
  • Chao Gao
  • Stephen A. Goutman
  • Alexandr Kalinin
  • Bhramar Mukherjee
  • Yuanfang Guan
  • Ivo D. DinovEmail author
Original Article


Amyotrophic lateral sclerosis (ALS) is a complex progressive neurodegenerative disorder with an estimated prevalence of about 5 per 100,000 people in the United States. In this study, the ALS disease progression is measured by the change of Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS) score over time. The study aims to provide clinical decision support for timely forecasting of the ALS trajectory as well as accurate and reproducible computable phenotypic clustering of participants. Patient data are extracted from DREAM-Phil Bowen ALS Prediction Prize4Life Challenge data, most of which are from the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT) archive. We employed model-based and model-free machine-learning methods to predict the change of the ALSFRS score over time. Using training and testing data we quantified and compared the performance of different techniques. We also used unsupervised machine learning methods to cluster the patients into separate computable phenotypes and interpret the derived subcohorts. Direct prediction of univariate clinical outcomes based on model-based (linear models) or model-free (machine learning based techniques – random forest and Bayesian adaptive regression trees) was only moderately successful. The correlation coefficients between clinically observed changes in ALSFRS scores relative to the model-based/model-free predicted counterparts were 0.427 (random forest) and 0.545(BART). The reliability of these results were assessed using internal statistical cross validation and well as external data validation. Unsupervised clustering generated very reliable and consistent partitions of the patient cohort into four computable phenotypic subgroups. These clusters were explicated by identifying specific salient clinical features included in the PRO-ACT archive that discriminate between the derived subcohorts. There are differences between alternative analytical methods in forecasting specific clinical phenotypes. Although predicting univariate clinical outcomes may be challenging, our results suggest that modern data science strategies are useful in clustering patients and generating evidence-based ALS hypotheses about complex interactions of multivariate factors. Predicting univariate clinical outcomes using the PRO-ACT data yields only marginal accuracy (about 70%). However, unsupervised clustering of participants into sub-groups generates stable, reliable and consistent (exceeding 95%) computable phenotypes whose explication requires interpretation of multivariate sets of features.


• Used a large ALS data archive of 8,000 patients consisting of 3 million records, including 200 clinical features tracked over 12 months.

• Employed model-based and model-free methods to predict ALSFRS changes over time, cluster patients into cohorts, and derive computable phenotypes.

• Research findings include stable, reliable, and consistent (95%) patient stratification into computable phenotypes. However, clinical explication of the results requires interpretation of multivariate information.

Graphical Abstract


ALS Amyotrophic lateral sclerosis Decision support Machine learning Predictive analytics Data science Big data 



Colleagues from the Statistics Online Computational Resource (SOCR), Center for Complexity and Self-management of Chronic Disease (CSCD), Big Data Discovery Science (BDDS), and the Michigan Institute for Data Science (MIDAS) provided constructive feedback about this study.

Data used in the preparation of this article were obtained from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) Database. As such, the following organizations and individuals within the PRO-ACT Consortium contributed to the design and implementation of the PRO-ACT Database and/or provided data, but did not participate in the analysis of the data or the writing of this report: Neurological Clinical Research Institute, MGH; Northeast ALS Consortium; Novartis; Prize4Life Israel; Regeneron Pharmaceuticals, Inc.; Sanofi; Teva Pharmaceutical Industries, Ltd.

Finally, the authors are deeply indebted to the journal editors and the anonymous reviewers who provided valuable recommendations and constructive critiques that improved the manuscript.

Author’s Contributions

MT: developed techniques, conducted analyses, and wrote manuscript.

CG: developed techniques, conducted analyses, and wrote manuscript.

SAG: conceptualized the study and wrote manuscript.

AK: informatics, data analytics, and wrote manuscript.

BM: biostatistical methodology and wrote manuscript.

YG: conducted analyses, and wrote manuscript.

IDD: conceptualized the study, developed methods, conducted analyses, and wrote manuscript.


This research was partially supported by NSF grants 1734853, 1636840, 1416953, 0716055 and 1023115, NIH grants P20 NR015331, P50 NS091856, UL1TR002240, P30 DK089503, U54 EB020406, P30 AG053760, and K23 ES027221, and the Elsie Andresen Fiske Research Fund. These funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Compliance with Ethical Standards

Ethics Approval and Consent to Participate

University of Michigan Institutional Review Board (IRB) approval (HUM00115107) was obtained prior to managing, processing and analyzing the PRO-ACT data.

Competing Interests

S.A.G. Dr. Goutman has received research support from the NIH/NIEHS (K23ES027221), Agency for Toxic Substances and Disease Registry/Centers for Disease Control, the ALS Association, Target ALS, Cytokinetics, and Neuralstem, Inc., and consulted for Cytokinetics.

Supplementary material

12021_2018_9406_MOESM1_ESM.docx (434 kb)
ESM 1 (DOCX 433 kb)
12021_2018_9406_MOESM2_ESM.pdf (78 kb)
ESM 2 (PDF 77 kb)


  1. Abayomi, K., Gelman, A., & Levy, M. (2008). Diagnostics for multivariate imputations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 57(3), 273–291.CrossRefGoogle Scholar
  2. Allen-Zhu, Z., & Hazan, E. (2016). Variance reduction for faster non-convex optimization. in International Conference on Machine Learning.Google Scholar
  3. Atassi, N., Berry, J., Shui, A., Zach, N., Sherman, A., Sinani, E., Walker, J., Katsovskiy, I., Schoenfeld, D., Cudkowicz, M., & Leitner, M. (2014). The PRO-ACT database design, initial analyses, and predictive features. Neurology, 83(19), 1719–1725.CrossRefGoogle Scholar
  4. Beaulieu-Jones, B.K., & Moore, J.H. (2017). Missing data imputation in the electronic health record using deeply learned autoencoders, in Pacific Symposium on Biocomputing 2017, R.B. Altman, et al., Editors. p. 207–218.Google Scholar
  5. Bergsma, W., Croon, M.A., & Hagenaars, J.A. (2009). Marginal models: For dependent, clustered, and longitudinal categorical data. Springer Science & Business Media.Google Scholar
  6. Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3–4), 231–357.CrossRefGoogle Scholar
  7. Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of statistical software, 45(3).Google Scholar
  8. Carreiro, A. V., Amaral, P. M. T., Pinto, S., Tomás, P., de Carvalho, M., & Madeira, S. C. (2015). Prognostic models based on patient snapshots and time windows: Predicting disease progression to assisted ventilation in amyotrophic lateral sclerosis. Journal of biomedical informatics, 58, 133–144.CrossRefGoogle Scholar
  9. Cedarbaum, J. M., & Stambler, N. (1997). Performance of the amyotrophic lateral sclerosis functional rating scale (ALSFRS) in multicenter clinical trials. Journal of the Neurological Sciences, 152, s1–s9.CrossRefGoogle Scholar
  10. Cedarbaum, J. M., Stambler, N., Malta, E., Fuller, C., Hilt, D., Thurmond, B., & Nakanishi, A. (1999). The ALSFRS-R: A revised ALS functional rating scale that incorporates assessments of respiratory function. Journal of the neurological sciences, 169(1), 13–21.CrossRefGoogle Scholar
  11. Chatterjee, S., & Hadi, A.S. (2015). Regression analysis by example. John Wiley & Sons.Google Scholar
  12. De Sa, J.M. (2012). Pattern recognition: concepts, methods and applications. Springer Science & Business Media.Google Scholar
  13. Dinov, I. D. (2016). Volume and value of big healthcare data. Journal of Medical Statistics and Informatics, 4(1), 1–7.CrossRefGoogle Scholar
  14. Dinov, I. D. (2018). Data science and predictive analytics: Biomedical and health applications using R, Springer, Computer Science,
  15. Dinov, I. D., Heavner, B., Tang, M., Glusman, G., Chard, K., Darcy, M., Madduri, R., Pa, J., Spino, C., Kesselman, C., Foster, I., Deutsch, E. W., Price, N. D., van Horn, J. D., Ames, J., Clark, K., Hood, L., Hampstead, B. M., Dauer, W., & Toga, A. W. (2016). Predictive big data analytics: A study of Parkinson's disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations. PLoS One, 11(8), e0157077.CrossRefGoogle Scholar
  16. Edwards, N., Wu, X., & Tseng, C.-W. (2009). An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra. Clinical Proteomics, 5(1), 23–36.CrossRefGoogle Scholar
  17. Fiedler, M., et al. (2006). Linear optimization problems with inexact data. Springer Science & Business Media.Google Scholar
  18. Filzmoser, P., Baumgartner, R., & Moser, E. (1999). A hierarchical clustering method for analyzing functional MR images. Magnetic Resonance Imaging, 17(6), 817–826.CrossRefGoogle Scholar
  19. Franchignoni, F., Mora, G., Giordano, A., Volanti, P., & Chiò, A. (2013). Evidence of multidimensionality in the ALSFRS-R scale: A critical appraisal on its measurement properties using Rasch analysis. Journal of Neurology, Neurosurgery, and Psychiatry, 84(12), 1340–1345.CrossRefGoogle Scholar
  20. Gomeni, R., Fava, M., & P.R.O.-A.A.C.T. Consortium. (2014). Amyotrophic lateral sclerosis disease progression model. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 15(1–2), 119–129.CrossRefGoogle Scholar
  21. Gong, P., et al. (2013). A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. in International Conference on Machine Learning.Google Scholar
  22. Gordon, P. H., Cheng, B., Salachas, F., Pradat, P. F., Bruneteau, G., Corcia, P., Lacomblez, L., & Meininger, V. (2010). Progression in ALS is not linear but is curvilinear. Journal of Neurology, 257(10), 1713–1717.CrossRefGoogle Scholar
  23. Grigull, L., et al. (2016). Diagnostic support for selected neuromuscular diseases using answer-pattern recognition and data mining techniques: A proof of concept multicenter prospective trial. BMC Medical Informatics and Decision Making, 16(1), 1.CrossRefGoogle Scholar
  24. Hothorn, T., & Jung, H. H. (2014). RandomForest4Life: A random Forest for predicting ALS disease progression. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 15(5–6), 444–452.CrossRefGoogle Scholar
  25. Huang, Z., Zhang, H., Boss, J., Goutman, S. A., Mukherjee, B., Dinov, I. D., Guan, Y., & for the Pooled Resource Open-Access ALS Clinical Trials Consortium. (2017). Complete hazard ranking to analyze right-censored data: An ALS survival study. PLOS Computational Biology, 13(12), e1005887.CrossRefGoogle Scholar
  26. Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition letters, 31(8), 651–666.CrossRefGoogle Scholar
  27. Jain, P., & Kar, P. (2017). Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3–4), 142–336.CrossRefGoogle Scholar
  28. Kai-Hsiang, C., et al. (1999). Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means. IEEE Transactions on Medical Imaging, 18(12), 1117–1128.CrossRefGoogle Scholar
  29. Kuffner, R., et al. (2015). Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nature Biotechnology, 33(1), 51–57.CrossRefGoogle Scholar
  30. Maaten, L.v.d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.Google Scholar
  31. Mairal, J. (2015). Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2), 829–855.CrossRefGoogle Scholar
  32. Marek, K., Jennings, D., Lasch, S., Siderowf, A., Tanner, C., Simuni, T., Coffey, C., Kieburtz, K., Flagg, E., Chowdhury, S., Poewe, W., Mollenhauer, B., Klinik, P. E., Sherer, T., Frasier, M., Meunier, C., Rudolph, A., Casaceli, C., Seibyl, J., Mendick, S., Schuff, N., Zhang, Y., Toga, A., Crawford, K., Ansbach, A., de Blasio, P., Piovella, M., Trojanowski, J., Shaw, L., Singleton, A., Hawkins, K., Eberling, J., Brooks, D., Russell, D., Leary, L., Factor, S., Sommerfeld, B., Hogarth, P., Pighetti, E., Williams, K., Standaert, D., Guthrie, S., Hauser, R., Delgado, H., Jankovic, J., Hunter, C., Stern, M., Tran, B., Leverenz, J., Baca, M., Frank, S., Thomas, C. A., Richard, I., Deeley, C., Rees, L., Sprenger, F., Lang, E., Shill, H., Obradov, S., Fernandez, H., Winters, A., Berg, D., Gauss, K., Galasko, D., Fontaine, D., Mari, Z., Gerstenhaber, M., Brooks, D., Malloy, S., Barone, P., Longo, K., Comery, T., Ravina, B., Grachev, I., Gallagher, K., Collins, M., Widnell, K. L., Ostrowizki, S., Fontoura, P., Ho, T., Luthman, J., Brug, M. . ., Reith, A. D., & Taylor, P. (2011). The Parkinson progression marker initiative (PPMI). Progress in Neurobiology, 95(4), 629–635.CrossRefGoogle Scholar
  33. Markus, K. A. (2012). Principles and practice of structural equation modeling by Rex B. Kline. Structural Equation Modeling: A Multidisciplinary Journal, 19(3), 509–512.CrossRefGoogle Scholar
  34. Moon, S. W., et al. (2015a). Structural neuroimaging genetics interactions in Alzheimer’s disease. Journal of Alzheimer's Disease, 48(4), 1051–1063.CrossRefGoogle Scholar
  35. Moon, S. W., Dinov, I. D., Hobel, S., Zamanyan, A., Choi, Y. C., Shi, R., Thompson, P. M., Toga, A. W., & for the Alzheimer's Disease Neuroimaging Initiative. (2015b). Structural brain changes in early-onset Alzheimer's disease subjects using the LONI pipeline environment. Journal of Neuroimaging, 25(5), 728–737.CrossRefGoogle Scholar
  36. Ong, M.-L., Tan, P. F., & Holbrook, J. D. (2017). Predicting functional decline and survival in amyotrophic lateral sclerosis. PLoS One, 12(4), e0174925.CrossRefGoogle Scholar
  37. Pfohl, S. R., Kim, R. B., Coan, G. S., & Mitchell, C. S. (2018). Unraveling the complexity of amyotrophic lateral sclerosis survival prediction. Frontiers in Neuroinformatics, 12(36).Google Scholar
  38. Rodriguez-Galiano, V., et al. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing, 67, 93–104.CrossRefGoogle Scholar
  39. Saitta, S., Kripakaran, P., Raphael, B., & Smith, I. F. C. (2010). Feature selection using stochastic search: An application to system identification. Journal of Computing in Civil Engineering, 24(1), 3–10.CrossRefGoogle Scholar
  40. Saykin, A. J., Shen, L., Yao, X., Kim, S., Nho, K., Risacher, S. L., Ramanan, V. K., Foroud, T. M., Faber, K. M., Sarwar, N., Munsie, L. M., Hu, X., Soares, H. D., Potkin, S. G., Thompson, P. M., Kauwe, J. S., Kaddurah-Daouk, R., Green, R. C., Toga, A. W., Weiner, M. W., & Alzheimer's Disease Neuroimaging Initiative. (2015). Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans. Alzheimers & Dementia, 11(7), 792–814.CrossRefGoogle Scholar
  41. Steinberg, D., & Colla, P. (2009). Cart: classification and regression trees. The Top Ten Algorithms in Data Mining, 9, 179.CrossRefGoogle Scholar
  42. Su, Y.-S., et al. (2011). Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software, 45(2), 1–31.CrossRefGoogle Scholar
  43. Tamás Kincses, Z., Johansen-Berg, H., Tomassini, V., Bosnell, R., Matthews, P. M., & Beckmann, C. F. (2008). Model-free characterization of brain functional networks for motor sequence learning using fMRI. NeuroImage, 39(4), 1950–1958.CrossRefGoogle Scholar
  44. Taylor, A. A., Fournier, C., Polak, M., Wang, L., Zach, N., Keymer, M., Glass, J. D., Ennist, D. L., & The Pooled Resource Open-Access ALS Clinical Trials Consortium. (2016). Predicting disease progression in amyotrophic lateral sclerosis. Annals of Clinical and Translational Neurology, 3(11), 866–875.CrossRefGoogle Scholar
  45. Westeneng, H.-J., Debray, T. P. A., Visser, A. E., van Eijk, R. P. A., Rooney, J. P. K., Calvo, A., Martin, S., McDermott, C. J., Thompson, A. G., Pinto, S., Kobeleva, X., Rosenbohm, A., Stubendorff, B., Sommer, H., Middelkoop, B. M., Dekker, A. M., van Vugt, J. J. F. A., van Rheenen, W., Vajda, A., Heverin, M., Kazoka, M., Hollinger, H., Gromicho, M., Körner, S., Ringer, T. M., Rödiger, A., Gunkel, A., Shaw, C. E., Bredenoord, A. L., van Es, M. A., Corcia, P., Couratier, P., Weber, M., Grosskreutz, J., Ludolph, A. C., Petri, S., de Carvalho, M., van Damme, P., Talbot, K., Turner, M. R., Shaw, P. J., al-Chalabi, A., Chiò, A., Hardiman, O., Moons, K. G. M., Veldink, J. H., & van den Berg, L. H. (2018). Prognosis for patients with amyotrophic lateral sclerosis: Development and validation of a personalised prediction model. The Lancet Neurology, 17(5), 423–433.CrossRefGoogle Scholar
  46. Wismüller, A., Meyer-Bäse, A., Lange, O., Auer, D., Reiser, M. F., & Sumners, D. W. (2004). Model-free functional MRI analysis based on unsupervised clustering. Journal of Biomedical Informatics, 37(1), 10–18.CrossRefGoogle Scholar
  47. Wistuba, M., Schilling, N., & Schmidt-Thieme, L.. (2015). Sequential model-free Hyperparameter tuning. in Data mining (ICDM), 2015 IEEE International Conference on.Google Scholar
  48. Witten, I.H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.Google Scholar
  49. Zach, N., Ennist, D. L., Taylor, A. A., Alon, H., Sherman, A., Kueffner, R., Walker, J., Sinani, E., Katsovskiy, I., Cudkowicz, M., & Leitner, M. L. (2015). Being PRO-ACTive: What can a clinical trial database reveal about ALS? Neurotherapeutics, 12(2), 417–423.CrossRefGoogle Scholar
  50. Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451–462.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Statistics Online Computational Resource, Department of Health Behavior and Biological SciencesUniversity of MichiganAnn ArborUSA
  2. 2.Department of BiostatisticsUniversity of MichiganAnn ArborUSA
  3. 3.Department of NeurologyUniversity of MichiganAnn ArborUSA
  4. 4.Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborUSA
  5. 5.Michigan Institute for Data ScienceUniversity of MichiganAnn ArborUSA

Personalised recommendations