Machine Learning Algorithm Helps Identify Non-Diagnosed Prodromal Alzheimer’s Disease Patients in the General Population

  • O. Uspenskaya-Cadoz
  • C. Alamuri
  • L. Wang
  • M. Yang
  • Sam KhindaEmail author
  • Y. Nigmatullina
  • T. Cao
  • N. Kayal
  • M. O’Keefe
  • C. Rubel
Original Research



Recruiting patients for clinical trials of potential therapies for Alzheimer’s disease (AD) remains a major challenge, with demand for trial participants at an all-time high. The AD treatment R&D pipeline includes around 112 agents. In the United States alone, 150 clinical trials are seeking 70,000 participants. Most people with early cognitive impairment consult primary care providers, who may lack time, diagnostic skills and awareness of local clinical trials. Machine learning and predictive analytics offer promise to boost enrollment by predicting which patients have prodromal AD, and which will go on to develop AD.


The authors set out to develop a machine learning predictive model that identifies prodromal AD patients in the general population, to aid early AD detection by primary care physicians and timely referral to expert sites for biomarker confirmation of diagnosis and clinical trial enrollment.


The authors use a classification machine learning algorithm to extract patterns within healthcare claims and prescription data three years prior to AD diagnosis/AD drug initiation.


The study focused on subjects included within proprietary IQVIA US data assets (claims and prescription databases). Patient information was extracted from January 2010 to July 2018, for cohorts aged between 50 and 85 years.


A total of 88,298,289 subjects aged between 50 and 85 years were identified. For the positive cohort, 667,288 subjects were identified who had 24 months of medical history and at least one record with AD or AD treatment. For the negative cohort, 3,670,254 patients were selected who had a similar length of medical history and who were matched to positive cohort subjects based on the prevalence rate. The scoring cohort was selected based on availability of recent medical data of 2–5 years and included 72,670,283 subjects between the ages of 50 and 85 years.

Intervention (if any)



A list of clinically–relevant and interpretable predictors was generated and extracted from the data sets for each subject, including pharmacological treatments (NDC/ product), office/specialist visits (specialty), tests and procedures (HCPCS and CPT), and diagnosis (ICD). The positive cohort was defined as patients who have AD diagnosis/AD treatment with a 3 years offset as an estimate for prodromal AD diagnosis. Supervised ML techniques were used to develop algorithms to predict the occurrence of prodromal AD cases. The sample dataset was divided randomly into a training dataset and a test dataset. The classification models were trained and executed in the PySpark framework. Training and evaluation of LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, and GBTClassifier were executed using PySpark’s mllib module. The area under the precision-recall curve (AUCPR) was used to compare the results of the various models.


The AUCPRs are 0.426, 0.157, 0.436, and 0.440 for LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, and GBTClassifier, respectively, meaning that GBTClassifier (Gradient Boosted Tree) outperforms the other three classifiers. The GBT model identified 222,721 subjects in the prodromal AD stage with 80% precision. Some 76% of identified prodromal AD patients were in the primary care setting.


Applying the developed predictive model to 72,670,283 U.S. residents, 222,721 prodromal AD patients were identified, the majority of whom were in the primary care setting. This could drive major advances in AD research by enabling more accurate and earlier prodromal AD diagnosis at the primary care physician level, which would facilitate timely referral to expert sites for in–depth assessment and potential enrolment in clinical trials.

Key words

Alzheimer’s disease prodromal AD machine learning algorithm AD clinical trial recruitment 

Supplementary material

42414_2019_78_MOESM1_ESM.docx (16 kb)
Supplementary material, approximately 16.3 KB.


  1. 1.
    Cummings J, Fox N. Defining Disease Modifying Therapy for Alzheimer’s Disease. The journal of prevention of Alzheimer’s disease. 2017;4(2):109–115.Google Scholar
  2. 2.
    Cummings J, Lee G, Ritter A, Zhong K. Alzheimer’s disease drug development pipeline: 2018. Alzheimer’s & Dementia: Translational Research & Clinical Interventions. 2018;4:195–214.Google Scholar
  3. 3.
    Hughes L, Kalali A, Vanbelle C, Cascade E. Innovative Digital Patient Recruitment Strategies in Prodromal AD trials. Poster at CTAD Annual Meeting, October 29–31, 2012.Google Scholar
  4. 4.
    Grill JD, Galvin JE. Facilitating Alzheimer disease research recruitment. Alzheimer Dis Assoc Disord. 2014;28(1):1–8. CrossRefGoogle Scholar
  5. 5.
    Boada M, et al. Patient Engagement: The Fundacio ACE Framework for Improving Recruitment and Retention in Alzheimer’s Disease Research. J Alzheimers Dis, 2018. 62(3): p. 1079–1090. CrossRefGoogle Scholar
  6. 6.
    Clinical Trials and Studies–Myths vs. Facts | Research Center | Alzheimer’s Association. Available at: Accessed 2018–08–24.
  7. 7.
    Watson JL, Ryan L, Silverberg N, Cahan V, Bernard MA. Obstacles And Opportunities In Alzheimer’s Clinical Trial Recruitment. Health affairs (Project Hope). 2014;33(4):574–579. doi:10.1377/hlthaff.2013.1314. 08–26–18)CrossRefGoogle Scholar
  8. 8.
    Getz K. The Need and Opportunity for a New Paradigm in Clinical Trial Execution. Applied Clinical Trials, Volume 27, issue 6, June 1, 2018. Google Scholar
  9. 9.
    Moradi E, Pepe A, Gaser C, Huttunen H, Tohka J. Machine learning framework for early MRI–based Alzheimer’s conversion prediction in MCI subjects. NeuroImage. 2015;104:398–412.CrossRefGoogle Scholar
  10. 10.
    «MLlib | Apache Spark». Retrieved 2016–01–18.Google Scholar
  11. 11.
    Spark SQL: Relational Data Processing in Spark Alzheimer’s disease facts and figures, Alzheimer’s & Dementia, Volume 13, Issue 4, 2017, Pages 325–373, ISSN 1552–5260, (
  12. 12.
    Moore P. Alzheimer’s Association Clinical Studies Initiative Recruitment and Retention Challenges and Opportunities For the Alzheimer Disease Centers. Available at: (accessed 2018–08–24)Google Scholar
  13. 13.
    Jones RW, Andrieu S, Knox S. et al. Physicians and caregivers: Ready and waiting for increased participation in clinical research J Nutr Health Aging (2010) 14:563. Google Scholar
  14. 14.
    Alzheimer’s Disease International web page, IMPACT Study. (accessed 08–27–18)
  15. 15.
    US National Institute on Aging web page. Seeking your ideas for ways to enhance recruitment and retention of Alzheimer’s disease study participants. March 07, 2018. (accessed 08–27–18)
  16. 16.
    Galvin JE, Meuser TM, Morris JC. Improving physician awareness of Alzheimer’s Disease and Enhancing Recruitment: The clinician partners program. Alzheimer Disease and Associated Disorders. 2012;26(1):61–67. doi:10.1097/WAD.0b013e318212c0df. 09–29–28)CrossRefGoogle Scholar
  17. 17.
    Williams MM, Meisel MM, Williams J, et al. An interdisciplinary outreach model of African American recruitment for Alzheimer’s disease research. The Gerontologist. 2011;51(Suppl 1):S134–S141. [PMC free article] [PubMed]Google Scholar
  18. 18.
    Nichols L, Martindale–Adams J, Burns R, et al. Social marketing as a framework for recruitment: illustrations from the REACH study. Journal of Aging and Health. 2004;16(5 Suppl):157S–176S. [PMC free article] [PubMed]Google Scholar
  19. 19.
    Fargo KN, Carrillo MC, Weiner MW, Potter WZ, Khachaturian Z. The crisis in recruitment for clinical trials in Alzheimer’s and dementia: An action plan for solutions. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, Volume 12, Issue 11, 1113–1115. (accessed 08–29–18)
  20. 20.
    Gold M, Amatniek J, Carrillo MC, Cedarbaum JM, Hendrix JA, Miller BB, Robillard JM, Rice JJ, Soares H, Tome MB, Tarnanas I, Vargas G, Bain LJ, Czaja SJ. Digital technologies as biomarkers, clinical outcomes assessment, and recruitment tools in Alzheimer’s disease clinical trials. Alzheimers Dement (N Y). 2018 May 24;4:234–242. doi: 10.1016/j.trci.2018.04.003. eCollection 2018. (accessed 08–29–18)Google Scholar
  21. 21.
    Singanamalli A, Wang H, Madabhushi A. Cascaded Multi–view Canonical Correlation (CaMCCo) for Early Diagnosis of Alzheimer’s Disease via Fusion of Clinical, Imaging and Omic Features. Scientific Reports, 2017; 7 (1) DOI: 10.1038/s41598–017–03925–0 (press release:, accessed 08–26–18)Google Scholar
  22. 22.
    Amoroso N, La Rocca M, Bruno S, Maggipinto T, Monaco A, Bellotti R, Tangaro S. Brain structural connectivity atrophy in Alzheimer’s disease. September 7, 2017. arXiv:1709.02369Google Scholar
  23. 23.
    Anithaswamy A. AI spots Alzheimer’s brain changes years before symptoms emerge. New Scientist Magazine 14 September 2017. 08–26–18)
  24. 24.
    Lee JS, Kim C, Shin J–H, et al. Machine Learning–based Individual Assessment of Cortical Atrophy Pattern in Alzheimer’s Disease Spectrum: Development of the Classifier and Longitudinal Evaluation. Scientific Reports. 2018;8(1):4161. (accessed 08–26–18)CrossRefGoogle Scholar
  25. 25.
    Ding X, Bucholc M, Wang H, et al. A hybrid computational approach for efficient Alzheimer’s disease classification based on heterogeneous data. Scientific Reports. 2018;8(1):9774.CrossRefGoogle Scholar

Copyright information

© Serdi and Springer Nature Switzerland AG 2019

Authors and Affiliations

  • O. Uspenskaya-Cadoz
    • 1
  • C. Alamuri
    • 2
  • L. Wang
    • 2
  • M. Yang
    • 2
  • Sam Khinda
    • 3
    Email author
  • Y. Nigmatullina
    • 2
  • T. Cao
    • 2
  • N. Kayal
    • 2
  • M. O’Keefe
    • 2
  • C. Rubel
    • 3
  1. 1.IQVIA Central Nervous System Center of Excellence, Medical Strategy & Science, Therapeutic Science & Strategy UnitSaint OuenFrance
  2. 2.IQVIA Analytics Center of ExcellenceLa Défense CedexFrance
  3. 3.IQVIA Project LeadershipReading, BerksUK

Personalised recommendations