Abstract
Introduction
To continue closing the gap between the predictive modeling and its real-world application, we report a new data-to-prediction pipeline that advanced the state-of-the-art predictive performance of body mass index (BMI) classifications by integrating siloed claims databases via a common data model.
Methods
This study adapted the ensemble-based methodology of the baseline prediction model and focused on removing the silos in the claims databases. We applied the Super Learner machine learning algorithm (SLA) to learn a combined dataset consisting of 50% data from the Optum Date of Death database and 50% data from the IBM MarketScan Commercial Claims and Encounters (CCAE), and omitted the commonly used one-hot-encoding step and used multi-categorical variables directly in the feature engineering process. These developments were then optimized via a standard cross-validation scheme and the performance was evaluated on a holdout test set.
Results
Sociodemographic and clinical characteristics were used with (denoted as SLA1) and without (denoted as SLA2) baseline BMI values to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m2). Although the newly implemented SLA1 performed similarly to the previous model, with the area under the receiver operating characteristic curve (ROC AUC) being approximately 88% for all BMI classifications, specificity ranging from 90% to 96%, and accuracy ranging from 88% to 93%. The new SLA2 achieved consistently better performance on all metrics across all BMI classes. In particular, the new SLA2 achieved 77–79% in ROC AUC, increasing from the previously reported level (73%). Its specificity improved to the range of 76–90% from 71–86%. Its accuracy improved to the range of 77–86% from 73–80%. Its recall (i.e., sensitivity) improved to the range of 64–78% from 60–76%.
Conclusions
This study demonstrates dramatic improvements in the prediction of BMI across classifications using integrated databases in a common data model for the generation of real-world evidence.
References
Adair T, Lopez AD. The role of overweight and obesity in adverse cardiovascular disease mortality trends: an analysis of multiple cause of death data from Australia and the USA. BMC Med. 2020;18:199.
Kamble PS, Hayden J, Collins J, et al. Association of obesity with healthcare resource utilization and costs in a commercial population. Curr Med Res Opin. 2018;34:1335–43.
Elrashidi MY, Jacobson DJ, St Sauver J, et al. Body mass index trajectories and healthcare utilization in young and middle-aged adults. Medicine (Baltimore). 2016;95:e2467.
Kent S, Fusco F, Gray A, Jebb SA, Cairns BJ, Mihaylova B. Body mass index and healthcare costs: a systematic literature review of individual participant data studies. Obes Rev. 2017;18:869–79.
Organisation for Economic Co-operation and Development (OECD). Obesity update 2017. https://www.oecd.org/els/health-systems/Obesity-Update-2017.pdf. Accessed Oct 25, 2021.
Ammann EM, Kalsekar I, Yoo A, Johnston SS. Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf. 2018;27:1092–100.
Center for Disease Control and Prevention. Overweight & obesity. https://www.cdc.gov/obesity/adult/defining.html. Accessed Sept 27, 2021.
Samadoulougou S, Idzerda L, Dault R, Lebel A, Cloutier AM, Vanasse A. Validated methods for identifying individuals with obesity in health care administrative databases: a systematic review. Obes Sci Pract. 2020;6:677–93.
Hales CM, Carroll MD, Fryar CD, Ogden CL. Prevalence of obesity and severe obesity among adults: United States, 2017–2018. NCHS Data Brief. 2020;360:1–8.
Wu B, Chow W, Sakthivel M, et al. Body mass index variable interpolation to expand the utility of real-world administrative healthcare claims database analyses. Adv Ther. 2021;38:1314–27.
Acknowledgements
The authors would like to thank Helen Hardy of Janssen Scientific Affairs and Aakash Bhargava, Simran Modi, and Thapa Anshul Chandrasingh of Mu Sigma, LLC. for their contribution in the research concept development and data exploration phase of this study.
Funding
This research, preparation of the manuscript, and the journal’s Rapid Service fee were funded by Janssen Scientific Affairs, LLC (Titusville, NJ, USA).
Authorship
All named authors meet the International Committee of Medical Journal Editors (ICMJE) criteria for authorship for this article, take responsibility for the integrity of the work as a whole, and have given their approval for this version to be published.
Author Contributions
All authors contributed to study conception and design; contributed to data acquisition, analysis, and interpretation; and contributed to drafting and revising of the manuscript and have given their approval for this manuscript version to be published.
Medical Writing Assistance
The authors would like to thank Michelle McDermott, PharmD, of Cello Health Communications/MedErgy (Yardley, PA, USA), which was supported by Janssen Scientific Affairs, LLC (Titusville, NJ, USA).
Disclosures
Ganhui Lan, Bingcao Wu, and Veronica Ashton are full-time employees of Janssen Scientific Affairs, LLC. Kaustubh Sharma and Kaushal Gadhia are employees of Mu Sigma Business Solutions, LLC.
Compliance with Ethics Guidelines
All datasets were from databases of de-identified patient data and ethics committee approval was not required.
Data Availability
The data that support the study are available within the article and the published study methodology paper and its online supplementary material [10].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lan, G., Wu, B., Sharma, K. et al. Improved Prediction of Body Mass Index in Real-World Administrative Healthcare Claims Databases. Adv Ther 39, 3835–3844 (2022). https://doi.org/10.1007/s12325-022-02192-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12325-022-02192-4