Skip to main content
Log in

Improved Prediction of Body Mass Index in Real-World Administrative Healthcare Claims Databases

  • Brief Report
  • Published:
Advances in Therapy Aims and scope Submit manuscript

Abstract

Introduction

To continue closing the gap between the predictive modeling and its real-world application, we report a new data-to-prediction pipeline that advanced the state-of-the-art predictive performance of body mass index (BMI) classifications by integrating siloed claims databases via a common data model.

Methods

This study adapted the ensemble-based methodology of the baseline prediction model and focused on removing the silos in the claims databases. We applied the Super Learner machine learning algorithm (SLA) to learn a combined dataset consisting of 50% data from the Optum Date of Death database and 50% data from the IBM MarketScan Commercial Claims and Encounters (CCAE), and omitted the commonly used one-hot-encoding step and used multi-categorical variables directly in the feature engineering process. These developments were then optimized via a standard cross-validation scheme and the performance was evaluated on a holdout test set.

Results

Sociodemographic and clinical characteristics were used with (denoted as SLA1) and without (denoted as SLA2) baseline BMI values to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m2). Although the newly implemented SLA1 performed similarly to the previous model, with the area under the receiver operating characteristic curve (ROC AUC) being approximately 88% for all BMI classifications, specificity ranging from 90% to 96%, and accuracy ranging from 88% to 93%. The new SLA2 achieved consistently better performance on all metrics across all BMI classes. In particular, the new SLA2 achieved 77–79% in ROC AUC, increasing from the previously reported level (73%). Its specificity improved to the range of 76–90% from 71–86%. Its accuracy improved to the range of 77–86% from 73–80%. Its recall (i.e., sensitivity) improved to the range of 64–78% from 60–76%.

Conclusions

This study demonstrates dramatic improvements in the prediction of BMI across classifications using integrated databases in a common data model for the generation of real-world evidence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Adair T, Lopez AD. The role of overweight and obesity in adverse cardiovascular disease mortality trends: an analysis of multiple cause of death data from Australia and the USA. BMC Med. 2020;18:199.

    Article  Google Scholar 

  2. Kamble PS, Hayden J, Collins J, et al. Association of obesity with healthcare resource utilization and costs in a commercial population. Curr Med Res Opin. 2018;34:1335–43.

    Article  Google Scholar 

  3. Elrashidi MY, Jacobson DJ, St Sauver J, et al. Body mass index trajectories and healthcare utilization in young and middle-aged adults. Medicine (Baltimore). 2016;95:e2467.

    Article  Google Scholar 

  4. Kent S, Fusco F, Gray A, Jebb SA, Cairns BJ, Mihaylova B. Body mass index and healthcare costs: a systematic literature review of individual participant data studies. Obes Rev. 2017;18:869–79.

    Article  Google Scholar 

  5. Organisation for Economic Co-operation and Development (OECD). Obesity update 2017. https://www.oecd.org/els/health-systems/Obesity-Update-2017.pdf. Accessed Oct 25, 2021.

  6. Ammann EM, Kalsekar I, Yoo A, Johnston SS. Validation of body mass index (BMI)-related ICD-9-CM and ICD-10-CM administrative diagnosis codes recorded in US claims data. Pharmacoepidemiol Drug Saf. 2018;27:1092–100.

    Article  Google Scholar 

  7. Center for Disease Control and Prevention. Overweight & obesity. https://www.cdc.gov/obesity/adult/defining.html. Accessed Sept 27, 2021.

  8. Samadoulougou S, Idzerda L, Dault R, Lebel A, Cloutier AM, Vanasse A. Validated methods for identifying individuals with obesity in health care administrative databases: a systematic review. Obes Sci Pract. 2020;6:677–93.

    Article  Google Scholar 

  9. Hales CM, Carroll MD, Fryar CD, Ogden CL. Prevalence of obesity and severe obesity among adults: United States, 2017–2018. NCHS Data Brief. 2020;360:1–8.

    Google Scholar 

  10. Wu B, Chow W, Sakthivel M, et al. Body mass index variable interpolation to expand the utility of real-world administrative healthcare claims database analyses. Adv Ther. 2021;38:1314–27.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Helen Hardy of Janssen Scientific Affairs and Aakash Bhargava, Simran Modi, and Thapa Anshul Chandrasingh of Mu Sigma, LLC. for their contribution in the research concept development and data exploration phase of this study.

Funding

This research, preparation of the manuscript, and the journal’s Rapid Service fee were funded by Janssen Scientific Affairs, LLC (Titusville, NJ, USA).

Authorship

All named authors meet the International Committee of Medical Journal Editors (ICMJE) criteria for authorship for this article, take responsibility for the integrity of the work as a whole, and have given their approval for this version to be published.

Author Contributions

All authors contributed to study conception and design; contributed to data acquisition, analysis, and interpretation; and contributed to drafting and revising of the manuscript and have given their approval for this manuscript version to be published.

Medical Writing Assistance

The authors would like to thank Michelle McDermott, PharmD, of Cello Health Communications/MedErgy (Yardley, PA, USA), which was supported by Janssen Scientific Affairs, LLC (Titusville, NJ, USA).

Disclosures

Ganhui Lan, Bingcao Wu, and Veronica Ashton are full-time employees of Janssen Scientific Affairs, LLC. Kaustubh Sharma and Kaushal Gadhia are employees of Mu Sigma Business Solutions, LLC.

Compliance with Ethics Guidelines

All datasets were from databases of de-identified patient data and ethics committee approval was not required.

Data Availability

The data that support the study are available within the article and the published study methodology paper and its online supplementary material [10].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bingcao Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, G., Wu, B., Sharma, K. et al. Improved Prediction of Body Mass Index in Real-World Administrative Healthcare Claims Databases. Adv Ther 39, 3835–3844 (2022). https://doi.org/10.1007/s12325-022-02192-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12325-022-02192-4

Keywords

Navigation