Skip to main content

Identification of Patient Prescribing Predicting Cancer Diagnosis Using Boosted Decision Trees

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11526))

Abstract

Machine learning has potential to identify patterns in pre-diagnostic prescribing that act as an early signal of cancer diagnosis. Danish studies using classical regression models have shown that prescribing of particular drugs increases in the months prior to lung and colorectal cancer diagnosis. The aim of this case-control study is to assess the potential for machine learning to extend these findings to identify combinations of prescriptions that might act as pre-cancer signals. We use a boosted trees approach to analyse prescriptions data from NHS Business Services Authority linked to English cancer registry data to classify individuals into two classes: cancer patients and controls. We then identify the drugs that contributed the most to the classification decisions in the models. To the best of our knowledge, this is the first study utilising machine learning to find pre-diagnostic primary-care-prescription-related indicators of cancer diagnosis in England. We assess two feature selection approaches using text categorisation methods alone and in combination with clinical domain knowledge. Matched samples of controls (ten controls for each patient) to control for age are used throughout. We train models for matched cohorts of 6,770 lung cancer patients and 5,869 colorectal cancer patients starting the cancer pathway for the first time between January and March 2016. The models outperform classical methods by AUC, AUC-PR, and F\(_{0.5}\) score, showing strong potential for using machine learning to extract signals from this dataset to aid earlier diagnosis. Our findings confirm the Danish studies.

Supported by a Cancer Research UK Pioneer Award. Data for this study is based on patient-level information collected by the NHS, as part of the care and support of cancer patients. The data is collated, maintained and quality assured by the National Cancer Registration and Analysis Service, which is part of Public Health England (PHE). Dr. Meena Rafiq is funded by a National Institute for Health Research (NIHR) in-practice clinical fellowship (IPF-2017-11-011). This article presents independent research funded by the NIHR. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. World Health Organisation Cancer Fact Sheet. https://www.who.int/news-room/fact-sheets/detail/cancer. Accessed 3 Jan 2019

  2. Pottegård, A., Hallas, J.: New use of prescription drugs prior to a cancer diagnosis. Pharmacoepidemiol. Drug Saf. 26, 223–227 (2017)

    Article  Google Scholar 

  3. Guldbrandt, L.M., Møller, H., Jakobsen, E., Vedsted, P.: General practice consultations, diagnostic investigations, and prescriptions in the year preceding a lung cancer diagnosis. Cancer Med. 6, 79–88 (2017)

    Article  Google Scholar 

  4. Hansen, P.L., Hjertholm, P., Vedsted, P.: Increased diagnostic activity in general practice during the year preceding colorectal cancer diagnosis. Int. J. Cancer 137, 615–624 (2015)

    Article  Google Scholar 

  5. Henson, K., et al.: Cohort profile: prescriptions dispensed in the community linked to the national cancer registry in England. BMJ Open 8, e020980 (2018)

    Article  Google Scholar 

  6. Joint Formulary Committee: British National Formulary Vol 68 and 72. BMJ Group and Pharmaceutical Press, London (2014, 2016)

    Google Scholar 

  7. Largeron, C., Moulin, C., Géry, M.: Entropy based feature selection for text categorization. In: ACM Symposium on Applied Computing TaiChung, pp. 924–928 (2011)

    Google Scholar 

  8. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 785–794 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josephine French .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

French, J. et al. (2019). Identification of Patient Prescribing Predicting Cancer Diagnosis Using Boosted Decision Trees. In: Riaño, D., Wilk, S., ten Teije, A. (eds) Artificial Intelligence in Medicine. AIME 2019. Lecture Notes in Computer Science(), vol 11526. Springer, Cham. https://doi.org/10.1007/978-3-030-21642-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21642-9_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21641-2

  • Online ISBN: 978-3-030-21642-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics