Utilizing Data Mining for Predictive Modeling of Colorectal Cancer Using Electronic Medical Records

  • Mark Hoogendoorn
  • Leon M. G. Moons
  • Mattijs E. Numans
  • Robert-Jan Sips
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8609)


Colorectal cancer (CRC) is a relatively common cause of death around the globe. Predictive models for the development of CRC could be highly valuable and could facilitate an early diagnosis and increased survival rates. Currently available predictive models are improving, but do not fully utilize the wealth of data available about patients in routine care nor do they take advantage of the developments in the area of data mining. In this paper, a first attempt to generate a predictive model using the CHAID decision tree learner based on anonymously extracted Electronic Medical Records is reported, showing an area under the curve (AUC) of .839 for the adult population and .702 for the age group between 55 and 75.


Receiver Operating Characteristic Curve European Prospective Investigation Into Cancer Otitis Externa ICPC Code Temporal Data Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L.: Bagging predictors. Machine Learning 26, 123–140 (1996)Google Scholar
  2. 2.
    Ferlay, J., Parkin, D.M., Steliarova-Foucher, E.: Estimates of cancer incidence and mortality in Europe in 2008. European Journal of Cancer 46(4), 765–781 (2010)CrossRefGoogle Scholar
  3. 3.
    Grobbee, D.E., Hoes, A.W., Verheij, T.J., Schrijvers, A.J., van Ameijden, E.J., Numans, M.E.: The Utrecht Health Project: optimization of routine healthcare data for research. Eur. J. Epidemiol. 20(3), 285–287 (2005)CrossRefGoogle Scholar
  4. 4.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology 143, 29–36 (1982)CrossRefGoogle Scholar
  5. 5.
    Hippisley-Cox, J., Coupland, C.: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. British Journal of GeneralPractice 62(594), e29–e37 (2012)Google Scholar
  6. 6.
    Kass, G.V.: An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics 29(2), 119–127 (1980)CrossRefGoogle Scholar
  7. 7.
    Lamberts, H., Wood, M., Hofmans-Okkes, I.M.: International primary care classifications: the effect of fifteen years of evolution. Fam. Pract. 9(3), 330–339 (1992)CrossRefGoogle Scholar
  8. 8.
    Laxman, S., Sastry, P.: A survey of temporal data mining. In: SADHANA, Academy Proceedings in Engineering Sciences, vol. 31 (2006)Google Scholar
  9. 9.
    Marshall, T., Lancashire, R., Sharp, D., Peters, T.J., Cheng, K.K., Hamilton, W.: The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut. 60(9), 1242–1248 (2011)CrossRefGoogle Scholar
  10. 10.
    Patnaik, D., Butler, P., Ramakrishnan, N., Parida, L., Keller, B.J., Hanauer, A.: Experiences with Mining Temporal Event Sequences from Electronic Medical Records. In: Proc. of ACM SIGKDD, pp. 360–368 (2011)Google Scholar
  11. 11.
    Post, A.R., Harrison, J.H.: Temporal data mining. Clinics in Laboratory Medicine 28(1), 83–100 (2008)CrossRefGoogle Scholar
  12. 12.
    Quinlan, R.: Data Mining Tools See5 and C5.0 (2003),
  13. 13.
    Riboli, E., et al.: European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutrition 5(6b), 1113–1124 (2002)CrossRefGoogle Scholar
  14. 14.
    Zhang, J., Silvescu, A., Honavar, V.G.: Ontology-driven induction of decision trees at multiple levels of abstraction. In: Koenig, S., Holte, R. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, p. 316. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Mark Hoogendoorn
    • 1
  • Leon M. G. Moons
    • 2
  • Mattijs E. Numans
    • 3
    • 4
  • Robert-Jan Sips
    • 5
  1. 1.Department of Computer ScienceVU University AmsterdamAmsterdamThe Netherlands
  2. 2.Department of Gastroenterology and HepatologyUtrecht University Medical CenterUtrechtThe Netherlands
  3. 3.Department of Public Health and Primary CareLeiden University Medical CenterLeidenThe Netherlands
  4. 4.Julius Center for Health Sciences and Primary CareUtrecht University Medical CenterUtrechtThe Netherlands
  5. 5.IBM Netherlands, Center for Advanced StudiesAmsterdamThe Netherlands

Personalised recommendations