Utilizing Data Mining for Predictive Modeling of Colorectal Cancer Using Electronic Medical Records
Colorectal cancer (CRC) is a relatively common cause of death around the globe. Predictive models for the development of CRC could be highly valuable and could facilitate an early diagnosis and increased survival rates. Currently available predictive models are improving, but do not fully utilize the wealth of data available about patients in routine care nor do they take advantage of the developments in the area of data mining. In this paper, a first attempt to generate a predictive model using the CHAID decision tree learner based on anonymously extracted Electronic Medical Records is reported, showing an area under the curve (AUC) of .839 for the adult population and .702 for the age group between 55 and 75.
KeywordsReceiver Operating Characteristic Curve European Prospective Investigation Into Cancer Otitis Externa ICPC Code Temporal Data Mining
Unable to display preview. Download preview PDF.
- 1.Breiman, L.: Bagging predictors. Machine Learning 26, 123–140 (1996)Google Scholar
- 5.Hippisley-Cox, J., Coupland, C.: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. British Journal of GeneralPractice 62(594), e29–e37 (2012)Google Scholar
- 8.Laxman, S., Sastry, P.: A survey of temporal data mining. In: SADHANA, Academy Proceedings in Engineering Sciences, vol. 31 (2006)Google Scholar
- 10.Patnaik, D., Butler, P., Ramakrishnan, N., Parida, L., Keller, B.J., Hanauer, A.: Experiences with Mining Temporal Event Sequences from Electronic Medical Records. In: Proc. of ACM SIGKDD, pp. 360–368 (2011)Google Scholar
- 12.Quinlan, R.: Data Mining Tools See5 and C5.0 (2003), http://www.rulequest.com