Identifying Risk Groups Associated with Colorectal Cancer
In this paper, we explore data mining techniques for the task of identifying and describing risk groups for colorectal cancer (CRC) from population based administrative health data. Association rule discovery, association classification and scalable clustering analysis are applied to the colorectal cancer patients’ profiles in contrast to background patients’ profiles. These data mining methods enable us to identify the most common characteristics of the colorectal cancer patients. The knowledge discovered by data mining methods which are quite different from traditional survey approaches. Although it is heuristic, the data mining methods may identify risk groups for further epidemiological study, such as older patients living near health facilities yet seldom utilising those facilities, and with respiratory and circulatory diseases.
KeywordsColorectal Cancer Data Mining Association Rule Colorectal Cancer Patient Data Mining Technique
Unable to display preview. Download preview PDF.
- 1.Colorectal cancer: The importance of prevention and early detection. Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services (2004)Google Scholar
- 5.Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
- 6.He, H., Chen, J., Jin, H., Hawkins, S., Williams, G., McAullay, D., Sparks, R., Cui, J., Kelman, C.: QLDS: Colorectal cancer data mining analysis. Technical Report 04/92, CSIRO Mathematical and Information Sciences, Canberra (2004)Google Scholar
- 9.Jin, H.-D., Wong, M.-L., Leung, K.-S.: Scalable model-based clustering by working on data summaries. In: Proceedings of Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, November 2003, pp. 91–98 (2003)Google Scholar
- 12.R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2004) ISBN 3-900051-00-3Google Scholar
- 13.Rao, R.B., Sandilya, S., Niculescu, R.S., Germond, C., Rao, H.: Clinical and financial outcomes analysis with existing hospital patient records. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 416–425 (2003)Google Scholar
- 15.Smith, A.E., Anand, S.S.: Patient survival estimation with multiple attributes: adaptation of coxs regression to give an individuals point prediction. In: Proceedings of European Conference in Artificial Intelligence in Intelligent Datamining in Medicine & Pharmacology, Berlin, pp. 51–54 (2000)Google Scholar
- 16.Webb, G.I.: Efficient search for association rules. In: Proceedings of SIGKDD 2000, pp. 99–107 (2000)Google Scholar
- 17.Williams, G., Vickers, D., Baxter, R., Hawkins, S., Kelman, C., Solon, R., He, H., Gu, L.: The Queensland Linked Data Set. Technical Report CMIS 02/21, CSIRO, Canberra (2002)Google Scholar
- 18.Williams, G., Vickers, D., Rainsford, C., Gu, L., He, H., Baxter, R., Hawkins, S.: Bias in the Queensland Linked Data Set. Technical Report 02/117, CSIRO Mathematical and Information Sciences, Canberra (2002)Google Scholar