Data Mining pp 260-272 | Cite as

Identifying Risk Groups Associated with Colorectal Cancer

  • Jie Chen
  • Hongxing He
  • Huidong Jin
  • Damien McAullay
  • Graham Williams
  • Chris Kelman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3755)


In this paper, we explore data mining techniques for the task of identifying and describing risk groups for colorectal cancer (CRC) from population based administrative health data. Association rule discovery, association classification and scalable clustering analysis are applied to the colorectal cancer patients’ profiles in contrast to background patients’ profiles. These data mining methods enable us to identify the most common characteristics of the colorectal cancer patients. The knowledge discovered by data mining methods which are quite different from traditional survey approaches. Although it is heuristic, the data mining methods may identify risk groups for further epidemiological study, such as older patients living near health facilities yet seldom utilising those facilities, and with respiratory and circulatory diseases.


Colorectal Cancer Data Mining Association Rule Colorectal Cancer Patient Data Mining Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Colorectal cancer: The importance of prevention and early detection. Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, U.S. Department of Health and Human Services (2004)Google Scholar
  2. 2.
    Chen, J., He, H., Williams, G., Jin, H.: Temporal sequence associations for rare events. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 235–239. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Cios, K.J., Moore, G.W.: Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1-2), 1–24 (2002)CrossRefGoogle Scholar
  4. 4.
    Gu, L., Li, J., He, H., Williams, G., Hawkins, S., Kelman, C.: Association rule discovery with unbalanced class. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 221–232. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  6. 6.
    He, H., Chen, J., Jin, H., Hawkins, S., Williams, G., McAullay, D., Sparks, R., Cui, J., Kelman, C.: QLDS: Colorectal cancer data mining analysis. Technical Report 04/92, CSIRO Mathematical and Information Sciences, Canberra (2004)Google Scholar
  7. 7.
    Jin, H.-D., Leung, K.-S., Wong, M.-L., Xu, Z.-B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)CrossRefGoogle Scholar
  8. 8.
    Jin, H.-D., Shum, W., Leung, K.-S., Wong, M.-L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Jin, H.-D., Wong, M.-L., Leung, K.-S.: Scalable model-based clustering by working on data summaries. In: Proceedings of Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA, November 2003, pp. 91–98 (2003)Google Scholar
  10. 10.
    Li, J., Shen, H., Topor, R.: Mining the optimal class association rule set. Knowledge-Based Systems 15(7), 399–405 (2002)CrossRefGoogle Scholar
  11. 11.
    McClisha, D., Penberthyb, L., Pughc, A.: Using medicare claims to identify second primary cancers and recurrences in order to supplement a cancer registry. Journal of Clinical Epidemiology 56, 760–767 (2003)CrossRefGoogle Scholar
  12. 12.
    R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2004) ISBN 3-900051-00-3Google Scholar
  13. 13.
    Rao, R.B., Sandilya, S., Niculescu, R.S., Germond, C., Rao, H.: Clinical and financial outcomes analysis with existing hospital patient records. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 416–425 (2003)Google Scholar
  14. 14.
    Roddick, J., Fule, P., Graco, W.: Exploratory medical knowledge discovery: Experiences and issues. SIGKDD Exploration 5(1), 94–99 (2003)CrossRefGoogle Scholar
  15. 15.
    Smith, A.E., Anand, S.S.: Patient survival estimation with multiple attributes: adaptation of coxs regression to give an individuals point prediction. In: Proceedings of European Conference in Artificial Intelligence in Intelligent Datamining in Medicine & Pharmacology, Berlin, pp. 51–54 (2000)Google Scholar
  16. 16.
    Webb, G.I.: Efficient search for association rules. In: Proceedings of SIGKDD 2000, pp. 99–107 (2000)Google Scholar
  17. 17.
    Williams, G., Vickers, D., Baxter, R., Hawkins, S., Kelman, C., Solon, R., He, H., Gu, L.: The Queensland Linked Data Set. Technical Report CMIS 02/21, CSIRO, Canberra (2002)Google Scholar
  18. 18.
    Williams, G., Vickers, D., Rainsford, C., Gu, L., He, H., Baxter, R., Hawkins, S.: Bias in the Queensland Linked Data Set. Technical Report 02/117, CSIRO Mathematical and Information Sciences, Canberra (2002)Google Scholar
  19. 19.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jie Chen
    • 1
  • Hongxing He
    • 1
  • Huidong Jin
    • 1
  • Damien McAullay
    • 1
  • Graham Williams
    • 1
    • 2
  • Chris Kelman
    • 3
  1. 1.CSIRO Mathematical and Information SciencesCanberraAustralia
  2. 2.Australian Taxation OfficeCanberraAustralia
  3. 3.National Centre for Epidemiology and Population HealthThe Australian National UniversityCanberraAustralia

Personalised recommendations