Familial Cancer

, Volume 15, Issue 2, pp 241–251 | Cite as

Determining the familial risk distribution of colorectal cancer: a data mining approach

  • Rowena Chau
  • Mark A. Jenkins
  • Daniel D. Buchanan
  • Driss Ait Ouakrim
  • Graham G. Giles
  • Graham Casey
  • Steven Gallinger
  • Robert W. Haile
  • Loic Le Marchand
  • Polly A. Newcomb
  • Noralane M. Lindor
  • John L. Hopper
  • Aung Ko Win
Original Article


This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95 % confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7 % of families (SIR = 7.11; 95 % CI 6.65–7.59) had a strong family history of colorectal cancer; (2) 13 % of families (SIR = 2.94; 95 % CI 2.78–3.10) had a moderate family history of colorectal cancer; (3) 11 % of families (SIR = 1.23; 95 % CI 1.12–1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96–1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60 % of families (SIR = 0.61; 95 % CI 0.57–0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7 % of the population) was 12-times that for people in the lowest risk category (60 %) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.


Data mining Colorectal cancer Familial risk Familial aggregation 



The authors thank all study participants of the Colon Cancer Family Registry and staff for their contributions to this project.


This work was supported by Grant UM1 CA167551 from the National Cancer Institute, National Institutes of Health (NIH) and through cooperative agreements with the following Colon Cancer Family Registry (CCFR) centers: Australasian Colorectal Cancer Family Registry (U01/U24 CA097735), Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (U01/U24 CA074800), Ontario Familial Colorectal Cancer Registry (U01/U24 CA074783), Seattle Colorectal Cancer Family Registry (U01/U24 CA074794), and USC Consortium Colorectal Cancer Family Registry (U01/U24 CA074799). Seattle CCFR research was also supported by the Cancer Surveillance System of the Fred Hutchinson Cancer Research Center, which was funded by Control Nos. N01-CN-67009 (1996–2003) and N01-PC-35142 (2003–2010) and Contract No. HHSN2612013000121 (2010–2017) from the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute with additional support from the Fred Hutchinson Cancer Research Center. The collection of cancer incidence data used in this study was supported by the California Department of Public Health as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under contract HHSN261201000035C awarded to the University of Southern California, and contract HHSN261201000034C awarded to the Public Health Institute; and the Centers for Disease Control and Prevention’s National Program of Cancer Registries, under agreement U58DP003862-01 awarded to the California Department of Public Health. The ideas and opinions expressed herein are those of the author(s) and endorsement by the State of California, Department of Public Health the National Cancer Institute, and the Centers for Disease Control and Prevention or their Contractors and Subcontractors is not intended nor should be inferred. This work is also supported by Centre for Research Excellence grant APP1042021 and Program grant APP1074383 from National Health and Medical Research Council (NHMRC), Australia. AKW is a NHMRC Early Career Fellow. MAJ is an NHMRC Senior Research Fellow. JLH is a NHMRC Senior Principal Research Fellow. DDB is a University of Melbourne Research at Melbourne Accelerator Program (R@MAP) Senior Research Fellow.

Compliance with ethical standards

Conflict of interest

The authors have no conflict of interest to declare with respect to this manuscript.

Supplementary material

10689_2015_9860_MOESM1_ESM.docx (34 kb)
Supplementary material 1 (DOCX 35 kb)
10689_2015_9860_MOESM2_ESM.pptx (722 kb)
Supplementary material 2 (PPTX 722 kb)


  1. 1.
    Taylor DP, Burt RW, Williams MS, Haug PJ, Cannon-Albright LA (2010) Population-based family history-specific risks for colorectal cancer: a constellation approach. Gastroenterology 138(3):877–885CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Baglietto L, Jenkins MA, Severi G et al (2006) Measures of familial aggregation depend on definition of family history: meta-analysis for colorectal cancer. J Clin Epidemiol 59(2):114–124CrossRefPubMedGoogle Scholar
  3. 3.
    Al-Sukhni W, Aronson M, Gallinger S (2008) Hereditary colorectal cancer syndromes: familial adenomatous polyposis and lynch syndrome. Surg Clin North Am 88(4):819–844CrossRefPubMedGoogle Scholar
  4. 4.
    Fain PR, Goldgar DE (1986) A nonparametric test of heterogeneity of family risk. Genet Epidemiol Suppl 1:61–66CrossRefPubMedGoogle Scholar
  5. 5.
    Negri E, Braga C, La Vecchia C et al (1998) Family history of cancer and risk of colorectal cancer in Italy. Br J Cancer 77(1):174–179CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Johns LE, Houlston RS (2001) A systematic review and meta-analysis of familial colorectal cancer risk. Am J Gastroenterol 96(10):2992–3003CrossRefPubMedGoogle Scholar
  7. 7.
    Fuchs CS, Giovannucci EL, Colditz GA, Hunter DJ, Speizer FE, Willett WC (1994) A prospective study of family history and the risk of colorectal cancer. N Engl J Med 331(25):1669–1674CrossRefPubMedGoogle Scholar
  8. 8.
    Goldgar DE, Easton DF, Cannon-Albright LA, Skolnick MH (1994) Systematic population-based assessment of cancer risk in first-degree relatives of cancer probands. J Natl Cancer Inst 86(21):1600–1608CrossRefPubMedGoogle Scholar
  9. 9.
    Ahsan H, Neugut AI, Garbowski GC et al (1998) Family history of colorectal adenomatous polyps and increased risk for colorectal cancer. Ann Intern Med 128(11):900–905CrossRefPubMedGoogle Scholar
  10. 10.
    Winawer SJ, Zauber AG, Gerdes H et al (1996) Risk of colorectal cancer in the families of patients with adenomatous polyps. National Polyp Study Workgroup. N Engl J Med 334(2):82–87CrossRefPubMedGoogle Scholar
  11. 11.
    Slattery ML, Kerber RA (1994) Family history of cancer and colon cancer risk: the Utah population database. J Natl Cancer Inst 86(21):1618–1626CrossRefPubMedGoogle Scholar
  12. 12.
    Newcomb PA, Baron J, Cotterchio M et al (2007) Colon Cancer Family Registry: an international resource for studies of the genetic epidemiology of colon cancer. Cancer Epidemiol Biomark Prev 16(11):2331–2343CrossRefGoogle Scholar
  13. 13.
    Win AK, Lindor NM, Young JP et al (2012) Risks of primary extracolonic cancers following colorectal cancer in Lynch syndrome. J Natl Cancer Inst 104(18):1363–1372CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, BerlinCrossRefGoogle Scholar
  15. 15.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  16. 16.
    Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  17. 17.
    Haykin SS (2009) Neural networks and learning machines, 3rd edn. Prentice Hall, New YorkGoogle Scholar
  18. 18.
    Vesanto J, Himberg J, Alhoniemi E, Parhankangas J (2000) SOM toolbox for Matlab. Tech Rep Laboratory of Computer and Information Science, Helsinki University of TechnologyGoogle Scholar
  19. 19.
    The MathWorks I (2010) MATLAB version 7.10.0. In: Natick, MassachusettsGoogle Scholar
  20. 20.
    Breslow NE, Day NE (1987) Statistical methods in cancer research. Volume II—the design and analysis of cohort studies. IARC Sci Publ 82:1–406Google Scholar
  21. 21.
    Parkin DM, Whelan SL, Ferlay J, Raymond L, Young J (1997) Cancer incidence in five continents, vol VII. International Agency for Research on Cancer, LyonGoogle Scholar
  22. 22.
    Gould W (1995) Jackknife estimation. Stata Tech Bull 4:25–29Google Scholar
  23. 23.
    Ries L, Eisner M, Kosary C et al (2003) SEER cancer statistics review, 1975–2000. National Cancer Institute, BethesdaGoogle Scholar
  24. 24.
    StataCorp (2009) Stata statistical software: release 11. StataCorp LP, College Station, TXGoogle Scholar
  25. 25.
    Kerber RA, O’Brien E (2005) A cohort study of cancer risk in relation to family histories of cancer in the Utah population database. Cancer 103(9):1906–1915CrossRefPubMedGoogle Scholar
  26. 26.
    Teerlink CC, Albright FS, Lins L, Cannon-Albright LA (2012) A comprehensive survey of cancer risks in extended families. Genet Med 14(1):107–114CrossRefPubMedGoogle Scholar
  27. 27.
    Andrieu N, Launoy G, Guillois R, Ory-Paoletti C, Gignoux M (2004) Estimation of the familial relative risk of cancer by site from a French population based family study on colorectal cancer (CCREF study). Gut 53(9):1322–1328CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Hopper JL (2011) Disease-specific prospective family study cohorts enriched for familial risk. Epidemiol Perspect Innov 8(1):2CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Win AK, Ait Ouakrim D, Jenkins MA (2014) Risk profiling: familial colorectal cancer. Cancer Forum 38(1):15–25Google Scholar
  30. 30.
    Hopper JL, Carlin JB (1992) Familial aggregation of a disease consequent upon correlation between relatives in a risk factor measured on a continuous scale. Am J Epidemiol 136(9):1138–1147PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Rowena Chau
    • 1
  • Mark A. Jenkins
    • 1
  • Daniel D. Buchanan
    • 1
    • 2
  • Driss Ait Ouakrim
    • 1
  • Graham G. Giles
    • 1
    • 3
  • Graham Casey
    • 4
  • Steven Gallinger
    • 5
    • 6
  • Robert W. Haile
    • 7
  • Loic Le Marchand
    • 8
  • Polly A. Newcomb
    • 9
  • Noralane M. Lindor
    • 10
  • John L. Hopper
    • 1
  • Aung Ko Win
    • 1
  1. 1.Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global HealthThe University of MelbourneParkvilleAustralia
  2. 2.Colorectal Oncogenomics Group, Genetic Epidemiology Laboratory, Department of PathologyThe University of MelbourneParkvilleAustralia
  3. 3.Cancer Epidemiology CentreThe Cancer Council VictoriaMelbourneAustralia
  4. 4.Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUSA
  5. 5.Samuel Lunenfeld Research InstituteMount Sinai HospitalTorontoCanada
  6. 6.Cancer Care OntarioTorontoCanada
  7. 7.Division of Oncology, Department of MedicineStanford UniversityStanfordUSA
  8. 8.University of Hawaii Cancer CenterHonoluluUSA
  9. 9.Cancer Prevention ProgramFred Hutchinson Cancer Research CenterSeattleUSA
  10. 10.Department of Health Science ResearchMayo Clinic ArizonaScottsdaleUSA

Personalised recommendations