Abstract
Current breast cancer research involves the study of many different prognosis factors: primary tumor size, lymph node status, tumor grade, tumor receptor status, p53, and ki67 levels, among others. High-throughput microarray technologies are allowing to better understand and identify prognostic factors in breast cancer. But the massive amounts of data derived from these technologies require the use of efficient computational techniques to unveil new and relevant biomedical knowledge. Furthermore, integrative tools are needed that effectively combine heterogeneous types of biomedical data, such as prognosis factors and expression data. The objective of this study was to integrate information from the main prognostic factors in breast cancer with whole-genome microarray data to identify potential associations among them. We propose the application of a data mining approach, called fuzzy association rule mining, to automatically unveil these associations. This paper describes the proposed methodology and illustrates how it can be applied to different breast cancer datasets. The obtained results support known associations involving the number of copies of chromosome-17, HER2 amplification, or the expression level of estrogen and progesterone receptors in breast cancer patients. They also confirm the correspondence between the HER2 status predicted by different testing methodologies (immunohistochemistry and fluorescence in situ hybridization). In addition, other interesting rules involving CDC6, SOX11, and EFEMP1 genes are identified, although further detailed studies are needed to statistically confirm these findings. As part of this study, a web platform implementing the fuzzy association rule mining approach has been made freely available at: http://www.genome2.ugr.es/biofar.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD INTL conference on management of data (ACM SIGMOD 93), Washington, USA, pp 207–216
Barrett T, Troup D, Wilhite S, Ledoux P et al (2011) NCBI GEO: archive for functional genomics data sets 10 years on. Nucleic Acids Res 39(suppl 1):D1005
Bebek G, Yankg J (2007) Pathfinder: mining signal transduction pathway segments from protein-protein interaction networks. BMC Bioinform 8:335–347
Bempt IV, Van Loo P, Drijkoningen M, Neven P et al (2008) Polysomy 17 in breast cancer: clinicopathologic significance and impact on HER-2 testing. J Clin Oncol 26(30):4869–4874
Berzal F, Blanco I, Sanchez D, Vila MA (2004) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6:221–235
Bownds S, Tong-On P, Rosenberg SA, Parkhurst M (2001) Induction of tumor-reactive cytotoxic T lymphocytes using a peptide from NY-ESO-1 modified at the carboxy-terminus to enhance HLA-A2. 1 binding affinity and stability in solution. J Immunother 24(1):1–9
Brennan D, Ek S, Doyle E, Drew T et al (2009) The transcription factor Sox11 is a prognostic factor for improved recurrence-free survival in epithelial ovarian cancer. Eur J Cancer 45(8):1510–1517
Brennan DJ, Ek S, Doyle E, Drew T et al (2009) The transcription factor Sox11 is a prognostic factor for improved recurrence-free survival in epithelial ovarian cancer. Eur J Cancer 45(8):1510–1517
Burcombe R., Wilson GD, Dowsett M, Khan I et al (2006) Evaluation of Ki-67 proliferation and apoptotic index before, during and after neoadjuvant chemotherapy for primary breast cancer. Breast Cancer Res 8(3):31–33
Carmona-Saez P, Chagoyen M, Rodriguez A, Trelles O et al (2006) Integrated analysis of gene expression by association rules discovery. BMC Bioinform 7:54–69
Cheng CJ, Lin YC, Tsai MT, Chen CS et al (2009) SCUBE2 suppresses breast tumor cell proliferation and confers a favorable prognosis in invasive breast cancer. Cancer Res 69(8):3634–3641
Chibon F, de Mascarel I, Sierankowski G, Brouste V et al (2009) Prediction of HER2 gene status in Her2 2+ invasive breast cancer: a study of 108 cases comparing ASCO/CAP and FDA recommendations. Modern pathology: an official journal of the United States and Canadian Academy of Pathology, Inc 22(3):403–409
Cuadros M, Cano C, Lopez F, Talavera P, Garcia-Perez I, Blanco A, Concha A (2011) Her2 status in breast cancer: experience of a spanish national reference centre. Clin Transl Oncol 13:335–340. doi:10.1007/s12094-011-0663-4
Cuadros M, Villegas R (2009) Systematic review of HER2 breast cancer testing. Appl Immunohistochem Mol Morphol 17(1):1–7
Delgado M, Marin N, Sanchez D, Vila MA (2003) Fuzzy association rules: general model and applications. IEEE Trans Fuzzy Syst 11:214–225
Dy P, Penzo-Mendez A, Wang H, Pedraza C, Macklin W, Lefebvre V (2008) The three SoxC proteins–Sox4, Sox11 and Sox12–exhibit overlapping expression patterns and molecular properties. Nucl Acids Res 36:3101–3117
Ek S, Dictor M, Jerkeman M, Jirstrom K, Borrebaeck C (2008) Nuclear expression of the non B cell lineage Sox11 transcription factor identifies mantle cell lymphoma. Blood 111(2):800
En-lin S, Sheng-guo C, Hua-qiao W (2010) The expression of EFEMP1 in cervical carcinoma and its relationship with prognosis. Gynecologic Oncology 117:417–422
Esseghir S, Todd SK, Hunt T, Poulsom R et al (2007) A role for glial cell derived neurotrophic factor induced expression by inflammatory cytokines and RET/GFR {alpha} 1 receptor up-regulation in breast cancer. Cancer Res 67(4):11732–11741
Esteva FJ, Sahin AA, Cristofanilli M, Arun B et al (2002) Molecular prognostic factors for breast cancer metastasis and survival. Semin Radiat Oncol 12(14):319–328
Galea M, Blamey R, Elston C, Ellis I (1992) The Nottingham Prognostic Index in primary breast cancer. Breast Cancer Res Treat 22(3):207–219
Garcia F, Lopez F, Cano C, Blanco A (2009) Fisim: a new similarity measure between transcription factor binding sites based on the fuzzy integral. BMC Bioinform 10(1):224
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3) Article 9, 1–32
HZhang BP (2004) Using randomization to determine a false discovery rate for rule discovery. In: Proceedings of the fourteenth workshop on information technologies and systems, pp 140–145
Helms MW, Kemming D, Pospisil H, Vogt U et al (2008) Squalene epoxidase, located on chromosome 8q24. 1, is upregulated in 8q+ breast cancer and indicates poor clinical outcome in stage I and II disease. Br J Cancer 99(5):774–780
Hu B, Thirtamara-Rajamani K, Sim H, Viapiano M (2009) Fibulin-3 is uniquely upregulated in malignant gliomas and promotes tumor cell motility and invasion. Mol Cancer Res 7(11):1756
Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-nn method. Bioinformatics 20(1):21–28
Irizarry RA, Bolstad BM, Collin F, Cope LM et al (2003) Summaries of affymetrix genechip probe level data. Nucliec Acids Res 31(4):e15
Iruela-Arispe ML, Porter P, Bornstein P, Sage EH (1996) Thrombospondin-1, an inhibitor of angiogenesis, is regulated by progesterone in the human endometrium. J Clin Invest 97(2):403–412
Ito LS, Iwata H, Hamajima N, Saito T et al (1997) Expression of interleukin-1B in human breast carcinoma. Cancer 80:421–433
Klosgen W (1996) Explora: a multipattern and multistrategy discovery assistant advances in knowledge discovery and data mining. MIT Press, Menlo Park
Labhart P, Karmakar S, Salicru EM, Egan BS et al (2005) Identification of target genes in breast cancer cells directly regulated by the SRC-3/AIB1 coactivator. Proc Natl Acad Sci 102(5):1339–1344
Lee C, Appleby V, Orme A, Chan W, Scotting P (2002) Differential expression of SOX4 and SOX11 in medulloblastoma. J Neuro Oncol 57(3):201–214
Lee SR, Ramos SM, Ko A, Masiello D et al (2002) AR and ER interaction with a p21-activated kinase (PAK6). Mol Endocrinol 16(1):85–99
Leung K, Wong K, Chan T, Wong M, Lee K, Lau C, Tsui S (2010) Discovering protein-DNA binding sequence patterns using association rule mining. Nucl Acids Res 38:6424–6437
Lopez FJ, Blanco A, Garcia F, Cano C et al (2008) Fuzzy association rules for biological data analysis: a case study on yeast. BMC Bioinform 9:107–115
Morgan XC, Ni S, Miranker DP, Iyer VR (2007) Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining. BMC Bioinform 8:445–458
Pan Y (2006) Advances in the discovery of cis regulatory elements 1:326–336
Pritchard K, Shepherd L, O’Malley F, Andrulis I et al (2006) HER2 and responsiveness of breast cancer to adjuvant chemotherapy. New Engl J Med 354(20):2103
Rinott R, Carmeli B, Kent C, Landau D, Maman Y, Rubin Y, Slonim N (2011) Prognostic data-driven clinical decision support-formulation and implications. Stud Health Technol Inform 169:140
Sadr-Nabavi A, Ramser J, Volkmann J, Naehrig J et al (2009) Decreased expression of angiogenesis antagonist EFEMP1 in sporadic breast cancer is caused by aberrant promoter methylation and points to an impact of EFEMP1 as molecular biomarker. Int J Cancer 124(7):1727–1735
Sadr-Nabavi A, Ramser J, Volkmann J, Naehrig J et al (2009) Decreased expression of angiogenesis antagonist EFEMP1 in sporadic breast cancer is caused by aberrant promoter methylation and points to an impact of EFEMP1 as molecular biomarker. Int J Cancer 124(7):1727–1735
Sassen A, Rochon J, Wild P, Hartmann A, Hofstaedter F, Schwarz S, Brockhoff G (2008) Cytogenetic analysis of HER1/EGFR, HER2, HER3 and HER4 in 278 breast cancer patients. Breast Cancer Res 10(1):R2
Sauter G, Lee J, Bartlett J, Slamon D, Press M (2009) Guidelines for human epidermal growth factor receptor 2 testing: biologic and methodologic considerations. J Clin Oncol 27(8):1323
Seeliger H, Camaj P, Ischenko I, Kleespies A, De Toni E, Thieme S, Blum H, Assmann G, Jauch K, Bruns C (2009) EFEMP1 expression promotes in vivo tumor growth in human pancreatic adenocarcinoma. Mol Cancer Res 7(2):189
Sims AH (2009) Bioinformatics and breast cancer: what can high-thoroughput genomic approaches actually tell us? J Clin Pathol 62:879–885
Slamon D, Clark G, Wong S, Levin W, Ullrich A, McGuire W (1987) Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science 235(4785):177
Slamon D, Godolphin W, Jones L, Holt J et al (1989) Studies of the HER-2/neu proto-oncogene in human breast and ovarian cancer. Science 244(4905):707
Stockert E, Jager E, Chen YT, Scanlan MJ et al (1998) A survey of the humoral immune response of cancer patients to a panel of human tumor antigens. J Exp Med 187(8):1349–1354
Thogersen VB, Sorensen BS, Poulsen SS, Orntoft TF et al (2001) A subclass of HER1 ligands are prognostic markers for survival in bladder cancer patients. Cancer Res 61:6227–6233
Thomassen M, Tan Q, Kruse TA (2009) Gene expression meta-analysis identifies chromosomal regions and candidate genes involved in breast cancer metastasis. Breast Cancer Res Treat 113(2):239–249
Toyoda H, Komurasaki T, Uchida D, Morimoto S (1997) Distribution of mRNA for human epiregulin, a differentially expressed member of the epidermal growth factor family. Biochem J 326(Pt 1):69–75
Vanden Bempt I, Vanhentenrijk V, Drijkoningen M, Wlodarska I, Vandenberghe P, De Wolf-Peeters C (2005) Real-time reverse transcription-PCR and fluorescence in-situ hybridization are complementary to understand the mechanisms involved in HER-2/neu overexpression in human breast carcinomas. Histopathology 46(4):431–441
Weigle B, Ebner R, Temme A, Schwind S et al (2005) Highly specific overexpression of the transcription factor SOX11 in human malignant gliomas. Oncol Reports 13(1):139–144
Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353
Zimmerman HJ (2001) Fuzzy sets theory and its applications. Kluwer Academic Publishers, Boston
Acknowledgements
This work has been carried out as part of projects P08-TIC-4299 of J. A., Sevilla, TIN2009-13489 of DGICT, Madrid, GREIB-PYR-2010-05 of University of Granada (MC) and GREIB-PYR-2010-02 of University of Granada (CC). The authors thank the Hospital Universitario Virgen de las Nieves Tumor Bank for providing the samples.
Author information
Authors and Affiliations
Corresponding author
Additional information
F. J. Lopez and M. Cuadros contributed equally to this work.
Electronic supplementary material
Online resource 1 Descriptive study of the datasets.
Online resource 2 Complete rule set obtained from the analysis of the 2,751 patients.
Online resource 3 Complete rule set relating prognostic factors and gene expression data.
Below is the link to the electronic supplementary material.
Below is the link to the electronic supplementary material.
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lopez, F.J., Cuadros, M., Cano, C. et al. Biomedical application of fuzzy association rules for identifying breast cancer biomarkers. Med Biol Eng Comput 50, 981–990 (2012). https://doi.org/10.1007/s11517-012-0914-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-012-0914-8