Abstract
Cancer produces complex cellular changes. Microarrays have become crucial to identifying genes involved in causing these changes; however, microarray data analysis is challenged by the high-dimensionality of data compared to the number of samples. This has contributed to inconsistent cancer biomarkers from various gene expression studies. Also, identification of crucial genes in cancer can be expedited through expression profiling of peripheral blood cells. We introduce a novel feature selection method for microarrays involving a two-step filtering process to select a minimum set of genes with greater consistency and relevance, and demonstrate that the selected gene set considerably enhances the diagnostic accuracy of cancer. The preliminary filtering (Bi-biological filter) involves building gene coexpression networks for cancer and healthy conditions using a topological overlap matrix (TOM) and finding cancer specific gene clusters using Spectral Clustering (SC). This is followed by a filtering step to extract a much-reduced set of crucial genes using best first search with support vector machine (BFS-SVM). Finally, artificial neural networks, SVM, and K-nearest neighbor classifiers are used to assess the predictive power of the selected genes as well as to select the most effective diagnostic system. The approach was applied to peripheral blood profiling for breast cancer where Bi-biological filter selected 415 biologically consistent genes, from which BFS-SVM extracted 13 highly cancer specific genes for breast cancer identification. ANN was the superior classifier with 93.2% classification accuracy, a 14% improvement over the study from which data were obtained for this study (Aaroe et al., Breast Cancer Res 12:R7, 2010).
Data: Available from NCBI Gene Expression Omnibus: accession number GEO:GSE16443.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aaroe J, Lindahl T, Dumeaux V et al (2010) Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res 12(1):R7
Marteau J-B, Mohr S, Pfister M et al (2005) Collection and storage of human blood cells for mrna expression profiling: a 15-month stability study. Clin Chem 51(7):1250–1252
Fang X, Evans K, Willis RC et al (2006) High-throughput sample preparation from whole blood for gene expression analysis. J Assoc Lab Automat 11(6):381–386. https://doi.org/10.1016/j.jala.2006.10.001
Fan X, Shi L, Fang H et al (2010) DNA microarrays are predictive of cancer prognosis: a re-evaluation. Clin Cancer Res 16(2):629–636
Kretschmer C, Sterner-Kock A, Siedentopf F et al (2011) Identification of early molecular markers for breast cancer. Mol Cancer 10(1):15
Ma S, Kosorok MR, Huang J et al (2011) Incorporating higher-order representative features improves prediction in network-based cancer prognosis analysis. BMC Med Genet 4:5
Schrauder MG, Strick R, Schulz-Wendtland R et al (2012) Circulating micro-rnas as potential blood-based markers for early stage breast cancer detection. PLoS One 7(1):E29770
Sharma P, Sahni NS, Tibshirani R et al (2005) Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Res 7(5):R634–R644
Wu CFJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14(4):1261–1295
Obayashi T, Hayashi S, Shibaoka M et al (2008) coxpresdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res 36(suppl 1):D77–D82
Yip A, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 8(1):22
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
Gilad-Bachrach R, Navot A, Tishby N (2004) Margin based feature selection—theory and algorithms. In Proceedings of the twenty-first international conference on machine learning. ACM, Banff
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Proces Syst 14
Samarasinghe S (2010) Neural networks for water system analysis: from fundamentals to complex pattern recognition. In Hydrocomplexity: new tools for solving wicked water problems, international association of hydrological science, Paris. pp 209–213
Al-Yousef A, Samarasinghe S (2011) Ultrasound based computer aided diagnosis of breast cancer: evaluation of a new feature of mass central regularity degree
Dennis G, Sherman BT, Hosack DA et al (2003) DAVID: database for annotation, visualization, and integrated discovery. Genome Biol 4(5):P3
Haldar S, Negrini M, Monne M et al (1994) Down-regulation of bcl-2 by p53 in breast cancer cells. Cancer Res 54(8):2095–2097
Parton M, Dowsett M, Smith I (2001) Studies of apoptosis in breast cancer. BMJ 322(7301):1528–1532
Feng Z, Marti A, Jehn B et al (1995) Glucocorticoid and progesterone inhibit involution and programmed cell death in the mouse mammary gland. J Cell Biol 131(4):1095–1103
Graham JD, Clarke CL (1997) Physiological action of progesterone in target tissues. Endocr Rev 18(4):502–519
European Molecular Biology Laboratory, EMBL-EBI (2011) European Bioinformatics Institute
Simonnet H, Alazard N, Pfeiffer K et al (2002) Low mitochondrial respiratory chain content correlates with tumor aggressiveness in renal cell carcinoma. Carcinogenesis 23(5):759–768
Warburg O (1956) On the origin of cancer cells. Science 123(3191):309–314
Beitsch PD, Clifford E (2000) Detection of carcinoma cells in the blood of breast cancer patients. Am J Surg 180(6):446–449
Annibaldi A, Widmann C (2010) Glucose metabolism in cancer cells. Curr Opin Clin Nutr Metab Care 13(4):466–470. https://doi.org/10.1097/MCO.0b013e32833a5577
Acknowledgements
This work was supported by a Scholarship from Jerash University in Jordan and support from Lincoln University, New Zealand.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Al-Yousef, A., Samarasinghe, S. (2021). A Novel Computational Approach for Biomarker Detection for Gene Expression-Based Computer-Aided Diagnostic Systems for Breast Cancer. In: Cartwright, H. (eds) Artificial Neural Networks. Methods in Molecular Biology, vol 2190. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0826-5_9
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0826-5_9
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0825-8
Online ISBN: 978-1-0716-0826-5
eBook Packages: Springer Protocols