Bioinformatics pp 299-325 | Cite as

Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques

  • Luke Mathieson
  • Alexandre Mendes
  • John Marsden
  • Jeffrey Pond
  • Pablo Moscato
Part of the Methods in Molecular Biology book series (MIMB, volume 1526)


This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686–690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.

Key words

Safe data reduction Combinatorial optimization Minimum feature set Breast cancer diagnostics Memetic algorithms 


  1. 1.
    Bird R, Wallace T, Yankaskas B (1992) Analysis of cancer missed at screening mammography. Radiology 184:613–617CrossRefPubMedGoogle Scholar
  2. 2.
    Hall F, Storella J, Silverstone D, Wyshak G (1988) Nonpalpable breast lesions: recommendations for biopsy based on suspicion of carcinoma at mammography. Radiology 167:353–358CrossRefPubMedGoogle Scholar
  3. 3.
    Cotta C, Sloper C, Moscato P (2004) Evolutionary search of thresholds for robust feature set selection: application to the analysis of microarray data. In: Proceedings of EvoBio2004—2nd European workshop on evolutionary computation and bioinformatics, Coimbra, Portugal, 5–7 April 2004, pp 21–30Google Scholar
  4. 4.
    Kovalerchuk B, Triantaphyllou E, Ruiz J, Torvik V, Vityaev E (2000) The reliability issue of computer-aided breast cancer diagnosis. Comput Biomed Res 33:296–313CrossRefPubMedGoogle Scholar
  5. 5.
    Davies S, Russell S (1994) NP-completeness of searches for smallest possible feature sets. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) fall symposium on relevance, pp 41–43Google Scholar
  6. 6.
    Goldberg D, Sastry K (2010) Genetic algorithms: the design of innovation, 2nd edn. Springer, New YorkGoogle Scholar
  7. 7.
    Moscato P, Cotta C, Mendes A (2004) Memetic algorithms. In: Onwubolu G, Babu B (eds) New optimization techniques in engineering. Springer, New York, pp 53–86CrossRefGoogle Scholar
  8. 8.
    Cotta C, Moscato P (2003) The k-Feature Set problem is W[2]-complete. J Comput Syst Sci 67(4):686–690CrossRefGoogle Scholar
  9. 9.
    Kovalerchuk B, Vityaev E, Ruiz J (2000) Consistent knowledge discovery in medical diagnosis. IEEE Eng Med Biol 19:26–37CrossRefGoogle Scholar
  10. 10.
    Weihe K (1998) Covering trains by stations or the power of data reduction. In: Proceedings of ALEX'98—1st workshop on algorithms and experiments, Trento, Italy, 9–11 February 1998, pp 1–8Google Scholar
  11. 11.
    Berretta R, Mendes A, Moscato P (2007) Selection of discriminative genes in microarray experiments using mathematical programming. J Res Pract Inform Technol 39(4):287–299Google Scholar
  12. 12.
    Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. In: Glover F, Kochenberger G (eds) Handbook of metaheuristics. Springer, New York, pp 105–144CrossRefGoogle Scholar
  13. 13.
    Neri F, Cotta C, Moscato P (2011) Handbook of memetic algorithms. Springer, New YorkGoogle Scholar
  14. 14.
    Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, USAGoogle Scholar
  15. 15.
    Yunus M, Ahmed N, Masroor I, Yaqoob J (2004) Mammographic criteria for determining the diagnostic value of microcalcifications in the detection of early breast cancer. J Pak Med Assoc 54:24–29PubMedGoogle Scholar
  16. 16.
    Cotta C, Mendes A, Garcia V, Franca P, Moscato P (2003) Applying memetic algorithms to the analysis of microarray data. In: Cagnoni S et al. (eds) Proceedings of EvoBIO2003—1st European workshop on evolutionary bioinformatics, Essex, UK, 14–16 April 2003. Lecture Notes in Computer Science, vol 2611. Springer, Heidelberg, pp 22–32Google Scholar
  17. 17.
    Moscato P, Mendes A, Berretta R (2007) Benchmarking a memetic algorithm for ordering microarray data. Biosystems 88(1–2):56–75CrossRefPubMedGoogle Scholar
  18. 18.
    Johnstone D, Milward EA, Berretta R, Moscato P (2012) Multivariate protein signatures of pre-clinical Alzheimer’s disease in the Alzheimer’s disease neuroimaging initiative (ADNI) plasma proteome dataset. PLoS One 7(4):e34341CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    de Paula MR, Ravetti MG, Berretta R, Moscato P (2011) Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PLoS One 6(3):e17481CrossRefGoogle Scholar
  20. 20.
    Ravetti MG, Moscato P (2008) Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9):e3111CrossRefGoogle Scholar
  21. 21.
    Johnstone D, Graham RM, Trinder D, Delima RD, Riveros C, Olynyk JK et al (2012) Brain transcriptome perturbations in the Hfe(−/−) mouse model of genetic iron loading. Brain Res 1448:144–152CrossRefPubMedGoogle Scholar
  22. 22.
    Johnstone DM, Graham RM, Trinder D, Riveros C, Olynyk JK, Scott RJ et al (2012) Changes in brain transcripts related to Alzheimer’s disease in a model of HFE hemochromatosis are not consistent with increased Alzheimer’s disease risk. J Alzheimers Dis 30(4):791–803PubMedGoogle Scholar
  23. 23.
    Ravetti MG, Rosso OA, Berretta R, Moscato P (2010) Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS One 5(4):e10153CrossRefGoogle Scholar
  24. 24.
    Riveros C, Mellor D, Gandhi KS, McKay FC, Cox MB, Berretta R et al (2010) A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis. PLoS One 5(12):e14176CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Rosso OA, Mendes A, Berretta R, Rostas JA, Hunter M, Moscato P (2009) Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity (II): a combinatorial optimization approach for electrode selection. J Neurosci Methods 181(2):257–267CrossRefPubMedGoogle Scholar
  26. 26.
    Mendes A, Scott RJ, Moscato P (2008) Microarrays—identifying molecular portraits for prostate tumors with different Gleason patterns. Methods Mol Med 141:131–151Google Scholar
  27. 27.
    Berretta R, Costa W, Moscato P (2008) Combinatorial optimization models for finding genetic signatures from gene expression datasets. Methods Mol Biol 453:363–377CrossRefPubMedGoogle Scholar
  28. 28.
    Milward EA, Moscato P, Riveros C, Johnstone DM (2014) Beyond statistics: a new combinatorial approach to identifying biomarker panels for the early detection and diagnosis of Alzheimer’s disease. J Alzheimers Dis 39(1):211–217PubMedGoogle Scholar
  29. 29.
    Pastore G, Costantini M, Valentini V, Romani M, Terribile D, Belli P (2002) Clinically nonpalpable breast tumors: global critical review and second look on microcalcifications. Rays 27(4):233–239PubMedGoogle Scholar
  30. 30.
    Bocchi L, Nori J (2007) Shape analysis of microcalcifications using Radon transform. Med Eng Phys 29(6):691–698CrossRefPubMedGoogle Scholar
  31. 31.
    Resende LM, Matias MA, Oliveira GM, Salles MA, Melo FH, Gobbi H (2008) Evaluation of breast microcalcifications according to Breast Imaging Reporting and Data System (BI-RADS) and Le Gal’s classifications. Rev Bras Ginecol Obstet 30(2):75–79CrossRefPubMedGoogle Scholar
  32. 32.
    Wilson GH 3rd, Gore JC, Yankeelov TE, Barnes S, Peterson TE, True JM et al (2014) An approach to breast cancer diagnosis via PET imaging of microcalcifications using 18F-NaF. J Nucl Med 55(7):1138–1143CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Boisserie-Lacroix M, Bullier B, Hurtevent-Labrot G, Ferron S, Lippa N, Mac Grogan G (2014) Correlation between imaging and prognostic factors: molecular classification of breast cancers. Diagn Intervent Imaging 95(2):227–233CrossRefGoogle Scholar
  34. 34.
    Scimeca M, Giannini E, Antonacci C, Pistolese CA, Spagnoli LG, Bonanno E (2014) Microcalcifications in breast cancer: an active phenomenon mediated by epithelial cells with mesenchymal characteristics. BMC Cancer 14:286CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Cox RF, Morgan MP (2013) Microcalcifications in breast cancer: lessons from physiological mineralization. Bone 53(2):437–450CrossRefPubMedGoogle Scholar
  36. 36.
    Jing H, Yang Y, Nishikawa RM (2012) Retrieval boosted computer-aided diagnosis of clustered microcalcifications for breast cancer. Med Phys 39(2):676–685CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Baker R, Rogers KD, Shepherd N, Stone N (2010) New relationships between breast microcalcifications and cancer. Br J Cancer 103(7):1034–1039CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Uematsu T, Kasami M, Yuen S (2009) A cluster of microcalcifications: women with high risk for breast cancer versus other women. Breast Cancer 16(4):307–314CrossRefPubMedGoogle Scholar
  39. 39.
    Karahaliou A, Skiadopoulos S, Boniatis I, Sakellaropoulos P, Likaki E, Panayiotakis G et al (2007) Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis. Br J Radiol 80(956):648–656CrossRefPubMedGoogle Scholar
  40. 40.
    Kamitani T, Yabuuchi H, Soeda H, Matsuo Y, Okafuji T, Sakai S et al (2007) Detection of masses and microcalcifications of breast cancer on digital mammograms: comparison among hard-copy film, 3-megapixel liquid crystal display (LCD) monitors and 5-megapixel LCD monitors: an observer performance study. Eur Radiol 17(5):1365–1371CrossRefPubMedGoogle Scholar
  41. 41.
    Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK (2006) Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. Radiology 240(3):666–673CrossRefPubMedGoogle Scholar
  42. 42.
    Jing H, Yang Y, Nishikawa RM (2012) Regularization in retrieval-driven classification of clustered microcalcifications for breast cancer. Int J Biomed Imaging 2012, id463408Google Scholar
  43. 43.
    Farshid G, Sullivan T, Downey P, Gill PG, Pieterse S (2011) Independent predictors of breast malignancy in screen-detected microcalcifications: biopsy results in 2545 cases. Br J Cancer 105(11):1669–1675CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Hsieh SL, Hsieh SH, Cheng PH, Chen CH, Hsu KP, Lee IS et al (2012) Design ensemble machine learning model for breast cancer diagnosis. J Med Syst 36(5):2841–2847CrossRefPubMedGoogle Scholar
  45. 45.
    Djebbari A, Liu Z, Phan S, Famili F (2008) An ensemble machine learning approach to predict survival in breast cancer. Int J Comput Biol Drug Des 1(3):275–294CrossRefPubMedGoogle Scholar
  46. 46.
    Choi JY, Kim DH, Plataniotis KN, Ro YM (2014) Computer-aided detection (CAD) of breast masses in mammography: combined detection and ensemble classification. Phys Med Biol 59(14):3697–3719CrossRefPubMedGoogle Scholar
  47. 47.
    Ali S, Majid A, Khan A (2014) IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46(4):977–993CrossRefPubMedGoogle Scholar
  48. 48.
    Krawczyk B, Schaefer G (2013) A pruned ensemble classifier for effective breast thermogram analysis. In: Annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 7120–7123Google Scholar
  49. 49.
    Luo ST, Cheng BW (2012) Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst 36(2):569–577CrossRefPubMedGoogle Scholar
  50. 50.
    Takemura A, Shimizu A, Hamamoto K (2010) Discrimination of breast tumors in ultrasonic images using an ensemble classifier based on the AdaBoost algorithm with feature selection. IEEE Trans Med Imaging 29(3):598–609CrossRefPubMedGoogle Scholar
  51. 51.
    Vimieiro R, Moscato P (2014) Disclosed: an efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data. Inform Sci 280:171–187CrossRefGoogle Scholar
  52. 52.
    Vimieiro R, Moscato P (2014) A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs. Inform Syst 40:1–10CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Luke Mathieson
    • 1
  • Alexandre Mendes
    • 1
  • John Marsden
    • 1
  • Jeffrey Pond
    • 1
  • Pablo Moscato
    • 1
  1. 1.Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine (CIBM), Faculty of Engineering and Built EnvironmentThe University of NewcastleCallaghanAustralia

Personalised recommendations