Classification of breast masses in mammograms using genetic programming and feature selection

  • R. J. Nandi
  • A. K. Nandi
  • R. M. Rangayyan
  • D. Scutt
Original Article


Mammography is a widely used screening tool and is the gold standard for the early detection of breast cancer. The classification of breast masses into the benign and malignant categories is an important problem in the area of computer-aided diagnosis of breast cancer. A small dataset of 57 breast mass images, each with 22 features computed, was used in this investigation; the same dataset has been previously used in other studies. The extracted features relate to edge-sharpness, shape, and texture. The novelty of this paper is the adaptation and application of the classification technique called genetic programming (GP), which possesses feature selection implicitly. To refine the pool of features available to the GP classifier, we used feature-selection methods, including the introduction of three statistical measures—Student’s t test, Kolmogorov–Smirnov test, and Kullback–Leibler divergence. Both the training and test accuracies obtained were high: above 99.5% for training and typically above 98% for test experiments. A leave-one-out experiment showed 97.3% success in the classification of benign masses and 95.0% success in the classification of malignant tumors. A shape feature known as fractional concavity was found to be the most important among those tested, since it was automatically selected by the GP classifier in almost every experiment.


Pattern classification Breast masses Breast tumors Breast cancer Computer-aided diagnosis Genetic programming Feature selection 



This research work was partly funded by the Medical Research Council, UK, through the InterDisciplinary Bridging Awards (IDBA) scheme, and by a grant from the University of Calgary Research Grants Committee. Authors would like to thank Mr. L. Zhang, a research student at the University of Liverpool, for his initial assistance with genetic programming code.


  1. 1.
    Page title: Breast Cancer Statistics (2005) Source: UK National Statistics website
  2. 2.
    Yaffe MJ (2001) Digital mammography: IWDM 2000, Madison. Medical Physics Publishing, WIGoogle Scholar
  3. 3.
    Peitgen H–O (2003) Digital mammography: IWDM 2002. Springer, BremenGoogle Scholar
  4. 4.
    Rangayyan RM, Ayres FJ, Desautels JEL (2005) Computer-aided diagnosis of breast cancer: toward the detection of early and subtle signs, the 1st world experts’ congress on women’s health medicine and healthcare. World Academy of Biomedical Technologies, ParisGoogle Scholar
  5. 5.
    Brzakovic D, Luo XM, Brzakovic P (1990) An approach to automated detection of tumours in mammograms. IEEE Trans Med Imaging 9(3):233–241CrossRefGoogle Scholar
  6. 6.
    Kegelmeyer WP, Pruneda Jr JM, Bourland PD, Hillis A, Riggs MW, Nipper ML (1994) Computer-aided mammographic screening for spiculated lesions. Radiology 191(2):331–337Google Scholar
  7. 7.
    Laws KI (1980) Rapid texture identification. In: Proceedings of SPIE, vol 238: Image processing for missile guidance, pp 376–380Google Scholar
  8. 8.
    Rangayyan RM, Mudigonda NR, Desautels JEL (2000) Boundary modeling and shape analysis methods for classification of mammographic masses. Med Biol Eng Comput 38:487–95CrossRefGoogle Scholar
  9. 9.
    Sahiner BS, Chan H-P, Petrick N, Helvie MA, Hadjiiski LM (2001) Improvement of mammographic mass characterization using spiculation measures and morphological features. Med Phys 28(7):1455–1465CrossRefGoogle Scholar
  10. 10.
    Sahiner BS, Chan H-P, Petrick N, Helvie MA, Goodsitt MM (1998) Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis. Med Phys 25(4):516–526CrossRefGoogle Scholar
  11. 11.
    Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC–3(6):610–621CrossRefGoogle Scholar
  12. 12.
    Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804CrossRefGoogle Scholar
  13. 13.
    Shen L, Rangayyan RM, Desautels JEL (1993) Detection and classification of mammographic calcifications. Int J Pattern Recognit Artif Intell 7(6):1403–1416CrossRefGoogle Scholar
  14. 14.
    Rangayyan RM, El-Faramawy NM, Desautels JEL, Alim OA (1997) Measures of acutance and shape for classification of breast tumors. IEEE Trans Med Imaging 16(6):799–810CrossRefGoogle Scholar
  15. 15.
    Sahiner BS, Chan HP, Petrick N, Wagner RF, Hadjiiski L (2000) Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size. Med Phys 27(7):1509–1522CrossRefGoogle Scholar
  16. 16.
    Alto H, Rangayyan RM, Desautels JEL (2005) Content-based retrieval and analysis of mammographic masses. J Electron Imaging 14(2): Article no. 023016, pp 1–17Google Scholar
  17. 17.
    Theodoridis S, Koutroumbas K (2005) Pattern recognition. Academic, New YorkGoogle Scholar
  18. 18.
    Pearson K (1901) Principal components analysis. Lond Edinburgh Dublin Philos Mag J Sci 2(2):559Google Scholar
  19. 19.
    Alberta Cancer Board (2004) Screen test: Alberta Program for the early detection of breast cancer, 2001/2003 biennial report, Edmonton, Alberta.
  20. 20.
    Mudigonda NR, Rangayyan RM, Desautels JEL (2000) Gradient and texture analysis for the classification of mammographic masses. IEEE Trans Med Imaging 19(10):1032–1043CrossRefGoogle Scholar
  21. 21.
    Mudigonda NR, Rangayyan RM, Desautels JEL (2001) Detection of breast masses in mammograms by density slicing and texture flow field analysis. IEEE Trans Med Imaging 20(12):1215–1227CrossRefGoogle Scholar
  22. 22.
    Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, USAzbMATHGoogle Scholar
  23. 23.
    Zhang L, Jack LB, Nandi AK (2005) Fault detection using genetic programming. Mech Syst Signal Process 19:271–289CrossRefGoogle Scholar
  24. 24.
    Guo H, Jack LB, Nandi AK (2005) Feature generation using genetic programming with application to fault classification. IEEE Trans Syst Man Cybern Part B 35(1):89–99CrossRefGoogle Scholar
  25. 25.
    Nordin P, Banzhaf W (1997) Real time control of a khepera robot using genetic programming. Cybern Control 26(3):533–561MathSciNetGoogle Scholar
  26. 26.
    Kishore JK, Patnaik LM, Mani V, Agrawal VK (2000) Application of genetic programming for multicategory pattern classification. IEEE Trans Evol Comput 4(3):242–258CrossRefGoogle Scholar
  27. 27.
    Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1):25–41CrossRefGoogle Scholar
  28. 28.
    Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1989) Numerical recipes in C. Cambridge University Press, Cambridge, UKGoogle Scholar
  29. 29.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Statist 22(1):79–86CrossRefMathSciNetzbMATHGoogle Scholar
  30. 30.
    Nykter M (2004) Feature selection for Lymphoma outcome prediction. In: Proceedings of the 2nd TICSP workshop on computational systems biology. WCSB’2004, Silja Opera, Helsinki-St. Petersburg 14–16 June, pp 51–52Google Scholar
  31. 31.
    Koller D, Shami M (1996) Toward optimal feature selection. In: Proceedings of the 13th international conference on machine learning. ICML–96, pp 284–292Google Scholar
  32. 32.
    Levner I (2005) Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinf 6:68. doi: 10.1186/1471–2105–6–68Google Scholar
  33. 33.
    Sahiner B, Chan HP, Petrick N, Helvie MA, Goodsitt MM, Adler DA (1996) Classification of mass and normal breast tissue: feature selection using a genetic algorithm. In: Proceedings of 3rd internatrional workshop on digital mammography, Chicago, pp 379–384Google Scholar
  34. 34.
    American College of Radiology (ACR) (1998) Illustrated breast imaging reporting and data system (BI-RADS), 3rd edn. American College of Radiology, RestonGoogle Scholar
  35. 35.
    Fukunaga K, Hayes RR (1989) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 11(8):873–885CrossRefGoogle Scholar
  36. 36.
    Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264CrossRefGoogle Scholar
  37. 37.
    Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkzbMATHGoogle Scholar
  38. 38.
    Efron B, Tibshirani RJ (1998) An introduction to the bootstrap. CRC Press LLC, Boca RatonGoogle Scholar
  39. 39.
    Liu Y, Smith MR, Rangayyan RM (2004) The application of Efron’s bootstrap methods in validating feature classification using artificial neural networks for the analysis of mammographic masses. In: 26th annual international conference of the IEEE engineering in medicine and biology society, San Francisco. IEEE, CA, pp 1553–1556Google Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2006

Authors and Affiliations

  • R. J. Nandi
    • 1
  • A. K. Nandi
    • 1
  • R. M. Rangayyan
    • 2
  • D. Scutt
    • 3
  1. 1.Department of Electrical Engineering and ElectronicsThe University of LiverpoolBrownlow Hill, LiverpoolUK
  2. 2.Department of Electrical and Computer Engineering, Schulich School of EngineeringUniversity of CalgaryCalgaryCanada
  3. 3.School of Health SciencesThe University of LiverpoolLiverpoolUK

Personalised recommendations