Journal of Medical Systems

, Volume 36, Issue 2, pp 569–577

Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods

Original Paper


Methods that can accurately predict breast cancer are greatly needed and good prediction techniques can help to predict breast cancer more accurately. In this study, we used two feature selection methods, forward selection (FS) and backward selection (BS), to remove irrelevant features for improving the results of breast cancer prediction. The results show that feature reduction is useful for improving the predictive accuracy and density is irrelevant feature in the dataset where the data had been identified on full field digital mammograms collected at the Institute of Radiology of the University of Erlangen-Nuremberg between 2003 and 2006. In addition, decision tree (DT), support vector machine—sequential minimal optimization (SVM-SMO) and their ensembles were applied to solve the breast cancer diagnostic problem in an attempt to predict results with better performance. The results demonstrate that ensemble classifiers are more accurate than a single classifier.


Digital mammography Feature selection Breast cancer Ensemble classifiers 


  1. 1.
    Razavi, A. R., Gill, H., Åhlfeldt, H., and Shahsavar, N., Predicting metastasis in breast cancer: comparing a decision tree with domain experts. J. Med. Syst. 31:263–273, 2007.CrossRefGoogle Scholar
  2. 2.
    Brenner, H., Long-term survival rates of cancer patients achieved by the end of the 20th century: a period analysis. Lancet. 360:1131–1135, 2002.CrossRefGoogle Scholar
  3. 3.
    Nystrom, L., Andersson, I., Bjurstam, N., Frisell, J., Nordenskjold, B., and Rutqvist, L. E., Long-term effects of mammography screening: updated overview of the Swedish randomised trials. Lancet. 359(9310):909–919, 2002.CrossRefGoogle Scholar
  4. 4.
    Bjurstam, N., Bjorneld, L., Warwick, J., Sala, E., Duffy, S. W., Nyström, L., et al., The Gothenburg breast screening trial. Cancer. 97(10):2387–2396, 2003.CrossRefGoogle Scholar
  5. 5.
    Rijnsburger, A. J., van Oortmarssen, G. J., Boer, R., Draisma, G., Miler, A. B., et al., Mammography benefit in the Canadian National Breast Screening Study-2: a model evaluation. Int. J. Cancer. 110(5):756–762, 2004.CrossRefGoogle Scholar
  6. 6.
    Carney, P. A., Miglioretti, D. L., Yankaskas, B. C., Kerlikowske, K., Rosenberg, R., Rutter, C. M., et al., Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann. Intern. Med. 138(3):168–175, 2003.Google Scholar
  7. 7.
    Pisano, E. D., Gatonis, C., Hendrick, E., Yaffe, M., Baum, J. K., Acharyya, S., et al., Diagnostic performance of digital versus film mammography for breast-cancer screening. N. Engl. J. Med. 353:1773–1783, 2005.CrossRefGoogle Scholar
  8. 8.
    Pisano, E. D., Gatonis, C., Hendrick, E., Yaffe, M., Baum, J. K., Acharyya, S., et al., Diagnostic accuracy of digital versus film mammography: exploratory analysis of selected population subgroups in DMIST. Radiology. 246(3):376–383, 2008.CrossRefGoogle Scholar
  9. 9.
    Kulkarni, A. D., Computer Vision and Fuzzy-Neural Systems. Prentice-Hall, Englewood-Cliffs, 2001.Google Scholar
  10. 10.
    Karssemeijer, N., Adaptive noise equalization and recognition of microcalcification clusters in mammograms. Int. J. Pattern. Recog. Artificial. Intell. 7(6):1357–1376, 1993.CrossRefGoogle Scholar
  11. 11.
    Priebe, C. E., Lorey, R. A., Marchette, D. J., Solka, J. L., and Rogers, G. W., Nonparametric spatio-temporal change point analysis for early detection in mammography. In: Gale, A. G., Astley, S. M., Dance, D. R., and Cairns, A. Y. (Eds.), Digital mammography. Elsevier, Amsterdam, pp. 111–120, 1994.Google Scholar
  12. 12.
    Heine, J. J., Deans, S. R., Cullers, D. K., Stauduhar, R., and Clarke, L. P., Multiresolution statistical analysis of high-resolution digital mammograms. IEEE. Trans. Med. Imaging. 5(16):503–515, 1997.CrossRefGoogle Scholar
  13. 13.
    Rakowski, W., and Clark, M. A., Do groups of women aged 50–75 match the national average mammography rate? Am. J. Prev. Med. 15(3):187–197, 1998.CrossRefGoogle Scholar
  14. 14.
    Chhatwal, J., Alagoz, O., Lindstrom, M. J., Kahn, C. E., Jr., Shaffer, K. A., and Burnside, E. S., A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. Am. J. Roentgenol. 192(4):1117–1127, 2009.CrossRefGoogle Scholar
  15. 15.
    Sameti, M., and Ward, R. K., A fuzzy segmentation algorithm for mammogram partitioning. In: Doi, K., Giger, M. L., Nishikawa, R. M., and Schmidt, R. A. (Eds.), Third international workshop on digital mammography. Elsevier, Amsterdam, pp. 471–474, 1996.Google Scholar
  16. 16.
    Qian, W., Sunden, P., Sjostrom, H., Fenger-Krog, H., and Brodin, U., Comparison of image quality for different digital mammogram image processing parameter settings versus analogue film. Electromedica. 71(1):2–6, 2003.Google Scholar
  17. 17.
    Verma, B., and Zakos, J. A., Computer-aided diagnosis system for digital mammograms based on fuzzy-neural and feature extraction techniques. IEEE T. Inf. Technol. Biomed. 5(1):46–54, 2001.CrossRefGoogle Scholar
  18. 18.
    Acharya, U. R., Ng, E. Y. K., Chang, Y. H., Yang, J., and Kaw, G. J. L., Computer-based identification of breast cancer sing digitized mammograms. J. Med. Syst. 32(6):499–507, 2008.CrossRefGoogle Scholar
  19. 19.
    Rafayah, M., Qutaishat, M., and Abdallah, M., Breast cancer diagnosis system based on wavelet analysis and fuzzy-neural. Expert. Syst. Appl. 28(4):713–723, 2005.CrossRefGoogle Scholar
  20. 20.
    Verma, B., and Panchal, R., Neural networks for the classification of benign and malignant patterns in digital mammograms. In: Fulcher, J. (Ed.), Advances in applied artificial intelligence. Idea Group, USA, 2006.Google Scholar
  21. 21.
    Brijesh, B., Novel network architecture and learning algorithm for the classification of mass abnormalities in digitized mammograms. Artif. Intell. Med. 42(1):67–79, 2008.CrossRefGoogle Scholar
  22. 22.
    Li, Y., and Jiang, J., Combination of SVM knowledge for microcalcification detection in digital mammograms. Lect. Notes Comput. Sci. 3177:359–365, 2004.CrossRefGoogle Scholar
  23. 23.
    de Oliveira Martins, L., Junior, G. B., Correa Silva, A., de Paiva, A. C., and Gattass, M., Detection of masses in digital mammograms using K-means and support vector machine. Electron. Lett. Comput. Vis. Image. Ana. 8(2):39–50, 2009.Google Scholar
  24. 24.
    Yang, J., and Olafsson, S., Optimization-based feature selection with adaptive instance sampling. Comput. Oper. Res. 33(11):3088–3106, 2006.MATHCrossRefGoogle Scholar
  25. 25.
    Rodriguez, J. J., Kuncheva, L. I., and Alonso, C. J., Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10):1619–1630, 2006.CrossRefGoogle Scholar
  26. 26.
    Kuncheva, L. I., Combining pattern classifiers: methods and algorithms. Wiley-IEEE Press, New York, 2004.MATHCrossRefGoogle Scholar
  27. 27.
    Schapire, R. E., The strength of weak learnability. Mach. Learn. 5(2):197–227, 1990.Google Scholar
  28. 28.
    Mitchell, T., Machine learning. McGraw-Hill, New York, 1997.MATHGoogle Scholar
  29. 29.
    Witten, I. H., and Frank, E., Data mining: practical machine learning tools with java implementations. Morgan Kaufmann, San Francisco, 2000.Google Scholar
  30. 30.
    Razavi, A.R., Gill, H., Åhlfeldt, H., and Shahsavar, N.: A data pre-processing method to increase efficiency and accuracy in data mining. In: Miksch, S., Hunter, J., Keravnou, E. (eds.) 10th Conference on Artificial Intelligence in Medicine. Springer-Verlag GmbH, Aberdeen, pp. 434–443, 2005.Google Scholar
  31. 31.
    Quinlan, J. R., C4.5: Programs for machine learning. CA: Morgan Kaufmann, San Mateo, 1993.Google Scholar
  32. 32.
    Vapnik, V. N., The nature of statistical learning theory. Springer, Berlin, 1995.MATHGoogle Scholar
  33. 33.
    Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998.Google Scholar
  34. 34.
    Melville, P., and Monney, R. J., Creating diversity in ensembles using artificial data. Inf. Fusion. 6(1):99–111, 2005.CrossRefGoogle Scholar
  35. 35.
    Schapire, R. E., Freund, Y., Bartlett, P. L., and Lee, W. S., Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26(5):1651–1686, 1998.MathSciNetMATHCrossRefGoogle Scholar
  36. 36.
    Breiman, L., Random forests. Mach. Learn. 45(1):5–32, 2001.MATHCrossRefGoogle Scholar
  37. 37.
    Kim, H. C., Pang, S., Je, H. M., Kim, D., and Bang, S. Y., Constructing support vector machine ensemble. Pattern. Recognit. 36(12):2757–2767, 2003.MATHCrossRefGoogle Scholar
  38. 38.
    Valentini, G., and Dietterich, T. G., Low bias bagged support vector machines. In: Fawcett, T., and Mishra, N. (Eds.), International conference on machine learning. AAAI press, California, 2003.Google Scholar
  39. 39.
    Breiman, L., Bagging predictors. Mach. Learn. 24(2):123–140, 1996.MathSciNetMATHGoogle Scholar
  40. 40.
    Freund, Y., and Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to Boosting. J. Comput. Syst. Sci. 55(1):119–139, 1997.MathSciNetMATHCrossRefGoogle Scholar
  41. 41.
    Zhang, C. X., Zhang, J. S., and Zhang, G. Y., An efficient modified Boosting method for solving classification problems. J. Comput. Appl. Math. 214(2):381–392, 2008.MathSciNetMATHCrossRefGoogle Scholar
  42. 42.
    Webb, G. I., MultiBoosting: a technique for combining Boosting and wagging. Mach. Learn. 40(2):159–197, 2000.CrossRefGoogle Scholar
  43. 43.
    Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34:113–127, 2005.CrossRefGoogle Scholar
  44. 44.
    Centor, R. M., Signal detectability: the use of ROC curves and their analyses. Med. Decis. Mak. 11:102–106, 1991.CrossRefGoogle Scholar
  45. 45.
    Hanley, J. A., and McNeil, B., The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 143(1):29–36, 1982.Google Scholar
  46. 46.
    DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L., Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 44:837–845, 1988.MATHCrossRefGoogle Scholar
  47. 47.
    Newmann, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning database., Irvine, CA: University of California, Department of Information and Computer Science, (1998)
  48. 48.
    Kopans, D. B., D’Orsi, C. J., Adler, D. D., et al., Breast Imaging Reporting and Data System (BIRADS). American College of Radiology, Reston, 1993.Google Scholar
  49. 49.
    Elter, M., Wendtland, R. S., and Wittenberg, T., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34(11):4164–4172, 2007.CrossRefGoogle Scholar
  50. 50.
    Zhang, G. P., Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4):451–462, 2000.CrossRefGoogle Scholar
  51. 51.
    Zangwill, L. M., Chan, K., Bowd, C., Hao, J., Lee, T. W., Weinreb, R. N., et al., Heidelberg retina tomograph measurements of the optic disc and parapapillary retina for detecting glaucoma analyzed by machine learning classifiers. Invest. Ophthalmol. Vis. Sci. 45(3):3144–3151, 2004.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Graduate School of Industry Engineering and ManagementNational Yunlin University of Science and TechnologyDouliouTaiwan

Personalised recommendations