Skip to main content

Oversampling Methods for Classification of Imbalanced Breast Cancer Malignancy Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7594))

Abstract

During breast cancer malignancy grading the main problem that has direct influence on the classification is imbalanced number of cases of the malignancy classes. This poses a challenge for pattern recognition algorithms and leads to a significant decrease of the classification accuracy for the minority class. In this paper we present an approach which ameliorates such a problem. We describe and compare several state of the art methods, that are based on the oversampling approach, i.e. introduction of artificial objects into the dataset to eliminate the disproportion among classes. We also describe the automatic thresholding and fuzzy c-means algorithms used for the nuclei segmentation from fine needle aspirates. Based on the segmented images a set of 15 feattures used for classification process was extracted.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alpaydin, E.: Combined 5 x 2 cv f test for comparing supervised classification learning algorithms. Neural Computation 11(8), 1885–1892 (1999)

    Article  Google Scholar 

  2. Bloom, H.J.G., Richardson, W.W.: Histological Grading and Prognosis in Breast Cancer. British Journal of Cancer 11, 359–377 (1957)

    Article  Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  4. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Chen, S., He, H., Garcia, E.A.: Ramoboost: Ranked minority oversampling in boosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)

    Article  Google Scholar 

  6. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (2011) (article in press)

    Google Scholar 

  7. He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328 (2008)

    Google Scholar 

  8. Jeleń, Ł.: Computerized Cancer Malignancy Garding of Fine Needle Aspirates. PhD thesis, Concordia University (2009)

    Google Scholar 

  9. Jeleń, Ł., Krzyżak, A., Fevens, T.: Comparison of Pleomorphic and Structural Features Used for Breast Cancer Malignancy Classification. In: Bergler, S. (ed.) Canadian AI. LNCS (LNAI), vol. 5032, pp. 138–149. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice-Hall, New Jersey (1995)

    MATH  Google Scholar 

  11. Krawczyk, B.: Pattern recognition approach to classifying cyp 2c19 isoform. Central European Journal of Medicine 7(1), 38–44 (2012)

    Article  MathSciNet  Google Scholar 

  12. Krawczyk, B., Woźniak, M.: Combining Diverse One-Class Classifiers. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 590–601. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0

    Google Scholar 

  14. Ridler, T.W., Calvard, S.: Picture thresholding using an iterative selection. IEEE Trans. System, Man and Cybernetics 8, 630–632 (1978)

    Article  Google Scholar 

  15. Schölkopf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning. MIT Press (2002)

    Google Scholar 

  16. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)

    Article  MATH  Google Scholar 

  17. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 Proceedings, pp. 324–331 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krawczyk, B., Jeleń, Ł., Krzyżak, A., Fevens, T. (2012). Oversampling Methods for Classification of Imbalanced Breast Cancer Malignancy Data. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds) Computer Vision and Graphics. ICCVG 2012. Lecture Notes in Computer Science, vol 7594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33564-8_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33564-8_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33563-1

  • Online ISBN: 978-3-642-33564-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics