Abstract
During breast cancer malignancy grading the main problem that has direct influence on the classification is imbalanced number of cases of the malignancy classes. This poses a challenge for pattern recognition algorithms and leads to a significant decrease of the classification accuracy for the minority class. In this paper we present an approach which ameliorates such a problem. We describe and compare several state of the art methods, that are based on the oversampling approach, i.e. introduction of artificial objects into the dataset to eliminate the disproportion among classes. We also describe the automatic thresholding and fuzzy c-means algorithms used for the nuclei segmentation from fine needle aspirates. Based on the segmented images a set of 15 feattures used for classification process was extracted.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alpaydin, E.: Combined 5 x 2 cv f test for comparing supervised classification learning algorithms. Neural Computation 11(8), 1885–1892 (1999)
Bloom, H.J.G., Richardson, W.W.: Histological Grading and Prognosis in Breast Cancer. British Journal of Cancer 11, 359–377 (1957)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Chen, S., He, H., Garcia, E.A.: Ramoboost: Ranked minority oversampling in boosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (2011) (article in press)
He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328 (2008)
Jeleń, Ł.: Computerized Cancer Malignancy Garding of Fine Needle Aspirates. PhD thesis, Concordia University (2009)
Jeleń, Ł., Krzyżak, A., Fevens, T.: Comparison of Pleomorphic and Structural Features Used for Breast Cancer Malignancy Classification. In: Bergler, S. (ed.) Canadian AI. LNCS (LNAI), vol. 5032, pp. 138–149. Springer, Heidelberg (2008)
Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice-Hall, New Jersey (1995)
Krawczyk, B.: Pattern recognition approach to classifying cyp 2c19 isoform. Central European Journal of Medicine 7(1), 38–44 (2012)
Krawczyk, B., Woźniak, M.: Combining Diverse One-Class Classifiers. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 590–601. Springer, Heidelberg (2012)
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0
Ridler, T.W., Calvard, S.: Picture thresholding using an iterative selection. IEEE Trans. System, Man and Cybernetics 8, 630–632 (1978)
Schölkopf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning. MIT Press (2002)
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 Proceedings, pp. 324–331 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krawczyk, B., Jeleń, Ł., Krzyżak, A., Fevens, T. (2012). Oversampling Methods for Classification of Imbalanced Breast Cancer Malignancy Data. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds) Computer Vision and Graphics. ICCVG 2012. Lecture Notes in Computer Science, vol 7594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33564-8_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-33564-8_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33563-1
Online ISBN: 978-3-642-33564-8
eBook Packages: Computer ScienceComputer Science (R0)