Oversampling Methods for Classification of Imbalanced Breast Cancer Malignancy Data
During breast cancer malignancy grading the main problem that has direct influence on the classification is imbalanced number of cases of the malignancy classes. This poses a challenge for pattern recognition algorithms and leads to a significant decrease of the classification accuracy for the minority class. In this paper we present an approach which ameliorates such a problem. We describe and compare several state of the art methods, that are based on the oversampling approach, i.e. introduction of artificial objects into the dataset to eliminate the disproportion among classes. We also describe the automatic thresholding and fuzzy c-means algorithms used for the nuclei segmentation from fine needle aspirates. Based on the segmented images a set of 15 feattures used for classification process was extracted.
Keywordspattern recognition image processing imbalanced classification oversampling classifier ensemble breast cancer nuclei segmentation
Unable to display preview. Download preview PDF.
- 6.Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (2011) (article in press)Google Scholar
- 7.He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328 (2008)Google Scholar
- 8.Jeleń, Ł.: Computerized Cancer Malignancy Garding of Fine Needle Aspirates. PhD thesis, Concordia University (2009)Google Scholar
- 13.R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0Google Scholar
- 15.Schölkopf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning. MIT Press (2002)Google Scholar
- 17.Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 Proceedings, pp. 324–331 (2009)Google Scholar