SMOTE-D a Deterministic Version of SMOTE
Imbalanced data is a problem of current research interest. This problem arises when the number of objects in a class is much lower than in other classes. In order to address this problem several methods for oversampling the minority class have been proposed. Oversampling methods generate synthetic objects for the minority class in order to balance the amount of objects between classes, among them, SMOTE is one of the most successful and well-known methods. In this paper, we introduce a modification of SMOTE which deterministically generates synthetic objects for the minority class. Our proposed method eliminates the random component of SMOTE and generates different amount of synthetic objects for each object of the minority class. An experimental comparison of the proposed method against SMOTE in standard imbalanced datasets is provided. The experimental results show an improvement of our proposed method regarding SMOTE, in terms of F-measure.
KeywordsImbalanced datasets Oversampling Supervised classification
This work was partly supported by National Council of Science and Technology of Mexico under the scholarship grant 627301.
- 1.Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
- 7.Deepa, T., Punithavalli, M.: An E-SMOTE technique for feature selection in high-dimensional imbalanced dataset. In: 2011 3rd International Conference on Electronics Computer Technology (ICECT), vol. 2. IEEE (2011)Google Scholar
- 8.Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 9.Koto, F.: SMOTE-OUT, SMOTE-COSINE, and selected-SMOTE: an enhancement strategy to handle imbalance in data level. In: 2014 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE (2014)Google Scholar
- 11.Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (1999)Google Scholar
- 12.Shakiba, N., Rueda, L.: MicroRNA identification using linear dimensionality reduction with explicit feature mapping. In: BMC Proceedings. BioMed Central (2013)Google Scholar