A Novel Synthetic Over-Sampling Technique for Imbalanced Classification of Gene Expressions Using Autoencoders and Swarm Optimization
A new synthetic minority class over-sampling approach for binary (normal/cancer) classification of microarray gene expression data is proposed. The idea is to exploit a previously trained autoencoder in combination with the Particle Swarm Optimisation algorithm to generate new synthetic examples of the minority class for solving the class imbalance problem. Experiments using two different autoencoder representation sizes (500 and 30) and two base classifiers (Support Vector Machine and naïve Bayes) show that the proposed method is able to generate discriminating representations that outperformed state-of-the-art methods such as Synthetic Minority Class Over-sampling Technique and Density-Based Synthetic Minority Class Over-sampling Technique in many test cases.
KeywordsClass imbalance Cancer prediction Autoencoders Classification
- 1.Abadi, M. et. al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems (2015). https://www.tensorflow.org/
- 3.Blagus, R., Lusa, L.: Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 11th International Conference on Machine Learning and Applications (icmla), vol. 2, pp. 89–94. IEEE (2012)Google Scholar
- 13.Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the International Conference on Artificial Intelligence (2005)Google Scholar
- 14.Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML 1997, Nashville, USA (1997)Google Scholar
- 17.Siriseriwan, W.: Smotefamily: A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE (2016)Google Scholar