Abstract
One of the challenging issues in bioinformatics field is that, microarray datasets are imbalance in nature i.e., the majority class dominates the minority class making it difficult for the conventional classifiers to achieve accurate and useful predictions. However, some studies have addressed this issue merely by focusing on binary–class problems. In this article, an ensemble framework is proposed for multiclass imbalance classification problem that combines a meta learning algorithm ‘decorate’ with a sampling technique to deal with the problem in microarray datasets. The meta-learning algorithm builds diverse ensembles of classifiers constructing artificial samples and the sampling technique introduces bias to achieve uniform class distribution to reduce misclassification error. Experimental results on the two highly imbalanced multiclass microarray cancer datasets indicate that the technique applied provides significant improvement in comparison to other conventional ensembles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ghorai, S., Mukherjee, A., Sengupta, S., Dutta, P.K.: Cancer classification from gene expression data by NPPC ensemble. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(3), 659–671 (2011)
Yin, Q.Y., Zhang, J.S., Zhang, C.X., Ji, N.N.: A novel selective ensemble algorithm for imbalanced data classification based on exploratory understanding. Math. Probl. Eng. 2014, 1–14, article ID 358942 (2014)
Kamal, A.H.M., Zhu, X., Narayanan, R.: Gene selection for microarray expression data with imbalanced sample distributions. In: Proceedings of the International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS ’09), pp. 3–9. Shanghai, China (2009)
Blagus, R., Lusa, L.: Class prediction for high-dimensional class-imbalanced data. BMC Bioinf. 11, article 523 (2010)
Wasikowski, M., Chen, X.-W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)
Yu, H., Hong, S., Yang, X., Ni, J., Dan, Y., Qin, B.: Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifier. BioMed Res. Int. 2013, 1–13 (2013)
Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high dimensional data. Briefings Bioinf. 14(1), 13–26 (2013)
Blagus, R., Lusa, L.: Evaluation of SMOTE for high dimensional class-imbalanced microarray data. In: Proceedings of the 11th International Conference on Machine Learning and Applications, pp. 89–94. Boca Raton, Fla, USA (2012)
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man. Cybern. B 42(4), 1119–1130 (2012)
Chawla, N.V., Cieslak, D.A., Hall, L.O., Joshi, A.: Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Disc. 17(2), 225–252 (2008)
Lee, P.H.: Resampling methods improve the predictive power of modeling in class-imbalanced datasets. Int. J. Environ. Res. Public Health 11, 9776–9789 (2014)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. In: Eighteenth International Joint Conference on Artificial Intelligence, pp. 505–510 (2003)
Dash, S., Dash, A.: A correlation based multilayer perceptron algorithm for cancer classification with gene-expression dataset. In: Proceedings of the International Conference on Hybrid Intelligent Systems (HIS), published in IEEE Xplore, 978-1-4799-7633-1/14, Kuwait (2014)
Kira, K., Rendell, L.: The feature selection problem: Traditional methods and new algorithms. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256 (1992)
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RreliefF. Mach. Learn. 53, 23–69 (2003)
Estabrooks, Andrew, Jo, Taeho, Japkowicz, Nathalie: Multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)
Galar, M., Fernandez, A., Barrenechea, E., Bastince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid-based approaches. IEEE Trans, Syst. Man Cybern. Part C: Appl. Rev. 42(4), 463–484 (2012)
Melville, P., Mooney, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the IJCAI. pp. 505–510. Acapulco, Mexico, August (2003)
Ramaswamy, S., Tamayo, P., Rifkin, R. et al.: Multiclass cancer diagnosis using tumor gene expression signatures. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 26, pp. 15149–15154 (2001)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Dash, S. (2016). A Diverse Meta Learning Ensemble Technique to Handle Imbalanced Microarray Dataset. In: Pillay, N., Engelbrecht, A., Abraham, A., du Plessis, M., Snášel, V., Muda, A. (eds) Advances in Nature and Biologically Inspired Computing. Advances in Intelligent Systems and Computing, vol 419. Springer, Cham. https://doi.org/10.1007/978-3-319-27400-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-27400-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27399-0
Online ISBN: 978-3-319-27400-3
eBook Packages: Computer ScienceComputer Science (R0)