Skip to main content

A Diverse Meta Learning Ensemble Technique to Handle Imbalanced Microarray Dataset

  • Conference paper
  • First Online:
Advances in Nature and Biologically Inspired Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 419))

Abstract

One of the challenging issues in bioinformatics field is that, microarray datasets are imbalance in nature i.e., the majority class dominates the minority class making it difficult for the conventional classifiers to achieve accurate and useful predictions. However, some studies have addressed this issue merely by focusing on binary–class problems. In this article, an ensemble framework is proposed for multiclass imbalance classification problem that combines a meta learning algorithm ‘decorate’ with a sampling technique to deal with the problem in microarray datasets. The meta-learning algorithm builds diverse ensembles of classifiers constructing artificial samples and the sampling technique introduces bias to achieve uniform class distribution to reduce misclassification error. Experimental results on the two highly imbalanced multiclass microarray cancer datasets indicate that the technique applied provides significant improvement in comparison to other conventional ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ghorai, S., Mukherjee, A., Sengupta, S., Dutta, P.K.: Cancer classification from gene expression data by NPPC ensemble. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(3), 659–671 (2011)

    Article  Google Scholar 

  2. Yin, Q.Y., Zhang, J.S., Zhang, C.X., Ji, N.N.: A novel selective ensemble algorithm for imbalanced data classification based on exploratory understanding. Math. Probl. Eng. 2014, 1–14, article ID 358942 (2014)

    Google Scholar 

  3. Kamal, A.H.M., Zhu, X., Narayanan, R.: Gene selection for microarray expression data with imbalanced sample distributions. In: Proceedings of the International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS ’09), pp. 3–9. Shanghai, China (2009)

    Google Scholar 

  4. Blagus, R., Lusa, L.: Class prediction for high-dimensional class-imbalanced data. BMC Bioinf. 11, article 523 (2010)

    Google Scholar 

  5. Wasikowski, M., Chen, X.-W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)

    Article  Google Scholar 

  6. Yu, H., Hong, S., Yang, X., Ni, J., Dan, Y., Qin, B.: Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifier. BioMed Res. Int. 2013, 1–13 (2013)

    Google Scholar 

  7. Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high dimensional data. Briefings Bioinf. 14(1), 13–26 (2013)

    Article  Google Scholar 

  8. Blagus, R., Lusa, L.: Evaluation of SMOTE for high dimensional class-imbalanced microarray data. In: Proceedings of the 11th International Conference on Machine Learning and Applications, pp. 89–94. Boca Raton, Fla, USA (2012)

    Google Scholar 

  9. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man. Cybern. B 42(4), 1119–1130 (2012)

    Article  Google Scholar 

  10. Chawla, N.V., Cieslak, D.A., Hall, L.O., Joshi, A.: Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Disc. 17(2), 225–252 (2008)

    Article  MathSciNet  Google Scholar 

  11. Lee, P.H.: Resampling methods improve the predictive power of modeling in class-imbalanced datasets. Int. J. Environ. Res. Public Health 11, 9776–9789 (2014)

    Article  Google Scholar 

  12. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  13. Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. In: Eighteenth International Joint Conference on Artificial Intelligence, pp. 505–510 (2003)

    Google Scholar 

  14. Dash, S., Dash, A.: A correlation based multilayer perceptron algorithm for cancer classification with gene-expression dataset. In: Proceedings of the International Conference on Hybrid Intelligent Systems (HIS), published in IEEE Xplore, 978-1-4799-7633-1/14, Kuwait (2014)

    Google Scholar 

  15. Kira, K., Rendell, L.: The feature selection problem: Traditional methods and new algorithms. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  16. Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RreliefF. Mach. Learn. 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  17. Estabrooks, Andrew, Jo, Taeho, Japkowicz, Nathalie: Multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)

    Article  MathSciNet  Google Scholar 

  18. Galar, M., Fernandez, A., Barrenechea, E., Bastince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid-based approaches. IEEE Trans, Syst. Man Cybern. Part C: Appl. Rev. 42(4), 463–484 (2012)

    Google Scholar 

  19. Melville, P., Mooney, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the IJCAI. pp. 505–510. Acapulco, Mexico, August (2003)

    Google Scholar 

  20. Ramaswamy, S., Tamayo, P., Rifkin, R. et al.: Multiclass cancer diagnosis using tumor gene expression signatures. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 26, pp. 15149–15154 (2001)

    Google Scholar 

  21. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sujata Dash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Dash, S. (2016). A Diverse Meta Learning Ensemble Technique to Handle Imbalanced Microarray Dataset. In: Pillay, N., Engelbrecht, A., Abraham, A., du Plessis, M., Snášel, V., Muda, A. (eds) Advances in Nature and Biologically Inspired Computing. Advances in Intelligent Systems and Computing, vol 419. Springer, Cham. https://doi.org/10.1007/978-3-319-27400-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27400-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27399-0

  • Online ISBN: 978-3-319-27400-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics