Skip to main content

Advertisement

Log in

A weighted ensemble-based active learning model to label microarray data

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Classification of cancerous genes from microarray data is an important research area in bioinformatics. Large amount of microarray data are available, but it is very costly to label them. This paper proposes an active learning model, a semi-supervised classification approach, to label the microarray data using which predictions can be made even with lesser amount of labeled data. Initially, a pool of unlabeled instances is given from which some instances are randomly chosen for labeling. Successive selection of instances to be labeled from unlabeled pool is determined by selection algorithms. The proposed method is devised following an ensemble approach to combine the decisions of three classifiers in order to arrive at a consensus which provides a more accurate prediction of the class label to ensure that each individual classifier learns in an uncorrelated manner. Our method combines the heuristic techniques used by an active learning algorithm to choose training samples with the multiple learning paradigm attained by an ensemble to optimize the search space by choosing efficiently from an already sparse learning pool. On evaluating the proposed method on 10 microarray datasets, we achieve performance which is comparable with state-of-the-art methods. The code and datasets are given at https://github.com/anuran-Chakraborty/Active-learning.

Flowchart of the proposed ensemble-based active learning framework

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Dasgupta S, Hsu DJ, Monteleoni C (2008) “A general agnostic active learning algorithm,” in Advances in neural information processing systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., pp. 353–360

  2. Krishnamurthy V (2002) Algorithms for optimal scheduling and management of hidden Markov model sensors. IEEE Trans Signal Process 50(6):1382–1397. https://doi.org/10.1109/TSP.2002.1003062

    Article  Google Scholar 

  3. McCallum A, Nigam K (1998) “Employing EM and pool-based active learning for text classification,” in Proceedings of the Fifteenth International Conference on Machine Learning, pp. 350–358

  4. Settles B, Craven M (2008) “An analysis of active learning strategies for sequence labeling tasks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079

  5. Holub A, Perona P, Burl MC (2008) “Entropy-based active learning for object recognition,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8, doi: https://doi.org/10.1109/CVPRW.2008.4563068

  6. Mitra P, Murthy CA, Pal SK (2004) A probabilistic active support vector learning algorithm. IEEE Trans Pattern Anal Mach Intell 26(3):413–418. https://doi.org/10.1109/TPAMI.2004.1262340

    Article  PubMed  Google Scholar 

  7. Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28(2–3):133–168. https://doi.org/10.1023/A:1007330508534

    Article  Google Scholar 

  8. Zhang C, Chen T (2002) “An active learning framework for content-based information retrieval,” IEEE Trans Multimed, vol. 4, pp. 260–268

  9. Hoi SCH, Jin R, Lyu MR (2006) “Large-scale text categorization by batch mode active learning,” in Proceedings of the 15th International Conference on World Wide Web, pp. 633–642, doi: https://doi.org/10.1145/1135777.1135870

  10. Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C (2003) Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci 43(2):667–673. https://doi.org/10.1021/ci025620t

    Article  CAS  PubMed  Google Scholar 

  11. Liu Y (2004) Active learning with support vector machine applied to gene expression data for cancer classification. J Chem Inf Comput Sci 44(6):1936–1941. https://doi.org/10.1021/ci049810a

    Article  CAS  PubMed  Google Scholar 

  12. Hoi SCH, Jin R, Zhu J, Lyu MR (2006) “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd International Conference on Machine Learning, pp. 417–424, doi: https://doi.org/10.1145/1143844.1143897

  13. Ruskin HJ (2016) Computational modeling and analysis of microarray data: new horizons. Microarrays (Basel, Switzerland) 5(4):26. https://doi.org/10.3390/microarrays5040026

    Article  CAS  Google Scholar 

  14. Epstein CB, Butow RA (2000) Microarray technology - enhanced versatility, persistent challenge. Curr Opin Biotechnol 11(1):36–41. https://doi.org/10.1016/s0958-1669(99)00065-8

    Article  CAS  PubMed  Google Scholar 

  15. Fan J, Ren Y (2006) Statistical analysis of DNA microarray data in cancer research. Clin Cancer Res 12(15):4469–4473. https://doi.org/10.1158/1078-0432.CCR-06-1033

    Article  CAS  PubMed  Google Scholar 

  16. Schalper KA, Velcheti V, Carvajal D, Wimberly H, Brown J, Pusztai L, Rimm DL (2014) In situ tumor PD-L1 mRNA expression is associated with increased TILs and better outcome in breast carcinomas. Clin Cancer Res 20(10):2773–2782. https://doi.org/10.1158/1078-0432.CCR-13-2702

    Article  CAS  PubMed  Google Scholar 

  17. Xu P, Brock GN, Parrish RS (2009) Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput Stat Data Anal 53(5):1674–1687. https://doi.org/10.1016/j.csda.2008.02.005

    Article  Google Scholar 

  18. Kittler J, Hatef M, Duin RPW, Matas J (Mar. 1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239. https://doi.org/10.1109/34.667881

    Article  Google Scholar 

  19. Joshi AJ, Porikli F, Papanikolopoulos N (2009) “Multi-class active learning for image classification,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2372–2379, doi: https://doi.org/10.1109/CVPR.2009.5206627

  20. Ali K (1995) “On the link between error correlation and error reduction in decision tree ensembles,”

  21. Xu L, Krzyzak A, Suen CY (May 1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435. https://doi.org/10.1109/21.155943

    Article  Google Scholar 

  22. Ho TK, Hull JJ, Srihari SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75. https://doi.org/10.1109/34.273716

    Article  Google Scholar 

  23. Wolpert DH (2011) Stacked generalization. Neural Netw 5(2):241–260. https://doi.org/10.1360/zd-2013-43-6-1064

    Article  Google Scholar 

  24. Cao J, Ahmadi M, Shridhar M (1995) Recognition of handwritten numerals with multiple feature and multistage classifier. Pattern Recogn 28(2):153–160. https://doi.org/10.1016/0031-3203(94)00094-3

    Article  Google Scholar 

  25. Kimura F, Shridhar M (1991) Handwritten numerical recognition based on multiple algorithms. Pattern Recogn 24(10):969–983. https://doi.org/10.1016/0031-3203(91)90094-L

    Article  Google Scholar 

  26. Franke J, Mandler E (1992) “A comparison of two approaches for combining the votes of cooperating classifiers,” in Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems, pp. 611–614, doi: https://doi.org/10.1109/ICPR.1992.201786

  27. Bagui SC, Pal NR (1995) A multistage generalization of the rank nearest neighbor classification rule. Pattern Recogn Lett 16(6):601–614. https://doi.org/10.1016/0167-8655(95)80006-F

    Article  Google Scholar 

  28. Hashem S, Schmeiser B (May 1995) Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Trans Neural Netw 6(3):792–794. https://doi.org/10.1109/72.377990

    Article  CAS  PubMed  Google Scholar 

  29. Kittler J, Hater M, Duin RPW (1996) “Combining classifiers,” in Proceedings of 13th International Conference on Pattern Recognition, vol. 2, pp. 897–901 vol.2, doi: https://doi.org/10.1109/ICPR.1996.547205

  30. Kittler TWJ, Hojjatoleslami A (1997) “Weighting factors in multiple expert fusion,” in Proc. British Machine Vision Conf., Colchester, England, pp. 41–50

  31. Rogova G (1994) Combining the results of several neural network classifiers. Neural Netw 7(5):777–781. https://doi.org/10.1016/0893-6080(94)90099-X

    Article  Google Scholar 

  32. Tresp V, Taniguchi M (1995) “Combining estimators using non-constant weighting functions,” in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press, pp. 419–426

  33. Ghosh M, Begum S, Sarkar R, Chakraborty D, Maulik U (2019) Recursive memetic algorithm for gene selection in microarray data. Expert Syst Appl 116:172–185. https://doi.org/10.1016/j.eswa.2018.06.057

    Article  Google Scholar 

  34. Ghosh M, Adhikary S, Ghosh KK, Sardar A, Begum S, Sarkar R (Jan. 2019) Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med Biol Eng Comput 57(1):159–176. https://doi.org/10.1007/s11517-018-1874-4

    Article  PubMed  Google Scholar 

  35. Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. https://doi.org/10.1016/j.patcog.2007.02.007

  36. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159

    Article  PubMed  Google Scholar 

  37. Singh PK, Sarkar R, Nasipuri M (2016) Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int J Comput Sci Math. https://doi.org/10.1504/IJCSM.2016.080073

  38. Singh PK, Sarkar R, Nasipuri M (2015) Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int J Appl Pattern Recognit. https://doi.org/10.1504/ijapr.2015.068929

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anuran Chakraborty.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De, R., Chakraborty, A., Chatterjee, A. et al. A weighted ensemble-based active learning model to label microarray data. Med Biol Eng Comput 58, 2427–2441 (2020). https://doi.org/10.1007/s11517-020-02238-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-020-02238-1

Keywords

Navigation