An instance and variable selection approach in pixel-based classification for automatic white blood cells segmentation

  • Nesma SettoutiEmail author
  • Meryem Saidi
  • Mohammed El Amine Bechar
  • Mostafa El Habib Daho
  • Mohamed Amine Chikh
Short paper


Instance and variable selection involve identifying a subset of instances and variables such that the learning process will use only this subset with better performances and lower cost. Due to the huge amount of data available in many fields, data reduction is considered as an NP-hard problem. In this paper, we present a simultaneous instance and variable selection approach based on the Random Forest-RI ensemble methods in the aim to discard noisy and useless information from the original data set. We proposed a selection principle based on two concepts: the ensemble margin and the importance variable measure of Random Forest-RI. Experiments were conducted on cytological images for the automatic segmentation and recognition of white blood cells WBC (nucleus and cytoplasm). Moreover, in order to explore the performance of our proposed approach, experiments were carried out on standardized datasets from UCI and ASU repository, and the obtained results of the instances and variable selection by the Random Forest classifier are very encouraging.


Instance and variable selection Random Forest Data reduction Small target detection Automatic segmentation Pixel-based classification White blood cells 



  1. 1.
    Azmi R, Norozi N, Anbiaee R, Salehi L, Amirzadi A (2011) Impst: a new interactive self-training approach to segmentation suspicious lesions in breast MRI. J Med Signals Sens 1(2):138–148CrossRefGoogle Scholar
  2. 2.
    Baluja S (1994) Population-based incremental learning: a method for integrating genetic search based function optimization and competitive learning. Technical Report, CMU-CS-94-163, Computer Science Department, Carnegie Mellon UniversityGoogle Scholar
  3. 3.
    Baluja S (1995) An empirical comparison of seven iterative and evolutionary function optimization heuristics. Technical report, School of Computer Science Carnegie Mellon UniversityGoogle Scholar
  4. 4.
    Baluja S, Caruana R (1995) Removing the genetics from the standard genetic algorithm. Technical report, School of Computer Science Carnegie Mellon UniversityGoogle Scholar
  5. 5.
    Bechar ME, Settouti N, Barra V, Chikh MA (2017) Semi-supervised superpixel classification for medical images segmentation: application to detection of glaucoma disease. Multidimens Syst Signal Process. CrossRefGoogle Scholar
  6. 6.
    Benazzouz M, Baghli I, Chikh MA (2013) Microscopic image segmentation based on pixel classification and dimensionality reduction. Int J Imaging Syst Technol 23(1):22–28CrossRefGoogle Scholar
  7. 7.
    Boukir S, Guo L, Chehata N (2013) Classification of remote sensing data using margin-based ensemble methods. In: 2013 IEEE international conference on image processing, pp 2602–2606.
  8. 8.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. CrossRefzbMATHGoogle Scholar
  9. 9.
    Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
  10. 10.
    Cano J, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evolut Comput 7:561–575CrossRefGoogle Scholar
  11. 11.
    Chen ZY, Lin WC, Ke SW, Tsai CF (2015) Evolutionary feature and instance selection for traffic sign recognition. Comput Ind 74:201–211. CrossRefGoogle Scholar
  12. 12.
    Cicconet M, Hochbaum DR, Richmond D, Sabatini BL (2017) Bots for software-assisted analysis of image-based transcriptomics. bioRxiv 5:4. CrossRefGoogle Scholar
  13. 13.
    do Carmo RAF, de Freitas FG, de Souza JT (2010) Empowering simultaneous feature and instance selection in classification problems through the adaptation of two selection algorithms. In: Proceedings of the 2010 9th international conference on machine learning and applicationsGoogle Scholar
  14. 14.
    Derrac J, Garcia S, Herrera F (2010) IFs-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit 49:2082–2105CrossRefGoogle Scholar
  15. 15.
    Derrac J, Triguero I, Garcia S, Herrera F (2012) Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Trans Syst Man Cybern 42:1383–1397CrossRefGoogle Scholar
  16. 16.
    Drimbarean A, Whelan P (2001) Experiments in colour texture analysis. Pattern Recognit Lett 22(10):1161–1167. CrossRefzbMATHGoogle Scholar
  17. 17.
    Ebner M (2007) Color constancy. Wiley, LondonzbMATHGoogle Scholar
  18. 18.
    Eshelman L (1991) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. Morgan Kaufmann, Los Altos, pp 265–283Google Scholar
  19. 19.
    Gao C, Wang L, Xiao Y, Zhao Q, Meng D (2018) Infrared small-dim target detection based on markov random field guided noise modeling. Pattern Recognit 76:463–475. CrossRefGoogle Scholar
  20. 20.
    García-Pedrajas N, Romero del Castillo J, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420MathSciNetCrossRefGoogle Scholar
  21. 21.
    Garcia-Pedrajas N, de Haro-Garcia A, Pérez-Rodriguez J (2014) A scalable memetic algorithm for simultaneous instance and feature selection. Evolut Comput 22(1):1–45. 23544367) CrossRefGoogle Scholar
  22. 22.
    Guo L, Boukir S (2014) Ensemble margin framework for image classification. In: 2014 IEEE international conference on image processing (ICIP), pp 4231–4235.
  23. 23.
    Gupta V, Bhavsar A (2017) Random forest-based feature importance for hep-2 cell image classification. In: Valdés Hernández M, González-Castro V (eds) Medical image understanding and analysis. Springer International Publishing, Cham, pp 922–934CrossRefGoogle Scholar
  24. 24.
    Hamidzadeh J, Monsefi R, Yazdi HS (2016) Large symmetric margin instance selection algorithm. Int J Mach Learn Cybern 7:25–45CrossRefGoogle Scholar
  25. 25.
    Hoehfeld M, Rudolph G (1997) Towards a theory of population based incremental learning. In: Proceedings of the IEEE conference on evolutionary computationGoogle Scholar
  26. 26.
    Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection, chap. 6. Springer, Dordrecht, pp 95–112Google Scholar
  27. 27.
    Kim JH, Park YS, Ahn SH, Kim SK (2014) A feature-based small target detection system. In: Park JJJH, Adeli H, Park N, Woungang I (eds) Mobile, ubiquitous, and intelligent computing. Springer, Berlin, pp 541–548CrossRefGoogle Scholar
  28. 28.
    Kursa MB (2014) Robustness of random forest-based gene selection methods. BMC Bioinform 15(1):8. CrossRefGoogle Scholar
  29. 29.
    Laszlo L, Szidonia L, Simina E, Mircea Florin V (2017) Random forest feature selection approach for image segmentation.
  30. 30.
    Lefkovits L, Lefkovits S, Vaida MF, Emerich S, Maluţan R (2017) Comparison of classifiers for brain tumor segmentation. In: Vlad S, Roman NM (eds) International conference on advancements of medicine and health care through technology; 12th–15th Oct 2016, Cluj-Napoca, Romania. Springer International Publishing, Cham, pp 195–200CrossRefGoogle Scholar
  31. 31.
    Li H, Tan Y, Li Y, Tian J (2014) Image layering based small infrared target detection method. Electron Lett 50:42–44CrossRefGoogle Scholar
  32. 32.
    Li Y, Zhang Y (2018) Robust infrared small target detection using local steering kernel reconstruction. Pattern Recognit 77(C):113–125. CrossRefGoogle Scholar
  33. 33.
    Lim YW, Lee SU (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recognit 23(9):935–952CrossRefGoogle Scholar
  34. 34.
    Liu Y, Zhao H (2017) Variable importance-weighted random forests. Quant Biol 5(4):338–351. MathSciNetCrossRefGoogle Scholar
  35. 35.
    Lizarraga-Morales RA, Sanchez-Yanez RE, Ayala-Ramirez V, Patlan-Rosales AJ (2014) Improving a rough set theory-based segmentation approach using adaptable threshold selection and perceptual color spaces. J Electron Imaging 23(1):013024–013024CrossRefGoogle Scholar
  36. 36.
    Martinez W, Gray JB (2014) The role of margins in boosting and ensemble performance. Wiley Interdiscip Rev Comput Stat 6(2):124–131. CrossRefGoogle Scholar
  37. 37.
    Matale SM, Banait SS (2017) A review on instance and feature selection in big data environment. Int J Adv Res Innov Ideas Educ 3(2):519–523Google Scholar
  38. 38.
    Mellor A, Boukir S, Haywood A, Jones S (2015) Using ensemble margin to explore issues of training data imbalance and mislabeling on large area land cover classification. In: 2014 IEEE international conference on image processing, ICIP 2014, pp 5067–5071.
  39. 39.
    Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. Retrieved 21 May 2019
  40. 40.
    Nguyen TT, Zhao H, Huang JZ, Nguyen TT, Li MJ (2015) A new feature sampling method in random forests for predicting high-dimensional data. In: Cao T, Lim EP, Zhou ZH, Ho TB, Cheung D, Motoda H (eds) Advances in knowledge discovery and data mining. Springer International Publishing, Cham, pp 459–470CrossRefGoogle Scholar
  41. 41.
    Ohta YI, Kanade T, Sakai T (1980) Color information for region segmentation. Comput Graph Image Process 13(3):222–241CrossRefGoogle Scholar
  42. 42.
    Paschos G (2001) Perceptually uniform color spaces for color texture analysis: an empirical evaluation. IEEE Trans Image Process 10(6):932–937. CrossRefzbMATHGoogle Scholar
  43. 43.
    Phung SL, Bouzerdoum A, Chai D (2005) Skin segmentation using color pixel classification: analysis and comparison. IEEE Trans Pattern Anal Mach Intell 27(1):148–154CrossRefGoogle Scholar
  44. 44.
    Potter MA, De Jong K (2000) Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evolut Comput 8:1–29CrossRefGoogle Scholar
  45. 45.
    Pérez-Rodríguez J, Arroyo-Peña AG, García-Pedrajas N (2015) Simultaneous instance and feature selection and weighting using evolutionary computation: proposal and study. Appl Soft Comput 37:416–443. CrossRefGoogle Scholar
  46. 46.
    Ramirez-Cruz JF, Fuentes O, V AA, L GB (2006) Instance selection and feature weighting using evolutionary algorithms. In: Proceedings of the 15th international conference on computing (CIC’06)Google Scholar
  47. 47.
    Ros F, Harba R, Pintore M (2012) Fast dual selection using genetic algorithms for large data sets. In: 12th international conference on intelligent systems design and applications (ISDA)Google Scholar
  48. 48.
    Saidi M, Bechar MEA, Settouti N, Chikh MA (2017) Instances selection algorithm by ensemble margin. J Exp Theor Artif Intell. CrossRefGoogle Scholar
  49. 49.
    Saidi M, El Amine Bechar M, Settouti N, Chikh MA (2016) Application of pixel selection in pixel-based classification for automatic white blood cell segmentation. In: Proceedings of the Mediterranean conference on pattern recognition and artificial intelligence, MedPRAI-2016. ACM, New York, pp 31–38.
  50. 50.
    Sakinah S, Ahmad S, Pedrycz W (2011) Feature and instance selection via cooperative PSO. IEEEGoogle Scholar
  51. 51.
    Saraswat M, Arya KV (2014) Feature selection and classification of leukocytes using random forest. Med Biol Eng Comput 52(12):1041–1052. CrossRefGoogle Scholar
  52. 52.
    Schapire R, Freund F (2012) Boosting: foundations and algorithms. The MIT Press, CambridgezbMATHGoogle Scholar
  53. 53.
    Serra J (1986) Introduction to mathematical morphology. Comput Vis Graph Image Process 35(3):283–305. CrossRefzbMATHGoogle Scholar
  54. 54.
    Settouti N, El Habib Daho M, Bechar MEA, Lazouni MA, Chikh MA (2018) Semi-automated method for the glaucoma monitoring. Springer International Publishing, Cham. CrossRefGoogle Scholar
  55. 55.
    Sirikulviriya N, Sinthupinyo S (2011) Integration of rules from a random forest. In: International conference on information and electronics engineering IPCSIT, vol 6. IACSIT Press, SingaporeGoogle Scholar
  56. 56.
    Soltaninejad M, Zhang L, Lambrou T, Allinson NM, Ye X (2017) Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural network. CoRR arXiv:abs/1704.08134.
  57. 57.
    Teixeira de Souza J, Ferreira do Carmo RA, Lima De Campos GA (2008) A novel approach for integrating feature and instance selection. In: Proceedings of the 7th international conference on machine learning and cybernetics. KunmingGoogle Scholar
  58. 58.
    Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247CrossRefGoogle Scholar
  59. 59.
    Vandenbroucke N, Macaire L, Postaire JG (2003) Color image segmentation by pixel classification in an adapted hybrid color space. Application to soccer image analysis. Comput Vis Image Underst 90(2):190–216. CrossRefGoogle Scholar
  60. 60.
    Villuendas-Rey Y, Caballero-Mota Y, Garcìa-Lorenzo M (2013) Intelligent feature and instance selection to improve nearest neighbor classifiers. Springer, BerlinCrossRefGoogle Scholar
  61. 61.
    Wang H, Yang F, Zhang C, Ren M (2018) Infrared small target detection based on patch image model with local and global analysis. Int J Image Graph 18(01):1850002. MathSciNetCrossRefGoogle Scholar
  62. 62.
    Wang L, Gao Y, Shi F, Li G, Chen K, Tang Z, Xia J, Shen D (2016) Automated segmentation of CBCT image with prior-guided sequential random forest. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9601 LNCS. Springer, Germany, pp 72–82. CrossRefGoogle Scholar
  63. 63.
    Yang J, Yao D, Zhan X, Zhan X (2014) Predicting disease risks using feature selection based on random forest and support vector machine. In: Basu M, Pan Y, Wang J (eds) Bioinformatics research and applications. Springer International Publishing, Cham, pp 1–11Google Scholar
  64. 64.
    Zafarani R, Liu H (1998) Asu repository of social computing databases. Retrieved 21 May 2019
  65. 65.
    Zhang L, Chen C, Bu J, He X (2012) A unified feature and instance selection framework using optimum experimental design. IEEE Trans Image Process 21(5):2379–2388MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2020

Authors and Affiliations

  1. 1.Biomedical Engineering Laboratory GBMUniversity of TlemcenTlemcenAlgeria

Personalised recommendations