Skip to main content

Advertisement

Log in

Joint feature and instance selection using manifold data criteria: application to image classification

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In many pattern recognition applications feature selection and instance selection can be used as two data preprocessing methods that aim at reducing the computational cost of the learning process. Moreover, in some cases, feature subset selection can improve the classification performance. Feature selection and instance selection can be interesting since the choice of features and instances greatly influence the performance of the learnt models as well as their training costs. In the past, unifying both problems was carried out by solving a global optimization problem using meta-heuristics. This paradigm not only does not exploit the manifold structure of data but can be computationally expensive. To the best of our knowledge, the joint use of sparse modeling representative and feature subset relevance have not been exploited by the joint feature and selection methods. In this paper, we target the joint feature and instance selection by adopting feature subset relevance and sparse modeling representative selection. More precisely, we propose three schemes for the joint feature and instance selection. The first is a wrapper technique while the two remaining ones are filter approaches. In the filter approaches, the search process adopts a genetic algorithm in which the evaluation is mainly given by a score that quantify the goodness of the features and instances. An efficient instance selection technique is used and integrated in the search process in order to adapt the instances to the candidate feature subset. We evaluate the performance of the proposed schemes using image classification where classifiers are the nearest neighbor classifier and support vector machine classifier. The study is conducted on five public image datasets. These experiments show the superiority of the proposed schemes over various baselines. The results confirm that the filter approaches leads to promising improvement on classification accuracy when both feature selection and instance selection are adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.sheffield.ac.uk/eee/research/iel/research/face.

  2. www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  3. http://www.cs.nyu.edu/~roweis/data.html.

  4. https://archive.ics.uci.edu/ml/datasets/Image+Segmentation.

  5. http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html.

  6. https://cvml.ist.ac.at/AwA2/.

  7. http://www.briancbecker.com/blog/research/pubfig83-lfw-dataset/.

  8. http://yann.lecun.com/exdb/mnist/.

  9. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  10. http://www.ccs.neu.edu/home/eelhami/codes.htm.

  11. https://www.mathworks.com/matlabcentral/fileexchange/68210-feature-selection-library.

References

  • Aghazadeh A, Spring R, LeJeune D, Dasarathy G, Shrivastava A, Baraniuk R (2018) Mission: ultra large-scale feature selection using count-sketches. In: ICML

  • Ahn H, Kim K (2009) Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach. Appl Soft Comput 9:599–607

    Article  Google Scholar 

  • Angulo AP, Shin K (2018) Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data. Appl Intell 49(5):1954–1967

    Article  Google Scholar 

  • Becker B, Ortiz E (2013) Evaluating open-universe face identification on the web. In: IEEE conference on computer vision and pattern recognition workshops

  • Bien J, Tibshirani RJ (2011) Sparse estimation of a covariance matrix. Biometrika 98:807–820

    Article  MathSciNet  Google Scholar 

  • Blachnik M (2014) Ensembles of instance selection methods based on feature subset. Proc Comput Sci 35:388–396

    Article  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  Google Scholar 

  • Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. ICML 98:82–90

    Google Scholar 

  • Chen J-H, Chen H-M, Ho S-Y (2005) Design of nearest neighbor classifiers: multi-objective approach. Int J Approx Reason 40(1–2):3–22

    Article  Google Scholar 

  • Chen H-T, Chang H-W, Liu T-L (2005) Local discriminant embedding and its variants. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, volume 2, pp 846–853. IEEE

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Derrac J, Garcia S, Herrera F (2010) IFS-CoCo: instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit 43:2082–2105

    Article  Google Scholar 

  • Dornaika F, Aldine IK (2015) Decremental sparse modeling representative selection for prototype selection. Pattern Recognit 48(11):3717–3727

    Article  Google Scholar 

  • Dornaika F, Aldine I Kamal (2015) Decremental sparse modeling representative selection for prototype selection. Pattern Recognit 48(11):3714–3727

    Article  Google Scholar 

  • Dornaika F, Aldine IK (2018) Instance selection using non-linear sparse modeling. IEEE Trans Circuits Syst Video Technol 28(6):1457–1461

    Article  Google Scholar 

  • Dornaika F, Bosaghzadeh A (2015) Adaptive graph construction using data self-representativeness for pattern classification. Inf Sci 325:118–139

    Article  MathSciNet  Google Scholar 

  • Dornaika F, El Traboulsi Y (2016) Learning flexible graph-based semi-supervised embedding. IEEE Trans Cybern 46(1):206–218

    Article  Google Scholar 

  • Dornaika F, Aldine IK, Cases B (2015) Exemplar selection using collaborative neighbor representation. In: Hybrid artificial intelligence systems, volume LNAI, 9121

  • Du W, Cao Z, Song T, Li Y, Liang Y (2017) A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min 10:4

    Article  Google Scholar 

  • Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: sparse modeling for finding representative objects. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1600–1607. IEEE

  • Elhamifar E, Vidal R (2011) Robust classification using structured sparse representation. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 1873–1879

  • Fernández A, Carmona CJ, del Jesus MJ, Herrera F (2018) A pareto based ensemble with feature and instance selection for learning from multi-class imbalanced datasets. In: Proceedings of the XVIII Conferencia de la Asociación Española para la Inteligencia Artificial (XVIII CAEPIA), pp 1316–1317

  • Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprintarXiv:1202.3725

  • Gunal S, Edizkan R (2008) Subspace based feature selection for pattern recognition. Inf Sci 178(19):3716–3726

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Exemplar selection using collaborative neighbor representation. In: IEEE conference on computer vision and pattern recognition (CVPR)

  • Ishibuchi H, Nakashima T (2000) Multi-objective pattern and feature selection by a genetic algorithm. In: Proceedings of the 2nd annual conference on genetic and evolutionary computation, pp 1069–1076. Morgan Kaufmann Publishers Inc

  • Kaufman L, Rousseeuw P (1987) Statistical data analysis based on the L1-Norm, chapter Clustering by means of medoids, pp 405–416

  • Keinosuke F (1990) Introduction to statistical pattern recognition. Academic Press Inc, London

    MATH  Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP et al (1983) Optimization by simulated annealing. Science 220(4598):671–680

    Article  MathSciNet  Google Scholar 

  • Kuncheva LI, Jain LC (1999) Nearest neighbor classifier: simultaneous editing and feature selection. Pattern Recognit Lett 20(11):1149–1156

    Article  Google Scholar 

  • Kuri-Morales A, Rodríguez-Erazo F (2009) A search space reduction methodology for data mining in large databases. Eng Appl Artif Intell 22(1):57–65

    Article  Google Scholar 

  • Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–201

    Article  Google Scholar 

  • Lim H, Lee J, Kim D-W (2017) Optimization approach for feature selection in multi-label classification. Pattern Recognit Lett 89:25–30

    Article  Google Scholar 

  • Liu Y, Nie F, Wu J, Chen L (2013) Efficient semi-supervised feature selection with noise insensitive trace ratio criterion. Neurocomputing 105:12–18

    Article  Google Scholar 

  • Mohamed R, Yusof MM, Wahidi N (2018) A comparative study of feature selection techniques for bat algorithm in various applications. In: MATEC Web of Conferences, vol 150

  • Nie F, Wang Z, Wang R, Li X (2019) Submanifold-preserving discriminant analysis with an auto-optimized graph. IEEE Trans Cybern

  • Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. In: AAAI

  • Olvera-Lopez JA, Carrasco-Ochoa JA, Martinez-Trinidad JF (2008) Prototype selection via prototype relevance. In: IberoAmerican Congress on Pattern Recognition, LNCS 5197

  • Pelikan M, Mühlenbein H (1998) Marginal distributions in evolutionary algorithms. In: Proceedings of the international conference on genetic algorithms mendel, vol 98, pp 90–95. Citeseer

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  • Perez-Rodriguez J, Arroyo-Pena AG, Garcia-Pedrajas N (2015) Simultaneous instance and feature selection and weighting using evolutionary computation: proposal and study. Appl Soft Comput 37:416–443

    Article  Google Scholar 

  • Ramirez-Cruz J-F, Fuentes O, Alarcon-Aquino V, Garcia-Banuelos L (2006) Instance selection and feature weighting using evolutionary algorithms. In: 15th international conference on computing, 2006. CIC’06, pp 73–79. IEEE

  • Roffo G, Melzi S, Castellani U, Vinciarelli A (2017) Infinite latent feature selection: a probabilistic latent graph-based ranking approach. arXiv:1707.07538

  • Ros F, Guillaume S, Pintore M, Chrétien JR (2008) Hybrid genetic algorithm for dual selection. Pattern Anal Appl 11(2):179–198

    Article  MathSciNet  Google Scholar 

  • Sierra B, Lazkano E, Inza I, Merino M, Larrañaga P, Quiroga J (2001) Prototype selection and feature subset selection by estimation of distribution algorithms. a case study in the survival of cirrhotic patients treated with tips. In: Conference on artificial intelligence in medicine in Europe, pp 20–29, Springer

  • Staczyk U, Zielosko B, Jain LC (2018) Advances in feature selection for data and pattern recognition. Springer, Berlin

    Book  Google Scholar 

  • Suganthi M, Karunakaran V (2018) Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree. Cluster Computing

  • Sun Y, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1–18

    Article  Google Scholar 

  • Teixeira J, Ferreira R, Lima G (2008) A novel approach for integrating feature and instance selection. In: International Conference on machine learning and cybernetics

  • Tsai C, Eberle W, Chu C (2013) Genetic algorithms in feature and instance selection. Knowledge-Based Syst 39:240–247

    Article  Google Scholar 

  • Tsai C-F, Wu J-W (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649

    Article  Google Scholar 

  • Wen J, Xu Y, Li Z, Ma Z i, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47

    Article  Google Scholar 

  • Wen J, Fang X, Cui J, Fei L, Yan K, Chen Y, Xu Y (2018) Robust sparse linear discriminant analysis. IEEE Trans Circuits Syst Video Technol

  • Wilson D, Martinez T (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286

    Article  Google Scholar 

  • Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 40(8):2251–2265

    Google Scholar 

  • Yang W, Li D, Zhu L (2011) An improved genetic algorithm for optimal feature subset selection from multi-character feature set. Expert Syst Appl 38(3):2733–2740

    Article  Google Scholar 

  • Yin Z-X, Chiang J-H (2008) Novel algorithm for coexpression detection in time-varying microarray data sets. IEEE/ACM Trans Comput Biol Bioinform 5(1):120–135

    Article  Google Scholar 

  • Yin J, Yin Z, Lai Z, Zeng W, Wei L (2018) Local sparsity preserving projectionand its application to biometric recognition. Multimed Tools Appl 77:1069–1092

    Article  Google Scholar 

  • Zaffalon M, Hutter M (2002) Robust feature selection using distributions of mutual information. In: Proceedings of the 18th international conference on uncertainty in artificial intelligence (UAI-2002), pp 577–584

  • Zhang A, Gao X (2018) Supervised data-dependent kernel sparsity preserving projection for image recognition. Appl Intell 48(12):4923–4936

    Article  Google Scholar 

  • Zhu R, Dornaika F, Ruichek Y (2019) Learning a discriminant graph-based embedding with feature selection for image categorization. Neural Netw 111:35–46

    Article  Google Scholar 

  • Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1274

    Article  MathSciNet  Google Scholar 

  • Zhu R, Dornaika F, Ruichek Y (2019) Joint graph based embedding and feature weighting for image classification. Pattern Recognit

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fadi Dornaika.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dornaika, F. Joint feature and instance selection using manifold data criteria: application to image classification. Artif Intell Rev 54, 1735–1765 (2021). https://doi.org/10.1007/s10462-020-09889-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09889-4

Keywords

Navigation