On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection

Derrac, J.; Verbiest, N.; García, S.; Cornelis, C.; Herrera, F.

doi:10.1007/s00500-012-0888-3

On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection

Focus
Published: 20 July 2012

Volume 17, pages 223–238, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

J. Derrac¹,
N. Verbiest²,
S. García³,
C. Cornelis^1,2 &
…
F. Herrera¹

624 Accesses
25 Citations
Explore all metrics

Abstract

The k-nearest neighbors classifier is a widely used classification method that has proven to be very effective in supervised learning tasks. In this paper, a fuzzy rough set method for prototype selection, focused on optimizing the behavior of this classifier, is presented. The hybridization with an evolutionary feature selection method is considered to further improve its performance, obtaining a competent data reduction algorithm for the 1-nearest neighbors classifier. This hybridization is performed in the training phase, by using the solution of each preprocessing technique as the starting condition of the other one, within a cycle. The results of the experimental study, which have been contrasted through nonparametric statistical tests, show that the new hybrid approach obtains very promising results with respect to classification accuracy and reduction of the size of the training set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi Threshold FRPS: A New Approach to Fuzzy Rough Set Prototype Selection

OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory

Efficient Feature Selection Algorithm Based on Population Random Search with Adaptive Memory Strategies

Notes

http://sci2s.ugr.es/pr/.
http://www.keel.es/datasets.php.
http://sci2s.ugr.es/sicidm/.
The experiments have been carried out on a machine with a Dual Core 3,20 GHz processor and 2GB of RAM, running under the Fedora 4 operating System.

References

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Google Scholar
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2008) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Article Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput 17(2–3):255–287
Google Scholar
Almuallim H, Dietterich T (1991) Learning with many irrelevant features. In: Proceedings of the 9th national conference on artificial intelligence, vol 2, Anaheim, CA, USA, July 14–19, The MIT Press, pp 547–552
Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge
Google Scholar
Bell G, Hey T, Szalay A (2009) Beyond the data deluge. Science 323:1297–1298
Article Google Scholar
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Trans Evol Comput 7(6):561–575
Article Google Scholar
Cano JR, Herrera F, Lozano M (2007) Evolutionary stratified training set selection for extracting classification rules with trade-off precision-interpretability. Data Knowl Eng 60:90–100
Article Google Scholar
Cano JR, Herrera F, Lozano M, García S (2008) Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection. Expert Syst Appl 35:1949–1965
Article Google Scholar
Casillas J, Cordon O, Del Jesus MJ, Herrera F (2001) Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems. Inf Sci 136:135–157
Article MATH Google Scholar
Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
MathSciNet MATH Google Scholar
Cornelis C, Jensen R, Hurtado G, Slezak D (2010) Attribute selection with fuzzy decision reducts. Inf Sci 180:209–224
Article MathSciNet MATH Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
De Cock M, Cornelis C, Kerre EE (2007) Fuzzy rough sets: The forgotten step. IEEE Trans Fuzzy Syst 15(1):121–130
Article Google Scholar
Derrac J, García S, Herrera F (2010a) IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit 43(6):2082–2105
Article MATH Google Scholar
Derrac J, García S, Herrera F (2010b) A survey on evolutionary instance selection and generation. Int J Appl Metaheur Comput 1(1):60–92
Article Google Scholar
Derrac J, Cornelis C, García S, Herrera F (2012) Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Inf Sci 186(1):73–92
Article Google Scholar
Destercke S (2012) A k-nearest neighbours method based on imprecise probabilities. Soft Comput 16(5):833–844
Article Google Scholar
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J General Syst 17:191–209
Article MATH Google Scholar
Eiben AE, Smith JE (2003) Introduction to Evolutionary Computing. Natural Computing, Springer-Verlag, Berlin
Google Scholar
Eshelman LJ (1991) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Rawlins GJE (ed) Foundations of genetic algorithms, Morgan Kaufmann, San Mateo, pp 265–283
Google Scholar
Ferrandiz S, Boullé M (2010) Bayesian instance selection for the nearest neighbor rule. Mach Learn 81(81):229–256
Article Google Scholar
Franco A, Maltoni D, Nanni L (2010) Data pre-processing through reward-punishment editing. Pattern Anal Appl 13:367–381
Article MathSciNet Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer-Verlag, Berlin
MATH Google Scholar
García S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol Comput 17(3):275–306
Article Google Scholar
García S, Cano JR, Herrera F (2008) A memetic algorithm for evolutionary prototype selection: A scaling up approach. Pattern Recognit 41(8):2693–2709
Article MATH Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064
Article Google Scholar
García S, Derrac J, Cano JR, Herrera F (2012a) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Article Google Scholar
García S, Luengo J, Sáez JA, López V, Herrera F (2012b) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng (in press)
García-Pedrajas N (2011) Evolutionary computation for training set selection. Wiley Interdiscip Rev Data Min Knowl Dis 1(6):512–523
Article Google Scholar
García-Pedrajas N, Romero JA, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420
Article Google Scholar
Ghosh A, Jain LC (eds) (2005) Evolutionary computation in data mining. Springer-Verlag, Berlin
MATH Google Scholar
Gil-Pita R, Yao X (2008) Evolving edited k-nearest neighbor classifiers. Int J Neural Syst 18(6):1–9
Article Google Scholar
Gonzalez A, Perez R (2001) Selection of relevant features in a fuzzy genetic learning algorithm. IEEE Trans Syst Man Cybern 31(3):417–425
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) (2006) Feature extraction: foundations and applications. Springer, Berlin
MATH Google Scholar
Hart PE (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 18(5):515–516
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284
Article Google Scholar
He Q, Wu C (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15(6):1105–1114
Article MathSciNet Google Scholar
Ho SY, Liu CC, Liu S (2002) Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recognit Lett 23(13):1495–1503
Article MATH Google Scholar
Inza I, Larrañaga P, Sierra B (2001) Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms. Int J Approx Reason 27:143–164
Article MATH Google Scholar
Ishibuchi H, Nakashima T (1998) Evolution of reference sets in nearest neighbor classification. In: Second Asia-Pacific conference on simulated evolution and learning on simulated evolution and learning (SEAL’98). Lecture notes in computer science, vol 1585, pp 82–89
Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection. In: Liu H, Motoda H (eds) Instance selection and construction for data mining, Kluwer Academic Publishers, Dordrecht, pp 95–112
Jensen R, Cornelis C (2010) Fuzzy-rough instance selection. In: Proceedings of the WCCI 2010 IEEE world congress on computational intelligence, IEEE congress on fuzzy logic, Barcelona, Spain, pp 1776–1782
Jensen R, Shen Q (2007) Fuzzy-rough sets assisted attribute selection. IEEE Trans Fuzzy Syst 15(1):73–89
Article Google Scholar
Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838
Article Google Scholar
Kim K (2006) Artificial neural networks with evolutionary instance selection for financial forecasting. Expert Syst Appl 30:519–526
Article Google Scholar
Kira K, Rendell L (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, Aberdeen, Scotland UK, pp 249–256
Kohavi R, John G (1997) Wrappers for feature selection. Artif Intell 97:273–324
Article MATH Google Scholar
Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16:809–814
Article Google Scholar
Kuncheva LI, Jain L (1999) Nearest neighbor classifier: simultaneous editing and descriptor selection. Pattern Recognit Lett 20:1149–1156
Article Google Scholar
Kusunoki Y, Inuiguchi M (2010) A unified approach to reducts in dominance-based rough set approach. Soft Comput 14(5):507–515
Article MATH Google Scholar
Liu H, Motoda H (eds) (1998) Feature selection for knowledge discovery and data mining. The Springer international series in engineering and computer science, Springer, Berlin
Google Scholar
Liu H, Motoda H (eds) (2001) Instance selection and construction for data mining. The Springer international series in engineering and computer science, Springer, Berlin
Google Scholar
Liu H, Motoda H (eds) (2007) Computational methods of feature selection. Chapman & Hall/Crc data mining and knowledge discovery series, Chapman & Hall/Crc, London
Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(3):1–12
Article MATH Google Scholar
Mjolsness E, DeCoste D (2001) Machine learning for science: state of the art and future prospects. Science 293:2051–2055
Article Google Scholar
Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26:1424–1437
Article Google Scholar
Pappa GL, Freitas AA (2009) Automating the design of data mining algorithms: an evolutionary computation approach. Natural computing. Springer, Berlin
Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Article MathSciNet MATH Google Scholar
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishing, Dordrecht
MATH Google Scholar
Pawlak Z, Skowron A (2007a) Rough sets: some extensions. Inf Sci 177(1):28–40
Article MathSciNet MATH Google Scholar
Pawlak Z, Skowron A (2007b) Rudiments of rough sets. Inf Sci 177:3–27
Article MathSciNet MATH Google Scholar
Pyle D (1999) Data preparation for data mining. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, Menlo Park
Google Scholar
Quirino T, Kubat M, Bryan NJ (2010) Instinct-based mating in genetic algorithms applied to the tuning of 1-nn classifiers. IEEE Trans Knowl Data Eng 22(12):1724–1737
Article Google Scholar
Radzikowska A, Kerre E (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126:137–156
Article MathSciNet MATH Google Scholar
Ramentol E, Verbiest N, Bello R, Caballero Y, Cornelis C, Herrera F (2012) SMOTE-FRST: a new resampling method using fuzzy rough set theory. In: 10th International FLINS conference on uncertainty modelling in knowledge engineering and decision making (to appear)
Rokach L (2008) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit 41:1676–1700
Article MATH Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 19:2507–2517
Article Google Scholar
Shakhnarovich G, Darrell T, Indyk P (eds) (2006) Nearest-neighbor methods in learning and vision: theory and practice. The MIT Press, Cambridge
Google Scholar
Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman & Hall/CRC, London
Google Scholar
Shie J, Chen S (2008) Feature subset selection based on fuzzy entropy measures for handling classification problems. Appl Intell 28:69–82
Article Google Scholar
Stracuzzi D, Utgoff P (2004) Randomized variable elimination. J Mach Learn Res 5:1331–1362
MathSciNet MATH Google Scholar
Triguero I, García S, Herrera F (2010) IPADE: Iterative prototype adjustment for nearest neighbor classification. IEEE Trans Neural Netw 21(12):1984–1990
Article Google Scholar
Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern Part C Appl Rev 42(1):86–100
Article Google Scholar
Tsang E, Chen D, Yeung D, Wang X, Lee JT (2008) Attributes reduction using fuzzy rough sets. IEEE Trans Fuzzy Syst 16(5):1130–1141
Article Google Scholar
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Whitley LD (1989) The genitor algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In: Proceedings of the 3rd international conference on genetic algorithms, vol 2, Fairfax, Virginia, USA, June 1989, Morgan Kaufmann, pp 116–123
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Article MATH Google Scholar
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Article MATH Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann series in data management systems. Morgan Kaufmann, Menlo Park
Google Scholar
Wu X, Kumar V (eds) (2009) The top ten algorithms in data mining. Data mining and knowledge discovery. Chapman & Hall/CRC, London
Google Scholar
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Article MathSciNet MATH Google Scholar
Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15(6):1087–1096
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, 18071, Granada, Spain
J. Derrac, C. Cornelis & F. Herrera
Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium
N. Verbiest & C. Cornelis
Department of Computer Science, University of Jaén, 23071, Jaén, Spain
S. García

Authors

J. Derrac
View author publications
You can also search for this author in PubMed Google Scholar
N. Verbiest
View author publications
You can also search for this author in PubMed Google Scholar
S. García
View author publications
You can also search for this author in PubMed Google Scholar
C. Cornelis
View author publications
You can also search for this author in PubMed Google Scholar
F. Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Derrac.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Derrac, J., Verbiest, N., García, S. et al. On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput 17, 223–238 (2013). https://doi.org/10.1007/s00500-012-0888-3

Download citation

Published: 20 July 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s00500-012-0888-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection

Abstract

Access this article

Similar content being viewed by others

Multi Threshold FRPS: A New Approach to Fuzzy Rough Set Prototype Selection

OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory

Efficient Feature Selection Algorithm Based on Population Random Search with Adaptive Memory Strategies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection

Abstract

Access this article

Similar content being viewed by others

Multi Threshold FRPS: A New Approach to Fuzzy Rough Set Prototype Selection

OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory

Efficient Feature Selection Algorithm Based on Population Random Search with Adaptive Memory Strategies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation