Instance reduction for one-class classification

Krawczyk, Bartosz; Triguero, Isaac; García, Salvador; Woźniak, Michał; Herrera, Francisco

doi:10.1007/s10115-018-1220-z

Instance reduction for one-class classification

Regular Paper
Published: 21 May 2018

Volume 59, pages 601–628, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Bartosz Krawczyk ORCID: orcid.org/0000-0002-9774-0106¹,
Isaac Triguero²,
Salvador García³,
Michał Woźniak⁴ &
…
Francisco Herrera^3,5

1064 Accesses
23 Citations
2 Altmetric
Explore all metrics

Abstract

Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. These algorithms have been designed and widely analyzed in multi-class problems providing very competitive results. However, this issue was rarely addressed in the context of one-class classification. In this specific domain a reduction of the training set may not only decrease the classification time and classifier’s complexity, but also allows us to handle internal noisy data and simplify the data description boundary. We propose two methods for achieving this goal. The first one is a flexible framework that adjusts any instance reduction method to one-class scenario by introduction of meaningful artificial outliers. The second one is a novel modification of evolutionary instance reduction technique that is based on differential evolution and uses consistency measure for model evaluation in filter or wrapper modes. It is a powerful native one-class solution that does not require an access to counterexamples. Both of the proposed algorithms can be applied to any type of one-class classifier. On the basis of extensive computational experiments, we show that the proposed methods are highly efficient techniques to reduce the complexity and improve the classification performance in one-class scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Cluster-Based Instance Selection for the Imbalanced Data Classification

References

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Google Scholar
Angiulli F (2012) Prototype-based domain description for one-class classification. IEEE Trans Pattern Anal Mach Intell 34(6):1131–1144
Article Google Scholar
Bicego M, Figueiredo MAT (2009) Soft clustering using weighted one-class support vector machines. Pattern Recogn 42(1):27–32
Article MATH Google Scholar
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. Artificial Intelligence: Foundations, Theory, and Algorithms. Springer, pp 1–132. ISBN 978-3-319-21857-1.
Brest J, Greiner S, Boskovic B, Mernik M, Zumer V (2006) Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. IEEE Trans Evol Comput 10(6):646–657. https://doi.org/10.1109/TEVC.2006.872133
Article Google Scholar
Cabral GG, de Oliveira ALI (2011) A novel one-class classification method based on feature analysis and prototype reduction. In: Proceedings of the IEEE international conference on systems, man and cybernetics, Anchorage, October 9–12, 2011, pp 983–988
Cano A, Ventura S, Cios KJ (2017) Multi-objective genetic programming for feature extraction and data visualization. Soft Comput 21(8):2069–2089
Article Google Scholar
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575
Article Google Scholar
Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
MathSciNet MATH Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Cyganek B (2012) One-class support vector ensembles for image segmentation and classification. J Math Imaging Vis 42(2–3):103–117
Article MathSciNet MATH Google Scholar
Cyganek B, Wiatr K (2011) Image contents annotations with the ensemble of one-class support vector machines. In: NCTA 2011—proceedings of the international conference on neural computation theory and applications [part of the international joint conference on computational intelligence IJCCI 2011], Paris, 24–26 October, 2011, pp 277–282
Czarnowski I (2010) Prototype selection algorithms for distributed learning. Pattern Recogn 43(6):2292–2300
Article MATH Google Scholar
Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133
Article Google Scholar
Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):4–31
Article Google Scholar
García S, Cano JR, Herrera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41(8):2693–2709
Article MATH Google Scholar
García S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
García S, Herrera F (2008) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
García S, Herrera F (2009) Evolutionary under-sampling for classification with imbalanced data sets: proposals and taxonomy. Evol Comput 17(3):275–306
Article MathSciNet Google Scholar
García S, Luengo J, Herrera F (2014) Data preprocessing in data mining. Springer Publishing Company, Berlin
Google Scholar
García S, Luengo J, Herrera F (2016) Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 98:1–29
Article Google Scholar
García V, Mollineda RA, Sánchez JS (2009) Index of balanced accuracy: a performance measure for skewed class distributions. In: Pattern recognition and image analysis, 4th Iberian Conference, IbPRIA 2009, Póvoa de Varzim, Portugal, June 10–12, 2009, Proceedings, pp 441–448
García-Pedrajas N, de Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360
Article Google Scholar
Hadjadji B, Chibani Y (2014) Optimized selection of training samples for one-class neural network classifier. In: 2014 international joint conference on neural networks, IJCNN 2014, Beijing, July 6–11, 2014, pp 345–349
Hempstalk K, Frank E, Witten IH (2008) One-class classification by combining density and class probability estimation. In: Machine learning and knowledge discovery in databases, European conference, ECML/PKDD 2008, Antwerp, September 15–19, 2008, Proceedings, Part I, pp 505–519
Hu W, Tan Y (2016) Prototype generation using multiobjective particle swarm optimization for nearest neighbor classification. IEEE Trans Cybern 46(12):2719–2731
Article Google Scholar
Japkowicz N, Myers C, Gluck M (1995) A novelty detection approach to classification. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI 95, Montréal Québec, August 20–25 1995, vol 2, pp 518–523
Juszczak P, Tax DMJ, Pekalska E, Duin RPW (2009) Minimum spanning tree based one-class classifier. Neurocomputing 72(7–9):1859–1869
Article Google Scholar
Kim K, Lin H, Choi JY, Choi K (2016) A design framework for hierarchical ensemble of multiple feature extractors and multiple classifiers. Pattern Recogn 52:1–16
Article Google Scholar
Krawczyk B (2015) One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing 150:490–500
Article Google Scholar
Krawczyk B, Triguero I, García S, Woźniak M, Herrera F (2014) A first attempt on evolutionary prototype reduction for nearest neighbor one-class classification. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2014, Beijing, July 6–11, 2014, pp 747–753. https://doi.org/10.1109/CEC.2014.6900469
Krawczyk B, Woźniak M, Herrera F (2015) On the usefulness of one-class classifier ensembles for decomposition of multi-class problems. Pattern Recogn 48(12):3969–3982
Article Google Scholar
Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 14(8):1075–1090
Article Google Scholar
Leyva E, Muñoz AG, Pérez R (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recogn 48(4):1523–1537
Article Google Scholar
Li Y (2011) Selecting training points for one-class support vector machines. Pattern Recogn Lett 32(11):1517–1522
Article Google Scholar
Liu W, Hua G, Smith JR (2014) Unsupervised one-class learning for automatic outlier removal. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014, Columbus, June 23–28, 2014, pp 3826–3833
Moya M, Hush D (1996) Network constraints and multi-objective optimization for one-class classification. Neural Netw 9(3):463–474
Article Google Scholar
Neri F, Tirronen V (2009) Scale factor local search in differential evolution. Memet Comput 1(2):153–171
Article Google Scholar
Parhizkar E, Abadi M (2015) Beeowa: a novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381
Article Google Scholar
Pyle D (1999) Data preparation for data mining. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, Burlington
Google Scholar
Ramírez-Gallego S, Krawczyk B, García S, Wozniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57
Article Google Scholar
Rokach L (2016) Decision forest: twenty years of research. Inf Fus 27:111–125
Article Google Scholar
Shu W, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280
Article Google Scholar
Sonnenburg S, Ratsch G, Schafer C, Scholkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
MathSciNet MATH Google Scholar
Spurek P, Wójcik M, Tabor J (2015) Cross-entropy clustering approach to one-class classification. In: Artificial intelligence and soft computing—14th international conference, ICAISC 2015, Zakopane, June 14–18, 2015, Proceedings, Part I, pp 481–490
Tax DJM, Duin RPW (2001) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173
MATH Google Scholar
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Article MATH Google Scholar
Tax DMJ, Müller K (2004) A consistency-based model selection for one-class classification. In: 17th international conference on pattern recognition, ICPR 2004, Cambridge, August 23–26, 2004, pp 363–366
Tomek I (1976) Two modifications of cnn. IEEE Trans Syst Man Cybern SMC–6(11):769–772
MathSciNet MATH Google Scholar
Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C Appl Rev 42(1):86–100
Article Google Scholar
Triguero I, García S, Herrera F (2010) IPADE: iterative prototype adjustment for nearest neighbor classification. IEEE Trans Neural Netw 21(12):1984–1990
Article Google Scholar
Triguero I, García S, Herrera F (2011) Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recogn 44(4):901–916
Article Google Scholar
Triguero I, Peralta D, Bacardit J, García S, Herrera F (2015) MRPR: a mapreduce solution for prototype reduction in big data classification. Neurocomputing 150:331–345
Article Google Scholar
Wilk T, Woźniak M (2012) Soft computing methods applied to combination of one-class classifiers. Neurocomputing 75:185–193
Article Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Article MathSciNet MATH Google Scholar
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Article MATH Google Scholar
Woźniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fus 16(1):3–17
Article Google Scholar
Zhu F, Ye N, Yu W, Xu S, Li G (2014) Boundary detection and sample reduction for one-class support vector machines. Neurocomputing 123:166–173
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Bartosz Krawczyk
School of Computer Science, Automated Scheduling, Optimisation and Planning (ASAP) Group, University of Nottingham, Nottingham, UK
Isaac Triguero
Department of Computer Science and Artificial Intelligence, CITIC-UGR, University of Granada, Granada, Spain
Salvador García & Francisco Herrera
Department of Systems and Computer Networks, Wrocław University of Technology, Wrocław, Poland
Michał Woźniak
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Francisco Herrera

Authors

Bartosz Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Triguero
View author publications
You can also search for this author in PubMed Google Scholar
Salvador García
View author publications
You can also search for this author in PubMed Google Scholar
Michał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bartosz Krawczyk.

Additional information

Michał Woźniak was supported by the Polish National Science Center under the Grant No. UMO-2015/19/B/ST6/01597. Salvador García and Francisco Herrera were supported by the Spanish National Research Project TIN2014-57251-P and the Andalusian Research Plan P11-TIC-7765.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krawczyk, B., Triguero, I., García, S. et al. Instance reduction for one-class classification. Knowl Inf Syst 59, 601–628 (2019). https://doi.org/10.1007/s10115-018-1220-z

Download citation

Received: 18 January 2017
Revised: 13 February 2018
Accepted: 07 May 2018
Published: 21 May 2018
Issue Date: 04 June 2019
DOI: https://doi.org/10.1007/s10115-018-1220-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instance reduction for one-class classification

Abstract

Access this article

Similar content being viewed by others

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Cluster-Based Instance Selection for the Imbalanced Data Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Instance reduction for one-class classification

Abstract

Access this article

Similar content being viewed by others

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Cluster-Based Instance Selection for the Imbalanced Data Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation