Instance Selection Methods and Resampling Techniques for Dissimilarity Representation with Imbalanced Data Sets

Millán-Giraldo, M.; García, V.; Sánchez, J. S.

doi:10.1007/978-3-642-36530-0_12

M. Millán-Giraldo⁴,
V. García⁴ &
J. S. Sánchez⁴

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 204))

1474 Accesses
1 Citations

Abstract

In the dissimilarity representation approach, the dimension reduction of the dissimilarity space is addressed by using instance selection methods. Several studies have shown that these methods work well on small data sets. Also, the uniformity of the instances distribution can be obtained when the classes are evenly spread and balanced. However, many real-world problems are characterized by an imbalanced class distribution. In this paper, we address the problem of instance selection for constructing the dissimilarity space in the imbalanced data context. Class imbalance is handled by resampling the data set, whereas instance selection is applied to find a small representation set. Experimental results demonstrate the significance of the joint use of resampling techniques and instance selection methods to improve the performance of classifiers trained on dissimilarity representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Duin, R.P.W., Pekalska, E.: The dissimilarity space: Bridging structural and statistical pattern recognition. Pattern Recognition Letters 33, 826–832 (2012)
Article Google Scholar
Pękalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters 23, 943–956 (2002)
Article MATH Google Scholar
Paclik, P., Duin, R.P.W.: Dissimilarity-based classification of spectra: computational issues. Real-Time Imaging 9, 237–244 (2003)
Article Google Scholar
Kim, S.W., Oommen, B.J.: On using prototype reduction schemes to optimize dissimilarity-based classification. Pattern Recognition 40, 2946–2957 (2007)
Article MATH Google Scholar
Kim, S.W.: An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications. Pattern Recognition Letters 32, 816–823 (2011)
Article Google Scholar
Lozano, M., Sotoca, J.M., Sánchez, J.S., Pla, F., Pkalska, E., Duin, R.P.W.: Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces. Pattern Recognition 39, 1827–1838 (2006)
Article MATH Google Scholar
Pekalska, E., Duin, R.P.W.: Prototype selection for finding efficient representations of dissimilarity data. In: Proc. 16th International Conference on Pattern Recognition, vol. 3, pp. 37–40 (2002)
Google Scholar
Pekalska, E., Duin, R.P.W., Paclik, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognition 39, 189–208 (2006)
Article MATH Google Scholar
Plasencia-Calaña, Y., García-Reyes, E., Duin, R.P.W.: Prototype selection methods for dissimilarity space classification. Technical report, Advanced Technologies Application Center CENATAV (2010)
Google Scholar
Plasencia-Calaña, Y., García-Reyes, E., Orozco-Alzate, M., Duin, R.P.W.: Prototype selection for dissimilarity representation by a genetic algorithm. In: Proc. 20th International Conference on Pattern Recognition, pp. 177–180 (2010)
Google Scholar
Fernández, A., García, S., Herrera, F.: Addressing the classification with imbalanced data: Open problems and new challenges on class distribution. In: Proc. 6th International Conference on Hybrid Artificial Intelligent Systems, pp. 1–10 (2011)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling TEchnique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Proc. International Conference on Intelligent Computing, pp. 878–887 (2005)
Google Scholar
Barandela, R., Sánchez, J., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)
Article Google Scholar
García, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evolutionary Computation 17, 275–306 (2009)
Article Google Scholar
Koknar-Tezel, S., Latecki, L.: Improving SVM classification on imbalanced time series data sets with ghost points. Knowledge and Information Systems 28, 1–23 (2011)
Article Google Scholar
Sousa, A., Mendonca, A., Campilho, A.: Minimizing the imbalance problem in chromatographic profile classification with one-class classifiers. In: Proc. 5th International Conference on Image Analysis and Recognition, pp. 413–422 (2008)
Google Scholar
Sousa, A., Mendonca, A., Campilho, A.: Dissimilarity-based classification of chromatographic profiles. Pattern Analysis & Applications 11, 409–423 (2008)
Article MathSciNet Google Scholar
Duin, R.P.W., Pękalska, E.: The Dissimilarity Representation for Structural Pattern Recognition. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 1–24. Springer, Heidelberg (2011)
Chapter Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. on Information Theory 14, 515–516 (1968)
Article Google Scholar
Barandela, R., Ferri, F.J., Sánchez, J.S.: Decision boundary preserving prototype selection for nearest neighbor classification. International Journal of Pattern Recognition and Artificial Intelligence 19, 787–806 (2005)
Article Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th International Conference on Machine Learning, Nashville, USA, pp. 179–186 (1997)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Daskalaki, S., Kopanas, I., Avouris, N.: Evaluation of classifiers for an uneven class distribution problem. Applied Artificial Intelligence 20, 381–417 (2006)
Article Google Scholar
Fatourechi, M., Ward, R., Mason, S., Huggins, J., Schlogl, A., Birch, G.: Comparison of evaluation metrics in classification applications with imbalanced datasets. In: Proc. 7th International Conference on Machine Learning and Applications, pp. 777–782 (2008)
Google Scholar
Huang, J., Ling, C.X.: Constructing new and better evaluation measures for machine learning. In: Proc. 20th International Joint Conference on Artificial Intelligence, pp. 859–864 (2007)
Google Scholar
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proc. 3rd International Conference on Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Information Processing & Management 45, 427–437 (2009)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH Google Scholar
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180, 2044–2064 (2010)
Article Google Scholar
Iman, R.L., Davenport, J.M.: Approximations of the critical region of the friedman statistic. Communications in Statistics – Theory and Methods 9, 571–595 (1980)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of New Imaging Technologies Department of Computer Languages and Systems, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071, Castellón de la Plana, Spain
M. Millán-Giraldo, V. García & J. S. Sánchez

Authors

M. Millán-Giraldo
View author publications
You can also search for this author in PubMed Google Scholar
V. García
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Millán-Giraldo .

Editor information

Editors and Affiliations

Department of Computer Languages, and Systems, Jaume I University, Vicent Sos Baynat Avenue, Castellon de la Plana, 12071, Spain
Pedro Latorre Carmona
Department of Programming Languages, and Information Systems, Jaume I University, Campus del Riu Sec s/n, Castellon de la Plana, 12071, Spain
J. Salvador Sánchez
IST - Technical University of Lisbon, Av. Rovisco Pais 1, Lisbon, 1049-001, Portugal
Ana L.N. Fred

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Millán-Giraldo, M., García, V., Sánchez, J.S. (2013). Instance Selection Methods and Resampling Techniques for Dissimilarity Representation with Imbalanced Data Sets. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds) Pattern Recognition - Applications and Methods. Advances in Intelligent Systems and Computing, vol 204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36530-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-36530-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36529-4
Online ISBN: 978-3-642-36530-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics