Abstract
In binary classification problems, in presence of unbalanced datasets, the detection of rare patterns is a difficult task due to several interacting factors which affect the performance of standard classifiers. In this paper a novel approach to this problem is presented. The described method tries to overcome the criticalities encountered by standard methods and by some systems expressly developed to face this problem by means of a dynamic resampling technique, which suitably resamples the training dataset by means of a feed–forward neural network counterbalancing the natural distribution of the dataset. The proposed method has been tested on literature and industrial datasets: the achieved encouraging results are presented and discussed in the paper.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Butler, K.L., Momoh, J.A.: A neural net based approach for fault diagnosis in distribution networks. In: Power Engineering Society Winter Meeting, vol. 2, pp. 1275–1278. IEEE (2000)
Shreekant, G., Bin, Y., Meckl, P.: Fault detection for nonlinear systems in presence of input unmodeled dynamics. In: 2007 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, September 4-7, pp. 1–5 (2007)
Stepenosky, N., Polikar, R., Kounios, J., Clark, C.: Ensemble Techniques with Weighted Combination Rules for Early Diagnosis of Alzheimer’s Disease. In: International Joint Conference on Neural Networks, IJCNN 2006 (2006)
Estabrooks, A.: A combination scheme for inductive learning from imbalanced datasets. MSC thesis. Faculty of computer science, Dalhouise university (2000)
Estabrooks, A., Japkowicz, N.: A multiple resampling method for learning from imbalanced dataset. Computational Intelligence 20(1) (2004)
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 Intl. Conference on Artificial Intelligence (IC-AI 2000): Special Track on Inductive Learning, Las Vegas, Nevada (2000)
Soler, V., Prim, M.: Rectangular Basis Functions Applied to Imbalanced Datasets. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 511–519. Springer, Heidelberg (2007)
Li, P., Chan, K.L., Fang, W.: Hybrid Kernel Machine Ensemble for Imbalanced Data Sets. In: 18th International Conference on Pattern Recognition. IEEE (2006)
Akbani, R., Kwek, S., Japkowicz, N.: Applying Support Vector Machines to Imbalanced Datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Scholkopf, B.: New support vector algorithms. Neural Computation 12, 1207–1245 (2000)
Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded Neural Networks for Sensitive Industrial Classification Tasks. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds.) IWANN 2009, Part I. LNCS, vol. 5517, pp. 1320–1327. Springer, Heidelberg (2009)
Pazzani, M., Marz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification cost. In: Proc. of the 11th Intl. Conf. on Machine Learning (1994)
Vannucci, M., Colla, V.: Novel classification methods for sensitive problems and uneven datasets based on neural networks and fuzzy logic. Applied Soft Computing 11, 2383–2390 (2011)
Elkan, C.: The foundations of cost–sensitive learning. In: Proc. of 17th Intl. Joint Conference on Artificial Intelligence, IJCAI 2001 (2001)
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Computational Intelligence 20(1) (2004)
Liu, Y., An, A., Huang, X.: Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 107–118. Springer, Heidelberg (2006)
Guo, H., Viktor, H.L.: Learning from imbalanced datasets with boosting and data generation: the databoost approach. SIGKDD Explorations 6 (2004)
Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Workshop on Learning from Imbalanced Dataset II, ICML, Washington, DC (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Marquardt, D.W.: An algorithm for least square estimation of non linear parameters. SIAM Journal of Applied Mathematics 11, 164–168
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vannucci, M., Colla, V., Vannocci, M., Reyneri, L.M. (2012). Dynamic Resampling Method for Classification of Sensitive Problems and Uneven Datasets. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds) Advances in Computational Intelligence. IPMU 2012. Communications in Computer and Information Science, vol 298. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31715-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-31715-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31714-9
Online ISBN: 978-3-642-31715-6
eBook Packages: Computer ScienceComputer Science (R0)