Skip to main content

Dynamic Resampling Method for Classification of Sensitive Problems and Uneven Datasets

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 298))

Abstract

In binary classification problems, in presence of unbalanced datasets, the detection of rare patterns is a difficult task due to several interacting factors which affect the performance of standard classifiers. In this paper a novel approach to this problem is presented. The described method tries to overcome the criticalities encountered by standard methods and by some systems expressly developed to face this problem by means of a dynamic resampling technique, which suitably resamples the training dataset by means of a feed–forward neural network counterbalancing the natural distribution of the dataset. The proposed method has been tested on literature and industrial datasets: the achieved encouraging results are presented and discussed in the paper.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Butler, K.L., Momoh, J.A.: A neural net based approach for fault diagnosis in distribution networks. In: Power Engineering Society Winter Meeting, vol. 2, pp. 1275–1278. IEEE (2000)

    Google Scholar 

  2. Shreekant, G., Bin, Y., Meckl, P.: Fault detection for nonlinear systems in presence of input unmodeled dynamics. In: 2007 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, September 4-7, pp. 1–5 (2007)

    Google Scholar 

  3. Stepenosky, N., Polikar, R., Kounios, J., Clark, C.: Ensemble Techniques with Weighted Combination Rules for Early Diagnosis of Alzheimer’s Disease. In: International Joint Conference on Neural Networks, IJCNN 2006 (2006)

    Google Scholar 

  4. Estabrooks, A.: A combination scheme for inductive learning from imbalanced datasets. MSC thesis. Faculty of computer science, Dalhouise university (2000)

    Google Scholar 

  5. Estabrooks, A., Japkowicz, N.: A multiple resampling method for learning from imbalanced dataset. Computational Intelligence 20(1) (2004)

    Google Scholar 

  6. Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 Intl. Conference on Artificial Intelligence (IC-AI 2000): Special Track on Inductive Learning, Las Vegas, Nevada (2000)

    Google Scholar 

  7. Soler, V., Prim, M.: Rectangular Basis Functions Applied to Imbalanced Datasets. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 511–519. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Li, P., Chan, K.L., Fang, W.: Hybrid Kernel Machine Ensemble for Imbalanced Data Sets. In: 18th International Conference on Pattern Recognition. IEEE (2006)

    Google Scholar 

  9. Akbani, R., Kwek, S., Japkowicz, N.: Applying Support Vector Machines to Imbalanced Datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Scholkopf, B.: New support vector algorithms. Neural Computation 12, 1207–1245 (2000)

    Article  Google Scholar 

  11. Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded Neural Networks for Sensitive Industrial Classification Tasks. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds.) IWANN 2009, Part I. LNCS, vol. 5517, pp. 1320–1327. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Pazzani, M., Marz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification cost. In: Proc. of the 11th Intl. Conf. on Machine Learning (1994)

    Google Scholar 

  13. Vannucci, M., Colla, V.: Novel classification methods for sensitive problems and uneven datasets based on neural networks and fuzzy logic. Applied Soft Computing 11, 2383–2390 (2011)

    Article  Google Scholar 

  14. Elkan, C.: The foundations of cost–sensitive learning. In: Proc. of 17th Intl. Joint Conference on Artificial Intelligence, IJCAI 2001 (2001)

    Google Scholar 

  15. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Computational Intelligence 20(1) (2004)

    Google Scholar 

  16. Liu, Y., An, A., Huang, X.: Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 107–118. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Guo, H., Viktor, H.L.: Learning from imbalanced datasets with boosting and data generation: the databoost approach. SIGKDD Explorations 6 (2004)

    Google Scholar 

  18. Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Workshop on Learning from Imbalanced Dataset II, ICML, Washington, DC (2003)

    Google Scholar 

  19. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  20. Marquardt, D.W.: An algorithm for least square estimation of non linear parameters. SIAM Journal of Applied Mathematics 11, 164–168

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vannucci, M., Colla, V., Vannocci, M., Reyneri, L.M. (2012). Dynamic Resampling Method for Classification of Sensitive Problems and Uneven Datasets. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M., Matarazzo, B., Yager, R.R. (eds) Advances in Computational Intelligence. IPMU 2012. Communications in Computer and Information Science, vol 298. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31715-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31715-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31714-9

  • Online ISBN: 978-3-642-31715-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics