Parallel Perceptrons, Activation Margins and Imbalanced Training Set Pruning

  • Iván Cantador
  • José R. Dorronsoro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3523)


A natural way to deal with training samples in imbalanced class problems is to prune them removing redundant patterns, easy to classify and probably over represented, and label noisy patterns that belonging to one class are labelled as members of another. This allows classifier construction to focus on borderline patterns, likely to be the most informative ones. To appropriately define the above subsets, in this work we will use as base classifiers the so–called parallel perceptrons, a novel approach to committee machine training that allows, among other things, to naturally define margins for hidden unit activations. We shall use these margins to define the above pattern types and to iteratively perform subsample selections in an initial training set that enhance classification accuracy and allow for a balanced classifier performance even when class sizes are greatly different.


Near Neighbor Minority Class Activation Margin Negative Pattern Positive Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, P., Burgsteiner, H., Maass, W.: Reducing Communication for Distributed Learning in Neural Networks. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 123–128. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1983)Google Scholar
  3. 3.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  4. 4.
    Dorronsoro, J., Ginel, F., Sánchez, C., Santa Cruz, C.: Neural Fraud Detection in Credit Card Operations. IEEE Transactions on Neural Networks 8, 827–834 (1997)CrossRefGoogle Scholar
  5. 5.
    Fawcett, T., Provost, F.: Adaptive Fraud Detection. Journal of Data Mining and Knowledge Discovery 1, 291–316 (1997)CrossRefGoogle Scholar
  6. 6.
    Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121, 256–285 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One- Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, Nashville, TN, U.S.A., pp. 179–186 (1997)Google Scholar
  8. 8.
    Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar
  9. 9.
    Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases, Tech. Report, University of Califonia, Irvine (1994)Google Scholar
  10. 10.
    Nilsson, N.: The Mathematical Foundations of Learning Machines. Morgan Kaufmann, San Francisco (1990)zbMATHGoogle Scholar
  11. 11.
    Swets, J.A.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1998)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning, Technical Report ML-TR 43, Department of Computer Science, Rutgers University (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Iván Cantador
    • 1
  • José R. Dorronsoro
    • 1
  1. 1.Dpto. de Ingeniería Informática and Instituto de Ingeniería del ConocimientoUniversidad Autónoma de MadridMadridSpain

Personalised recommendations