Sampling Strategies for Targeting Rare Groups from a Bank Customer Database

  • J-H. Chauchat
  • R. Rakotomalala
  • D. Robert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1910)


This paper presents various balanced sampling strategies for building decision trees in order to target rare groups. A new coeficient to compare targeting performances of various learning strategies is introduced. A real life application of targeting specific bank customer group for marketing actions is described. Results shows that local sampling on the nodes while constructing the tree requires small samples to achieve the performance of processing the complete base, with dramatically reduced computing times.


sampling customer targeting targeting quality coeffcient imbalanced database decision tree application 


  1. 1.
    L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. California: Wadsworth International, 1984.zbMATHGoogle Scholar
  2. 2.
    L. A. Breslow and D. W. Aha. Simplifying decision trees: a survey. Knowledge Engineering Review, 12(1):1–40, 1997.CrossRefGoogle Scholar
  3. 3.
    G. Celeux and A. Mkhadri. Méthodes dérivées du modèle multinomial. In G. Celeux and J.P. Nakache, editors, Analyse Discriminante Sur Variables Qualitatives, chapter 2. Polytechnica, 1994.Google Scholar
  4. 4.
    J.H. Chauchat, O. Boussaid, and L. Amoura. Optimization sampling in a large database for induction trees. In Proceedings of the JCIS’98-Association for Intelligent Machinery, pages 28–31, 1998.Google Scholar
  5. 5.
    J.H. Chauchat and R. Rakotomalala. A new sampling strategy for building decision trees from large databases. In Proceedings of the 7th Conference of the International Federation of Classification Societies, IFCS’2000, pages 45–50, 2000.Google Scholar
  6. 6.
    J.P. Egan. Signal Detection Theory and ROC Analysis. Series in Cognition and Perception. Academic Press, New York, 1975.Google Scholar
  7. 7.
    Eibe Frank and Ian H. Witten. Making better use of global discretization. In Proc. 16th International Conf. on Machine Learning, pages 115–123. Morgan Kaufmann, San Francisco, CA, 1999.Google Scholar
  8. 8.
    C.W. Gini. Variabilita e mutabilita, contributo allo studio delle distribuzioni erelazioni statische. Technical report, Studi Economico-Giuridici della R. Universita di Caligiari, 1938.Google Scholar
  9. 9.
    G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):119–127, 1980.CrossRefGoogle Scholar
  10. 10.
    Miroslav Kubat and Stan Matwin. Addressing the curse of imbalanced training sets: one-sided selection. In Proc. 14th International Conference on Machine Learning, pages 179–186. Morgan Kaufmann, 1997.Google Scholar
  11. 11.
    T.M. Mitchell. Machine learning. McGraw Hill, 1997.Google Scholar
  12. 12.
    Y.H. Pao. Adaptive pattern recognition and neural networks. AddisonWesley, 1989.Google Scholar
  13. 13.
    J.R. Quinlan. Discovering rules by induction from large collections of examples. In D. Michie, editor, Expert Systems in the Microelectronic Age, pages 168–201, Edinburgh, 1979. Edinburgh University Press.Google Scholar
  14. 14.
    R. Rakotomalala. Graphes d’Induction. PhD thesis, University Claude Bernard Lyon 1, December 1997.Google Scholar
  15. 15.
    J. Swets. Measuring the accuracy of diagnostic systems. Science, 240:1285–1293, 1988.CrossRefMathSciNetGoogle Scholar
  16. 16.
    J.S. Vitter. Faster methods for random sampling. In Communications of ACM, volume 27, pages 703–718, 1984.zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    I.H. Witten and E. Frank. Data Mining: practical machine learning tools and techniques with JAVA implementations. Morgan Kaufmann, 2000.Google Scholar
  18. 18.
    D.A. Zighed, S. Rabaseda, R. Rakotomalala, and F. Feschet. Discretization methods in supervised learning. In A. Kent and J.G. Williams, editors, Encyclopedia of Computer Science and Technology, volume 40, pages 35–50. Marcel Dekker, Inc., 1999.Google Scholar
  19. 19.
    D.A. Zighed and R. Rakotomalala. Graphes d’Induction-Apprentissage et Data Mining. Hermes, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • J-H. Chauchat
    • 1
  • R. Rakotomalala
    • 1
  • D. Robert
    • 2
  1. 1.ERIC LaboratoryUniversity of Lyon 2BronFRANCE
  2. 2.Crédit Agricole Centre-EstChampagne aux Monts d’OrFRANCE

Personalised recommendations