Data Mining and Knowledge Discovery

, Volume 17, Issue 2, pp 283–316 | Cite as

Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns

  • Lior RokachEmail author
  • Lihi Naamani
  • Armin Shmilovici


In business applications such as direct marketing, decision-makers are required to choose the action which best maximizes a utility function. Cost-sensitive learning methods can help them achieve this goal. In this paper, we introduce Pessimistic Active Learning (PAL). PAL employs a novel pessimistic measure, which relies on confidence intervals and is used to balance the exploration/exploitation trade-off. In order to acquire an initial sample of labeled data, PAL applies orthogonal arrays of fractional factorial design. PAL was tested on ten datasets using a decision tree inducer. A comparison of these results to those of other methods indicates PAL’s superiority.


Cost-sensitive learning Reinforcement learning Active learning Direct marketing Decision trees Design of experiments 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Blake CL, Merz CJ (1998), UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA.
  2. Buchner A, Mulvenna M (1998) Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record 27(4): 54–61CrossRefGoogle Scholar
  3. Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: Proceedings of the ninth European conference on artificial intelligence, pp 147–149Google Scholar
  4. Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2): 201–221Google Scholar
  5. Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4: 129–145zbMATHGoogle Scholar
  6. Clarke P (2006) Christmas gift giving involvement. J Consumer Market 23(5): 283–291CrossRefGoogle Scholar
  7. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30MathSciNetGoogle Scholar
  8. Domingos P (2005) Mining social networks for viral marketing. IEEE Intell Syst 20(1): 80–82CrossRefMathSciNetGoogle Scholar
  9. Fong PWL (1995) A quantitative study of hypothesis selection. In: Twelfth international conference on machine learning, Tahoe City, California, pp 226–234Google Scholar
  10. Hedayat AS, Sloane NJA, Stufken J (1999) Orthogonal arrays: theory and applications. Springer-Verlag, NYzbMATHGoogle Scholar
  11. Hwang J, Ding A (1997) Prediction intervals for artificial neural networks. J Am Stat Assoc 92: 748–757zbMATHCrossRefMathSciNetGoogle Scholar
  12. Kaelbling LP (1993) Learning in embedded systems. MIT PressGoogle Scholar
  13. Kaelbling LP, Littman M (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237–285Google Scholar
  14. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598): 671–680CrossRefMathSciNetGoogle Scholar
  15. Kyriakopoulos K, Moorman C (2004) Tradeoffs in marketing exploitation and exploration strategies: the overlooked role of market orientatio. Int J Res Market 21: 219–240CrossRefGoogle Scholar
  16. Leemis LM, Trivedi KS (1996) A comparison of approximate interval estimators for the Bernoulli parameter. Am Stat 50: 63–68CrossRefMathSciNetGoogle Scholar
  17. Levin N, Zahavi J (2005) Data mining for target marketing. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook, pp 1261–1301Google Scholar
  18. Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the international ACM-SIGIR conference on research and development in information retrieval, pp 3–12Google Scholar
  19. Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings 4th international conference on knowledge discovery in databases (KDD-98), New York, pp 73–79Google Scholar
  20. Margineantu D (2005) Active cost-sensitive learning. In: Proceedings of the nineteenth international joint conference on artificial intelligence, IJCAI-05Google Scholar
  21. Mayer UF, Sarkissian A (2003) Experimental design for solicitation campaigns. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, USA, pp 717–722Google Scholar
  22. Montgomery DC (1997) Design and analysis of experiments, 4th edn. Wiley, New YorkzbMATHGoogle Scholar
  23. Pednault E, Abe N, Zadrozny B (2002) Sequential cost-sensitive decision making with reinforcement learning. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 259–268Google Scholar
  24. Percus OE, Percus JK (1984) Modified Bayes technique in sequential clinical trials. Comput Biol Med 14: 127–134CrossRefGoogle Scholar
  25. Petkau AJ (1978) Sequential medical trials for comparing an experimental with a standard treatment. J Am Stat Assoc 73: 328–338CrossRefMathSciNetGoogle Scholar
  26. Potharst R, Kaymak U, Pijls W (2002) Neural networks for target selection in direct marketing. In: Smith KA, Gupta JND (eds) Neural networks in business: techniques and applications, Idea Group Publishing, pp 89–110Google Scholar
  27. Putten P, Someren M (eds) (2000) CoIL Challenge 2000: The Insurance Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000–09. June 22Google Scholar
  28. Quinlan JR (1993) C4.5: programs for machine learning. Morgan KaufmannGoogle Scholar
  29. Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 55: 527–535CrossRefMathSciNetGoogle Scholar
  30. Rokach L, Maimon O (2005) Top down induction of decision trees classifiers: a survey. IEEE SMC Trans Part C 35(4): 476–487Google Scholar
  31. Rothaermel FT, Deeds DL (2004) Exploration and exploitation alliances in biotechnology: a system of new product development. Strateg Manage J 25(3): 201–217CrossRefGoogle Scholar
  32. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the international conference on machine learningGoogle Scholar
  33. Saar-Tsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Mach Learn 54(2): 153–178zbMATHCrossRefGoogle Scholar
  34. Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform Syst Res 18(1): 4–22CrossRefGoogle Scholar
  35. Schein AI, Ungar LH (2007) Active learning for logistic regression: an evaluation. Mach Learn 68(3): 235–265CrossRefGoogle Scholar
  36. Sloane NJA (2007) A library of orthogonal arrays.
  37. Sofroniou N, Hutcheson GD (2002) Confidence intervals for the predictions of logistic regression in the presence and absence of a variance-covariance matrix. Understand Stat 1(1): 3–18CrossRefGoogle Scholar
  38. Strehl AL, Littman ML (2005) A theoretical analysis of model-based interval estimation. In: Proceedings of the 22nd international conference on Machine learning, August 07–11, 2005, Bonn, Germany, pp 856–863Google Scholar
  39. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press.
  40. Turney P (2000) Types of cost in inductive concept learning. Workshop on Cost Sensitive Learning at the 17th ICML, Stanford, CAGoogle Scholar
  41. Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2): 161–186CrossRefGoogle Scholar
  42. Viaene S, Baesens B, De Moor B, VanDen Poel D, Dedene G, Suykens JAK, Vanthienen J, Viaene S, Van Gestel T (2001) Knowledge discovery using least squares support vector machine classifiers: a direct marketing case. Int J Intell Syst 16(9): 1023–1036zbMATHCrossRefGoogle Scholar
  43. Weiss G, Tian T (2008) Maximizing classifier utility when there are data acquisition and modeling costs. Data Min Knowl DiscGoogle Scholar
  44. Wiering M, Schmidhuber J (1998) Efficient model-based exploration. In: Proceedings of the 5th international conference on simulation of adaptive behavior, pp 223–228Google Scholar
  45. Yinghui Y (2004) New data mining and marketing approaches for customer segmentation and promotion planning on the Internet, Phd Dissertation, University of Pennsylvania, ISBN 0-496-73213-1Google Scholar
  46. Zahavi J, Levin N (1995) Issues and problems in applying neural computing to target marketing. (with N. Levin). J Direct Market 9(1): 7–12Google Scholar
  47. Zahavi J, Levin N (1997) Applying neural computing to target marketing. J Direct Market 11(1): 5–22CrossRefGoogle Scholar
  48. Zadrozny B (2005) One-benefit learning: cost-sensitive learning with restricted cost information. In: Proceedings of the workshop on utility-based data mining at the eleventh ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Information System EngineeringBen-Gurion University of the NegevBeer-ShevaIsrael
  2. 2.Deutsche Telekom Laboratories at Ben-Gurion UniversityBen-Gurion University of the NegevBeer-ShevaIsrael

Personalised recommendations