Knowledge and Information Systems

, Volume 32, Issue 2, pp 303–327 | Cite as

Decision trees for uplift modeling with single and multiple treatments

  • Piotr Rzepakowski
  • Szymon JaroszewiczEmail author
Open Access
Regular Paper


Most classification approaches aim at achieving high prediction accuracy on a given dataset. However, in most practical cases, some action such as mailing an offer or treating a patient is to be taken on the classified objects, and we should model not the class probabilities themselves, but instead, the change in class probabilities caused by the action. The action should then be performed on those objects for which it will be most profitable. This problem is known as uplift modeling, differential response analysis, or true lift modeling, but has received very little attention in machine learning literature. An important modification of the problem involves several possible actions, when for each object, the model must also decide which action should be used in order to maximize profit. In this paper, we present tree-based classifiers designed for uplift modeling in both single and multiple treatment cases. To this end, we design new splitting criteria and pruning methods. The experiments confirm the usefulness of the proposed approaches and show significant improvement over previous uplift modeling techniques.


Uplift modeling Decision trees Randomized controlled trial Information theory 



This work was supported by Research Grant no. N N516 414938 of the Polish Ministry of Science and Higher Education (Ministerstwo Nauki i Szkolnictwa Wyższego) from research funds for the period 2010–2012.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


  1. 1.
    Abe N, Verma N, Apte C, Schroko R (2004) Cross channel optimized marketing by reinforcement learning. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2004), pp 767–772Google Scholar
  2. 2.
    Adomavicius G, Tuzhilin A (1997) Discovery of actionable patterns in databases: The action hierarchy approach. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD-1997), pp 111–114Google Scholar
  3. 3.
    Bellamy S, Lin J, Ten Have T (2007) An introduction to causal modeling in clinical trials. Clin Trials 4(1): 58–73CrossRefGoogle Scholar
  4. 4.
    Brieman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontGoogle Scholar
  5. 5.
    Buntine W (1992) Learning classification trees. Stat Comput 2(2): 63–73CrossRefGoogle Scholar
  6. 6.
    Chickering DM, Heckerman D (2000) A decision theoretic approach to targeted advertising. In: Proceedings of the 16th conference on uncertainty in artificial intelligence (UAI-2000), Stanford, CA, pp 82–88Google Scholar
  7. 7.
    Csiszár I, Shields P (2004) Information theory and statistics: a tutorial. Found Trends Commun Inf Theory 1(4): 417–528CrossRefGoogle Scholar
  8. 8.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30MathSciNetzbMATHGoogle Scholar
  9. 9.
    Drabent W, Małuszyński J (2010) Hybrid rules with well-founded semantics. Knowl Inf Syst 25(1): 137–168CrossRefGoogle Scholar
  10. 10.
    Goetghebeur E, Lapp K (1997) The effect of treatment compliance in a placebo-controlled trial: regression with unpaired data. Appl Stat 46(3): 351–364zbMATHGoogle Scholar
  11. 11.
    Han TS, Kobayashi K (2001) Mathematics of information and coding. American Mathematical Society, USAGoogle Scholar
  12. 12.
    Hansotia B, Rukstales B (2002) Incremental value modeling. J Interact Market 16(3): 35–46CrossRefGoogle Scholar
  13. 13.
    Im S, Raś Z, Wasyluk H (2010) Action rule discovery from incomplete data. Knowl Inf Syst 25(1): 21–33CrossRefGoogle Scholar
  14. 14.
    Jaroszewicz S, Ivantysynova L, Scheffer T (2008) Schema matching on streams with accuracy guarantees. Intell Data Anal 12(3): 253–270Google Scholar
  15. 15.
    Jaroszewicz S, Simovici DA (2001) A general measure of rule interestingness. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery (PKDD-2001), Freiburg, Germany, pp 253–265Google Scholar
  16. 16.
    Larsen K (2011) Net lift models: optimizing the impact of your marketing. In: Predictive Analytics World. Workshop presentationGoogle Scholar
  17. 17.
    Lee L (1999) Measures of distributional similarity. In: Proceedings of the 37th annual meeting of the association for computational linguistics (ACL-1999), pp 25–32Google Scholar
  18. 18.
    Lo VSY (2002) The true lift model—a novel data mining approach to response modeling in database marketing. SIGKDD Explor 4(2): 78–86CrossRefGoogle Scholar
  19. 19.
    Manahan C (2005) A proportional hazards approach to campaign list selection. In: Proceedings of the thirtieth annual SAS users group international conference (SUGI), Philadelphia, PAGoogle Scholar
  20. 20.
    Mitchell T (1997) Machine learning. McGraw Hill, New YorkzbMATHGoogle Scholar
  21. 21.
    Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, CambridgezbMATHGoogle Scholar
  22. 22.
    Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106Google Scholar
  23. 23.
    Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27(3): 221–234CrossRefGoogle Scholar
  24. 24.
    Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kauffman, Los AltosGoogle Scholar
  25. 25.
    Radcliffe NJ (2007) Generating incremental sales. White paper, Stochastic Solutions LimitedGoogle Scholar
  26. 26.
    Radcliffe NJ (2007) Using control groups to target on predicted lift: building and assessing uplift models. Direct Market J Direct Market Assoc Anal Council 1: 14–21Google Scholar
  27. 27.
    Radcliffe NJ, Simpson R (2007) Identifying who can be saved and who will be driven away by retention activity. White paper, Stochastic Solutions LimitedGoogle Scholar
  28. 28.
    Radcliffe NJ, Surry PD (1999) Differential response analysis: Modeling true response by isolating the effect of a single action. In: Proceedings of Credit Scoring and Credit Control VI. Credit Research Centre, University of Edinburgh Management SchoolGoogle Scholar
  29. 29.
    Radcliffe NJ, Surry PD (2011) Real-world uplift modelling with significance-based uplift trees. Portrait Technical Report TR-2011-1, Stochastic SolutionsGoogle Scholar
  30. 30.
    Raś Z, Wyrzykowska E, Tsay L-S (2009) Action rules mining. In: Encyclopedia of Data Warehousing and Mining, vol 1, pp 1–5. IGI GlobalGoogle Scholar
  31. 31.
    Robins J (1994) Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat Theory Methods 23(8): 2379–2412MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Robins J, Rotnitzky A (2004) Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika 91(4): 763–783MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Rzepakowski P, Jaroszewicz S (2010) Decision trees for uplift modeling. In: Proceedings of the 10th IEEE international conference on data mining (ICDM-2010), Sydney, Australia, pp 441–450Google Scholar
  34. 34.
    Salicrú M (1992) Divergence measures: invariance under admissible reference measure changes. Soochow J Math 18(1): 35–45MathSciNetzbMATHGoogle Scholar
  35. 35.
    Taneja IJ (2001) Generalized information measures and their applications. (on-line book)
  36. 36.
    Toussaint GT (1978) Probability of error, expected divergence, and the affinity of several distributions. IEEE Trans Syst Man Cybern (SMC) 8: 482–485MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Wang T, Qin Z, Jin Z, Zhang S (2010) Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning. J Syst Softw 83(7): 1137–1147CrossRefGoogle Scholar
  38. 38.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
  39. 39.
    Zhang R, Tran T (2011) An information gain-based approach for recommending useful product reviews. Knowl Inf Syst 26(3): 419–434CrossRefGoogle Scholar
  40. 40.
    Zhang S (2010) Cost-sensitive classification with respect to waiting cost. Knowl Based Syst 23(5): 369–378CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.National Institute of TelecommunicationsWarsawPoland
  2. 2.Faculty of Electronics and Information TechnologyWarsaw University of TechnologyWarsawPoland
  3. 3.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations