Skip to main content

Introduction to Proactive Data Mining

  • Chapter
  • First Online:
Proactive Data Mining with Decision Trees

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSELECTRIC))

Abstract

In this chapter, we provide an introduction to the aspects of the exciting field of data mining, which are relevant to this book. In particular, we focus on classification tasks and on decision trees, as an algorithmic approach for solving classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aggarwl C (2002) Toward effective and interpretable data mining by visual interaction. ACM SIGKDD Explorations Newsletter 3(2):11–22

    Article  Google Scholar 

  • Ankerst M (2002) Report on the SIGKDD-2002 Panel—the perfect data mining tool: interactive or automated? ACM SIGKDD Explorations Newsletter 4(2):110–111

    Article  Google Scholar 

  • Boulicaut J, Jeudy B (2005) Constraint-based data mining, the data mining and knowledge discovery handbook, Springer, pp 399–416

    Google Scholar 

  • Breiman L, Friedman JH, Olshen R A, Stone C J (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA. ISBN 978–0–412–04841–8

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  MathSciNet  Google Scholar 

  • Büchner AG, Mulvenna MD (1998) Discovering internet marketing intelligence through online analytical web usage mining. ACM Sigmod Record 27(4):54–61

    Article  Google Scholar 

  • Buntine W, Niblett T (1992) A further comparison of splitting rules for decision-tree induction. Mach Learn 8:75–85

    Google Scholar 

  • Cao L, Zhang C (2006) Domain-driven actionable knowledge discovery in the real world. PAKDD2006, pp 821–830, LNAI 3918

    Google Scholar 

  • Cao L, Zhang C (2007) The evolution of KDD: towards domain-driven data mining, international. J Pattern Recognit Artif intell 21(4):677–692

    Article  Google Scholar 

  • Cao L (2012) Actionable knowledge discovery and delivery. Wiley Interdiscip Rev Data Min Knowl Discov 2:149–163

    Google Scholar 

  • Ciraco M, Rogalewski M, Weiss G (2005) Improving classifier utility by altering the misclassification cost ratio. In: Proceedings of the 1st international workshop on utility-based data mining, Chicago, pp 46–52

    Google Scholar 

  • Clarke P (2006) Christmas gift giving involvement. J Consumer Market 23(5):283–291

    Article  Google Scholar 

  • Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221

    Google Scholar 

  • Domingos P (1999) MetaCost: a general method for making classifiers cost sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press, pp 155–164

    Google Scholar 

  • Domingos P (2005) Mining social networks for viral marketing. IEEE Intell Syst 20(1):80–82

    Article  MathSciNet  Google Scholar 

  • Drummond C, Holte R (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the 17th International Conference on Machine Learning, 239–246

    Google Scholar 

  • Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: misclassification cost-sensitive boosting. In: Proceedings of the 16th international conference machine learning, pp 99–105

    Google Scholar 

  • Fayyad U, Irani KB (1992) The attribute selection problem in decision tree generation. In Proceedings of tenth national conference on artificial intelligence. AAAI Press, Cambridge, pp 104–110

    Google Scholar 

  • Fayyad U, Shapiro G, Uthurusamy R (2003) Summary from the KDD-03 panel—data mining: the next 10 years. ACM SIGKDD Explor Newslett 5(2) 191–196

    Article  Google Scholar 

  • Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of the national conference on artificial intelligence, pp. 717–724

    Google Scholar 

  • Kyriakopoulos K, Moorman C (2004) Tradeoffs in marketing exploitation and exploration strategies: the overlooked role of market orientatio. Int J Res Market 21:219–240

    Article  Google Scholar 

  • Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In Proceedings of the international ACM-SIGIR conference on research and development in information retrieval, pp 3–12

    Google Scholar 

  • Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228

    Article  MATH  Google Scholar 

  • Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In Proceedings 4th international conference on knowledge discovery in databases (KDD-98), New York, pp 73–79

    Google Scholar 

  • Liu XY, Zhou ZH (2006) The influence of class imbalance on cost-sensitive learning: an empirical study. In Proceedings of the 6th international conference on data mining, pp. 970–974

    Google Scholar 

  • Loh WY, Shih X (1997) Split selection methods for classification trees. Stat Sinica 7:815–840

    MATH  MathSciNet  Google Scholar 

  • Loh WY, Shih X (1999) Families of splitting criteria for classification trees. Stat Comput 9:309–315

    Article  Google Scholar 

  • Maimon O, Rokach L (2001) Data mining by attribute decomposition with semiconductor manufacturing case study. In: Braha D (ed) Data mining for design and manufacturing, pp 311–336

    Google Scholar 

  • Margineantu D (2002) Class probability estimation and cost sensitive classification decisions. In: Proceedings of the 13th european conference on machine learning, 270–281

    Google Scholar 

  • Margineantu D (2005) Active cost-sensitive learning. In Proceedings of the nineteenth international joint conference on artificial intelligence, IJCAI–05

    Google Scholar 

  • Nunez M (1991) The use of background knowledge in decision tree induction. Mach Learn 6(3):231–250

    MathSciNet  Google Scholar 

  • Pazzani M, Merz C, Murphy P, Ali K, Hume T, Brunk C (1994) Reducing misclassification costs. In: Proceedings 11th international conference on machine learning. Morgan Kaufmann, pp 217–225

    Google Scholar 

  • Provost F, Fawcett T (1997) Analysis and visualization of classifier performance comparison under imprecise class and cost distribution. In: Proceedings of KDD-97. AAAI Press, pp 43–48

    Google Scholar 

  • Provost F, Fawcett T (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings 15th international conference on machine learning. Madison, pp 445–453

    Google Scholar 

  • Rokach L (2008) Mining manufacturing data using genetic algorithm-based feature set decomposition. Int J Intell Syst Tech Appl 4(1):57–78

    Google Scholar 

  • Rothaermel FT, Deeds DL (2004) Exploration and exploitation alliances in biotechnology: a system of new product development. Strateg Manage J 25(3):201–217

    Article  Google Scholar 

  • Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the international conference on machine learning

    Google Scholar 

  • Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform Syst Res 18(1):4–22

    Article  Google Scholar 

  • Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. In Proceedings, first international conference knowledge discovery and data mining, pp 275–281

    Google Scholar 

  • Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems, IEEE Trans. Know Data Eng 8:970–974

    Article  Google Scholar 

  • Turney P (1995) Cost-sensitive classification: empirical evaluation of hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409

    Google Scholar 

  • Turney P (2000) Types of cost in inductive concept learning In Proceedings of the ICML’2000. Workshop on cost sensitive learning, Stanford, pp 15–21

    Google Scholar 

  • Viaene S, Baesens B, Van Gestel T, Suykens JAK, Van den Poel D, Vanthienen J, De Moor B, Dedene G (2001) Knowledge discovery in a direct marketing case using least squares support vector machine classifiers. Int J Intell Syst 9:1023–1036

    Article  Google Scholar 

  • Yinghui Y (2004) New data mining and marketing approaches for customer segmentation and promotion planning on the Internet, Phd Dissertation, University of Pennsylvania, ISBN 0-496-73213–1

    Google Scholar 

  • Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01)

    Google Scholar 

  • Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In ICDM (2003), pp 435–442

    Google Scholar 

  • Zadrozny B (2005) One-benefit learning: cost-sensitive learning with restricted cost information. In Proceedings of the workshop on utility-based data mining at the eleventh ACM SIGKDD international conference on knowledge discovery and data mining

    Google Scholar 

  • Zahavi J, Levin N (1997) Applying neural computing to target marketing. J Direct Mark 11(1):5–22

    Article  Google Scholar 

  • Zengyou He, Xiaofei X, Shengchun D (2003) Data mining for actionable knowledge: A survey. Technical report, Harbin Institute of Technology, China. http://arxiv.org/abs/cs/0501079. Accessed 13 Jan 2013.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lior Rokach .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 The Author(s)

About this chapter

Cite this chapter

Dahan, H., Cohen, S., Rokach, L., Maimon, O. (2014). Introduction to Proactive Data Mining. In: Proactive Data Mining with Decision Trees. SpringerBriefs in Electrical and Computer Engineering. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0539-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0539-3_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-0538-6

  • Online ISBN: 978-1-4939-0539-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics