Skip to main content

The Bump Hunting by the Decision Tree with the Genetic Algorithm

  • Chapter
Advances in Computational Algorithms and Data Analysis

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 14))

In difficult classification problems of the z-dimensional points into two groups giving 0–1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 points than to find the boundaries to separate the two groups. For such problems which can often be seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods as shown in the previous study. By specifying a pureness rate in advance, a maximum capture rate will be obtained. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. Then, a trade-off curve between the pureness rate and the capture rate can be constructed. However, such a trade-off curve could be optimistic if the training data set alone is used. Therefore, we should be careful in assessing the accuracy of the tradeoff curve. Using the accuracy evaluation procedures such as the cross validation or the bootstrapped hold-out method combined with the training and test data sets, we have shown that the actually applicable trade-off curve can be obtained. We have also shown that an attainable upper bound trade-off curve can be estimated by using the extreme-value statistics because the genetic algorithm provides many local maxima of the capture rates with different initial values. We have constructed the three kinds of trade-off curves; the first is the curve obtained by using the training data; the second is the return capture rate curve obtained by using the extreme-value statistics; the last is the curve obtained by using the test data. These three are indispensable like the Trinity to comprehend the whole figure of the trade-off curve between the pureness rate and the capture rate. This paper deals with the behavior of the trade-off curve from a statistical viewpoint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hastie, T., Tibshirani, R., and Friedman, J.H.: Elements of Statistical Learning. New York: Springer (2001)

    Google Scholar 

  2. Hirose, H.: A method to discriminate the minor groups from the major groups. Hawaii Int. Conf. Stat. Math. Related Fields. (2005)

    Google Scholar 

  3. Hirose, H.: Optimal boundary finding method for the bumpy regions. IFORS2005

    Google Scholar 

  4. Agarwal, D., Phillips, J.M., and Venkatasubramanian, S.: The hunting of the bump: On maxi mizing statistical discrepancy, SODA'06. (2006) 1137–1146

    Google Scholar 

  5. Becker, U. and Fahrmeir, L.: Bump hunting for risk: A new data mining tool and its applica tions, Comput. Stat., 16 (2001) 373–386

    MATH  MathSciNet  Google Scholar 

  6. Friedman, J.H. and Fisher, N.I.: Bump hunting in high—dimensional data, Statistics and Com puting. 9 (1999) 123–143

    Article  Google Scholar 

  7. Gray, J.B. and Fan, G.: Target: Tree analysis with randomly generated and evolved trees. Technical report. The University of Alabama (2003)

    Google Scholar 

  8. Hirose, H., Ohi, S., and Yukizane, T.: Assessment of the prediction accuracy in the bump hunting procedure. Hawaii Int. Conf. Stat. Math. Related Fields. (2007)

    Google Scholar 

  9. Davis, J. and Goadrich, M.: The relationship between precision—recall and ROC curves, Proc. 23rd Intl. Conf. Mach. Learn. (2006)

    Google Scholar 

  10. Fawcett, T.: An introduction to ROC analysis, Pattern Recog. Let. 27 (2006) 861–874

    Google Scholar 

  11. Hirose, H., Yukizane, T., and Deguchi, T.: The bump hunting method and its accuracy using the genetic algorithm with application to real customer data. submitted

    Google Scholar 

  12. Yukizane, T., Ohi, S., Miyano, E., and Hirose, H.: The bump hunting method using the genetic algorithm with the extreme—value statistics. IEICE Trans. Inf. Syst., E89—D (2006) 2332–2339

    Google Scholar 

  13. Castillo, E.: Extreme Value Theory in Engineering. San Diego, CA, USA: Academic (1998)

    Google Scholar 

  14. Hirose, H., Yukizane, T., and Miyano, E.: Boundary detection for bumps using the Gini's index in messy classification problems. CITSA2006. (2006) 293–298

    Google Scholar 

  15. Efron, B.: Estimating the error rate of a prediction rule: Improvements in cross—validation, JASA. 78 (1983) 316–331

    MATH  MathSciNet  Google Scholar 

  16. Kohavi, R.: A study of cross—validation and bootstrap for accuracy estimation and model se lection. IJCAI. (1995)

    Google Scholar 

  17. Yukizane T., Hirose, H., Ohi, S., and Miyano, E.: Accuracy of the Solution in the Bump Hunting. IPSJ MPS SIG report, MPS06-62-04 (2006) 13–16

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media B.V

About this chapter

Cite this chapter

Hirose, H. (2009). The Bump Hunting by the Decision Tree with the Genetic Algorithm. In: Ao, SI., Rieger, B., Chen, SS. (eds) Advances in Computational Algorithms and Data Analysis. Lecture Notes in Electrical Engineering, vol 14. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8919-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-8919-0_21

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-8918-3

  • Online ISBN: 978-1-4020-8919-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics