In difficult classification problems of the z-dimensional points into two groups giving 0–1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 points than to find the boundaries to separate the two groups. For such problems which can often be seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods as shown in the previous study. By specifying a pureness rate in advance, a maximum capture rate will be obtained. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. Then, a trade-off curve between the pureness rate and the capture rate can be constructed. However, such a trade-off curve could be optimistic if the training data set alone is used. Therefore, we should be careful in assessing the accuracy of the tradeoff curve. Using the accuracy evaluation procedures such as the cross validation or the bootstrapped hold-out method combined with the training and test data sets, we have shown that the actually applicable trade-off curve can be obtained. We have also shown that an attainable upper bound trade-off curve can be estimated by using the extreme-value statistics because the genetic algorithm provides many local maxima of the capture rates with different initial values. We have constructed the three kinds of trade-off curves; the first is the curve obtained by using the training data; the second is the return capture rate curve obtained by using the extreme-value statistics; the last is the curve obtained by using the test data. These three are indispensable like the Trinity to comprehend the whole figure of the trade-off curve between the pureness rate and the capture rate. This paper deals with the behavior of the trade-off curve from a statistical viewpoint.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hastie, T., Tibshirani, R., and Friedman, J.H.: Elements of Statistical Learning. New York: Springer (2001)
Hirose, H.: A method to discriminate the minor groups from the major groups. Hawaii Int. Conf. Stat. Math. Related Fields. (2005)
Hirose, H.: Optimal boundary finding method for the bumpy regions. IFORS2005
Agarwal, D., Phillips, J.M., and Venkatasubramanian, S.: The hunting of the bump: On maxi mizing statistical discrepancy, SODA'06. (2006) 1137–1146
Becker, U. and Fahrmeir, L.: Bump hunting for risk: A new data mining tool and its applica tions, Comput. Stat., 16 (2001) 373–386
Friedman, J.H. and Fisher, N.I.: Bump hunting in high—dimensional data, Statistics and Com puting. 9 (1999) 123–143
Gray, J.B. and Fan, G.: Target: Tree analysis with randomly generated and evolved trees. Technical report. The University of Alabama (2003)
Hirose, H., Ohi, S., and Yukizane, T.: Assessment of the prediction accuracy in the bump hunting procedure. Hawaii Int. Conf. Stat. Math. Related Fields. (2007)
Davis, J. and Goadrich, M.: The relationship between precision—recall and ROC curves, Proc. 23rd Intl. Conf. Mach. Learn. (2006)
Fawcett, T.: An introduction to ROC analysis, Pattern Recog. Let. 27 (2006) 861–874
Hirose, H., Yukizane, T., and Deguchi, T.: The bump hunting method and its accuracy using the genetic algorithm with application to real customer data. submitted
Yukizane, T., Ohi, S., Miyano, E., and Hirose, H.: The bump hunting method using the genetic algorithm with the extreme—value statistics. IEICE Trans. Inf. Syst., E89—D (2006) 2332–2339
Castillo, E.: Extreme Value Theory in Engineering. San Diego, CA, USA: Academic (1998)
Hirose, H., Yukizane, T., and Miyano, E.: Boundary detection for bumps using the Gini's index in messy classification problems. CITSA2006. (2006) 293–298
Efron, B.: Estimating the error rate of a prediction rule: Improvements in cross—validation, JASA. 78 (1983) 316–331
Kohavi, R.: A study of cross—validation and bootstrap for accuracy estimation and model se lection. IJCAI. (1995)
Yukizane T., Hirose, H., Ohi, S., and Miyano, E.: Accuracy of the Solution in the Bump Hunting. IPSJ MPS SIG report, MPS06-62-04 (2006) 13–16
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media B.V
About this chapter
Cite this chapter
Hirose, H. (2009). The Bump Hunting by the Decision Tree with the Genetic Algorithm. In: Ao, SI., Rieger, B., Chen, SS. (eds) Advances in Computational Algorithms and Data Analysis. Lecture Notes in Electrical Engineering, vol 14. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-8919-0_21
Download citation
DOI: https://doi.org/10.1007/978-1-4020-8919-0_21
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-8918-3
Online ISBN: 978-1-4020-8919-0
eBook Packages: Computer ScienceComputer Science (R0)