A Comparative Study in the Bump Hunting between the Tree-GA and the PRIM
The bump hunting, proposed by Friedman and Fisher, has become important in many fields. Suppose that we are interested in finding regions where target points are denser than other regions. Such dense regions of target points are called the bumps, and finding them is called bump hunting. By pre-specifying a pureness rate in advance, a maximum capture rate could be obtained. Then, a trade-off curve between the two can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. When we adopt simpler boundary shapes for the bumps such as the union of boxes located parallel to some explanation variable axes, it would be convenient to adopt the binary decision tree. Since the conventional binary decision tree, e.g., CART (Classification and Regression Trees), will not provide the maximum capture rates, we use the genetic algorithm (GA), specified to the tree structure, the tree-GA. So far, we assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one. In this paper, we further investigate the prediction accuracy of the tree-GA by comparing the trade-off curve obtained by using the tree-GA with that obtained by using the PRIM (Patient Rule Induction Method) proposed by Friedman and Fisher. We have found that the tree-GA reveals the superiority over the PRIM in some cases.
Unable to display preview. Download preview PDF.
- 2.Agarwal, D., Phillips, J.M., Venkatasubramanian, S.: The hunting of the bump: On maximizing statistical discrepancy. In: SODA 2006, pp. 1137–1146 (2006)Google Scholar
- 4.Castillo, E.: Extreme Value Theory in Engineering. Academic Press (1988)Google Scholar
- 6.Dazard, J.-E., Rao, J.S., Markowitz, S.: Local sparse bump hunting reveals molecular heterogeneity of colon tumors. Statistics in Medicine (online 2011), doi: 10.1002/sim.4389 Google Scholar
- 8.Gray, J.B., Fan, G.: Target: Tree analysis with randomly generated and evolved trees. Technical report, The University of Alabama (2003)Google Scholar
- 9.Hastie, T., Tibshirani, R., Friedman, J.H.: Elements of Statistical Learning. Springer (2001)Google Scholar
- 10.Hirose, H.: The bump hunting by the decision tree with the genetic algorithm. In: Advances in Computational Algorithms and Data Analysis, pp. 305–318. Springer (2008)Google Scholar
- 11.Hirose, H.: Evaluation of the trade-off curve in the bump hunting using the tree genetic algorithm. In: 1st IMS Asia Pacific Rim Meetings (2009)Google Scholar
- 12.Hirose, H.: Assessment of the trade-off curve accuracy in the bump hunting using the tree-GA. In: Third International Conference on Knowledge Discovery and Data Mining, pp. 597–600 (2010)Google Scholar
- 13.Hirose, H.: Bump Hunting using the Tree-GA. Information 14, 3409–3424 (2011)Google Scholar
- 14.Hirose, H., Yukizane, T., Deguchi, T.: The bump hunting method and its accuracy using the genetic algorithm with application to real customer data. In: IEEE 7th International Conference on Computer and Information Technology, pp. 128–132 (2007)Google Scholar
- 15.Hirose, H., Yukizane, T.: The accuracy of the trade-off curve in the bump hunting. In: 7th Hawaii International Conference on Statistics, Mathematics and Related Fields (2008)Google Scholar
- 16.Hirose, H., Yukizane, T., Zaman, F.: Accuracy assessment for the trade-off curve and its upper bound curve in the bump hunting using the new tree genetic algorithm. In: 7th World Congress in Probability and Statistics (2008)Google Scholar
- 18.Sniadecki, J., Therapeutics, A.: Bump Hunting With SAS: A Macro Approach To Employing PRIM. SAS Global Forum 156 (2011)Google Scholar