Advertisement

A Comparative Study in the Bump Hunting between the Tree-GA and the PRIM

  • Hideo Hirose
  • Genki Koga
Part of the Studies in Computational Intelligence book series (SCI, volume 443)

Abstract

The bump hunting, proposed by Friedman and Fisher, has become important in many fields. Suppose that we are interested in finding regions where target points are denser than other regions. Such dense regions of target points are called the bumps, and finding them is called bump hunting. By pre-specifying a pureness rate in advance, a maximum capture rate could be obtained. Then, a trade-off curve between the two can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. When we adopt simpler boundary shapes for the bumps such as the union of boxes located parallel to some explanation variable axes, it would be convenient to adopt the binary decision tree. Since the conventional binary decision tree, e.g., CART (Classification and Regression Trees), will not provide the maximum capture rates, we use the genetic algorithm (GA), specified to the tree structure, the tree-GA. So far, we assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one. In this paper, we further investigate the prediction accuracy of the tree-GA by comparing the trade-off curve obtained by using the tree-GA with that obtained by using the PRIM (Patient Rule Induction Method) proposed by Friedman and Fisher. We have found that the tree-GA reveals the superiority over the PRIM in some cases.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abu-Hannaa, A., Nanningsa, B., Dongelmansb, D., Hasmana, A.: PRIM versus CART in subgroup discovery: When patience is harmful. Journal of Biomedical Informatics 43, 701–708 (2010)CrossRefGoogle Scholar
  2. 2.
    Agarwal, D., Phillips, J.M., Venkatasubramanian, S.: The hunting of the bump: On maximizing statistical discrepancy. In: SODA 2006, pp. 1137–1146 (2006)Google Scholar
  3. 3.
    Becker, U., Fahrmeir, L.: Bump hunting for risk: a new data mining tool and its applications. Computational Statistics 16, 373–386 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Castillo, E.: Extreme Value Theory in Engineering. Academic Press (1988)Google Scholar
  5. 5.
    Dazard, J.-E., Rao, J.S.: Local Sparse Bump Hunting. Journal of Computational and Graphical Statistics 19, 900–929 (2010)CrossRefGoogle Scholar
  6. 6.
    Dazard, J.-E., Rao, J.S., Markowitz, S.: Local sparse bump hunting reveals molecular heterogeneity of colon tumors. Statistics in Medicine (online 2011), doi: 10.1002/sim.4389 Google Scholar
  7. 7.
    Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9, 123–143 (1999)CrossRefGoogle Scholar
  8. 8.
    Gray, J.B., Fan, G.: Target: Tree analysis with randomly generated and evolved trees. Technical report, The University of Alabama (2003)Google Scholar
  9. 9.
    Hastie, T., Tibshirani, R., Friedman, J.H.: Elements of Statistical Learning. Springer (2001)Google Scholar
  10. 10.
    Hirose, H.: The bump hunting by the decision tree with the genetic algorithm. In: Advances in Computational Algorithms and Data Analysis, pp. 305–318. Springer (2008)Google Scholar
  11. 11.
    Hirose, H.: Evaluation of the trade-off curve in the bump hunting using the tree genetic algorithm. In: 1st IMS Asia Pacific Rim Meetings (2009)Google Scholar
  12. 12.
    Hirose, H.: Assessment of the trade-off curve accuracy in the bump hunting using the tree-GA. In: Third International Conference on Knowledge Discovery and Data Mining, pp. 597–600 (2010)Google Scholar
  13. 13.
    Hirose, H.: Bump Hunting using the Tree-GA. Information 14, 3409–3424 (2011)Google Scholar
  14. 14.
    Hirose, H., Yukizane, T., Deguchi, T.: The bump hunting method and its accuracy using the genetic algorithm with application to real customer data. In: IEEE 7th International Conference on Computer and Information Technology, pp. 128–132 (2007)Google Scholar
  15. 15.
    Hirose, H., Yukizane, T.: The accuracy of the trade-off curve in the bump hunting. In: 7th Hawaii International Conference on Statistics, Mathematics and Related Fields (2008)Google Scholar
  16. 16.
    Hirose, H., Yukizane, T., Zaman, F.: Accuracy assessment for the trade-off curve and its upper bound curve in the bump hunting using the new tree genetic algorithm. In: 7th World Congress in Probability and Statistics (2008)Google Scholar
  17. 17.
    Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactions on Modeling and Computer Simulation 8, 3–30 (1998)zbMATHCrossRefGoogle Scholar
  18. 18.
    Sniadecki, J., Therapeutics, A.: Bump Hunting With SAS: A Macro Approach To Employing PRIM. SAS Global Forum 156 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Kyushu Institute of TechnologyFukuokaJapan
  2. 2.Nomura Research Institute, Ltd.TokyoJapan

Personalised recommendations