Statistics and Computing

, Volume 9, Issue 2, pp 123–143 | Cite as

Bump hunting in high-dimensional data

  • Jerome H. Friedman
  • Nicholas I. Fisher

Abstract

Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In addition it is usually desired that these regions be describable in an interpretable form involving simple statements (“rules”) concerning the input values. This paper presents a procedure directed towards this goal based on the notion of “patient” rule induction. This patient strategy is contrasted with the greedy ones used by most rule induction methods, and semi-greedy ones used by some partitioning tree techniques such as CART. Applications involving scientific and commercial data bases are presented.

Data Mining noisy function optimization classification association rule induction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnett, V. (1976) The ordering of multivariate data (with dis-cussion). J. Roy. Statist. Soc., A 139, 318–354.Google Scholar
  2. Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.Google Scholar
  3. Breiman, L. (1996) Bagging predictors. Machine Learning, 24, 123–140.Google Scholar
  4. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.Google Scholar
  5. Clark, P. and Niblett, R. (1989) The CN2 induction algorithm. Machine Learning, 3, 261–284.Google Scholar
  6. Cohen W. W. (1995) Fast efficient rule induction. In Machine Learning: Proceedings of the Twelfth International Confer-ence, Lake Tahoe, CA (115–123). Morgan-Kaufmann.Google Scholar
  7. Donoho, D. and Gasko, M. (1992) Breakdown properties of lo-cation estimates based on halfspace depth and projected outlyingness. Annals of Statistics, 20, 1803–1827.Google Scholar
  8. Efron, B. and Tibshirani, R. J. (1993) An Introduction to the Bootstrap, Chapman and Hall.Google Scholar
  9. Friedman, J. H. (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.Google Scholar
  10. Green, P. J. (1981) Peeling bivariate data. In Interpreting Multi-variate Data (V. Barnett, ed.) Wiley.Google Scholar
  11. Griffin, W. L., Fisher, N. I., Friedman, J. H., Ryan, C. G., and O'Reilly, S. (1999) Cr-Pyrope garnets in lithospheric mantle. J. Petrology to appear.Google Scholar
  12. Hall, P. (1989) On projection pursuit regression. Annals of Sta-tistics, 17, 573–588.Google Scholar
  13. Lorentz, G. G. (1986) Approximation of Functions. Chelsea.Google Scholar
  14. Mitchell, T. M. (1997) Machine Learning. McGraw-Hill.Google Scholar
  15. Quinlan, J. R. (1990) Learning logical definitions from relations. Machine Learning, 5, 239–266.Google Scholar
  16. Quinlan, J. R. (1994) C4.5: Programs for Machine Learning. Morgan-Kaufmann.Google Scholar
  17. Quinlan, J. R. (1995) MDL and categorical theories (continued). In Machine Learning: Proceedings of the Twelfth Interna-tional Conference, Lake Tahoe, CA (464–470). Morgan-Kaufmann.Google Scholar
  18. Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.Google Scholar
  19. Rivest, R. L. (1987) Learning decision lists. Machine Learning, 2, 229–246.Google Scholar
  20. Tibshirani, R. J. and Knight, K. (1995) Model search and infer-ence by bootstrap “bumping”. Technical Report, University of Toronto.Google Scholar
  21. Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer.Google Scholar
  22. Wahba, G. (1990) Spline Models for Observational Data. SIAM.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Jerome H. Friedman
  • Nicholas I. Fisher

There are no affiliations available

Personalised recommendations