Advertisement

Iterated Boosting for Outlier Detection

  • Nathalie Cheze
  • Jean-Michel Poggi
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

A procedure for detecting outliers in regression problems based on information provided by boosting trees is proposed. Boosting is meant for dealing with observations that are hard to predict, by giving them extra weights. In the present paper, such observations are considered to be possible outliers, and a procedure is proposed that uses the boosting results to diagnose which observations could be outliers. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate boosting after removing it. A lot of well-known bench data sets are considered and a comparative study against two classical competitors allows to show the value of the method.

Keywords

Outlier Detection Boost Regression Tree Rejection Region Least Trim Square Minimum Covariance Determinant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984): Classification And Regression Trees. Chapman & Hall.Google Scholar
  2. CHEZE, N. and POGGI, J-M. (2005): Outlier Detection by Boosting Regression Trees. Preprint 2005–17, Orsay. www.math.u-psud.fr/biblio/ppo/2005/Google Scholar
  3. CHEZE, N., POGGI, J-M. and PORTIER, B. (2003): Partial and Recombined Estimators for Nonlinear Additive Models. Stat. Inf. Stoch. Proc., 6, 155–197.zbMATHMathSciNetCrossRefGoogle Scholar
  4. DRUCKER, H. (1997): Improving Regressors using Boosting Techniques. In: Proc. of the 14th Int. Conf. on Machine Learning. Morgan Kaufmann, 107–115.Google Scholar
  5. FREUND, Y. and SCHAPIRE, R. E. (1997): A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55,1, 119–139.zbMATHMathSciNetCrossRefGoogle Scholar
  6. GEY, S. and POGGI, J-M. (2006): Boosting and Instability for Regression Trees. Computational Statistics & Data Analysis, 50,2, 533–550.MathSciNetCrossRefGoogle Scholar
  7. ROUSSEEUW, P.J. and LEROY, A. (1987): Robust regression and outlier detection. Wiley.Google Scholar
  8. VERBOVEN, S. and HUBERT, M. (2005): LIBRA: a MATLAB library for robust analysis. Chemometrics and Intelligent Laboratory Systems, 75, 127–136.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2006

Authors and Affiliations

  • Nathalie Cheze
    • 1
    • 2
  • Jean-Michel Poggi
    • 1
    • 3
  1. 1.Laboratoire de Mathématique — U.M.R. C 8628, “Probabilités, Statistique et Modélisation”Université Paris-Sud, Bât. 425Orsay cedexFrance
  2. 2.Université Paris 10-NanterreModal’XFrance
  3. 3.Université Paris 5France

Personalised recommendations