Iterated Boosting for Outlier Detection
A procedure for detecting outliers in regression problems based on information provided by boosting trees is proposed. Boosting is meant for dealing with observations that are hard to predict, by giving them extra weights. In the present paper, such observations are considered to be possible outliers, and a procedure is proposed that uses the boosting results to diagnose which observations could be outliers. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate boosting after removing it. A lot of well-known bench data sets are considered and a comparative study against two classical competitors allows to show the value of the method.
KeywordsOutlier Detection Boost Regression Tree Rejection Region Least Trim Square Minimum Covariance Determinant
Unable to display preview. Download preview PDF.
- BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984): Classification And Regression Trees. Chapman & Hall.Google Scholar
- CHEZE, N. and POGGI, J-M. (2005): Outlier Detection by Boosting Regression Trees. Preprint 2005–17, Orsay. www.math.u-psud.fr/biblio/ppo/2005/Google Scholar
- DRUCKER, H. (1997): Improving Regressors using Boosting Techniques. In: Proc. of the 14th Int. Conf. on Machine Learning. Morgan Kaufmann, 107–115.Google Scholar
- ROUSSEEUW, P.J. and LEROY, A. (1987): Robust regression and outlier detection. Wiley.Google Scholar