Skip to main content

Advertisement

Log in

Genetic algorithms for outlier detection and variable selection in linear regression models

  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

This article addresses some problems in outlier detection and variable selection in linear regression models. First, in outlier detection there are problems known as smearing and masking. Smearing means that one outlier makes another, non-outlier observation appear as an outlier, and masking that one outlier prevents another one from being detected. Detecting outliers one by one may therefore give misleading results. In this article a genetic algorithm is presented which considers different possible groupings of the data into outlier and non-outlier observations. In this way all outliers are detected at the same time. Second, it is known that outlier detection and variable selection can influence each other, and that different results may be obtained, depending on the order in which these two tasks are performed. It may therefore be useful to consider these tasks simultaneously, and a genetic algorithm for a simultaneous outlier detection and variable selection is suggested. Two real data sets are used to illustrate the algorithms, which are shown to work well. In addition, the scalability of the algorithms is considered with an experiment using generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Tolvi.

Additional information

I would like to thank Dr Tero Aittokallio and an anonymous referee for useful comments.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tolvi, J. Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Computing 8, 527–533 (2004). https://doi.org/10.1007/s00500-003-0310-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-003-0310-2

Keywords

Navigation