Skip to main content
Log in

Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper provides a graphical visualization of multiple outliers based on a clustering algorithm using the minimal spanning tree, and proposes a modified version of this clustering algorithm for identifying multiple outliers. Graphical visualization is helpful for the classification of multiple outliers. It is shown that the proposed modified procedure preserves the performance of the clustering algorithm in identifying multiple outliers, but also reduces the problem of swamping of observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Brownlee KA (1965) Statistical theory and methodology in science and engineering, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Gordon AD (1981) classification. Chapman and Hall, London

    MATH  Google Scholar 

  • Gower JC, Ross GJS (1969) Minimum spanning trees and single linkage cluster analysis. Appl Stat 18:54–64

    Article  MathSciNet  Google Scholar 

  • Hadi AS, Simonoff JS (1993) Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 88:1264–1272

    Article  MathSciNet  Google Scholar 

  • Hawkins DM, Bradu D, Kass GV (1984) Location of several outliers in multiple regression data using elemental sets. Technometrics 26:197–208

    Article  MathSciNet  Google Scholar 

  • Jolliffe IT, Jones B, Morgan BJT (1995) Identifying influential observations in hierarchical cluster analysis. J Appl Stat 22(1):61–80

    MathSciNet  Google Scholar 

  • Kim S, Kwon S, Cook D (2000) Interactive visualization of hierarchical clusters using MDS and MST. Metrika 51(1):39–51

    Article  MATH  Google Scholar 

  • Kim S, Park S (1995) Dynamic Plots for Displaying the Roles of Variables and Observations in Regression Model. Comput Stat Data Anal 19:401–418

    Article  MATH  Google Scholar 

  • Krzanowski WJ (1988) Principles of multivariate analysis. Oxford Science Publication, Oxford

    MATH  Google Scholar 

  • Lawrance AJ (1995) Deletion Influence and Masking in Regression. J Roy Stat Soc B 57(1):181–189

    MATH  MathSciNet  Google Scholar 

  • Mojena R (1977) Hierarchical grouping methods and stopping rule:an evaluation. Comput J 20:359–363

    Article  Google Scholar 

  • Pena D, Yohai VJ (1995) The Detection of Influential Subsets in Linear Regression by using an Influence Matrix. J Roy Stat Soc B 57(1):145–156

    MATH  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Leroy A (1987) Robust regression and outlier detection. Wiley, New York

    Book  MATH  Google Scholar 

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Associ 79:871–881

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw PJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Associ 85:633–639

    Article  Google Scholar 

  • Sebert DM, Montgomery DC, Rollier DA (1998) A clustering algorithm for identifying multiple outliers. Comput Stat Data Analy 27:461–484

    Article  MATH  Google Scholar 

  • Wilcox RR (2005) Introduction to robust estimation and hypothesis testing, 2nd edn. Elsevier Academic Press, Amsterdam

    MATH  Google Scholar 

  • Wisnowski JW, Montgomery DC, Simpson JR (2001) A comparative analysis of multiple outlier detection procedures in the linear regression model. Comput Stat Data Anal 351–382

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sung-Soo Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, SS., Krzanowski, W.J. Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization. Computational Statistics 22, 109–119 (2007). https://doi.org/10.1007/s00180-007-0026-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-007-0026-3

Keywords

Navigation