Skip to main content
Log in

Separating a mixture of two normals with proportional covariances

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

We propose a simple affine equivariant clustering method, based on the idea of best linear classification, for samples from a mixture of two multivariate normal distributions with different mean vectors but proportional covariance matrices. To ameliorate the curse of dimensionality, a non-parametric approach to find candidates for a best linear discriminant function is presented. By using simulation studies and a real example, we show that for large samples in high dimensions, the proposed method can be a useful supplement to general-purpose multivariate outlier detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson TW, Bahadur RR (1962) Classification into two multivariate normal distributions with different covariance matrices. Ann Math Stat 33: 420–431

    Article  MATH  MathSciNet  Google Scholar 

  • Becker C, Gather U (2001) The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput Stat Data Anal 36: 119–127

    Article  MATH  MathSciNet  Google Scholar 

  • Béguin C (2002) Outlier detection in multivariate data. Master’s thesis, University of Neuchâtel

  • Billor N, Hadi AS, Velleman PF (2000) BACON: blocked adaptive computationally efficient outlier nominators. Comput Stat Data Anal 34: 279–298

    Article  MATH  Google Scholar 

  • Caussinus H, Fekri M, Hakam S, Ruiz-Gazen A (2003) A monitoring display of multivariate outliers. Comput Stat Data Anal 44: 237–252

    Article  MATH  MathSciNet  Google Scholar 

  • Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30: 45–59

    Article  MATH  MathSciNet  Google Scholar 

  • Cohen AC (1967) Estimation in mixtures of two normal distributions. Technometrics 9: 15–28

    Article  MATH  MathSciNet  Google Scholar 

  • Croux C, Haesbroeck G (2000) Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87: 603–618

    Article  MATH  MathSciNet  Google Scholar 

  • Croux C, Haesbroeck G (2002) A note on finite-sample efficiencies of estimators for the minimum volume ellipsoid. J Stat Comput Simul 72: 585–596

    Article  MATH  MathSciNet  Google Scholar 

  • Croux C, Haesbroeck G, Rousseeuw P (2002) Location adjustment for the minimum volume ellipsoid estimator. Stat Comput 12: 191–200

    Article  MathSciNet  Google Scholar 

  • Dunn CL (1992) Algorithm AS 276: normal combinatoric classification. Appl Stat 41: 483–496

    Article  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discrimination analysis, and density estimation. J Am Stat Assoc 97: 611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Furman WD, Linsay BG (1994) Measuring the relative effectiveness of moment estimators as starting values in maximizing likelihoods. Comput Stat Data Anal 17: 493–507

    Article  MATH  Google Scholar 

  • Gervini D (2003) A robust and efficient adaptive reweighted estimator of multivariate location and scatter. J Multivar Anal 84: 116–144

    Article  MATH  MathSciNet  Google Scholar 

  • Hadi AS (1994) A modification of a method for the detection of outliers in multivariate samples. J R Stat Soc Ser B 56: 393–396

    MATH  Google Scholar 

  • Hardin J, Rocke DM (2004) Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput Stat Data Anal 44: 625–638

    Article  MathSciNet  Google Scholar 

  • Hawkins DM (1980) The identification of outliers. Chapman and Hall, London

    Google Scholar 

  • Hawkins DM, Olive DJ (2002) Inconsistency of resampling algorithms for high breakdown regression estimators and a new algorithm. J Am Stat Assoc 97: 136–159

    Article  MATH  MathSciNet  Google Scholar 

  • Hoaglin DC, Mosteller F, Tukey JW (1983) Understanding robust and exploratory data analysis. Wiley, New York

    MATH  Google Scholar 

  • Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47: 64–79

    Article  MathSciNet  Google Scholar 

  • Hubert M, Rousseeuw PJ, Verboven S (2002) A fast robust method for principal components with applications to chemometrics. Chemom Intell Lab Syst 60: 101–111

    Article  Google Scholar 

  • Juan J, Prieto FJ (1995) A subsampling method for the computation of multivariate estimators with high breakdown point. J Comput Graph Stat 4: 319–334

    Article  Google Scholar 

  • Juan J, Prieto FJ (2001) Using angles to identify concentrated multivariate outliers. Technometrics 43: 311–322

    Article  MathSciNet  Google Scholar 

  • Kosinski AS (1999) A procedure for the detection of multivariate outliers. Comput Stat Data Anal 29: 145–161

    Article  Google Scholar 

  • Li G, Chen Z (1985) Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J Am Stat Soc 80: 759–766

    MATH  Google Scholar 

  • Li G, Zhang J (1998) Sphering and its properties. Sankhyã Ser A 60: 119–133

    MATH  Google Scholar 

  • Lopuhaä HP (1989) On the relation between S-estimators and M-estimators of multivariate location and covariance. Ann Stat 17: 1662–1683

    Article  MATH  Google Scholar 

  • Lopuhaä HP (1999) Asymptotics of reweighted estimators of multivariate location and scatter. Ann Stat 27: 1638–1665

    Article  MATH  Google Scholar 

  • Markatou M (2000) Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56: 483–486

    Article  MATH  Google Scholar 

  • Maronna RA, Yohai VJ (1995) The behavior of the Stahel-Donoho robust multivariate estimator. J Am Stat Soc 90: 330–341

    MATH  MathSciNet  Google Scholar 

  • Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44: 307–317

    Article  MathSciNet  Google Scholar 

  • Mehrotra DV (1995) Robust elementwise estimation of a dispersion matrix. Biometrics 51: 1344–1351

    Article  MATH  Google Scholar 

  • Merz P (2003) An iterated local search approach for minimum sum-of-squares clustering. In: Proceedings of the 5th international symposium on intelligent data analysis, Berlin, pp 286–296

  • Olive DJ (2004) A resistant estimator of multivariate location and dispersion. Comput Stat Data Anal 46: 93–102

    Article  MATH  MathSciNet  Google Scholar 

  • Pearson K (1894) Contributions to the mathematical theory of evolution. Philos Trans R Soc 185: 71–110

    Article  Google Scholar 

  • Peña D, Prieto FJ (2000) The Kurtosis coefficient and the linear discriminant function. Stat Probab Lett 49: 257–261

    Article  MATH  Google Scholar 

  • Peña D, Prieto FJ (2001a) Cluster identification using projections. J Am Stat Assoc 96: 1433–1445

    Article  MATH  Google Scholar 

  • Peña D, Prieto FJ (2001b) Multivariate outlier detection and robust covariance matrix estimation. Technometrics 43: 286–300

    Article  MathSciNet  Google Scholar 

  • Priebe C, Marchette D, Healy D (2002) Integrated sensing and processing for statistical pattern recognition. In: Rockmore D, Healy D Jr (eds) Modern signal processing. Cambridge University Press, London, pp 223–246

    Google Scholar 

  • Reiners T (1998) Maximum likelihood clustering of data sets using multilevel, parallel heuristic. Master’s thesis, Technische Universität Braunschweig

  • Reyen SS (2004) Constructive clustering analysis. PhD thesis, George Mason University

  • Reyen SS, Miller JJ (2005) The moment of inertia and the linear discriminant function. Stat Probab Lett 71: 39–46

    Article  MATH  MathSciNet  Google Scholar 

  • Ridolfi A, Idier J (2000) Penalized maximum likelihood estimation for univariate normal mixture distributions. In: Proceedings of the 20th international workshop on Bayesian inference and maximum entropy methods in science and engineering (MaxEnt), American Institute of Physics, Gif-sur-Yvette, France

  • Rocke DM (1996) Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann Stat 24: 1327–1345

    Article  MATH  MathSciNet  Google Scholar 

  • Rocke DM (1998a) Constructive statistics: estimators, algorithms, and asymptotics. In: 30th Symposium on the interface: computing science and statistics, Minneapolis, Minnesota, vol 30, pp 3–14

  • Rocke DM (1998b) A perspective on statistical tools for data mining applications. In: Proceedings of the second international conference on practical application of knowledge discovery and data minings, London, pp 313–318

  • Rocke DM, Woodruff DL (1993a) Computation of robust estimates of multivariate location and shape. Stat Neerl 47: 27–42

    Article  MathSciNet  Google Scholar 

  • Rocke DM, Woodruff DL (1993b) Heuristic search algorithms for the minimum volume ellipsoid. J Comput Graph Stat 2: 69–95

    Article  Google Scholar 

  • Rocke DM, Woodruff DL (1994) Computable robust estimation of multivariate location and shape in high dimension using compound estimators. J Am Stat Assoc 89: 888–896

    Article  MATH  MathSciNet  Google Scholar 

  • Rocke DM, Woodruff DL (1996) Identification of outliers in multivariate data. J Am Stat Assoc 91: 1047–1061

    Article  MATH  MathSciNet  Google Scholar 

  • Rocke DM, Woodruff DL (1997) Robust estimation of multivariate location and shape. J Stat Plann Inference 57: 245–255

    Article  MATH  MathSciNet  Google Scholar 

  • Rocke DM, Woodruff DL (1999) A synthesis of outlier detection and cluster identification (Preprint)

  • Rocke DM, Woodruff DL (2001) Discussion of multivariate outlier detection and robust covariance matrix estimation. Technometrics 43: 300–303

    MathSciNet  Google Scholar 

  • Rohlf FJ (1975) Generalization of the gap test for the detection of multivariate outliers. Biometrics 31: 93–101

    Article  MATH  Google Scholar 

  • Rousseeuw PJ, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimators. Technometrics 41: 212–223

    Article  Google Scholar 

  • Ruppert D (1992) Computing S estimators for regression and multivariate location/shape. J Comput Graph Stat 1: 253–270

    Article  Google Scholar 

  • Schott JR (1997) Matrix analysis for statistics. Wiley, New York

    MATH  Google Scholar 

  • Werner M (2003) Identification of multivariate outliers in large data sets. PhD thesis, University of Colorado at Denver

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John J. Miller.

Additional information

Salem S. Reyen was supported by the Defense Advanced Research Project Agency through cooperative agreement 8105-48267 with the John Hopkins University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reyen, S.S., Miller, J.J. & Wegman, E.J. Separating a mixture of two normals with proportional covariances. Metrika 70, 297–314 (2009). https://doi.org/10.1007/s00184-008-0193-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-008-0193-4

Keywords

Navigation