A robust approach to model-based classification based on trimming and constraints

Semi-supervised learning in presence of outliers and label noise
  • Andrea CappozzoEmail author
  • Francesca Greselin
  • Thomas Brendan Murphy
Regular Article


In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.


Model-based classification Label noise Outliers detection Impartial trimming Eigenvalues restrictions Robust estimation 

Mathematics Subject Classification

62H30 62F35 



The authors are very grateful to Agustin Mayo-Iscar and Luis Angel García Escudero for both stimulating discussion and advices on how to enforce the eigenvalue-ratio constraints under the different patterned models. Andrea Cappozzo deeply thanks Michael Fop for his endless patience and guidance in helping him with methodological and computational issues encountered during the draft of the present manuscript. Brendan Murphy’s work is supported by the Science Foundation Ireland Insight Research Centre (SFI/12/RC/2289_P2).


  1. Aitken AC (1926) A series formula for the roots of algebraic and transcendental equations. Proc R Soc Edinb 45(01):14–22zbMATHGoogle Scholar
  2. Alimentarius C (2001) Revised codex standard for honey. Codex stan 12:1982Google Scholar
  3. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803MathSciNetzbMATHGoogle Scholar
  4. Bensmail H, Celeux G (1996) Regularized Gaussian discriminant analysis through eigenvalue decomposition. J Am Stat Assoc 91(436):1743–1748MathSciNetzbMATHGoogle Scholar
  5. Bohning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388zbMATHGoogle Scholar
  6. Bouveyron C, Girard S (2009) Robust supervised classification with mixture models: learning from data with uncertain labels. Pattern Recognit 42(11):2649–2658zbMATHGoogle Scholar
  7. Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8:217–226MathSciNetGoogle Scholar
  8. Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276Google Scholar
  9. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793Google Scholar
  10. Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2018) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Gr Stat 27(2):404–416MathSciNetGoogle Scholar
  11. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  12. Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetzbMATHGoogle Scholar
  13. Dean N, Murphy TB, Downey G (2006) Using unlabelled data to update classification rules with applications in food authenticity studies. J R Stat Soc Ser C Appl Stat 55(1):1–14MathSciNetzbMATHGoogle Scholar
  14. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetzbMATHGoogle Scholar
  15. Dotto F, Farcomeni A (2019) Robust inference for parsimonious model-based clustering. J Stat Comput Simul 89(3):414–442MathSciNetGoogle Scholar
  16. Dotto F, Farcomeni A, García-Escudero LA, Mayo-Iscar A (2018) A reweighting approach to robust clustering. Stat Comput 28(2):477–493MathSciNetzbMATHGoogle Scholar
  17. Downey G (1996) Authentication of food and food ingredients by near infrared spectroscopy. J Near Infrared Spectrosc 4(1):47Google Scholar
  18. Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J XX(August):1–29Google Scholar
  19. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetzbMATHGoogle Scholar
  20. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetzbMATHGoogle Scholar
  21. Fritz H, García-Escudero LA, Mayo-Iscar A (2012) tclust : an R package for a trimming approach to cluster analysis. J Stat Softw 47(12):1–26Google Scholar
  22. Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetzbMATHGoogle Scholar
  23. Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis, Springer, pp 247–255Google Scholar
  24. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster Analysis. Ann Stat 36(3):1324–1345MathSciNetzbMATHGoogle Scholar
  25. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2–3):89–109MathSciNetzbMATHGoogle Scholar
  26. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21(4):585–599MathSciNetzbMATHGoogle Scholar
  27. García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43MathSciNetGoogle Scholar
  28. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633MathSciNetzbMATHGoogle Scholar
  29. García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2016) The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers. Comput Stat Data Anal 99:131–147MathSciNetzbMATHGoogle Scholar
  30. García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2017) Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif 12:1–31MathSciNetzbMATHGoogle Scholar
  31. Gordaliza A (1991a) Best approximations to random variables based on trimming procedures. J Approx Theory 64(2):162–180MathSciNetzbMATHGoogle Scholar
  32. Gordaliza A (1991b) On the breakdown point of multivariate location estimators based on trimming procedures. Stat Probab Lett 11(5):387–394MathSciNetzbMATHGoogle Scholar
  33. Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B (Methodol) 58(1):155–176MathSciNetzbMATHGoogle Scholar
  34. Hawkins DM, McLachlan GJ (1997) High-breakdown linear discriminant analysis. J Am Stat Assoc 92(437):136MathSciNetzbMATHGoogle Scholar
  35. Hickey RJ (1996) Noise modelling and evaluating learning from examples. Artif Intell 82(1–2):157–179MathSciNetGoogle Scholar
  36. Hubert M, Debruyne M, Rousseeuw PJ (2018) Minimum covariance determinant and extensions. Wiley Interdiscip Rev Comput Stat 10(3):1–11MathSciNetGoogle Scholar
  37. Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13(2):151–166MathSciNetGoogle Scholar
  38. Kelly JD, Petisco C, Downey G (2006) Application of Fourier transform midinfrared spectroscopy to the discrimination between Irish artisanal honey and such honey adulterated with various sugar syrups. J Agric Food Chem 54(17):6166–6171Google Scholar
  39. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, New YorkzbMATHGoogle Scholar
  40. Maronna R, Jacovkis PM (1974) Multivariate clustering procedures with variable metrics. Biometrics 30(3):499zbMATHGoogle Scholar
  41. McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition, vol 544. Wiley series in probability and statistics. Wiley, HobokenzbMATHGoogle Scholar
  42. McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, vol 54. Wiley series in probability and statistics. Wiley, HobokenzbMATHGoogle Scholar
  43. McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Joint IAPR international workshops on statistical techniques in pattern recognition and structural and syntactic pattern recognition. Springer, Berlin, pp 658–666Google Scholar
  44. McNicholas PD (2016) Mixture model-based classification. CRC Press, Boca RatonzbMATHGoogle Scholar
  45. Menardi G (2011) Density-based Silhouette diagnostics for clustering methods. Stat Comput 21(3):295–308MathSciNetzbMATHGoogle Scholar
  46. Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52(1):299–308MathSciNetzbMATHGoogle Scholar
  47. Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348Google Scholar
  48. Prati RC, Luengo J, Herrera F (2019) Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise. Knowl Inf Syst 60(1):63–97Google Scholar
  49. R Core Team (2018) R: a language and environment for statistical computingGoogle Scholar
  50. Rousseeuw PJ, Driessen KV (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223Google Scholar
  51. Russell N, Cribbin L, Murphy TB (2014) upclass: an R package for updating model-based classification rules. Cran R-Project OrgGoogle Scholar
  52. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MathSciNetzbMATHGoogle Scholar
  53. Thomson G (1939) The factorial analysis of human ability. Br J Educ Psychol 9(2):188–195Google Scholar
  54. Vanden Branden K, Hubert M (2005) Robust classification in high dimensions based on the SIMCA Method. Chemom Intell Lab Syst 79(1–2):10–21Google Scholar
  55. Wu X (1995) Knowledge acquisition from databases. Intellect books, WestportGoogle Scholar
  56. Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210zbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Statistics and Quantitative MethodsUniversity of Milano-BicoccaMilanItaly
  2. 2.School of Mathematics and Statistics and Insight Research CentreUniversity College DublinDublinIreland

Personalised recommendations