Skip to main content

Computing Robust Statistics via an EM Algorithm

  • Conference paper
  • First Online:
Statistics for Data Science and Policy Analysis

Abstract

Maximum likelihood is perhaps the most common method to estimate model parameters in applied statistics. However, it is well known that maximum likelihood estimators often have poor properties when outliers are present. Robust estimation methods are often used for estimating the model parameters in the presence of outliers, but these methods lack a unified approach. We propose a unified method using EM algorithm to make statistical modelling more robust. In this paper, we describe the proposed method of robust estimation and demonstrate it using the example of estimating the location parameter. Well known real data sets with outliers were used to demonstrate the application of proposed estimator. Finally, the proposed estimator is compared with standard M-estimator. In this talk, the location case was considered for simplicity, but directly extends to the robust estimation of parameters in a broad range of statistical models. Hence this proposed method aligns with the classical statistical modelling, in terms of a unified approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abbey, S.: Robust measures and the estimator limit. Geostand. Newslett. 12, 241 (1988)

    Article  Google Scholar 

  2. Analytical Methods Committee: Robust statistics how not to reject outliers. The Analyst 114, 1693–1702 (1989)

    Google Scholar 

  3. Andrews, D.F., Bickel, P.J., Hampel, F.R., Huber, P.J., Rogers, W.H., Tukey, J.W.: Robust Estimates of Location: Survey and Advances, vol. 279. Princeton University Press, Princeton (1972)

    MATH  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. (B) 39, 1–38 (1977)

    Google Scholar 

  5. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall, London (1995)

    Book  Google Scholar 

  6. Hennig, C.: Robustness of ML estimators of location-scale mixtures. In: Innovations in Classification, Data Science, and Information Systems, pp. 128–137. Springer, Heidelberg (2005)

    Google Scholar 

  7. Hennig, C., Coretto, P.: The noise component in model-based cluster analysis. In: Data Analysis, Machine Learning and Applications, pp. 127–138. Springer, Berlin (2008)

    Google Scholar 

  8. Holland, P.W., Welsch, R.E.: Robust regression using iteratively reweighted least-squares. Commun. Stat. Theor. Methods A6(9), 813–827 (1977)

    Article  Google Scholar 

  9. Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)

    Article  MathSciNet  Google Scholar 

  10. Huber, P., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, New York (2009)

    Book  Google Scholar 

  11. Longforda, N.T., D’Ursob, P.: Mixture models with an improper component. J. Appl. Stat. 38(11), 2511–2521 (2011)

    Article  MathSciNet  Google Scholar 

  12. Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. B 44, 226–233 (1982)

    MathSciNet  MATH  Google Scholar 

  13. Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics. Wiley, West Sussex, England (2006)

    Book  Google Scholar 

  14. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, Hoboken, New Jersey (1996)

    MATH  Google Scholar 

  15. Rohan, M.: Using Finite Mixtures to Robustify Statistical Models. Ph.D. Thesis, The University of Waikato (2011)

    Google Scholar 

  16. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002)

    Book  Google Scholar 

Download references

Acknowledgements

I would like to express sincere thanks to Dr Murray Jorgensen (AUT University) for his valuable discussion throughout my research and advice of this manuscript. I also acknowledged Dr Iain Hume (NSW DPI) for comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maheswaran Rohan .

Editor information

Editors and Affiliations

Appendix A: Proof of Equation (2.9)

Appendix A: Proof of Equation (2.9)

$$\displaystyle \begin{aligned} \begin{array}{rcl} l_c(\theta) &\displaystyle =&\displaystyle \log L_c(\theta)\\ &\displaystyle =&\displaystyle \sum_{i = 1}^n \left[\hat z_i \log (\lambda) + \hat z_i \log f(y_i - \theta) + (1-\hat z_i) \log (1-\lambda) + (1-\hat z_i) \log (g) \right]\\ &\displaystyle =&\displaystyle \sum_{i = 1}^n \hat z_i \log f(y_i - \theta) + \mbox{constant}\\ \end{array} \end{aligned} $$

For MLE, \( l_c(\theta ) = \sum _{i = 1}^n \hat z_i (y_i - \theta )^2 + \mbox{constant} \) is to be maximized with respect to θ,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d l_c(\theta)}{d \theta} &\displaystyle =&\displaystyle 0\\ \sum_{i = 1}^n \hat z_i (y_i - \theta) &\displaystyle =&\displaystyle 0. \end{array} \end{aligned} $$

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rohan, M. (2020). Computing Robust Statistics via an EM Algorithm. In: Rahman, A. (eds) Statistics for Data Science and Policy Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-15-1735-8_2

Download citation

Publish with us

Policies and ethics