Skip to main content

Robust-Diagnostic Regression: A Prelude for Inducing Reliable Knowledge from Regression

  • Conference paper
  • First Online:

Abstract

Regression lies heart in statistics, it is the one of the most important branch of multivariate techniques available for extracting knowledge in almost every field of study and research. Nowadays, it has drawn a huge interest to perform the tasks with different fields like machine learning, pattern recognition and data mining. Investigating outlier (exceptional) is a century long problem to the data analyst and researchers. Blind application of data could have dangerous consequences and leading to discovery of meaningless patterns and carrying to the imperfect knowledge. As a result of digital revolution and the growth of the Internet and Intranet data continues to be accumulated at an exponential rate and thereby importance of detecting outliers and study their costs and benefits as a tool for reliable knowledge discovery claims perfect attention. Investigating outliers in regression has been paid great value for the last few decades within two frames of thoughts in the name of robust regression and regression diagnostics. Robust regression first wants to fit a regression to the majority of the data and then to discover outliers as those points that possess large residuals from the robust output whereas in regression diagnostics one first finds the outliers, delete/correct them and then fit the regular data by classical (usual) methods. At the beginning there seems to be much confusion but now the researchers reach to the consensus, robustness and diagnostics are two complementary approaches to the analysis of data and any one is not good enough. In this chapter, we discuss both of them under the unique spectrum of regression diagnostics. Chapter expresses the necessity and views of regression diagnostics as well as presents several contemporary methods through numerical examples in linear regression within each aforesaid category together with current challenges and

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atkinson, A.C. (1981), Two graphical displays for outlying and influential observations in regression. Biometrika, 68, 13 20.

    Google Scholar 

  2. Atkinson, A. C. (1986), Masking unmasked. Biometrika, 73, 533541.

    Google Scholar 

  3. Atkinson, A. C., Riani, M. (2000), Robust Diagnostic Regression Analysis. London, Springer.

    Google Scholar 

  4. Barnett, V., Lewis, T. B. (1995), Outliers in Statistical Data. NY, Wiley.

    Google Scholar 

  5. lBelsley, D. A., Kuh, E.,Welsch, R. E. (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. NY, Wiley.

    Google Scholar 

  6. Berka, P. (1997), Recognizing reliability of discovered knowledge, Principles of knowledge discovery and data mining, Lecture notes in computer science, Vol. 1263/1997, 307314.

    Google Scholar 

  7. Berry, M. J. A., Linoff, G. (1997), Data Mining Techniques for Marketing, Sales and Customer Support, NY, Wiley.

    Google Scholar 

  8. Billor, N., Hadi A. S., Velleman, F. (2000), BACON: Blocked adaptive computationally efficient outlier nominator. Computational Statistics and Data Analysis, 34, 279298.

    Article  Google Scholar 

  9. Box, G. E. P. (1953), Non-normality and tests on variance. Biometrika, 40, 318335.

    Google Scholar 

  10. Chatterjee, S., Hadi, A. S. (1986), Influential observations, high leverage points, and outliers in regression. Statistical Sciences, 1, 379416.

    MathSciNet  Google Scholar 

  11. Chatterjee, S., Hadi, A. S. (1988), Sensitivity Analysis in Linear Regression. NY, Wiley.

    Book  MATH  Google Scholar 

  12. Chatterjee, S., Hadi, A. S. (2006), Regression Analysis by Examples. NY, Wiley.

    Book  Google Scholar 

  13. Cook, R. D. (1977), Detection of influential observations in linear regression. Technometrics, 19, 1518.

    Article  Google Scholar 

  14. Cook, R. D. (1979), Influential observations in regression. Journal of the American Statistical Association, 74, 169174.

    Google Scholar 

  15. Cook, R. D. (1986), Assessment of local influence. Journal of Royal Statistical Society, B, 48(2), 133169.

    Google Scholar 

  16. Cook, R. D., Weisberg, S. (1982), Residuals and Influence in Regression. London, Chapman and Hall.

    Google Scholar 

  17. Cookley, C.W., Hettmansperger, T. P. (1993), A bounded influence, high breakdown, efficient regression estimator, Journal of the American Statistical Association, 88, 872880.

    Google Scholar 

  18. Dai, H., Liu, J. and Liu, H. (2006), 1st InternationalWorkshop on Reliability Issues in Knowledge Discovery (RIKD 06), http://doi.ieeecomputersociety.org/10.1109/ICDMW.2008.6, access 10-8-10.

  19. Dai, H, Liu, J. (2008), 2nd International Workshop on Reliability Issues in Knowledge Discovery (RIKD 08). newsgroups.derkeiler.com/Archive/Comp/comp…/msg00009.html, access 10810.

    Google Scholar 

  20. Dai, H., Liu, J., Smirnovi, E. (2010), 3rd International Workshop on Reliability Issues in Knowledge Discovery (RIKD 10), http://www.ourglocal.com/event/?eventid=4342, access 10810.

  21. Daniel, C., Wood, F. S. (1971), Fitting Equations to Data, NY, Wiley.

    MATH  Google Scholar 

  22. Efron, B., Tibshirani, R. J. (1993), An Introduction to the Bootstrap. NY, Wiley.

    MATH  Google Scholar 

  23. Elder, J. F. and Pregibon, D. (1995), A statistical perspective on KDD, in Proceedings of KDD-95, 8793.

    Google Scholar 

  24. Ellenberg, J. H. (1976), Testing for a single outlier from a general regression. Biometrics, 32, 637645.

    Article  MathSciNet  Google Scholar 

  25. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996), The KDD process for extracting useful knowledge from volumes of data, Communications of the ACM, 39 (10), 2734.

    Article  Google Scholar 

  26. Feng, Y., Wu, Z. (2006), Enhancing reliability throughout knowledge discovery process, in Proceedings of 1st International Workshop on Reliability Issues in Knowledge Discovery, Hong Kong, China.

    Google Scholar 

  27. Fox, J. (1993), Regression diagnostics. In M. S. L. Beck (Ed.), Regression analysis (245334). London, Sage Publications.

    Google Scholar 

  28. Gnanadesikan, R., Wilk, M. B. (1968), Probability plotting methods for the analysis of data, Biometrika, 55(1), 117.

    Google Scholar 

  29. Hadi, A. S. (1992), A new measure of overall potential influence in linear regression. Computational Statistics and Data Analysis, 14, 127.

    Article  Google Scholar 

  30. Hadi, A. S., Simonoff, J. S. (1993), Procedures for the identification of outliers. Journal of the American Statistical Association, 88, 12641272.

    Article  MathSciNet  Google Scholar 

  31. Hampel, F. R. (1968), Contribution to the theory of robust estimation. Ph. D. Thesis, University of California, Berkley.

    Google Scholar 

  32. Hampel, F. R. (1975). Beyond location parameters: robust concepts and methods. Bulletin of the International Statistics Institute, 46, 375382.

    MathSciNet  Google Scholar 

  33. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., Stahel, W. A. (1986), Robust Statistics: The Approach Based on Influence Function. NY, Wiley.

    MATH  Google Scholar 

  34. Hawkins, D. M. (1980), Identification of Outliers. London, Chapman and Hall.

    MATH  Google Scholar 

  35. Hawkins, D. M., Bradu, D., Kass, G. V. (1984), Location of several outliers in multiple regression data using elemental sets. Technometrics, 26, 197208.

    Article  MathSciNet  Google Scholar 

  36. Hoaglin, D. C., Welsch, R. E. (1978), The hat matrix in regression and ANOVA. American Statistician, 32, 1722.

    Article  Google Scholar 

  37. Hossjer, O. (1994), Rank-based estimates in the linear model with high breakdown point. Journal of the American Statistical Association, 89, 149158.

    Article  Google Scholar 

  38. Huber, P. J. (1964), Robust estimation of a location parameter. Annals of Mathematical Statistics, 35, 73101.

    Article  Google Scholar 

  39. Huber, P. J. (1973), Robust regression: asymptotics, conjectures and Monte Carlo. Annals of Statistics, 1, 799821.

    Google Scholar 

  40. Huber, P. J. (1981), Robust Statistics. NY, Wiley.

    Book  MATH  Google Scholar 

  41. Huber, P. J. (1991), Between robustness and diagnostics. In Stahel, W. and Weisberg, S. (Eds.), Direction in Robust Statistics and Diagnostics. 121130, NY, Springer-Verlag.

    Google Scholar 

  42. Imon, A.H.M.R. (2005), Identifying multiple influential observations in linear regression. Journal of Applied Statistics, 32(9), 929946.

    MathSciNet  Google Scholar 

  43. Knorr, M. E., Ng, T. R., Tucakov, V. (2000), Distance-based outlier: algorithms and applications. VLDB Journal, 8, 327253.

    Google Scholar 

  44. Mahalanobis, P. C. (1936), On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 12, 4955.

    Google Scholar 

  45. Mannila, H. (1996), Data mining: machine learning, statistics, and databases. http:reference.kfupm.edu.sa/contentda/data mining machine learning statistic 50921.pdf; access 6810.

    Google Scholar 

  46. Mallow, C. P. (1975), On some topics in robustness, Unpublished memorandum, Bell telephone laboratories, Murray Hill, NJ.

    Google Scholar 

  47. Maronna, R. A., Zamar, R. H. (2002), Robust estimates of location and dispersion for highdimensional data sets, Technometrics, 44, 307313.

    Article  MathSciNet  Google Scholar 

  48. Maronna, R. A., Martin, R. D., Yohai, V. J. (2006), Robust Statistics: Theory and Methods. NY, Wiley.

    Book  MATH  Google Scholar 

  49. Nurunnabi, A. A. M. (2008), Robust diagnostic deletion techniques in linear and logistic regression, M. Phil. Thesis, Unpublished, Rajshahi University, Bangladesh.

    Google Scholar 

  50. Nurunnabi, A. A. M., Imon, A. H. M. R., Nasser, M. (2011), A diagnostic measure for influential observations in linear regression. Communication in Statistics-Theory and Methods, 40 (7), 11691183.

    MathSciNet  Google Scholar 

  51. Pea, D., Prieto, F. J. (2001), Multivariate outlier detection and robust covariance estimation, Technometrics, 43, 286310.

    Google Scholar 

  52. Rousseeuw, P. J. (1984), Least median of squares regression. Journal of the American Statistical Association, 79, 871880.

    Article  MathSciNet  Google Scholar 

  53. Rousseeuw, P. J., Leroy, A. M. (2003), Robust Regression and Outlier Detection. NY, Wiley.

    Google Scholar 

  54. Rousseeuw, P. J., van Driessen, K. (1999), A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212223.

    Article  Google Scholar 

  55. Rousseeuw, P. J., van Zomeren, B. C. (1990), Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633639.

    Google Scholar 

  56. Simpson, D. G., Ruppert, D., Carroll, R. J. (1992), On one-step GM-estimates and stability of inference in linear regression, Journal of the American Statistical Association, 87, 439450.

    Article  MathSciNet  Google Scholar 

  57. Tukey, J. W. (1960), A survey of sampling from contaminated distributions: contributions to probability and statistics. Olkin, I. Ed., Stanford University Press, Stanford, California.

    Google Scholar 

  58. Tukey, J. W. (1962), The future of data analysis. Annals of Mathematical Statistics, 33, 167.

    MathSciNet  Google Scholar 

  59. Velleman, P. F., Welsch, R. E. (1981), Efficient computing in regression diagnostics. American Statistician, 35, 234242.

    Article  Google Scholar 

  60. Welsch, R. E., Kuh, E. (1977), Linear regression diagnostics, Sloan School of Management Working Paper, 923977, MIT, Cambridge: Massachusetts.

    Google Scholar 

  61. Willems, G., Aelst, S. V. (2004), Fast and robust bootstrap for LTS. Elsevier Science.

    Google Scholar 

  62. Yohai, V. J. (1987), High breakdown point and high efficiency robust estimates for regression. The Annals of Statistics, 15, 642656.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Awal Md. Nurunnabi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this paper

Cite this paper

Nurunnabi, A.A.M., Dai, H. (2012). Robust-Diagnostic Regression: A Prelude for Inducing Reliable Knowledge from Regression. In: Dai, H., Liu, J., Smirnov, E. (eds) Reliable Knowledge Discovery. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1903-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1903-7_4

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-1902-0

  • Online ISBN: 978-1-4614-1903-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics