Robust-Diagnostic Regression: A Prelude for Inducing Reliable Knowledge from Regression

Nurunnabi, Abdul Awal Md.; Dai, Honghua

doi:10.1007/978-1-4614-1903-7_4

Robust-Diagnostic Regression: A Prelude for Inducing Reliable Knowledge from Regression

Abdul Awal Md. Nurunnabi⁴ &
Honghua Dai⁵

Conference paper
First Online: 01 January 2012

532 Accesses
2 Citations

Abstract

Regression lies heart in statistics, it is the one of the most important branch of multivariate techniques available for extracting knowledge in almost every field of study and research. Nowadays, it has drawn a huge interest to perform the tasks with different fields like machine learning, pattern recognition and data mining. Investigating outlier (exceptional) is a century long problem to the data analyst and researchers. Blind application of data could have dangerous consequences and leading to discovery of meaningless patterns and carrying to the imperfect knowledge. As a result of digital revolution and the growth of the Internet and Intranet data continues to be accumulated at an exponential rate and thereby importance of detecting outliers and study their costs and benefits as a tool for reliable knowledge discovery claims perfect attention. Investigating outliers in regression has been paid great value for the last few decades within two frames of thoughts in the name of robust regression and regression diagnostics. Robust regression first wants to fit a regression to the majority of the data and then to discover outliers as those points that possess large residuals from the robust output whereas in regression diagnostics one first finds the outliers, delete/correct them and then fit the regular data by classical (usual) methods. At the beginning there seems to be much confusion but now the researchers reach to the consensus, robustness and diagnostics are two complementary approaches to the analysis of data and any one is not good enough. In this chapter, we discuss both of them under the unique spectrum of regression diagnostics. Chapter expresses the necessity and views of regression diagnostics as well as presents several contemporary methods through numerical examples in linear regression within each aforesaid category together with current challenges and

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkinson, A.C. (1981), Two graphical displays for outlying and influential observations in regression. Biometrika, 68, 13 20.
Google Scholar
Atkinson, A. C. (1986), Masking unmasked. Biometrika, 73, 533541.
Google Scholar
Atkinson, A. C., Riani, M. (2000), Robust Diagnostic Regression Analysis. London, Springer.
Google Scholar
Barnett, V., Lewis, T. B. (1995), Outliers in Statistical Data. NY, Wiley.
Google Scholar
lBelsley, D. A., Kuh, E.,Welsch, R. E. (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. NY, Wiley.
Google Scholar
Berka, P. (1997), Recognizing reliability of discovered knowledge, Principles of knowledge discovery and data mining, Lecture notes in computer science, Vol. 1263/1997, 307−314.
Google Scholar
Berry, M. J. A., Linoff, G. (1997), Data Mining Techniques for Marketing, Sales and Customer Support, NY, Wiley.
Google Scholar
Billor, N., Hadi A. S., Velleman, F. (2000), BACON: Blocked adaptive computationally efficient outlier nominator. Computational Statistics and Data Analysis, 34, 279298.
Article Google Scholar
Box, G. E. P. (1953), Non-normality and tests on variance. Biometrika, 40, 318335.
Google Scholar
Chatterjee, S., Hadi, A. S. (1986), Influential observations, high leverage points, and outliers in regression. Statistical Sciences, 1, 379416.
MathSciNet Google Scholar
Chatterjee, S., Hadi, A. S. (1988), Sensitivity Analysis in Linear Regression. NY, Wiley.
Book MATH Google Scholar
Chatterjee, S., Hadi, A. S. (2006), Regression Analysis by Examples. NY, Wiley.
Book Google Scholar
Cook, R. D. (1977), Detection of influential observations in linear regression. Technometrics, 19, 1518.
Article Google Scholar
Cook, R. D. (1979), Influential observations in regression. Journal of the American Statistical Association, 74, 169174.
Google Scholar
Cook, R. D. (1986), Assessment of local influence. Journal of Royal Statistical Society, B, 48(2), 133169.
Google Scholar
Cook, R. D., Weisberg, S. (1982), Residuals and Influence in Regression. London, Chapman and Hall.
Google Scholar
Cookley, C.W., Hettmansperger, T. P. (1993), A bounded influence, high breakdown, efficient regression estimator, Journal of the American Statistical Association, 88, 872880.
Google Scholar
Dai, H., Liu, J. and Liu, H. (2006), 1st InternationalWorkshop on Reliability Issues in Knowledge Discovery (RIKD 06), http://doi.ieeecomputersociety.org/10.1109/ICDMW.2008.6, access 10-8-10.
Dai, H, Liu, J. (2008), 2nd International Workshop on Reliability Issues in Knowledge Discovery (RIKD 08). newsgroups.derkeiler.com/Archive/Comp/comp…/msg00009.html, access 10−8−10.
Google Scholar
Dai, H., Liu, J., Smirnovi, E. (2010), 3rd International Workshop on Reliability Issues in Knowledge Discovery (RIKD 10), http://www.ourglocal.com/event/?eventid=4342, access 10−8−10.
Daniel, C., Wood, F. S. (1971), Fitting Equations to Data, NY, Wiley.
MATH Google Scholar
Efron, B., Tibshirani, R. J. (1993), An Introduction to the Bootstrap. NY, Wiley.
MATH Google Scholar
Elder, J. F. and Pregibon, D. (1995), A statistical perspective on KDD, in Proceedings of KDD-95, 87−93.
Google Scholar
Ellenberg, J. H. (1976), Testing for a single outlier from a general regression. Biometrics, 32, 637645.
Article MathSciNet Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996), The KDD process for extracting useful knowledge from volumes of data, Communications of the ACM, 39 (10), 27−34.
Article Google Scholar
Feng, Y., Wu, Z. (2006), Enhancing reliability throughout knowledge discovery process, in Proceedings of 1st International Workshop on Reliability Issues in Knowledge Discovery, Hong Kong, China.
Google Scholar
Fox, J. (1993), Regression diagnostics. In M. S. L. Beck (Ed.), Regression analysis (245334). London, Sage Publications.
Google Scholar
Gnanadesikan, R., Wilk, M. B. (1968), Probability plotting methods for the analysis of data, Biometrika, 55(1), 117.
Google Scholar
Hadi, A. S. (1992), A new measure of overall potential influence in linear regression. Computational Statistics and Data Analysis, 14, 127.
Article Google Scholar
Hadi, A. S., Simonoff, J. S. (1993), Procedures for the identification of outliers. Journal of the American Statistical Association, 88, 12641272.
Article MathSciNet Google Scholar
Hampel, F. R. (1968), Contribution to the theory of robust estimation. Ph. D. Thesis, University of California, Berkley.
Google Scholar
Hampel, F. R. (1975). Beyond location parameters: robust concepts and methods. Bulletin of the International Statistics Institute, 46, 375382.
MathSciNet Google Scholar
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., Stahel, W. A. (1986), Robust Statistics: The Approach Based on Influence Function. NY, Wiley.
MATH Google Scholar
Hawkins, D. M. (1980), Identification of Outliers. London, Chapman and Hall.
MATH Google Scholar
Hawkins, D. M., Bradu, D., Kass, G. V. (1984), Location of several outliers in multiple regression data using elemental sets. Technometrics, 26, 197208.
Article MathSciNet Google Scholar
Hoaglin, D. C., Welsch, R. E. (1978), The hat matrix in regression and ANOVA. American Statistician, 32, 1722.
Article Google Scholar
Hossjer, O. (1994), Rank-based estimates in the linear model with high breakdown point. Journal of the American Statistical Association, 89, 149158.
Article Google Scholar
Huber, P. J. (1964), Robust estimation of a location parameter. Annals of Mathematical Statistics, 35, 73101.
Article Google Scholar
Huber, P. J. (1973), Robust regression: asymptotics, conjectures and Monte Carlo. Annals of Statistics, 1, 799821.
Google Scholar
Huber, P. J. (1981), Robust Statistics. NY, Wiley.
Book MATH Google Scholar
Huber, P. J. (1991), Between robustness and diagnostics. In Stahel, W. and Weisberg, S. (Eds.), Direction in Robust Statistics and Diagnostics. 121130, NY, Springer-Verlag.
Google Scholar
Imon, A.H.M.R. (2005), Identifying multiple influential observations in linear regression. Journal of Applied Statistics, 32(9), 929946.
MathSciNet Google Scholar
Knorr, M. E., Ng, T. R., Tucakov, V. (2000), Distance-based outlier: algorithms and applications. VLDB Journal, 8, 327253.
Google Scholar
Mahalanobis, P. C. (1936), On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 12, 4955.
Google Scholar
Mannila, H. (1996), Data mining: machine learning, statistics, and databases. http:reference.kfupm.edu.sa/contentda/data mining machine learning statistic 50921.pdf; access 6−8−10.
Google Scholar
Mallow, C. P. (1975), On some topics in robustness, Unpublished memorandum, Bell telephone laboratories, Murray Hill, NJ.
Google Scholar
Maronna, R. A., Zamar, R. H. (2002), Robust estimates of location and dispersion for highdimensional data sets, Technometrics, 44, 307313.
Article MathSciNet Google Scholar
Maronna, R. A., Martin, R. D., Yohai, V. J. (2006), Robust Statistics: Theory and Methods. NY, Wiley.
Book MATH Google Scholar
Nurunnabi, A. A. M. (2008), Robust diagnostic deletion techniques in linear and logistic regression, M. Phil. Thesis, Unpublished, Rajshahi University, Bangladesh.
Google Scholar
Nurunnabi, A. A. M., Imon, A. H. M. R., Nasser, M. (2011), A diagnostic measure for influential observations in linear regression. Communication in Statistics-Theory and Methods, 40 (7), 11691183.
MathSciNet Google Scholar
Pea, D., Prieto, F. J. (2001), Multivariate outlier detection and robust covariance estimation, Technometrics, 43, 286310.
Google Scholar
Rousseeuw, P. J. (1984), Least median of squares regression. Journal of the American Statistical Association, 79, 871880.
Article MathSciNet Google Scholar
Rousseeuw, P. J., Leroy, A. M. (2003), Robust Regression and Outlier Detection. NY, Wiley.
Google Scholar
Rousseeuw, P. J., van Driessen, K. (1999), A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212223.
Article Google Scholar
Rousseeuw, P. J., van Zomeren, B. C. (1990), Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633639.
Google Scholar
Simpson, D. G., Ruppert, D., Carroll, R. J. (1992), On one-step GM-estimates and stability of inference in linear regression, Journal of the American Statistical Association, 87, 439450.
Article MathSciNet Google Scholar
Tukey, J. W. (1960), A survey of sampling from contaminated distributions: contributions to probability and statistics. Olkin, I. Ed., Stanford University Press, Stanford, California.
Google Scholar
Tukey, J. W. (1962), The future of data analysis. Annals of Mathematical Statistics, 33, 167.
MathSciNet Google Scholar
Velleman, P. F., Welsch, R. E. (1981), Efficient computing in regression diagnostics. American Statistician, 35, 234242.
Article Google Scholar
Welsch, R. E., Kuh, E. (1977), Linear regression diagnostics, Sloan School of Management Working Paper, 923977, MIT, Cambridge: Massachusetts.
Google Scholar
Willems, G., Aelst, S. V. (2004), Fast and robust bootstrap for LTS. Elsevier Science.
Google Scholar
Yohai, V. J. (1987), High breakdown point and high efficiency robust estimates for regression. The Annals of Statistics, 15, 642656.
Article Google Scholar

Download references

Author information

Authors and Affiliations

SLG, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
Abdul Awal Md. Nurunnabi
Deakin University, 221 Burwood Highway, Burwood, Melbourne, VIC, 3125, Australia
Honghua Dai

Authors

Abdul Awal Md. Nurunnabi
View author publications
You can also search for this author in PubMed Google Scholar
Honghua Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Awal Md. Nurunnabi .

Editor information

Editors and Affiliations

, School of Information Technology, Deakin University, 221 Burwood Highway, Burwood, 3125, Victoria, Australia
Honghua Dai
, Computing, Hong Kong Polytechnic University, Man Wai Building, Hunghom, PQ806, Hong Kong SAR
James N. K. Liu
, Department of Knowledge Engineering, Maastricht University, Maastricht, 6200MD, Netherlands
Evgueni Smirnov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nurunnabi, A.A.M., Dai, H. (2012). Robust-Diagnostic Regression: A Prelude for Inducing Reliable Knowledge from Regression. In: Dai, H., Liu, J., Smirnov, E. (eds) Reliable Knowledge Discovery. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1903-7_4

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1903-7_4
Published: 08 February 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-1902-0
Online ISBN: 978-1-4614-1903-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics