Computations and analysis in robust regression model selection using stochastic complexity

Qian, Guoqi

doi:10.1007/BF03500911

Computations and analysis in robust regression model selection using stochastic complexity

Published: 10 September 1999

Volume 14, pages 293–314, (1999)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Guoqi Qian¹

36 Accesses
8 Citations
Explore all metrics

Summary

A stochastic complexity approach for model selection in robust linear regression is studied in this paper. Computational aspects and applications of this approach are the focuses of the study. Particularly, we provide both procedures and a package of S language programs for computing the stochastic complexity and for proceeding with the associated model selection. On the other hand, we discuss how a probability distribution on the set of candidate models may be induced by stochastic complexity and how this distribution may be used in diagnosis to measure the likelihood that a candidate model is selected. We also discuss some strategies for model selection when large number of potential explanatory variables are available. Finally, examples and a simulation study are presented for assessing the finite sample performance of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On generalized degrees of freedom with application in linear mixed models selection

Article 26 July 2014

Post-Model-Selection Prediction Intervals for Generalized Linear Models

Article 06 April 2024

On the Geometric Interplay Between Goodness-of-Fit and Estimation: Illustrative Examples

Notes

²Strictly saying, just specifying which X_i’s are included determines only a regression model class with the associated β_i’s to be given. A regression model class determined by those including X₁, X₂ and X₃ is a correct class because it contains the true model (19). Since β is mostly unknown and is uniquely estimated by the underlying regression procedure, such a correct model class may be called a correct model for purpose of conciseness.

References

Becker, R., Chambers, J.M. & Wilks, A. (1988), The New S language, Wadsworth, Belmont CA.
MATH Google Scholar
George, E.I. & McCulloch, R.E. (1997), ‘Approaches for Bayesian variable selection’, Statistica Sinica 7, 339–373.
MATH Google Scholar
Glantz, S.A. & Slinker, B.K. (1990), Primer of Applied Regression and Analysis of Variance, McGraw-Hill, Inc., New York.
Google Scholar
Hampel, F.R. (1974), ‘The influence curve and its role in robust estimation’, J. Amer. Statist Assoc. 69, 383–393.
Article MathSciNet Google Scholar
Hampel, F.R. (1983), ‘Some aspects of model choice in robust statistics’, Proceedings of the 44th Session of ISI, Book 2, Madrid, 767–771.
Hampel, F.R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986), Robust Statistics: The Approach Based on Influence Functions, Wiley, New York.
MATH Google Scholar
Hill, R.W. (1977), Robust regression when there are outliers in the carriers, Ph.D. thesis, Harvard University, Cambridge, Mass..
Google Scholar
Huber, P.J. (1964), ‘Robust estimation of a location parameter’, Ann. Math. Stat. 35, 73–101.
Article MathSciNet Google Scholar
Huber, P.J. (1981), Robust Statistics, Wiley, New York.
Book Google Scholar
Kohrt, W.M., Morgan, D.W., Bates, B. & Skinner, J.S. (1987), ‘Physiological responses of triathletes to maximal swimming, cycling, and running.’, Med. Sci. Sports Exerc. 19, 51–55.
Article Google Scholar
Machado, J.A.F. (1993), ‘Robust Model Selection and M-estimation’, Econ-Ther. 9, 478–493.
Article MathSciNet Google Scholar
Madigan, D. & York, J. (1995), ‘Bayesian graphical models for discrete data’, Internat. Statist Rev. 63, 215–232.
Article Google Scholar
Miller, A.J. (1990), Subset Selection in Regression, New York: Chapman and Hall.
Book Google Scholar
Qian, G., & Künsch, H. (1996), ‘On model selection in robust linear regression’, Res. rep. No. 80, Seminar für Statistik, Swiss Federal Institute of Technology, Zürich (ETH). To appear in J. Stat. Plan. & Infer..
Qian, G., & Künsch, H. (1998), ‘Some notes on Rissanen’s stochastic complexity.’, IEEE Trans. Inform. Theory. 44, 782–786.
Article MathSciNet Google Scholar
Rao, C.R. & Wu, Y. (1989), ‘A strongly consistent procedure for model selection in a regression problem’, Biometrika 76, 369–374.
Article MathSciNet Google Scholar
Rissanen, J. (1986), ‘Stochastic complexity and modeling’, Annals of Statistics, 14, 3, 1080–1100.
Article MathSciNet Google Scholar
Rissanen, J. (1987), ‘Stochastic complexity (with discussion)’, J. R. Statist. Soc., Ser. B, 49, 3, 223–265.
MathSciNet MATH Google Scholar
Rissanen J. (1989), Stochastic Complexity in Statistical Inquiry, World Scientific Publishing Co. Pte. Ltd., Singapore.
MATH Google Scholar
Rissanen, J. (1996), ‘Fisher information and stochastic complexity’, IEEE Trans. Inform. Theory. 42, 40–47.
Article MathSciNet Google Scholar
Ronchetti, E. (1985), ‘Robust model selection in regression’, Stat. Prob. Lett. 3, 21–23.
Article MathSciNet Google Scholar
Ronchetti, E. & Staudte, R.G. (1994), ‘A robust version of Mallows’s C_p’, J. Amer. Statist Assoc. 89, 550–559.
MathSciNet MATH Google Scholar
Smith, A.F.M. & Roberts, G.O. (1993), ‘Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods’, J. Roy. Statist. Soc. Ser. B 55, 3–23.
MathSciNet MATH Google Scholar
Tanner, M.A. (1996), Tools for Statistical Inference, 3rd Edition. Springer-Verlag, New York.
Book Google Scholar
Venables, W.N. & Ripley, B.D. (1994), Modern Applied Statistics with S-Plus, Springer-Verlag, New York.
Book Google Scholar
Weisberg, S. (1985), Applied Linear Regression (2nd ed.), Wiley, New York.
MATH Google Scholar

Download references

Acknowledgment

I am grateful to an anonymous referee for the useful comments on the first version of the paper.

Author information

Authors and Affiliations

Department of Statistics, La Trobe University, Melbourne, VIC, 3083, Australia
Guoqi Qian

Authors

Guoqi Qian
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Research supported by the La Trobe University Central Starter Grant No. 09526.

Appendix. Proof of Inequality (9)

Define = F(h) = ρ_c(t+h) − ρ_c(t) -− hψ_c(t) − min{1/2, c(c+∣t∣)⁻¹}h² for any given t. Straightforward calculations give the following expressions: When t≥ c

$$F(h)=\left\{\begin{array}{ll}{-2 c(t+h)-c(c+t)^{-1} h^{2},} & {h \leq-c-t} \\ {\frac{1}{2}(t+h-c)^{2}-c(c+t)^{-1} h^{2},} & {-c-t<h<c-t} \\ {-c(c+t)^{-1} h^{2},} & {h \geq c-t}\end{array}\right.$$

When t ≤ −c

$$F(h)=\left\{\begin{array}{ll}{-c(c-t)^{-1} h^{2},} & {h \leq-c-t} \\ {\frac{1}{2}(t+h+c)^{2}-c(c-t)^{-1} h^{2},} & {-c-t<h<c-t} \\ {2 c(t+h)-c(c-t)^{-1} h^{2},} & {h \geq c-t}\end{array}\right.$$

When ∣t∣| < c

$$F(h)=\left\{\begin{array}{ll}{-\frac{1}{2}(t+h+c)^{2},} & {h \leq-c-t} \\ {0,} & {-c-t<h<c-t} \\ {-\frac{1}{2}(t+h-c)^{2},} & {h \geq c-t}\end{array}\right.$$

It is easy to show that each of the above expressions is not great than 0. For example, given that t ≥ c and h ≤ − c − t, we have F(−c − t) = c² − ct ≤ 0 and F′(h) = −2c(c+t+ h)(c +t)⁻¹ ≥ 0, thus F(h) ≥ 0. Therefore, the assertion of (9) follows.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, G. Computations and analysis in robust regression model selection using stochastic complexity. Computational Statistics 14, 293–314 (1999). https://doi.org/10.1007/BF03500911

Download citation

Published: 10 September 1999
Issue Date: September 1999
DOI: https://doi.org/10.1007/BF03500911

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computations and analysis in robust regression model selection using stochastic complexity

Summary

Access this article

Similar content being viewed by others

On generalized degrees of freedom with application in linear mixed models selection

Post-Model-Selection Prediction Intervals for Generalized Linear Models

On the Geometric Interplay Between Goodness-of-Fit and Estimation: Illustrative Examples

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Additional information

Appendix. Proof of Inequality (9)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computations and analysis in robust regression model selection using stochastic complexity

Summary

Access this article

Similar content being viewed by others

On generalized degrees of freedom with application in linear mixed models selection

Post-Model-Selection Prediction Intervals for Generalized Linear Models

On the Geometric Interplay Between Goodness-of-Fit and Estimation: Illustrative Examples

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Additional information

Appendix. Proof of Inequality (9)

Appendix. Proof of Inequality (9)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation