Abstract
Data users often apply standard regression model selection criteria to select variables in nested error regression models, which are widely used in small area estimation. We demonstrate through a Monte Carlo simulation study that this practice may lead to selection of a non-optimal or incorrect model. To assist data users who wish to use standard regression software, we propose a transformation of the data so that transformed data follow a standard regression model. Thus, variable selection software available for the standard regression model can be directly applied to the transformed data. We illustrate our methodology using survey and satellite data for corn and soybeans in 12 Iowa counties.
Similar content being viewed by others
References
Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error components model for prediction of county crop areas using survey and satellite data. J. Am. Stat. Assoc.83, 28–36.
Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. University Press, Cambridge.
Fuller, W.A. and Battese, G.E. (1973). Transformations for estimation of linear models with nested error structures. J. Am. Stat. Assoc.68, 626–632.
Gunst, G.F. and Mason, R.L. (1980). Regression Analysis and Its Application. Marcel Dekker, New York.
Henderson, C.R. (1953). Estimation of variance and variance components. Biometrics9, 226–252.
Jiang, J. and Lahiri, P. (2006). Mixed model prediction and small area estimation. Test15, 111–999.
Jiang, J., Rao, J.S., Gu, Z. and Nguyen, T. (2008). Fence methods for mixed model selection. Ann. Stat.36, 1669–1692.
Kutner, M.H., Nachtsheim, C.J. and Neter, J. (2004). Applied Linear Regression Models. McGraw-Hill/Irwin Series Operations and Decision Sciences, New York City.
Lahiri, P. (2001). Model Selection, vol. 38. Institute of Mathematical Statistics. OH. IMS Lecture Notes/Monograph, Beachwood.
Lahiri, P. and Li, Y. (2009). A new alternative to the standard F test for clustered data. J. Stat. Plan. Inference139, 3430–41.
Lahiri, P. and Suntornchost, J. (2015). Variable Selection for Linear Mixed Models with Applications in Small Area Estimation, Sankhya B. https://doi.org/10.1007/s13571-015-0096-0.
Meza, J.L. and Lahiri, P. (2005). A note on the p c statistic under the nested error regression model. Survey Method.31, 105–109.
Muller, S., Scealy, J.L. and Welsh, A.H. (2013). Model selection in linear mixed models. Stat. Sci.28, 135–167.
Prasad, N.G.N. (1990). The estimation of mean squared errors of small area estimators. J. Amer. Statist. Assoc.85, 163–171.
Rao, J.N.K. (2003). Small Area Estimation. Wiley, New York.
Rao, C.R. and Wu, Y. (2001). On model Selection, Lahiri, P. (ed.),. Institute of Mathematical Statistics Lecture Notes-Monograph Series, 38.
Rao, J.N.K., Sutradhar, B.C. and Yue, K. (1993). Generalized least squares F test in regression analysis with two-stage cluster samples. J. Am. Stat. Assoc.88, 1388–1391.
Shao, J. (1993). Linear model selection by cross-validation. J. Am. Stat. Assoc.88, 486–494.
Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrika92, 351–370.
Acknowledgements
The authors thank editors and the anonymous referee for a few constructive suggestions that led to improvement of an earlier version of the article. The research of the second author was supported in part by the National Science Foundation Grant Number SES-1534413.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Li, Y., Lahiri, P. A Simple Adaptation of Variable Selection Software for Regression Models to Select Variables in Nested Error Regression Models. Sankhya B 81, 302–317 (2019). https://doi.org/10.1007/s13571-018-0161-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-018-0161-6
Keywords and phrases
- Fuller-Battese transformation
- Intracluster correlation
- Lahiri-Li transformation
- Variable selection criteria