Advertisement

New Directions in Information Matrix Testing: Eigenspectrum Tests

  • Richard M. GoldenEmail author
  • Steven S. Henley
  • Halbert White
  • T. Michael Kashner
Chapter

Abstract

Model specification tests are essential tools for evaluating the appropriateness of probability models for estimation and inference. White (Econometrica, 50: 1–25, 1982) proposed that model misspecification could be detected by testing the null hypothesis that the Fisher information matrix (IM) Equality holds by comparing linear functions of the Hessian to outer product gradient (OPG) inverse covariance matrix estimators. Unfortunately, a number of researchers have reported difficulties in obtaining reliable inferences using White’s (Econometrica, 50: 1–25, 1982) original information matrix test (IMT). In this chapter, we extend White (Econometrica, 50: 1–25, 1982) to present a new generalized information matrix test (GIMT) theory and develop a new Adjusted Classical GIMT and five new Eigenspectrum GIMTs that compare nonlinear functions of the Hessian and OPG covariance matrix estimators. We then evaluate the level and power of these new GIMTs using simulation studies on realistic epidemiological data and find that they exhibit appealing performance on sample sizes typically encountered in practice. Our results suggest that these new GIMTs are important tools for detecting and assessing model misspecification, and thus will have broad applications for model-based decision making in the social, behavioral, engineering, financial, medical, and public health sciences.

Keywords

Eigenspectrum Goodness-of-fit Information matrix test  Logistic regression Specification analysis 

Notes

Acknowledgments

This research was made possible by grants from the National Cancer Institute (NCI) (R44CA139607, PI: S.S. Henley) and the National Institute on Alcohol Abuse and Alcoholism (NIAAA) (R43AA014302, PI: S.S. Henley; R43/44AA013351, PI: S.S. Henley; R44AA011607, PI: S.S. Henley) under the Small Business Innovation Research (SBIR) program. The authors wish to gratefully acknowledge this support. This chapter reflects the authors’ views and not necessarily the opinions or views of the NCI or the NIAAA. The authors would also like to thank the anonymous referee for helpful comments and suggestions.

References

  1. Agresti, A.: Categorical data analysis. New York: Wiley-Interscience, 2002.CrossRefGoogle Scholar
  2. Akaike, H.: “Information theory and an extension of the maximum likelihood principle”, 1973.Google Scholar
  3. Alonso, A., S. Litière, and G. Molenberghs: “A family of tests to detect misspecifications in the random-effects structure of generalized linear mixed models”, Computational Statistics and Data Analysis, 52(2008), 4474–4486.CrossRefGoogle Scholar
  4. Aparicio, T., and I. Villanua: “The asymptotically efficient version of the information matrix test in binary choice models. A study of size and power”, Journal of Applied Statistics, 28(2001), 167–182.CrossRefGoogle Scholar
  5. Archer, K. J., and S. Lemeshow: “Goodness-of-fit test for a logistic regression model fitted using survey sample data”, The Stata Journal, 6(2006), 97–105.Google Scholar
  6. Arminger, G., and M. E. Sobel: “Pseudo-maximum likelihood estimation of mean and covariance structures with missing data”, Journal of the American Statistical Association, 85(1990), 195–203.CrossRefGoogle Scholar
  7. Begg, M. D., and S. Lagakos: “On the consequences of model misspecification in logistic regression”, Environmental Health Perspectives, 87(1990), 69–75.CrossRefGoogle Scholar
  8. Bera, A. K., and S. Lee: “Information Matrix Test, Parameter Heterogeneity and ARCH: A Synthesis”, The Review of Economic Studies, 60(1993), 229–240.CrossRefGoogle Scholar
  9. Bertolini, G., R. D’Amico, D. Nardi, A. Tinazzi, and G. Apolone: “One model, several results: the paradox of the Hosmer-Lemeshow goodness-of-fit test for the logistic regression model”, Journal of Epidemiology and Biostatistics, 5(2000), 251–3.Google Scholar
  10. Box, E. P., G. M. Jenkins, and G. C. Reinsel: Time Series Analysis: Forecasting and Control. New York: John Wiley & Sons, 2008.Google Scholar
  11. Bozdogan, H.: “Akaike’s Information Criterion and Recent Developments in Information Complexity”, Journal of Mathematical Psychology, 44(2000), 62–91.CrossRefGoogle Scholar
  12. Bradley, A. P.: “The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms”, Pattern Recognition, 30(1997), 1145–1159.CrossRefGoogle Scholar
  13. Burnham, K. P., and D. R. Anderson: Model selection and multimodel inference : a practical information-theoretic approach. New York: Springer, 2002.Google Scholar
  14. Chesher, A.: “The information matrix test: Simplified calculation via a score test interpretation”, Economics Letters, 13(1983), 45–48.CrossRefGoogle Scholar
  15. Chesher, A., and R. Spady: “Asymptotic Expansions of the Information Matrix Test Statistic”, Econometrica, 59(1991), 787–815.CrossRefGoogle Scholar
  16. Christensen, R.: Log-Linear Models and Logistic Regression. Springer Texts in, Statistics, 1997.Google Scholar
  17. Collett, D.: Modelling Binary Data. Chapman & Hall/CRC, 2003.Google Scholar
  18. Copas, J.B.: “Unweighted sum of squares test for proportions”, Applied Statistics, 38(1989), 71–80.CrossRefGoogle Scholar
  19. Cox, D.R.: “Role of models in statistical analysis”, Statistical Science, 5(1990), 169–174.CrossRefGoogle Scholar
  20. Cramér, H.: Mathematical Methods of Statistics. Princeton: Princeton University Press, 1946.Google Scholar
  21. Davidson, R., and J. G. MacKinnon: “A New Form of the Information Matrix Test”, Econometrica, 60(1992), 145–157.CrossRefGoogle Scholar
  22. Davidson, R., and J. G. MacKinnon: “Graphical Methods for Investigating the Size and Power of Hypothesis Tests”, The Manchester School, 66(1998), 1–26.CrossRefGoogle Scholar
  23. Davison, A. C., D. V. Hinkley, and G. A. Young: “Recent Developments in Bootstrap Methodology”, Statistical Science, 18(2003), 141–157.CrossRefGoogle Scholar
  24. Davison, A. C., and C. L. Tsai: “Regression model diagnostics”, International Statistical Review, 60(1992), 337–353.CrossRefGoogle Scholar
  25. Deng, X., S. Wan, and B. Zhang: “An improved goodness-of-test for logistic regression models based on case-control data by random partition”, Communications in statistics: Simulations and computation, 38(2009), 233–243.CrossRefGoogle Scholar
  26. Dhaene, G., and D. Hoorelbeke: “The information matrix test with bootstrap-based covariance matrix estimation”, Economics Letters, 82(2004), 341–347.CrossRefGoogle Scholar
  27. DHHS: “The International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). DHHS Publication No. (PHS) 80–1280”, Washington D.C.: Department of Health and Human Services, 1980.Google Scholar
  28. Farrington, C.P.: “On assessing goodness of fit of generalized linear models to sparse data”, Journal of the Royal Statistical Society, Series B, 58(1996), 349–360.Google Scholar
  29. Fawcett, T.: “An introduction to ROC analysis”, Pattern Recognition Letters, 27(2006), 861–874.CrossRefGoogle Scholar
  30. Fisher, R.A.: “On the mathematical foundations of theoretical statistics”, Philosophical Transactions of the Royal Society of London, Series A, 222(1922), 309–368.CrossRefGoogle Scholar
  31. Gallini, J.: “Misspecifications that can result in path analysis structures”, Applied Psychological Measurement, 7(1983), 125–137.CrossRefGoogle Scholar
  32. Golden, R.M.: Mathematical methods for neural network analysis and design. Cambridge, Mass.: MIT Press, 1996.Google Scholar
  33. Golden, R. M.: “Statistical tests for comparing possibly misspecified and nonnested models”, Journal of Mathematical Psychology, 44(2000), 153–170.CrossRefGoogle Scholar
  34. Golden, R.M.: “Discrepancy risk model selection test theory for comparing possibly misspecified or nonnested models”, Psychometrika, 68(2003), 229–249.CrossRefGoogle Scholar
  35. Greene, W.: Econometric Analysis. New Jersey: Prentice-Hall, 2003.Google Scholar
  36. Hall, A.: “The Information Matrix Test for the Linear Model”, The Review of Economic Studies, 54(1987), 257–263.CrossRefGoogle Scholar
  37. Hamilton, J. D.: Time Series Analysis. Princeton, New Jersey: Princeton University Press, 1994.Google Scholar
  38. Hanley, J. A., and B. J. McNeil: “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve”, Radiology, 143(1982), 29–36.Google Scholar
  39. Harrell, F. E.: Regression modeling strategies : with applications to linear models, logistic regression, and survival analysis. New York: Springer, 2001.Google Scholar
  40. Hastie, T., R. Tibshirani, and J. Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in, Statistics, 2009.Google Scholar
  41. Hastie, T. J., and R. J. Tibshirani: “Generalized additive models”, Statistical Science, 3(1986), 297–318.CrossRefGoogle Scholar
  42. Hastie, T. J., and R. J. Tibshirani: Generalized Additive Models. Chapman & Hall/CRC, 1990.Google Scholar
  43. Henley, S. S., R. M. Golden, T. M. Kashner, and H. White: “Exploiting Hidden Structures in Epidemiological Data: Phase II Project”, (R44AA011607) National Institute on Alcohol Abuse and Alcoholism, 2000. http://www.sbir.gov/sbirsearch/detail/223679
  44. Henley, S. S., R. M. Golden, T. M. Kashner, H. White, and R. D. Katz: “Improving Validity Measures for Alcohol-Related Models: Phase I Project”, (R43AA013351) National Institute on Alcohol Abuse and Alcoholism, 2001. http://www.sbir.gov/sbirsearch/detail/223681
  45. Henley, S. S., R. M. Golden, T. M. Kashner, H. White, and R. D. Katz: “Robust Classification Methods for Categorical Regression: Phase I Project”, (R43AA014302) National Institute on Alcohol Abuse and Alcoholism, 2003. http://www.sbir.gov/sbirsearch/detail/223689
  46. Henley, S. S., R. M. Golden, T. M. Kashner, H. White, and D. Paik: “Robust Classification Methods for Categorical Regression: Phase II Project”, (R44CA139607) National Cancer Institute, 2008. http://www.sbir.gov/sbirsearch/detail/223709
  47. Henley, S. S., R. M. Golden, T. M. Kashner, H. White, L. Xuan, D. Paik, and R. D. Katz: “Improving Validity Measures in Alcohol-Related Models: Phase II Project”, (R44AA013351) National Institute on Alcohol Abuse and Alcoholism, 2004. http://www.sbir.gov/sbirsearch/detail/223693
  48. Hilbe, J. M.: Logistic Regression Models. New York: Chapman and Hall, 2009.Google Scholar
  49. Horowitz, J.L.: “Bootstrap-based critical values for the information matrix test”, Journal of Econometrics, 61(1994), 395–411.CrossRefGoogle Scholar
  50. Horowitz, J.L.: “The bootstrap in econometrics”, Statistical Science, 18(2003), 211–218.CrossRefGoogle Scholar
  51. Hosmer, D. W., T. Hosmer, S. LeCessie, and S. Lemeshow: “A comparison of goodness-of-fit tests for the logistic regression model”, Statistics in Medicine, 16(1997), 965–980.CrossRefGoogle Scholar
  52. Hosmer, D. W., and S. Lemeshow: “A goodness-of-fit test for the multiple logistic regression model”, Communication in Statistics, A10(1980), 1043–1069.CrossRefGoogle Scholar
  53. Hosmer, D. W., and S. Lemeshow: Applied Logistic Regression. New York: John Wiley & Sons, 2000.CrossRefGoogle Scholar
  54. Hosmer, D. W., S. Lemeshow, and J. Klar: “Goodness-of-Fit Testing for Multiple Logistic Regression Analysis when the Estimated Probabilities are Small”, Biometrical Journal, 30(1988), 1–14.CrossRefGoogle Scholar
  55. Hosmer, D. W., S. Taber, and S. Lemeshow: “The importance of assessing the fit of logistic regression models: a case study”, American Journal of Public Health, 81(1991), 1630–1635.CrossRefGoogle Scholar
  56. Huber, P.: “The behavior of maximum likelihood estimates under non-standard conditions”, University of California Press, 1967.Google Scholar
  57. Kashner, T. M., T. J. Carmody, T. Suppes, A. J. Rush, M. L. Crismon, A. L. Miller, M. Toprac, and M. Trivedi: “Catching up on health outcomes: The Texas Medication Algorithm Project”, Health Services Research, 38(2003), 311–331.CrossRefGoogle Scholar
  58. Kashner, T. M., S. S. Henley, R. M. Golden, J. M. Byrne, S. A. Keitz, G. W. Cannon, B. K. Chang, G. J. Holland, D. C. Aron, E. A. Muchmore, A. Wicker, and H. White: “Studying the Effects of ACGME Duty Hours Limits on Resident Satisfaction: Results From VA Learners’ Perceptions Survey”, Academic Medicine, 85(2010), 1130–1139.Google Scholar
  59. Kashner, T. M., S. S. Henley, R. M. Golden, A. J. Rush, and R. B. Jarrett: “Assessing the preventive effects of cognitive therapy following relief of depression: A methodological innovation”, Journal of Affective Disorders, 104(2007), 251–261.CrossRefGoogle Scholar
  60. Kashner, T. M., R. Rosenheck, A. B. Campinell, A. Suris, and C. W. T. S. Team: “Impact of work therapy on health status among homeless, substance-dependent veterans - A randomized controlled trial”, Archives of General Psychiatry, 59(2002), 938–944.CrossRefGoogle Scholar
  61. Konishi, S., and G. Kitagawa: “Generalized information criteria in model selection”, Biometrika, 83(1996), 875–890.CrossRefGoogle Scholar
  62. Kuss, O.: “Global goodness-of-fit tests in logistic regression with sparse data”, Statistics in Medicine, 21(2002), 3789–3801.CrossRefGoogle Scholar
  63. Lancaster, T.: “The Covariance Matrix of the Information Matrix Test”, Econometrica, 52(1984), 1051–1054.CrossRefGoogle Scholar
  64. Lehmann, E. L.: “Model specification: The views of Fisher and Neyman, and later developments”, Statistical Science, 5(1990), 160–168.CrossRefGoogle Scholar
  65. Maddala, G. S.: Limited-dependent and Qualitative Variables in Econometrics. New York: Cambridge, 1999.Google Scholar
  66. Magnus, J. R.: “On differentiating eigenvalues and eigenvectors”, Econometric Theory, 1(1985), 179–191.CrossRefGoogle Scholar
  67. Magnus, J. R., and H. Neudecker: Matrix Differential Calculus with Applications in Statistics and Econometrics. New York: John Wiley & Sons, 1999.Google Scholar
  68. McCullagh, P.: “On the asymptotic distribution of Pearson’s statistic in linear exponential family models”, International Statistical Review, 53(1985), 61–67.CrossRefGoogle Scholar
  69. McCullagh, P., and J. A. Nelder: Generalized linear models. New York: Chapman and Hall, 1989.Google Scholar
  70. Orme, C.: “The Calculation of the Information Matrix Test for Binary Data Models”, The Manchester School, 56(1988), 370–376.CrossRefGoogle Scholar
  71. Orme, C.: “The small-sample performance of the information-matrix test”, Journal of Econometrics, 46(1990), 309–331.CrossRefGoogle Scholar
  72. Osius, G., and D. Rojek: “Normal goodness-of-fit tests for multinomial models with large degrees-of-freedom”, Journal of the American Statistical Association, 87(1992), 1145–1152.CrossRefGoogle Scholar
  73. Pepe, M. S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: Oxford University Press, 2004.Google Scholar
  74. Politis, D. N., J. P. Romano, and M. Wolf: Subsampling. New York: Springer, 1999.CrossRefGoogle Scholar
  75. Qin, J., and B. Zhang: “A goodness-of-fit test for logistic regression models based on case-control data”, Biometrika, 84(1997), 609–618.CrossRefGoogle Scholar
  76. Raudenbush, S. W., and A. S. Bryk: Hierarchical Linear Models: Applications and Data Analysis Methods. Thousand Oaks, CA: Sage Publications, Inc., 2002.Google Scholar
  77. Sarkar, S. K., and H. Midi: “Importance of assessing the model adequacy of binary logistic regression”, Journal of Applied Sciences, 10(2010), 479–486.CrossRefGoogle Scholar
  78. Serfling, R. J.: Approximation theorems of mathematical statistics. New York: John Wiley & Sons, 1980.CrossRefGoogle Scholar
  79. Stomberg, C., and H. White: “Bootstrapping the Information Matrix Test”, University of California, San Diego Department of Economics Discussion Paper, 2000.Google Scholar
  80. Stukel, T.A.: “Generalized logistic models”, Journal of the American Statistical Association, 83(1988), 426–431.CrossRefGoogle Scholar
  81. Takeuchi, K.: “Distribution of information statistics and a criterion of model fitting for adequacy of models”, Mathematical Sciences, 153(1976), 12–18.Google Scholar
  82. Taylor, L.W.: “The Size Bias of White’s Information Matrix Test”, Economics Letters, 24(1987), 63–67.CrossRefGoogle Scholar
  83. Tsay, R.S.: Analysis of Financial Time Series. New York: John Wiley & Sons, 2010.CrossRefGoogle Scholar
  84. Tsiatis, A.A.: “A Note on a goodness-of-fit test for the logistic regression model”, Biometrika, 67(1980), 250–251.CrossRefGoogle Scholar
  85. Verbeke, G., and E. Lesaffre: “The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data”, Computational Statistics and Data Analysis, 23(1997), 541–556.CrossRefGoogle Scholar
  86. Vuong, Q.H.: “Likelihood ratio tests for model selection and non-nested hypotheses”, Econometrica, 57(1989).Google Scholar
  87. Wald, A.: “Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large”, Transactions of the American Mathematical Society, 54(1943), 426–482.CrossRefGoogle Scholar
  88. Wei, B.: Exponential Family Nonlinear Models. New York: Springer, 1998.Google Scholar
  89. White, H.: “Using least squares to approximate unknown regression functions”, International Economic Review, 21(1980), 149–170.CrossRefGoogle Scholar
  90. White, H.: “Consequences and detection of misspecified nonlinear regression models”, Journal of the American Statistical Association, 76(1981), 419–433.CrossRefGoogle Scholar
  91. White, H.: “Maximum Likelihood Estimation of Misspecified Models”, Econometrica, 50(1982), 1–25.CrossRefGoogle Scholar
  92. White, H.: “Specification Testing in Dynamic Models”, Cambridge University Press, 1987.Google Scholar
  93. White, H.: Estimation, inference, and specification analysis. Cambridge: Cambridge University Press, 1994.Google Scholar
  94. Wickens, T.D.: Elementary Signal Detection Theory. New York: Oxford University Press, 2002.Google Scholar
  95. Winkler, G.: Image Analysis, Random Fields, and Dynamic Monte Carlo Methods. New York: Springer-Verlag, 1991.Google Scholar
  96. Zhang, B.: “A chi-squared goodness-of-fit test for logistic regression models based on case-control data”, Biometrika, 86(1999), 531–539.CrossRefGoogle Scholar
  97. Zhang, B.: “An information matrix test for logistic regression models based on case-control data”, Biometrika, 88(2001), 921–932.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Richard M. Golden
    • 1
    Email author
  • Steven S. Henley
    • 2
    • 3
  • Halbert White
    • 4
  • T. Michael Kashner
    • 3
    • 5
    • 6
  1. 1.Cognitive Science and Engineering School of Behavioral and Brain SciencesUniversity of Texas at DallasRichardsonUSA
  2. 2.Martingale Research CorporationPlanoUSA
  3. 3.Department of MedicineLoma Linda University School of MedicineLoma LindaUSA
  4. 4.Department of EconomicsUniversity of California San DiegoLa JollaUSA
  5. 5.Department of PsychiatryUniversity of Texas Southwestern Medical CenterDallasUSA
  6. 6.Office of Academic Affiliations Department of Veterans AffairsWashingtonUSA

Personalised recommendations