Quality & Quantity

, Volume 48, Issue 4, pp 2155–2173 | Cite as

An unbiased model comparison test using cross-validation

  • Bruce A. Desmarais
  • Jeffrey J. HardenEmail author


Social scientists often consider multiple empirical models of the same process. When these models are parametric and non-nested, the null hypothesis that two models fit the data equally well is commonly tested using methods introduced by Vuong (Econometrica 57(2):307–333, 1989) and Clarke (Am J Political Sci 45(3):724–744, 2001; J Confl Resolut 47(1):72–93, 2003; Political Anal 15(3):347–363, 2007). The objective of each is to compare the Kullback–Leibler Divergence (KLD) of the two models from the true model that generated the data. Here we show that both of these tests are based upon a biased estimator of the KLD, the individual log-likelihood contributions, and that the Clarke test is not proven to be consistent for the difference in KLDs. As a solution, we derive a test based upon cross-validated log-likelihood contributions, which represent an unbiased KLD estimate. We demonstrate the CVDM test’s superior performance via simulation, then apply it to two empirical examples from political science. We find that the test’s selection can diverge from those of the Vuong and Clarke tests and that this can ultimately lead to differences in substantive conclusions.


Model selection Cross-validation Kullback–Leibler Divergence  Vuong test Clarke test 


  1. Abbe, O.G., Goodliffe, J., Herrnson, P.S., Patterson, K.D.: Agenda setting in Congressional elections: the impact of issues and campaigns on voting behavior. Political Res. Q. 56(4), 419–430 (2003)CrossRefGoogle Scholar
  2. Achen, C.H.: Let’s put garbage-can regressions and garbage-can probits where they belong. Confl. Manag. Peace Sci. 22(4), 327–339 (2005)CrossRefGoogle Scholar
  3. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)CrossRefGoogle Scholar
  4. Amaral, M.A., Dunsmore, I.R.: Optimal estimates of predictive distributions. Biometrika 67(3), 685–689 (1980)CrossRefGoogle Scholar
  5. Bailey, M.A.: Comparable preference estimates across time and institutions for the court, Congress, and presidency. Am. J. Political Sci. 51(3), 433–448 (2007)CrossRefGoogle Scholar
  6. Boockmann, B.: Partisan politics and treaty ratification: the acceptance of international labour organisation conventions by industrialised democracies, 1960–1996. Eur. J. Political Res. 45(1), 153–180 (2006)CrossRefGoogle Scholar
  7. Chaffin, W.W., Rhiel, S.G.: The effect of skewness and kurtosis on the one-sample \(t\) test and the impact of knowledge of the population standard deviation. J. Stat. Comput. Simul. 46(1), 79–90 (1993)CrossRefGoogle Scholar
  8. Clarke, K.A.: Testing nonnested models of international relations: reevaluating realism. Am. J. Political Sci. 45(3), 724–744 (2001)CrossRefGoogle Scholar
  9. Clarke, K.A.: Nonparametric model discrimination in international relations. J. Confl. Resolut. 47(1), 72–93 (2003)CrossRefGoogle Scholar
  10. Clarke, K.A.: A simple distribution-free test for nonnested hypotheses. Political Anal. 15(3), 347–363 (2007)CrossRefGoogle Scholar
  11. Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20(1), 134–144 (2002)CrossRefGoogle Scholar
  12. Gilula, Z., Haberman, S.J.: Density approximation by summary statistics: an information-theoretic approach. Scand. J. Stat. 27(3), 521–534 (2000)CrossRefGoogle Scholar
  13. Greene, W.H.: Econometric Analysis, 6th edn. Prentice Hall, Upper Saddle River (2008)Google Scholar
  14. Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)CrossRefGoogle Scholar
  15. Johnson, N.J.: Modified \(t\) tests and confidence intervals for asymmetrical populations. J. Am. Stat. Assoc. 73(363), 536–544 (1978)Google Scholar
  16. Joshi, M., Mason, T.D.: Between democracy and revolution: peasant support for insurgency versus democracy in Nepal. J. Peace Res. 45(6), 765–782 (2008)CrossRefGoogle Scholar
  17. Koenker, R.: Quantile Regression. Cambridge University Press, New York (2005)CrossRefGoogle Scholar
  18. Konishi, S., Kitagawa, G.: Generalised information criteria in model selection. Biometrika 83(4), 875–890 (1996)CrossRefGoogle Scholar
  19. Konisky, D.M., Woods, N.D.: Exporting air pollution? Regulatory enforcement and environmental free riding in the United States. Political Res. Q. 63(4), 771–782 (2010)CrossRefGoogle Scholar
  20. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)CrossRefGoogle Scholar
  21. Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the \(t\) distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989)Google Scholar
  22. Mebane, W.R., Sekhon, J.S.: Coordination and policy moderation at midterm. Am. Political Sci. Rev. 96(1), 141–157 (2002)CrossRefGoogle Scholar
  23. Mondak, J.J., Sanders, M.S.: The complexity of tolerance and intolerance judgments: a response to Gibson. Political Behav. 27(4), 325–337 (2005)CrossRefGoogle Scholar
  24. Palazzolo, D.J., Moscardelli, V.G.: Policy crisis and political leadership: election law reform in the states after the 2000 presidential election. State Politics Policy Q. 6(3), 300–321 (2006)CrossRefGoogle Scholar
  25. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(379–423), 623–656 (1948)CrossRefGoogle Scholar
  26. Shellman, S.M., Stewart, B.M.: Political persecution or economic deprivation? A time-series analysis of Haitian exodus, 1990–2004. Confl. Manag. Peace Sci. 24(2), 121–137 (2007)CrossRefGoogle Scholar
  27. Smyth, P.: Model selection for probabilistic clustering using cross-validated likelihood. Stat. Comput. 10(1), 63–72 (2000)CrossRefGoogle Scholar
  28. Souva, M.: Foreign policy determinants: comparing realist and domestic-political models of foreign policy. Confl. Manag. Peace Sci. 22(2), 149–163 (2005)CrossRefGoogle Scholar
  29. Travis, R.: Problems, politics, and policy streams: a reconsideration us foreign aid behavior toward Africa. Int. Stud. Q. 54(3), 797–821 (2010)CrossRefGoogle Scholar
  30. Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2), 307–333 (1989)CrossRefGoogle Scholar
  31. Ward, M.D., Greenhill, B.D., Bakke, K.M.: The perils of policy by \(p\)-value: predicting civil conflicts. J. Peace Res. 47(4), 363–375 (2010)CrossRefGoogle Scholar
  32. Western, B.: Concepts and suggestions for robust regression analysis. Am. J. Political Sci. 39(3), 786–817 (1995)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Department of Political ScienceUniversity of Massachusetts—AmherstAmherstUSA
  2. 2.Department of Political ScienceUniversity of Colorado BoulderBoulderUSA

Personalised recommendations