Abstract
Social scientists often consider multiple empirical models of the same process. When these models are parametric and non-nested, the null hypothesis that two models fit the data equally well is commonly tested using methods introduced by Vuong (Econometrica 57(2):307–333, 1989) and Clarke (Am J Political Sci 45(3):724–744, 2001; J Confl Resolut 47(1):72–93, 2003; Political Anal 15(3):347–363, 2007). The objective of each is to compare the Kullback–Leibler Divergence (KLD) of the two models from the true model that generated the data. Here we show that both of these tests are based upon a biased estimator of the KLD, the individual log-likelihood contributions, and that the Clarke test is not proven to be consistent for the difference in KLDs. As a solution, we derive a test based upon cross-validated log-likelihood contributions, which represent an unbiased KLD estimate. We demonstrate the CVDM test’s superior performance via simulation, then apply it to two empirical examples from political science. We find that the test’s selection can diverge from those of the Vuong and Clarke tests and that this can ultimately lead to differences in substantive conclusions.
Similar content being viewed by others
Notes
Moreover, unlike other information-theoretic model comparison criteria such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), the Vuong and Clarke tests can be used to test hypotheses about the equivalence of model fit in the same way one would use the \(F\) or likelihood ratio tests with nested models.
It should be noted that Vuong (1989) is very clear that all of his results are in the limit, focusing on consistency rather than bias.
We discovered this example by starting with two misspecified models and varying parameters in the data generating process until we arrived upon a simulation-based proof that it is possible for the signs of \(\tilde{\mu }(l^{(d)}_i)\) and \(E[l^{(d)}_i]\) to be different.
It may seem odd to see partial degrees of freedom because the \(t\)-distribution is often used with reference to the number of observations less the number of parameters estimated (i.e., an integer). However, the \(t\)-distribution is a valid probability distribution for any \(df > 0\). This interval for \(df\) is chosen to produce the divergence in the sign of \(\tilde{\mu }(l^{(d)}_i)\) and \(E[l^{(d)}_i]\).
The Laplace distribution is a symmetric, unbounded continuous distribution that has significantly heavier tails than the normal distribution (Clarke 2007). The MLE of the regression parameters with a Laplace distributed error term is equivalent to the estimate of the coefficients in median regression (Koenker 2005).
We attempted to depict 95 % confidence intervals around the mean estimates of \(\tilde{\mu }(l^{(d)}_i)\) and \(E[l^{(d)}_i]\) over the 10,000 iterations, but it was impossible to distinguish them on the graph.
Though this may seem somewhat restrictive, note that the general method of cross-validation can be used to conduct model comparison outside of ML estimators (see Diebold and Mariano 2002).
Another possibility is MR, which appears in one of our simulation examples above. For simplicity we only focus on the choice between OLS and RR here, though results do not change if we compare OLS and MR.
The distributions are fit to the data by ML. The estimated parameters are the mean and variance of the normal distribution and the median and dispersion parameter for the \(t\).
We replicated each model exactly. All coefficients are standardized to allow ease of presentation.
More formally, the skewness of these values is a statistically significant 1.05. The individual cross-validated log-likelihoods also exhibit this skewness.
Although the Clarke test makes the same selection as the CVDM test in this case.
It may sound odd to state the “expectation of the expected likelihood”, but this conveys the fact that the expected log-likelihood varies with the sample mean, resulting in the need for an outer expectation taken over the sampling distribution of the mean.
References
Abbe, O.G., Goodliffe, J., Herrnson, P.S., Patterson, K.D.: Agenda setting in Congressional elections: the impact of issues and campaigns on voting behavior. Political Res. Q. 56(4), 419–430 (2003)
Achen, C.H.: Let’s put garbage-can regressions and garbage-can probits where they belong. Confl. Manag. Peace Sci. 22(4), 327–339 (2005)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Amaral, M.A., Dunsmore, I.R.: Optimal estimates of predictive distributions. Biometrika 67(3), 685–689 (1980)
Bailey, M.A.: Comparable preference estimates across time and institutions for the court, Congress, and presidency. Am. J. Political Sci. 51(3), 433–448 (2007)
Boockmann, B.: Partisan politics and treaty ratification: the acceptance of international labour organisation conventions by industrialised democracies, 1960–1996. Eur. J. Political Res. 45(1), 153–180 (2006)
Chaffin, W.W., Rhiel, S.G.: The effect of skewness and kurtosis on the one-sample \(t\) test and the impact of knowledge of the population standard deviation. J. Stat. Comput. Simul. 46(1), 79–90 (1993)
Clarke, K.A.: Testing nonnested models of international relations: reevaluating realism. Am. J. Political Sci. 45(3), 724–744 (2001)
Clarke, K.A.: Nonparametric model discrimination in international relations. J. Confl. Resolut. 47(1), 72–93 (2003)
Clarke, K.A.: A simple distribution-free test for nonnested hypotheses. Political Anal. 15(3), 347–363 (2007)
Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20(1), 134–144 (2002)
Gilula, Z., Haberman, S.J.: Density approximation by summary statistics: an information-theoretic approach. Scand. J. Stat. 27(3), 521–534 (2000)
Greene, W.H.: Econometric Analysis, 6th edn. Prentice Hall, Upper Saddle River (2008)
Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)
Johnson, N.J.: Modified \(t\) tests and confidence intervals for asymmetrical populations. J. Am. Stat. Assoc. 73(363), 536–544 (1978)
Joshi, M., Mason, T.D.: Between democracy and revolution: peasant support for insurgency versus democracy in Nepal. J. Peace Res. 45(6), 765–782 (2008)
Koenker, R.: Quantile Regression. Cambridge University Press, New York (2005)
Konishi, S., Kitagawa, G.: Generalised information criteria in model selection. Biometrika 83(4), 875–890 (1996)
Konisky, D.M., Woods, N.D.: Exporting air pollution? Regulatory enforcement and environmental free riding in the United States. Political Res. Q. 63(4), 771–782 (2010)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the \(t\) distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989)
Mebane, W.R., Sekhon, J.S.: Coordination and policy moderation at midterm. Am. Political Sci. Rev. 96(1), 141–157 (2002)
Mondak, J.J., Sanders, M.S.: The complexity of tolerance and intolerance judgments: a response to Gibson. Political Behav. 27(4), 325–337 (2005)
Palazzolo, D.J., Moscardelli, V.G.: Policy crisis and political leadership: election law reform in the states after the 2000 presidential election. State Politics Policy Q. 6(3), 300–321 (2006)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(379–423), 623–656 (1948)
Shellman, S.M., Stewart, B.M.: Political persecution or economic deprivation? A time-series analysis of Haitian exodus, 1990–2004. Confl. Manag. Peace Sci. 24(2), 121–137 (2007)
Smyth, P.: Model selection for probabilistic clustering using cross-validated likelihood. Stat. Comput. 10(1), 63–72 (2000)
Souva, M.: Foreign policy determinants: comparing realist and domestic-political models of foreign policy. Confl. Manag. Peace Sci. 22(2), 149–163 (2005)
Travis, R.: Problems, politics, and policy streams: a reconsideration us foreign aid behavior toward Africa. Int. Stud. Q. 54(3), 797–821 (2010)
Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2), 307–333 (1989)
Ward, M.D., Greenhill, B.D., Bakke, K.M.: The perils of policy by \(p\)-value: predicting civil conflicts. J. Peace Res. 47(4), 363–375 (2010)
Western, B.: Concepts and suggestions for robust regression analysis. Am. J. Political Sci. 39(3), 786–817 (1995)
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Vuong test finite sample bias
Appendix: Proof of Vuong test finite sample bias
Here we derive the inequality given in Eq. 3 of the main text. Suppose \(\varvec{y}\) is a sample of \(n\) independent observations from a normal distribution with zero mean and variance \(\tau ^2\). Also, let \(g\) be a normal probability density function with the mean estimated as the sample mean of \(\varvec{y}\), and the variance fixed at \(\sigma ^2\). Then the expected value of the observed log-likelihood is
The expected value of the expected log-likelihood isFootnote 16
Now, considering two different values of \(\sigma ^2\), \(\sigma _1^2\) and \(\sigma _2^2\) with \(\sigma _1^2 < \sigma _2^2\), \(E_1[ll_o] > E_2[ll_o]\) iff
and \(E_1[ll_e] < E_2[ll_e]\) iff
Combining these two conditions gives the interval from Eq. 3.
Rights and permissions
About this article
Cite this article
Desmarais, B.A., Harden, J.J. An unbiased model comparison test using cross-validation. Qual Quant 48, 2155–2173 (2014). https://doi.org/10.1007/s11135-013-9884-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-013-9884-7