An unbiased model comparison test using cross-validation

Desmarais, Bruce A.; Harden, Jeffrey J.

doi:10.1007/s11135-013-9884-7

An unbiased model comparison test using cross-validation

Published: 27 June 2013

Volume 48, pages 2155–2173, (2014)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

Bruce A. Desmarais¹ &
Jeffrey J. Harden²

340 Accesses
2 Citations
Explore all metrics

Abstract

Social scientists often consider multiple empirical models of the same process. When these models are parametric and non-nested, the null hypothesis that two models fit the data equally well is commonly tested using methods introduced by Vuong (Econometrica 57(2):307–333, 1989) and Clarke (Am J Political Sci 45(3):724–744, 2001; J Confl Resolut 47(1):72–93, 2003; Political Anal 15(3):347–363, 2007). The objective of each is to compare the Kullback–Leibler Divergence (KLD) of the two models from the true model that generated the data. Here we show that both of these tests are based upon a biased estimator of the KLD, the individual log-likelihood contributions, and that the Clarke test is not proven to be consistent for the difference in KLDs. As a solution, we derive a test based upon cross-validated log-likelihood contributions, which represent an unbiased KLD estimate. We demonstrate the CVDM test’s superior performance via simulation, then apply it to two empirical examples from political science. We find that the test’s selection can diverge from those of the Vuong and Clarke tests and that this can ultimately lead to differences in substantive conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The “Pliability” of Criminological Analyses: Assessing Bias in Regression Estimates Using Monte Carlo Simulations

Article 14 November 2018

Linear Models and Regression Diagnostics

Meta-Analysis

Notes

According to Google Scholar, Vuong (1989) has been cited approximately 2,400 times and the relatively more recent work by Clarke (2001, 2003, 2007) has garnered a combined 229 citations.
Moreover, unlike other information-theoretic model comparison criteria such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), the Vuong and Clarke tests can be used to test hypotheses about the equivalence of model fit in the same way one would use the $F$ or likelihood ratio tests with nested models.
It should be noted that Vuong (1989) is very clear that all of his results are in the limit, focusing on consistency rather than bias.
We discovered this example by starting with two misspecified models and varying parameters in the data generating process until we arrived upon a simulation-based proof that it is possible for the signs of $\tilde{\mu }(l^{(d)}_i)$ and $E[l^{(d)}_i]$ to be different.
It may seem odd to see partial degrees of freedom because the $t$-distribution is often used with reference to the number of observations less the number of parameters estimated (i.e., an integer). However, the $t$-distribution is a valid probability distribution for any $df > 0$. This interval for $df$ is chosen to produce the divergence in the sign of $\tilde{\mu }(l^{(d)}_i)$ and $E[l^{(d)}_i]$.
The Laplace distribution is a symmetric, unbounded continuous distribution that has significantly heavier tails than the normal distribution (Clarke 2007). The MLE of the regression parameters with a Laplace distributed error term is equivalent to the estimate of the coefficients in median regression (Koenker 2005).
We attempted to depict 95 % confidence intervals around the mean estimates of $\tilde{\mu }(l^{(d)}_i)$ and $E[l^{(d)}_i]$ over the 10,000 iterations, but it was impossible to distinguish them on the graph.
Though this may seem somewhat restrictive, note that the general method of cross-validation can be used to conduct model comparison outside of ML estimators (see Diebold and Mariano 2002).
For examples using the Vuong test, see Mebane and Sekhon (2002), Abbe et al. (2003), Mondak and Sanders (2005), Bailey (2007), Shellman and Stewart (2007), and Konisky and Woods (2009). For those employing the Clarke test see Souva (2005), Boockmann (2006), and Travis (2010).
Another possibility is MR, which appears in one of our simulation examples above. For simplicity we only focus on the choice between OLS and RR here, though results do not change if we compare OLS and MR.
Following Lange et al. (1989) and Western (1995), we set $\nu $ = 4. However, this parameter could also be estimated from the data (Western 1995). In fact, analysts could use our CVDM test to compare a model in which $\nu $ is estimated to a model setting $\nu $ a priori.
The distributions are fit to the data by ML. The estimated parameters are the mean and variance of the normal distribution and the median and dispersion parameter for the $t$.
We replicated each model exactly. All coefficients are standardized to allow ease of presentation.
More formally, the skewness of these values is a statistically significant 1.05. The individual cross-validated log-likelihoods also exhibit this skewness.
Although the Clarke test makes the same selection as the CVDM test in this case.
It may sound odd to state the “expectation of the expected likelihood”, but this conveys the fact that the expected log-likelihood varies with the sample mean, resulting in the need for an outer expectation taken over the sampling distribution of the mean.

References

Abbe, O.G., Goodliffe, J., Herrnson, P.S., Patterson, K.D.: Agenda setting in Congressional elections: the impact of issues and campaigns on voting behavior. Political Res. Q. 56(4), 419–430 (2003)
Article Google Scholar
Achen, C.H.: Let’s put garbage-can regressions and garbage-can probits where they belong. Confl. Manag. Peace Sci. 22(4), 327–339 (2005)
Article Google Scholar
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Article Google Scholar
Amaral, M.A., Dunsmore, I.R.: Optimal estimates of predictive distributions. Biometrika 67(3), 685–689 (1980)
Article Google Scholar
Bailey, M.A.: Comparable preference estimates across time and institutions for the court, Congress, and presidency. Am. J. Political Sci. 51(3), 433–448 (2007)
Article Google Scholar
Boockmann, B.: Partisan politics and treaty ratification: the acceptance of international labour organisation conventions by industrialised democracies, 1960–1996. Eur. J. Political Res. 45(1), 153–180 (2006)
Article Google Scholar
Chaffin, W.W., Rhiel, S.G.: The effect of skewness and kurtosis on the one-sample $t$ test and the impact of knowledge of the population standard deviation. J. Stat. Comput. Simul. 46(1), 79–90 (1993)
Article Google Scholar
Clarke, K.A.: Testing nonnested models of international relations: reevaluating realism. Am. J. Political Sci. 45(3), 724–744 (2001)
Article Google Scholar
Clarke, K.A.: Nonparametric model discrimination in international relations. J. Confl. Resolut. 47(1), 72–93 (2003)
Article Google Scholar
Clarke, K.A.: A simple distribution-free test for nonnested hypotheses. Political Anal. 15(3), 347–363 (2007)
Article Google Scholar
Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20(1), 134–144 (2002)
Article Google Scholar
Gilula, Z., Haberman, S.J.: Density approximation by summary statistics: an information-theoretic approach. Scand. J. Stat. 27(3), 521–534 (2000)
Article Google Scholar
Greene, W.H.: Econometric Analysis, 6th edn. Prentice Hall, Upper Saddle River (2008)
Google Scholar
Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)
Article Google Scholar
Johnson, N.J.: Modified $t$ tests and confidence intervals for asymmetrical populations. J. Am. Stat. Assoc. 73(363), 536–544 (1978)
Google Scholar
Joshi, M., Mason, T.D.: Between democracy and revolution: peasant support for insurgency versus democracy in Nepal. J. Peace Res. 45(6), 765–782 (2008)
Article Google Scholar
Koenker, R.: Quantile Regression. Cambridge University Press, New York (2005)
Book Google Scholar
Konishi, S., Kitagawa, G.: Generalised information criteria in model selection. Biometrika 83(4), 875–890 (1996)
Article Google Scholar
Konisky, D.M., Woods, N.D.: Exporting air pollution? Regulatory enforcement and environmental free riding in the United States. Political Res. Q. 63(4), 771–782 (2010)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article Google Scholar
Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the $t$ distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989)
Google Scholar
Mebane, W.R., Sekhon, J.S.: Coordination and policy moderation at midterm. Am. Political Sci. Rev. 96(1), 141–157 (2002)
Article Google Scholar
Mondak, J.J., Sanders, M.S.: The complexity of tolerance and intolerance judgments: a response to Gibson. Political Behav. 27(4), 325–337 (2005)
Article Google Scholar
Palazzolo, D.J., Moscardelli, V.G.: Policy crisis and political leadership: election law reform in the states after the 2000 presidential election. State Politics Policy Q. 6(3), 300–321 (2006)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(379–423), 623–656 (1948)
Article Google Scholar
Shellman, S.M., Stewart, B.M.: Political persecution or economic deprivation? A time-series analysis of Haitian exodus, 1990–2004. Confl. Manag. Peace Sci. 24(2), 121–137 (2007)
Article Google Scholar
Smyth, P.: Model selection for probabilistic clustering using cross-validated likelihood. Stat. Comput. 10(1), 63–72 (2000)
Article Google Scholar
Souva, M.: Foreign policy determinants: comparing realist and domestic-political models of foreign policy. Confl. Manag. Peace Sci. 22(2), 149–163 (2005)
Article Google Scholar
Travis, R.: Problems, politics, and policy streams: a reconsideration us foreign aid behavior toward Africa. Int. Stud. Q. 54(3), 797–821 (2010)
Article Google Scholar
Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2), 307–333 (1989)
Article Google Scholar
Ward, M.D., Greenhill, B.D., Bakke, K.M.: The perils of policy by $p$-value: predicting civil conflicts. J. Peace Res. 47(4), 363–375 (2010)
Article Google Scholar
Western, B.: Concepts and suggestions for robust regression analysis. Am. J. Political Sci. 39(3), 786–817 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Political Science, University of Massachusetts—Amherst, 420 Thompson Hall, 200 Hicks Way, Amherst, MA, 01003, USA
Bruce A. Desmarais
Department of Political Science, University of Colorado Boulder, 136 Ketchum, UCB 333, Boulder, CO, 80309, USA
Jeffrey J. Harden

Authors

Bruce A. Desmarais
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey J. Harden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey J. Harden.

Appendix: Proof of Vuong test finite sample bias

Here we derive the inequality given in Eq. 3 of the main text. Suppose $\varvec{y}$ is a sample of $n$ independent observations from a normal distribution with zero mean and variance $\tau ^2$. Also, let $g$ be a normal probability density function with the mean estimated as the sample mean of $\varvec{y}$, and the variance fixed at $\sigma ^2$. Then the expected value of the observed log-likelihood is

$$\begin{aligned} E[ll_o]&= E_{\varvec{y}}\left[ \frac{1}{n} \ln \left( \prod _{i=1}^n \frac{1}{\sqrt{2\pi \sigma ^2}} \exp \left[ \frac{-1}{2\sigma ^2} \left( y_i - \frac{1}{n}\sum _{j=1}^n y_j \right) ^2\right] \right) \right] \nonumber \\&= -\ln \left( \sqrt{2 \pi \sigma ^2} \right) - \frac{1}{2\sigma ^2n} E_{\varvec{y}} \left[ \sum _{i=1}^n \left( y_i - \frac{1}{n}\sum _{j=1}^n y_j \right) ^2 \right] \nonumber \\&= -\ln \left( \sqrt{2 \pi \sigma ^2} \right) - \frac{\tau ^2(n-1)}{2\sigma ^2n}. \end{aligned}$$

(8)

The expected value of the expected log-likelihood is^{Footnote 16}

$$\begin{aligned} \begin{aligned} E[ll_e]&= E_{\bar{y}}\left[ E_{\varvec{y}}\left( \frac{1}{n} \sum _{i=1}^n \ln \frac{1}{\sqrt{2 \pi \sigma ^2}} - \frac{(y_i - \bar{y})^2}{2\sigma ^2} \right) \right] \\&= -\ln \left( \sqrt{2 \pi \sigma ^2} \right) - \frac{1}{2\sigma ^2} E_{\bar{y}}\left[ E_{\varvec{y}}\left( \frac{1}{n} \sum _{i=1}^n y_i^2 -2y_i\bar{y}+\bar{y}^2 \right) \right] \\&= -\ln \left( \sqrt{2 \pi \sigma ^2} \right) - \frac{1}{2\sigma ^2} E_{\bar{y}}\left[ \tau ^2 + \bar{y}^2 \right] \\&= -\ln \left( \sqrt{2 \pi \sigma ^2} \right) - \frac{\tau ^2+\frac{\tau ^2}{n}}{2\sigma ^2} \end{aligned} \end{aligned}$$

(9)

Now, considering two different values of $\sigma ^2$, $\sigma _1^2$ and $\sigma _2^2$ with $\sigma _1^2 < \sigma _2^2$, $E_1[ll_o] > E_2[ll_o]$ iff

$$\begin{aligned} -\ln \left( \sqrt{2 \pi \sigma _1^2} \right) - \frac{\tau ^2(n-1)}{2\sigma _1^2n}&> -\ln \left( \sqrt{2 \pi \sigma _2^2} \right) - \frac{\tau ^2(n-1)}{2\sigma _2^2n}\nonumber \\ \tau ^2&< \frac{\ln \left( \sigma ^2_2 \right) -\ln \left( \sigma _1^2 \right) }{\frac{n-1}{n}\left( \frac{1}{\sigma _1^2}-\frac{1}{\sigma _2^2}\right) }, \end{aligned}$$

(10)

and $E_1[ll_e] < E_2[ll_e]$ iff

$$\begin{aligned} -\ln \left( \sqrt{2 \pi \sigma _1^2} \right) - \frac{\tau ^2+\frac{\tau ^2}{n}}{2\sigma _1^2}&< -\ln \left( \sqrt{2 \pi \sigma _2^2} \right) - \frac{\tau ^2+\frac{\tau ^2}{n}}{2\sigma _2^2}\nonumber \\ \frac{\ln \left( \sigma ^2_2 \right) -\ln \left( \sigma _1^2 \right) }{\left[ 1+\frac{1}{n}\right] \left( \frac{1}{\sigma _1^2} -\frac{1}{\sigma _2^2}\right) }&< \tau ^2. \end{aligned}$$

(11)

Combining these two conditions gives the interval from Eq. 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Desmarais, B.A., Harden, J.J. An unbiased model comparison test using cross-validation. Qual Quant 48, 2155–2173 (2014). https://doi.org/10.1007/s11135-013-9884-7

Download citation

Published: 27 June 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11135-013-9884-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An unbiased model comparison test using cross-validation

Abstract

Access this article

Similar content being viewed by others

The “Pliability” of Criminological Analyses: Assessing Bias in Regression Estimates Using Monte Carlo Simulations

Linear Models and Regression Diagnostics

Meta-Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Vuong test finite sample bias

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An unbiased model comparison test using cross-validation

Abstract

Access this article

Similar content being viewed by others

The “Pliability” of Criminological Analyses: Assessing Bias in Regression Estimates Using Monte Carlo Simulations

Linear Models and Regression Diagnostics

Meta-Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Vuong test finite sample bias

Appendix: Proof of Vuong test finite sample bias

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation