Skip to main content
Log in

Model-free model-fitting and predictive distributions

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

The problem of prediction is revisited with a view towards going beyond the typical nonparametric setting and reaching a fully model-free environment for predictive inference, i.e., point predictors and predictive intervals. A basic principle of model-free prediction is laid out based on the notion of transforming a given setup into one that is easier to work with, namely i.i.d. or Gaussian. As an application, the problem of nonparametric regression is addressed in detail; the model-free predictors are worked out, and shown to be applicable under minimal assumptions. Interestingly, model-free prediction in regression is a totally automatic technique that does not necessitate the search for an optimal data transformation before model fitting. The resulting model-free predictive distributions and intervals are compared to their corresponding model-based analogs, and the use of cross-validation is extensively discussed. As an aside, improved prediction intervals in linear regression are also obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The qualitative difference is that the interest of the MF practitioner is on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof. In this sense, despite being frequentist in nature, the MF principle is in concordance with Bruno de Finetti’s statistical philosophy—see e.g. Dawid (2004) and the references therein.

  2. Rather than doing a two-dimensional search over h and q to minimize PRESS, the simple constraint q=h will be imposed in what follows which has the additional advantage of rendering \(M_{x}\geq m^{2}_{x}\) as needed for a well-defined estimator \(s^{2}_{x}\) in Eq. (11). Note that the choice q=h is not necessarily optimal; see e.g. Wang et al. (2008). Furthermore, note that these are global bandwidths; techniques for picking local bandwidths, i.e., a different optimal bandwidth for each x, are widely available but will not be discussed further here in order not to obscure the paper’s main focus. Similarly, there are several recent variations on the cross-validation theme such as the one-sided cross-validation of Hart and Yi (1998), and the far casting cross-validation for dependent data of Carmack et al. (2009) that present attractive alternatives. However, our discussion will focus on the well-known standard form of cross-validation for concreteness especially since our aim is to show how the Model-Free prediction principle applies in nonparametric regression with any type of kernel smoother, and any type of bandwidth selector.

  3. In general, the L 2-optimal predictor of Y f would be given by the conditional expectation of Y f given Y 1,…,Y n as well as x f; see e.g. Goldberger (1962). However, under model (8), the Y data are assumed independent; therefore, E(Y f|x f,Y 1,…,Y n ) simplifies to just E(Y f|x f).

  4. Here, and for the remainder of Sect. 3, we will assume that the form of the estimator m x is linear in the Y data; our running example of a kernel smoother obviously satisfies this requirement, and so do other popular methods such as local polynomial fitting.

  5. Strictly speaking, the W t ’s are not exactly independent because of dependence of \(m_{x_{t}}\) and \(s_{x_{t}}\) to \(m_{x_{k}}\) and \(s_{x_{k}}\). However, under typical conditions, \(m_{x}\stackrel{P}{\longrightarrow}E(Y|x)\) and \(s^{2}_{x}\stackrel {P}{\longrightarrow}\mathit{Var}(Y|x)\) as n→∞. Therefore, the W t ’s are—at least—asymptotically independent.

  6. If σ 2(x) is not assumed constant, then \(\tilde{e}_{t}= e_{t} C_{t}/(1-\delta_{x_{t}})\) where \(C_{t}=s_{x_{t}}/s_{x_{t}}^{(t)}\).

  7. Efron (1983) proposed an iterated bootstrap method in order to correct the downward bias of the bootstrap estimate of prediction error; his method notably involved the use of predictive residuals albeit at the second bootstrap tier—see Efron and Tibshirani (1993, Chap. 17.7) for details.

  8. For \(\bar{D}_{x_{\mathrm{f}}}^{-1}\) to be an accurate estimator of \(D_{x_{\mathrm{f}}}^{-1}\), the value x f must be such that it has an appreciable number of h-close neighbors among the original predictors x 1,…,x n as discussed in Remark 4.1. As an extreme example, note that prediction of Y f when x f is outside the range of the original predictors x 1,…,x n , i.e., extrapolation, is not feasible in the model-free paradigm.

References

  • Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185

    MathSciNet  Google Scholar 

  • Atkinson AC (1985) Plots, transformations and regression. Clarendon, Oxford

    MATH  Google Scholar 

  • Beran R (1990) Calibrating prediction regions. J Am Stat Assoc 85:715–723

    Article  MathSciNet  MATH  Google Scholar 

  • Bickel P, Li B (2006) Regularization in statistics. Test 15(2):271–344

    Article  MathSciNet  MATH  Google Scholar 

  • Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc, Ser B, Stat Methodol 26:211–252

    MathSciNet  MATH  Google Scholar 

  • Breiman L, Friedman J (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80:580–597

    Article  MathSciNet  MATH  Google Scholar 

  • Carmack PS, Schucany WR, Spence JS, Gunst RF, Lin Q, Haley RW (2009) Far casting cross-validation. J Comput Graph Stat 18(4):879–893

    Article  MathSciNet  Google Scholar 

  • Carroll RJ, Ruppert D (1988) Transformations and weighting in regression. Chapman & Hall, New York

    Google Scholar 

  • Carroll RJ, Ruppert D (1991) Prediction and tolerance intervals with transformation and/or weighting. Technometrics 33:197–210

    Article  MathSciNet  Google Scholar 

  • Cox DR (1975) Prediction intervals and empirical Bayes confidence intervals. In: Gani J (ed) Perspectives in probability and statistics. Academic Press, London, pp 47–55

    Google Scholar 

  • Dai J, Sperlich S (2010) Simple and effective boundary correction for kernel densities and regression with an application to the world income and Engel curve estimation. Comput Stat Data Anal 54(11):2487–2497

    Article  MathSciNet  Google Scholar 

  • DasGupta A (2008) Asymptotic theory of statistics and probability. Springer, New York

    MATH  Google Scholar 

  • Davison AC, Hinkley DV (1997) Bootstrap methods and their applications. Cambridge University Press, Cambridge

    Google Scholar 

  • Dawid AP (2004) Probability, causality, and the empirical world: a Bayes–de Finetti–Popper–Borel synthesis. Stat Sci 19(1):44–57

    Article  MathSciNet  MATH  Google Scholar 

  • Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc 78:316–331

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York

    MATH  Google Scholar 

  • Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, London

    MATH  Google Scholar 

  • Freedman DA (1981) Bootstrapping regression models. Ann Stat 9:1218–1228

    Article  MATH  Google Scholar 

  • Gangopadhyay AK, Sen PK (1990) Bootstrap confidence intervals for conditional quantile functions. Sankhya, Ser A 52(3):346–363

    MathSciNet  MATH  Google Scholar 

  • Goldberger AS (1962) Best linear unbiased prediction in the generalized linear regression model. J Am Stat Assoc 57:369–375

    Article  MathSciNet  MATH  Google Scholar 

  • Geisser S (1993) Predictive inference: an introduction. Chapman & Hall, New York

    MATH  Google Scholar 

  • Hahn J (1995) Bootstrapping quantile regression estimators. Econom Theory 11(1):105–121

    Article  Google Scholar 

  • Hall P (1992) The bootstrap and edgeworth expansion. Springer, New York

    Google Scholar 

  • Hall P (1993) On edgeworth expansion and bootstrap confidence bands in nonparametric curve estimation. J R Stat Soc, Ser B, Stat Methodol 55:291–304

    MATH  Google Scholar 

  • Hall P, Wehrly TE (1991) A geometrical method for removing edge effects from kernel type nonparametric regression estimators. J Am Stat Assoc 86:665–672

    Article  MathSciNet  Google Scholar 

  • Härdle W (1990) Applied nonparametric regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Härdle W, Bowman AW (1988) Bootstrapping in nonparametric regression: local adaptive smoothing and confidence bands. J Am Stat Assoc 83:102–110

    MATH  Google Scholar 

  • Härdle W, Marron JS (1991) Bootstrap simultaneous error bars for nonparametric regression. Ann Stat 19:778–796

    Article  MATH  Google Scholar 

  • Hart JD (1997) Nonparametric smoothing and lack-of-fit tests. Springer, New York

    Book  MATH  Google Scholar 

  • Hart JD, Yi S (1998) One-sided cross-validation. J Am Stat Assoc 93(442):620–631

    Article  MathSciNet  MATH  Google Scholar 

  • Hong Y (1999) Hypothesis testing in time series via the empirical characteristic function: a generalized spectral density approach. J Am Stat Assoc 94:1201–1220

    Article  MATH  Google Scholar 

  • Hong Y, White H (2005) Asymptotic distribution theory for nonparametric entropy measures of serial dependence. Econometrica 73(3):837–901

    Article  MathSciNet  MATH  Google Scholar 

  • Horowitz J (1998) Bootstrap methods for median regression models. Econometrica 66(6):1327–1351

    Article  MathSciNet  MATH  Google Scholar 

  • Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Li Q, Racine JS (2007) Nonparametric econometrics. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Linton OB, Sperlich S, van Keilegom I (2008) Estimation of a semiparametric transformation model. Ann Stat 36(2):686–718

    Article  MATH  Google Scholar 

  • Loader C (1999) Local regression and likelihood. Springer, New York

    MATH  Google Scholar 

  • McCullagh P, Nelder J (1983) Generalized linear models. Chapman & Hall, London

    MATH  Google Scholar 

  • McMurry T, Politis DN (2008) Bootstrap confidence intervals in nonparametric regression with built-in bias correction. Stat Probab Lett 78:2463–2469

    Article  MathSciNet  MATH  Google Scholar 

  • McMurry T, Politis DN (2010) Banded and tapered estimates of autocovariance matrices and the linear process bootstrap. J Time Ser Anal 31:471–482

    Article  MathSciNet  MATH  Google Scholar 

  • Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142

    Article  Google Scholar 

  • Neumann M, Polzehl J (1998) Simultaneous bootstrap confidence bands in nonparametric regression. J Nonparametr Stat 9:307–333

    Article  MathSciNet  MATH  Google Scholar 

  • Olive DJ (2007) Prediction intervals for regression models. Comput Stat Data Anal 51:3115–3122

    Article  MathSciNet  MATH  Google Scholar 

  • Pagan A, Ullah A (1999) Nonparametric econometrics. Cambridge University Press, Cambridge

    Google Scholar 

  • Patel JK (1989) Prediction intervals: a review. Commun Stat, Theory Methods 18:2393–2465

    Article  MATH  Google Scholar 

  • Politis DN (2003) A normalizing and variance-stabilizing transformation for financial time series. In: Akritas MG, Politis DN (eds) Recent advances and trends in nonparametric statistics. Elsevier, Amsterdam, pp 335–347

    Chapter  Google Scholar 

  • Politis DN (2007a) Model-free vs. model-based volatility prediction. J Financ Econom 5(3):358–389

    MathSciNet  Google Scholar 

  • Politis DN (2007b) Model-free prediction. In: Bulletin of the international statistical institute—volume LXII, Lisbon, 22–29 Aug 2007, pp 1391–1397

    Google Scholar 

  • Politis DN (2010) Model-free model-fitting and predictive distributions. Discussion Paper, Department of Economics, Univ of California—San Diego. Retrieved from: http://escholarship.org/uc/item/67j6s174

  • Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New York

    Book  MATH  Google Scholar 

  • Rosenblatt M (1952) Remarks on a multivariate transformation. Ann Math Stat 23:470–472

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert D, Cline DH (1994) Bias reduction in kernel density estimation by smoothed empirical transformations. Ann Stat 22:185–210

    Article  MathSciNet  MATH  Google Scholar 

  • Schmoyer RL (1992) Asymptotically valid prediction intervals for linear models. Technometrics 34:399–408

    Article  MathSciNet  MATH  Google Scholar 

  • Schucany WR (2004) Kernel smoothers: an overview of curve estimators for the first graduate course in nonparametric statistics. Stat Sci 19:663–675

    Article  MathSciNet  MATH  Google Scholar 

  • Seber GAF, Lee AJ (2003) Linear regression analysis. Wiley, New York

    Book  MATH  Google Scholar 

  • Shao J, Tu D (1995) The jackknife and bootstrap. Springer, New York

    Book  MATH  Google Scholar 

  • Shi SG (1991) Local bootstrap. Ann Inst Stat Math 43:667–676

    Article  MATH  Google Scholar 

  • Stine RA (1985) Bootstrap prediction intervals for regression. J Am Stat Assoc 80:1026–1031

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1988) Estimating transformations for regression via additivity and variance stabilization. J Am Stat Assoc 83:394–405

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc, Ser B, Stat Methodol 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Wang L, Brown LD, Cai TT, Levine M (2008) Effect of mean on variance function estimation in nonparametric regression. Ann Stat 36:646–664

    Article  MathSciNet  MATH  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhya, Ser A 26:359–372

    MathSciNet  MATH  Google Scholar 

  • Wolfowitz J (1957) The minimum distance method. Ann Math Stat 28:75–88

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

A preliminary version of this paper was presented as a Plenary Talk at the 10th International Vilnius Conference on Probability and Mathematical Statistics, June 28–July 3, 2010, and as a Special Invited Talk at the 28th European Meeting of Statisticians, August 17–22, 2010; the author is grateful to the audiences in these two—and several other—occasions for their helpful feedback. Many thanks are due to Arthur Berg, Wilson Cheung and Tim McMurry for invaluable help with R functions and computing, and to Richard Davis, Jeff Racine, Bill Schucany, Dimitrios Thomakos and Slava Vasiliev for helpful discussions. The author is also grateful to the Editors, Ricardo Cao and Domingo Morales, for their support and encouragement, and to six (!) anonymous referees for their very detailed and constructive comments; one of the referees deserves special thanks for an astute observation that helped shed light on the workings of the ‘uniformize’ algorithm of Sect. 4. This work has been partially supported by NSF grants DMS-07-06732 and DMS-10-07513, and by a fellowship from the Guggenheim Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitris N. Politis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Politis, D.N. Model-free model-fitting and predictive distributions. TEST 22, 183–221 (2013). https://doi.org/10.1007/s11749-013-0317-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-013-0317-7

Keywords

Mathematics Subject Classification

Navigation