Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Conditional density estimation using the local Gaussian correlation

Abstract

Let \(\mathbf {X} = (X_1,\ldots ,X_p)\) be a stochastic vector having joint density function \(f_{\mathbf {X}}(\mathbf {x})\) with partitions \(\mathbf {X}_1 = (X_1,\ldots ,X_k)\) and \(\mathbf {X}_2 = (X_{k+1},\ldots ,X_p)\). A new method for estimating the conditional density function of \(\mathbf {X}_1\) given \(\mathbf {X}_2\) is presented. It is based on locally Gaussian approximations, but simplified in order to tackle the curse of dimensionality in multivariate applications, where both response and explanatory variables can be vectors. We compare our method to some available competitors, and the error of approximation is shown to be small in a series of examples using real and simulated data, and the estimator is shown to be particularly robust against noise caused by independent variables. We also present examples of practical applications of our conditional density estimator in the analysis of time series. Typical values for k in our examples are 1 and 2, and we include simulation experiments with values of p up to 6. Large sample theory is established under a strong mixing condition.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Bashtannyk, D.M., Hyndman, R.J.: Bandwidth selection for kernel conditional density estimation. Comput. Stat. Data Anal. 36(3), 279–298 (2001)

  2. Berentsen, G.D., Cao, R., Francisco-Fernández, M., Tjøstheim, D.: Some properties of local gaussian correlation and other nonlinear dependence measures. J. Time Ser. Anal. 38, 352–380. doi:10.1111/jtsa.12183 (2017)

  3. Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods. Springer, Berlin (2013)

  4. Bücher, A., Volgushev, S.: Empirical and sequential empirical copula processes under serial dependence. J. Multivar. Anal. 119, 61–70 (2013)

  5. Chacón, J.E., Duong, T.: Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test 19(2), 375–398 (2010)

  6. Chen, X., Linton, O.B.: The estimation of conditional densities. LSE STICERD Research Paper No. EM415 (2001)

  7. Dette, H., Van Hecke, R., Volgushev, S.: Some comments on copula-based regression. J. Am. Stat. Assoc. 109(507), 1319–1324 (2014)

  8. Fan, J., Yao, Q.: Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, Berlin (2003)

  9. Fan, J., Yim, T.H.: A crossvalidation method for estimating conditional densities. Biometrika 91(4), 819–834 (2004)

  10. Fan, J., Yao, Q., Tong, H.: Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83(1), 189–206 (1996)

  11. Faugeras, O.P.: A quantile-copula approach to conditional density estimation. J. Multivar. Anal. 100(9), 2083–2099 (2009)

  12. Geenens, G., Charpentier, A., Paindaveine, D.: Probit transformation for nonparametric kernel estimation of the copula density. arXiv preprint arXiv:1404.4414 (2014)

  13. Hall, P.: On Kullback–Leibler loss and density estimation. Ann. Stat. 15(4), 1491–1519 (1987)

  14. Hall, P., Racine, J.S., Li, Q.: Cross-validation and the estimation of conditional probability densities. J. Am. Stat. Assoc. 99(468), 1015–1026 (2004)

  15. Hayfield, T., Racine, J.S.: Nonparametric econometrics: the np package. J. Stat. Softw. 27(5), 1–32 (2008)

  16. Hjort, N.L., Jones, M.C.: Locally parametric nonparametric density estimation. Ann. Stat. 24(4), 1619–1647 (1996)

  17. Holmes, M.P., Gray, A.G., Isbell, C.L.: Fast nonparametric conditional density estimation. arXiv preprint arXiv:1206.5278 (2012)

  18. Hothorn, T., Kneib, T., Bühlmann, P.: Conditional transformation models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 76(1), 3–27 (2014)

  19. Hyndman, R.J., Yao, Q.: Nonparametric estimation and symmetry tests for conditional density functions. J. Nonparametr. Stat. 14(3), 259–278 (2002)

  20. Hyndman, R.J., Bashtannyk, D.M., Grunwald, G.K.: Estimating and visualizing conditional densities. J. Comput. Graph. Stat. 5(4), 315–336 (1996)

  21. Irle, A.: On consistency in nonparametric estimation under mixing conditions. J. Multivar. Anal. 60(1), 123–147 (1997)

  22. Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Pearson Education Iternational, New York (2007)

  23. Lacal, V., Tjøstheim, D.: Local Gaussian autocorrelation and tests for serial independence. J. Time Ser. Anal. 38(1), 51–71 (2017). doi:10.1111/jtsa.12195

  24. Matt, P.: Transformations in density estimation. J. Am. Stat. Assoc. 86(414), 343–353 (1991)

  25. Nelsen, R.B.: An Introduction to Copulas, vol. 139. Springer, Berlin (2013)

  26. Newey, W.K.: Uniform convergence in probability and stochastic equicontinuity. Econometrica 59(4), 1161–1167 (1991)

  27. Noh, H., El Ghouch, A., Bouezmarni, T.: Copula-based regression estimation and inference. J. Am. Stat. Assoc. 108(502), 676–688 (2013)

  28. Otneim, H., Tjøstheim, D.: The locally Gaussian density estimator for multivariate data. Stat. Comput. 1–22 (2016). doi:10.1007/s11222-016-9706-6. ISSN: 1573-1375

  29. Palaro, H.P., Hotta, L.K.: Using conditional copula to estimate value at risk. J. Data Sci. 4, 93–115 (2006)

  30. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)

  31. Peligrad, M.: On the central limit theorem for weakly dependent sequences with a decomposed strong mixing coefficient. Stoch. Process. Appl 42(2), 181–193 (1992)

  32. R Core Team: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015).https://www.R-project.org/

  33. Rosenblatt, M.: Conditional probability density and regression estimators. Multivar. Anal. II 25, 31 (1969)

  34. Rosenblatt, M., et al.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956)

  35. Ruppert, D., Cline, D.B.H.: Bias reduction in kernel density estimation by smoothed empirical transformations. Ann. Stat. 22(1), 185–210 (1994)

  36. Schervish, M.J.: Theory of Statistics. Springer, Berlin (1995)

  37. Severini, T.A.: Likelihood Methods in Statistics. Oxford Science Publications, Oxford University Press, Oxford (2000). ISBN 9780198506508

  38. Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B (Methodol) 53(3), 683–690 (1991)

  39. Silverman, B.W.: Density estimation for statistics and data analysis. Monogr. Stat. Appl. Probab. 26 (1986)

  40. Stone, C.J.: Large-sample inference for log-spline models. Ann. Stat. 18(2), 717–741 (1990)

  41. Stone, C.J., Hansen, M.H., Kooperberg, C., Truong, Y.K., et al.: Polynomial splines and their tensor products in extended linear modeling: 1994 Wald Memorial Lecture. Ann. Stat. 25(4), 1371–1470 (1997)

  42. Tjøstheim, D., Hufthammer, K.O.: Local Gaussian correlation: a new measure of dependence. J. Econom. 172(1), 33–48 (2013)

  43. Wand, M.P., Jones, M.C.: Multivariate plug-in bandwidth selection. Comput. Stat. 9(2), 97–116 (1994)

Download references

Acknowledgements

The authors would like to thank two anonymus refrees, who provided several useful comments and suggestions in the preparation of this article. This work has been partly supported by The Finance Market Fund, project number 261570.

Author information

Correspondence to Håkon Otneim.

Electronic supplementary material

Appendices

Appendix 1: Proofs

Proof of Theorem 1

Except from a slight modification that accounts for the replacement of independence with \(\alpha \)-mixing, the proof of Theorem 1 is identical to the corresponding proof in Otneim and Tjøstheim (2016), which again is based on the global maximum likelihood case covered by Severini (2000). For each location \(\mathbf {z}\), that we for simplicity suppress from notation, denote by \(Q_{\mathbf {h}_n,K}(\rho )\) the expectation of the local likelihood function \(L_n(\rho , \mathbf {Z})\). Consistency follows from uniform convergence in probability of \(L_n(\rho , \mathbf {Z})\) toward \(Q_{\mathbf {h}_n,K}(\rho )\), conditions for which are provided in Corollary 2.2 by Newey (1991).

The result requires compact support of the parameter space, equicontinuity and Lipschitz continuity of the family of functions \(\{Q_{\mathbf {h}_n, K}(\rho )\}\), as well as pointwise convergence of the local likelihood functions. Compactness is covered by Assumption D, and the demonstration of equi- and Lipschitz continuity in Otneim and Tjøstheim (2016) does not rely on the independent data assumption. Pointwise convergence follows from a standard nonparametric law of large numbers in the independent case. Our assumption B about \(\alpha \)-mixing data, however, ensures that pointwise convergence still holds; see, for example, Theorem 1 by Irle (1997), conditions for which are straightforward to verify in our local likelihood setting.

The rest of the proof is identical to the corresponding argument by Severini (2000, pp. 105–107).

Proof of Theorem 2

Consider first the bivariate case, in which there is only one local correlation to estimate. The first part of the proof goes through exactly as in the iid case of Otneim and Tjøstheim (2016). We follow the argument for global maximum likelihood estimators as presented in Theorem 7.63 by Schervish (1995). The statement of Theorem 2 follows provided that

$$\begin{aligned} Y_n(\mathbf {z}) = \sum _{i=1}^nK\left( |\mathbf {h}_n|^{-1}(\mathbf {Z}_i - \mathbf {z})\right) u(\mathbf {Z}_i,\rho _0) = \sum _{i=1}^nV_{ni},\nonumber \\ \end{aligned}$$
(17)

is asymptotically normal, and this follows from a standard Taylor expansion. In the iid case, the limiting distribution of (17) is derived using the same technique as when demonstrating asymptotic normality for the standard kernel estimator, for example, as in the proof of Theorem 1A by Parzen (1962). We establish asymptotic normality of (17) in case of \(\alpha \)-mixing data, however, by going through the steps used in proving Theorem 2.22 in Fan and Yao (2003). Let \(W_i = h^{-1}V_{ni}\), then

$$\begin{aligned} \frac{1}{nh^2}\text {Var}(Y_n(\mathbf {z}))&= \frac{1}{nh^2} \Bigg \{ \sum _{i=1}^n \text {Var}(V_{ni}) \\&\qquad + 2{\sum \sum }_{1\le i < j \le n}\text {Cov}(V_{ni},V_{nj})\Bigg \} \\&= \text {Var}(W_1) + 2\sum _{j=1}^n (1-j/n)\text {Cov}(W_1, W_{j+1}), \end{aligned}$$

where

$$\begin{aligned} \text {Var}(W_1)&= \textit{E}(W_1^2) - (\textit{E}(W_1))^2 \\&= \int h^{-2}u^2(\mathbf {z}, \rho _0)K^2(h^{-1}(\mathbf {y} - \mathbf {z})) f(\mathbf {y}) \,\text {d}\mathbf {y} + O(h^2) \\&= \int u^2(\mathbf {z} + h\mathbf {v})K^2(\mathbf {v})f(\mathbf {z} + h\mathbf {v})\, \text {d}\mathbf {v} + O(h^2) \\&\quad \rightarrow u^2(\mathbf {z}, \rho _0)f(\mathbf {z})\int K^2(\mathbf {v})\, \text {d}\mathbf {v} \mathop {=}\limits ^{\text {def}} M(\mathbf {z}) \,\,\text {as}\,\, \mathbf {h}\rightarrow 0, \end{aligned}$$

and

$$\begin{aligned} |\text {Cov}(W_1, W_{j+1})|= & {} |\textit{E}(W_1W_{j+1}) - \textit{E}(W_1)\textit{E}(W_{j+1})|\\= & {} O(h^2), \end{aligned}$$

using the same argument once again. Therefore,

$$\begin{aligned} \left| \sum _{j=1}^{m_n}\text {Cov}(W_1,W_{j+1})\right| = O(m_nh^2). \end{aligned}$$

Fan and Yao (2003) require that

$$\begin{aligned} \textit{E}(u(\mathbf {Z}_n, \rho _0(\mathbf {z}))^{\delta })<\infty \end{aligned}$$
(18)

for some \(\delta >2,\) but this is of course true for our transformed data, because it is marginally normal. In proposition 2.5(i) by Fan and Yao (2003) we can therefore use \(p=q=\delta >2\) in order to obtain, for some constant C,

$$\begin{aligned} |\text {Cov}(W_|, W_{j+1})|\le C\alpha (j)^{1-2/\delta }h^{4/\delta -2}. \end{aligned}$$

Let \(m_n=(h_n^2|\log h_n^2|)^{-1}\). Then \(m_n\rightarrow \infty \), \(m_nh^2\rightarrow 0\), and

$$\begin{aligned}&\sum _{j=m_n+1}^{n-1}|\text {Cov}(W_1,W_{j+1})| \end{aligned}$$
(19)
$$\begin{aligned}&\quad \le C\frac{h^{4/\delta -2}}{m_n^{\lambda }}\sum _{j=m_n+1}^nj^{\lambda } \alpha (j)^{1-2/\delta }\rightarrow 0, \end{aligned}$$
(20)

which follows from Assumption B. Thus,

$$\begin{aligned} \sum _{j=1}^{n-1}\text {Cov}(W_1, W_{j+1})\rightarrow 0, \end{aligned}$$

and it follows that

$$\begin{aligned} \frac{1}{nh^2}\text {Var}(Y_n(\mathbf {z})) = M(\mathbf {z})(1+o(1)). \end{aligned}$$

The proof now continues exactly as in Fan and Yao (2003) using the “big block small block” technique, but with the obvious replacement of h with \(h^2\) to accommodate the bivariate case.

We expand the argument to the multivariate case using the Cramèr–Wold device. Let \(\mathbf {\rho } = (\rho _1, \ldots , \rho _d)^T\) be the vector of local correlations, where \(d = p(p-1)/2\), write \(\mathbf {u}(\mathbf {z}, \mathbf {\rho }_0) = (u_1(\mathbf {z}, \mathbf {\rho }_0), \ldots , u_d(\mathbf {z}, \mathbf {\rho }_0))\) and let \(\mathbf {S}_{n}(\mathbf {z}) = \{S_{ni}(\mathbf {z})\}_{i=1}^d\), where

$$\begin{aligned} S_{ni} = \sum _{n=1}^nu_k(\mathbf {Z}_t, \mathbf {\rho }_0)K(|\mathbf {h}|^{-1}(\mathbf {Z}_t - \mathbf {z})). \end{aligned}$$

We must show that

$$\begin{aligned} \sum _ka_kS_{nk} \mathop {\rightarrow }\limits ^{\mathcal {L}} \sum _ka_kZ_k^*, \end{aligned}$$
(21)

where \(\mathbf {a} = (a_1, \ldots , a_d)^T\) is an arbitrary vector of constants, and \(\mathbf {Z}^* = (Z_1^*, \ldots , Z_k^*)\) is a jointly normally distributed random vector. Because of Slutsky’s Theorem, it suffices to show that the left-hand side of (21) is asymptotically normal. This follows from observing that it is on the same form as the original sequence comprising \(S_n\), with

$$\begin{aligned} \sum _ka_kS_{nk} = \sum _nu^*(\mathbf {Z}_n, \mathbf {\rho }_0)K(|\mathbf {h}|^{-1}(\mathbf {Z}_n-\mathbf {z})), \end{aligned}$$

where \(u^*(\mathbf {Z}_n, \mathbf {\rho }_0) = \sum _ka_ku_k(\mathbf {Z}_n,\mathbf {\rho }_0)\). It is well known that any measurable mapping of a mixing sequence of random variables inherit the mixing properties of the original series, so condition B is therefore satisfied by the linear combination. The new sequence of observations satisfies (18) because it follows from Jensen’s inequality that for \(\delta >2\),

$$\begin{aligned} \left[ \frac{u^*(\mathbf {Z}_t, \mathbf {\rho }_0)}{\sum _ka_k}\right] ^{\delta }&= \left[ \frac{\sum _ka_ku_k(\mathbf {Z}_t, \mathbf {\rho }_0)}{\sum _ka_k}\right] ^{\delta } \\&\le \frac{\sum _ka_k[u_k(\mathbf {Z}_t,\mathbf {\rho }_0)]^{\delta }}{\sum _ka_k}, \end{aligned}$$

so that

$$\begin{aligned} \textit{E}[u^*(\mathbf {Z}_t,\mathbf {\rho }_0)]^{\delta } \le \sum _ka_k\textit{E}[u_k(\mathbf {Z}_t,\mathbf {\rho }_0)]^{\delta }\left[ \sum _ka_k\right] ^{\delta - 1}<\infty . \end{aligned}$$

The off-diagonal elements in the asymptotic covariance matrix are zero using the same arguments as in Otneim and Tjøstheim (2016).

Proof of Theorem 3

The key to proving 3 is to show that the asymptotic distribution of (17) remains unchanged when the marginally standard normal stochastic vectors \(\mathbf {Z}_n\) are replaced with the pseudo-observations

$$\begin{aligned} \widehat{\mathbf {Z}}_n = \left( \varPhi ^{-1}(\widehat{F}_1(X_{j1})), \ldots , \varPhi ^{-1}(\widehat{F}_p(X_{jp}))\right) ^\mathrm{T}, \end{aligned}$$

where \(\widehat{F}_i(\cdot )\), \(i=1,\ldots ,p\) are the marginal empirical distribution functions. This is shown in the independent case under assumptions FG in Otneim and Tjøstheim (2016), by providing a slight modification to Proposition 3.1 by Geenens et al (2014). The essence in that proof is the convergence of the empirical copula process, which remain unchanged if we replace the assumption of independent observations with \(\alpha \)-mixing, according to Bücher and Volgushev (2013).

The multivariate delta method states that if \(\sqrt{nh^2}(\theta _n - \theta ) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, A)\) and \(q:R^n\rightarrow R\) has continuous first partial derivatives, then \(\sqrt{nh^2}(q(\theta _n) - q(\theta )) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, \nabla q(\theta )^TA\nabla q(\theta ))\) Schervish 1995, p. 403). In our case, \(q(\mathbf {\rho }) = \varPsi (\mathbf {z}, \mathbf {R})g(\mathbf {x})\), and

$$\begin{aligned} \nabla q(\mathbf {\rho }) = \varPsi (\mathbf {z}, \mathbf {R})g(\mathbf {x})\mathbf {u}(\mathbf {z}, \mathbf {R}), \end{aligned}$$

from which the result follows immediately.

Appendix 2: Large sample properties of the logspline estimator

The current implementation of our method in the R programming language (R Core Team 2015) uses the logspline method by Stone et al. (1997) for marginal density estimation. The asymptotic theory for the logspline estimator is derived by Stone (1990), but restricted to density functions with compact support. Otneim and Tjøstheim (2016) relax this requirement using a truncation argument, so that the requirement of compact support can be replaced by an assumption on the tails of the unknown density not being too heavy.

In particular, Stone (1990) denotes by \(\epsilon \in (0,1/2)\) a tuning parameter that determines the asymptotic rate at which new nodes are added to the logspline procedure. If \(\epsilon \) is close to zero, new nodes are added quickly to the procedure, and as \(\epsilon \rightarrow 1/2\), new nodes are added very slowly. Stone (1990) then provides the following asymptotic results (again, under the assumption that the true density \(f(\mathbf {x})\) has compact support):

$$\begin{aligned} \sqrt{n^{0.5 + \epsilon }}\left( \widehat{f}_i(x) - f(x)\right) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, \sigma _1^2), \end{aligned}$$

and

$$\begin{aligned} \sqrt{n^{0.5}}\left( \widehat{F}_i(x) - F(x)\right) \mathop {\rightarrow }\limits ^{\mathcal {L}} N(0, \sigma _2^2). \end{aligned}$$

Otneim and Tjøstheim (2016) show that these results hold if there exist constants \(M>0\), \(\gamma > 2\epsilon /(1-2\epsilon )\), and \(x_0>0\) such that \(f(x)\le M|x|^{-(5/2+\gamma )}\) for all \(|x|>x_0\), so the “worst case scenario” with respect to assumption I when using the logspline estimator for the final back-transformation, is \(\epsilon \) being close to zero. In that case, we must require the bandwidths to tend to zero fast enough so that \(n^{1/2}h^2\rightarrow 0\), but on the other hand, that will allow \(\gamma \) to approach zero, and thus the tail-thickness of the density to approach that of \(|x|^{-5/2}\).

What remains here is to show that these results hold also in the case where the observations are \(\alpha \)-mixing. This is easily done by replacing the use of the iid central limit theorem (clt) in the proof of Theorem 3 in Stone (1990), with a corresponding clt that holds under our mixing condition. For example, Theorem A by Peligrad (1992) proves the clt under \(\alpha \)-mixing provided that the mixing coefficients satisfy \(\sum _{n=1}^{\infty }\alpha (n)^{1-2/\delta } < \infty \). This condition follows from our Assumption B.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Otneim, H., Tjøstheim, D. Conditional density estimation using the local Gaussian correlation. Stat Comput 28, 303–321 (2018). https://doi.org/10.1007/s11222-017-9732-z

Download citation

Keywords

  • Conditional density estimation
  • Local likelihood
  • Multivariate data
  • Cross-validation