Abstract
This article introduces a Bayesian neural network estimation method for quantile regression assuming an asymmetric Laplace distribution (ALD) for the response variable. It is shown that the posterior distribution for feedforward neural network quantile regression is asymptotically consistent under a misspecified ALD model. This consistency proof embeds the problem from density estimation domain and uses bounds on the bracketing entropy to derive the posterior consistency over Hellinger neighborhoods. This consistency result is shown in the setting where the number of hidden nodes grow with the sample size. The Bayesian implementation utilizes the normal-exponential mixture representation of the ALD density. The algorithm uses Markov chain Monte Carlo (MCMC) simulation technique - Gibbs sampling coupled with Metropolis–Hastings algorithm. We have addressed the issue of complexity associated with the afore-mentioned MCMC implementation in the context of chain convergence, choice of starting values, and step sizes. We have illustrated the proposed method with simulation studies and real data examples.
Similar content being viewed by others
References
Alhamzawi R (2018) “Brq”: R package for bayesian quantile regression. https://cran.r-project.org/web/packages/Brq/Brq.pdf
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102
Barndorff-Nielsen OE, Shephard N (2001) Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63:167–241
Barron A, Schervish MJ, Wasserman L (1999) The consistency of posterior distributions in nonparametric problems. Ann Stat 10:536–561
Benoit DF, Alhamzawi R, Yu K, den Poel DV (2017) R package ‘bayesQR’. https://cran.r-project.org/web/packages/bayesQR/bayesQR.pdf
Buntine WL, Weigend AS (1991) Bayesian back-propagation. Comp Syst 5:603–643
Cannon AJ (2011) R package ‘qrnn’. https://cran.r-project.org/web/packages/qrnn/qrnn.pdf
Cannon AJ (2018) Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stoch Environ Res Risk Assess 32:3207–3225
Chen C (2007) A finite smoothing algorithm for quantile regression. J Comput Graph Stat 16:136–164
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Controls Signals Syst 2:303–314
Dantzig GB (1963) Linear programming and extensions. Princeton University Press, Princeton
de Freitas N, Andrieu C, Højen-Sørensen P, Niranjan M, Gee A (2001) Sequential monte carlo methods for neural networks. In: Doucet A, de Freitas N, Gordon N (eds) Sequential monte carlo methods in practice. Springer, New York, pp 359–379
Funahashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2:183–192
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. CRC Press, Boca Raton
Ghosh M, Ghosh A, Chen MH, Agresti A (2000) Noninformative Priors for One-Parameter Item Models. J Stat Plan Infer 88:99–115
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Karmarkar N (1984) A new polynomial time algorithm for linear programming. Combinatorica 4:373–395
Koenker R (2005) Quantile regression, 1st edn. Cambridge University Press, Cambridge
Koenker R (2017) R package ‘quantreg’. https://cran.r-project.org/web/packages/quantreg/quantreg.pdf
Koenker R, Basset G (1978) Regression quantiles. Econometrica 46:33–50
Koenker R, Machado J (1999) Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94:1296–1309
Kottas A, Gelfand AE (2001) Bayesian semiparametric median regression modeling. J Am Stat Assoc 96:1458–1468
Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81:1565–1578
Lee HKH (2000) Consistency of posterior distributions for neural networks. Neural Netw 13:629–642
Mackay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4:448–472
Madsen K, Nielsen HB (1993) A finite smoothing algorithm for linear \(l_1\) estimation. SIAM J Optim 3:223–235
Neal RM (1996) Bayesian learning for neural networks. Springer, New York
Papamarkou T, Hinkle J, Young M, Womble D (2019) Challenges in Bayesian inference via Markov chain Monte Carlo for neural networks. arXiv:1910.06539
Pollard D (1991) Bracketing methods in statistics and econometrics. In: Nonparametric and semiparametric methods in econometrics and statistics. In: Barnett WA, Powell J, Tauchen GE (eds) Proceedings of the Fifth international symposium in econometric theory and econometrics, pp 337–355, Cambridge: Cambridge University Press
Sriram K, Ramamoorthi RV, Ghosh P (2013) Posterior consistency of bayesian quantile regression based on the misspecified asymmetric laplace density. Bayesian Anal 8:479–504
Taylor JW (2000) A quantile regression neural network approach to estimating the conditional density of multiperiod returns. J Forecast 19:299–311
Titterington DM (2004) Bayesian methods for neural networks and related models. Stat Sci 19:128–139
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Walker SG, Mallick BK (1999) A Bayesian semiparametric accelerated failure time model. Biometrics 55:477–483
Wasserman L (1998) Asymptotic properties of nonparametric Bayesian procedures. In: Dey D, Műller P, Sinha D (eds) Practical nonparametric and semiparametric Bayesian statistics. Springer, New York, pp 293–304
Wong WH, Shen X (1995) Probability inequalities for likelihood ratios and convergence rates of sieve mles. Ann Stat 23:339–362
Xu Q, Deng K, Jiang C, Sun F, Huang X (2017) Composite quantile regression neural network with applications. Exp Syst Appl 76:129–139
Yeh I-C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Conc Res 28:1797–1808
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Prob Lett 54:437–447
Yu K, Zhang J (2005) A three-parameter asymmetric Laplace distribution and its extensions. Commun Stat Theory Methods 34:1867–1879
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel P, Zellner A (eds) Bayesian inference and decision techniques: essays in honor of Bruno de Finetti. Elsevier Science Publishers Inc, New York, pp 233–243
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Celebrating the Centenary of Professor C. R. Rao” guest edited by, Ravi Khattree, Sreenivasa Rao Jammalamadaka , and M. B. Rao.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Lemmas for Posterior Consistency Proof
For all the proofs in Appendix A and Appendix B, we assume \({\varvec{X}}_{p \times 1}\) to be uniformly distributed on \([0,1]^p\) and keep them fixed. Thus, \(f_0({\varvec{x}})=f({\varvec{x}})=1\). Conditional on \({\varvec{X}}\), the univariate response variable Y has asymmetric Laplace distribution with location parameter determined by the neural network. We are going to fix its scale parameter, \(\sigma \), to be 1 for the posterior consistency derivations. Thus,
The number of input variables, p, is taken to be fixed, while the number of hidden nodes, k, will be allowed to grow with the sample size, n.
All the lemmas described below are taken from Lee [25].
Lemma 1
Suppose \(H_{[]}(u) \le \log [(C_n^2d_n/u)^{d_n}], d_n=(p+2)k_n+1, k_n \le n^a\) and \(C_n\le \exp (n^{b-a})\) for \(0<a<b<1\). Then, for any fixed constants \(c,\epsilon >0,\) and for all sufficiently large \(n, \int _0^\epsilon \sqrt{H_{[]}(u)} \le c\sqrt{n}\epsilon ^2\).
Proof
The proof follows from the proof of Lemma 1 from [25, p. 634-635] in BQRNN case. \(\square \)
For Lemmas 2, 3 and 4, we make use of the following notations. From Eq. 3.10, recall
is the ratio of likelihoods under neural network density f and the true density \(f_0\). \(\mathcal {F}_n\) is the sieve as defined in Eq. 3.9 and \(A_\epsilon \) is the Hellinger neighborhood of the true density \(f_0\) as in Eq. 3.5.
Lemma 2
\(\underset{f\in A_\epsilon ^c \cap \mathcal {F}_n}{\sup } R_n(f) \le 4 \exp (-c_2n\epsilon ^2)\) a.s. for sufficiently large n.
Proof
Using the outline of the proof of Lemma 2 from Lee [25, p. 635], first we have to bound the Hellinger bracketing entropy using van der Vaart and Wellner [34, Theorem 2.7.11 on p.164]. Next, we use Lemma 1 to show that the conditions of Wong and Shen [38, Theorem 1 on p.348-349] hold and finally we apply that theorem to get the result presented in the Lemma 2.
In our case of BQRNN, we only need to derive first step using ALD density mentioned in Eq. A.1. And rest of the steps follow from the proof given in Lee [25]. As we are looking for the Hellinger bracketing entropy for neural networks, we use \(L_2\) norm on the square root of the density functions, f. The \(L_\infty \) covering number was computed above in Eq. 3.11, so here \(d^*=L_\infty \). The version of van der Vaart and Wellner [34, Theorem 2.7.11] that we are interested in is
Now let’s start by defining some notations,
For notational convenience, we drop x and y from \(f_s(x,y)\), \(f_t(x,y)\), \(\mu _s(x)\), \(\mu _t(x)\), \(B_j(x)\), and \(A_j(x)\) and denote them as \(f_s\), \(f_t\), \(\mu _s\), \(\mu _t\), \(B_j\), and \(A_j\).
Now let’s separate above term into two cases when: (a) \(\mu _s \le \mu _t\) and (b) \(\mu _s > \mu _t\). Further let’s consider case-a and break it into three subcases when: (i) \(y \le \mu _s \le \mu _t\), (ii) \(\mu _s < y \le \mu _t\), and (iii) \(\mu _s \le \mu _t < y\).
- Case-a (i):
-
\(y \le \mu _s \le \mu _t\)
Eq. A.4 simplifies to
$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) - \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) \right| \nonumber \\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) \right| \left|\exp \left( -\frac{1}{2}(\mu _s-\mu _t) (\tau -1) \right) -1\right| \nonumber \\&\qquad \text {As first term in modulus is } \le 1 \nonumber \\&\quad \le \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2}(\mu _t-\mu _s) (1-\tau ) \right) \right| \nonumber \\&\qquad \text {Note: } 1-\exp (-z) \le z \,\, \forall z \in \mathbb {R}\implies \left|1-\exp (-z)\right| \le \left|z\right| \,\, \forall z \ge 0 \nonumber \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right|(1-\tau ) \nonumber \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right| \nonumber \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$(A.5) - Case-a (ii):
-
\(\mu _s < y \le \mu _t\)
Eq. A.4 simplifies to
$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right|\\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) -1 +1 - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right| \\&\quad \le \frac{1}{2} \left|1-\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) \right| + \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right| \\&\qquad \text {Let's use calculus inequality mentioned in A.5} \\&\quad \le \frac{1}{4}\left|(y-\mu _t)(\tau -1)\right| + \frac{1}{4}\left|(y-\mu _s)\tau \right| \\&\qquad \text {Both terms are positive so we will combine them in one modulus} \\&\quad = \frac{1}{4} \left|(y-\mu _t)(\tau -1) + (y-\mu _t+\mu _t-\mu _s)\tau \right| \\&\quad = \frac{1}{4} \left|(y-\mu _t)(2\tau -1) + (\mu _t-\mu _s)\tau \right| \\&\quad \le \frac{1}{4} \left[ \left|(y-\mu _t)\right|\left|2\tau -1\right| + \left|\mu _t-\mu _s\right|\tau \right] \\&\quad \text {Here, } \left|y-\mu _t\right| \le \left|\mu _t-\mu _s\right| \,\, \text {and} \,\, \left|2\tau -1\right| \le 1 \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$ - Case-a (iii):
-
\(\mu _s \le \mu _t < y\)
Eq. A.4 simplifies to
$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) \tau \right) - \exp \left( -\frac{1}{2}(y-\mu _s)\tau \right) \right| \\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) \tau \right) \right| \left|1-\exp \left( -\frac{1}{2}(\mu _t-\mu _s) \tau \right) \right|\\&\qquad \text {As first term in modulus is } \le 1 \\&\quad \le \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2} (\mu _t-\mu _s) \tau \right) \right| \\&\qquad \text {Using the calculus inequality mentioned in A.5} \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right|\tau \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right| \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$
We can similarly bound Eq. A.4 in case-(b) where \(\mu _s > \mu _t\) by \(\left|\mu _t-\mu _s\right|/2\). Now,
Hence, we can bound Eq. A.6 as follows
Now rest of the steps will follow from the proof of Lemma 2 given in Lee [25, p. 635-636]. \(\square \)
Lemma 3
If there exists a constant \(r>0\) and N, such that \(\mathcal {F}_n\) satisfies \(\pi _n(\mathcal {F}_n^c)<\exp (-nr), \forall n\ge N\), then there exists a constant \(c_2\) such that \(\int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f) < \exp (-nr/2) + \exp (-nc_2\epsilon ^2)\) except on a set of probability tending to zero.
Proof
The proof is same as the proof of Lemma 3 from [25, p. 636] in BQRNN scenario. \(\square \)
Lemma 4
Let \(K_\delta \) be the KL-neighborhood as in Eq. 3.8. Suppose that for all \(\delta ,\nu \, > 0, \exists \, N\) s.t. \(\pi _n(K_\delta ) \ge \exp (-n\nu ),\, \forall n \ge N\). Then, for all \(\varsigma > 0\) and sufficiently large n, \(\int R_n(f) d\pi _n(f) > e^{-n\varsigma }\) except on a set of probability going to zero.
Proof
The proof is same as the proof of Lemma 5 from [25, p. 637] in BQRNN scenario. \(\square \)
Lemma 5
Suppose that \(\mu \) is a neural network regression with parameters \((\theta _1,\ldots \theta _d)\), and let \(\tilde{\mu }\) be another neural network with parameters \((\tilde{\theta }_1,\ldots \tilde{\theta }_{\tilde{d}_n})\). Define \(\theta _i=0\) for \(i >d\) and \(\tilde{\theta }_j=0\) for \(j>\tilde{d}_n\). Suppose that the number of nodes of \(\mu \) is k, and that the number of nodes of \(\tilde{\mu }\) is \(\tilde{k}_n=O(n^a)\) for some a, \(0<a<1\). Let
Then, for any \(\tilde{\mu } \in M_\varsigma \) and for sufficiently large n,
Proof
The proof is same as the proof of Lemma 6 from [25, p. 638-639]. \(\square \)
Appendix B: Posterior Consistency Theorem Proofs
1.1 Appendix B.1: Theorem 2 Proof
For the proof of Theorem 2 and Corollary 3, we use the following notations. From Eq. 3.10, recall that
is the ratio of likelihoods under neural network density f and the true density \(f_0\). Also, \(\mathcal {F}_n\) is the sieve as defined in Eq. 3.9. Finally, \(A_\epsilon \) is the Hellinger neighborhood of the true density \(f_0\) as in Eq. 3.5.
By Lemma 3, there exists a constant \(c_2\) such that \(\int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f) < \exp (-nr/2) + \exp (-nc_2\epsilon ^2)\) for sufficiently large n. Further, from Lemma 4, \(\int R_n(f) \mathrm {d}\pi _n(f) \ge \exp (-n\varsigma )\) for sufficiently large n.
Now we will pick \(\varsigma \) such that for \(\varphi >0\), both \(\frac{r}{2}-\varsigma > \varphi \) and \(c_2 - \varsigma > \varphi \). Thus,
Hence, \(P(A_\epsilon ^c |({\varvec{X_1}},Y_1), \ldots , ({\varvec{X_n}},Y_n)) \overset{p}{\rightarrow } 0\). \(\square \)
1.2 Appendix B.2: Corollary 3 Proof
Theorem 2 implies that \(D_H(f_0,f) \overset{p}{\rightarrow } 0\) where \(D_H(f_0,f)\) is the Hellinger distance between \(f_0\) and f as in Eq. 3.4 and f is a random draw from the posterior. Recall from Eq. 3.6, the predictive density function
gives rise to the predictive conditional quantile function, \(\hat{\mu }_n({\varvec{x}}) = Q_{\tau ,\hat{f}_n}(y |{\varvec{X}}={\varvec{x}})\). We next show that \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\), which in turn implies \(\hat{\mu }_n({\varvec{x}})\) converges in \(L_1\)-norm to the true conditional quantile function,
First we show that \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\). Let \(X^n = (({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n))\). For any \(\epsilon >0\):
The second term goes to zero in probability by Theorem 2 and \(\epsilon \) is arbitrary, so \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\).
In the remaining part of the proof, for notational simplicity, we take \(\hat{\mu }_n({\varvec{x}})\) and \(\mu _0({\varvec{x}})\) to be \(\hat{\mu }\) and \(\hat{\mu }_0\), respectively. The Hellinger distance between \(f_0\) and \(\hat{f}_n\) is
Now let’s break T into two cases: (a) \(\hat{\mu }_n \le \mu _0\), and (b) \(\hat{\mu }_n > \mu _0\).
- Case-(a):
-
\(\hat{\mu }_n \le \mu _0\)
$$\begin{aligned} T = {\left\{ \begin{array}{ll} - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau , & \hat{\mu }_n \le \mu _0< y \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\mu _0)}{2}, & \hat{\mu }_n \le \frac{\hat{\mu }_n+\mu _0}{2}< y \le \mu _0 \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\hat{\mu }_n)}{2}, & \hat{\mu }_n < y \le \frac{\hat{\mu }_n+\mu _0}{2} \le \mu _0 \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1), & y \le \hat{\mu }_n \le \mu _0 \end{array}\right. } \end{aligned}$$ - Case-(b):
-
\(\hat{\mu }_n > \mu _0\)
$$\begin{aligned} T = {\left\{ \begin{array}{ll} - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau , & \mu _0 \le \hat{\mu }_n< y \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\hat{\mu }_n)}{2}, & \mu _0 \le \frac{\hat{\mu }_n+\mu _0}{2}< y \le \hat{\mu }_n \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\mu _0)}{2}, & \mu _0 < y \le \frac{\hat{\mu }_n+\mu _0}{2} \le \hat{\mu }_n \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1), & y \le \mu _0 \le \hat{\mu }_n \end{array}\right. } \end{aligned}$$
Hence now,
Substituting the above expression in Eq. B.1 we get,
Since \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\),
Our next step is to show that above expression implies that \(\left|\hat{\mu }_n-\mu _0\right| \rightarrow 0\) a.s. on a set \(\Omega \), with probability tending to 1, and hence \(\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 0\).
We are going to prove this using contradiction technique. Suppose that, \(\left|\hat{\mu }_n-\mu _0\right| \nrightarrow 0\) a.s. on \(\Omega \). Then, there exists an \(\epsilon > 0\) and a subsequence \(\hat{\mu }_{n_i}\) such that \(\left|\hat{\mu }_{n_i}-\mu _0\right| > \epsilon \) on a set A with \(P(A)>0\). Now decompose the integral as
So we have a contradiction since the integral converges in probability to 1. Thus \(\left|\hat{\mu }_n-\mu _0\right| \rightarrow 0\) a.s. on \(\Omega \). Once we apply Scheffe’s theorem we get \(\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \rightarrow 0\) a.s. on \(\Omega \) and hence \(\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 0\). \(\square \)
Below we prove Theorem 1 and for that we make use of Theorem 2 and Corollary 3.
1.3 Appendix B.3: Theorem 1 Proof
We proceed by showing that with \(\mathcal {F}_n\) as in Eq. 3.9, the prior \(\pi _n\) of Theorem 1 satisfies the condition (i) and (ii) of Theorem 2.
The proof of Theorem 2 condition-(i) presented in Lee [25, proof of Theorem 1 on p. 639] holds in BQRNN case without any change. Next, we need to show that condition-(ii) holds in BQRNN model. Let \(K_\delta \) be the KL-neighborhood of the true density \(f_0\) as in Eq. 3.8 and \(\mu _0\) the corresponding conditional quantile function. We first fix a closely approximating neural network \(\mu ^*\) of \(\mu _0\). We then find a neighborhood \(M_\varsigma \) of \(\mu ^*\) as in Eq. A.7 and show that this neighborhood has sufficiently large prior probability. Suppose that \(\mu _0\) is continuous. For any \(\delta >0\), choose \(\epsilon = \delta /2\) in theorem from Funahashi [13, Theorem 1 on p.184] and let \(\mu ^*\) be a neural network such that \(\underset{x\in \mathcal {X}}{\sup }\left|\mu ^*-\mu _0\right| < \epsilon \). Let \(\varsigma =(\sqrt{\epsilon }/5n^a)=\sqrt{(\delta /50)}n^{-a}\) in Lemma 5. Then, following derivation will show us that for any \(\tilde{\mu }\in M_\varsigma , D_K(f_0,\tilde{f})\le \delta \), i.e., \(M_\varsigma \subset K_\delta \).
Now let’s break T into two cases: (a) \(\tilde{\mu } \ge \mu _0\), and (b) \(\tilde{\mu } < \mu _0\).
- Case-(a):
-
\(\tilde{\mu } \ge \mu _0\)
$$\begin{aligned} T = {\left\{ \begin{array}{ll} (\mu _0-\tilde{\mu })\tau , & \mu _0 \le \tilde{\mu }< y \\ (\mu _0-\tilde{\mu })\tau -(y-\tilde{\mu }), & \mu _0 < y \le \tilde{\mu } \\ (\mu _0-\tilde{\mu })(\tau -1), & y \le \mu _0 \le \tilde{\mu } \end{array}\right. } \end{aligned}$$ - Case-(b):
-
\(\tilde{\mu } \le \mu _0\)
$$\begin{aligned} T = {\left\{ \begin{array}{ll} (\mu _0-\tilde{\mu })\tau , & \tilde{\mu } \le \mu _0< y \\ (\mu _0-\tilde{\mu })(\tau -1)+(y-\tilde{\mu }), & \tilde{\mu } < y \le \mu _0 \\ (\mu _0-\tilde{\mu })(\tau -1), & y \le \tilde{\mu } \le \mu _0 \end{array}\right. } \end{aligned}$$
So now,
Hence,
Use Lemma 5 and Funahashi [13, Theorem 1 on p.184] to bound the first and second term, respectively.
Finally, we prove that \(\forall \delta ,\nu >0, \exists N_\nu \) s.t. \(\pi _n(K_\delta ) \ge \exp (-n\nu ) \, \forall n\ge N_\nu \),
Hence, we have proved that both the conditions of Theorem 2 hold. The result of Theorem 1 thereby follows from the Corollary 3 which is derived from Theorem 2.
Rights and permissions
About this article
Cite this article
Jantre, S.R., Bhattacharya, S. & Maiti, T. Quantile Regression Neural Networks: A Bayesian Approach. J Stat Theory Pract 15, 68 (2021). https://doi.org/10.1007/s42519-021-00189-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-021-00189-w