Skip to main content
Log in

Quantile Regression Neural Networks: A Bayesian Approach

  • Original Article
  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

This article introduces a Bayesian neural network estimation method for quantile regression assuming an asymmetric Laplace distribution (ALD) for the response variable. It is shown that the posterior distribution for feedforward neural network quantile regression is asymptotically consistent under a misspecified ALD model. This consistency proof embeds the problem from density estimation domain and uses bounds on the bracketing entropy to derive the posterior consistency over Hellinger neighborhoods. This consistency result is shown in the setting where the number of hidden nodes grow with the sample size. The Bayesian implementation utilizes the normal-exponential mixture representation of the ALD density. The algorithm uses Markov chain Monte Carlo (MCMC) simulation technique - Gibbs sampling coupled with Metropolis–Hastings algorithm. We have addressed the issue of complexity associated with the afore-mentioned MCMC implementation in the context of chain convergence, choice of starting values, and step sizes. We have illustrated the proposed method with simulation studies and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alhamzawi R (2018) “Brq”: R package for bayesian quantile regression. https://cran.r-project.org/web/packages/Brq/Brq.pdf

  2. Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102

    MathSciNet  MATH  Google Scholar 

  3. Barndorff-Nielsen OE, Shephard N (2001) Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63:167–241

    Article  MathSciNet  Google Scholar 

  4. Barron A, Schervish MJ, Wasserman L (1999) The consistency of posterior distributions in nonparametric problems. Ann Stat 10:536–561

    MathSciNet  MATH  Google Scholar 

  5. Benoit DF, Alhamzawi R, Yu K, den Poel DV (2017) R package ‘bayesQR’. https://cran.r-project.org/web/packages/bayesQR/bayesQR.pdf

  6. Buntine WL, Weigend AS (1991) Bayesian back-propagation. Comp Syst 5:603–643

    MATH  Google Scholar 

  7. Cannon AJ (2011) R package ‘qrnn’. https://cran.r-project.org/web/packages/qrnn/qrnn.pdf

  8. Cannon AJ (2018) Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stoch Environ Res Risk Assess 32:3207–3225

    Article  Google Scholar 

  9. Chen C (2007) A finite smoothing algorithm for quantile regression. J Comput Graph Stat 16:136–164

    Article  MathSciNet  Google Scholar 

  10. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Controls Signals Syst 2:303–314

    Article  MathSciNet  Google Scholar 

  11. Dantzig GB (1963) Linear programming and extensions. Princeton University Press, Princeton

    Book  Google Scholar 

  12. de Freitas N, Andrieu C, Højen-Sørensen P, Niranjan M, Gee A (2001) Sequential monte carlo methods for neural networks. In: Doucet A, de Freitas N, Gordon N (eds) Sequential monte carlo methods in practice. Springer, New York, pp 359–379

    Chapter  Google Scholar 

  13. Funahashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2:183–192

    Article  Google Scholar 

  14. Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472

    MATH  Google Scholar 

  15. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. CRC Press, Boca Raton

    Book  Google Scholar 

  16. Ghosh M, Ghosh A, Chen MH, Agresti A (2000) Noninformative Priors for One-Parameter Item Models. J Stat Plan Infer 88:99–115

    Article  MathSciNet  Google Scholar 

  17. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366

    Article  Google Scholar 

  18. Karmarkar N (1984) A new polynomial time algorithm for linear programming. Combinatorica 4:373–395

    Article  MathSciNet  Google Scholar 

  19. Koenker R (2005) Quantile regression, 1st edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  20. Koenker R (2017) R package ‘quantreg’. https://cran.r-project.org/web/packages/quantreg/quantreg.pdf

  21. Koenker R, Basset G (1978) Regression quantiles. Econometrica 46:33–50

    Article  MathSciNet  Google Scholar 

  22. Koenker R, Machado J (1999) Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94:1296–1309

    Article  MathSciNet  Google Scholar 

  23. Kottas A, Gelfand AE (2001) Bayesian semiparametric median regression modeling. J Am Stat Assoc 96:1458–1468

    Article  Google Scholar 

  24. Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81:1565–1578

    Article  MathSciNet  Google Scholar 

  25. Lee HKH (2000) Consistency of posterior distributions for neural networks. Neural Netw 13:629–642

    Article  Google Scholar 

  26. Mackay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4:448–472

    Article  Google Scholar 

  27. Madsen K, Nielsen HB (1993) A finite smoothing algorithm for linear \(l_1\) estimation. SIAM J Optim 3:223–235

    Article  MathSciNet  Google Scholar 

  28. Neal RM (1996) Bayesian learning for neural networks. Springer, New York

    Book  Google Scholar 

  29. Papamarkou T, Hinkle J, Young M, Womble D (2019) Challenges in Bayesian inference via Markov chain Monte Carlo for neural networks. arXiv:1910.06539

  30. Pollard D (1991) Bracketing methods in statistics and econometrics. In: Nonparametric and semiparametric methods in econometrics and statistics. In: Barnett WA, Powell J, Tauchen GE (eds) Proceedings of the Fifth international symposium in econometric theory and econometrics, pp 337–355, Cambridge: Cambridge University Press

  31. Sriram K, Ramamoorthi RV, Ghosh P (2013) Posterior consistency of bayesian quantile regression based on the misspecified asymmetric laplace density. Bayesian Anal 8:479–504

    Article  MathSciNet  Google Scholar 

  32. Taylor JW (2000) A quantile regression neural network approach to estimating the conditional density of multiperiod returns. J Forecast 19:299–311

    Article  Google Scholar 

  33. Titterington DM (2004) Bayesian methods for neural networks and related models. Stat Sci 19:128–139

    Article  MathSciNet  Google Scholar 

  34. van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    Book  Google Scholar 

  35. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Book  Google Scholar 

  36. Walker SG, Mallick BK (1999) A Bayesian semiparametric accelerated failure time model. Biometrics 55:477–483

    Article  MathSciNet  Google Scholar 

  37. Wasserman L (1998) Asymptotic properties of nonparametric Bayesian procedures. In: Dey D, Műller P, Sinha D (eds) Practical nonparametric and semiparametric Bayesian statistics. Springer, New York, pp 293–304

    Chapter  Google Scholar 

  38. Wong WH, Shen X (1995) Probability inequalities for likelihood ratios and convergence rates of sieve mles. Ann Stat 23:339–362

    Article  MathSciNet  Google Scholar 

  39. Xu Q, Deng K, Jiang C, Sun F, Huang X (2017) Composite quantile regression neural network with applications. Exp Syst Appl 76:129–139

    Article  Google Scholar 

  40. Yeh I-C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Conc Res 28:1797–1808

    Article  Google Scholar 

  41. Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Prob Lett 54:437–447

    Article  MathSciNet  Google Scholar 

  42. Yu K, Zhang J (2005) A three-parameter asymmetric Laplace distribution and its extensions. Commun Stat Theory Methods 34:1867–1879

    Article  MathSciNet  Google Scholar 

  43. Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel P, Zellner A (eds) Bayesian inference and decision techniques: essays in honor of Bruno de Finetti. Elsevier Science Publishers Inc, New York, pp 233–243

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Maiti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Celebrating the Centenary of Professor C. R. Rao” guest edited by, Ravi Khattree, Sreenivasa Rao Jammalamadaka , and M. B. Rao.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 159 kb)

Appendices

Appendix A: Lemmas for Posterior Consistency Proof

For all the proofs in Appendix A and Appendix B, we assume \({\varvec{X}}_{p \times 1}\) to be uniformly distributed on \([0,1]^p\) and keep them fixed. Thus, \(f_0({\varvec{x}})=f({\varvec{x}})=1\). Conditional on \({\varvec{X}}\), the univariate response variable Y has asymmetric Laplace distribution with location parameter determined by the neural network. We are going to fix its scale parameter, \(\sigma \), to be 1 for the posterior consistency derivations. Thus,

$$\begin{aligned} Y |{\varvec{X}} = {\varvec{x}} \sim ALD \left( \beta _0 + \sum _{j=1}^k \beta _j \frac{1}{1 + \exp { \left( -\gamma _{j0}- \sum _{h=1}^p \gamma _{jh} x_{h} \right) }}, 1, \tau \right) \end{aligned}$$
(A.1)

The number of input variables, p, is taken to be fixed, while the number of hidden nodes, k, will be allowed to grow with the sample size, n.

All the lemmas described below are taken from Lee [25].

Lemma 1

Suppose \(H_{[]}(u) \le \log [(C_n^2d_n/u)^{d_n}], d_n=(p+2)k_n+1, k_n \le n^a\) and \(C_n\le \exp (n^{b-a})\) for \(0<a<b<1\). Then, for any fixed constants \(c,\epsilon >0,\) and for all sufficiently large \(n, \int _0^\epsilon \sqrt{H_{[]}(u)} \le c\sqrt{n}\epsilon ^2\).

Proof

The proof follows from the proof of Lemma 1 from [25, p. 634-635] in BQRNN case. \(\square \)

For Lemmas 23 and 4, we make use of the following notations. From Eq. 3.10, recall

$$\begin{aligned} R_n(f) = \prod _{i=1}^n \frac{f(x_i,y_i)}{f_0(x_i,y_i)} \end{aligned}$$

is the ratio of likelihoods under neural network density f and the true density \(f_0\). \(\mathcal {F}_n\) is the sieve as defined in Eq. 3.9 and \(A_\epsilon \) is the Hellinger neighborhood of the true density \(f_0\) as in Eq. 3.5.

Lemma 2

\(\underset{f\in A_\epsilon ^c \cap \mathcal {F}_n}{\sup } R_n(f) \le 4 \exp (-c_2n\epsilon ^2)\) a.s. for sufficiently large n.

Proof

Using the outline of the proof of Lemma 2 from Lee [25, p. 635], first we have to bound the Hellinger bracketing entropy using van der Vaart and Wellner [34, Theorem 2.7.11 on p.164]. Next, we use Lemma 1 to show that the conditions of Wong and Shen [38, Theorem 1 on p.348-349] hold and finally we apply that theorem to get the result presented in the Lemma 2.

In our case of BQRNN, we only need to derive first step using ALD density mentioned in Eq. A.1. And rest of the steps follow from the proof given in Lee [25]. As we are looking for the Hellinger bracketing entropy for neural networks, we use \(L_2\) norm on the square root of the density functions, f. The \(L_\infty \) covering number was computed above in Eq. 3.11, so here \(d^*=L_\infty \). The version of van der Vaart and Wellner [34, Theorem 2.7.11] that we are interested in is

$$\begin{aligned}&\text {If} \,\, \left|\sqrt{f_t(x,y)}-\sqrt{f_s(x,y)}\right| \le d^*(s,t) F(x,y) \quad \text {for some } F, \\&\text {then,} \,\, N_{[]}(2\epsilon \left\Vert F\right\Vert _2,\mathcal {F}^*,\left\Vert .\right\Vert _2) \le N(\epsilon ,\mathcal {F}_n,d^*) \end{aligned}$$

Now let’s start by defining some notations,

$$\begin{aligned} f_t(x,y)&= \tau (1-\tau ) \exp \left( -(y-\mu _t(x)) (\tau -I_{(y\le \mu _t(x))}) \right) , \nonumber \\&\quad \text { where,} \,\, \mu _t(x) = \beta _0^t + \sum _{j=1}^k \frac{\beta _j^t}{1 + \exp {(-A_j(x))}} \,\, \text {and} \,\, A_j(x) = \gamma _{j0}^t + \sum _{h=1}^p \gamma _{jh}^t x_{h} \end{aligned}$$
(A.2)
$$\begin{aligned} f_s(x,y)&= \tau (1-\tau ) \exp \left( -(y-\mu _s(x)) (\tau -I_{(y\le \mu _s(x))}) \right) , \nonumber \\&\quad \text {where,} \,\, \mu _s(x) = \beta _0^s + \sum _{j=1}^k \frac{\beta _j^s}{1 + \exp {(-B_j(x))}} \,\, \text {and} \,\, B_j(x) = \gamma _{j0}^s + \sum _{h=1}^p \gamma _{jh}^s x_{h} \end{aligned}$$
(A.3)

For notational convenience, we drop x and y from \(f_s(x,y)\), \(f_t(x,y)\), \(\mu _s(x)\), \(\mu _t(x)\), \(B_j(x)\), and \(A_j(x)\) and denote them as \(f_s\), \(f_t\), \(\mu _s\), \(\mu _t\), \(B_j\), and \(A_j\).

$$\begin{aligned} \left|\sqrt{f_t}-\sqrt{f_s}\right|&= \sqrt{\tau (1-\tau )} \left| \exp \left( -\frac{1}{2}(y-\mu _t) (\tau -I_{(y\le \mu _t)}) \right) \right. \nonumber \\&\quad - \left. \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -I_{(y\le \mu _s)}) \right) \right| \nonumber \\&\quad \text {As,} \,\, \tau \in (0,1) \text { is fixed.}\nonumber \\&\le \frac{1}{2} \left| \exp \left( -\frac{1}{2}(y-\mu _t) (\tau -I_{(y\le \mu _t)}) \right) \right. \nonumber \\&\quad - \left. \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -I_{(y\le \mu _s)}) \right) \right| \end{aligned}$$
(A.4)

Now let’s separate above term into two cases when: (a) \(\mu _s \le \mu _t\) and (b) \(\mu _s > \mu _t\). Further let’s consider case-a and break it into three subcases when: (i) \(y \le \mu _s \le \mu _t\), (ii) \(\mu _s < y \le \mu _t\), and (iii) \(\mu _s \le \mu _t < y\).

Case-a (i):

\(y \le \mu _s \le \mu _t\)

Eq. A.4 simplifies to

$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) - \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) \right| \nonumber \\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) \right| \left|\exp \left( -\frac{1}{2}(\mu _s-\mu _t) (\tau -1) \right) -1\right| \nonumber \\&\qquad \text {As first term in modulus is } \le 1 \nonumber \\&\quad \le \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2}(\mu _t-\mu _s) (1-\tau ) \right) \right| \nonumber \\&\qquad \text {Note: } 1-\exp (-z) \le z \,\, \forall z \in \mathbb {R}\implies \left|1-\exp (-z)\right| \le \left|z\right| \,\, \forall z \ge 0 \nonumber \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right|(1-\tau ) \nonumber \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right| \nonumber \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$
(A.5)
Case-a (ii):

\(\mu _s < y \le \mu _t\)

Eq. A.4 simplifies to

$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right|\\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) -1 +1 - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right| \\&\quad \le \frac{1}{2} \left|1-\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) \right| + \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right| \\&\qquad \text {Let's use calculus inequality mentioned in A.5} \\&\quad \le \frac{1}{4}\left|(y-\mu _t)(\tau -1)\right| + \frac{1}{4}\left|(y-\mu _s)\tau \right| \\&\qquad \text {Both terms are positive so we will combine them in one modulus} \\&\quad = \frac{1}{4} \left|(y-\mu _t)(\tau -1) + (y-\mu _t+\mu _t-\mu _s)\tau \right| \\&\quad = \frac{1}{4} \left|(y-\mu _t)(2\tau -1) + (\mu _t-\mu _s)\tau \right| \\&\quad \le \frac{1}{4} \left[ \left|(y-\mu _t)\right|\left|2\tau -1\right| + \left|\mu _t-\mu _s\right|\tau \right] \\&\quad \text {Here, } \left|y-\mu _t\right| \le \left|\mu _t-\mu _s\right| \,\, \text {and} \,\, \left|2\tau -1\right| \le 1 \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$
Case-a (iii):

\(\mu _s \le \mu _t < y\)

Eq. A.4 simplifies to

$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) \tau \right) - \exp \left( -\frac{1}{2}(y-\mu _s)\tau \right) \right| \\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) \tau \right) \right| \left|1-\exp \left( -\frac{1}{2}(\mu _t-\mu _s) \tau \right) \right|\\&\qquad \text {As first term in modulus is } \le 1 \\&\quad \le \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2} (\mu _t-\mu _s) \tau \right) \right| \\&\qquad \text {Using the calculus inequality mentioned in A.5} \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right|\tau \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right| \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$

We can similarly bound Eq. A.4 in case-(b) where \(\mu _s > \mu _t\) by \(\left|\mu _t-\mu _s\right|/2\). Now,

$$\begin{aligned}&\left|\sqrt{f_t}-\sqrt{f_s}\right| \le \frac{1}{2}\left|\mu _t-\mu _s\right| \nonumber \\&\qquad \text {Now, let's substitute }\mu _t\text { and }\mu _s\text { from A.2 and A.3} \nonumber \\&\quad = \frac{1}{2} \left|\beta _0^t + \sum _{j=1}^k \frac{\beta _j^t}{1 + \exp {(-A_j)}} - \beta _0^s - \sum _{j=1}^k \frac{\beta _j^s}{1 + \exp {(-B_j)}}\right| \nonumber \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\frac{\beta _j^t}{1 + \exp {(-A_j)}} -\frac{\beta _j^s}{1 + \exp {(-B_j)}}\right| \right] \nonumber \\&\quad = \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\frac{\beta _j^t-\beta _j^s+\beta _j^s}{1 + \exp {(-A_j)}} -\frac{\beta _j^s}{1 + \exp {(-B_j)}}\right| \right] \nonumber \\&\quad = \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \frac{\left|\beta _j^t-\beta _j^s\right|}{1 + \exp {(-A_j)}} \right. \nonumber \\&\qquad + \left. \sum _{j=1}^k \left|\beta _j^s\right| \left|\frac{1}{1 + \exp {(-A_j)}} -\frac{1}{1 + \exp {(-B_j)}}\right| \right] \nonumber \\&\qquad \text {Recall that} \,\, \left|\beta _j^s\right| \le C_n \nonumber \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left|\frac{\exp (-B_j) - \exp (-A_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}\right| \right] \nonumber \\&\text {Note:} \,\, \left|\exp (-B_j) - \exp (-A_j)\right| \nonumber \\&\quad = \left\{ \begin{array}{ll} \exp (-A_j)(1-\exp (-(B_j-A_j))), & \,\, \text {when} \,\, B_j-A_j \ge 0 \\ exp(-B_j)(1-\exp (-(A_j-B_j))), & \,\, \text {when} \,\, A_j-B_j \ge 0 \end{array} \right. \nonumber \\&\quad \text {Using the calculus inequality mentioned in A.5} \nonumber \\&\quad \le \left\{ \begin{array}{ll} \exp (-A_j)(B_j-A_j), & \,\, \text {when} \,\, B_j-A_j \ge 0 \\ \exp (-B_j)(A_j-B_j), & \,\, \text {when} \,\, A_j-B_j \ge 0 \end{array} \right. \nonumber \\&\text {So,} \,\, \left|\frac{\exp (-B_j) - \exp (-A_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}\right| \nonumber \\&\quad \le \left\{ \begin{array}{ll} \frac{\exp (-A_j)(B_j-A_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}, & \,\, \text {when} \,\, B_j-A_j \ge 0 \\ \frac{\exp (-B_j)(A_j-B_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}, & \,\, \text {when} \,\, A_j-B_j \ge 0 \end{array} \right. \nonumber \\&\quad \le \left|A_j-B_j\right| \end{aligned}$$
(A.6)

Hence, we can bound Eq. A.6 as follows

$$\begin{aligned}&\left|\sqrt{f_t}-\sqrt{f_s}\right| \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left|A_j-B_j\right| \right] \\&\qquad \text {Now, let's substitute }A_j\text { and }B_j\text { from A.2 and A.3} \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left|\gamma _{j0}^t + \sum _{h=1}^p \gamma _{jh}^t x_{h} - \gamma _{j0}^s - \sum _{h=1}^p \gamma _{jh}^s x_{h}\right| \right] \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left( \left|\gamma _{j0}^t - \gamma _{j0}^s\right| + \sum _{h=1}^p \left|x_h\right| \left|\gamma _{jh}^t - \gamma _{jh}^s \right| \right) \right] \\&\qquad \text {Recall that} \,\, \left|x_h\right| \le 1 \,\, \text {and w.l.o.g assume} \,\, C_n > 1 \\&\quad \le \frac{C_n}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k \left( \left|\gamma _{j0}^t - \gamma _{j0}^s\right| + \sum _{h=1}^p \left|\gamma _{jh}^t - \gamma _{jh}^s \right| \right) \right] \\&\quad \le \frac{C_n d}{2} \left\Vert t-s\right\Vert _\infty \end{aligned}$$

Now rest of the steps will follow from the proof of Lemma 2 given in Lee [25, p. 635-636]. \(\square \)

Lemma 3

If there exists a constant \(r>0\) and N, such that \(\mathcal {F}_n\) satisfies \(\pi _n(\mathcal {F}_n^c)<\exp (-nr), \forall n\ge N\), then there exists a constant \(c_2\) such that \(\int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f) < \exp (-nr/2) + \exp (-nc_2\epsilon ^2)\) except on a set of probability tending to zero.

Proof

The proof is same as the proof of Lemma 3 from [25, p. 636] in BQRNN scenario. \(\square \)

Lemma 4

Let \(K_\delta \) be the KL-neighborhood as in Eq. 3.8. Suppose that for all \(\delta ,\nu \, > 0, \exists \, N\) s.t. \(\pi _n(K_\delta ) \ge \exp (-n\nu ),\, \forall n \ge N\). Then, for all \(\varsigma > 0\) and sufficiently large n, \(\int R_n(f) d\pi _n(f) > e^{-n\varsigma }\) except on a set of probability going to zero.

Proof

The proof is same as the proof of Lemma 5 from [25, p. 637] in BQRNN scenario. \(\square \)

Lemma 5

Suppose that \(\mu \) is a neural network regression with parameters \((\theta _1,\ldots \theta _d)\), and let \(\tilde{\mu }\) be another neural network with parameters \((\tilde{\theta }_1,\ldots \tilde{\theta }_{\tilde{d}_n})\). Define \(\theta _i=0\) for \(i >d\) and \(\tilde{\theta }_j=0\) for \(j>\tilde{d}_n\). Suppose that the number of nodes of \(\mu \) is k, and that the number of nodes of \(\tilde{\mu }\) is \(\tilde{k}_n=O(n^a)\) for some a, \(0<a<1\). Let

$$\begin{aligned} M_\varsigma =\{\tilde{\mu }\Big | \left|\theta _i-\tilde{\theta }_i\right| \le \varsigma , i=1,2,\ldots \} \end{aligned}$$
(A.7)

Then, for any \(\tilde{\mu } \in M_\varsigma \) and for sufficiently large n,

$$\begin{aligned} \underset{x\in \mathcal {X}}{\sup }(\tilde{\mu }(x)-\mu (x))^2 \le (5n^a)^2\varsigma ^2 \end{aligned}$$

Proof

The proof is same as the proof of Lemma 6 from [25, p. 638-639]. \(\square \)

Appendix B: Posterior Consistency Theorem Proofs

1.1 Appendix B.1: Theorem 2 Proof

For the proof of Theorem 2 and Corollary 3, we use the following notations. From Eq. 3.10, recall that

$$\begin{aligned} R_n(f) = \prod _{i=1}^n \frac{f({\varvec{x}}_i,y_i)}{f_0({\varvec{x}}_i,y_i)} \end{aligned}$$

is the ratio of likelihoods under neural network density f and the true density \(f_0\). Also, \(\mathcal {F}_n\) is the sieve as defined in Eq. 3.9. Finally, \(A_\epsilon \) is the Hellinger neighborhood of the true density \(f_0\) as in Eq. 3.5.

By Lemma 3, there exists a constant \(c_2\) such that \(\int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f) < \exp (-nr/2) + \exp (-nc_2\epsilon ^2)\) for sufficiently large n. Further, from Lemma 4, \(\int R_n(f) \mathrm {d}\pi _n(f) \ge \exp (-n\varsigma )\) for sufficiently large n.

$$\begin{aligned} P(A_\epsilon ^c |({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n))&= \frac{{\displaystyle \int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f)}}{{\displaystyle \int R_n(f) \mathrm {d}\pi _n(f)}}\\&< \frac{\exp \left( - \frac{nr}{2} \right) + \exp (-nc_2\epsilon ^2)}{\exp (-n\varsigma )} \\&= \exp \left( -n \left[ \frac{r}{2} -\varsigma \right] \right) + \exp \left( -n\epsilon ^2 [c_2 - \varsigma ] \right) \end{aligned}$$

Now we will pick \(\varsigma \) such that for \(\varphi >0\), both \(\frac{r}{2}-\varsigma > \varphi \) and \(c_2 - \varsigma > \varphi \). Thus,

$$\begin{aligned} P(A_\epsilon ^c |({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n)) \le \exp (-n\varphi ) + \exp (-n\epsilon ^2\varphi ) \end{aligned}$$

Hence, \(P(A_\epsilon ^c |({\varvec{X_1}},Y_1), \ldots , ({\varvec{X_n}},Y_n)) \overset{p}{\rightarrow } 0\). \(\square \)

1.2 Appendix B.2: Corollary 3 Proof

Theorem 2 implies that \(D_H(f_0,f) \overset{p}{\rightarrow } 0\) where \(D_H(f_0,f)\) is the Hellinger distance between \(f_0\) and f as in Eq. 3.4 and f is a random draw from the posterior. Recall from Eq. 3.6, the predictive density function

$$\begin{aligned} \hat{f}_n(.) = \int f(.)\,\mathrm {d}P(f|({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n)) \end{aligned}$$

gives rise to the predictive conditional quantile function, \(\hat{\mu }_n({\varvec{x}}) = Q_{\tau ,\hat{f}_n}(y |{\varvec{X}}={\varvec{x}})\). We next show that \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\), which in turn implies \(\hat{\mu }_n({\varvec{x}})\) converges in \(L_1\)-norm to the true conditional quantile function,

$$\begin{aligned} \mu _0({\varvec{x}}) = Q_{\tau ,f_0}(y |{\varvec{X}}={\varvec{x}}) = \beta _0 + \sum _{j=1}^k \beta _j \frac{1}{1 + \exp \left( -\gamma _{j0}- \sum _{h=1}^p \gamma _{jh} x_{ih} \right) } \end{aligned}$$

First we show that \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\). Let \(X^n = (({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n))\). For any \(\epsilon >0\):

$$\begin{aligned} D_H(f_0,\hat{f}_n)&\le \int D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \\&\quad \text {By Jensen's Inequality} \\&\le \int _{A_\epsilon } D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) + \int _{A_\epsilon ^c} D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \\&\le \int _{A_\epsilon } \epsilon \, \mathrm {d}\pi _n(f|X^n) + \int _{A_\epsilon ^c} D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \\&\le \,\, \epsilon + \int _{A_\epsilon ^c} D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \end{aligned}$$

The second term goes to zero in probability by Theorem 2 and \(\epsilon \) is arbitrary, so \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\).

In the remaining part of the proof, for notational simplicity, we take \(\hat{\mu }_n({\varvec{x}})\) and \(\mu _0({\varvec{x}})\) to be \(\hat{\mu }\) and \(\hat{\mu }_0\), respectively. The Hellinger distance between \(f_0\) and \(\hat{f}_n\) is

$$\begin{aligned} D_H(f_0,\hat{f}_n)&= \left( \iint \left[ \sqrt{\hat{f}_n({\varvec{x}},y)} - \sqrt{f_0({\varvec{x}},y)} \right] ^2 \, \mathrm {d}y \, \mathrm {d}x \right) ^{1/2} \nonumber \\&= \left( \iint \tau (1-\tau ) \left[ \exp \left( -\frac{1}{2}(y-\hat{\mu }_n) (\tau -I_{(y\le \hat{\mu }_n)}) \right) \right. \right. \nonumber \\&\quad - \left. \left. \exp \left( -\frac{1}{2}(y-\mu _0) (\tau -I_{(y\le \mu _0)}) \right) \right] ^2 \, \mathrm {d}y \, \mathrm {d}{\varvec{x}} \right) ^{1/2} \nonumber \\&= \left( 2 - 2 \iint \tau (1-\tau ) \exp \left( -\frac{1}{2}(y-\hat{\mu }_n) (\tau -I_{(y\le \hat{\mu }_n)})\right. \right. \nonumber \\&\quad -\left. \left. \frac{1}{2}(y-\mu _0) (\tau -I_{(y\le \mu _0)}) \right) \, \mathrm {d}y \, \mathrm {d}{\varvec{x}} \right) ^{1/2} \nonumber \\&\quad \text {let,} \,\, T = -\frac{1}{2}(y-\hat{\mu }_n) (\tau -I_{(y\le \hat{\mu }_n)}) -\frac{1}{2}(y-\mu _0) (\tau -I_{(y\le \mu _0)}) \nonumber \\&= \left( 2 - 2 \iint \tau (1-\tau ) \exp \left( T \right) \, \mathrm {d}y \, \mathrm {d}{\varvec{x}} \right) ^{1/2} \end{aligned}$$
(B.1)

Now let’s break T into two cases: (a) \(\hat{\mu }_n \le \mu _0\), and (b) \(\hat{\mu }_n > \mu _0\).

Case-(a):

\(\hat{\mu }_n \le \mu _0\)

$$\begin{aligned} T = {\left\{ \begin{array}{ll} - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau , & \hat{\mu }_n \le \mu _0< y \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\mu _0)}{2}, & \hat{\mu }_n \le \frac{\hat{\mu }_n+\mu _0}{2}< y \le \mu _0 \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\hat{\mu }_n)}{2}, & \hat{\mu }_n < y \le \frac{\hat{\mu }_n+\mu _0}{2} \le \mu _0 \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1), & y \le \hat{\mu }_n \le \mu _0 \end{array}\right. } \end{aligned}$$
Case-(b):

\(\hat{\mu }_n > \mu _0\)

$$\begin{aligned} T = {\left\{ \begin{array}{ll} - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau , & \mu _0 \le \hat{\mu }_n< y \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\hat{\mu }_n)}{2}, & \mu _0 \le \frac{\hat{\mu }_n+\mu _0}{2}< y \le \hat{\mu }_n \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\mu _0)}{2}, & \mu _0 < y \le \frac{\hat{\mu }_n+\mu _0}{2} \le \hat{\mu }_n \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1), & y \le \mu _0 \le \hat{\mu }_n \end{array}\right. } \end{aligned}$$

Hence now,

$$\begin{aligned}&\int \tau (1-\tau ) \exp \left( T \right) \, \mathrm {d}y \\&\quad = \int \left[ I_{(\hat{\mu }_n \le \mu _0)} + I_{( \hat{\mu }_n> \mu _0)} \right] \tau (1-\tau ) \exp \left( T \right) \mathrm {d}y \\&\quad = I_{(\hat{\mu }_n \le \mu _0)} \tau (1-\tau ) \times \left[ \int _{\mu _0}^\infty \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau \right\} \, \mathrm {d}y \right. \\&\qquad +\int _{\frac{\hat{\mu }_n+\mu _0}{2}}^{\mu _0} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\mu _0)}{2} \right\} \, \mathrm {d}y \\&\qquad + \left. \int _{\hat{\mu }_n}^{\frac{\hat{\mu }_n+\mu _0}{2}}\exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\hat{\mu }_n)}{2} \right\} \, \mathrm {d}y \right. \\&\qquad +\left. \int _{-\infty }^{\hat{\mu }_n} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) \right\} \, \mathrm {d}y \right] \\&\qquad + I_{(\hat{\mu }_n > \mu _0)} \tau (1-\tau ) \times \left[ \int _{\hat{\mu }_n}^\infty \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau \right\} \, \mathrm {d}y \right. \\&\qquad + \int _{\frac{\hat{\mu }_n+\mu _0}{2}}^{\hat{\mu }_n} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\hat{\mu }_n)}{2} \right\} \, \mathrm {d}y \\&\qquad + \left. \int _{\mu _0}^{\frac{\hat{\mu }_n+\mu _0}{2}}\exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\mu _0)}{2} \right\} \, \mathrm {d}y \right. \\&\qquad + \left. \int _{-\infty }^{\mu _0} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) \right\} \, \mathrm {d}y \right] \\&\quad = \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \end{aligned}$$

Substituting the above expression in Eq. B.1 we get,

$$\begin{aligned}&D_H(f_0,\hat{f}_n) = \left( 2 - 2 \int \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) \right. \right. \\&\quad - \left. \left. \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \right) ^{1/2} \end{aligned}$$

Since \(D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0\),

$$\begin{aligned} \int \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 1 \end{aligned}$$

Our next step is to show that above expression implies that \(\left|\hat{\mu }_n-\mu _0\right| \rightarrow 0\) a.s. on a set \(\Omega \), with probability tending to 1, and hence \(\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 0\).

We are going to prove this using contradiction technique. Suppose that, \(\left|\hat{\mu }_n-\mu _0\right| \nrightarrow 0\) a.s. on \(\Omega \). Then, there exists an \(\epsilon > 0\) and a subsequence \(\hat{\mu }_{n_i}\) such that \(\left|\hat{\mu }_{n_i}-\mu _0\right| > \epsilon \) on a set A with \(P(A)>0\). Now decompose the integral as

$$\begin{aligned}&\int \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \\&\quad = \int _A \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \\&\qquad + \int _{A^c} \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \\&\quad \le \underbrace{P(A)}_{>0} \underbrace{\left[ \frac{(1-\tau )\exp (-\epsilon \tau /2) - \tau \exp (-\epsilon (1-\tau )/2)}{1-2\tau } \right] }_{<1 \,\, \text {(max }=1\text { for }\epsilon =0\text {) and strictly }\downarrow \text { for } \epsilon \in (0,\infty )} + \underbrace{P(A^c)}_{<1} \, < 1 \end{aligned}$$

So we have a contradiction since the integral converges in probability to 1. Thus \(\left|\hat{\mu }_n-\mu _0\right| \rightarrow 0\) a.s. on \(\Omega \). Once we apply Scheffe’s theorem we get \(\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \rightarrow 0\) a.s. on \(\Omega \) and hence \(\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 0\). \(\square \)

Below we prove Theorem 1 and for that we make use of Theorem 2 and Corollary 3.

1.3 Appendix B.3: Theorem 1 Proof

We proceed by showing that with \(\mathcal {F}_n\) as in Eq. 3.9, the prior \(\pi _n\) of Theorem 1 satisfies the condition (i) and (ii) of Theorem 2.

The proof of Theorem 2 condition-(i) presented in Lee [25, proof of Theorem 1 on p. 639] holds in BQRNN case without any change. Next, we need to show that condition-(ii) holds in BQRNN model. Let \(K_\delta \) be the KL-neighborhood of the true density \(f_0\) as in Eq. 3.8 and \(\mu _0\) the corresponding conditional quantile function. We first fix a closely approximating neural network \(\mu ^*\) of \(\mu _0\). We then find a neighborhood \(M_\varsigma \) of \(\mu ^*\) as in Eq. A.7 and show that this neighborhood has sufficiently large prior probability. Suppose that \(\mu _0\) is continuous. For any \(\delta >0\), choose \(\epsilon = \delta /2\) in theorem from Funahashi [13, Theorem 1 on p.184] and let \(\mu ^*\) be a neural network such that \(\underset{x\in \mathcal {X}}{\sup }\left|\mu ^*-\mu _0\right| < \epsilon \). Let \(\varsigma =(\sqrt{\epsilon }/5n^a)=\sqrt{(\delta /50)}n^{-a}\) in Lemma 5. Then, following derivation will show us that for any \(\tilde{\mu }\in M_\varsigma , D_K(f_0,\tilde{f})\le \delta \), i.e., \(M_\varsigma \subset K_\delta \).

$$\begin{aligned} D_K(f_0,\tilde{f})&= \iint f_0(x,y) \log \frac{f_0(x,y)}{\tilde{f}(x,y)} \, \mathrm {d}y \, \mathrm {d}x \\&= \iint \left[ (y-\tilde{\mu }) (\tau -I_{(y\le \tilde{\mu })}) - (y-\mu _0)(\tau -I_{(y\le \mu _0)}) \right] f_0(y|x) \, f_0(x)\, \mathrm {d}y \, \mathrm {d}x \\&\quad \text {let,} \,\, T = (y-\tilde{\mu }) (\tau -I_{(y\le \tilde{\mu })}) - (y-\mu _0)(\tau -I_{(y\le \mu _0)}) \\&= \int \left[ \int T f_0(y|x) \, \mathrm {d}y \right] f_0(x) \, \mathrm {d}x \end{aligned}$$

Now let’s break T into two cases: (a) \(\tilde{\mu } \ge \mu _0\), and (b)  \(\tilde{\mu } < \mu _0\).

Case-(a):

\(\tilde{\mu } \ge \mu _0\)

$$\begin{aligned} T = {\left\{ \begin{array}{ll} (\mu _0-\tilde{\mu })\tau , & \mu _0 \le \tilde{\mu }< y \\ (\mu _0-\tilde{\mu })\tau -(y-\tilde{\mu }), & \mu _0 < y \le \tilde{\mu } \\ (\mu _0-\tilde{\mu })(\tau -1), & y \le \mu _0 \le \tilde{\mu } \end{array}\right. } \end{aligned}$$
Case-(b):

\(\tilde{\mu } \le \mu _0\)

$$\begin{aligned} T = {\left\{ \begin{array}{ll} (\mu _0-\tilde{\mu })\tau , & \tilde{\mu } \le \mu _0< y \\ (\mu _0-\tilde{\mu })(\tau -1)+(y-\tilde{\mu }), & \tilde{\mu } < y \le \mu _0 \\ (\mu _0-\tilde{\mu })(\tau -1), & y \le \tilde{\mu } \le \mu _0 \end{array}\right. } \end{aligned}$$

So now,

$$\begin{aligned}&\int T f_0(y|x) \, \mathrm {d}y = \int \left[ I_{(\tilde{\mu } - \mu _0 \ge 0)} \times \left\{ (\tilde{\mu }-\mu _0)(1-\tau )I_{(y\le \mu _0)} - (y-\tilde{\mu })I_{(\mu _0< y \le \tilde{\mu })} \right. \right. \\&\qquad - \left. (\tilde{\mu }-\mu _0)\tau I_{(y> \mu _0)} \right\} \\&\qquad + I_{(\tilde{\mu } - \mu _0< 0)} \times \left\{ (\tilde{\mu }-\mu _0)(1-\tau )I_{(y\le \mu _0)} + (y-\tilde{\mu })I_{(\tilde{\mu }< y \le \mu _0)} \right. \\&\qquad -\left. \left. (\tilde{\mu }-\mu _0)\tau I_{(y> \mu _0)} \right\} \right] f_0(y|x) \, \mathrm {d}y \\&\quad = \int \left[ (\tilde{\mu }-\mu _0)(1-\tau )I_{(y\le \mu _0)} - (\tilde{\mu }-\mu _0)\tau I_{(y> \mu _0)} \right. \\&\qquad -\left. (y-\mu _0+\mu _0-\tilde{\mu })I_{(\mu _0< y \le \tilde{\mu })} + (y-\mu _0+\mu _0-\tilde{\mu })I_{(\tilde{\mu }< y \le \mu _0)} \right] f_0(y|x) \, \mathrm {d}y \\&\qquad \text {let,} \,\, z = y - \mu _0, b=\tilde{\mu }-\mu _0 \,\, \text {and note that} \\&\qquad P(y\le \mu _0|x)=\tau , \text { and } P(y>\mu _0|x)=1-\tau .\\&\quad = E \left[ -(z-b)I_{(0<z<b)} + (z-b)I_{(b<z<0)}|x \right] \\&\quad \le E \left[ bI_{(0<z<b)} - bI_{(b<z<0)}|x \right] \\&\quad = \left|b\right| \times \left[ P(0<z<b|x) + P(b<z<0|x) \right] \\&\quad = \left|b\right| \times P(0<\left|z\right|<\left|b\right||x) \\&\quad \le \left|b\right| \end{aligned}$$

Hence,

$$\begin{aligned} \iint T f_0(y|x) \, \mathrm {d}y \, \mathrm {d}x&\le \int \left|b\right| f_0(x) \, \mathrm {d}x \\&= \int \left|\tilde{\mu }-\mu _0\right| f_0(x) \, \mathrm {d}x \\&= \int \left|\tilde{\mu }-\mu ^*+\mu ^*-\mu _0\right| f_0(x) \, \mathrm {d}x \\&\le \int \left[ \underset{x\in \mathcal {X}}{\sup }\left|\tilde{\mu }-\mu ^*\right| + \underset{x\in \mathcal {X}}{\sup }\left|\mu ^*-\mu _0\right| \right] f_0(x) \, \mathrm {d}x \end{aligned}$$

Use Lemma 5 and Funahashi [13, Theorem 1 on p.184] to bound the first and second term, respectively.

$$\begin{aligned}&\le \int \left[ \epsilon +\epsilon \right] f_0(x) \, \mathrm {d}x \\&= 2\epsilon = \delta \end{aligned}$$

Finally, we prove that \(\forall \delta ,\nu >0, \exists N_\nu \) s.t. \(\pi _n(K_\delta ) \ge \exp (-n\nu ) \, \forall n\ge N_\nu \),

$$\begin{aligned} \pi _n(K_\delta )&\ge \pi _n(M_\varsigma ) \\&= \prod _{i=1}^{\tilde{d}_n} \int _{\theta _i-\varsigma }^{\theta _i+\varsigma } \frac{1}{\sqrt{2\pi \sigma _0^2}} \exp \left( -\frac{1}{2\sigma _0^2} u^2 \right) \mathrm {d}u \\&\ge \prod _{i=1}^{\tilde{d}_n} 2\varsigma \underset{u\in [\theta _i-1,\theta _i+1]}{\inf } \frac{1}{\sqrt{2\pi \sigma _0^2}} \exp \left( -\frac{1}{2\sigma _0^2} u^2 \right) \\&= \prod _{i=1}^{\tilde{d}_n} \varsigma \sqrt{\frac{2}{\pi \sigma _0^2}}\exp \left( -\frac{1}{2\sigma _0^2} \vartheta _i \right) \\&\quad \vartheta _i = \max ((\theta _i-1)^2,(\theta _i+1)^2) \\&\ge \left( \varsigma \sqrt{\frac{2}{\pi \sigma _0^2}} \right) ^{\tilde{d}_n} \exp \left( -\frac{1}{2\sigma _0^2} \vartheta \tilde{d}_n \right) \qquad \text {where, } \vartheta = \underset{i}{\max }(\vartheta _1,\ldots ,\vartheta _{\tilde{d}_n}) \\&= \exp \left( -\tilde{d}_n \left[ a\log n - \log \sqrt{\frac{\delta }{25\pi \sigma _0^2}} \right] -\frac{1}{2\sigma _0^2} \vartheta \tilde{d}_n \right) \\&\quad \varsigma = \sqrt{\frac{\delta }{50}} n^{-a} \\&\ge \exp \left( -\left[ 2a \log n + \frac{\vartheta }{2\sigma _0^2} \right] \tilde{d}_n \right) \qquad \text {for large } n \\&\ge \exp \left( -\left[ 2a \log n + \frac{\vartheta }{2\sigma _0^2} \right] (p+3)n^a \right) \\&\quad \tilde{d}_n = (p+2)\tilde{k}_n + 1 \le (p+3)n^a \\&\ge \exp (-n \nu ) \qquad \text {for any } \nu \text { and } \forall n \ge N_\nu \text { for some } N_\nu \end{aligned}$$

Hence, we have proved that both the conditions of Theorem 2 hold. The result of Theorem 1 thereby follows from the Corollary 3 which is derived from Theorem 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jantre, S.R., Bhattacharya, S. & Maiti, T. Quantile Regression Neural Networks: A Bayesian Approach. J Stat Theory Pract 15, 68 (2021). https://doi.org/10.1007/s42519-021-00189-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42519-021-00189-w

Keywords

Navigation