Quantile Regression Neural Networks: A Bayesian Approach

Jantre, S. R.; Bhattacharya, S.; Maiti, T.

doi:10.1007/s42519-021-00189-w

Quantile Regression Neural Networks: A Bayesian Approach

Original Article
Published: 02 June 2021

Volume 15, article number 68, (2021)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

822 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

This article introduces a Bayesian neural network estimation method for quantile regression assuming an asymmetric Laplace distribution (ALD) for the response variable. It is shown that the posterior distribution for feedforward neural network quantile regression is asymptotically consistent under a misspecified ALD model. This consistency proof embeds the problem from density estimation domain and uses bounds on the bracketing entropy to derive the posterior consistency over Hellinger neighborhoods. This consistency result is shown in the setting where the number of hidden nodes grow with the sample size. The Bayesian implementation utilizes the normal-exponential mixture representation of the ALD density. The algorithm uses Markov chain Monte Carlo (MCMC) simulation technique - Gibbs sampling coupled with Metropolis–Hastings algorithm. We have addressed the issue of complexity associated with the afore-mentioned MCMC implementation in the context of chain convergence, choice of starting values, and step sizes. We have illustrated the proposed method with simulation studies and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Quantile-based dynamic modeling of asymmetric data: a novel Burr XII approach for positive continuous random variables

Article 24 May 2024

Omega $${{\omega}}$$ —Type Probability Models: A Parametric Modification of Probability Distributions

Article 27 May 2024

References

Alhamzawi R (2018) “Brq”: R package for bayesian quantile regression. https://cran.r-project.org/web/packages/Brq/Brq.pdf
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102
MathSciNet MATH Google Scholar
Barndorff-Nielsen OE, Shephard N (2001) Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63:167–241
Article MathSciNet Google Scholar
Barron A, Schervish MJ, Wasserman L (1999) The consistency of posterior distributions in nonparametric problems. Ann Stat 10:536–561
MathSciNet MATH Google Scholar
Benoit DF, Alhamzawi R, Yu K, den Poel DV (2017) R package ‘bayesQR’. https://cran.r-project.org/web/packages/bayesQR/bayesQR.pdf
Buntine WL, Weigend AS (1991) Bayesian back-propagation. Comp Syst 5:603–643
MATH Google Scholar
Cannon AJ (2011) R package ‘qrnn’. https://cran.r-project.org/web/packages/qrnn/qrnn.pdf
Cannon AJ (2018) Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stoch Environ Res Risk Assess 32:3207–3225
Article Google Scholar
Chen C (2007) A finite smoothing algorithm for quantile regression. J Comput Graph Stat 16:136–164
Article MathSciNet Google Scholar
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Controls Signals Syst 2:303–314
Article MathSciNet Google Scholar
Dantzig GB (1963) Linear programming and extensions. Princeton University Press, Princeton
Book Google Scholar
de Freitas N, Andrieu C, Højen-Sørensen P, Niranjan M, Gee A (2001) Sequential monte carlo methods for neural networks. In: Doucet A, de Freitas N, Gordon N (eds) Sequential monte carlo methods in practice. Springer, New York, pp 359–379
Chapter Google Scholar
Funahashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2:183–192
Article Google Scholar
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472
MATH Google Scholar
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. CRC Press, Boca Raton
Book Google Scholar
Ghosh M, Ghosh A, Chen MH, Agresti A (2000) Noninformative Priors for One-Parameter Item Models. J Stat Plan Infer 88:99–115
Article MathSciNet Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Article Google Scholar
Karmarkar N (1984) A new polynomial time algorithm for linear programming. Combinatorica 4:373–395
Article MathSciNet Google Scholar
Koenker R (2005) Quantile regression, 1st edn. Cambridge University Press, Cambridge
Book Google Scholar
Koenker R (2017) R package ‘quantreg’. https://cran.r-project.org/web/packages/quantreg/quantreg.pdf
Koenker R, Basset G (1978) Regression quantiles. Econometrica 46:33–50
Article MathSciNet Google Scholar
Koenker R, Machado J (1999) Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94:1296–1309
Article MathSciNet Google Scholar
Kottas A, Gelfand AE (2001) Bayesian semiparametric median regression modeling. J Am Stat Assoc 96:1458–1468
Article Google Scholar
Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81:1565–1578
Article MathSciNet Google Scholar
Lee HKH (2000) Consistency of posterior distributions for neural networks. Neural Netw 13:629–642
Article Google Scholar
Mackay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4:448–472
Article Google Scholar
Madsen K, Nielsen HB (1993) A finite smoothing algorithm for linear $l_1$ estimation. SIAM J Optim 3:223–235
Article MathSciNet Google Scholar
Neal RM (1996) Bayesian learning for neural networks. Springer, New York
Book Google Scholar
Papamarkou T, Hinkle J, Young M, Womble D (2019) Challenges in Bayesian inference via Markov chain Monte Carlo for neural networks. arXiv:1910.06539
Pollard D (1991) Bracketing methods in statistics and econometrics. In: Nonparametric and semiparametric methods in econometrics and statistics. In: Barnett WA, Powell J, Tauchen GE (eds) Proceedings of the Fifth international symposium in econometric theory and econometrics, pp 337–355, Cambridge: Cambridge University Press
Sriram K, Ramamoorthi RV, Ghosh P (2013) Posterior consistency of bayesian quantile regression based on the misspecified asymmetric laplace density. Bayesian Anal 8:479–504
Article MathSciNet Google Scholar
Taylor JW (2000) A quantile regression neural network approach to estimating the conditional density of multiperiod returns. J Forecast 19:299–311
Article Google Scholar
Titterington DM (2004) Bayesian methods for neural networks and related models. Stat Sci 19:128–139
Article MathSciNet Google Scholar
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Book Google Scholar
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Book Google Scholar
Walker SG, Mallick BK (1999) A Bayesian semiparametric accelerated failure time model. Biometrics 55:477–483
Article MathSciNet Google Scholar
Wasserman L (1998) Asymptotic properties of nonparametric Bayesian procedures. In: Dey D, Műller P, Sinha D (eds) Practical nonparametric and semiparametric Bayesian statistics. Springer, New York, pp 293–304
Chapter Google Scholar
Wong WH, Shen X (1995) Probability inequalities for likelihood ratios and convergence rates of sieve mles. Ann Stat 23:339–362
Article MathSciNet Google Scholar
Xu Q, Deng K, Jiang C, Sun F, Huang X (2017) Composite quantile regression neural network with applications. Exp Syst Appl 76:129–139
Article Google Scholar
Yeh I-C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Conc Res 28:1797–1808
Article Google Scholar
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Prob Lett 54:437–447
Article MathSciNet Google Scholar
Yu K, Zhang J (2005) A three-parameter asymmetric Laplace distribution and its extensions. Commun Stat Theory Methods 34:1867–1879
Article MathSciNet Google Scholar
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel P, Zellner A (eds) Bayesian inference and decision techniques: essays in honor of Bruno de Finetti. Elsevier Science Publishers Inc, New York, pp 233–243
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA
S. R. Jantre, S. Bhattacharya & T. Maiti

Authors

S. R. Jantre
View author publications
You can also search for this author in PubMed Google Scholar
S. Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
T. Maiti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Maiti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Celebrating the Centenary of Professor C. R. Rao” guest edited by, Ravi Khattree, Sreenivasa Rao Jammalamadaka , and M. B. Rao.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 159 kb)

Appendices

Appendix A: Lemmas for Posterior Consistency Proof

For all the proofs in Appendix A and Appendix B, we assume ${\varvec{X}}_{p \times 1}$ to be uniformly distributed on $[0,1]^p$ and keep them fixed. Thus, $f_0({\varvec{x}})=f({\varvec{x}})=1$. Conditional on ${\varvec{X}}$, the univariate response variable Y has asymmetric Laplace distribution with location parameter determined by the neural network. We are going to fix its scale parameter, $\sigma $, to be 1 for the posterior consistency derivations. Thus,

$$\begin{aligned} Y |{\varvec{X}} = {\varvec{x}} \sim ALD \left( \beta _0 + \sum _{j=1}^k \beta _j \frac{1}{1 + \exp { \left( -\gamma _{j0}- \sum _{h=1}^p \gamma _{jh} x_{h} \right) }}, 1, \tau \right) \end{aligned}$$

(A.1)

The number of input variables, p, is taken to be fixed, while the number of hidden nodes, k, will be allowed to grow with the sample size, n.

All the lemmas described below are taken from Lee [25].

Lemma 1

Suppose $H_{[]}(u) \le \log [(C_n^2d_n/u)^{d_n}], d_n=(p+2)k_n+1, k_n \le n^a$ and $C_n\le \exp (n^{b-a})$ for $0<a<b<1$. Then, for any fixed constants $c,\epsilon >0,$ and for all sufficiently large $n, \int _0^\epsilon \sqrt{H_{[]}(u)} \le c\sqrt{n}\epsilon ^2$.

Proof

The proof follows from the proof of Lemma 1 from [25, p. 634-635] in BQRNN case. $\square $

For Lemmas 2, 3 and 4, we make use of the following notations. From Eq. 3.10, recall

$$\begin{aligned} R_n(f) = \prod _{i=1}^n \frac{f(x_i,y_i)}{f_0(x_i,y_i)} \end{aligned}$$

is the ratio of likelihoods under neural network density f and the true density $f_0$. $\mathcal {F}_n$ is the sieve as defined in Eq. 3.9 and $A_\epsilon $ is the Hellinger neighborhood of the true density $f_0$ as in Eq. 3.5.

Lemma 2

$\underset{f\in A_\epsilon ^c \cap \mathcal {F}_n}{\sup } R_n(f) \le 4 \exp (-c_2n\epsilon ^2)$ a.s. for sufficiently large n.

Proof

Using the outline of the proof of Lemma 2 from Lee [25, p. 635], first we have to bound the Hellinger bracketing entropy using van der Vaart and Wellner [34, Theorem 2.7.11 on p.164]. Next, we use Lemma 1 to show that the conditions of Wong and Shen [38, Theorem 1 on p.348-349] hold and finally we apply that theorem to get the result presented in the Lemma 2.

In our case of BQRNN, we only need to derive first step using ALD density mentioned in Eq. A.1. And rest of the steps follow from the proof given in Lee [25]. As we are looking for the Hellinger bracketing entropy for neural networks, we use $L_2$ norm on the square root of the density functions, f. The $L_\infty $ covering number was computed above in Eq. 3.11, so here $d^*=L_\infty $. The version of van der Vaart and Wellner [34, Theorem 2.7.11] that we are interested in is

$$\begin{aligned}&\text {If} \,\, \left|\sqrt{f_t(x,y)}-\sqrt{f_s(x,y)}\right| \le d^*(s,t) F(x,y) \quad \text {for some } F, \\&\text {then,} \,\, N_{[]}(2\epsilon \left\Vert F\right\Vert _2,\mathcal {F}^*,\left\Vert .\right\Vert _2) \le N(\epsilon ,\mathcal {F}_n,d^*) \end{aligned}$$

Now let’s start by defining some notations,

$$\begin{aligned} f_t(x,y)&= \tau (1-\tau ) \exp \left( -(y-\mu _t(x)) (\tau -I_{(y\le \mu _t(x))}) \right) , \nonumber \\&\quad \text { where,} \,\, \mu _t(x) = \beta _0^t + \sum _{j=1}^k \frac{\beta _j^t}{1 + \exp {(-A_j(x))}} \,\, \text {and} \,\, A_j(x) = \gamma _{j0}^t + \sum _{h=1}^p \gamma _{jh}^t x_{h} \end{aligned}$$

(A.2)

$$\begin{aligned} f_s(x,y)&= \tau (1-\tau ) \exp \left( -(y-\mu _s(x)) (\tau -I_{(y\le \mu _s(x))}) \right) , \nonumber \\&\quad \text {where,} \,\, \mu _s(x) = \beta _0^s + \sum _{j=1}^k \frac{\beta _j^s}{1 + \exp {(-B_j(x))}} \,\, \text {and} \,\, B_j(x) = \gamma _{j0}^s + \sum _{h=1}^p \gamma _{jh}^s x_{h} \end{aligned}$$

(A.3)

For notational convenience, we drop x and y from $f_s(x,y)$, $f_t(x,y)$, $\mu _s(x)$, $\mu _t(x)$, $B_j(x)$, and $A_j(x)$ and denote them as $f_s$, $f_t$, $\mu _s$, $\mu _t$, $B_j$, and $A_j$.

$$\begin{aligned} \left|\sqrt{f_t}-\sqrt{f_s}\right|&= \sqrt{\tau (1-\tau )} \left| \exp \left( -\frac{1}{2}(y-\mu _t) (\tau -I_{(y\le \mu _t)}) \right) \right. \nonumber \\&\quad - \left. \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -I_{(y\le \mu _s)}) \right) \right| \nonumber \\&\quad \text {As,} \,\, \tau \in (0,1) \text { is fixed.}\nonumber \\&\le \frac{1}{2} \left| \exp \left( -\frac{1}{2}(y-\mu _t) (\tau -I_{(y\le \mu _t)}) \right) \right. \nonumber \\&\quad - \left. \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -I_{(y\le \mu _s)}) \right) \right| \end{aligned}$$

(A.4)

Now let’s separate above term into two cases when: (a) $\mu _s \le \mu _t$ and (b) $\mu _s > \mu _t$. Further let’s consider case-a and break it into three subcases when: (i) $y \le \mu _s \le \mu _t$, (ii) $\mu _s < y \le \mu _t$, and (iii) $\mu _s \le \mu _t < y$.

Case-a (i):

$y \le \mu _s \le \mu _t$

Eq. A.4 simplifies to

$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) - \exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) \right| \nonumber \\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) \right| \left|\exp \left( -\frac{1}{2}(\mu _s-\mu _t) (\tau -1) \right) -1\right| \nonumber \\&\qquad \text {As first term in modulus is } \le 1 \nonumber \\&\quad \le \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2}(\mu _t-\mu _s) (1-\tau ) \right) \right| \nonumber \\&\qquad \text {Note: } 1-\exp (-z) \le z \,\, \forall z \in \mathbb {R}\implies \left|1-\exp (-z)\right| \le \left|z\right| \,\, \forall z \ge 0 \nonumber \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right|(1-\tau ) \nonumber \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right| \nonumber \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$

(A.5)

Case-a (ii):

$\mu _s < y \le \mu _t$

Eq. A.4 simplifies to

$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right|\\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _s) (\tau -1) \right) -1 +1 - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right| \\&\quad \le \frac{1}{2} \left|1-\exp \left( -\frac{1}{2}(y-\mu _t) (\tau -1) \right) \right| + \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2}(y-\mu _s) \tau \right) \right| \\&\qquad \text {Let's use calculus inequality mentioned in A.5} \\&\quad \le \frac{1}{4}\left|(y-\mu _t)(\tau -1)\right| + \frac{1}{4}\left|(y-\mu _s)\tau \right| \\&\qquad \text {Both terms are positive so we will combine them in one modulus} \\&\quad = \frac{1}{4} \left|(y-\mu _t)(\tau -1) + (y-\mu _t+\mu _t-\mu _s)\tau \right| \\&\quad = \frac{1}{4} \left|(y-\mu _t)(2\tau -1) + (\mu _t-\mu _s)\tau \right| \\&\quad \le \frac{1}{4} \left[ \left|(y-\mu _t)\right|\left|2\tau -1\right| + \left|\mu _t-\mu _s\right|\tau \right] \\&\quad \text {Here, } \left|y-\mu _t\right| \le \left|\mu _t-\mu _s\right| \,\, \text {and} \,\, \left|2\tau -1\right| \le 1 \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$

Case-a (iii):

$\mu _s \le \mu _t < y$

Eq. A.4 simplifies to

$$\begin{aligned}&\frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) \tau \right) - \exp \left( -\frac{1}{2}(y-\mu _s)\tau \right) \right| \\&\quad = \frac{1}{2} \left|\exp \left( -\frac{1}{2}(y-\mu _t) \tau \right) \right| \left|1-\exp \left( -\frac{1}{2}(\mu _t-\mu _s) \tau \right) \right|\\&\qquad \text {As first term in modulus is } \le 1 \\&\quad \le \frac{1}{2} \left|1 - \exp \left( -\frac{1}{2} (\mu _t-\mu _s) \tau \right) \right| \\&\qquad \text {Using the calculus inequality mentioned in A.5} \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right|\tau \\&\quad \le \frac{1}{4}\left|\mu _t-\mu _s\right| \\&\quad \le \frac{1}{2}\left|\mu _t-\mu _s\right| \end{aligned}$$

We can similarly bound Eq. A.4 in case-(b) where $\mu _s > \mu _t$ by $\left|\mu _t-\mu _s\right|/2$. Now,

$$\begin{aligned}&\left|\sqrt{f_t}-\sqrt{f_s}\right| \le \frac{1}{2}\left|\mu _t-\mu _s\right| \nonumber \\&\qquad \text {Now, let's substitute }\mu _t\text { and }\mu _s\text { from A.2 and A.3} \nonumber \\&\quad = \frac{1}{2} \left|\beta _0^t + \sum _{j=1}^k \frac{\beta _j^t}{1 + \exp {(-A_j)}} - \beta _0^s - \sum _{j=1}^k \frac{\beta _j^s}{1 + \exp {(-B_j)}}\right| \nonumber \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\frac{\beta _j^t}{1 + \exp {(-A_j)}} -\frac{\beta _j^s}{1 + \exp {(-B_j)}}\right| \right] \nonumber \\&\quad = \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\frac{\beta _j^t-\beta _j^s+\beta _j^s}{1 + \exp {(-A_j)}} -\frac{\beta _j^s}{1 + \exp {(-B_j)}}\right| \right] \nonumber \\&\quad = \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \frac{\left|\beta _j^t-\beta _j^s\right|}{1 + \exp {(-A_j)}} \right. \nonumber \\&\qquad + \left. \sum _{j=1}^k \left|\beta _j^s\right| \left|\frac{1}{1 + \exp {(-A_j)}} -\frac{1}{1 + \exp {(-B_j)}}\right| \right] \nonumber \\&\qquad \text {Recall that} \,\, \left|\beta _j^s\right| \le C_n \nonumber \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left|\frac{\exp (-B_j) - \exp (-A_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}\right| \right] \nonumber \\&\text {Note:} \,\, \left|\exp (-B_j) - \exp (-A_j)\right| \nonumber \\&\quad = \left\{ \begin{array}{ll} \exp (-A_j)(1-\exp (-(B_j-A_j))), & \,\, \text {when} \,\, B_j-A_j \ge 0 \\ exp(-B_j)(1-\exp (-(A_j-B_j))), & \,\, \text {when} \,\, A_j-B_j \ge 0 \end{array} \right. \nonumber \\&\quad \text {Using the calculus inequality mentioned in A.5} \nonumber \\&\quad \le \left\{ \begin{array}{ll} \exp (-A_j)(B_j-A_j), & \,\, \text {when} \,\, B_j-A_j \ge 0 \\ \exp (-B_j)(A_j-B_j), & \,\, \text {when} \,\, A_j-B_j \ge 0 \end{array} \right. \nonumber \\&\text {So,} \,\, \left|\frac{\exp (-B_j) - \exp (-A_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}\right| \nonumber \\&\quad \le \left\{ \begin{array}{ll} \frac{\exp (-A_j)(B_j-A_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}, & \,\, \text {when} \,\, B_j-A_j \ge 0 \\ \frac{\exp (-B_j)(A_j-B_j)}{(1 + \exp (-A_j))(1 + \exp (-B_j))}, & \,\, \text {when} \,\, A_j-B_j \ge 0 \end{array} \right. \nonumber \\&\quad \le \left|A_j-B_j\right| \end{aligned}$$

(A.6)

Hence, we can bound Eq. A.6 as follows

$$\begin{aligned}&\left|\sqrt{f_t}-\sqrt{f_s}\right| \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left|A_j-B_j\right| \right] \\&\qquad \text {Now, let's substitute }A_j\text { and }B_j\text { from A.2 and A.3} \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left|\gamma _{j0}^t + \sum _{h=1}^p \gamma _{jh}^t x_{h} - \gamma _{j0}^s - \sum _{h=1}^p \gamma _{jh}^s x_{h}\right| \right] \\&\quad \le \frac{1}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k C_n \left( \left|\gamma _{j0}^t - \gamma _{j0}^s\right| + \sum _{h=1}^p \left|x_h\right| \left|\gamma _{jh}^t - \gamma _{jh}^s \right| \right) \right] \\&\qquad \text {Recall that} \,\, \left|x_h\right| \le 1 \,\, \text {and w.l.o.g assume} \,\, C_n > 1 \\&\quad \le \frac{C_n}{2} \left[ \left|\beta _0^t - \beta _0^s\right| + \sum _{j=1}^k \left|\beta _j^t-\beta _j^s\right| + \sum _{j=1}^k \left( \left|\gamma _{j0}^t - \gamma _{j0}^s\right| + \sum _{h=1}^p \left|\gamma _{jh}^t - \gamma _{jh}^s \right| \right) \right] \\&\quad \le \frac{C_n d}{2} \left\Vert t-s\right\Vert _\infty \end{aligned}$$

Now rest of the steps will follow from the proof of Lemma 2 given in Lee [25, p. 635-636]. $\square $

Lemma 3

If there exists a constant $r>0$ and N, such that $\mathcal {F}_n$ satisfies $\pi _n(\mathcal {F}_n^c)<\exp (-nr), \forall n\ge N$, then there exists a constant $c_2$ such that $\int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f) < \exp (-nr/2) + \exp (-nc_2\epsilon ^2)$ except on a set of probability tending to zero.

Proof

The proof is same as the proof of Lemma 3 from [25, p. 636] in BQRNN scenario. $\square $

Lemma 4

Let $K_\delta $ be the KL-neighborhood as in Eq. 3.8. Suppose that for all $\delta ,\nu \, > 0, \exists \, N$ s.t. $\pi _n(K_\delta ) \ge \exp (-n\nu ),\, \forall n \ge N$. Then, for all $\varsigma > 0$ and sufficiently large n, $\int R_n(f) d\pi _n(f) > e^{-n\varsigma }$ except on a set of probability going to zero.

Proof

The proof is same as the proof of Lemma 5 from [25, p. 637] in BQRNN scenario. $\square $

Lemma 5

Suppose that $\mu $ is a neural network regression with parameters $(\theta _1,\ldots \theta _d)$, and let $\tilde{\mu }$ be another neural network with parameters $(\tilde{\theta }_1,\ldots \tilde{\theta }_{\tilde{d}_n})$. Define $\theta _i=0$ for $i >d$ and $\tilde{\theta }_j=0$ for $j>\tilde{d}_n$. Suppose that the number of nodes of $\mu $ is k, and that the number of nodes of $\tilde{\mu }$ is $\tilde{k}_n=O(n^a)$ for some a, $0<a<1$. Let

$$\begin{aligned} M_\varsigma =\{\tilde{\mu }\Big | \left|\theta _i-\tilde{\theta }_i\right| \le \varsigma , i=1,2,\ldots \} \end{aligned}$$

(A.7)

Then, for any $\tilde{\mu } \in M_\varsigma $ and for sufficiently large n,

$$\begin{aligned} \underset{x\in \mathcal {X}}{\sup }(\tilde{\mu }(x)-\mu (x))^2 \le (5n^a)^2\varsigma ^2 \end{aligned}$$

Proof

The proof is same as the proof of Lemma 6 from [25, p. 638-639]. $\square $

Appendix B: Posterior Consistency Theorem Proofs

1.1 Appendix B.1: Theorem 2 Proof

For the proof of Theorem 2 and Corollary 3, we use the following notations. From Eq. 3.10, recall that

$$\begin{aligned} R_n(f) = \prod _{i=1}^n \frac{f({\varvec{x}}_i,y_i)}{f_0({\varvec{x}}_i,y_i)} \end{aligned}$$

is the ratio of likelihoods under neural network density f and the true density $f_0$. Also, $\mathcal {F}_n$ is the sieve as defined in Eq. 3.9. Finally, $A_\epsilon $ is the Hellinger neighborhood of the true density $f_0$ as in Eq. 3.5.

By Lemma 3, there exists a constant $c_2$ such that $\int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f) < \exp (-nr/2) + \exp (-nc_2\epsilon ^2)$ for sufficiently large n. Further, from Lemma 4, $\int R_n(f) \mathrm {d}\pi _n(f) \ge \exp (-n\varsigma )$ for sufficiently large n.

$$\begin{aligned} P(A_\epsilon ^c |({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n))&= \frac{{\displaystyle \int _{A_\epsilon ^c} R_n(f) \mathrm {d}\pi _n(f)}}{{\displaystyle \int R_n(f) \mathrm {d}\pi _n(f)}}\\&< \frac{\exp \left( - \frac{nr}{2} \right) + \exp (-nc_2\epsilon ^2)}{\exp (-n\varsigma )} \\&= \exp \left( -n \left[ \frac{r}{2} -\varsigma \right] \right) + \exp \left( -n\epsilon ^2 [c_2 - \varsigma ] \right) \end{aligned}$$

Now we will pick $\varsigma $ such that for $\varphi >0$, both $\frac{r}{2}-\varsigma > \varphi $ and $c_2 - \varsigma > \varphi $. Thus,

$$\begin{aligned} P(A_\epsilon ^c |({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n)) \le \exp (-n\varphi ) + \exp (-n\epsilon ^2\varphi ) \end{aligned}$$

Hence, $P(A_\epsilon ^c |({\varvec{X_1}},Y_1), \ldots , ({\varvec{X_n}},Y_n)) \overset{p}{\rightarrow } 0$. $\square $

1.2 Appendix B.2: Corollary 3 Proof

Theorem 2 implies that $D_H(f_0,f) \overset{p}{\rightarrow } 0$ where $D_H(f_0,f)$ is the Hellinger distance between $f_0$ and f as in Eq. 3.4 and f is a random draw from the posterior. Recall from Eq. 3.6, the predictive density function

$$\begin{aligned} \hat{f}_n(.) = \int f(.)\,\mathrm {d}P(f|({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n)) \end{aligned}$$

gives rise to the predictive conditional quantile function, $\hat{\mu }_n({\varvec{x}}) = Q_{\tau ,\hat{f}_n}(y |{\varvec{X}}={\varvec{x}})$. We next show that $D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0$, which in turn implies $\hat{\mu }_n({\varvec{x}})$ converges in $L_1$-norm to the true conditional quantile function,

$$\begin{aligned} \mu _0({\varvec{x}}) = Q_{\tau ,f_0}(y |{\varvec{X}}={\varvec{x}}) = \beta _0 + \sum _{j=1}^k \beta _j \frac{1}{1 + \exp \left( -\gamma _{j0}- \sum _{h=1}^p \gamma _{jh} x_{ih} \right) } \end{aligned}$$

First we show that $D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0$. Let $X^n = (({\varvec{X_1}},Y_1),\ldots ,({\varvec{X_n}},Y_n))$. For any $\epsilon >0$:

$$\begin{aligned} D_H(f_0,\hat{f}_n)&\le \int D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \\&\quad \text {By Jensen's Inequality} \\&\le \int _{A_\epsilon } D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) + \int _{A_\epsilon ^c} D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \\&\le \int _{A_\epsilon } \epsilon \, \mathrm {d}\pi _n(f|X^n) + \int _{A_\epsilon ^c} D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \\&\le \,\, \epsilon + \int _{A_\epsilon ^c} D_H(f_0,f) \, \mathrm {d}\pi _n(f|X^n) \end{aligned}$$

The second term goes to zero in probability by Theorem 2 and $\epsilon $ is arbitrary, so $D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0$.

In the remaining part of the proof, for notational simplicity, we take $\hat{\mu }_n({\varvec{x}})$ and $\mu _0({\varvec{x}})$ to be $\hat{\mu }$ and $\hat{\mu }_0$, respectively. The Hellinger distance between $f_0$ and $\hat{f}_n$ is

$$\begin{aligned} D_H(f_0,\hat{f}_n)&= \left( \iint \left[ \sqrt{\hat{f}_n({\varvec{x}},y)} - \sqrt{f_0({\varvec{x}},y)} \right] ^2 \, \mathrm {d}y \, \mathrm {d}x \right) ^{1/2} \nonumber \\&= \left( \iint \tau (1-\tau ) \left[ \exp \left( -\frac{1}{2}(y-\hat{\mu }_n) (\tau -I_{(y\le \hat{\mu }_n)}) \right) \right. \right. \nonumber \\&\quad - \left. \left. \exp \left( -\frac{1}{2}(y-\mu _0) (\tau -I_{(y\le \mu _0)}) \right) \right] ^2 \, \mathrm {d}y \, \mathrm {d}{\varvec{x}} \right) ^{1/2} \nonumber \\&= \left( 2 - 2 \iint \tau (1-\tau ) \exp \left( -\frac{1}{2}(y-\hat{\mu }_n) (\tau -I_{(y\le \hat{\mu }_n)})\right. \right. \nonumber \\&\quad -\left. \left. \frac{1}{2}(y-\mu _0) (\tau -I_{(y\le \mu _0)}) \right) \, \mathrm {d}y \, \mathrm {d}{\varvec{x}} \right) ^{1/2} \nonumber \\&\quad \text {let,} \,\, T = -\frac{1}{2}(y-\hat{\mu }_n) (\tau -I_{(y\le \hat{\mu }_n)}) -\frac{1}{2}(y-\mu _0) (\tau -I_{(y\le \mu _0)}) \nonumber \\&= \left( 2 - 2 \iint \tau (1-\tau ) \exp \left( T \right) \, \mathrm {d}y \, \mathrm {d}{\varvec{x}} \right) ^{1/2} \end{aligned}$$

(B.1)

Now let’s break T into two cases: (a) $\hat{\mu }_n \le \mu _0$, and (b) $\hat{\mu }_n > \mu _0$.

Case-(a):: $\hat{\mu }_n \le \mu _0$
$$\begin{aligned} T = {\left\{ \begin{array}{ll} - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau , & \hat{\mu }_n \le \mu _0< y \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\mu _0)}{2}, & \hat{\mu }_n \le \frac{\hat{\mu }_n+\mu _0}{2}< y \le \mu _0 \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\hat{\mu }_n)}{2}, & \hat{\mu }_n < y \le \frac{\hat{\mu }_n+\mu _0}{2} \le \mu _0 \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1), & y \le \hat{\mu }_n \le \mu _0 \end{array}\right. } \end{aligned}$$
Case-(b):: $\hat{\mu }_n > \mu _0$
$$\begin{aligned} T = {\left\{ \begin{array}{ll} - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau , & \mu _0 \le \hat{\mu }_n< y \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\hat{\mu }_n)}{2}, & \mu _0 \le \frac{\hat{\mu }_n+\mu _0}{2}< y \le \hat{\mu }_n \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\mu _0)}{2}, & \mu _0 < y \le \frac{\hat{\mu }_n+\mu _0}{2} \le \hat{\mu }_n \\ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1), & y \le \mu _0 \le \hat{\mu }_n \end{array}\right. } \end{aligned}$$

Hence now,

$$\begin{aligned}&\int \tau (1-\tau ) \exp \left( T \right) \, \mathrm {d}y \\&\quad = \int \left[ I_{(\hat{\mu }_n \le \mu _0)} + I_{( \hat{\mu }_n> \mu _0)} \right] \tau (1-\tau ) \exp \left( T \right) \mathrm {d}y \\&\quad = I_{(\hat{\mu }_n \le \mu _0)} \tau (1-\tau ) \times \left[ \int _{\mu _0}^\infty \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau \right\} \, \mathrm {d}y \right. \\&\qquad +\int _{\frac{\hat{\mu }_n+\mu _0}{2}}^{\mu _0} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\mu _0)}{2} \right\} \, \mathrm {d}y \\&\qquad + \left. \int _{\hat{\mu }_n}^{\frac{\hat{\mu }_n+\mu _0}{2}}\exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\hat{\mu }_n)}{2} \right\} \, \mathrm {d}y \right. \\&\qquad +\left. \int _{-\infty }^{\hat{\mu }_n} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) \right\} \, \mathrm {d}y \right] \\&\qquad + I_{(\hat{\mu }_n > \mu _0)} \tau (1-\tau ) \times \left[ \int _{\hat{\mu }_n}^\infty \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau \right\} \, \mathrm {d}y \right. \\&\qquad + \int _{\frac{\hat{\mu }_n+\mu _0}{2}}^{\hat{\mu }_n} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) \tau + \frac{(y-\hat{\mu }_n)}{2} \right\} \, \mathrm {d}y \\&\qquad + \left. \int _{\mu _0}^{\frac{\hat{\mu }_n+\mu _0}{2}}\exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) - \frac{(y-\mu _0)}{2} \right\} \, \mathrm {d}y \right. \\&\qquad + \left. \int _{-\infty }^{\mu _0} \exp \left\{ - \left( y - \frac{\hat{\mu }_n+\mu _0}{2} \right) (\tau -1) \right\} \, \mathrm {d}y \right] \\&\quad = \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \end{aligned}$$

Substituting the above expression in Eq. B.1 we get,

$$\begin{aligned}&D_H(f_0,\hat{f}_n) = \left( 2 - 2 \int \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) \right. \right. \\&\quad - \left. \left. \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \right) ^{1/2} \end{aligned}$$

Since $D_H(f_0,\hat{f}_n) \overset{p}{\rightarrow } 0$,

$$\begin{aligned} \int \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 1 \end{aligned}$$

Our next step is to show that above expression implies that $\left|\hat{\mu }_n-\mu _0\right| \rightarrow 0$ a.s. on a set $\Omega $, with probability tending to 1, and hence $\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 0$.

We are going to prove this using contradiction technique. Suppose that, $\left|\hat{\mu }_n-\mu _0\right| \nrightarrow 0$ a.s. on $\Omega $. Then, there exists an $\epsilon > 0$ and a subsequence $\hat{\mu }_{n_i}$ such that $\left|\hat{\mu }_{n_i}-\mu _0\right| > \epsilon $ on a set A with $P(A)>0$. Now decompose the integral as

$$\begin{aligned}&\int \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \\&\quad = \int _A \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \\&\qquad + \int _{A^c} \left[ \frac{1-\tau }{1-2\tau }\exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} \tau \right) - \frac{\tau }{1-2\tau } \exp \left( -\frac{\left|\hat{\mu }_n-\mu _0\right|}{2} (1-\tau ) \right) \right] \mathrm {d}{\varvec{x}} \\&\quad \le \underbrace{P(A)}_{>0} \underbrace{\left[ \frac{(1-\tau )\exp (-\epsilon \tau /2) - \tau \exp (-\epsilon (1-\tau )/2)}{1-2\tau } \right] }_{<1 \,\, \text {(max }=1\text { for }\epsilon =0\text {) and strictly }\downarrow \text { for } \epsilon \in (0,\infty )} + \underbrace{P(A^c)}_{<1} \, < 1 \end{aligned}$$

So we have a contradiction since the integral converges in probability to 1. Thus $\left|\hat{\mu }_n-\mu _0\right| \rightarrow 0$ a.s. on $\Omega $. Once we apply Scheffe’s theorem we get $\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \rightarrow 0$ a.s. on $\Omega $ and hence $\int \left|\hat{\mu }_n-\mu _0\right| \mathrm {d}{\varvec{x}} \overset{p}{\rightarrow } 0$. $\square $

Below we prove Theorem 1 and for that we make use of Theorem 2 and Corollary 3.

1.3 Appendix B.3: Theorem 1 Proof

We proceed by showing that with $\mathcal {F}_n$ as in Eq. 3.9, the prior $\pi _n$ of Theorem 1 satisfies the condition (i) and (ii) of Theorem 2.

The proof of Theorem 2 condition-(i) presented in Lee [25, proof of Theorem 1 on p. 639] holds in BQRNN case without any change. Next, we need to show that condition-(ii) holds in BQRNN model. Let $K_\delta $ be the KL-neighborhood of the true density $f_0$ as in Eq. 3.8 and $\mu _0$ the corresponding conditional quantile function. We first fix a closely approximating neural network $\mu ^*$ of $\mu _0$. We then find a neighborhood $M_\varsigma $ of $\mu ^*$ as in Eq. A.7 and show that this neighborhood has sufficiently large prior probability. Suppose that $\mu _0$ is continuous. For any $\delta >0$, choose $\epsilon = \delta /2$ in theorem from Funahashi [13, Theorem 1 on p.184] and let $\mu ^*$ be a neural network such that $\underset{x\in \mathcal {X}}{\sup }\left|\mu ^*-\mu _0\right| < \epsilon $. Let $\varsigma =(\sqrt{\epsilon }/5n^a)=\sqrt{(\delta /50)}n^{-a}$ in Lemma 5. Then, following derivation will show us that for any $\tilde{\mu }\in M_\varsigma , D_K(f_0,\tilde{f})\le \delta $, i.e., $M_\varsigma \subset K_\delta $.

$$\begin{aligned} D_K(f_0,\tilde{f})&= \iint f_0(x,y) \log \frac{f_0(x,y)}{\tilde{f}(x,y)} \, \mathrm {d}y \, \mathrm {d}x \\&= \iint \left[ (y-\tilde{\mu }) (\tau -I_{(y\le \tilde{\mu })}) - (y-\mu _0)(\tau -I_{(y\le \mu _0)}) \right] f_0(y|x) \, f_0(x)\, \mathrm {d}y \, \mathrm {d}x \\&\quad \text {let,} \,\, T = (y-\tilde{\mu }) (\tau -I_{(y\le \tilde{\mu })}) - (y-\mu _0)(\tau -I_{(y\le \mu _0)}) \\&= \int \left[ \int T f_0(y|x) \, \mathrm {d}y \right] f_0(x) \, \mathrm {d}x \end{aligned}$$

Now let’s break T into two cases: (a) $\tilde{\mu } \ge \mu _0$, and (b) $\tilde{\mu } < \mu _0$.

Case-(a):: $\tilde{\mu } \ge \mu _0$
$$\begin{aligned} T = {\left\{ \begin{array}{ll} (\mu _0-\tilde{\mu })\tau , & \mu _0 \le \tilde{\mu }< y \\ (\mu _0-\tilde{\mu })\tau -(y-\tilde{\mu }), & \mu _0 < y \le \tilde{\mu } \\ (\mu _0-\tilde{\mu })(\tau -1), & y \le \mu _0 \le \tilde{\mu } \end{array}\right. } \end{aligned}$$
Case-(b):: $\tilde{\mu } \le \mu _0$
$$\begin{aligned} T = {\left\{ \begin{array}{ll} (\mu _0-\tilde{\mu })\tau , & \tilde{\mu } \le \mu _0< y \\ (\mu _0-\tilde{\mu })(\tau -1)+(y-\tilde{\mu }), & \tilde{\mu } < y \le \mu _0 \\ (\mu _0-\tilde{\mu })(\tau -1), & y \le \tilde{\mu } \le \mu _0 \end{array}\right. } \end{aligned}$$

So now,

$$\begin{aligned}&\int T f_0(y|x) \, \mathrm {d}y = \int \left[ I_{(\tilde{\mu } - \mu _0 \ge 0)} \times \left\{ (\tilde{\mu }-\mu _0)(1-\tau )I_{(y\le \mu _0)} - (y-\tilde{\mu })I_{(\mu _0< y \le \tilde{\mu })} \right. \right. \\&\qquad - \left. (\tilde{\mu }-\mu _0)\tau I_{(y> \mu _0)} \right\} \\&\qquad + I_{(\tilde{\mu } - \mu _0< 0)} \times \left\{ (\tilde{\mu }-\mu _0)(1-\tau )I_{(y\le \mu _0)} + (y-\tilde{\mu })I_{(\tilde{\mu }< y \le \mu _0)} \right. \\&\qquad -\left. \left. (\tilde{\mu }-\mu _0)\tau I_{(y> \mu _0)} \right\} \right] f_0(y|x) \, \mathrm {d}y \\&\quad = \int \left[ (\tilde{\mu }-\mu _0)(1-\tau )I_{(y\le \mu _0)} - (\tilde{\mu }-\mu _0)\tau I_{(y> \mu _0)} \right. \\&\qquad -\left. (y-\mu _0+\mu _0-\tilde{\mu })I_{(\mu _0< y \le \tilde{\mu })} + (y-\mu _0+\mu _0-\tilde{\mu })I_{(\tilde{\mu }< y \le \mu _0)} \right] f_0(y|x) \, \mathrm {d}y \\&\qquad \text {let,} \,\, z = y - \mu _0, b=\tilde{\mu }-\mu _0 \,\, \text {and note that} \\&\qquad P(y\le \mu _0|x)=\tau , \text { and } P(y>\mu _0|x)=1-\tau .\\&\quad = E \left[ -(z-b)I_{(0<z<b)} + (z-b)I_{(b<z<0)}|x \right] \\&\quad \le E \left[ bI_{(0<z<b)} - bI_{(b<z<0)}|x \right] \\&\quad = \left|b\right| \times \left[ P(0<z<b|x) + P(b<z<0|x) \right] \\&\quad = \left|b\right| \times P(0<\left|z\right|<\left|b\right||x) \\&\quad \le \left|b\right| \end{aligned}$$

Hence,

$$\begin{aligned} \iint T f_0(y|x) \, \mathrm {d}y \, \mathrm {d}x&\le \int \left|b\right| f_0(x) \, \mathrm {d}x \\&= \int \left|\tilde{\mu }-\mu _0\right| f_0(x) \, \mathrm {d}x \\&= \int \left|\tilde{\mu }-\mu ^*+\mu ^*-\mu _0\right| f_0(x) \, \mathrm {d}x \\&\le \int \left[ \underset{x\in \mathcal {X}}{\sup }\left|\tilde{\mu }-\mu ^*\right| + \underset{x\in \mathcal {X}}{\sup }\left|\mu ^*-\mu _0\right| \right] f_0(x) \, \mathrm {d}x \end{aligned}$$

Use Lemma 5 and Funahashi [13, Theorem 1 on p.184] to bound the first and second term, respectively.

$$\begin{aligned}&\le \int \left[ \epsilon +\epsilon \right] f_0(x) \, \mathrm {d}x \\&= 2\epsilon = \delta \end{aligned}$$

Finally, we prove that $\forall \delta ,\nu >0, \exists N_\nu $ s.t. $\pi _n(K_\delta ) \ge \exp (-n\nu ) \, \forall n\ge N_\nu $,

$$\begin{aligned} \pi _n(K_\delta )&\ge \pi _n(M_\varsigma ) \\&= \prod _{i=1}^{\tilde{d}_n} \int _{\theta _i-\varsigma }^{\theta _i+\varsigma } \frac{1}{\sqrt{2\pi \sigma _0^2}} \exp \left( -\frac{1}{2\sigma _0^2} u^2 \right) \mathrm {d}u \\&\ge \prod _{i=1}^{\tilde{d}_n} 2\varsigma \underset{u\in [\theta _i-1,\theta _i+1]}{\inf } \frac{1}{\sqrt{2\pi \sigma _0^2}} \exp \left( -\frac{1}{2\sigma _0^2} u^2 \right) \\&= \prod _{i=1}^{\tilde{d}_n} \varsigma \sqrt{\frac{2}{\pi \sigma _0^2}}\exp \left( -\frac{1}{2\sigma _0^2} \vartheta _i \right) \\&\quad \vartheta _i = \max ((\theta _i-1)^2,(\theta _i+1)^2) \\&\ge \left( \varsigma \sqrt{\frac{2}{\pi \sigma _0^2}} \right) ^{\tilde{d}_n} \exp \left( -\frac{1}{2\sigma _0^2} \vartheta \tilde{d}_n \right) \qquad \text {where, } \vartheta = \underset{i}{\max }(\vartheta _1,\ldots ,\vartheta _{\tilde{d}_n}) \\&= \exp \left( -\tilde{d}_n \left[ a\log n - \log \sqrt{\frac{\delta }{25\pi \sigma _0^2}} \right] -\frac{1}{2\sigma _0^2} \vartheta \tilde{d}_n \right) \\&\quad \varsigma = \sqrt{\frac{\delta }{50}} n^{-a} \\&\ge \exp \left( -\left[ 2a \log n + \frac{\vartheta }{2\sigma _0^2} \right] \tilde{d}_n \right) \qquad \text {for large } n \\&\ge \exp \left( -\left[ 2a \log n + \frac{\vartheta }{2\sigma _0^2} \right] (p+3)n^a \right) \\&\quad \tilde{d}_n = (p+2)\tilde{k}_n + 1 \le (p+3)n^a \\&\ge \exp (-n \nu ) \qquad \text {for any } \nu \text { and } \forall n \ge N_\nu \text { for some } N_\nu \end{aligned}$$

Hence, we have proved that both the conditions of Theorem 2 hold. The result of Theorem 1 thereby follows from the Corollary 3 which is derived from Theorem 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jantre, S.R., Bhattacharya, S. & Maiti, T. Quantile Regression Neural Networks: A Bayesian Approach. J Stat Theory Pract 15, 68 (2021). https://doi.org/10.1007/s42519-021-00189-w

Download citation

Accepted: 01 March 2021
Published: 02 June 2021
DOI: https://doi.org/10.1007/s42519-021-00189-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantile Regression Neural Networks: A Bayesian Approach

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Quantile-based dynamic modeling of asymmetric data: a novel Burr XII approach for positive continuous random variables

Omega $${{\omega}}$$ —Type Probability Models: A Parametric Modification of Probability Distributions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 159 kb)

Appendices

Appendix A: Lemmas for Posterior Consistency Proof

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Appendix B: Posterior Consistency Theorem Proofs

1.1 Appendix B.1: Theorem 2 Proof

1.2 Appendix B.2: Corollary 3 Proof

1.3 Appendix B.3: Theorem 1 Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quantile Regression Neural Networks: A Bayesian Approach

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Quantile-based dynamic modeling of asymmetric data: a novel Burr XII approach for positive continuous random variables

Omega $${{\omega}}$$ —Type Probability Models: A Parametric Modification of Probability Distributions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 159 kb)

Appendices

Appendix A: Lemmas for Posterior Consistency Proof

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Appendix B: Posterior Consistency Theorem Proofs

1.1 Appendix B.1: Theorem 2 Proof

1.2 Appendix B.2: Corollary 3 Proof

1.3 Appendix B.3: Theorem 1 Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation