1 Introduction

Regression models for dealing with responses following a non-normal distribution have been drawing significant attention in the literature. For example, quantile regression and expectile regression have been widely developed in the literature and increasingly applied to a greater variety of scientific questions. See, [1,2,3,4,5] and among others.

Typically, quantile regression estimates various conditional quantiles of a response or dependent random variable, including the median (0.5th quantile). Putting different quantile regressions together provide a more complete description of the underlying conditional distribution of the response than a simple mean regression. This is particularly useful when the conditional distribution is asymmetric or heterogeneous or fat-tailed or truncated. Quantile regression has been widely used in statistics and numerous application areas. Bayesian quantile regression for continuous responses has received increasing attention from both theoretical and empirical viewpoints. See a recent review by [6] for the first type of Bayesian quantile regression methods ([7,8,9] and among others) based on asymmetric Laplace distribution (ALD) likelihood function. But among numerous application areas of regression models, discrete observations such as integer values (e.g., -2, -1, 0, 1, 2, 3, etc.) on a response are easily collected. In particular, many big data nowadays contain discrete observations such as number of online transaction, number of days in hospital, number of votes and so on. Classic regression models for discrete responses include logistic, Poisson and negative Binomial regression. Discrete responses are generally skewed, the mean-based regression analysis would not be sufficient for a complete analysis ([10]).

However, quantile regression for discrete responses receives far less attention than for continuous responses in the literature. Binary quantile regression was first introduced by [11]. Then, several authors ([12,13,14] and among others) developed different smoothed estimation techniques (nonparametric or semiparametric methods) for the binary quantile regression model under frequentist approaches. Based on the idea of linking ALD to latent variables in Bayesian Tobit quantile regression ([15]), the papers by [16, 17] and among others did proposed Bayesian inference binary quantile regression. Based on a normal-exponential mixture representation of ALD, [18] and [19] extended Bayesian binary quantile regression to Bayesain ordinal quantile regression. Then, [20] further extended it to Bayesain inference of single-index quantile regression for ordinal data. [21, 22] and [23] applied these discrete quantile regression methods in economics, energy and education, respectively.

But all these research methods are not a direct Bayesian quantile regression approach for discrete responses, or deal with quantile regression for general discrete responses. Similarly, there is little research on expectile regression for discrete responses, let alone from a Bayesian perspective ([24]). A semi-parametric jittering approach for general count quantile regression has been introduced ([25]), but some degree of smoothness has to be artificially imposed on the approach. Quantile regression for count data may be achieved via density regression as shown in [26], but this approach may result in a global estimation of regression coefficients. In this paper, we propose Bayesian inference quantile regression for discrete responses via introducing a discrete version of ALD-based likelihood function. This approach not only keeps the ‘local property’ of quantile regression, but also enjoys the coherency and finite posterior moments of the posterior distribution. Along this line, we then introduce Bayesian expectile regression for discrete responses, which proceeds by forming the likelihood function based on a discrete asymmetric normal distribution (DAND). Section 2 introduces a discrete asymmetric Laplace distribution (DALD) and discusses its natural link with quantile regression for discrete responses. Sections 3 and 4 detail this Bayesian approach for quantile regression and expectile regression with discrete responses, respectively. Section 5 illustrates the numerical performance of the proposed method. Section 6 concludes with a brief discussion.

2 Discrete Asymmetric Laplace Distribution

Let Y be a real-valued random variable with its \(\tau \)-th (\(0<\tau <1\)) quantile \(\mu \) (\(-\infty<\mu <\infty \)), and then it is well-known that \(\mu \) could be found by minimizing the expected loss of Y with respect to the loss function (or check function) \(\rho _\tau (y)= y(\tau -I(y<0)),\) or \(\min _\mu E_{F_0(Y)}\rho _\tau (Y-\mu )\), where \(F_0(Y)\) denotes the distribution function of Y, which is usually unknown in practice.

When Y is a continuous random variable, the inference based on the loss function \(\rho _\tau (y-\mu )\) was linked to a maximum likelihood inference based on an \(\mathrm {ALD}(\mu , \tau )\) with local parameter \(\mu \) and shape parameter \(\tau \):

$$\begin{aligned} f(y;\mu ,\tau ) =\tau (1-\tau ) \, \exp \left\{ -\rho _\tau \left( y - \mu \right) \right\} . \end{aligned}$$
(1)

Now, if Y is a discrete random variable, let Y take integer values in \({\mathbb {Z}}\). We first derive a discrete version of ALD or a DALD and then show that the \(\tau \)th quantile \(\mu \) can also be estimated via this DALD.

To this end, note that the corresponding cumulative distribution function (c.d.f.) of an ALD in Eq.(1) can be written as:

$$\begin{aligned} F(y; \mu , \tau ) = {\left\{ \begin{array}{ll} 1 - (1-\tau )\exp \left\{ -\tau (y-\mu )\right\} , &{} \quad y \ge \mu ,\\ \tau \exp \left\{ (1-\tau )(y-\mu )\right\} , &{} \quad y < \mu .\\ \end{array}\right. } \end{aligned}$$
(2)

Let \(S(y;\mu ,\tau )\) be the survival function of this ALD, which is given by:

$$\begin{aligned} \begin{aligned} S(y;\mu ,\tau )&= 1-F(y;\mu ,\tau ) \\&= {\left\{ \begin{array}{ll} (1-\tau )\exp \left\{ -\tau (y-\mu )\right\} , &{} \quad y \ge \mu ,\\ 1- \tau \exp \left\{ (1-\tau )(y-\mu )\right\} , &{} \quad y < \mu ,\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(3)

then, according to [27], the probability mass function (p.m.f.) of a DALD can be defined as:

$$\begin{aligned} \phi (y;\mu , \tau ) = \left\{ \begin{aligned}&S(y;\mu , \tau ) - S(y+1;\mu , \tau ), ~~ y \in {\mathbb {Z}}, \\&0, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ otherwise, \end{aligned} \right. \end{aligned}$$
(4)

with \(S(y;\mu ,\tau )\) in Eq.(3). Note that when \(y \ge \mu \),

$$\begin{aligned}&S(y;\mu , \tau ) - S(y+1;\mu , \tau )\\&\quad =(1-\tau ) \biggl (\exp \left\{ -\tau (y-\mu )\right\} -\exp \left\{ -\tau (y+1-\mu )\right\} \biggr )\\&\quad = (1-\tau ) \biggl (\exp \left\{ -\tau (y-\mu )\right\} -\exp \left\{ -\tau (y-\mu )\right\} \exp \left\{ -\tau \right\} \biggr ) \\&\quad =(1-\tau )\exp \left\{ -\tau (y-\mu )\right\} \biggl (1-\exp \left\{ -\tau \right\} \biggr ) \\&\quad =-(1-\tau ) \biggl (\exp \left\{ -\tau \right\} -1 \biggr )\exp \left\{ -\tau (y-\mu )\right\} \\&\quad =\rho _{\tau }(-\text {sgn}(y-\mu )) \left[ \exp \{-\rho _{\tau }(\text {sgn}(y-\mu ))\} - 1\right] \exp \left\{ -\rho _\tau (y-\mu )\right\} , \end{aligned}$$

and similar deviation of it in the case of \(y <\mu \). It follows:

$$\begin{aligned} \begin{aligned}&\phi (y; \mu , \tau ) = \rho _{\tau }(-\text {sgn}(y-\mu )) \left[ \exp \{-\rho _{\tau }(\text {sgn}(y-\mu ))\} - 1\right] \\&\quad \exp \left\{ -\rho _\tau (y-\mu )\right\} \\&y = \cdots ,-1, 0, 1, \cdots , \end{aligned} \end{aligned}$$
(5)

and the loss function (or check function) is

$$\begin{aligned} \rho _\tau \left( u\right) = \frac{|u| + (2 \tau -1 )u}{2}. \end{aligned}$$

Remark 1

One could also incorporate scale parameter \(\sigma \) in Eq.(4) to obtain

$$\begin{aligned} \begin{aligned}&\phi (y; \mu , \tau ) = \rho _{\tau }(-\text {sgn}(y-\mu )) \left[ \exp \left\{ -\rho _{\tau }\left( \text {sgn}\left( \frac{y - \mu }{\sigma }\right) \right) \right\} - 1\right] \\&\quad \exp \left\{ -\rho _\tau \left( \frac{y - \mu }{\sigma }\right) \right\} , \\&y = \cdots ,-1, 0, 1, \cdots . \end{aligned} \end{aligned}$$

According to [6], any fixed \(\sigma \) can be utilised to obtain asymptotically valid posterior inference and make the results asymptotically invariant. Here, we simply fix \(\sigma \) as 1.

Given a sample \(\varvec{Y}= (Y_1, Y_2, \cdots , Y_n)\) of the discrete response Y whose distribution \(F_0(y)\) may be unknown, consider the DALD-based likelihood function for \(\mu \):

$$\begin{aligned}&L(\varvec{Y}|\mu ) = \prod _{i=1}^{n}\nonumber \\&\quad \left[ \rho _{\tau }(-\text {sgn}(Y_i-\mu )) \left[ \exp ^{-\rho _{\tau }(\text {sgn}(Y_i-\mu ))} - 1\right] \exp \left\{ -\rho _\tau (Y_i-\mu )\right\} \right] . \end{aligned}$$
(6)

Then, we have

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmax}}\limits _{\mu } L(\varvec{Y}|\mu ) \\&\quad = \mathop {\hbox {argmax}}\limits _{\mu } \log L(\varvec{Y}|\mu ) \\&\quad = \mathop {\hbox {argmax}}\limits _{\mu } \left\{ - \sum _{i=1}^n \rho _\tau (Y_i-\mu ) \right\} \\&\quad = \mathop {\hbox {argmin}}\limits _\mu \sum \limits _{i=1}^n \rho _\tau \left( Y_i - \mu \right) . \end{aligned} \end{aligned}$$

This means that the estimation of the \(\tau \)-th quantile \(\mu \) of a discrete random variable Y with respect to the loss function \(\rho _\tau (\cdot )\) is equivalent to maximization of the likelihood function Eq.(6) based on the DALD. According to [28], a Bayesian inference of \(\mu \) can be developed. That is, if \(\pi (\mu )\) represents prior beliefs about the \(\tau \)-th quantile \(\mu \), and \(\varvec{Y}\) is observed data from the unknown distribution \(F_0(Y)\) of the discrete random variable Y, then a posterior \(\pi (\mu |\varvec{Y})\) which is valid and coherent update of \(\pi (\mu )\) can be obtained via the DALD-based likelihood function Eq.(6) and is given by:

$$\begin{aligned} \pi (\mu |\varvec{Y}) \propto \pi (\mu )\, L(\varvec{Y}|\mu ). \end{aligned}$$
(7)

Coherency here means if \(\nu \) denotes a probability measure on the space of \(\mu \), then \(\nu \) is named coherent if

$$\begin{aligned} \iint \rho _{\tau }( Y-\mu )dF_0(Y)\nu (d\mu ) \le \iint \rho _{\tau }( Y-\mu )dF_0(Y)\nu _1(d\mu ), \end{aligned}$$

for all other probability measure \(\nu _1\) on the space of \(\mu \) in terms of expected loss of Y given by \(E_{F_0(Y)}\rho _\tau (Y-\mu )\). This coherency property aims to ensure the consistency of posterior from the proposed inference even if the ‘working likelihood’ in Eqs.(3)–(A1) is misspecified.

3 Bayesian Quantile Regression with Discrete Responses

Generalized linear models (GLMs) extend the linear modelling capability to scenarios that involve non-normal distributions \( f(y; \mu ) \) or heteroscedasticity, with \( f(y; \mu ) \) specified by the values of \(\mu =E[Y|\varvec{X}=\varvec{x}]\) conditional on \(\varvec{x}\), including to involve a known link function g, \(g(\mu )=\varvec{x}^T \varvec{\beta }\). Specifically, GLMs also apply to the so-called ‘exponential’ family of models, which typically include Poisson regression with log-link function.

When we are interested in the conditional quantile \(Q_Y(\tau | \varvec{x})\) of a discrete response, according to [7], we could still cast the problem in the framework of the generalized linear model, no matter what the original distribution of the data is, by assuming that (i) \( f(y; \mu ) \) follows a DALD in the form of Eqs.(5) or (A1) and (ii) \(g(\mu )=\varvec{x}^T \varvec{\beta }(\tau ) = Q_Y(\tau | \varvec{x})\) for any \(0< \tau <1\).

When covariate information such as a covariate vector \(\varvec{X}\) is available, quantile regression denoted by \(Q_Y(\tau | \varvec{X})\) for \(\mu \) is introduced. Without loss of generality, consider a linear regression model for \(Q_Y(\tau | \varvec{X})\): \(Q_Y(\tau | \varvec{X}) = \varvec{X}^T \varvec{\beta }\), where \(\varvec{\beta }\) is the regression parameter vector.

Given observations \(\varvec{Y}= (Y_1, Y_2, \cdots , Y_n)\) of the discrete response Y, one of the aims in regression analysis is the inference of \(\varvec{\beta }\). Let \(\pi (\varvec{\beta })\) be the prior distribution of \(\varvec{\beta }\), then the posterior distribution of \(\varvec{\beta }\), \(\pi (\varvec{\beta }|\varvec{Y})\) is given by

$$\begin{aligned} \pi (\varvec{\beta }|\varvec{Y}) \propto \pi (\varvec{\beta }) \,L(\varvec{Y}|\varvec{\beta }), \end{aligned}$$
(8)

where the likelihood function \(L(\varvec{Y}|\varvec{\beta })\) is given by:

$$\begin{aligned} L(\varvec{Y}|\varvec{\beta })= & {} \prod _{i=1}^{n}\left[ \rho _{\tau }(-\text {sgn}(Y_i-\varvec{X}_i^T \varvec{\beta })) \left[ \exp ^{-\rho _{\tau }(\text {sgn}(Y_i-\varvec{X}_i^T \varvec{\beta }))} - 1\right] \right. \\&\left. \exp \left\{ -\rho _\tau (Y_i-\varvec{X}_i^T \varvec{\beta })\right\} \right] . \end{aligned}$$

The numerical computation of the posterior distribution can be carried out by the Metropolis-Hastings algorithm. That is, we first generate a candidate \(\varvec{\beta }^*\) according to a random walk, which results in a symmetric proposal distribution in the Metropolis algorithm. Then, accept or reject \(\varvec{\beta }^*\) for \(\varvec{\beta }\) according to the acceptance probability \(p(\varvec{\beta }^*| \varvec{\beta }) = \min \left( 1, \frac{L\left( \varvec{y}|\varvec{\beta }^*\right) }{L\left( \varvec{y}|\varvec{\beta }\right) }\right) \) around 30%. We always suggest to throw away at last 30% of the iterations at the beginning of an MCMC run or ‘burn-in’ period to make sure the chain to reach stationarity.

Besides the coherent property discussed in Sect. 2 for posterior distribution \(\pi (\varvec{\beta }|\varvec{Y})\), it is important to verify the existence of the posterior distribution when the prior of \(\varvec{\beta }\) is improper, i.e,

$$\begin{aligned} 0< E \left\{ \pi (\varvec{\beta }|\varvec{Y}) \right\} < \infty , \end{aligned}$$

or, equivalently,

$$\begin{aligned} 0< E \left\{ \pi (\varvec{\beta }) \,L(\varvec{Y}|\varvec{\beta }) \right\} < \infty . \end{aligned}$$

Moreover, it is preferable to check that the existence of posterior moments of the regression parameters is entirely unaffected by improper priors and quantile index \(\tau \) ([29] and among others), i.e,

$$\begin{aligned} E\left[ \left( \prod _{j=0}^m |\beta _j|^{r_j}\right) \Big | \varvec{Y}\right] < \infty , \end{aligned}$$
(9)

where \(r_j\) denotes the order of the moments of \(\beta _j\).

To this end, we have the following conclusion:

Theorem 3.1

Assume the posterior is given by Eq.(8) and \(\pi (\varvec{\beta }) \propto 1\), then all posterior moments of \(\varvec{\beta }\) in Eq.(9) exist.

The proof of Theorem 3.1 is available in the Supplementary Materials.

4 Bayesian Expectile Regression for Discrete Responses

Instead of defining the \(\tau \)-th quantile of a response Y by \(\mathop {\hbox {argmin}}\limits _\mu E\left( \rho _\tau (Y-\mu )\right) \), [30] defined the \(\theta \)-th expectile of Y by

$$\begin{aligned} Expectile_\theta (Y) = \mathop {\hbox {argmin}}\limits _\mu E\left( \rho _\theta ^{(E)}(Y-\mu )\right) , \end{aligned}$$
(10)

in terms of an asymmetric quadratic loss function

$$\begin{aligned} \rho _\theta ^{(E)}(u)= u^2\left| \theta -I(u<0)\right| , \end{aligned}$$

where \(\theta \in (0,1)\) determines the degree of asymmetry of the loss function. Note that \(\theta \) is typically not equal to \(\tau \), although there is a one-to-one relationship between \(\tau \)-th quantile and \(\theta \)-th expectile ([31]).

Corresponding to \(\rho _\theta ^{(E)}(u)\), we can define an asymmetric normal distribution (AND) whose density function is given by

$$\begin{aligned} f^{(E)}\left( y; \mu , \theta \right) = k {\left\{ \begin{array}{ll} \exp \left\{ -\theta \left( y-\mu \right) ^2\right\} , &{} \quad y \ge \mu ,\\ \exp \left\{ (\theta -1)\left( y-\mu \right) ^2\right\} , &{} \quad y < \mu , \\ \end{array}\right. } \end{aligned}$$
(11)

where \(k = \frac{2}{\sqrt{\pi }} \frac{\sqrt{\theta (1-\theta )}}{\sqrt{\theta }+\sqrt{1-\theta }}\), \(\mu \) and \(\theta \) are the location parameter and shape parameter, respectively.

The corresponding c.d.f. of the AND can be written as:

$$\begin{aligned} \begin{aligned}&F^{(E)}\left( y; \mu , \theta \right) \\&= {\left\{ \begin{array}{ll} k \sqrt{\frac{\pi }{\theta }} \Phi \left( \sqrt{2\theta }\left( y-\mu \right) \right) + \frac{k}{2}\left( \sqrt{\frac{\pi }{1-\theta }} - \sqrt{\frac{\pi }{\theta }} \right) , &{} \quad y > \mu ,\\ k \sqrt{ \frac{\pi }{1-\theta }} \Phi \left( \sqrt{2(1-\theta )}\left( y-\mu \right) \right) , &{} \quad y \le \mu , \\ \end{array}\right. }\\ \end{aligned} \end{aligned}$$
(12)

where \(\Phi (\cdot )\) denotes the c.d.f. of the standard normal distribution.

Therefore, based on the survival function \(S^{(E)}\left( y; \mu , \theta \right) \) \(= 1-F^{(E)}\left( y; \mu , \theta \right) \), we can derive the p.m.f. of the DAND by following the same procedure as in Eq.(4). In fact, note that

$$\begin{aligned} \begin{aligned}&\rho _{\theta }(\text {sgn}(y-\mu ))= {\left\{ \begin{array}{ll} \theta (y-\mu ), &{} \quad y > \mu ,\\ -(1-\theta ) (y-\mu ) &{} \quad y \le \mu , \\ \end{array}\right. }\\ \end{aligned} \end{aligned}$$

then, for \(y \in Z\),

$$\begin{aligned} \phi ^{(E)}\left( y; \mu , \tau \right)=\, & {} S^{(E)}\left( y; \mu , \theta \right) -S^{(E)}\left( y+1; \mu , \theta \right) \\=\, & {} F^{(E)}\left( y+1; \mu , \theta \right) -F^{(E)}\left( y; \mu , \theta \right) , \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned} \phi ^{(E)}\left( y; \mu , \tau \right)&= k \sqrt{\frac{\pi }{\rho _{\theta }(\text {sgn}(y-\mu ))}} \bigg [\Phi \left( \sqrt{2 \rho _{\theta }(\text {sgn}(y-\mu ))}\left( y+1-\mu \right) \right) \\&\quad - \Phi \left( \sqrt{2 \rho _{\theta }(\text {sgn}(y-\mu ))}\left( y-\mu \right) \right) \bigg ], ~~ y=\cdots ,-1,0,1,\cdots . \end{aligned} \end{aligned}$$
(13)

Now, if Y is a discrete random variable with unknown distribution function \(F_0(y)\), then given a sample \(\varvec{Y}= (Y_1, Y_2, \cdots , Y_n)\) of Y, the \(\theta \)-th expectile of Y is estimated by the minimization of the loss function \(\rho _\theta ^{(E)}\) or \(\mathop {\hbox {argmin}}\limits _{\mu } \sum _{i=1}^n \rho _\theta ^{(E)}\left( Y_i-\mu \right) \). Consider the DAND-based likelihood function:

$$\begin{aligned} \begin{aligned}&L^{(E)}\left( \varvec{Y}|\mu \right) = \prod _{i=1}^{n} \bigg [k \sqrt{\frac{\pi }{\rho _{\theta }(\text {sgn}(Y_i-\mu ))}} \Big [\Phi \left( \sqrt{2 \rho _{\theta }(\text {sgn}(Y_i-\mu ))}\left( Y_i+1-\mu \right) \right) \\&\quad - \Phi \left( \sqrt{2 \rho _{\theta }(\text {sgn}(Y_i-\mu ))}\left( Y_i-\mu \right) \right) \Big ]\bigg ]. \end{aligned} \end{aligned}$$
(14)

We can see that the expectile \(\mu \) can also be estimated equivalently by the maximization of the likelihood function \(L^{(E)}(\varvec{Y}|\mu )\) in Eq.(14). In fact,

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmax}}\limits _{\mu } L^{(E)}(\varvec{Y}|\mu ) \\&\quad = \mathop {\hbox {argmax}}\limits _{\mu } \prod _{i=1}^n \left[ \Phi \left( \sqrt{2 \rho _{\theta }(\text {sgn}(Y_i-\mu ))}\left( Y_i+1-\mu \right) \right) \right. \\&\left. \quad - \Phi \left( \sqrt{2 \rho _{\theta }(\text {sgn}(Y_i-\mu ))}\left( Y_i-\mu \right) \right) \right] \\&\quad = \mathop {\hbox {argmax}}\limits _{\mu } \prod _{i=1}^n \int _{\sqrt{2\rho _{\theta }(\text {sgn}(Y_i-\mu ))}\left( Y_i-\mu \right) } ^{\sqrt{2\rho _{\theta }(\text {sgn}(Y_i-\mu ))}\left( Y_i+1-\mu \right) } \phi (u) du \end{aligned} \end{aligned}$$

(According to Lagrange mean value theorem \(\int ^b_a \phi (u) du=\phi (\xi ) (b-a)\),)

$$\begin{aligned} \begin{aligned}&= \mathop {\hbox {argmax}}\limits _{\mu } \left[ \exp \left\{ - \rho _{\theta }(\text {sgn}(Y_i-\mu )) \sum _i^n \left( Y_i-\mu \right) ^2 \right\} \right] \\&\quad = \mathop {\hbox {argmax}}\limits _{\mu } \left[ - \rho _{\theta }(\text {sgn}(Y_i-\mu )) \sum _i^n \left( Y_i-\mu \right) ^2\right] \\&\quad = \mathop {\hbox {argmin}}\limits _{\mu } \sum _{i=1}^n \rho _\theta ^{(E)}\left( Y_i-\mu \right) , \end{aligned} \end{aligned}$$

where \(\phi (\cdot )\) denotes the p.d.f. of the standard normal distribution.

Again, according to [28], a Bayesian inference of the expectile \(\mu \) can be developed. That is, a coherent posterior \(\pi (\mu |\varvec{Y})\) for the update of \(\pi (\mu )\) exists and is given by \(\pi (\mu |\varvec{Y}) \propto \pi (\mu )\, L^{(E)}(\varvec{Y}|\mu )\) with the likelihood function \(L^{(E)}\left( \varvec{Y}|\mu \right) \) in Eq.(14). Along with the similar discussion in Sect. 3, we can prove that the posterior distribution under this Bayesian inference is proper with regarding to improper priors for regression parameter \(\varvec{\beta }\) in the expectile regression model \(\mu = \varvec{X}^T \varvec{\beta }\), if covariate information \(\varvec{X}\) is available. The corresponding proofs are available in the Supplementary Materials.

5 Numerical Analysis

In this section, we implement the proposed method to illustrate the Bayesian quantile regression for discrete responses via Monte Carlo simulation studies and one real data analysis. In all numerical analyses, we follow standard practice by setting the variance of random-walk MH sampling to tune the proposal distribution to get around a 0.25–0.3 acceptance rate. We discard the first 10000 of 20000 runs in every case of MCMC outputs and then collect a sample of 10000 values to calculate the posterior distribution of each of regression coefficients in \(\varvec{\beta }\). All numerical experiments are carried out on one Intel Core i5-3470 CPU (3.20GMHz) processor and 8 GB RAM.

5.1 Simulated Example 1

Consider a simple regression model for which the sample \(Y_i (i=1, 2, \cdots , n)\) is counts and follows a Poisson distribution with parameter 3 and a Binomial distribution with parameters 20 and 1/5, respectively. 500 simulations for each case of \(\tau \in \{\)0.05, 0.25, 0.50, 0.75, 0.95\(\}\) and \(n \in \{200, 1000\}\) are performed. The quantile regression \(Q_\tau (Y)=\beta (\tau )\) is a constant depending on \(\tau \) only. Table 1 compares the posterior means with the true values of \(\beta (\tau )\) for each case under 500 simulations. Moreover, the expectile regression \(Expectile_\theta (Y) = \beta (\theta )\) is also a constant depending on the \(\theta \)-th expectile. Table 2 compares the posterior means with the true values of \(\beta (\theta )\) obtained via an empirical estimation in Eq.(10) for different cases. Both Tables 1 and 2 show that the results obtained by the proposed Bayesian inference are reasonably accurate.

Table 1 Posterior mean and posterior standard deviations (S.D.) of \(\beta (\tau )\) for quantiles from simulated example 5.1
Table 2 Posterior mean and posterior standard deviations (S.D.) of \(\beta (\theta )\) for expectiles from simulated example 5.1

5.2 Simulated Example 2

We consider a discrete quantile linear regression:

$$\begin{aligned} Y_i = \beta _0 + \sum _{k=1}^p \beta _k X_{ik} + \varepsilon _i, ~i=1,\cdots ,n;~ k = 1,\cdots , p, \end{aligned}$$
(15)

where n and p denote the number of observations and independent variables, respectively. \(\beta _k, k = 1,\ldots ,p\) are the regression parameters. Let the random item \(\varepsilon _i\) follows a Poisson distribution with parameter 3. 500 simulations for each case of \(\tau \in \{0.25, 0.50, 0.75\}\) and \(n \in \{300, 1500\}\) are performed.

Without loss of generality, let \(p=2\) in Eq.(15):

$$\begin{aligned} Y_i = \beta _0 + \beta _1 X_{i1} + \beta _2 X_{i2} + \varepsilon _i, ~~i=1,\cdots ,n, \end{aligned}$$

where covariate \(X_{1,i}\) is generated from a Geometric distribution with probability 1/4, and covariate \(X_{2,i}\) is generated from a Poisson distribution with parameter 2. We generate the training data with \(\beta _i = \{6, 2, -4\}, i= \{0,1,2\}\) and \(\varepsilon _i \sim \mathrm {Pois}(3)\). 500 simulations for each case of \(\tau \in \{0.25, 0.50, 0.75\}\) and \(n_1 \in \{200, 1000\}\) are performed.

Therefore, the corresponding discrete quantile function is of the form

$$\begin{aligned} Q_\tau (Y|X) = \beta _0(\tau ) + \beta _1(\tau )X_1 + \beta _2(\tau )X_2. \end{aligned}$$

Although we have chosen improper flat priors in Simulated example 1 above, one may use other priors for analysis in a relatively straightforward fashion. For example, along with [32], conditional conjugate prior distribution in the Normal-Gamma Inverse form for the unknown parameters \(\varvec{\beta }\) can be obtained. Given \(\tau \in (0,1)\), for any \(a > 0\), the prior mean and covariance matrix for \(\varvec{\beta }\) are given, respectively, by

$$\begin{aligned} \begin{aligned} E(\varvec{\beta })&= \varvec{\beta }_{a\tau } \\ Cov(\varvec{\beta })&= 2g(\varvec{X}\varvec{V}\varvec{X}^T)^{-1}, \end{aligned} \end{aligned}$$

where \(\varvec{\beta }_{a}\) is anticipated values, and \(g>0\) is a known scaling factor. Various values of g have been used in the context of variable selection and estimation. [33] performed variable selection using splines and suggested that the value of g is in the range \(10 \le g \le 1000\). Following the discussions in [32, 34] and among others, we set \(g = 100\) in this paper. Thus, given \(\tau \) and \(\varvec{\beta }_{a\tau }\), the conditional prior distribution for \(\varvec{\beta }\) is readily available. Here, we suggest a particular form of a conjugate Normal-Inverse Gamma family for \(\varvec{\beta }\) given by

$$\begin{aligned} \varvec{\beta }| V, X \sim N(\varvec{\beta }_{a}, 2g(\varvec{X}\varvec{V}\varvec{X}^T)^{-1} ). \end{aligned}$$

For simplicity, let \(E(\varvec{\beta })\) and \(Cov(\varvec{\beta })\) be the fitted values obtained by the semi-parametric jittering approach ([25]), as presented in Table 3.

Table 3 The prior mean and covariance matrix for \(\varvec{\beta }\)

Under the proposed Bayesian inference in Sect. 2, Table 4 reports the posterior mean, standard deviation and 95% credible interval for the regression parameters \(\beta _0(\tau ), \beta _1(\tau )\) and \(\beta _2(\tau )\), under 500 simulations with \(\varepsilon _i \sim \mathrm {Pois}(3)\), based on a conjugate Normal-Inverse Gamma prior for \(\varvec{\beta }\). It can be shown from Table 4 that under different prior settings, the posterior estimates of regression coefficients obtained from the working likelihood analysis are consistent.

Table 4 Posterior mean, standard deviation and 95% credible interval of \(\beta _k(\tau ), k = 0,1,2\) from simulated example 2 based on a conjugate Normal-Inverse Gamma prior for \(\varvec{\beta }\)

5.3 Analysis of Length of Stay (LoS) in Days

The data are extracted from the Worcester Heart Attack Study with 500 observations ([35]), which describes factors associated with trends over time in the incidence and survival rates following hospital admission for acute myocardial infarction. We aim to explore the relationship between the LoS and its associated factors such as age, gender (years), hr (initial heart rate by beats per minute), BMI (body mass index by \(kg/m^2\)), av3 (complete heart block), cvd (history of cardiovscular disease), sysbp (initial systolic blood pressure by mmHg) and diasbp (initial diastolic blood pressure by mmHg). Among covariates, gender (0=Male, 1=Female), av3 (0=No, 1=Yes) and cvd (0=No, 1=Yes) are binary variables. Age, hr, BMI, sysbp and diasbp as the continuous covariates are detailed in Table 5. The distribution of LoS is skewed and one is usually more interested in long stay or short stay than an average stay ([36, 37] and among others). We aim to investigate how these factors affect the LoS, from short LoS, to middle LoS and long LoS, so that we carry out the analysis in a complete range of the quantiles \(\tau \in \{0.05, 0.1, 0.25, 0.50, 0.75, 0.95\}\).

Table 5 Statistical description of the continuous covariates in data

Therefore, we fit a quantile regression model for LoS of the form:

$$\begin{aligned} \begin{aligned} Q_\tau (Y|X)&= \beta _0(\tau ) + \beta _1(\tau )\mathrm {Age} + \beta _2(\tau )\mathrm {Gender}+\beta _3(\tau )\mathrm {hr} + \beta _4(\tau )\mathrm {BMI} \\&\quad + \beta _5(\tau )\mathrm {av3}+\beta _6(\tau )\mathrm {cvd}+\beta _7(\tau )\mathrm {sysbp}+\beta _8(\tau )\mathrm {diasbp}. \end{aligned} \end{aligned}$$

Table 6 shows the posterior mean of all regression parameters under selected quantiles. The boxplots in Fig. 1 also display the posterior mean of these regression parameters across \(\tau \)s.

Table 6 Posterior mean of regression parameters with different quantiles under the proposed method
Fig. 1
figure 1

Boxplots of posterior mean of regression parameters

The values of the posterior mean of \(\beta _1\), \(\beta _3\), \(\beta _7\) and \(\beta _8\) in Table 6 clearly indicate that age and all initial states of heart rate, systolic blood pressure and initial diastolic blood pressure, have little effect on LoS (days), particularly on the low quantiles of its distribution, whereas the values of the posterior mean of \(\beta _2\), \(\beta _5\) and \(\beta _6\) show that gender, complete heart block and history of cardiovscular disease are those factors to affect the LoS most. Specifically, female patients tend to stay longer than male patients generally once they got those health problems and admitted into hospitals. Similarly, patients suffering complete heart block or history of cardiovscular disease stay much longer than patients without these problems, particularly for very long stay needed.

Based on the posterior mean of \(\beta _4\) in Table 6, the effect on LoS from BMI is not big but generally negative except on the median and 95% quantile of the distribution of LoS.

If we compare the fitted median quantiel regression of LoS with a Poisson mean regression below,

$$\begin{aligned} Mean/LoS= & {} 1.388 + 0.001 (\tau )\mathrm {Age} + 0.140 (\tau )\mathrm {Gender}+0.003(\tau )\mathrm {hr} \\&-0.005(\tau )\mathrm {BMI} + 0.113(\tau )\mathrm {av3}+0.071(\tau )\mathrm {cvd}\\&-0.001(\tau )\mathrm {sysbp}+0.003(\tau )\mathrm {diasbp}. \end{aligned}$$

We can see that both the proposed median model and Poisson mean model provide consistent conclusion, but Poisson regression can not explored the short and long LoS.

Unlike the jittering method for count (Machado and Silva, 2005), the proposed method in this paper is density function based Bayesian inference. However, if we compare our findings to the results from the count model of Machado and Silva (2005) in Table 7, we can draw similar conclusions to those from Table 6. The difference is that the proposed method can show that complete heart block or history of cardiovscular disease increase the long LoS significantly, which is true in real situations and much significant than what explored in the count model of Machado and Silva (2005).

Table 7 Posterior mean of regression parameters with different quantiles under Machado and Silva (2005)

6 Discussion

Discrete responses or count data are common in many disciplines. Regression analysis of discrete responses has been an active and promising area of research. Data with discrete responses may present features of skewness, fat-tailed and leptokurtic. Quantile regression is a more suitable tool to analyse this type of data than mean regression. We propose Bayesian quantile regression and Bayesian expectile regression for discrete responses. This is achieved by using a discrete asymmetric Laplace distribution and discrete asymmetric normal distribution to form the likelihood function, respectively. The method is shown robust numerically and coherent theoretically. The Bayesian approach which is fairly easy to implement and provides complete univariate and joint posterior distributions of parameters of interest. The posterior distributions of the unknown model parameters are obtained by using M-H algorithm implemented in R. We have shown promising results through Monte Carlo simulation studies and one real data analysis.