A Discrete Density Approach to Bayesian Quantile and Expectile Regression with Discrete Responses

For decades, regression models beyond the mean for continuous responses have attracted great attention in the literature. These models typically include quantile regression and expectile regression. But there is little research on these regression models for discrete responses, particularly from a Bayesian perspective. By forming the likelihood function based on suitable discrete probability mass functions, this paper introduces a discrete density approach for Bayesian inference of these regression models with discrete responses. Bayesian quantile regression for discrete responses is first developed, and then this method is extended to Bayesian expectile regression for discrete responses. The posterior distribution under this approach is shown not only coherent irrespective of the true distribution of the response, but also proper with regarding to improper priors for the unknown model parameters. The performance of the method is evaluated via extensive Monte Carlo simulation studies and one real data analysis.


Introduction
Regression models for dealing with responses following a non-normal distribution have been drawing significant attention in the literature. For example, quantile regression and expectile regression have been widely developed in the literature and increasingly applied to a greater variety of scientific questions. See, [1][2][3][4][5] and among others.
Typically, quantile regression estimates various conditional quantiles of a response or dependent random variable, including the median (0.5th quantile). Putting different quantile regressions together provide a more complete description of the underlying conditional distribution of the response than a simple mean regression. This is particularly useful when the conditional distribution is asymmetric or heterogeneous or fat-tailed or truncated. Quantile regression has been widely used in statistics and numerous application areas. Bayesian quantile regression for continuous responses has received increasing attention from both theoretical and empirical viewpoints. See a recent review by [6] for the first type of Bayesian quantile regression methods ( [7][8][9] and among others) based on asymmetric Laplace distribution (ALD) likelihood function. But among numerous application areas of regression models, discrete observations such as integer values (e.g., -2, -1, 0, 1, 2, 3, etc.) on a response are easily collected. In particular, many big data nowadays contain discrete observations such as number of online transaction, number of days in hospital, number of votes and so on. Classic regression models for discrete responses include logistic, Poisson and negative Binomial regression. Discrete responses are generally skewed, the mean-based regression analysis would not be sufficient for a complete analysis ( [10]).
However, quantile regression for discrete responses receives far less attention than for continuous responses in the literature. Binary quantile regression was first introduced by [11]. Then, several authors ( [12][13][14] and among others) developed different smoothed estimation techniques (nonparametric or semiparametric methods) for the binary quantile regression model under frequentist approaches. Based on the idea of linking ALD to latent variables in Bayesian Tobit quantile regression ( [15]), the papers by [16,17] and among others did proposed Bayesian inference binary quantile regression. Based on a normal-exponential mixture representation of ALD, [18] and [19] extended Bayesian binary quantile regression to Bayesain ordinal quantile regression. Then, [20] further extended it to Bayesain inference of single-index quantile regression for ordinal data. [21,22] and [23] applied these discrete quantile regression methods in economics, energy and education, respectively. But all these research methods are not a direct Bayesian quantile regression approach for discrete responses, or deal with quantile regression for general discrete responses. Similarly, there is little research on expectile regression for discrete responses, let alone from a Bayesian perspective ( [24]). A semi-parametric jittering approach for general count quantile regression has been introduced ([25]), but some degree of smoothness has to be artificially imposed on the approach. Quantile regression for count data may be achieved via density regression as shown in [26], but this approach may result in a global estimation of regression coefficients. In this paper, we propose Bayesian inference quantile regression for discrete responses via introducing a discrete version of ALD-based likelihood function. This approach not only keeps the 'local property' of quantile regression, but also enjoys the coherency and finite posterior moments of the posterior distribution. Along this line, we then introduce Bayesian expectile regression for discrete responses, which proceeds by forming the likelihood function based on a discrete asymmetric normal distribution (DAND). Section 2 introduces a discrete asymmetric Laplace distribution (DALD) and discusses its natural link with quantile regression for discrete responses. Sections 3 and 4 detail this Bayesian approach for quantile regression and expectile regression with discrete responses, respectively. Section 5 illustrates the numerical performance of the proposed method. Section 6 concludes with a brief discussion.

Discrete Asymmetric Laplace Distribution
Let Y be a real-valued random variable with its -th ( 0 < < 1 ) quantile ( −∞ < < ∞ ), and then it is well-known that could be found by minimizing the expected loss of Y with respect to the loss function (or check function) When Y is a continuous random variable, the inference based on the loss function (y − ) was linked to a maximum likelihood inference based on an ALD( , ) with local parameter and shape parameter : Now, if Y is a discrete random variable, let Y take integer values in ℤ . We first derive a discrete version of ALD or a DALD and then show that the th quantile can also be estimated via this DALD.
To this end, note that the corresponding cumulative distribution function (c.d.f.) of an ALD in Eq.(1) can be written as: Let S(y; , ) be the survival function of this ALD, which is given by: then, according to [27], the probability mass function (p.m.f.) of a DALD can be defined as:    According to [6], any fixed can be utilised to obtain asymptotically valid posterior inference and make the results asymptotically invariant. Here, we simply fix as 1.
Given a sample Y = (Y 1 , Y 2 , ⋯ , Y n ) of the discrete response Y whose distribution F 0 (y) may be unknown, consider the DALD-based likelihood function for : Then, we have S(y; , ) − S(y + 1; , ) This means that the estimation of the -th quantile of a discrete random variable Y with respect to the loss function (⋅) is equivalent to maximization of the likelihood function Eq.(6) based on the DALD. According to [28], a Bayesian inference of can be developed. That is, if ( ) represents prior beliefs about the -th quantile , and Y is observed data from the unknown distribution F 0 (Y) of the discrete random variable Y, then a posterior ( |Y) which is valid and coherent update of ( ) can be obtained via the DALD-based likelihood function Eq.(6) and is given by: Coherency here means if denotes a probability measure on the space of , then is named coherent if for all other probability measure 1 on the space of in terms of expected loss of Y given by E F 0 (Y) (Y − ) . This coherency property aims to ensure the consistency of posterior from the proposed inference even if the 'working likelihood' in Eqs.

Bayesian Quantile Regression with Discrete Responses
Generalized linear models (GLMs) extend the linear modelling capability to scenarios that involve non-normal distributions f (y; ) or heteroscedasticity, with f (y; ) specified by the values of = E[Y|X = x] conditional on x , including to involve a known link function g, g( ) = x T . Specifically, GLMs also apply to the so-called 'exponential' family of models, which typically include Poisson regression with loglink function. When we are interested in the conditional quantile Q Y ( |x) of a discrete response, according to [7], we could still cast the problem in the framework of the generalized linear model, no matter what the original distribution of the data is, by assuming that (i) f (y; ) follows a DALD in the form of Eqs. (5) or (A1) and (ii) When covariate information such as a covariate vector X is available, quantile regression denoted by Q Y ( |X) for is introduced. Without loss of generality, consider a linear regression model for Q Y ( |X) : Q Y ( |X) = X T , where is the regression parameter vector. Given observations Y = (Y 1 , Y 2 , ⋯ , Y n ) of the discrete response Y, one of the aims in regression analysis is the inference of . Let ( ) be the prior distribution of , then the posterior distribution of , ( |Y) is given by where the likelihood function L(Y| ) is given by: The numerical computation of the posterior distribution can be carried out by the Metropolis-Hastings algorithm. That is, we first generate a candidate * according to a random walk, which results in a symmetric proposal distribution in the Metropolis algorithm. Then, accept or reject * for according to the acceptance probability around 30%. We always suggest to throw away at last 30% of the iterations at the beginning of an MCMC run or 'burn-in' period to make sure the chain to reach stationarity. Besides the coherent property discussed in Sect. 2 for posterior distribution ( |Y) , it is important to verify the existence of the posterior distribution when the prior of is improper, i.e, or, equivalently, Moreover, it is preferable to check that the existence of posterior moments of the regression parameters is entirely unaffected by improper priors and quantile index ( [29] and among others), i.e, where r j denotes the order of the moments of j .
To this end, we have the following conclusion: The proof of Theorem 3.1 is available in the Supplementary Materials.

Bayesian Expectile Regression for Discrete Responses
Instead of defining the -th quantile of a response Y by argmin E (Y − ) , [30] defined the -th expectile of Y by in terms of an asymmetric quadratic loss function where ∈ (0, 1) determines the degree of asymmetry of the loss function. Note that is typically not equal to , although there is a one-to-one relationship between -th quantile and -th expectile ( [31]).
Corresponding to (E) (u) , we can define an asymmetric normal distribution (AND) whose density function is given by , and are the location parameter and shape parameter, respectively. The corresponding c.d.f. of the AND can be written as: where Φ(⋅) denotes the c.d.f. of the standard normal distribution. Therefore, based on the survival function S (E) (y; , ) = 1 − F (E) (y; , ) , we can derive the p.m.f. of the DAND by following the same procedure as in Eq.(4). In fact, note that then, for y ∈ Z, we have Consider the DAND-based likelihood function: We can see that the expectile can also be estimated equivalently by the maximization of the likelihood function L (E) (Y| ) in Eq. (14). In fact, where (⋅) denotes the p.d.f. of the standard normal distribution. Again, according to [28], a Bayesian inference of the expectile can be developed. That is, a coherent posterior ( |Y) for the update of ( ) exists and is given by ( |Y) ∝ ( ) L (E) (Y| ) with the likelihood function L (E) (Y| ) in Eq. (14). Along with the similar discussion in Sect. 3, we can prove that the posterior distribution under this Bayesian inference is proper with regarding to improper priors for regression parameter in the expectile regression model = X T , if covariate information X is available. The corresponding proofs are available in the Supplementary Materials.

Numerical Analysis
In this section, we implement the proposed method to illustrate the Bayesian quantile regression for discrete responses via Monte Carlo simulation studies and one real data analysis. In all numerical analyses, we follow standard practice by setting the variance of random-walk MH sampling to tune the proposal distribution to get around a 0.25-0.3 acceptance rate. We discard the first 10000 of 20000 runs in every case of MCMC outputs and then collect a sample of 10000 values to calculate the posterior distribution of each of regression coefficients in . All numerical experiments are carried out on one Intel Core i5-3470 CPU (3.20GMHz) processor and 8 GB RAM.

Simulated Example 1
Consider a simple regression model for which the sample Y i (i = 1, 2, ⋯ , n) is counts and follows a Poisson distribution with parameter 3 and a Binomial distribution with parameters 20 and 1/5, respectively. 500 simulations for each case of ∈ {0.05, 0.25, 0.50, 0.75, 0.95} and n ∈ {200, 1000} are performed. The quantile regression Q (Y) = ( ) is a constant depending on only. Table 1 compares the posterior means with the true values of ( ) for each case under 500 simulations. Moreover, the expectile regression Expectile (Y) = ( ) is also a constant depending on the -th expectile. Table 2 compares the posterior means with the true values of ( )  2 show that the results obtained by the proposed Bayesian inference are reasonably accurate.

Simulated Example 2
We consider a discrete quantile linear regression: where n and p denote the number of observations and independent variables, respectively. k , k = 1, … , p are the regression parameters. Let the random item i follows a Poisson distribution with parameter 3. 500 simulations for each case of ∈ {0.25, 0.50, 0.75} and n ∈ {300, 1500} are performed. Without loss of generality, let p = 2 in Eq.(15): where covariate X 1,i is generated from a Geometric distribution with probability 1/4, and covariate X 2,i is generated from a Poisson distribution with parameter 2. We gen-  along with [32], conditional conjugate prior distribution in the Normal-Gamma Inverse form for the unknown parameters can be obtained. Given ∈ (0, 1) , for any a > 0 , the prior mean and covariance matrix for are given, respectively, by where a is anticipated values, and g > 0 is a known scaling factor. Various values of g have been used in the context of variable selection and estimation. [33] performed variable selection using splines and suggested that the value of g is in the range 10 ≤ g ≤ 1000 . Following the discussions in [32,34] and among others, we set g = 100 in this paper. Thus, given and a , the conditional prior distribution for is readily available. Here, we suggest a particular form of a conjugate Normal-Inverse Gamma family for given by For simplicity, let E( ) and Cov( ) be the fitted values obtained by the semi-parametric jittering approach ( [25]), as presented in Table 3.
Under the proposed Bayesian inference in Sect. 2, Table 4 reports the posterior mean, standard deviation and 95% credible interval for the regression parameters 0 ( ), 1 ( ) and 2 ( ) , under 500 simulations with i ∼ Pois(3) , based on a conjugate Normal-Inverse Gamma prior for . It can be shown from Table 4 that under different prior settings, the posterior estimates of regression coefficients obtained from the working likelihood analysis are consistent .

Analysis of Length of Stay (LoS) in Days
The data are extracted from the Worcester Heart Attack Study with 500 observations ( [35]), which describes factors associated with trends over time in the incidence and survival rates following hospital admission for acute myocardial infarction. We aim to explore the relationship between the LoS and its associated factors such as age, gender (years), hr (initial heart rate by beats per minute), BMI (body mass index by kg∕m 2 ), av3 (complete heart block), cvd (history of cardiovscular disease), sysbp (initial systolic blood pressure by mmHg) and diasbp (initial diastolic blood pressure by mmHg). Among covariates, gender (0=Male, 1=Female), av3 (0=No, 1=Yes) and cvd (0=No, 1=Yes) are binary variables. Age, hr, BMI, sysbp and diasbp as the continuous covariates are detailed in Table 5. The distribution of LoS is skewed and one is usually more interested in long stay or short stay than an average stay ( [36,37] and among others). We aim to investigate how these factors affect the LoS, from short LoS, to middle LoS and long LoS, so that we carry out the analysis in a complete range of the quantiles ∈ {0.05, 0.1, 0.25, 0.50, 0.75, 0.95}.
Therefore, we fit a quantile regression model for LoS of the form: Table 6 shows the posterior mean of all regression parameters under selected quantiles. The boxplots in Fig. 1 also display the posterior mean of these regression parameters across s. The values of the posterior mean of 1 , 3 , 7 and 8 in Table 6 clearly indicate that age and all initial states of heart rate, systolic blood pressure and initial diastolic blood pressure, have little effect on LoS (days), particularly on the low quantiles of Q (Y|X) = 0 ( ) + 1 ( )Age + 2 ( )Gender + 3 ( )hr + 4 ( )BMI + 5 ( )av3 + 6 ( )cvd + 7 ( )sysbp + 8 ( )diasbp. its distribution, whereas the values of the posterior mean of 2 , 5 and 6 show that gender, complete heart block and history of cardiovscular disease are those factors to affect the LoS most. Specifically, female patients tend to stay longer than male patients generally once they got those health problems and admitted into hospitals. Similarly, patients suffering complete heart block or history of cardiovscular disease stay much longer than patients without these problems, particularly for very long stay needed. Based on the posterior mean of 4 in Table 6, the effect on LoS from BMI is not big but generally negative except on the median and 95% quantile of the distribution of LoS.
If we compare the fitted median quantiel regression of LoS with a Poisson mean regression below, We can see that both the proposed median model and Poisson mean model provide consistent conclusion, but Poisson regression can not explored the short and long LoS.
Unlike the jittering method for count (Machado and Silva, 2005), the proposed method in this paper is density function based Bayesian inference. However, if we compare our findings to the results from the count model of Machado and Silva (2005) in Table 7, we can draw similar conclusions to those from Table 6. The difference is that the proposed method can show that complete heart block or history of cardiovscular disease increase the long LoS significantly, which is true in real situations and much significant than what explored in the count model of Machado and Silva (2005).

Discussion
Discrete responses or count data are common in many disciplines. Regression analysis of discrete responses has been an active and promising area of research. Data with discrete responses may present features of skewness, fat-tailed and leptokurtic. Quantile regression is a more suitable tool to analyse this type of data than mean regression. We propose Bayesian quantile regression and Bayesian expectile regression for discrete responses. This is achieved by using a discrete asymmetric Laplace distribution and discrete asymmetric normal distribution to form the likelihood function, respectively. The method is shown robust numerically and coherent theoretically. The Bayesian approach which is fairly easy to implement and provides complete univariate and joint posterior distributions of parameters of interest. Proof of Lemma A.1 Expand (t) as a mixture of g, consider 0 < q ≤ p < 1, (A1) (y; , p, q) = p y− (1 − p) log q, y ≥ , y ∈ ℤ q y− (1 − q) log p, y < , y ∈ ℤ Similarly, consider m = 1 for simplicity, then X T i = 0 + 1 X 1i , As is known to all that the double-integration is finite for any constants. Therefore, assume the likelihood is given by Eq. (8) and ( ) ∝ 1 , then it can be proved that all posterior moments of in Eq.(9) exist.

Appendix C. Worcester Heart Attack Study Data Set
Data set used in the illustration of the proposed novel method in Sect. 5.3 is available in R-package smoothHR.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.