Analysis of Demographic Incidence of Gambling Expenditures
According to economic theory, the utility from consumption of gambling could, in extreme, be negative due to occurrence of some kind of fixed cost related to the individual’s choice of participation in gambling (Cogan 1981; Moffitt 1983; Scott and Garen 1994). Of course negative expenditures are not possible in real life,Footnote 3 thus these values are censored to zero (non-participation). On the other hand, the survey respondents might not have reported the amount they have gambled truthfully, just telling they haven’t gambled when they actually have or they simply cannot remember correctly if they have gambled or how much, again implying zero observation and possible selection issues. The following distinct feature of gambling expenditure data (large number of zero observations) must be accounted by specific statistical and econometric methods.
There are three widely known and used statistical models that take into account the censoring mechanism of the data, which can be seen as a large probability mass at zero in the distribution function of the dependent variable,Footnote 4 are Tobit model, Two-Part model and Sample selection model. These so called limited dependent variable (LDV) regression models are widely used in the consumption analysis of durable goods and medical expenses, as well as in the labour supply analysis (Cragg 1971; Duan et al. 1983; Cogan 1981). These methods have also been incorporated in the studies regarding the determinants of gambling expenditures (Scott and Garen 1994; Humphreys et al. 2010; Rude et al. 2014).
Censoring and Corner Solution Models
Censoring is defined as observing always the regressors, x, but observing the possible values of latent dependent variable, \(y^*\), completely only for a subset of values and incompletely for the rest of the possible values [see e.g. Cameron and Trivedi (2005)]. In the case of censoring from below (or left censoring) at zero, the distribution of y can be written as
$$\begin{aligned} y = {\left\{ \begin{array}{ll} y^* &{} \text {if }\quad y^* > 0, \\ 0 &{} \text {if }\quad y^* \le 0. \end{array}\right. } \end{aligned}$$
(1)
Censoring changes both, the conditional density and mean. The density of y is equal to \(y^*\) for y > 0, in other words \(f(y|x) = f^*(y|x)\) when y > 0. However, when y is at the lower bound (\(y=0\)), the density is a large discrete spike of probability mass that gives the probability of observing \(y^{*} \le 0\), i.e. \(F^*(0|x)\). Therefore, the conditional density for censoring from below can be written as
$$\begin{aligned} f(y|x) = {\left\{ \begin{array}{ll} f^*(y|x) &{} \text {if }\quad y > 0, \\ F^*(0|x) &{} \text {if }\quad y = 0. \end{array}\right. } \end{aligned}$$
(2)
Thus, the density can be written as a combination of conditional probability density function (PDF) and cumulative distribution function (CDF) by using an indicator variable defined as
$$\begin{aligned} d = {\left\{ \begin{array}{ll} 1 &{} \text {if }\quad y > 0, \\ 0 &{} \text {if }\quad y = 0. \end{array}\right. } \end{aligned}$$
(3)
Therefore, the conditional density in the case of censoring from below can be formalized as
$$\begin{aligned} f(y|x) = f^{*}(y|x)^{d} F^{*}(0|x)^{1-d}. \end{aligned}$$
(4)
Regarding gambling expenditures, however, the problem is not the observability of the dependent variable (gambling expenditures), but rather the fact that many individuals make an optimal decision of non-consuming, i.e. choose a corner solution of not to gamble. Theoretically, the two cases call for the same empirical handling. Although, in the case of corner solution outcomes, the latent dependent variable, \(y^{*}\), is just an artificial object, which should not have too much emphasis in our analysis. This is the case when the interest lies on E(y|x) rather than \(E(y^{*}|x)\). In our application \(y^{*}\) can be seen as “desired” amount of gambling, whereas our interest and usually in other applied empirical work as well, lies on the realized gambling expenditures y.
Tobit Model
The classical estimation approach dealing with corner solution outcomes and censored data is the Tobit model. The most traditional case is when the censoring happens at zero or in other words from below, as in our case. Tobit model assumes that the latent dependent variable is linear in regressors with additive, homoskedastic and normally distributed errors:
$$\begin{aligned} y^{*} = x^{'} \beta + \epsilon , \end{aligned}$$
(5)
where
$$\begin{aligned} \epsilon \sim N(0,\sigma ^2). \end{aligned}$$
(6)
The probability that y is observed is
$$\begin{aligned} F^*(0)&= Pr(y^{*} \ge 0)\nonumber \\&= Pr(x^{'}\beta + \epsilon \ge 0)\nonumber \\&= \varPhi \left( -\frac{x^{'}\beta }{\sigma }\right) \nonumber \\&= 1 - \varPhi \left( \frac{x^{'}\beta }{\sigma }\right) \end{aligned}$$
(7)
and \(\varPhi\) is the CDF of standard normal distribution N(0, 1). The censored density function in the case of Tobit model is then
$$\begin{aligned} f(y) = \left[ \frac{1}{ \sqrt{2\pi \sigma ^{2}}} exp {-\frac{1}{2 \sigma ^{2}}(y-x^{'}\beta )^{2}}\right] ^{d}\left[ 1 - \varPhi \left( \frac{x^{'}\beta }{\sigma }\right) \right] ^{1-d}, \end{aligned}$$
(8)
where d is the indicator variable defined above. The log-likelihood function can thus be written as
$$\begin{aligned} ln L_{N}(\beta , \sigma ^2) =&\sum _{i=1}^{N}{d_i\left( -\frac{1}{2}ln2\pi -\frac{1}{2}ln\sigma ^2-\frac{1}{2\sigma ^2}(y_i-x_{i}^{'}\beta \right) ^{2})} \nonumber \\&+(1-d_i)ln\left( 1 - \varPhi \left( \frac{x_{i}^{'}\beta }{\sigma }\right) \right) , \end{aligned}$$
(9)
which is a combination of discrete and continuous densities. The Tobit model is estimated with maximum likelihood method. In the Tobit model both decision margins, participation and expenditure, are determined simultaneously and the effects of explanatory variables are similar on both margins.
Two-Part Model
In most of the empirical applications, however, the Tobit model is too restrictive when stating the same underlying mechanism and parameters for the selection (extensive margin) and the outcome (intensive margin) process. In contrast to Tobit model, Two-Part model (TPM) allows different processes for the censoring (participation, extensive margin) and the outcome (the actual level of gambling expenditures, intensive margin) mechanisms. In addition, if there exists some kind of stigma or fixed cost affecting gambling participation decision, then the Tobit estimation leads to biased estimates and the use of a more general model is needed.
TPM is a generalization of the Tobit model (see Cragg 1971). Formally TPM for the dependent variable y can be written as
$$\begin{aligned} f(y|x) = {\left\{ \begin{array}{ll} Pr(d=0|x)\quad \text {if }\, y = 0, \\ Pr(d=1|x)f(y|d=1,x) \quad \text {if }\, y > 0. \end{array}\right. } \end{aligned}$$
(10)
The participation decision \(Pr(d=0|x)\) is usually modelled by estimating Probit or Logit model. For the continuous part of the distribution (positive expenditures), a log-normal distribution is convenient and is usually estimated with ordinary least squares (OLS) regression. TPM can therefore be formalised as
$$\begin{aligned}&P(y=0|x) = 1-\varPhi (x^{'}\beta ) \end{aligned}$$
(11)
$$\begin{aligned}&log(y|x, y>0) \sim N(x^{'}\beta , \sigma ^2), \end{aligned}$$
(12)
where the binary participation equation (Eq. 11) is first estimated with Probit by defining dummy variable indicating zero or positive expenditures. Second, the expenditure quation (Eq. 12) is assumed to follow a classic linear regression model, which is estimated with OLS by regressing log(Y) on a set of explanatory variables X.
The previous widely known applications of the Two-Part model include e.g. modelling of health expenditures (see Duan et al. 1983). The estimates of TPM can be compared to those of the Tobit model. If the estimates between these models and therefore the effect of certain variables on the two margins differ, it suggests in our application that there might be some kind of fixed-cost associated with the gambling participation.
Sample Selection Model, TPM and Endogenous Selection
The Sample selection model (SSM) (see Heckman 1979), on the other hand, defines a joint distribution for the censoring and the outcome, and then specifies the implied distribution conditional on the outcome observed. Sample selection models are used when the sample is not entirely random. For example when the participation in a survey is voluntary or the quantities asked are determined by the responders themselves, some of the surveyed individuals might be ashamed of their gambling behaviour and refuse to participate at all in the survey or even when participating, might report falsely/remember inconsistently details about their gambling.
SSM is estimated in two separate parts as TPM, but assuming that the error terms from the two equations (selection and outcome) are joint normally distributed. Usually the estimation of SSM is motivated by accounting for endogenous selection, if there are reasons to believe it might be an issue. However, for the identification of SSM, exclusion restrictions are needed. Therefore the estimation of sample selection model is justified, as long as there are convincing instruments for the exclusion restrictions that determine the selection process and can be excluded from the outcome equation for identification of the parameters of the model.
As mentioned above, usually, the use of TPM or SSM is argued to imply a trade off between assumptions about exogenous or endogenous selection. However, TPM is also shown to be a robust estimator in the case of endogenous selection (Drukker 2017). To see this, the observed outcome can be written as a product of participation dummy (d) and the value of the variable (w), so it either takes value w or zero
$$\begin{aligned} y = d \cdot w \end{aligned}$$
(13)
where
$$\begin{aligned} d = {\left\{ \begin{array}{ll} 1, &{} \hbox { if}\ \mathbf {x}\beta + \upsilon > 0 \\ 0, &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(14)
Now the conditional expectation \(E(d \cdot w|x)\) can be written by the law of iterated expectations as
$$\begin{aligned} E(d \cdot w|x) =&E_d[E(d \cdot w|x,d)] \nonumber \\ =&E_d[d \cdot E( w|x,d)] \nonumber \\ =&1 \cdot Pr(d=1|x) \cdot E(w|x, d=1) \nonumber \\&+ 0 \cdot Pr(d=0|x) \cdot E(w|x, d=0) \nonumber \\ =&Pr(d=1|x) \cdot E(w|x, d=1). \end{aligned}$$
(15)
The both terms, \(Pr(d=1|x)\) and \(E(w|x, s=1)\) \((= E(y|x, s=1))\), in the right hand side of Eq. 15 can be identified from the observed data. Therefore, \(E(d \cdot w|x)\) is also identifiable from data. Consequently, to identify the effect of covariates, \(\mathbf {x}\), on gambling expenditures, we do not necessary need to explicitly account for the endogenous selection by estimating SSM. As shown, the TPM estimator is consistent even in the case of endogenous selection. Our main interest in this study are the marginal effects of the demographic variables and thus we can safely ignore the possible endogenous selection issue.
Estimation Results for the Expected Gambling Expenditures
We start our analysis by estimating a standard Tobit model for a benchmark, which is the classical approach to the censored data. After that, a Two-Part Model (TPM) is estimated to analyse whether the Tobit model for the data is correct and whether there exists some form of stigma or fixed cost associated with some of the demographic factors. Furthermore, the TPM marginal effects are decomposed to analyse more throughout the association between the demographic variables and expected gambling expenditures.
The estimation results of Tobit and TPM in Table 3 reveal that the effects of demographic variables on gambling expenditures vary between these two models. Most of the coefficients have the same signs and significance levels. However, few exceptions exist. According to the TPM estimates, unemployment seems to decrease the probability of participation by 6%, but the effect on the level of expenditures is non-significant (positive coefficient), whereas the Tobit estimate is negative and non-significant. Furthermore, living in the rural area appears to be non-significant in the Tobit model, but on the other hand, has a significant positive effect on the expenditures conditional on participation in the TPM, increasing the expenditures on average by 11.8%.
Table 3 Tobit and Two-part estimates for the gambling expenditures The coefficient of (logarithmic) disposable income is positive and less than one for the participation equation. In other words, as income increases one percentage the probability of participation increases less than one percentage, 0.177%. In addition, the effect of squared income appears to have negative sign in all equations. That is, the effect of income is positive on participation and on the level of expenditures, but the effect is less the higher the income. Being male has consistently a significant positive coefficient in every equation; being male increases the expected gambling participation and the expenditures by 14.7% and 61.6% respectively. According to both models, age contributes positively on gambling participation and expenditures. TPM estimates suggest that the probability of participation increases by 1.4% with additional year of age, but again with decreasing rate. The TPM marginal effect of age on the expenditures is somewhat larger, 2.7%, and the effect does not fade out as the coefficient of squared age is non-significant.
Marital status has a consistent negative effect on gambling in every equation; individuals that are married participate and spend less on gambling than non-married individuals. In contrast, the results suggest that belonging to Lutheran church contributes positively on both, the gambling participation and the expenditures, although being significant only in participation equation in the TPM. Retirement status does not contribute on either, the gambling participation nor the expenditures according to the estimated models. Those individuals who have completed a university degree have significantly lower levels of gambling participation and the expenditures conditional on participation. Finally, the receiving of sickness allowances (being on sick leave during the last year) contributes positively to the probability of participation and the level of expenditures, however the association is significant only with the participation decision.
In addition, the R-square of the expenditure regression appears to be quite low. This means that we are left with a lot of unexplained variation in the dependent variable, the gambling expenditures. This is the usual case when modelling economic behaviour and decision making; there is a lot of “noise” in the human behaviour. This does not, however, mitigate the relevance of our results and does not imply that the estimated marginal effects are biased. As we are interested in estimating the marginal effects of the demographic variables on gambling expenditures and not trying to forecast or predict the gambling expenditures as precise as possible, we do not need every possible variable that is associated with gambling expenditures. It can actually be more beneficial to leave out additional variables for the analysis of marginal effects to avoid problems as multicollinearity.
Decomposition of TPM Marginal Effects
The estimation results between the Tobit model and Two-part model contradict to some extent, which suggests that some of the socio-demographic variables do not contribute to the extensive and intensive decision margins of gambling similarly. Thus, implicating the Tobit model might not be the appropriate model for the data generating process of the gambling expenditures. The TPM marginal effects can be further analysed by calculating the decomposedFootnote 5 effects on both margins and the sum of these two; the total effect of particular variable on the expected gambling expenditures.
By the decomposition of the TPM marginal effects it is also possible to analyse the relative magnitudes of the two components to the total expected gambling expenditures. In addition, as our main interest in this study lies on how different socio-demographic factors contribute on the (total) expected gambling expenditures, it is therefore also crucial to calculate the two decomposed effects. Furthermore, the decomposition is also extremely important because if the mechanism is, for instance, solely through the expenditure margin, it implies that individuals of certain demographic group have higher probability to spend more on gambling conditional on participation. Consequently, this can also be seen as an indicator of increased probability of gambling related problems among particular socio-demographic groups as high gambling expenditures are the most significant predictor of gambling related problems (Markham et al. 2014).
The total effect has two components because the explanatory variables are expected to affect both decisions separately. The (unconditional) expectation of the level of gambling expenditures can be written as
$$\begin{aligned} E(G)=Pr(G>0)E(G|G>0), \end{aligned}$$
(16)
where \(Pr(G>0)\) is the sample proportion of gamblers and
\(E(G|G>0)\) is the mean expenditure of those who have gambled. The marginal effect of explanatory variable, \(X_i\), on the total expected gambling expenditures is thus
$$\begin{aligned} \frac{dE(G)}{dX}=\frac{dPr(G>0)}{dX}E(G|G>0) \nonumber \\ +\frac{dE(G|G>0)}{dX}Pr(G>0). \end{aligned}$$
(17)
From Eq. 17 can be seen that the marginal effect of the explanatory variable on expected gambling has two components; the first term is the participation effect (extensive margin) and the second term is the expenditure effect (intensive margin). The proportion of gamblers \(Pr(G>0)\) and the mean expenditure of those who have gambled \(E(G|G>0)\) are observable from the data. The marginal effects dE(G) / d(x) and \(dPr(G<0)/dX\) are the estimated coefficients presented in Table 3. Thus, it is quite straightforward to calculate the decomposed effects by using these above described values.
Table 4 presents the decomposed TPM effects and their sum as in Eq. 17. Calculations include only the covariates that had any significant coefficient in Table 3. The sample means of the covariates are used in the calculations. The decomposed TPM marginal effects reveal that many of the variables have vastly different effect on the total expected gambling expenditures and that they clearly differ from the Tobit assumption of the same process for the both margins. However, if the effect of a covariate is different in these margins, it does not necessarily imply that the Tobit model fails to estimate correctly the (weighted) sum of these two effects on the total effect of X on expected amount of gambling, dE(G) / dX.
Table 4 Decomposed TPM marginal effects However, from the Table 4 it can be seen that the decomposed effects of income, male gender, belonging to church, being unemployed, having a university degree and living in rural area are not proportional between the two margins. Furthermore, age, marital status and receiving of sickness benefits all have quite proportional effects. Income increases the expected gambling expenditures proportionally more through the consumption margin. However, the effect on expenditure is declining by a faster rate than the effect on participation. The overall income elasticity (total effect) appears to be less than one. Therefore, the results suggest that lower income individuals have proportionally higher gambling expenditures.
Moreover, men participate and spend more on gambling and the effect on the expected expenditures is mainly via the consumption margin. Keeping other factors constant, the expected gambling expenditures of men are approximately 68 % higher than women’s and two thirds of this effect originates from expected expenditures conditional on participation. One additional year of age increases on average the expected gambling expenditures by 4.2 %. Being married, on the other hand, decreases the expenditures by approximately 20 %. Belonging to the Lutheran church increases proportionally more the expected expenditures via participation margin. Having a university degree cuts the expected expenditures almost in half, the effect emerging a little more through the expenditure margin. Receiving of sickness benefits increases the expected gambling expenditures by 19 % when other demographic factors are kept constant.
As before, regarding being unemployed and living in the rural area, the decomposed effects have opposite signs; negative on the participation probability and positive on expenditures conditional on participation. However, the total effect of these covariates do not have the same signs as the total effect of being unemployed is negative and living in the rural area is positive. Rural residents are expected to have 7.6 % higher gambling expenditures on average when other factors are kept constant and the effect is also clearly more through the expenditure channel.
Quantile Regression Analysis of Gambling Expenditures
Regarding the consumption of certain vice goods, as gambling, the interest usually lies on the behaviour of individuals that belong to the right tail of gambling expenditure distribution, that is, those who have higher gambling expenditures. Thus, it is important to study how the demographic characteristics contribute to gambling expenditures in different parts of the gambling expenditure distribution (positive part), as this may lead to considerably richer conclusions about the association of certain background variable with gambling expenditures. By estimating the quantile regression model for the positive gambling expenditures, it is possible to study whether there is heterogeneity in how these demographic factors contribute to conditional gambling expenditures. The other advantage of quantile regression, compared to least squares regression, is that it is more robust to outliers and it requires weaker stochastic assumptions for consistency. Consequently, quantile regression, in contrast to OLS, estimates the quantiles of the conditional distribution of gambling expenditures, y, given the demographic variables, \(\mathbf {x}\). Therefore, quantile regression gives a more overall picture of the data, not just around the mean as OLS regression. Thus, the conditional mean function of least squares estimates can be seen at some level as an incomplete picture of the joint distribution of the response and the explanatory variables in case there is variation in the estimates in different parts of dependent variable’s conditional distribution.
The quantile regression estimator \({\hat{\beta }}_q\) can be written as a minimization problem of objective function (Eq. 18) over \(\beta _q\)
$$\begin{aligned} Q_N(\beta _q) = \sum _{i:y_{i} \ge \mathbf {x'_{i}} \beta }^{N} q|y_i-\mathbf {x'_{i}} \beta + \sum _{i:y_{i} < \mathbf {x'_{i}} \beta }^{N} (1-q)|y_i - \mathbf {x'_{i}} \beta , \end{aligned}$$
(18)
where \({\hat{y}}\) is linear in \(\mathbf {x}\) and therefore \(e = y - \mathbf {x'_{i}} \beta\). From Eq. 18 it can be seen that the estimates of \(\beta\) differ between the choices of quantiles, q. The special case where \(q=\frac{1}{2}\) is the median regression estimator or the least absolute deviations (LAD) estimator.
Quantile Regression Results
The quantile regression results are presented in Fig. 1, where the values on the X-axis show the quantiles of gambling expenditure distribution and on the Y-axis are the coefficient values. The black dots and lines are the estimated quantile coefficients and the gray shadowed area is the 95% confidence interval of quantile estimates. The constant line is the OLS estimate and the dashed lines are the 95% confidence interval for the OLS estimate.
The results show that most of the variables have quite constant effect over the conditional distribution of gambling expenditures. However, the quantile regression coefficients of income differ statistically significantly from the OLS estimate in the both tails of the conditional expenditure distribution. In the lower tail of conditional gambling expenditure distribution (1st decile), income does not contribute at all to gambling expenditures. In contrast, at the right tail, at the 9th decile, 1 % increase in income is associated with more than 1% increase in gambling expenditures. Although, the effect seems to also dissipate more rapidly in the highest decile. The quantile estimate of male gender differs statistically significantly from the OLS estimate at the 8th decile. However, the estimates below median are less than the average OLS estimate, while in turn being higher above the median.
Moreover, the quantile regression estimates of age do not differ statistically significantly from the OLS estimate. However, Fig. 1 also shows that the quantile estimates of age are higher in the tails of the conditional distribution, so age contributes less on gambling expenditures at the median than in the lowest or highest deciles of conditional distribution. In addition, as was the case with income, the effect of age is also dissipating with faster rate in the both tails of expenditures conditional distribution. The quantile estimates of being married are smaller than the OLS estimate in all, but the highest, 9th, decile. Regarding belonging to Lutheran church, the quantile estimates do not differ from the OLS estimate, suggesting to have quite uniform effect at different levels of gambling expenditure.
Furthermore, the quantile estimates of being unemployed increase along the conditional distribution of gambling expenditures, however non of the quantile estimates differs statistically significantly from the average effect at any point. Having a university degree and being retired have almost constant effect over the whole distribution as the average effect states. The receiving of sickness benefits decreases the expected gambling expenditures and differs statistically significantly from the OLS estimate at the 3th decile. The estimates increase from there on, but not differ from the OLS estimate. Living in rural area contributes less to the gambling expenditures at the lower part of the conditional distribution than at the right tail, however none of the quantile estimates differ statistically significantly from the OLS estimate.
Analysis of the Distribution of Gambling-Tax Based Contributions
As we have studied how the socio-demographic background factors are associated with gambling expenditures, the next task is to analyse how these compare to the allocation of the gambling-tax based contributions. Thus, we analyse who are the most probable beneficiaries from the public spending of these gambling-based tax revenues. Consequently, by comparing the two estimation results, the demographic incidence gambling and the distribution of gambling-based contributions, inferences about the tax incidence of gambling can be made. In other words, we examine who are the “winners” and who are the “losers” of the gambling taxation system in Finland.
To analyse the distribution of gambling-tax based benefits the following OLS regression is estimated
$$\begin{aligned} log(Benefit_{i}) = \alpha + \mathbf {{x'}}_{i}\beta + \epsilon _{i}, \end{aligned}$$
(19)
where \(Benefit_{i}\) is the level of benefits per capita in individual i’s home region. \(\alpha\) is constant, \(\mathbf {x'_{i}}\) is the vector that contains the set of individual’s background variables as before and \(\epsilon _{i}\) is the error term. The estimates of \(\beta\) tell how much individuals with certain background characteristics are expected to receive gambling benefits at county level keeping other individual characteristics constant.
Table 5 presents the OLS estimates of the Eq. 19 for the distribution of the contributions sourced from gambling expenditures. The results reveal that on average the expected benefits decrease with income; 1% increase in income is associated with 0.1% decrease in expected benefits. In addition, the effect dissipates as individuals income raises, turning to positive already after relatively small level of income. Thus, individuals with lower income are expected to have proportionally less gambling-tax based contributions at their home region than individuals with higher income.
Furthermore, the results show that individuals that are married, belong to Lutheran church and live in rural area are expected to receive 2.2%, 5.5% and 9.3% less benefits respectively. In contrast, individuals with university degree are expected to receive 3.4% more benefits in their home region on average than individuals with no degree.
Table 5 Estimation results for the distribution of gambling-tax based contributions