1 Introduction

Unemployment is a global issue at the moment that every nation is striving to keep it at its minimum level. According to International Labour Organization report on Global Employment Trends, there were about 202 million people who were unemployed in 2013 around the world after the 2008 global financial crisis. However, the impact of unemployment in developing countries getting worse mainly due to the unbalanced relationship between the rate of economic development and the rapid population growth [1].

The gender gap in unemployment prior to the crisis from 2002 to 2007 was constant on average at 0.5% points with female unemployment in 2007 higher at 5.8% than male unemployment which stood at 5.3%. The crisis actually raised this gap by 2011 to 0.7% points with women’s unemployment reaching at 6.4% and men’s unemployment also plateauing at about 5.7%. Though, there would be no a significant reduction in the unemployment gap even by 2017 [2].

Women have always faced a number of disadvantageous gaps in the labour market; the status of women at the labour markets throughout the world has not substantially narrowed gender gaps in the workplace [3]. According to [4] assessment, many employers expressed a preference for male workers on the grounds that women were seen to have a weaker attachment to the labor market, with higher rates of absenteeism and turnover.

In developing countries six out of ten women work in the informal economy often as self-employed. Many of these women are domestic workers or informal factory workers, while others are unpaid workers in family enterprises and family farms. Agriculture is the primary sector of women’s employment. Women constitute 41% of the total employment in the agricultural sector. The regions with the highest proportion of women in the agriculture sector are East and South East Asia, the Middle East and Sub-Saharan Africa [3].

Like other developing countries, the labour market in Ethiopia is also typically characterized by huge inefficiency and underdevelopment [5]. In urban Ethiopia the majority of employed population is self-employed (37.6%) followed by those employed by government 22.0% and private organization 19.3%. The amount of paid employees altogether constitutes about 50.0% of the total working population [6]. Ethiopian Time Use Survey [7], which tracked the percentage of males and females above the age of 10 participating in various economic activities, also found that urban women (31%) were likely to be more unemployed compared to urban men (21%).

Most studies on urban women unemployment in Ethiopia were limited to reporting descriptive results, explaining the prevalence of unemployment with corresponding socio-economic and demographic variables [6]. However, the major causes of women unemployment need to be identified, to provide an indication of the quantitative importance of different categories regarding women unemployment than the descriptive analysis suggests. Therefore, this research applies Bayesian approach to conduct statistical inference using logistic regression model on women employment status data as in [8, 9] and examines whether the estimated parameters adopting Bayesian and Maximum likelihood approach are similar or not.

2 Materials and Methods

This research was conducted in Harari regional state which is located in the eastern part of Ethiopia. Harari people’s regional state divided in six urban and three rural administrative districts. The settlement pattern of the region is different from other regions of the country where 62% of the population reside in urban area. Harar is the capital city of Harari people’s regional state; which is located in east at a distance of 510 km from country’s capital Addis Ababa [10].

The study includes women who reside in the conventional households that comprise all women who live in urban districts during the survey time by virtue of usual residence; excluding visitors, homeless and women residing in collective quarters. Stratified sampling was employed as sampling design, with households as the sampling units. The sampling frame was divided in six urban administrative districts, that is Hakim, Amir nur, Abadir, Shenkor, Awbeker and Jenela. The data collection was held between June and July 2015 by selecting households within strata (urban districts) using systematic sampling technique.

2.1 Variables Considered in the Study

The response or dependent variable used in the study was employment status of women (Y). This variable is dichotomous and the classification was based on ILO’s definition [11]. That is, women who are simultaneously “without work”, “currently available for work” and “seeking work” are considered as unemployed. However, in this paper employment status of a woman is classified as unemployed (This woman was not working for pay during the period of the survey) and otherwise employed. Thus, the outcome for the ith woman is represented by a random variable Yi with two possible values coded as 1 (unemployed) and 0 (employed). The independent variables which are included in this paper were obtained from literature reviews and those assumed to be factors associated with employment status of women.

2.2 Inferential Statistics

2.2.1 Bayesian Logistic Regression

The basis for Bayesian inference is derived from Bayes’ theorem. Even though, there are important theoretical advantages of Bayesian analysis over classical inference [12]. The Bayesian approach provides a very different method to the problem of unknown model parameters, the uncertainty about the unknown parameters is quantifiable using probability distributions [13], so that the unknown parameters are considered as random variables. The Bayesian inference is based on the posterior distribution of these unknown parameters. The posterior distribution of the model parameters is set up of the likelihood function of the sample data and prior distribution of the parameters in the model. Thus, the Bayesian model generally defined as:

$$ \pi (\theta |y) = \frac{f(y|\theta )\pi (\theta )}{f(y)} $$
(1)

where \( \pi (\theta |y) \) is posterior distribution of parameter \( \theta ,f(y|\theta ) \) is the likelihood function of sample data given parameter θ, \( \pi (\theta ) \) is the prior distribution of parameter θ and the denominator f(y) which is equal to \( \smallint f(y |\theta )\pi (\theta )d\theta \) is the marginal distribution of the sample data (y). The presence of the marginal distribution of y normalizes the joint posterior distribution, that is \( \pi (\theta |y) \) guaranteeing it is a proper distribution and integrates to one. By considering f(y) as constant number, Bayes’ theorem will lead to a posterior density function written as:

$$ \pi (\theta |y) \propto f(y|\theta )*\pi (\theta ) $$
(2)

Bayesian logistic regression procedure was used to make inference about the parameters of a logistic regression model. The purpose of this method is generating the posterior distribution of the unknown parameters given both the data and some prior density for the unknown parameters. Bayesian Statistics provides much more complete picture of the uncertainty in the estimation of the unknown parameters, especially after the confounding effects of nuisance parameters are removed [14, 15].

2.2.2 Likelihood Function

The likelihood function in Bayesian approach is analogous to that of frequentist scheme. The joint distribution of n independent Bernoulli trials is the product of each Bernoulli densities, where the sum of independent and identically distributed Bernoulli trials has a Binomial distribution.

The probability of success in logistic regression varies from case to case depending on their corresponding covariates. By considering y1, y2.,…,yn as independent Bernoulli trials with corresponding probabilities of success πl for l = 1,2,…,n, the likelihood function can be depicted as product of n Bernoulli trials:

$$ L(\beta |y) = \prod\limits_{l = 1}^{n} {\left[ {\pi_{l}^{{y_{l} }} (1 - \pi_{l} )^{{(1 - y_{l} )}} } \right]} $$
(3)

Since, each subjects are assumed to be independent to each other, the likelihood function over data set of subjects is written as:

$$ L(\beta |y) = \prod\limits_{l = 1}^{n} {\left( {\frac{{e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}{{1 + e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}} \right)^{{y_{l} }} \left( {1 - \frac{{e^{{\beta_{o} + \beta_{1} x_{1} + \cdots + \beta_{K} x_{K} }} }}{{1 + e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}} \right)^{{(1 - y_{l} )}} } $$

where \( \pi_{l} = \frac{{e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}{{1 + e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }} \) which represents the probability of success in logistic regression model for lth subject with covariate vector Xl.

2.2.3 Prior Distribution

The main difference between the classical and the Bayesian framework is the introduction of prior information in the form of probability distributions [16]. To obtain the posterior distribution of the parameters in the model, and to make inference about the posterior parameters, prior distribution is included to the model.

When prior information is available about β, it should be added in the model. However, after conjugate priors were introduced, most applied Bayesian modeling has used vague (non-informative) priors [17]. Thus, by considering a vague normal prior with a very lower precision (higher variance) for the regression coefficient, the model parameters can be estimated using WinBUGS software [18]. In this study normally distributed non-informative prior was used, which is a common prior for logistic regression coefficients (βi). The prior distribution of βi is given as:

$$ \pi (\beta_{i} ) = \frac{1}{{\sqrt {2\pi \sigma_{i}^{2} } }}\exp \left\{ {\frac{ - 1}{2}\left( {\frac{{\beta_{i}^{{}} - \mu {}_{i}}}{{\sigma_{i}^{{}} }}} \right)^{2} } \right\},\;{\text{where}}\;i = 0,1, \ldots ,k $$
(4)

2.2.4 Posterior Distribution

Once the prior distribution of the parameter and the likelihood function are specified, we need to model the posterior distribution of each parameter. The posterior distribution is obtained as the product of the prior distribution of the parameters and the likelihood function [19]. Thus, the posterior distribution given on (2) can be represented as follows:

$$ \begin{aligned} & \pi ({{\beta}} |y) \propto L({{\beta}} |y) {{\pi}} ({{\beta} } ) \\ & \quad = \;\prod\limits_{l = 1}^{n} {\left( {\frac{{e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}{{1 + e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}} \right)^{{y_{l} }} \left( {1 - \frac{{e^{{\beta_{o} + \beta_{1}^{{}} x_{1} + \cdots + \beta_{K} x_{K} }} }}{{1 + e^{{\beta_{o} + \beta_{1} x_{1} + \cdots \beta_{K} x_{K} }} }}} \right)^{{(1 - y_{l} )}} } \\ & \qquad \times \;\prod\limits_{i = 0}^{k} {\frac{1}{{\sqrt {2\pi \sigma_{i}^{2} } }}} \exp \left\{ {\frac{ - 1}{2}\left( {\frac{{\beta_{i} - \mu_{i} }}{{\sigma_{i} }}} \right)^{2} } \right\} \\ \end{aligned} $$

2.2.5 Gibbs Sampling

The computation to estimate coefficients of the posterior distribution could be mathematically unthinkable; to avoid such complexity, it is advisable to use non numerical integration method like simulation techniques [20].

Gibbs sampling is used to compute the posterior distribution model parameter (Θ), where Θ = θ1,…,θk. The data set which is obtained by adopting Gibbs sampling to the model, converges to the joint posterior distribution of the parameters [21]. The Gibbs sampling allows us to sample from a multivariate distribution using full conditional distributions. A full conditional distribution is the conditional distribution of a parameter given all of the other parameters in the model, that is

$$ {\text{P}}(\uptheta_{1} |,\uptheta_{2} , \ldots ,\uptheta_{\text{k}} ,data),\;{\text{P}}(\uptheta_{2} |,\uptheta_{1} , \ldots ,\uptheta_{\text{k}} ,data), \ldots ,{\text{P}}(\uptheta_{\text{k}} |,\uptheta_{1} , \ldots ,\uptheta_{{{\text{k}} - 1}} ,data) . $$

The Gibbs sampling algorithm for parameters \( {{\uptheta}}_{{\mathbf{j}}} (j = 0,1, \ldots ,{\text{k}}) \) is implemented by sampling from the full conditional distributions according to the following listed steps below [22].

  1. 1.

    Initialize the iteration counter of the chain i = 1 and set initial values

    $$ {{\uptheta}}^{(0)} = \{\uptheta_{0}^{(0)} , \ldots ,\uptheta_{\text{k}}^{(0)} \} $$
  2. 2.

    Obtain new value \( {{\uptheta}}^{{({\text{i}})}} = \{\uptheta_{0}^{{({\text{i}})}} , \ldots ,\uptheta_{\text{k}}^{{({\text{i}})}} \} \) from θ(i−1) through successive generation of values

    $$ \begin{aligned} &\uptheta_{0}^{(i)} \;from\;{\text{P}}(\uptheta_{0} |\uptheta_{1}^{(i - 1)} , \ldots ,\uptheta_{\text{k}}^{(i - 1)} \\ &\uptheta_{1}^{(i)} \;from\;{\text{P}}(\uptheta_{1} |\uptheta_{0}^{(i)} ,\uptheta_{2}^{(i)} , \ldots ,\uptheta_{\text{k}}^{(i - 1)} \\ &\uptheta_{2}^{(i)} \;from\;{\text{P}}(\uptheta_{2} |\uptheta_{0}^{(i)} ,\uptheta_{1}^{(i)} , \ldots ,\uptheta_{\text{k}}^{(i - 1)} \\ & \ldots \\ &\uptheta_{\text{k}}^{(i)} \;from\;{\text{P}}(\uptheta_{\text{k}} |\uptheta_{1}^{(i)} ,\uptheta_{2}^{(i)} , \ldots ,\uptheta_{{{\text{k}} - 1}}^{(i - 1)} \\ \end{aligned} $$
  3. 3.

    Change i to i + 1 and return to step 2 until convergence is reached.

3 Results and Discussion

3.1 Descriptive Statistics

According to the data gathered from 274 sampled households indicates about 144 (52.6%) of them were unemployed that is, they were not involved in any activity for earning during the data collection. The summarized household information in Table 1 shows that, the proportion of women with respect to their age categories 15–24 years, 25–34 years, 35–44 years and 45 ≥ years were 30.7%, 33.6%, 22.6% and 13.1% respectively.

Table 1 Demographic and socio-economic characteristics of women at Harari region urban districts

The distribution of women based on their educational level shows that 42 (15.3%) illiterate, 87 (31.8%) primary, 89 (32.5%) secondary and 56 (20.4%) college or higher. Similarly, the percentage of women who had family size ≤ 4, 5–7 and 8 and above were 29.6%, 54.4% and 16.1% respectively. Regarding women’s role, about 30.3% of women had household headship role and 51.5% of the women were addicted to tobacco and stimulant plant known as khat, locally call it chat. To observe the association between women employment status and the explanatory variables, Chi square test was conducted. As the test result revealed in Table 1, except household headship and exposure to media, most of the variables had statistically significant association with dependent variable (p < 0.05).

3.2 Bayesian Estimation for Logistic Regression Model

Bayesian analysis was adopted to make inference about the parameters of a logistic regression model which is applied to model women’s employment status in urban districts of Harari region, Ethiopia. Bayesian method gives estimates of parameters by sampling them from their posterior distributions by Markov Chain Monte Carlo (MCMC) techniques. The result of the model parameters in this study were computed by MCMC techniques, especially Gibbs sampler algorithm methods using WinBugs software [23]. The Gibbs sampler algorithm was implemented with 25,000 iterations in two different chains, 10,000 burn-in terms discarded, so that the 15,000 iteration are sampled from the posterior distribution. However, in order to check that the sample was truly representative of the stationary or posterior distribution, various schemes of diagnosis were applied to check the convergence of the Markov chains to the target distribution.

3.3 Model Assessment

Before making inferences and prediction about the posterior distribution of the parameters in the model, it is essential to conduct some diagnostics to assess whether the Markov chain has converged to its stationary or posterior distribution. The Gibbs sampler algorithm with two simultaneous chains running provides numeric and graphical summaries of the estimated univariate marginal posterior distributions of the requested model quantities, which are used to check convergence [24,25,26]. Time series plot is one of methods of assessing the convergence of the Markov chain to its posterior distribution. The values on the Y-axis are the posterior parameter values and the value on the X-axis is the number of iteration made to sample the corresponding values from their posterior parameter (Fig. 1). The time series plot below indicates that convergence is achieved since; the two separately generated chains are mixed together [24].

Fig. 1
figure 1

Time series plots for convergence of coefficients for child less than 5 years old (beta [18]) and illiterate (beta [8])

The autocorrelation function for the chain of each parameter is mixing well with autocorrelation vanishing before 40 lags (Fig. 2). This indicates the independence of the current value from previous one and convergence of the model parameters to their target distributions [27].

Fig. 2
figure 2

Autocorrelation plots for convergence of coefficients for child less than 5 years old (beta [18]) and illiterate (beta [8])

Gelman –Rubin Statistics is graphical method used to check whether the Markov chain has converged to its stationary distribution. To apply the test, it is necessary to run two or more chains in parallel with different initial values. The test compares the variances within and between the chains. In the plots of the Gelman-Rubin statistic, the lower two lines represent the within and between chain variations, respectively and the upper line is the ratio of the between and within chain variations (Fig. 3). The lower two lines are stable and the upper line converges to 1, which imply that the chain has converged to its target distribution [25].

Fig. 3
figure 3

Gelman –Rubin Statistics for convergence of coefficients for child less than 5 years old (beta [18]) and illiterate (beta [8])

Kernel Density plot is another way of checking the convergence of model parameters to proposed posterior distribution. The coefficients of all the predictors have unimodal density (Fig. 4). Thus, the simulated parameter values were converged to known target distribution [27].

Fig. 4
figure 4

Kernel Density plot for convergence of coefficients for child less than 5 years old (beta [18]) and illiterate (beta [8])

The convergence of the Markov chain initially was assessed visually using various plots. Even though, beside the graphical method the convergence of the chain to its posterior distribution can be checked using the numeric summaries of the estimated univariate marginal posterior distributions of the specified model parameters [26]. The simulation in the study was run until the Monte Carlo error for each parameter of interest is less than 5% of its corresponding posterior standard error, which justifies convergence and accuracy of posterior estimates [28].

The numeric summary estimates of the MCMC algorithm, includes Monte Carlo (MC) error, Posterior mean, Standard error and a 95% confidence intervals for posterior mean. For convenience of explaining the analysis result, estimates of the parameters are depicted in Table 2 in terms of odds ratio that is, the exponential of the estimates’ (Exp (Mean)).

Table 2 Posterior distribution parameter estimates for bayesian logistic regression model

The Bayesian analysis result in Table 2 indicates among the explanatory variables considered in the model, Age, Education Level, Husband/partner occupation, Pregnancy, Marital Status, Family Size, Training and a Child less than 5 years old had statistically significant (p < 0.05) effect on women’s employment status. Women in age categories between 15 and 24 and 24 and 34 were 14.348 and 6.297 times more likely to be unemployed than women in age group 45 and above,respectively. Regarding education level, the odds of unemployment among illiterate and primary level women were 31.249 and 7.022 times higher than women attending college or higher, respectively.

According to the study, pregnancy was one of the predictors found to be significantly associated with women’s employment status, women who were pregnant during the study were 26.024 times more likely to be unemployed than the reference category.

As the result reported in Table 2 indicates, women’s participation in different short-term training programs provided by government had significant impact on determining women’s employment status. The likelihood of being unemployed was 0.09778 times less among women participated in training programs than their counterparts. Family size and a child less than 5 years old were among significant variables identified by the model. As illustrated in Table 2, women who live in household with family members ≤ 4 were 0.20784 times less likely to be unemployed than women with family members 8 and above.

3.4 Maximum Likelihood Estimation for Logistic Regression Model

As shown in Table 3, the Hosmer and Lemeshow test statistic for maximum likelihood approach has significance value 0.711 which is greater than 0.05, that confirms the model adequately fits the data. From Table 3, the 95% confidence intervals for maximum likelihood estimates not containing one in the intervals indicate explanatory variables that had statistically significant effect on women employment status, that is Pregnancy, Age, Education Level, Husband/partner occupation, Marital Status, Family Size, Training and a Child less than 5 years old were statistically significant (p < 0.05).

Table 3 Bayesian and maximum likelihood estimates for logistic regression model

4 Conclusions

This study was held with objective to determine the factors associated with women employment status and to compare the analysis results obtained following the Bayesian and Maximum Likelihood approaches. The descriptive analysis indicated that about 144 (52.6%) of the women were unemployed that is, they were not involved in any activity for earning during the data collection. From the inferential statistics, Pregnancy, Age, Education Level, Husband/partner occupation, Marital Status, Family Size, Training and a Child less than 5 years old had significant effect on employment status of women. The maximum likelihood estimates for logistic regression model parameter do not have considerable differences to the Bayesian estimates and the independent variables which were statistically significant under maximum likelihood approach were also significant under Bayesian method of estimation.