1 Introduction

Women’s labour market access is an important societal, political and economic issue. In fact, despite the consistent steady growth that took place in European countries since the beginning of the ’80 s, contemporary labour markets are still characterised by gender inequalities in access to employment. However, women’s disadvantage does not distribute evenly between countries, and, furthermore, even within countries there are strong territorial disparities that shape women’s capacity to access employment. Southern Europe is the region that shows the persistence of the strongest inequalities in accessing employment between genders. If we look at the EIGE inequality index in the domain of access to work, we see that three out of four countries that traditionally are included in the Southern European model are well below the European average (81.3) and countries like France (83.7) or Germany (84.2): Spain (80.2), Greece (72.7) and Italy (69.1). The only exception to this rule is Portugal, with 88.2. The EIGE indexFootnote 1 is a composite indicator proposed by the European Institute for Gender Equality, which allows a comparison between European countries in terms of gender equality. It includes several dimensions, although we have considered only the domain of access to work in this article. High values of this index denote more equal access to employment and good working conditions between men and women.

Women’s access to the labour market is complicated by cultural reasons [1], but also by the specific configuration that the welfare state assumed in the Southern European countries [2]. In particular, the Southern European model of social protection [3] has always promoted a male breadwinner model, in which women’s labour has been considered a secondary (and expendable) contribution to the main income that men procured to their household [4]. In this model, women are traditionally assigned to care and domestic role, while the state does not intervene to support the unpaid work provided by women [5]. Despite the unfavourable societal and political context that explains the lag in gender inequalities in Southern Europe, women’s individual characteristics also take a role. In fact, the level of education, presence of children and family configuration affect the capacity of women to access the labour market [6]; the recent crises [7, 8] increased the inequality within women, as women most likely lose their jobs when they were the low educated, with a migrant background or in the presence of young children. In this paper, we aim to study what are the individual characteristics that might affect women’s labour market participation and to what extent being a resident in one region or another can shape the relationship between gender and access to employment.

We consider data from the European Social Survey and focus on dataset round 9 (2018), the latest round available at the time of this research. This wave includes data among others on social conditions and indicators, inequality and social exclusion, elderly, youth, children, family life and marriage, with a special module on the timing of life, justice and fairness. The data come from interviews with respondents from Italy, Spain, Portugal and Cyprus, which are the only countries from South Europe in dataset round 9. We are interested in the probability that a respondent has been doing paid work in the last seven days (from the interview). The dataset collects further information, that we treat as covariates in a logistic regression model: gender, age, number of children, being an immigrant, education level, if the respondent has a partner, and if the partner has a job. We have also included a dummy variable for each country the respondents are from, splitting Spain and Italy into two parts.

We consider the Bayesian approach to the model since it automatically provided uncertainty quantification of the regression parameters. The posterior inference is computed through simulation methods easily obtained using probabilistic programming languages. The findings of the Bayesian logit model confirm that individual characteristics of women (e.g. education, migrant background) are mediated by household characteristics (e.g. having a partner, presence of children) in their influence on women’s access to the labour market. Moreover, women’s access to the labour market is unevenly distributed and depends on the different economic performances of regions.

The remainder of this paper is structured as follows. Section 2 describes all the difficulties for women to access the labour market in Southern Europe, while Sect. 3 introduces the dataset we have analyzed and a preliminary exploratory analysis. In Sect. 4 we present the Bayesian logit model we apply to the data. Section 5 presents the posterior inference and findings from this specific dataset. Section 6 concludes the paper with a discussion.

2 Women’s labour market access in Southern Europe

2.1 Women’s dilemma between paid and unpaid work

The role of family commitments in preventing women from accessing labour has always been debated from the feminist perspective on the gendered labour market. From one side, authors (as Williams [9]) argued that it is the role of the gendered unbalance in unpaid work (that is, care and domestic work) that prevented women from being active in the labour market as men do. Conversely, authors like Walby [10] sustained that the disadvantaged conditions of women’s in the labour market exacerbated women’s predominance in reproduction. The expression Work-Family role system indicates the interconnections of family, care policies and labour market regulation [11]; it indicates the strong degree of interdependence between the possibility that women have of taking up employment and the organisation of the family. If it is difficult to decide what comes first (gender roles or the labour market), exploring the relationships between the two is important.

This debate started already in the ’70 s, with the neoclassical theory of Becker [12] who sustained that the most rational choice for a household is to have one member specialised in paid activities and one member that specialised in unpaid activities necessary for reproduction. In his economic perspective, the most efficient equilibrium is given when the first were men and the second were women: men are usually paid more on labour markets, while women are naturally inclined to care, so they are more efficient in doing unpaid work. The fact that this household organization exacerbates gender inequality between men and women is not at all considered an issue in this theoretical perspective. A second important intervention was proposed by Hakim [13] with her theory of preferences. She focuses on analysing differences among women and puts forward two main ideal types of women confronting the Work-Family role system: the first group called the grateful slaves are women who prefer family over work commitment and the second one the self-made women have on the contrary a preference for work. She measures in a later work [14] that the two groups weigh 20% each, while about 60% of women have mixed preferences - that explains the variability in women’s labour market access in real labour markets.

Hakim’s intervention gave ground to fierce reactions, especially among feminist scholars. She was accused of blaming the victim [15], instead of pointing at the real constraints that women faced when trying to access the labour market: a gendered distribution of unpaid work, culturally grounded in the European societies [1] and a system of institutions that locked women into their role of mothers [16].

From a cultural perspective, gender norms prescribe the “correct”number of working hours for women: institutions are shaped over the gendered norms and reproduce inequalities that have their roots in how societies assign roles to genders [1]. From the institutional perspective, the reconciliation of work commitments outside of the home with family duties is strongly influenced by services available that effectively allow women to work. Women are thus structurally compelled to work less than men because labour markets are incompatible with unpaid work and the welfare state does not always offer adequate support for externalizing care outside families [15, 17].

Since family workloads are particularly significant when there are small children or elderly family members to look after, comparative studies reveal the importance of the availability of public childcare services and parental leave, if female employment is to be facilitated [5, 18, 19]. Many studies of the development of the welfare state from a gender perspective have shown, however, that the way in which childcare and elderly care becomes the responsibility of the State, varies enormously from one country to another [20]. In this regard, Naldini and Saraceno [21] utilise the concepts of defamilisation to describe the various degrees to which different welfare regimes pursue the redefinition of gender relations. To be more precise, the term defamilisation refers to the capillary presence of those services that release women in particular from the task of looking after dependant family members (e.g. young children), and thus increase their chances of pursuing a career which would otherwise be seriously threatened by the problem of reconciling work and family commitments. On the other hand, the term supported familialism means the existence of periods of leave, parental allowances and monetary transfers, which permit a person to interrupt work to look after a family member, thanks to the financial aid provided. In the absence of either type of policy, then familialism by default is the term used [5], which necessarily implies the family shouldering all responsibility for providing such care.

Geographically speaking, countries that are mora capable of reducing women’s commitment to unpaid work are Scandinavian and Northern countries: in Scandinavian, the externalisation occurs thanks to the involvement of the State as a provider of services; in countries such as the UK or Ireland, it occurs through private service paid on the market. The supported familialism characterised Continental Europe, with countries like France, Germany or Austria extensively supporting families with monetary transfers and services in their reproductive role. Lastly, Southern European countries are characterised by the lack of support for families, with women entrapped in the double shift of being workers and mothers [22].

2.2 The specificity of Southern Europe

Maurizio Ferrera proposed the Southern European Welfare model in 1996. The original formulation of the welfare regime theory by Esping Andersen [23] included Southern Europe in the regime of Conservative countries. However, scholars like Ferrera were unsatisfied with this classification, as these countries were distinguished for several characteristics from the Conservative group. In fact, even if it is true that labour market policies are organised on a Bismarckian model of assurance like France or Germany [24], Southern Europe has a limited intervention in social care and family policies that completely distinguish these countries from Continental ones [22, 25]. Although following the social investment turn in European policies increased attention to the childcare service has also been taking place in the South, the provisions of childcare lag behind Continental and Northern Europe [26]. It thus configures a situation in which there is a scarcity of publicly provided alternatives to family care, while the cash transfers to families with needy members are low.

The limited intervention of the state, defined by Chiara Saraceno the unsupported familialism [27], implies that the unpaid work is considered by the Work-Family role system as mainly a responsibility of families while the State intervene only when households are not able to comply their role. Consequently, countries in Southern Europe have limited provisions for care services, which jeopardises women’s involvement in labour markets as they are usually the main provider of care work within families. In addition, the model of family that is assumed by the family policy is a model in which men are the primary earner, while women’s income is a complement to men’s one [4]. The male breadwinner model is persistent and quite far from the dual earner-dual carer model that lies behind the process of defamilisation [21].

As such, empirical evidence has demonstrated the importance of household composition in determining the likelihood of women’s involvement in the labour market [7]. Together with education and the presence of children, having a partner becomes fundamental to predicting women’s access to the labour market [28]. As childcare services are provided mostly at the local level, services are uneven between and within countries in Southern Europe. The strong territorial disparities in the provisions of services depend on the economic capacity of the regions and, even if the direction of causality is unclear, is strongly associated with women’s access to the labour market [29]. Especially in bigger countries like Spain or Italy, geography shapes women’s labour market access as much as individual characteristics, such as education, while regional differences are reduced in smaller countries such as Cyprus or Portugal.

In conclusion, based on the theory that we have reviewed regarding the Work-Family role system in Southern Europe we might elaborate the following hypothesis in relation to the determinants of women’s access to the labour market:

  1. I

    We expect that individual characteristics of women (e.g. education, migrant background) are mediated by household characteristics (e.g. having a partner, presence of children) in their influence on women’s access to the labour market. Having a working partner and the presence of children reduce the likelihood that women work.

  2. II

    Women’s access to the labour market is unevenly distributed and depends on the different economic performances of regions. We assume that richer regions not only offer more employment opportunities but are also more capable of offering care services for working women.

3 Data description and exploratory analysis

In this section, we give a description of the data we focus on for later inference (in Sect. 5). In particular, Sect. 3.1 describes the dataset and variables used for analysis and Sect. 3.2 presents the exploratory data analysis which summarizes the main data characteristics by using statistical visualization tools.

3.1 Description of the dataset

The European Social Survey (ESS) is an academically driven cross-national survey that collects information in more than thirty European nations since its establishment in 2001. In this paper we focus on dataset round 9 (2018) provided by the ESS; data can be downloaded at https://www.europeansocialsurvey.org/data/round-index.html.

As for similar rounds, the dataset results from an hour-long face-to-face interview for respondents aged 15 and over, who are residents within private households from twenty-seven European countries. The interview includes questions on various core topics: social trust, political interest and participation, socio-political orientation, social exclusion, etc. This results in a large number (557) of covariates for each respondent. Note that we will not use the whole dataset for analysis, since there are irrelevant covariates with respect to the main research question. We have organized the dataset as follows.

  1. (a)

    We consider only those respondents aged between 25 and 55 years old since most people under 25 years old might be students and those over 55 might have retired yet or will be in a few years. This restriction is called prime age, and is adopted to focus on the most active stratum of population in the labour market. For instance, see [30].

  2. (b)

    We restrict the analysis to data from countries of Southern Europe, where the gender gap in economic participation and opportunity is larger than in North Europe [31]. However not all South European countries participated in the survey. We have data from Cyprus, Italy, Portugal and Spain. This will be the definition of South Europe in the rest of the paper. However, we have excluded respondents from two Spanish regions, the enclaves of Ciudad Autonoma de Ceuta and Ciudad Autonoma de Melilla, as their economic situation is very different from other Spanish regions.

With the aim of improving readability, some variable labels we use in our model are different from those in the original European Social Survey dataset. We have also transformed some of the variables in the original dataset with the same aim. See “Appendix C”. To sum up, we include the following variables in the dataset we focus on in this paper:

  • age: respondent’s age in years.

  • child: number of children (son, daughter, step or adopted or foster child) under 14 years old of the respondent and living in the same household. We consider only children below 14 years old in this variable since it is well-known that they need more attention from parents, as they develop competencies, interests, a sense of confidence and autonomy during these years [32].

  • region: region of the respondent’s country.

  • country: country of residence of the respondent. Instead of considering only the four countries in the dataset (Cyprus, Italy, Portugal, and Spain), we have considered two groups for Italy and Spain. Specifically, we have grouped Italian and Spanish regions into two levels, that is affluent and vulnerable, according to regional GDP per capita in 2018, since there exists a huge difference in terms of economic development between regions of these two countries [33]. More specifically, we calculated the median regional GDP per capita for Italy and Spain separately in 2018: the value equals 23,200 € for Spain and 31,700 € for Italy. The regions whose GDP per capita is lower than the median are considered vulnerable regions, otherwise affluent ones. See “Appendix A” for the associated classification. We use the covariate country in the rest of the paper, but the levels it assumes are affluent Italy, vulnerable Italy, affluent Spain, vulnerable Spain, Cyprus, and Portugal.

  • citizen: respondent’s citizenship status which can be autochthonous or immigrant. A respondent is autochthonous if he/she is born in the country and has citizenship of the country, otherwise, the person is classified as an immigrant. The variable citizen equals 1 if the respondent is autochthonous and 0 otherwise.

  • education: respondent’s highest education level with three ordered levels. Education = 1 means up to lower secondary education or less; Education = 2 corresponds to upper secondary education or post-secondary or short-cycle tertiary; Education = 3 indicates other higher education levels.

  • gender: respondent’s gender.

  • partner: status of respondent’s partner, a categorical covariate with 4 levels. A respondent may live without a partner (no partner) or with an unemployed partner (inactive partner) or with a partner who looks actively for a job (partner looking for job) or with a partner who has been doing paid work for the last seven days (working partner).

  • pdwrk: this variable is equal to 1 if the respondent performed at least one hour of paid work during the last seven days, 0 otherwise (definition by EurostatFootnote 2). This is the response variable in the model. In the rest of the paper, we denominate as probability of employment the probability that pdwrk is equal to 1.

The dataset downloaded from the ESS website is richer of information. First of all, according to the aim of the analysis, we selected only records with non-missing values of age and satisfying points (a) and (b), which resulted in a dataset of 2908 subjects. Based on experts’ knowledge, we have initially included information on religion, sexual orientation and personal opinions, in addition to the covariates listed above. Then subjects with missing values among the larger set of selected covariates were further excluded; this resulted in a dataset from 2488 subjects. However, since too many non-informative covariates may influence the quality of the statistical inference we produce under our model, we have fitted preliminary frequentist lasso regression [34, 35] in a rather black-box fashion. This method automatically selects a subset of covariates that are most significant for the response. In our case, information on religion, sexual orientation and personal opinions were discarded as being non-significant.

The final dataset we consider for the Bayesian statistical analysis in this paper has the following characteristics: the sample size is \(n=2488\) (this is the number of respondents in the dataset), 8 independent variables (covariates, itemized above), and one dependent variable (pdwrk). The empirical sizes of the respondents in the four countries are \(n_C=293\) for Cyprus, \(n_P=437\) for Portugal, \(n_{Iv}= 338\) for vulnerable Italian regions, \(n_{Ia}= 478\) for affluent Italian regions, \(n_{Sv}= 248\) for vulnerable Spanish regions and \(n_{Sa}= 271\) for affluent Spanish regions, with \(n_C+n_P+n_{Iv}+n_{Ia}+n_{Sv}+n_{Sa}=n\). We refer to this data as SE (South Europe) labour market dataset.

3.2 Exploratory data analysis

We start by reporting summary statistics for the response pdwrk and of the only numerical covariates, age and child. See Table 1.

Table 1 Empirical mean and standard deviation of the response, age and child

Figure 1 shows the empirical frequencies of the categorical variables introduced in the previous section. Note that the numeric variable child has been discretized with levels 0, 1, 2, 3+. Figure 1 shows that only 2.1% of respondents live with more than three children. As far as citizen is concerned, the dataset is clearly strongly unbalanced, since only 16% of respondents are immigrants. Other rounds of this data type, referring to years close to 2018, show the same unbalancedness. Differently from citizen, gender is balanced.

Fig. 1
figure 1

Empirical frequencies of the categorical covariates in the dataset

To build the model for the posterior analysis, we analyze potential relationships among the covariates. Figure 2 shows boxplots of age and child (the only numeric covariates in the sample) with respect to categorical covariates such as citizen, education, partenr, factorized by gender. From Fig. 2, we see that respondents with a high education level are younger, which might reflect the improvement of the education level in European countries during these last decades [36]. Moreover, the empirical distribution of age is shifted towards higher values for those respondents living with an inactive partner, especially for women; this could account for women with older partners who may have already retired. Respondents living without a partner and female respondents with an inactive partner have no child. Here we can observe a remarkable gender difference: the distribution of child is shifted toward higher values for men with an inactive partner than for women in the same condition.

Fig. 2
figure 2

Boxplots of age and child for each level of the categories of citizen, education and partner, factorized by gender

To understand the dependency between categorical variables (except for country and region), we have performed chi-squared tests of independence; the associated p values are reported in Fig. 3. Of course, as a rule of thumb, the smaller the p value of the test in Fig. 3, the stronger the evidence against independence and the darker the colour we have used in the figure.

Fig. 3
figure 3

p values of chi-squared tests between each pair of categorical variables. The smaller the p value, the stronger the evidence against independence and the darker the colour

It is clear that dependence is detected between partner on one side and each of child or citizen or education or gender on the other; see also the plots at the bottom row of Fig. 2. Similarly, there is empirical evidence of dependence between child and citizen. This is a well-known phenomenon, that immigrants tend to have a larger number of children [37].

Moreover, interactions between age and education, age and partner can be seen from Fig. 2.

4 Bayesian modelling of the impact of covariates

This section specifies the model we use in Sect. 5 for posterior inference on the SE labour market dataset, which is a Bayesian lasso regression model. We also include a brief description of the probabilistic programming language, Stan, that can build an appropriate Markov Chain Monte Carlo algorithm, to approximate the full posterior distribution.

Generalized linear models (GLM) were introduced by Nelder and Wedderburn [38] by extending ordinary regression models to non-normal response distributions. The logistic regression model, an example of GLMs, is widely used for modelling the relationship between the binary response variable and a set of explanatory variables. Since our response variables \(Y_1,Y_2,\ldots ,Y_n\) (an indicator of paid work for the survey respondents) are binary, we fit the following model with the logit link function:

$$\begin{aligned} Y_i \vert \pi _i {\mathop {\sim }\limits ^{\textrm{ind}}}Be(\pi _i), \ \ i=1,\ldots ,n \end{aligned}$$
(1)

where the logit link is given by

$$\begin{aligned} {{\,\textrm{logit}\,}}(\pi _{i}) =\varvec{x}_{i}^t\varvec{\beta }. \end{aligned}$$
(2)

In (2), \(\varvec{x}_i=(1, x_{i1},\ldots , x_{ip})\) is a \((p+1)\)-dimensional vector of covariates for \(i=1,\ldots ,n\), while \(\varvec{\beta }\) is a \((p+1)\)-dimensional coefficient vector. The model is completed with the following prior distribution:

$$\begin{aligned} \beta _j\vert \lambda&{\mathop {\sim }\limits ^{iid}} Laplace(0,1/\lambda ), \quad j=0,\dots ,p \end{aligned}$$
(3)
$$\begin{aligned} \lambda ^2&\sim Gamma(a_\lambda ,b_\lambda ) ,\quad a_\lambda>0, b_\lambda >0. \end{aligned}$$
(4)

This model represents a Bayesian counterpart of the lasso (least absolute shrinkage and selection operator) method introduced by Tibshirani [34]. The (frequentist) lasso method is a well-known regularization approach for generalized linear models. It shrinks some coefficients and sets others to 0, so automatically performs a subset selection by imposing a penalty term on the objective function. This penalty term is given by \(L_1\)-norm of the coefficient vector \(\varvec{\beta }\) multiplied by a parameter \(\lambda \ge 0\), which expresses the amount of shrinkage. The larger \(\lambda \), the larger the penalization on the magnitude of coefficients. The choice of \(\varvec{\beta }\) which minimizes the lasso penalty as described above corresponds to the Bayes posterior mode of \(\varvec{\beta }\) when the regression coefficients \(\beta _j\)’s are a priori distributed as in (3). As suggested by Park and Casella [35], one can specify a diffuse hyperprior distribution for \(\lambda \) like in (4). This prior allows performing the variable selection automatically. This means that some coefficients get shrunk to exactly zero as they are not significant for the model. Another popular alternative for sparse Bayesian regression coefficient estimation is the horseshoe prior [39], characterized by its robustness at handling unknown sparsity and large outlying signals. Piironen and Vehtari [40] point out its two shortcomings—the absence of consensus on the specification of the global shrinkage hyperparameter and the problem of regularization for large parameters—and propose a new way to solve these problems. Moreover, the horseshoe prior has shown comparable performance in different circumstances with the popular two-component mixture priors known as the spike-and-slab [39].

For our application, we select the variables in Table 2 as covariate \(\varvec{x}_i\) in (2). They have been selected according to two criteria: (i) based on the preliminary analysis described in Sect. 3.2, and (ii) as those variables that are the most significant according to the full Bayesian models, i.e., including all variables and their two-way and three-way combinations. Specifically, this means that we have run our model (1)–(4) including all the covariates described in Sect. 3.1 and all two-way and three-way combinations for few iterations, and have discarded all the two-way and three-way combinations that were clearly not significative. This results in all the combinations of variables listed in Table 2.

Table 2 Independent variables and their combinations included in our statistical model

Note that all numerical variables have been standardized and categorical variables were transformed into dummies by using the Python function dmatrix from the patsy package [41]. For example, the covariate gender is equal to 1 for a female respondent and 0 otherwise. Hence the corresponding regression parameter expresses the algebraic increment on the logit of the success probability in case the respondent is female. The reference value, represented by the intercept term, corresponds to the logit of the success probability of a native man coming from Cyprus, with education level 1 and without a partner.

All the inference in the Bayesian approach is based on the posterior distribution, i.e., the conditional distribution of \((\varvec{\beta },\lambda )\), given data \(y_1,\ldots ,y_n\) and covariates \(\varvec{x}_1,\ldots ,\varvec{x}_n\). In general, the posterior distribution is not available in closed analytic form; typical approximations are based on a class of simulation methods known as Markov Chain Monte Carlo (MCMC) [42]. However, MCMC methods can be extremely demanding from the computational point of view and the design of efficient MCMC algorithms is not a simple task. Thanks to the growth of computer processing power and the development of different accurate software products, and probabilistic programming languages, which can generate MCMC samples in a black-box fashion given the specification of a Bayesian model, Bayesian statistics has become more popular in recent years. Suitable probabilistic programming languages for MCMC methods are JAGS [43] and Stan [44, 45]. The default go-to software is Stan [46], which automatically generates samples from the target distribution, the posterior, using Hamiltonian Monte Carlo (HMC) and No-U-Turn Sampler (NUTS) that provide high-quality chain [47]. There is also an R interface, i.e. the package rstan [48], that we use to call the Stan sampler.

5 Posterior inference on the SE labour market dataset

In this section, we report posterior inference, such as posterior means and credibility intervals of parameters, for the model described in Sect. 4 and applied to our dataset presented in Sect. 3.1. In this work, we fix \(a_\lambda =1\) and \(b_\lambda =0.1\) in (4); see Sect. 5.1 below. We have run a single chain by Stan for the model described in Sect. 4. The first 5,000 iterations have been discarded as burn-in and the last 5,000 have been saved for analysis. First, Sect. 5.2 reports the inference, while in Sect. 5.3 we report associated findings and comments.

5.1 Hyperparameters and simulation settings

We have fixed hyperparameter \(a_\lambda = 1\) so that this implies that the prior for \(\lambda ^2\) is the exponential distribution, a rather flat prior, as asked in Park and Casella [35]. We computed standard goodness-of-fit indexes to fix hyperparameter \(b_\lambda \). Specifically, we computed log-pseudo marginal likelihood (LPML) and widely applicable information criterion (WAIC) for our model under \(b_\lambda =0.01, 0.1, 1\). We found that the optimal value is \(b_\lambda =0.1\).

We have also fitted likelihood (1)-(2) with the horseshoe prior [39] by the R-package rstanarm; see Piironen and Vehtari (2017) for more detail. Posterior inference in terms of significant variables and associated credibility intervals are very similar to the posterior inference of our model as reported in the next section for the Bayesian lasso prior (3)–(4).

As mentioned, we report here the output from a single chain. The debate on one single longer chain versus multiple shorter chains is old. For instance, Geyer [49] support the former approach, because of computationally efficiency. Here we have examined trace and autocorrelation plots of all parameters, indicating that convergence was achieved.

5.2 Posterior means and 95% credible intervals

Figures 4 and 5 report posterior means and posterior 95% credibility intervals (CI) of each regression parameter. See also Tables 5 and 6 (in “Appendix B”) which provide explicit numerical values of the posterior means and posterior 95% intervals of each parameter. As a selection criterion for covariates, we consider hard selection, that is if 0 is contained in the posterior 95% credibility intervals, then the corresponding covariate is not significative and can be removed from the model. We have signed in black all intervals which do not contain 0, while the rest are in light grey. However, to help the plots’ readability, we have also signed in black those intervals that do not overlap the zero significantly, i.e. we consider the corresponding covariate to be significant enough. Figure 4 refers to the estimates of regression parameters corresponding to the linear terms and the two-way interactions, while Fig. 5 concerns estimates of parameters of three-way interactions.

From the left panel of Fig. 4, we see that covariates such as child, region, country, education, gender and partner are significant to explain the success probability, that is the probability of employment. Note that, since the posterior distributions of the regression parameters associated with gender[female] and country[vulnerableIT] are concentrated on negative values, being a woman coming from vulnerable Italian regions decreases the probability of employment. This confirms that there is a difference between regions within Spain and Italy: the probability of being employed in vulnerable Italian regions is smaller, but it is higher in affluent Italian and affluent Spanish regions, given the same value of all other covariates. Living with an inactive partner or having children less than 14 years old increases the probability of employment. Living with an employed partner increases the success probability as the posterior distribution of the regression coefficient of partner[paid work] is concentrated on the positive values. However, this is not true for women: the negative values assumed by the regression parameter assumed when we include the interaction between being a female respondent and having a working partner, i.e., partner[paid work]\(\cdot \)gender[female], demonstrate the negative impact of having an employed partner on the employment probability for women; see the right panel of Fig. 4. The plot on the left of Fig. 4 shows that a higher education level increases the probability of being employed. Being an immigrant with a medium education level decreases the success probability as shown by the plot on the right, probably due to the problem of recognition of education obtained in other countries for immigrants.

From the right panel of Fig. 5, we observe other interesting interactions that involve the education level, such as education\(\cdot \)partner\(\cdot \)age and education\(\cdot \)partner\(\cdot \)child. A higher age can increase the probability of employment for respondents with education level 2 or 3 and without a partner. Having more children under 14 years old decreases the probability for respondents with education level 1 or 2 and living with an employed partner.

Fig. 4
figure 4

Marginal posterior credible intervals for regression parameters; the points represent posterior means, the thicker segments are 50% CIs and the thinner ones are 95% CIs. Black intervals denote that the corresponding covariate is significative, i.e., the intervals do not contain 0

Fig. 5
figure 5

Marginal posterior credible intervals for regression parameters; the points represent posterior means, the thicker segments are 50% CIs and the thinner ones are 95% CIs. Black intervals denote that the corresponding covariate is significative, i.e., the intervals do not contain 0

As a final remark in this section, we have fitted the likelihood (1)–(2) with the horseshoe prior using the R package rstanarm. Significative covariates do not change, and the credibility intervals of the regression parameters are very robust. For this reason, we do not report any further comments here.

Fig. 6
figure 6

Probability of been doing paid work in the last 7 days, as a function of age between 25 and 55 years old, with posterior 95% credible bands (dashed lines) for male and female reference respondents in different countries of Southern Europe

Fig. 7
figure 7

Probability of been doing paid work for the last 7 days, as a function of child between 0 and 6, with posterior 95% credible bands (dashed lines) for male and female reference respondents in different countries of Southern Europe

5.3 Profiling respondents

In this section we focus on respondents with particular characteristics and calculate the associated probability of employment, to better understand how those significant covariates (described in Sect. 5.2) influence labour participation, possibly across different countries. The resulting plots do not show anything new regarding the inference displayed in Figs. 4 and 5, but they may help understand the gender gap in Southern European labour market.

First, let us consider as reference respondents two interviewed people with the following characteristics: not immigrant, having 1 child, with education level 2 and living with an employed partner. We denote these subjects as female and male reference respondents. Figure 6 shows the posterior medians (continuous lines) of the success probability for each value of age for the male and female reference respondents across different countries, while the dashed lines are the posterior 95% credibility bands. It is evident that there is a significant difference between genders: the success probability for female reference respondents is always lower than that for male respondents. A larger discrepancy can be observed in the vulnerable regions rather than the affluent ones.

Another significant covariate is child and we show the success probabilities, as a function of this covariate, for two reference respondents with the following characteristics: not immigrant, 42 years old (empirical median of age), with education level 2 and living with an employed partner. Figure 7 shows the success probability with 95% posterior credible bands in dashed lines for the female and male reference respondents as a function of the number of children. As a result, as the number of children increases, women are more excluded by the labour market, especially in vulnerable Italian regions, while men are more likely to be employed. The gender differences become more evident with a higher number of children.

From Fig. 4 (right) we have seen that one of the significative two-way combinations is the interaction between citizen and education. Table 3 shows the posterior mean of the employment probability for a female/male reference respondent with different education levels, citizenship and countries, given that both are 42 years old, have 1 child and live with an employed partner. If we condition respondents coming from the same country, with the same citizenship status and the same education level, the success probability for any male reference respondent is always higher than that for the associated female respondent. While for the male reference respondents it is true that the higher the education level, the higher the probability of being employed, regardless of being immigrant or not, this is not true for female reference respondents. In fact, in vulnerable Italian or Spanish regions, female immigrants with education level 2 are more likely to be excluded from the labour market than those with education level 1. Moreover, the gender difference is particularly evident in vulnerable Italian regions since the employment probability for female respondents is much lower than that for male respondents.

Table 3 Posterior estimates of success probability for female and male reference respondents varying citizenship and education level

We now focus on significative three-way interactions pointed out by Fig. 5 such as the association between education, partner and age and between education, partner and child. To visualize how the first combination affects the probability of being employed, we have computed such probability as a function of age for reference respondents who are not immigrants, have 1 child and live in vulnerable Italian regions. The plots are reported in Fig. 8; in this case, the 95% posterior credible bands were not included in the figure to make the plots clearer. From Fig. 8, it is evident that a higher education level increases the employment probability. In particular, education can bring radical changes to women living with an employed partner: female reference respondents with an education level equal to 1 have always a lower probability of being employed than others in the same condition but with different partner’s status (the plot in the top right corner). However, if they are highly educated, i.e., with education level 3, they are more active in the labour market than those living without a partner or with an inactive partner (the plot in the bottom right corner). A larger gender discrepancy can be observed by comparing the success probability for male and female reference respondents with education level 1 (the plots on top): while male respondents with an employed partner are more active in the labour market with respect to other male respondents with different partner’s status, female respondents with an employed partner have lower success probability than others. We also observe that ageing is a helpful factor for single respondents, in particular for women with high education level: while the curve of the employment probability for single female reference respondents with education level 1 is almost flat as age varies, it is clearly increasing for women with education levels 2 or 3.

Fig. 8
figure 8

Probability of been doing paid work in the last 7 days for male and female reference respondents varying age, partner’s status and education level in vulnerable Italian regions

To display the interaction between education, partner and child, Fig. 9 reports the employment probability curves as a function of child for male and female respondents with any education level and any partner’s status from vulneerable Italian regions (not immigrant, 42 years old). While the trend of the curves remains similar for male reference respondents with different education levels or partner statuses, it changes significantly for female reference respondents. The probability of being employed decreases for women living with an employed partner as the number of children increases, especially if women have a low education level. There exists a great difference between female reference respondents with education level 2, but characterized by different partner’s status: in case their partners are inactive or looking for a job, they tend to become more active in the labour market with a higher number of children and their probability curves share similar tendencies with those of male reference respondents with education level 2, while they are more excluded by the labour market if living with an employed partner. Note also that the employment probability curves for women living with an employed partner and with a low education level (level 1 or 2) are quite similar, but they differ significantly from the curve for women highly educated, i.e., with education level 3.

Fig. 9
figure 9

Curve of the employment probability for male and female reference respondents varying number of children, partner’s status and education level in vulnerable Italian regions

We have repeated the same analysis for reference respondents living in other countries. However, the gender difference is more evident for respondents living in vulnerable Italian regions and this is the reason why we have focused on this case here.

6 Discussion and future work

In this article, we have analyzed the determinants of women’s access to the labour market in South Europe, focusing on Cyprus, Italy, Spain and Portugal. Our findings confirm that individual characteristics of women are mediated by household characteristics in their influence on women’s involvement in the labour market. On the one hand, education significantly impacts access to the labour market for both genders, especially for women; the gender gap narrows among highly educated people. Whilst ageing can be seen as a helpful factor for men being employed, this does not hold for women. On the other hand, the presence of children and having a partner, particularly having an employed partner, reduce the probability that women work.

The situation becomes more delicate and complicated when taking into account combinations of individual and household characteristics. The gender gap in terms of the access to labour market becomes negligible among highly educated people without partners. For these subjects, the probability of employment increases with ageing, regardless of gender. The presence of children has a significant negative impact on access to the labour market for women with low or medium education levels and having an employed partner. In these families, women usually provide family-related work, while men take the role of the breadwinner due to the limited provision of care services in Southern Europe.

Moreover, our findings confirm the uneven distribution of labour market access in South Europe, in accordance with the EIGE inequality index in the domain of access to work: among the Southern European countries included in our dataset, Portugal outperforms other countries in terms of access to the labour market. This uneven distribution is strongly dependent on the economic capacity of regions or countries; there are sharp differences between vulnerable and affluent regions, especially in Italy. Moreover, our analysis points out the persistence of gender inequality in South Europe, more evident in vulnerable Italian regions rather than in other regions. This can be traced back to the radical regional difference in Italy. An attempt to reduce the highlighted differences in the supply of early childcare services was made by the nationwide program, “Piano Straordinario per lo Sviluppo dei Servizi per la Prima Infanzia”. However, the program effectiveness was scarse in the Southern regions, and stronger in other regions [50].

We also note that subjects with an immigrant background, and a low education level are more likely to be employed than native-born subjects with low education. But the situation is reversed in case subjects received middle or high education. Indeed, the competencies acquired by immigrants in their country of origin may be not suitable for the labour market of the current country due to cultural or language barriers. Thus, access to highly qualified occupations for immigrants might be discouraged by such differences, while access to occupations that do not require specific skills, i.e. manual workers, is more straightforward. Moreover, the effect of education is not linear among immigrants: subjects with medium education levels are less active in the labour market than those with a low education level. Here one should consider the problem of recognition of education obtained in other countries and the mismatch between jobs and skills among immigrants [37].

In this work, we have assumed that the effect of being from a country (including the classification of Spain and Italy into two groups each) is fixed. In future work, we could include this as random effects in a generalised linear mixed-effects model, thus assuming a hierarchical model where groups might exchange a priori information as it is customary in these types of Bayesian models. See [51] for an approach of this type on a similar dataset.