1 Introduction

Policymakers tend to encourage entrepreneurial activity because it is viewed as a key driver of economic growth, job creation, and innovation. Consequently, they implement portfolios of policies to promote entrepreneurship/self-employment and to support small and medium firms as a solution to weak economic performance and deficient job creation. However, as the seminal work of Blanchflower (2004) pointed out, the level of self-employment itself does not guarantee economic growth. In fact, as Poschke (2013) noted, both developed and developing countries sometimes show the same self-employment rates despite having different growth patterns.

Among others, Shane (2009) and Congregado et al. (2010) warn that encouraging more people to become entrepreneurs does not necessarily lead to economic development. The strong negative cross-country association between self-employment and the level of income per capita in both less-developed and developing countries and the mixed evidence regarding the impact of entrepreneurship on growth at the macro level constitute indications of something being wrong in the usual linkage between the size of the aggregate self-employment sector and economic growth, as the works of Pietrobelli et al. (2004), Wennekers et al. (2010), Arin et al. (2015), and Rodriguez-Santiago (2022) have found.

Not trying to be exhaustive, Maloney (2004), Acs (2006), Poschke (2018, 2019), and Allub and Erosa (2019) pointed out that self-employment exhibits substantial heterogeneity and that cross-national differences could be behind this apparent puzzle and suggested examining the relationships between qualified self-employment (rather that the aggregate rate) and economic development. In this context, Stam (2015) and Stam and Van de Ven (2021) investigated the determinants of optimally productive entrepreneurship and the pillars of the entrepreneurial ecosystem, which are particularly important for devising an effective national competitiveness strategy.

This body of literature encompasses three main categories of research focusing on the determinants of self-employment rates at the macroeconomic level. Firstly, studies such as those by Acs et al. (1994) delve into the influence of macroeconomic factors like capital per worker and industry composition. Secondly, research by Pietrobelli et al. (2004), Arin et al. (2015), and Rodríguez-Santiago (2022) examines the impact of income per worker on self-employment rates, highlighting the adverse effects of macroeconomic instability on entrepreneurial activities. Thirdly, investigations by Blanchflower (2000), Centeno (2000), Robson (2003), and Torrini (2005) scrutinize labor market dynamics and regulations, including employment protection legislation, while others such as Fölster (2002), Anokhin and Schulze (2009), Djankov et al. (2010), Estrin et al. (2012), Belitski et al. (2016), and Dutta and Sobel (2016) focus on the role of corruption and taxation. Lastly, various articles, including those by Sobel (2008), Acs et al. (2008), Estrin et al. (2012), Bjørnskov and Foss (2016), and Urbano et al. (2020), explore the influence of institutions and institutional quality on entrepreneurship.

For these reasons, our paper follows this literature by focusing on the analysis of self-employment productivity (output per self-employed worker) with a twofold purpose. Our first purpose is to cluster countries worldwide to identify groups with some degree of similarity regarding their level and trend in self-employment productivity. With this classification, policymakers could examine cluster membership to determine whether their countries have performed on par with other countries in similar economic circumstances and to provide warning of unfulfilled expectations.

Our second purpose is to determine what drives country memberships and, as a result, what characteristics are shared by countries that makes them similar to their cluster and different from other clusters in terms of self-employment productivity. This analysis may provide implications for country policymakers regarding which policy variables have the greatest degree of influence on country-level self-employment performance and to consider which strategies would be appropriate to promote movements towards more productive clusters.

In our study, we use finite mixture models to analyze the varied landscape of self-employment productivity across different countries. These models offer numerous advantages over alternative approaches, particularly in their capacity to provide inference on individual classifications and overall clustering. The following paragraphs lay out a robust foundation for comprehending the chosen methodology.

To capture the worldwide heterogeneity in self-employment productivity, we rely on finite mixture models. These model-based classification methods exhibit several advantages over their alternatives, which classify according to similarities. First, the clustering in finite mixture models works on a statistical basis, which facilitates the conduction of inference on the estimates for individual classifications and for the clustering as a whole. Second, in the context of finite mixture models, a number of statistical criteria have been developed to objectively assess the optimal number of clusters. Third, exogenous explanatory variables explaining cluster formation can be easily and explicitly incorporated in the clustering procedure. Thus, model-based classification facilitates designing guidelines for policy recommendations based on an analysis of outcomes.

In finite mixture models, each cluster is assumed to have its own density. In our approach, we assume that this density is determined by both the group-dependent level of self-employment productivity and its group-dependent trend, or long-term direction, over the sample period. Thus, the model characterizes the homogeneity within each cluster not only by level of self-employment, which could be viewed as either high, moderate, or low, but also by the intensity in the overall evolution of the data path.

In addition, the model defines the probability that a country belongs to a given group. In this case, we consider a logit-type structure that depends on the influence of unit-specific exogenous variables on cluster membership. According to the literature, we postulate that these exogenous variables characterizing the ex-ante likelihood of membership in a given cluster are of four types. First, following Acs (2006), who consider that improvements in information technologies such as telecommunications may increase the returns to entrepreneurship, we use the World Bank's Digital Adoption Index (DAI).

Second, following Fairlie and Fossen (2020) and Cowling and Wooden (2021), we also consider the country-specific labor market situation as measured by the unemployment rate to be an influencing factor on whether a country is affiliated with a certain cluster of self-employment productivity. Third, in line with Blanchflower and Shadforth (2007), who examine whether self-employment was stimulated in the United Kingdom through changes in the industrial sector, we include the relative weight of the industrial sector. Finally, we follow Centeno (2000) and Robson (2003), who examined the interaction of self-employment and labor market rigidity, to propose the Labor Market Rigidity Index (LAMRIG) as an additional exogenous determinant of cluster membership.

In the empirical analysis, we compile a new large internationally comparable database of 121 countries covering the period from 1991 to 2019 and use the finite mixture model to obtain the following results. First, our data-driven approach points to three distinct groups of countries. The first group characterizes the countries with the highest productivity level and the steepest productivity trend. The second group comprises countries with a medium level of productivity and a flatter trend than that in the first group. The third group characterizes the countries with the lowest productivity level and the least pronounced trend in self-employment productivity.

The main results and contributions of the paper can be summarized in the following statements: firstly, referring to the geographical distribution of groups; secondly, regarding the disparities in self-employment productivity across these groups; and thirdly, concerning the pivotal elements influencing transitions between groups.

In accordance with the resulting geographical distribution of groups, we followed the categories of countries suggested by Porter et al. (2002) to categorize the groups according to their levels of competitiveness across the stages of economic development. The first cluster tends to include most innovation-driven economies with higher wages, level of innovation and associated standards of living. The second cluster includes countries in the efficiency-driven stage, who require the development of more efficient production processes and the ability to harness the benefits of existing technologies. Countries in the factor-driven stage predominate in the third cluster, which is composed of the least developed countries, where subsistence agriculture, extraction businesses, and unskilled labor are prevalent.

Second, despite the significant differences in the levels of self-employment productivity across these three groups, our results do not suggest that they will eventually converge. The reason is that the productivity of countries in the lower productive groups tends to grow over the sample period at a slower rate than that in the higher productive groups. Therefore, the trajectory of the least productive countries will not tend to catch up unless policymakers take measures to close the productivity gap. Thus, doing nothing is the best guarantee of failure in promoting the convergence process.

Third, our research identifies two key elements in the national entrepreneurial ecosystem that can enable less productive countries to reverse this tendency by determining the key factors that influence group membership. In line with the intersections of digital technologies and entrepreneurship that have been documented by Jafari-Sadeghi et al. (2021), our results show that designing a nuanced digital strategy with policies tailored to promote adoption and diffusion of digital technologies is especially important in facilitating the transition to the innovation-driven group.

In addition, we find that unemployment is a barrier to moving countries from the efficiency-driven to the innovation-driven group. In line with Thurik et al. (2008), who detected a dynamic interrelationship between self-employment and unemployment rates, we find that the labor market dynamic is also related to self-employment productivity. We postulate that the structural unemployment rate tends to favor the entry of marginal entrepreneurs who erode average productivity into self-employment. Thus, active labor market policies that are oriented to stimulate the search for salaried work offers hinder the promotion of self-employment among the less productive unemployed and thus appear to be advisable as a strategy for catching-up.

Interestingly, we failed to find that industrialization intensity or the rigidity of employment protection legislation were key elements in transitioning to more productive groups. The former is in line with the findings of Acs and Naudé (2013), who do not see industrial policies as merely functional policies without consideration of firm or entrepreneurial specifics. The latter agrees with Robson (2003), Torrini (2005), and Kanniainen and Vesala (2005), who found that employment protection legislation restrictiveness had little impact on aggregate self-employment.

The outline of the rest of the paper is as follows. Section 2 presents the model and provides an overview of the key statistical elements used to understand the empirical results. Section 3 describes the database and presents and discusses the results. Section 4 concludes and provides policy implications and further avenues of future research.

2 Data and methodology

2.1 Data description

In this paper, we focus on the productivity of self-employment as a proxy for the quality of entrepreneurship. In particular, we consider GDP per self-employed person, i.e., the output per self-employed worker, as our measure of productivity. To compare productivity levels across countries, GDP is converted to international dollars using purchasing power parity rates, which account for the differences in relative prices among countries.Footnote 1

Self-employed workers are those workers who, working on their own account, with one or a few partners or in a cooperative, hold the type of jobs defined as self-employment jobs. These data are taken from the International Labor Office (ILOSTAT) database. Self-employed workers include the following four subcategories: employers, own-account workers, members of producers' cooperatives, and contributing family workers.

The dataset of covariates that are used in the logistic prior to classifying each country in a specific group is created with four variables meant to capture group-specific differences. The first structural variable captures the labor market situation by using the average unemployment rate provided by the ILOSTAT database. This is measured as the percentage of the total labor force that is without work but have been seeking work in a recent past period and is currently available to work.

The second structural variable, which reflects the level of industrialization, is the average of industry added value as a percentage of GDP, including the ISIC divisions 05-43. These data are taken from World Bank national accounts and OECD National Accounts Statistics. The third structural variable is meant to capture the rigidity of employment protection legislation. For this purpose, we use the average of the market legislation rigidity index, as detailed by Campos and Nugent (2018).

Finally, we use a fourth structural variable to measure the level of digitalization. In particular, we use the Digital Adoption Index provided by the World Bank, which is a composite index measuring the spread of digital technologies in a country across three dimensions of the economy, namely, those of people, government, and business. To facilitate interpretation, the data have been normalized so that countries with values over 0 will be above the sample average, and vice versa.

Estimation in finite mixture models requires handling balanced panels. Therefore, our effective dataset is composed of annual self-employment and the four country-level covariates for a large set of 121 countries, spanning from 1991 to 2019. The list of countries, their code, average GDP by self-employed for the period, and the values of covariates can be found in the Appendix Table 4.

2.2 Model specification

In this paper, we investigate pooling within a time series panel using a finite mixture of an unspecified number of separate distributions.Footnote 2 For this purpose, let \(y=\{{y}_{it}\}\), \(i=1,...,N; t=1,...,T\) be a panel, where \(i\) and \(t\) refer to country-specific self-employment productivity and year, respectively. In addition, we assume that the time series arise from \(K\) hidden groups in such a way that all the time series within a certain group are characterized by the same econometric model and depend on the same set of parameters, which are heterogeneous across groups.

The approach used is based on formulating a time series model for each univariate time series \({y}_{i} =\left\{{y}_{i1},\dots ,{y}_{iT}\right\}\) in terms of the group-specific sampling density. For group \(k\), the density is \(p\left({y}_{i}|{\vartheta }_{k}\right)\), and the unknown group-specific parameters \(\vartheta =\left\{{\vartheta }_{1},\dots ,{\vartheta }_{K}\right\}\) take values in a parameter space \(\theta\). In this case, the same model is valid for all the time series within a given group, although with different parameters across groups. Furthermore, we also assume that the time series are independent within each cluster.

In this context, it is convenient to introduce a latent group indicator \({S}_{i}\), which takes a value out of the discrete set \(\{1,...,K\}\), indicating to which group the time series belongs; that is, \({S}_{i}=k\) indicates that \({y}_{i}\) belongs to group \(k\). We assume that \({S=(S}_{1},\dots ,{S}_{N})\) are a priori independent. Thus, knowing \({S}_{i}\) is equivalent to knowing the group-specific parameters and the density \(p\left({y}_{i}|{\vartheta }_{{S}_{i}}\right)\).

The joint sampling distribution reads as

$$p\left(y|S,\vartheta \right)=\prod\nolimits_{k=1}^{K}\prod\nolimits_{i:{S}_{i}=k}p\left({y}_{i}|{\vartheta }_{k}\right).$$
(1)

However, an important issue in this specification is that neither the number of clusters nor the group membership are known a priori. In contrast, we use model-based clustering techniques based on Bayesian classification rules to determine \(K\) and to estimate the group indicator \({S}_{i}\) along with the group-specific parameters \({\vartheta }_{1},...,{\vartheta }_{K}\) from the data.

To overcome the issue that group membership is unknown in practice, we assume that each time series of self-employment productivity is taken to be a realization of the mixture probability density function of \(K\) separate distributions

$${y}_{it}\sim {\sum }_{{S}_{i}=1}^{K}p\left({y}_{i}|{\vartheta }_{{S}_{i}=k}\right){\text{Pr}}({S}_{i}=k|{Z}_{i},\gamma ),$$
(2)

where the mixing proportion \(Pr\left({S}_{i}=k \right|{Z}_{i},\gamma )\) is the probability that \({y}_{i}\) belongs to group \(k\). Thus, the probabilities of group membership are posited to rely on vectors of country-specific variables, \({Z}_{i}\), each comprising g variables, and on the parameter set \(\gamma =({\gamma }_{1},\dots ,{\gamma }_{K})\), where each \({\gamma }_{k}\) represents a vector of \(g\) elements, with k=1,…,K. For clustering purposes, each component in mixture Model (2) corresponds to a cluster.

To complete the model specification, given the time series of self-employment productivity of country \(i\), \({y}_{i}\), that belongs to a certain group \(k\), \({S}_{i}=k\), we consider that the expected value of each time series of this group is fully characterized by a group-dependent mean and a group-dependent trend. Thus, we model the time-series dynamics of self-employment productivity as

$${y}_{it} = {\mu }^{{S}_{i}}+{\alpha }^{{S}_{i}}t+{\varepsilon }_{it}$$
(3)

where the error term is conditionally heteroscedastic, \({\varepsilon }_{it} \sim N\left(0,\frac{{\sigma }^{2}}{{\lambda }_{i}}\right),\) and \({S}_{i}=1,...,K\).Footnote 3 For each group, we consider \({\vartheta }_{k}={(\mu }^{k}, {\alpha }^{k},{\sigma }^{2})\), and for each country, we consider that the series-specific variance weights, \({\lambda }_{i}\), where \({\lambda =(\lambda }_{1},\dots ,{\lambda }_{N})\) collects all of the weights. For a given cluster \(i\) belonging to cluster \({S}_{i}=k\), parameter \({\mu }^{{S}_{i}}\) represents the base values of self-employment productivity that characterizes the cluster, while parameter \({\alpha }^{{S}_{i}}\) provides the intensity of the increasing or decreasing behavior in the series belonging to the cluster over time.

In addition, we follow Frühwirth-Schnatter and Kaufmann (2008), Kaufmann (2010), and Hamilton and Owyang (2012), and consider a multinomial logit model to include prior information on a particular series in the estimation of the group probability:

$$Pr\left({S}_{i}=k \right|{Z}_{i},\gamma )=\frac{{\text{exp}}\left({Z}_{i}^{'}{\gamma }_{k}\right)}{1+{\sum\nolimits }_{l=2}^{K}{\text{exp}}\left({Z}_{i}{'}{\gamma }_{l}\right)},$$
(4)

where the first group is the baseline group, and we set \({\gamma }_{1}=0\). We assume that \({\gamma =(0,\gamma }_{2},\dots ,{\gamma }_{K})\) are independent of the other parameters of the model.

The vector \({Z}_{i}\), for \(i=1,\dots ,N\), includes the \(g\) country-specific average features of the labor market structure that determine the classification of the self-employment productivity of country \(i\) into a specific group, with \({Z{'}=(Z{'}}_{1},\dots ,{Z{'}}_{N})\). The parameters \({(\gamma }_{2},\dots ,{\gamma }_{K})\) are unknown but group-specific values, and they allow us to estimate the prior classification probabilities of country \(i\) belonging to a group depending on the structural variables \({Z}_{i}\).

These parameters have a nice interpretation because they determine the intensity of each structural variable for classifying a country into a certain group. If the j-th component of \({\gamma }_{k}\) for country \(i\) is positive, then there is an important role for the j-th structural variable of country \(i\) in making this country more likely to belong to group \(k\) rather than to part of the baseline group. In contrast, if the component is negative, increasing the j-th structural variable of country \(i\) increases the probability of this country being reclassified toward the baseline group.

2.3 Model estimation

The model estimation is carried out within a Bayesian framework with the aid of Markov Chain Monte Carlo (MCMC) simulation and data augmentation methods for finite mixture models. Thus, using the information given in the data, the key issue is obtaining a posterior inference on the group indicator, \(S\), the model parameters, \(\vartheta\), the series-specific variance weights, \(\lambda\), and the intensity of the structural variables, \(\gamma\).

Let us start by assuming that the number of clusters \(K\) is known, although we will set a procedure for determining the number of clusters below.

Priors

The parameter vector is further broken down into parameter blocks, for all of which we assume standard prior distributions as follows: The prior distribution of the group-specific parameters \(\left({\mu }^{k},{\alpha }^{k}\right)\sim N({m}_{0},{M}_{0})\), for \(k=1,\dots ,K\); the variance of the error terms and the series-specific variance weights follow inverse gamma and gamma distributions, respectively: \({\sigma }^{2}\sim IG\left({g}_{0},{G}_{0}\right)\) and \({\lambda }_{i}\sim G\left(\frac{v}{2},\frac{v}{2}\right)\) for \(i=1,\dots ,N\); and the parameters governing the prior group probabilities under the logit structure follow a normal distribution, \({\gamma }_{k}\sim N(0,\tau {I}_{g})\), for \(k=1,\dots ,K\), where \(g\) is the dimension of vectors \({Z}_{i}\).

Estimation

The sampling scheme to draw from the posterior follows Frühwirth-Schnatter and Kaufmann (2008) and involves the iteration between the following three steps:

  1. (i)

    Classification for fixed parameters. Each time series \({y}_{i}\), with \(i =1,\dots , N\), is classified into one of the \(K\) groups by sampling the group indicator \({S}_{i}\) from the posterior distribution \(Pr\left({S}_{i} = k|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right),\) using the sampling density as well as the prior classification probabilities,

$$Pr\left({S}_{i} =k|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right)\propto p\left({y}_{i}|{\vartheta }_{k},{\lambda }_{i}\right)Pr\left({S}_{i} =k|{Z}_{i},\gamma \right),$$
(5)

for \(k=1,...,K\).

  1. (ii)

    Estimation for a fixed classification and \(\lambda\). Conditional on knowing the values of \(S\) and \(\lambda\), sampling \({\vartheta }_{1},...,{\vartheta }_{K}\) is carried out by sampling the group-specific parameters from the posterior \(p\left({\vartheta }_{1},...,{\vartheta }_{K}\right|S,y,\lambda )\), where each group parameter \({\vartheta }_{k}\) is estimated by pooling each time series that currently belongs to group \(k\). To sample \(\gamma\), we follow Scott (2011) and use a Metropolis-Hasting algorithm.

  2. (iii)

    Estimation of \(\lambda\) for a fixed \(S\), \(\vartheta\) and \(\gamma\). For each \(i =1,..., N\), the scale factors \({\lambda =(\lambda }_{1},\dots ,{\lambda }_{N})\) are sampled independently from the gamma distributions.

The MCMC estimation procedure described above is repeated \(M\) times, and the stacked values of the outcomes of each iteration draw can be used to perform an inference. However, the sampler could present label switching problems, and the finite mixture model must be identified through some inequality constraint on the group-specific parameters. To handle label switching in mixture models, we use the identifiability constraint \({\mu }_{1}>{\mu }_{2}>\dots >{\mu }_{K}\) for all \(k=1,\dots ,K\). In other words, the constraint implies that the groups are identified by their level of self-employment productivity.

Once the model has been identified, we can perform inference regarding which time series belong to which group by using the posterior classification probability. In particular, we can estimate the posterior probability that a time series \({y}_{i}\) belongs to group \(k\) from the MCMC draws by averaging over the \(M\) iterations,

$${P{r}}\left(S_i =k|y_i,{Z}_{i},{\vartheta }, {\lambda}, {\gamma}\right)\approx \frac{1}{M}\sum\nolimits_{{m}=1}^{M}{I}_{\left\{{{S}}_{i}^{\left({m}\right)}={k}\right\}}$$
(6)

2.4 The number of clusters

For exposition purposes, the number of components of the mixture, \(K\), was known. In practice, however, the number of groups will be unknown. To choose the number of groups in a straightforward form, one could select the number of components that maximizes the marginal likelihood from the set \(\{1,\dots ,{K}^{*}\}\), where \({K}^{*}\) is an upper bound. However, this method will result in a model with an arbitrarily large number of groups.

For this reason, we consider selecting the model with the number of groups necessary to maximize the quality of the classification by introducing the entropy of the model. If we call \({EN}_{j}\) the entropy of a model with a fixed number of \(j\) groups, the method would entail selecting \(K\) as the model that minimizes

$${EN}_{j}=-\sum\nolimits_{i=1}^{N}\sum\nolimits_{k=1}^{j}Pr\left({S}_{i}=k|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right){\text{log}}Pr\left({S}_{i}=k|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right),$$
(7)

for \(j=1,\dots ,{K}^{*}\). In this expression, larger entropy values indicate worse clustering solutions in terms of a quality classification, where the value would be 0 for perfect classification.

3 Empirical results

3.1 Model estimation

Estimation is based on the following priors. For group-specific parameters, we use \(\left({\mu }_{k}, {\alpha }_{k}\right)\sim {\text{N}}\left(0,1000\right)\). The priors of the variances are \({\sigma }^{2}\sim IG(\mathrm{1,1})\), and they are \({\lambda }_{i}\sim G(\mathrm{4,4})\) for the scale parameters. For the parameters of the logistic model, we use the prior \(\gamma \sim N(0,\tau {I}_{g})\), with \(\tau =20\) and \(g=4\). For each run of the MCMC sampler, after conducting a burn-in phase of 2000 iterations to remove dependence on the starting condition, 8000 draws are kept to evaluate the estimation.

To select the number of groups, we set \({K}^{*}=6\). Table 1 presents the results of the marginal likelihood and entropy for models with up to six groups. As expected, the likelihood increases with the number of groups. However, the model specification that divides the data into three separate groups is preferred because this model reaches the lowest entropy value (0.38) among model specifications.

Table 1 Marginal likelihood of various model specifications

Table 2 gives the posterior means of the mean and slope coefficients associated with each of the three idiosyncratic groups, displaying their standard deviations in parentheses indicating that they are all significant at the 5% level. The table shows a division of countries into three distinct groups according to the average and trend of their respective self-employment productivity. The groups are ordered in decreasing order of entrepreneurship productivity levels and trends, with Group 1 being associated with countries with the highest levels of productivity and the steepest tendencies. In Group 2, we find countries with medium levels of productivity and mid-level tendencies. Finally, Group 3 contains the countries with the lowest productivity levels and the flattest tendencies.

Table 2 Group-specific model parameters

To complete the description of the groups, Table 3 illustrates the individual characteristics that drive group formation. For this purpose, the table shows the posterior means of the estimated logistic coefficients that influence the group probabilities and their standard deviations (in parenthesis), with bold indicating that the coefficients are significant at 95% confidence. The structural variables, which appear in columns, represent the averages of unemployment rate, the value added by industry, the Labor Market Rigidity Index, and the Digital Adoption Index.

Table 3 Parameters governing the membership probabilities

In accordance with the parameter estimates, we find an important role of the level of ability of individuals in a country to access and use new information and communication technologies (ICTs) to make the country less likely to be part of Groups 2 and 3. Thus, we identify Group 1 as the group of countries with the highest levels of adoption of digital technologies, which lead to high levels and an upward tendency of productive self-employment. To examine this finding more deeply, Fig. 1a shows the prior probability that country i belongs to Group 1, which is conditional on the structural variables, \(Pr({S}_{i}=1|{Z}_{i},\gamma )\), as a function of the Digital Adoption Index. The figure reveals the positive relationship between the adoption of digital technologies and the probability of being classified in the group with the highest level and steepest tendency of self-employment productivity.

Fig. 1 
figure 1

Membership probabilities. Panel (a) shows the prior probability that country \(i\) belongs to Group 1, which is conditional on the structural variables \({Z}_{i}\), \(Pr({S}_{i}=1|{Z}_{i},\gamma )\), as a function of the Digital Adoption Index (DAI). Panel (b) shows the prior probability that country \(i\) belongs to Group 2, which is conditional on the structural variables \({Z}_{i}\), \(Pr({S}_{i}=2|{Z}_{i},\gamma )\), as a function of unemployment rate

In addition, Table 3 shows that countries with high unemployment rates have lower odds of being classified in the first group than of belonging to the second group. To illustrate this finding, Figure 1 (b) shows that the prior probability that country i belongs to Group 2, which is conditional on the structural variables, \(Pr({S}_{i}=2|{Z}_{i},\gamma )\), is positively correlated with unemployment.

The figures in Table 3 also point to a very interesting finding. The parameters governing the membership probabilities that relate to industry size and the rigidity of employment protection legislation are not statistically significant. Thus, neither the value added by industry nor that added by labor market rigidity seem to play a statistically significant role in group formation.

To interpret the dynamics of group membership, Fig. 2 sketches the geographical distribution of the three groups based on the country’s highest posterior probabilities. In particular, country i is classified in Group k if \(Pr\left({S}_{i} =k|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right)>Pr\left({S}_{i} =j|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right)\), with \(j\ne k\). A visual examination of the map allows us to identify the highly productive countries of Group 1 as most of the European countries, United States, Canada, Saudi Arabia, Oman, Japan and Australia.Footnote 4 These countries show the highest wages and associated standard of living, which can be sustained through businesses that compete with new and unique products and companies that compete through innovation and the production of new and different goods using the most sophisticated production processes.

Fig. 2 
figure 2

Group membership. The map shows the geographical distribution of group membership, where country i is classified in Group k if \(Pr\left({S}_{i} =k|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right)>Pr\left({S}_{i} =j|{y}_{i},{Z}_{i},\vartheta ,\lambda ,\gamma \right)\), with \(j\ne k\)

The interaction between the medium levels and trends in self-employment productivity and high rates of unemployment plays a significant role in forming Group 2. In this group, we find the remaining European countries and some developing and emerging economies, such as southern Africa, Mexico, Brazil, Argentina, Chile, Uruguay, Algeria, Egypt, Turkey, Iran, Kazakhstan, South Korea and Malaysia. These countries produce standard products and services and are susceptible to external, sector-specific demand shocks.

Finally, the countries with the lowest entrepreneurship productivity and the flattest productivity trend appear in Group 3. In this group, we find Central and Middle African countries, such as Angola, Cameroon, the Central African Republic, Chad, and Nigeria; some Asian countries, such as India, Pakistan, Indonesia and Mongolia; and some South American countries, such as Paraguay, Bolivia, Peru, Ecuador, and Colombia. These are less developed economies, showing limitations in the accessibility of digital technologies, in the level of wages and in competitive advantages accompanied by a heavy reliance on unskilled labor and natural resources.Footnote 5

3.2 Connection with the literature

Our empirical findings call for the leveraging of the most closely related scholarly knowledge on international self-employment development. First, our model-based clustering procedure split the dataset of countries into three distinct groups according to their levels and trends in self-employment productivity. According to the classification of competitiveness across stages of economic development as advocated by Porter et al. (2002), the high-productivity group aligns with economies in the innovation-driven stage, the medium-productivity group relates to efficiency-driven economies, and the low-productivity group contains factor-driven economies.

Second, our results point to the fact that digitalization seems to be a key factor favoring the transition from factor- and efficiency-driven economies to innovation-driven economies. This is likely because digitalization is a key competitive factor, both for a managed economy in which competitiveness is based on efficiency and for capturing the best profit opportunities that favor technological and economic leadership. This agrees with recent findings examining the intersection of digital technologies and entrepreneurship and its impact on the pursuit of sustainable development. In this context, Nambisan (2017) and Jafari-Sadeghi et al. (2021) are two significant examples.

Third, our results suggest that there is a relationship between labor market dynamics (the reduction of unemployment) and the likelihood of a country moving from a medium to a high-productivity self-employment group. At this point, we could argue that a well-functioning labor market generates sufficient wage employment opportunities to substantially reduce the relative weight of "necessity entrepreneurs," usually marginally attached to self-employment, with respect to "opportunity entrepreneurs." This results in an increase in the quality of entrepreneurship, becomes a key element in capturing more and better profit opportunities and transforms an economy into an entrepreneurial-driven economy.

Thus, it follows that labor market reforms that promote employability and the provisioning of employment opportunities in the context of a labor market with adequate dynamism are elements that favor the transition to the innovation-driven group. These arguments are in line with the contributions of Acs (2006), Baptista and Thurik (2007), Baumol and Strom (2007), Acs et al. (2008) and Van der Zwan et al. (2016).

Fourth, our statistical evidence does not seem to support the idea that the industrial sector plays a significant role in the probability of a country belonging to the high-productivity self-employment group. In contrast to this result and in line with Lucas (1988), industrialization processes should be accompanied by an increase in self-employment productivity as low-skilled self-employment moves to paid employment as attracted by larger wages in routine industrial job opportunities.

However, our result agrees with the literature on employer-size wage differentials. Davis and Haltiwanger (1996) pointed out that larger employers do not necessarily pay substantially higher wages because the dispersion of wages exhibits a pronounced relationship to employer size. In this context, Poschke (2018) found that it is not only the average size of firms but also their dispersion that is significantly higher in developed countries, and Shi et al. (2020) recently suggested the wage-boosting effect of innovation in shaping firm wages, which does not necessarily depend on firm size. In addition, Acs and Naudé (2013) recognized the complexity of the role of entrepreneurs in industrialization, as this role can be inhibited by, for example, market failures. For this reason, these authors do not see industrial policies as merely functional policies without consideration for firm or entrepreneurial specifics.

Finally, our results do not support that strict employment protection legislation promotes self-employment productivity because it does not influence a transition toward the group of countries with the highest self-employment productivity. This is closely related to the work of Robson (2003), who found very limited evidence for a positive relationship between self-employment and the strictness of employment protection legislation, as this largely depends on the introduction of suitable control variables. Torrini (2005) also failed to find any robust relationship between the self-employment rate and employment protection legislation in a multivariate context.

3.3 Policy recommendations

This study provides important guidelines that policymakers are invited to use when drawing up effective national strategies and policy aspects for self-employment productivity and combatting traditional stereotypes that appear to be less effective for transitioning into the group of highly productive countries.

Our analysis identifies three clusters of countries with some degree of similarity regarding their level and trend in self-employment productivity. This classification can help national policymakers to verify which group their own country belongs to and determine whether their country has performed on par with other countries in similar economic circumstances.

Unfortunately, our results provide evidence that the catch-up effect, which predicts that all economies will eventually converge in terms of self-employment productivity, does not apply. In this context, we consider the inactivity of policy makers to not be justified and recognize that there is pressure on governments to provide resources to assist in promoting changes toward more productive clusters.

According to our results, the first challenge of policy interventions implies improving incentive structures for entrepreneurs associated with digitalization and promoting the introduction of a new culture of digital entrepreneurship. This implies supporting the development of digital and entrepreneurship skills by addressing some key barriers with a range of policy actions. Examples include embedding digital entrepreneurship modules in entrepreneurship education, offering tailored digital entrepreneurship training programs and improving access to finance for digital entrepreneurship for underrepresented and disadvantaged groups.

The second challenge of policy interventions that are aimed at encouraging self-employment productivity to facilitate transitioning into the innovation-driven group requires decisive measures aimed at reducing national unemployment. To name a few, we suggest increasing the attractiveness to private capital, removing the obstacles to labor mobility, improving the quality of formal job allocation mechanisms and reducing rigidities in the housing market. The effectiveness of national policies must also be enhanced by ensuring that funds devoted to reducing unemployment are well managed and that the monitoring and evaluation procedures are improved to guarantee a consistent long-term strategy for human capital.

Likewise, our results point to avoiding policy measures that are likely to be inefficient in promoting self-employment productivity. Our results do not support implementing industrial policies as merely functional policies without consideration of firm or entrepreneurial specifics. In fact, the patterns we find suggest that more numerous industry does not necessarily appear to be better for self-employment productivity.

In addition, our results suggest that changes in the rigidity of labor legislation per se do not serve to stimulate self-employment productivity. Despite the explicit set of rules that govern national employment protection legislation, different degrees of regulatory compliance and the possibility of evasion opportunities could explain this finding. In any case, we consider that the design of incentive schemes should not result in distortions to the allocation of talent between salaried employment and self-employment.

4 Conclusions

This paper reexamines the diversity in the level and dynamics of entrepreneurship across countries in terms of self-employment productivity. To this end, we applied a Bayesian finite mixture model for clustering time series to a large dataset consisting of internationally comparable indicators covering a large set of 121 countries over the last three decades.

Our empirical findings point to the existence of three homogeneous groups stratified by the following levels of entrepreneurship productivity: high-, medium-, and low-productivity countries. These clusters are roughly aligned with the three major groups of countries usually considered in the entrepreneurship literature, namely, factor-, efficiency-, and innovation-knowledge- driven countries, and with the literature on managed vs. entrepreneurial societies. In addition, these clusters parallel the three different stages of economic development, namely, developing, transitioning and developed countries.

In contrast to simpler clustering methods, our clustering approach allows us to examine the key structural variables that allocate each country to a particular cluster and regulate the transition to higher productive groups. In other words, our results not only provide homogeneous country groups regarding the quality of entrepreneurship but also point to the institutions or elements in the national entrepreneurial ecosystem that enable their high self-employment productivity and determine their transition to higher productivity groups. The identification of these factors might be particularly useful for policymakers interested in promoting self-employment productivity and for academics who strive to test previous theories and hypotheses.

Among the factors guiding the transition between groups, we consider some structural variables that strengthen or weaken the probability of a nation becoming an entrepreneurial economy. These variables are the unemployment rate, which refers to the labor market situation; the average of industry added value as a percentage of GDP, which measures the level of industrialization; the Digital Adoption Index, which measures the diffusion and adoption of digital technologies; and the Labor Market Rigidity Index, which is a measure of labor market rigidities. These variables are usually viewed as common drivers in entrepreneurship (e.g., Audretsch and Thurik 2004) and are included in the literature on regional innovation systems and entrepreneurial ecosystems (see Cao and Shi 2021 and Qian and Acs 2023 for recent surveys).

Our results suggest that policy measures oriented toward (i) creating and enabling environments that foster the accessibility of digital technologies and (ii) promoting initiatives for reducing unemployment are key elements for those countries that are generally moving toward becoming highly productive economies. However, the results fail to find the share of industrial added value as a determinant of such transitions. In addition, we find that deregulation policies meant to reduce rigidities in the labor market are also not important keys for transitioning between clusters.

According to our results, the proposed framework is a very promising tool for analyzing the determinants of self-employment productivity. In fact, we look forward to future work addressing the following issues. First, although we focused on aggregate self-employment productivity, we see a natural extension to be the exploration of disaggregated measures, mixed incomes, and nonagricultural self-employment. Second, we could extend the number of additional factors driving the transition between groups. Third, the method is suitable for exploring the determinants of self-employment productivity at the regional level. These extensions were not pursued in this paper due to the cost of reducing the number of observational units. For this reason, these extensions have been explicitly left for further research.