1 Introduction

Extensive research over the past 25 years has looked for characteristics that drive individuals to choose self-employment instead of wage work. Blanchflower and Oswald (1990, 1998) define the typical research question in this research area as ‘what makes an entrepreneur’? Their later work notes that “The simplest kind of entrepreneurship is self-employment,” and we follow these authors, and many others, by using ‘entrepreneur’ and ‘self-employed’ interchangeably. In the literature it has been typical to use cross-sectional data, viewing individuals only at a particular point in time or comparing their behaviour between two points in time, so that dichotomous choice models can be used to identify characteristics associated with either state. This approach has its theoretical foundations in discrete models of career choice, such as Kihlstrom and Laffont (1979), Evans and Jovanovic (1989) and Blanchflower and Oswald (1990), and has been driven at least in part by the ease with which probit and logit analysis can be used to predict probable career choice between these two alternatives (for example, Evans and Leighton (1989), and Blanchflower and Oswald (1998)).

As a consequence, the much more rich (and realistic) dynamics of individual career choice is an area that still awaits detailed research. It is well known that many entrepreneurs gain some prior experience in paid employment before starting their own business. Similarly, others use wage work as a means of saving for a new venture. Most new start-ups then fail within a few years, and many of the (formerly) self-employed who suffered from bad luck or bad judgement then switch to temporary or lasting employment, perhaps after spells of unemployment or inactivity (Holtz-Eakin et al. (1994)). Likewise, some individuals are serial entrepreneurs and some of these spend temporary periods in employment between ventures; especially those who are unsuccessful (for example, see Handy (1999) and Bridge (2006)). This depiction contrasts with the methodology of discrete models and empirical analysis where individuals are forced into mutually exclusive categories comprising pure wage-workers and pure entrepreneurs—‘pure’ in the sense that individuals spend all of their time in either wage-work or self-employment. However, labour market data indicates that this is not only inaccurate, but also actually very misleading. Over a 9-year period, our data set illustrates that while self-employment is a minority career activity, within this category ‘pure’ entrepreneurs are outnumbered by individuals who mix their career with spells in both self-employment and wage-work. In fact, patterns in the data suggest three types of people (although we only directly consider a period of 8–9 years): those who never try self-employment, those who are ‘die-hard’ entrepreneurs and spend their entire career in self-employment, and those who move between wage work and self-employment. This pattern in the data has prompted us to delve deeper in to the question of what makes an entrepreneur in order to distinguish between the ‘die-hards’ and the ‘less persistent’ entrepreneurs. Therefore, the aim of this article is to try to move beyond the dichotomous depiction of entrepreneurship and wage-work, and begin to explore the implications of entrepreneurial persistence. We want to differentiate the factors that make an individual try self-employment from those that make a persistent, dedicated or what we term a ‘die-hard’ entrepreneur. It is worth noting that even successful serial entrepreneurs may be associated with rapid change of ventures, giving rise to short-lived businesses alongside high persistence in self-employment. Similarly, relatively unsuccessful entrepreneurs who enjoy the non-pecuniary benefits of self-employment may still choose to spend long periods in self-employment. Therefore, at a theoretical level our concept of entrepreneurial persistence is not synonymous with firm survival. Footnote 1 However, at an empirical level survival and persistence are likely to be highly related as in this data nearly 90% of the time spent in self-employment is accounted for by a single continuous spell in self-employment. To our knowledge this is the first empirical analysis of the determinants of ‘die-hard’ entrepreneurs—although growth studies conditioned on the survival of a self-employed business fulfil a similar role. Before proceeding to outline the structure of the rest of the article we first spend a little time explaining terms associated with our reclassification of entrepreneurs from a single ‘pure’ type to a form which accounts for varying degrees of persistence (or ‘die-hardness’) in self-employment.

Not to overstate our contribution, here we shall only take a modest step beyond the static, binary choice approach by considering a reduced-form, ex post result of sequentially ‘optimal’ decisions, namely the total time spent in self-employment over a period of approximately 9 years. This summary measure can obviously include multiple spells, and thus does not address spell durations as such, but as we have noted that is not the purpose of the article. Footnote 2 The complementary measure “time not self-employed” includes employment, unemployment and inactivity—though for many (particularly among the males in our chosen cohort) it will be mainly employment; and, in any case, we include basic controls to provide crucial distinctions between those who are employed and those who are not.

Once we proceed beyond considering individuals only at a point in time, we could in principle examine all individuals on a continuum between choosing to spend their entire career history in self-employment, or none of it, or any fraction. However, the polarised nature of our data, the crucial elements of which are drawn from the sixth sweep of the UK National Child Development Study (NCDS), suggest that we ought to test whether a natural distinction exists between individuals who are pure wage-workers (no self-employment in their work history or likely future), and those who have ever been self-employed (or may be in the future). We label this latter group Entrepreneurial Types (ETs), some of whom may only be very briefly self-employed. The cross-section of those who are self-employed at any point in time within our sample period will be a proper subset of the ETs, since some individuals may only be self-employed before or after the date of the cross-section. The more inclusive ET set should thus provide more insight into the fundamental determinants of a propensity for self-employment.

In the second stage of our analysis, we then estimate the total time spent in self-employment by ET individuals between the ages of 33 and 42—a measure of what we shall call Entrepreneurial Persistence (EP)—the highest levels of EP being for the die-hard entrepreneurs. This summary measure is dictated largely by data availability, and is clearly not a direct measure of survival or spell duration, although we note that for nearly 90% of the sample this is indeed the case. However, our measure is an indicator of entrepreneurial performance that should be correlated with the other measures, such as job creation that we studied previously in a cross-section of NCDS individuals at age 33—see Burke et al. (2000, 2002). We should be able to provide a more insightful perspective about the factors that determine entrepreneurship, as well as being able to assess the extent to which pre-existing discrete analyses—such as those drawing on the fifth sweep (1991) of the NCDS (Blanchflower and Oswald (1998) and Burke et al. (2000, 2002))—have been distorted by oversimplification of the empirical analysis.

Our two-regime approach allows us to distinguish between characteristics that encourage individuals to try self-employment (be of ET), and those that are associated with longer total times in self-employment (or greater EP)—the longest being the ‘die-hard’ entrepreneurs. The relevance of this distinction is ultimately an empirical question that is indeed confirmed by our data. We begin by modelling the count of quarters spent self-employed, using (two-regime) zero-inflated count data models that allow for a fundamental difference between those individuals who might be self-employed at some time, and others who never would. Results for an alternative modelling approach—a ‘hurdle’ model, based on the work of Cragg (1971), Lin and Schmidt (1984) and Jones (1989)—are also estimated. These provide some indications of robustness for our conclusions about the impacts, for a cohort of individuals, of various aspects of their background, experience and characteristics in determining time spent self-employed during nearly a decade of mature adult life.

There have been some other recent methodological advances. The work of Constant and Zimmerman (2006) uses a three-stage estimation approach, culminating in a structural probit that allows for the endogeneity of the role of earnings. Fraser and Greene (2006) employ heteroscedastic probit estimation of occupational choice, on the basis that entrepreneurial optimism diminishes with length of self-employment experience. Henley (2004) analyses panel data for self-employment by means of a two-step method proposed by Orme (1999)—which takes into account the ‘initial conditions’ problem, and identifies genuine state dependence by explicitly modelling unobserved heterogeneity. Parker and Belghitar (2006) utilise a multinomial logit model to investigate nascent entrepreneurship, and include a selectivity term to control for possible non-random attrition bias.

As we point out in Sect. 1, economic theory offers only limited guidance on modelling the likely determinants of self-employment. We proceed, in Sect. 2, to describe the data, drawn from the NCDS. Section 3 gives an account of the econometric methodology to be used. In Sect. 4, we describe our empirical results for gender-specific two-regime models on time spent self-employed, and offer our interpretation of their meaning. Conclusions are summarised in Sect. 5.

2 Economic background

Our central motivation stems from the fact that the empirical tests underlying models of self-employment (such as Kihlstrom and Laffont (1979), Jovanovic (1982), Evans and Jovanovic (1989), de Meza and Southey (1996), Blanchflower and Oswald (1998), and Burke et al. (2000)) can be improved in order to provide a more accurate and insightful perspective on both the determinants of self-employment as a career choice, and the subsequent time spent in self-employment. The conceptual background for our reduced-form empirical models is the dynamic programming problem under uncertainty faced by individuals with differing preferences and abilities for employment and self-employment. It is assumed that utility is affected by both the pecuniary, and the non-pecuniary, dimensions of each form of economic activity. For wage work, this is a standard approach—but it is worth elaborating a little for the case of self-employment. Footnote 3 Income in self-employment is related to entrepreneurial ability (itself, influenced by innate and acquired human capital), access to resources (including finance) Footnote 4 and the competitive environment in which the venture operates. Non-pecuniary factors could include the enjoyment of realising a vision, non-financial effects (e.g. helping others, promoting a philosophy or point of view), working in a sector or being a manager. Of course, non-pecuniary effects may be negative—such as the disutility from effort (like that experienced in wage work) or the negative side effects of pursuing the chosen strategy (e.g. family costs, job displacement in other firms, damage to the environment, etc.). The inherent risk attached to the (pecuniary and non-pecuniary) returns from an activity, and the individual’s attitude to risk, are also relevant.

In the first period, optimal choice of activity in employment or self-employment maximises the present value of expected discounted lifetime utility, given current knowledge about the effect of the initial decision on later career prospects. This effect can arise in many ways, through accumulation of human capital interacting with abilities and preferences. In the next period, random shocks are realised and new information is acquired, and the new best choice may differ from the Period 2 plan that was made initially with less information. Decisions are thus made sequentially, and the resulting sequence of activities can be summarised by the integer EP, defined as the (possibly zero) number of periods spent in self-employment by each individual. EP is thus a function of the identifying vector of individual characteristics \(\user2{x}_i\) in our dataset, and of the realisations of all the random shocks over the person’s career during our overall time window. This picture contrasts with the stark representation often found elsewhere—which implies that individuals are either 100% pure wage-workers, or totally committed entrepreneurs. Roughly 9.7% of NCDS individuals spent some of our 9-year sample window in self-employment and another part of it in wage work. It is worthwhile to compare this to the 6.8% ‘die-hards’ that spent the whole window in self-employment because in effect this is the group most people have in mind when they think of a person who is an entrepreneur. However, to our knowledge this is the first article to attempt to isolate and estimate what makes this particular (‘die-hard’) type of entrepreneur.

Choice models based on static utility functions, say of expected income and ‘job’ satisfaction in employment or self-employment, will generate either corner solutions or a unique interior optimum as in consumption theory, under the appropriate concavity assumptions. However, in our dynamic context, this standard approach can easily be misleading. Apart from a few part-time entrepreneurs, who also hold regular jobs, most people who switch between the two modes do so at discrete intervals. Planned transitions, such as learning skills in employment and then transferring human and other capital into an entrepreneurial venture, are the dynamic equivalent to the interior optimum in static choice. However, all these cases suggest a sequence of corner solutions with at least initially increasing returns to duration in any activity, and transitions motivated primarily by a combination of expectations and shocks.

This pattern may result from quite a number of sources. First, in view of the learning costs involved in any new activity, very short spell duration may be involuntary—the result of bad luck, over-optimism or error resulting in bankruptcy or redundancy. In such cases, the expected outcome from self-employment is worse than the actual outcome. Whatever its origins, this form of over-optimism may result in a pattern where individuals only learn the true value of the venture after actually starting it. Along the lines of Jovanovic (1982), they can reverse their decision to become self-employed if overly disappointed.Footnote 5 Thus, factors causing over-optimism, such as evangelical entrepreneurial role models, may be expected to increase the probability of an individual being an entrepreneurial type but have either negative or no impact on their persistence in self-employment.

Second, it is also plausible that a specific type of entrepreneurial ability may have a high rate of economic depreciation. This is especially likely if the business opportunity is short lived, or if the specific skills/knowledge of an individual (such as knowledge relating to a technology) are superseded in economic importance by other varieties. Third, the non-pecuniary vision or purpose of the venture (such as proving to family/friends that one is capable of running a business) may be realised quite quickly. As a result, a one-time entrepreneur may want to move on to other goals in life—and these might involve wage work. Fourth, under uncertainty, a move into self-employment may be a means of signalling managerial or other skills to employers in order to secure wage work once the ‘true’ value of the skills have been recognised. For example, this type of entrepreneurial activity is very common in media industries such a music, film and literature where employers find it hard to select high quality employees in the absence of seeing some demonstrable market performance (usually demonstrated through a start-up). Finally, an individual may choose to become self-employed as a means of acquiring business experience (such as managerial skills or knowledge of a business sector) that it may not be possible to acquire in wage work. As we know, only a tiny fraction of employees ever get an opportunity to undertake learning by doing in the role of CEO. Yet, this is exactly what every entrepreneur can do, albeit usually in a smaller firm. Thus, an individual whose wage work career requires experience in a sector or senior managerial role may find that ‘barriers to learning’ are less in self-employment. Thus, in such a case, a career path involves a transitory initial spell in self-employment. For example, employers in the venture capital and private equity industries frequently seek individuals with a successful prior experience in entrepreneurship. In sum, when one moves from a single- to a multi-period perspective on career choice between wage work and entrepreneurship, the process not only becomes richer but the dichotomous view of pure entrepreneurs versus wage-workers becomes misleading. Instead, self-employment and wage work are interrelated career options, frequently feeding off each other in terms of access to finance (for start-up), human capital and signalling. This career choice process is less about dichotomy and more about flexibility.

Many of the characteristics that determine career choice are in binary form, represented by dummy variables in estimation, and as usual there is unobserved heterogeneity between individuals, as well as the random influences on choices at each stage. If an individual has most of the characteristics associated with ET, but chooses zero EP (no self-employment), then it may be reasonable to ascribe this choice realisation to chance and classify the individual as a potential entrepreneur in an extended ET set; the details of this procedure will be discussed in Sect. 2 below on econometric methodology. The entrepreneurship literature has established that self-employment income is influenced by entrepreneurial ability (in turn, determined through elements such as education, previous work experience, family background and innate ability), available business opportunities and the cost and availability of capital. However, as we have outlined above, many of these same variables also affect wage work income, e.g. education, work experience, self-employment experience, etc. Likewise, many of these same factors potentially affect non-pecuniary income in both wage work and self-employment e.g. education, parents’ career, personality type and work experience. Lazear (2004) suggests that pure entrepreneurs are ‘jacks of all trades’—in contrast, we argue that those who mix spells of self-employment and wage work, over a period of time, may be the ultimate ‘jacks of all trades’.Footnote 6

A key issue is the distinction that can be drawn between the traditional dichotomous and discrete approach, and our two-regime (ET, EP) framework. As already noted, the discrete approach misses those ETs who were self-employed at other times, and is generally more prone to severely misclassifying individuals. It also loses the persistence dimension of entrepreneurship. Furthermore, a specification test of the appropriateness of two-regime econometric models will actually confirm our approach against the alternative of the oft-used traditional binary choice logit or probit approach to self-employment.

In the two-regime model, it is also possible for an individual characteristic or element of the \(\user2{x}_i\) vector to have different predicted effects on the ET and EP components. To classify these possibilities, recall that a binary variable such as ET is modelled econometrically as the probability of being self-employed or belonging to ET, say Pr(ET). For example, we find that higher education reduces the probability of self-employment or Pr(ET) for males, but does not affect their total time spent in self-employment for those who do make this choice, so it is insignificant in estimates of EP. Other variables raise Pr(ET), but have no influence on EP. Conversely, there are characteristics that seem to be irrelevant for ET, but have positive or negative effects on EP. We discuss these in more detail in Sect. 4, which deals with the interpretation of the results of the econometric analysis.

With this background, we turn next to our description of the data—before examining the issues of estimating these relationships and the appropriate econometric methodology, and then the results of our econometric estimation.

3 Data description

The data used for our empirical analysis are taken from the National Child Development Study (NCDS). The NCDS has obtained information about a cohort of individuals born in the week from 3rd March, 1958 to 9th March 1958 inclusive and living in Great Britain. Following an initial study in 1958, a series of surveys has been undertaken at irregular intervals—in 1965, 1969, 1974, 1981, 1991 and 1999/2000. This article focuses on the number of years, quarters or months (complete or incomplete) spent by individuals in self-employment in the period between sweeps 5 and 6 of data collection. Nonetheless, we consider many regressors for inclusion that refer to the characteristics and background of the individual over the entire period of their life up to 1999/2000.

In Table 1, below, we summarise the distribution of the number of quarters (periods of three months) of self-employment undertaken by NCDS individuals between March 1991 (NCDS5) and the time of the NCDS interview in 1999/2000 (variable between 102 and 113 months later).Footnote 7 Note that incomplete quarters are counted—so that, for example, a period of 4 months is recorded as 2 quarters, while a period of 50 months is recorded as 17 quarters. Table 1 results from NCDS6 variables concerning main economic activity—so our data do not necessarily refer to a sole economic activity at a particular point in time. The frequency distributions display the key features we would expect—a substantial majority of individuals who are not self-employed at any stage over a period of close to a decade; more self-employment among males than females; and a fairly small core of die-hard individuals who were self-employed throughout (less than half of those with some experience of self-employment in the years between NCDS5 and NCDS6). Figure 1 provides an illustration of the relative frequencies and provides some motivation for our central research question. It shows three groups, comprising a group who have mixed their career between wage work and self-employment walled in at either tail by die-hard entrepreneurs on the right and pure wage workers on the left who have never tried self-employment.

Fig. 1
figure 1

Relative frequencies of time spent self-employed, NCDS5-NCDS6

Another issue that needs to be addressed is the number of spells of self-employment that each individual undertook to accumulate their observed number of quarters of self-employment. For example, were those who spent a modest amount of time self-employed repeatedly entering and exiting from self-employment? Were those occupying self-employment for long enough to be in the top (33–38 quarters) grouping doing so through a single spell in almost all cases? Table 2, below, allows us to answer these questions. Less than 20% of the individuals in the NCDS sample were clearly of Entreprenurial Type (ET) through having some period of self-employment between NCDS5 and NCDS6. Of those undertaking some self-employment, nearly 90% had only one spell and 98% had two spells or fewer. Thus, empirical observation shows that our measure of Entrepreneurial Persistence (EP) is more useful in practice than we might have feared: we do not have to worry too much about controlling for the distinction between two individuals of ET with similar numbers of quarters self-employed, but very different numbers of spells of self-employment. This is a fortunate situation, since any regressor that simply measures the number of self-employment spells is definitionally forced towards a strong positive correlation with the number of quarters self-employed, since, when the latter is zero, the former must be zero also. However, conditional on some time having been spent self-employed, the number of self-employment spells might be negatively correlated with the number of quarters self-employed, if some individuals exhibit frequent transitions into, and out of, self-employment; while others undertake a single lengthy period in self-employment.

Table 1 Distribution of quarters of self-employment from 1991—by gendera

Further context is provided by Table 3—which demonstrates the mix between self-employment and other main economic activity (or inactivity) states between NCDS5 and NCDS6, and largely the expected sorts of variations by gender:

Table 2 Distribution of quarters of self-employment—by gender and number of spells
Table 3 Months of self-employment and other states of economic activity—by gender
Table 4 Zero-inflated negative binomial model for males

We take the ‘general-to-specific’ approach for our estimation—starting by using data on as many available variables as possible that we might expect to be relevant in determining individual self-employment, but discarding some variables on the basis of the statistical evidence. Regressors to be considered for inclusion can be split into several categories, as follows:

  1. 1.

    General controls—a gender dummy (where the sample is not split by gender); a dummy for self-employment at age 23 (NCDS4); eight English region dummies (SW England is the base region) and separate dummies for Scotland and Wales, to capture NCDS5 region of residence data and control for variations in costs (particularly housing) and regional demand conditions.

  2. 2.

    Family background—a dummy captures non-white ethnicity; another dummy reflects family financial difficulties (NCDS1); up to four dummies are used to capture the social class (class I, the base case, is top) of the cohort member’s father in 1965 (NCDS1); several dummies are used to capture the occupation of the cohort member’s fatherFootnote 8 in 1969 (NCDS2); a dummy is used to indicate use of the English language at home (NCDS2); two grouped variables from NCDS3 indicate the age at which the cohort member’s father and mother left full-time education; another grouped variable indicates, for the cohort member’s 1974 school, the percentage of male parents in a non-manual job; a dummy (NCDS5) indicates whether the cohort member’s parents ever permanently separated or divorced.

  3. 3.

    Education, ability and training—there are dummies to indicate highest academic qualification (CSE,Footnote 9 O level, A level, first degree or higher degree); four pairs of dummies capture performance in separate reading and maths tests at age seven (NCDS2) and age sixteen (NCDS3). For each test, a dummy is used to indicate a score definitively (not tied) in the top quintile of the cohort and another indicates a score in the bottom quintile, leaving the middle 60% (plus ties) of each ability distribution as the base case. A dummy variable captures embarkation by the cohort member on an apprenticeship by 1981; three others denote (respectively) receipt of vocational, professional and nursing qualifications by 1991.

  4. 4.

    Non-cognitive attributes—several psychological measures are included as discrete scores. Creativity comes from NCDS1 (1965), a zero value denoting no creativity, and other values rescaled to a maximum of 0.4; while unforthcomingness, withdrawal, depression, anxiety acceptance and hostility towards (other) children are taken from NCDS2 (1969), each with a zero minimum; and caution, moodiness, timidity, sociability and laziness measures are derived from NCDS3 (1974)—varying in the range [−2,+2]. There is a dummy for fear of new situations (1974). A number of dummies indicate the aspect that the cohort member regarded, in 1981 (NCDS4), as being most important when choosing a job. These include promotion, being in charge, being one’s own boss, lack of responsibility, job security and good pay (cohort members responding with some other job characteristic form the base group).

  5. 5.

    Financial—real terms value of inheritance received by 1991 may enter both linearly (scaled in units of \(\pounds10000\)) and quadratically (scaled by a factor of 10−10), or as a dummy variable (above a threshold value level); the year in which inheritance was received (subtracting 1900 from the actual year, and then dividing by 100). See Burke et al. (2000), Taylor (2001), and Hurst and Lusardi (2004) for justification of non-linear effects. Although self-employment income is a potential determinant of EP, the NCDS data suffer from too many missing values (in addition to the guaranteed missing values for those who are not of ET, and possible measurement error); and also from the fact that income data are not available to cover the full NCDS5-NCDS6 period.

  6. 6.

    Other—a regressor is defined as the number of spells of unemployment undergone between March 1981 and being surveyed in 1991 (NCDS5); a dummy captures not having at least one child by 1991, an alternative, related, measure being the number of children by 1981; another dummy indicates membership of a union or staff association in 1991 (NCDS5).

  7. 7.

    Missing value dummies—for some individual regressors, and some groups of regressors, an extra dummy is used to indicate that relevant data were missing, and as a (rather limited) control for this fact. The effects of sample attrition are more important if the attrition is non-random. The issues of attrition and non-response at a particular sweep (wave), and the extent to which they are non-random in the case of the NCDS, are investigated in some detail by Hawkes and Plewis (2006). This builds on the more descriptive account by Plewis et al. (2004).

4 Econometric methodology

If the values taken by a dependent variable are non-negative integers, it is possible to improve on the simple least squares regression framework. For such count data (e.g. the number of workplace accidents in a year at a set of factories), the most straightforward alternative (see, for example, Greene (1997, 2002), Maddala (1983)) is the Poisson regression model—while an extension is offered by the negative binomial model. Negative binomial and Poisson random variables each have a single parameter, and their discrete probability mass function can be written as:

$$ P\left(Y_i =y_i \right)=e^{-\lambda_i }\lambda_i ^{y_i}/{y_i !};\quad y_{\rm i}=0, 1, 2,\ldots $$
(1)

In each case, it is usual to specify the natural logarithm of the parameter as a linear regression function and estimate by the method of Maximum Likelihood (ML). The negative binomial model uses \(\lambda_i=\exp \left({\mathbf{x}}_i^{\prime} \varvec{\beta}+\ln \left(u_i\right)\right),\) where the u i is often assumed to follow a unit-mean gamma distribution with parameter θ, and accommodates heterogeneity not captured by the \(\varvec{\beta}\) vector of regressors. The conditional mean of Y is λ i , and the variance is \(\lambda_{i}(1 + \kappa\cdot\lambda_{i}),\) where \(\kappa= 1/\theta.\) It is common in actual data for the variance to exceed the mean, and the negative binomial model allows this, whereas the mean and variance of a Poisson variate are both λ i , where \(\lambda _i =\exp \left({\mathbf{x}}_i^{\prime}\varvec{\beta} \right),\) and unobserved heterogeneity is not addressed by the Poisson model.

In considering individual self-employment over a sample period, there may be two types of person not observed as self-employed—one that would never (seriously) consider becoming self-employed; and the other that would be willing, but was not self-employed at any point in the observed sample period. The zero-inflated Poisson and negative binomial models reflect this possibility—with a binary choice model (logit or probit) used to capture the difference between those who would never choose to be self-employed (thus inflating the number of zeros observed for the dependent variable), and those who might do so at least sometimes. Negative estimates here indicate a greater chance of being of ET. For both the zero-inflated negative binomial model and the zero-inflated Poisson model, with a logit component to inflate the number of zeros:

$$ P\left(Y_i=y_i\right)=\left(1/1+e^{w_i^{\prime}\varvec{\alpha}} \right)\left(e^{-\lambda_i}\lambda_i ^{y_i}/y_i! \right);\quad y_{\rm i} =1, 2,\ldots$$
(2a)

For the case of Y i  = 0, the negative binomial probability is shown below—and the appearance of the Poisson probability is different only in omitting the final ln (u i ) term:

$$ P\left(Y_i=0\right)=\left(1/1+e^{w_i^{\prime}\varvec{\alpha}} \right)\left(e^{w_i^{\prime}\varvec{\alpha}}+e^{-\lambda_i}\right)=\left( 1/1+e^{w_i^{\prime}\varvec{\alpha}}\right)\left(e^{w_i^{\prime}\varvec{\alpha}}+\exp \left(-\exp\left({\mathbf{x}}_i^{\prime}\varvec{\beta}+\ln \left(u_i \right)\right)\right)\right). $$
(2b)

The zero-inflated models are also estimated using ML, and it is usual to use robust standard errors (see White (1980), for example) when reporting results for these models.

Prediction of the mean of the dependent variable is straightforward. For example, under the zero-inflated negative binomial model:

$$ E\left[Y_i\right]=0.\left(e^{w_i^{\prime}\varvec{\alpha}}/1+e^{w_i^{\prime}\varvec{\alpha}}\right)+\sum\limits_{y_i=0}^{\infty}\left[\left(y_i /1+e^{w_i^{\prime}\varvec{\alpha}}\right)\left(e^{-\lambda_i}\lambda _i ^{y_i}/{y_i!}\right)\right]. $$
(3)

Estimated coefficients and regressor sample means can be used to provide estimated marginal effects.Footnote 10

The Poisson regression model is nested within the negative binomial model, and the extra restriction imposed by it may be tested by means of the standard Likelihood Ratio test. If there is overdispersion, this test will favour the negative binomial model. However, another possible source of excess zeros is the scenario where there are two types of person—so that zero-inflated models are appropriate. Vuong (1989) provided a two-sided test applicable when choosing between a pair of non-nested models—either Poisson versus zero-inflated Poisson, or negative binomial versus zero-inflated negative binomial. Asymptotically, the Vuong test statistic has a standard normal distribution.

We also consider an alternative modelling approach, which is not specifically designed for the case of count data, as a comparator. Although a Tobit model (censored regression) is more applicable than standard least squares regression—since negative self-employment durations cannot be observed—it can readily be improved upon if a separate process determines zero and non-zero values of the self-employment duration (like the zero-inflated models for count data). Let us use a probit model for the individual’s choice of whether to be self-employed and a Tobit model for the subsequent choice of non-zero time spent self-employed:

$$ d_i^{\ast}={{\mathbf{w}}_i^{\prime}\varvec{\alpha}}+u_i;\quad d_i=\left\{ {{\begin{array}{l} {1\hbox{ if }d_i^\ast > 0} \\ {0\hbox{ otherwise.}} \\ \end{array} }} \right. $$
(4a)
$$ y_i^{\ast}={{\mathbf{x}}_i^{\prime}\varvec{\beta}}+v_i ;\quad y_i =\left\{ {{\begin{array}{l} {y_i^\ast \hbox{ if }y_i^\ast > 0\hbox{ }} \\ {0\hbox{ otherwise.}} \\ \end{array} }} \right.$$
(4b)

Lin and Schmidt (1984) consider a model proposed by Cragg (1971), using equations like (4a) and (4b). For an individual who chooses no self-employment via the probit model (4a), equation (4b)—a truncated regression (with truncation to the left of zero)—is not relevant. Summation of the respective log-likelihoods of the univariate probit model and the truncated regression model yields the overall log-likelihood for the two-regime model. This combination reduces to the log-likelihood of a Tobit model if α = (β/σ), where σ is the standard deviation of the disturbance term ν in (4b). While Lin and Schmidt (1984) derive an LM test of the Tobit model against the two-regime Cragg model, Greene (1997) points out that a simple Likelihood Ratio test is possible (as an asymptotically equivalent alternative).

In a paper on cigarette smoking by individuals, Jones (1989) considers several alternative model structures. These differ with respect to the independence (or otherwise) of the disturbance terms in equations like (4a) and (4b); and in whether the participation decision ‘dominates’, in which case, in our example, only those not of ET could be observed spending zero time in self-employment. Slightly confusingly, the ‘Cragg model’ we have drawn from Lin and Schmidt (1984) is described by Jones (1989) as the ‘First hurdle dominance’ model. With Φ(.) as the cumulative distribution function of the standard normal, its log-likelihood function, across a sample of n individuals, is as follows:

$$ \ell=\sum\limits_{i=1}^n \left[\left(1-d_i\right)\ln \Upphi \left[-{\mathbf{w}}_i^{\prime}\varvec{\alpha}\right]+d_i\left[\ln \Upphi \left[{\mathbf{w}}_i^{\prime}\varvec{\alpha}\right]-\ln \Upphi \left[ \left( {\mathbf{x}}_i^{\prime}\varvec{\beta}/\sigma\right)\right]-\ln \left( {\sigma \sqrt{2\pi}} \right)-1/2\sigma^{-2}\left(y_i - {\mathbf{x}}_i^{\prime}\varvec{\beta}\right)^{2}\right]\right]. $$
(5)

The predicted unconditional mean number of quarters spent in self-employment comes from the following expression—using ϕ to denote the probability density function of the standard normal:

$$ 0.\Upphi \left[-{\mathbf{w}}_{i}^{\prime}\varvec{\alpha} \right]+\left({\mathbf{x}}_i^{\prime}\varvec{\beta}+\sigma\left(\phi \left( {\mathbf{x}}_i^{\prime}\varvec{\beta}/\sigma\right)/\Upphi \left({\mathbf{x}}_i^{\prime}\varvec{\beta}/\sigma\right)\right) \right)\Upphi \left[{\mathbf{w}}_{i}^{\prime}\varvec{\alpha}\right].$$
(6)

Use of estimated coefficients and regressor sample means again leads to estimated marginal effects.Footnote 11

Jones (1989) uses the ‘Cragg model’ title for a model in which the disturbance terms from (4a) and (4b) are independent.Footnote 12 Given the discussion underlying the various model structures in Jones (1989), here we need to emphasise that certain aspects of self-employment enable a qualitative distinction between those who would never be self-employed and those who might be, even if only briefly. Although cigarette smoking and self-employment differ in many respects, attitudes to both exhibit considerable variation within society, including—to an extent—by social classification sub-group.

Jones (1989) also points out that participation in an activity at a point in time means that an individual has previously decided to commence it, and has also not quit from it. This is the basis for sample separation models explicitly modelling the individual’s decisions to commence and/or to quit. Blundell et al. (1987) note that such an approach should improve the efficiency of estimates. However, previous studies of individual self-employment at a particular time, such as Blanchflower and Oswald (1998) and Burke et al. (2000, 2002), have not used this method. This article needs a different approach, since we are not examining self-employment at some given instant, and multiple cycles of starting and quitting are observed for some individuals over our 9-year period.

5 Empirical results

This section will present the main results, summarising and discussing the content of Tables 4–7. We focus a fair bit on the intriguing (but less obvious) effects of background characteristics from the early lives of members of our 1958 cohort. The effects of including a few additional controls are discussed briefly in an appendix. Some statistical diagnostics for our models, and a few predictions of time spent in self-employment (from Eq. 3) are also included there. However, to summarise, a two-regime approach is found to be justified throughout, for both males and females.

Table 5 First hurdle dominance model for males
Table 6 Zero-inflated negative binomial model for females
Table 7 First hurdle dominance model for females

5.1 Male self-employment

There is an unsurprisingly strong element of persistence in the tendency to be self-employed—with self-employment status at age 23 making some self-employment between NCDS5 and NCDS6 more likely, and likely to last longer (shown in both Tables 4 and 5). Other factors found to favour both being an ET, and EP, include having a father who was an employee-manager in farming, having an apprenticeship by age 23, being creative (as measured way back at age 7) and having expressed the view at age 23 that being one’s own boss is the most important aspect of a job. Notably, only the control for union membership at age 33 has the opposite effect on both being an ET, and EP.

A male is more likely to be of ET if his father was the manager of a small firm, if his father was a worker or a farmer with his own account (Table 5), if he was lazy at age 16 (Table 5), or due to the receipt, timing and/or value of an inheritance received by age 33. He is predicted as less likely to be of ET if he possesses a first degree as his highest academic qualification, if he has a vocational qualification, if he was timid (Table 4) back at age 16, or if he viewed either promotion or job security as being the most important aspects of a job when asked in 1981 (aged 23). Regressors that raise EP only include having had a father who was self-employed, and the value of an inheritance received at age 33 (but this effect is small in magnitude). Those that just reduce EP are having an O level equivalent as highest academic qualification, being flexible back at age 16 (presumably too flexible), having the view at age 23 that lack of responsibility is the most important aspect of a job (Table 4), high maths ability at age 16 (Table 5) and the number of unemployment spells suffered up to the age of 23.

5.1.1 Interpretation of results for male self-employment

Our two-part econometric approach proves to be superior to the simple logit/probit and therefore, supports our reclassification of pure wage-workers and self-employed into pure wage-workers and entrepreneurial types. However, what is reassuring is that our estimation of entrepreneurial types arrives at a specification which is broadly similar to previous studies—in the case of this data set to that of Blanchflower and Oswald (1998) and Burke et al. (2000). Thus we confirm the existing literature on determinants of ET. Nevertheless, our results clearly indicate that causes of ET differ from those of EP. It follows that our results indicate that using probit/logit analysis to determine ‘what makes an entrepreneur’ is misleading if one is interested in ‘die-hard’ persistent entrepreneurs rather than individuals who hope to get around to it some day or who try it only fleetingly. The significance of this difference is underlined by the observation earlier in the article that, over the sample period, the population of individuals who move between self-employment and wage work is greater than those who are classified as entirely self-employed. The results provide some interesting insights into this distinction—which, we believe, enriches our understanding of the process of what makes entrepreneurs.

Throughout our results for males, we note a high degree of consistency between the results from the zero-inflated and hurdle models (this feature also remains for females). In each form of estimation there is strong path dependence in terms of career choice early in each male’s life. We find that if a male was self-employed at the age of 23, he is not only more likely to be of ET from 33–42, but is also more likely to persist in self-employment. The results raise some issues for entrepreneurship education as they suggest that awareness of self-employment as a career path early in a male’s life may be a key influence on an economy’s long-term enterprise base. Likewise, it may also indicate that ‘learning by doing’ in self-employment early in a career can be a useful driver of entrepreneurial human capital and/or its specific nature may lock an individual into this form of career path.

Family background highlights some interesting intergenerational effects. A father who is self-employed or a manager of a small firm has a positive effect on his son being of ET. However, only a self-employed father has a positive effect on a son persisting in entrepreneurship. In terms of a human capital/mentor interpretation, this might indicate that there are valuable entrepreneurial skills—distinct from small business management skills—that only a self-employed father can pass onto a son. An alternative interpretation stems from a role model or ‘influenced expected utility’ effect where a father who is a manager of a small firm (without real experience of self-employment) may cause over-optimistic expectations (of the kind identified by de Meza and Southey (1996)) of utility from self-employment among their sons. If this is the case among a significant number of sons then they will not persist in self-employment thereby generating insignificance (perhaps negating positive effects of mentoring by a father who is or was a small business manager) of the ‘dad manager of a small firm’ variable in the EP estimation. In contrast, self-employed fathers have real experience of self-employment and hence may pass on more realistic expectations of utility from self-employment to their sons—in which case the EP estimation is not affected by an outflow of those whose expected utility needed serious downward revision. Outside of these effects, the results seem to indicate that a dad who works in the farming sector has a positive impact on the son being of ET and persisting in self-employment. This effect may be due to sons of farmers being more likely to enter the farming sector than non-farmers’ sons. In this case, with the high prevalence of small firms and self-employment in the farming sector one might well expect this pattern of econometric results.

The education variables, while different in composition, broadly reflect the interpretation of previous logit/probit estimates of the same dataset provided by Blanchflower and Oswald (1998) and Burke et al. (2000). In general, higher levels of education are not associated with entrepreneurial types but low levels of education are negatively related to EP (or performance as in the case of Burke et al. 2000). The same type of observation applies to the role of ‘creativity’ among psychological profiles of males. We find that creativity has both a positive effect on an individual being of ET (similar to Blanchflower and Oswald (1998), and Burke et al. (2000)) and persisting in entrepreneurship (as found in the second stage estimation of performance in Burke et al. (2000)). However, some other psychological profiles show an interesting distinction between ETs and EP. Notably, ‘being cautious’ is found to be a positive attribute of persistence in entrepreneurship, which would make sense in terms of the impact of risk aversion on sample selection. However, this result is more interesting in light of the recent theory posited by Bhide (2000) who, on the basis of case study evidence, argued that entrepreneurs who ran high growth ventures were not typically risk lovers, but rather had a ‘heads I win, tails I do not lose very much’ approach. In this light, our results provide some statistical support for Bhide’s case study evidence. Less easy to interpret is the finding that ‘being flexible’ (usually believed to be of the essence of entrepreneurship) appears to be negatively related to EP. This may reflect an inverse effect, namely that inflexible individuals may be more die-hard/persistent types who might be willing to see a venture through ‘thick and thin’ hence giving rise to the negative relationship between flexibility and persistence in entrepreneurship.

The role of finance as depicted through the exogenous measure of inheritance has similar effects to that outlined in studies such as Evans and Jovanovic (1989), Evans and Leighton (1989), Blanchflower and Oswald (1998) and Burke et al. (2000). Simply put, receipt of an inheritance increases the likelihood that a male will be of ET. In terms of persistence it is also found to be significant but the marginal effects show that it has only a minor role to play. This would seem to indicate that its effects are largely short term and are overtaken by other more pressing influences on the decision to persist in self-employment.

Finally, in terms of an auxiliary grouping of variables, some interesting results emerge. We find that having children seems to neither stimulate nor deter being of ET, and EP, among males. Given that we later find it has a negative effect on female EP, this suggests that despite changes in the labour market, females still bear the main economic burden of looking after children. Turning to the role of unemployment, we find that spells in unemployment do not appear to push individuals to become of ET and in fact appear to cause those who nonetheless choose to become self-employed to persist less in entrepreneurship. This result contrasts with the view originally put forward by Foreman-Peck (1985)—who, using UK data for the Interwar period, finds evidence of a push effect and speculates that these start-ups were more likely to be low quality. Our more direct evidence, for a more recent period, does not support the push hypothesis but does indicate that individuals with more early life experience of unemployment seem to have less staying power in entrepreneurship—being negatively related to EP. Thus, the results seem to indicate that unemployment weakens the enterprise economy. This is in contrast to the push hypothesis (see Storey (1994) for an overview).

5.2 Female self-employment

Persistence in self-employment is again evident—but the regressor ‘self-employed at age 23’ is the only one to have positive effects on being of ET, and EP. However, a nursing qualification acts against being of ET, and has a negative impact on EP.

A female is more likely to be of ET if her father was a manager of a small firm, a worker with his own account or an employee manager in farming. Her probability of being of ET is also positively linked to her father’s age when leaving full-time education. Other regressors that have a similar effect include a professional qualification, an apprenticeship, being in the top quintile on mathematical ability at age 7 and viewing (at age 23) being one’s own boss as the most important characteristic of a job. Unsurprisingly, a lower probability of being of ET is linked to union membership at age 33. Similar effects are also found for the English language being spoken at home (at age 11), being in the bottom quintile on reading ability at age 16, being cautious at age 16 and viewing job security as the most important aspect of a job (at age 23). The last two of these results in particular are very plausible, intuitively. Regressors that only act to increase EP are a strong desire for a lack of responsibility in a job (Table 6, not easily explained), having a first degree as highest academic qualification (Table 7) and the timing of the receipt of an inheritance (closer to 1991, rather than less recently). Reduced female EP only is linked to being in Social Classes II, III and IV at age 7 (Table 7), the mother’s age of departure from full-time education, having suffered a parental split (Table 7), the number of unemployment spells endured since age 23, the number of children borne by age 23 and poor health at age 33.

5.2.1 Interpretation of results for female self-employment

We find some stark differences between the female and male results—which, we believe, underlines the appropriateness of dividing the datasets (Burke et al. (2002)Footnote 13). While males of ET and male EP can be largely explained within the confines of economic models of entrepreneurship augmented with psychological factors, the same approach is less satisfactory in explaining female entrepreneurship. Nonetheless, some generic features do emerge from the estimation process. As in the case of males, we find a strong degree of path dependence in terms of early career choice with females who were self-employed at the age of 23 also being more likely to be of ET and persist in self-employment over the age 33–42. We deduce similar implications for entrepreneurship education to those outlined above for males.

In the case of the family background variables, we do not find that gender differences undermine the influence of the father’s career on daughters. As in the case of males, we find that both fathers who are managers of small firms or self-employed (in the case of females only those who are ‘worker own account’) appear to have a positive impact on daughters being of ET. Moreover, as in the case of males, we find that the daughters of fathers who are managers of small firms are not any more persistent in self-employment than daughters without such a father. Thus, as before, we view this as either evidence of limited relevance of small firm managerial skills for persistence in entrepreneurship and/or evidence of a role model father causing over-optimism about self-employment utility among their daughters. However, the area where males and females diverge is that, unlike males, this same pattern also emerges for fathers who are self-employed in that their daughters do not appear to persist longer in self-employment than those who do not have a self-employed (‘worker own account’) father. As before, this might again be due to a father role model/mentor causing over-optimism (of the de Meza and Southey (1996) form) among daughters but it might also be due to key differences in human capital that are pivotal to typical male and female self-employment. Namely, the father’s human capital may be less applicable to a daughter’s career (compared to a son’s) and hence mentoring by a father becomes less useful for females. An alternative viewpoint could be that the human capital transmission channel might be generally stronger from father-son than father-daughter. In other words, if fathers have closer and more communicative relationships with sons than daughters, then sons may receive a greater transfer of human capital from a father.

In the case of education, we note that higher levels of education—in the form of a first university degree—have a positive impact on EP. In this sense, the general pattern that education is good for persistence is similar to males. However, the pattern diverges in terms of determinants of being of ET—as university education has insignificant effects. We also find some polarised effects—with low levels of education (e.g. ‘O level highest’) appearing to be on the verge of significance in terms of a stimulus to be of ET; while a high level of education, in the form of a professional qualification, does likewise.

In terms of psychology scores, creativity is not a driver of female entrepreneurship in the same way as it is for males. It is insignificant and on the verge of a negative effect in terms of EP. Cautious females tend to avoid self-employment, as do those who value job promotion. However, like males, a desire to be ‘one’s own boss’ is positively related to being of ET—but, unlike males, it is not associated with being a die-hard entrepreneur.

In terms of the roles of finance and spells in unemployment, the difference between males and females only persists in the case of finance. Females are not stimulated to be of ET by receipt of an inheritance but are stimulated to persist longer in self-employment by such an event. Thus, the impact of an exogenous increase in access to finance appears to stimulate entrepreneurship among males and females, but in very different ways. In contrast, unemployment has similar effects in that it tends to decrease EP among both females and males. Poor health seems to constrain persistence in self-employment among females more than males while as we noted before, having children (by age 23) seems to only constrain persistence in female self-employment. Thus, overall we note some key areas of difference between female and males regarding both being of ET and EP. The extent of these gender differences justifies the treatment of male and female self-employment as distinct processes in separate equations—a practice not often observed in the previous literature on entrepreneurship.

6 Conclusion

The article contributes to the literature on entrepreneurial choice by moving beyond a dichotomy between wage-workers and entrepreneurs. We note that the majority of entrepreneurs actually spend some of their career in wage work and hence we have distinguished between entrepreneurial types (individuals who either have been self-employed or, if not, would consider self-employment as a career option) and entrepreneurial persistence including die-hard entrepreneurs. We outline how entrepreneurial choice becomes richer when this distinction is considered. We also offer an econometric approach to test the appropriateness of this classification. To our knowledge, this article is the first empirical analysis of the determinants of persistent or ‘die-hard’ entrepreneurs. Using a recent update to the NCDS dataset we explore ET (individuals with an inclination for entrepreneurship), and EP, for both males and females across a 9-year period—from age 33–42. Diagnostic tests indicate clearly that a two-regime approach is indeed a superior specification to the simpler probit/logit dichotomous approach. The results have important ramifications because of the differences we find between factors that encourage individuals to try self-employment and those that determine persistence in self-employment. We find that the determinants of being of ET are similar to the results in the existing empirical literature based on a probit/logit estimation of self-employment choice. Given the superiority of our econometric approach, this finding is reassuring as it means that the pre-existing literature on self-employment choice is indeed a good guide to explaining ET. However, our results for EP or ‘die-hards’ are quite different and taking these results in conjunction with those on entrepreneurial types provides an enriched understanding of what makes an entrepreneur.

We find that male and female entrepreneurship are distinct—although with some common determinants. One of these is an early career experience of self-employment—which tends to encourage persistent entrepreneurship throughout the 33–42 career span. Similarly, higher levels of education tend to be associated with EP among both males and females. Access to finance encourages being of ET among males with only marginal effects on EP. In the case of females it only has the effect of increasing EP. We also find self-employed fathers tend to encourage more entrepreneurial types and more persistence among their sons, but only the former among their daughters. This may indicate more relevance of a father’s human capital for a son’s business. We also find that fathers who are managers of small firms encourage both sons and daughters to be of ET but have negligible effects on EP. This may be due to disparate skills for self-employment and small business management, and/or small business managers encouraging over-optimistic views (of the kind identified by de Meza and Southey (1996)) of prospects in self-employment. We believe that this effect does not occur for sons of self-employed fathers because the latter’s actual experience should provide their sons with a more realistic perspective. However, there may be less mentoring in the case of daughters—feeding through less additional realism and hence undermining the positive effect on persistence.

Our results are also consistent with children being a greater hindrance to female entrepreneurship. Having children by the age of 23 has no statistically significant effect on male entrepreneurship but is negatively related to female EP. We also find EP is hindered by poor health among females more than males.

Finally, our analysis sheds some interesting light on unemployment and entrepreneurship, which contrasts with previous views in the literature. We find that spells in unemployment do not increase ET, and decrease EP. Thus, we do not find an ‘unemployment push’ into self-employment, but it appears that early life experience in unemployment may reduce self-employment quality (leading to the drop in EP).

In sum, we offer a new theoretical perspective and empirical findings based on new data. However, this is only a first step beyond the pure wage-worker versus pure entrepreneur logit/probit approach, towards a multi-dimensional and dynamic analysis of different kinds of entrepreneurship.