1 Introduction

The self-employment rate in the Netherlands has risen substantially in recent years, from 11.6% in 2003 to 16.8% in 2016.Footnote 1 This increase has raised interest in the income vulnerability of the self-employed and their impact on the social security system. In particular, they are an important potentially vulnerable group when it comes to retirement income, since, like in many other countries, they are not obliged to accumulate the same occupational pensions that employees accumulate.Footnote 2 Policy makers are therefore concerned that the self-employed are not building up enough pension wealth until their retirement (See SZW 2016, Sect. 3.1). Indeed, Mastrogiacomo (2016) shows that the self-employed have the same pension ambitions as employees but tend to fall short on their goals. Similar results are found by other studies on retirement preparedness of the Dutch; see, e.g., de Bresser and Knoef (2015) or Knoef et al. (2016).

Mastrogiacomo et al. (2016) emphasize the large heterogeneity in wealth holdings of the self-employed. This raises the question how to characterize them, dividing them, for example, into groups that differ distinctly in their income, wealth or pension entitlements. Population data published by Statistics Netherlands show that the rise in the number of self-employed is due to the growth in solo self-employed (SSE), self-employed that work on their own and have no employees in their business. Earlier studies like Bosch and Van Vuuren (2010) or SER (2010) found that the SSE are themselves quite heterogeneous in terms of personal characteristics and background. This is in line with the international literature on entrepreneurship (Blanchflower 2000; Parker 2004). So far only few studies have differentiated between SSE and other self-employed. An exception is Mastrogiacomo et al. (2016) who show that heterogeneity in business wealth of the self-employed is mostly driven by SSE, while entrepreneurs who own a larger firm are more homogeneous. Zwinkels et al. (2017) focus on the heterogeneity of SSE in particular: the variation in their calculated pension wealth is larger than for employees. Given the heterogeneity of SSE it is questionable if a bisection of the self-employed into SSE and other self-employed individuals is sufficient to understand the heterogeneity in how the self-employed prepare for retirement. Bolhaar et al. (2016) split the SSE into two groups based on their business activity but do not find many differences. Zwinkels et al. classify the SSE into several groups based on income sources and household composition. They find that households consisting of only SSE in particular will often have a low replacement rate after retirement.

Unlike most of the existing studies, the current paper analyses individuals' labour market trajectories over a large number of years. It uses sequence analysis to construct different clusters of self-employment based on these trajectories. Sequence analysis was introduced in sociology by Abbott (1983) and became a popular way to study e.g. life-course and career trajectories in the social sciences. It is particularly well suited for the analysis of self-employment and pensions, since pension savings are typically determined by individuals’ complete labour market trajectories. We show for example that a large share of the self-employed remain in self-employment for only a few years, while studies on pension preparedness of the Dutch often assume that individuals remain in the same labour market state going forward. The only exception we are aware of is Mastrogiacomo et al. (2016) who distinguish long-term from short-term entrepreneurs.

Some related studies also use sequence analysis. Zacher et al. (2012) use data from the German Socio-Economic Panel and find that compared to those who are only self-employed for a short time, individuals who remain continuously in self-employment are more likely to be male, from older cohorts, and have a higher risk taking propensity. Humphries (2016) studies the 1970 cohort of Swedish men and distinguishes between incorporated and unincorporated self-employment. He finds clusters that differ both in income and wealth accumulation, in line with Levine and Rubinstein (2017) who found that in the US, the legal form chosen by self-employed explains most of the variation in income. Visser (2018) uses sequence analysis to investigate whether careers in the Netherlands have become more complex for individuals born in the early 1940s compared to those born in the 1930s. Munnell et al. (2019) follow individuals in the US from age 50–62 to assess how they use non-traditional jobs. They find that individuals who are more frequently employed in non-traditional work during their late career are more likely to have lower retirement income and higher rates of depression. Tophoven and Tisch (2016) analyse the implications of work trajectories for accrued statutory pension entitlements by the age of 42 for different cohorts in Germany. Their study does not offer any insights for the self-employed, since they are not included in the statutory pension system. Lastly, Madero-Cabib and Fasang (2016) study cohorts born in 1920–1950 in Germany and Switzerland to examine how work and family life affect financial well-being in retirement. Analysing labour market and family trajectories jointly, they find that breadwinner policies (i.e. supporting a system where men do and women do not do paid work) in combination with liberal pension policies later in life, as in Switzerland, intensify pension penalties for typical female work-family life courses.

Our study is descriptive and does not aim at identifying causal mechanisms. Our main aim is to analyse the heterogeneity of labour market careers involving self-employment. In order to partition individuals into distinct groups, we use optimal matching to first calculate dissimilarity measures for all observed employment paths. In a second step we cluster the individuals based on the dissimilarity measures. We then study the characteristics of the clusters, i.e. their income, wealth, and pension investments. Like other studies on the Dutch self-employed, we use data from Statistics Netherlands’ Income Panel (IPO), following nine 5-year birth cohorts over 29 years. Our sample consists of more than 50,000 individuals, of which more than 13,000 self-employed. Our definition of self-employment is similar to the one of Zwinkels et al. (2017), but there are small differences. We include all freelancers and all those with income from their own enterprise, even those with low income.

We find the following clusters involving self-employment: (1) self-employed with weak labour market attachment, (2) self-employed who spend a large part of their trajectory as benefit recipients, (3) employees with short self-employment spells, (4) employees that switch to self-employment later in their career, (5) always pure self-employed, (6) always hybrid self-employed (combining self-employment with income from another source), and (7) DGAs (“directeur-grootaandeelhouder”, directors and majority share holder of their firms). We find that DGAs are the “positive” outliers among the self-employed. They earn more, have higher wealth, and save more for retirement through the (voluntary) third pillar. At the other end we find the first three clusters, whose members do not spend a long period in self-employment. They have large gaps in their occupational pension accumulation and are not more likely than employees to participate in a third pillar solution. Similar findings hold for the late career switchers. The pure and hybrid self-employed have a slightly higher disposable income than employees and accumulate also significantly more non-pension wealth than employees. Like DGAs, these two clusters have a rather low risk of not having sufficient means after retirement.

Our study is structured as follows. In Sect. 2 we describe the different data sources and give our definition of self-employment. Section 3 introduces the methods and presents the self-employment clusters that we find. Section 4 discusses the differences between the clusters with respect to demographic characteristics, income and wealth, while Sect. 5 investigates pension accumulation across clusters. Section 6 concludes.

2 Data

This paper combines several sources of Dutch administrative data provided by Statistics Netherlands (CBS). These are all longitudinal data sets with annual-level data. The core is the Income Panel Study (IPO), which follows a representative sample of the Dutch population over time. Attrition only occurs if individuals decease or move abroad. As such, IPO contains information on more than 90,000 individuals of all ages. These are the so called “core individuals” (in Dutch: “kernpersonen”).Footnote 3

The main advantage of IPO over other data such as integral income data sets covering the complete Dutch population, is the long time period covered. IPO starts in 1989Footnote 4 and ends in 2014—when it was replaced by integral population data—and hence covers a period of 26 years, whereas available integral income data currently only cover 15 years. This allows us, for example, to differentiate between those who have always been self-employed and those who switch to self-employment later in life. We extend the time horizon of IPO by three years (2015–2017) by linking the IPO individuals to their records in the integral income data set (INPATAB).Footnote 5 Hence our analysis of trajectories covers the years 1989–2017.

For all these years IPO records detailed information on the individuals’ income from different sources and demographic characteristics such as age, gender, and marital status. There are two breaks in the series due to tax reforms, but neither of them has an impact on our analysis.Footnote 6 The integral data set with which we extend IPO also provides more detailed information on the self-employed starting from 2005: the type of self-employment (i.e., whether the individual is solo self-employed (SSE) or not), and the industry in which the individual is active. We also also link the individuals to data on household wealth (VEHTAB) and (for a subsample) to accrued pension rights in the second pillar (“Pensioendeelnemingen”).

We divide the IPO core individuals into 5-year birth cohorts and limit our analysis to those for whom we have at least 14 years of data during their working age. We consider the range 24–66 as individuals’ working age. This allows us to form symmetric cohorts preceding and following the three whose observation window falls completely on their working age.Footnote 7 Table 1 gives an overview of the nine cohorts that we retain. The oldest cohort includes individuals born in 1936–1940, and the youngest those born in 1976–1980. Cohorts 1–4, and partially cohort 5, fall into what Mastrogiacomo (2016) refers to as the old self-employed that are a last group of high income and wealth individuals; the remaining cohorts allow us to analyse whether the newer generations of self-employed are indeed different.

Table 1 Overview of cohorts in sample

We limit our analysis to individuals who remain in the Netherlands for the whole (cohort-specific) period of interest. We exclude both immigrants and emigrants if they enter or leave the Netherlands in the years we study, since we do not observe them before or after their move. Similarly, we drop everyone who temporarily leaves the country. All in all, this leaves us with approximately 70% of all individuals from the cohorts we consider—a sample of slightly less than 51,000 individuals.

2.1 Definition of Labour Market States

In order to construct employment trajectories we need to define possible labour market states. Each individual can only be in one state in a given year. The definition of the states is similar to the socio-economic categories (SEC) that CBS provides in IPO. Like the SEC, our definition is based on observed incomes. We use different categories to differentiate between those who are exclusively self-employed with non-zero income from self-employment only (referred to as “pure self-employed”)Footnote 8 and “hybrid self-employed” who, in addition to income from self-employment, also receive income from another source. The SEC in IPO makes this distinction until 2000 only.

Individuals who do not receive any income are categorized as having no income. Individuals with non-zero income from entrepreneurial activity are categorized as pure self-employed or hybrid self-employed (self-employed individuals with additional income from employment or pension income). All the other individuals are categorized according to their largest non-zero income as either employees (total income from employment in the private and public sector), freelancers (income from “other work”),Footnote 9DGAs (income as DGAFootnote 10), benefit recipients (based on the sum of all social benefits received,Footnote 11pensioners (sum of state and occupational pensions), or, finally, if all these incomes are zero but IPO indicates that the individual has an income,Footnote 12 as other income recipients.

The rightmost column in Table 1 shows that the shares of individuals in each cohort who have spent at least 1 year in self-employment vary between 17 and 32%, with an overall average of 26%. This is much larger than the average percentage in self-employment at a given point in time, which is 11%. The reason is mobility into and out of self-employment. Our analysis will focus on distinguishing between those who remain in self-employment persistently and those who only spend short periods in self-employment.

2.2 Descriptive Statistics

Table 2 shows the demographic characteristics of our sample overall and by labour market state. Even though our sample selection procedure discriminates against emigrants and immigrants, these groups are still represented fairly well. For the year 2000, CBS Statline reports a share of 82.51% Dutch, 3.43% first generation and 5.18% second generation Western immigrants, and 5.59% first generation and 3.29% second generation non-Western immigrants for all ages. The only group that is strongly under-represented are second generation non-Western immigrants, perhaps because our sample excludes cohorts born after 1980 while CBS Statline includes all individuals.

Table 2 Demographic characteristics across labour market states

Freelancers and DGAs are less frequently of non-Western origin than pure and hybrid self-employed. The self-employed are on average older than employees and are also more frequently male. Freelancers stand out in particular: they tend to be female and most of them can rely on a partner as the household’s breadwinner (defined as the adult with the highest gross income). There is less of a gender difference between pure and hybrid self-employed where roughly two thirds of the observations are for men. Among DGAs, almost three quarters are men. The three types of self-employed are more frequently than employees the breadwinner of their household and are less often single.

Figure 1 plots the distribution of labour market states over time for all individuals in our sample between age 24 and 60.Footnote 13 The graph confirms what we would expect: (female) labour market participation increases over time: the share of individuals without income declines. The share of self-employed increases. While the share of freelancers fluctuates somewhat over time, the shares of the other three self-employment types are all increasing. The total share of self-employed individuals in the sample rises steadily from approximately 6.6% in 1989 to 16.5% in 2017.

Fig. 1
figure 1

Distribution of labour market states over time

Figure 2 shows the share of self-employed individuals in the sample as a fraction of the working population (self-employed and employees), by cohort and for all cohorts.Footnote 14 The decline in the contribution of older cohorts to the self-employment share is partly by construction, as individuals older than 60 are not considered for the calculation of the shares. Hence each cohort fades out over 5 years. Similarly, the trough in the early 2000s is due to the youngest cohort, which enters the sample later than that the oldest cohort leaves; in addition, the share of self-employed is lower among individuals at the beginning of their careers. We have no explanation for the sharp increase in the self-employment rate in 1996. The plot shows that until the beginning of the 2000s, all cohorts contribute approximately equally to the self-employment share. Thereafter we see that the contribution of the younger cohorts 6–9 is increasing over time; these cohorts mainly drive the rise in self-employment.

Fig. 2
figure 2

Self-employment share and contributions of cohorts

How do the self-employed differ from individuals in other labour market states? Table 3 shows the sample distribution of taxable income at the individual level, and of disposable household income. To make values comparable, we adjust household income using the CBS equivalence scales, convert all values to euro and deflate with CPI data from The World Bank World Development Indicators (base year 2010). Two general observations can be made: First, women’s taxable income is on average lower than men’s. Second, women’s disposable income is generally higher than their taxable income while the opposite holds for men. This is as we would expect considering that the majority is married/living with a partner, and women less often participate in the labour market and are more likely to work part-time.Footnote 15 This is also reflected in the median and the 25th and 75th percentiles of taxable income: Median taxable income for female employees, self-employed (full-time and hybrid) and DGAs is between 55 and 60% of males and similar ratios are observed for the quartiles. This can also be observed for pensioners, in line with the fact that occupational pensions are related to earnings before retirement.

Table 3 Taxable individual and disposable household income, wealth and sample share by gender and labour state

Comparing taxable income of different groups shows that the DGAs earn much more than employees or other self-employed. This is partially due to the minimum salary requirement for DGAs imposed by the tax rules. Because it would be fiscally attractive for DGAs to pay themselves dividends instead of a salary, the tax rules set a minimum salary (“gebruikelijkloonregeling”) that was 45,000 euros in 2017.Footnote 16 Individuals would generally not incorporate their business if they could not afford to pay themselves this salary. This is in line with the findings by Levine and Rubinstein (2017) and Humphries (2016). The other types of self-employed (who are generally not incorporated), have lower median incomes than employees have. Finally, the differences between the labour states become smaller at the 75th percentile, showing that the dispersion in income among the self-employed is larger than among employees.

Closer inspection of the differences in taxable income between men and women shows that the largest difference is found for freelancers. For female freelancers, the median taxable income is less than 20% of that of employees, whereas for male freelancers the median income is 72% that of employees. This stands in stark contrast with hybrid self-employed women, whose median taxable income is 87% of that of employees and whose third quartile is even higher than that of female employees. The same holds for men, which makes the hybrid self-employed the second most successful group of self-employed after DGAs. The inter-quartile range of taxable income is largest for the pure self-employed and for freelancers, indicating large income dispersion in these groups.

A different picture is found for disposable household income. Adjusted for household size, the values are very similar across labour states, except for DGAs and freelancers. The higher taxable income of DGAs and lower income of freelancers are also reflected in their household’s disposable incomes. Disposable household income is substantially higher than personal income for women and vice versa for men, indicating that women are less frequently the breadwinner in the household. This holds in particular for the lower half of the income distribution of freelance, hybrid and pure self-employed women. Moreover, the dispersion of disposable income is larger for most self-employment types and while the median values of pure and hybrid self-employed are relatively close to the median of employees, the third quartile is much larger.

The lower panel of Table 3 reports household wealth adjusted for household size. Wealth includes both financial wealth and net value of real estate. The dispersion of wealth is larger for the self-employed than for employees. Once more, DGAs are outliers, with much more wealth than any other group of self-employed.Footnote 17 For all self-employed except male freelancers, the quartiles of household wealth are higher than those of employees. Self-employed women have higher median wealth than employees.

Table 4 shows the shares of individuals that make a contribution to the second pillar (occupational pensions) and third pillar (voluntary private pensions). By definition, the pure self-employed make no contribution to the second pillar. Contributions by pure self-employed (e.g., voluntary payments to the pension fund of an old employer) are recorded as third pillar contributions. The same rules also apply to the other self-employed, but as we can see in Table 4 both DGAs and freelancers still have a small share of individuals that contribute to a pension fund—they have an employment contract with an income above the fund’s franchise.

Table 4 Share of individuals (in %) contributing to pension pillars

The share with contributions to the second pillar is much larger for hybrid self-employed, as expected. Still, compared to employees, a larger share of hybrid self-employed is not contributing to an occupational pension. This is because among the hybrid self-employed, there are more individuals whose earnings as an employee are below the franchise or work for a company without occupational pension. For the same reasons, we do not see a 100% share for employees, particularly for women.

Men are more likely to contribute to either pillar than women. The differences are particularly large for the third pillar. For example, 25% of male pure self-employed make a contribution to the third pillar, compared to half as many women. Contributions to the third pillar are more common among the pure self-employed than for DGAs or hybrid self-employed. The freelancers are once more the worst performing group among the self-employed, with the largest difference between genders. While female freelancers make almost no contributions to the 3rd pillar, their male counterparts still do so in 15% of all cases.

3 Labour Market Trajectories Over the Life Cycle

This section describes the analysis of labour market trajectories. First, we show how self-employment spells vary across individuals’ working lives. Then we introduce sequence analysis and explain the optimal matching algorithm with which we calculate so-called edit distances between all pairs of trajectories. Finally, we explain how clustering can be used as a tool to build a data-driven classification of labour market trajectories.

3.1 Visualisation of Employment Trajectories

An employment trajectory is an individual’s sequences of labour market states over time. Figure 3 shows the distribution of those states over time for the individuals of the 1961–1965 birth cohort (cohort 6) who are categorised as self-employed in at least one of the 29 years in which we observe them. The figure shows that the share of individuals who are self-employed at a given point in time is increasing over time, in accordance with Figs. 1 and 2. In addition, the share of benefit recipients in this subsample is smaller than in the overall population shown in Fig. 1. Furthermore, the shares of all types of self-employed increase but the share of full-time self-employed increases most. The same patterns are also observed across other cohorts (not shown), though less pronounced in the oldest cohorts who often had attractive early retirement options. The graph does not show, however, what kind of labour market careers the self-employed have had or will have. For example, are the individuals that are self-employed early in their career also self-employed in later years? How much mobility is there into and out of self-employment? If so, what are the labor market states from which they enter self-employment, etc.?

Fig. 3
figure 3

Distribution of labour market states over time of cohort born in 1961–1965 (self-employed)

One way to answer these questions is to plot all labour market trajectories. Figure 4 essentially does that and shows a so-called index plotFootnote 18 of the trajectories of the same individuals as in Fig. 3. The individual’s paths are stacked vertically on top of each other and each path is a (thin) horizontal line on which each year is coloured according to the individual’s labour market state in that year.Footnote 19 The figure conveys two messages. First, only a small share spends the majority of the 29 years in self-employment. Overall, a quarter of the individuals spend no more than three years in self-employment, less than half spend more than 10 years in self-employment, and only around one third spend more than half of the 29 years in self-employment.Footnote 20 Second, the trajectories vary a lot across individuals and two trajectories of different individuals are hardly ever exactly the same. We want to know whether this large variation in trajectories can help us to better understand the large variation in income, wealth and pension savings of the self-employed.

Fig. 4
figure 4

Index plot of cohort born in 1961–1965 (self-employed)

3.2 Sequence Analysis and Optimal Matching

In order to have a better picture of how the trajectories differ, we turn to sequence analysis, a methodology that has been used in sociology for some years to study sequences of social events. It was first introduced in the field of social sciences by Abbott (1983) and further developed in, e.g., Abbott and Forrest (1986), Abbott and Hrycak (1990), Abbott (1995). Sequence analysis takes a holistic approach, not only taking account of the types of states that individuals experience over time, but also of their duration. It studies the complete sequence in order to understand the importance of different patterns that the trajectories may have. As argued by Studer and Ritschard (2016, p. 481), sequence analysis thus stands in contrast to, e.g., survival or event history analysis that focus on specific events rather than an overall view off the trajectories.

Sequence analysis compares all sequences pair-wise in order to find sequences that display similar patterns. In our case, the aim is to group individuals that share a similar history of labour market states at similar times and in a similar order. The first step is to compute a dissimilarity measure for all unique sequence pairs. Studer and Ritschard (2016) provide an overview of the dissimilarity measures that are used in the field. We use the most common measure for discrete sequences: Optimal Matching (OM).Footnote 21 OM provides a measure of “edit distance” between each pair of sequences: Given a pair of sequences, OM computes the least costly way in which one sequence can be converted into the other one. The operations used for this conversion are (1) substitutions, (2) deletions, and (3) insertions. Each of these is associated with a cost and the edit distance is the minimum cost at which the conversion can be achieved.

Consider the hypothetical example given by Table 5, which shows four individuals with different labour market trajectories. For the sake of simplicity, it covers only 8 years and three labour market states: employment (E), self-employment (S), and not working (N). Let us also assume unit costs for al operations. For trajectories 1 and 2 the computation is straightforward. The two trajectories only differ from each other in year 3. Substituting E with N at the third position in the first trajectory is the fastest manner to transform it into the second. The edit distance between the second and first trajectories is thus equal to 1. On the other hand, trajectories 2 and 3 only coincide in the last 4 years. Relying on substitution only would require four substitutions, resulting in a total cost of 4. Alternatively, using also insertion and deletion, we can achieve the same transformation with fewer steps. The solution with the fewest steps involves deleting the first period in trajectory 3, shifting the whole sequence one position to the left. Next the N, which is now in first position, is substituted with an E. Finally, an S is inserted in the seventh position. This gives an edit cost of 3 between trajectories 2 and 3.

Table 5 Hypothetical labour market trajectories

Our example used unit costs for all operations. It is easy to see that the optimal outcome and the distances between trajectories will change if costs are chosen differently. This raises the issue how the costs for the operations should be defined. A popular solution is to derive the substitution costs from observed transition rates. Low costs are then assigned to frequently observed transitions, and higher costs to rare transitions. Studer and Ritschard (2016) point out, however, that observed transition rates are generally low and the resulting substitution costs are close to 2. Because of this, the approach does not produce results that differ much from those obtained with fixed state-independent costs.

There is less discussion on the setting of insertion and deletion, or “indel” costs in the literature.Footnote 22 Most applications use the same value for insertion and deletion and the only choice therefore concerns the value that should be assigned to indel versus substitution costs. To understand this, consider the last of the example trajectories in Table 5. If we compare trajectories 1 and 4 we can see that both individuals spend half of their time in self-employment and the rest in employment. The shortest way to transform trajectory 4 into trajectory 1 involves six steps: either six substitutions or three deletions at the end of trajectory 4 together with three insertions at the beginning. If substitution and indel costs are the same, the algorithm is indifferent between the two. If insertion and deletion however are cheaper than substitution, indel operations will be the preferred way to make these two trajectories alike, etc. Thus as indel operations become cheaper compared to substitutions, OM will place less importance on the timing and more on the sequencing of events. Moreover, note that as soon as indel costs are less than half of substitution costs, any substitution is more costly than two indel operations achieving the same result.

Based on these considerations we decided to set substitution costs equal to 2 for all labour market state transitions, and we set the indel costs equal to 1.5. We then apply OM separately to each cohort, where before OM, we split each cohort into two samples separating those individuals that are at least 1 year self-employed from those that are never self-employed. The latter group will be used as a control group to which we compare the self-employed.Footnote 23

3.3 Clusters of Self-employment

The result of OM is a matrix containing all pairwise distances. The distance measures can be used to group all individuals into clusters using machine learning; see, e.g., Zacher et al. (2012), Madero-Cabib and Fasang (2016), Tophoven and Tisch (2016), Visser (2018), Munnell et al. (2019), or Humphries (2016). We follow this approach and use Ward’s Method (Ward 1963) to cluster the trajectories.Footnote 24

Ward’s Method produces a tree of potential groupings. In practice, the researcher has to choose how many clusters to use. This is not a straightforward decision—no “hard” statistical criteria exist and different criteria often lead to different numbers of clusters. Nevertheless, the applications in the literature show that clustering has its merits. In our application, it provides a tool to separate individuals with long periods in self-employment from those with short self-employment spells and helps to automatically distinguish groups that spend their time in different types of self-employment. We will choose cluster solutions that match ex-ante expectations of self-employment types. For example, we expect different clusters for long-term (hybrid) self-employed, freelance workers, and DGAs.

As with OM we apply the clustering algorithm to each cohort separately. Ward’s Method does not split up all cohorts in the same manner. If we would choose the number of clusters based upon popular criteria, we typically would get 5 or 6 clusters according to the Calinski–Harabasz index, and 6–8 according to the Average Silhouette Width (see Studer 2013 for a discussion of several measures). These clusters would not be the same across cohorts. Instead of strictly following the criteria, we choose a clustering that is harmonized across cohorts. We start with a rather fine partitioning into twelve groups for each cohort (Ward’s solution). This is the smallest partitioning that gives a meaningful grouping. Based on the Ward solution, we then build seven clusters of self-employment trajectories. As we will show below, this allows for a sufficient level of distinction between the different clusters while maintaining parsimony and manageability. We will use the clustering of the self-employed in cohort 6 as an example to illustrate this.Footnote 25

Figure 5 presents the same trajectories as shown in Fig. 4, but now broken down into Ward’s twelve groups. Several groups mainly contain one type of self-employed: Group 4 is dominated by hybrid self-employment, groups 8 and 9 are DGAs, and groups 11 and 12 are dominated by pure self-employment. Similarly, group 5 are individuals whose main income in most years are social benefits. We also see that timing of sequences matters: group 3 consists of individuals that start as self-employed and then switch to employment later, whereas groups 6 and 7 do the opposite. Finally, groups 1, 2 and 10 consist of more volatile trajectories which have short self-employment spells.

Fig. 5
figure 5

Index plot by clusters of cohort born in 1961–1965 (self-employed)

This leads to seven larger clusters involving self-employment that we construct from the 12 Ward groups (denoted in “[ ]”):

  1. 1.

    Weak labour market attachment/freelancer [1, 2]—In the older cohorts this cluster predominantly consists of individuals that spend a large share of their trajectory without any income. In younger cohorts this is less the case and most trajectories either start or end with a long spell without income. The predominant self-employment type observed in these trajectories is freelancer.

  2. 2.

    Benefit recipients [5]—This cluster contains the individuals who receive some type of benefit for most of the observed period.

  3. 3.

    Mostly employed with short SE spells [3, 10]—Almost no cohorts have a distinct cluster of individuals that start as self-employed and then switch to employment. We therefore also include these individuals in the cluster with short spells.

  4. 4.

    Employees that switch to SE later on [6, 7]—We do not differentiate between hybrid and (full-time) self-employed in this cluster.

  5. 5.

    Always pure SE [11, 12]—In particular in the younger cohorts (observed from age 30 or younger) this also includes trajectories where few of the initial years are spent in employment.

  6. 6.

    Always hybrid SE [4]—This cluster is defined similarly to the preceding cluster but for hybrid self-employed.

  7. 7.

    DGAs [8, 9]—For DGAs we do not differentiate with respect to the timing. Most DGAs either spend the whole trajectory in this state or the second half. Since DGAs are a small (and special) group, we merge both types into one cluster.

For the “control group” of individuals who are never self-employed, we follow a similar strategy. Their clusters are built on the six groups solution of Ward’s Method.Footnote 26 We retain four main clusters: (8) Weak labour market attachment (no SE); (9) Benefit recipient (no SE), (10) Employee with long spells as benefit recipient and (11) Employee (including short spells as benefit recipient). For cohorts 1–4 we also include a separate cluster (12) Pensioner, which consists of individuals that are pensioners for most of the years that we observe them.

4 Differences Among Self-employment Clusters

In this section we compare demographic characteristics, income and wealth of individuals across the clusters identified in Sect. 3. Table 6 shows the demographic characteristics across the twelve clusters involving (1–7) and not involving (8–12) self-employment. The patterns are largely similar to those for the labour market states shown in Table 2. For example, both clusters with weak labour market attachment (clusters 1 and 8) have a high proportion of women who frequently also have a partner that acts as the main income earner in their household. Similarly, women are also the majority (around 60%) in the clusters of benefit recipients (clusters 2 and 9). In the other self-employment clusters, the gender imbalance is less pronounced than in Table 2. It is slightly in favour of women in the cluster of individuals with short self-employment spells (3), whereas the other self-employment clusters contain more men than women. In particular, the DGA cluster (7) remains predominantly male (around 80%). For late switchers (4) and hybrid self-employed (6) the share of men is lower in Table 6 than in Table 2. This can be explained by the distribution of the length of self-employment spells.Footnote 27 Men generally spend more time in self-employment than women and hence make up a larger share of the self-employed observations in Table 2. For similar reasons we see a lower share of breadwinners in clusters 4–7, reflected in the relatively low share of breadwinners among individuals with short SE spells.

Table 6 Demographic characteristics across clusters in sample

We find similar differences for marital status. Individuals in clusters 1 and 8 are mostly married. For immigration status, there is no significant difference across clusters, and the share of both western and non-western immigrants (first and second generation) is slightly larger in the low attachment and benefit recipient clusters.

Average age is similar across clusters by construction, as all individuals are observed for a similar period in their lives. Still, the increase of self-employment in younger cohorts is reflected in a lower average age for the self-employment clusters (3–7). Furthermore we see the increase in labour market participation among women in younger cohorts reflected in a higher average age of cluster 1 in comparison to cluster 8. In the latter cluster, most individuals do not participate in the labour market and that cluster’s sample share is decreasing over cohorts.

4.1 Self-employment over time

Figure 6 shows the same overall self-employment shares as Fig. 2 and adds the contributions that different clusters make to the self-employment rate. Only about half of the share is due to the clusters in which individuals spend most of their working years in self-employment (hybrid and pure self-employed and DGAs). The share of individuals who become self-employed later in their career (cluster 4) is growing substantially over time, to approximately a quarter of the self-employment share in recent years. Finally, a constant share of approximately three to five percentage points is always attributed to the clusters of individuals that do not remain in self-employment (clusters 1–3). These findings are similar if we look at the cohorts separately (details available upon request).

Fig. 6
figure 6

Self-employment share and contributions of clusters

Table 1 in “Appendix A” shows that in most clusters, around three out of four individuals have only one self-employment spell, while roughly one in five have two. The exception is the cluster with weak labour market attachment where a larger share has more spells. In clusters 1, 2 and 3 the median spell length is only 2–3 years, whereas individuals in the other clusters have a median spell length of at least 7 years. For pure and hybrid self-employed, the top 25% spend almost all of the observed period in self-employment. The distribution of self-employment spell lengths hardly changes across cohorts.Footnote 28

4.2 Income

The upper and middle sections of Table 7 show individual taxable income and disposable household income (adjusted for household size) by gender and cluster. Comparing the results to Table 3 we see that most of the results for the clusters are similar to those for the corresponding labour market status. We still find that self-employment is different for men and women. Women who switch to self-employment after a career as an employee have higher incomes than other self-employed clusters, with the exception of DGAs. Furthermore, women with short self-employment spells also do relatively well in terms of income. The same holds for men where this is in fact the best earning cluster.

Table 7 Taxable individual and disposable household income, wealth and sample share (%) by gender and cluster

Disposable income for the household gives smaller differences among clusters. For women the starkest contrast is between individuals with short self-employment spells and the other groups. For men, both pure and hybrid self-employed have larger interquartile ranges than switchers and individuals with short spells.

To better understand the relation between our clusters and income, we estimate the following model:

$$\begin{aligned} y_{it} = \beta _0 + X_{it}'\beta _x + L_{it}'\beta _l + W_{i}'\beta _c + C_{i}'\gamma + T'\delta + \epsilon _{it} \end{aligned}$$
(1)

Here \(y_{it}\) is the income variable of interest.Footnote 29\(X_{it}\) is a vector with characteristics of individual i in period t such as age and its square, civil status and country of origin controls,Footnote 30 as well as dummy variables for whether the individual is their household’s main income earner, whether self-employed individuals are solo self-employed (SSE).Footnote 31 Finally, we include dummy variables indicating the household’s main income source in each period.

\(L_{it}\) is a set of labour market state dummies, while \(W_i\) controls for the clusters. The coefficients of the labour market states give the direct (current) impact, whereas the cluster coefficients say how income varies with different labour market careers, keeping current labour market state (and demographics etc.) constant. We take employees as the base category for labour market state, cluster, and main income source. Results should therefore be interpreted relative to employees. \(C_i\) and T capture cohort and year fixed effects, respectively.

As a baseline, we first estimate Eq. (1) without the cluster information. For both income measures, the year effects first increase over time and then decrease, returning to the 2006 levels in the last few years of IPO. For taxable income of males, the younger cohorts (5, 6, 7, 9) do significantly better than the older cohorts (particularly reference cohort 4). On the other hand, disposable household income is higher for cohorts 2 and 3, suggesting that older individuals have around 700–1000 euros more, ceteris paribus.

Table 8 shows the other regression results for men and women with taxable income as the dependent variable.Footnote 32 The results for disposable household income are shown in Table 9.

Table 8 Panel regression—taxable income
Table 9 Panel regression—disposable household income

For all specifications, the cluster dummies are jointly significant. Furthermore, the estimates for the other coefficients (e.g. the effects of current labour market status) do not change much. DGAs have significant and positive coefficients for their cluster as well as their labour market state in all specifications. That is, individuals in the DGA cluster earn more, ceteris paribus, than other employees, even if they are not currently self-employed, and earn even more once they become self-employed. This is in line with the result on incorporated individuals in Humphries (2016). Similarly, for the clusters with weak labour market attachment or benefit recipients, we find large negative coefficients.

The picture is less clear for other self-employed clusters. For example, we find negative cluster effects for the pure self-employed (larger in absolute terms for men), but the direct effects of the labour market states differ by gender and income measure: While there is no significant effect on women’s taxable income, we find a large negative effect for men. Vice versa, we find no significant effect on men’s disposable household income but a strong positive effect for women, which dominates the negative cluster effect.

Hybrid self-employed generally have insignificant cluster coefficients. We find positive direct effects of hybrid self-employment for both genders on disposable household income, but a negative direct effect on men’s taxable income. Similarly, we find no significant cluster effect for individuals that become self-employed later in their career except in the disposable income regression for women where the effect is positive. This cluster effect is small compared to the direct effects we estimate.

Finally, we find opposite signs for the cluster with short self-employment spells for both income measures. Women in this cluster earn less than employees while men earn more. In periods where men are self-employed, this will generally mean that they earn a lower taxable income, as the cluster effect is smaller than the direct effect.

On average, self-employed individuals earn less in terms of individual taxable income, but with the exception of freelancers, they have a slightly larger disposable household income. Since the difference between taxable and disposable income of employees includes pension contributions, the higher disposable income of self-employed individuals may not mean they are better off if they do not save for their pension. Furthermore, solo self-employment has a large significantly negative coefficient.

4.3 Wealth

The lower panel of Table 7 reports household wealth adjusted for household size by cluster. As for the values by labour market status, the clusters with the self-employed have higher wealth at each quartile. Wealth still has high variation within clusters, as shown by the large inter-quartile ranges. Median household wealth of individuals with short SE spells or with a switch from employment to self-employment later in their career, is much lower than median wealth of pure and hybrid self-employed, for both men and women. Overall, differences across clusters are larger for the higher quartiles than for the first quartile. Household wealth of women in a given cluster is generally higher than that of men, probably because of the partner’s income.

We estimate Eq. (1) with household wealth as the dependent variable, adding an additional control variable based on the standard industrial classification that is available for the self-employed starting from 2005: a dummy variable with value one for individuals active in the agricultural, forestry, or fishery sector. This aims at identifying farmers whose main wealth component often is their land. Table 10 shows the results.Footnote 33

Table 10 Panel regression—household wealth

We find negative year effects starting around 2009/2010, which can be attributed to housing wealth—There are no significant year effects in the regression that excludes owner-occupied housing from wealth. Interestingly, we do not find any significant cohort effects on household wealth. This stands in contrast to Mastrogiacomo et al. (2016) who suggest that younger cohorts are less wealthy. While our regression sample excludes the oldest two cohorts due to data availability, we would have expected cohort effects for, e.g., cohort 3 and 4.

As for income, we find that the cluster dummies are jointly significant. Unlike for income however, we find almost no differences between genders in signs and significance. All self-employed clusters except the first two have significant and positive coefficients. Current self-employment labour market states also have significantly positive coefficients, except freelancers for men. These direct effects have a clear ranking—DGAs have the highest household wealth and hybrid self-employed the lowest. The pattern of the cluster effects is clearer than for income: For both genders, pure and hybrid self-employed clusters have coefficients that are around twice as large as those of short self-employment spells or late switchers.

Compared to women, self-employed men seem to be in a better financial position. The magnitude of the coefficients for males who are always self-employed is around twice that of females. The results from the wealth regression therefore suggest that men who are always self-employed tend to accumulate more wealth than employees, and that this may partially offset shortages in their pension accumulation. The questions that remain are whether the approximately one hundred thousand euro extra that we find for them is sufficient to bridge the gap in pension accumulation, and how the other clusters fare with less wealth.

5 Pensions of the Self-employed

In this section we analyse the pension accumulation of the self-employed, considering pension investments in the second pillar (mandatory occupational pensions) and the third pillar (voluntary private pensions). We do not consider the first pillar as our sample is restricted to individuals who spend (almost) their complete working life in the Netherlands and therefore receive the full public pension.

Table 11 shows the share of individuals who make contributions to the second and third pillar in a given year, by gender and cluster. Women contribute less often than men in the same cluster, in line with Table 4. The cluster of DGAs contributes more often than what Table 4 would suggest. Approximately half of the DGA cluster spends the first half of their career as employees and accumulate an occupational pension during that time (Table 4), but they do not pay occupational pension premiums when they are DGA (Table 4). Self-employed individuals in the cluster with weak labour market attachment and those who are benefit recipients during most of their working life are comparable to their non-self-employed counterparts, though self-employed men in these groups prepare somewhat better for retirement through the third pillar. For men, the difference between shares of the short self-employment spells cluster and the cluster of employees is what we would expect given the median length of their self-employment spells.Footnote 34 For women the gap of approximately 20 percentage points is larger than expected. Neither men nor women bridge the gap in the second pillar by participating in the third pillar—Their 3rd pillar participation does not differ from that of employees. The same applies to the cluster that becomes self-employed later in their career, who have an even larger participation gap in the second pillar. A similar result holds for the clusters of pure and hybrid self-employed. Men in these clusters participate in the third pillar more often, but these differences are small compared to the differences in second pillar participation. For women, if they cannot count on a partner’s pension, the outlook is even bleaker, considering that their third pillar participation is very low.

Table 11 Percentage contributing to second and third pension pillar

The administrative data on pension participation (“pensioendeelnemingen”) contain information on second pillar entitlements. It covers the years 2005–2014 and is based on an annual survey among a representative sample of pension funds. Not all these funds are present in every year due to non-response. Because the data do not cover all pension funds we do not know whether an individual for whom we do not observe a pension account never participated or whether their pension fund is not in the sample. Moreover, if someone has more than one pension fund, the observed entitlement may not be their total entitlement. Moreover, due to the fluctuations in sampled pension funds across years, the time series of each individual gives excessive noise, preventing the use of panel data models.

To account for these data issues, we proceed as follows. We first add up the pension entitlements over all observed funds for each individual within a year. We then deflate monetary values and take the median by individual across all years. Overall, this gives pension information for about half of the individuals in our sample. Table 12 reports sample statistics by cluster and gender for three variables.Footnote 35 The first is pension contribution years, converted to full-time equivalents. The second variable is the pension annuity to which individuals are entitled given their accumulated pension rights at the time of reporting (calculated by the pension funds). Third, the current capital entitlement is reported (also based on pension funds’ calculations).

Table 12 Occupational pension entitlements

Note that all three variables refer to pension rights accumulated at the time of the survey, which is, for most individuals, quite a long time before retirement. Pension accumulation between the survey date and retirement will most likely be lower for self-employment clusters than for employees, so that the ultimate gaps will be larger than those considered here.

Table 12 shows the same picture for all three variables. Employees have much higher medians than any of the self-employed clusters and women almost always have lower values than men. The two self-employed clusters with weak ties to the labour market are comparable to their non-self-employed counterparts. For all other self-employed clusters we find large gaps in comparison with the employees’ values, particularly for men. For the cluster with short self-employment spells, the gap in contribution years corresponds approximately to the median self-employment spell length for women. For men the gap is much larger, in particular for the third quartile. The pure self-employed, who by definition spend at best a few years in employment, have almost no pension entitlements and will need to rely on a third pillar annuity or on other private wealth.

Table 13 reports the results of a regression that shows the ceteris paribus associations between the pension accrual variables and the clusters, similarly to the income and wealth regressions. We use the same demographic characteristics as in the income regressions and take their values from the last year in which we have information on an individual’s second pillar accrual. We add year effects based on the last year we observe an individual in the pension data to control for level differences due to different timing in observations. Because of the limitations of the data, we do not include the labour market state dummies.

Table 13 Regression—accruals in 2nd pillar

Both regressions show the same picture and the cluster coefficients confirm what we have learned from Table 12. Self-employed who mostly receive benefits are worse off than their non-self-employed counterparts. Among the other self-employed clusters the pure self-employed are worst off and will on average receive substantially lower pension annuities after their retirement—in the case of men a good 500 euro less per month. The cluster effect for career changers shows that the estimated average gap in pension capital for men is of similar magnitude to the corresponding coefficient estimated in the wealth regression, suggesting that this group may be particularly ill-prepared for retirement.

To investigate whether self-employed individuals use the third pillar to counter-balance gaps in the second pillar, we analyse the probability of making a payment into the third pillar using a probit model, and estimate a tobit model to understand how much they invest. In both models we use the same controls as in the income regressions as well as a dummy that takes value one when a positive employee contribution to the second pillar is observed.

Table 14 shows the results of the probit model. The dependent variable is a dummy that takes value one if a payment to a tax-deductible third pillar pension plan is observed and zero otherwise. Since individuals with no income do not make contributions to the third pillar, these individuals are excluded. Four results stand out: first, individuals who make a contribution to the second pillar in the same year are more likely to save in the third pillar. Second, among the different self-employed clusters, only DGAs are more likely to make a third pillar contribution than employees, whereas we find significantly negative effects for the weak labour market attachment and benefit clusters (1 and 2). Third, current solo self-employed contribute less often than others. Finally, individuals that live in households deriving their main income from pure or hybrid self-employment (main income source: profit) are more likely than their other self-employed peers to make payments to the third pillar.

Table 14 Probit regression—participation in 3rd pillar

The results of the tobit regressions are shown in Table 15. The results are largely in line with the probit results. Again, we find a positive association with contributing to the second pillar. As before, DGAs are the positive outliers. They are the only cluster with a large positive (and significant) coefficient, contributing much more than employees. The coefficients on labour market status tell us that once individuals are active as self-employed, their payment to the third pillar increases compared to employees. Furthermore, pure self-employed have the largest coefficients—they make the largest contributions to the third pillar. Again, we also find a negative effect for SSE.

Table 15 Tobit regression—contributions to 3rd pillar

6 Conclusion

To analyse the heterogeneity in labour market trajectories of the self-employed in the Netherlands, we have classified the self-employed into different clusters using sequence analysis. We have identified the following seven distinct clusters of self-employment across all cohorts considered: (1) self-employed with weak labour market attachment, (2) self-employed that spend a large portion of their trajectory as benefit recipients, (3) employees with short self-employment spells, (4) employees that switch to self-employment later in their career, (5) always pure self-employed, (6) always hybrid self-employed, and (7) DGAs (director and major shareholder).

Unlike cross-sectional studies, sequence analysis provides a holistic view that incorporates the whole labour market trajectory of each individual. We find that half of the individuals that are ever self-employed, (those in clusters 1–3) remain so for only a few years. In any given calender year, approximately one third of the self-employment rate can be attributed to these short-term self-employed.

For wealth, we find no statistically significant cohort effects within clusters. This suggests that the finding in earlier studies that the current self-employed are less wealthy than those in older cohorts is due to a shift in the share of different clusters in the total self-employed population rather than to a shift within clusters. In line with this, we find that the share of individuals that become self-employed later in their career has increased over time, and this cluster has accumulated less wealth than the other self-employed.

In terms of income, wealth, and pension accumulation, we show that clusters 1 and 2 are worse off than employees and do not differ a lot from their counterparts amongst the never self-employed. Their average income is lower than that of employees, also in the periods when they are self-employed, they accumulate less wealth, and, particularly, do not invest in voluntary (or mandatory) pensions. Policies targeted at the self-employed only are unlikely to affect their post-retirement outcomes—These individuals would benefit more from policies targeted at everyone at the lower end of the labour market.

Individuals in cluster 3 have unexpectedly large gaps in their occupational pension savings. They accumulate more financial wealth than employees, but this will probably not be sufficient to cover these gaps. Similar results are found for those who change to self-employment halfway through their career (cluster 4). Compared to employees, they neither invest more in a voluntary pension nor in other (non-pension) wealth. On the other hand, their income is similar to that of employees. This suggests that further research into the adequacy or inadequacy of their pensions is necessary.

The long-term self-employed, clusters 5–7, hold larger amounts of wealth than employees or clusters 3 and 4. Clusters 5 and 7 have almost no accruals in the second pension pillar, but the majority of them may still be well prepared for their retirement.

There is substantial difference by gender: the wealth differential between self-employed clusters and employees is much larger for men than for women. Women in the self-employment clusters also have less pension accruals in the mandatory occupational pensions and participate less in voluntary private pensions. They may often be able to rely on a partner as main earner in the household, as long as the couple remains intact. Studying household composition dynamics (i.e. divorce or widowhood) and its consequences for pension adequacy is left for future research.

Our findings on DGAs replicate results found for incorporated self-employed in the US and Sweden. Given that we still find large variation in wealth within our clusters and that there are different unincorporated business structures in the Netherlands, it seems worthwhile to study if using more information on the legal form can help to further differentiate among clusters of self-employed. The same holds for the solo self-employed. We find that they are worse off than the other self-employed but the short time period for which their information is available, did not allow to study them in detail. More work on this seems useful. Finally, better data on pension accruals is needed. Because of the current data limitations our results give us only a lower bound on accumulation of an occupational pension. Integral pension data is needed for a clearer understanding of how pension accruals differ across clusters.