1 Introduction

Economic dynamics are characterised by a noisy selection environment that (imperfectly) rewards superior performance. This paper investigates whether the selection environment becomes clearer—more predictable—in the years after entry. The benefits of greater predictability accrue to business owners, to providers of finance and to governments. For business owners, the value is the availability of a route-map to enable them to plan ahead and check progress over time (Dencker et al. 2009). For providers of finance, being able to more accurately estimate the optimal date to provide finance is valuable, because too early an investment may be too risky, whereas delay may mean the opportunity is seized by a rival (Cumming et al. 2015). Finally, governments are continually faced with the choice of using taxpayer’s funds to support and stimulate start-ups, or instead to delay support until performance metrics become clearer (Pons Rotger et al. 2012). An optimal combination of support at different stages as new ventures evolve could provide considerable social and economic returns.

This paper is motivated by a desire to explain and then to use these explanations to predict, post-start performance so providing the benefits of greater predictability to all three parties. It takes two alternative measures of performance (Miller et al. 2013)—survival and sales growth—and assesses whether, over time, our ability to explain these two performance variables improves. In the phrasing of this paper—does the fog lift with time? If so, when does this greater clarity appear? Is it after 1 year, or 10 years? Or, does the fog lift only gradually but continually? Is there a clear ‘step’ at a clearly-identified point in time?

Our theoretical starting point is the Levinthal (1991) random walk model which we apply to new, as opposed to well-established, ventures. Here, each enterprise has an initial stock of resources which expands or contracts depending on post-entry growth, which is determined by stochastic shocks. Exit takes place when the stock falls below a minimum threshold.Footnote 1

Given these assumptions we distinguish between new venture sales growth and survival. Following Levinthal (1991), if the sales growth of a new venture follows a random walk, there is no improvement in our ability to predict growth in the years after entry—hence the fog is thick and remains thick over time. However, we assume new ventures have different (financial) resource endowments, enabling those with more resources to survive shocks that would lead to the exit of those with fewer resources. These financial endowments are either present at start-up, or accumulated through post-entry growth. Survival rates are therefore expected to increase and become more predictable, in the years after entry, as surviving new ventures acquire the financial resources that enable them to ‘ride out’ the inevitable vicissitudes of trade that characterise their early months and years.

These predictions for growth and survival are tested using a cohort of 6579 new ventures in the UK, all of which began to trade in the same quarter of 2004, where every financial transaction is tracked over 10 years. With this unique data we show that our ability to explain sales growth decreases as the venture ages, because as time goes by this becomes more random. When we focus only on firms that survive until the end of year 10; however, for this subsample of surviving firms, our ability to predict growth remains constant over time. Regarding survival, our ability to predict which firms will remain in operation increases slightly in the years after entry. Our results are therefore broadly consistent with our model.

Our specific contribution is then to demonstrate that, even if the sales growth of a new venture increasingly approximates a random walk, its survival becomes more predictable. The growth fog becomes thicker over time, but the survival fog becomes less dense. Perhaps the paper most closely related to ours is Lotti et al. (2009), who present evidence that firms converge to a random growth model (i.e. Gibrat’s (1931) ‘Law of Proportionate Effect’) in the years after entry. In our paper, however, we look more widely at our ability to explain growth and survival in the years after entry. Another related paper is Wiklund et al. (2010), who observe that the explanatory power of financial indicators decreases in the years after entry, when the task is to explain survival. In our analysis, we include other variables (beyond financial indicators) as explanatory variables for performance (measured in terms of both survival and growth), and present finer-grained evidence on the year-by-year evolution of the model fit statistics.

The remainder of the paper is set out as follows: Sect. 2 provides the theoretical context that is used in Sect. 3 to derive hypotheses. Section 4 presents our methodology. Section 5 presents the dataset, and we test our hypotheses in Sect. 6. Section 7 concludes.

2 Theory development

Conceptualising firm performance as a random walk has a long history in economics (Gibrat 1931; Ijiri and Simon 1964; Levinthal 1991; Denrell et al. 2015). Random processes produce results that closely match the outcomes of many top performing companies, to the extent that investigating whether or not performance is purely random remains a valid research question (Henderson et al. 2012; Denrell et al. 2015; Storey 2011). This need not imply that managers do not put thoughtful planning and effort into their business decisions, because it could be that competition is so fierce, and businesses are all more or less ‘neck-and-neck’, that there may not be any easily observed systematic factors that allow new ventures to enjoy prolonged above-average performance in the years after entry. ‘Chance models are, in fact, compatible with effortful managers who carry out deliberate actions’ (Denrell et al. 2015, p. 936). Our preference for chance models in this paper is because random walk models offer useful approximations to real-world phenomena (Levinthal 1991; Henderson et al. 2012; Denrell et al. 2015), and also because random walk models can provide simple and clear theoretical predictions that can be developed into testable hypotheses.

Levinthal (1991) was amongst the first to formally explore how random processes could shed light on venture survival in a managerial context. His model had two key assumptions. The first was that firm growth was modelled as a random walk, and the second was that survival depended upon access to resources or assets that could be used to finance the shocks experienced by the business in a random walk.Footnote 2

Levinthal emphasised that the random walk model is compatible with variations in competence amongst enterprises. He writes (p. 399):

While variation in competence should shift the mean of the possible distribution of outcomes, and perhaps the variance as well, the presence or absence of competence does not fundamentally alter the stochastic nature of the process.

The Levinthal (1991) model also puts forward that the amount of the assets is determined by two factors—past performance and initial resources. We assume that access to more financial resources improves the chances of survival. However, there are two key respects in which the assets of the new venture differ from that in an established firm. The first is that, in an established venture, the assets primarily comprise those accumulated over time, whereas those available to the new venture are considerably more likely to be those in place when the venture begins. Second, in an established firm the accumulated assets constitute a ‘track-record’ which can help internal and external parties assess future performance, whereas no such record exists for a new venture. This is particularly problematic for external suppliers of finance—banks, trade creditors—who then seek ‘signals’ of credibility, such as collateral (Voordeckers and Steijvers 2006).

Our model therefore assumes the returns from venture creation are a random walk and this payoff structure attracts individuals who are optimistic and favour situations where, although the expected returns may be negative (Hamilton 2000), the variance is high and positively skewed. Survival, in turn, reflects the availability of resources (i.e. resources available at start-up, as well as those obtained from post-entry performance).

More formally expressed, growth occurs through the following random process:

$$x_{t} = \, x_{t - 1} + \, \varepsilon_{t} ,$$
(1)

where x t is the logarithm of firm size at time t, and ε is a random shock (additive in logs, but multiplicative on a linear scale) with mean µ and standard deviation σ.

Survival is a function of the stock of accumulated resources, so survival, S, depends on whether a firm’s resources exceed a minimum threshold size x*:

$$S \, = \, 1\quad{\text{if}}\quad x^{l} > x^{*} ;\quad {\text{otherwise}}\;S = 0$$
(2)

where x l is a latent variable that corresponds to x if x l > x*, but remains unobserved if x l ≤ x*. If the exit threshold is positive, i.e. x* > 0, then players will not persist until their resources reach zero, but quit the ‘gambling table’ even when resources are positive (Gimeno et al. 1997).

3 Hypotheses derivation

Our primary interest is in whether the selection environment for new ventures improves—becomes more predictable—in the years after entry. To do this we investigate the explanatory power (or goodness-of-fit, represented by the R 2 statistic) of models that seek to explain the growth and survival of new ventures.

3.1 Growth

If new venture growth is a random walk, à la Levinthal, the dynamics of (log) size are x t  = x t−1 + ε t , the growth rate (in log-differences; Tornqvist et al. 1985) is expressed entirely in terms of a random shock: x t  − x t−1 = ε t . Growth is well approximated by a random walk in the years after start-up, and our inability to make systematic predictions for post-entry growth implies that the expected R 2 from growth regressions is low and remains low in the years after entry.Footnote 3

Hypothesis 1

The R 2 from regressions of the determinants of new venture growth does not increase in the years after entry.

3.2 Survival

To examine survival, firm size at time t is xt, with start-up size being denoted as x 0. In a random walk model à la Levinthal (1991), firm size evolves, with x t  = x t−1 + ε t , where ε t is distributed with mean μ and variance σ 2. When μ = 0, we have a pure random walk, whereas when μ > 0 (following Le Mens et al. 2011) then there is a steady increase in expected resource stock over time.

Firms are assumed to exit when their size (proxied by their resource stock) reaches zero. The time taken until the firm first exhausts its resources (i.e. x l ≤ x*, for the case when x* = 0) is expressed as the cumulative distribution function of a random variable in the following way (known as the Bachelier–Lévy formula):

$$F(T|x_{0} ,\mu ,\sigma ) = N \left( { - \frac{{\mu t + x_{0} }}{\sigma \sqrt t }} \right) + {\text{e}}^{{ - 2x_{0} \frac{\mu }{{\sigma^{2} }}}} \cdot N\left( {\frac{{\mu t - x_{0} }}{\sigma \sqrt t }} \right)$$
(3)

where N(·) represents the cumulative density of the standard normal distribution (Levinthal 1991; Coad et al. 2014). Time to exit is thus a function of three parameters: the trend in the random walk μ, the variance σ 2 of the growth shocks, and start-up size x 0. Footnote 4 Even if growth is a random process, expected survival time can be increased by increasing the size at start-up x 0 (Levinthal 1991; Coad et al. 2014). The R 2 from survival regressions therefore depends on both start-up size and growth since start-up.

We now apply a simulation model to derive implications of the Levinthal random walk model for the evolution of the R 2. We generate an artificial dataset of 50,000 firms, whose start-up size is calibrated according to the lognormal distribution with mean 10.55 and standard deviation 1.5, in order to closely follow the start-up size distribution observed in our data. We then generate a distribution of growth rates, distributed according to the Laplace or ‘symmetric exponential’ (Stanley et al. 1996; Bottazzi and Secchi 2006), with mean µ = −0.1 and standard deviation σ = 0.9 (again, closely following the values observed in our data).Footnote 5 Firm size evolves as a random walk, x t  = x t−1 + ε t , given the distributions of start-up size and growth rates given above, for t = 60 periods. The exit threshold x* is set at 7 in the baseline case, which is deliberately chosen to be a relatively high value that will guarantee that in each period some firms will exit (thus avoiding a degenerate value for the R 2 in any year’s survival regression in which all firms survive). For each individual period up to t = 60, we estimate a probit survival regression (with a constant term and a single explanatory variable: lagged size) and record the Nagelkerke R 2 statistic.

Figure 1 shows that the R 2 clearly increases in the years following start-up. This is because, with the passage of time, surviving firms overcome the liability of newness and grow to become sufficiently large that they have accumulated a ‘buffer’ stock of resources, and no longer operate on the brink of the exit threshold. Firms that start small, on the other hand, are more likely to be quickly weeded out through a selection effect. As these chaotic, short-lived firms are removed, the selection environment becomes less ‘foggy’. The central point here is that the R 2 value rises over time even when growth is a random walk.

Fig. 1
figure 1

Evolution of the Nagelkerke R 2 using simulated data, for 60 periods. y-axis: Nagelkerke R 2 obtained from probit regressions where exit depends on lagged size. x-axis: time period. Baseline case (with exit threshold x* = 7) appears as a solid line; x* = 8 for the long-dash line; x* = 9 for the short-dash line. Linear trend-line plotted for the baseline case

Hypothesis 2

The R 2 from regressions of the determinants of new venture survival increases in the years after entry.

4 Testing for changes in the density of the fog

Of crucial interest for our paper is measuring what we call ‘fog’—the coefficient of determination, or R 2 statistic. The standard R 2 statistic is expressed in terms of how well an OLS regression model can explain the total variation in the data:

$$R^{2} \equiv \frac{{{\text{SS}}_{\text{reg}} }}{{{\text{SS}}_{\text{tot}} }} = \, 1 \, - \left\{ {\frac{{{\text{SS}}_{\text{res}} }}{{{\text{SS}}_{\text{tot}} }}} \right\}$$

where SSreg is the regression sum of squares (i.e. the explained sum of squares), SStot is the total sum of squares, and SSres is the residual (i.e. unexplained) sum of squares. The R 2 statistic provides meaningful information on how well a set of variables can explain a given outcome, or how well we can predict real-world outcomes on the basis of our available information (Bertrand and Schoar 2003; see also Syverson 2011, p. 340). Cox and Snell (1989) suggested that the R 2 statistic be generalised to other regression models (such as regression models with binary dependent variables) where maximum likelihood is the criterion of fit. They suggested the following R 2 statistic:

$${\text{Cox}}{\text{-}}{\text{Snell}}\,R^{2} = \, 1 \, - \left\{ L\left( 0 \right)/L\left( {\hat{\beta}} \right) \right\}^{{\frac{2}{n}}}$$

where L(\(\hat{\beta }\)) and L(0) denote the likelihoods of the fitted and ‘null’ models, respectively. The Cox–Snell R 2 statistic has a number of desirable properties (e.g. it is asymptotically independent of the sample size), although a drawback is that it reaches a maximum value that is lower than unity for discrete models (Nagelkerke 1991). Therefore, it has been suggested that the Cox–Snell R 2 be adjusted as follows, to obtain what has become known as the Nagelkerke R 2 statistic, after Nagelkerke (1991):

$$\begin{aligned} & {\text{Nagelkerke}}\;R^{2} = {\text{ Cox}}{\text{-}}{\text{Snell }}R^{2} /\left[ {{ \hbox{max} }\;\left( {R^{2} } \right)} \right] \\ & {\text{where}}\;{ \hbox{max} }\;\left( {R^{2} } \right) = 1 - L\left( 0 \right)^{{\frac{2}{n}}}. \\ \end{aligned}$$

Because of its desirable statistical properties we use the Nagelkerke R 2 statistic, although we check that our results are not sensitive to this choice of R 2 statistic.

We begin by running regressions on cross sections corresponding to each year, where the dependent variable is either growth rate or survival probability.Footnote 6 For each year we obtain a Nagelkerke R 2 statistic. We then plot the evolution of the Nagelkerke R 2 over time using line charts—one chart for growth, one for survival.

5 The dataset: Barclays bank customer accounts

5.1 Start-up: definition

We exploit a rich and unique dataset drawn from non-financial firms identified as start-ups or new ventures that entered the business customer base of Barclays Bank between March and May 2004. At that time about one in four UK start-ups banked with Barclays. The sample excludes established businesses that switched from another Bank. We are aware that a new business does not necessarily start trading immediately upon opening an account. Indeed, for Barclays’ customers, approximately five per cent of start-ups show no activity through their account in the subsequent 12 months. We addressed this by only including firms that showed activity in the month following entry to the customer base.Footnote 7 , Footnote 8

We therefore focus on a cohort of 6579 firms that have the same start date. We consider this to be important, because firms starting in different years may not be readily comparable (especially if the macroeconomic conditions at start-up have persistent effects on firm development in subsequent years). Focusing on a single cohort means that firms face the same macro-economic conditions at each year of their development and can therefore be meaningfully compared (Ryder 1965; Anyadike-Danes et al. 2015). We then track the cohort for a maximum of 10 years, a period of time that we consider to be sufficiently long for our purposes, given that over 80 % of the ventures will have exited in that time (Anyadike-Danes and Hart 2014).

5.2 Start-up: data

Prior to opening the new business account, data were collected on the founder(s) gender, age, highest level of educational qualifications; prior business experience; previous ownership; and/or ownership amongst immediate family members. Finally, to capture access to non-financial resources, owners were asked about the sources of advice and support they used prior to start-up.

These data were then supplemented by the bank as part of its general account opening process. This covers the legal form of the business, the activity type (sector/branch/market) and its location (standard region) within the UK. Table 2 in ‘Appendix’ sets out the data definitions in full.

5.3 Ongoing data

To measure the size of the business we used credit turnover—the value of payments into a current account,Footnote 9 which we will refer to as ‘sales’. This serves as a very close approximation to sales revenue inclusive of taxes.Footnote 10 The much greater granularity of sales compared with using measures of employee numbers is a particular strength.Footnote 11 It is also reliable, comprehensive and, because every financial transaction is documented, the scale of volatility can be reliably quantified. Although credit turnover was initially observed by the bank at monthly intervals, the data we have have been aggregated over 12 months to analyse annual values, since our focus here is to explain long-run, rather than short-run changes.

5.4 Exit and closure

Establishing precisely when a business has closed is perhaps the most challenging aspect of any study of new ventures. Even for datasets taken from near comprehensive official sources, the date at which exit occurs may be some time after actual closure.Footnote 12

When using bank records, there are two main issues to resolve. The first is to distinguish between those businesses that have closed, and those that have switched to another bank. For our dataset we used Barclays closure-reason-codes that record why any given account has been closed. 1.38 % of our initial sample switched over the 10 years covered by the dataset, i.e. they had closed their account with Barclays, but continued to trade.Footnote 13 These were dropped from our sample before we started the analysis.

The second issue is judging when a given business has actually closed. While the majority of Barclays customers ceasing to trade clearly close at a specific time when no more transactions take place, an important minority become dormant, i.e. their account remains open, but with no activity.Footnote 14 For the firms in our sample we used a simple rule—if the business had shown no sales in consecutive 6-month periods, then it was deemed to have closed in the first of these periods.Footnote 15

It is important to note that this process identifies closures. It is not limited to business ‘failures’. By the latter we mean those firms that cease to trade with some external financial liability. Of course, as noted earlier, a closing firm may, or may not, have met the objectives of its owner(s), although closure may equally reflect that a better opportunity has presented itself to the business owner(s) (Headd 2003; Harada 2007). Finally, cases of entrepreneurial exit (but business continuation) such as an initial public offering (IPO), merger or acquisition (M&A) or trade sale will not have a confounding effect on our measurement of business exit (Wennberg et al. 2010; Coad 2014), because if the firm continues operations with the same bank account, it will be treated in our dataset as a continuing firm, whereas if it switches its bank account to a different bank, it will be treated in our dataset as a ‘switcher’ and dropped prior to analysis. Nevertheless, cases of IPOs, M&As and even trade sales are negligible because our new ventures are both young, small and representative of all sectors (apart from financial services). The tech-based services in which these outcomes are particularly characteristic constitute only a tiny proportion of the sample.Footnote 16

5.4.1 Dependent variables

We take two dependent variables as alternative indicators of new venture ‘performance’ (Miller et al. 2013). Survival is a binary variable, equal to 1 if the enterprise continues to trade at end of period (=0 if the enterprise exited). The Growth Rate is measured in terms of growth in credit turnover (or ‘sales’, the value of payments into a current account) excluding payments from a related account (deposit account).Footnote 17 Sales growth has many advantages over other metrics of growth, such as employment, for new ventures. The first is because growth in terms of employment is ‘clunky’ (Coad et al. 2015, p. 6) due to integer constraints in terms of employee headcounts. These are particularly important for new ventures (e.g. a solo self-employed individual contemplating her first hire, who can either remain static or double her size—and nothing in between).Footnote 18 Second, the decision to take on a new/first employee is a huge decision by a NV and presents problems of interpretation since it reflects a, difficult to specify, combination of past and current performance as well as future expectations. Finally, most new ventures, in our sample, are too small to employ others—certainly when they start to trade.Footnote 19

We calculate the annual growth rate in the usual way (see e.g. Tornqvist et al. 1985; Coad 2009) by taking log-differences of sales, i.e.

$${\text{Sales}}\;{\text{growth}}\left( {i,t} \right) \, = \, (\log \left( {{\text{sales}}\left( {i,t} \right)} \right) - \log \left( {{\text{sales}}\left( {i,t - 1} \right)} \right)$$
(4)

We estimate regression equations 1 year at a time, one cross section at a time, to obtain an R 2 statistic for each year. A logistic regression model is applied for our survival estimations (Jenkins 1995; Wiklund et al. 2010), which is compatible with our focus on survival/death within a single year (rather than survival durations over many years). Our regression equations for the growth and survival of firm i in year t are as follows:

$$\begin{aligned} {\text{Growth}}\left( {i,t} \right) \, & = \, \alpha_{1} + \, \beta_{1} \cdot \log \_{\text{sales}}\left( {i,t - 1} \right) \, + \, \beta_{2} \cdot {\text{Growth}}\left( {i,t - 1} \right) \\ & \quad + \, \gamma_{1} \cdot {\text{Entrepreneur}}\left( {i,t} \right) \, + \, \delta_{1} \cdot {\text{Business}}\left( {i,t} \right) \\ & \quad + \, \theta_{1} \cdot {\text{Account}}\left( {i,t} \right) \, + \, \varepsilon_{1} \left( {i,t} \right) \\ \end{aligned}$$
(5)
$$\begin{aligned} {\text{Survival}}\left( {i,t} \right) \, & = \, \alpha_{2} + \, \beta_{3} \cdot \log \_{\text{sales}}\left( {i,t - 1} \right) \, + \, \beta_{4} \cdot {\text{Growth}}\left( {i,t - 1} \right) \\ & \quad + \, \gamma_{2} \cdot {\text{Entrepreneur}}\left( {i,t} \right) \, + \, \delta_{2} \cdot {\text{Business}}\left( {i,t} \right) \, \\ & \quad + \, \theta_{2} \cdot {\text{Account}}\left( {i,t} \right) \, + \, \varepsilon_{2} \left( {i,t} \right) \\ \end{aligned}$$
(6)

where our explanatory variables can be grouped together at the entrepreneur level (age, education, business experience, sources of advice), the business level (number and gender of owner(s), legal form, industry, region), and the bank account level (volatility, overdraft behaviour).Footnote 20

5.4.2 Independent variables

The independent variables used in the analysis are defined in Table 2 in ‘Appendix’. It also sets out where these variables have been used in previous work on survival/growth of new/small enterprises and the results obtained. The first group are the ‘usual suspects’ such as Legal form (Company, Partnership, Sole Trader); Number of owners; Gender; Age (and Age squared); Education level categories; Sources of advice (EABL scheme, Accountant, Solicitor, College, SR Seminar, PYBT scheme, Family, or Other), and a full set of dummies for industry and geographical region.

A second group is information on bank account activity: sales volatility, availability and extent of use of authorised overdraft facilities, and the use and extent of use of unauthorised overdrafts. These variables have not been explored in previous work that seeks to explain firm growth and survival, and so their inclusion can be considered to be a strength of this paper.Footnote 21

Table 2 in ‘Appendix’ does not point to the omission of key variables that might cause our regression equations to be grossly misspecified. Prior work on firm growth generally has low values for the R 2 statistic (usually lower than 15 %, see the survey in Coad 2009, Table 7.1) so although there remains a risk of specification error and omitted variable bias, there are no clear guidelines in the literature as to which (if any) variables or regression specifications would be more appropriate.

5.5 Summary statistics

Table 1 provides an overview of the size and growth of new ventures in our sample which, it will be recalled, all began trading in the second quarter of 2004. The median sales in year 1 (i.e. 2005) are £38,712 which is far smaller than the threshold for value-added tax (VAT) registration (set at £58,000 for the 12 months from 1 April 2004, and rising to £73,000 by 1 April 2011), above which firms start to appear in UK administrative datasets. Around 50 % of new ventures will exit within 3 years of starting to trade, which is similar to that observed from UK administrative data on new ventures (Anyadike-Danes and Hart 2014).

Table 1 Summary statistics for size and growth rates

To investigate the impact of our rich coverage of micro firms, we complement our baseline results with those obtained from restricting our sample of new ventures to those of above-median start-up size. This makes our sample more similar to other work on new ventures that has a disproportionate coverage of larger new ventures (Yang and Aldrich 2012). The significance of this is that, only by year 10, would the median surviving firm from this dataset have had sales sufficient for them to be included in official data.

A second key fact to emerge from the lower section of Table 1 is that (positive) growth in sales is by no means the ‘norm’ for new ventures. The mean growth rate is negative in every single year, although the median growth rate is only negative in 3 years. The term ‘sales growth,’ when applied to NVs, is for this reason potentially misleading if it is not understood that growth rates can be negative (Davila et al. 2015). Indeed, negative growth rates (i.e. decline) are very common.

Figure 2 presents the growth rate distribution, which resembles the usual Laplace or symmetric exponential distribution found in other work (Bottazzi and Secchi 2006; Coad and Tamvada 2012; Daunfeldt and Halvarsson 2015). In every year, about half of the firms will have negative growth rates, which emphasises further that our use of the term ‘sales growth’ does not imply that new ventures all have (positive) growth, but that there are many cases of decline (i.e. negative growth rates).

Fig. 2
figure 2

Growth rate distributions for different years. Note the log scale on the y-axis

Summary statistics for the explanatory variables are presented in Table 3 in ‘Appendix’.

6 Testing the hypotheses

This section presents the crux of our empirical contribution, which can be found in our plots of the evolution of the R 2 statistic over time (see Figs. 3a, b, 4). We present the evolution of the Nagelkerke R 2 statistic for four regression specifications—in some cases we include lagged growth as an explanatory variable (at the cost of losing an extra year’s results), and in some cases we focus on a subsample of relatively large firms (i.e. those with above-median sales in year 1). Regression results tables for the baseline specification are also presented for the sake of completeness as Tables 4, 5 and 6 in ‘Appendix’.

Fig. 3
figure 3

a OLS growth regression Nagelkerke R 2 statistics for individual cross sections for the first 10 years, for 4 different growth rate regression specifications and b OLS growth regression Nagelkerke R 2 statistics for the first 10 years, for 4 different growth rate regressions (NVs that survive until the end of year 10). Key: Baseline: full sample. Baseline + lag: full sample controlling for lagged growth. Largest startup size: above-median start-up size only. Largest startup size + lag: above-median start-up size subsample, controlling for lagged growth

Fig. 4
figure 4

Logit survival regression: Nagelkerke R 2 statistics for individual cross sections for first 10 years, for 4 different survival regression specifications. Key to regression specifications: the baseline model refers to the full sample with or without controlling for lagged growth. Regressions labelled ‘large startup size’ refer to a subsample of firms with above-median start-up size (i.e. above-median values of sales in the first year)

6.1 Plotting the R 2 statistics

6.1.1 Sales growth

Figure 3a shows how the Nagelkerke R 2 statistic for sales growth regressions evolves over the first 10 years. It starts off in year 2 at values of 28–37 % (depending on the regression specification), which is considerably higher than normally found in the literature on growth rate regressions (no doubt due to our unusually rich information on business behaviour). A closer look at the regression coefficients, reported in Table 4 in ‘Appendix’ for the baseline model, shows that the most significant variables are the bank account activity variables (volatility and overdraft behaviour).

Figure 3a shows that the R 2 decreases in the years after entry for the four specifications shown. In year 2 it is in the range of 27–37 %, whereas by year 10 it is in the range of 13–22 %. Year 5, which corresponds to the deep recession of 2009, does not stand out or interrupt the overall trend. As new ventures age, it seems to become increasingly difficult to accurately predict their growth. This implies the fog seems to thicken and is in line with Lotti et al. (2009), who observe that Gibrat’s Law appears to hold as a ‘long-run regularity’ as time goes by, and growth becomes harder to predict.

Another observation is that, from both Fig. 3a, b, for nearly all years, the Nagelkerke R 2 values for the equations that only include the larger enterprises (i.e. the ‘large startup size’ subsample) are higher than for the baseline sample. This implies it is harder to explain the growth performance of smaller firms, which exhibit greater volatility. This may explain, at least in part, why analyses using relatively large and well-established ‘new ventures’ (Hmieleski and Baron 2009; Dencker et al. 2009; Baum and Bird 2010) are able to show higher explanatory power than those included here.

Table 4 in ‘Appendix’ shows this decreasing trend in the explanatory power of our regressions is observed for alternative indicators of goodness of fit—the standard R 2 statistic as well as the Cox–Snell R 2 statistic—for the baseline case. Further explorations show that this is also the case for the 3 other regression specifications (results available upon request).

Figure 3a shows that the explanatory power of growth rate regressions decreases over time, but it does not explain why. Changes in the ‘fog’—our ability to predict growth—could be due to internal developmental factors in new firms, or they could reflect selection effects—whereby the composition of the sample of survivors is affected by the selective exit of certain types of firms.

Indeed, previous research has shown that many firms will exit in the years after entry (Audretsch et al. 1999; Santarelli and Vivarelli 2007), and it could be that changes in our ability to explain growth are due to changes in the sample composition over time. One way of eliminating the role of selection effects is to restrict the analysis to only those firms that survive the full 10-year period. Any change in the ability to explain growth for this subsample would then be due to internal developmental factors rather than selection effects.

The results are plotted in Fig. 3b (and the regression results for the baseline case are presented in Table 5 in ‘Appendix’). For the subsample of surviving firms, the R 2 shows no clear trend over time. For surviving firms, there is no clear change in our ability to explain their growth in the years after entry. Any deterioration in our ability to explain growth in the years after entry (shown in Fig. 3a) would therefore seem to be driven by the relative ease of explaining the growth (or perhaps more precisely: the decline) of short-lived firms.Footnote 22

Overall, therefore, the evidence in Fig. 3a, b suggests that our ability to explain growth deteriorates in the years after entry, with this becoming closer to random over time. This seems to be driven by the changing composition of the sample of surviving firms (i.e. selection effects) rather than any internal developmental factors within firms. Focusing on a core subsample of NVs that survive until the end of year 10 (and thus removing any sample composition effects because we have the same number of observations in each year), our ability to explain growth remains roughly constant over time (Fig. 3b). Overall, this mixed evidence is in keeping with Hypothesis 1.

6.1.2 Survival

To test Hypothesis 2 we run year-by-year regressions (presented in detail in Table 6 in ‘Appendix’ for the baseline case) and plot the evolution of the Nagelkerke R 2 statistics in Fig. 4. The ‘fog’ regarding survival—i.e. our ability to explain the survival of firms—seems to clear in the years after entry. Figure 4 shows how the Nagelkerke R 2 starts off at around 15 % in year 2 and increases to 26–36 % by year 10. This is consistent with our simulation model and the predictions of Hypothesis 2.

The key difference between the growth rate regressions (Fig. 3a, b), on the one hand, and the survival regressions (Fig. 4), on the other hand, is that ‘the fog clears’ in the years after entry when the task is to explain survival, yet it remains dense when the task is to explain growth.

6.2 Robustness analysis

Further evidence on the robustness of our findings comes from considering alternative measures of goodness of fit, in addition to the Nagelkerke R 2. These are shown at the bottom of Tables 4, 5 and 6 in ‘Appendix’. For the growth regressions, the R 2 and Cox–Snell R 2 statistics closely mirror the Nagelkerke R 2, with no clear trend in the R 2 statistic. For survival, we report the Cox–Snell R 2, as well as information on the percentage of cases correctly classified: the latter increase in most years after start-up, hence confirming our earlier results using the Nagelkerke R 2 statistic. The Cox–Snell measure provides results that are less clear-cut, however: although it rises during the early years, it reaches a peak in year 6.

Another way of exploring the robustness of our results is by taking an alternative regression specification with a different set of explanatory variables. Earlier we commented on the fact that our database contains a number of variables relating to bank account activity, which constitute a rich and unique source of information on firm behaviour, although these variables remain little-known in the literature, and also they may raise concerns of endogeneity (e.g. risky unauthorised overdraft behaviour may be a cause or a consequence of poor performance in terms of growth or survival prospects). We repeated the analysis excluding the variables relating to bank account volatility and obtained the following results. For the growth rate regressions, a first observation was that the Nagelkerke R 2 statistics were very low, in the range of 3–7 % for our baseline specification. If anything, the Nagelkerke R 2 statistics appeared to increase slightly in the years after entry, although this increase was not monotonic. When the growth rate regressions were performed on the core sub-sample of firms surviving until the end of the 10-year period, the Nagelkerke R 2 generally decreased, if anything, in the years after entry. Our clearest results were observed in our survival regressions, where the Nagelkerke R 2 followed an increasing trend in the years after entry. All in all, when we repeated the analysis without the bank account activity variables, our results for survival were relatively clear in showing that the survival ‘fog’ tends to clear in the years after entry (i.e. that the Nagelkerke R 2 generally increases in the years after entry). Our results for growth were less clear-cut, probably because after dropping the bank account variables the overall explanatory power was very low (Nagelkerke R 2 statistics of around 5 % or lower) and hence the lower signal-to-noise ratio made it hard to detect any clear trend.

7 Conclusion

Business owners, providers of finance, and governments have much to gain from developing a better understanding of the factors influencing the performance of new ventures (NVs) in the years after entry. The starting point for this paper was that the post-entry performance of new ventures is highly diverse, the selection environment is noisy (characterised by imperfect mechanisms of survival/growth of the fittest), and that our ability to explain and perhaps forecast the survival/growth in new ventures is weak. In the terminology of this paper, the fog was thick. The challenge therefore was to examine whether, as the new venture aged, it became easier to explain performance: did the fog lift? Our final question was, if the fog does lift with time, did visibility improve in steps or stages (Phelps et al. 2007; Levie and Lichtenstein 2010) or was the process more continuous?

To address these questions we primarily drew upon a theoretical framework that sees new venture sales growth as a random walk (Levinthal 1991; Le Mens et al. 2011), and survival being determined by the stock of available resources (proxied by size), where these resources are either present at start-up or accumulated after entry. We used this theory to derive testable hypotheses that our ability to explain growth (i.e. the R 2 from growth regressions) should remain low over time, but that our ability to explain survival should increase in the years after entry.

We conducted our tests on 6579 new ventures which, because they were genuinely representative of NVs, were on balance considerably smaller than those identified in prior work. These NVs were tracked over the years 2004–2014, generating two key findings. First, in the sales growth regressions, the goodness-of-fit measure (Nagelkerke R 2) decreases in value in the years after entry—implying that our ability to explain firm growth deteriorates, or that ‘the fog thickens’. However, when we sidestep issues of ‘selection’ and focus only on a subsample of NVs that we know will survive until the end of the period of observation, then our ability to explain the growth in this subsample of survivors remains low but does not change over time. Hence, any decrease in our ability to explain growth in the years after entry appears to be driven by the presence of short-lived firms, rather than being due to internal developmental factors within surviving firms. In any case, our ability to explain growth remains low throughout the period investigated.Footnote 23 Second, in the survival equations, using three performance metrics we find that, on balance, the goodness-of-fit increases in years since start-up. This suggests that the fog does lift somewhat with time when the task is to predict survival.

In terms of the questions posed at the start of the paper, we take our evidence as showing that the growth rate fog is always thick and shows no signs of improvement with time, in line with our theory. Survival visibility, however, does seem to improve with time, but not in a clear ‘step’ fashion.

Finally, we see important areas for developing this approach. Currently our data track a cohort of new ventures during an unusual period—beginning in benign macro-economic conditions that are followed by a deep recession. Ideally we would like to know whether our findings hold under different macro-conditions. However, future efforts in this direction will face challenges of obtaining comprehensive datasets on NVs (from year 1) that also include a rich set of explanatory variables.