1 Introduction

Over the last decade, a large body of literature has focused on determinants of firm growth. Convincing evidence has been provided that firm growth is affected by firm size and age, but the empirical results differ on the question of which additional firm characteristics are important determinants of firm growth (Coad 2007b).

There is a considerable literature that analyzes the relationship between financial performance and firm growth, as well as the impacts of agglomeration on firm growth. Specifically, Endogenous Growth Theory and New Economic Geography provide theoretical explanations. Since the work of Marshall (1890), it is assumed that the spatial concentration of economic activity of firms in the same sector (specialization) can foster the emergence of economies of scale, which affects the performance of firms positively. Firms may locate in proximity to other firms to benefit from a more extensive base of suppliers and customers (Glaeser and Kerr 2009, p. 636). Thereby firms can reduce their costs of obtaining inputs, often referred to as input sharing, and the costs of shipping their goods to customers. Detailed evidence of the importance of knowledge spillovers is, for example, presented by Mathias et al. (2021). Agglomeration appears to lead to knowledge gains only if unique conditions prevail, such as firm age. There is also evidence of a positive relationship between agglomeration and innovation outcomes, e.g., patents.

Existing empirical analyses are primarily conducted on a regional levelFootnote 1 (e.g., city, county, or zip code level). Thus, findings may depend on the chosen regional level and provide only limited insights into the existence and effects of agglomeration on growth at the firm-level (de Bok and van Oort 2011). Since agglomeration economies affect regional economic growth through their impact on the performance of firms only indirectly, a firm-level agglomeration analysis is most suited to provide insight into the mechanism of agglomeration.

While there may well be effects of agglomeration, which can have effects without direct interactions of firms, e.g., positive effects of labor market pooling, these effects may be based on firm-level interactions to a larger extent. As we do not obtain direct information of supplier and customer interactions at the firm-level, sectoral data provide valuable information on input (supplier of firms’ material and services for the production process) and output (customers of firms’ products) relations. To utilize this information, we combine firm-level information on location and firm characteristics obtained from balance sheets and income and loss accounts, with sectoral dependencies obtained from input and output tables provided in the national account statistics published by the Federal Statistical Office of Germany (Destatis). Using this relational information, we calculate firm-specific supplier and demand densities based on surrounding firms, taking into account their relevance revealed in the sectoral input-output relations.

This paper aims to contribute to both the firm-growth and agglomeration literature by estimating a firm-growth model, which regards firm-specific attributes, as well as agglomeration effects based on supplier and customer densities in Germany.

Although there is literature that examines the effect of input-output linkages (as well as other Marshallian agglomeration mechanisms) on coagglomeration, which is the tendency of two industries to locate in proximity (e.g., see Ellison et al. 2010 or Aleksandrova et al. 2020), this paper differs from these others by using kernel densities to calculate supplier and customer densities specific to each individual firm’s location, thereby avoiding the arbitrariness of spatial boundaries and scales.

The paper is structured as follows. In the next section, we provide a literature review. In the third section, we present our database and the operationalization of supplier and customer densities. Section four includes descriptive evidence and regression results. We discuss our findings and provide conclusions in the last section.

2 Literature Overview

In our analysis, we examine the effect of the agglomeration of supplier and customer firms at the firm’s specific location, on its employment growth. Hence, we discuss the firm growth literature as well as the agglomeration literature that is relevant to this specific linkage.

Firm age and firm size are the most frequently investigated determinants of firm growth in empirical studies. Age and size may be correlated, since large firms are mostly older than smaller firms (Coad 2007b, p. 51). Reviewing previous research on the growth of firms, Coad (2007b, p. 18) reveals that age affects firm growth negatively. Similar findings are also found by Harhoff et al. (1998) and Audretsch and Dohse (2007), who analyze employment growth using a dataset of German firms. Concerning firm size (total assets), Harhoff et al. (1998), Audretsch and Dohse (2007), and Hoogstra and van Dijk (2004) report a negative impact of firm size on employment growth.

Another strand of the literature focusses on the link between financial performance and firm growth. For instance, previous sales growth may spur growth, since investors may consider firms with high growth in sales as secure investments (Levratto et al. 2010, p. 8). In this context, Fuertes-Callen and Cuellar-Fernandez (2019) analyze the impact of profitability, measured as the return on assets (earnings before interest and taxes divided by total assets), on Spanish employment, as well as on sales growth using static and dynamic panel models. They discover a positive effect of past profits on employment growth, but a negative effect on sales growth, suggesting that Spanish firms do not tend to invest in growth. This result is in line with Coad (2007b), who states that although financial performance is often found to be statistically significant, its impact on growth is small.

The financial structure of a firm may also influence its growth. A high debt ratio may indicate lower financial constraints, but could also increase the risk of future financial constraints and solvency, whereas a high fixed assets ratio may lead to low flexibility and the need to cover recurring asset expenses. However, in empirical research, it remains unclear whether the financial structure is an essential determinant of firm growth. For example, Becchetti and Trovato (2002, p. 294) find that high-leverage firms tend to experience stronger growth than low-leverage firms, whereas, in the study of Fuertes-Callen and Cuellar-Fernandez (2019, p. 96), the debt ratio does not have a statistically significant influence on employment growth.

A wide range of studies have analyzed the relationship between regional growth and agglomeration economies using aggregated regional data, such as zip-code-sectors or municipal-sectors (e.g., Glaeser et al. 1992; Combes 2000; van Oort 2007). Agglomeration economies may arise from various types of economic structure, for instance, the spatial concentration of economic activity in the same sector (specialization), in diverse sectors (diversity), or competition. Although there is no agreement on which type of economic structure leads to the emergence of agglomeration economies, many empirical studies have indicated a link between agglomeration economies and the uneven regional distribution of economic activity and growth (de Bok and van Oort 2011, pp. 5–6).

Mathias et al. (2021) review 42 studies on the relationship between agglomeration, which was operationalized by aggregated regional indices, and firm performance. They find that agglomeration is associated with innovation benefits, but the level of agglomeration depends on unique conditions such as firm age or the institutional environment. Mathias et al. (2021, p. 436) conjecture that the positive relationship between the degree of agglomeration and financial performance outcomes will be stronger in younger firms than in older ones. In contradiction to their supposition, they find that older firms achieve greater financial performance (Mathias et al. 2021, p. 441).

However, the spatial scale composition may affect the regional analysis (see the Modifiable Areal Unit Problem (MAUP)Footnote 2). Furthermore, these studies only provide limited information on the relationship between agglomeration and firm performance.

These limitations may be due to two reasons: On the one hand, agglomeration economies only indirectly affect regional economic growth through their impact on the performance of firms (de Bok and van Oort 2011, pp. 5–6). On the other hand, these studies often use economic structure indicators, which are hypothesized as fostering the emergence of agglomeration economies, to examine agglomeration forces. For example, the location quotient, a sectoral employment share indicator, is used to measure specialization. However, in theory, not the economic structure per se, but agglomeration mechanisms lead to externalities, potentially fostering firm performance. Moreover, specific economic structures such as specialization, may also generate negative effects, e.g., congestion effects (Henderson et al. 1995) or vulnerability to sector-specific shocks (Duranton and Puga 2004), which is why measuring economic structures may only be second-best.

Over the last two decades, several scholars have focused on sources of agglomeration.Footnote 3 Using U.S. manufacturing firm data, Rosenthal and Strange (2001) find that all three Marshallian agglomeration mechanisms (labor market pooling, input sharing, knowledge spillovers) positively affect the emergence of agglomeration economies.

Ellison et al. (2010) examine the relative importance of the Marshallian agglomeration mechanisms using OLS regressions based on firm-level manufacturing and input-output accounting data. Regressing two different coagglomeration indices, which measure the tendency of two industries to locate in proximity to one another, on variables measuring the extent of input-output linkages, separate input and output effects, labor market pooling, and technological flows, they find a positive statistically significant effect for all three Marshallian mechanisms on the tendency to coagglomerate. However, of these three mechanisms, input-output linkages are particularly important.

Jofre-Monseny et al. (2011) and Aleksandrova et al. (2020) also provide supportive evidence of the relevance of input-output linkages by analyzing the importance of the three Marshallian agglomeration mechanism concerning the creation of firms across cities (between-cities) and across municipalities within large cities (within-cities) in Spain and Russian manufacturing firms, respectively. Since the use of a coagglomeration index as the dependent variable may lead to simultaneity, i.e., customer-supplier relationships may be the result of coagglomeration, Jofre-Monseny et al. (2011) employ a Poisson regression to model the count of new firms. To define customer-supplier relations, data from the Catalan Input-Output Table is used to create variables measuring local employment in industries that are the main input supplier or customers, respectively, of an industry. Their results indicate that only input sharing and labor market pooling constitute important agglomeration mechanisms in the between-cities analysis, but all three mechanisms are relevant in the within-cities analysis.

Also addressing the micro-foundations of observed location patterns at state, county, and zip code level, Kolko (2010) focuses on the question of why services industries are more urbanized, but less highly concentrated than manufacturing industries. The author finds that services industries rely less on natural resources, and occupational specialization does not have a statistically significant effect on coagglomeration. Services industries also tend to urbanize, in order to minimize transport costs for their outputs.

To study sectoral localization, a strand of literature uses a nonparametric approach developed by Duranton and Overman (2005), which compares the distribution of pairwise distances between establishments in an industry to that of randomly chosen establishments. Regarding the colocalization patterns between vertically-linked industries, Duranton and Overman (2008) discover that on small spatial scales, establishments tend to locate closer to the same industry establishments, while on a greater spatial scale (around 150 km), they tend to locate closer to vertically-linked industries. Klier and McMillen (2008) find that U.S. auto supplier plants tend to locate in areas with good highway access, close to Detroit, and in proximity to assembly plants. Recent evidence from Goldman et al. (2019) suggests that R&D, administrative, and production workers are the most concentrated occupation groups for manufacturing industries. This result is in line with the argument that knowledge spillovers and labor market pooling lead to a high concentration of R&D and administrative workers.

The literature also provides empirical insights into the sources of agglomeration about production and productivity. For instance, Feser (2002), Rigby and Essletzbichler (2002), and Greenstone et al. (2010) estimate production functions using plant-level data and found evidence for an influence of the traditional Marshallian sources of agglomeration in the U.S.

In conclusion, while many studies have examined the impact of agglomeration on regional growth, the empirical literature focusing on individual firms is rather limited. Existing studies have regressed (co‑) agglomeration indexes or (total factor) productivity on Marshall’s three sources of agglomeration. In particular, to gain deeper insight into the relationship between agglomeration and regional growth, it is necessary to analyze the effects of the micro-foundations of agglomeration economies on regional growth, in addition to the economic structures conducive to agglomeration. Therefore, focusing on regional agglomeration effects, considering the surrounding supplier and customer locations information on the individual firm’s growth seems to be a promising research strategy.

3 Data and variable construction

3.1 Data

For the analysis, we use firm-level information from the Orbis database generated by “Bureau van Dijk” (BvD). This database includes companies of all sizes, but has a focus on private companies. Orbis contains balance sheet and income statement firm-level data and in addition, Orbis also provides data from the notes to the financial statements, such as the number of employees, or metadata such as information about a company’s industry. The database also includes the firm’s address, which was transformed into geographic coordinates (longitude and latitude).

The data quality requirement for firms to provide information on the supplier and customer densities is relatively low, as we only need information on the firm location and their sales in 2013. Firms included in the final regression analysis have to fulfill more restrictive criteria, as we need valid information on all the variables included in the regression model. Hence, we constructed two datasets, one with fewer variables and less stringent selection criteria for the estimations of the input supplier and customer densities, and one more restricted dataset for the core analysis of growth determinants.Footnote 4 The number of observations in the data set for estimating supplier and customer densities is 21 053, whereas the number of observations in the data set finally used for the regression analysis is \(19\,275\).

We use input-output-accounts provided by the German Federal Statistical Office (Statistisches Bundesamt) to measure the input supplier and customer linkages between sectors. Sector information in the Orbis dataset is provided as a NACE code (Eurostat 2008), while the aggregation to 72 sectors in the specific advanced input matrix is based on CPA (Classification of Products by Activity). According to the Statistisches Bundesamt (2008, pp. 48–49), the structure of CPA generally corresponds to that of NACE up to the level “class”.

Starting with the advanced input matrix \((72\times 72)\) for domestic production (Table 2.3 provided in Statistisches Bundesamt 2017, pp. 60–71), we aggregated the 72 sectors to 12, using the conversion table provided in Behr and Rohwer (2019, p. 270) (see also Table 4). For the Orbis dataset, we first converted NACE to the 72 sectors, as given in the advanced input matrix, and then aggregated further to 12 sectors.

We compare our data (dataset used for the regression analysis) with census data from the regional database (Regionaldatenbank Deutschland 2020) provided by the German Federal State Statistical Offices to assess the quality of the dataset. German census data is classified according to WZ 2008Footnote 5, which structure corresponds to that of NACE according to Statistisches Bundesamt (2008, pp. 47–48). According to Table 5 and Table 6, our data reflects the employment share per sector and federal state of the census data. In Table 5, we find similar employment shares per sector for both datasets, and from Table 6 we find that employment shares per federal state are similarly distributed.

3.2 Variable construction

In this subsection, we describe the definition of variables obtained directly from firm-level balance sheets and income and loss statements.

Employment growth. Using job creation as the general objective from a socio-economic perspective, in line with Illy et al. (2009), we use the employment growth indicator

$$\mathit{lgr}_{i}=\ln{(\mathit{emp}_{i,2017})}-\ln{(\mathit{emp}_{i,2013})}$$
(1)

as a proxy for economic growth, where \(\mathit{emp}_{i}\) represents (total) employment (number of employees) in firm \(i\).

Firm age. Several studies have shown that young firms tend to grow faster than older ones, e.g., Harhoff et al. (1998). We define the age of the firm according to the available information in the Orbis data set in decades:

$$\mathit{age}_{i}=\text{(age in days)}_{i,2013}/3\,650.$$
(2)

The more dynamic growth process of younger companies tends to have a relatively high variance, see, e.g., Coad (2007b), Dunne et al. (1989) and Boeri and Cramer (1992).

Firm size. We define a firm’s size by the natural logarithm of its total assets

$$\mathit{ltoas}_{i}=\ln\text{(total assets)}_{i,2013}.$$
(3)

A firm’s age and size have been found to be important in firm-growth-related studies. Since larger firms are usually older than smaller ones, there is possibly a correlation between these two predictors (Coad 2007b, p. 18 and p. 51). Furthermore, small firms are under constant pressure to grow in order to reduce costs to the same level as other (larger) competitors (Coad 2007b, p. 51).

A specific minimal size is essential in fields with high fixed costs, whereas a relatively small size increases the firm’s flexibility (Schreyer 2000, p. 13). Consequently, large firms have longer planning horizons and more long-term investments, which pay off over several years and lead to a higher autocorrelation of growth (Coad 2007a, p. 74). In addition national legislation is often firm-size-specific. For instance, larger firms in certain countries have higher firing costs and have to pay higher taxes. On the other hand, they also have greater lobbying power, which might facilitate growth (Coad 2007b, p. 27 and p. 78–79 and Schreyer 2000, p. 7).

There seems to be no consensus in the literature as to whether a small or a large size facilitates high growth. For instance, Harhoff et al. (1998) find that a larger size leads to lower growth rates, whereas Henrekson and Johansson (2010, p. 1) find the opposite.

Sales growth. We consider the change in sales

$$\mathit{lsa}_{i}=\ln{(\mathit{sales}_{i,2013})}-\ln{(\mathit{sales}_{i,2012})}$$
(4)

as it can be considered a main determinant for firm growth to meet the rising demand for its product by enlarging its employment.

The debt ratio. The debt ratio is defined as follows:

$$\mathit{dr}_{i}=\frac{\text{(total liabilities)}_{i,2013}}{\text{(total assets)}_{i,2013}}$$
(5)

It measures a firm’s leverage, indicating the borrowed share of a firm’s funding. The remaining financing originates from past retained profits or has been introduced by the shareholders (Albrecht et al. 2007, p. 476; Penner 2004, p. 218). One important motivation to include \(\mathit{dr}\) has been put forward by Lopez-Garcia and Puente (2012, p. 1036–1039). They argue that firm growth requires financing and that a high \(\mathit{dr}\) might lead to future financing constraints and an inability to realize all reputedly profitable projects. In their study, \(\mathit{dr}\) was found to have a significant non-linear influence when not controlling for firm-specific time-invariant factors. The importance of \(\mathit{dr}\) for firm growth was also confirmed by Fagiolo and Luzzi (2006, p. 33), Becchetti and Trovato (2002, p. 294) and Levratto et al. (2010, p. 10). Lopez-Garcia and Puente (2012, p. 1031) found no effect of \(\mathit{dr}\). One possible reason for doubt is that, especially in cases of start-ups, not bank credits but risk capital and internal finance are important funding sources (Lopez-Garcia and Puente 2012, p. 1038).

The fixed assets ratio. The fixed assets ratio

$$\mathit{far}_{i}=\frac{\text{(fixed assets)}_{i,2013}}{\text{(total assets)}_{i,2013}}$$
(6)

indicates the degree of capital commitment. High values imply low flexibility and constant pressure to keep capacity utilization high, so as to cover the recurring assets’ expenses (e.g., for energy and maintenance). We assume that a high \(\mathit{far}\) might motivate a firm to grow in order to spread its high fixed costs over a greater number of products (Schneider and Lindner 2010, p. 317). Levratto et al. (2010, p. 5) added that excessively high fixed costs could be a severe threat to future growth.

3.3 Firm-level supplier and customer densities

In this section, we describe our approach of combining firm-level and input-output data to obtain firm-level estimates of supplier and customer densities.

3.3.1 Sector densities

In order to calculate input supplier and customer densities, we first calculate \(n_{s}\) weighted sector densities \(d_{i}^{s}\) for each firm \(i\) at each firms’ coordinates. \(d_{i}^{s}\) is the density estimate, using all firms from sector \(s\) with \(s=1,\ldots,n_{s}\) (weighted with their amount of sales in 2013), evaluated at the firm’s location \(\{\mathit{longitude}_{i},\mathit{latitude}_{i}\}\).Footnote 6

Following the notation of Wand and Jones (1995, p. 90), the multivariate density estimator for a \(d\)-dimensional random sample \(\boldsymbol{X}_{1},\ldots,\boldsymbol{X}_{n};\boldsymbol{X}_{i}\in\mathbb{R}^{d}\) with density \(f\) can generally be written asFootnote 7

$$\hat{f}(\boldsymbol{x};\boldsymbol{H})=n^{-1}\sum_{i=1}^{n}K_{\boldsymbol{H}}(\boldsymbol{x}-\boldsymbol{X}_{i}),$$
(7)

with

$$K_{\boldsymbol{H}}=|\boldsymbol{H}|^{-1/2}K\left(|\boldsymbol{H}|^{-1/2}\boldsymbol{x}\right),$$
(8)

where \(\boldsymbol{x}\in\mathbb{R}^{d}\), \(\boldsymbol{H}\) is a \(d\times d\) bandwidth matrix (symmetric positive definite) and \(K(\cdot)\) is a \(d\)-variate kernel function with \(\int\!K(\boldsymbol{x})\mathop{}\!\mathrm{d}\boldsymbol{x}=1\). In this paper, the standard normal density,

$$K(\boldsymbol{x})=(2\pi)^{-d/2}\mathrm{e}^{-(1/2)\boldsymbol{x}^{\prime}\boldsymbol{x}},$$
(9)

was used as kernel function. To account for weights, we can modify Eq. (7) towards

$$\hat{f}(\boldsymbol{x};\boldsymbol{H};\boldsymbol{y})=n^{-1}\sum_{i=1}^{n}\omega_{i}K_{\boldsymbol{H}}(\boldsymbol{x}-\boldsymbol{X}_{i}),\quad\text{with}\quad\omega_{i}=\frac{y_{i}n}{\sum_{i=1}^{n}y_{i}},$$
(10)

where \(\boldsymbol{y}\) is a vector with weights (sales).

We use the same bandwidth matrix \(\boldsymbol{H}\) for all twelve sector-specific density estimations to obtain an equal surrounding area. The bandwidth is estimated by a plug-in bandwidth selector, using all firm locations, regardless of the specific sector. Calculating the weighted densities with the fixed bandwidth matrix, we obtain a vector \(\boldsymbol{d}_{i}=(d_{i}^{1},\ldots,d_{i}^{s},\ldots,d_{i}^{n_{s}})^{\prime}\) for each firm \(i\) containing the sectoral densities.

3.3.2 Taking into account sectoral supplier and customer interrelations

To construct input supplier and customer densities, we combine input-output data from the German Federal Statistical Office and firm-level data from the Orbis database. Note that we have only locational information on intermediate goods, and therefore, the obtained customer density has to be calculated based on surrounding firms buying intermediate products for their own production process.

Given a symmetric \(n_{s}\times n_{s}\) advance input matrix for \(n_{s}\) sectors

$$\boldsymbol{A}=\begin{pmatrix}a_{1,1}&\ldots&a_{1,n_{s}}\\ \vdots&\ddots&\vdots\\ a_{n_{s},1}&\ldots&a_{n_{s},n_{s}}\end{pmatrix},$$
(11)

where \(j\) \((j=1,\ldots,n_{s})\) is an index for rows and \(k\) \((k=1,\ldots,n_{s})\) is an index for columns, \(a_{j,k}\) is the value of goods of sector \(j\), which are used as the input for sector \(k\). Note that all entries of the matrix are measured in Euros, thus allowing for aggregation.

Thus, the \(j\)-th row sum \(\sum_{k=1}^{n_{s}}a_{j,k}\) represents the value of all advanced inputs that sector \(j\) delivers to its “customers”, analogously the column sum \(\sum_{j=1}^{n_{s}}a_{j,k}\) is the value of all advanced inputs that sector \(k\) uses for production.

To account for the different sector shares in the economy, we define two sector-specific \(n_{s}\times 1\) weight vectors

$$\boldsymbol{S}^{\text{sup}}=\frac{1}{\sum_{j=1}^{n_{s}}\sum_{k^{=}1}^{n_{s}}a_{j,k}}\left(\begin{matrix}\sum_{j=1}^{n_{s}}a_{j,1}\\ \vdots\\ \sum_{j=1}^{n_{s}}a_{j,{n_{s}}} \end{matrix}\right)$$
(12)

and

$$\boldsymbol{S}^{\text{cus}}=\frac{1}{\sum_{j=1}^{n_{s}}\sum_{k^{=}1}^{n_{s}}a_{j,k}}\left(\begin{matrix}\sum_{k=1}^{n_{s}}a_{k,1}\\ \vdots\\ \sum_{k=1}^{n_{s}}a_{k,{n_{s}}} \end{matrix}\right)$$
(13)

where \(\boldsymbol{S}^{\text{sup}}\) is the weight vector for the input supplier density and \(\boldsymbol{S}^{\text{cus}}\) the weight vector for the customer density.

Assume that firm \(i\) at whose location the density is to be estimated belongs to sector \(s\). We use the \(s\)-th column of the input-output matrix to define the weight vector for its sectoral suppliers as

$$\boldsymbol{W}^{\text{sup}}_{s}=\frac{1}{\sum_{j=1}^{n_{s}}a_{j,s}}\begin{pmatrix}a_{1,s}\\ \vdots\\ a_{n_{s},s}\\ \end{pmatrix}$$
(14)

where \(\boldsymbol{W}^{\text{sup}}_{s}\) is the weight vector for the input supplier density for firm \(i\) located in sector \(s\).

Accordingly, we use the \(s\)-th row of the input-ouput matrix to define the weight vector for its sectoral customers as

$$\boldsymbol{W}^{\text{cus}}_{s}=\frac{1}{\sum_{k=1}^{n_{s}}a_{s,k}}\begin{pmatrix}a_{s,1}\\ \vdots\\ a_{s,n_{s}}\\ \end{pmatrix}$$
(15)

where \(\boldsymbol{W}^{\text{cus}}_{s}\) is the weight vector for customer density. Note that all firms located in the same sector are therefore assumed to have the identical sectoral supplier and customer weighting scheme for their surrounding firms. Note also the two different functions of the weighting schemes \(\boldsymbol{S}^{\text{sup}}\), \(\boldsymbol{S}^{\text{cus}}\) and \(\boldsymbol{W}^{\text{sup}}_{s}\), \(\boldsymbol{W}^{\text{cus}}_{s}\). As all sectoral densities have an identical volume of 1 by definition, \(\boldsymbol{S}^{\text{sup}}\) and \(\boldsymbol{S}^{\text{cus}}\) are used to account for the different shares of the sectors in total supply and demand. \(\boldsymbol{W}^{\text{sup}}_{s}\) and \(\boldsymbol{W}^{\text{cus}}_{s}\) account for the firm-specific supply and customer relations based on sectoral information.

Using the weighted density estimate, we can calculate the input supplier and customer density for a firm \(i\) belonging to sector \(s\) as

$$\begin{aligned}\displaystyle d^{\text{sup}}_{i}=(\boldsymbol{S}^{\text{sup}}\circ\boldsymbol{W}^{\text{sup}}_{s})^{\prime}\> \boldsymbol{d}_{i}\quad\text{and}\quad d^{\text{cus}}_{i}=(\boldsymbol{S}^{\text{cus}}\circ\boldsymbol{W}^{\text{cus}}_{s})^{\prime}\> \boldsymbol{d}_{i},\end{aligned}$$
(16)

where \(\circ\) is the Hadamard product.

3.3.3 Regression equations

Since the emergence of agglomeration economies appears to be affected by unique conditions, such as firm age (Mathias et al. 2021), we conjecture that nascent firms may benefit from higher supply and customer densities. Based on the age of the firms, we form an indicator function \(I_{a}\) to group the firms into “old” and “young” ones (1, if a firm belongs to the group of “old” companies, 0 otherwise), using the median age of the firms as threshold.

We estimate the following regression

$$\begin{aligned}[b]\mathit{lgr}_{i}&=\beta_{0}+\sum_{j=1}^{n_{s}-1}\beta_{j}S_{i,j}+\beta_{12}\mathit{age}_{i}+\beta_{13}\mathit{ltoas}_{i}+\beta_{14}\mathit{lsa}_{i}+\beta_{15}\mathit{dr}_{i}+\beta_{16}\mathit{far}_{i}\\&\quad\,+\beta_{17}d^{\text{sup}}_{i}+\beta_{18}I_{a}d^{\text{sup}}_{i}+\beta_{19}d^{\text{cus}}_{i}+\beta_{20}I_{a}d^{\text{cus}}_{i}+\epsilon_{i}.\end{aligned}$$
(17)

\(S_{i,1},\ldots,S_{i,11}\) take the value 1 if \(i\) belongs to sector \(S_{j}\) and 0 otherwise, using sector twelve (here: construction) as the reference sector, \(\epsilon_{i}\) is an independent and identically distributed (i.i.d.) error term and \(i\) is the firm index \((i=1,\ldots,n)\).

4 Empirical findings

4.1 Descriptive results

For the regression dataset with \(19\,275\) observations, we find (see Table 7) that most firms in the dataset belong to the trade and finance sectors (30.78% and 21.08%), and the least firms belong to vehicles and mining (0.6% and 0.43%). Looking at the number of employees per sector, we find that most employees work in the sectors finance and trade (41.39% and 18.53%) and in sectors agriculture and mining (0.52% and 0.27%) the least.

For the density estimation dataset (21 053 observations), we show in Figs. 1 and 2 the sales-weighted density estimations for each sector. The darker the area, the higher the estimated weighted density at this region.

Fig. 1
figure 1

Densities for sectors 1–6. a agriculture, b mining, c food, d chemistry, e machinery, f vehicles

Fig. 2
figure 2

Densities for sectors 7–12. a processing, b energy, c construction, d trade, e finance, f services

Firms from the agriculture sector generally seem to be located in the new federal states, with some larger hotspots in the old federal states. The sectors of mining, chemistry, machinery, and processing are primarily concentrated in the old federal states. For the other sectors, the weighted densities appear to be more evenly distributed. While firms from the food and vehicles sectors show a rather spotty distribution in space, whereas energy, construction, trade, finance and services seem to be more evenly distributed.

Table 1 Descriptive statistics for the variables used in the regressions

In Table 1, descriptive statistics for the regression variables are presented. We find that the distribution of the approximate percentage growth rates for employees \(\mathit{lgr}\) is right-skewed with a mean of 8.23% and a median of 3.53%. The median firm age is about 21 years, whereas the youngest firm in the dataset has existed for less than a year, and the oldest firm for more than 525 years. Firm size ranges significantly between \(\text{\EUR}23\)k and \(\text{\EUR}20,\!617,\!000\)k, for which the amount of total assets of the median firm is \(\text{\EUR}1,\!420\)k. For approximate growth in sales, we find that more than half of the firms have a positive growth of more than about 1.31%. Mean leverage is about 65% and the mean of the fixed asset ratio for all sectors is 29.34%, but varies considerably between sectors.

4.2 Regression results

To empirically determine the effects of supplier and customer relationships, we estimate regression Eq. (17) and show the results in Table 2.Footnote 8

The estimated input supplier and customer densities both have a statistically significant but opposing effect on firm growth for younger firms. In theory, firms would benefit from a dense supplier and customer environment, reducing costs for obtaining inputs from suppliers and shipping products to their customers. For older firms, the impacts point in the same direction, but are much smaller, thus indicating that younger firms are much more sensitive to agglomeration economies. In general, employment growth for German firms seems to benefit from a high input supplier density, whereas a high customer density seems to hamper growth.

For size (\(\mathit{ltoas}\)) and age, we find a significant negative relationship with firm growth, which is in line with previous literature. While Fuertes-Callen and Cuellar-Fernandez (2019) found a negative effect of sales growth on employment growth for Spanish manufacturing firms, we see a significant positive overall effect for German firms.

Compared to sector construction, firm growth is more extensive for firms belonging to a different sector, except for mining and agriculture, whereas only the effect of the sector dummy for agriculture is statistically significant.

To measure the effect of the financial structure of a firm on employment growth, we included the debt ratio (\(\mathit{dr}\)) and the fixed asset ratio (\(\mathit{far}\)). For our dataset, both variables have no statistically significant effect on firm growth.

Table 2 Regression results (using sector construction as the reference sector)

To address concerns of potential endogeneity of agglomeration, i.e. new firms may cluster close to fast growing, overperforming firms, thereby reversing the causality and to analyze the robustness of our regression results in this respect, we perform the following analysis.

First, we calculated the supplier and demand densities based on a subsample consisting only of firms above or equal the age of 20. Second, the regression (Eq. 17) is now based on the subsample consisting only of firms younger than 20. Hence, all firms considered for estimating the densities chose their location prior to the existence of the younger firms used in the regression. The grouping towards younger and older firms is now based on the median age of about 12 years for the subsample of younger firms.

Fig. 3 presents our identification approach using a simple example of five firms. We divided our sample into two groups based on firm age. Supplier and customer densities for firms 4 and 5 (\(d^{\text{sup}}_{4}\), \(d^{\text{sup}}_{5}\), \(d^{\text{cus}}_{4}\), \(d^{\text{cus}}_{5}\)) are then calculated using the subsample of firms above or equal to the age of 20 (firms 1 to 3). This ensures that agglomeration effects affect the firms in the subsample for the regression analysis (firms 4 and 5) in only one direction. Thus, firms 1–3 can affect firms 4 and 5, but there is no effect of firm 4 (a hypothetical firm with above-average performance) on firm 5.

Fig. 3
figure 3

The concept of the identification strategy using a simple example

We present these additional regression results in Table 9 and provide a robustness check in Table 10. The regression results show that the estimated regression coefficients resemble the coefficients in the original analysis very closely. The significances of the hypothesis tests also resemble the original results. Due to roughly halfing the number of observations by considering only younger firms, of course the standard errors are slightly larger as are the p‑values. Based on this regression findings we conclude that our original analysis indeed is not biased by potential endogeneity.

5 Conclusion

Since agglomeration economies affect regional growth mediated through firm growth, we provide a firm-level approach to analyze the effects of firm-specific supplier and customer densities. Estimating these densities is enabled by combining sectoral supplier and customer interrelations of input-output tables with firm-level information obtained from balance sheets and profit and loss accounts from the Orbis database.

The use of kernel density estimates allows us to avoid the use of arbitrary spatial boundaries and scales and to use individual firms as the unit of analysis. Analyzing the effects of supplier and customer relations for firms is more appropriate than the traditional analysis based on region-sector combinations.

We observe that regional agglomeration patterns differ substantially between sectors. Our econometric analysis, also considering firm-specific covariates such as firm size, age, and sales growth, shows that both input supplier and customer densities have a statistically significant effect on firm growth. As the effects differ in size for younger and older firms, firm age seems to be an important determinant when analyzing agglomeration economies. According to theory, a dense supplier and customer environment may be expected to reduce the costs of obtaining inputs and delivering goods to customers. Nevertheless, high regional densities may also implicate disadvantages, such as traffic congestion, high competitive pressure, or high rents. Based on our German data set, we find that a dense supplier environment contributes to firm growth, whereas a dense customer environment seems rather to hamper employment growth. One factor that is difficult to capture in data is that local policies may be designed to incentivize or discourage firms from locating in areas with a high supplier or customer density. Those policies may eventually lead to a regionally suboptimal relationship between suppliers and customers.

As our analysis might potentially be prone to the problem of endogeneity, i.e., high growth may as well attract other firms resulting in higher agglomeration, we provide an additional analysis using only agglomeration information obtained from a subsample before the period of analysis. This analysis confirms the results of our main analysis, indicating the absence of endogeneity bias.

In summary, using the firm-level supplier and customer densities allows for a more profound understanding of the relationship between agglomeration mechanisms and specific firm characteristics. Future research might explore how the impacts of dense supplier and customer environments vary depending on a firm’s stage of development, market power, or size. Additionally, the interaction of agglomeration and firm age might be analyzed more closely in further research.