1 Introduction

The research on the relationship between new firm formation and geographical accessibility to banks has regained popularity in recent years (Backman 2015; Nguyen 2019; Ho and Berggren 2020; Agostino et al. 2021). However, an area that has received considerably less attention is the geographical accessibility to bank branches and its relationship with new firm formation across space in Sweden. Although bank loans are one of the most popular forms of finance for Swedish small and medium-sized enterprises (SMEs), almost half of Sweden’s local bank branches have closed between 1990 and 2010, especially in rural areas (Swedish Agency for Economic and Regional Growth 2017; Backman and Wallin 2018).

This paper adds to the limited literature on the geographical accessibility to banks and its relationship with new firm formation. In this paper, we analyse the relationships between the geographical accessibility to the nearest bank branch and new firm formation across all 290 Swedish municipalities in 2013 and by using another time frame in 2007 as a robustness test, with the use of multiscale geographically weighted regression (MGWR). This paper also addresses the issue of endogeneity by using a 2SLS model to instrument the physical accessibility to the nearest bank branch in 2013 with two instrumental variables in 2000. The fitted values from the 2SLS model are then used in the MGWR modelling.

By comparing local parameter estimates across space using the MGWR model instead of a global model or a geographically weighted regression (GWR) model, multiple spatial scales can be applied simultaneously to characterise spatial contexts. Moreover, MGWR has not yet been applied to model new firm formation determinants. The results of the MGWR models in this paper reveal how the relationship between the proximity to bank branches and new firm formation varies across each Swedish municipality.

We hypothesized that the relationship between new firm formation and the proximity to the nearest bank branch differs across space, with more localised bank influence in rural areas. We also argued that spatial contexts can influence the relationship between the proximity to the nearest bank branch and new firm formation, which causes it to vary across space. The results of this paper show empirical evidence of a negative association between the proximity to the nearest bank branch and new firm formation in all Swedish municipalities, regardless of the location. However, the results also show that the relationship between the geographical distance to bank branches and new formation is not spatially varying across space.

This paper is organized as follows. Section 2 presents a review of studies that investigates the relationship between geographical accessibility to bank branches and new firm formation. A discussion of the methodology follows in Section 3. Section 4 presents the data and the variables used in this paper. Section 5 presents the empirical results, namely the results obtained from a 2SLS model, GWR, and MGWR models. Section 6 concludes.

2 Relationship between geographical accessibility to bank branches and new firm formation

Geographical accessibility to banks at the local level has been argued by some researchers to be less important due to technological developments which reduce the “distance-related diseconomies” (Petersen and Rajan 2002; Berger 2003). Berger (2003) argued that advances in technologies help banks in loan monitoring between the firm and the bank at long distances, allowing banks to issue credits at a greater distance. Technological progress also helps banks in carrying out traditional banking services which require physical proximity. For example, the use of credit scoring to screen a potential loan applicant without the need for geographical proximity or local knowledge. Bank managers may also find it easier to monitor and control staff located at a long distance, with the use of technology.

Several studies examined the relationship between the geographical accessibility to banks and lending to new firms in a region (Petersen and Rajan 2002; Brevoort et al. 2010; Backman 2015; Nguyen 2019; Kärnä et al. 2021). A study by Petersen and Rajan (2002) examined the distance between small businesses and their bank lenders from 1973 to 1993. They found out that the average distance between the small firms and their bank lenders has increased and attributed the increase in distance to improvements in bank productivity. Advances in computing and communications have increased the availability and timeliness of hard information, which facilitates impersonal and distant lending. Hence, distant firms do not necessarily need to have the highest credit quality and even informationally opaque firms that are located far away can secure a loan from the bank (Petersen and Rajan 2002).

However, an updated study by Brevoort et al. (2010) compares the results from the same data source used by Petersen and Rajan (2002), using data in the previous decade from the Surveys of Small Business Finances in 1993, 1998 and 2003. The findings by Brevoort et al. (2010) provide conflicting evidence with the findings by Petersen and Rajan (2002). The results from the updated study also show that firms with higher credit quality and more experienced ownership realize larger gains in distance compared with other firms (Brevoort et al. 2010). Another finding from the updated study shows that the growth in the distance between relatively old firms and their lenders shows a greater increase over time compared with younger firms, which are more informationally opaque than older firms (Brevoort et al. 2010). Hence, the updated study shows that even as the distance between small firms and their lenders has increased overall, the importance of geographical distance in small business lending cannot be underestimated, especially for young firms. A Swedish study also shows that the interest rates increase with the distance to the lender while the loan size decreases with the distance (Kärnä et al. 2021).

Thus, some researchers argue that geographical accessibility to banks is still important at the local level to new firm lending and that the advancement in communication technology does not necessarily mean an increased exchange of quality information (Degryse and Ongena 2004). Loan officers have limited incentives to transfer information and even with better communication platforms like emails, they might find it difficult to harden the soft information or transmit the information which is too sensitive to move through the bank branch organisation (Liberti 2011).

Soft information is often harder to communicate to others as it requires multiple face-to-face meetings and trust, which are often developed through a long-term relationship between the banks and the borrowers. It takes time for the loan officer to gather knowledge about the borrower’s personality, the quality of his or her firm’s management and his or her relationships with customers (Uchida et al. 2012). Therefore, proximity between the banks and borrower firms gives the banks an informational advantage by reducing search and monitoring costs (Geanakoplos and Milgrom 1991; Becker and Murphy 1992; Bolton and Dewatripont 1994). Nguyen’s (2019) study also highlights that geographical distance matters not only due to the improvement of accessibility but also in reducing the costs of information transmission.

Despite the advances in communication technology and Internet banking, banks continue to rely on personal relationships with borrower firms as it helps to facilitate monitoring and screening activities. Proximity helps in the forging of a long-term relationship between the lenders and the borrowers, and the closures of bank branches can have a disruptive effect on lender-specific relationships that have already been formed over a period of time (Nguyen 2019). The search for another lender located farther away and the formation of a new relationship with another lender takes a long time and effort. Whether the firm would be able to secure a loan with similar rates with another lender also involves uncertainty as the new lender does not know the borrower as well as the previous lender did. Moreover, soft information on distant firms is hard to obtain unless the loan officers who were previously in charge of the loan, transfer this soft information to the new lender (Sharpe 1990; Rajan 1992). However, previous loan officers have no incentives to do so due to the proprietary nature of soft information (Drexler and Schoar 2014).

Proximity also enables banks to access knowledge about the local market; distance erodes lenders’ ability to acquire valuable proprietary intelligence about potential borrowers in the local market (Berger and Udell 1995; Agarwal and Hauswald 2010). Several studies also emphasize the importance of a well-developed financial system at the local level, which can enhance the ability of a small firm to gain access to external financing (Cole et al. 2004; Arcuri and Levratto 2020). Small firms generally suffer from great difficulty in applying for loans due to the lack of track record (Cole et al. 2004; Arcuri and Levratto 2020).

In general, spatial contexts can influence the importance of geographic distance to bank branches on new firm formation but it is unclear in the literature which spatial scale the relationship between geographic distance to bank branches and new firm formation applies. For instance, the different spatial scales in which the relationship applies can be on a local, regional or global scale.

A limitation of previous studies is the assumption that each new firm formation determinant operates at the same spatial scale and it is more likely that the relationship may vary at different scales due to complex social, economic, and demographic factors (Backman 2015; Kärnä et al. 2021). For example, geographical accessibility to banks might be more important for new firm formation in rural areas than in urban areas. One factor for this difference in the magnitude of importance is the existence of alternative external financing in urban regions. The majority of the venture capital market is located in the two largest metropolitan cities in Sweden: Stockholm and Gothenburg (SVCA 2017). New firms in urban regions have accessibility to a wider range of finance such as formal and informal venture capital and business capital while new firms in rural regions have fewer financing options (Avdeitchikova 2009). Therefore, firms in urban regions should face lesser credit constraints as they have easier access to finance, while firms in remote regions have to rely on a few local banks.

Furthermore, a Swedish study finds that in non-metropolitan areas, both soft and hard information is treated equally important by the loan officer for the credit risk assessment of a firm. While in metropolitan areas, the loan officer focuses a lot more on quantitative information like the annual financial statement (Silver 2001). This is because, in smaller towns, it is easier to receive and verify soft information from another actor in the social network (Agarwal and Hauswald 2010). Furthermore, Silver (2001) also finds that bank branch managers in non-metropolitan areas make greater use of social network forums to get hold of interesting prospects and to maintain their relationship with their customers. This local context influences the importance of geographical distance to bank branches in the credit risk assessment process, which can affect the chances of a new firm getting a bank loan.

Another factor that contributes to the local context is the industrial composition of the economy in the local region. Capital-intensive sectors such as transport, agriculture and manufacturing have a larger demand for external capital, hence creating more need for bank capital, which means that it is even more important to have good accessibility to a bank branch (Landström 2017). Furthermore, the nature of the industry is also another factor that contributes to the importance of acquiring soft information from the firm, which is then related to the importance of physical accessibility to banks (Silver 2001). In the metropolitan areas where the service industry takes the bulk of the industry composition, a firm visit by the loan officer usually meant less for firms in the knowledge-intensive service industries. Moreover, in smaller communities, the loan officer probably finds it easier to understand the impact of the local environment on a firm due to the prior local knowledge he possesses.

The hypothesis resulting from this section is as follows:

H1

The influence of the physical accessibility to bank branches on new firm formation is spatially-varying across different Swedish municipalities, with higher importance in rural municipalities.

3 Methodology

3.1 GWR model

A conventional global regression model assumes constant relationships across a study area and calculates the average relationship between the dependent variable and the independent variables across space. However, many spatial processes vary in reality, which makes the conventional global regression model unfit for estimation (Oshan et al. 2020). The GWR model relaxes the spatial homogeneity assumption by allowing the parameter estimates to vary locally across the spatial units of analysis.

The GWR model is represented in this paper by:

$$y_{r}=\beta _{0r}\left(u_{r},v_{r}\right)+{\sum }_{k=1}^{p}\beta _{kr}\left(u_{r},v_{r}\right)x_{kr}+\epsilon _{r}$$
(1)

where yr is new firm formation per capita calculated by the number of new firms formed per capita (NFFPC) at municipality r, (ur, vr) represents the coordinates of municipality r’s centroid, β0r(ur, vr) and βkr(ur, vr) represent the local intercept and the coefficient of each independent variable in each municipality respectively. \(\epsilon _{r}\) is the random error at municipality r.

For the calibration of a GWR model, a spatial weighting matrix is calculated with three different elements, namely the weighting matrix, its bandwidth, and the type of distance matrix (Brunsdon et al. 1996). The weighting matrix assumes that areas that are closer to municipality r have more influence on municipality r than areas that are farther away (Chasco et al. 2008). Therefore, more weight is given to observations near municipality r, and less weight is given to observations that are farther away. Two popular choices of weighting matrices are Gaussian and Bi-square kernel functions, both of which are spatial kernel functions (Brunsdon et al. 1998).

The bandwidth is calculated such that a certain proportion of observations that are nearest to municipality r will be included in the local regression for each municipality. An optimal bandwidth parameter is selected by minimizing a corrected Akaike information criterion (AICc), which strikes a balance between model variance and bias.

The adaptive Gaussian and Bi-square weighting functions are defined formally as:

Gaussian:

$$w_{rq}=exp\left(-\frac{1}{2}\right)\left(\frac{{\mathrm{DIST}_{rq}}^{2}}{{{h}_{r}}^{2}}\right)$$
(2)

Bi-square:

$$w_{rq}=\begin{cases} \left[1-\left(\mathrm{DIST}_{rq}/h_{r}\right)^{2}\right]^{2}\\ 0,\textit{otherwise}\, \end{cases}, if \mathrm{DIST}_{rq}< h_{r}$$
(3)

where wrq represents the weight assigned to the data for municipality q when considering the calibration of the model in municipality r. DISTrq is the distance between municipalities r and q, and hr represents the different bandwidths that consider the same proportion of municipalities to be included in the estimation of the regression model for each municipality r. As for the distance matrix, GWR models usually use Euclidean distance in practice to measure spatial proximity.

3.2 Multiscale geographically weighted regression model (MGWR)

However, a drawback of the GWR is that the same spatial scale is applied to all the local relationships within the model, which means that the same bandwidth is used (Oshan et al. 2020). A fixed spatial scale is not valid in the case where the phenomena involve multiple spatial processes with distinct spatial scales (Mansour et al. 2021).

MGWR extends the GWR model by allowing each local relationship in the model to vary at a unique spatial scale and is calculated as follows:

$$y_{r}={\sum }_{j=1}^{m}\beta _{\mathrm{b_{wj}}}\left(u_{r},v_{r}\right)x_{rj}+\epsilon _{r}$$
(4)

where bwj represents the bandwidth for each of the spatial processes being modelled and is calibrated by using a back-fitting algorithm (Fotheringham et al. 2017). In comparison to GWR, MGWR has much less restrictive assumptions as the relationship between the response variable and a covariate is allowed to vary locally, vary regionally, or not vary at all (Oshan et al. 2019). By allowing each relationship in the model to vary at a unique spatial scale, eliminates over-fitting, decreases the bias in the parameter and mitigate collinearity issues (Oshan et al. 2019). Therefore, MGWR has been recommended by many scholars to be used in place of GWR in investigating process spatial heterogeneity and scale (Fotheringham et al. 2017; Oshan et al. 2019).

4 Empirical data

4.1 Spatial unit of analysis

Sweden is divided into 290 municipalities, 5984 demographical areas, and 227,235 included areas. The included areas are represented by square areas of 250 m by 250 m in urban areas and 1000 m by 1000 m in rural areas as rural areas are usually more sparsely distributed. Fig. 1 shows the representation of a Swedish municipality—Stockholm in different spatial units of reference. The empty spaces as shown in Fig. 1 are uninhabited areas such as forests, nature reserves, and lakes.

Fig. 1
figure 1

Map of Stockholm, represented in different spatial units of reference

The municipal level is chosen as the spatial unit of analysis in this study as municipalities are the smallest local authorities in Sweden, where policies are being carried out. Hence, the municipal level is considered to be localised and there can be significant differences across municipalities in the same region (Eliasson 2016). A larger spatial unit of reference relative to the municipal level like the functional region carries a larger risk of aggregation error. Furthermore, Backman (2015) found no significant relationship between the number of bank branches per capita and new firm formation on the functional regional level.

Smaller spatial units of reference like the demographical area and included areas can help to capture a higher degree of localness as compared to the municipal level. However, it is harder to obtain the data on a finer spatial level; some data are protected by personal data protection laws due to confidentiality issues, making it not possible to compare data sources from different units of reference (Eurostat 2021). Moreover, the primary data at the granular level consists of sparse counts of new firm formation and other variables, which makes it impractical to study small-scale geographical variation.

Thus, we utilize municipal level data in this study. The accessibility to bank branches is calculated by making use of the small included areas contained within the municipalities. In this way, we account for the spatial distribution of the population inside the municipality and minimise aggregation error when calculating the geographical proximity to bank branches on the municipal level.

4.2 Variables

The locations of all the local bank branch offices are captured in the form of geographical coordinates with data gathered from Argomento GIS & IT. Data on new firm formation are gathered from the Swedish Agency for Growth Policy Analysis, which do not include existing firms that are reorganised. Data related to the population are acquired from Statistics Sweden and they include information on the human capital level, changes in housing prices and the proportion of immigrants on the municipal level. From the Swedish Public Employment Service’s website, unemployment data are also obtained. Information about the firms’ perception of public attitudes towards entrepreneurship in each municipality is also obtained from an annual survey conducted by the Federation of Swedish Enterprise.

Table 1 summarises the description and expected signs of all the variables used in the models.

Table 1 Description of variables

4.2.1 Dependent variable: New firm formation per capita

NFFPC is calculated by the number of new firms formed per capita and NFFPC can be calculated using two different approaches. NFFPC calculated using the labour market approach (NFFPCLabour) is the number of new firms formed per capita in the labour population per 1000 people since firms are usually started by people from the labour population and in areas close to where they live (Audretsch and Fritsch 1994a). The age group between 16 and 64 years old is used as a proxy for the labour population. Fig. 2 shows the NFFPCLabour calculated using the labour market approach in 2013. The map shows that the NFFPCLabour is higher in metropolitan municipalities and in municipalities that are popular for tourism.

Fig. 2
figure 2

NFFPCLabour in 2013 (“Natural breaks”2 classification)

An alternative approach to calculate NFFPC is the ecological approach which standardises the number of new firms by the number of existing firms (NFFPCEco) (Audretsch and Fritsch 1994b). However, the ecological approach has been criticised by several researchers as it gives off higher values in areas with a larger mean firm size as the number of new firms are standardised by a smaller number of firms that are larger (Garofoli 1994). As a robustness test, the analysis is also conducted using NFFPCEco as a dependent variable.

4.2.2 Independent variable: Weighted mean distance to the nearest bank branch

To measure the accessibility to bank branches, the following formula is used to calculate the weighted mean distance to the nearest bank branch for each municipality y:

$$\text{WeightedDist}_{y}=\left(\frac{\sum \left(D_{xy}\cdot P_{x}\right)}{\sum P_{xy}}\right),$$
(5)

where Dxy is the driving distance to the first nearest bank branch from each included area x in municipality y, and Px represents the population in the included area x and \(\sum P_{xy}\) represents the total population in all the included areas x in municipality y. Street network analysis is used to identify the shortest driving route from each included area x to its first nearest bank branch. By weighing the distance to the first nearest bank branch with the population, the distribution of the population within the municipality is considered. Moreover, this measure assumes that the higher the number of members in the population, the higher the number of potential new firms that would apply for a bank loan. Hence, the relative location of the population and bank branches will determine the pattern of spatial accessibility to the bank branches.

Similar to the labour market and ecological approaches in calculating NFFPC, the labour market population and the existing firms’ population is considered in the calculation of the WeightedDist as different denominators in Eq. 5. The labour population is considered as bank loans are often applied by individuals from the labour market where they live since they are most likely to start a new firm.

The existing firm population is considered as the establishment of new firms might not necessarily follow the residential pattern of the labour force population but might instead follow the pattern of the locations with high firm density. A higher firm density in the region means that there is a potentially favourable environment for setting up a business, which encourages potential entrepreneurs to set up their business in the same region. Hence, there could be a higher demand for bank loans by these entrepreneurs. A high firm density can also be advantageous for new firm formation due to a higher probability in the match between firms, labour, suppliers and customer, which lowers the transaction costs (Krugman 33,32,b, a). Furthermore, banks are often profit-driven and they may locate more branches in locations of high firm density (Okeahalam 2009). The weighted mean distance to the nearest bank branch weighted by the number of existing firms is included in the robustness test.

Fig. 3 shows the weighted mean distance to the nearest bank branch weighted by labour population (WeightedDistLabour) while Fig. 6 in the appendix shows the weighted mean distance to the nearest bank branch weighted by the number of existing firms (WeightedDistFirms). It is observed that the weighted mean distance to the nearest bank branches in both Fig. 3 and 6 is generally higher in northern Sweden than in southern Sweden as bank branches in the north are more sparsely distributed.

Fig. 3
figure 3

WeightedDistLabour in 2013 in m (“Natural breaks”2 classification)

4.2.3 Reverse causality problem

However, we faced the problem of reverse causality as it was not possible to establish the direction of causality between the accessibility to the nearest bank branch variable and new firm formation. If the accessibility to the nearest bank branch is endogenous due to reverse causality, the estimate of the effect of the accessibility to the nearest bank branch on new firm formation will be biased and inconsistent. To address this endogeneity concern, an instrumental variable approach is adopted in this paper by manually estimating the MGWR model in the same way a global two-stage least squares (2SLS) is performed.

In the first stage, the accessibility to the nearest bank branch is regressed on two instrumental variables z1 and z2, and the other exogenous independent variables:

Regress \(\textit{WeightedDistLabour}_{r}\) on z1, z2, and \({\sum }_{k=3}^{p+2}x_{rk}\)

$$\widehat{\text{WeightedDistLabour}_{r}}=\gamma _{0}+\gamma _{1}z_{1}+\gamma _{2}z_{2}+{\sum }_{k=3}^{p+2}\gamma _{k}\left(u_{r},v_{r}\right)x_{rk}$$
(6)

where γk(ur, vr)xrk is the coefficient for each exogenous independent variable k (k = 3, …, p + 2) in municipality r.

Second stage of 2SLS: NFFPCLabour is regressed on the fitted values of \(\text{WeightedDistLabour}_{r}\):

$$y_{r}=\beta _{0}\left(u_{r},v_{r}\right)+\beta _{1}\left(u_{r},v_{r}\right)\widehat{\text{WeightedDistLabour}_{r}}+{\sum }_{k=1}^{p}\beta _{k}\left(u_{r},v_{r}\right)\overline{x_{rk}}+v_{r}$$
(7)

where yr is the NFFPCLabour and vr is a composite error term that is uncorrelated with \(\widehat{\text{WeightedDistLabour}_{r}}\).

The first instrumental variable of the WeightedDistLabour in 2013 is a temporal lag of the WeightedDistLabour variable in 2000 (WeightedDistLabour_2000). The second instrumental variable measures the weighted mean distance to the nearest savings bank weighted by the labour population in 2000 (WeightedDistSavingsLabour_2000). As pointed out by Backman (2015), the historical localisation of savings banks provides insight into the present localisation of primary savings banks and other banks. There is a high bivariate correlation of 0.401 between the WeightedDistSavingsLabour_2000 and WeightedDistLabour in 2013. The original purpose of establishing the savings banks historically was to encourage people to save up for their retirement and to lower the cost of alms-houses, which did not follow the strict logic of profit (Backman 2015). Furthermore, the historical localisation of savings banks is unrelated to the present NFFPCLabour in 2013, which is supported by its low bivariate correlation of −0.054.

The manual estimation of Eq. 6 and 7 is not ideal as the standard errors of the locally varying parameter estimates that are obtained from a manual second-stage of the 2SLS MGWR model are likely to be incorrect. However, in the current MGWR framework, there is no known solution that can deal with both endogeneity and incorrect standard errors of a manual 2SLS simultaneously (Bilgel 2020). There are three possible options as explained by Bilgel (2020) who weighed the different options. One option is to ignore endogeneity by estimating a conventional MGWR model as in Eq. 4 (i.e., locally varying but biased and inconsistent parameter estimates if the accessibility to the nearest bank branch is truly endogenous). Another option is to estimate a manual 2SLS MGWR model as in Eq. 6 and 7 (i.e., locally varying and unbiased but imprecise parameter estimates due to incorrect standard errors.). The third option is to estimate a global 2SLS model (i.e., unbiased but erroneously fixed and monotonic estimates with correct standard errors). Since the objective of this paper is to estimate the effect of the accessibility to bank branches on new firm formation for each municipality, we proceed with a manual 2SLS MGWR model that has a small bias due to the potentially incorrect standard error. Since the MGWR is linear, the manual estimation by plugging the fitted values from the first stage leads to a consistent estimator.

The first stage results in the 2SLS model for WeightedDistLabour is shown in Table 5. Both the instruments WeightedDistLabour_2000 and WeightedDistSavingsLabour_2000 fulfil the relevance condition because they are correlated with the endogenous variable, which is verified by significant estimates in the first-stage equation, high pairwise correlations, and high F‑value (above 10), leading to a rejection of the null hypothesis of weak instruments. The validity condition implies that the instruments are not correlated with the error term. Similar instruments have been utilised in previous empirical work (Guiso et al. 2004; Alessandrini et al. 2010; Backman 2015).

The Sargan test is also conducted for testing of instrument exogeneity using overidentifying restrictions since there are more instrumental variables than endogenous variables and rejecting the null hypothesis means that at least one of our instruments is invalid. The null hypothesis of the Sargan test is not being rejected as the p-value of this test are above 5 per cent for the estimated model. By using a Hausman test, we reject the null hypothesis that there is a systematic difference between the coefficients in the OLS and 2SLS models at the 5% level, supporting the use of the 2SLS model.

4.2.4 Control variables: other determinants of new firm formation

Several additional explanatory variables were also obtained and included in the model explaining new firm formation as they were shown to determine regional differences in NFFPC in previous studies. These include firm density, establishment size, human capital level, unemployment rate, industry diversity index, industry specialization index, the percentage of immigrants, change in housing prices, income growth and entrepreneurial attitudes. The formulas for the calculation of the industry diversity index, industry specialisation index and income growth rate are found in the appendix.

5 Estimation procedure and empirical results

Log transformations are applied to all the variables to make the distribution of the variables more normal except for housing price change as it contains negative values. The dependent variable used in the models in this section is NFFPCLabour. The independent variables used are the WeightedDistLabour, firm density (FirmDensity), establishment size (EstSize), human capital (HumanCap), unemployment rate (UnempRate), regional industry diversity index (TheilIndex), specification index (SpecIndex), the share of immigrants in the municipality (ImmigrantsShare), change in housing price (ChangeHP), income growth (IncomeGrowth) and entrepreneurial attitudes (EntrepreneurialAttitudes). All the variables are measured in 2013.

5.1 Findings

Results from the second stage of the 2SLS model in Table 2 are first reported to provide context for the GWR and MGWR results1. The independent variables do not pose the issue of multicollinearity based on their global variance inflation factors (VIFs) when evaluated against one another (i.e., less than 5) (O’Brien 2007). Overall, 8 out of 11 variables have significant relationships with NFFPCLabour with p-values less than 0.1 in Table 2. Table 2 also shows that WeightedDistLabour has a significant negative relationship with NFFPCLabour.

Table 2 Second stage of 2SLS model

The global 2SLS model has a moderately low adjusted R‑squared of 0.468, which indicate that about 54% of the variation in NFFPCLabour across Swedish municipalities cannot be explained by the variables chosen in this study. One of the unknown factors that can increase the explanation of the variation in NFFPCLabour is the spatial non-stationarity that may exist in the relationships between the dependent and the independent variables. As the global 2SLS model assumes that the relationships between the dependent and the independent variables are not spatially varying, the global model may not be enough to describe the underlying relationship (Mansour et al. 2021).

To explore the local spatial variation in the relationships between the WeightedDistLabour and NFFPCLabour, GWR and MGWR models are estimated. The results of the GWR and MGWR models are shown in Table 3 and 4 respectively. For the brevity of the results’ presentation, the results are summarised using the minimum, median and maximum values of the parameter estimate for each variable. The GWR models using the Gaussian and Euclidean weighting matrices also have a slightly improved adjusted R‑squared of 0.496 and 0.559 respectively, with an optimal bandwidth of 192 nearest neighbours.

Table 3 GWR Results
Table 4 MGWR Results

The calibration of an MGWR model creates a set of optimal bandwidths which describe the scale for each spatial varying process in the model. As compared to the single bandwidth of 192 nearest neighbours in the GWR models, the bandwidths for each explanatory variable are listed in Table 4. The MGWR models using the Gaussian and Euclidean weighting matrices have a much larger improvement in the adjusted R‑squared of 0.723 and 0.683 respectively.

It is observed that for both MGWR models which are based on a Euclidean distance matrix, the Gaussian and Bi-square weighting functions give the same set of bandwidths for each process in the model. Though the parameter estimates from both models 3 and 4 in Table 4 are the same in terms of magnitudes and signs, model 3 which is based on a Gaussian weighting function has the lower AICc of the two models, which means that it has a better fit. Therefore, the subsequent discussion of the results is devoted to the MGWR model based on a Gaussian weighting function.

5.1.1 Local effects of WeightedDistLabour on NFFPCLabour

Fig. 4 maps the coefficients of the GWR and MGWR for the WeightedDistLabour variable, with the parameter estimate surfaces for GWR on the left and MGWR model on the right. The grey shaded regions on the maps are not statistically different from zero as their confidence intervals overlap with zero. In both the GWR and MGWR models, similar patterns are exhibited for the WeightedDistLabour in explaining the spatial distribution of the NFFPCLabour at the municipal level as the coefficient estimates of the WeightedDist variable do not vary much across the study area, as observed in Fig. 4. All statistically significant parameter estimates of WeightedDistLabour on NFFPCLabour are of the same sign, which indicates spatial monotonicity. The negative sign indicates that a long distance to the nearest bank branch is associated with a lower new firm formation in municipalities where the effect of WeightedDistLabour on NFFPCLabour is statistically significant.

Fig. 4
figure 4

Maps for GWR (a) and MGWR (b) parameter estimates surfaces for WeightedDistLabour, which tend to show global patterns of spatial heterogeneity. Grey regions are not statistically different from zero

5.1.2 Local effects of other variables on NFFPCLabour

Based on the results of the MGWR model with Gaussian weighting function, six relationships occur at an effectively global scale (WeightedDistLabour, EstSize, HumanCap, SpecIndex, IncomeGrowth, and EntrepreneurialAttitudes) with large bandwidths indicating almost all the municipalities are included in each local subset. Five processes seem to occur at a regional scale (FirmDensity, UnempRate, TheilIndex, ImmigrantsShare, and ChangeHP) with relatively smaller bandwidths of more than 50 nearest neighbours and only one process varies locally (the intercept) with a relatively small bandwidth of 38 nearest neighbours.

5.1.3 Model diagnostics

Overall, MGWR provides a better model fit than GWR as it provides a lower AIC and AICc value than GWR and a much higher R‑squared than GWR. MGWR is also less prone to multicollinearity problems, which is supported by the lower local condition numbers compared to GWR, which are all well below the rule-of-thumb of 30, as shown in Fig. 5 (Wheeler and Tiefelsdorf 2005; Wheeler 2007).

Fig. 5
figure 5

Maps for local condition numbers for GWR (a) and MGWR (b)

5.2 Robustness check

Three different robustness tests are conducted. The first robustness test is by analysing with the use of the data from 2007, the second robustness test is conducted by using the WeightedDistFirms variable along with other data in 2013, and the third robustness test is by utilising an ecological approach for calculating NFFPC in 2013 as the dependent variable along with other data in 2013. All three robustness tests are conducted with a Gaussian weighting function and a Euclidean distance matrix. The results of the MGWR models for all three robustness tests are reported in Table 6. The results are mapped for visualisation of patterns in Fig. 7, with the parameter estimate surfaces for GWR on the left and MGWR model on the right.

For all three robustness tests, the relationship between the weighted mean distance to the nearest bank branch variable and NFFPC occur at an effectively global scale with a bandwidth of 288 which include almost all the municipalities in each local subset, which is consistent with the results in Table 4. However, when inspecting the MGWR parameter estimate surfaces on the right side of Fig. 7, they exhibit different patterns in all three robustness tests. The second robustness test show results which are consistent with model 3 from Table 4.

By comparing the results between the GWR model and MGWR model in the second robustness test, the MGWR surfaces for WeightedDistFirms display little-to-no spatial heterogeneity and are statistically non-zero, as observed in Fig. 4. Moreover, the MGWR parameter estimates for WeightedDistFirms in the second robustness tests are between −0.177 to −0.167, which are higher than the MGWR parameter estimates for WeightedDistLabour in model 3 from Table 4.

The MGWR parameter estimates for WeightedDistLabour for the other two robustness tests hardly show any statistical significance. For the first robustness test where the data in 2007 is used, the MGWR parameter estimates for WeightedDistLabour in the first robustness test are between −0.047 to −0.010, which are 3 times lower than the parameter estimates for WeightedDistLabour in model 3 from Table 4. The number of bank branches has decreased drastically in Sweden from 2007 to 2013, especially in the countryside where about 13% of the bank branches has shut down over the years. Thus, possible reasons could be the drastic decrease in the number of bank branches and the financial crisis in 2008 which have resulted in a stronger association between the weighted mean distance to bank branches and NFFPCLabour in 2013. A finding by a recent paper shows that an external event such as a financial crisis can affect the relative importance of location factors (Cruz and Teixeira 2021).

As for the third robustness test using NFFPCEco as a dependent variable, the MGWR parameter estimates for WeightedDistLabour are between −0.08 to −0.071 which are also lower than the MGWR parameter estimates for WeightedDistLabour in model 3 from Table 4. Only some rural municipalities show statistically non-zero estimate surfaces for WeightedDistLabour in the third robustness test. The calculation of NFFPC using an ecological approach can however be misleading due to a small denominator problem in regions with a small number of new firm formations and an even smaller number of existing firms, which might bias the results (Garofoli 1994).

6 Conclusion

This paper demonstrates the potential of MGWR to improve our understanding of how the weighted mean distance to the nearest bank branch influences the NFFPC. The results show that a mix of global and local processes can best model new firm formation, which provides a richer quantitative representation of the determinants compared to both GWR and global models. The MGWR model is also able to overcome the problems of GWR due to multicollinearity, which has also a higher model fit. As MGWR is a multiscale method, it may be useful for facilitating the development of more specific policy development by framing new firm formation determinants through a mix of global, regional, and local spatial contexts (Oshan et al. 2020).

It is important to note that the purpose of the MGWR model is not to claim causality but to explore the spatial variation of the relationship between new firm formation and its independent variables, which can have important policy implications. The results of the MGWR model in this paper reject our hypothesis that the relationship between new firm formation and physical accessibility to banks differs across space, which means that regardless of the location, the effects of the proximity to the nearest bank branch on new formation is the same.

A policy implication resulting from this finding is that it is important to focus on the geographical accessibility to bank branches in all the Swedish municipalities. A state-owned company called Almi, provides loans, venture capital and advice for start-ups and established companies. They have 16 regional subsidiaries in Sweden, as of February 2022. The business loan that Almi provides only acts as a complement to the company’s financing solution, where the bank is expected to be the other financier (Almi 2021). Almi also charges a higher interest rate which is above the average bank interest rate. They usually put together the entire financing solution in collaboration with the bank. Given the negative association between the weighted mean distance to the nearest bank branch and new firm formation in all Swedish municipalities and the complementary nature of the business loan provided by Almi, this means that the geographical accessibility to bank branches would continue to play an important role in the financing of firms in Sweden, regardless of the location. Furthermore, it would be beneficial to new companies if Almi can lower their interest rates so that new companies would not be intimidated by the high interest rate charged by Almi.

One of the three robustness tests also shows consistent results in terms of the negative global association while showing insignificant results in the other two robustness tests. The robustness test which uses 2007 data shows that the relationship between WeightedDistLabour and NFFPCLabour can change over time due to external changes such as the increasing closures of bank branches. Another finding in the paper is that some determinants vary in the parameter estimates over space. For instance, FirmDensity, UnempRate, TheilIndex, ImmigrantsShare and ChangeHP show regional and local variation in the parameter estimates over space. Hence, localized policy interventions can be designed for these variables.

To extend this research in the future, additional determinants can be identified to explore within MGWR models of new firm formation. Furthermore, the outcome of this research could be operationalized by collaborating with policymakers to formulate, deploy and evaluate specific policies to increase entrepreneurship and increase the accessibility to financing for entrepreneurs. Lastly, similar MGWR model specifications can be applied in other developed countries which are reliant on banks such as Germany and Norway to validate the conclusion obtained in this paper and to compare the results for other developed countries. These would allow us to increase our understanding of the multiscale processes like the proximity to banks on new firm formation and increase our ability to plan policy and increase entrepreneurship in the long run.