1 Introduction

In studies of urban and industrial clustering, the identification of clusters’ geographic boundaries is an important research task (Portnov and Erell 2001; Cortright 2006). Commuting patterns are often used for such identification. According to this approach, a cluster is defined as a group of urban settlements or places of employment located within a commuting range of each other (Portnov and Schwartz 2009). According to another approach, geographic contiguities of employment, productivity, wages, or population density are used for cluster identification (Cortright 2006; Desrochers and Sautet 2004; Ketels and Memedovic 2008). Alternatively, various indices of spatial association (such as, local Moran’s \(I\) and Geary’s \(C\)) can be used for the identification of geographic clusters of economic activities (Feser et al. 2001; Kies et al. 2009; Campos et al. 2012). However, these indices do not perform well if geographically referenced information on neighboring localities is sparse or unavailable (Morgenroth 2008).

In the present study, we propose and test a two-step approach to cluster identification. Using EU NUTS3 regional subdivisions, we, first, restore missing information on the geographic concentrations of economic activities, and, then, identify their clusters, using spatial analysis tools. In order to accomplish the former task (i.e., restoring missing data on the geographic concentrations of economic activities), we use light-at-night measurements, obtained from satellite imagery (DMSP 2014). The underlying assumption behind this approach is that light-at-night, emitted from geographic concentrations of economic activities, is characterized by different intensities, depending on its source—industries, commerce, services, etc. (Haim and Portnov 2013). As a result, light-at-night intensities can become a marker for different types of economic activities, helping to identify the aerial concentrations of these activities and delineate their geographic clusters.

The specific objectives of present study are:

  • To determine whether light-at-night intensities can help to identify, with a sufficient degree of accuracy, different types of economic activities concentrated in the European NUTS3 regions.

  • To identify geographic clusters of economic activities based on the economic activity data reconstructed using light-at-night measurements.

To archive these objectives, we carried out our analysis in three steps. First, we used different concentration indices (such as density of the employed, gross value added, and location quotients) to determine which of them can best be predicted by average light-at-night intensities emitted from NUTS3 regions. Next, we used our prediction models to reconstruct data on specific economic activities for NUTS3 regions with missing observations. Then, we identified the geographic clusters of different economic activities, using spatial analysis tools.

2 Background studies

2.1 Empirical approaches to cluster identification

Since the early 1990s, clusters of economic activities have been a top research subject, both in studies of business competitiveness and in the field of New Economic Geography (Porter 1990; Krugman 1991; Fujita and Krugman 2004; Krugman 2011). Being organized as a cluster, enterprises often demonstrate higher efficiency than their individual counterparts (Cortright 2006). Such positive synergy is attributed to sharing natural resources, labor and infrastructure, as well as to knowledge spillover (Ketels and Memedovic 2008; Porter 2000; Ketels 2013; Desrochers and Sautet 2004; Gordon and McCann 2000; Cortright 2006).

The synergetic effect between urban and industrial clusters is also well documented for population growth, employment, income, and productivity (Atzema and Dijk 2005; Gil et al. 2005; Loikkanen et al. 2005; Meunier and Mignolet 2005; O’Leari 2005; Portnov 2005; Wostner 2005; Portnov and Schwartz 2009). Once formed, clusters of economic activities often give birth to urban clusters (Portnov and Erell 2001; Green 2010).

There are several empirical approaches to identifying the geographic boundaries of urban and industrial clusters. Thus, according to Portnov and Erell (2001) and Portnov and Schwartz (2009), the boundaries of such clusters can be identified using established commuting patterns, according to which urban settlements, located within commuting range of each other, are considered to be a part of the same cluster.

In economic geography studies, it is common to distinguish between the “bottom-up” and “top-down” approaches to clusters’ identification (Cortright 2006). According to the former approach, well-known economic activity clusters (such as Silicon Valley and Hollywood film industry clusters) are first identified (Saxenian 1994; Scott 2004). The inter-firm relations within such clusters are then studied and used as criteria for identifying other clusters. By contrast, according to the “top-down” approach, statistical data on continuous geographic concentrations of employment, productivity, wages, or, alternatively, inter-industry linkages are used for clusters’ identification (Desrochers and Sautet 2004). According to the latter approach, an elevated (that is, e.g., above the global average) concentration of economic activities (e.g., in terms of employment, productivity and/or wages) classifies an object as a part of a cluster (Ketels and Memedovic 2008).

Thus, in an early study, Feser et al. (2001) combined this “top-down’ approach with spatial statistical analysis tools to identify employment patterns in Kentucky’s economy. In another study, Kies et al. (2009) used concentration indices of employment (empirically measured as location quotients and GINI coefficients), combining them with geo-statistical autocorrelation measures (such as Moran’s \(I\) and Getis-Ord \(G\)), to identify regional economic clusters in the German forest sector. In a separate study, Campos et al. (2012) used location quotients and local Moran’s \(I\) statistic to identify the patterns of spatial concentration of industries in the Great Britain.

2.2 Light-at-night as a development marker

In the past years, light-at-night data, generated by the U.S. Defense Meteorological Satellite Program (US-DMSP), have been used in several studies focusing on economic and technological performance of countries and regions (see inter aliaElvidge et al. 1997; Imhoff et al. 1997; Amaral et al. 2006; Sutton et al. 2007; Henderson 2009; Ghosh et al. 2009, 2010; Chen and Nordhaus 2010; Zhao et al. 2011; Kulkarni et al. 2011; Mellander et al. 2013), as well as in health studies (Kloog et al. 2007, 2009, 2010), and in several other research applications (see Ghosh et al. 2013; Cauwels et al. 2014 for detailed reviews of various light-at-night-related research applications).

Thus, Sutton (1997) studied the association between light-at-night and population density in the continental USA and concluded that it is feasible to use light-at-night measurements to predict population density, using uniform, linear, parabolic, exponential and Gaussian dependences. In a separate study, Imhoff et al. (1997) employed the “threshold” technique to convert “city lights” data into a map of urban areas in the USA. According to the results of this study, light-at-night-based estimates showed only a 5 % difference compared to the actual data obtained from the 1990 US Population Census. Amaral et al. (2006) also found a highly significant correlation between light-at-night and density of urban population in the Brazilian Amazon region \((\hbox {P}<0.01)\).

In a separate study, Doll et al. (2000) revealed a strong country-level relationship between light-at-night and GDP \((\hbox {R}^{2}=0.85)\). In another study, Sutton et al. (2007) assessed time-related changes in nighttime satellite images and compared them with changes in population density and GDP in India, China, Turkey, and the United States. The regression models estimated for the four nations had a high explanatory power, helping to predict up to 60 % of the regional GDP variation in Turkey and up to 94 % of regional GDP variation in China. In a follow-up study, Doll et al. (2006) investigated light-at-night intensities emitted from NUTS2 regions in 11 European countries and found a strong association between light-at-night intensities and gross regional product in several aggregated economic sectors, ranging from \(\hbox {R}^{2}=0.85\) for industry to \(\hbox {R}^{2}=0.89\) for services.

To the best of our knowledge, however, there have been no systematic attempts to use light-at-night intensities for reconstructing economic activity data for specific types of economic activities and to use these data to identify the geographic clusters of these activities, such as that we attempt in the present study, which methodology and results are discussed below.

3 Data and methods

3.1 Study area

In the present study, we use data for Europe, the world’s second-smallest continent (after Australia), which land area is about 23.0 million \(\hbox {km}^{2}\) and combined population is about 740 million residents (WA 2014). The continent hosts 50 sovereign states, of which 28 are member states of the European Union (EU) at present.

Since the early 1970s, the Eurostat established the Nomenclature of Territorial Units for Statistics (NUTS) used to collect and analyze regional statistical data. According to this nomenclature, all the member states of the EU, EU candidate countries and European Free Trade Association (EFTA) countries (i.e., Iceland, Norway, Principality of Liechtenstein, and Switzerland) are divided into a hierarchical system of NUTS units, ranging from NUTS1, which represent entire countries or regions, to NUTS2 and NUTS3 units, which are concomitant with regions, provinces or sub-regions (EP 2014a, b). In the present study, we use the most detailed, NUTS3 classification, formed by 1315 NUTS3 regions in the year-2010.

Although data on specific economic activities, available from the Eurostat Portal (EP 2014), are, apparently, best of this kind in the world, even this, fairly comprehensive, data source is essentially sparse in its geographic coverage. Thus, according to the Eurostat database, out of 1315 NUTS3 regions in 2010, data on specific economic activities are available for about 600 regions only, that is, for less than 46 % of all regional subdivisions (see Fig. 1). Due to this limitation, we performed our study in two phases. First, we used light-at-night intensities and other readily available (or easily calculated) geographic and socio-economic attributes of NUTS3 regions, to reconstruct missing data on specific economic activities, and, then, applied these reconstructed data to identify the geographic clusters of specific economic activities, as detailed in the following subsections.

Fig. 1
figure 1

Map of EU NUTS3 regions showing the availability of economic activity data. Notes: Overseas territories of Spain, France and Portugal are not shown on the map. GVA gross value added; ED density of the employed in specific economic activities

3.2 Research variables and data sources

In order to represent different types of economic activities in multivariate analysis, we tested several alternative indices, including density of the employed (persons per \(\hbox {km}^{2})\), gross value added by economic activity per employee (€ per person), and location quotients of both the density of the employed and of gross value added (see Appendix B)Footnote 1. The information on these indices for NUTS3 regions with available economic activity data (i.e., about 50 % of all NUTS3 regions; see Fig. 1) were obtained from the Eurostat Portal (EP 2013), while light-at-night intensity data, used in the analysis as an explanatory variable, were obtained from the US Defense Meteorological Satellite Program (DMSP 2014).

The DMSP’s satellites (coded F10, F12, F14, F15, F16 and F18) provide continuous reading of the entire Earth surface during nighttime as they circle around the globe. The satellite images, used in our study, were constructed by the DMSP by averaging daily readings of the satellite sensors and removing the cloud cover. The year 2010 satellite images we used in our study are of \(750\times 750\)-m resolution per pixel and report the light-at-night intensity in dimensionless units, ranging from 0 (the minimal light-at-night intensity) to about 1,000, which is the maximum light intensity detected by the US-DMSP satellite sensors (NOAA 2014). In particular, we calculated average light-at-night levels for individual NUTS3 regions by applying pixel-by-pixel averaging and using the “zonal statistics” tool in the \(\hbox {ArcGIS10.x}^{\mathrm{TM}}\) software.

In addition, we computed several additional NUTS3 attributes, such as latitude (decimal degrees); July and January average temperatures \((^{\circ }\hbox {C})\); distances from NUTS3 centroids to the nearest major city, to the seashore, to the main road, to the rail, and to the river (km); population density (persons per \(\hbox {km}^{2})\), and gross domestic product (€ per capita), by combining data from the ESRI ArcGIS \({}^{\mathrm{TM}}\) database (ESRI 2013) and the Eurostat Portal (EP 2013), or calculated them in the \(\hbox {ArcGIS10.x}^{\mathrm{TM}}\) software.

Geographic and socio-economic features of geographic areas are known to contribute to the locational patterns of economic activities. For instance, agriculture is often dependent on the latitude, average temperatures and the amount of precipitation (Reidsma et al. 2010), while good roads are needed to facilitate trade (Duranton et al. 2013). Finance, professional and scientific activities are often “tied” to major cities, which are loci of population density and productivity (Henderson 2010; Cuadrado-Roura and Rubalcaba-Bermejo 1998). Due to these considerations, we added the above factors as explanatory variables (in addition to light-at-night), as regional determinants of economic activities’ concentrations.

3.3 Study stages

As noted previously, the study was carried in three stages. First, we analyzed what types of economic activities can be identified, with sufficient accuracy, by light-at-night they emit. Next, we used prediction models for specific types of economic activities to reconstruct the economic activity data for NUTS3 regions with missing observations. Then, we applied spatial analysis tools to identify clusters of specific economic activities. To this end, we first computed local Moran’s \(I\) statistic (Anselin 1995) and then applied kriging interpolation to identify areas with highly positive and significant Moran’s \(I\) values (Z-Moran’s \(I>1.96; \,\hbox { P}\le 0.05\)). As an alternative cluster identification approach, we applied kriging directly to the restored values of economic activities and then selected top 5 % of the estimated values as the selection threshold for cluster delineation, as detailed in Sect. 4.5.

3.4 Statistical analysis

In order to determine the relative strength of the factors affecting economic activities’ variation across NUTS3 units, we used the following generic equation:

$$\begin{aligned} ED_{xi}= & {} b_{x0} +b_{x1} \cdot Ln(LAN)_i +b_{x2} \cdot LAT_i +b_{x3} \cdot DS_i\nonumber \\&+\,b_{x4} \cdot DM_i +b_{x5} \cdot DR_i +b_{x6} \cdot DC_i \nonumber \\&+\,b_{x7} \cdot DRI_i +b_{x8} \cdot T_{Julyi} +b_{x9} \cdot T_{Jani} +b_{x10} \cdot E_i\nonumber \\&+\,b_{x11} \cdot Ln\left( {GDPpc} \right) _i +b_{x12} \cdot PD_i +\varepsilon _x , \end{aligned}$$
(1)

where \(ED_{xi}\), is the average value of density of the employed, estimated by the model for economic activity \(x\) in region \(i;\, b_{x0},\, b_{x1}{\ldots }b_{xl}\, (l=1..L)\) are regression coefficients: Ln(LAN) \(=\) natural logarithm of light-at-night intensity, measured in dimensionless units; LAT \(=\) latitude of a NUTS3 region’s centroid (decimal degrees); DS \(=\) distance to the seashore (km); DM \(=\) distance to the nearest major highway (km); DR \(=\) distance to the rail (km); DC \(=\) distance from NUTS3’s centroid to the nearest large city (km); DRI \(=\) distance to the nearest major river (km); \(T_{July}\) and \(T_{Jan}\) \(=\) average July and January temperatures \(({}^{\circ }\hbox {C});\,E = \hbox {elevation (m)}\); Ln(GDPpc) \(=\) natural logarithm of gross domestic product (€ per capita); PD \(=\) population density (persons per \(\hbox {km}^{2})\); and \(\varepsilon _{x}\) is random error term.

Both OLS (with and without country dummies) and spatial dependency (SD) models were used in the analysis. The use of spatial dependency models (of the spatial error (SE) and spatial lag (SL) families) was necessitated by the fact that the analysis of the regression residuals from the OLS models, performed using the Moran’s \(I\) test (Ullah 1998), indicated a high degree of spatial correlation (Z-Moran’s \(I >15.0,\, \hbox {P}<0.001\)), which can potentially affect the robustness of regression estimates (Fotheringham 2009). From the above classes of spatial dependency models, we eventually chose the spatial error (SE) model of the following functional form:

$$\begin{aligned} Y_n =b_0 +b_n \cdot X_n +\varepsilon _n , \end{aligned}$$
(2)

where X is vector of independent variables, \(b_{0},b_{1}, {\ldots }b_{n}\) are regressions coefficients, and

$$\begin{aligned} \varepsilon _n =\lambda _n \cdot W\cdot \xi +\varsigma , \end{aligned}$$
(3)

where \(\lambda \) \(=\) spatial error coefficient; \(\xi \) \(=\) the vector of error terms, spatially weighted using the weights matrix (W), and \(\zeta \) \(=\) vector of uncorrelated error terms. [To calculate W, we used the “queen” neighborhood matrix that defines neighboring locations as those with either a shared border or a common vertex (GeoDa 2014), which is commonly used in empirical studies of geographically distributed data (see inter alia Pacheco and Tyrrel 2002; Roux et al. 2007)].

The models’ performance was compared using adjusted \(\hbox {R}^{2}\)’s as a measure of regression fit. Similarly, to assess the strength of light-at-night contribution to the observed variation of economic activities vs. that of other predictors, we applied the \(F\)-test of \(R^{2}\)-change, with \(F\)-values larger than 3.0 \((\hbox {P}<0.05)\) being considered as a statistically significant improvement.

During the analysis, the Kolmogorov–Smirnov/Lillifors normality test was performed. Since we detected deviations from normality for several dependent variables, a Box-Cox transformation was applied to the original values of the dependent variables with significant deviations from normality. The results of the normality test before and after the Box-Cox transformation are reported in Appendix C.

4 Research results

4.1 Economic activity predictions: OLS with and without country dummies

Table 1 reports OLS models for specific types of economic activities, for which reasonably good explanatory power of the models could be obtained \((\hbox {R}^{2}\hbox {-}adj.>0.7)\). As Table 1 shows, the models reported help to explain up to 90 % of the economic activity variation, performing especially well for professional, scientific and technical activities \((\hbox {R}^{2}\hbox {-} adj.=0.899,\hbox { F}=702.976;\hbox { P}<0.001)\), public administration \((\hbox {R}^{2}\hbox {-}adj.=0.898, \hbox { F}=879.672;\hbox { P}<0.001)\), as well as arts, entertainment and recreation \((\hbox {R}^{2}\hbox {-}adj.=0.879,\hbox { F}=568.693;\hbox { P}<0.001)\). In these models, light-at-night emerged with the expected sign, indicating that densities of the employed in these activities across NUTS3 regions tend to increase with light-at-night intensities, being especially strong for wholesale trade (\(\hbox {t}>52.0;\hbox { P}<0.001\); see Model 1, Table 1), as well as public administration, education and related activities (\(\hbox {t}>55.0;\hbox { P}<0.001\); Model 5, Table 1).

Table 1 Factors affecting the density of employed in different economic activities (persons per \(\hbox {km}^{2}\)) across the year-2010 NUTS regions (method—OLS regression without country dummies)

In addition to the above mentioned continuous variables, we also included in the models country dummies, taking on value 1 if a particular NUTS3 region is located in a given country, and zero otherwise. As we assumed, such dichotomous variables may reflect localized factors, not taken into account by other predictors, such as e.g., local mobility patterns, historical shares of traditional manufacturing industries, availability of natural resources, etc.

OLS models, incorporating countries’ fixed effects as additional predictors, are reported separately in Table 2. These models appear to increase the explained variance of density of economic activities across NUTS3 regions by additional 3.5–8.4 %, from \(\hbox {R}^{2}\hbox {-} adj.=0.810-0.899\) (Models 1-6, Table 1) to R\(^2\)-adj. \(=\) 0.894-0.939 in the models with country dummies (see Models 7-12, Table 2). In these models, light-at-night also appears to be a strong positive predictor for all the economic activities analyzed, contributing to about 40 % of the explained variance (see Table 3).

Table 2 Factors affecting the density of employed in different economic activities (persons per km\(^{2})\) across the year-2010 NUTS regions (method—OLS regression; country dummies included)
Table 3 \(F\)-test of \(\hbox {R}^{2}\)-change for different prediction models

Although the estimated values of economic activities in the NUTS3 regions demonstrated highly significant correlations \((\hbox {r}>0.6; \hbox { p}<0.05)\), models estimated for individual types of economic activities are activity-specific, not generic, as our analysis demonstrates (see Table 4). In particular, as Table 4 shows, standard errors of the estimates are significantly lower in the models estimated for specific types of economic activities than standard errors in the models estimated for one type of economic activity and applied to the others.

Table 4 Standard errors of the estimates in different prediction models

4.2 Spatial dependency models

Since we found that Moran’s \(I\) values were found to be high in the OLS models we estimated (Z-Moran’s \(I>15.0\)), we applied, at the next stage of the analysis, spatial dependency models, aimed at minimizing the information loss attributed to the inter-dependence of values of the dependent variable observed in neighboring locations (Anselin 1995). Although both spatial lag (SL) and spatial error (SE) models were tested, only the latter ones are reported in the following discussion, as demonstrating about 5 % improvement in the regression fit, as compared to the SL models.

As Table 5 shows, compared to OLS models reported in Table 1, SE models improve the regression fit to \(\hbox {R}^{2}-adj.=0.881{-}0.936\) (Models 13–18, Table 5), as opposed to R2\(-adj\)\(=\) 0.810–0.899 (Models 1–6, Table 1), that is by about 4–9 %. Compared to OLS models with country dummies, for which R\(^2\) \(-adj\).\(\,=\,\)0.894–0.939 (Models 7–12, Table 2), SE models appeared to be of similar accuracy. In these models light-at-night also emerge as the strongest positive predictor for the economic activities analyzed.

Table 5 Factors affecting the density of employed in different economic activities (persons per km\(^{2})\) across the year-2010 NUTS regions (method—spatial error (SE) regression; dependent variables—densities of employed in different economic activities, persons per \(\hbox {km}^{2})\)

4.3 Models’ verification

In order to determine whether our model estimates are sufficiently accurate and can be used for forecasting, we selected a random sample of about 10 % of cases, estimated the models anew based on the rest of the 90 % observations, and calculated the model predictions for the control cases. Then, we applied the \(t\)-test to determine whether there are statistically significant differences between actually observed values in the “control” subset of regions and our model estimates. The results of the test are reported in Table 6, which shows that the mean differences from zero are statistically insignificant in all the models \((P>0.51)\), thus indicating that our models are essentially robust.

Table 6 \(t\)-test for differences in the regression models’ residuals (test value—difference from zero; see text for explanations)

4.4 Mapping clusters of economic activities

Following the model verification stage (see the previous Subsection), we used the models reported in Table 1 to reconstruct densities of economic activities for NUTS3 regions with missing data. In particular, the calculations were performed for three types of economic activities—information and communication (Model 2, Table 1), financial and insurance activities (Model 3, Table 1) and professional, scientific and technical activities, administrative and support service activities (Model 4, Table 1). The reconstructed economic activity data are shown in Figs. 2, 3 and 4.

4.5 Identifying economic activities’ clusters

Hence clusters are characterized, according to our operational definition, by high concentrations of economic activities and similarity of neighboring observations, we carried out our analysis in several steps. First, we identified “stand-out” concentrations of economic activities using the reconstructed data on their geographic concentrations, and applying a top 5 % concentration criterion. Second, we applied the Local Moran’s \(I\) statistic to the reconstructed economic activity maps, so as to identify regions being similar in terms of the economic activities’ densities.

Geographic boundaries of the clusters of economic activities may be sensitive to the choice of territorial units, reflecting the well-known modifiable aerial unit problem or MAUP (Openshaw 2014; Portnov 2012; Jacobs-Crisioni et al. 2014). Therefore, we applied kriging interpolation to identify clusters’ boundaries more precisely. In particular, kriging smoothing was applied to both reconstructed values of economic activities, using the above mentioned top 5 % concentration criterion, and to the observed local Moran’s \(I\) values, using only positive and statistically significant values of this index (Z-Moran’s \(I>1.96\); \(\hbox {P}\le 0.05\)). The results of both identification approaches appear to be similar, as the outcome maps featured in Figs. 5 and 6 demonstrate.

Fig. 2
figure 2

Density of the employed in information and communication (persons per \(\hbox {km}^{2})\) a data available for 2010; b predicted values. Notes: The estimates of the employed are based on Model 2 (Table 1). Both available and estimated values are grouped into five classes using the quintile method

Fig. 3
figure 3

Density of employed in financial and insurance activities (persons per \(\hbox {km}^{2})\) a data available for 2010; b predicted values. Notes: The estimates of the employed are based on Model 3 (Table 1). Both available and estimated values are grouped into five classes using the quintile method

Fig. 4
figure 4

Density of employed in professional, scientific and technical activities; administrative and support service activities (persons per \(\hbox {km}^{2})\) a data available for 2010; b predicted values. Notes: The estimates of the employed are based on Model 4 (Table 1). Both available and estimated values are grouped into five classes using the quintile method

Fig. 5
figure 5

Clusters of information and communication (a), finance and insurance (b), and professional, scientific, technical activities, administrative and support services (c), identified by kriging with subsequent application of the top 5 % concentration criterion (see text for explanations). Notes: The color scales are based on the normalized values of densities of the employed in given activities, reconstructed using Models 2, 3 and 4 (see Table 1). Higher values on all scales signify higher concentrations, while lower values mark lower concentrations of economic activities in the area

Fig. 6
figure 6

Clusters of information and communication (a), finance and insurance (b), and professional, scientific, technical activities; administrative and support services (c), identified by combination of kriging and high positive values of local Moran’s \(I\) index. Notes: Color scales are based on local Moran’s \(I\) values, calculated for the density of the employed in a given economic activity (see text for explanations)

5 Discussion and conclusions

Clusters of economic activities are an important research subject in economic geography. According to previous studies (Porter 2000; Desrochers and Sautet 2004; Cortright 2006; Ketels and Memedovic 2008), such clusters can create appropriate conditions for further development, helping to formulate policy tools and capitalize on such positive effects. However, a major problem with clusters’ identification often stems for the scarcity of data about specific economic activities, attributed to limited reporting by individual countries and administrative entities.

In the present study we propose and empirically test a research approach which helps to reconstruct missing data on specific economic activities, using readily available and/or easily calculated data on light-at-night intensities, as well as other general attributes of geographic areas (such as, GDPpc, population density, and various geographical characteristics of the area - latitude, temperatures, distances to the nearest major city, etc.). According to our models, light-at-night, in combination with other readily available (or easy-to-calculate) data helps to explain up to 94 % of the economic activity variation, performing especially well for professional, scientific and technical activities, public administration and arts, entertainment and recreation. Moreover, in all models we estimated light-at-night appeared to be the strongest predictor for several types of economic activities, with its inclusion into the models helping to explain up to 40 % of economic activities variations, being unexplained by other predictors.

The idea to use light-at-night as a marker for specific economic activities is a novel one, used in the present study, to the best of our knowledge, for the first time. The logic beyond this idea is relatively straightforward: light-at-night, captured by satellite sensors, is likely to differ by intensity, depending on its source, viz., industrial concentrations, agriculture, services, etc.; as a result, light-at-night levels can become a marker for specific economic activities, helping to distinguish between different types of economic activities on the ground.

Generally speaking, light-at-night cannot be viewed as a predictor for economic activities per se, because light-at-night is determined by economic activities’ concentrations. However, light-at-night intensities do appear to differ across different types of economic activities, thus helping to differentiate between them. The analysis of subset of control cases for models’ verification indicated that our estimates are essentially robust, and that the light-at-night variable can help to differentiate between specific types of economic activities on the ground and thus compete, with sufficiently accuracy, missing observations.

To the best of our knowledge, there are only a handful of studies, attempting to assess the association between economic performance and light-at-night levels (Ebener et al. 2005; Doll et al. 2006; Bhandari and Roychowdhury 2011; Xiangdi et al. 2012). However, unlike the present analysis, these studies focused on aggregated economic sectors only.

In the present study, we followed the general suggestion by Ebener et al. (2005) and Bhandari and Roychowdhury (2011) that economic activities’ variation can successfully be explained by light-at-night. However, we advance this idea by using the most precise classification of economic activities, available for the EU countries, and tested alternative indices to estimate these activities concentrations in the NUTS3 regions, such as, density of the employed in the activity, GVA per employed in the activity, and corresponding location quotients, which has not been done before. We took the analysis even further and used light-at-night-reconstructed economic activities’ data for the identification of geographic clusters of specific types of economic activities using spatial analysis tools.

Several limitations of the study should be mentioned. Several predictors of economic activities may influence (or be influenced by) the others. For instance, light-at night, emitted by geographic concentrations of economic activities, may be affected by population density and/or proximity to the nearest major city. In our analysis, we tried to address this issue by checking the predictors’ multi-collinearity and removing from the models predictors with high variance inflation. NUTS3 regions may also differ in size, and reported (or reconstructed data) on economic activities may, therefore, produce relatively coarse clusters’ delineations. While in the present analysis we used NUTS3 regions, due to unavailability of data for smaller geographic units, and then applied kriging to delineate clusters of economic activities more precisely, future studies should attempt to use finer geographic units, such as, e.g., municipality level data or information on individual enterprises and facilities. This approach can also help to resolve another potential problem, namely insufficiently detailed classification of economic activities, currently available for NUTS3 regional units.

It is also important to mention that light-at-night, emitted from on-ground economic activities’ concentrations differs not only by intensity, which was analyzed in the present study, but also by spectral properties, accounting for which may further improve the accuracy of economic activities’ identification results and delineating their geographic clusters more precisely.