The electricity consumption of enterprises and their development over time is a relevant information for utility companies. This holds for large and small enterprises alike. Those enterprises with an electricity consumption of more than 100,000 kWh are relevant, because the energy retail market in this segment is competitive. In Germany, for instance, the churn rates among such firms, likely to switch their electricity supplier, are constantly at a high level of 10% (Bundesnetzagentur 2017, p. 205) and any of the four largest energy utilitiesFootnote 1 offer special tariffs for them. Besides that, utility companies have different effort handling supply and invoicing of electricity customers above this level (StromNZV 2005, §12). In Switzerland, the electricity retail market is only liberalized above this consumption level. Identifying these large electricity consumers is therefore relevant for utilities’ sales departments, but not so much the short-term load-forecasting that has attracted much attention from researchers in the past.

Another relevant customer segment are Small and Medium Enterprises (SMEs), because they account for at least one third of the global energy demand in industry and service (International Energy Agency 2015). In some countries, the estimated share of energy consumption of SMEs is even higher, with contributions of over 60% of the industrial sector in Italy (Trianni and Cagno 2011) and 50% in the manufacturing sector in the U.S. (Trombley 2014). Numerous new enterprises are created constantly. In Switzerland, up to 40,000 new ones are founded every year (Swiss Federal Statistical Office 2017). For those newly created firms, no data on electricity consumption is available, but could be beneficial for load planning and grid operation. For example, when a new enterprise is founded or a new industrial area is designated in a city, it is relevant for local utility companies to estimate the upcoming load. Available synthetic standard load profiles for typical businesses usually cover only a limited number of consumer tyes (‘general business’, ‘shop’, ‘bakery’) and focus on the daily load distribution, but do not help to predict the overall annual consumption of enterprises.

Besides that, information on typical electricity consumption in economic branches is interesting for enterprises themselves, given that they can compare their consumption to branch standards and take actions when competitors have lower energy demand. Likewise, Simpson et al. (Simpson et al. 2004) state that implementing environmentally friendly practices to gain higher energy efficiency can lead to a competitive advantage for SMEs. In order to obtain the desired electricity consumption of enterprises, it seems appropriate to make use of public available data, such as the economic branch of the enterprise and open big data from online sources. The meaningful use of open big data sources can lead to value-adding applications (LaValle et al. 2011; Davenport 2014), in particular in the energy retail industry (Hopf 2018), and the use of big data analytics is becoming increasingly important to firms (Constantiou and Kallinikos 2015). The challenge in analyzing big data sources lays not only in the amount of data, but alsoin the characteristics variety, velocity and veracity. Thus, the analysis and sensemaking from raw data is necessary.

By means of a dataset with 1810 enterprise addresses in a typical town in Switzerland, we investigate the annual electricity consumption of such and use the industry branch together with open big data (geographic data, online-content, social media data and governmental statistical data) to explain the electricity consumption. We also evaluate to what extent a statistical model based on the public available data sources can be used to predict the electricity consumption of enterprise customers of this utility company.

The results of this study help to better understand the enterprise’s electricity consumption per se, it allows utility companies to better plan upcoming loads or changes through economic developments, and helps companies to identify the group of high consumption customers.

Moreover, the results of this study may also help to improve modeling the electricity load in the grid. To the best of our knowledge, there is no similar study that investigates the electricity consumption of enterprises in the context of open data.

The remainder of the paper is structured as follows: First, we formulate the three research questions we will answer. Thereafter, we provide an overview of existing works and show that this is the first study investigating annual electricity consumption of enterprises together with economic branch information and open big data. Thereafter, we explain the predictor variables in detail and answer the three research questions. We discuss the findings, their implications, and future research in the concluding section.

Research goal

We formulate three Research Questions (RQs) guiding through our paper, where the first is:

RQ 1

To what extent can the electricity consumption of enterprises be explained with the base area of the enterprise building, economic branch affiliation, opening hours and online user-reviews?

Besides that, we analyze the development of the electricity over 5 years, identify trends that are reflected in the data and compare the trends with governmental statistical data. The correlations between trends in electricity consumption may indicate a decoupling of economic development and electricity demand, give further insights on the electricity consumption of enterprises, may lead to the identification of further influencing factors for prediction models and helps to assess the reliability of the presented models. Thus, the second RQ is:

RQ 2

Are economic trends (e.g., in turnover statistics or job opportunities) reflected in the electricity consumption of enterprises in different industries?

Finally, we investigate to what extend the available data can be used to predict the average annual electricity consumption of enterprises and whether this can be used as an alternative to prediction models with historical consumption data. This raises the third and last RQ:

RQ 3

How well can the annual power consumption of enterprises be predicted by using the given data sources?

Related work

A large body of research investigates the modeling and forecasting of energy demand with various purposes (Jebaraj and Iniyan 2006). However, to the best of our knowledge, our study is the first empirical work trying to explain annual electricity consumption of enterprises on an individual level with open big data that is available online to the public.

Existing studies often have a macroeconomic and long-term focus, explaining or predicting electricity consumption for whole countries (Wolde-Rufael 2006; Al-Bajjali and Shamayleh 2018; Bianco et al. 2009; Mohamed and Bodger 2005), sectors (Al-Ghandoor and Samhouri 2009) or cities (Farahat 2004).

In a comprehensive study, Schlomann et al.(Schlomann et al. 2013) describe the main electricity consumption and structural data of companies in the German trade,commerce and services sector and provide an extrapolation for final energy consumption by energy source.

Besides that, several works aim at modeling the electric grid, with various focuses and research goals. Those include improving grid stability (Kinney et al. 2005), integrating renewable energy on a large scale (Pruckner et al. 2012) and advancing communication in smart grid systems (Godfrey et al. 2010).

Further related works focus on the micro-level energy demand of different consumer groups. Besides the electricity consumption of residential customers (Kavousian et al. 2013; Apadula et al. 2012), also the consumption of enterprises was investigated so far. For the short-term predictions of electricity consumption of enterprises, Gundin et al. (Gundin et al. 2002) investigate three industrial electricity consumers and use variables such as historic demand, the number of production days, capacity utilization, size and sector of the enterprises to predict the weekly power consumption of individual companies with a Relative Root Mean Squared Error (RRMSE) of 12–18%. On the level of individual enterprises, Braun et al. (Braun et al. 2014) predict the energy consumption of a supermarket with linear regression models using weather and consumption data with an Root Mean Squared Error (RMSE) of less than 4%.

For SMEsFootnote 2 specifically, research focused on improving energy efficiency (Trianni and Cagno 2011; Bradford and Fraser 2008; Thollander and Dotzauer 2010). Lee et al. (Lee et al. 2014) estimate the weekly electricity profile of SMEs based on the mean daily consumption and operational hours of an enterprise in combination with clusters obtained from smart meter data of 196 known SMEs. However, no further studies on modeling the electricity consumption of enterprises on a micro-level could be identified that include a decent number of companies.

In summary, numerous research on modeling and forecasting of energy demand, either aggregated or on the level of individual consumers exist. However, we could not identify studies that explain or predict the annual electricity demand of individual enterprises with data present at utility companies and open big data.

Predictors for SME electricity consumption and modeling

As a first step of our research, we identified online data sources that are publicly available and may serve as predictors for enterprise electricity demand. We identified the free geographic data from OpenStreetMap (OSM) as the first data source that we use to obtain the building basal area of the main company building, the economic branch and opening hours that can be retrieved from the companies website or from business directories, and user ratings from social media platforms We underline that the investigated data sources comply with the characteristics of big data (LaValle et al. 2011), known as the four V’s (volume, variety, velocity, veracity). Even when the investigated data is not ‘big’ in terms of volume, the other charaterictics are fulfilled: online content is mostly unstructured or semi-structured and changes over time, different data types are considered, user-generated content may contain errors or wrong information and the amount of data increases by the number of companies investigated.

Figure 1 illustrates the identified predictors and the relationships between the variables that are investigated in this study. We justify the relations between the investigated factors and the electricity demand of enterprises below.

Fig. 1
figure 1

Conceptual model

Building size and energy consumption

The size of the companies’ building(s) has a significant influence on the electricity consumption. For instance, the annual electricity consumption per square meter in company buildings in Germany is estimated to lie between 155–183 kWh/m2 (Schlomann et al. 2013). Accordingly, in residential buildings, the size of houses is one of the most important factor influencing the electricity consumption (Kavousian et al. 2013).

As a proxy for the actual building size, we consider the basal area of the building next to the company address, as mapped in OSM. We select OSM as the geographic information data source, because it is the currently largest free mapping website and the data quality is high (Jokar et al. 2015). There is, indeed, the possibility to store the number of building floors in the OSM database, which would enable to obtain the actual floor area of the whole enterprise building, but this functionality is only rarely usedFootnote 3.

Economic branch

As a second influencing factor, we consider the economic branch a company belongs to, given that the electricity demand strongly depends on the kind of business conducted. We adopt the “General Classification of Economic Activities” scheme from the Swiss Federal Statistical Office (Swiss Federal Statistical Office 2008). This allows us to compare the energy consumption development in different years also to compare with several economic trends that we investigate later in this study. The different branches are listed in Table 1.

Table 1 Economic branch classification and number of companies in the dataset with the different open big data variables available

Opening hours

We assume that longer opening hours lead to higher electricity consumption. This information can be retrieved using the Google Places APIFootnote 4. The information from this service contains opening and closing times for each day of the week. Based on this information, the amount of open hours per week can be calculated.

Online user ratings

As a fourth influencing factor of the electricity consumption of enterprises, we take user ratings on companies’ social media websites into account.

Several popular online services offer built-in rating functionalities that make statements about the quality or price level of companies possible. These evaluations, which were originally intended as a recommendation for other users, represent the popularity of places and might therefore serve as explanatory variables for the electricity consumption. We assume that companies with numerous ratings and activity on social media are more popular and have more customers than comparable companies lacking such an online presence. Consequently, comparable companies with more customers should also exhibit a higher electricity demand.

Such user ratings also served as predictors in other studies. Ye et al. (Ye et al. 2011), for example, show that user ratings and the number of reviews have a positive impact on online hotel bookings. Facebook activity can be used to predict attendance of football matches (Egebjerg et al. 2017), user-generated content related to music albums has a positive correlation with sales (Dhar and Chang 2009) and movie ticket sales can also be predicted using online ratings (Duan et al. 2008). Social media content was also used in other areas including the prediction of election results or macroeconomic developments (Yu and Kak 2012).

We select the platforms Facebook, Yelp and Google as sources for user-generated content in this work.


In this section, we describe the available datasets, our data preparation steps and present our analysis. We use explanatory linear regression models to answer the first RQ, correlation analysis to answer the second RQ, and evaluate predictive models to answer the third RQ.

Experimental data and data preparation

For our study, a dataset with 2282 names and addresses of enterprise locations together with annual electricity consumption in the years 2010–2014 was available. This dataset is a typical data base that is present to any energy retailing company having enterprises as customers.

All enterprises are located in an exemplary city in SwitzerlandFootnote 5. We converted the address into a geographic coordinates using a geocoding service, being able to further retrieve online location data.

The electricity consumption per year was normalized by the number of consumption days, giving us the Consumption per Day (CPD). This CPD (M=284.58kWh, SD=1379.07 kWh) is suspected to contain a number of extremely high values. Initially, we transformed the consumption with the natural logarithm, resulting in an approximately normal distribution. Following Tukey (Tukey 1977), we replaced the consumption in 38 cases, where the log-transformed consumption was 1.5 times the inter-quartile-range higher than the median, with the value of the 95% percentile (1091.46 kWh). This replacement was performed to remove extreme values that might distort the linear models and leads us to an adjusted CPD of M=171.66 kWh (SD=371.07 kWh).

We obtained the branch membership for each company location by collecting a number of words describing the business activity from three data sources. First, we used the words in the company name. Second, a business directoryFootnote 6 was used to obtain descriptions of each company. Third, keywords from the Google Places API Footnote 7 were retrieved.

Considering the collection of all words, describing the business activities of the companies, we associated them with the respective economic branch when the textual description contained a certain keyword (see Table 1). In some cases, the branch was manually attributed. This mapping enabled us to associate economic branches for 1810 of the 2282 company locations.

We exclude all branches from our analysis with less than 25 company locations, because of low statistical validity of the findings. To get an impression of the data, we show descriptive statistics for all variables, the correlation between the variable and the logarithmized electricity consumption in Table 2. Following Cohen (Cohen 1988), all variables show a weak positive correlation with the electricity consumption, which suggests a further examination of the relationship using linear regression models.

Table 2 Open big data variables with presence for the company locations, descriptive statistics and the correlation with normalized electricity consumption (log)

We have no information on the size of the enterprises (turnover or number of employees), but we assume that a large portion are SMEs and we find evidence in two descriptive facts on the data. First, we found 1467 unique enterprise names enabling us to group the addresses to enterprises. Each enterprise has M=1.65 (SD=3.79) locations, but the majority (80%) of enterprises have only one address. The grouping of addresses was just a descriptive analysis and we use the company locations independently from their affiliation to an enterprise in the remaining analysis of the paper. Second, the median of the base area of all enterprises is 476.28m2 (e.g., a square with a side length of 22m).

Explanatory models of the electricity consumption

In this first analysis, we use linear regression models with ordinary least squares estimationFootnote 8 and answer RQ 1 based on the data. The regression models are described in Eq. 1 in a general form. For each observation i, we consider the mean CPDi for all years as the dependent variable and transform the values with the natural logarithm, given that the distribution of this variable is approximately log-normal. In different models, we use n explanatory variables xj,j∈{1,...,n} to investigate combinations of them. While β0 represents the intercept, βj,j∈{1,...,n} are regression coefficients that describe the size of the effect of the variables xj.

$$ log({CPD}_{i}) = \beta_{0}+\beta_{1}x_{1i} + \ldots + \beta_{n}x_{ni} + \epsilon_{i} $$

The explanatory variables basal area, opening hours, user ratings and Facebook visits are numeric and are used as we obtained the values from the open data sources. The industry branch is a categorical variable which we represented as a binary dummy variables for all branches, whereas the economic branch “S” (other service activities) serves as default and is encoded in case all dummy variables are zero. εi denote the error terms in the regression model. We estimate separate models for the different influencing factors first (Model 1 – 5) to see the direct effect of the variables on the electricity consumption and the amount of explained variance (R2). Model 6 and 7 combine the different variables.

Table 3 shows the estimated coefficients for linear regression models for the variables base area, opening hours, number of visitors on Facebook and the combined number of reviews on Yelp, Google and Facebook independently. All variables have a statistically significant effect in the individual models. The estimated effects can be interpreted as follows: Per m2 basal area, the electricity consumption increases by e0.239=1.269979 kWh, per additional opening hour, the consumption increases by 1.0% (e0.009937=1.009987). Per additional online rating, the consumption increases by 2.5% (e0.02429=1.024587). The increase in consumption per Facebook per additional visit is small with 0.14% (e0.001366=1.001367) and only estimated based on a smaller sample, but the effect is statistically significant.

Table 3 Linear regression models explaining logaritmized CPD with each influencing factor separately

According to the low estimates of the coefficients in the models, the explained variance (R2) of the logaritmized CPD is quite low, ranging from 2% to 8%. The R2 for Model 4 is slightly higher than for Model 1–3, even though the effect of Facebook visits is small. We assume that this is a result of the different numbers of observations (202 instead of 1810) that are available, given that only those companies offered a Facebook page.

The influence of the economic branches is included in Model 5 (Table 4).

Table 4 Linear regression models explaining logaritmized CPD with the branch information and combined models with multiple influencing factor

In this model, the branch membership has a significant influence on the electricity consumption and the explained variance is higher than in the Models 1–4.

Model 6 and 7 in Table 4 show the estimates for multinomial regression including also variables from online data sources. By adding the number of opening hours, Facebook visits and the basal area to the model, the estimates for branches M and O are not anymore significant, but the explained variance increased (adjusted R2=0.13).

In Model 7, we consider only service-oriented enterprises with direct customer contact, because these companies have also a sufficient number of online ratings and social media data present. Interestingly, the opening hours have a slightly higher influence in this model and the explained variance could be further increased (adjusted R2=0.18). One reason for that can also be that the companies in these branches are more homogenous. We conclude that we can explain electricity consumption of enterprisies to some extend and thereby answer our first RQ.

Reflection of economic trends in electricity consumption of enterprises

In the available dataset, the annual electricity consumption for the years 2010–2014 is available. In this analysis, we want to see whether economic trends are reflected in the energy consumption of typical enterprises in different economic branches and thus answer RQ 2.

For data on economic trends, the Swiss Federal Statistical Office offers numerous official statistics. For the years 2010–2014, datasets on employment, turnover and electricity consumption were retrieved, where the same branch classification as in Table 1 was usedFootnote 9. All statistics are aggregations on the level of the local canton of the city, except for energy consumption, where the data for whole Switzerland was used. We answer our second RQ for each of the considered statistic data below.

Labor market statistics No significant correlation between labour market statistics and the electricity consumption exists in most branches. However, in the construction branch a strong and significant correlation (p<0.1) is present.

Turnover statistics Turnover statistics are available for the secondary sector (manufacturing, industry, crafts, energy and construction) in Switzerland. Sales for each quarter were reported as indices (annual average 2010 corresponds to 100%). The annual average was calculated for these quarterly figures, which in turn was used to calculate the correlation with electricity consumption. The results are shown in Fig. 2. No significant correlations (p<0.1) could be found for the sectors C (manufacturing industry / manufacture of goods) and D (energy supply). However, there is a strong linear correlation for the construction industry (F).

Fig. 2
figure 2

Correlation of electricity consumption with governmental statistical data in the years 2010–2014

Nationwide electricity consumption The majority of economic branches (12 of 16) show a positive correlation, of which D, F and M have a very strong and significant correlation with ρ>0.7. The relationship between nationwide consumption and that of enterprises in our dataset can give a perception of how representative they are for all of Switzerland. While a positive correlation leads to the assumption that findings from those branches have more general importance, this assumption can not be made for branches with a strong negative correlation (K and S).

In summary, some interesting points have emerged from the study of the links between the electricity consumption and other statistical surveys. In some sectors, for example, there are strong and significant correlations between electricity consumption and various labour market statistics. However, there is no uniform picture of the nature of the interrelationships: whereas there is a strongly positive correlation in the retail sector, the correlations in the other sectors are usually negative. A further investigation of these interrelationships and the causalities behind them can be a goal of further research.

In addition, there is a positive correlation for most industries between the development of electricity consumption of enterprises in our dataset and the development of consumption throughout Switzerland.

Prediction of annual power consumption

In this final analysis, we answer RQ 3 and test, how well our presented models can be used to predict the electricity consumption of an enterprise for which no electricity consumption data is known.

For prediction, we consider the linear regression model 5 and 6 (see Table 4). In previous studies, linear regression models showed a good prediction performance, even in comparison with neural network and decision tree machine learning algorithms (Al-Ghandoor and Samhouri 2009; Tso and Yau 2007). However, we compare the prediction performance of the linear regression model with a Random Forest (Breiman 2001) regression model, trained with the same data as model 6.

To measure the prediction error, we use the actual electricity consumption per day yi and compare it to the predicted consumption \(\hat {y_{i}}\) for every company i∈{1,...,n}. We can then compute the Mean Absolute Percentage Error (MAPE):

$$ MAPE = \frac{100}{n}*\sum\limits_{i=1}^{n}\left(|\frac{y_{i}-\hat{y_{i}}}{y_{i}}|\right) $$

To get an impression to what extent the prediction deviates from the average electricity consumption \(\overline {y}\), we consider the RRMSE:

$$ RRMSE=\frac{\sqrt{\frac{\sum_{i=1}^{n}\left(\hat{y_{i}}-y_{i}\right)^{2}}{n}}}{\overline{y}} $$

For an unbiased estimation of the errors, we use 10-fold cross-validationFootnote 10. As a benchmark measure, we consider a random predictor taking the average electricity consumption of all company locations.

We show the results in Figs. 3 and 4. The prediction error is high for all considered models. Expectably, the random predictor has the worst performance in all metrics, the Random Forest model shows the best performance, with both regression models in between. Interestingly, the inclusion of open big data (basal area and opening hours) in the regression model 6 leads to a higher predictive error than only using economic branches (model 5) as a predictor. However, this could also be a result of model overfitting. We could not achieve significant less prediction errors by considering only the companies with strong relations to consumers (those in economic branches I, G, Q or S).

Fig. 3
figure 3

Mean Absolute Percentage Error (MAPE)

Fig. 4
figure 4

Relative Root Mean Squared Error (RRMSE)

Previous literature achieved forecast errors for long-term power consumption in the industrial sector of approximately 2% (Farahat 2004) and suggests that for energy suppliers in long-term forecasts an error of up to 10% is acceptable, which is clearly exceeded here. In addition, Savka (Savka 2005, p. 52ff) shows that predicting electricity consumption for one year in advance in the industrial and commercial sector is possible values of 6% and 3%, respectively. Those accurate load forecasts have been enabled by time series data of past consumption, which was not used for our predictions. We conclude that the detailed prediction of the actual electricity consumption based on open big data is not reliable, but can give a first estimate when historic consumption of a potential customer is not available.

In some cases, the actual electricity consumption of enterprises is not necessary and it is sufficient to identify high energy consumers with annual electricity consumption of more than 100,000 kWh. We therefore train a Random Forest classification model with the branch information and open big data features and use the Receiver Operating Characteristic (ROC) curve for evaluation (see Fig. 5). This curve shows the performance of a binary classifier by plotting the true positive rate against the false positive rate of classification. The Area Under ROC Curve (AUC) is a well-known metric to evaluate classifier(Fawcett 2006) and is in our case AUC=0.74. A random classification is considered as a diagonal line from (0,0) to (1,1) in the plot corresponding to an AUC=0.5. For further information, we provide the feature importance scores of the Random Forest prediction model in Table 5.

Fig. 5
figure 5

Prediction performance of high consumption enterprises as ROC curve

Table 5 Random Forest feature importance scores for the prediction of high consumption enterprises

In conclusion, we can answer RQ 3 as follows: The prediction of the annual power consumption of enterprises based on public available data is possible better than random, but still associated with a high prediction error. Nevertheless, the identification of companies with a high electricity consumption of more than 100,000 kWh annually is possible based on branch information and open big data.

Discussion and conclusion

In this paper, we investigated the annual electricity consumption of 1810 company addresses in an exemplary Swiss city together with information on the economic branch and open big data from various sources (geographic information, online content, social media data and governmental statistical data). In contrast to previous studies, we used only explanatory variables from public available online sources. Based on the data, we answered three research questions and can draw the following three conclusions from our research:

First, the electricity consumption of SMEs can be explained with open big data and information on the company branch using linear regression models. In detail, the size of the companys’ buildings increases the electricity consumption by 1.27kWh per additional m2, each online review increases the consumption by 2.5%, each opening hour by 1.0% and each Facebook visit by 0.14%, when using the variables as single predictors. Nevertheless, only a small part of the variance in electricity demand can be explained (from 2% to 8%) with the simple models using only one explanatory variable. By using all variables and adding the branch information to a combined model, our linear regression analysis shows that up to 19% of variance in electricity consumption can be explained among the service-oriented enterprises with direct customer contact, and up to 13% of variance considering all branches.

Second, economic trends in different industries (e.g., in turnover statistics or job opportunities) are reflected in the electricity consumption of SMEs to some extend, especially in the labor-intense construction industry. The electricity consumption of enterprises in some economic branches developed alongside open statistical surveys (such as economic development or labour market statistics) over time with strong and significantb correlation.

Third, the annual power consumption of enterprises can be predicted by using the considered public available data sources. The exact prediction of the electricity consumption using linear regression and Random Forest regression led, however, to a high average forecasting error of 340%. A random predictor, which always assumes the average as a prediction, has an error of 360%. Nevertheless, the identification of companies with a high energy consumption of more than 100,000 kWh is possible with an AUC=0.74.

Implications and contribution

Our study contributes to the sparse literature on explaning and predicting the electricity consumption of enterprises by investigating new predictor variables for the electric load of such and investigating the topic with a comprehensive dataset of 1810 company addresses.

Our results have implications for grid planning, load forecasting and energy modeling in utility companies. Competitors may use the public available data for benchmarking, as we show that the explanation and prediction of enterprise energy consumption can be supported by open big data, as firms or researchers can include the estimated influence of basal area, industry branch, opening hours, number of user ratings and Facebook visits into their energy models. Besides that, we showed how companies with a high energy consumption (>100,000 kWh) can be identified, which is a beneficial insight for electricity retailers.

We underline, that all data for the considered predictor variables stems from public available online sources and is available to researchers and practitioners for future works.

Our results extend findings from the most comprehensive study investigating the electricity demand of enterprises (Lee et al. 2014) that uses data from 196 Irish SMEs). We find support in our data that operational hours of enterprises are valid predictors of the electricity demand, but find evidence to the obvious fact that the the economic branches of an enterprise affects the electricity demand to a large extend (which Lee et al. (Lee et al. 2014) found no evidence for).

Limitations and future research

With an explained variance of up to 19%, the identified factors do not provide a full explanation of the electricity consumption of companies and further factors should be considered for a complete picture. Possible ones include the annual revenue, number and size of production equipment or the number of employees. We motivate further research to investigate such factors.

Given that a large portion of companies in our dataset are SMEs, the results presented are especially valid for SMEs and can explain the energy consumption for the companies that account for a large proportion of overall electricity consumption.

A subject of future research can be the extension of our analysis on enterprises to a broader geographic scope. So far, only companies from a single municipality from Switzerland have been considered in our case study. To lower the forecasting error of our prediction of enterprise energy consumption, further advanced prediction models (such as artificial neural networks or recurrent neural networks) could be tested. For the analysis of the reflection of economic trends in energy consumption of enterprises we used a correlation analysis. However, a panel data analysis using regression models with a time-dimension would be helpful to further verify the findings and could be subject of future work.

Furthermore, more open big data sources could be examined as influencing factors of enterprise electricity consumption. This research could be inspired by previous work on analyzing household electricity consumption with open geographic data. Hopf et al. (Hopf et al. 2016) for example used features derived from OSM to a much greater extent than this paper, including topological features, land use and landmarks in their analysis of household consumption.