1 Introduction

1.1 Energy efficiency, sustainability and entrepreneurship

Industrial efficiency can decisively contribute to reaching sustainability goals (UN, 2015). Environmental innovation can drive businesses towards improved international standards while also benefitting from technological spillovers (Gatto and Drago, 2021). This includes energy policy objectives at both firm and country-level (Aldieri et al., 2021). This fact is made possible from cleaner technologies investments, which allow for improved resilience and reduced vulnerability of the firms which invest in transitioning (Gnansounou, 2008; Yong et al., 2016). In this framework, competitiveness requires tailored policy strategies (Farinha et al., 2020). Ultimately, the elevation of entrepreneurship can contribute to solving major socioeconomic and environmental issues, acting as a driver for the empowerment of the vulnerable socioeconomic categories in both developed and developing countries (Gatto and Drago, 2021).

Entrepreneurship boosting combined with vulnerable relief may happen in a number of ways. A number of strategies in the energy sector have been enhanced to promote rural electrification, local development and social change (Sadik-Zada et al., 2022). Renewable energy programmes are, indeed, formulating new possibilities for the poor, rural people, women and other vulnerable categories. Often, these projects are accompanied by microfinance plans, including microcredit, microinsurances and remittances strategies. These programmes contribute to energy efficiency, ethics and sustainability (Drago and Gatto, 2022). On the other hand, entrepreneurship promotion can help to preserve the environment and mitigate climate change, showing new solutions for the governance of common-pool resources (Morrow et al., 2022; Gatto, 2022a; Cisco and Gatto, 2021).

Novel investigations also have an important role in guiding increasing scholarship, policymaking and practical actions towards the energy transition (Gatto, 2022b; Sadik-Zada and Gatto, 2021). This process requires a combined action striving for business, societal and ecological improvements, whereby entrepreneurship retains a lead role in activating the innovation potential and enhancing occupation (Boons et al., 2013). Entrepreneurship is of primary importance for the energy sector (Aldieri and Vinci, 2018). It is, indeed, true that innovation is being increasingly coupled with sustainable development industrial models – such as the circular economy (Veleva and Bodkin, 2018). These industrial patterns are shown to be able to render a significant sustainable competitive advantage (Lopes and Farinha, 2019).

Based on the explored dynamics, the need for studies and data on efficiency, sustainability and innovation arises. Above all, it becomes important to stimulate publication and new data and descriptors on resource policy and energy economics and econometrics. These items can be exploited for practical uses and to detect cross-cutting development issues – such as vulnerability, resilience and related topics.

1.2 The importance of extensive data on energy

This work’s motivation stems from the need for new, comprehensive data on innovation, entrepreneurship and sustainability topics, especially for panel data econometrics. Amongst the various possible purposes, such types of publications can be important for providing fresh data and insights and generating additional studies on resource policy, economics and econometrics. Previous papers have, indeed, proposed extensive datasets for energy policy, furnishing up-to-date analyses on diverse issues and evidencing this need (Masoudinejad, 2020; Blomqvist & Thollander, 2015). In particular, past scholarship built panel datasets to perform energy econometrics and economics analyses (Gatto et al., 2021).

To this end, this work provides: i) a new extensive panel dataset on energy economics (n = 4896). This is composed of both developing and developed economies, covering 136 world countries. The dataset focuses on 6 variables related to energy economics aspects and has a 6-year extension – i.e. from 2009 to 2014. Considering the original databases, this methodological and applied choice allowed preserving the largest and best quality number of data. The whole panel dataset is appended within the (supplementary material 1) section of this paper. ii) An exhaustive data descriptor furnishing theoretical and practical analysis of the new dataset to guide and inform scholars, practitioners and policymakers. This feature may serve both as a data descriptor and research note. The chosen variables refer to energy econometrics, economics and policy and can be exploited for environmental, efficiency and innovation inquiries.

The proposed panel dataset and descriptor are original. The new dataset has been already used to investigate environmental innovation and energy resilience dynamics (Aldieri et al., 2021). Following this example, this work may illuminate new energy economics, policy and econometrics publications.

2 Data description

The data regarded the entire world countries for the most representative time span (from 2009 to 2014). Data were acquired from 4 large open-access sources: i) Doing Business database (WB) (WB, 2019); ii) OECD-IEA Statistics (OECD-IEA, 2014); iii) Sustainable Energy for All (multiple intergovernmental agencies effort) (IEA, World Bank, 2017); and iv) World Development Indicators (WB, Eurostat and OECD) (WB, OECD, 2020).

The methodological choices allowed for parsimonious management of the initial data, preserving a total of 136 countries from both the developing and the developed world, keeping a low bias risk. This advantage was due to both data selection and data imputation choices. The data compose approximately 5000 observations.

Figure 1 and Table 1 present the used variables, the related databases and the data description.

Fig. 1
figure 1

Variables and databases

Table 1 Variables, databases and data description

Descriptive statistics of the dataset have been plotted with the objective to sketch key data features – i.e. mean, standard deviation, minimum and maximum value. Tables 24 summarise the descriptive statistics of the dataset. In particular, Table 2 shows the value of the variables for the full sample, whereas Tables 3 and 4 highlight the differences in the economic context between developed countries (284 x 6 years = 1,704) and developing and transition economies (532 x 6 years = 3,192). As one can observe, developed countries exhibit a higher resilience and lower vulnerability to economic shocks because they have a higher value of ‘access to electricity’ and lower values of ‘time to get electricity’ and ‘electric power losses’ than developing countries. Moreover, these variables could be useful to control energy efficiency, while ‘energy intensity’ could be evaluated as an environmental innovation component (Chen et al., 2020).

Table 2 Descriptive statistics
Table 3 Descriptive statistics by developed countries
Table 4 Descriptive statistics by developing countries

3 Data collection and treatment

Data were selected and gathered from the 4 mentioned databases. In order to ensure solidity to the dataset in terms of data population, the most prolific years have been selected (2009 to 2014). This is due to the fact that the years preceding 2009 and following 2014 presented a large number of missing values. This methodological decision was guided by data availability reasons. Considering the already vast data availability ensured from the constructed panel dataset, it was preferred to avoid large missing values regarding specific years or countries or methodological biases connected with large data imputation. Data were harmonised, aggregated and transcribed on Excel. Thus, data were cleaned, assembled and imputed. Hence, data were transposed and ordered to generate a panel dataset that could respond to econometrics and software needs.

It shall be noticed that the sorting panel dataset presented a modest number of missing values – a common issue for large datasets. More specifically, only one variable presented multiple missing values – ‘time required to get electricity’, having 55 missing values. The remaining 5 variables had, on average, 6 missing values per each, respectively: ‘energy intensity’, 6; ‘GDP per capita PP’, 7; ‘electric power consumption’, 7; ‘electric power losses’, 6; ‘access to electricity’, 4. This returned a total of 85 missing values over almost 5000 observations. In addition to this factor, there was no country nor year presenting consistent missing data points to be imputed – an additional reason that allowed for data imputation.

That is why germane data treatment techniques were applied in order to preserve as much possible data and to avoid biases due to massive or inappropriate data imputation. The methods selected to impute the missing data were: i) imputation of the single observation; and ii) cold deck imputation. The former was based on the mean of the two values. The latter occurred by keeping the same country value, substituting the missing year (t) with the closest available year (t + 1 or t-1). The single imputation method was the primary research choice. When this was not possible, the cold deck imputation technique was used. These techniques were preferred to alternative data treatment methods. This is the case for case deletion – which would have rendered a lower number of countries and observations – and different hot/cold deck imputations or Monte Carlo simulations – which might return increased biases on data.

4 Concluding remarks – research, practical and policy relevance of the data

This paper provided two contributions to the existing energy economics and econometrics literature: i) a new panel dataset; and ii) data and methodology analysis of the mentioned dataset. The whole process for data collection, cleaning, treatment and harmonisation was shown in detail. Both the items are related to the paper “Evaluation of energy resilience and adaptation policies: An energy efficiency analysis” (Aldieri et al., 2021).

The work presented a brand-new extensive dataset on energy economics. To this end, the original data organisation was reconceived. The dataset was assembled exploiting pre-existing large databases from official intergovernmental organisations statistics (WB and OECD-IEA). Selected aspects of energy economics and sustainability were highlighted. A focus on sustainability, resilience and vulnerability was bear.

Indeed, energy for mobility and power is very important for country development. The analysis proposed could be the basis for further research in such a way that energy policy can ensure affordable and secure energy services, deal with climate policy levels and manage sustainable development goals. It is important to require alternative pathways to meet objectives concerning the vulnerability of the supply system. The findings can suggest some improvements in terms of cost-effective supplies and carbon trading. In this perspective, further research is needed.

Limitations exist. Databases issued by the World Bank and other intergovernmental organisations are usually not collected regularly/yearly. Also, there are recurrent collection issues in developing and transition economies. That is why this work had to consider the most representative available and recent years. This choice has both a methodological and applied meaning. This is a limit for working on large datasets, covering the vast majority of the world’s countries.

The sketched data and data descriptor can be a valid toolkit for investigating energy vulnerability and resilience determinants, focusing on firms’ behaviour and policy response as well as sustainable development dynamics. These selected features may be explored for detecting behaviours and determinants from both the private and the public sectors. This is why both the dataset and the paper can be relevant for improving our understanding of both micro, macro and mesoeconomic dynamics of energy, development, environmental, business and industrial economics. The data and the data descriptor can be used for research, policymaking and practical analyses.