Panel data and descriptor for energy econometrics – an efficiency, resilience and innovation analysis

The work at hand presents a new extensive panel dataset for energy economics, econometrics and policy. The referred dataset is made of 5000 observations circa, including 6 energy economics variables and the majority of the world’s countries (n = 136), extended for 6 years (2009–2014). Data can be used for diverse energy econometrics studies, especially for socioeconomic and environmental aspects of energy innovation and efficiency investigations. The analysed data can be exploited for further analyses to improve our understanding of resilience and vulnerability aspects of the domestic industry, examined at the global scale. To this end, several databases were chosen from the IEA, the World Bank and their partners. Data were collected, cleaned, treated, harmonised and analysed to return a new panel dataset. Both the new data organisation and the descriptor can be used as tools and guidance to perform sustainability, innovation and entrepreneurial inquiries and analyses, focusing on energy economics, econometrics and development policy.


Energy efficiency, sustainability and entrepreneurship
Industrial efficiency can decisively contribute to reaching sustainability goals (UN, 2015). Environmental innovation can drive businesses towards improved international standards while also benefitting from technological spillovers (Gatto and Drago, 2021). This includes energy policy objectives at both firm and country-level (Aldieri et al., 2021). This fact is made possible from cleaner technologies investments, which allow for improved resilience and reduced vulnerability of the firms which invest in transitioning (Gnansounou, 2008;Yong et al., 2016). In this framework, competitiveness requires tailored policy strategies (Farinha et al., 2020). Ultimately, the elevation of entrepreneurship can contribute to solving major socioeconomic and environmental issues, acting as a driver for the empowerment of the vulnerable socioeconomic categories in both developed and developing countries (Gatto and Drago, 2021).
Entrepreneurship boosting combined with vulnerable relief may happen in a number of ways. A number of strategies in the energy sector have been enhanced to promote rural electrification, local development and social change (Sadik-Zada et al., 2022). Renewable energy programmes are, indeed, formulating new possibilities for the poor, rural people, women and other vulnerable categories. Often, these projects are accompanied by microfinance plans, including microcredit, microinsurances and remittances strategies. These programmes contribute to energy efficiency, ethics and sustainability (Drago and Gatto, 2022). On the other hand, entrepreneurship promotion can help to preserve the environment and mitigate climate change, showing new solutions for the governance of common-pool resources (Morrow et al., 2022;Gatto, 2022a;Cisco and Gatto, 2021).
Novel investigations also have an important role in guiding increasing scholarship, policymaking and practical actions towards the energy transition (Gatto, 2022b;Sadik-Zada and Gatto, 2021). This process requires a combined action striving for business, societal and ecological improvements, whereby entrepreneurship retains a lead role in activating the innovation potential and enhancing occupation (Boons et al., 2013). Entrepreneurship is of primary importance for the energy sector (Aldieri and Vinci, 2018). It is, indeed, true that innovation is being increasingly coupled with sustainable development industrial models -such as the circular economy (Veleva and Bodkin, 2018). These industrial patterns are shown to be able to render a significant sustainable competitive advantage (Lopes and Farinha, 2019).
Based on the explored dynamics, the need for studies and data on efficiency, sustainability and innovation arises. Above all, it becomes important to stimulate publication and new data and descriptors on resource policy and energy economics and econometrics. These items can be exploited for practical uses and to detect cross-cutting development issuessuch as vulnerability, resilience and related topics.

The importance of extensive data on energy
This work's motivation stems from the need for new, comprehensive data on innovation, entrepreneurship and sustainability topics, especially for panel data econometrics. Amongst the various possible purposes, such types of publications can be important for providing fresh data and insights and generating additional studies on resource policy, economics and econometrics. Previous papers have, indeed, proposed extensive datasets for energy policy, furnishing up-to-date analyses on diverse issues and evidencing this need (Masoudinejad, 2020;Blomqvist & Thollander, 2015). In particular, past scholarship built panel datasets to perform energy econometrics and economics analyses .
To this end, this work provides: i) a new extensive panel dataset on energy economics (n = 4896). This is composed of both developing and developed economies, covering 136 world countries. The dataset focuses on 6 variables related to energy economics aspects and has a 6-year extension -i.e. from 2009 to 2014. Considering the original databases, this methodological and applied choice allowed preserving the largest and best quality number of data. The whole panel dataset is appended within the (supplementary material 1) section of this paper. ii) An exhaustive data descriptor furnishing theoretical and practical analysis of the new dataset to guide and inform scholars, practitioners and policymakers. This feature may serve both as a data descriptor and research note. The chosen variables refer to energy econometrics, economics and policy and can be exploited for environmental, efficiency and innovation inquiries.
The proposed panel dataset and descriptor are original. The new dataset has been already used to investigate environmental innovation and energy resilience dynamics (Aldieri et al., 2021). Following this example, this work may illuminate new energy economics, policy and econometrics publications. The methodological choices allowed for parsimonious management of the initial data, preserving a total of 136 countries from both the developing and the developed world, keeping a low bias risk. This advantage was due to both data selection and data imputation choices. The data compose approximately 5000 observations. Figure 1 and Table 1 present the used variables, the related databases and the data description. Time required to get electricity is the number of days to obtain a permanent electricity connection. The measure captures the median duration that the electricity utility and experts indicate is necessary in practice, rather than required by law, to complete a procedure. (WB, 2019). Electric power consumption (kWh per capita)

OECD-IEA Statistics
Electric power consumption measures the production of power plants and combined heat and power plants less transmission, distribution, and transformation losses and own use by heat and power plants. (OECD IEA, 2014 Energy intensity level of primary energy is the ratio between energy supply and gross domestic product measured at purchasing power parity. Energy intensity is an indication of how much energy is used to produce one unit of economic output. Lower ratio indicates that less energy is used to produce one unit of output. (IEA, WB, 2017). This indicator provides per capita values for gross domestic product (GDP) expressed in current international dollars converted by purchasing power parity (PPP) conversion factor. GDP is the sum of gross value added by all resident producers in the country plus any product taxes and minus any subsidies not included in the value of the products. conversion factor is a spatial price deflator and currency converter that controls for price level differences between countries. Total population is a mid-year population based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. (WB, OECD, 2020) Descriptive statistics of the dataset have been plotted with the objective to sketch key data features -i.e. mean, standard deviation, minimum and maximum value. Tables 2-4 summarise the descriptive statistics of the dataset. In particular, Table 2 shows the value of the variables for the full sample, whereas Tables 3 and 4 highlight the differences in the economic context between developed countries (284 x 6 years = 1,704) and developing and transition economies (532 x 6 years = 3,192). As one can observe, developed countries exhibit a higher resilience and lower vulnerability to economic shocks because they have a higher value of 'access to electricity' and lower values of 'time to get electricity' and 'electric power losses' than developing countries. Moreover, these variables could be useful to control energy efficiency, while 'energy intensity' could be evaluated as an environmental innovation component (Chen et al., 2020).

Data collection and treatment
Data were selected and gathered from the 4 mentioned databases. In order to ensure solidity to the dataset in terms of data population, the most prolific years have been selected (2009 to 2014). This is due to the fact that the years preceding 2009 and following 2014 presented a large number of missing values. This methodological decision was guided by data availability reasons. Considering the already vast data availability ensured from the constructed panel dataset, it was preferred to avoid large missing values regarding specific years or countries or methodological biases connected with large data imputation. Data were harmonised, aggregated and transcribed on Excel. Thus, data were cleaned, assembled and imputed. Hence, data were transposed and ordered to generate a panel dataset that could respond to econometrics and software needs. It shall be noticed that the sorting panel dataset presented a modest number of missing values -a common issue for large datasets. More specifically, only one variable presented multiple missing values -'time required to get electricity', having 55 missing values. The remaining 5 variables had, on average, 6 missing values per each, respectively: 'energy intensity', 6; 'GDP per capita PP', 7; 'electric power consumption', 7; 'electric power losses', 6; 'access to electricity', 4. This returned a total of 85 missing values over almost 5000 observations. In addition to this factor, there was no country nor year presenting consistent missing data points to be imputed -an additional reason that allowed for data imputation.
That is why germane data treatment techniques were applied in order to preserve as much possible data and to avoid biases due to massive or inappropriate data imputation. The methods selected to impute the missing data were: i) imputation of the single observation; and ii) cold deck imputation. The former was based on the mean of the two values. The latter occurred by keeping the same country value, substituting the missing year (t) with the closest available year (t + 1 or t-1). The single imputation method was the primary research choice. When this was not possible, the cold deck imputation technique was used. These techniques were preferred to alternative data treatment methods. This is the case for case deletion -which would have rendered a lower number of countries and observations -and different hot/cold deck imputations or Monte Carlo simulations -which might return increased biases on data.

Concluding remarks -research, practical and policy relevance of the data
This paper provided two contributions to the existing energy economics and econometrics literature: i) a new panel dataset; and ii) data and methodology analysis of the mentioned dataset. The whole process for data collection, cleaning, treatment and harmonisation was shown in detail. Both the items are related to the paper "Evaluation of energy resilience and adaptation policies: An energy efficiency analysis" (Aldieri et al., 2021).
The work presented a brand-new extensive dataset on energy economics. To this end, the original data organisation was reconceived. The dataset was assembled exploiting preexisting large databases from official intergovernmental organisations statistics (WB and OECD-IEA). Selected aspects of energy economics and sustainability were highlighted. A focus on sustainability, resilience and vulnerability was bear. Indeed, energy for mobility and power is very important for country development. The analysis proposed could be the basis for further research in such a way that energy policy can ensure affordable and secure energy services, deal with climate policy levels and manage sustainable development goals. It is important to require alternative pathways to meet objectives concerning the vulnerability of the supply system. The findings can suggest some improvements in terms of cost-effective supplies and carbon trading. In this perspective, further research is needed.
Limitations exist. Databases issued by the World Bank and other intergovernmental organisations are usually not collected regularly/yearly. Also, there are recurrent collection issues in developing and transition economies. That is why this work had to consider the most representative available and recent years. This choice has both a methodological and applied meaning. This is a limit for working on large datasets, covering the vast majority of the world's countries.
The sketched data and data descriptor can be a valid toolkit for investigating energy vulnerability and resilience determinants, focusing on firms' behaviour and policy response as well as sustainable development dynamics. These selected features may be explored for detecting behaviours and determinants from both the private and the public sectors. This is why both the dataset and the paper can be relevant for improving our understanding of both micro, macro and mesoeconomic dynamics of energy, development, environmental, business and industrial economics. The data and the data descriptor can be used for research, policymaking and practical analyses.