One of the heaviest resource-consuming tasks of KGEMM, as with all MEMs, is the collection, update, revision, and maintenance of data. In econometric modeling, data are the key elements in determining the statistical properties of relationships. In this regard, data availability plays an important role in establishing linkages between the variables in time series-based MEMs. As discussed in the literature review, MEMs are heavily data-intensive, and obtaining comprehensive results is conditional upon the accuracy and time span of the data. MEMs are also data-dependent, with data updates and revisions resulting in re-estimation of the behavioral equations.

The fifth version of KGEMM has 828 annual time-series variables. In total, 397 of them are endogenous, expressed by behavioral equations and identities. There are 96 behavioral equations, and the rest endogenous variables are represented by identities. The endogenous variables are those on which we are interested in examining the impacts of other variables including domestic policy variables, as well as variables from the rest of the world. There are two main types of identities across the blocks of the model: System of National Accounting identities (e.g., total demand is the sum of private and government consumption, investments, and net exports) and definitional identities (e.g., nominal value added is obtained by multiplying real value added and the respective price deflator). The other 431 variables are exogenous in KGEMM. Many of the exogenous variables are dummy variables that capture permanent and temporary changes in relationships (that cannot be explained by the data)Footnote 1 and discrepancy or error terms that are used to balance relationships. The rest of the world variables, which provide a comprehensive picture of the global economic and energy ties of Saudi Arabia, are also treated as exogenous variables.Footnote 2 The remaining exogenous variables are policy-related variables and energy prices.

The data were collected from various domestic and external sources. Most of the domestic data come from the General Authority of Statistics (GaStat), formerly the Central Department of Statistics (CDSI) and the Saudi Arabian Monetary Agency (SAMA). These two sources provide a crucial portion of the country’s data. Some domestic data are collected from the Ministry of Energy (MoE), and Saudi Aramco, the Ministry of Economy and Planning (MEP), the Ministry of Finance (MoF). External data mainly come from the databases of Oxford Economics Global Economic Model, the World Bank, the United Nations, the International Monetary Fund, and the International Energy Agency. The KGEMM database includes aggregated and disaggregated sector-level data. The KGEMM database contains nominal, real (usually at 2010 prices), index, ratio, and other user-calculated variables data for the real, monetary, fiscal, external, energy sectors, as well as consumer and producer prices, labor market, and population. The mnemonics and descriptions of the variables used in the fifth version of KGEMM are documented in Appendix B.