1 Introduction

International high-dimensional datasets with relatively large cross-sectional (countries/regions; \(N\)) and time (\(T\)) dimensions are frequently used in economics. These datasets provide a wealth of information to both academics and practitioners in the field, helping them to understand economic developments. Although such large datasets provide more information, they also reveal interdependencies, such as how individual economies depend on the global system through various channels. Factors like resource sharing, political developments, trade, labour, and capital movements affect relations between countries. Global shocks and technological advances can also cause economies to become interdependent. It is therefore important to take account of these interdependencies in models with a large number of variables.

Due to increased economic and financial integration among countries, macroeconomic policy spillovers have garnered significant attention in extensive research in recent decades. Various approaches have been employed to address these channels: Decompositions such as Kalman filtering (e.g., Canova & Marrinan, 1998; Gregory et al., 1997; Kose et al., 2003; Lumsdaine & Prasad, 2003), unobserved factor models (e.g., Bai & Ng, 2007; Giannone et al., 2005; Stock & Watson, 1999, 2002), Factor Augmented Vector Auto Regressions (FAVAR) (e.g., Bernanke et al., 2005; Stock & Watson, 2005), large-scale Bayesian VARs (e.g., Banbura et al., 2010; Carriero et al., 2009; De Mol et al., 2008; Giacomini & White, 2006), Panel VARs (PVAR) (e.g., Abrigo & Love, 2016; Canova & Ciccarelli, 2013; Love & Zicchino, 2006) have all been used.

Dees et al. (2007) (DdPS) highlight the usefulness of unobserved factor models but note the difficulties in identifying these factors and assigning economic interpretations to them. DdPS also emphasize that even when all the common factors are considered, significant residual interdependencies remain unexplained due to policy and trade spillovers. In an analysis of the UK economy using the FAVAR approach, Lagana and Mountford (2005) found that while the inclusion of additional variables through factor augmentation helps to solve the price puzzle, it creates new puzzles related to the counterintuitive effects of interest rate changes on house prices, equity prices, and the exchange rate. Furthermore, Kapetanios and Pesaran (2007) conducted Monte Carlo experiments and found that estimators using cross-sectional averages perform better in capturing common correlated effects than those based on principal components.

Another widely used econometric method for analyzing high-dimensional time series data is the Global Vector Autoregressive (GVAR) modelling approach. First proposed by Pesaran et al. (2004) (PSW), it was presented as a workable solution to create a coherent global model of the world economy. DdPS developed the theoretical justification for this approach to approximate a global factor mode.Footnote 1 The GVAR approach addresses the challenge of dimensionality by decomposing large-dimensional VARs into a smaller set of conditional models, which are interconnected through cross-sectional averages. Compared to Bayesian and Panel VARs, the GVAR approach provides an intuitive framework for understanding cross-country linkages and does not impose any restrictions on the dynamics of individual country sub-models. However, modelling a complex system like the global economy using the GVAR approach invariably involves many difficulties.

The most fundamental source of difficulty is the dimension and structure of the \(N\). After PSW and DdPS, many studies have aimed to to explain the basic variables of a target country. This involves prioritizing the model results of a single country in the dataset over which the GVAR modelling approach is used. When the data contain a large \(N\) in a GVAR model, data management and computation are very time-consuming. Hence, the complexity of a GVAR model stems from the composition of \(N\), i.e., the aggregation of the cross-sections.

Since PSW introduced the model, there have been extensive applications of cross-country aggregation in the GVAR approach. Cesa-Bianchi et al. (2012) used the interdependence between China, Latin America, and the rest of emerging Asia, excluding China and India. They found that the long-run impact of a shock to China’s GDP on Latin American economies has tripled since the mid-1990s, while the long-run impact of a shock to U.S. GDP has decreased by about 50%. Assenmacher (2013) re-examined the DdPS dataset to model the Swiss economy with only its three highest trading partners (accounting for about 80% of trade), excluding other countries. Cashin et al. (2014) utilized the GVAR model to examine the economic effects of oil-price shocks, distinguishing between supply- and demand-driven shocks. They examined 38 countries/regions, from 1979Q2 to 2011Q2, and created two regions: the six countries of the Gulf Cooperation Council (GCC) and the Euro Area block, same as in the DdPS. They found that the macroeconomic consequences of oil price shocks varied significantly, depending on the source of the shock and whether the country was an oil importer or exporter.

De Waal et al. (2015) estimated two models for South Africa, one restricted to South Africa and its three main trading partners (accounting for about 55% of trade) and the other considering all countries in the DdPS data. This approach contrasts with the lower bound on the share of trading partners share reported in Assenmacher (2013). Georgiadis (2016) employed a GVAR modelling approach that includes 61 economies from 1999Q1 to 2009Q4. He aggregates the economies of Estonia, Latvia, and Lithuania into a Baltic (BAL) region, while the economies of Venezuela, Ecuador, and Saudi Arabia are classified as an Oil Exporting Countries (OPC) block. His study focused on the important role of the US in the global economy, in particular the cross-border spillovers of its economic activities. The findings suggest that the economic influence of the US is as strong abroad as it is domestically, with the exception of a few economies such as some Australasian, African, and Latin American emerging markets and China. Spillovers were particularly significant for Russia, the Baltics States, Greece, Ireland, and Luxembourg. In general, spillovers to non-advanced economies were smaller than those to advanced economies, but still significant. The study raises the question of whether global welfare could be improved if US policymakers took these spillovers into account, given the role of the US dollar as the global reserve currency.

A recent study by Zahedi et al. (2022) focused on the growing influence of China on the global economy. They constructed a Global Vector Autoregressive (GVAR) dataset with data from 42 countries covering the period 2000Q1–2019Q3. The study treated China and the US as individual countries, while European countries were grouped together, and the rest were considered as the rest of the world (ROW). The findings show that China’s increasing economic prosperity and global trade share have led to statistically significant spillover effects on the global economy. This effect is tied to China’s substantial growth in investment and activity, which has boosted its global trade.

Modelling the target country and other important countries, such as reference country, individually, and aggregating of the remaining countries into 5–10 blocks or regions makes the analysis more manageable. We consider the challenge of determining the appropriate dimension for cross-country aggregation in GVAR models and selecting the optimal model from the available options when using the same dataset. In our world of advanced computational capabilities, models of any size can be estimated for policy analysis. Policy-making institutions do not face practical difficulties in developing high-dimensional models and using them to generate forecasts. However, the availability of high computing power should not diminish the importance of the principle of parsimony. Therefore, in this paper, we propose an Akaike-type information criterion (AIC) and an ad hoc modification of this criterion to identify the optimal model from a set of competing models. The best approximating model is obtained by minimizing this criterion.

From a purely economic point of view, it is essential to reduce a GVAR model to its most parsimonious form. As will be discussed in detail in the subsequent sections, the GVAR approach involves a network comprising a target country, its influential economic and trade partners, and an aggregated entity referred to as ROW. For any target country, it is intuitive that the inclusion of more partners would improve the statistical quality of a GVAR. However, improvements in the statistical and analytical quality of scenario analyses do not necessarily occur simultaneously. In order to ensure a robust scenario analysis with concrete assumptions, it is important to distinguish carefullly between the most impactful partners and the rest of the world. This raises the question of whether to include a partner country as a separate entity or as part of the rest of the world in the GVAR model. This paper aims to provide a reliable starting point for answering this question by proposing a useful modification of the AIC.

In this paper, we also individually examine the selection of the dimension of the GVAR model for the developing countries. Our analysis is based on the updated DdPS dataset, which includes data from 33 countries from 1979Q2 to 2019Q4. The empirical applications yield findings that underscore the importance of the US, China, the Euro, and Japan for developing economies. As major participants in the global economy, these countries exert significant influence on developing economies worldwide through their economic relationships and interactions. They play a crucial role as major trading partners, sources of investment, and providers of technology and expertise for many developing countries.

The rest of the paper is structured as follows: Sect. 2 outlines the well-known GVAR model settings and some preliminary remarks on information criteria. Section 3 presents an AIC-type information criterion and its ad hoc version. Section 4 conducts Monte Carlo simulation studies to examine the performance of the proposed criterion. Section 5 contains a real data analysis. Section 6 provides a conclusion. The proofs are provided in the Appendix, and further details of the simulation experiments and empirical results are given in the Supplementary Materials.

2 An Overview of the GVAR Modelling Approach

In this section we provide an overview of the GVAR modelling approach due to Pesaran et al. (2004) to lay down the basis of our work in the subsequent sections. We consider that there are \(N\) countries (or regions) in our settings, \(i=\mathrm{1,2},\dots N\). The aim here is to explain the sources of the changes in the \(\left({k}_{i}\times 1\right)\) vector of the country-specific domestic (endogenous) variables vector, \({\mathbf{x}}_{it}\). If a total of \(k\), \(k={\sum }_{i=1}^{N}{k}_{i}\), endogenous variables are to be estimated using the standard VAR model, the problem known generally as the curse of dimensionality arises.Footnote 2 The GVAR modelling approach solves this problem in two steps. In the first step, each country-specific conditional \({{\text{VARX}}}^{\mathbf{*}}\) model—domestic variables are modelled as a VAR, augmented by the vector of foreign-specific variables—is estimated individually. In the second step, the estimated conditional country models are combined into a large global model.

2.1 Country-Specific \({\mathbf{V}\mathbf{A}\mathbf{R}\mathbf{X}}^{\mathbf{*}}\) Models

We consider the following \({{\text{VARX}}}^{*}\left({p}_{i},{q}_{i}\right)\) model as follows:

$$ {\mathbf{x}}_{it} = {\mathbf{a}}_{i0} + {\mathbf{a}}_{i1} t + \mathop \sum \limits_{\ell = 1}^{{p_{i} }} {{\varvec{\Phi}}}_{i\ell } {\mathbf{x}}_{i,t - \ell } + {{\varvec{\Lambda}}}_{i0} {\mathbf{x}}_{it}^{*} + \mathop \sum \limits_{\ell = 1}^{{q_{i} }} {{\varvec{\Lambda}}}_{i\ell } {\mathbf{x}}_{i,t - \ell }^{*} +_{it} $$
(2.1)

where \({\mathbf{x}}_{it}\) and \({\mathbf{x}}_{it}^{*}\) are \(\left({k}_{i}\times 1\right)\) vector of domestic variables specific to countries and \(\left({k}_{i}^{*}\times 1\right)\) vector of foreign variables, respectively.Footnote 3\({{\varvec{\Phi}}}_{i{\ell}}\) are \(\left({k}_{i}\times {k}_{i}\right)\) matrices of lagged coefficients, and \({{\varvec{\Lambda}}}_{i{\ell}}\) are \(\left({k}_{i}\times {k}_{i}^{*}\right)\) matrices of coefficients associated with the foreign specific variables.

The foreign variables, denoted by \({\mathbf{x}}_{it}^{*}\), is the weighted average of domestic variables

$$ {\mathbf{x}}_{it}^{*} = \mathop \sum \limits_{n = 1}^{N} w_{in} {\mathbf{x}}_{nt} $$

where \({w}_{in}\) be a series of weights with \({w}_{ii}=0\) and\({\sum }_{n=1}^{N}{w}_{in}=1\).\({\epsilon }_{it}\) is a \(\left({k}_{i}\times 1\right)\) vector of country-specific errors with \(E[{\epsilon }_{it}]=0\) and\({{\varvec{\Sigma}}}_{ii}\), \(\left({k}_{i}\times {k}_{i}\right)\) a nonsingular covariance matrix. The relationship between the shock of \(h\) th and \({h}{\prime}\) th variables of the \(i\)th country is \(\sigma_{{ii,hh^{\prime } }} = cov\left( {_{iht} ,_{{ih^{\prime } t}} } \right)\). More specifically,

$${\epsilon }_{it}\sim IID\left(0,{{\varvec{\Sigma}}}_{ii}\right)$$
(2.2)

where \({{\varvec{\Sigma}}}_{ii} = \left( {\sigma_{{ii,hh^{\prime } }} } \right)\) and the cross-sectional dependence of shocks between \(i\) th and \(n\) th country is expressed as

$$ {{\varvec{\Sigma}}}_{in} = cov\left( {_{it} ,{nt} } \right) = E\left( {_{it}{nt}^{\prime } } \right) $$
(2.3)

where \(i \ne n\) and \({{\varvec{\Sigma}}}_{in} = \left( {\sigma_{{in,hh^{\prime } }} } \right)\).

The parameters of the \({\text{VARX}}^{*} \left( {p_{i} ,q_{i} } \right)\) model in Eq. (2.1) are estimated for all countries seperately. To do this, \(\left( {k_{i} + k_{i}^{*} } \right) \times 1\) dimensional vector of domestic and country-specifc foreign variables, \({\mathbf{z}}_{it} = ({\mathbf{x}}_{it}^{\prime } ,{\mathbf{x}}_{it}^{{{*}\prime }} )^{\prime }\), is defined and Eq. (2.1) is rewritten as

$$ {\mathbf{A}}_{i0} {\mathbf{z}}_{it} = {\mathbf{a}}_{i0} + {\mathbf{a}}_{i1} t + \mathop \sum \limits_{\ell = 1}^{p} {\mathbf{A}}_{i\ell } {\mathbf{z}}_{it - \ell } +_{it} $$
(2.4)

where \({\mathbf{A}}_{i0}=\left({\mathbf{I}}_{{k}_{i}},-{{\varvec{\Lambda}}}_{i0}\right),{\mathbf{A}}_{i{\ell}}=\left({{\varvec{\Phi}}}_{i{\ell}},{{\varvec{\Lambda}}}_{i{\ell}}\right), p={{\text{max}}}_{i}\left({p}_{i},{q}_{i}\right)\), and, for \({\ell}>{p}_{i}\), \({{\varvec{\Phi}}}_{i{\ell}}=0\), and, for \({\ell}>{q}_{i}\), \({{\varvec{\Lambda}}}_{i{\ell}}=0\).

The error-correction representation, \({\text{VECMX}}^{*}\), of Eq. (2.1) can be equivalently written as

$$ {\Delta }{\mathbf{x}}_{it} = {\mathbf{a}}_{i0} + {\mathbf{a}}_{i1} t + {{\varvec{\Lambda}}}_{i0} {\Delta }{\mathbf{x}}_{it}^{*} - {{\varvec{\Pi}}}_{i} {\mathbf{z}}_{i,t - 1} + \mathop \sum \limits_{\ell = 1}^{p} {\mathbf{H}}_{i\ell } {\Delta }{\mathbf{z}}_{i,t - \ell } + \varepsilon_{it} $$
(2.5)

where

$$ {{\varvec{\Pi}}}_{i} = {\mathbf{A}}_{i0} - \mathop \sum \limits_{\ell = 1}^{p} {\mathbf{A}}_{i\ell } ,{\text{and}}{\mathbf{H}}_{i\ell } = - \left( {{\mathbf{A}}_{i,\ell + 1} + {\mathbf{A}}_{i,\ell + 2} + \cdots + {\mathbf{A}}_{i,\ell + p} } \right). $$

The estimation theory of VAR models with weakly exogenous \(I(1)\) variables was developed by Harbo et al. (1998) and PSW. PSW indicates that the weak exogeneity assumption can be tested. This assumption is generally not rejected when the weights used to create the \({(}^{*})\) variables are granular,Footnote 4 and when the target country’s economy is small compared to the rest of the world.

For unrestricted constant and restricted trend coefficient, the error correction model in Eq. (2.5) can be written as

$$ {\Delta }{\mathbf{x}}_{it} = {\mathbf{c}}_{i0} - {\varvec{\alpha}}_{i} {\varvec{\beta}}_{i}^{\prime } \left[ {{\mathbf{z}}_{i,t - 1} - \gamma_{i} \left( {t - 1} \right)} \right] + {{\varvec{\Lambda}}}_{i0} {\Delta }{\mathbf{x}}_{it}^{*} + \mathop \sum \limits_{\ell = 1}^{p} {{\varvec{\Gamma}}}_{i\ell } {\Delta }{\mathbf{z}}_{i,t - \ell } +_{it} $$
(2.6)

where \(\forall i\in N\), the dimensions of \({\boldsymbol{\alpha }}_{i}\) and \({{\varvec{\beta}}}_{i}\) are \({k}_{i}\times {r}_{i}\) and \(({k}_{i}+{k}_{i}^{*})\times {r}_{i}\), respectively. And also, the ranks of these matrices are \(rank({\boldsymbol{\alpha }}_{i})=rank({{\varvec{\beta}}}_{i})={r}_{i}\).

By adjusting \({\varvec{\beta}}_{i}\) as \({\varvec{\beta}}_{i} = \left( {{\varvec{\beta}}_{{i{\mathbf{x}}}}^{\prime } ,{\varvec{\beta}}_{{i{\mathbf{x}}{*}}}^{\prime } } \right)^{\prime }\), the possibility of cointegration both within domestic variables and between domestic and foreign variables, and consequently across domestic variables of different countries is allowed as follows:

$$ {\varvec{\beta}}_{i}^{\prime } \left( {{\mathbf{z}}_{it} - \gamma_{i} t} \right) = {\varvec{\beta}}_{ix}^{\prime } {\mathbf{x}}_{it} + {\varvec{\beta}}_{{ix{*}}}^{\prime } {\mathbf{x}}_{it}^{*} - \left( {{\varvec{\beta}}_{i}^{\prime } {\varvec{\gamma}}_{i} } \right)t $$

The error correction model in Eq. (2.6) is estimated by reduced rank regression.Footnote 5

2.2 Solution Strategy

In the second step of the GVAR approach, individually estimated country models are combined to create a large Global VAR model. For this purpose, “link” matrices \({\mathbf{W}}_{i}\) are used, which contains country-specific weights. For all \(i\), this large GVAR model’s dimension is \(\left({k}_{i}+{k}_{i}^{*}\right)\times k\), where \(k={\sum }_{i=1}^{N}{k}_{i}\) and \({k}^{*}={\sum }_{i=1}^{N}{k}_{i}^{*}\) are the total number of endogenous and exogenous variables in the entire system, respectively. This arrangement enables the rewriting of the global model in terms of \({\mathbf{x}}_{t}\), which is a \(\left(k\times 1\right)\) vector of all variables:

$${\mathbf{z}}_{it}={\mathbf{W}}_{i}{\mathbf{x}}_{t}$$
(2.7)

Substituting the right-hand side of this expression into Eq. (2.4) produces

$$ {\mathbf{A}}_{i0} {\mathbf{W}}_{i} {\mathbf{x}}_{t} = {\mathbf{a}}_{i0} + {\mathbf{a}}_{i1} t + \mathop \sum \limits_{\ell = 1}^{p} {\mathbf{A}}_{i\ell } {\mathbf{W}}_{i} {\mathbf{x}}_{t - \ell } +\epsilon_{it} $$
(2.8)

Its closed form is

$$ {\mathbf{G}}_{0} {\mathbf{x}}_{t} = {\mathbf{a}}_{0} + {\mathbf{a}}_{1} {\mathbf{t}} + \mathop \sum \limits_{\ell = 1}^{p} {\mathbf{G}}_{\ell } {\mathbf{x}}_{t - \ell } +\epsilon_{t} $$
(2.9)

where, for \(\ell = 0,1,2, \ldots ,p\),

$$ {\mathbf{G}}_{\ell } = \left( {\begin{array}{*{20}c} {{\mathbf{A}}_{1,\ell } {\mathbf{W}}_{1} } \\ {{\mathbf{A}}_{2,\ell } {\mathbf{W}}_{2} } \\ \vdots \\ {{\mathbf{A}}_{N,\ell } {\mathbf{W}}_{N} } \\ \end{array} } \right),\;{\mathbf{a}}_{0} = \left( {\begin{array}{*{20}c} {{\mathbf{a}}_{10} } \\ {{\mathbf{a}}_{20} } \\ \vdots \\ {{\mathbf{a}}_{N0} } \\ \end{array} } \right),{\mathbf{a}}_{1} = \left( {\begin{array}{*{20}c} {{\mathbf{a}}_{11} } \\ {{\mathbf{a}}_{21} } \\ \vdots \\ {{\mathbf{a}}_{N1} } \\ \end{array} } \right)\;{\text{and}}\;\epsilon_{t} = \left( {\begin{array}{*{20}c} {\epsilon_{1t} } \\ {\epsilon_{2t} } \\ \vdots \\ {\epsilon_{Nt} } \\ \end{array} } \right) $$

If \({\mathbf{G}}_{0}\) is non-singular matrix, which depends on weight matrices and estimation of parameters, the GVAR(\(p\)) is obtained by premultiplying both sides of the equaiton given in (2.9) by \({\mathbf{G}}_{0}^{-1}\):

$$ {\mathbf{x}}_{t} = {\mathbf{b}}_{0} + {\mathbf{b}}_{1} {\mathbf{t}} + \mathop \sum \limits_{\ell = 1}^{p} {\mathbf{H}}_{\ell } {\mathbf{x}}_{t - \ell } + {\mathbf{u}}_{t} $$
(2.10)

where

$$\begin{array}{l}{\mathbf{b}}_{0}={\mathbf{G}}_{0}^{-1}{\mathbf{a}}_{0},{\mathbf{b}}_{1}={\mathbf{G}}_{0}^{-1}{\mathbf{a}}_{1}\\ {\mathbf{H}}_{{\ell}}={\mathbf{G}}_{0}^{-1}{\mathbf{G}}_{{\ell}}, {\mathbf{u}}_{t}={\mathbf{G}}_{0}^{-1}{\epsilon }_{t}\end{array}$$

PSW states that the number of cointegrations in the global model cannot exceed the sum of the number of long-term cointegrations in the conditional country models.

2.3 Cross-Country Aggregation in GVAR Modelling

The GVAR modelling approach is a flexible method for taking into account various global links. However, modelling a large number of countries can pose difficulties in terms of data management, computation and data analysis. To make the analysis more manageable, one possible approach is to model the countries of high importance individually, and aggregate the remaining countries into one or more different regions. This section discusses how to perform such regional aggregation.

Suppose the dataset contains \(R\) countries/regions, \(i=1, 2,\dots ,R\), and \(i\) th region is composed of \({N}_{i}\) countries. Let \({\mathbf{x}}_{ijt}\) and \({\mathbf{x}}_{ijt}^{*}\), for \(j=\mathrm{1,2},\dots ,{N}_{i}\), denote the vectors of country-specific domestic and foreign variables in the \(j\) th country of the \(i\) th region, respectively. In this case, Eq. (2.1) can be rewritten as:

$$ {\mathbf{x}}_{ijt} = {\mathbf{a}}_{ij0} + {\mathbf{a}}_{ij1} t + \mathop \sum \limits_{\ell = 1}^{{p_{i} }} {{\varvec{\Phi}}}_{ij\ell } {\mathbf{x}}_{ij,t - \ell } + {{\varvec{\Lambda}}}_{ij0} {\mathbf{x}}_{ijt}^{*} + \mathop \sum \limits_{\ell = 1}^{{q_{i} }} {{\varvec{\Lambda}}}_{ij\ell } {\mathbf{x}}_{ij,t - \ell }^{*} +\epsilon_{ijt} $$
(2.11)

Here, the source of heterogeneity in the parameter vectors \({{\varvec{\Phi}}}_{ij{\ell}}\), \({{\varvec{\Lambda}}}_{ij0}\) and \({{\varvec{\Lambda}}}_{ij{\ell}}\) comes from distinct country-specific model coefficients. Regional weights are used to aggregate the model.

The weight of the \(j\) th country in the \(i\) th region is \({w}_{ij}^{0}\), and the sum of the weights of the countries in a region is \(\sum_{j=1}^{{N}_{i}}{w}_{ij}^{0}=1.\) Using the weighted regional variables,

$${\mathbf{x}}_{it}=\sum_{j=1}^{{N}_{i}}{w}_{ij}^{0}{\mathbf{x}}_{ijt}$$

we can derive the regional model similar to the \({{\text{VARX}}}^{\mathbf{*}}\) model provided in Eq. (2.1):

$$ {\mathbf{x}}_{it} = {\mathbf{a}}_{i0} + {\mathbf{a}}_{i1} t + \mathop \sum \limits_{\ell = 1}^{{p_{i} }} {{\varvec{\Phi}}}_{i\ell } {\mathbf{x}}_{i,t - \ell } + {{\varvec{\Lambda}}}_{i0} {\mathbf{x}}_{it}^{*} + \mathop \sum \limits_{\ell = 1}^{{q_{i} }} {{\varvec{\Lambda}}}_{i\ell } {\mathbf{x}}_{i,t - \ell }^{*} +\epsilon_{it} $$
(2.12)

where

$$ {\mathbf{a}}_{i0} = \mathop \sum \limits_{j = 1}^{{N_{i} }} w_{ij}^{0} {\mathbf{a}}_{ij0} {\mathbf{a}}_{i1} = \mathop \sum \limits_{j = 1}^{{N_{i} }} w_{ij}^{0} {\mathbf{a}}_{ij1} $$

and

$$ \epsilon _{{it}} = \sum\limits_{{j = 1}}^{{N_{i} }} {w_{{ij}}^{{0_{{ijt}} }} } \epsilon_{{ijt}} $$

Region-specific foreign variables, \({\mathbf{x}}_{it}^{*}\), can be formed using regional trade weights or country-specific trade weights. After PSW and DdPS, most studies in the literature focused on explaining the basic variables of a targeted country. This raises the question of how to organize countries other than the targeted one in the cross-section dimension. Assenmacher (2013) modelled only three trading partners along with Switzerland itself, and utilizing the DdPS dataset (GVAR-DdPS). The model included the three countries with the highest trade rates (approximately 80%) excluding the others. This more restricted version of the GVAR-DdPS dataset was justified by its ease of analysis, with fewer parameters estimated for a smaller number of countries. However, the study didn’t elaborate on different model alternatives or the reasons for choosing one.

Similarly, De Waal et al. (2015) aimed to explain the South African economy using two forms of GVAR-DdPS, restricted and unrestricted. The restricted model included data from GVAR-DdPS data only for South Africa and its three main trading partners (approximately 55%), while the unrestricted model used all GVAR-DdPS data. The reason for setting up the models in this way contradicts the lower limit of the trade partnership share in Assenmacher (2013). The study stated that the trade partnership for South Africa is closer to 80%, and thus nearly the entire GVAR-DdPS dataset should be used, eliminating the distinction between the restricted and unrestricted model. However, no conclusion was reached as to which of the two models should be selected.

A common feature of these studies is the exclusion of what is refered to in the models as the ‘rest of the world’. This refers to the aggregation of countries into a region, achieved by restricting the country dimension in the dataset, which may lead to model specification errors. Competing models with different country components are generally not considered when explaining the dynamics of the target country’s economy. To the best of our knowledge, the literature on the GVAR modelling approach has not considered exploring the strengths and weaknesses of multiple potential models derived from geographical, cultural or political regions. Given the multiple model options, the choice of the most suitable model is covered by the topic of model selection. The selection of the most appropriate model among the GVAR models containing different countries and/or regions, all derived from the same dataset, is identified as a significant gap in the literature. Therefore, model selection criteria can be used to eliminate this uncertainty. Specifically, an Akaike-type information criterion and an ad hoc modification of this criterion are proposed to allow the selection among all possible GVAR model specifications that can be generated within the same dataset.

3 Information Criteria in Model Selection of GVAR

Based on the framework developed in Sect. 2 by Pesaran et al. (2004), we introduce our proposed information criteria in this section. At the beginning, we adopt the general principle of Gourieroux and Monfort (1995) to derive the Akaike-type information criterion. To simplify to computations, we assume that \({p}_{i}={q}_{i}=p\) and cross-sectional dimension contains \(N\) countries (or regions), with\(i\in {\mathcal{S}}_{(N)}\equiv \{1,\dots ,N\}\subseteq {\mathbb{N}}\).Footnote 6 As previously defined in Eq. (2.4), the vector \({\mathbf{z}}_{it}\) vector contains all variables in the whole system. It can be re-expressed as \({\mathbf{z}}_{t} = \left( {{\mathbf{x}}_{t}^{\prime } ,{\mathbf{x}}_{t}^{{{*}\prime }} } \right)^{\prime }\), where \({\mathbf{x}}_{t}\) and \({\mathbf{x}}_{t}^{{{*}\prime }}\) are \(\left(k\times 1\right)\) and \(\left({k}^{*}\times 1\right)\) vectors respectively, and are compatible with the definitions given in Sect. 2.2. The variable\({\mathbf{z}}_{t}\), namely

$$ {\mathbf{z}}_{t} = \mathop \sum \limits_{\ell = 1}^{p} {{\varvec{\Phi}}}_{\ell } {\mathbf{z}}_{t - \ell } + {\mathbf{a}}_{0} + {\mathbf{a}}_{1} t +\epsilon_{t} $$
(3.1)

is subject to the curse of dimensionality introduced in Sect. 2.1. The unit root and cointegration properties of \({\mathbf{z}}_{t}\) are expressed with the vector error correction (VEC) model structure of the VAR(\(p\)) model:

$$ {\Delta }{\mathbf{z}}_{t} = - {\Pi }{\mathbf{z}}_{t - 1} + \mathop \sum \limits_{\ell = 1}^{p - 1} {\Gamma }_{z\ell } {\Delta }{\mathbf{z}}_{t - \ell } + {\mathbf{a}}_{0} + {\mathbf{a}}_{1} t +\epsilon_{t} $$
(3.2)

where

$$ {\Pi } = {\mathbf{I}}_{{\left( {k + k^{*} } \right)}} - \mathop \sum \limits_{\ell = 1}^{p} {{\varvec{\Phi}}}_{\ell } ,\; {\Gamma }_{z\ell } = - \mathop \sum \limits_{\tau = \ell + 1}^{p} {{\varvec{\Phi}}}_{\tau } $$
(3.3)

for \({\ell}=1,\dots ,p-1\), \({\left\{{\Gamma }_{z{\ell}}\right\}}_{{\ell}=1}^{p-1}\) and \(\Pi \) are \((k+{k}^{*})\times (k+{k}^{*})\) matrices of short and long run responses.

3.1 Information Criterion

\((k+{k}^{*})\times 1\) Dimensional vector \({\epsilon }_{t}\) in Eq. (3.1) is closed form of the VEC model of \({\epsilon }_{it}\), given in Eq. (2.2). We make the following assumption.

Assumption 1

(Weak Cross-Sectional Dependence in Errors). \({\epsilon }_{t}\) is independent and identically multivariate normally distributed, \({\epsilon }_{t}\sim IIDN(0,\boldsymbol{\Sigma })\), and \(\boldsymbol{\Sigma }\), is positively defined, \(\parallel \boldsymbol{\Sigma }{\parallel }_{2}>0\).

Remark 1

High-dimensional time series models, such as GVAR, widely subject to error cross-sectional dependence. In addition, Assumption 1 allows the likelihood analysis of the VEC model given in (3.2) and conditional expression of the model.Footnote 7

When \({\epsilon }_{t}\) is rearranged to be \({\mathbf{z}}_{t} = \left( {{\mathbf{x}}_{t}^{\prime } ,{\mathbf{x}}_{t}^{{{*}\prime }} } \right)^{\prime }\), \( \epsilon _{t} = \left( {\epsilon _{{xt}}^{\prime } ,\;\epsilon _{{x*t}}^{\prime } } \right)^{\prime } \), variance–covariance matrix of the model in accordance with Assumption 1 is

$$ E\left( {\epsilon_{t} \epsilon_{t}^{\prime } } \right) = {\mathbf{\Sigma }} = \left( {\begin{array}{*{20}c} {\mathop \sum \limits_{{xx}} {\text{}}} & {\mathop \sum \limits_{{xx{\text{*}}}} } \\ {\mathop \sum \limits_{{x{\text{*}}x}} } & {\mathop \sum \limits_{{x{\text{*}}x{\text{*}}}} } \\ \end{array} } \right) $$

where the dimensions of\({\sum }_{xx}\),\({\sum }_{xx*}\), \({\sum }_{x*x}\) and \({\sum }_{x*x*}\) matrices are \(\left(k\times k\right)\), \(\left(k\times {k}^{*}\right)\), \(\left({k}^{*}\times k\right)\) and\(\left({k}^{*}\times {k}^{*}\right)\), respectively. And then, \({\epsilon }_{xt}\) is expressed by\({\epsilon }_{x*t}\)Footnote 8\(.\)

$$ \epsilon _{xt} = \sum\limits_{{xx{*}}} {\sum\limits\epsilon_{{x{*}x}}^{ - 1} {_{{x{*}t}} } } + {\varvec{v}}_{t} $$
(3.4)

where \({{\varvec{v}}}_{t}\sim IIDN(0,{{\varvec{\Sigma}}}_{{\varvec{v}}{\varvec{v}}})\), \({{\varvec{\Sigma}}}_{{\varvec{v}}{\varvec{v}}}={{\varvec{\Sigma}}}_{xx}-{{\varvec{\Sigma}}}_{xx*}{{\varvec{\Sigma}}}_{x*x*}^{-1}{{\varvec{\Sigma}}}_{x*x}\), \({{\varvec{v}}}_{t}\) and \({\epsilon }_{x*t}\) are unrelated.

For \(\Delta {\mathbf{x}}_{t}\), we have conditional model:

$$ {\Delta }{\mathbf{x}}_{t} = - {{\varvec{\Pi}}}_{{xx,x{*}}} {\mathbf{z}}_{t - 1} + {{\varvec{\Lambda}}}_{0} {\Delta }{\mathbf{x}}_{t}^{*} + \mathop \sum \limits_{\ell = 1}^{p - 1} {{\varvec{\Lambda}}}_{\ell } {\Delta }{\mathbf{z}}_{t - \ell } + {\mathbf{c}}_{0} + {\mathbf{c}}_{1} t + {\varvec{v}}_{t} $$
(3.5)

where \({{\varvec{\Pi}}}_{xx,x*}={{\varvec{\Pi}}}_{x}-{{\varvec{\Sigma}}}_{xx*}{{\varvec{\Sigma}}}_{x*x*}^{-1}{{\varvec{\Pi}}}_{x*}\), \({{\varvec{\Lambda}}}_{0}={{\varvec{\Sigma}}}_{xx*}{{\varvec{\Sigma}}}_{x*x*}^{-1}\), for \({\ell}=1,\dots p-1\), \({{\varvec{\Lambda}}}_{{\ell}}={\Gamma }_{{\ell}}-{{\varvec{\Sigma}}}_{xx*}{{\varvec{\Sigma}}}_{x*x*}^{-1}{\Gamma }_{{\ell}}^{*}\), \({\mathbf{c}}_{0}={\mathbf{a}}_{\mathbf{x}0}-{{\varvec{\Sigma}}}_{xx*}{{\varvec{\Sigma}}}_{x*x*}^{-1}{\mathbf{a}}_{\mathbf{x}\mathbf{*}0}\) and \({\mathbf{c}}_{1}={\mathbf{a}}_{\mathbf{x}1}-{{\varvec{\Sigma}}}_{xx*}{{\varvec{\Sigma}}}_{x*x*}^{-1}{\mathbf{a}}_{\mathbf{x}\mathbf{*}1}\).

Following Pesaran et al. (2000), it assumes that \({\left\{{\mathbf{x}}_{t}^{*}\right\}}_{t=1}^{\infty }\) is weakly exogenous, namely

$${{\varvec{\Pi}}}_{x*}=0$$

so \({{\varvec{\Pi}}}_{xx,x*}={{\varvec{\Pi}}}_{x}\).

More specificially, we have

$$ {\Delta }{\mathbf{x}}_{t} = - {{\varvec{\Pi}}}_{x} {\mathbf{z}}_{t - 1} + {{\varvec{\Lambda}}}_{0} {\Delta }{\mathbf{x}}_{t}^{*} + \mathop \sum \limits_{\ell = 1}^{p - 1} {{\varvec{\Lambda}}}_{\ell } {\Delta }{\mathbf{z}}_{t - \ell } + {\mathbf{c}}_{0} + {\mathbf{c}}_{1} t + {\varvec{v}}_{t} $$
(3.6)

and

$$ {\Delta }{\mathbf{x}}_{t}^{*} = \mathop \sum \limits_{\ell = 1}^{p - 1} {{\varvec{\Lambda}}}_{\ell } {\Delta }{\mathbf{z}}_{t - \ell } + {\mathbf{a}}_{{{\mathbf{x}}*0}} +\epsilon_{{x{*}t}} $$
(3.7)

The \({\left\{{\mathbf{x}}_{t}^{*}\right\}}_{t=1}^{\infty }\) and \({\left\{{\mathbf{x}}_{t}\right\}}_{t=1}^{\infty }\) processes are Garratt et al., (2006) state that trend coefficients must be restricted in order to eliminate the quadratic trend and ensure that the trend in the deterministic part of \({\mathbf{x}}_{t}\) is linear. In this case, \({\mathbf{a}}_{\mathbf{x}\mathbf{*}1}=0\) and so \({\mathbf{c}}_{1}={\mathbf{a}}_{\mathbf{x}1}\).

The equation system given in (3.2) has different dynamics, for endogenous variables, \(\Delta {\mathbf{x}}_{t}\), and for exogenous variables, \(\Delta {\mathbf{x}}_{t}^{*}\), and thus consists of two parts. Granger and Lin (1995) and Pesaran et al. (2000) describe variables with \(I(1)\) characteristics associated with the concept of weak exogeneity as “long-run forcing”.

The \({\left\{{\mathbf{x}}_{t}^{*}\right\}}_{t=1}^{\infty }\) and \({\left\{{\mathbf{x}}_{t}\right\}}_{t=1}^{\infty }\) processes are consistent with the conditional and the marginal model definitions, respectively.Footnote 9 Thus, the information criteria proposed for the GVAR modelling approach can be considered within the framework of the likelihood approach described by Pesaran et al. (2000).

To simplify the derivation, unrestricted intercept and restricted trend model option given in Eq. (2.6), \({\mathbf{c}}_{0}\ne 0\) and \({\mathbf{c}}_{1}={{\varvec{\Pi}}}_{x}\gamma \), be used in Eq. (3.6) and we have

$$ {\Delta }{\mathbf{x}}_{t} = {\mathbf{c}}_{0} + {{\varvec{\Pi}}}_{{\tilde{x}}} {\tilde{\mathbf{z}}}_{t - 1} + {{\varvec{\Lambda}}}_{0} {\Delta }{\mathbf{x}}_{t}^{*} + \mathop \sum \limits_{\ell = 1}^{p - 1} {{\varvec{\Lambda}}}_{\ell } {\Delta }{\mathbf{z}}_{t - \ell } + {\varvec{v}}_{t} $$
(3.8)

where \({\tilde{\mathbf{z}}}_{t - 1} = (t,{\mathbf{z}}_{t - 1}^{\prime } )^{\prime }\), and \({{\varvec{\Pi}}}_{{\tilde{x}}} = {{\varvec{\Pi}}}_{x} \left( { - \gamma ,I_{{\left( {k + k^{*} } \right)}} } \right)\). In addition, \({{\varvec{\Pi}}}_{x} = {\varvec{\alpha}}_{x} {\varvec{\beta}}^{\prime }\), \({{\varvec{\Pi}}}_{x}\), \({\varvec{\alpha}}_{x}\) and \({\varvec{\beta}}\) are \(k \times \left( {k + k^{*} } \right)\), \(\left( {k \times r} \right)\) and \(\left( {k + k^{*} } \right) \times r\) dimensional matrices, respectively. \(r = \mathop \sum \limits_{i = 1}^{N} r_{i}\) is the total number of cointegrations in the GVAR system.

When the VEC model given in Eq. (3.8) is stacked for \(T\) observation,

$$ {\Delta }{\mathbf{X}} = {\mathbf{c}}_{0T}^{\prime } + {{\varvec{\Lambda}}}{\Delta }{\mathbf{Z}}_{ - } + {{\varvec{\Pi}}}_{{\tilde{x}}} {\tilde{\mathbf{Z}}}_{ - 1} + {\mathbf{V}} $$
(3.9)

where \({\Delta }{\mathbf{X}} \equiv \left( {{\Delta }{\mathbf{x}}_{1} , \ldots ,{\Delta }{\mathbf{x}}_{T} } \right)\), \(l_{T}\) is \(T \times 1\) dimensional ones vector, \({\Delta }{\mathbf{X}}^{*} \equiv \left( {{\Delta }{\mathbf{x}}_{1}^{*} , \ldots ,{\Delta }{\mathbf{x}}_{T}^{*} } \right)\), \({\Delta }{\mathbf{Z}}_{ - \ell } \equiv \left( {{\Delta }{\mathbf{z}}_{1 - \ell } , \ldots ,{\Delta }{\mathbf{z}}_{T - \ell } } \right)\), \(\ell = 1, \ldots ,p - 1\), \({{\varvec{\Lambda}}} \equiv \left( {{{\varvec{\Lambda}}}_{0} ,{{\varvec{\Lambda}}}_{1} , \ldots ,{{\varvec{\Lambda}}}_{p - 1} } \right)\), \({\Delta }{\mathbf{Z}}_{ - } \equiv ({\Delta }{\mathbf{X}}^{{{*}^{\prime}}} ,{\Delta }{\mathbf{Z}}_{ - 1}{\prime} , \ldots ,{\Delta }{\mathbf{Z}}_{1 - p}{\prime} ){\prime}\), \({\tilde{\mathbf{Z}}}_{ - 1} \equiv (_{T} ,{\mathbf{Z}}_{ - 1}{\prime} ){\prime}\) \({\mathbf{Z}}_{ - 1} \equiv \left( {{\mathbf{z}}_{0} , \ldots ,{\mathbf{z}}_{T - 1} } \right)\), and \({\mathbf{V}} \equiv \left( {{\varvec{v}}_{1} , \ldots ,{\varvec{v}}_{T} } \right)\).

In order to utilize the model presented in Eq. (3.9) instead of the unconditional VEC(\(p-1\)) model given in Eq. (3.2), which presents a dimension problem when estimated directly, the log-likelihood function is split into two parts: the conditional and the marginal model.

We can further explain the above expression with the following lemma:

Lemma 1

If the assumption of weak exogeneity is valid, \({\boldsymbol{\Pi }}_{x*}=0\), the log-likelihood function of the unconditional model in Eq. (3.2) can be derived from the conditional and the marginal models given in Eq. (3.6) and Eq. (3.7):

$${\ell}({{\varvec{\theta}}}_{\Delta z},{\varvec{\Sigma}})={{\ell}}_{v}({\varvec{\theta}},{{\varvec{\Sigma}}}_{vv})+{{\ell}}_{*}({{\varvec{\theta}}}^{*}{{\varvec{\Sigma}}}_{x*x*})$$

where \({{\varvec{\theta}}}_{\Delta z}\), \({\varvec{\theta}}\) and \({{\varvec{\theta}}}^{*}\) are vectors of parameters in unconditional, conditional and marginal models, respectively.

The proof of Lemma 1 is given in Appendix.

The log-likelihood function of the conditional model in Eq. (3.9) is

$$ l\left( {\varvec{\theta}} \right) = \frac{ - kT}{2}{\text{log}}\left( {2\pi } \right) - \frac{T}{2}{\text{log}}\left| {{{\varvec{\Sigma}}}_{vv} } \right| - \frac{1}{2}tr\left( {{{\varvec{\Sigma}}}_{vv}^{ - 1} {\varvec{VV}}^{\prime } } \right) $$
(3.10)

where \({\varvec{\theta}}\) is a \(K \times 1\) dimensional vector,

$$ {\varvec{\theta}} = \left( {vec({{\varvec{\Lambda}}})^{\prime } ,vec\left( {{\varvec{\alpha}}_{x} } \right)^{\prime } ,vec({\varvec{\beta}}_{x} )^{\prime } ,vech\left( {{{\varvec{\Sigma}}}_{{{\varvec{vv}}}} } \right)^{\prime } } \right)^{\prime } $$

and

$$K=\left[k\times \left[{k}^{*}+\left(k+{k}^{*}\right)(p-1)\right]+kr+\left(k+{k}^{*}\right)\times r+\left(\frac{k(k+1)}{2}\right)\right]$$
(3.11)

In Eq. (3.2), we define \(\Delta {\mathbf{z}}_{t}\) as a continuous random variable,\(\Delta \mathbf{Z}\), where \({\Delta }{\mathbf{z}}_{t} = \left( {{\Delta }{\mathbf{x}}_{t}^{\prime } ,{\Delta }{\mathbf{x}}_{t}^{{{*}\prime }} } \right)^{\prime }\), characterized by the probability density function\(f(\Delta \mathbf{z},{{\varvec{\theta}}}_{\Delta z})\). The parameter vector \({{\varvec{\theta}}}_{\Delta z}\) is \(\left({K}_{\Delta z}\times 1\right)\) dimensional, with\({{\varvec{\theta}}}_{{K}_{\Delta z}}\in\Theta \subset {\mathbb{R}}^{{K}_{\Delta z}}\). Let’s assume the true parameter vector for \({{\varvec{\theta}}}_{\Delta z0}\) is\({{\varvec{\theta}}}_{\Delta z}\), which is unknown. This vector originates from the probability density function\(f(\Delta \mathbf{z},{{\varvec{\theta}}}_{\Delta z0})\), which correctly reflects observed sample data. We select a \({{\varvec{\theta}}}_{\Delta z}\) such that the actual parameter vector closely approximates\({{\varvec{\theta}}}_{\Delta z0}\). Thus, the probability density functions \(f(\Delta \mathbf{z},{{\varvec{\theta}}}_{\Delta z})\) and \(f(\Delta \mathbf{z},{{\varvec{\theta}}}_{\Delta z0})\) are expressed as the rival model and the actual model, respectively. Using Lemma 1, these models are described as \(f(\Delta \mathbf{x},{\varvec{\theta}})\) and\(f(\Delta \mathbf{x},{{\varvec{\theta}}}_{0})\).

The Kullback–Leibler (K-L) information distance is generally used to derive an Akaike-type information criterion.Footnote 10 In order to use the K-L information to measure the distance between the two models, the following assumptions should be made. These assumptions are also known as regularity conditions (RC).Footnote 11 We consider that \(i^{\prime }\), be the \(i^{\prime }\) th element of\({\varvec{\theta}}\).

Assumption 2

(RC1) \(\forall{\varvec{\theta}}\in \Theta \),

$$ \frac{{\partial {\text{log}}\;f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial \theta_{{i^{\prime } }} }},\;\frac{{\partial^{2} {\text{log}}\;f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial \theta_{{i^{\prime } }} \partial \theta_{{j^{\prime } }} }},\;\frac{{\partial^{3} {\text{log}}\;f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial \theta_{{i^{\prime } }} \partial \theta_{{j^{\prime } }} \partial \theta_{{k^{\prime } }} }} $$

the derivatives exist for all \({\Delta }{\mathbf{x}}\) and \(i^{\prime } ,j^{\prime } ,k^{\prime } = 1 \cdots K\).

Assumption 3

(RC2) For \({\varvec{\theta}}_{0} \in \Theta\), functions of \(G\left( {\Delta {\varvec{x}}} \right)\), \(H\left( {\Delta {\varvec{x}}} \right)\) and \(K\left( {\Delta {\varvec{x}}} \right)\) exist,

$$ \left| {\frac{{\partial f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial \theta_{{i^{\prime } }} }}} \right| \le G_{{i^{\prime } }} \left( {{\Delta }{\mathbf{x}}} \right),\;\left| {\frac{{\partial^{2} f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial \theta_{{i^{\prime } }} \partial \theta_{{j^{\prime } }} }}} \right| \le H_{{i^{\prime } j^{\prime } }} \left( {{\Delta }{\mathbf{x}}} \right),\;\left| {\frac{{\partial^{3} f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial \theta_{{i^{\prime } }} \partial \theta_{{j^{\prime } }} \partial \theta_{{k^{\prime } }} }}} \right| \le K_{{i^{\prime } j^{\prime } k^{\prime } }} \left( {{\Delta }{\mathbf{x}}} \right) $$

inequalities exist for \({\Delta }{\mathbf{x}}\) and \(i^{\prime } ,j^{\prime } ,k^{\prime } = 1 \cdots K\), and

$$ \int {G_{{i^{\prime } }} \left( {\Delta {\mathbf{x}}} \right){\text{d}}\Delta {\mathbf{x}} < \infty } ,\;\int {H_{{i^{\prime } j^{\prime } }} \left( {\Delta {\mathbf{x}}} \right){\text{d}}{\mathbf{x}} < \infty } ,\;\int {K_{{i^{\prime } j^{\prime } k^{\prime } }} \left( {\Delta {\mathbf{x}}} \right){\text{d}}\Delta {\mathbf{x}} < \infty } $$

Assumption 4

(RC3) For \({\varvec{\theta}}\in \Theta \),

$$ {\mathbf{B}}\left( {\varvec{\theta}} \right) = E\left[ {\left( {\frac{{\partial {\text{log}}f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial {\varvec{\theta}}}}} \right)\left( {\frac{{\partial {\text{log}}f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}{{\partial {\varvec{\theta}}^{\prime } }}} \right)} \right] $$
(3.12)

exist and \(\parallel \mathbf{B}({\varvec{\theta}}){\parallel }_{2}>0\).

Remark 2

According to RC1, \(\frac{\partial logf(\Delta {\varvec{x}},{\varvec{\theta}})}{\partial{\varvec{\theta}}}\) has Taylor series expansion as a function of \({\varvec{\theta}}.\)\(\frac{\partial logf(\Delta {\varvec{x}},{\varvec{\theta}})}{\partial{\varvec{\theta}}}\) is considered score vector. RC2 allows derivatives of \(\int f(\Delta {\varvec{x}},{\varvec{\theta}})d\Delta {\varvec{x}}\) and \(\int \left[\frac{\partial logf(\Delta {\varvec{x}},{\varvec{\theta}})}{\partial{\varvec{\theta}}}\right]d\Delta {\varvec{x}}\) to be obtained according to \({\varvec{\theta}}\). Finally, RC3 requires the score vector to have a finite variance.

Lemma 2

If the regularity conditions are provided in RC1-RC3, the score vector, \({\varvec{S}}({\varvec{\theta}})=\frac{\partial logf(\Delta {\varvec{x}},{\varvec{\theta}})}{\partial{\varvec{\theta}}}\), has zero mean and constant variance.

The proof of Lemma 2 is given in Appendix.

For VEC model given in (3.9), the K-L information is

$$ \begin{aligned} K - L\left( {{\varvec{\theta}}_{0} ;{\varvec{\theta}}} \right) & = E\left[ {\log f\left( {\Delta {\mathbf{X}},{\varvec{\theta}}_{0} } \right) - \log f\left( {\Delta {\mathbf{X}},{\varvec{\theta}}} \right)} \right] \\ & = \int {f\left( {\Delta {\mathbf{x}},{\varvec{\theta}}_{0} } \right)\log f\left( {\Delta {\mathbf{x}},{\varvec{\theta}}_{0} } \right)d\Delta {\mathbf{x}}} - \int {f\left( {\Delta {\mathbf{x}},{\varvec{\theta}}_{0} } \right)\log f\left( {\Delta {\mathbf{x}},{\varvec{\theta}}} \right)d\Delta {\mathbf{x}}} \\ & = H\left( {{\varvec{\theta}}_{0} ;{\varvec{\theta}}_{0} } \right) - H\left( {{\varvec{\theta}}_{0} ;{\varvec{\theta}}} \right) \\ \end{aligned} $$
(3.13)

where \(E\) indicates the expectation of the \(\Delta \mathbf{X}\) according to \(f(\Delta \mathbf{X},{{\varvec{\theta}}}_{0})\).

When the expression is arranged, the K-L is

$$ {\text{K}} - {\text{L}}\left( {{\varvec{\theta}}_{0} ;{\varvec{\theta}}} \right) = \int {f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}_{0} } \right){\text{log}}\left( {\frac{{f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}_{0} } \right)}}{{f\left( {{\Delta }{\mathbf{x}},{\varvec{\theta}}} \right)}}} \right)d{\Delta }{\mathbf{x}}} $$
(3.14)

Proposition 1

If the regularity conditions are provided in RC1-RC3, for the VEC model given in Eq. (3.9), \(K-L({{\varvec{\theta}}}_{0};{\varvec{\theta}})\) is a non-negative measure, the amount of information of the distance of \(f(\Delta {\varvec{x}},{\varvec{\theta}})\) to \(f(\Delta {\varvec{x}},{{\varvec{\theta}}}_{0})\) probability density function.

The proof of Proposition 1 is given in Appendix.

Corollary 1

Proposition 1 provides that the analytical characteristics of the amount of \(K-L({{\varvec{\theta}}}_{0};{\varvec{\theta}})\) information for the VEC model given in Eq. (3.9) are.

(i) If \(f(\Delta \mathbf{x},{{\varvec{\theta}}}_{0})\ne f(\Delta \mathbf{x},{\varvec{\theta}})\), \({\text{K}}-{\text{L}}({{\varvec{\theta}}}_{0};{\varvec{\theta}})>0\).

(ii) If, and only if \(f(\Delta \mathbf{x},{{\varvec{\theta}}}_{0})=f(\Delta \mathbf{x},{\varvec{\theta}})\), if the model is specified correctly, \({\text{K}}-{\text{L}}({{\varvec{\theta}}}_{0};{\varvec{\theta}})=0\).

(iii) Because \(\Delta \mathbf{X}\) is independent and identically distributed, \({\text{K}}-{\text{L}}({{\varvec{\theta}}}_{0};{\varvec{\theta}})\) is additive.

\({f}_{0}\) and \(f\) will be used instead of \(f(\Delta \mathbf{x},{{\varvec{\theta}}}_{0})\) and \(f(\Delta \mathbf{x},{\varvec{\theta}})\) for ease of spelling. If \(g\) is an another competing model other than \(f\), Eq. (3.14) is rewritten as

$${\text{K}}-{\text{L}}({f}_{0};f)=H({f}_{0})-H({f}_{0};f)$$
$${\text{K}}-{\text{L}}({f}_{0};g)=H({f}_{0})-H({f}_{0};g)$$
$${\text{K}}-{\text{L}}({f}_{0};f)-{\text{K}}-{\text{L}}({f}_{0};g)=H({f}_{0};g)-H({f}_{0};f).$$

When comparing competing models, the actual correct model \({f}_{0}\) does not need to be known. Furthermore, the \({f}_{0}\) model is never truly known. Therefore, the priority in the amount of K-L information given in Eq. (3.14) will be the estimation of the competing model \(f\). The K-L information amount is expressed as:

$$ {\text{K}} - {\text{L}}\left( {f_{0} ;\hat{f}} \right) = H\left( {f_{0} } \right) - H\left( {f_{0} ;\hat{f}} \right). $$

The estimate of the amount of K-L information is random because it depends on \(\widehat{f}\). We focus on the expected amount of K-L informationFootnote 12

$$ E\left[ {{\text{K}} - {\text{L}}\left( {f_{0} ;\hat{f}} \right)} \right] = \int {f_{0} \left( {{\mathbf{\Delta x}}} \right){\text{log}}f_{0} \left( {{\mathbf{\Delta x}}} \right)d{\mathbf{\Delta x}} - E\left[ {\int {f_{0} \left( {{\mathbf{\Delta x}}} \right){\text{log}}\hat{f}\left( {{\mathbf{\Delta x}}} \right)d{\mathbf{\Delta x}}} } \right]} $$
(3.15)

where the expectation is on \(\widehat{f}\).

As mentioned earlier, the first term is not necessary when choosing between competing models. Thus, minimizing the expected amount of K-L information equates to maximizing the second term. Multiplying the second term by − 2, the purpose functionFootnote 13 becomes:

$$ A = - 2E\left[ {\int {f_{0} \left( {{\mathbf{\Delta x}}} \right){\text{log}}\;\hat{f}\left( {{\mathbf{\Delta x}}} \right)d{\mathbf{\Delta x}}} } \right] $$
(3.16)

where the integral is based on the \({f}_{0}\), which is the true the population random variable \(\Delta \mathbf{X}\). As mentioned earlier, the true probability density function will never be known. However, estimates are made based on the sample drawn from \(\Delta \mathbf{X}\), \(\Delta \widetilde{\mathbf{x}}\). Hence, Eq. (3.16) can be rewritten as:

$$ A = - 2E\left[ {\log \hat{f}\left( {{\Delta }{\tilde{\mathbf{x}}}} \right)} \right] $$
(3.17)

Here, \(\Delta \widetilde{\mathbf{x}}\) is an independent sample drawn from the random variable \(\Delta \mathbf{X}\) with a probability density function of \({f}_{0}\). Thus, \(A\) represents the expected estimate of log-likelihood. Burnham and Anderson (2002) argue that models with low \(A\) values have good out-of-sample log-likelihood.

$$ {\varvec{v}}_{t} = {\Delta }{\mathbf{x}}_{t} - {\mathbf{c}}_{0} - {{\varvec{\Pi}}}_{{\tilde{x}}} {\tilde{\mathbf{z}}}_{t - 1} - {{\varvec{\Lambda}}}_{0} {\Delta }{\mathbf{x}}_{t}^{*} - \mathop \sum \limits_{\ell = 1}^{p - 1} {{\varvec{\Lambda}}}_{\ell } {\Delta }{\mathbf{z}}_{t - \ell } $$

and when the log-likelihood function of the conditional model given in Eq. (3.10) is rewritten as:

$$ l\left( {\varvec{\theta}} \right) = \frac{ - kT}{2}\log \left( {2\pi } \right) - \frac{T}{2}\log \left| {{{\varvec{\Sigma}}}_{vv} } \right| - \frac{1}{2}\mathop \sum \limits_{t = 1}^{T} {\varvec{v}}_{t}^{\prime } {{\varvec{\Sigma}}}_{vv}^{ - 1} {\varvec{v}}_{t} $$
(3.18)

The maximum value of the log-likelihood function of the system is obtained with the parameters that are actually correct:

$$ l\left( {{\varvec{\theta}}_{0} } \right) = \frac{ - kT}{2}\log \left( {2\pi } \right) - \frac{T}{2}\log \left| {{{\varvec{\Sigma}}}_{vv,0} } \right| $$

Therefore, the desired value of the \(A\) function is:

$$ A_{0} = kT\log \left( {2\pi } \right) + T\log \left| {{{\varvec{\Sigma}}}_{vv,0} } \right| $$
(3.19)

This will be the value obtained if there is no estimation error.Footnote 14

Lemma 3

If the conditional VEC model given in Eq. (3.9) with a parameter count of \(K\) and variance–covariance matrix \({\varvec{\varSigma}}_{vv,0}\) is actually estimated with \(\hat{f}\left( {\Delta {\varvec{x}}} \right)\) while the correct model is \(f_{0}\),

$$ A = kT\left( {\log \left( {2\pi } \right) + 1} \right) + T\log \left| {{{\varvec{\Sigma}}}_{vv,0} } \right| + K $$
(3.20)

and

$$ E\left[ { - 2\ell_{T} \left( {\hat{\varvec{\theta }}} \right)} \right] = kT\left( {\log \left( {2\pi } \right) + 1} \right) + T\log \left| {{{\varvec{\Sigma}}}_{vv,0} } \right| - K $$
(3.21)

The proof of Lemma 3 is provided in Appendix.

The target function \(A\) in Eq. (3.20) appears to exceed the desired value of \({A}_{0}\) by \(K\) parameters. The second term represents the cost of an additional parameter estimate, measured by the expected amount of K-L information. When parameters are estimated instead of using actual parameter values, the K-L amount increases linearly with the number of parameters, \(K\).

This result is represented differently in Eq. (3.21). The sample log-likelihood function is less than the desired value of \({A}_{0}\) by \(K\) parameters. This represents the cost of over-fitting within the sample. The sample log-likelihood is a measure of in-sample compatibility and is therefore less than the population’s log-likelihood. Taken together, these two statements indicate that the target value of the sample logarithmics is \(2K\) less than \(A\). This reflects the combined cost of over-fitting and parameter estimation.

By combining expression Eqs. (3.20) and (3.21), we get:

$$ {\text{AIC}} = - 2 \times \left( {{\text{log}} - likelihood} \right) + 2 \times \left( {number\; of\; parameter} \right) $$

which forms the general structure of the AIC. The likelihood is obtained by estimating parameters, where \(- 2 \times \left( {{\text{log}} - likelihood} \right)\) signifies the goodness of fit and \(2\times (number of parameter)\) is considered the penalty term.Footnote 15

Theorem 1

GVAR-Akaike information criterion (GAIC): In the VEC model given in Eq. (3.9), under the validity of Lemma 1–3, \(\left\{ {{\mathfrak{M}}_{{\tilde{k}_{\left( N \right)} }} :\tilde{k}_{\left( N \right)} = \tilde{k}_{\left( N \right),1} ,\tilde{k}_{\left( N \right),2} , \cdots ,\tilde{k}_{\left( N \right),K} \in {\mathbb{Z}}^{ + } ,N \in {\mathbb{N}}} \right\}\) is a set of all competing models generated according to the country/regional structure and

$$ \begin{aligned} {\text{GAIC}}\left( {\tilde{k}_{\left( N \right)} } \right) & = - 2\log \ell \left( {\hat{\varvec{\theta }}_{{\tilde{k}_{\left( N \right)} }} } \right) + 2\tilde{k}_{\left( N \right)} \\ & = k\left( {\log \left( {2\pi } \right) + 1} \right) + \log \left| {{\hat{\mathbf{\Sigma }}}_{{{\varvec{vv}},0}} } \right| + \frac{{2\tilde{k}_{\left( N \right)} }}{T} \\ \end{aligned} $$
(3.22)

is an estimator of twice the target function given in Eq. (3.21) minimized to select an \({\mathfrak{M}}_{{\tilde{k}_{\left( N \right)} }}\) model from among the competing models.

\(k\) is the number of endogenous variables and \(\tilde{k}_{\left( N \right)}\) is the number of parameters that vary depending on the country/region dimension \(N\).

The proof of Theorem 1 is given in the Appendix. All models are estimated, and the one with the lowest \({\text{GAIC}}\) value is considered the best. The advantages of AIC include its ease of calculation, implementation, and interpretation. Essentially, it provides an intuitive approach to model selection, accounting for the cost (or penalty) of each additional parameter estimate.

Despite these advantages, the criterion has well-documented disadvantages in the literature. Cavanaugh and Neath (2019) argue that when the number of parameters is larger than the number of observations, the penalty term on the right-hand side of the criterion is significantly affected, leading to biased results. This situation is particularly evident when the sample size is increased. Shibata (1989) emphasizes that the AIC criterion is sensitive to the number of parameters, and Linhart and Zucchini (1986) claim that the criterion can be heavily biased towards incorrect model selection as the number of parameters increases.

The \({\text{GAIC}}\) criterion, developed for the GVAR modelling approach, is also likely to face similar challenges. This is particularly true as the number of individual country/region models increases, along with the number of endogenous and, therefore, exogenous variables, causing the size of the parameter vector to exceed the number of observations. This feature is particularly prominent in macroeconomic analyses and in datasets used in the GVAR modelling approach. The finite sample characteristics observed for the Akaike-type criteria were also observed for the \({\text{GAIC}}\) criterion. Section 3.2 proposes an \(ad-hoc\) solution for the \({\text{GAIC}}\) criterion based on these observations.

3.2 An Ad hoc Modification to Information Criterion

For the GVAR model with a number of parameters, \(K\), \({\text{GAIC}}\) in Eq. (3.22) is:

$$ {\text{GAIC}}\left( K \right) = k\left( {\log \left( {2\pi } \right) + 1} \right) + \log \left| {{\hat{\mathbf{\Sigma }}}_{{{\varvec{vv}},0}} } \right| + \frac{2K}{T} $$
(3.23)

and the row-dimension of the vector of parameters is defined as:

$$ K = \left[ {k \times \left[ {k^{*} + \left( {k + k^{*} } \right)\left( {p - 1} \right)} \right] + kr + \left( {k + k^{*} } \right) \times r + \left( {\frac{{k\left( {k + 1} \right)}}{2}} \right)} \right] $$
(3.24)

The total number of endogenous and exogenous variables in the GVAR system is defined by average values as follows:

$$ k = \mathop \sum \limits_{i = 1}^{N} k_{i} = N\overline{k} $$

and

$$ k^{*} = \mathop \sum \limits_{i = 1}^{N} k_{i}^{*} = N\overline{k}^{*} $$

So we have

$$ {\text{GAIC}}\left( K \right) = N\overline{k}\left( {\log \left( {2\pi } \right) + 1} \right) + \log \left| {{\hat{\mathbf{\Sigma }}}_{vv,0} } \right| + \frac{2K}{T} $$
(3.25)

From this point of view, applying same approach to \(K\) in Eq. (3.24) obtained the following equation:

$$ K = \left( {p - 1} \right)N^{2} \overline{k}^{2} + pN^{2} \overline{kk}^{*} + 2rN\overline{k} + rN\overline{k}^{*} + \frac{1}{2}N^{2} \overline{k}^{2} + \frac{1}{2}N\overline{k} $$
(3.26)

Thus, the \({\text{GAIC}}\left( K \right)\) criterion given in Eq. (3.25) is defined in terms of three different functions as follows:

$$ {\text{GAIC}}\left( K \right) = f_{1} \left( {N,\overline{k}} \right) + f_{2} \left( {{\hat{\mathbf{\Sigma }}}_{vv,0} } \right) + f_{3} \left( {N^{2} ,N,p,r,\overline{k},\overline{k}^{*} ,2T^{ - 1} } \right) $$

Firstly, we consider the function \(f_{2} \left( {{\hat{\mathbf{\Sigma }}}_{vv,0} } \right)\) that takes into account the stochastic structure. This function is defined as follows:

$$ f_{2} \left( {{\hat{\mathbf{\Sigma }}}_{vv,0} } \right) = {\text{log}}\left| {{\hat{\mathbf{\Sigma }}}_{{{\varvec{vv}},0}} } \right| $$

where \({\hat{\mathbf{\Sigma }}}_{vv,0}\) be a matrix of \(k \times k\) dimension. The estimate of the variance–covariance matrix,

$$ {\mathbf{\hat{\Sigma }}}_{{vv,0}} = \mathop \sum \limits_{{t = 1}}^{T} {\text{}}\varvec{\hat{v}}_{t} \varvec{\hat{v}}_{t}^{'} /T $$
(3.27)

is calculated from the estimated residuals of country/region specific models.

If the total number of endogenous variables in the GVAR model, \(k = N\overline{k}\), is greater than time dimension \(T\), \({\hat{\mathbf{\Sigma }}}_{vv,0}\) may not be positively defined.Footnote 16 This will be a serious problem in the calculation of the criteria.

The matrix is given in Eq. (3.27) is the estimation of the large-dimension variance–covariance matrices mentioned by Bailey et al., (2019), is in keeping with the problems and solutions that are being encountered. It can be defined as follows:

$$ {\hat{\mathbf{\Sigma }}}_{vv,0} = \left( {\begin{array}{*{20}c} {\widehat{cov}\left( {\hat{\varvec{v}}_{1t} ,\hat{\varvec{v}}_{1t} } \right)} & {\widehat{cov}\left( {\hat{\varvec{v}}_{1t} ,\hat{\varvec{v}}_{2t} } \right)} & \cdots & {\widehat{cov}\left( {\hat{\varvec{v}}_{1t} ,\hat{\varvec{v}}_{Nt} } \right)} \\ {\widehat{cov}\left( {\hat{\varvec{v}}_{2t} ,\hat{\varvec{v}}_{1t} } \right)} & {\widehat{cov}\left( {\hat{\varvec{v}}_{2t} ,\hat{\varvec{v}}_{2t} } \right)} & \cdots & {\widehat{cov}\left( {\hat{\varvec{v}}_{2t} ,\hat{\varvec{v}}_{Nt} } \right)} \\ \vdots & \vdots & {} & \vdots \\ {\widehat{cov}\left( {\hat{\varvec{v}}_{Nt} ,\hat{\varvec{v}}_{1t} } \right)} & {\widehat{cov}\left( {\hat{\varvec{v}}_{Nt} ,\hat{\varvec{v}}_{2t} } \right)} & \cdots & {\widehat{cov}\left( {\hat{\varvec{v}}_{Nt} ,\hat{\varvec{v}}_{Nt} } \right)} \\ \end{array} } \right) $$
(3.28)

One solution recommendation is to be restricted to the diagonal elements of this matrix. In addition, PSW states that the GVAR modelling approach enables weak cross-sectional dependency, \(N \to \infty\),

$$ \frac{{\sum\nolimits_{i,n = 1}^{N} {\sigma_{{in,hh^{\prime } }} } }}{N} \to 0 $$

In the framework of these approaches, to derive an \(ad - hoc\) modification of \(GAIC\), a transformation of the function \(f_{2} \left( {{\hat{\mathbf{\Sigma }}}_{vv,0} } \right)\) is applied as follows.Footnote 17

$$ f_{2}^{ad - hoc} \left( {Bdiag{\hat{\mathbf{\Sigma }}}_{vv,0} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{N} log\left| {{\hat{\mathbf{\Sigma }}}_{{{\varvec{v}}_{{\varvec{i}}} {\varvec{v}}_{{\varvec{i}}} ,0}} } \right|}}{N} $$
(3.29)

Secondly, we examine \(f_{1} \left( {N,\overline{k}} \right)\) and \(f_{3} \left( {N^{2} ,N,p,r,\overline{k},\overline{k}^{*} ,2T^{ - 1} } \right)\) functions, which are deterministic structures depending on \(N\). To obtain \(ad - hoc\) versions of these funcitons, we rearrange them to remove the cross-sectional dimension effect so that they do not depend on \(N\). Thus, these functions can be obtained as follows:

$$ f_{1}^{ad - hoc} \left( {\overline{k}} \right) = \frac{{N\overline{k}\left( {log\left( {2\pi } \right) + 1} \right)}}{N} = \overline{k}\left( {log\left( {2\pi } \right) + 1} \right) $$
(3.30)

and

$$ f_{3}^{ad - hoc} \left( {\overline{K}} \right) = \frac{{\varrho \times {\text{dim}}\left( K \right)}}{N} $$
(3.31)

where \(dim\) denote dimension expression and \(\varrho =2/T\).

Finally, \({\text{GAIC}}(K)^{ad - hoc}\) is obtained as the sum of these three functions as follows:

$$ {\text{GAIC}}(K)^{ad - hoc} = f_{1}^{ad - hoc} \left( {\overline{k}} \right) + f_{2}^{ad - hoc} \left( {Bdiag{\hat{\mathbf{\Sigma }}}_{vv,0} } \right) + f_{3}^{ad - hoc} \left( {\overline{K}} \right) $$
(3.32)

4 Small Sample Evidence

This section examines the finite sample performances of the model selection criteria previously defined as \({\text{GAIC}}\) and \({\text{GAIC}}(K)^{ad - hoc}\), respectively. We adopt an approach similar to that of Fujikoshi and Satoh (1997). In this approach, we consider the performance of the information criteria when the country dimension of the true model is larger (smaller) than the country dimension of the candidate models. The objective of this approach is to generate competing models against the true model in all circumstances, whether the dimension of the model parameters is large or small. Our main aim is to measure the sensitivity of these criteria across all possible model alternatives, as might be encountered in empirical studies. We will first introduce the setting, then present and discuss the findings.

4.1 Design of Simulation

The Data Generation Process (DGP) is based on four main scenarios. Our aim is to simulate events commonly found in macroeconometric applications when designing these scenarios. The primary reason for this approach is the variability in the structure and content of the cross-sectional country dimension within the same dataset in the GVAR model, depending on the target country whose economy is being analyzed. For instance, when considering the economy of target country A, some countries need to be modeled individually, while others within the same dataset must be aggregated.Conversely, the content and composition of the cross-sectional dimension should be changed when the aim is to analyze the target country B in the same dataset.Footnote 18 As a result, our primary focus is on target and reference countries, as stated in Sect. 1. The following scenarios will outline the main considerations for the formulation of the DGPs:

Scenarios

1. The size of target country’s economy is smaller than that of other countries. Furthermore, the relationship between target and reference country is stronger than the relationship between other countries and reference country

2. The size of target country’s economy is smaller than that of other countries. Furthermore, the relationship between target and reference country is weaker than the relationship between other countries and reference country

3. The size of target country’s economy is smaller than that of reference country, but it is bigger than other countries. Furthermore, the relationship between target and reference country is stronger than the relationship between other countries and reference country

4. The size of target country’s economy is smaller than that of reference country, but it is bigger than other countries. Furthermore, the relationship between target and reference country is weaker than the relationship between other countries and reference country

The DGPs are created in response to the aforementioned scenarios.. The cross-sectional unit dimension, which represent the countries, is numbered and labeled as follows: \(i=\mathrm{1,2},\mathrm{3,4},\mathrm{5,6},7\). Four different DGPs are simulated in the following direction:

DGP1: The economy of the target country, labeled as 6, is smaller than all the other country’s economies (\(6<1<2<3<4<5<7\)). The relationship between the target country (label 6) and countries labeled as \(\mathrm{1,2},\mathrm{3,4}\), and \(5\) is weak, while the relationship between reference country (label \(7\)) and target country is strong.

DGP2: The economy of the target country, labeled as 6, is smaller than all the other countries’ economies (\(6<1<2<3<4<5<7\)). The relationship between the target country and the other countries is strong, but the relationship between the target country and the reference country (label 7) is weak.

DGP3: The economy of the target country, labeled as 5, is the second largest economy after the reference country’s economy (\(6<1<2<3<4<5<7\)). The relationship between the target country (label\(5\)) and countries labeled as \(\mathrm{1,2},\mathrm{3,4}\) and \(6\) is weak, while its relationship with the reference country (label 7) is strong \(.\)

DGP4: The economy of the target country, labeled as \(5\), is the second largest economy after the reference country’s economy (\(6<1<2<3<4<5<7\)). The relationship between the target country (label \(5\)) and countries labeled as 1, 2, 3, 4, and 6 is strong, but its relationship with the reference country (label 7) is weak.

A common feature across all DGPs is that the economic size ranking of the countries holds for all DGPs. Table 1 illustrates this situation by using predetermined values assigned to the variable\(D\). The values assigned to this variable only serve to provide an order of magnitudeFootnote 19:\(6<1<2<3<4<5<7\) \(.\)

Table 1 Predetermined values of \(D\)

The predetermined variable “trade” has been generated to represent the link matrix \({\mathbf{W}}_{i}\), as described in Sect. 2.2. In DGP1, the relationship between the target country (label 6) and the countries labeled \(\mathrm{1,2},\mathrm{3,4}\) and \(5\) is weak. However, its relationship with the reference country’s economy (label 7) is stronger, as shown in Table 2 below. For example, the amount of artificial trade carried out by the target country (label 6) has conducted with the reference country (label 7) is 800.

Table 2 Values for country-level trade relations for DGP1

After that, trade weights are calculated using these values, which are presented in Table 3.Footnote 20 As can be seen from Table 3, the importance value of the reference country (label 7) for the target country (label 6) is 53%. Thus, the reference country is more important for the target country than other countries are.

Table 3 Values for country level trade rates for DGP1

As can be seen from Table 4, the relationship between the target country (label 6) and the countries labeled 1,2,3,4, and 5 is strong, while the relationship with the reference country’s economy (label 7) is weak in DGP2. The amount of artificial trade carried out by the target country (label 6) has conducted with the reference country (label 7) is 50.

Table 4 Values for country-level trade relations for DGP2 and DGP3

The trade weights are calculated using these values, which are shown in Table 5. In DGP 2, the importance of the reference country (label 7) for the target country (label 6) is 7%. The importance of the reference country for the target country is less than the importance of the other countries.

Table 5 Values for country level trade rates for DGP2 and DGP3

In DGP3, we use the trade weights which obtained in DGP2. The relationship between the target country (label 5) in DGP3 and the countries labeled \(\mathrm{1,2},\mathrm{3,4}\) and \(6\) is weak, but the relationship is strong with the reference country’s economy (label 7) is 27%, as shown in Table 5.

In Table 6, the amount of artificial trade carried out by the target country (label 5) with the reference country (label 7) is 500. In DGP4, the relationship between the target country (label 5) and the countries labeled \(\mathrm{1,2},\mathrm{3,4}\) and \(6\) is strong, but the relationship with the reference country’s economy (label 7) is weak, 13%, as shown in Table 7.

Table 6 Values for country-level trade rselations for DGP4
Table 7 Values for country-level trade rates for DGP4

In all DGPs, we examine two endogenous (domestic) variables per countries \(i\) (where \(i=1,\dots ,N\) and\({k}_{i}=2\)), denoted as \({\mathbf{x}}_{it} = \left( {x_{i1} ,x_{i2} } \right)^{\prime }\). Therefore, the dimension of the vector of all endogenous variables, \({\mathbf{x}}_{t}\), is \(\left( {2N \times 1} \right)\), and is denoted as \({\mathbf{x}}_{t} = \left( {{\mathbf{x}}_{1t}^{\prime } ,{\mathbf{x}}_{2t}^{\prime } , \ldots ,{\mathbf{x}}_{Nt}^{\prime } } \right)^{\prime }\). The initial value of all variables for all countries is set to zero, \({\mathbf{x}}_{ - 101} = 0\). In the simulations, \(T \in \left\{ {100,200,500} \right\}\) and 2000 replications are run for each experiment.

Because DdPS derives an approximation of GVAR to a global factor model, all DGPs were generated from the canonical global factor model, for \(t = - 101, - 100 \ldots ,0,1,2, \ldots ,T\),

$$ {\mathbf{x}}_{it} = \delta_{i0} + {{\varvec{\Gamma}}}_{if} f_{t} + \xi_{it} $$
(4.1)

and

$$ \delta_{i0} \sim IIDN\left( {1,1} \right), $$
$$ {{\varvec{\Gamma}}}_{if} = \left( {\begin{array}{*{20}l} {\gamma_{i11} } \hfill \\ {\gamma_{i21} } \hfill \\ \end{array} } \right) \sim IID\left( {\begin{array}{*{20}l} {N\left( {0,0.50} \right)} \hfill \\ {N\left( {0.5,0.50} \right)} \hfill \\ \end{array} } \right) $$

where \(\delta_{i0}\) and \({{\varvec{\Gamma}}}_{if}\) are vectors of \(\left( {2 \times 1} \right)\) constant coefficients and loadings of factor, respectively.

Unobservable global factors vector \(f_{t}\) is generated as

$$ f_{t} = f_{t - 1} + \eta_{ft} $$

where \(\eta_{f,t} \sim IIDN\left( {0,1} \right)\).\(\left( {2 \times 1} \right)\) vector of country/region specific effects \(\xi_{it}\), \(\xi_{t} = \left( {\xi_{1,t}^{\prime } ,\xi_{2,t}^{\prime } , \ldots ,\xi_{N,t}^{\prime } } \right)^{\prime }\), is generated as

$$ \xi_{t} = \left( {{\mathbf{I}}_{2N} - {{\varvec{\Pi}}}_{\xi } } \right)\xi_{t - 1} + {\mathbf{u}}_{t} $$
(4.2)

where \({\mathbf{u}}_{t} = \left( {{\mathbf{u}}_{1t}^{\prime } ,{\mathbf{u}}_{2t}^{\prime } , \ldots ,{\mathbf{u}}_{Nt}^{\prime } } \right)^{\prime }\), and \({\mathbf{u}}_{t} \sim IIDN_{2N} \left( {0,{\mathbf{I}}} \right)\).Footnote 21 For \(i,n = 1,2, \ldots N\), \(\left( {2N \times 2N} \right)\) matrix of long-run relationship, \({{\varvec{\Pi}}}_{\xi } = \left\{ {{\Pi }_{\xi ,in} } \right\}\), is generated as

$$ {{\varvec{\Pi}}}_{\xi } = {\varvec{\alpha}}_{\xi } {\varvec{\beta}}_{\xi }^{\prime } $$

where \({\varvec{\alpha}}_{\xi }\) means speed of adjustment, \({\varvec{\alpha}}_{\xi } = \left\{ {\alpha_{\xi ,in} } \right\}\), and long-run coefficients are \({\varvec{\beta}}_{\xi }\), \({\varvec{\beta}}_{\xi } = \left\{ {\beta_{\xi ,in} } \right\}\). Their dimensions are \(\left( {2N \times r} \right)\). \(\forall i\), we set \(r = \mathop \sum \limits_{i = 1}^{N} r_{i}\) and \(r_{i} = 1\).

For validity of following condition,

$$ Rank\left( {{{\varvec{\Pi}}}_{\xi } } \right) = Rank\left( {{\varvec{\alpha}}_{\xi } } \right) = Rank\left( {{\varvec{\beta}}_{\xi } } \right) = r < 2N $$

\(\forall i \in N\), we set

$$ \alpha_{\xi ,i} = \left( {\begin{array}{*{20}c} { - 0.5} \\ 0 \\ \end{array} } \right),\;\beta_{\xi ,i} = \left( {\begin{array}{*{20}c} 1 \\ { - 1} \\ \end{array} } \right), $$
$$ {\varvec{\alpha}}_{\xi } = \left( {\begin{array}{*{20}c} {{\varvec{\alpha}}_{\xi 1} } & 0 & \ldots & 0 \\ 0 & {{\varvec{\alpha}}_{\xi 2} } & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & {{\varvec{\alpha}}_{\xi N} } \\ \end{array} } \right) $$

and

$$ {\varvec{\beta}}_{\xi } = \left( {\begin{array}{*{20}c} {{\varvec{\beta}}_{\xi 1} } & 0 & \ldots & 0 \\ 0 & {{\varvec{\beta}}_{\xi 2} } & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & {{\varvec{\beta}}_{\xi N} } \\ \end{array} } \right) $$

The following description gives the details of the simulation. For \(y = 1,2,3,4\), \({\mathcal{S}}_{N}^{{DGP_{y} }} = \left\{ {i:i \in N} \right\}\) shows countries/regions in the \(y\)-th \(DGP\). The true model is denoted as \(TM\). For \(m = 1,2,3,4,5\), \(EM_{m}\) refers to the \(m\)-th Estimated Model (which means Rival or Candidate Model), and \(ROW_{b}\) refers to the \(b\)-th Rest of the World.Footnote 22 To understand the approach in the simulation and to facilitate the interpretation of the results, we discuss a sample DGPFootnote 23:

  • For \(y\)-th DGP, \(TM\) contains 4 countries/regions. These are \(ROW\), country 5, target country, number 6, and reference country, number 7, respectively:

    $$ {\mathcal{S}}_{N,TM = 4}^{{DGP_{y = 1} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{1} }}^{{DGP_{y = 1} }} ,5,6,7} \right\}. $$
  • For \(y\)-th DGP, the region \(ROW\) consists of aggregretion of countries 1, 2, 3, 4:

    $$ {\mathcal{S}}_{{N,ROW_{1} }}^{{DGP_{y = 1} }} = \left\{ {1,2,3,4} \right\}. $$
  • For \(y\)-th DGP, countries, estimated in first rival model:

    $$ {\mathcal{S}}_{{N,EM_{1} }}^{{DGP_{y = 1} }} = \left\{ {1,2,3,4,5,6,7} \right\}, $$
    $$ {\mathcal{S}}_{{N,ROW_{0} }}^{{DGP_{y = 1} }} = \{ \} . $$
  • For \(y\)-th DGP, countries, estimated in second rival model:

    $$ {\mathcal{S}}_{{N,EM_{2} }}^{{DGP_{y = 1} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{1} }}^{{DGP_{y = 1} }} ,3,4,5,6,7} \right\}, $$
    $$ {\mathcal{S}}_{{N,ROW_{1} }}^{{DGP_{y = 1} }} = \left\{ {1,2} \right\}. $$
  • For \(y\)-th DGP, countries, estimated in third rival model:

    $$ {\mathcal{S}}_{{N,EM_{3} }}^{{DGP_{y = 1} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{2} }}^{{DGP_{y = 1} }} ,4,5,6,7} \right\}, $$
    $$ {\mathcal{S}}_{{N,ROW_{2} }}^{{DGP_{y = 1} }} = \left\{ {1,2,3} \right\}. $$
  • For \(y\)-th DGP, countries, estimated in forth rival model:

    $$ {\mathcal{S}}_{{N,EM_{4} }}^{{DGP_{y = 1} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{3} }}^{{DGP_{y = 1} }} ,5,6,7} \right\}, $$
    $$ {\mathcal{S}}_{{N,ROW_{3} }}^{{DGP_{y = 1} }} = \left\{ {1,2,3,4} \right\}. $$
  • For \(y\)-th DGP, countries, estimated in fifth rival model:

    $$ {\mathcal{S}}_{{N,EM_{5} }}^{{DGP_{y = 1} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{4} }}^{{DGP_{y = 1} }} ,6,7} \right\}, $$
    $$ {\mathcal{S}}_{{N,ROW_{4} }}^{{DGP_{y = 1} }} = \left\{ {1,2,3,4,5} \right\}. $$

4.2 The Results of Monte Carlo Experiments

Tables 8, 9, 10, 11, 12, 13 and 14 report the results of the \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) criterion in Monte Carlo experiments with various DGP specifications as mentioned above.Footnote 24 These tables display the frequencies at which the criteria select correct and incorrect models. The rows indicate the TM, while the columns show the EM. Cells marked with * denote the frequency of correctly determining the model based on the criterion. Other cells display the selection frequency results of the candidate model.

Table 8 Average selected model frequencies of \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) in DGP1
Table 9 The relationship with countries for \({\mathcal{S}}_{N,TM}^{{DGP_{y = 1} }}\)
Table 10 The relationship with countries for \({\mathcal{S}}_{{N,EM_{3} }}^{{DGP_{y = 1} }}\)
Table 11 The relationship with countries for \({\mathcal{S}}_{N,DM = 4}^{{DGP_{y = 1} }}\) and \({\mathcal{S}}_{N,DM = 4}^{{DGP_{y = 2} }}\)
Table 12 Average selected model frequencies of \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) in DGP2
Table 13 Average selected model frequencies of \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) in DGP3
Table 14 Average selected model frequencies of \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) in DGP4

Table 8 reports the results for \(DGP_{y = 1}\). When the dimension of the \(TM\) is 3 and \(T = \left\{ {100,200,500} \right\}\), the \({\text{GAIC}}\) criterion selects this model is approximately 100%. However, for other results marked with *, increasing the dimension of the \(TM\) worsens the performance of the \({\text{GAIC}}\) criterion. It also exhibits bias towards selecting the alternative candidate model with 3 countries. This reflects the over-parameterization of an Akaike-based criterion, as described in Sect. 3.2. For example, when the dimension of the \(TM\) is 4 and \(T = \left\{ {100,200,500} \right\}\), the value of the criterion increases, favoring the selection of the nearest alternative, which is the candidate model with 3 countries. More specifically, the \({\text{GAIC}}\) criterion is inherently sensitive to the number of parameters

When the dimension of \(TM\) is 3 and \(T = \left\{ {100,\;200,\;500} \right\}\), the \({\text{GAIC}}^{ad - hoc}\) results in Table 8 reveal a selection frequency of approximately 91%, similar to the \({\text{GAIC}}\). If the dimension of \(TM\) is 4 and \(T = 100\), the selection frequency of \(TM\) for \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) is \({\text{\% }}4\) and \({\text{\% }}50\), respectively. Increasing \(T\) diminishes the selection frequency of \(TM\) for \({\text{GAIC}}^{ad - hoc}\) to 51%, 22% and 1.10%. Although similar to \({\text{GAIC}}\), the rate of decrease is slower. When evaluated in \(DGP_{y = 2}\), the frequencies are approximately 56%, 37% and 15%, respectively.

Tables 9 and 10 show that, in the case of \(DGP_{y = 1}\) with \(TM\) dimension is 4, the relationship between target country (label 6) and the country (label 5) is weak. In contrast, the target country (label 6) has a strong relationship with ROW. Table 11 reveals that the relationship between target country (label 6) and ROW increases from 34 to 67%. These observations demonstrate that the strength of relationships between countries significantly influences model selection.

For \(TM = 4,\;5\) in \(DGP_{y = 1,2}\), with \(T = 100,{ }200\) and \(500\), \({\text{GAIC}}^{ad - hoc}\) is sensitive to candidate models with \(EM = 3\). In contrast, for \(TM = 6,7\) in \(DGP_{y = 1,2}\), when \(T = 100,\;200\) and \(500\), \({\text{GAIC}}^{ad - hoc}\) is more robust and shows less sensitivity to candidate models with \(EM = 3\). Comparing the performance of \({\text{GAIC}}^{ad - hoc}\) for various setsFootnote 25 reveals some important findings. The most notable difference between \(TM = 4,\;5\) and \(TM = 6,\;7\) in \(DGP_{y = 1,2}\) is the need to consider almost all countries individually. Under these sets, \({\text{GAIC}}^{ad - hoc}\) gives successful results in small sample performances. Therefore, the ad hoc modification for \({\text{GAIC}}\) produces more successful outcomes in situations where the country dimension is large and countries need to be treated individually in the GVAR modelling approach. Indeed, this is also observed for \(TM = 5,\;6\) in \(DGP_{y = 3,4}\).

A similar result to that of the \({\text{GAIC}}\) criterion presented in Table 8 can be found in Tables 11, 12 and 13. As mentioned earlier, this scenario is consistent with the Akaike literature.

In \(DG{P}_{y=3}\), \(T = 100,\;200\) and \(500\), while for \(TM = 5\) the results given in Table 13 show that the selection frequency of \(TM\) for \({\text{GAIC}}^{ad - hoc}\) are 87%, 80% and 78%. For \(TM = 6\), the results are 90%, 87% and 78%, respectively. The same results are given in Table 14 for \(DGP_{y = 4}\). For \(T = 100,\;200\) and \(500\), and \(TM = 5\), the results indicate that the selection frequency of \(TM\) for \({\text{GAIC}}^{ad - hoc}\) are 87%, 76% and 77%. If \(TM = 6\), these proportions are 92%, 88% and 77%, respectively. Comparing these findings with the results for \(TM = 5,\;6\) in \(DGP_{y = 1,2}\), \(T = 100,\;200\) and \(500\), we conclude that:

  • An increase in the economic size of the target country,

  • A decrease in the economic size of the ROW,

can enhance the successful of the \({\text{GAIC}}^{ad - hoc}\).

We also examine that the structure used in these DGPs,

$$ {\mathcal{S}}_{N,TM = 5}^{{DGP_{y = 1,2} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{TM = 5} }}^{{DGP_{y = 1,2} }} ,4,5,6,7} \right\}, $$
$$ {\mathcal{S}}_{N,TM = 5}^{{DGP_{y = 3,4} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{TM = 5} }}^{{DGP_{y = 3,4} }} ,3,4,5,7} \right\}, $$

and

$$ {\mathcal{S}}_{{N,ROW_{TM = 5} }}^{{DGP_{y = 1,2} }} = \left\{ {1,2,3} \right\}, $$
$$ {\mathcal{S}}_{{N,ROW_{TM = 5} }}^{{DGP_{y = 3,4} }} = \left\{ {1,2,6} \right\}, $$

and the following transformation,

$$ f_{2}^{ad - hoc} \left( {Bdiag{\hat{\mathbf{\Sigma }}}_{vv,0} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{N} {\text{log}}\left| {{\hat{\mathbf{\Sigma }}}_{{{\varvec{v}}_{{\varvec{i}}} {\varvec{v}}_{{\varvec{i}}} ,0}} } \right|}}{N}, $$

We present some concluding remarks. For a country with a large economy, shocks occurring in countries with smaller economies are often too inconsequential to warrant individual treatment. Thus, these countries can be aggregated. The economic size of the target country serves as a crucial indicator in the aggregation process of smaller economies. This approach allows us to obtain the appropriate model selection without having to estimate numerous models. A similar scenario has been observed under conditions where the 6-country model is correct. Furthermore, our findings are consistent with those obtained in \(DG{P}_{y=\mathrm{3,4}}\) and \(DG{P}_{y=\mathrm{1,2}}\) where \(TM=3\). It’s also worth noting that the size of the ROW region is also an important indicator for the target country.

Considering the case where \(TM = 3\),

$$ {\mathcal{S}}_{N,TM = 3}^{{DGP_{y = 1,2} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{TM = 3} }}^{{DGP_{y = 1,2} }} ,6,7} \right\}, $$
$$ {\mathcal{S}}_{N,TM = 3}^{{DGP_{y = 3,4} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{TM = 3} }}^{{DGP_{y = 3,4} }} ,5,7} \right\}, $$

and

$$ {\mathcal{S}}_{{N,ROW_{DM = 3} }}^{{DGP_{y = 1,2} }} = \left\{ {1,2,3,4,5} \right\}, $$
$$ {\mathcal{S}}_{{N,ROW_{DM = 3} }}^{{DGP_{y = 3,4} }} = \left\{ {1,2,3,4,6} \right\}. $$

In \(DGP_{y = 1,2}\) while the country (label 5) with the second largest economy is part of the ROW region, the economy (label 6) with the weakest economy is now less important. As mentioned above, the modification made with \(f_{2}^{ad - hoc} \left( {Bdiag{\hat{\mathbf{\Sigma }}}_{vv,0} } \right)\) inherently depends on the importance of the shocks from the countries. The results observed in the case of \(TM = 4\) are similar. When constructing the model for the target country, the difference in economic size between the ROW region and the target country should not be too large.

In \(DGP_{y = 3,4}\) for \(TM = 7\), the selection of the candidate model \(EM = 6\) stands out. For this case, we have

$$ {\mathcal{S}}_{N,TM = 7}^{{DGP_{y = 3,4} }} = \left\{ {1,2,3,4,5,6,7} \right\}, $$
$$ {\mathcal{S}}_{N,EM = 5}^{{DGP_{y = 3,4} }} = \left\{ {{\mathcal{S}}_{{N,ROW_{EM = 5} }}^{{DGP_{y = 3} }} ,2,3,4,5,7} \right\}, $$

and

$$ {\mathcal{S}}_{{N,ROW_{TM = 7} }}^{{DGP_{y = 3,4} }} = \{ \} , $$
$$ {\mathcal{S}}_{{N,ROW_{EM = 5} }}^{{DGP_{y = 3,4} }} = \left\{ {1,6} \right\}. $$

In the candidate model alternative, the ROW region comprises countries with the smallest economic size, labelled as ‘1’ and ‘6’. Looking at Table 7, we can see that the relationship between the target country (label 5) and the country (label 1), is stronger than the relationship between the target country, labelled with 5, and the reference country (label 7). Moreover, country (label 6) is the second most associated country with country (label 1). In cases where individual country models need to be considered, \(GAI{C}^{ad-hoc}\) shows sensitivity to the complexity of the relationships between countries. Even though it may not be quantifiable, the involvement of countries outside the target country, due to factors such as geographical reasons, cannot be filtered out in the GVAR modelling approach. This affects the criteria.

5 Empirical Application: How Many Countries Should be Individually Included in GVAR Models for Developing Countries?

In what follows, we conduct an analysis to demonstrate the usefulness and effectiveness of model selection criteria by investigating the structure of modelling in developing countries within a GVAR framework. Our objective is to highlight the empirical performance of our approach. In doing so, we analyse each developing country individually. We propose a modelling strategy for constructing dynamic multi-country frameworks that consider the structure of trade relations and draws upon the existing GVAR literature. Before presenting our proposed modelling framework, we provide a brief overview of the methodology and data employed.

5.1 Data and Variables

In our econometric analysis, we utilize the updated GVAR-DdPS dataset, which includes data from 33 countries and covers the period from 1979Q2 to 2019Q4. This dataset is drawn from the dataset ofMohaddes and Raissi (2020).Footnote 26 Table 15 presents the list of countries in the updated GVAR-DdPS database.

Table 15 Countries in the GVAR Model

The primary focus of this paper lies on developing countries, specifically: Argentina, Brazil, Chile, China, India, Indonesia, Korea, Malaysia, Mexico, Peru, Philippines, Saudi Arabia, South Africa, Thailand, and Turkey. Our modelling strategy is to conduct an analysis and derive results for each country individually.

For country \(i\) during the period \(t\), the updated GVAR dataset comprises of quarterly macroeconomic and financial variables. These variables include log real GDP (\({y}_{it}\)), the inflation rate (\({p}_{it}\)), short-term interest rate (\({\rho }_{it}^{S}\)), long-term interest rate (\({\rho }_{it}^{L}\)), log deflated exchange rate (\({e}_{it}\)), log real equity prices (\({q}_{it}\)), and quarterly data on oil prices (\(poi{l}_{t}\)). Consequently, the vector of endogenous variables, \({\mathbf{x}}_{it}\), contains the following variables:

$$ \begin{array}{*{20}l} {} \hfill & {y_{it} = {\text{ln}}\left( {GDP_{it} /CPI_{it} } \right),{ }p_{it} = {\text{ln}}\left( {CPI_{it} } \right),q_{it} = {\text{ln}}\left( {EQ_{it} /CPI_{it} } \right),} \hfill \\ {} \hfill & {e_{it} = {\text{ln}}\left( {E_{it} } \right),\rho_{it}^{S} = 0.25 \times {\text{ln}}\left( {1 + R_{it}^{S} /100} \right),\rho_{it}^{L} = 0.25{\text{*ln}}\left( {1 + R_{it}^{L} /100} \right)} \hfill \\ \end{array} $$

where \(GDP_{it}\) is the nominal gross domestic product, \(CPI_{it}\) is the consumer price index, \(EQ_{it}\) is the nominal equity price index, \(E_{it}\) is US dollars exchange rate, \(R_{it}^{S}\) is the short rate, and \(R_{it}^{L}\) is the long rate.

We use country-specific trade-weighted averages to calculate country-specific foreign variables:

$$ {\mathbf{x}}_{it}^{*} = \mathop \sum \limits_{j = 1}^{N} w_{ij} {\mathbf{x}}_{jt} , w_{ii} = 0 $$

where, as mentioned above, \(w_{ij}\) is a set of weights such that \(\mathop \sum \nolimits_{j = 1}^{N} w_{ij} = 1\). For the empirical applications, the trade weights are calculated as the average of the years 2014–2016:

$$ w_{ij} = \frac{{T_{ij,2014} + T_{ij,2015} + T_{ij,2016} }}{{T_{i,2014} + T_{i,2015} + T_{i,2016} }} $$

where \({T}_{ij,t}\) represents the bilateral trade between country \(i\) and country \(j\) in a specific year \(t\); it is calculated as the average of country \(i\)’s exports and imports with country \(j\), \({T}_{i,t}=\sum_{j=1}^{N}{T}_{ij,t}\) for \(t=2014, 2015, 2016\). In the supplementary materials, we provide concise information and specifications of the individual models for developing countries.

5.2 Modelling Strategy and Results

We have formulated a strategy for constructing competing models to analyze individual countries. The key components of this strategy are as follows: (i) a developing country’s trade with countries in the 33-country GVAR should not be less than 50% of its total trade with all trading partners, (ii) when a developing country’s trade with its trading partners exceeds 80%, we build a model with the largest country size, and (iii) the US is always included as a separate entity in these models. For demonstration purposes, Table 16 shows the cumulative trade weights with other countries for Argentina, as an example from the GVAR DdPS dataset.

Table 16 Argentina’s cumulative trade weights with trading partners

Using the information given in Table 16, Fig. 1 illustrates the identification of models for Argentina utilizing this strategy.

Fig. 1
figure 1

Competing models for Argentina based on trade relations

According to the modelling strategy based on trade relations, four different competing models are developed. These models consist of sscountries represented by square boxes. Model 1 comprises Brazil, China, the Euro area, the US, and the ROW because of the minimum two conditions are satified. The Euro area, as in the DdPS, comprises eight out of the eleven countries that initially adopted the Euro on 1 January 1999. These are Germany, France, Italy, Spain, the Netherlands, Belgium, Austria, and Finland. The ROW region is formed by combining all the remaining countries listed in Table 15, expect Brazil, China, the Euro area, and the US. Similarly, in Model 2, the ROW region is formed by combining all the other countries in Table 15, except Brazil, China, the Euro area, the US, and Chile. Model 3 shows that India, like Chile, is added to the model individually. Consequently, using a comparable methodology, the fourth model, is constructed, and Mexico is included as the final individually modelled country, meeting the maximum country size condition (exceeding 80%). Figures 2 and 3 display the results obtained using the above modelling strategy for 15 developing countries, using \({\text{GAIC}}\) and \({\text{GAIC}}^{ad - hoc}\) respectively. The calculated results according to the criteria for these models are shown in the supplementary materials.

Fig. 2
figure 2

The results of \({\text{GAIC}}\)-model selection criterion for developing countries

Fig. 3
figure 3

The results of \({\text{GAIC}}^{ad - hoc}\)-model selection criterion for developing countries

On the one hand, the \({\text{GAIC}}\) criterion tends to favor the selection of models corresponding to the largest countries. This tendency aligns with the \({\text{AIC}}\) criterion’s behavior in other econometric model structures, such as the lag lenght selection in VAR models. It is commonly observed that the \({\text{AIC}}\) criterion tends to favor models with a larger number of parameters. This observation is also consistent with the findings presented in the theoretical and simulation experimental sections of this paper.

On the other hand, the \({\text{GAIC}}^{ad - hoc}\) criterion tends to prioritize the selection of models with a fewer number of parameters. This implies that when the trade partnership of developing countries exceeds 50%, sufficient information on the economic structure of these countries is collected. This observation is also consistent with the theoretical framework outlined in the study and the results of the simulation experiments.

Figures 2 and 3 also show findings that reinforce the significance of the US, China, the Euro area, and Japan for developing economies. As major players in the global economy, these developed countries have a significant impact on developing countries through their economic relations and interactions. They play a crucial role as major trading partners, sources of investment, and providers of technology and expertise for many developing countries.

Developing economies have significant links with major economic powers, including the US, China, the euro area and Japan. The US holds a central position in the global economy and is an important trading partner for many developing economies. China, characterized by rapid economic growth, has emerged as a crucial player in global trade. Developing economies have expanded their trade relations with China and integrated themselves into its production chain. The Euro area is also a trading partner for many developing economies. In addition, Japan is a notable trading partner for developing countries, with a robust economy in sectors such as high technology, automobiles and finance.The economic policies of these developed countries, such as interest rates, exchange rates, and trade policies, can have a direct and significant effects on the growth and competitiveness of developing countries.

These relationships with major economic powers present both opportunities and challenges for developing countries. Depending on global economic developments, these developing economies may need to adapt their economic policies and refine their foreign trade strategies. At the same time, these relationships can provide incentives for developing countries to strengthen their domestic economic foundations and enhance their competitiveness.

6 Conclusion

The main purpose of this study is to propose an approach to the problem of cross-country aggregation within GVAR models. GVAR models are superior to FAVAR models and PVAR models both in terms of modelling flexibility and the ability to aggregate cross-country aggregated estimates. As a result, academic researchers and practitioners tend to favour GVAR models and enjoy the ability to include virtually all countries’ data in their models. The availability of vast computational resources, including both the software and hardware, turn the task of multi-country modelling into a trivial one.

Nevertheless, while practicing GVAR modelling, the modeler has the liberty to consider the other economies in the world either individually or within a bundle of countries marked as the ‘Rest of the World’ (ROW). The common practice is to take into account the close trade partners individually and to bundle the others as ROW. While such an approach seems both realistic and practical, it undermines a deep-rooted principle of econometrics, which is ‘parsimony’. In relation to the principle of parsimony, avoidance from model specification errors can be facilitated via proper model selection criteria. Owing to the serious transaction costs in GVAR models, determining the appropriate model at the least cost must be considered a priority for effective day-to-day use of these models. Having noticed the literature gap in GVAR model selection, this study develops, examines and suggests an information criterion for parsimonious model selection in GVARs.

Our proposed information criteria have been developed in two stages. We considered a relatively straightforward Akaike Information Criterion-type criterion in the first stage, which is a criterion that does not distinguish between the cross-section and time dimensions. After observing the dominance of the cross-section dimension on the values of our first criterion, we adjusted it to come up with our second, i.e., proposed, information criterion. The proposed criterion, by construction, is capable of handling the cross-section and time dimensions in a balanced fashion. In that, while determining the appropriate model dimensions, expansion of a GVAR model over either of the dimensions would yield equal degrees of punishment.

Owing to the ad hoc character of the proposed information criterion, it is not feasible to test its qualities by means of formal statistical tests of hypotheses. So, we resort to two separate exercises in which we examine if the criterion works sufficiently well. In the first exercise, the small sample characteristics of the developed criterion are examined through a series of simulation experiments. The basic principle in designing experiments is that the model dimension, which is actually considered correct, both covers a set of competing models and has a subset. Thus, the effects of all possible situations in high-dimensional time series data are reflected, eventually underlining the viability of the proposed criterion. Namely, the proposed criterion has provided results that coincide with the simulated data generating processes at hand.

The findings out of our first exercise can be summarized as follows: First, the economic size of the target country that is being explained affects the country/region structure of the GVAR model that should be generated. When compared to a country with a large economy and a small country, it is less fragile to be affected by cross-country relations. Secondly, the results of the GVAR modelling approach show sensitivity to the relationship of countries with the country designated as the reference country, regardless of their economic size. The weakening of the relationship with the reference country for the target country allows other countries to participate more individually in the model. Finally, the success of the GVAR modelling approach and in connection with the model selection criteria is higher if the country/region models are individually included in the system rather than the creation of more compact regions that are aggregated.

In the second exercise, based on an updated DdPS dataset of 33 countries from 1979Q2 to 2019Q4, individual GVAR model selection problem is considered. The proposed criterion has successfully yielded the GVAR models that reflect the real-life trade relations of developing countries with their counterparts. In this exercise, the results obtained exhibit a common characteristic: developing countries are more prone to the developments in the US, China, European countries and Japan than the rest of the world. In all these exercises, parsimonious as well as economically intuitive GVAR specifications have been obtained.

All in all, the proposed information criterion of this study seems to be a promising first step in sustaining parsimony in GVARs. Future work may shed more light on a standard set of principles to be applied within the GVAR modelling approach in a way to bring together the forecast performance of models and their degree of parsimony.