1 Introduction

The relationship between financial development and economic growth has been a subject of research for several decades. The significance of the insurance sector as an integral part of this research has been recognised ever since the first United Nations Conference on Trade and Development (UNCTAD 1964). The increasing size of this sector and its importance as a part of the financial sector have led to the analysis of the relation between insurance market development and economic activity becoming a separate field of interest. In particular, the issue of causality between the two variables has received considerable interest in recent decades, with cointegration being a necessary part of the analysis. In spite of this interest, however, there is no consensus on this matter (see Outreville 2013, for a survey of this literature).

Most early papers concerning cointegration between the insurance market and economic activity mainly investigate time series data for a single country (see, for example, Ward and Zurbruegg 2000; Kugler and Ofoghi 2005; Adams et al. 2009). The development of panel data techniques has had a significant impact on this research. The main reason for this is the possibility to increase the precision of econometric techniques by exploiting both the time series and cross-sectional dimensions, the size of which are henceforth denoted as T and N, respectively. Cointegration analysis has, therefore, also been considered in a panel data framework. A number of tests have been developed, the most commonly applied one in practice being a set of residual-based statistics by Pedroni (2004). These tests owe their popularity to their high degree of flexibility: the procedure can accommodate panels with both heterogeneous dynamics and heterogeneous slope coefficients. However, whilst being flexible in some aspects, there is also a number of restrictions if the asymptotic properties are to hold. This can be a significant complication when it comes to analysing real data. Problems arise since available data is often relatively short in both T and N, while the theoretical results are derived using the assumption that both T and N are large with \(N/T \rightarrow 0\), which in practice means that \(T>N\). We provide a more detailed discussion on the methodology and the required assumptions in Sect. 2 of this paper.

Such a discrepancy between theory and practice can very well result in incorrect inference. In particular, if the constraint that \(T>N\) is not met, tests may suffer from severe size distortions (see, for example, Wagner and Hlouskova 2010, for the case of cointegration, or Hlouskova and Wagner 2006; Westerlund and Breitung 2013, for the unit root case). For the purpose of illustrating the problems that may arise we conduct a Monte Carlo simulation for a wide range of constellations of T and N. This is done in Sect. 3. The results show that even for the simplest data generating process the tests tend to heavily over-reject the null of no cointegration when T is small relative to N. The size distortions tend to disappear as T grows, but only very slowly. This means that a rejection by these tests cannot be taken as unreserved evidence of cointegration, since the possibility remains that the result is due to the smallness of T.

In Sect. 4, we illustrate the above-mentioned problem using data on insurance market activity and economic activity . In this literature, Pedroni’s tests are the workhorse approach for examining cointegration relationships between economic activity and different macroeconomic variables (Lee 2005; Tatahi et al. 2016; Fowowe 2011; Costantini and Martini 2010; Kim et al. 2017), including variables for the insurance sector (Lee 2005, 2011, 2013; Lee et al. 2013; Liu and Zhang 2016; Pradhan et al. 2017). The problem is that the restrictions on N / T are almost never met. For example, Lee et al. (2013) look at 41 countries over the time period 1979–2007; Liu and Zhang (2016) examine 45 countries over 1980–2011; Pradhan et al. (2017) examines cointegration for 18 middle-income countries across the years 1980–2012. In the case of the insurance-economic activity nexus, the main data restriction lies within the insurance sector. The reason for this is that there is limited data available at the country level. The only provider of data for a relatively broad selection of countries is the Swiss Re Institute.Footnote 1 However, the data is only available annually from 1980 onwards, which results in a relatively small T. In our empirical analysis, we consider 49 countries over 36 years, 1980–2015. The fact that we use data until 2015 means that our T is larger than in previously used datasets (see, for example, Lee 2013). Details of the data and the description of the variables can be found in Sect. 4.1. When considering the full sample, so that \(N>T\), the results show that Pedroni’s tests reject the null hypothesis of no cointegration. This is no longer true if the relationship between N and T is reversed. In particular, we consider two subsamples dividing the initial set of countries into advanced and developing economies, 25 and 24, respectively. In those cases, the \(T>N\) restriction is met. However, in both cases we do not find any evidence of cointegration.

As a response to the size issues with Pedroni’s tests, we consider a different cointegration testing method that is more suitable for the given data size. We apply a residual based panel cointegration test by Banerjee and Carrion-i-Silvestre (2017), which exhibits better size performance in panels where T is relatively small. Applying this test to our data sample, we do not find any evidence of cointegration between the variables. This strengthens our claim that much of the provided evidence for cointegration between insurance sector and economic activity are likely to be due to the limited time dimension of datasets used in practice.

2 Cointegration testing

We observe panel data for \(i=1,...,N\) cross section units and \(t=1,...,T\) time periods. For each member of the panel, we consider a scalar dependent variable \(y_{it}\) and a \(m\ \times \ 1\) vector of explanatory variables \(\mathbf {x}_{it}\). All variables are assumed to be integrated of order one, I(1). The estimated regression is of the following form:

$$\begin{aligned} y_{it}=\alpha _i + \varvec{\beta }_i' \mathbf {x}_{it}+\epsilon _{it} \end{aligned}$$
(1)

where \(\alpha _i\) is a member-specific fixed effect, while \(\varvec{\beta }_i\) is a vector of member-specific slope coefficients.

We are interested in testing the null hypothesis of no cointegration for all individuals in the panel. Under this null, the residuals from the estimated regression in (1) will be I(1). In order to test if this is true, Pedroni (1999, 2004) considers seven different residual-based test statistics which pool the residuals of the regression either along the within dimension of the panel or along the between dimension. However, in this paper we focus on the panel and group mean t-statistics, which are by far the two most popular test statistics. These statistics are the only ones shown to exhibit relatively high power for small T (Pedroni 1999; Gutierrez 2003). Both t-statistics are formulated as panel analogies to the conventional time series semiparametric t-statistics studied in Phillips and Ouliaris (1990). Those statistics are based on testing a unit root hypothesis for the residual series obtained from the cointegration relation. The difference in the two statistics lies in the type of pooling used. The panel t-statistic, \(Z_{{\hat{t}}_{NT}}\), is “based on the estimator that effectively pools the autoregressive coefficient across different members for the unit root tests on the estimated residuals” (Pedroni 1999). Meanwhile, the group mean statistic, \({\tilde{Z}}_{{\hat{t}}_{NT}}\), is based on the estimator, which individually estimates the autoregressive coefficients for each member of the cross section and then simply averages them. Asymptotically both statistics follow a standard normal distribution. However, for this to hold there are at least three assumptions that must be met.

The first assumption is that an invariance principle holds for each individual time series of the panel separately as \(T \rightarrow \infty \), which is not very restrictive. The second assumption is more problematic in this regard and requires no correlation between the units of the panel, which is clearly very restrictive. In his application, Pedroni (2004) suggests a simple solution for partially dealing with the problem, namely to subtract the cross-sectional averages from the data. However, such demeaning is unlikely to be enough in general. The third assumption is that not only should both T and N be large, but T should be significantly larger than N, a condition that is unlikely to hold in application to insurance market data. Such a dimensionality assumption, even though being restrictive, is not unique to Pedroni’s tests, but is actually a foundation for the results to be found in the majority of panel cointegration tests (see Baltagi and Choi 2015, for a survey of this literature). All these tests are therefore likely to suffer similar consequences as the Pedroni’s test when the dimensionality restriction is not met.

Recently, Banerjee and Carrion-i-Silvestre (2017) suggested a test procedure that is not only more flexible with regards to the relative size of T and N, but is also very general as regards the type of cross-sectional dependence that can be permitted. This test procedure is based on the common correlated effects estimator (CCE) of Pesaran (2006), which assumes that the cross-sectional dependence has a common factor representation. These factors can be approximated by the cross section averages of the variables in the model. The appropriate test regression is therefore given by (1) with the right-hand side augmented with said averages;

$$\begin{aligned} y_{it}=\alpha _i + \varvec{\beta }_i' \mathbf {x}_{it}+\varvec{\lambda }_{i}\bar{\mathbf {z}}_t+\upsilon _{it}, \end{aligned}$$
(2)

where \(\bar{\mathbf {z}}_t=({\bar{y}}_t,\bar{\mathbf {x}}'_t)'\) is a \((m+1)\ \text {x}\ 1\) vector collecting the cross section averages of the dependent variable and the stochastic regressors. The model is estimated using the pooled CCE estimator, and the residuals are then to be tested for stationarity. This test statistic in practice builds upon the cross section augmented Dickey–Fuller statistic in spirit of Pesaran (2007) and Pesaran et al. (2013). This being said, the critical values for it in a small sample setting are different from those of Pesaran (2007) and Pesaran et al. (2013). Therefore, Banerjee and Carrion-i-Silvestre (2017) tabulate the critical values for a wide range of different combinations of T and N. The test exhibits good size performance for small samples even when T is smaller than N. This test, therefore, fits well with the application considered in this paper.

3 Monte Carlo simulations

In this section, we study the small sample performance of Pedroni’s (2004) tests for a wide range of constellations of T and N. We consider \(N=\{10,\ 20,\ 30,\ 40,\ 50\}\) and \(T=\{20,\ 30,\) 40,  \(50,\ 100,\ 1000\}\), which cover the usual range for the number of units and time periods in a typical macro type panel.

The data generating process (DGP) is a modification of the one used by Pedroni (2004). He considers the case of a single stochastic regressor, where both \(y_{it}\) and \(\mathbf {x}_{it}\) are generated by random walk processes. We also consider our variables to be random walks. However, the number of regressors is set to two in order to match our empirical application. Hence, letting \( \mathbf {z}_{it}=(y_{it}, \mathbf {x}'_{it})'\) the DGP under the null hypothesis is given by

$$\begin{aligned} \begin{aligned} \mathbf {z}_{it}=\mathbf {z}_{it-1}+\varvec{\eta }_{it}, \end{aligned} \end{aligned}$$
(3)

where \(\varvec{\eta }_{it}\sim N(0,I_{3})\). Pedroni allows for a particular type of autocorrelation in \(\varvec{\eta }_{it}\). In this section, however, we assume that the error terms are normally distributed. This is a conservative approach since the simplification implies that the resulting DGP is ideal for the tests at hand. If the tests do not perform well under such circumstances, they are hardly likely to perform well under more realistic DGP settings. Assuming normal errors also allows us to isolate the effect of N and T by abstracting from other problems, such as autocorrelation.

We simulate the data 1000 timesFootnote 2 and calculate the panel and group mean residual-based t-statistics of Pedroni (2004). We calculate the rejection rate at the 5% significance level. Given that the data is generated as independent random walks, we expect to reject the null hypothesis of no cointegration 5% of the time. The obtained rejection rates are presented in Table 1.

Table 1 Size (%) of the Pedroni (2004) t-statistics in a simple spurious regression model

The first thing that strikes us is that the simulated sizes are considerably larger than the expected 5%. In fact, the size distortions are extremely large when N and T are of comparable size, and they become even larger when \(N>T\). The test statistics we have considered therefore tend to over-reject the null when the condition that \(T>N\) is not met. However, we also see that the distortions do tend to come down with increases in T. Consider, for example, the case when \(N=30\). While \(T=50\) yields unacceptably high size distortions, when T increases to 1000 the distortions are much smaller. Size accuracy is not perfect, though, and some distortions remain even when T is as large as 1000. These results demonstrate quite clearly the consequences of applying Pedroni’s cointegration test to panels where T is not large enough relative to N. Given that the results are obtained for such a simple DGP, it is highly unlikely that the test performance is reliable under any other more realistic settings. This leads us to the conclusion that inferences based on these tests are also likely to be misleading in applications to insurance market activity and economic development, where T is usually smaller than N. In the next section, we elaborate on this.

4 Empirical results

4.1 Data and preliminary results

In order to re-examine the relationship between insurance market development and economic activity, we consider a panel of 49 countries covering the years 1980–2015. The data for the insurance sector, as already mentioned, is taken from the largest currently existing database by Swiss Re Institute. Life (\(\mathrm{LP}_{it}\)) and non-life (\(\mathrm{NLP}_{it}\)) insurance sectors are examined separately,Footnote 3 since these two types of insurance are commonly known to exhibit different activity patterns depending on the level of economic development. The sectors’ activity is measured in terms of insurance density, that is, insurance premiumsFootnote 4 per capita. Gross domestic product (\(\mathrm{GDP}_{it}\)) is also taken as a per capita variable in order to eliminate the population growth effect. Both insurance density and GDP per capita are given in US dollars at constant 2010 prices and exchange rates. The data on GDP, consumer price index, exchange rates and population are taken from the World Development Indicators database. Following common practice, insurance variables and GDP are transformed into logarithms in the analysis. To be able to separate the effect of the insurance sector from the financial sector, we include a proxy for banking sector development, which is measured as private credit by deposit money, \(\mathrm{credit}_{it}\), (see Loayza and Ranciere 2006; Lee 2013). This data is taken from Financial Development and Structure Dataset (June, 2017) and is measured as percentage of GDP.

Table 2 Countries considered according to the level of economic development

One advantage of the dataset being considered here is that it is relatively large in both N and T. To the best of our knowledge, 36 years is the longest time span considered so far when examining the insurance-economic activity nexus. The size of N allows us not only to examine the sample as a whole, but also to consider smaller subsamples. Given that different types of insurance (life versus non-life) have different development trajectories depending on a country’s level of economic development, we then look at advanced and developing economies separately (Ward and Zurbruegg 2000; Arena 2008; Haiss and Sümegi 2008). The number of advanced (developing) economies is given by 25 (24), see Table 2. The division into advanced and developing countries is interesting in its own, but also for what it implies for N / T. In fact, while \(T<N\) for the full sample, in the two subsamples \(T>N\). Hence, if the relative size of N and T is indeed an important factor in determining the performance of Pedroni’s test, we would expect the evidence of cointegration to be strongest for the full sample where N is largest relative to T.

Before we can actually test for cointegration, there are a number of prerequisite conditions that need to be satisfied. As stated earlier, Pedroni’s tests rely on the assumption that there is no cross-unit correlation in the data. In order to check the suitability of this assumption in our data, we consider the cross-sectional dependence (CD) test of Pesaran (2004), which tests the null hypothesis of cross-sectional independence. This test exhibits good small sample performance for a wide range of constellations of T and N and is therefore applicable in our case.

Table 3 Evidence on cross section dependence in insurance activity, as indicated by Pesaran’s (2004) CD test

Applying the CD test to our raw data shows that the null is rejected at all conventional levels of significance (see Table 3); the data can therefore be considered to be cross-sectionally dependent, which means that we cannot apply Pedroni’s tests directly. We therefore follow Pedroni’s advice and demean the data. When testing the transformed data, we see that we can no longer reject the null for some subsamples, suggesting that the demeaning is quite effective in accounting for cross-correlation in the data. However, as expected, for most subsamples the null is still rejected, which shows that demeaning might not always work and that one should actually consider more general testing procedures.

The next step of the preliminary analysis is to check if all the variables are indeed I(1). For this purpose, we apply the extended cross-sectionally augmented panel unit root test (CIPS) of Pesaran et al. (2013). The CIPS test statistic can be seen as a version of the popular IPS test of Im et al. (2003), where the fitted test regression is augmented with a set of cross section averages to account for cross-sectional dependence (see Pesaran 2007; Pesaran et al. 2013). In particular, the auxiliary regression includes the cross section average of the inspected series together with cross section averages of series sharing the unobserved common component in order to allow a multifactor error structure. One reason for choosing the CIPS test is therefore that it allows for the presence of cross-unit correlation in the data. Another reason is that it is very flexible in terms of the allowable values of T and N. Moreover, the test exhibits good small sample performance, which is an important condition when applying the test to real data.

Table 4 Results for the Pesaran et al. (2013) CIPS panel unit root tests (Full sample)

The unit root test results for the whole sample are provided in Table 4. The results for the subsamples of advanced and developing economies are available upon request. We include up to 3 lags. For all these cases, the test cannot reject the null of a unit root in levels. Depending on the choice of lag length, there is some variation in the results for the data in first differences; however, for the most commonly used lag length of one the null is rejected. We therefore proceed with the analysis considering all the variables to be I(1).

4.2 Empirical results

The hypothesized cointegration relationship is given by

$$\begin{aligned} \log ({\hbox {GDP}}_{it})=\alpha _i + \beta _i \log ({\hbox {IP}}_{it})+\gamma _{i} {\hbox {credit}}_{it}+\epsilon _{it} \end{aligned}$$
(4)

where \({\hbox {IP}}_{it}\) is either life sector or non-life sector insurance density.

We first apply the Pedroni’s tests to the demeaned data. The results for both types of insurance and for different groups of countries can be found in Table 5. When looking at the whole sample of 49 countries, we see that we reject the null of no cointegration for both life and non-life insurance sectors at the 5% level. This is completely in line with previous research, which typically finds evidence of cointegration. It is important to note, however, that the conclusions change when looking at the subsamples for advanced and developing economies. For these subsamples, we are not able to reject the null. That is, the evidence of cointegration goes away when N decreases relative to T, which is consistent with the Monte Carlo simulations. The rejection of the null for the full sample is therefore likely a result of size distortions when the requirement that \(T>N\) is not met.

Table 5 Pedroni (2004) cointegration test on demeaned data

Since Pedroni’s tests cannot be trusted for these values of T and N, we focus instead on the test of Banerjee and Carrion-i-Silvestre (2017). We apply this test to the raw data (not demeaned) since the test allows for the presence of cross section correlation between units of the panel. The calculated test statistics are provided in Table 6. Banerjee and Carrion-i-Silvestre (2017) tabulate appropriate critical values, which depend on T and N, the number of variables tested for cointegration, the number of lags included in the test regression, and the number of common factors. As usual, the result can be sensitive with regards to the choice of lag length, p, which is needed to control for serial correlation. Banerjee and Carrion-i-Silvestre (2017) do not provide a lag selection procedure and consider a set of possible lag lengths in their application. We follow their analysis and consider the following lag lengths: \(p=0,1,2,3\), where the maximum lag length to consider is set equal to the integer part of \(4(T/100)^{1/4}\), in line with Pesaran et al. (2013). The calculated statistics are provided in Table 6, while the critical values for the test can be found in Banerjee and Carrion-i-Silvestre (2017) for each particular case. In all of the considered cases, we are not able to reject the null hypothesis of no cointegration.

Table 6 Banerjee and Carrion-i-Silvestre (2017) cointegration test

Hence, the results of the Banerjee and Carrion-i-Silvestre (2017) cointegration test do not provide any evidence in favour of cointegration between insurance market activity and economic development. Again, this reinforces our suspicion that the existing cointegration results are misleading due to the size distortions induced by the smallness of T. This is important in itself, but also for what it implies with regards to studies of causality, which is the main focus area of current research. In particular, if the test for cointegration is misleading, then the evidence of causality will also be misleading. Therefore, as far as future research is concerned, it is of high importance to re-examine this aspect of the insurance-economic activity nexus.

5 Concluding remarks

In this paper, we look at the relationship between insurance market and economic activity in terms of cointegration. The existing literature has shown that there exists a cointegrating relationship between the two variables. However, the main work horse of this literature is the panel cointegration tests of Pedroni (2004), which require that T is substantially larger than N. This condition is commonly ignored in practice, and the tests are applied to panels of any dimensionality. Using Monte Carlo simulations, we demonstrate how Pedroni’s tests tend to over-reject the null hypothesis of no cointegration when T is not significantly larger than N. We then re-examine cointegration between insurance market development and economic activity for 49 countries over 36 years, using a popular database given by the Swiss Re Institute. We find that if Pedroni’s tests are used, we can reject the null of no cointegration. If, however, more suitable tests are used, we are no longer able to reject the null. This suggests that much of the earlier evidence on cointegration between insurance and economic activity needs to be re-evaluated, since it is quite possible that the results are due to the size of the panel. Equally important, if the evidence on cointegration is misleading, then all subsequent causality results based on taking cointegration as given are likely to be misleading, too.