Inferences on the regression coefficients in panel data models: parametric bootstrap approach

This article presents a parametric bootstrap approach to inference on the regression coefficients in panel data models. We aim to propose a method that is easily applicable for implement hypothesis testing and construct confidence interval of the regression coefficients vector of balanced and unbalanced panel data models. We show the results of our simulation study to compare of our parametric bootstrap approach with other approaches and approximated methods based on a Monte Carlo simulation study.


Introduction
Panel data are the combination of observations on a cross section of individuals, cities, factories, etc., over many time periods. Panel data have been applied in economics extensively. The Panel Study of Income Dynamics (PSID) from the Survey Research Center at the University of Michigan and the Gasoline demand panel of annual observations across 18 Organisation for Economic Cooperation and Development (OECD) countries, covering the period 1960-1978, are two famous examples of panel data. Analysis of panel data models has worked in statistics and econometrics by many researchers ( [1,5,14], and references therein). Baltagi [2] presents an overview for panel data models excellently. Precious inferences in panel data are difficult when these models have nuisance parameters. Zhao [23] suggested generalized p value inferences in panel data models of generalized p values. When nuisance parameters are present in models, generalized p values are effective to solve the testing problems [8,11,[16][17][18][19]24]. Parametric bootstrap approaches are used as another method for inferences in panel data models, when unknown parameters are present. Xu et al. [21] and [22] provided parametric bootstrap inferences for parameters of the linear combination of regression coefficients in balanced and unbalanced panel data models, respectively.
In this article, we aim to propose a method that is easily applicable for implement hypothesis testing and construct confidence interval of the regression coefficient vector of balanced and unbalanced panel data models. Our procedure is based on a new parametric bootstrap pivot variable. The performance of our PB method is compared with generalized p value approaches introduced by [23]. The numerical results in section "Simulation study" show that in terms of the type I error rate and power, the performance of our method is better than generalized p value (GPV) inferences and approximate (AP) method.
The rest of this paper is organized as follows. Our PB approaches for hypothesis testing and constructing confidence region about the regression coefficients vector are presented for the balanced and unbalanced panel data models in section "PB inferences for the regression coefficients". In section "Simulation study", the proposed PB methods are evaluated in terms of type I error rates and powers. The suggested PB approaches are illustrated with a real data example in section "Example". The some conclusions are assumed in section "Conclusions".

Balanced panel data models
Panel data regression models show the behaviour of several explanatory variables on the response variable between N individuals over T time periods. A panel data model is with where Y it and it are the response value and K explanatory variables on the ith individual for the tth time period, respectively. u it is the regression disturbance, i denotes the unobservable individual specific effect and it denotes the remainder disturbance. Usually, in the random effects model, we suppose that i ∼ N(0, 2 ) and it ∼ N(0, 2 ) vary independently. is the intercept and is K × 1 vector of unknown coefficients. Let y it denote the observed values of Y it for i = 1, 2, … , N;t = 1, 2, … , T.
Equation ( 3) is the spectral decomposition representation of , which is the main key to the following inferences. Both and are symmetric and idempotent matrices, such that = = 0 . [15] using the properties of and show that where r is an arbitrary scalar. Hence, The generalized least squares estimator (GLSE) of is obtained by [12] as

It is easy to verify
To attain the estimators of 2 and 2 1 , transformed model (2.2) is as follows: It is easy to show that ∼ N( , 2 ) and ∼ N( , 2 1 ) , such that and are mutually independent, since Therefore, we can define such that S 2 1 and S 2 are independently distributed as where 2 (m) denotes a central Chi-square random variable with m degree of freedom. Then, the unbiased estimators of (2.12) The values of 2 , 2 and then are usually unknown in practice. Therefore, we propose to replace 2 and 2 with their unbiased estimators, which leads to We can construct a approximated (AP) confidence region as where ̃ 0 is a observed value of ̃ . This approximated method is applicable while the sample size is large. Since the distribution of H is unknown and approximated method has poor performance (based on simulation results), we use a parametric bootstrap approach to approximate distribution of H.
Let s 2 and s 2 1 be the observed values of S 2 and S 2 1 in (2.9), respectively. For a given (̃ The null hypothesis (2.18) is rejected at level when D 0 > H B , where D 0 is the observed value of D. Also, it can be defined a PB p value as Therefore, H 0 is rejected at level when p < .

Unbalanced panel data Models
The unbalanced panel data model is given by: with where Y it , it and so on are similar to the balanced case which is defined, with the difference that in unbalanced case, the time period for each ith cross section is different and equal to the time T i . In matrix notations, equation (2.21) can also be expressed as Then, the generalized least square estimator (GLSE) of is Also, the GLSE of is distributed as Similar to the balanced case, we consider the following two quadratic forms defining the Between and Within residuals sums of squares to obtain the estimators of 2 and 2 .
. According to [12], the unbiased estimators of 2 and 2 can be given as Therefore, the natural estimators of and −1 are To construct a confidence region for in this case, we propose to use a similar random quantity H in (2.15) and PB approach to approximated its distribution.

Simulation study
In this section, we present the results of our simulation study to compare the size and powers of our PB approach with generalized p values by [23] and approximated methods based on a Monte Carlo simulation study. we use the abbreviation PB, GPV and AP to refer these three methods. At first, we briefly review the GPV method.
[23] only proposed a generalized p value method for testing H 0 ∶ = * v.s H 1 ∶ ≠ * in balanced panel data state. He proposed the generalized F test for testing the null hypothesis as . For a given (N, T) and ( , , 2 , 2 ) , generate and compute s 2 1 , s 2 ,̃ 0 , ̃ 0 and observed value of H from (2.15), i.e. h 0 , respectively.  Table 1. Also, we take * to be equal to (2,3,1,5) and be various values of vectors. Notice that, in this simulation, we have used the three columns of the panel data as reported in Table 2 instead of the matrix . That is, (ln Y∕N, ln P MG ∕P GDP , ln Car∕N) , where we clarified this example in section 5. The first column of Table 1 shows estimated type I error rate (actually size) of the tests and other three columns show estimated powers. We consider the following reasonable criterion for comparing the methods: firstly, a method is preferred to the other methods when its estimated size is not significantly different than 0.05. We refer to such a method as a reliable method. Secondly, the candidate for the best method must have the largest power among reliable methods, see [7,9,10,20] and [6]. In addition, using the central limit theorem, 98% confidence intervals around estimates between 0.0428 and 0.0572 cover the nominal level 0.05. In other words, if the estimated size of a test is less than or greater than that of these bounds, we can conclude that that test is conservative or liberal, respectively. In Table 1, the estimated sizes in boldface show that they are significantly less or greater than 0.05.

Generate
Note that the estimated powers vary slightly from one simulation to another [9]. Therefore, we used the well-known z test to compare powers of two methods. One can conclude that the powers of two test procedures are statistically significant at 100 % level when �p 1 −p 2 � > Z ∕2 √p (1 −p)∕5000 , where p = (p 1 +p 2 )∕2 and p 1 and p 2 denote the estimated powers of the two test procedures based on 5000 samples. In the following remarks, we discuss the results of simulation.

Remark 1
In all cases that we considered here, the estimated sizes of our PB test vary between 0.0446 and 0.0547 which shows that our proposed test behaves like the exact test.

Remark 2
The simulated size probabilities in the GPV and AP often exceed the upper limit of this range, and then, these methods are assumed to be liberal. Therefore, in this paper, the powers of these test methods cannot be comparable with our parametric bootstrap approach.

Remark 3
To compare the estimated power, in the cases that the estimated size of GPV is close to 0.05, the PB test and GPV have not significantly different powers.

Remark 4
Overall, it seems that the proposed PB method has better performance than two other methods in terms of both controlling the type I error rates and powers.

Example
To illustrate our suggested approach to inference on the regression coefficients of a panel data, we consider the following gasoline demand equation like [3] as where Gas/Car is motor gasoline consumption per auto, Y / N is real per capita income, P MG ∕P GDP is real motor gasoline price and Car∕N denotes the stock of cars per capita. This panel consists of annual observations across 18 OECD countries, covering the period 1960-1978. We take a part of the panel as well as reported in Table 3 by [23]. At first, let = ( , 1  Thus, for the problem of testing regression coefficients vector, the three methods made the same decision reject the corresponding null hypothesis at the nominal level of 5%, but PB method is not reject null hypothesis at level 1%.
In this example, for obtaining PB confidence region for the regression coefficient vector = ( , 1

Conclusions
In this article, we propose a parametric bootstrap method for testing hypothesis as well as constructing confidence region on the regression coefficients vector ( ) in panel data models in balanced and unbalanced panels. We study performance our PB method with GPV and AP methods based on simulation study in balanced state. The simulation study is compared type I error rate and power of three methods. The simulation results show close estimated size of our PB test to the nominal level (0.05), in which two other methods are often liberal (significantly greater than 0.05). However, in the cases that the estimated size of GPV is close to 0.05, the PB test and GPV have not significantly different powers. Therefore, for testing or constructing confidence region about we propose PB method.