Inferences on the regression coefficients in panel data models: parametric bootstrap approach

Esmaeli-Ayan, A.; Malekzadeh, A.; Hormozinejad, F.

doi:10.1007/s40096-019-00316-6

Inferences on the regression coefficients in panel data models: parametric bootstrap approach

Original Research
Open access
Published: 28 December 2019

Volume 14, pages 67–73, (2020)
Cite this article

Download PDF

You have full access to this open access article

Mathematical Sciences Aims and scope Submit manuscript

Inferences on the regression coefficients in panel data models: parametric bootstrap approach

Download PDF

A. Esmaeli-Ayan¹,
A. Malekzadeh^1,2 &
F. Hormozinejad¹

2089 Accesses
4 Citations
Explore all metrics

Abstract

This article presents a parametric bootstrap approach to inference on the regression coefficients in panel data models. We aim to propose a method that is easily applicable for implement hypothesis testing and construct confidence interval of the regression coefficients vector of balanced and unbalanced panel data models. We show the results of our simulation study to compare of our parametric bootstrap approach with other approaches and approximated methods based on a Monte Carlo simulation study.

Testing structural changes in panel data with small fixed panel size and bootstrap

Article 21 December 2014

Panel bootstrap tests of slope homogeneity

Article 21 July 2015

Bootstrap-based model selection criteria for beta regressions

Article 15 March 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Panel data are the combination of observations on a cross section of individuals, cities, factories, etc., over many time periods. Panel data have been applied in economics extensively. The Panel Study of Income Dynamics (PSID) from the Survey Research Center at the University of Michigan and the Gasoline demand panel of annual observations across 18 Organisation for Economic Co-operation and Development (OECD) countries, covering the period 1960–1978, are two famous examples of panel data. Analysis of panel data models has worked in statistics and econometrics by many researchers ([1, 5, 14], and references therein). Baltagi [2] presents an overview for panel data models excellently. Precious inferences in panel data are difficult when these models have nuisance parameters. Zhao [23] suggested generalized p value inferences in panel data models of generalized p values. When nuisance parameters are present in models, generalized p values are effective to solve the testing problems [8, 11, 16,17,18,19, 24]. Parametric bootstrap approaches are used as another method for inferences in panel data models, when unknown parameters are present. Xu et al. [21] and [22] provided parametric bootstrap inferences for parameters of the linear combination of regression coefficients in balanced and unbalanced panel data models, respectively.

In this article, we aim to propose a method that is easily applicable for implement hypothesis testing and construct confidence interval of the regression coefficient vector of balanced and unbalanced panel data models. Our procedure is based on a new parametric bootstrap pivot variable. The performance of our PB method is compared with generalized p value approaches introduced by [23]. The numerical results in section “Simulation study” show that in terms of the type I error rate and power, the performance of our method is better than generalized p value (GPV) inferences and approximate (AP) method.

The rest of this paper is organized as follows. Our PB approaches for hypothesis testing and constructing confidence region about the regression coefficients vector are presented for the balanced and unbalanced panel data models in section “PB inferences for the regression coefficients”. In section “Simulation study”, the proposed PB methods are evaluated in terms of type I error rates and powers. The suggested PB approaches are illustrated with a real data example in section “Example”. The some conclusions are assumed in section “Conclusions”.

PB inferences for the regression coefficients

Balanced panel data models

Panel data regression models show the behaviour of several explanatory variables on the response variable between N individuals over T time periods. A panel data model is

$$\begin{aligned} Y_{it} ={\alpha }+{{{\mathbf{x}}}^{\prime }_{it}}{\varvec{\beta }}+u_{it}, \end{aligned}$$

(2.1)

with

$$\begin{aligned} u_{it}=\mu _{i}+\nu _{it}, \quad i=1,2,\ldots ,N ; t=1,2,\ldots ,T, \end{aligned}$$

where $Y_{it}$ and ${{\mathbf{x}}}_{it}$ are the response value and K explanatory variables on the ith individual for the tth time period, respectively. $u_{it}$ is the regression disturbance, $\mu _i$ denotes the unobservable individual specific effect and $\nu _{it}$ denotes the remainder disturbance. Usually, in the random effects model, we suppose that $\mu _i\sim N(0,\sigma ^2_\mu )$ and $\nu _{it}\sim N(0,\sigma ^2_\nu )$ vary independently. $\alpha$ is the intercept and $\varvec{\beta }$ is $K \times 1$ vector of unknown coefficients. Let $y_{it}$ denote the observed values of $Y_{it}$ for $i=1,2,\ldots ,N ; t=1,2,\ldots ,T$.

Equation (2.1) can also be expressed as matrix notations,

$$\begin{aligned} {{\mathbf{Y}}}=\alpha {{\mathbf{1}}}_{NT}+{\mathbf{X}}\varvec{\beta }+{\mathbf{Z}}_{\varvec{\mu }}{\varvec{\mu }}+\varvec{\nu }={\mathbf{Z}}\varvec{\delta }+{\mathbf{u}}, \end{aligned}$$

(2.2)

where ${\mathbf{Y}}={(Y_{11},\ldots ,Y_{1T},\ldots ,Y_{N1},\ldots ,Y_{NT})}^{\prime }$, ${\mathbf{X}}$ is a $NT \times K$ matrix, ${{\mathbf{Z}}}=[{{\mathbf{1}}}_{NT},\mathbf{X}]$, $\varvec{\delta }=(\alpha ,{\varvec{\beta }}^{\prime })^{\prime }$ is a unknown regression coefficients vector, ${\mathbf{Z}}_{\varvec{\mu }}={{\mathbf{I}}}_{N}\otimes {{\mathbf{1}}}_{T}$, ${\varvec{\mu }}={(\mu _1,\mu _2,\ldots ,\mu _N)}^{\prime }$, ${\varvec{\nu }}={(\nu _{11},\ldots ,\nu _{1T},\ldots ,\nu _{N1},\ldots ,\nu _{NT})}^{\prime }$, ${\mathbf{u}}=\mathbf{Z}_{\varvec{\mu }}{\varvec{\mu }}+\varvec{\nu }$, ${\mathbf{I}}_N$ is an identity matrix of order N, ${{\mathbf{1}}}_T$ denotes the $T\times 1$ vector whose elements are all ones and $\otimes$ denotes Kronecker product.

Let ${{\mathbf{J}}}_T={{{\mathbf{1}}}_T}{{\mathbf{1}}}_T^{\prime }$, $\overline{{\mathbf{J}}}_{T}={\dfrac{1}{T}}{{\mathbf{J}}}_T$ and ${{\mathbf{E}}}_T={{\mathbf{I}}}_T-\overline{{\mathbf{J}}}_{T}$. Then, the covariance matrix of ${\mathbf{Y}}$ is

$$\begin{aligned} Cov({{\mathbf{Y}}})={\varvec{{\varSigma }}}=\sigma _\mu ^2({\mathbf{I}}_N\otimes {{\mathbf{J}}}_T)+\sigma _\nu ^2({{\mathbf{I}}}_N\otimes {\mathbf{I}}_T)= {\sigma _1^2}{{\mathbf{P}}}+{\sigma _\nu ^2}{\mathbf{Q}}, \end{aligned}$$

(2.3)

where ${\sigma _1^2}=T{\sigma _\mu ^2}+{\sigma _\nu ^2}$, ${{\mathbf{P}}}={{\mathbf{I}}}_N\otimes \overline{{{\mathbf{J}}}}_T$ and ${{\mathbf{Q}}}={{\mathbf{I}}}_N\otimes {{\mathbf{E}}}_T$. (2.3) is the spectral decomposition representation of ${{{\varvec{\varSigma }}}}$, which is the main key to the following inferences. Both ${\mathbf{P}}$ and ${\mathbf{Q}}$ are symmetric and idempotent matrices, such that ${\mathbf{PQ=QP}}=0$. [15] using the properties of ${\mathbf{P}}$ and ${\mathbf{Q}}$ show that

$$\begin{aligned} {{{\varvec{\Sigma }}}}^{r}={\left( \sigma _1^2\right) }^{r}{{\mathbf{P}}}+{\left( \sigma _\nu ^2\right) }^{r}{{\mathbf{Q}}}, \end{aligned}$$

(2.4)

where r is an arbitrary scalar. Hence,

$$\begin{aligned} {{{\varvec{\Sigma }}}}^{-1}={\left( \sigma _1^2\right) }^{-1}{{\mathbf{P}}}+{\left( \sigma _\nu ^2\right) }^{-1}{{\mathbf{Q}}}. \end{aligned}$$

(2.5)

The generalized least squares estimator (GLSE) of $\varvec{\delta }$ is obtained by [12] as

$$\begin{aligned} \hat{\varvec{\delta }}\left( \sigma _1^2,\sigma _{\nu }^2,{\mathbf{Y}}\right) ={({\mathbf{Z}}^{\prime }{{{\varvec{\Sigma }}}}^{-1}Z)}^{-1}{\mathbf{Z}}^{\prime } {{{\varvec{\Sigma }}}}^{-1}{{\mathbf{Y}}}. \end{aligned}$$

(2.6)

It is easy to verify

$$\begin{aligned} \hat{\varvec{\delta }}\left( \sigma _1^2,\sigma _{\nu }^2,{{\mathbf{Y}}}\right) \sim N(\varvec{\delta },{({\mathbf{Z^{\prime }}{{{\varvec{\Sigma }}}}^{-1}Z})}^{-1}). \end{aligned}$$

(2.7)

To attain the estimators of $\sigma _\nu ^2$ and $\sigma _1^2$, transformed model (2.2) is as follows:

$$\begin{aligned} \begin{pmatrix} {{\mathbf{QY}}} \\ {\mathbf{PY}} \end{pmatrix} =\begin{pmatrix} {\mathbf{QZ}} \\ {\mathbf{PZ}} \end{pmatrix} \varvec{\delta }+ \begin{pmatrix} {\mathbf{Qu}} \\ {\mathbf{Pu}} \end{pmatrix} = \begin{pmatrix} {\mathbf{QX}}\varvec{\beta } \\ {\mathbf{PZ}}\varvec{\delta } \end{pmatrix} + \begin{pmatrix} {\mathbf{Q}}\varvec{\nu } \\ {\mathbf{Pu}} \end{pmatrix}. \end{aligned}$$

(2.8)

It is easy to show that ${{\mathbf{QY}}}\sim N({{\mathbf{QX}}}\varvec{\beta },\sigma ^2_\nu {\mathbf{Q}})$ and ${{\mathbf{PY}}}\sim N({{\mathbf{PZ}}}\varvec{\delta },{\sigma ^2_1}{\mathbf{P}})$, such that ${\mathbf{PY}}$ and ${\mathbf{QY}}$ are mutually independent, since

$$\begin{aligned} \mathrm{Cov} \begin{pmatrix} {\mathbf{QY}} \\ {\mathbf{PY}} \end{pmatrix} =\begin{pmatrix} \sigma _\nu ^2{\mathbf{Q}} &{} {\mathbf{0}} \\ {\mathbf{0}} &{} \sigma _1^2{{\mathbf{P}}} \end{pmatrix}. \end{aligned}$$

Therefore, we can define

$$\begin{aligned}&S_1^2 = {{\mathbf{Y}}}^{\prime }{{\mathbf{PY}}}-{{\mathbf{Y}}}^{\prime } {\mathbf{PZ}}({{\mathbf{Z}}}^{\prime }{{\mathbf{PZ}}})^{-1}{{\mathbf{Z}}}^{\prime }{\mathbf{PY}}, \nonumber \\&S_\nu ^2 = {{\mathbf{Y}}}^{\prime }{{\mathbf{QY}}}-{{\mathbf{Y}}}^{\prime } {{\mathbf{QX}}}({{\mathbf{X}}}^{\prime }{{\mathbf{QX}}})^{-1}{{\mathbf{X}}}^{\prime }{{\mathbf{QY}}}, \end{aligned}$$

(2.9)

such that $S_1^2$ and $S_\nu ^2$ are independently distributed as

$$\begin{aligned} \dfrac{S_1^2}{\sigma _1^2}\sim \chi _{(N-K-1)}^2, \dfrac{S_{\nu }^2}{\sigma _{\nu }^2}\sim \chi _{(N(T-1)-K)}^2, \end{aligned}$$

(2.10)

where $\chi _{(m)}^2$ denotes a central Chi-square random variable with m degree of freedom. Then, the unbiased estimators of $\sigma _1^2$, $\sigma _{\nu }^2$ and $\sigma _{\mu }^2$ can be given

$$\begin{aligned} {{\tilde{\sigma }}}_1^2=\dfrac{S_1^2}{N-K-1}, {{\tilde{\sigma }}}_\nu ^2=\dfrac{S_\nu ^2}{N(T-1)-K}, \quad \text {and} \quad {{\tilde{\sigma }}}_\mu ^2=\dfrac{1}{T}\left( {{\tilde{\sigma }}}_1^2-{{\tilde{\sigma }}}_\nu ^2\right). \end{aligned}$$

(2.11)

According to (2.4) and (2.5), the natural estimators of ${\varvec{{\varSigma }}}$ and ${\varvec{{\varSigma }}}^{-1}$are, respectively,

$$\begin{aligned} {{\tilde{\varvec{\varSigma }}}}={{\tilde{\sigma }}}_1^2{\mathbf{P}}+{{\tilde{\sigma }}}_\nu ^2{{\mathbf{Q}}} \quad \text {and} \quad {{\tilde{\varvec{\varSigma }}}}^{-1}=\left( {{\tilde{\sigma }}}_{1}^{2}\right) ^{-1}{\mathbf{P}}+\left( {{{\tilde{\sigma }}}_\nu ^2}\right) ^{-1}{{\mathbf{Q}}}. \end{aligned}$$

(2.12)

When ${\varvec{{\varSigma }}}$ is known, a natural pivotal quantity for inferences on ${\varvec{\delta }}$ is given by

$$\begin{aligned} H^*=(\hat{\varvec{\delta }}-{\varvec{\delta }})^{\prime }({\mathbf{Z}}^{\prime }{\varvec{{\varSigma }}}^{-1}{\mathbf{Z}})(\hat{\varvec{\delta }}-{\varvec{\delta }})\sim \chi _{(K+1)}^2. \end{aligned}$$

(2.13)

Then,

$$\begin{aligned} R_{\varvec{\delta }}=\left\{ {\varvec{\delta }}\vert ({\hat{\varvec{\delta }}}^{0}-{\varvec{\delta }})^{\prime }({\mathbf{Z}}^{\prime }{\varvec{{\varSigma }}}^{-1}{\mathbf{Z}})({\hat{\varvec{\delta }}}^{0}-{\varvec{\delta }})<\chi _{(\gamma ,K+1)}^2\right\} \end{aligned}$$

(2.14)

is an exact $100(1-\gamma )\%$ confidence region for ${\varvec{\delta }}$, where ${\hat{\varvec{\delta }}}^{0}$ is the observed value of $\hat{\varvec{\delta }}$ by replacing ${\mathbf{Y}}$ in (2.6) by ${\mathbf{y}}$ and $\chi _{(\gamma ,m)}^2$ stands for the lower $(1-\gamma )$th quantile of the central Chi-square distribution with m degree of freedom.

The values of $\sigma _\nu ^2$, $\sigma ^2_\mu$ and then ${\varvec{{\varSigma }}}$ are usually unknown in practice. Therefore, we propose to replace $\sigma _\nu ^2$ and $\sigma ^2_\mu$ with their unbiased estimators, which leads to

$$\begin{aligned} H=(\tilde{\varvec{\delta }}-{\varvec{\delta }})^{\prime }({\mathbf{Z}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}{\mathbf{Z}})(\tilde{\varvec{\delta }}-{\varvec{\delta }}), \end{aligned}$$

(2.15)

where $\tilde{\varvec{\delta }}={({\mathbf{Z}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}{{\mathbf{Z}}})}^{-1}{\mathbf{Z}}^{\prime } {{{\tilde{\varvec{\varSigma }}}}}^{-1}{{\mathbf{Y}}}$ is a feasible GLSE.

We can construct a approximated (AP) confidence region as

$$\begin{aligned} R_{\varvec{\delta }}^{AP}=\lbrace {\varvec{\delta }}\vert ({\tilde{\varvec{\delta }}}^{0}-{\varvec{\delta }})^{\prime }({\mathbf{Z}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}{\mathbf{Z}})({\tilde{\varvec{\delta }}}^{0}-{\varvec{\delta }})<\chi _{(\gamma ,K+1)}^2\rbrace , \end{aligned}$$

where ${\tilde{\varvec{\delta }}}^{0}$ is a observed value of ${\tilde{\varvec{\delta }}}$. This approximated method is applicable while the sample size is large. Since the distribution of H is unknown and approximated method has poor performance (based on simulation results), we use a parametric bootstrap approach to approximate distribution of H.

Let $s_\nu ^2$ and $s_1^2$ be the observed values of $S_\nu ^2$ and $S_1^2$ in (2.9), respectively. For a given $({\tilde{\varvec{\delta }}}^0,s_1^2,s_\nu ^2)$, let ${{\mathbf{Y}}}_{B}\sim N({{\mathbf{Z}}}{\tilde{\varvec{\delta }}}^0,{{\tilde{\varvec{\varSigma }}}}_{0})$, where ${{\tilde{\varvec{\varSigma }}}}_{0}$ is the observed value of ${{\tilde{\varvec{\varSigma }}}}$. Then, the PB pivot variable based on the random quantity (2.15) is

$$\begin{aligned} H^{B}=({\tilde{\varvec{\delta }}}_{B}-{\tilde{\varvec{\delta }}}^0)^{\prime }({\mathbf{Z}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}_{B}{\mathbf{Z}})({\tilde{\varvec{\delta }}}_{B}-{\tilde{\varvec{\delta }}}^0), \end{aligned}$$

(2.16)

where

$$\begin{aligned} {\tilde{\varvec{\delta }}}_{B}={({\mathbf{Z}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}_{B}{{\mathbf{Z}}})}^{-1} {{\mathbf{Z}}}^{\prime }{{{\tilde{\varvec{\varSigma }}}}}^{-1}_{B}{\mathbf{Y}}_{B},\quad {{\tilde{\varvec{\varSigma }}}}_{B}={{\tilde{\sigma }}}_{1B}^2{\mathbf{P}}+{{\tilde{\sigma }}}_{\nu B}^2{{\mathbf{Q}}}, \end{aligned}$$

$$\begin{aligned} {{\tilde{\sigma }}}_{1B}^2=\dfrac{S_{1B}^2}{N-K-1}, {{\tilde{\sigma }}}_{\nu B}^2=\dfrac{S_{\nu B}^2}{N(T-1)-K}, \end{aligned}$$

$$\begin{aligned} S_{1B}^2={{\mathbf{Y}}}^{\prime }_{B}{{\mathbf{PY}}}_{B}-{{\mathbf{Y}}}_{B}^{\prime } {{\mathbf{PZ}}}({{\mathbf{Z}}}^{\prime }{{\mathbf{PZ}}})^{-1}{{\mathbf{Z}}}^{\prime }{{\mathbf{PY}}}_{B} \end{aligned}$$

and

$$\begin{aligned} S_{\nu B}^2={{\mathbf{Y}}}^{\prime }_{B}{{\mathbf{QY}}}_{B}-{\mathbf{Y}}^{\prime }_{B}{{\mathbf{QX}}}({{\mathbf{X}}}^{\prime }{{\mathbf{QX}}})^{-1}{\mathbf{X}}^{\prime }{{\mathbf{QY}}}_{B}. \end{aligned}$$

Distribution of $H^{B}$ for a given $({\tilde{\varvec{\delta }}}^0,s_1^2,s_\nu ^2)$ in (2.16) does not depend on any unknown parameters. Therefore, we can construct a PB confidence region for the parameter ${\varvec{\delta }}$ based on the distribution of $H^{B},$ where $H^B_\gamma$ denotes the lower $(1-\gamma )$th quantile of $H^{B}$. Then, we propose a $100(1-\gamma )\%$ confidence region for ${\varvec{\delta }}$ by

$$\begin{aligned} R^{B}_{\varvec{\delta }}=\left\{ {\varvec{\delta }}\vert (\tilde{\varvec{\delta }}^{0}-{\varvec{\delta }})^{\prime } ({{\mathbf{Z}}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}_{0}{\mathbf{Z}})({\tilde{\varvec{\delta }}}^{0}-{\varvec{\delta }})< H^B_\gamma \right\}. \end{aligned}$$

(2.17)

Next, we consider the problem of hypothesis testing about ${\varvec{\delta }}$ as

$$\begin{aligned} H_{0}:{\varvec{\delta }}={\varvec{\delta }}^* \quad \text {vs.} \quad H_{1}:{\varvec{\delta }}\ne {\varvec{\delta }}^*, \end{aligned}$$

(2.18)

where ${\varvec{\delta }}^* = (\alpha ^*,\beta _1^*,\ldots ,\beta _K^*)^\prime$ is a pre-specified values vector. Our proposed test statistic is

$$\begin{aligned} D=(\tilde{\varvec{\delta }}-{\varvec{\delta }}^{*})^{\prime }({\mathbf{Z}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}{\mathbf{Z}})(\tilde{\varvec{\delta }}-{\varvec{\delta }}^{*}). \end{aligned}$$

(2.19)

The null hypothesis (2.18) is rejected at level $\gamma$ when $D_0>H_\gamma ^B$, where $D_0$ is the observed value of D. Also, it can be defined a PB p value as

$$\begin{aligned} \textit{p}=P(H^{B}>D_0). \end{aligned}$$

(2.20)

Therefore, $H_0$ is rejected at level $\gamma$ when $\textit{p}<\gamma$.

Unbalanced panel data Models

The unbalanced panel data model is given by:

$$\begin{aligned} Y_{it}={\alpha }+{\mathbf{x }}^{\prime }_{it}\varvec{\beta }+u_{it}, \end{aligned}$$

(2.21)

with

$$\begin{aligned} u_{it}=\mu _{i}+\nu _{it}, \quad i=1,2,\ldots ,N ; t=1,2,\ldots ,T_i, \end{aligned}$$

where $Y_{it}$, ${\mathbf{x}}_{it}$ and so on are similar to the balanced case which is defined, with the difference that in unbalanced case, the time period for each ith cross section is different and equal to the time $T_i$. In matrix notations, equation (2.21) can also be expressed as

$$\begin{aligned} {{\mathbf{Y}}}=\alpha {{\mathbf{1}}}_{n}+{\mathbf{X}}\varvec{\beta }+\mathbf{Z}_{\varvec{\mu }}{\varvec{\mu }}+\varvec{\nu }=\mathbf{Z}\varvec{\delta }+{\mathbf{u}}, \end{aligned}$$

(2.22)

where $n={{\varSigma }^N_{i=1}}T_{i},{\mathbf{Y}}={(Y_{11},\ldots ,Y_{1T_1},\ldots ,Y_{N1},\ldots ,Y_{NT_N})}^{\prime }$, ${\mathbf{X}}$ is a $n \times K$ matrix, ${{\mathbf{Z}}}=[{{\mathbf{1}}}_{n},\mathbf{X}]$, $\varvec{\delta }=(\alpha ,{\varvec{\beta }}^{\prime })^{\prime }$, ${{\mathbf{Z}}}_{\varvec{\mu }}=diag({{\mathbf{1}}}_{T_1},\ldots ,{\mathbf{1}}_{T_N})$, ${\varvec{\mu }}={(\mu _1,\mu _2,\ldots ,\mu _N)}^{\prime }$, ${\varvec{\nu }}=(\nu _{11},\ldots ,\nu _{1T_1},\ldots ,\nu _{N1},\ldots ,\nu _{NT_N})^{\prime }$ and ${\mathbf{u}}={\mathbf{Z}}_{\varvec{\mu }}{\varvec{\mu }}+\varvec{\nu }$.

${{\mathbf{J}}}_{T_i}={{{\mathbf{1}}}_{T_i}}{{{\mathbf{1}}}_{T_i}^{\prime }}$, $\overline{{\mathbf{J}}}_{T_i}={\frac{1}{T_i}}{{\mathbf{J}}}_{T_i}$ and ${{\mathbf{E}}}_{T_i}={{\mathbf{I}}}_{T_i}-\overline{{\mathbf{J}}}_{T_i}$, for $i=1,\ldots ,N$. Then, the covariance matrix of ${\mathbf{Y}}$ is

$$\begin{aligned} \begin{aligned} Cov({{\mathbf{Y}}})&={{{\varvec{\Sigma }}}}={\sigma _{\mu }^{2}}\mathrm{diag}({\mathbf{J}}_{T_1},\ldots , {{\mathbf{J}}}_{T_N})+{\sigma _\nu ^2}{{\mathbf{I}}}_n\\&\mathrm{diag}[(T_{1}{\sigma _{\mu }^{2}}+{\sigma _\nu ^2})\overline{\mathbf{J}}_{T_1},\ldots ,(T_{N}{\sigma _{\mu }^{2}}+{\sigma _\nu ^2})\overline{\mathbf{J}}_{T_N}]+{\sigma _\nu ^2}{\mathbf{Q}}, \end{aligned} \end{aligned}$$

(2.23)

where ${{\mathbf{Q}}}=\mathrm{diag}({{\mathbf{E}}}_{T_1},\ldots ,{{\mathbf{E}}}_{T_N})$. It is established that

$$\begin{aligned} {{{\varvec{\varSigma }}}^{-1}}=\mathrm{diag}\left[ {(T_{1}{\sigma _{\mu }^{2}}+{\sigma _\nu ^2})}^{-1}\overline{{\mathbf{J}}}_{T_1},\ldots ,{(T_{N}{\sigma _{\mu }^{2}}+{\sigma _\nu ^2})}^{-1}\overline{{\mathbf{J}}}_{T_N}\right] +({\sigma _\nu ^2})^{-1}{\mathbf{Q}}. \end{aligned}$$

(2.24)

Then, the generalized least square estimator (GLSE) of ${\varvec{\delta }}$ is

$$\begin{aligned} \hat{\varvec{\delta }}(\sigma _1^2,\sigma _{\nu }^2,{\mathbf{Y}})={({\mathbf{Z^{\prime }}{{{\varvec{\Sigma }}}}^{-1}Z})}^{-1}\mathbf{Z}^{\prime }{{{\varvec{\Sigma }}}}^{-1}{{\mathbf{Y}}}. \end{aligned}$$

(2.25)

Also, the GLSE of ${\varvec{\delta }}$ is distributed as

$$\begin{aligned} \hat{\varvec{\delta }}(\sigma _1^2,\sigma _{\nu }^2,{{\mathbf{Y}}})\sim N(\varvec{\delta },{({\mathbf{Z^{\prime }}{{{\varvec{\Sigma }}}}^{-1}Z})}^{-1}). \end{aligned}$$

(2.26)

Similar to the balanced case, we consider the following two quadratic forms defining the Between and Within residuals sums of squares to obtain the estimators of $\sigma _{\mu }^2$ and $\sigma _{\nu }^2$.

$$\begin{aligned} S_1^2= & {} {{\mathbf{Y}}}^{\prime }{{\mathbf{PY}}}-{{\mathbf{Y}}}^{\prime } {\mathbf{PZ}}({{\mathbf{Z}}}^{\prime }{{\mathbf{PZ}}})^{-1}{{\mathbf{Z}}}^{\prime }{\mathbf{PY}}, \nonumber \\ S_2^2= & {} {{\mathbf{Y}}}^{\prime }{{\mathbf{QY}}}-{{\mathbf{Y}}}^{\prime } {{\mathbf{QX}}}({{\mathbf{X}}}^{\prime }{{\mathbf{QX}}})^{-1}{{\mathbf{X}}}^{\prime }{{\mathbf{QY}}}, \end{aligned}$$

(2.27)

where ${{\mathbf{P}}}=\mathrm{diag}({\overline{{\mathbf{J}}}}_{T_1},...,{\overline{\mathbf{J}}}_{T_N})$ and ${S_2^2}/{\sigma ^2_\nu }\sim \chi _{(n-N-K)}^2$. According to [12], the unbiased estimators of ${\sigma ^2_\nu }$ and ${\sigma ^2_\mu }$ can be given as

$$\begin{aligned} {{{\tilde{\sigma }}}^2_\nu }=\dfrac{S_2^2}{n-N-K}, \quad {{{\tilde{\sigma }}}^2_\mu }=\dfrac{{S_1^2}-(N-K-1){{{\tilde{\sigma }}}^2_\nu }}{n-tr({(\mathbf{Z^{\prime }PZ})}^{-1}{\mathbf{Z}}^{\prime }Z_\mu Z^{\prime }_{\mu }{\mathbf{Z}})}. \end{aligned}$$

(2.28)

Therefore, the natural estimators of ${\varvec{{\varSigma }}}$ and ${\varvec{{\varSigma }}}^{-1}$ are

$$\begin{aligned} {{\tilde{\varvec{\varSigma }}}}= & {} \mathrm{diag}[(T_{1}{{{\tilde{\sigma }}}^2_{\mu }}+{{{\tilde{\sigma }}}^2_{\nu }})\overline{\mathbf{J}}_{T_1},\ldots ,(T_{N}{{{\tilde{\sigma }}}^2_{\mu }}+{{{\tilde{\sigma }}}^2_{\nu }})\overline{\mathbf{J}}_{T_N}]+{{{\tilde{\sigma }}}^2_{\nu }}{\mathbf{Q}}, \nonumber \\ {{\tilde{\varvec{\varSigma }}}}^{-1}= & {} \mathrm{diag}[(T_{1}{{{\tilde{\sigma }}}^2_{\mu }}+{{{\tilde{\sigma }}}^2_{\nu }})^{-1}\overline{\mathbf{J}}_{T_1},\ldots ,(T_{N}{{{\tilde{\sigma }}}^2_{\mu }}+{{{\tilde{\sigma }}}^2_{\nu }})^{-1}\overline{\mathbf{J}}_{T_N}]+({{{\tilde{\sigma }}}^2_{\nu }})^{-1}{\mathbf{Q}}. \end{aligned}$$

(2.29)

To construct a confidence region for ${\varvec{\delta }}$ in this case, we propose to use a similar random quantity H in (2.15) and PB approach to approximated its distribution.

Simulation study

In this section, we present the results of our simulation study to compare the size and powers of our PB approach with generalized p values by [23] and approximated methods based on a Monte Carlo simulation study. we use the abbreviation PB, GPV and AP to refer these three methods. At first, we briefly review the GPV method.

[23] only proposed a generalized p value method for testing $H_{0}: {\varvec{\delta }} = {\varvec{\delta }}^*$ v.s $H_{1}: {\varvec{\delta }} \ne {\varvec{\delta }}^*$ in balanced panel data state. He proposed the generalized F test for testing the null hypothesis as

$$\begin{aligned} {{\tilde{T}}}_T({{\mathbf{Y}}}; {{\mathbf{y}}}, \sigma _1^2, \sigma _\nu ^2, {\varvec{\delta }})=\frac{\left( \hat{\varvec{\delta }}(\sigma _1^2, \sigma _\nu ^2, {\mathbf{Y}})-{\varvec{\delta }}^*)^{\prime }S_{\varvec{\delta }}^{-2}(\sigma _1^2, \sigma _\nu ^2)(\hat{\varvec{\delta }}(\sigma _1^2, \sigma _\nu ^2, {\mathbf{Y}})-{\varvec{\delta }}^*\right) }{\left( \hat{\varvec{\delta }}(\sigma _1^2\frac{ss_1}{SS_1}, \sigma _\nu ^2\frac{ss_\nu }{SS_\nu }, {\mathbf{y}})-{\varvec{\delta }}^*)^{\prime }S_{\varvec{\delta }}^{-2}(\sigma _1^2\frac{ss_1}{SS_1}, \sigma _\nu ^2\frac{ss_\nu }{SS_\nu })(\hat{\varvec{\delta }}(\sigma _1^2\frac{ss_1}{SS_1}, \sigma _\nu ^2\frac{ss_\nu }{SS_\nu }, {\mathbf{y}})-{\varvec{\delta }}^*\right) }. \end{aligned}$$

Subsequently, the generalized p value can be computed as

$$\begin{aligned} \textit{p}=P\left( T_T\ge 1\mid H_0)=P(\frac{\chi ^2}{(\hat{\varvec{\delta }}(\frac{ss_1}{U}, \frac{ss_\nu }{V}, {\mathbf{y}})-{\varvec{\delta }}^*)^{\prime }S_{\varvec{\delta }}^{-2}(\frac{ss_1}{U}, \frac{ss_\nu }{V})(\hat{\varvec{\delta }}(\frac{ss_1}{U}, \frac{ss_\nu }{V}, {{\mathbf{y}}})-{\varvec{\delta }}^*)}\ge 1\right) , \end{aligned}$$

where $S_{\varvec{\delta }}^2(\sigma _1^2,\sigma _\nu ^2)=(\mathbf{Z }^\prime {\varvec{{\varSigma }}}^{-1} \mathbf{Z })^{-1}, U\sim \chi ^2_{(N-K-1)}, V\sim \chi ^2_{(N(T-1)-K)}, \chi ^2\sim \chi ^2_{(K+1)}$ and $\chi ^2, U, V$ are mutually independent.

Algorithm: We use the following steps to estimate powers of the PB and GPV methods.

1. For a given (N, T) and $({{\mathbf{Z}}}, \varvec{\delta }, \sigma _\mu ^2,\sigma _\nu ^2)$, generate $\mathbf{y}$ and compute $s_1^2,s_\nu ^2,{{\tilde{\varvec{\Sigma }}}}_{0}$, ${\tilde{\varvec{\delta }}^{0}}$ and observed value of H from (2.15), i.e. $h_0$, respectively.

2. Generate ${{\mathbf{Y}}}_B\sim N({\mathbf{Z}}{\tilde{\varvec{\delta }}^{0}},{{\tilde{\varvec{\Sigma }}}}_{0})$, $U\sim \chi ^2_{(N-K-1)}, V\sim \chi ^2_{(N(T-1)-K)}, \chi ^2\sim \chi ^2_{(K+1)}$.

3. Repeat step 2 many times ($n=5000$) to obtain values of $H^B_1$,...,$H^B_n$ and $T_{T_1}$,...$T_{T_n}$ and compute the estimations of the p values of PB and GPV methods.

4. Repeat steps (1) to (3) for $\textit{m}=5000$ times to obtain estimations of the two test powers.

For power estimation of the AP method, we compute the fraction of times that the value of $D_0$ is exceed $\chi ^2_{(\gamma , K+1)}.$

The results of simulation for the different values of $N, T, \sigma _\nu ^2, \sigma _\mu ^2$ are shown in Table 1. Also, we take $\varvec{\delta }^*$ to be equal to (2, 3, 1, 5) and $\varvec{\delta }$ be various values of vectors. Notice that, in this simulation, we have used the three columns of the panel data as reported in Table 2 instead of the matrix $\mathbf X$. That is, $(\ln Y/N, \ln P_{MG}/P_{GDP}, \ln Car/N)$, where we clarified this example in section 5. The first column of Table 1 shows estimated type I error rate (actually size) of the tests and other three columns show estimated powers. We consider the following reasonable criterion for comparing the methods: firstly, a method is preferred to the other methods when its estimated size is not significantly different than 0.05. We refer to such a method as a reliable method. Secondly, the candidate for the best method must have the largest power among reliable methods, see [7, 9, 10, 20] and [6]. In addition, using the central limit theorem, 98% confidence intervals around estimates between 0.0428 and 0.0572 cover the nominal level 0.05. In other words, if the estimated size of a test is less than or greater than that of these bounds, we can conclude that that test is conservative or liberal, respectively. In Table 1, the estimated sizes in boldface show that they are significantly less or greater than 0.05.

Table 1 Simulated powers of the GPV, PB and AP tests at 5% nominal level

Full size table

Table 2 Data of motor gasoline consumption

Full size table

Note that the estimated powers vary slightly from one simulation to another [9]. Therefore, we used the well-known z test to compare powers of two methods. One can conclude that the powers of two test procedures are statistically significant at 100$\alpha$% level when $|{{\hat{p}}_1}-{{\hat{p}}_2}|>Z_{\alpha /2}\sqrt{{\hat{p}}(1-{\hat{p}})/5000}$, where ${\hat{p}}=({\hat{p}}_1+{\hat{p}}_2)/2$ and ${\hat{p}}_1$ and ${\hat{p}}_2$ denote the estimated powers of the two test procedures based on 5000 samples. In the following remarks, we discuss the results of simulation.

Remark 1

In all cases that we considered here, the estimated sizes of our PB test vary between 0.0446 and 0.0547 which shows that our proposed test behaves like the exact test.

Remark 2

The simulated size probabilities in the GPV and AP often exceed the upper limit of this range, and then, these methods are assumed to be liberal. Therefore, in this paper, the powers of these test methods cannot be comparable with our parametric bootstrap approach.

Remark 3

To compare the estimated power, in the cases that the estimated size of GPV is close to 0.05, the PB test and GPV have not significantly different powers.

Remark 4

Overall, it seems that the proposed PB method has better performance than two other methods in terms of both controlling the type I error rates and powers.

Example

To illustrate our suggested approach to inference on the regression coefficients of a panel data, we consider the following gasoline demand equation like [3] as

$$\begin{aligned} \ln \frac{\rm Gas}{\text{Car}}=\alpha +\beta _1 \ln \frac{Y}{N}+\beta _2 \ln \frac{P_{\rm MG}}{P_{\rm GDP}}+\beta _3 \ln \frac{\rm Car}{N}+u, \end{aligned}$$

where Gas/Car is motor gasoline consumption per auto, Y / N is real per capita income, ${P_\mathrm{MG}}/{P_\mathrm{GDP}}$ is real motor gasoline price and $\mathrm{Car}/N$ denotes the stock of cars per capita. This panel consists of annual observations across 18 OECD countries, covering the period 1960–1978. We take a part of the panel as well as reported in Table 3 by [23]. At first, let ${\varvec{\delta }} = (\alpha ,\beta _1,\beta _2,\beta _3)^{\prime }$, then a computed GLSE of ${\varvec{\delta }}$ is given as $\tilde{\varvec{\delta }} = (0.765, 0.323, -0.469, -0.578)^\prime$ and unbiased estimators of $\sigma _\mu ^2$ and $\sigma _\nu ^2$ are 0.0552 and 0.0012, respectively. The p values are computed using simulations consisting of 20,000 runs in the three methods.

For testing $H_0: {\varvec{\delta }}=(1.7, 0.55,-0.42, -0.61)^{\prime }$, the p values of the PB, GPV and AP for the regression coefficients are computed to be 0.014, 0.0006 and 0.0002, respectively. Thus, for the problem of testing regression coefficients vector, the three methods made the same decision reject the corresponding null hypothesis at the nominal level of 5%, but PB method is not reject null hypothesis at level 1%.

In this example, for obtaining PB confidence region for the regression coefficient vector ${\varvec{\delta }} = (\alpha ,\beta _1,\beta _2,\beta _3)^{\prime }$, at the confidence level of 0.95 by (2.17), we have

$$\begin{aligned} R_{\varvec{\delta }}^B=\lbrace {\varvec{\delta }}\vert (\tilde{\varvec{\delta }}^{0}-{\varvec{\delta }})^{\prime } ({{\mathbf{Z}}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}_{0}{\mathbf{Z}})({\tilde{\varvec{\delta }}}^{0}-{\varvec{\delta }})< H^B_{0.05}\rbrace , \end{aligned}$$

(4.1)

where,

$$\begin{aligned} {{\mathbf{Z}}}^{\prime }{{\tilde{\varvec{\varSigma }}}}^{-1}_{0}{{\mathbf{Z}}} = \begin{pmatrix} 216.53 &{} -1375.44 &{} -117.73 &{} -2041.19\\ -1375.44 &{} 9036.17 &{} 703.35 &{} 13488.31 \\ -117.73 &{} 703.35 &{} 342.23 &{} 667.97\\ -2041.19 &{} 13488.31 &{} 667.97 &{} 20852.25 \end{pmatrix}, \end{aligned}$$

and the lower 0.95th quantile of $H^{B},$ i.e. $H^B_{0.05}$, using 20,000 simulations, is computed to be around 13.74.

Conclusions

In this article, we propose a parametric bootstrap method for testing hypothesis as well as constructing confidence region on the regression coefficients vector (${\varvec{\delta }}$) in panel data models in balanced and unbalanced panels. We study performance our PB method with GPV and AP methods based on simulation study in balanced state. The simulation study is compared type I error rate and power of three methods. The simulation results show close estimated size of our PB test to the nominal level (0.05), in which two other methods are often liberal (significantly greater than 0.05). However, in the cases that the estimated size of GPV is close to 0.05, the PB test and GPV have not significantly different powers. Therefore, for testing or constructing confidence region about ${\varvec{\delta }}$ we propose PB method.

References

Balestra, P., Nerlove, M.: Pooling cross section and time series data in the estimation of a dynamic model: the demand for natural gas. Econometrica 34(3), 585 (1966)
Article Google Scholar
Baltagi, B.: Econometric analysis of panel data (fifth) (2013)
Baltagi, B.H., Griffin, J.M.: Gasoline demand in the OECD: an application of pooling and testing procedures. Eur. Econ. Rev. 22(2), 117–137 (1983)
Article Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Hoboken (1994)
MATH Google Scholar
Fuller, W.A., Battese, G.E.: Estimation of linear models with crossed-error structure. J. Econom. 2(1), 67–78 (1974)
Article Google Scholar
Kharrati-Kopaei, M., Kharati Koopaei, E.: A note on the multiple comparisons of exponential location parameters with several controls under heteroscedasticity. Hacet. J. Math. Stat. 47, 447–461 (2018)
MathSciNet MATH Google Scholar
Li, H.-Q., Tang, M.-L., Wong, W.-K.: Confidence intervals for ratio of two Poisson rates using the method of variance estimates recovery. Comput. Stat. 29, 869–889 (2014)
Article MathSciNet Google Scholar
Lin, S.-H., Lee, J.C.: Exact tests in simple growth curve models and one-way ANOVA with equicorrelation error structure. J. Multivar. Anal. 84(2), 351–368 (2003)
Article MathSciNet Google Scholar
Malekzadeh, A., Kharrati-Kopaei, M.: Inferences on the common mean of several normal populations under heteroscedasticity. Comput. Stat. 33(3), 1367–1384 (2018)
Article MathSciNet Google Scholar
Malekzadeh, A., Kharrati-Kopaei, M.: Inferences on the common mean of several heterogeneous log-normal distributions. J. Appl. Stat. 46(6), 1066–1083 (2019)
Article MathSciNet Google Scholar
Mathew, T., Webb, D.W.: Generalized p values and confidence intervals for variance components: applications to army test and evaluation. Technometrics 47(3), 312–322 (2005)
Article MathSciNet Google Scholar
Swamy, P., Arora, S.S.: The exact finite sample properties of the estimators of coefficients in the error components regression models. Econometrica 40, 261–275 (1972)
Article MathSciNet Google Scholar
Tsui, K.-W., Weerahandi, S.: Generalized p values in significance testing of hypotheses in the presence of nuisance parameters. J. Am. Stat. Assoc. 84(406), 602–607 (1989)
MathSciNet Google Scholar
Wallace, T.D., Hussain, A.: The use of error components models in combining cross section with time series data. Econometrica 37, 55–72 (1969)
Article Google Scholar
Wansbeek, T., Kapteyn, A.: A simple way to obtain the spectral decomposition of variance components models for balanced data. Commun. Stat.-Theory Methods 11(18), 2105–2112 (1982)
Article MathSciNet Google Scholar
Weerahandi, S.: Testing variance components in mixed models with generalized p values. J. Am. Stat. Assoc. 86(413), 151–153 (1991)
MathSciNet Google Scholar
Weerahandi, S., Johnson, R.A.: Testing reliability in a stress-strength model when x and y are normally distributed. Technometrics 34(1), 83–91 (1992)
Article MathSciNet Google Scholar
Weerahandi, S.: Generalized confidence intervals. In: Exact Statistical Methods for Data Analysis, pp. 143–168. Springer, New York (1995)
Weerahandi, S., Berger, V.W.: Exact inference for growth curves with intraclass correlation structure. Biometrics 55(3), 921–924 (1999)
Article Google Scholar
Xiao, M., Jiang, T., Zhang, H., Shan, G.: Exact one-sided confidence limit for the ratio of two Poisson rates. Stat. Biopharm. Res. 9, 180–185 (2017)
Article Google Scholar
Xu, L., Tian, M.: Parametric bootstrap inferences for panel data models. Commun. Stat.-Theory Methods 46(11), 5579–5594 (2017)
Article MathSciNet Google Scholar
Xu, L., Wang, D.: Parametric bootstrap inferences for unbalanced panel data models. Commun. Stat.-Simul. Comput. 46(10), 7602–7613 (2017)
Article MathSciNet Google Scholar
Zhao, H.: Exact tests in panel data using generalized p values. Commun. Stat.-Theory Methods 37(1), 18–36 (2007)
Article MathSciNet Google Scholar
Zhou, L., Mathew, T.: Some tests for variance components using generalized p values. Technometrics 36(4), 394–402 (1994)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to referees for their valuable comments and suggestions.

Author information

Authors and Affiliations

Department of Statistics, Ahvaz Branch, Islamic Azad University, Ahvaz, Iran
A. Esmaeli-Ayan, A. Malekzadeh & F. Hormozinejad
Department of Computer Science and Statistics, Faculty of Mathematics, K.N. Toosi University of Technology, Tehran, Iran
A. Malekzadeh

Authors

A. Esmaeli-Ayan
View author publications
You can also search for this author in PubMed Google Scholar
A. Malekzadeh
View author publications
You can also search for this author in PubMed Google Scholar
F. Hormozinejad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Malekzadeh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Esmaeli-Ayan, A., Malekzadeh, A. & Hormozinejad, F. Inferences on the regression coefficients in panel data models: parametric bootstrap approach. Math Sci 14, 67–73 (2020). https://doi.org/10.1007/s40096-019-00316-6

Download citation

Received: 16 September 2019
Accepted: 07 December 2019
Published: 28 December 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s40096-019-00316-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inferences on the regression coefficients in panel data models: parametric bootstrap approach

Abstract

Similar content being viewed by others

Testing structural changes in panel data with small fixed panel size and bootstrap

Panel bootstrap tests of slope homogeneity

Bootstrap-based model selection criteria for beta regressions

Introduction

PB inferences for the regression coefficients

Balanced panel data models

Unbalanced panel data Models

Simulation study

Remark 1

Remark 2

Remark 3

Remark 4

Example

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferences on the regression coefficients in panel data models: parametric bootstrap approach

Abstract

Similar content being viewed by others

Testing structural changes in panel data with small fixed panel size and bootstrap

Panel bootstrap tests of slope homogeneity

Bootstrap-based model selection criteria for beta regressions

Introduction

PB inferences for the regression coefficients

Balanced panel data models

Unbalanced panel data Models

Simulation study

Remark 1

Remark 2

Remark 3

Remark 4

Example

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation