Abstract
Considering linear dynamic panel data models with fixed effects, existing outlier–robust estimators based on the median ratio of two consecutive pairs of first-differenced data are extended to higher-order differencing. The estimation procedure is thus based on many pairwise differences and their ratios and is designed to combine high precision and good robust properties. In particular, the proposed two-step GMM estimator based on the corresponding moment equations relies on an innovative weighting scheme reflecting both the variance and bias of those moment equations, where the bias is assumed to stem from data contamination. To estimate the bias, the influence function is derived and evaluated. The robust properties of the estimator are characterized both under contamination by independent additive outliers and the patches of additive outliers. The proposed estimator is additionally compared with existing methods by means of Monte Carlo simulations.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Dynamic panel data models with fixed effects have been used in many empirical applications in economics; see Bun and Sarafidis (2015) and Harris et al. (2008) for an overview of the methodology and applications. Despite the complex data structure of dynamic panels, a vast majority of literature focuses on the models assuming that data are free of influential observations or outliers. This is often not the case in reality (Janz 2002; Verardi and Wagner 2011; Zaman et al. 2001), and procedures robust to outliers are thus very important in the case of panel data, where erroneous observations can be easily masked by the complex data structure.
The robust methods for panel data have been studied only to a limited extent till now. There are some methods available for static models (e.g., Bramati and Croux 2007; Aquaro and Čížek 2013) and just a handful for the dynamic models. Locally robust estimation procedures have been proposed by Lucas et al. (2007), based on the generalized method of moment estimator with a bounded influence function, and by Galvao (2011), using quantile regression techniques. On the other hand, Dhaene and Zhu (2017) and Aquaro and Čížek (2014) propose globally robust estimators that are based on the median ratios of the first differences of the dependent variable and of the first- or higher-order differences of the lagged dependent variable [note that previously studied median-unbiased estimation such as Cermeño (1999) was based on the least squares method and was thus not robust to outliers]. The main shortcomings of these methods follow from the use of a fixed number of the differences and their ratios. On the one hand, using just the first differences as in Dhaene and Zhu (2017) can be beneficial for the robustness of the estimator, but it results in a lower precision of estimates. On the other hand, Aquaro and Čížek (2014) employ multiple differences of the explanatory variables to improve the precision of estimation, but it leads to a high sensitivity to sequences of outliers. Additionally, estimation using higher-order differences of the dependent variables has not been explored in neither case.
Our aim is to extend these median-based estimators of Dhaene and Zhu (2017) and Aquaro and Čížek (2014) by employing multiple pairwise difference transformations in such a way that the resulting estimator is robust and also exhibits good finite-sample performance in data without outliers. The use of higher-order differences of the dependent variable is not new (see Aquaro and Čížek 2013), but presents two big challenges when applied in dynamic models. In particular, higher-order differences have not been previously used since (1) they can result in a substantial increase in bias in the presence of particular types of outliers and (2) their number grows quadratically with the number of time periods, which can lead to additional biases due to weak identification or outliers. We address this by proposing a data-driven weighting and selection of the median ratios of differenced data since the traditional strategy used in the robust statistics—using an initial robust estimator to detect outlying observations, and after removing them, applying an efficient non-robust estimator (c.f., Gervini and Yohai 2002)—is not feasible in this context. Even in the case of using the first differences only, removing a single observation means that the observation and its two or three following data points (depending on the actual estimation method) cannot be used in estimation. Especially in short panels with less than five time periods, removing a single observation for a given individual thus means that no observations of that individual can be used in estimation and the problem gets worse if higher-order differences are used.
In this paper, we generalize the estimation method of Dhaene and Zhu (2017) to a combination of the pth and sth order differences, \(p,s\in \mathbb {N}\), and combine multiple pairwise differences by means of the generalized method of moments (GMM). To account for the shortcomings of the current methods and to extend the analysis of Aquaro and Čížek (2014), we first analyze the robustness of the median-based moment conditions, derive their influence functions, and quantify the bias caused by data contamination. Subsequently, we use the maximum bias and propose a two-step GMM estimator, which weights the (median-based) moment conditions both by their variance and bias; this guarantees that imprecise or biased moment conditions get low weights in estimation. Finally, as the number of applicable moment conditions grows quadratically with the number of time periods, a suitable number of moment conditions for the underlying data generating process needs to be selected using a robust version of moment selection procedure of Hall et al. (2007).
In the rest of the paper, the new estimator is introduced first in Sect. 2. Its robust properties are studied in Sect. 3 and are used to define the data-dependent GMM weights. The existing and proposed methods are then compared by means of Monte Carlo simulations in Sect. 4 and the proofs can be found in the “Appendix”.
2 Median-based estimation of dynamic panel models
The dynamic panel data model (Sect. 2.1) and its median-based estimation (Sect. 2.2) will be now discussed. Later, the two-step GMM estimation procedure (Sect. 2.3) and the moment selection method (Sect. 2.4) will be introduced.
2.1 Dynamic panel data model
Consider the simple dynamic panel data model \((i=1,\ldots ,n; t=1,\ldots ,T; T\ge 3)\)
where \(y_{ it }\) denotes the response variable, \(\eta _i\) is the unobservable fixed effect, and \(\varepsilon _{ it }\) represents the idiosyncratic error. Parameter \(|\alpha |<1\) so that this data generating process can be stationary. The number T of time periods is fixed, which implies that fixed or stochastic effects \(\eta _i\) are nuisance parameters and cannot be consistently estimated. Finally note that the extension of the discussed estimators to a model with exogenous covariates is straightforward (see Dhaene and Zhu 2017, Section 4.1).
As in Aquaro and Čížek (2014) and similarly to Cermeño (1999) and Han et al. (2014), we will consider model (1) under the following assumptions:
A.1
Errors \(\varepsilon _{ it }\) are independent across \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\) and possess finite second moments. Errors \(\{\varepsilon _{ it }\}_{t=1}^T\) are also independent of fixed effects \(\eta _i\).
A.2
The sequences \(\{y_{ it }\}_{t=1}^T\) are time stationary for all \(i=1,\ldots ,n\). In particular, the first and second moments of \(y_{ it }\) conditional on \(\eta _i\) exist and do not depend of time.
A.3
Errors \(\varepsilon _{ it }\sim \mathrm {N}(0,\sigma _{\varepsilon }^2)\), \(\sigma _{\varepsilon }^2>0\), for all \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\).
Except of the independence in Assumption A.1, there are no assumptions are made about the unobservable fixed effects \(\eta _i\). Although we impose rather strict Assumptions A.1 and A.3 on idiosyncratic errors, they can be relaxed. The errors \(\varepsilon _{ it }\) do not have to follow the same distribution across cross-sectional units i, allowing for heteroscedasticity. Additionally, the consistency of the estimators introduced below requires that the joint distributions of errors \(\{\varepsilon _{ it }\}_{t=1}^T\) are elliptically contoured, making the normality Assumption A.3 sufficient, but not necessary (see Dhaene and Zhu 2017, Section 4.2). On the other hand, the violation of the time-homoscedasticity in Assumption A.3 leads to the inconsistency of the discussed estimators. If \(\varepsilon _{ it }\sim \mathrm {N}(0,\sigma _{\epsilon t}^2)\) for \(t=1,\ldots ,T\), the model equation (1) has to be therefore rescaled by unknown standard deviations \(\sigma _{\epsilon t}\), which can be treated as unknown parameters and estimated along with \(\alpha \) by GMM. Finally, the stationarity Assumption A.2 is used not only by the proposed estimators, but also by frequently applied GMM estimators such as Blundell and Bond (1998) and it is implied by the assumptions of Han et al. (2014) if \(|\alpha |<1\).
2.2 Median-based moment conditions
To generalize the estimator by Dhaene and Zhu (2017), let \(\Delta ^s\) denote the sth difference operator, that is, \(\Delta ^s\upsilon _{t}:=\upsilon _{t}-\upsilon _{t-s}\) (cf. Abrevaya 2000; Aquaro and Čížek 2013). Given model (1), stationarity Assumption A.2 implies for any integers \(s,q,p\in \mathbb {N}\) that
where the triplet \(\varvec{j}=(s,q,p)\) and \( r_{\varvec{j}} = {\text {cov}}(\Delta ^sy_{ it },\Delta ^py_{it-q}){/}{{\text {var}}(\Delta ^py_{it-q})}\) are independent of i and t, \(\max \{s,p+q\}<T\). Consequently, the variables \(\Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-q}\) and \(\Delta ^py_{it-q}\) are uncorrelated, and by Assumption A.3, independent and symmetrically distributed around zero. Hence, \( {\text {E}}[{\text {sgn}}(\Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-q}){\text {sgn}}(\Delta ^py_{it-q})]=0 \) and \( {\text {E}}\left[ {\text {sgn}}\left( {\Delta ^sy_{ it }}/{\Delta ^py_{it-q}}-r_{\varvec{j}}\right) \right] =0. \) The estimation of \(r_{\varvec{j}}\) can be therefore based on the sample analog of this moment condition:
To relate \(r_{\varvec{j}}\) to the autoregressive coefficient \(\alpha \) in (1), Aquaro and Čížek (2014) derived under Assumptions A.1 and A.2 that the correlation coefficient \(r_{\varvec{j}}\) satisfies the moment condition
If \(s=q=p=1\), (4) defines Dhaene and Zhu (2017)’s estimator: \(\alpha \in (-1,1)\) is identified by \(g_{111}(\alpha )=(1-\alpha )(2r_{111}+1-\alpha )=0\). Dhaene and Zhu’s (DZ) estimator \(\hat{\alpha }_n\) therefore simply equals to \(2\hat{r}_{n111}+1\) and it was proved to be consistent and asymptotically normal. Aquaro and Čížek (2014)’s estimator (AC-DZ) of \(\alpha \) uses \(s=q=1\) and p being odd, \(p<T-1\). They do not use differences with \(s>1\) due to their robustness properties: while they seem robust to sequences of outliers, they can lead to large biases if outliers occur at random times.
2.3 Two-step GMM estimation
To increase the precision and robustness of the estimation, we propose to extend the (AC-)DZ estimator by allowing for multiple differences with \(s = q \ge 1\) and \(p \ge 1\). We consider only \(s=q\) as the moment conditions (4) do not allow distinguishing outlying and regular observations for \(s \not = q\) as shown in Aquaro and Čížek (2014). For \(s=q\), (4) simplifies after dividing by \(1-\alpha ^p\) and accordingly redefining \(g_{\varvec{j}}(\alpha )\) to
The full set of moment conditions in (5) can be then written as
where \(\varvec{g}(\alpha )=\{g_{\varvec{j}}(\alpha )\}_{\varvec{j}\in \mathcal {J}}\) and a fixed finite set \(\mathcal {J}\) contains all triplets \(\varvec{j}= (s,q,p)\) that are considered in estimation. The DZ estimator then corresponds to the special case \(\mathcal{J} = \{ (1,1,1) \}\) and the AC-DZ relies on a set \(\mathcal{J} = \{ (1,1,p){:}\,1 \le p < T-1 \text{ odd } \}\). Here we consider all combinations with any \(s=q\) odd and p odd, \(\mathcal{J} \subseteq \mathcal{J}_o = \{ (s,s,p){:}\,s\in \mathbb {N} \text{ odd }, p\in \mathbb {N} \text{ odd }, 1 \le s+p < T\}\), as the single moment conditions do not identify uniquely \(\alpha \) for even values of s or p and this could negatively affect the bias caused by contamination.
Given the system of equations in (6), the parameter \(\alpha \) can be estimated by the GMM procedure. This GMM estimator is referred to here as the pairwise-difference DZ (PD-DZ) estimator and is defined by
where \(\varvec{g}_n(c)=(g_{n\varvec{j}}(c))_{\varvec{j}\in \mathcal {J}}\) is the sample analog of \(\varvec{g}(\alpha )\) and corresponds to (5) with \(r_{\varvec{j}}\) being replaced by \(\hat{r}_{n\varvec{j}}\) defined in (3).
The weighting matrix \(\varvec{A}_n\) can be initially chosen as in Aquaro and Čížek (2014) proportional to the number of observations available for the estimation of each moment equation: \(\varvec{A}_n = \varvec{A}= {\text {diag}}\{(T - p - s)/T \}\). The traditional variance-minimizing choice of the GMM weighting matrix \(\varvec{A}_n\) however equals the inverse of the variance matrix \(\varvec{V}_n\) of the moment conditions \(\varvec{g}_n(\alpha )\), which converges to the asymptotic variance matrix \(\varvec{V}\) of the moment conditions (5); see the Appendix for the asymptotic distribution of \(\hat{\alpha }_n\) and the asymptotic variance matrix \(\varvec{V}\) previously obtained by Aquaro and Čížek (2014).
On the other hand, we aim to account also for the presence of outlying observations that can substantially bias the estimates. Since simply removing outliers would result in a substantial data loss as explained in the introduction, we propose instead to use the moment conditions (5) and minimize the mean squared error (MSE) of estimates instead of the asymptotic variance. First, let us denote the MSE of \(\varvec{g}_n(\alpha )\) by \(\varvec{W}_n\),
Given a weighting matrix \(\varvec{A}_n\) and the asymptotic linearity of \(\hat{\alpha }_n\) (see Aquaro and Čížek 2014, the proof of Theorem 1)
as \(n\rightarrow \infty \), it immediately follows that the MSE of \(\hat{\alpha }_n\) equals
which is (asymptotically) minimized by choosing \(\varvec{A}_n = \varvec{W}_n^{-1}\) (Hansen 1982, Theorem 3.2).
Next, to create a feasible procedure, both the variance and squared bias matrices have to be estimated. The estimation thus proceeds in two steps: first, the (AC-)DZ estimator is applied to obtain an initial parameter estimates; then—after estimating the bias \(\varvec{b}_n\) and variance \(\varvec{V}_n\) of moment conditions—the GMM estimator with all applicable pairwise differences is evaluated using an estimate of the weighting matrix \(\varvec{A}_n = [\varvec{b}_n \varvec{b}_n' + \varvec{V}_n]^{-1}\). On the one hand, the estimate \(\hat{\varvec{V}}_n\) of \(\varvec{V}_n\) can be directly obtained from Theorem 5 in the “Appendix” using initial estimates of \(r_{\varvec{j}}\) and \(\alpha \) because both the responses \(y_{ it }\) as well as estimates \(\hat{\alpha }_n\) are continuously distributed with bounded densities due to the stationarity Assumptions A.2 andA.3. On the other hand, estimating \(\varvec{b}_n\) by \(\hat{\varvec{b}}_n\) requires first studying the biases of median-based moment conditions and constructing a feasible estimate thereof in Sect. 3. Using estimates \(\hat{\varvec{V}}_n\) and \(\hat{\varvec{b}}_n\) to construct \(\hat{\varvec{W}}_n = \hat{\varvec{b}}_n \hat{\varvec{b}}_n' + \hat{\varvec{V}}_n\) and \(\hat{\varvec{A}}_n=\hat{\varvec{W}}_n^{-1}\) then leads to the proposed second-step GMM estimator
2.4 Robust moment selection
The proposed two-step GMM estimator is based on the moment conditions (5), and given that we consider only odd s and p, their number equals approximately \(T(T-1)/8\) and grows quadratically with the number of time periods. Although the extra moment conditions based on higher-order differences might improve precision of estimation for larger values of \(|\alpha |\), their usefulness is rather limited if \(\alpha \) is close to zero. At the same time, a large number of moment conditions might increase estimation bias due to outliers. More specifically, Aquaro and Čížek (2014) showed for \(\alpha \) close to 0 that the original moment condition of the DZ estimator \(s=q=p=1\) is least sensitive to random outliers, for instance; including higher-order moment conditions then just increases bias, does not improve the variance, and is thus harmful.
To account for this, we propose to select the moment conditions used in estimation by a robust analog of a moment selection criterion (e.g., see Cheng and Liao 2015, for an overview). Since all moments are valid and no weak instruments are involved, the information content of the moment equations and their number have to be balanced as in Hall et al. (2007), whose approach to moment selection in the presence of nearly redundant moment conditions can be adapted to robust estimation. They propose the so-called relevant moment selection criterion (RMSC) that—for a given set of moment conditions defined by triplets \(\mathcal J\) in our case—equals
Matrix \(\hat{\varvec{V}}_{n,\mathcal{J}}\) represents an estimate of the variance matrix \(\varvec{V}_{\mathcal{J}}\) of moment conditions (6) defined by triplets \(\mathcal J\) and \(\kappa (\cdot ,\cdot )\) is a deterministic penalty term depending on the number \(|\mathcal{J}|\) of triplets (or moment conditions) and on the sample size n used for estimating the elements of \(\varvec{V}_n\) (see Theorem 5). To select relevant moment conditions, this criterion has to be minimized:
Two examples of the penalization term used by Hall et al. (2007) are the Bayesian information criterion (BIC) with \(\kappa (c,n) = (c-K) \cdot \ln (\sqrt{n}){/}\sqrt{n}\) and the Hannan–Quinn information criterion (HQIC) with \(\kappa (c,n) = (c-K) \cdot \kappa _c \ln (\ln (\sqrt{n})){/}\sqrt{n}\), where the number of estimated parameters \(K=1\) in model (1) and constant \(\kappa _c>2\).
As in Sect. 2.3, the proposed robust estimator (9) should minimize the MSE error rather than just the variance of the estimates. We therefore suggest to use the relevant robust moment selection criterion (RRMSC),
which is based on the determinant of an estimate \(\hat{\varvec{W}}_n\) of the MSE matrix \(\varvec{W}_n\) rather than on the variance matrix estimate \(\hat{\varvec{V}}_n\) of the moment conditions. The relevant robust moment conditions are then obtained by minimizing
3 Robustness properties
There are many measures of robustness that are related to the bias of an estimator, or more typically, the worst-case bias of an estimator due to an unknown form of outlier contamination. In this section, various kinds of contamination are introduced and some relevant measures of robustness are defined (Sect. 3.1). Using these measures, we characterize the robustness of moment conditions (5) in Sect. 3.2 and the robustness of the GMM estimator (7) in Sect. 3.3. Next, we use these results to estimate of the bias of the moment conditions (5) as discussed in Sect. 3.4. Finally, the whole estimation procedure is summarized in Sect. 3.5.
3.1 Measures of robustness
Given that the analyzed data from model (1) are dependent, the effect of outliers can depend on their structure. Therefore, we first describe the considered contamination schemes and then the relevant measures of robustness.
More formally, let \(\mathcal {Z}\) be the set of all possible samples \(Z = \{z_{ it }\}\) of size (n, T) following model (1) and let \(Z_\epsilon = \{ z_{ it }^\epsilon \}\) be a contaminating sample of size (n, T) following a fixed data-generating process, where the index \(\epsilon \) of \(Z_\epsilon \) indicates the probability that an observations in \(Z_\epsilon \) is different from zero. The observed contaminated sample is \(Z+Z_\epsilon =\{z_{ it } + z_{ it }^{\epsilon }\}_{i=1,t=1}^{n,~~T}\). Similarly to Dhaene and Zhu (2017), we consider the contamination by independent additive outliers following a degenerate distribution with the point mass at \(\zeta \),
and by patches of k additive outliers,
where \(\nu _{ it }^\epsilon \) follows the Bernoulli distribution with the parameter \(\tilde{\epsilon }\) such that \((1-\tilde{\epsilon })^k=1-\epsilon \). Additionally, a third contamination scheme \(Z^3_{\epsilon ,\zeta }=\{z_{ it }^\epsilon \}_{i=1,}^{n,}{}_{t=1}^T\) is considered, where
where \(\Pr \left( a_{it-l}=\zeta \right) =1/2\) and \(\Pr \left( a_{it-l}=-\zeta \right) =1/2\) and where \(\nu _{ it }^{\epsilon }\) is defined as in \(Z_{\epsilon ,\zeta }^2\). Note that (12) and (13) are special cases of a more general type of contamination \(Z_{\epsilon ,\zeta }^4=\{z_{ it }^\epsilon \}_{i=1,}^{n,}{}_{t=1}^T\), where
and \(-\,1\le \rho \le 1\). Note that this general type of contamination closely corresponds to the contamination by innovation outliers for large k and \(\rho =\alpha \) and it is therefore important to study. As we can however conjecture from Dhaene and Zhu (2017)’s results for \(s=p=1\) that the contamination scheme \(Z_{\epsilon ,\zeta }^4\) biases estimates towards \(\rho \) for \(\zeta \rightarrow +\infty \) and \(\rho \) is unknown in practice, we are not analysing this most general case with \(\rho \in [-\,1,1]\). Instead, we concentrate on the most extreme cases of \(\rho =1\) and \(\rho =-\,1\) as they can arguably bias the estimate most. Hence, the contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) bias the DZ estimates of \(\alpha \) towards 0, 1, and \(-\,1\), respectively—see Sect. 3.2 and Dhaene and Zhu (2017).
Given the contamination schemes, one of the traditional measures of the global robustness of an estimator is the breakdown point. It can be defined as the smallest fraction of the data that can be changed in such a way that the estimator will not reflect any information concerning the remaining (non-contaminated) observations. Aquaro and Čížek (2014) derived the breakdown points of the estimators \(\hat{r}_{\varvec{j}}\), \(\varvec{j}\in \mathcal{J}\), for contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\), and under some regularity conditions, proved that the breakdown point of the GMM estimator (7) equals the breakdown point of the DZ estimator \(\hat{r}_{(1,1,1)}\) if \((1,1,1)\in \mathcal{J}\). While such results characterize the global robustness of the PD-DZ estimators, they are not informative about the size of the bias caused by outliers.
We therefore base the estimation of the bias due to contamination on the influence function. It is a traditional measure of local robustness and can be defined as follows. Let \(\mathcal {T}(Z+Z_\epsilon )\) denote a generic estimator of an unknown parameter \(\theta \) based on a contaminated sample \(Z+Z_\epsilon =\{z_{ it }+z_{ it }^{\epsilon }\}_{i=1,}^{n,}{}_{t=1}^T\), where Z and \(Z_\epsilon \) have been defined at the beginning of Sect. 3. As the definition is asymptotic, let \(\mathcal {T}(\theta ,\zeta ,\epsilon ,T)\) be the probability limit of \(\mathcal {T}(Z+Z_\epsilon )\) when T is fixed and \(n\rightarrow \infty \). Note that \(\mathcal {T}(\theta ,\zeta ,\epsilon ,T)\) depends on the unknown parameter \(\theta \) describing the data generating process, on the fraction \(\epsilon \) of data contamination, on the non-zero value \(\zeta \) characterizing the outliers, and on the number of time periods T. Assume \(\mathcal {T}\) is consistent under non-contaminated data, that is, \(\mathcal {T}(\theta ,\zeta ,0,T)=\theta \). The influence function (IF) of estimator \(\mathcal {T}\) at data generating process Z due to contamination \(Z_\epsilon \) is defined as
where the equality follows by the definition of asymptotic bias of \(\mathcal {T}\) due to the data contamination \(Z_\epsilon \), \( {\text {bias}}(\mathcal {T};\theta ,\zeta ,\epsilon ,T):=\mathcal {T}(\theta ,\zeta ,\epsilon ,T)-\theta . \) (If IF does not depend on the number T of time periods, T can be omitted from its arguments.)
Clearly, the knowledge of the influence function allows us to approximate the bias of an estimator \(\mathcal {T}\) at \(Z+Z_\epsilon \) by \(\epsilon \cdot {\text {IF}}(\mathcal {T};\theta ,\zeta ,T)\). Although such an approximation is often valid only for small values of \(\epsilon >0\) (e.g., in the linear regression model, where the bias can get infinite), it is relevant in a much wider range of contamination levels \(\epsilon \) in model (1) given that the parameter space \((-\,1,1)\) is bounded and so is the bias (the dependence of the bias on the contamination level \(\epsilon \) has been studied by Dhaene and Zhu 2017).
The disadvantage of approximating bias by \(\epsilon \cdot {\text {IF}}(\mathcal {T};\theta ,\zeta ,T)\) is that it depends on the unknown magnitude \(\zeta \) of outliers. We therefore suggest to evaluate the supremum of the influence function, the gross error sensitivity (GES)
and approximate the worst-case bias by \(\epsilon \cdot {\text {GES}}(\mathcal {T};\theta ,T)\). For the PD-DZ estimator and the corresponding moment conditions, IF and GES are derived in the following Sects. 3.2 and 3.3, where \(\mathcal {T}\) will equal to \(\hat{\alpha }\) and \(\hat{r}_{\varvec{j}}\), respectively (without the subscript n since the IF and GES definitions depend only on the probability limits of the estimators).
3.2 Influence function
The GMM estimator (7) is based on moment conditions depending on the data only by means of the medians \(r_{\varvec{j}}\). We therefore derive first the influence functions of the estimators \(\hat{r}_{\varvec{j}}\) and then combine them to derive the influence function of the GMM estimator. Building on Dhaene and Zhu (2017, Theorems 3.2 and 3.7), the IFs of \(\hat{r}_{\varvec{j}}\) in model (1) under contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) are derived in the following Theorems 1–3. Only the point-mass distribution \(G_\zeta \) with the mass at \(\zeta \in R\) is considered. In all theorems, \(\Phi \) denotes the cumulative distribution function of the standard normal distribution \(\mathrm {N}(0,1)\).
Theorem 1
Let Assumptions A.1–A.3 hold and \(\varvec{j}\in \mathcal{J}_o\). Then it holds in model (1) under the independent-additive-outlier contamination \(Z_{\epsilon ,\zeta }^1\) with point-mass distribution at \(\zeta \ne 0\) that
Theorem 2
Let Assumptions A.1–A.3 hold and \(\varvec{j}\in \mathcal{J}_o\). Then it holds in model (1) under the patched-additive-outlier contamination \(Z_{\epsilon ,\zeta }^2\) with point-mass distribution at \(\zeta \ne 0\) and patch length \(k\ge 2\) that
where \(\mathfrak {p}_C'(0)\), \(\mathfrak {p}_D'(0)\), \(C(r_{\varvec{j}};\zeta ,0)\), and \(D(r_{\varvec{j}};\zeta ,0)\) are defined in (52), (53), (56), and (57), respectively.
Theorem 3
Let Assumptions A.1–A.3 hold and \(\varvec{j}\in \mathcal{J}_o\). Then it holds in model (1) under the patched-additive-outlier contamination \(Z_{\epsilon ,\zeta }^3\) with point-mass distribution at \(\zeta \ne 0\) and patch length \(k\ge 2\) that
where \(\mathfrak {p}_L'\), \(L\in \{C,D,E,G,I\}\), are defined in Eqs. (73), (74), (75), (77), (79), \(\mathcal {L}(1/2)=L(r_{\varvec{j}};\zeta ,0)-1/2\) for \(\mathcal {L}\in \{\mathcal {C},\mathcal {D},\mathcal {E},\mathcal {G},\mathcal {I}\}\) and \(L\in \{C,D,E,G,I\}\), and \(L(r_{\varvec{j}};\zeta ,0)\) for \(L\in \{C,D,E,G,I\}\) are defined in Eqs. (82)–(86) in “Appendix A.3”.
The influence functions reported in Theorems 1–3 are complicated objects both due to their algebraic forms and their dependence on the unknown parameter values \(\alpha \) and \(\zeta \). As \(\zeta \) is generally unknown, we characterize the worst-case scenario by means of the gross error sensitivity: recall that \( {\text {GES}}(\hat{r}_{\varvec{j}};\alpha ) = \sup _{\zeta }\left| {\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )\right| \) by Eq. (16). Inspection of the influence functions and their elements in Theorems 1–3 reveals though that the largest effect can be attributed to outliers with magnitude \(|\zeta | \rightarrow +\infty \) (possibly with an exception of term \(\mathcal {E}(1/2)\) in Theorem 3).
Given the results in Theorems 1–3, we thus have to compute the GES of estimators \(\hat{r}_{\varvec{j}}\) numerically for each \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\) and \(\alpha \in (-\,1,1)\). Although this might be relatively demanding if T is large and a dense grid for \(\alpha \) is used, note that the GES values are asymptotic and independent of a particular data set. They have to be therefore evaluated just once and then used repeatedly during any application of the proposed PD-DZ estimator. We computed the GES of \(\hat{r}_{\varvec{j}}\) for \(\varvec{j}\in \{ (s,s,p); s=1,3,5,7 \text{ and } p=1,3,5,7,9,11\}\) with the variance \(\sigma _\varepsilon ^2\) set equal to one without loss of generality. The results corresponding to Theorems 1–3 are depicted on Figs. 1, 2 and 3. Irrespective of the contamination scheme, most GES curves display typically higher sensitivity to outliers for \(|\alpha |\) close to one than for values of the autoregressive parameter around zero. One can also see that the DZ estimator corresponding to \(s=1\) and \(p=1\) is indeed biased towards 0, 1, and \(-\,1\) for the contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\), respectively. Concerning the higher-order differences we propose to add to the (AC-)DZ methods, Fig. 1 documents they do exhibit high sensitivity to independent outliers. On the other hand, their sensitivity to the patches of outliers on Fig. 2, for instance, decreases with an increasing s and becomes very low (relative to \(s=1\) and \(p\ge 1\)) if s is larger than the patch length k, for example, \(s = 7 > k = 6\).
3.3 Robust properties of the GMM estimator \(\hat{\alpha }_{n}\)
Given the results of the previous sections, we will now analyze the robust properties of the general GMM estimator \(\hat{\alpha }\) based on moment equations (6) for \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\). The results are stated first for the initial PD-DZ estimator (7) with a deterministic weight matrix and later for the second step of the PD-DZ estimator (9). Since the weight matrix and the bias in particular can be estimated in different ways, we consider in the latter case a weight matrix as a general function of the parameter \(\alpha \) and the considered fraction \(\epsilon \) of outliers.
Theorem 4
Consider a particular additive outlier contamination \(Z_{\epsilon }\) occurring with probability \(\epsilon \), where \(0<\epsilon <1\). Further, let \(\mathcal {J}\subseteq \mathcal {J}_o\).
First, assume that \(\varvec{A}_n=\varvec{A}^0\) is a positive definite diagonal matrix. Then the influence function of the GMM estimator \(\hat{\alpha }^0\) using moment conditions indexed by \(\mathcal J\) is given by
where \(\varvec{d}\) is defined in Theorem 5 and \(\varvec{\psi }\) is the \(|\mathcal {J}|\times 1\) vector of the influence functions of each single \(\hat{r}_{\varvec{j}}\), \(\varvec{\psi }=\big ({\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )\big )_{\varvec{j}\in \mathcal {J}}\).
Next, assume that \(\varvec{A}_n=\varvec{A}(\hat{\alpha }_n^0, \epsilon )\) is a positive definite matrix function of the initial estimate \(\hat{\alpha }_n^0\) based on the deterministic weight matrix \(\varvec{A}^0\). If \(\varvec{A}_n = \varvec{A}(\hat{\alpha }_n^0, \epsilon ) \rightarrow \varvec{A}= \varvec{A}(\alpha ,\epsilon )\) has a finite probability limit and bounded influence function as \(n\rightarrow \infty \), then the influence function of \(\hat{\alpha }\) using moment conditions indexed by \(\mathcal J\) is again given by
Contrary to the breakdown point of Aquaro and Čížek (2014) mentioned earlier, the bias of the proposed PD-DZ estimators is a linear combination of the biases of the individual moment conditions depending on \(\hat{r}_{\varvec{j}}\). To minimize the influence of outliers on the estimator, one could theoretically select the moment condition with the smallest IF value, which could however result in a poor estimation if the moment condition is not very informative of the parameter \(\alpha \). As suggested in Sect. 2.3, we aim to minimize the MSE of the estimates and thus downweight the individual moment conditions if their biases or variances are large. Obviously, this will also lead to lower effects of biased or imprecise moment conditions on the IF in Theorem 4. To quantify the maximum influence of generally unknown outliers on the estimate, the GES function of the GMM estimator, that is, the supremum of IF in (21) with respect to \(\zeta \) can be used again.
3.4 Estimating the bias
The IF and GES derived in Sect. 3.2 characterize only the derivative of the bias caused by outlier contamination. We will refer to them in the case of contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) by \({\text {IF}}^c_k\) and \({\text {GES}}^c_k\), \(c=1,2,3\), respectively, where k denotes the number of consecutive outliers (patch length) in schemes \(Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\). Whenever the sequence of consecutive outliers is mentioned in this section, we understand by that a sequence of observations \(y_{ it }, t=t_1,\ldots ,t_2\), that can all be considered outliers.
To approximate \(\varvec{b}_n = Bias \{\varvec{g}_n(\alpha )\}\) introduced in Sect. 2.3, we therefore need to estimate the type and amount of outliers in a given sample. Assuming that the consecutive outliers form sequences of length k and the fraction of such outliers in data is denoted \(\epsilon _k\), the bias can be approximated using the \(\epsilon _k\)-multiple of \(|{\text {IF}}_1^1|\) or \({\text {GES}}_1^1\) if \(k=1\) and of \(\max \{ |{\text {IF}}_k^{2}|, |{\text {IF}}_k^{3}| \}\) or \(\max \{ {\text {GES}}_k^{2}, {\text {GES}}_k^{3} \}\) if \(k>1\) since we cannot reliably distinguish contamination \(Z^2_{\epsilon ,\zeta }\) and \(Z^3_{\epsilon ,\zeta }\). Given that the outlier locations cannot be reliably computed either, GES is preferred for estimating the bias due to contamination.
We therefore suggest to compute the bias vector \(\varvec{b}_n\) in the following way, provided that the estimates \(\hat{\epsilon }_k\) of the fractions of outliers forming sequences or patches of length k are available:
where \(\hat{\alpha }_n^0\) is an initial estimate of the parameter \(\alpha \) and the inner maximum is taken over \(c\in \{1\}\) for \(k=1\) and \(c\in \{2,3\}\) for \(k>1\). Note that if outliers (or particular types of outliers) are not present, \(\hat{\epsilon }_k = 0\) and the corresponding bias term is zero.
To estimate \(\hat{\epsilon }_k\), an initial estimate \(\hat{\alpha }_n^0\) is needed. Once it is obtained by the DZ or AC-DZ estimator, the regression residuals \(\hat{\varepsilon }_{ it }\) can be constructed, for example, by \(\hat{u}_{ it } = y_{ it } - \hat{\alpha }_n^0 y_{it-1}\) and \(\hat{\varepsilon }_{ it } = \hat{u}_{ it } - {{\text {med}}}_{t=2,\ldots ,T} \hat{u}_{ it }\) for any \(i=1,\ldots ,n\) and \(t=2,\ldots ,T\); the median \({\text {med}}_{t=2,\ldots ,T} \hat{u}_{ it }\) is used here as an estimate of the individual effect \(\eta _i\) similarly to Bramati and Croux (2007). Having estimated residuals \(\hat{\varepsilon }_{ it }\), the outliers are detected and the fractions \(\epsilon _k\) of outliers in data forming the patches or sequences of k consecutive outliers are computed. We consider as outliers all observations with \(|\hat{\varepsilon }_{ it } |> \gamma \hat{\sigma }_\varepsilon \), where \(\hat{\sigma }_\varepsilon \) estimates the standard deviation of \(\varepsilon _{ it }\), for example, by the median absolute deviation \(\hat{\sigma }_\varepsilon = \text{ MAD }(\hat{\varepsilon }_{ it }) / \Phi ^{-1}(3/4)\), and \(\gamma \) is a cut-off point. Although one typically uses a fixed cut-off point such as \(\gamma =2.5\), it can be chosen in a data-adaptive way by determining the fraction of residuals compatible with the normal distribution function of errors, for instance. This approach pioneered by Gervini and Yohai (2002) determines the cut-off point as the quantile of the empirical distribution function \(F_n^+\) of \(|\hat{\varepsilon }_{ it }|/\hat{\sigma }_\varepsilon \):
for
where \(F_0^+(t) = \Phi (t) - \Phi (-t), t\ge 0\), denotes the distribution function of |V|, \(V \sim N(0,1)\).
3.5 Algorithm
The whole procedure of the bias estimation, and subsequently, the proposed GMM estimation with the robust moment selection can be summarized as follows.
-
1.
Obtain an initial estimate \(\hat{\alpha }_n^0\) by DZ or AC-DZ estimator.
-
2.
Compute residuals \(\hat{u}_{ it } = y_{ it } - \hat{\alpha }_n^0 y_{it-1}\) and \(\hat{\varepsilon }_{ it } = \hat{u}_{ it } - {\text {med}}_{t=2,\ldots ,T} \hat{u}_{ it }\) and estimate their standard deviation \(\hat{\sigma }_\varepsilon \).
-
3.
Using the data-adaptive cut-off point (23), determine the fractions \(\hat{\epsilon }_k\) of outliers present in the data in the forms of outlier sequences of length k.
-
4.
Approximate the bias \(\varvec{b}_n\) due to outliers by \(\hat{\varvec{b}}_n\) using (22) and estimate the variance matrix \(\varvec{V}_n\) in Theorem 5 by \(\hat{\varvec{V}}_n\) for all moment conditions (5) defined for indices \(\varvec{j}\in \mathcal{J}_o\).
-
5.
For all \(\varvec{j}=(s,s,p) \in \mathcal{J}_o\),
-
(a)
set \(\mathcal{J} = \{(k,k,l){:}\,1\le k \le s \text{ is } \text{ odd }, 1 \le l \le p \text{ is } \text{ odd }\}\);
-
(b)
compute the GMM estimate \(\hat{\alpha }_{n,\mathcal J}\) defined in (9) using the moment conditions selected by \(\mathcal J\) and the weighting matrix defined as the inverse of the corresponding submatrix of \(\hat{\varvec{W}}_n = \hat{\varvec{b}}_n\hat{\varvec{b}}_n' + \hat{\varvec{V}}_n\);
-
(c)
evaluate the criterion \( RRMSC (\mathcal{J})\) defined in (10).
-
(a)
-
6.
Select the set of moment conditions by
$$\begin{aligned} \hat{\mathcal{J}} = \mathop {{\text {arg}}\,{\text {min}}}\limits _{\mathcal{J} \subseteq \mathcal{J}_o} RRMSC (\mathcal{J}). \end{aligned}$$ -
7.
The final estimate equals \(\hat{\alpha }_{n,\hat{\mathcal{J}}}\).
Let us note that the algorithm in step 5 does not evaluate the GMM estimates for all subsets of indices \(\mathcal{J} \subseteq \mathcal{J}_o\) and the corresponding moment conditions as that would be very time-consuming. It is therefore suggested to limit the number of \(\mathcal{J}_o\) subsets and one possible proposal, which always includes the DZ condition in the estimation, is described in point 5 of the algorithm. If an extensive evaluation of many GMM estimators has to be avoided, it is possible to opt for a simple selection between the DZ, AC-DZ, and PD-DZ estimator, where PD-DZ uses all moment conditions defined by \(\mathcal{J}_o\).
4 Monte Carlo simulation
In this section, we evaluate the finite sample performance of the proposed and existing estimators by Monte Carlo simulations to see whether the proposed method can weight the moment conditions so that it picks and mimicks the performance of the better estimator (e.g., out of those with fixed sets of moment conditions such as DZ and AC-DZ) for each considered data generating process.
Let \(\{y_{ it }\}\) follow model (1). We generate \(T+100\) observations for each i and discard the first 100 observations to reduce the effect of the initial observations and to achieve stationarity. We consider cases with \(\alpha =0.1,0.5,0.9\), \(n=25,50,100,200\), \(T=6,12\), \(\eta _i\sim \mathrm {N}(0,\sigma _\eta ^2)\), and \(\varepsilon _{ it }\sim \mathrm {N}(0,1)\). If data contamination is present, it follows the contamination schemes (11) and (12) for \(\epsilon =0.20\). More specifically, \(Z_{\epsilon ,\zeta }^1\) and \(Z_{\epsilon ,\zeta }^2\) used with \(p=3\) are both based on \(\zeta \) drawn for each outlier or patch of outliers randomly from U(10, 90); \(U(\cdot ,\cdot )\) denotes here the uniform distribution. The extreme values of outliers are chosen as they are supposed to have the largest influence on the estimates—cf. Theorem 1, for instance. Note that we have also considered mixes of two contamination schemes, for example, mixing equally independent additive outliers and patches of outliers, but the results are not reported as they are just convex combinations of the corresponding results obtained with only the first and only the second contamination schemes.
All estimators are compared by means of the mean bias and the root mean squared error (RMSE) evaluated using 1000 replications. The included estimators are chosen as follows. The non-robust estimators are represented by the Arellano–Bond (AB) two-step GMM estimatorFootnote 1 (Arellano and Bond 1991), the system Blundell and Bond (BB) estimatorFootnote 2 (Blundell and Bond 1998), and the X-differencing (XD) estimator (Han et al. 2014). The globally robust estimators are represented by the original DZ and AC-DZ estimators and by the proposed PD-DZ estimator. For the latter, we consider two different moment selection criteria RRMSC: BIC and HQIC with \(\kappa _c = 2.1\) introduced in Sect. 2.4.
Considering the clean data first (see Table 1), most estimators exhibit small RMSEs except of the AB estimator that is usually strongly negatively biased if \(\alpha \) is close to 1. The BB estimator performs well under these circumstances as expected, but is outperformed by the XD estimation. Regarding the robust estimators, the results are closer to each other for \(T=6\) than for \(T=12\) since there are only three possible moment conditions (5) if \(T=6\). The DZ estimator based on the first moment condition only is lacking behind AC-DZ and PD-DZ when \(\alpha \) is not close to zero and additional higher-order moment conditions thus improve estimation. The results for AC-DZ and PD-DZ are rather similar in most situations, with PD-DZ becoming relatively more precise as n increases due to less noisy moment selection. Overall, adding moment conditions improves performance of AC-DZ and PD-DZ relative to DZ; the performance of PD-DZ is worse than that of the AB and BB estimators for \(\alpha =0.1\), matches them for \(\alpha =0.5\), and outperforms them for \(\alpha =0.9\).
Next, the two different data contaminations schemes are considered: independent additive outliers and the patches of additive outliers. Considering the independent additive outliers first (see Table 2), which generally bias estimates toward zero and lead thus to larger biases especially for values of \(\alpha \) close to 1, AB, BB, and XD are strongly biased in all cases as expected. In the case of robust estimators, the negative biases of DZ, AC-DZ, and PD-DZ are rather small, although increasing with \(\alpha \). As AC-DZ outperforms DZ in this case, PD-DZ should and does exhibit performance more similar to AC-DZ than to DZ; PD-DZ even outperforms AC-DZ for \(\alpha =0.9\) or the largest sample size. This confirms the functionality of the weighting as the inclusion of higher-order differences with \(s>1\) in PD-DZ could lead to large biases due to independent additive outliers especially for \(\alpha =0.9\), see Fig. 1.
On the other hand, the higher-order differences with \(s>1\) should provide benefits when the data are contaminated by the patches of additive outliers, see Table 3. This type of contamination leads again to substantially biased non-robust estimates by XD, AB, and BB, In the case of the robust estimators, the patches of outliers tend to bias them toward 1 and have thus largest effect for \(\alpha \) close to 0. Hence, the biases of and more generally differences among the robust estimators are smallest for \(\alpha =0.9\). For smaller values of \(\alpha \), DZ outperforms AC-DZ, in particular for \(\alpha =0.1\), as the patches of outliers have a larger impact on the higher-order differences of AC-DZ—see Fig. 2a. Thus, the proposed PD-DZ should and does perform similarly to DZ and actually outperforms it in most situations most cases with \(\alpha \le 0.5\), which again confirms that the proposed weighting scheme is able to choose moment conditions that are less affected by the outliers. Note that the largest difference between DZ and PD-DZ is observed for \(T=12\) and \(\alpha =0.5\) as the higher-order differences can be used only if the number T of time periods is sufficiently large and they have a reasonable precision only if \(\alpha \) is not close to zero.
5 Concluding remarks
In this paper, we propose an extension of the median-based robust estimator for dynamic panel data model of Dhaene and Zhu (2017) by means of multiple pairwise differences. The newly proposed GMM estimation procedure that uses weights accounting both for the variance and outlier-related bias of the moment conditions is combined with the moment selection method. As a result, the estimator performs well in non-contaminated data as well as in data containing both independent outliers and patches of outliers.
Notes
The (optimal) inverse weight matrix, which is used here, is \(\sum _i\varvec{Z}_i^{\text {AB}\prime }\varvec{H}\varvec{Z}_i^{\text {AB}}\), where \(\varvec{Z}_i^{\text {AB}}\) is the matrix of instruments per individual and \(\varvec{H}\) is a \((T-1)\times (T-1)\) tridiagonal matrix with 2 in the main diagonal, \(-\,1\) in the first two sub-diagonals, and zeros elsewhere (see Arellano and Bond 1991, p. 279).
The inverse weight matrix is \(\sum _i\varvec{Z}_i^{\text {BB}\prime }\varvec{G}\varvec{Z}_i^{\text {BB}}\), where \(\varvec{Z}_i^{\text {BB}}\) is the matrix of instruments per individual and \(\varvec{G}\) is a partitioned matrix, \(\varvec{G}={\text {diag}}(\varvec{H},\varvec{I})\), where \(\varvec{H}\) is as in Arellano–Bond and \(\varvec{I}\) is the identity matrix [see Kiviet 2007, Eq. (38)].
References
Abrevaya J (2000) Rank estimation of a generalized fixed-effects regression model. J Econ 95(1):1–23
Aquaro M, Čížek P (2013) One-step robust estimation of fixed-effects panel data models. Comput Stat Data Anal 57(1):536–548
Aquaro M, Čížek P (2014) Robust estimation of dynamic fixed-effects panel data models. Stat Pap 55:169–186
Arellano M, Bond S (1991) Some tests of specification for panel data: Monte carlo evidence and an application to employment equations. Rev Econ Stud 58(2):277–297
Blundell R, Bond S (1998) Initial conditions and moment restrictions in dynamic panel data models. J Econ 87(1):115–143
Bramati MC, Croux C (2007) Robust estimators for the fixed effects panel data model. Econ J 10(3):521–540
Bun MJG, Sarafidis V (2015) Dynamic panel data models. In: Baltagi BH (ed) The Oxford handbook of panel data. The Oxford University Press, Oxford, pp 76–110
Cermeño R (1999) Median-unbiased estimation in fixed-effects dynamic panels. Annales dconomie et de Statistique 55–56:351–368
Cheng X, Liao Z (2015) Select the valid and relevant moments: an information-based LASSO for GMM with many moments. J Econ 186(2):443–464
Dhaene G, Zhu Y (2017) Median-based estimation of dynamic panel models with fixed effects. Comput Stat Data Anal 113:398–423
Galvao AF Jr (2011) Quantile regression for dynamic panel data with fixed effects. J Econ 164(1):142–157
Gervini D, Yohai VJ (2002) A class of robust and fully efficient regression estimators. Ann Stat 30(2):583–616
Hall AR, Inoue A, Jana K, Shin C (2007) Information in generalized method of moments estimation and entropy-based moment selection. J Econ 138:488–512
Han C, Phillips PCB, Sul D (2014) X-differencing and dynamic panel model estimation. Econ Theory 30:201–251
Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50(4):1029–1054
Harris MN, Mátyás L, Sevestre P (2008) Dynamic models for short panels, chapter 8. In: Mátyás L, Sevestre P (eds) The econometrics of panel data. Springer, Berlin, pp 249–278
Janz N (2002) Outlier robust estimation of an Euler equation investment model with German firm level panel data. In: Klein I, Mittnik S (eds) Contributions to modern econometrics. Dynamic modeling and econometrics in economics and finance, vol 4. Springer, Boston
Kiviet JF (2007) Judging contending estimators by simulation: tournaments in dynamic panel data models, chapter 11. In: Phillips G, Tzavalis E (eds) The refinement of econometric estimation and test procedures. Cambridge University Press, Cambridge, pp 282–318
Lucas A, van Dijk R, Kloek T (2007) Outlier robust GMM estimation of leverage determinants in linear dynamic panel data models. https://doi.org/10.2139/ssrn.20611
Verardi V, Wagner J (2011) Robust estimation of linear fixed effects panel data models with an application to the exporter productivity premium. J Econ Stat 231(4):546–557
Zaman A, Rousseeuw PJ, Orhan M (2001) Econometric applications of high-breakdown robust regression techniques. Econ Lett 71(1):1–8
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the Czech Science Foundation Project No. 13-01930S: “Robust methods for nonstandard situations, their diagnostics, and implementations.” We are grateful to Bertrand Melenberg, Christophe Croux, and the participants of the workshop “Robust methods for Dependent Data” of the German Statistical Society and SFB 823 in Witten, Germany, 2012, for helpful suggestions on an early version of this paper. The scientific output expressed does not imply a policy position of the European Commission. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication.
Appendix
Appendix
The outlier contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) are generally described by the contamination fraction \(\epsilon \) and the magnitude of outliers \(\zeta \) (recall that only the point-mass distribution \(G_\zeta \) is considered here). Therefore, we will denote the non-contaminated sample observations following model (1) by \(y_{ it }\) and the contaminated sample observations by \(y_{ it }^{\zeta ,\epsilon }\). By definition of \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\), the difference \(w_{ it } = y_{ it }^{\zeta ,\epsilon } - y_{ it }\) can only equal \(-\zeta \), 0, or \(\zeta \).
In order to prove the theorems concerning the influence function of \(\hat{\alpha }\), it is useful to derive first the asymptotic bias of \(\hat{r}_{\varvec{j}}\) as an estimator of \(r_{\varvec{j}}\). Similarly to Sect. 3.1, it is defined as
where \({\text {plim}}\) denotes the probability limit operator. Let \(b:=b(r_{\varvec{j}},\zeta ,\epsilon )\) be a short-hand notation for (24). Then, b solves the following equation:
Since \(r_{\varvec{j}}\) is considered only for \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\), where both s and p are odd, \(r_{\varvec{j}} = -(1-\alpha ^s)/2\). This mapping of \(\alpha \) to \(r_{\varvec{j}}=-(1-\alpha ^s)/2\) has the same important properties for \(s=1\) and any odd \(s>1\): it maps interval \((-\,1,0)\) to \((-\,1,-\,1/2)\) and interval (0, 1) to \((-\,1/2,0)\), it is continuous, and it is strictly increasing on \((-\,1,1)\). One can thus follow the proofs in Dhaene and Zhu (2017, Theorems 3.5 and 3.8) and apply them not only to the case of \(s=p=1\), but any odd s and p with only two adjustments: (i) the variables \(\Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-s}\) and \(\Delta ^py_{it-s}\) have to be standardized (Dhaene and Zhu 2017, equation (A.3)) and their variances generally depend on the values of s and p and (ii) in the case of patches of outliers, the probability that a patch contaminates the ratio \(\Delta ^sy_{ it }/\Delta ^py_{it-s}\) needs to be generalized.
As for (i), note that, by Eq. (2), the variables \(\Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-s}\) and \(\Delta ^py_{it-s}\) are uncorrelated, and by Assumption A.3, they are independent and normally distributed around zero. Additionally, the stationarity Assumptions A.1 and A.2 imply that, after substituting from the model equation, \({\text {cov}}(y_{ it }, \eta ) = \alpha {\text {cov}}(y_{it-1}, \eta _i) + {\text {var}}(\eta _i)\),
and subsequently, \((1-\alpha ^2) {\text {var}}(y_{ it }) = (1 + \alpha ) {\text {cov}}(y_{ it }, \eta ) + {\text {var}}(\varepsilon _{ it })\) and \({\text {var}}(y_{ it }) - {\text {cov}}(y_{ it }, \eta ){/}(1 - \alpha ) = {\text {var}}(\varepsilon _{ it }) / (1 - \alpha ^2) \). From this result and Aquaro and Čížek (2014, Equation (24)), we can thus conclude that
(the diagonal structure of the covariance matrix can be also seen from Equation 2.2 that implies \({\text {cov}}(\Delta ^sy_{ it },\Delta ^py_{it-s}) = r_{\varvec{j}}{\text {var}}(\Delta ^py_{it-s})\)).
Based on these observations, Aquaro and Čížek (2014, Theorem 1) derived the asymptotic distribution of the PD-DZ estimator defined by Eq. (7), which is presented here for the case of the triplet sets \(\mathcal{J} \subseteq \mathcal{J}_o\).
Theorem 5
Suppose that Assumptions A.1–A.3 hold and that \(\varvec{A}_n\rightarrow \varvec{A}>0\) in probability as \(n\rightarrow \infty \). Let \((1,1,1) \in \mathcal{J} \subseteq \mathcal{J}_o\) and \(\varvec{d}=\partial \varvec{g}(\alpha )/\partial \alpha \), where \(\alpha \) represents the true parameter value.
Then for a fixed T and \(n\rightarrow \infty \), \(\hat{\alpha }_n\) is consistent and asymptotically normal,
where \(\varvec{d}= \partial \varvec{g}(\alpha )/\partial \alpha = \{-s\alpha ^{s-1}\}_{\varvec{j}=(s,s,p)\in \mathcal{J}}\) and \(\varvec{V}\) is has a typical element with indices \(\varvec{j}=(s,s,p)\in \mathcal{J},\varvec{j}'=(s',s',p')\in \mathcal{J}\) defined by
1.1 Independent additive outlier contamination \(Z_{\epsilon ,\zeta }^1\)
Under independent additive outlier contamination \(Z_{\epsilon ,\zeta }^1\), Eq. (25) can be written as
where residual \(u_{it\varvec{j}} = \Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-s}\), \(w_{ it } \in \{0,\zeta \}\), \(\varvec{w}_{ it }=(w_{ it },w_{it-s},w_{it-s-p})'\) is a random vector, and \(f(\varvec{w}_{ it })\) is a random scalar. Let \(\Omega _{\varvec{w}_{ it }}\) be the set of the eight possible outcomes of \(\varvec{w}_{ it }\), that is,
where the number of elements is \(\#\Omega _{\varvec{w}_{ it }}=8\). To simplify the notation, let us refer to (29) as \(\Omega _{ it }\), and denote each of its element as \(\varvec{\omega }_{itj}\), \(j=1,\ldots ,8\). Then it holds
Note that \(\Pr \left( \varvec{w}_{ it }=\varvec{\omega }_{itj}\right) =\Pr \left( \varvec{w}_{ it }=\varvec{\omega }_{itj'}\right) \) for some j and \(j'\) because the data contamination \(Z_{\epsilon ,\zeta }^1\) is characterized by outliers occurring independently from each other. For instance, \(\Pr [(\zeta ,0,0)']=\Pr [(0,\zeta ,0)']=\Pr [(0,0,\zeta )']=(1-\epsilon )^2\epsilon \). Moreover, \(f[(0,0,0)']=f[(\zeta ,\zeta ,\zeta )']\). Therefore, Eq. (28) can be decomposed as
where A, B, and C are defined for \(r_{\varvec{j}}\), \(\zeta \), and b as follows:
These probabilities are all of the form
for given k, l, and b, and they can be conveniently standardized by using (26) as follows:
where X and Y are independent \(\mathrm {N}(0,1)\) variables and
and
where \(\sigma _{u}:=\sqrt{{\text {var}}(u_{it\varvec{j}})}\) and \(\sigma _{\Delta ^p}:=\sqrt{{\text {var}}(\Delta ^py_{it-s})}\) can be found in (26). Finally, note that \(L(k,l,b)=L(-k,-l,b)\), hence \(B=C\) and (28) becomes
Proof of Theorem 1
As in Dhaene and Zhu (2017, proof of Theorem 3.2), it follows from the definition of influence function that
where the equality follows from the implicit function theorem applied to (37) and where
As in Dhaene and Zhu (2017, Equation (A.4)),
where \(\sigma ^*\) is defined in (36) and \(X,Y\sim \mathrm {N}(0,1)\). Hence, \(A(r_{\varvec{j}},0)=1/2\) and
(recall that \(r_{\varvec{j}}=(1-\alpha ^s)/2\)). Next, Dhaene and Zhu (2017, Lemma A.1) implies that, for \(X,Z\sim \mathrm {N}(0,1)\) and constants \(c,c',c''\), \( P\{(X+c)/Z \le 0\} = 1/2 \) and \( P\{(X+c')/(Z-c) \le 0\} + P\{(X+c'')/(Z-c) \le 0\} = 1 + [\Phi (c')-\Phi (-c'')][\Phi (c)-\Phi (-c)]. \) Hence, the definition of \(B(r_{\varvec{j}},\zeta ,b)\) and the standardization (34) imply
Substituting for \(\sigma _{u}:=\sqrt{{\text {var}}(u_{it\varvec{j}})}\) and \(\sigma _{\Delta ^p}:=\sqrt{{\text {var}}(\Delta ^py_{it-s})}\) from (26) and \(r_{\varvec{j}}=-(1-\alpha ^s)/2\) into (42) and for terms \(A(r_{\varvec{j}},0)\), \(B(r_{\varvec{j}},\zeta ,0)\), and \(A_b'(r_{\varvec{j}},0)\) in (38) completes the proof. \(\square \)
1.2 Patch additive outlier contamination \(Z_{\epsilon ,\zeta }^2\)
As in “Appendix A.1”, it is useful to derive first the asymptotic bias of \(\hat{r}_{\varvec{j}}\) under the outlier contamination \(Z_{\epsilon ,\zeta }^2\) as defined in (12). This is given by \(b:=b(r_{\varvec{j}},\zeta ,\epsilon ,k)\) solving the equation
where the notation is defined below. Note that the decomposition in the second equality follows along the same lines as in “Appendix A.1”, in particular Eq. (30). In this case, the only difference is that outliers no longer occur independently but in patches. The number of elements of \(\Omega _{ it }\) increases to \(\#\Omega _{ it }=13\) as now, if we observe multiple outliers, we shall distinguish the event of the outliers belonging to the same patch from the event of these outliers belonging to different patches. For instance, \((0,\zeta ,\zeta )'\) may be that result of one patch only, \((0,\zeta _1,\zeta _1)'\), or of two patches, \((0,\zeta _2,\zeta _1)'\), where the subscript of \(\zeta \) indicates the patch. Recalling that \((1-\tilde{\epsilon })^k=1-\epsilon \),
and \(\mathfrak {p}_A=1-\mathfrak {p}_B-\mathfrak {p}_C-\mathfrak {p}_D\). Next, the terms A, B, C, D are defined for \(r_{\varvec{j}}\), \(\zeta \), and b as follows:
where the symmetry \(L(k,l,b)=L(-k,-l,b)\) has been used, recall Eq. (33).
Proof of Theorem 2
By the definition of influence function in (15),
where b denotes the bias of \(\hat{r}_{\varvec{j}}\). Given that \((1-\tilde{\epsilon })^k=1-\epsilon \), it holds
The derivative in (49) can obtained by applying the implicit function theorem to (43),
where \(A'_b(r_{\varvec{j}},0)\) is the same as in (39) and where \(\mathfrak {p}_j'\), \(j\in \{B,C,D\}\), denote the derivatives of \(\mathfrak {p}_j\) in Eqs. (44)–(46) with respect to \(\tilde{\epsilon }\), that is,
and
As in “Appendix A.1”, \(A(r_{\varvec{j}};0)=1/2\). Further, it follows from Dhaene and Zhu (2017, Lemma A.1) that, for \(X,Z\sim \mathrm {N}(0,1)\) and constants \(c,c'\), \( P\{(X+c')/(Z-c) \le 0\} = \Phi (-c')\Phi (-c) + \Phi (c')\Phi (c). \) Hence, the definition (47) and the standardization (34)–(36) imply
where \(\sigma _{u}:=\sqrt{{\text {var}}(u_{it\varvec{j}})}\) and \(\sigma _{\Delta ^p}:=\sqrt{{\text {var}}(\Delta ^py_{it-s})}\) are given in (26) and \(r_{\varvec{j}}=-(1-\alpha ^s)/2\). Substituting (49)–(57) in (48) completes the proof. \(\square \)
1.3 Patch additive outlier contamination \(Z_{\epsilon ,\zeta }^3\)
This case is a generalization of the \(Z_{\epsilon ,\zeta }^2\) contamination. The proof structure is very similar to the one in “Appendices A.1 and A.2”, although the algebra is a bit more lengthy. As before, it is useful to derive first the bias of \(\hat{r}_{\varvec{j}}\) under the outlier contamination \(Z_{\epsilon ,\zeta }^3\) as defined in (13). This is given by \(b:=b(r_{\varvec{j}},\zeta ,\epsilon ,k)\) solving the equation
where the notation is explained below. Note that the set \(\Omega _{ it }\) in (29) is different than it was for previous types of contaminations as now outliers can be either negative or positive multiple of \(\zeta \). Also recall that \((1-\tilde{\epsilon })^k=1-\epsilon \).
By using the results in Table 4, we have that
and
where \(\mathcal {I}:=\{A,B,C,D,E,F,G,H,I,J\}\). Moreover,
where the symmetry \(L(k,l,b)=L(-k,-l,b)\) has been used, recall Eq. (33).
Proof of Theorem 3
Denote
where \(\mathfrak {p}_j(\cdot )\), \(j\in \mathcal {I}\), are defined in (59)–(68). Given that \((1-\tilde{\epsilon })^k=1-\epsilon \), it holds
Differentiating (58) with respect to \(\epsilon \) and evaluating it at \(\epsilon =0\) yields
where \(A'_b(r_{\varvec{j}},0)\) is defined in (39) and where (see results in Table 4)
As in “Appendix A.1”, \(A(r_{\varvec{j}};0)=1/2\). Further, it follows from Dhaene and Zhu (2017, Lemma A.1) that, for \(X,Z\sim \mathrm {N}(0,1)\) and constants \(c,c'\), \( P\{(X+c')/(Z-c) \le 0\} = \Phi (-c')\Phi (-c) + \Phi (c')\Phi (c). \) Hence, the definition (69) and the standardization (34)–(36) imply
where \(\sigma _{u}:=\sqrt{{\text {var}}(u_{it\varvec{j}})}\) and \(\sigma _{\Delta ^p}:=\sqrt{{\text {var}}(\Delta ^py_{it-s})}\) are given in (26) and \(r_{\varvec{j}}=-(1-\alpha ^s)/2\). Substituting (71)–(86) in (70) completes the proof. \(\square \)
1.4 General results
Proof of Theorem 4
Given a non-stochastic weighting matrix \(\varvec{A}^0\), the proof follows directly from Eq. (8). The estimator \(\hat{\alpha }^0\) is defined by the solution of the sample analogs of Eq. (5), which are deterministic functions of \(\hat{r}_{\varvec{j}}\). Thus the influence function of \(\hat{\alpha }^0\) is fully determined by the influence functions of each \(\hat{r}_{\varvec{j}}\) being an element of \(\varvec{g}(\alpha )\):
where \(\varvec{\psi }:=\big ({\text {IF}}(\hat{r}_{\varvec{j}};r_{\varvec{j}},\zeta )\big )_{\varvec{j}\in \mathcal {J}_o}\) is a \(\#\mathcal {J}_o\times 1\) vector whose elements \({\text {IF}}(\hat{r}_{\varvec{j}};r_{\varvec{j}},\zeta )\), \(\varvec{j}\in \mathcal {J}_o\), are derived for each considered data contamination \(Z_{\epsilon ,\zeta }^1\), \(Z_{\epsilon ,\zeta }^2\), and \(Z_{\epsilon ,\zeta }^3\) in Theorem 1, 2, and 3, respectively.
If the weight matrix \(\varvec{A}_n=\varvec{A}(\hat{\alpha }_n^0, \epsilon )\), then it follows by the same argument as above (\(\varvec{d}\) is still deterministic) and the matrix differentiation rules that
where \(\dot{\varvec{A}} = {\text {IF}}(\varvec{A}_n;\varvec{A},\zeta )\). Since this influence function is bounded, the result follows from the asymptotic validity of the moment conditions \(\varvec{g}(\alpha )=0\). \(\square \)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Čížek, P., Aquaro, M. Robust estimation and moment selection in dynamic fixed-effects panel data models. Comput Stat 33, 675–708 (2018). https://doi.org/10.1007/s00180-017-0782-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0782-7