1 Introduction

Dynamic panel data models with fixed effects have been used in many empirical applications in economics; see Bun and Sarafidis (2015) and Harris et al. (2008) for an overview of the methodology and applications. Despite the complex data structure of dynamic panels, a vast majority of literature focuses on the models assuming that data are free of influential observations or outliers. This is often not the case in reality (Janz 2002; Verardi and Wagner 2011; Zaman et al. 2001), and procedures robust to outliers are thus very important in the case of panel data, where erroneous observations can be easily masked by the complex data structure.

The robust methods for panel data have been studied only to a limited extent till now. There are some methods available for static models (e.g., Bramati and Croux 2007; Aquaro and Čížek 2013) and just a handful for the dynamic models. Locally robust estimation procedures have been proposed by Lucas et al. (2007), based on the generalized method of moment estimator with a bounded influence function, and by Galvao (2011), using quantile regression techniques. On the other hand, Dhaene and Zhu (2017) and Aquaro and Čížek (2014) propose globally robust estimators that are based on the median ratios of the first differences of the dependent variable and of the first- or higher-order differences of the lagged dependent variable [note that previously studied median-unbiased estimation such as Cermeño (1999) was based on the least squares method and was thus not robust to outliers]. The main shortcomings of these methods follow from the use of a fixed number of the differences and their ratios. On the one hand, using just the first differences as in Dhaene and Zhu (2017) can be beneficial for the robustness of the estimator, but it results in a lower precision of estimates. On the other hand, Aquaro and Čížek (2014) employ multiple differences of the explanatory variables to improve the precision of estimation, but it leads to a high sensitivity to sequences of outliers. Additionally, estimation using higher-order differences of the dependent variables has not been explored in neither case.

Our aim is to extend these median-based estimators of Dhaene and Zhu (2017) and Aquaro and Čížek (2014) by employing multiple pairwise difference transformations in such a way that the resulting estimator is robust and also exhibits good finite-sample performance in data without outliers. The use of higher-order differences of the dependent variable is not new (see Aquaro and Čížek 2013), but presents two big challenges when applied in dynamic models. In particular, higher-order differences have not been previously used since (1) they can result in a substantial increase in bias in the presence of particular types of outliers and (2) their number grows quadratically with the number of time periods, which can lead to additional biases due to weak identification or outliers. We address this by proposing a data-driven weighting and selection of the median ratios of differenced data since the traditional strategy used in the robust statistics—using an initial robust estimator to detect outlying observations, and after removing them, applying an efficient non-robust estimator (c.f., Gervini and Yohai 2002)—is not feasible in this context. Even in the case of using the first differences only, removing a single observation means that the observation and its two or three following data points (depending on the actual estimation method) cannot be used in estimation. Especially in short panels with less than five time periods, removing a single observation for a given individual thus means that no observations of that individual can be used in estimation and the problem gets worse if higher-order differences are used.

In this paper, we generalize the estimation method of Dhaene and Zhu (2017) to a combination of the pth and sth order differences, \(p,s\in \mathbb {N}\), and combine multiple pairwise differences by means of the generalized method of moments (GMM). To account for the shortcomings of the current methods and to extend the analysis of Aquaro and Čížek (2014), we first analyze the robustness of the median-based moment conditions, derive their influence functions, and quantify the bias caused by data contamination. Subsequently, we use the maximum bias and propose a two-step GMM estimator, which weights the (median-based) moment conditions both by their variance and bias; this guarantees that imprecise or biased moment conditions get low weights in estimation. Finally, as the number of applicable moment conditions grows quadratically with the number of time periods, a suitable number of moment conditions for the underlying data generating process needs to be selected using a robust version of moment selection procedure of Hall et al. (2007).

In the rest of the paper, the new estimator is introduced first in Sect. 2. Its robust properties are studied in Sect. 3 and are used to define the data-dependent GMM weights. The existing and proposed methods are then compared by means of Monte Carlo simulations in Sect. 4 and the proofs can be found in the “Appendix”.

2 Median-based estimation of dynamic panel models

The dynamic panel data model (Sect. 2.1) and its median-based estimation (Sect. 2.2) will be now discussed. Later, the two-step GMM estimation procedure (Sect. 2.3) and the moment selection method (Sect. 2.4) will be introduced.

2.1 Dynamic panel data model

Consider the simple dynamic panel data model \((i=1,\ldots ,n; t=1,\ldots ,T; T\ge 3)\)

$$\begin{aligned} y_{ it }=\alpha y_{it-1}+\eta _i+\varepsilon _{ it }, \end{aligned}$$
(1)

where \(y_{ it }\) denotes the response variable, \(\eta _i\) is the unobservable fixed effect, and \(\varepsilon _{ it }\) represents the idiosyncratic error. Parameter \(|\alpha |<1\) so that this data generating process can be stationary. The number T of time periods is fixed, which implies that fixed or stochastic effects \(\eta _i\) are nuisance parameters and cannot be consistently estimated. Finally note that the extension of the discussed estimators to a model with exogenous covariates is straightforward (see Dhaene and Zhu 2017, Section 4.1).

As in Aquaro and Čížek (2014) and similarly to Cermeño (1999) and Han et al. (2014), we will consider model (1) under the following assumptions:

A.1

Errors \(\varepsilon _{ it }\) are independent across \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\) and possess finite second moments. Errors \(\{\varepsilon _{ it }\}_{t=1}^T\) are also independent of fixed effects \(\eta _i\).

A.2

The sequences \(\{y_{ it }\}_{t=1}^T\) are time stationary for all \(i=1,\ldots ,n\). In particular, the first and second moments of \(y_{ it }\) conditional on \(\eta _i\) exist and do not depend of time.

A.3

Errors \(\varepsilon _{ it }\sim \mathrm {N}(0,\sigma _{\varepsilon }^2)\), \(\sigma _{\varepsilon }^2>0\), for all \(i=1,\ldots ,n\) and \(t=1,\ldots ,T\).

Except of the independence in Assumption A.1, there are no assumptions are made about the unobservable fixed effects \(\eta _i\). Although we impose rather strict Assumptions A.1 and A.3 on idiosyncratic errors, they can be relaxed. The errors \(\varepsilon _{ it }\) do not have to follow the same distribution across cross-sectional units i, allowing for heteroscedasticity. Additionally, the consistency of the estimators introduced below requires that the joint distributions of errors \(\{\varepsilon _{ it }\}_{t=1}^T\) are elliptically contoured, making the normality Assumption A.3 sufficient, but not necessary (see Dhaene and Zhu 2017, Section 4.2). On the other hand, the violation of the time-homoscedasticity in Assumption A.3 leads to the inconsistency of the discussed estimators. If \(\varepsilon _{ it }\sim \mathrm {N}(0,\sigma _{\epsilon t}^2)\) for \(t=1,\ldots ,T\), the model equation (1) has to be therefore rescaled by unknown standard deviations \(\sigma _{\epsilon t}\), which can be treated as unknown parameters and estimated along with \(\alpha \) by GMM. Finally, the stationarity Assumption A.2 is used not only by the proposed estimators, but also by frequently applied GMM estimators such as Blundell and Bond (1998) and it is implied by the assumptions of Han et al. (2014) if \(|\alpha |<1\).

2.2 Median-based moment conditions

To generalize the estimator by Dhaene and Zhu (2017), let \(\Delta ^s\) denote the sth difference operator, that is, \(\Delta ^s\upsilon _{t}:=\upsilon _{t}-\upsilon _{t-s}\) (cf. Abrevaya 2000; Aquaro and Čížek 2013). Given model (1), stationarity Assumption A.2 implies for any integers \(s,q,p\in \mathbb {N}\) that

$$\begin{aligned} {\text {E}}(\Delta ^sy_{ it }|\Delta ^py_{it-q})=r_{\varvec{j}}\Delta ^py_{it-q}, \end{aligned}$$
(2)

where the triplet \(\varvec{j}=(s,q,p)\) and \( r_{\varvec{j}} = {\text {cov}}(\Delta ^sy_{ it },\Delta ^py_{it-q}){/}{{\text {var}}(\Delta ^py_{it-q})}\) are independent of i and t, \(\max \{s,p+q\}<T\). Consequently, the variables \(\Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-q}\) and \(\Delta ^py_{it-q}\) are uncorrelated, and by Assumption A.3, independent and symmetrically distributed around zero. Hence, \( {\text {E}}[{\text {sgn}}(\Delta ^sy_{ it }-r_{\varvec{j}}\Delta ^py_{it-q}){\text {sgn}}(\Delta ^py_{it-q})]=0 \) and \( {\text {E}}\left[ {\text {sgn}}\left( {\Delta ^sy_{ it }}/{\Delta ^py_{it-q}}-r_{\varvec{j}}\right) \right] =0. \) The estimation of \(r_{\varvec{j}}\) can be therefore based on the sample analog of this moment condition:

$$\begin{aligned} \hat{r}_{n\varvec{j}} = {\text {med}}\left\{ \frac{\Delta ^sy_{ it }}{\Delta ^py_{it-q}};\, t=p+q+1,\dots ,T;\ i=1,\dots ,n \right\} . \end{aligned}$$
(3)

To relate \(r_{\varvec{j}}\) to the autoregressive coefficient \(\alpha \) in (1), Aquaro and Čížek (2014) derived under Assumptions A.1 and  A.2 that the correlation coefficient \(r_{\varvec{j}}\) satisfies the moment condition

$$\begin{aligned} g_{\varvec{j}}(\alpha ) = 2(1-\alpha ^p)r_{\varvec{j}}-\alpha ^q+\alpha ^{q+p}+\alpha ^{|s-q|}-\alpha ^{|s-p-q|} =0. \end{aligned}$$
(4)

If \(s=q=p=1\), (4) defines Dhaene and Zhu (2017)’s estimator: \(\alpha \in (-1,1)\) is identified by \(g_{111}(\alpha )=(1-\alpha )(2r_{111}+1-\alpha )=0\). Dhaene and Zhu’s (DZ) estimator \(\hat{\alpha }_n\) therefore simply equals to \(2\hat{r}_{n111}+1\) and it was proved to be consistent and asymptotically normal. Aquaro and Čížek (2014)’s estimator (AC-DZ) of \(\alpha \) uses \(s=q=1\) and p being odd, \(p<T-1\). They do not use differences with \(s>1\) due to their robustness properties: while they seem robust to sequences of outliers, they can lead to large biases if outliers occur at random times.

2.3 Two-step GMM estimation

To increase the precision and robustness of the estimation, we propose to extend the (AC-)DZ estimator by allowing for multiple differences with \(s = q \ge 1\) and \(p \ge 1\). We consider only \(s=q\) as the moment conditions (4) do not allow distinguishing outlying and regular observations for \(s \not = q\) as shown in Aquaro and Čížek (2014). For \(s=q\), (4) simplifies after dividing by \(1-\alpha ^p\) and accordingly redefining \(g_{\varvec{j}}(\alpha )\) to

$$\begin{aligned} g_{\varvec{j}}(\alpha ) = 2r_{\varvec{j}} + 1 - \alpha ^s =0. \end{aligned}$$
(5)

The full set of moment conditions in (5) can be then written as

$$\begin{aligned} \varvec{g}(\alpha )=\varvec{0}, \end{aligned}$$
(6)

where \(\varvec{g}(\alpha )=\{g_{\varvec{j}}(\alpha )\}_{\varvec{j}\in \mathcal {J}}\) and a fixed finite set \(\mathcal {J}\) contains all triplets \(\varvec{j}= (s,q,p)\) that are considered in estimation. The DZ estimator then corresponds to the special case \(\mathcal{J} = \{ (1,1,1) \}\) and the AC-DZ relies on a set \(\mathcal{J} = \{ (1,1,p){:}\,1 \le p < T-1 \text{ odd } \}\). Here we consider all combinations with any \(s=q\) odd and p odd, \(\mathcal{J} \subseteq \mathcal{J}_o = \{ (s,s,p){:}\,s\in \mathbb {N} \text{ odd }, p\in \mathbb {N} \text{ odd }, 1 \le s+p < T\}\), as the single moment conditions do not identify uniquely \(\alpha \) for even values of s or p and this could negatively affect the bias caused by contamination.

Given the system of equations in (6), the parameter \(\alpha \) can be estimated by the GMM procedure. This GMM estimator is referred to here as the pairwise-difference DZ (PD-DZ) estimator and is defined by

$$\begin{aligned} \hat{\alpha }_n = \mathop {{\text {arg}}\,{\text {min}}}\limits _{c\in (-1,1)} \varvec{g}_n(c)'\varvec{A}_n\varvec{g}_n(c), \end{aligned}$$
(7)

where \(\varvec{g}_n(c)=(g_{n\varvec{j}}(c))_{\varvec{j}\in \mathcal {J}}\) is the sample analog of \(\varvec{g}(\alpha )\) and corresponds to (5) with \(r_{\varvec{j}}\) being replaced by \(\hat{r}_{n\varvec{j}}\) defined in (3).

The weighting matrix \(\varvec{A}_n\) can be initially chosen as in Aquaro and Čížek (2014) proportional to the number of observations available for the estimation of each moment equation: \(\varvec{A}_n = \varvec{A}= {\text {diag}}\{(T - p - s)/T \}\). The traditional variance-minimizing choice of the GMM weighting matrix \(\varvec{A}_n\) however equals the inverse of the variance matrix \(\varvec{V}_n\) of the moment conditions \(\varvec{g}_n(\alpha )\), which converges to the asymptotic variance matrix \(\varvec{V}\) of the moment conditions (5); see the Appendix for the asymptotic distribution of \(\hat{\alpha }_n\) and the asymptotic variance matrix \(\varvec{V}\) previously obtained by Aquaro and Čížek (2014).

On the other hand, we aim to account also for the presence of outlying observations that can substantially bias the estimates. Since simply removing outliers would result in a substantial data loss as explained in the introduction, we propose instead to use the moment conditions (5) and minimize the mean squared error (MSE) of estimates instead of the asymptotic variance. First, let us denote the MSE of \(\varvec{g}_n(\alpha )\) by \(\varvec{W}_n\),

$$\begin{aligned} \varvec{W}_n = MSE \{\varvec{g}_n(\alpha )\} = Bias \{\varvec{g}_n(\alpha )\} Bias \{\varvec{g}_n(\alpha )\}' + Var \{\varvec{g}_n(\alpha )\} = \varvec{b}_n\varvec{b}_n' + \varvec{V}_n. \end{aligned}$$

Given a weighting matrix \(\varvec{A}_n\) and the asymptotic linearity of \(\hat{\alpha }_n\) (see Aquaro and Čížek 2014, the proof of Theorem 1)

$$\begin{aligned} \hat{\alpha }_n - \alpha = (\varvec{d}' \varvec{A}_n \varvec{d})^{-1} \varvec{d}' \varvec{A}_n \varvec{g}_n(\alpha ) + o_p(1) \end{aligned}$$
(8)

as \(n\rightarrow \infty \), it immediately follows that the MSE of \(\hat{\alpha }_n\) equals

$$\begin{aligned} (\varvec{d}'\varvec{A}_n\varvec{d})^{-1}\varvec{d}'\varvec{A}_n \varvec{W}_n \varvec{A}_n \varvec{d}(\varvec{d}'\varvec{A}_n\varvec{d})^{-1} + o_p(1), \end{aligned}$$

which is (asymptotically) minimized by choosing \(\varvec{A}_n = \varvec{W}_n^{-1}\) (Hansen 1982, Theorem 3.2).

Next, to create a feasible procedure, both the variance and squared bias matrices have to be estimated. The estimation thus proceeds in two steps: first, the (AC-)DZ estimator is applied to obtain an initial parameter estimates; then—after estimating the bias \(\varvec{b}_n\) and variance \(\varvec{V}_n\) of moment conditions—the GMM estimator with all applicable pairwise differences is evaluated using an estimate of the weighting matrix \(\varvec{A}_n = [\varvec{b}_n \varvec{b}_n' + \varvec{V}_n]^{-1}\). On the one hand, the estimate \(\hat{\varvec{V}}_n\) of \(\varvec{V}_n\) can be directly obtained from Theorem 5 in the “Appendix” using initial estimates of \(r_{\varvec{j}}\) and \(\alpha \) because both the responses \(y_{ it }\) as well as estimates \(\hat{\alpha }_n\) are continuously distributed with bounded densities due to the stationarity Assumptions A.2 andA.3. On the other hand, estimating \(\varvec{b}_n\) by \(\hat{\varvec{b}}_n\) requires first studying the biases of median-based moment conditions and constructing a feasible estimate thereof in Sect. 3. Using estimates \(\hat{\varvec{V}}_n\) and \(\hat{\varvec{b}}_n\) to construct \(\hat{\varvec{W}}_n = \hat{\varvec{b}}_n \hat{\varvec{b}}_n' + \hat{\varvec{V}}_n\) and \(\hat{\varvec{A}}_n=\hat{\varvec{W}}_n^{-1}\) then leads to the proposed second-step GMM estimator

$$\begin{aligned} \hat{\alpha }_n = \mathop {{\text {arg}}\,{\text {min}}}\limits _{c\in (-1,1)} \varvec{g}_n(c)'\hat{\varvec{A}}_n\varvec{g}_n(c) = \mathop {{\text {arg}}\,{\text {min}}}\limits _{c\in (-1,1)} \varvec{g}_n(c)'[\hat{\varvec{b}}_n \hat{\varvec{b}}_n' + \hat{\varvec{V}}_n]^{-1}\varvec{g}_n(c). \end{aligned}$$
(9)

2.4 Robust moment selection

The proposed two-step GMM estimator is based on the moment conditions (5), and given that we consider only odd s and p, their number equals approximately \(T(T-1)/8\) and grows quadratically with the number of time periods. Although the extra moment conditions based on higher-order differences might improve precision of estimation for larger values of \(|\alpha |\), their usefulness is rather limited if \(\alpha \) is close to zero. At the same time, a large number of moment conditions might increase estimation bias due to outliers. More specifically, Aquaro and Čížek (2014) showed for \(\alpha \) close to 0 that the original moment condition of the DZ estimator \(s=q=p=1\) is least sensitive to random outliers, for instance; including higher-order moment conditions then just increases bias, does not improve the variance, and is thus harmful.

To account for this, we propose to select the moment conditions used in estimation by a robust analog of a moment selection criterion (e.g., see Cheng and Liao 2015, for an overview). Since all moments are valid and no weak instruments are involved, the information content of the moment equations and their number have to be balanced as in Hall et al. (2007), whose approach to moment selection in the presence of nearly redundant moment conditions can be adapted to robust estimation. They propose the so-called relevant moment selection criterion (RMSC) that—for a given set of moment conditions defined by triplets \(\mathcal J\) in our case—equals

$$\begin{aligned} RMSC (\mathcal{J}) = \ln ( | \hat{\varvec{V}}_{n,\mathcal{J}} | ) + \kappa ( |\mathcal{J}|, n ). \end{aligned}$$

Matrix \(\hat{\varvec{V}}_{n,\mathcal{J}}\) represents an estimate of the variance matrix \(\varvec{V}_{\mathcal{J}}\) of moment conditions (6) defined by triplets \(\mathcal J\) and \(\kappa (\cdot ,\cdot )\) is a deterministic penalty term depending on the number \(|\mathcal{J}|\) of triplets (or moment conditions) and on the sample size n used for estimating the elements of \(\varvec{V}_n\) (see Theorem 5). To select relevant moment conditions, this criterion has to be minimized:

$$\begin{aligned} \hat{\mathcal{J}} = \mathop {{\text {arg}}\,{\text {min}}}\limits _{\mathcal{J} \subseteq \mathcal{J}_o} RMSC (\mathcal{J}). \end{aligned}$$

Two examples of the penalization term used by Hall et al. (2007) are the Bayesian information criterion (BIC) with \(\kappa (c,n) = (c-K) \cdot \ln (\sqrt{n}){/}\sqrt{n}\) and the Hannan–Quinn information criterion (HQIC) with \(\kappa (c,n) = (c-K) \cdot \kappa _c \ln (\ln (\sqrt{n})){/}\sqrt{n}\), where the number of estimated parameters \(K=1\) in model (1) and constant \(\kappa _c>2\).

As in Sect. 2.3, the proposed robust estimator (9) should minimize the MSE error rather than just the variance of the estimates. We therefore suggest to use the relevant robust moment selection criterion (RRMSC),

$$\begin{aligned} RRMSC (\mathcal{J}) = \ln ( | \hat{\varvec{W}}_{n,\mathcal{J}} | ) + \kappa ( |\mathcal{J}|, n ), \end{aligned}$$
(10)

which is based on the determinant of an estimate \(\hat{\varvec{W}}_n\) of the MSE matrix \(\varvec{W}_n\) rather than on the variance matrix estimate \(\hat{\varvec{V}}_n\) of the moment conditions. The relevant robust moment conditions are then obtained by minimizing

$$\begin{aligned} \hat{\mathcal{J}} = \mathop {{\text {arg}}\,{\text {min}}}\limits _{\mathcal{J} \subseteq \mathcal{J}_o} RRMSC (\mathcal{J}). \end{aligned}$$

3 Robustness properties

There are many measures of robustness that are related to the bias of an estimator, or more typically, the worst-case bias of an estimator due to an unknown form of outlier contamination. In this section, various kinds of contamination are introduced and some relevant measures of robustness are defined (Sect. 3.1). Using these measures, we characterize the robustness of moment conditions (5) in Sect. 3.2 and the robustness of the GMM estimator (7) in Sect. 3.3. Next, we use these results to estimate of the bias of the moment conditions (5) as discussed in Sect. 3.4. Finally, the whole estimation procedure is summarized in Sect. 3.5.

3.1 Measures of robustness

Given that the analyzed data from model (1) are dependent, the effect of outliers can depend on their structure. Therefore, we first describe the considered contamination schemes and then the relevant measures of robustness.

More formally, let \(\mathcal {Z}\) be the set of all possible samples \(Z = \{z_{ it }\}\) of size (nT) following model (1) and let \(Z_\epsilon = \{ z_{ it }^\epsilon \}\) be a contaminating sample of size (nT) following a fixed data-generating process, where the index \(\epsilon \) of \(Z_\epsilon \) indicates the probability that an observations in \(Z_\epsilon \) is different from zero. The observed contaminated sample is \(Z+Z_\epsilon =\{z_{ it } + z_{ it }^{\epsilon }\}_{i=1,t=1}^{n,~~T}\). Similarly to Dhaene and Zhu (2017), we consider the contamination by independent additive outliers following a degenerate distribution with the point mass at \(\zeta \),

$$\begin{aligned}&Z^1_{\epsilon ,\zeta } = \{z_{ it }^\epsilon \} _{i=1,t=1}^{n,~~T} = \{ \zeta \cdot I(\nu _{ it }^\epsilon =1)\} _{i=1,t=1}^{n,~~T}, \nonumber \\&\quad P(\nu _{ it }^\epsilon =1)=\epsilon , \quad P(\nu _{ it }^\epsilon =0)=1-\epsilon , \end{aligned}$$
(11)

and by patches of k additive outliers,

$$\begin{aligned} Z^2_{\epsilon ,\zeta }= \{z_{ it }^\epsilon \} _{i=1,t=1}^{n,~~T} = \left\{ \zeta \cdot I\left( \nu _{ it }^\epsilon =1\hbox { or }\ldots \hbox { or }\nu _{it-k+1}^\epsilon =1 \right) \right\} _{i=1,}^{n,}{}_{t=1}^T, \end{aligned}$$
(12)

where \(\nu _{ it }^\epsilon \) follows the Bernoulli distribution with the parameter \(\tilde{\epsilon }\) such that \((1-\tilde{\epsilon })^k=1-\epsilon \). Additionally, a third contamination scheme \(Z^3_{\epsilon ,\zeta }=\{z_{ it }^\epsilon \}_{i=1,}^{n,}{}_{t=1}^T\) is considered, where

$$\begin{aligned} z_{ it }^{\epsilon }= {\left\{ \begin{array}{ll} a_{it-l}(-1)^l &{}\quad \text {if the smallest index }l\ge 0\text { with }\nu _{it-l}^\epsilon =1\text { satisfies }l\le k-1,\\ 0 &{}\quad \text {otherwise,} \end{array}\right. } \end{aligned}$$
(13)

where \(\Pr \left( a_{it-l}=\zeta \right) =1/2\) and \(\Pr \left( a_{it-l}=-\zeta \right) =1/2\) and where \(\nu _{ it }^{\epsilon }\) is defined as in \(Z_{\epsilon ,\zeta }^2\). Note that (12) and (13) are special cases of a more general type of contamination \(Z_{\epsilon ,\zeta }^4=\{z_{ it }^\epsilon \}_{i=1,}^{n,}{}_{t=1}^T\), where

$$\begin{aligned} z_{ it }^{\epsilon }= {\left\{ \begin{array}{ll} a_{it-l}\rho ^l &{}\quad \text {if the smallest index }l\ge 0\text { with }\nu _{it-l}^\epsilon =1\text { satisfies }l\le k-1,\\ 0 &{}\quad \text {otherwise,} \end{array}\right. } \end{aligned}$$
(14)

and \(-\,1\le \rho \le 1\). Note that this general type of contamination closely corresponds to the contamination by innovation outliers for large k and \(\rho =\alpha \) and it is therefore important to study. As we can however conjecture from Dhaene and Zhu (2017)’s results for \(s=p=1\) that the contamination scheme \(Z_{\epsilon ,\zeta }^4\) biases estimates towards \(\rho \) for \(\zeta \rightarrow +\infty \) and \(\rho \) is unknown in practice, we are not analysing this most general case with \(\rho \in [-\,1,1]\). Instead, we concentrate on the most extreme cases of \(\rho =1\) and \(\rho =-\,1\) as they can arguably bias the estimate most. Hence, the contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) bias the DZ estimates of \(\alpha \) towards 0, 1, and \(-\,1\), respectively—see Sect. 3.2 and Dhaene and Zhu (2017).

Given the contamination schemes, one of the traditional measures of the global robustness of an estimator is the breakdown point. It can be defined as the smallest fraction of the data that can be changed in such a way that the estimator will not reflect any information concerning the remaining (non-contaminated) observations. Aquaro and Čížek (2014) derived the breakdown points of the estimators \(\hat{r}_{\varvec{j}}\), \(\varvec{j}\in \mathcal{J}\), for contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\), and under some regularity conditions, proved that the breakdown point of the GMM estimator (7) equals the breakdown point of the DZ estimator \(\hat{r}_{(1,1,1)}\) if \((1,1,1)\in \mathcal{J}\). While such results characterize the global robustness of the PD-DZ estimators, they are not informative about the size of the bias caused by outliers.

We therefore base the estimation of the bias due to contamination on the influence function. It is a traditional measure of local robustness and can be defined as follows. Let \(\mathcal {T}(Z+Z_\epsilon )\) denote a generic estimator of an unknown parameter \(\theta \) based on a contaminated sample \(Z+Z_\epsilon =\{z_{ it }+z_{ it }^{\epsilon }\}_{i=1,}^{n,}{}_{t=1}^T\), where Z and \(Z_\epsilon \) have been defined at the beginning of Sect. 3. As the definition is asymptotic, let \(\mathcal {T}(\theta ,\zeta ,\epsilon ,T)\) be the probability limit of \(\mathcal {T}(Z+Z_\epsilon )\) when T is fixed and \(n\rightarrow \infty \). Note that \(\mathcal {T}(\theta ,\zeta ,\epsilon ,T)\) depends on the unknown parameter \(\theta \) describing the data generating process, on the fraction \(\epsilon \) of data contamination, on the non-zero value \(\zeta \) characterizing the outliers, and on the number of time periods T. Assume \(\mathcal {T}\) is consistent under non-contaminated data, that is, \(\mathcal {T}(\theta ,\zeta ,0,T)=\theta \). The influence function (IF) of estimator \(\mathcal {T}\) at data generating process Z due to contamination \(Z_\epsilon \) is defined as

$$\begin{aligned} {\text {IF}}\big (\mathcal {T};\theta ,\zeta ,T\big ) :=\lim _{\epsilon \rightarrow 0}\frac{\mathcal {T}(\theta ,\zeta ,\epsilon ,T)-\theta }{\epsilon } =\left. \frac{\partial {\text {bias}}(\mathcal {T};\theta ,\zeta ,\epsilon ,T)}{\partial \epsilon }\right| _{\epsilon =0}, \end{aligned}$$
(15)

where the equality follows by the definition of asymptotic bias of \(\mathcal {T}\) due to the data contamination \(Z_\epsilon \), \( {\text {bias}}(\mathcal {T};\theta ,\zeta ,\epsilon ,T):=\mathcal {T}(\theta ,\zeta ,\epsilon ,T)-\theta . \) (If IF does not depend on the number T of time periods, T can be omitted from its arguments.)

Clearly, the knowledge of the influence function allows us to approximate the bias of an estimator \(\mathcal {T}\) at \(Z+Z_\epsilon \) by \(\epsilon \cdot {\text {IF}}(\mathcal {T};\theta ,\zeta ,T)\). Although such an approximation is often valid only for small values of \(\epsilon >0\) (e.g., in the linear regression model, where the bias can get infinite), it is relevant in a much wider range of contamination levels \(\epsilon \) in model (1) given that the parameter space \((-\,1,1)\) is bounded and so is the bias (the dependence of the bias on the contamination level \(\epsilon \) has been studied by Dhaene and Zhu 2017).

The disadvantage of approximating bias by \(\epsilon \cdot {\text {IF}}(\mathcal {T};\theta ,\zeta ,T)\) is that it depends on the unknown magnitude \(\zeta \) of outliers. We therefore suggest to evaluate the supremum of the influence function, the gross error sensitivity (GES)

$$\begin{aligned} {\text {GES}}(\mathcal {T};\theta ,T) = \sup _{\zeta }\left| {\text {IF}}(\mathcal {T};\theta ,\zeta ,T)\right| , \end{aligned}$$
(16)

and approximate the worst-case bias by \(\epsilon \cdot {\text {GES}}(\mathcal {T};\theta ,T)\). For the PD-DZ estimator and the corresponding moment conditions, IF and GES are derived in the following Sects. 3.2 and 3.3, where \(\mathcal {T}\) will equal to \(\hat{\alpha }\) and \(\hat{r}_{\varvec{j}}\), respectively (without the subscript n since the IF and GES definitions depend only on the probability limits of the estimators).

3.2 Influence function

The GMM estimator (7) is based on moment conditions depending on the data only by means of the medians \(r_{\varvec{j}}\). We therefore derive first the influence functions of the estimators \(\hat{r}_{\varvec{j}}\) and then combine them to derive the influence function of the GMM estimator. Building on Dhaene and Zhu (2017, Theorems 3.2 and 3.7), the IFs of \(\hat{r}_{\varvec{j}}\) in model (1) under contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) are derived in the following Theorems 13. Only the point-mass distribution \(G_\zeta \) with the mass at \(\zeta \in R\) is considered. In all theorems, \(\Phi \) denotes the cumulative distribution function of the standard normal distribution \(\mathrm {N}(0,1)\).

Theorem 1

Let Assumptions A.1–A.3 hold and \(\varvec{j}\in \mathcal{J}_o\). Then it holds in model (1) under the independent-additive-outlier contamination \(Z_{\epsilon ,\zeta }^1\) with point-mass distribution at \(\zeta \ne 0\) that

$$\begin{aligned} {\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )= & {} -\pi \sqrt{\frac{1-\alpha ^s}{1-\alpha ^p}-\frac{1}{4}(1-\alpha ^s)^2}\nonumber \\&\times \,\left[ \Phi \left( \frac{\zeta (1+\alpha ^s)/2}{\sqrt{2\frac{\sigma _{\varepsilon }^2}{1-\alpha ^2}\left( 1-\alpha ^s-\frac{(1-\alpha ^s)^2}{4}(1-\alpha ^p)\right) }}\right) \right. \nonumber \\&\quad \left. -\,\Phi \left( \frac{-\zeta (1-\alpha ^s)/2}{\sqrt{2\frac{\sigma _{\varepsilon }^2}{1-\alpha ^2}\left( 1-\alpha ^s-\frac{(1-\alpha ^s)^2}{4}(1-\alpha ^p)\right) }}\right) \right] \nonumber \\&\times \,\left[ \Phi \left( \frac{\zeta }{\sqrt{2\sigma _{\varepsilon }^2\frac{1-\alpha ^p}{1-\alpha ^2}}}\right) - \Phi \left( -\frac{\zeta }{\sqrt{2\sigma _{\varepsilon }^2\frac{1-\alpha ^p}{1-\alpha ^2}}}\right) \right] . \end{aligned}$$
(17)

Theorem 2

Let Assumptions A.1–A.3 hold and \(\varvec{j}\in \mathcal{J}_o\). Then it holds in model (1) under the patched-additive-outlier contamination \(Z_{\epsilon ,\zeta }^2\) with point-mass distribution at \(\zeta \ne 0\) and patch length \(k\ge 2\) that

$$\begin{aligned} {\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )= & {} -\frac{\pi }{k}\sqrt{\frac{1-\alpha ^s}{1-\alpha ^p}-\frac{(1-\alpha ^s)^2}{4}}\nonumber \\&\times \,\left[ \mathfrak {p}_C'(0)\left( C(r_{\varvec{j}};\zeta ,0)-\frac{1}{2}\right) + \mathfrak {p}_D'(0)\left( D(r_{\varvec{j}};\zeta ,0)-\frac{1}{2}\right) \right] ,\nonumber \\ \end{aligned}$$
(18)

where \(\mathfrak {p}_C'(0)\), \(\mathfrak {p}_D'(0)\), \(C(r_{\varvec{j}};\zeta ,0)\), and \(D(r_{\varvec{j}};\zeta ,0)\) are defined in (52), (53), (56), and (57), respectively.

Theorem 3

Let Assumptions A.1–A.3 hold and \(\varvec{j}\in \mathcal{J}_o\). Then it holds in model (1) under the patched-additive-outlier contamination \(Z_{\epsilon ,\zeta }^3\) with point-mass distribution at \(\zeta \ne 0\) and patch length \(k\ge 2\) that

$$\begin{aligned} {\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )= & {} -\frac{\pi }{k}\sqrt{\frac{1-\alpha ^s}{1-\alpha ^p}-\frac{(1-\alpha ^s)^2}{4}}\nonumber \\&\times \,\left[ \mathfrak {p}_C'\mathcal {C}\left( \frac{1}{2}\right) + \mathfrak {p}_D'\mathcal {D}\left( \frac{1}{2}\right) + \mathfrak {p}_E'\mathcal {E}\left( \frac{1}{2}\right) + \mathfrak {p}_G'\mathcal {G}\left( \frac{1}{2}\right) + \mathfrak {p}_I'\mathcal {I}\left( \frac{1}{2}\right) \right] \nonumber \\ \end{aligned}$$
(19)

where \(\mathfrak {p}_L'\), \(L\in \{C,D,E,G,I\}\), are defined in Eqs. (73), (74), (75), (77), (79), \(\mathcal {L}(1/2)=L(r_{\varvec{j}};\zeta ,0)-1/2\) for \(\mathcal {L}\in \{\mathcal {C},\mathcal {D},\mathcal {E},\mathcal {G},\mathcal {I}\}\) and \(L\in \{C,D,E,G,I\}\), and \(L(r_{\varvec{j}};\zeta ,0)\) for \(L\in \{C,D,E,G,I\}\) are defined in Eqs. (82)–(86) in “Appendix A.3”.

The influence functions reported in Theorems 13 are complicated objects both due to their algebraic forms and their dependence on the unknown parameter values \(\alpha \) and \(\zeta \). As \(\zeta \) is generally unknown, we characterize the worst-case scenario by means of the gross error sensitivity: recall that \( {\text {GES}}(\hat{r}_{\varvec{j}};\alpha ) = \sup _{\zeta }\left| {\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )\right| \) by Eq. (16). Inspection of the influence functions and their elements in Theorems 13 reveals though that the largest effect can be attributed to outliers with magnitude \(|\zeta | \rightarrow +\infty \) (possibly with an exception of term \(\mathcal {E}(1/2)\) in Theorem 3).

Given the results in Theorems 13, we thus have to compute the GES of estimators \(\hat{r}_{\varvec{j}}\) numerically for each \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\) and \(\alpha \in (-\,1,1)\). Although this might be relatively demanding if T is large and a dense grid for \(\alpha \) is used, note that the GES values are asymptotic and independent of a particular data set. They have to be therefore evaluated just once and then used repeatedly during any application of the proposed PD-DZ estimator. We computed the GES of \(\hat{r}_{\varvec{j}}\) for \(\varvec{j}\in \{ (s,s,p); s=1,3,5,7 \text{ and } p=1,3,5,7,9,11\}\) with the variance \(\sigma _\varepsilon ^2\) set equal to one without loss of generality. The results corresponding to Theorems 13 are depicted on Figs. 12 and 3. Irrespective of the contamination scheme, most GES curves display typically higher sensitivity to outliers for \(|\alpha |\) close to one than for values of the autoregressive parameter around zero. One can also see that the DZ estimator corresponding to \(s=1\) and \(p=1\) is indeed biased towards 0, 1, and \(-\,1\) for the contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\), respectively. Concerning the higher-order differences we propose to add to the (AC-)DZ methods, Fig. 1 documents they do exhibit high sensitivity to independent outliers. On the other hand, their sensitivity to the patches of outliers on Fig. 2, for instance, decreases with an increasing s and becomes very low (relative to \(s=1\) and \(p\ge 1\)) if s is larger than the patch length k, for example, \(s = 7 > k = 6\).

Fig. 1
figure 1

Gross-error sensitivity of \(\hat{r}_{\varvec{j}}\), \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\), under contamination \(Z_{\epsilon ,\zeta }^1\) by independent additive outliers. a \(s=1\), b \(s=3\), c \(s=5\), d \(s=7\)

Fig. 2
figure 2

Gross-error sensitivity of \(\hat{r}_{\varvec{j}}\), \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\), under contamination \(Z_{\epsilon ,\zeta }^2\) by patch additive outliers, length of the path \(k=6\). a \(s=1\), b \(s=3\), c \(s=5\), d \(s=7\)

Fig. 3
figure 3

Gross-error sensitivity of \(\hat{r}_{\varvec{j}}\), \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\), under contamination \(Z_{\epsilon ,\zeta }^3\) by patch additive outliers, length of the path \(k=6\). a \(s=1\), b \(s=3\), c \(s=5\), d \(s=7\)

3.3 Robust properties of the GMM estimator \(\hat{\alpha }_{n}\)

Given the results of the previous sections, we will now analyze the robust properties of the general GMM estimator \(\hat{\alpha }\) based on moment equations (6) for \(\varvec{j}=(s,s,p)\in \mathcal {J}_o\). The results are stated first for the initial PD-DZ estimator (7) with a deterministic weight matrix and later for the second step of the PD-DZ estimator (9). Since the weight matrix and the bias in particular can be estimated in different ways, we consider in the latter case a weight matrix as a general function of the parameter \(\alpha \) and the considered fraction \(\epsilon \) of outliers.

Theorem 4

Consider a particular additive outlier contamination \(Z_{\epsilon }\) occurring with probability \(\epsilon \), where \(0<\epsilon <1\). Further, let \(\mathcal {J}\subseteq \mathcal {J}_o\).

First, assume that \(\varvec{A}_n=\varvec{A}^0\) is a positive definite diagonal matrix. Then the influence function of the GMM estimator \(\hat{\alpha }^0\) using moment conditions indexed by \(\mathcal J\) is given by

$$\begin{aligned} {\text {IF}}(\hat{\alpha }^0;\alpha ,\zeta )=-(\varvec{d}'\varvec{A}^0\varvec{d})^{-1}\varvec{d}'\varvec{A}^0\varvec{\psi }, \end{aligned}$$
(20)

where \(\varvec{d}\) is defined in Theorem 5 and \(\varvec{\psi }\) is the \(|\mathcal {J}|\times 1\) vector of the influence functions of each single \(\hat{r}_{\varvec{j}}\), \(\varvec{\psi }=\big ({\text {IF}}(\hat{r}_{\varvec{j}};\alpha ,\zeta )\big )_{\varvec{j}\in \mathcal {J}}\).

Next, assume that \(\varvec{A}_n=\varvec{A}(\hat{\alpha }_n^0, \epsilon )\) is a positive definite matrix function of the initial estimate \(\hat{\alpha }_n^0\) based on the deterministic weight matrix \(\varvec{A}^0\). If \(\varvec{A}_n = \varvec{A}(\hat{\alpha }_n^0, \epsilon ) \rightarrow \varvec{A}= \varvec{A}(\alpha ,\epsilon )\) has a finite probability limit and bounded influence function as \(n\rightarrow \infty \), then the influence function of \(\hat{\alpha }\) using moment conditions indexed by \(\mathcal J\) is again given by

$$\begin{aligned} {\text {IF}}(\hat{\alpha };\alpha ,\zeta )= -(\varvec{d}'\varvec{A}\varvec{d})^{-1}\varvec{d}'\varvec{A}\varvec{\psi }. \end{aligned}$$
(21)

Contrary to the breakdown point of Aquaro and Čížek (2014) mentioned earlier, the bias of the proposed PD-DZ estimators is a linear combination of the biases of the individual moment conditions depending on \(\hat{r}_{\varvec{j}}\). To minimize the influence of outliers on the estimator, one could theoretically select the moment condition with the smallest IF value, which could however result in a poor estimation if the moment condition is not very informative of the parameter \(\alpha \). As suggested in Sect. 2.3, we aim to minimize the MSE of the estimates and thus downweight the individual moment conditions if their biases or variances are large. Obviously, this will also lead to lower effects of biased or imprecise moment conditions on the IF in Theorem 4. To quantify the maximum influence of generally unknown outliers on the estimate, the GES function of the GMM estimator, that is, the supremum of IF in (21) with respect to \(\zeta \) can be used again.

3.4 Estimating the bias

The IF and GES derived in Sect. 3.2 characterize only the derivative of the bias caused by outlier contamination. We will refer to them in the case of contamination schemes \(Z^1_{\epsilon ,\zeta }, Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\) by \({\text {IF}}^c_k\) and \({\text {GES}}^c_k\), \(c=1,2,3\), respectively, where k denotes the number of consecutive outliers (patch length) in schemes \(Z^2_{\epsilon ,\zeta }\), and \(Z^3_{\epsilon ,\zeta }\). Whenever the sequence of consecutive outliers is mentioned in this section, we understand by that a sequence of observations \(y_{ it }, t=t_1,\ldots ,t_2\), that can all be considered outliers.

To approximate \(\varvec{b}_n = Bias \{\varvec{g}_n(\alpha )\}\) introduced in Sect. 2.3, we therefore need to estimate the type and amount of outliers in a given sample. Assuming that the consecutive outliers form sequences of length k and the fraction of such outliers in data is denoted \(\epsilon _k\), the bias can be approximated using the \(\epsilon _k\)-multiple of \(|{\text {IF}}_1^1|\) or \({\text {GES}}_1^1\) if \(k=1\) and of \(\max \{ |{\text {IF}}_k^{2}|, |{\text {IF}}_k^{3}| \}\) or \(\max \{ {\text {GES}}_k^{2}, {\text {GES}}_k^{3} \}\) if \(k>1\) since we cannot reliably distinguish contamination \(Z^2_{\epsilon ,\zeta }\) and \(Z^3_{\epsilon ,\zeta }\). Given that the outlier locations cannot be reliably computed either, GES is preferred for estimating the bias due to contamination.

We therefore suggest to compute the bias vector \(\varvec{b}_n\) in the following way, provided that the estimates \(\hat{\epsilon }_k\) of the fractions of outliers forming sequences or patches of length k are available:

$$\begin{aligned} \hat{\varvec{b}}_n = \left\{ \max _{k=1,\ldots ,T} \left[ \hat{\epsilon }_k \cdot \max _{c} {\text {GES}}^c_k(\hat{r}_{\varvec{j}};\hat{\alpha }_n^0)\right] \right\} _{\varvec{j}\in \mathcal{J}}, \end{aligned}$$
(22)

where \(\hat{\alpha }_n^0\) is an initial estimate of the parameter \(\alpha \) and the inner maximum is taken over \(c\in \{1\}\) for \(k=1\) and \(c\in \{2,3\}\) for \(k>1\). Note that if outliers (or particular types of outliers) are not present, \(\hat{\epsilon }_k = 0\) and the corresponding bias term is zero.

To estimate \(\hat{\epsilon }_k\), an initial estimate \(\hat{\alpha }_n^0\) is needed. Once it is obtained by the DZ or AC-DZ estimator, the regression residuals \(\hat{\varepsilon }_{ it }\) can be constructed, for example, by \(\hat{u}_{ it } = y_{ it } - \hat{\alpha }_n^0 y_{it-1}\) and \(\hat{\varepsilon }_{ it } = \hat{u}_{ it } - {{\text {med}}}_{t=2,\ldots ,T} \hat{u}_{ it }\) for any \(i=1,\ldots ,n\) and \(t=2,\ldots ,T\); the median \({\text {med}}_{t=2,\ldots ,T} \hat{u}_{ it }\) is used here as an estimate of the individual effect \(\eta _i\) similarly to Bramati and Croux (2007). Having estimated residuals \(\hat{\varepsilon }_{ it }\), the outliers are detected and the fractions \(\epsilon _k\) of outliers in data forming the patches or sequences of k consecutive outliers are computed. We consider as outliers all observations with \(|\hat{\varepsilon }_{ it } |> \gamma \hat{\sigma }_\varepsilon \), where \(\hat{\sigma }_\varepsilon \) estimates the standard deviation of \(\varepsilon _{ it }\), for example, by the median absolute deviation \(\hat{\sigma }_\varepsilon = \text{ MAD }(\hat{\varepsilon }_{ it }) / \Phi ^{-1}(3/4)\), and \(\gamma \) is a cut-off point. Although one typically uses a fixed cut-off point such as \(\gamma =2.5\), it can be chosen in a data-adaptive way by determining the fraction of residuals compatible with the normal distribution function of errors, for instance. This approach pioneered by Gervini and Yohai (2002) determines the cut-off point as the quantile of the empirical distribution function \(F_n^+\) of \(|\hat{\varepsilon }_{ it }|/\hat{\sigma }_\varepsilon \):

$$\begin{aligned} \hat{\gamma }_n = \min \left\{ t{:}\,F_n^+(t)\ge 1-d_n \right\} \end{aligned}$$
(23)

for

$$\begin{aligned} d_n = \sup _{t\ge 2.5} \max \left\{ 0, F_0^+(t) - F_n^+(t) \right\} , \end{aligned}$$

where \(F_0^+(t) = \Phi (t) - \Phi (-t), t\ge 0\), denotes the distribution function of |V|, \(V \sim N(0,1)\).

3.5 Algorithm

The whole procedure of the bias estimation, and subsequently, the proposed GMM estimation with the robust moment selection can be summarized as follows.

  1. 1.

    Obtain an initial estimate \(\hat{\alpha }_n^0\) by DZ or AC-DZ estimator.

  2. 2.

    Compute residuals \(\hat{u}_{ it } = y_{ it } - \hat{\alpha }_n^0 y_{it-1}\) and \(\hat{\varepsilon }_{ it } = \hat{u}_{ it } - {\text {med}}_{t=2,\ldots ,T} \hat{u}_{ it }\) and estimate their standard deviation \(\hat{\sigma }_\varepsilon \).

  3. 3.

    Using the data-adaptive cut-off point (23), determine the fractions \(\hat{\epsilon }_k\) of outliers present in the data in the forms of outlier sequences of length k.

  4. 4.

    Approximate the bias \(\varvec{b}_n\) due to outliers by \(\hat{\varvec{b}}_n\) using (22) and estimate the variance matrix \(\varvec{V}_n\) in Theorem 5 by \(\hat{\varvec{V}}_n\) for all moment conditions (5) defined for indices \(\varvec{j}\in \mathcal{J}_o\).

  5. 5.

    For all \(\varvec{j}=(s,s,p) \in \mathcal{J}_o\),

    1. (a)

      set \(\mathcal{J} = \{(k,k,l){:}\,1\le k \le s \text{ is } \text{ odd }, 1 \le l \le p \text{ is } \text{ odd }\}\);

    2. (b)

      compute the GMM estimate \(\hat{\alpha }_{n,\mathcal J}\) defined in (9) using the moment conditions selected by \(\mathcal J\) and the weighting matrix defined as the inverse of the corresponding submatrix of \(\hat{\varvec{W}}_n = \hat{\varvec{b}}_n\hat{\varvec{b}}_n' + \hat{\varvec{V}}_n\);

    3. (c)

      evaluate the criterion \( RRMSC (\mathcal{J})\) defined in (10).

  6. 6.

    Select the set of moment conditions by

    $$\begin{aligned} \hat{\mathcal{J}} = \mathop {{\text {arg}}\,{\text {min}}}\limits _{\mathcal{J} \subseteq \mathcal{J}_o} RRMSC (\mathcal{J}). \end{aligned}$$
  7. 7.

    The final estimate equals \(\hat{\alpha }_{n,\hat{\mathcal{J}}}\).

Let us note that the algorithm in step 5 does not evaluate the GMM estimates for all subsets of indices \(\mathcal{J} \subseteq \mathcal{J}_o\) and the corresponding moment conditions as that would be very time-consuming. It is therefore suggested to limit the number of \(\mathcal{J}_o\) subsets and one possible proposal, which always includes the DZ condition in the estimation, is described in point 5 of the algorithm. If an extensive evaluation of many GMM estimators has to be avoided, it is possible to opt for a simple selection between the DZ, AC-DZ, and PD-DZ estimator, where PD-DZ uses all moment conditions defined by \(\mathcal{J}_o\).

4 Monte Carlo simulation

In this section, we evaluate the finite sample performance of the proposed and existing estimators by Monte Carlo simulations to see whether the proposed method can weight the moment conditions so that it picks and mimicks the performance of the better estimator (e.g., out of those with fixed sets of moment conditions such as DZ and AC-DZ) for each considered data generating process.

Let \(\{y_{ it }\}\) follow model (1). We generate \(T+100\) observations for each i and discard the first 100 observations to reduce the effect of the initial observations and to achieve stationarity. We consider cases with \(\alpha =0.1,0.5,0.9\), \(n=25,50,100,200\), \(T=6,12\), \(\eta _i\sim \mathrm {N}(0,\sigma _\eta ^2)\), and \(\varepsilon _{ it }\sim \mathrm {N}(0,1)\). If data contamination is present, it follows the contamination schemes (11) and (12) for \(\epsilon =0.20\). More specifically, \(Z_{\epsilon ,\zeta }^1\) and \(Z_{\epsilon ,\zeta }^2\) used with \(p=3\) are both based on \(\zeta \) drawn for each outlier or patch of outliers randomly from U(10, 90); \(U(\cdot ,\cdot )\) denotes here the uniform distribution. The extreme values of outliers are chosen as they are supposed to have the largest influence on the estimates—cf. Theorem 1, for instance. Note that we have also considered mixes of two contamination schemes, for example, mixing equally independent additive outliers and patches of outliers, but the results are not reported as they are just convex combinations of the corresponding results obtained with only the first and only the second contamination schemes.

All estimators are compared by means of the mean bias and the root mean squared error (RMSE) evaluated using 1000 replications. The included estimators are chosen as follows. The non-robust estimators are represented by the Arellano–Bond (AB) two-step GMM estimatorFootnote 1 (Arellano and Bond 1991), the system Blundell and Bond (BB) estimatorFootnote 2 (Blundell and Bond 1998), and the X-differencing (XD) estimator (Han et al. 2014). The globally robust estimators are represented by the original DZ and AC-DZ estimators and by the proposed PD-DZ estimator. For the latter, we consider two different moment selection criteria RRMSC: BIC and HQIC with \(\kappa _c = 2.1\) introduced in Sect. 2.4.

Considering the clean data first (see Table 1), most estimators exhibit small RMSEs except of the AB estimator that is usually strongly negatively biased if \(\alpha \) is close to 1. The BB estimator performs well under these circumstances as expected, but is outperformed by the XD estimation. Regarding the robust estimators, the results are closer to each other for \(T=6\) than for \(T=12\) since there are only three possible moment conditions (5) if \(T=6\). The DZ estimator based on the first moment condition only is lacking behind AC-DZ and PD-DZ when \(\alpha \) is not close to zero and additional higher-order moment conditions thus improve estimation. The results for AC-DZ and PD-DZ are rather similar in most situations, with PD-DZ becoming relatively more precise as n increases due to less noisy moment selection. Overall, adding moment conditions improves performance of AC-DZ and PD-DZ relative to DZ; the performance of PD-DZ is worse than that of the AB and BB estimators for \(\alpha =0.1\), matches them for \(\alpha =0.5\), and outperforms them for \(\alpha =0.9\).

Table 1 RMSE for all estimators in model with \(\varepsilon _{ it }\sim \mathrm {N}(0,1)\) and \(\eta _{i}\sim \mathrm {N}(0,1)\) under different sample sizes

Next, the two different data contaminations schemes are considered: independent additive outliers and the patches of additive outliers. Considering the independent additive outliers first (see Table 2), which generally bias estimates toward zero and lead thus to larger biases especially for values of \(\alpha \) close to 1, AB, BB, and XD are strongly biased in all cases as expected. In the case of robust estimators, the negative biases of DZ, AC-DZ, and PD-DZ are rather small, although increasing with \(\alpha \). As AC-DZ outperforms DZ in this case, PD-DZ should and does exhibit performance more similar to AC-DZ than to DZ; PD-DZ even outperforms AC-DZ for \(\alpha =0.9\) or the largest sample size. This confirms the functionality of the weighting as the inclusion of higher-order differences with \(s>1\) in PD-DZ could lead to large biases due to independent additive outliers especially for \(\alpha =0.9\), see Fig. 1.

Table 2 Biases and RMSE for all estimators in data with \(\varepsilon _{ it }\sim \mathrm {N}(0,1)\), \(\eta _{i}\sim \mathrm {N}(0,1)\), and 20% contamination by independent additive outliers under different sample sizes
Table 3 Biases and RMSE for all estimators in data with \(\varepsilon _{ it }\sim \mathrm {N}(0,1)\), \(\eta _{i}\sim \mathrm {N}(0,1)\), and 20% contamination by the patches of 3 additive outliers under different sample sizes

On the other hand, the higher-order differences with \(s>1\) should provide benefits when the data are contaminated by the patches of additive outliers, see Table 3. This type of contamination leads again to substantially biased non-robust estimates by XD, AB, and BB, In the case of the robust estimators, the patches of outliers tend to bias them toward 1 and have thus largest effect for \(\alpha \) close to 0. Hence, the biases of and more generally differences among the robust estimators are smallest for \(\alpha =0.9\). For smaller values of \(\alpha \), DZ outperforms AC-DZ, in particular for \(\alpha =0.1\), as the patches of outliers have a larger impact on the higher-order differences of AC-DZ—see Fig. 2a. Thus, the proposed PD-DZ should and does perform similarly to DZ and actually outperforms it in most situations most cases with \(\alpha \le 0.5\), which again confirms that the proposed weighting scheme is able to choose moment conditions that are less affected by the outliers. Note that the largest difference between DZ and PD-DZ is observed for \(T=12\) and \(\alpha =0.5\) as the higher-order differences can be used only if the number T of time periods is sufficiently large and they have a reasonable precision only if \(\alpha \) is not close to zero.

5 Concluding remarks

In this paper, we propose an extension of the median-based robust estimator for dynamic panel data model of Dhaene and Zhu (2017) by means of multiple pairwise differences. The newly proposed GMM estimation procedure that uses weights accounting both for the variance and outlier-related bias of the moment conditions is combined with the moment selection method. As a result, the estimator performs well in non-contaminated data as well as in data containing both independent outliers and patches of outliers.