On robust estimation of negative binomial INARCH models

We discuss robust estimation of INARCH models for count time series, where each observation conditionally on its past follows a negative binomial distribution with a constant scale parameter, and the conditional mean depends linearly on previous observations. We develop several robust estimators, some of them being computationally fast modifications of methods of moments, and some rather efficient modifications of conditional maximum likelihood. These estimators are compared to related recent proposals using simulations. The usefulness of the proposed methods is illustrated by a real data example.


Introduction
Let Y 1 , . . . , Y n be a time series of counts like the weekly number of people falling ill in epidemiology, the number of transactions per minute in finance, or jobs sent to a server during an hour. Zhu [28] extends the Poisson integer valued GARCH (more briefly called INGARCH) model, which has been put forward by [12,14], among others, to scenarios where the conditional distribution of Y t given the past of the process exhibits overdispersion. In the arising NBINGARCH model this conditional distribution is assumed to belong to the negative binomial family, Here, F t−1 is the σ -algebra describing the information provided by the past of the process up to time t − 1, r ∈ N is the constant number of successes, and p t is the time-varying probability of success. He assumes a linear model for the inverse odds λ t = (1 − p t )/ p t , which is regressed on past values Y t−1 , . . . , Y t− p , λ t−1 , . . . , λ t−q . The negative binomial is a natural extension of the Poisson and covers a broad variety of overdispersed unimodal distributions. Zhu [28] studies Yule-Walker estimators for the parameter θ = (α 0 , . . . , α p ) of NBINGARCH( p, 0) models, also called NBINARCH( p) models, and conditional least squares and conditional maximum likelihood estimators for the parameter of NBINGARCH( p, q) models for given values of r .
A first approach for robust fitting of NBINGARCH( p, q) models has been published by [27]. Like these authors we parameterize the negative binomial distribution in terms of the conditional mean μ t ≥ 0 and the parameter κ = 1/r ≥ 0. The conditional mean and variance of Y t are respectively, i.e., κ measures the amount of overdispersion. The conditional Poisson model corresponds to the limiting special case κ = 0. An analogous parametrization has also been used by [1] in the context of generalized linear models for independent observations because of its greater numerical stability. While κ can take any positive value, r is usually restricted to be an integer. The linear model of [28] mentioned above can be expressed equivalently in terms of μ t as with α 0 , . . . , α p , β 1 , . . . , β q being positive parameters. The interest of [27] is in transaction counts, where very large numbers of observations are available for model fitting. Accordingly, they focus on scenarios with at least n = 1000 observations. Moreover, they usually fix the dispersion parameter κ at the true value in their simulations. Our interest is also in scenarios with only a few hundred observations, where robust joint estimation of the shape parameter κ and the regression parameters β 1 , . . . , β q measuring the effects of unobserved past conditional means becomes very difficult. We avoid the latter and focus on the simpler class of NBINARCH( p) models, where we regress on past observations, only. NBINARCH models offer the advantage that the observed process (Y t : t ∈ N 0 ) forms a p-th order Markov chain, which implies some simplifications in practice. Furthermore, Xiong and Zhu recommend the MCD [24] for dealing with outlying past observations, although the MCD has been designed for multivariate elliptically symmetric continuous measurements [17]. This may eliminate observations from the estimating equations although just a single preceding value is spurious. We instead shrink outlying past values towards an estimate of the marginal mean to avoid unnecessary eliminations.
The rest of the paper is organized as follows. In Sect. 2, we investigate robust versions of method of moment estimators, which are useful to get a first idea about a suitable model and its order p. In Sect. 3, we discuss robustifications of conditional maximum likelihood estimators for NBINARCH models. Section 4 illustrates the methods by a data example and Sect. 5 concludes.

Estimation derived from method of moments
Moment type estimators are computationally tractable even in case of high model orders if the moments are estimated individually without the need of multidimensional optimization. A possible drawback arises if the parameters of interest are nonlinear transforms of the moments used, since estimation errors can increase a lot in this transformation then. Nevertheless, moment type estimators are convenient for getting a first idea about suitable model orders and for initialization of more complex estimators like those discussed in the next section.
Here and in the following, we make use of popular ψ-functions as proposed by Huber and by Tukey. The Huber ψ-function compromises least squares and least absolute deviations, Tukey's biweight ψ-function is where I A is the indicator function of a set A. The ψ-functions are applied to standardized values (y − μ)/σ . The tuning constant c regulates the robustness and the efficiency of the estimators. For both these ψ-functions, larger values of c increase the efficiency but reduce the robustness to outliers. The Poisson and the negative binomial distributions do not form location-scale families and the tail behaviour depends on the parameters, so that suitable choices of the tuning constant in principle depend on the parameters. For our calculations we need the formula for the conditional probabilities corresponding to the negative binomial distribution in terms of μ t and κ, which is

Conditions for mean and second-order stationarity
Many of [28] formulae simplify if the parametrization in terms of κ and the regression parameters θ = (α 0 , . . . , α p ) for the conditional mean is used. The necessary and sufficient condition for the NBINARCH( p) model to be stationary in the mean stated in his Theorem 1 then becomes that all roots of the equation lie inside the unit circle. Under this assumption, the marginal mean is as in the limiting conditional Poisson case. The additional condition for second-order stationarity is that all roots of 1 − C 1 z −1 − · · · − C p z − p = 0 lie inside the unit circle, where with b vu being the elements of the inverse of the matrix (β vu ) The above conditions simplify if p is small. An NBINARCH(2) model is mean stationary iff α 1 + α 2 < 1, and it is second-order stationary iff additionally The restrictions for α 1 and α 2 to achieve second-order stationarity are thus more stringent for larger values of κ.
In the simplest interesting case, the second-order stationary NBINARCH(1)-model, the marginal mean, variance and the autocorrelations are imposing a tendency to overestimate κ if autocorrelation is neglected.

Definition of the estimators
Zhu [28] studies Yule-Walker (YW) estimation of α 1 , . . . , α p based on the ordinary sample autocovariances in NBINARCH( p) models. He suggests combining them with a method of moments (MoM) estimator of α 0 based on the formula (3) for the marginal mean μ. A moment type estimator of κ [4,8,19] can be obtained solving where p + 1 is the dimension of θ . Remember that σ 2 t = μ t + κμ 2 t . We can use the autocorrelation (ρ Y (h), h ∈ N) instead of the autocovariance function to write down the Yule-Walker estimators. We plug the sample autocorrelationŝ ρ Y (1), . . . ,ρ Y ( p) into the equations (4) for ρ Y (1),. . .,ρ Y ( p) and solve for the unknown parameters α 1 , . . . , α p , imposing the basic positivity restriction. A suitable model order p can be identified from the partial autocorrelation function, as it is zero for all lags h > p. The partial autocorrelations can be estimated from estimates of the ordinary autocorrelations using the Durbin-Levinson algorithm, setting negative estimates to 0 because of the positivity restrictions.
Suitable score transforms can improve the asymptotical efficiencies of rank based autocorrelation estimators in case of autoregressive models with continuous innovation densities; see e.g. [16]. Let R t be the rank of Y t and J a score function, which is often chosen as the quantile function F −1 of a suitable distribution function F. Using this notation, the concept of empirical autocorrelation can be applied to the transformed ranks J (R t /(n + 1)), t = 1, . . . , n. The negative binomial distribution functions F μ,κ do not form a locationscale family and there is no linear relationship between the quantiles of different negative binomials. A suitable score function will thus depend on μ and κ, which will be unknown in practice. We investigate rank estimators with negative binomial (NB) scores, applying a stepwise approach: for initialization we calculate estimatesμ,κ of the negative binomial parameters under the simplifying assumption of observing independent data. Then estimate α 1 , . . . , α p from ranks transformed by the quantile function J = F −1 μ,κ . Thereafter use these NB scores estimators to estimate α 0 and κ as described below. The process can be iterated using the new estimates of μ and κ to improve on the estimates of α 1 , . . . , α p .
A robust method of moments type estimator of α 0 can be derived by estimating the marginal mean μ for a given value of κ modifying the Huber or the Tukey M-estimator for the Poisson distribution investigated by [5,11], respectively. Setting σ 2 = μ + κμ 2 , we can where a = a(μ, κ) = Eψ c ((Y 1 − μ)/σ ) is a bias correction to achieve Fisher consistency. In our calculations, we fix the value of κ at an initial estimate and ignore the effects of the temporal dependence in this part of the algorithm for simplicity. Elsaied and Fried [11] suggest the tuning constants c = 1.8 for the Huber and c = 5.5 for the Tukey M-estimator in case of independent Poisson data. Since negative binomial distributions are more heavy-tailed than the Poisson, we choose larger tuning constants c ≥ 2 and c ≥ 6, respectively. Moreover, we suggest a robustification of the moment type estimator of κ based on equation (5), applying a robust ψ-function and solving where d is the expectation of the term on the left hand side. Since our interest in this equation is in a robust estimate of the amount of overdispersion κ, a rather large tuning constant like c = 10 can be chosen for the Tukey ψ-function here to restrict the resulting negative bias, setting d(n, p) = 1, which is its limiting value as c → ∞. We combine the ideas presented above as follows, initializing with an a-priori guess of κ (e.g. κ = 0) and estimating α 1 , . . . , α p by plugging the first p Spearman rank autocorrelations into the YW equations. In steps 1 and 3 we use the R (R Core Team [21]) function optimise to minimize the squared difference between the terms on the left and the right hand side of the corresponding estimation equation. 0. Estimate α 1 , . . . , α p using the YW equations and the first p Spearman rank autocorrelations. 1. Estimate μ from (6) using the current estimate of κ. 2. Estimate α 0 from (3) using the current estimates of μ and α 1 , . . . , α p . 3. Estimate κ from (7) using the current estimate of θ = (α 0 , . . . , α p ) . 4. (Optional) Estimate α 1 , . . . , α p using the YW equations by plugging in the NB score transforms of the first p Spearman rank autocorrelations based on the current estimates of μ and κ.
Steps 1.-4. can be iterated until some convergence criterion is met. We found the changes of the parameter estimates to be very small after the second iteration. We call the estimator obtained after the first iteration the 1-step NB scores estimator. The estimators of μ and κ corresponding to (6) and (7), respectively, can be shown to be consistent at least for known values of the other parameters, using similar arguments as for the joint estimator of all model parameters presented in the next section. In the next subsection we will inspect the performance of the estimators presented here via simulations.

Simulations
We evaluate the performance of the estimators derived from methods of moments, applying them to time series of length n = 200 and n = 500 from several NBINARCH(1) and NBINARCH(2) models. We consider first order models with parameters The two models with κ = 0 correspond to Poisson INARCH models with low or high lag-one autocorrelation, α 1 = 0.2 or α 1 = 0.7. Two other models with κ ∈ {0.2, 0.3} correspond to moderate deviations from the Poisson case, again one of them with low and one with high lag-one autocorrelation. Four other models with κ ∈ {0.7, 0.8} correspond to substantial deviations from the Poisson case, two of them with low and two with high lag-one autocorrelation, combined with a small or a moderately large intercept α 0 ∈ {0.55, 1.5}. Varying these models, we also consider eight second order models with parameters (α 0 , α 1 , α 2 , κ) in We consider time series of length n ∈ {200, 500} and four different data scenarios for each of these models, namely clean time series without outliers, time series with 5% additive outliers of size either 4 marginal standard deviations or 8 marginal standard deviations, rounded to the nearest integer, at time points chosen independently from a uniform distribution, and time series with 5% additive outliers occurring as a block at the end of the time series. The latter scenario resembles the onset of a level shift.
For each of these 128 combinations of an NBINARCH model, an outlier scenario and a sample size, we generate 1000 time series and fit NBINARCH(2) models. From the derived estimates we calculate the empirical mean square errors (MSE) for the estimation of each of the parameters α 0 , α 1 , α 2 and κ, and take the efficiency (as measured by the ratio of the MSEs) relatively to the estimator with the smallest MSE for the respective scenario among those considered here. Then we calculate summary measures like the average mean square error or the median relative efficiency of an estimator over sets of data scenarios. Table 1 summarizes the simulation results for the estimation of α 1 and α 2 by showing the average root of the mean square error and the median relative efficiencies separately for the scenarios without outliers, with isolated additive outliers of size 4σ or 8σ , or with a patch of outliers. We set the tuning constant of the Tukey score function to c = 6 if not stated otherwise. Results for c = 7 are quite similar and not shown here.
The YW estimators based on the ordinary sample autocorrelations offer little advantage as compared to the rank based estimators considered here in terms of efficiency in clean data scenarios, and they are usually inferior in the outlier scenarios. Transforming the ranks by the NB scores improves the efficiency of the rank based YW estimators in case of clean data, but reduces their robustness against outliers to some extent. Similarly, the 1-step NB scores estimator shows somewhat better efficiencies in case of clean data but somewhat larger MSE in case of outliers as compared to Spearman's rank autocorrelation. This difference appears to be smaller for the larger sample size n = 500.
The significance of the coefficients of the sample partial autocorrelation function is commonly checked by comparing their absolute values to the 97.5% percentile of the standard normal distribution divided by the square root of the sample size. We apply this rule to the different estimators of α 2 and calculate the percentage of time series for which the estimate of α 2 is found to be significant separately for 1st and 2nd order models. This gives us an idea of the size and the power of such tests based on the different estimators, see Table 1. The rank based tests work similarly well as the one based on the ordinary sample autocorrelations in case of clean data and better in the presence of outliers. Spearman's rank autocorrelation can be recommended as it works well in case of all outlier scenarios considered here if n = 200, and it shows the least size distortions in case of n = 500 with outliers. Table 2 shows the results for the estimation of α 0 and κ. Combining the Tukey M-estimator of the marginal mean with Spearman's rank autocorrelation leads to the most stable results for α 0 among the methods considered here, followed by the estimators based on the NB scores transformed ranks. Increasing the tuning constant to c = 7 (not shown here) improved the efficiency for α 0 in case of clean data and patchy outliers, but increased the MSE in case of isolated outliers. The iterated NB scores estimator gives the best results for the estimation of κ. Starting the iterated estimator from Spearman's rank autocorrelation (not shown here) leads to rather similar results.
These findings indicate that the partial autocorrelation estimates arising from Spearman's rank autocorrelation can be used to get a first idea about a suitable NBINARCH model order p even in the presence of some outlier contamination. Simple yet quite robust estimates of the autoregressive parameters α 1 , . . . , α p can then be obtained by applying the Yule-Walker equations to the first p Spearman rank autocorrelations, and of α 0 by combining these estimatesα 1 , . . . ,α p with Tukey's M-estimator of the marginal mean using a tuning constant c ≥ 6. Finally, estimate κ solving Eq. (7) using the estimatesμ t =α 0 +α 1 y t−1 +· · ·+α p y t− p andσ t =μ t + κμ 2 t .

Joint M-estimation
In Sect. 2 we have seen that rank based autocorrelation estimators perform similarly well as ordinary sample autocorrelations in terms of efficiency in NBINARCH models, while being more robust. Now we investigate whether the efficiency and robustness of these simple estimators can be improved further if we robustify the conditional maximum likelihood estimators for a given model order p. We expect substantial improvements in particular concerning the Results for clean data (Clean), 5% additive outliers or 5% patchy outliers of size s marginal standard deviations (AOs or POs). Average root of the mean square error (ARMSE, multiplied by 100), median relative efficiency MedEff, empirical size and empirical power for the respective scenarios estimation of α 0 , as the simple method of moment type estimatorμ/(1−α 1 −· · ·−α p ) suffers from the non-linear effects of estimation errors inα 1 , . . . ,α p , especially if the sum of these parameters is close to 1. This can possibly be improved by joint estimation of α 0 , α 1 , . . . , α p as considered in the following but for the price of larger computational costs. We focus on first order models for the reason of simplicity. After reviewing conditional maximum likelihood estimation we summarize the estimation approach of [27], before explaining ours. Then we compare the estimators in a simulation study, including also some of the estimators which showed the best performance in Sect. 2. Results for clean data (Clean), 5% additive outliers or 5% patchy outliers of size s marginal standard deviations (AOs or POs). Average root of the mean square error (ARMSE, multiplied by 100) and median relative efficiency MedEff

M-estimation using multivariate outlyingness
Xiong and Zhu [27] robustify the first of the above score equations of the CML estimator using the Pearson residuals r t = (y t − μ t )/σ t and Mallow's quasi-likelihood estimator proposed by Cantoni and Ronchetti (2001), is a bias correction to achieve Fisher consistency. They consider weights w t = 1 − h t,t based on the diagonal elements h t,t of the hat matrix H = X (X X ) −1 X , with X = (X 1 , . . . ,X n− p ) , as well as hard rejection weights w t = I (D 2 t ≤ χ 2 p (0.95)) based on the squared Mahalanobis distances D 2 t measuring multivariate outlyingness.
For estimation of κ, they suggest the weighted maximum likelihood estimator proposed by [1], using the estimation equation with another Fisher consistency correction b t (κ) = E(S t,κ w t ψ(r t )/r t |F t−1 ). Xiong and Zhu [27] perform a simulation study for time series consisting of n = 1000 observations without, or with up to 40 isolated, or a patch of 52 subsequent outliers. Based on this they recommend hard rejection weights using the minimum covariance determinant (MCD) estimator [25] in combination with Tukey's ψ-function for both, the estimation of θ and κ, using constants c 1 = 7 and c 2 = 6, respectively.
Using multivariate outlyingness allows taking positive correlations among the regressors into account, but it may downweight the contribution of many observations substantially although just one of its regressor components looks suspicious; see the R-package cellWise [22], and the references cited therein for related discussions. Moreover, the MCD has been designed for (continuous) unimodal elliptically symmetric but not for discrete data, see e.g. [17]. Indeed, methods based on subsets of the observations like the MCD may have problems with many identical values, as they can occur in count data; see Duerre et al. (2015) for a discussion in the context of robust autocorrelation estimation. These are possible drawbacks of the method by Xiong and Zhu.

M-estimation using componentwise shrinking
We propose another variant of robust M-estimation for NBINARCH( p) models, combining the M-estimation approaches developed by [10] for the limiting Poisson INARCH model and by [1] for the negative binomial regression model. We robustify the CML estimator along the same lines as in [10] in the Poisson case, applying M-estimation to the Pearson residuals r t = (y t − μ t )/σ t , n t= p+1 Hereby, we use bias corrections c t,0 = c t, While [27] downweight the whole vector S t,θ (ω) in the score equation according to a measure of the multivariate outlyingness of the corresponding regressor values, including the component for the intercept, we shrink each regressor towards the center of its distribution. Note that replacing c t,i (ω) by its expectation c i (ω) = E(c t,i (ω)) is simpler and asymptotically equivalent because of the general ergodic theorem [18] and Slutsky's Lemma.
For estimation of κ we also make use of the M-estimator of [1]. To obtain starting values for an iterative scheme to solve these equations, [1] recommend plugging in the estimates resulting from Poisson M-estimation into the equation for κ and then to iterate. Alternatively, we can use the estimator of κ proposed in the previous section.
In the Appendix we prove the consistency of the joint estimatorω n = (θ n ,κ n ) and conjecture its asymptotic normality, 0 (y t , ω), . . . , s t, p (y t , ω)) , and ω (0) = (θ (0) , κ (0) ) being the true parameter vector. The matrices A(ω (0) ) and B(ω (0) ) can be estimated using their empirical counterparts as in [10], replacing the unknown parameters by their robust estimations and the expectations by the corresponding averages across the realizations; the arising estimate inherits some robustness when using a bounded ψ-function with a bounded derivative.

Simulations
We perform some simulations to compare the performance of the CML estimator, the robust method of moments estimator based on Spearman's rank autocorrelation combined with the Tukey M-estimator of the marginal mean, the 1-step NB scores transformed rank estimator, as well as our Tukey M-estimator and [27] M-estimators using the hat matrix or the MCD. We consider the performance in scenarios without outliers as well as with an increasing number of isolated or patchy additive outliers with the same size. We focus on scenarios similar to those inspected by [27] with time series of length n = 1000. This is in part due to problems when running their algorithms with the recommended MCD for shorter series lengths n, presumably caused by the discreteness of the data. We applied several estimators for initialization of our joint M-estimator, namely the estimators presented in Sect. 2, in particular that using the ordinary Spearman rank autocorrelations, and the robust Poisson INARCH estimator of [10] assuming κ = 0. We can combine these possibilities to check for the possibility of multiple roots. The solutions obtained by the different initializations in the following simulations can usually be regarded as identical.

Scenarios without outliers
First we inspect the efficiency in case of data without outliers, generating 1000 time series of length n = 1000 from an NBINARCH(1) model with parameter ω = (0.55, 0.4, 0.3), see Fig. 1. The initial method of moments estimators achieve about 50% efficiency for α 1 and about 30% efficiency for α 0 . The 1-step NB scores rank estimator improves on this as its efficiencies are about 90% for α 1 and 60% for α 0 . The estimator of [27] with a recommended tuning constant c about 6 or 7 achieves an efficiency of more than 70% using hard rejection based on the MCD and even about 80% efficiency when using soft rejection based on the hat matrix. Our estimator shows a stronger dependence on c. It achieves high efficiencies of about 90% when using c = 10, like Xiong and Zhu's method, while for c = 7 it is somewhat less efficient than their estimator. Figure 2 compares normal qq-plots of our joint M-estimator with c = 10 and of the conditional maximum likelihood estimator in case of time series of length n = 200, n = 500 or n = 1000. The finite-sample distributions of both estimators are right-skewed for α 0 if n = 200 but reasonably close to normality for the larger sample sizes, with little difference between the estimators.

Scenarios with an increasing number of isolated outliers
Now we analyze the behavior of the estimators in the presence of outliers. In a first experiment we generate a time series of length n = 300 from an NBINARCH(1) model with parameters α 0 = 0.55, α 1 = 0.4 and κ = 0.3 and include a single additive outlier of increasing size. Figure 3 shows the arising sensitivity curves of several estimators for estimation of α 0 and α 1 , corresponding to the difference between the estimated value with and without this outlier, multiplied by n. Except for the CMLE, the other estimators show bounded sensitivities, with the effects being largest for the 1-step NB score estimator, followed by our joint M-estimator.
Next we include an increasing number of 4,8,…,40 isolated additive outliers of size 5 or 10 at positions chosen at random, analyzing 500 time series for each scenario. Figure 4 depicts the resulting biases as a function of the number of outliers. The lack of robustness of the CMLE and the Poisson quasi-likelihood estimator manifests itself in an increasingly large positive (negative) bias for α 0 (α 1 ), particularly for the larger outlier size. Spearman's rank autocorrelation is negatively biased for α 1 , and the resulting estimator for α 0 thus positively biased, which explains its moderately large efficiency for large sample sizes observed before. The 1-step NBscore transform estimator reduces this bias in case of clean data but is more affected by the outliers. Our Tukey M-estimator shows some bias for the estimation of α 1 in case of the smaller outlier size considered here, but performs better than the other estimators otherwise, except for M-estimator with hard-thresholding based on the MCD [27]. These two estimators perform similarly well although we have chosen a larger tuning constant c = 10 for our estimator to obtain a high efficiency in case of clean data, instead of the c = 6 recommended by Xiong and Zhu for their estimator. Qualitatively similar results have been obtained for time series with parameters α 0 = 1, α 1 = 0.6 and κ = 0.3 (not shown here).

Scenarios with a patch of additive outliers
We also investigate scenarios with a patch of consecutive additive outliers of the same size, which can arise for instance because of a temporary disturbance or a temporary level shift. Figure 5 depicts the results obtained for an increasing number of 4, 8, …, 52 additive outliers of identical size 5 or 10 starting at time point 250 in NBINARCH(1) time series with parameters α 0 = 0.55, α 1 = 0.4, κ = 0.3. A patch of consecutive outliers imposes a positive bias on non-robust estimators of α 1 , which can then lead to a negative bias for the estimation of α 0 . This can be seen for the 1-step NB scores rank estimator, which shows a similarly strong bias for α 1 as the QMLE and the CML here. Estimation using Spearman's rank autocorrelation performs better than this in terms of bias and similar to our Tukey M-estimator with a tuning constant c = 10, which is large relatively to the height of the shift. Application of the latter with a smaller tuning constant c = 6 resists these outliers better and similarly well as the Mestimator with hard rejection proposed by [27] in the scenarios considered here. The version of their estimator applying soft-thresholding based on the hat matrix shows considerable problems for the estimation of the intercept here, and this also applies to the version using the MCD in case of the smaller outlier size.

Data example
For illustration we analyze the number of campylobacterosis infections from January 1990 to the end of October 2000 in the north of the Province of Québek, Canada. There is one observation every 4 weeks, that is 13 observations per year. [12] used this data set shown in Fig. 6 to exemplify the INGARCH model with a conditional Poisson distribution. Under the same basic model assumption, [13] found a possible level shift at time t = 84 and a socalled spiky outlier at time t = 100 by applying an iterative procedure for the detection and elimination of different types of intervention effects. Later on, [20] found evidence that the conditional distribution might be better described by a negative binomial than by a Poisson distribution, albeit their estimate of the overdispersion parameter was small (κ ≈ 0.0297). In consequence, the findings of [13] might have been influenced by using the wrong conditional distribution, and in turn the findings of [20] might be in part due to extraordinary effects in these data, which might have been estimated incorrectly or even been missed. We re-analyze these data using robust methods to get additional insights. Further work is needed to include structural changes or covariate information into the methods presented in Sect. 3. We instead concentrate on the methods presented in Sect. 2.2 as they can be modified easily to account for a level shift:  likelihood estimate of α 1 from the full data set leads to a much larger value than this, namely about 0.64, which is just within the range of a 95% confidence interval based on the first six years. 2. Second we fit an INARCH(1) model with a level shift after t = 84 to the data, i.e., we allow for a change of the intercept α 0 . In order to make use of the full data set in spite of a possible level shift, we split the time series into several non-overlapping subsequences and estimate the parameters of interest from each of them separately. This approach has been analysed by [23] for estimation of the Hurst parameter and by [3] for estimation of the variance. Here, we split the time series into five subsequences of length 28 each, which fits well to the total sample size of 140 and also to a possible level shift right at the end of the third subsequence. Due to the robustness properties of ranks we expect the effect of a single spiky outlier on the estimate to be small and take the average of the estimates obtained from the five subsequences as our final estimate. The average lag one Spearman rank autocorrelation isα 1 = 0.368, which is again large as compared to the asymptotic standard error 0.085 under white noise and very close to the value 0.369 obtained by [20] using an iterative procedure for detecting and modelling different types of intervention effects.  [20] applying a sequential outlier detection and modelling procedure.
Our robust method of moment type estimatorκ of κ is obviously biased, with a bias which is larger for smaller tuning constants c. The smaller estimate obtained for c = 10 might thus be due to a negative bias, but the difference to the less robust estimates could also be due to outlier effects. To correct the bias ofκ we can multiply it with a correction factor f c (θ, κ), which will likely depend on the true parameters θ = (α 0 , α 1 ) and κ as well. To derive suitable finite sample correction factors we generate 1000 NBINARCH(1) time series with the same length n = 140, the same parameters and the same level shift as found for the real data, and estimate the value of κ for each of them. Then we set f c (θ,κ) to the ratio between the value of κ, from which these artificial data sets have been generated, and the average of these estimates. Multiplication ofκ with this factor gives a new bias-reduced estimate. Since f c (θ,κ) only approximates the factor f c (θ, κ) needed, we can iterate this process until it stabilizes after a few steps. This leads us to an estimate of about 0.04, a little larger than those reported before but well within the bootstrap CI.
Overall our analysis confirms that a NBINGARCH model describes this data set well after taking a possible level shift at observation 84 and an outlier at time 100 into account. Figure  6 illustrates that all observations except for a few ones at or right after time point 100 are below or at least close to the respective 95% percentile of the fitted 1-step ahead predictive distribution. The observations at t = 100 and t = 101 are far outlying as compared even to the 99% percentile. Note that the differences between these percentiles and those obtained by a Poisson INARCH(1) model with the same conditional mean sequence is at most 2 in case of the 95% and at most 3 in case of the 99% percentile, so that a Poisson model suits most purposes here.

Conclusions
We have proposed robustifications of method of moments and ML-estimation for fitting INARCH models with conditional negative binomial distributions. The former avoid multidimensional optimizations and are useful for getting information about a suitable model and as initial estimates for calculation of the more efficient and robust joint M-estimators of the autoregressive parameters.
Our proposal of negative binomial scores increases the efficiency of rank based autocorrelation and partial autocorrelation estimators in the NBINARCH model substantially but reduces its robustness. Similar findings are known for van der Waerden scores under Gaussian model assumptions. If robustness against outliers is important, we prefer the ordinary Spearman's rank (partial) autocorrelation. Our joint M-estimation approach is an alternative to the one suggested by [27]. Our method seems to be computationally simpler, so that it can be applied to moderately large data sets, where we encountered problems for the Xiong and Zhu's algorithm. A drawback of our approach is that the choice of tuning constants to achieve both high efficiency and high robustness against different outlier scenarios (like isolated and patchy outliers) seems to be more difficult. Nevertheless, in our experiments it achieves better efficiency and robustness than the Yule-Walker estimates based on Spearman's rank autocorrelation if the model order is known.
For illustration we have analyzed a famous data set from the literature, the campylobacterosis data studied e.g. by [12]. In doing so we have modified the robust method of moment estimators such that patterns of change like a level shift found in these data by several authors (e.g. [13]) can be incorporated in the estimation. The results obtained agree well with those obtained by [20] applying sophisticated stepwise detection and correction procedures.
Funding Open Access funding enabled and organized by Projekt DEAL.
The condition E sup ω∈Θ |g(Y ; ω)| < ∞ is obviously fulfilled, because |s t (Y , ω) s t (Y , ω)| is bounded with probability 1 in view of the boundedness of the ψ-function and the parameter space.
Since Q(ω) ≤ 0 = Q(ω (0) ) by construction, there is a global maximum at ω (0) , so that ω n is consistent if it is initialized by a consistent estimator.
For asymptotic normality, note that ∂ S n ( y, ω)/∂ω exists and is continuous in an open convex neighborhood of ω (0) if we use a differentiable ψ-function, since the other terms μ t , σ t , μ and σ involved in the construction of S n ( y, ω) are also differentiable. The condition that n −1 ∂ S n ( y, ω)/∂ω converges to a finite nonsingular matrix A(ω (0) ) = E(∂s t (Y t , ω)/∂ω) ω (0) in probability can be deduced by applying the general ergodic theorem [18] to the Markov chain (Y t , . . . , Y t− p ) , t ∈ N.
Proof By construction (s t (Y , ω) − E(s t (Y , ω)|F t−1 )) is a martingale difference sequence and (S n (Y , ω) − n t= p+1 E(s t (Y , ω)|F t−1 )) is a martingale. We thus make use of the strong law of large numbers [7] and the central limit theorem for martingales.
The boundedness of E||u t (Y , ω)|| 2 has already been verified by [27]. For the other terms E||s t,i || 2 , i = 0, . . . , p, this is obvious due to the boundedness of the ψ-function, as long as σ and σ t are bounded away from 0. This is guaranteed since α 0 > 0.