Skip to main content
Log in

Mallows’ quasi-likelihood estimation for log-linear Poisson autoregressions

  • Published:
Statistical Inference for Stochastic Processes Aims and scope Submit manuscript

Abstract

We consider the problems of robust estimation and testing for a log-linear model with feedback for the analysis of count time series. We study inference for contaminated data with transient shifts, level shifts and additive outliers. It turns out that the case of additive outliers deserves special attention. We propose a robust method for estimating the regression coefficients in the presence of interventions. The resulting robust estimators are asymptotically normally distributed under some regularity conditions. A robust score type test statistic is also examined. The methodology is applied to real and simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Barczy M, Ispány M, Pap G, Scotto M, Silva ME (2012) Additive outliers in INAR(1) models. Stat Papers 53:935–949

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow N (1990) Tests of hypotheses in overdispersed Poisson regression and other quasi-likelihood models. J Am Stat Assoc 85:565–571

    Article  Google Scholar 

  • Brockwell PJ, Davis RA (1991) Time series: theory and methods, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Cantoni E, Ronchetti E (2001) Robust inference for generalized linear models. J Am Stat Assoc 96:1022–1030

    Article  MathSciNet  MATH  Google Scholar 

  • Chen C, Liu L (1993) Joint estimation of model parameters and outlier effects in time-series. J Am Stat Assoc 88:284–297

    MATH  Google Scholar 

  • Chow YS (1967) On a strong law of large numbers for martingales. Ann Math Stat 38:610

    Article  MathSciNet  MATH  Google Scholar 

  • Christou V, Fokianos K (2015) Estimation and testing linearity for mixed Poisson autoregressions. Electron J Stat 9:1357–1377

    Article  MathSciNet  MATH  Google Scholar 

  • Douc R, Doukhan P, Moulines E (2013) Ergodicity of observation-driven time series models and consistency of the maximum likelihood estimator. Stoch Process Appl 123:2620–2647

    Article  MathSciNet  MATH  Google Scholar 

  • El Saied H (2012) Robust modelling of count time series: applications in medicine. Ph.D. thesis, TU Dortmund University, Germany

  • El Saied H, Fried R (2014) Robust fitting of INARCH models. J Time Ser Anal 35:517–535

    Article  MathSciNet  MATH  Google Scholar 

  • Ferland R, Latour A, Oraichi D (2006) Integer-valued GARCH process. J Time Ser Anal 27:923–942

    Article  MathSciNet  MATH  Google Scholar 

  • Fokianos K, Fried R (2010) Interventions in INGARCH processes. J Time Ser Anal 31:210–225

    Article  MathSciNet  MATH  Google Scholar 

  • Fokianos K, Fried R (2012) Interventions in log-linear Poisson autoregression. Stat Model 12:299–322

    Article  MathSciNet  Google Scholar 

  • Fokianos K, Rahbek A, Tjøstheim D (2009) Poisson autoregression. J Am Stat Assoc 104:1430–1439

    Article  MathSciNet  MATH  Google Scholar 

  • Fokianos K, Tjøstheim D (2011) Log-linear Poisson autoregression. J Multivar Anal 102:563–578

    Article  MathSciNet  MATH  Google Scholar 

  • Francq C, Zakoïan J-M (2010) GARCH models: structure, statistical inference and financial applications. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Fried R, Elsaied H, Liboschik T, Fokianos K, Kitromilidou S (2014) On outliers and interventions in count time series following GLMs. Austrian J Stat 43:181–193

    Article  Google Scholar 

  • Hall P, Heyde CC (1980) Martingale limit theory and its application. Academic Press, New York

    MATH  Google Scholar 

  • Harvey AC (1990) The econometric analysis of time series, 2nd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  • Heritier S, Ronchetti E (1994) Robust bounded-influence tests in general parametric models. J Am Stat Assoc 89:897–904

    Article  MathSciNet  MATH  Google Scholar 

  • Huber PJ, Ronchetti E (2009) Robust statistics, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Kedem B, Fokianos K (2002) Regression models for time series analysis. Wiley, New York

    Book  MATH  Google Scholar 

  • Kitromilidou S, Fokianos K (2016) Robust estimation methods for a class of count time series log-linear models. J Stat Comput Simul 86:740–755

    Article  MathSciNet  Google Scholar 

  • Klimko LA, Nelson PI (1978) On conditional least squares estimation for stochastic processes. Ann Stat 6:629–642

    Article  MathSciNet  MATH  Google Scholar 

  • Künsch HR, Stefanski LA, Carroll RJ (1989) Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J Am Stat Assoc 84:460–466

    MathSciNet  MATH  Google Scholar 

  • Lô SN, Ronchetti E (2009) Robust and accurate inference for generalized linear models. J Multivar. Anal 100:2126–2136

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Mukherjee K (2008) \(M\)-estimation in GARCH models. Econ Theory 24:1530–1553

    Article  MathSciNet  MATH  Google Scholar 

  • Muler N, Yohai VJ (2002) Robust estimates for ARCH processes. J Time Ser Anal 23:341–375

    Article  MathSciNet  MATH  Google Scholar 

  • Muler N, Yohai VJ (2008) Robust estimates for GARCH models. J Stat Plan Inference 138:2918–2940

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639

    Article  Google Scholar 

  • Seber GF, Lee AJ (2003) Linear regression analysis, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Taniguchi M, Kakizawa Y (2000) Asymptotic theory of statistical inference for time series. Springer, New York

    Book  MATH  Google Scholar 

  • Tjøstheim D (2012) Some recent theory for autoregressive count time series. TEST 21:413–438 (with discussion)

    Article  MathSciNet  MATH  Google Scholar 

  • Valdora M, Yohai VJ (2014) Robust estimators for generalized linear models. J Stat Plan Inference 146:31–48

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Woodard DW, Matteson DS, Henderson SG (2011) Stationarity of count-valued and nonlinear time series models. Electron J Stat 5:800–828

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We cordially thank two anonymous reviewers for several useful comments that improved the article considerably. The authors would like to acknowledge the project eMammoth - Compute and Store on Grids and Clouds infrastructure(ANABATHMISI/06609/09), which is co-funded by the Republic of Cyprus and the European Regional Development Fund of the EU. Work supported by Cyprus Research Promotion Foundation TEXNOLOGIA/THEPIS/0609(BE)/02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantinos Fokianos.

Appendix

Appendix

In the following, the symbol C denotes a constant which depends upon the context. Define also \(d_M=\max (|d_L|,|d_U|)\), \(a_M=\max (|a_L|,|a_U|)\) and \(b_M=\max (|b_L|,|b_U|)\). In addition, when a quantity is evaluated at the true value of the parameter \(\varvec{\theta }\), denoted it by \(\varvec{\theta }_{0}\), then the notation will be simplified by dropping \(\varvec{\theta }_{0}\). For instance, \(m_{t} \equiv m_{t}(\varvec{\theta }_{0})\) and so on. The following two results are taken from Fokianos and Tjøstheim (2011) and are included for completeness.

Lemma 6.1

Assume model (2) and suppose that \(|a|<1\). In addition, assume that when \(b>0\) then \(|a+b|<1\), and when \(b<0\) then \(|a||a+b|<1\). Then, the following conclusions hold:

  1. 1.

    The process \(\{\nu _t^m,t \ge 0\}\) is a geometrically ergodic Markov chain with finite moments of order k, for an arbitrary k.

  2. 2.

    The process \(\{(Y_t^m,U_t,\nu _t^m),t \ge 0\}\) is a \(V_{(Y,U,\nu )}\)-geometrically ergodic Markov chain with \(V_{Y,U,\nu }(Y,U,\nu )=1+\log ^{2k}(1+Y)+\nu ^{2k}+U^{2k}\), k being a positive integer.

Lemma 6.2

Suppose that \((Y_t,\nu _t)\) and \((Y_t^m,\nu _t^m)\) are defined by (1) and (2) respectively. Assume that \(|a+b|<1\), if a and b have the same sign, and \(a^2+b^2<1\) if a and b have different signs. Then the following statements are true:

  1. 1.

    \(E|\nu _t^m-\nu _t| \rightarrow 0\) and \(|\nu _t^m-\nu _t|<\delta _{1,m}\) almost surely for m large.

  2. 2.

    \(E(\nu _t^m-\nu _t)^2 \le \delta _{2,m}\),

  3. 3.

    \(E|\lambda _t^m-\lambda _t| \le \delta _{3,m}\),

  4. 4.

    \(E|Y_t^m-Y_t| \le \delta _{4,m}\),

  5. 5.

    \(E(\lambda _t^m-\lambda _t)^2 \le \delta _{5,m}\),

  6. 6.

    \(E(Y_t^m-Y_t)^2 \le \delta _{6,m}\),

where \(\delta _{i,m} \rightarrow 0\) as \(m \rightarrow \infty \) for \(i=1,\ldots ,6\). Furthermore, almost surely, with m large enough

$$\begin{aligned} |\lambda _t^m-\lambda _t| \le \delta \quad \text {and} ~~ |Y_t^m-Y_t| \le \delta , \quad \text {for any} \, \delta >0. \end{aligned}$$

We will also need the following lemma whose proof is given below.

Lemma 6.3

Define the Pearson residuals for both perturbed and unperturbed models by

$$\begin{aligned} r_t^m=\frac{Y_t^m-e^{\nu _t^m}}{e^{{\nu _t^m}/2}}, ~~~ r_t=\frac{Y_t-e^{\nu _t}}{e^{{\nu _t}/2}} \end{aligned}$$

respectively. Suppose that \((Y_t,\nu _t)\) and \((Y_t^m,\nu _t^m)\) are defined by (1) and (2) respectively. Assume that \(|a+b|<1\), if a and b have the same sign, and \(a^2+b^2<1\) if a and b have different signs. Then,

  1. 1.

    \(E|r_t^m-r_t| \rightarrow 0\),

  2. 2.

    \(E(r_t^m-r_t)^2 \le \delta _{7;m}\),

where \(\delta _{7,m} \rightarrow 0\) as \(m \rightarrow \infty \). Furthermore, almost surely, with m large enough

$$\begin{aligned} |r_t^m-r_t| \le \delta ~~~ \text {for any} ~~ \delta >0. \end{aligned}$$

Proof of Lemma 6.3

We have that

$$\begin{aligned} |r_t^m-r_t|= & {} \left| \frac{Y_t^m-e^{\nu _t^m}}{e^{{\nu _t^m}/2}} - \frac{Y_t-e^{\nu _t}}{e^{{\nu _t}/2}} \right| = \left| \frac{Y_t^m}{e^{{\nu _t^m}/2}} - \frac{Y_t}{e^{{\nu _t}/2}} + \left( e^{{\nu _t}/2} - e^{{\nu _t^m}/2} \right) \right| \\\le & {} \left| \frac{Y_t^m e^{{\nu _t}/2} - Y_t e^{{\nu _t^m}/2} \pm Y_t^m e^{{\nu _t^m}/2}}{e^{{\nu _t^m}/2} e^{{\nu _t}/2}}\right| + \left| e^{{\nu _t}/2} - e^{{\nu _t^m}/2} \right| \\\le & {} \left| \frac{Y_t^m-Y_t}{e^{{\nu _t}/2}} \right| + \left| \frac{Y_t^m \left( e^{{\nu _t}/2} - e^{{\nu _t^m}/2} \right) }{e^{{\nu _t^m}/2} e^{{\nu _t}/2}} \right| + \left| e^{{\nu _t^m}/2}- e^{{\nu _t}/2} \right| \\\le & {} |Y_t^m-Y_t| + \left( |Y_t^m|+1 \right) \left| \lambda _{t}^{m} -\lambda _{t} \right| ^{1/2} \\< & {} \delta , \end{aligned}$$

for any \(\delta >0\) almost surely and for m large enough by using the results of Lemma 6.2. The claims follow. \(\square \)

Proof of Lemma 2.1

We will show that

$$\begin{aligned} E \left( m_t^m(\varvec{\theta }) \left( m_t^m(\varvec{\theta }) \right) ^T \right) - E \left( m_t(\varvec{\theta }) \left( m_t(\varvec{\theta }) \right) ^T \right) \rightarrow 0 \end{aligned}$$
(9)

and

$$\begin{aligned} E \left( m_t^m(\varvec{\theta }) \right) E^{T} \left( m_t^m(\varvec{\theta }) \right) - E \left( m_t(\varvec{\theta }) \right) E^{T} \left( m_t(\varvec{\theta }) \right) \rightarrow 0, \end{aligned}$$
(10)

as \( m \rightarrow \infty \). Consider first (9). Working along the lines of Fokianos and Tjøstheim (2011), we consider differences of the perturbed and non perturbed matrix along the diagonal individually for \(\theta _i=d,a,b\). Then, we need to evaluate

$$\begin{aligned} E \left| (Z_t^m)^2 \left( \frac{\partial \nu _t^m}{\partial \theta _i} \right) ^2 - Z_t^2 \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2 \right| , \end{aligned}$$

with \(Z_t=\psi _c(r_t)w_t e^{\nu _t/2}\) and similarly for \(Z_t^m\). We have,

$$\begin{aligned}&E\left| \left( Z_t^m\right) ^2 \left( \frac{\partial \nu _t^m}{\partial \theta _i} \right) ^2 - Z_t^2 \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2 \right| \\&\quad = E \left| \left( Z_t^m\right) ^2 \left[ \left( \frac{\partial \nu _t^m}{\partial \theta _i} \right) ^2 - \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2 \right] + \left( \left( Z_t^m\right) ^2 - Z_t^2\right) \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2 \right| \\&\quad \le E \left| \left( Z_t^m\right) ^2 \left[ \left( \frac{\partial \nu _t^m}{\partial \theta _i} \right) ^2- \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2 \right] \right| + E\left| \left( \left( Z_t^m\right) ^2-Z_t^2 \right) \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2 \right| \\&\quad \le C E \left| \exp \left( \nu _{t}^{m}\right) \Bigl ( \left( \frac{\partial \nu _t^m}{\partial \theta _i} \right) -\left( \frac{\partial \nu _t}{\partial \theta _i} \right) \Bigr ) \Bigl ( \left( \frac{\partial \nu _t^m}{\partial \theta _i} \right) +\left( \frac{\partial \nu _t}{\partial \theta _i} \right) \Bigr ) \right| \\&\qquad +\, \sqrt{ E \Bigl ( \left( Z_t^{m}\right) ^{2}- Z_{t}^{2} \Bigr )^{2} E \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^{4} } \end{aligned}$$

The first term can become arbitrarily small because it can be shown (following the proof of Fokianos et al. (2009, Lemma 3.1)) that

$$\begin{aligned} \left| \frac{\partial \nu _t^m}{\partial \theta _i}- \frac{\partial \nu _t^m}{\partial \theta _i} \right| < \delta ,~~ E \left| \frac{\partial \nu _t^m}{\partial \theta _i}- \frac{\partial \nu _t^m}{\partial \theta _i} \right| < \delta ,~~ E \left( \frac{\partial \nu _t^m}{\partial \theta _i}- \frac{\partial \nu _t^m}{\partial \theta _i} \right) ^{2} < \delta . \end{aligned}$$
(11)

For the second term, note first that \(E (\partial \nu _t / \partial \theta _i)^{4}\) is bounded by a finite constant for \(i=1,2,3\) since

$$\begin{aligned} \frac{\partial \nu _t}{\partial d}\le & {} \frac{1}{(1-a_M)},~~ \frac{\partial \nu _t}{\partial a}\le \frac{c_0}{(1-a_M)}+b_M \sum \limits _{i=1}^{t-1} ia_M^{i-1} \log (1+Y_{t-i-1}),\nonumber \\ \frac{\partial \nu _t}{\partial b}\le & {} \sum \limits _{i=0}^{t-1} a_M^{i-1} \log (1+Y_{t-i-1}), \end{aligned}$$
(12)

by using Lemma 6.1. Furthermore

$$\begin{aligned} \left| \left( Z_t^{m}\right) ^{2}- Z_{t}^{2} \right|\le & {} \left| \psi ^2\left( r_t^m\right) e^{\nu _t^m} - \psi ^2(r_t) e^{\nu _t} \right| \\= & {} \left| \psi ^2\left( r_t^m\right) e^{\nu _t^m} - \psi ^2(r_t) e^{\nu _t} \pm \psi ^2(r_t) e^{\nu _t^m} \right| \\\le & {} \left| \psi ^2\left( r_t^m\right) - \psi ^2(r_t)\right| \lambda _{t}^{m} + \psi ^2(r_t)\left| e^{\nu _t^m} -e^{\nu _t}\right| \\\le & {} C \Bigl (\Bigl |r_t^m - r_t\Bigr | \lambda _{t}^{m} + \Bigl |\lambda _{t}^{m} - \lambda _t\Bigr | \Bigr ) < \delta \end{aligned}$$

where we have used the boundedness of the function \(\psi (\cdot )\), the mean-value theorem and Lemmas 6.2 and 6.3. Hence (9) follows. To prove (10), consider

$$\begin{aligned}&\Bigl | E^{2} \Bigl ( Z_t^m \frac{\partial \nu _t^m}{\partial \theta _{i}} \Bigr )- E^{2} \Bigl ( Z_t \frac{\partial \nu _t}{\partial \theta _{i}} \Bigr ) \Bigr | \\&\quad \le E \left| Z_t^m \frac{\partial \nu _t^m}{\partial \theta _{i}} - Z_t \frac{\partial \nu _t}{\partial \theta _{i}} \right| E \left| Z_t^m \frac{\partial \nu _t^m}{\partial \theta _{i}} + Z_t \frac{\partial \nu _t}{\partial \theta _{i}} \right| \\&\quad \le C E \left| Z_t^m \frac{\partial \nu _t^m}{\partial \theta _{i}} - Z_t \frac{\partial \nu _t}{\partial \theta _{i}} \pm Z_t^m \frac{\partial \nu _t}{\partial \theta _{i}} \right| E \left| e^{\nu _t^m/2} \frac{\partial \nu _t^m}{\partial \theta _{i}} + e^{\nu _t/2} \frac{\partial \nu _t}{\partial \theta _{i}} \pm e^{\nu _t^m/2} \frac{\partial \nu _t}{\partial \theta _{i}} \right| \\&\quad = C E \left| Z_t^m \left( \frac{\partial \nu _t^m}{\partial \theta _{i}} -\frac{\partial \nu _t}{\partial \theta _{i}} \right) + \left( Z_t^m-Z_t \right) \frac{\partial \nu _t}{\partial \theta _{i}}\right| \\&\qquad \times \,E\left| e^{\nu _t^m/2}\left( \frac{\partial \nu _t^m}{\partial \theta _{i}} +\frac{\partial \nu _t}{\partial \theta _{i}} \right) +\left( e^{\nu _t^m/2}- e^{\nu _t/2}\right) \frac{\partial \nu _t}{\partial \theta _{i}}\right| . \end{aligned}$$

The above quantity can be made arbitrarily small because of finite moments of \(\partial \nu _t / \partial \theta _{i}\), \(\partial \nu _t^{m} / \partial \theta _{i}\), \(\exp (\nu _{t}^{m})\), (11) and the fact that \(E \left| \left( Z_t^m-Z_t \right) \ \partial \nu _t / \partial \theta _{i} \right| \rightarrow 0\), as \( \rightarrow \infty \) which is proved following the previous arguments. \(\square \)

Proof of Lemma 2.2

The score function \(S_n^m\) for the perturbed model is a martingale sequence, with \(E(S_n^m||\mathcal{F}_{t-1}^m)=S_{n-1}^m\) at the true value \(\varvec{\theta }=\varvec{\theta }_0\) and \(\mathcal{F}_{t-1}^m\) denotes the \(\sigma \)-field generated by \(\{Y_0^m,\ldots ,Y_{t-1}^m,\mathcal{U}_0,\ldots ,\mathcal{U}_{t-1}\}\). We will show that it is square integrable. Proving that \(E||m_t^m|| ^2\) is finite for \(\varvec{\theta }_0=d_0,a_0\) and \(b_0\) guarantees an application of the strong law of large numbers for martingales (Chow 1967), which gives almost sure convergence to 0 of \(S_n^m/n\) as \(n \rightarrow \infty \). But

$$\begin{aligned} E\left\{ \Bigl ( \psi \left( r_t^m\right) w_t e^{\nu _t^m/2} \frac{\partial \nu _t^m}{\partial \theta _{i}} \Bigr )^{2} \right\} \le C \left( E| e^{\nu _t^m} |^2 \right) ^{1/2} \left( E \left( \frac{\partial \nu _t^m}{\partial \theta _{i}} \right) ^4 \right) ^{1/2} \end{aligned}$$

and this is finite because of Lemma 6.1 and (12). To show asymptotic normality of the perturbed score function \(S_n^m\) we apply the CLT for martingales (Hall and Heyde 1980, Cor. 3.1). \((S_n^m)_{n \ge 1}\) is a zero mean, square integrable martingale sequence with \((s_t^m)_{t \ge \mathbb {N}}\) a martingale difference sequence. To prove the conditional Lindeberg’s condition note that

$$\begin{aligned} \frac{1}{n} \sum \limits _{t=1}^n E\left( \parallel s_t^m \parallel ^2 \mathbbm {1} \left( \parallel s_t^m \parallel > \sqrt{n} \delta \right) \parallel \mathcal{F}_{t-1}^m \right) \rightarrow 0, \end{aligned}$$

since \(E||s_t^m||^4 < \infty \). In addition,

$$\begin{aligned} \frac{1}{n} \sum _{t=1}^n { Var\left( s_t^m \parallel \mathcal{F}_{t-1}^m\right) } \xrightarrow {p}&E \Bigl \{E\left[ \left( s_t^m\right) \left( s_t^m\right) ^T \parallel \mathcal{F}_{t-1}^m\right] \Bigr \}=W^{m}. \end{aligned}$$

This concludes the second result of the Lemma.

The third result of the Lemma is identical to Lemma 2.1 by Brockwell and Davis (1991, Prop. 6.4.9.). Consider now the last result of the Lemma.

$$\begin{aligned} \frac{1}{\sqrt{n}}\left( S_n^m-S_n\right)= & {} \frac{1}{\sqrt{n}} \sum \limits _{t=1}^n \left\{ s_t^m - s_t \right\} = \frac{1}{\sqrt{n}}\sum _{t=1}^{n} \left( W_{t}^{m}\frac{\partial \nu _{t}^{m}}{\partial \varvec{\theta } }-W_{t}\frac{\partial \nu _{t}}{\partial \varvec{\theta } }\right) \\= & {} \frac{1}{\sqrt{n}}\sum _{t=1}^{n} \left[ W_{t}^{m}\left( \frac{\partial \nu _{t}^{m}}{\partial \varvec{\theta } }-\frac{\partial \nu _{t}}{\partial \varvec{\theta }} \right) + \left( W_{t}^{m}-W_{t}\right) \frac{\partial \nu _{t}}{\partial \varvec{\theta }} \right] , \end{aligned}$$

where \(W_{t}= Z_{t}-E[Z_{t} \mid \mathcal{F}_{t-1}]\) and similarly for the perturbed model. For the first summand in the above representation, we obtain that

$$\begin{aligned} {P}\left( \left\| \sum _{t=1}^{n}W_{t}^{m}\left( \frac{\partial \nu _{t}^{m}}{\partial \varvec{\theta } }-\frac{\partial \nu _{t}}{\partial \varvec{\theta } } \right) \right\| >\delta \sqrt{n}\right)\le & {} {P}\left( \epsilon _{m} \left| \sum _{t=1}^{n} W_{t}^{m}\right| >\delta \sqrt{n}\right) \\\le & {} \frac{\epsilon _{m}^{2}}{\delta ^{2}n}\sum _{t=1}^{n} \text{ E } \left| W_{t}^{m}\right| ^{2}\le C\epsilon _{m}^{2}\rightarrow 0, \end{aligned}$$

as \(m \rightarrow \infty \), for some \(\epsilon _{m}\). For the second summand, note that

$$\begin{aligned}&\Bigl |\Bigl (\psi \left( r_t^m\right) - E\left( \psi \left( r_t^m\right) || \mathcal{F}_{t-1}^m \right) \Bigr ) - \Bigl (\psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \Bigr ) \right] \Bigr |\nonumber \\&\quad \le C \Bigl (|r_t^m-r_t| + E\left( |r_t^m-r_t| || \mathcal{F}_{t-1}^m \right) \Bigr ) \end{aligned}$$
(13)

and therefore its expected value tends to 0 by Lemma 6.3. The fact that \(E||\partial \nu _t / \partial \varvec{\theta }||^2< \infty \) yields the desired conclusion. \(\square \)

Proof of Lemma 2.3

Because \(S_n(\varvec{\theta })=0\) is an unbiased estimating function, it holds that

$$\begin{aligned} -E\left( \frac{\partial }{\partial \varvec{\theta }}s_t(\varvec{\theta })||\mathcal{F}_{t-1} \right) =E\left( s_t(\varvec{\theta }) \frac{\partial l_t(\varvec{\theta })}{\partial \varvec{\theta }} || \mathcal{F}_{t-1} \right) , \end{aligned}$$

where \(l_t(\varvec{\theta })= \nu _{t}(\varvec{\theta })Y_{t}- \exp ( \nu _{t}(\varvec{\theta }))\), is the logarithm of the conditional probability of \(Y_t||\mathcal{F}_{t-1}\) under the Poisson assumption. Then, the matrix \(V_n(\varvec{\theta })\) is rewritten in the form

$$\begin{aligned} V_n(\varvec{\theta })=\frac{1}{n} \sum \limits _{t=1}^n E\left( s_t(\varvec{\theta }) \frac{\partial l_t(\varvec{\theta })}{\partial \varvec{\theta }} || \mathcal{F}_{t-1} \right) \end{aligned}$$

and the matrix \(V_n^m(\varvec{\theta })\) for the perturbed model is defined analogously. We again examine the difference \(s_t^m ({\partial l_t^m}/{\partial \theta _{i}}) - s_t ({\partial l_t}/{\partial \theta _i})\). Notice that

$$\begin{aligned} s_t \frac{\partial l_t}{\partial \theta _i}= & {} (m_t-E(m_t|| \mathcal{F}_{t-1}))(Y_t-e^{\nu _t}) \frac{\partial \nu _t}{\partial \theta _i}\\= & {} \left( \psi (r_t) w_t e^{\nu _t/2} \frac{\partial \nu _t}{\partial \theta _i}- E\left( \psi (r_t) w_t e^{\nu _t/2} \frac{\partial \nu _t}{\partial \theta _i} || \mathcal{F}_{t-1} \right) \right) \left( Y_t-e^{\nu _t} \right) \frac{\partial \nu _t}{\partial \theta _i}\\= & {} w_t e^{\nu _t} r_t \Bigl ( \psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \right) \Bigr ) \left( \frac{\partial \nu _t}{\partial \theta _i} \right) ^2. \end{aligned}$$

Then,

$$\begin{aligned} E\left| s_t^m \frac{\partial l_t^m}{\partial \theta _{i}} - s_t \frac{\partial l_t}{\partial \theta _{i}} \right|&\le E\left| e^{\nu _t^m} r_t^m \left( \frac{\partial \nu _t^m}{\partial \theta _{i}} \right) ^2 \Bigl (\left[ \psi (r_t^m) - E\left( \psi (r_t^m) || \mathcal{F}_{t-1}^m \right) \right] \right. \\&\quad \left. - \left[ \psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \right) \right] \Bigr ) \right| \\&\quad + E\left| \Bigl ( e^{\nu _t^m} r_t^m \left( \frac{\partial \nu _t^m}{\partial \theta _{i}} \right) ^2 - e^{\nu _t} r_t \left( \frac{\partial \nu _t}{\partial \theta _{i}} \right) ^2 \Bigr ) \left( \psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \right) \right) \right| . \end{aligned}$$

For the first summand, (13) shows that it tends to zero. We work similarly for the second summand to obtain the desired result. To show that \(V_n\) is positive definite it is sufficient to show that \(z^T \left( {\partial \nu _t}/{\partial \varvec{\theta }} \right) \left( {\partial \nu _t}/{\partial \varvec{\theta }} \right) ^T z >0\) for any non-zero three dimensional real vector z. If \( z^{\prime } {\partial \nu _{t}}/{\partial \varvec{\theta }}=0\), then we obtain that \(z^{\prime } (1, \nu _{t-1}, \log (Y_{t-1}+1))^{\prime }=0\). But if the last equation holds, then \(z =0\) because \(\nu _{t}\) is expressed as a past function of \(\log (Y_{t}+1)\) and \(Y_{t}\) is non-zero for some t. The same reasoning holds for \(V_{n}^{m}\). \(\square \)

Proof of Lemma 2.4

The first assertion of the Lemma holds by using a LLN. For the second, the Hessian matrix \(H_n\) can be represented as

$$\begin{aligned} H_n=\frac{1}{n} \sum \limits _{t=1}^n s_t \frac{\partial l_t}{\partial \varvec{\theta }} = \frac{1}{n} \sum \limits _{t=1}^n \left\{ w_t e^{\nu _t} r_t \left[ \psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \right) \right] \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) ^T \right\} . \end{aligned}$$

The matrix \(H_n^m\) for the perturbed model is defined analogously. Examining the difference \(H_n^m-H_n\), we obtain that

$$\begin{aligned} H_n^m-H_n= & {} \frac{1}{n} \sum \limits _{t=1}^n w_t \left\{ e^{\nu _t^m} r_t^m \left[ \psi (r_t^m) - E\left( \psi (r_t^m) || \mathcal{F}_{t-1}^m \right) \right] \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) ^T\right. \\&- \left. e^{\nu _t} r_t \left[ \psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \right) \right] \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) ^T \right\} \\= & {} \frac{1}{n} \sum \limits _{t=1}^n w_t \left\{ e^{\nu _t^m} r_t^m \left[ \psi (r_t^m) - E\left( \psi (r_t^m) || \mathcal{F}_{t-1}^m \right) \right] \right. \\&\left. \times \, \left[ \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) ^T - \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) ^T \right] \right. \\&\left. +\,\left( e^{\nu _t^m} r_t^m \left[ \psi (r_t^m) - E\left( \psi (r_t^m) || \mathcal{F}_{t-1}^m \right) \right] - e^{\nu _t} r_t \left[ \psi (r_t) - E\left( \psi (r_t) || \mathcal{F}_{t-1} \right) \right] \right) \right. \\&\left. \times \,\left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) ^T \right\} . \end{aligned}$$

The second term in the above representation tends to zero as \(m \rightarrow \infty \) because of the previous Lemma and the fact that \(E \left\| \left( {\partial \nu _{t}}/{\partial \varvec{\theta }} \right) \left( {\partial \nu _t}/{\partial \varvec{\theta }} \right) ^T \right\| < \infty \).

For the first term in the representation of \(H_n^m-H_n\), we obtain the following

$$\begin{aligned}&P\left( \left\| \sum \limits _{t=1}^n e^{\nu _t^m} r_t^m \left[ \psi (r_t^m) - E\left( \psi (r_t^m) || \mathcal{F}_{t-1}^m \right) \right] \right. \right. \\&\left. \left. \qquad \times \, \left[ \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) ^T -\left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) ^T\right] \right\| \ge \epsilon n \right) \\&\quad \le \frac{1}{\epsilon n } \sum \limits _{t=1}^n E \left\| e^{\nu _t^m} r_t^m \left[ \psi (r_t^m) - E\left( \psi (r_t^m) || \mathcal{F}_{t-1}^m \right) \right] \right. \\&\left. \qquad \times \,\left[ \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t^m}{\partial \varvec{\theta }} \right) ^T -\left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) \left( \frac{\partial \nu _t}{\partial \varvec{\theta }} \right) ^T\right] \right\| \\&\quad \rightarrow 0. \end{aligned}$$

\(\square \)

Proof of Lemma 2.5

Recall that the components of the MQLE score are given by

$$\begin{aligned} s_{ti}(\varvec{\theta })= & {} m_{ti}(\varvec{\theta })-E({m}_{ti}(\varvec{\theta }) || \mathcal{F}_{t-1}), ~~~\text {where}~~~m_{ti}(\varvec{\theta })\\= & {} \psi (r_t(\varvec{\theta }))w_te^{\nu _t(\varvec{\theta })/2} \frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _i}, ~~~i=1,2,3. \end{aligned}$$

The second derivative of the i-th component of the MQLE score \(\partial ^2 s_{ti}(\varvec{\theta })/\partial \theta _k \partial \theta _j\) is given by

$$\begin{aligned} \frac{\partial ^2 s_{ti}(\varvec{\theta })}{\partial \theta _k \partial \theta _j}=\frac{\partial ^2 m_{ti}(\varvec{\theta })}{ \partial \theta _k \partial \theta _j}-E\left( \frac{\partial ^2 m_{ti}(\theta )}{\partial \theta _k \partial \theta _j} || \mathcal{F}_{t-1}\right) , \end{aligned}$$

where

$$\begin{aligned} \frac{\partial ^2 m_{ti}(\varvec{\theta })}{\partial \theta _k \partial \theta _j}= & {} \xi _{1t}(\varvec{\theta }) \frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _k}\frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _j}\frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _i}\\&+\,\xi _{2t}(\varvec{\theta }) \left\{ \frac{\partial ^2 \nu _t(\varvec{\theta })}{\partial \theta _k \partial \theta _j}\frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _i}+\frac{\partial ^2 \nu _t(\varvec{\theta })}{\partial \theta _k \partial \theta _i}\frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _j}+\frac{\partial ^2 \nu _t(\varvec{\theta })}{\partial \theta _j \partial \theta _i}\frac{\partial \nu _t(\varvec{\theta })}{\partial \theta _k} \right\} \\&+\,\xi _{3t}(\varvec{\theta }) \frac{\partial ^3 \nu _t(\varvec{\theta })}{\partial \theta _k \partial \theta _j \partial \theta _i}, \end{aligned}$$

with

$$\begin{aligned} \xi _{1t}(\varvec{\theta })= & {} -\frac{w_{t}}{2} \left\{ \psi '(r_t(\varvec{\theta }))e^{\nu _t(\varvec{\theta })}-\frac{1}{2} \psi ''(r_t(\varvec{\theta }))(Y_t+e^{\nu _t(\varvec{\theta })})(Y_t e^{-\nu _t(\varvec{\theta })/2} + e^{\nu _t(\varvec{\theta })/2}) \right. \\&\left. +\, \frac{1}{2} \psi '(r_t(\varvec{\theta })) (Y_t+e^{\nu _t(\varvec{\theta })})-\frac{1}{2}\psi (r_t(\varvec{\theta })) e^{\nu _t(\varvec{\theta })/2} \right\} ,\\ \xi _{2t} (\varvec{\theta })= & {} - \frac{w_{t}}{2} \left\{ (Y_t+e^{\nu _t(\varvec{\theta })})\psi '(r_t(\varvec{\theta }))- \psi (r_t(\varvec{\theta })) e^{\nu _t(\varvec{\theta })/2} \right\} ,\\ \xi _{3t} (\varvec{\theta })= & {} w_{t} \psi (r_t(\varvec{\theta })) e^{\nu _t(\varvec{\theta })/2}, \end{aligned}$$

Without loss of generality, we only consider derivatives with respect to a. For the derivatives with respect to d and b we use identical arguments. For the derivatives of \(\nu _t\) with respect to the parameter a we obtain the following bounds

$$\begin{aligned}&\nu _t \le \mu _{0t}:=b_M \sum \limits _{j=1}^{t-1} a_M^{j} \log (1+Y_{t-j-1})+c_0,~~~\text {where}~~~c_0=d_M/(1-a_M)+\nu _0,\\&\frac{\partial \nu _t}{\partial a} \le \mu _{1t}:=b_M \sum \limits _{j=1}^{t-1} ja_M^{j-1} \log (1+Y_{t-j-1})+c_1,~~~\text {where}~~~c_1=c_0/(1-a_M),\\&\frac{\partial ^2 \nu _t}{\partial a^2} \le \mu _{2t}:=b_M \sum \limits _{j=1}^{t-2} j(j+1)a_M^{j-1} \log (1+Y_{t-j-2})+c_2,~~~\text {where}~~~c_2=2c_0/(1-a_M)^2,\\&\frac{\partial ^3 \nu _t}{\partial a^3} \le \mu _{3t}:=b_M \sum \limits _{j=1}^{t-3} j(j+1)(j+2)a_M^{j-1} \log (1+Y_{t-j-3})+c_3,~~~\text {where}~~~c_3=6c_0/(1-a_M)^3. \end{aligned}$$

With \(\theta _i=\theta _j=\theta _k=a\),

$$\begin{aligned}&\Bigl |\frac{\partial ^2 m_{ti}(\varvec{\theta })}{\partial \theta _k \partial \theta _j}-E(\frac{\partial ^2 m_{ti}(\varvec{\theta })}{\partial \theta _k \partial \theta _j}||\mathcal{F}_{t-1})\Bigr |\nonumber \\&\quad < C \Bigl \{ \left| \xi _{1t}(\varvec{\theta })- E(\xi _{1t}(\varvec{\theta })|| \mathcal{F}_{t-1}) \right| \mu _{1t}^3+\left| \xi _{2t}(\varvec{\theta })- E(\xi _{2t}(\varvec{\theta }) || \mathcal{F}_{t-1})\right| \mu _{1t}\mu _{2t}\nonumber \\&\qquad +\,\left| \xi _{3t}(\varvec{\theta })-E(\xi _{3t}(\varvec{\theta })||\mathcal{F}_{t-1}) \right| \mu _{3t} \Bigr \}\nonumber \\&\quad \equiv \tilde{m}_t. \end{aligned}$$
(14)

By defining \(\tilde{M}_{n}^{m}\) analogously and working as before the result of the Lemma follows. \(\square \)

Proof of Theorem 3.1

The first assertion of the Theorem follows from arguments given in Francq and Zakoïan (2010, Prop. 8.3). For the second assertion of the Theorem, we consider the difference

$$\begin{aligned} ST_n^m(\tilde{\varvec{\theta }}_n)-ST_n(\tilde{\varvec{\theta }}_n)=\left[ S_n^{m(2)}(\tilde{\varvec{\theta }}_n)\right] ^2 \frac{\tilde{\sigma }-\tilde{\sigma ^m}}{\tilde{\sigma ^m} \tilde{\sigma }} + \frac{\left[ S_n^{m(2)}(\tilde{\varvec{\theta }}_n)\right] ^2-\left[ S_n^{(2)}(\tilde{\varvec{\theta }}_n)\right] ^2}{\tilde{\sigma }} \end{aligned}$$

The above representation is composed of the following differences, \(W_{22}^m-W_{22}\), \(W_{12}^m-W_{12}\), \(W_{21}^m-W_{21}\), \(W_{11}^m-W_{11}\), \({V_{11}^m}^{-1}V_{12}^m-{V_{11}}^{-1}V_{12}\), \(V_{21}^m{V_{11}^m}^{-1}-V_{21}V_{11}^{-1}\) and \({V_{11}^m}^{-1}V_{12}^m-V_{11}^{-1}V_{12}\) which all converge to zero as results of Lemmas 2.1 and 2.3. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kitromilidou, S., Fokianos, K. Mallows’ quasi-likelihood estimation for log-linear Poisson autoregressions. Stat Inference Stoch Process 19, 337–361 (2016). https://doi.org/10.1007/s11203-015-9131-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11203-015-9131-z

Keywords

Mathematics Subject Classification

Navigation