Skip to main content
Log in

Poisson autoregressive process modeling via the penalized conditional maximum likelihood procedure

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, we consider the penalized estimation procedure for Poisson autoregressive model with sparse parameter structure. We study the theoretical properties of penalized conditional maximum likelihood (PCML) with several different penalties. We show that the penalized estimators perform as well as the true model was known. We establish the oracle properties of PCML estimators. Some simulation studies are conducted to verify the proposed procedure. A real data example is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Al-Osh MA, Alzaid AA (1987) First-order integer-valued autoregressive (INAR (1)) process. J Time Ser Anal 8(3):261–275

    Article  MathSciNet  Google Scholar 

  • Al-Osh MA, Alzaid AA (1988) Integer-valued moving average (INMA) process. Stat Pap 29(1):281–300

    Article  MathSciNet  Google Scholar 

  • Brillinger DR (2001) Time series: data analysis and theory. SIAM

  • Davis RA, Dunsmuir WTM, Streett SB (2003) Observation-driven models for Poisson counts. Biometrika 90(4):777–790

    Article  MathSciNet  Google Scholar 

  • Dicker L, Huang B, Lin X (2013) Variable selection and estimation with the seamless-L0 penalty. Stat Sin:929–962

  • Doukhan P, Fokianos K, Tjøtheim D (2012) On weak dependence conditions for Poisson autoregressions. Stat Prob Lett 82:942–48

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20:101–148

    MathSciNet  MATH  Google Scholar 

  • Fan J, Peng H (2004) On nonconcave penalized likelihood with diverging number of parameters. Ann Stat 32:928–961

    Article  MathSciNet  Google Scholar 

  • Ferland R, Latour A, Oraichi D (2006) Integer-valued GARCH process. J Time Ser Anal 27(6):923–942

    Article  MathSciNet  Google Scholar 

  • Fokianos K, Rahbek A, Tjøstheim D (2009) Poisson autoregression. J Am Stat Assoc 104(488):1430–1439

    Article  MathSciNet  Google Scholar 

  • Hall P, Heyde CC (2014) Martingale limit theory and its application. Academic press, New York

    MATH  Google Scholar 

  • Kedem B, Fokianos K (2005) Regression models for time series analysis. Wiley, New York

    MATH  Google Scholar 

  • Khoo WC, Ong SH, Biswas A (2017) Modeling time series of counts with a new class of INAR (1) model. Stat Pap 58:393–416

    Article  MathSciNet  Google Scholar 

  • Nardi Y, Rinaldo A (2011) Autoregressive process modeling via the LASSO procedure. J Multivar Anal 102(3):528–549

    Article  MathSciNet  Google Scholar 

  • Steutel FW, Van Harn K (1979) Discrete analogues of self-decomposability and stability. Ann Prob:893–899

    Article  MathSciNet  Google Scholar 

  • Wang H, Li G, Tsai CL (2007a) Regression coefficient and autoregressive order shrinkage and selection via the LASSO. J R Stat Soc 69(1):63–78

    Article  MathSciNet  Google Scholar 

  • Wang H, Li R, Tsai CL (2007b) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568

    Article  MathSciNet  Google Scholar 

  • Yang K, Wang D, Jia B, Li H (2016) An integer-valued threshold autoregressive process based on negative binomial thinning. Stat Pap. doi:10.1007/s00362-016-0808-1

    Article  MathSciNet  Google Scholar 

  • Yoon YJ, Park C, Lee T (2013) Penalized regression models with autoregressive error terms. J Stat Comput Simul 83(9):1756–1772

    Article  MathSciNet  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat:894–942

    Article  MathSciNet  Google Scholar 

  • Zhang H, Wang D, Zhu F (2010) Inference for INAR (p) processes with signed generalized power series thinning operator. J Stat Plan Inference 140(3):667–683

    Article  MathSciNet  Google Scholar 

  • Zheng H, Basawa IV, Datta S (2006) Inference for pth-order random coefficient integer-valued autoregressive processes. J Time Ser Anal 27(3):411–440

    Article  MathSciNet  Google Scholar 

  • Zhu F, Wang D (2011) Estimation and testing for a Poisson autoregressive model. Metrika 73(2):211–230

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the Editor and two reviewers for their valuable suggestions and comments which greatly improved the article. This work is supported by National Natural Science Foundation of China (No. 11271155, 11371168, J1310022, 11571138, 11501241, 11571051, 11301137, 11301212 and 11401146), National Social Science Foundation of China (16BTJ020), Science and Technology Research Program of Education Department in Jilin Province for the 12th Five-Year Plan (440020031139) and Jilin Province Natural Science Foundation (20150520053JH).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dehui Wang.

Appendix

Appendix

To prove Theorem 1, we need the following lemma.

Lemma A.1

Under condition (C.1), as \(n\rightarrow \infty \) we have

$$\begin{aligned} \frac{1}{\sqrt{n}}B(\varvec{\theta _{0}})\mathop {\longrightarrow }\limits ^{D}N(\mathbf 0 ,\varvec{\Sigma }(\varvec{\theta _{0}})), \end{aligned}$$

where \(B({\varvec{\theta }}_{\varvec{0}})=\sum ^{n}_{t=1}\frac{\partial l_{n}({\varvec{\theta }}_{\varvec{0}})}{\partial {\varvec{\theta }}}\); the Fisher information matrix \({\varvec{\Sigma }}({\varvec{\theta }_{\varvec{0}}})=E\left( \frac{{\varvec{Y}}_{t}{\varvec{Y}}_{t}^{\mathrm {T}}}{\gamma _{t}}\right) \) with \({\varvec{Y}}_{t}=(1,X_{t-p},\ldots ,X_{t-1})^{\mathrm {T}}\).

Proof of Lemma A.1

Let

$$\begin{aligned}&T_{n1}=\sum _{t=1}^{n}\left( \frac{X_{t}}{\gamma _{t}}-1\right) ,\\&T_{ni}=\sum _{t=1}^{n}\left( \frac{X_{t}}{\gamma _{t}}-1\right) X_{t-(p+2-i)},~2\le i\le p+1, \end{aligned}$$

Through some calculation, we can derive that

$$\begin{aligned}&E\left( \left( \frac{X_{n}}{\gamma _{n}}-1\right) \Big |\mathscr {F}_{n-1}\right) =0~,\\&E\bigl (T_{n1}|\mathscr {F}_{n-1}\bigr )=E\left( T_{(n-1)1}+\left( \frac{X_{n}}{\gamma _{n}}-1\right) \Big |\mathscr {F}_{n-1}\right) =T_{(n-1)1}, \end{aligned}$$

which implies that \(\{T_{n1},\mathscr {F}_{n},n\ge 1\}\) is a martingale with \(\mathscr {F}_{n}=\sigma \left( X_{n},X_{n-1},\ldots ,X_{0}\right) \). By \(E|X_{t}|^{4}<\infty \), the strict stationarity of \(\{X_{t}\}\), and the ergodic theorem, we obtain that

$$\begin{aligned}&E\left( \frac{X_{n}}{\gamma _{n}}-1\right) ^{2}<\infty ,\\&\frac{1}{n}\sum ^{n}_{t=1}\left( \frac{X_{t}}{\gamma _{t}}-1\right) ^{2}\mathop {\longrightarrow }\limits ^{a.s.}E\left( \frac{X_{n}}{\gamma _{n}}-1\right) ^{2}=E(\frac{1}{\gamma _{n}})=\sigma _{11}. \end{aligned}$$

Using the martingale central limit theorem (Hall and Heyde 2014), we get that

$$\begin{aligned} \frac{1}{\sqrt{n}}T_{n1}\mathop {\longrightarrow }\limits ^{D}N(0,\sigma _{11}). \end{aligned}$$

Similarly, we can prove \(\{T_{ni},\mathscr {F}_{n},n\ge 1\}\), \(i=2,\ldots ,p+1\) is a martingale and

$$\begin{aligned} \frac{1}{\sqrt{n}}T_{ni}\mathop {\longrightarrow }\limits ^{D}N(0,\sigma _{ii}). \end{aligned}$$

For any \(\mathbf c =(c_{1},\ldots ,c_{p+1})^{\mathrm {T}}\in \mathbb {R}^{p+1}\backslash (0,\ldots ,0)^{\mathrm {T}}\), we get

$$\begin{aligned} \frac{1}{\sqrt{n}}{} \mathbf c ^{\mathrm {T}}\begin{pmatrix} T_{n1}\\ T_{n2}\\ \vdots \\ T_{n(p+1)} \end{pmatrix}&=\frac{1}{\sqrt{n}}\sum ^{n}_{t=1}\left( c_{1}+c_{2}X_{t-p}+\cdots +c_{p+1}X_{t-1}\right) \left( \frac{X_{t}}{\gamma _{t}}-1\right) \\&\mathop {\longrightarrow }\limits ^{D}N \left( \mathbf 0 ,E\left( c_{1}+c_{2}X_{0}+\cdots +c_{p+1}X_{p-1}\right) ^{2}\left( \frac{X_{p}}{\gamma _{p}}-1\right) ^{2}\right) . \end{aligned}$$

Thus, by the Cramer-Wold device,

$$\begin{aligned} \frac{1}{\sqrt{n}}\begin{pmatrix} T_{n1}\\ T_{n2}\\ \vdots \\ T_{n(p+1)} \end{pmatrix}=\frac{1}{\sqrt{n}}B(\varvec{\theta _{0}})\mathop {\longrightarrow }\limits ^{D}N(\mathbf 0 ,\varvec{\Sigma }(\varvec{\theta _{0}})). \end{aligned}$$

This end this proof. \(\square \)

Proof of Theorem 1

Let \(\beta _{n}=(n^{-1/2}+a_{n})\), following Fan and Li (2001), we need to show that for any \(\varepsilon >0\), there exists a constant d, such that

$$\begin{aligned} \mathbf P \left[ \sup _{\Vert \mathbf u \Vert =d}\left\{ Q_{n}(\varvec{\theta _{0}}+\beta _{n}{} \mathbf u )\right\} <Q_{n}(\varvec{\theta _{0}}) \right] \ge 1-\varepsilon , \end{aligned}$$
(7)

which implies that there exists a local maximum in the ball \(\{\varvec{\theta _{0}}+\beta _{n}{} \mathbf u :\Vert \mathbf u \Vert \le d\}\) with probability at least \(1-\varepsilon \), then there exists a local maximizer with \(\Vert \varvec{\hat{\theta }}-\varvec{\theta _{0}}\Vert =O_{p}(\beta _{n})\). Note that

$$\begin{aligned} D_{n}(\mathbf u )&=Q_{n}(\varvec{\theta _{0}}+\beta _{n}{} \mathbf u )-Q_{n}(\varvec{\theta _{0}})\nonumber \\&=L_{n}(\varvec{\theta _{0}}+\beta _{n}{} \mathbf u )-L_{n}(\varvec{\theta _{0}})-n\sum ^{p+1}_{i}\left( P_{\lambda _{n}}(|\theta _{i}^{0}+\beta _{n}u_{i}|)-P_{\lambda _{n}}(|\theta _{i}^{0}|)\right) \nonumber \\&\le L_{n}(\varvec{\theta _{0}}+\beta _{n}{} \mathbf u )-L_{n}(\varvec{\theta _{0}})-n\sum ^{s}_{i}\left( P_{\lambda _{n}}(|\theta _{i}^{0}+\beta _{n}u_{i}|)-P_{\lambda _{n}}(|\theta _{i}^{0}|)\right) . \end{aligned}$$
(8)

By Taylor series expansion, we obtain

$$\begin{aligned} \beta _{n}{} \mathbf u ^{\mathrm {T}}B(\varvec{\theta _{0}})&+\frac{1}{2}\beta _{n}^{2}{} \mathbf u ^{\mathrm {T}}\frac{\partial ^{2}L_{n}(\varvec{\theta _{0}})}{\partial \varvec{\theta }\partial \varvec{\theta }^{\mathrm {T}}}{} \mathbf u \bigl \{1+o(1)\bigr \}\nonumber \\&-\sum ^{s}_{i}\left\{ n\beta _{n}\dot{P}_{\lambda _{n}}(|\theta ^{0}_{i}|)sgn(\theta ^{0}_{i})u_{i}+n\beta ^{2}_{n}\ddot{P}_{\lambda _{n}}(|\theta ^{0}_{i}|)u^{2}_{i}\left[ 1+o(1)\right] \right\} \nonumber \\&=A_{1}+A_{2}+A_{3}, \end{aligned}$$

where

$$\begin{aligned}&A_{1}=\beta _{n}{} \mathbf u ^{\mathrm {T}}B(\varvec{\theta _{0}}),\\&A_{2}=\frac{1}{2}\beta _{n}^{2}{} \mathbf u ^{\mathrm {T}}\frac{\partial ^{2}L_{n}(\varvec{\theta _{0}})}{\partial \varvec{\theta }\partial \varvec{\theta }^{\mathrm {T}}}{} \mathbf u \bigl \{1+o(1)\bigr \},\\&A_{3}=-\sum ^{s}_{i}\left\{ n\beta _{n}\dot{P}_{\lambda _{n}}(|\theta ^{0}_{i}|)\text {sgn}(\theta ^{0}_{i})u_{i}+n\beta ^{2}_{n}\ddot{P}_{\lambda _{n}}(|\theta ^{0}_{i}|)u^{2}_{j}\left[ 1+o(1)\right] \right\} . \end{aligned}$$

From Lemma A.1, we know that \(n^{-1/2}B(\varvec{\theta _{0}})=O_{p}(1)\), then \(A_{1}=O_{p}(n^{1/2}\beta _{n})=O_{p}(n\beta ^{2}_{n})\). By ergodicity, we get \(A_{2}= -n\beta _{n}^{2}{} \mathbf u ^{\mathrm {T}}\varvec{\Sigma }(\varvec{\theta _{0}})\mathbf u \), as \(n\rightarrow \infty \). From conditions (C.2) and (C.3), we have \(A_{3}\) is bounded by \(\sqrt{s}\beta _{n}a_{n}\Vert \mathbf u \Vert +n\beta _{n}^{2}b_{n}\Vert \mathbf u \Vert ^{2}.\) By choosing a sufficient large d, both \(A_{1}\) and \(A_{3}\) are dominated by \(A_{2}\). The proof is completed. \(\square \)

Proof of Lemma 1

We need to prove that with probability tending to one, as \(n\rightarrow \infty \) for any \(\varvec{\theta _{1}}\) satisfying \(\Vert \varvec{\theta _{1}}-\varvec{\theta _{10}}\Vert =O_{p}(n^{-1/2})\) and for some small \(\epsilon _{n}=\eta n^{-1/2}\) and \(j=s+1,\ldots ,p+1\)

$$\begin{aligned}&\frac{\partial Q_{n}(\varvec{\theta })}{\partial \theta _{j}}<0, \quad \mathrm{for}~0<\theta _{j}<\epsilon _{n},\end{aligned}$$
(9)
$$\begin{aligned}&\frac{\partial Q_{n}(\varvec{\theta })}{\partial \theta _{j}}>0, \quad \mathrm{for}~-\epsilon _{n}<\theta _{j}<0. \end{aligned}$$
(10)

To show (9), by Taylor’s expansion,

$$\begin{aligned} \frac{\partial Q_{n}(\varvec{\theta })}{\partial \theta _{j}}=\,&\frac{\partial L_{n}(\varvec{\theta })}{\partial \theta _{j}}-n\dot{P}_{\lambda _{n}}(|\theta _{j}|)\text {sgn}(\theta _{j})\nonumber \\ =\,&\frac{\partial L_{n}(\varvec{\theta _{0}})}{\partial \theta _{j}}+\sum ^{p+1}_{i=1}\frac{\partial ^{2} L_{n}(\varvec{\theta _{0}})}{\partial \theta _{i}\partial \theta _{j}}(\theta _{i}-\theta _{i}^{0})\left\{ 1+o(1)\right\} \nonumber \\&-n\dot{P}_{\lambda _{n}}(|\theta _{j}|)\text {sgn}(\theta _{j}). \end{aligned}$$
(11)

From Lemma A.1, we know that \(\frac{\partial L_{n}(\varvec{\theta _{0}})}{\partial \theta _{j}}=O_{p}(n^{1/2})\). By law of large numbers, strict stationarity and \(\Vert \varvec{\theta _{1}}-\varvec{\theta _{10}}\Vert =O_{p}(n^{-1/2})\), we have

$$\begin{aligned} \sum ^{p+1}_{i=1}\frac{\partial ^{2} L_{n}(\varvec{\theta _{0}})}{\partial \theta _{i}\partial \theta _{j}}(\theta _{i}-\theta _{i0})\left\{ 1+o(1)\right\} =O_{p}(n^{1/2}). \end{aligned}$$

Thus, \(\frac{\partial Q_{n}(\varvec{\theta })}{\partial \theta _{j}}=n\lambda _{n}\left\{ O_{p}(n^{-1/2}/\lambda _{n})-\lambda _{n}^{-1}\dot{P}_{\lambda _{n}}(|\theta _{j}|)\text {sgn}(\theta _{j})\right\} \). Since \(n^{-1/2}/\lambda _{n}\rightarrow 0\) and \(\lambda _{n}^{-1}\dot{P}_{\lambda _{n}}(|\theta _{j}|)>0\) as \(n\rightarrow \infty \). The sign of (11) is dominated by that of \(\theta _{j}\). Hence, (10) follows. This completes the proof. \(\square \)

Proof of Theorem 2

Part (\(i\)) holds by Lemma 1. We only need to prove (\(ii\)). From part (\(i\)), we know that \(\varvec{\hat{\theta }_{2}}=\mathbf 0 \) with probability tending to 1. Thus, there exists a root-n consistent local maximum estimator \(\varvec{\hat{\theta }_{1}}\) satisfies the following equation

$$\begin{aligned} \frac{\partial Q_{n}(\varvec{\theta })}{\partial \theta _{j}}\Bigg |_{\varvec{\theta }=\begin{pmatrix}\varvec{\hat{\theta }_{1}}\\ \mathbf 0 \end{pmatrix}}=0,~~for~j=1,\ldots ,s. \end{aligned}$$

By the Taylor expansion, we have

$$\begin{aligned} 0=&\,\frac{\partial L_{n}(\varvec{\theta _{0}})}{\partial \theta _{j}}-n\dot{P}_{\lambda _{n}}(|\hat{\theta }_{j}|)\text {sgn}(\hat{\theta }_{j})\\ =&\,\frac{\partial L_{n}(\varvec{\theta _{0}})}{\partial \theta _{j}}+\sum ^{s}_{l=1}\biggl \{\frac{\partial ^{2}L_{n}(\varvec{\theta _{0}})}{\partial \theta _{l}\partial \theta _{j}}+o_{p}(1)\biggr \}(\hat{\theta }_{l}-\theta _{l}^{0})\\&-n\left\{ \dot{P}_{\lambda _{n}}(|\theta _{j}^{0}|)\text {sgn}(\theta _{j}^{0})+\left( \ddot{P}_{\lambda _{n}}(|\theta _{j}^{0}|)+o_{p}(1)\right) (\hat{\theta }_{j}-\theta _{j}^{0})\right\} . \end{aligned}$$

This indicates

$$\begin{aligned} \sqrt{n}(\varvec{\Sigma }^{s}(\varvec{\theta _0})+\varvec{\Lambda })\{(\varvec{\hat{\theta }_{1}}-\varvec{\theta _{10}})+(\varvec{\Sigma }^{s}(\varvec{\theta _0})+\varvec{\Lambda })^{-1}{} \mathbf b )\}=\frac{1}{\sqrt{n}}B^{s}(\varvec{\theta }_{0})+o_{p}(1), \end{aligned}$$

where \(B^{s}(\varvec{\theta }_{\varvec{0}})=\sum ^{n}_{t=1}\frac{1}{\gamma _{t}}\left( X_{t}-\gamma _{t}\right) {\varvec{Y}}^{s}_{t}\) and \({\varvec{Y}}^{s}_{t}=(1,X_{t-p},\ldots ,X_{t-p+s-2})^{\mathrm {T}}\). From the Slutskys theorem and the martingale central limit theorem, we complete the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wang, D. & Zhang, H. Poisson autoregressive process modeling via the penalized conditional maximum likelihood procedure. Stat Papers 61, 245–260 (2020). https://doi.org/10.1007/s00362-017-0938-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-017-0938-0

Keywords

Navigation