Skip to main content

Dynamic Networks with Multi-scale Temporal Structure

Abstract

We describe a novel method for modeling non-stationary multivariate time series, with time-varying conditional dependencies represented through dynamic networks. Our proposed approach combines traditional multi-scale modeling and network based neighborhood selection, aiming at capturing temporally local structure in the data while maintaining sparsity of the potential interactions. Our multi-scale framework is based on recursive dyadic partitioning, which recursively partitions the temporal axis into finer intervals and allows us to detect local network structural changes at varying temporal resolutions. The dynamic neighborhood selection is achieved through penalized likelihood estimation, where the penalty seeks to limit the number of neighbors used to model the data. We present theoretical and numerical results describing the performance of our method, which is motivated and illustrated using task-based magnetoencephalography (MEG) data in neuroscience.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

References

  • Amano, K., Takeda, T., Haji, T., Terao, M., Maruya, K., Matsumoto, K., Murakami, I. and Nishida, S. (2012). Human neural responses involved in spatial pooling of locally ambiguous motion signals. Journal of Neurophysiology107, 3493–3508.

    Article  Google Scholar 

  • Bach, F.R. (2008). Consistency of the group lasso and multiple kernel learning. The Journal of Machine Learning Research 9, 1179–1225.

    MathSciNet  MATH  Google Scholar 

  • Barigozzi, M. and Brownlees, C.T. (2014). Nets: network estimation for time series. Available at SSRN 2249909.

  • Basu, S., Shojaie, A. and Michailidis, G. (2015). Network granger causality with inherent grouping structure. Journal of Machine Learning Research 16, 417–453. http://jmlr.org/papers/v16/basu15a.html.

    MathSciNet  MATH  Google Scholar 

  • Betancourt, B., Rodríguez, A. and Boyd, N. (2017). Bayesian fused lasso regression for dynamic binary networks. Journal of Computational and Graphical Statistics.

  • Bettencourt, K.C. and Xu, Y. (2016). Decoding the content of visual short-term memory under distraction in occipital and parietal areas. Nature Neuroscience 19, 150–157.

    Article  Google Scholar 

  • Bolstad, A., Van Veen, B.D. and Nowak, R. (2011). Causal network inference via group sparse regularization. IEEE Transactions on Signal Processing 59, 2628– 2641.

    MathSciNet  Article  Google Scholar 

  • Braddick, O., O'Brien, J., Wattam-Bell, J., Atkinson, J. and Turner, R. (2000). Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain. Current Biology 10, 731–734.

    Article  Google Scholar 

  • Bullmore, E. and Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience10, 186– 198.

    Article  Google Scholar 

  • Calabro, F. and Vaina, L. (2012). Interaction of cortical networks mediating object motion detection by moving observers. Experimental Brain Research 221, 177– 189.

    Article  Google Scholar 

  • Davis, R.A., Lee, T. and Rodriguez-Yam, G.A. (2008). Break detection for a class of nonlinear time series models. Journal of Time Series Analysis 29, 834–867.

    MathSciNet  Article  Google Scholar 

  • Donoho, D.L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Applied and Computational Harmonic Analysis1, 100– 115.

    MathSciNet  Article  Google Scholar 

  • Donoho, D.L. (1997). Cart and best-ortho-basis: a connection. Annals of Statistics 25, 1870–1911.

    MathSciNet  Article  Google Scholar 

  • Fouque, J.-P., Papanicolaou, G., Sircar, R. and Sølna, K. (2011). Multiscale stochastic volatility for equity, interest rate, and credit derivatives. Cambridge University Press, Cambridge.

    Book  Google Scholar 

  • Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 3, 424–438.

    Article  Google Scholar 

  • Hamilton, J.D. (1983). Oil and the macroeconomy since world war ii. The Journal of Political Economy 91, 2, 228–248.

    Article  Google Scholar 

  • Hiemstra, C. and Jones, J.D. (1994). Testing for linear and nonlinear granger causality in the stock price-volume relation. Journal of Finance 49, 1639–1664.

    Google Scholar 

  • Honey, C.J., Kötter, R., Breakspear, M. and Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proceedings of the National Academy of Sciences of the United States of America 104, 10240– 10245.

    Article  Google Scholar 

  • Killick, R., Fearnhead, P. and Eckley, I.A. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association 107, 1590–1598.

    MathSciNet  Article  Google Scholar 

  • Kolaczyk, E.D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer Publishing Company, Incorporated, 1st edn.

  • Kolaczyk, E.D. and Nowak, R.D. (2005). Multiscale generalised linear models for nonparametric function estimation. Biometrika 92, 119–133.

    MathSciNet  Article  Google Scholar 

  • Li, J.Q. and Barron, A.R. (2000). Mixture density estimation, p. 279–285.

  • Long, C., Brown, E., Triantafyllou, C., Aharon, I., Wald, L. and Solo, V. (2005). Nonstationary noise estimation in functional mri. Neuroimage28, 890–903.

    Article  Google Scholar 

  • Louie, M.M. and Kolaczyk, E.D. (2006). A multiscale method for disease mapping in spatial epidemiology. Statistics in Medicine 25, 1287–1306.

    MathSciNet  Article  Google Scholar 

  • Mallat, S.G. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 674–693.

    Article  Google Scholar 

  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, 3, 1436–1462.

    MathSciNet  Article  Google Scholar 

  • Mukhopadhyay, N.D. and Chatterjee, S. (2007). Causality and pathway search in microarray time series experiment. Bioinformatics 23, 442–449.

    Article  Google Scholar 

  • Müller, A. (2001). Stochastic ordering of multivariate normal distributions. Annals of the Institute of Statistical Mathematics 53, 567–575.

    MathSciNet  Article  Google Scholar 

  • Rana, K.D. and Vaina, L.M. (2014). Functional roles of 10 hz alpha-band power modulating engagement and disengagement of cortical networks in a complex visual motion task. PloS One 9, e107715.

    Article  Google Scholar 

  • Sims, C.A. (1972). Money, income, and causality. American Economic Review 62, 540–552.

    Google Scholar 

  • Willett, R.M. and Nowak, R.D. (2007). Multiscale poisson intensity and density estimation. IEEE Transactions on Information Theory 53, 3171–3187.

    MathSciNet  Article  Google Scholar 

  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68, 49–67.

    MathSciNet  Article  Google Scholar 

  • Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Lucia Vaina and Kunjan Rana for providing the MEG data and offering helpful discussion throughout. This work was supported in part by funding under AFOSR award 12RSL042 and NIH award 1R01NS095369-01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric D. Kolaczyk.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

A.1 Algorithm using RDP

Here we provide the algorithm for implementation based on recursive dyadic partitions. Assume the length of the time series equals T = 2J and \(j_{min} = \min \limits _{j}\) such that 2j > p + 1. Note that p + 1 is the minimum required number of observations to fit the restricted VAR(p) model. Assume J > jmin,

figure b

Algorithm 2 splits only at dyadic positions. The candidate partitions \(\mathcal {P} \preceq \mathcal {P}_{D_{y}}^{*}\) can be represented as subtrees of a binary tree of depth \(\log _{2} T\). Given a dataset of length T = 2J, we have 20 root node, 21 nodes at level 1, 22 nodes, 23 nodes, and so on, at the following levels, until we reach the leaf level, which has 2(J− 1) nodes. The complexity of the algorithm is then of order \(\mathcal {O}(T)\) calls to fit the group lasso regression and \(\mathcal {O}(T)\) calls for comparisons.

A.2 Proof of Theorem 3.1

Proof.

Theorem 3.1

The proof contains two parts. In the first part, we show that Eq. 3.1 holds, under H0. In the second part, we show that Eq. 3.2 holds, under H1.

Part 1

We begin by defining the group lasso penalized likelihood on an interval I:

$$ \begin{array}{@{}rcl@{}} PL_{I} = \frac{1}{|I|}\left\|\mathbf{X}_{I}(u) -\mathbf{X}_{I}(-u)\boldsymbol\theta_{I}(u,v) \right\|{~}_{2}^{2} + \lambda_{I} \sum\limits_{v \in V\backslash \{u\}}\left\|\boldsymbol\theta_{I}(u,v)\right\|{~}_{2}. \end{array} $$
(6.1)

Let \(\boldsymbol {\hat {\theta }}_{1:T}\) be the 𝜃 that minimizes the penalized likelihood Eq. 6.1 on the interval from 1 to T and \(\hat {PL}_{1:T}\) be the quantity upon substituting \(\boldsymbol {\hat {\theta }}_{1:T}\) in Eq. 6.1. Consider any alternative model with a change point detected at point \(\hat {\tau }\in (1, T)\). Denote by \(\boldsymbol {\hat {\theta }}_{1:\hat {\tau }}\) and \(\boldsymbol {\hat {\theta }}_{\hat {\tau }:T}\) the coefficients 𝜃 that minimize Eq. 6.1 over intervals \([1, \hat {\tau }]\) and \((\hat {\tau }, T]\), respectively. Given our model, Eq. 3.1 in Theorem 3.1 is equivalent to

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{0}}(\hat{PL}_{1:T} \leq \hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} + C_{3}\log T) \longrightarrow 1. \end{array} $$

The additional term \(C_{3}\log T\) comes from the fact that the alternative model has 1 more partition than the null model, with C3 = 1/2 using RDP and C3 = 3/2 using RP. We expand \(\hat {PL}_{1:\hat {\tau }} + \hat {PL}_{\hat {\tau }:T} - \hat {PL}_{1:T} +C_{3}\log T\) and get:

$$ \begin{array}{@{}rcl@{}} &&{} \frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&+\! \frac{1}{T- \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2}\\ &&+ \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}- \frac{1}{T}\left\|\vphantom{\left.\sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2}}\mathbf{X}_{1:T}(u)\right.\\&&\left.- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2}- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2}\\ && + C_{3}\log T. \end{array} $$
(6.2)

By rewriting the last line of Eq. 6.2, we have

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&~~+ \frac{1}{T - \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2}\\ &&+ \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}\! \\ &&- \frac{1}{T}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2} + C_{3}\log T. \end{array} $$
(6.3)

We then add and subtract a term in both line 3 and line 4 of Eq. 6.3. In doing so, we have:

$$ \begin{array}{@{}rcl@{}} &&{} \frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - {\sum}_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}{\sum}_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&+ \frac{1}{T - \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) + \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right.\\&&\left.- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) + \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right.\\&&\left.- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2} + C_{3}\log T. \end{array} $$
(6.4)

From which we have that:

$$ \begin{array}{@{}rcl@{}} &&\text{equation~(6.4)} \\ &\geq & \frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} \\ &&+ \frac{1}{T- \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left \| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{2}{T}\left( \left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2} \right. \\ &&\times \left. \left\| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2} \right) \\ &&- \frac{1}{T}\left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right \|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{2}{T}\left( \left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right \|{~}_{2} \right. \\ &&\times \left. \left\|\sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2} \right) \\ &&+ \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}+ \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\&&- \lambda_{1:T}{\sum}_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2} \\ &&+ C_{3}\log T. \end{array} $$
(6.5)

Under assumptions (1) to (5), Bach (2008) reformulated the group lasso penalized likelihood (6.1) as:

$$ PL_{I} = \hat{\boldsymbol{\Sigma}}_{\mathbf{X}(u)\mathbf{X}(u)} - 2\hat{\boldsymbol{\Sigma}}_{\mathbf{X}(-u)\mathbf{X}(u)}^{\prime}\boldsymbol{\theta} + \boldsymbol{\theta}^{\prime}\boldsymbol{\hat{\Sigma}}_{\mathbf{X}(-u)\mathbf{X}(-u)}\boldsymbol\theta + \lambda_{I} \sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\theta}(u,v)\right\|{~}_{2} $$
(6.6)

where \(\hat {\boldsymbol {\Sigma }}_{\mathbf {X}(u)\mathbf {X}(u)} = \frac {1}{|I|}\mathbf {X}(u)^{\prime } {\Pi }_{|I|}\mathbf {X}(u)\), \(\hat {\boldsymbol {\Sigma }}_{\mathbf {X}(-u) \mathbf {X}(u)} = \frac {1}{|I|}\mathbf {X}(-u)^{\prime }{\Pi }_{|I|}\mathbf {X}(u)\) and \(\boldsymbol \theta ^{\prime }\hat {\boldsymbol {\Sigma }}_{\mathbf {X}(-u)\mathbf {X}(-u)}\boldsymbol \theta = \frac {1}{|I|}\mathbf {X}(-u)^{\prime }{\Pi }_{|I|}\mathbf {X}(-u)\) are the empirical covariance matrices with π|I| defined as \({\Pi }_{|I|} = \mathbf {I}_{|I|}-\frac {1}{|I|}\mathbf {1}_{|I|}\mathbf {1}_{|I|}^{\prime }\) and showed that the group lasso estimator \(\boldsymbol {\hat {\theta }}\) converges in probability to 𝜃. Using expression in Eq. 6.6 and collecting similar terms, we could then rewrite (6.5) as:

$$ \begin{array}{@{}rcl@{}} &&{} \frac{T-\hat{\tau}}{T}\left\{\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{1:\hat{\tau}}(u) \mathbf{X}_{1:\hat{\tau}}(u)} - 2\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{1:\hat{\tau}}(-u) \mathbf{X}_{1:\hat{\tau}}(u)} \hat{\boldsymbol\theta}_{1:\hat{\tau}} + \hat{\boldsymbol\theta}_{{1:\hat{\tau}}}^{\prime} \hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{1:\hat{\tau}}(-u)\mathbf{X}_{1:\hat{\tau}}(-u)} \hat{\boldsymbol\theta}_{1:\hat{\tau}} \right\} \\ &&{}+ \frac{\hat{\tau}}{T}\left\{\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{\hat{\tau}:T}(u) \mathbf{X}_{\hat{\tau}:T}(u)} - 2\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{\hat{\tau}:T}(-u) \mathbf{X}_{\hat{\tau}:T}(u)} \hat{\boldsymbol\theta}_{\hat{\tau}:T} + \hat{\boldsymbol\theta}_{{\hat{\tau}:T}}^{\prime} \hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{\hat{\tau}:T}(-u)\mathbf{X}_{1:\hat{\tau}}(-u)} \hat{\boldsymbol\theta}_{\hat{\tau}:T} \right\} \end{array} $$
(6.7)
$$ \begin{array}{@{}rcl@{}} &&{}- \left\| \hat{\boldsymbol{\Sigma}}^{1/2}_{\mathbf{X}_{1:\hat{\tau}}(-u) \mathbf{X}_{1:\hat{\tau}}(-u)} \left( \hat{\boldsymbol\theta}_{1:\hat{\tau}} - \hat{\boldsymbol\theta}_{1:T} \right)\right\|{~}_{2}^{2} - \left\| \hat{\boldsymbol{\Sigma}}^{1/2}_{\mathbf{X}_{\hat{\tau}:T}(-u) \mathbf{X}_{\hat{\tau}:T}(-u)} \left( \hat{\boldsymbol\theta}_{\hat{\tau}:T} - \hat{\boldsymbol\theta}_{1:T} \right)\right\|{~}_{2}^{2} \end{array} $$
(6.8)
$$ \begin{array}{@{}rcl@{}} &&{}- \frac{2}{T}\left( \left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}\right.\\&&\left. \left \| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\left( \boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)- \boldsymbol{\hat{\theta}}_{1:T}(u,v)\right) \right\|{~}_{2}\right) \end{array} $$
(6.9)
$$ \begin{array}{@{}rcl@{}} &&{}- \frac{2}{T}\left( \left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}\right.\\&&\left. \left \| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\left( \boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)- \boldsymbol{\hat{\theta}}_{1:T}(u,v)\right) \right\|{~}_{2}\right) \end{array} $$
(6.10)
$$ \begin{array}{@{}rcl@{}} &&{}+ \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\hat{\boldsymbol\theta}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} +\lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\hat{\boldsymbol\theta}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}\\ &&{}- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\hat{\boldsymbol\theta}_{1:T}(u,v)\right\|{~}_{2} + C_{3}\log T. \end{array} $$
(6.11)

Note that in the previous expression, the first two lines are by definition non-negative. The expression in the last line is composed of a collection of penalty terms. They are the group lasso penalties, and all of them converge to zero asymptotically assuming λ(⋅)→0 and λ(⋅)N→0.

Since \(\hat {\boldsymbol \theta }_{1:\hat {\tau }} \stackrel {P}{\longrightarrow } \boldsymbol \theta \), \(\hat {\boldsymbol \theta }_{\hat {\tau }:T} \stackrel {P}{\longrightarrow } \boldsymbol \theta \) and \(\hat {\boldsymbol \theta }_{1:T} \stackrel {P}{\longrightarrow } \boldsymbol \theta \), \(\hat {\boldsymbol \theta }_{1:\hat {\tau }} - \hat {\boldsymbol \theta }_{1:T} \stackrel {P}{\longrightarrow } 0\) and X’s have finite moments up to order 4, each term in Eqs. 6.86.9 and 6.10 converges to 0 in probability.

Putting everything together, we then complete the proof of the first part of the theorem:

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{0}}(\hat{PL}_{1:T} \leq \hat{PL}_{1:\hat{\tau}_{i}} + \hat{PL}_{\hat{\tau}:T} + C_{3}\log T) \longrightarrow 1. \end{array} $$

Part 2

Suppose H1 is true. We denote the estimated change point by \(\hat {\tau }\). We show that \(\hat {PL}_{1:\hat {\tau }} + \hat {PL}_{\hat {\tau }:T}\) is minimized at \(\hat {\tau } = \tau \). Assume we have a competing estimator \(\tilde \tau \) with change point detected at time \(\tilde \tau = s\) with sτ. We show that

$$ \begin{array}{@{}rcl@{}} \hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} \leq \hat{PL}_{1:s} + \hat{PL}_{s:T} \end{array} $$
(6.12)

holds with high probability under H1. Without loss of generality, we assume that τs = δ, for some δ > 0 as shown in Fig. 6. For the case that s > τ, a similar argument holds.

Figure 6
figure 6

Relative position of two detected change points

Denote by \(\boldsymbol {\hat {\theta }}_{1:\hat {\tau }}\) and \(\boldsymbol {\hat {\theta }}_{\hat {\tau }:T}\) the estimated coefficients that minimize the penalized likelihoods, given that \(I = \{t: t \in [1,\hat {\tau })\}\) and \(I = \{t: t \in [\hat {\tau }, T]\}\). We also define \(\boldsymbol {\hat {\theta }}_{1:s}\) and \(\boldsymbol {\hat {\theta }}_{s:T}\) to be the estimated coefficients that minimize the penalized likelihoods in Eq. 6.1, given that I = {t : t ∈ [1,s)} and I = {t : t ∈ [s, T]}. The key idea is that \(\hat {\boldsymbol \theta }_{1:\hat {\tau }}\) and \(\hat {\boldsymbol \theta }_{\hat {\tau }:T}\) are consistent estimators of 𝜃1:τ and 𝜃τ:T but \(\hat {\boldsymbol \theta }_{s:T}\) is not a consistent estimator of 𝜃1:τ nor 𝜃τ:T due to the mis-specification error. Therefore, one of the estimators from \(\boldsymbol {\hat {\theta }}_{1:s}\) and \(\boldsymbol {\hat {\theta }}_{s:T}\) such that s < τ is not a consistent estimator on the corresponding intervals. Formally, we have that

$$ \begin{array}{@{}rcl@{}} &&{}\hat{PL}_{1:s} + \hat{PL}_{s:T}\\ &&{}= \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2}\\ &&{\kern2pt}+ \frac{1}{T-s}\left\|\mathbf{X}_{s:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:T}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{s:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2}\\ &&{}= \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2}\\ &&{}+ \frac{1}{T - s}\left\|\mathbf{X}_{s:\tau}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\tau}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&{}+ \frac{\delta \lambda_{s:T}}{T-s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2} \end{array} $$
(6.13)
$$ \begin{array}{@{}rcl@{}} &&{}+ \frac{1}{T-s}\left\|\mathbf{X}_{\tau:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\tau:T}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&{}+ \frac{(T-s-\delta)\lambda_{s:T}}{T-s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \end{array} $$
(6.14)

and

$$ \begin{array}{@{}rcl@{}} &&{}\hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} \\ &&{}= \frac{1}{\tau}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&{}+ \frac{1}{T - \tau}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}. \end{array} $$

We write expression (6.13) as \(\hat {PL}_{1:s} + \hat {PL}_{s:\hat {\tau }}\), and expression (6.14), as \(\tilde {PL}_{s:T}\). We show (6.12) holds by first showing that \(\hat {PL}_{1:s} + \hat {PL}_{s:\hat {\tau }} \geq \hat {PL}_{1:\hat {\tau }}\), and then showing \(\tilde {PL}_{s:T} \geq \hat {PL}_{\hat {\tau }:T}\). We first compute \(\hat {PL}_{1:s} + \hat {PL}_{s:\hat {\tau }} - \hat {PL}_{1:\hat {\tau }}\):

$$ \begin{array}{@{}rcl@{}} &&{}= \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}v_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{\mathbf{X}\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \\ &&{}+ \frac{1}{T - s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} + \frac{\delta \lambda_{s:T}}{T - s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2} \\ &&{}-\frac{1}{\tau}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} - \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2}. \end{array} $$

Assuming there is another group-lasso estimator defined on the the interval between s and \(\hat {\tau }\), which is given by

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\hat{\theta}}_{s:\hat{\tau}} \!\!\!&=&\!\!\! \operatornamewithlimits{arg min}_{\boldsymbol{\theta}} \frac{1}{\hat{\tau}-s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\theta}_{s:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} \\&&\!\!\!+ \lambda_{s:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol\theta_{s:\hat{\tau}}(u,v)\right\|{~}_{2}. \end{array} $$

The estimator \(\boldsymbol {\hat {\theta }}_{s:\hat {\tau }} \) is again a consistent estimator of \(\boldsymbol {\theta }_{1:\hat {\tau }}\) and we have that:

$$ \begin{array}{@{}rcl@{}} &&{}\frac{1}{\hat{\tau} - s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{s:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v)\right\|{~}_{2} \end{array} $$
(6.15)
$$ \begin{array}{@{}rcl@{}} &&{}\leq \frac{1}{T - s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&{}+ \frac{\delta \lambda_{s:T}}{T - s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2}. \end{array} $$
(6.16)

These are directly implied by Theorem (2) in Bach (2008) given that \(\boldsymbol {\hat {\theta }}_{s:T}\) is not consistent in the 2 sense of estimating \(\boldsymbol \theta _{1:\hat {\tau }}\) whenever \(s \neq \hat {\tau }\). Given (6.16), we have that

$$ \begin{array}{@{}rcl@{}} &&{}\hat{PL}_{1:s} + \hat{PL}_{s:\hat{\tau}} - \hat{PL}_{1:\hat{\tau}} \\ &&{}\geq \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \\ &&{\kern2pt}+ \frac{1}{\hat{\tau}-s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{s:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&{\kern2pt}-\frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\hat{\tau}}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} - \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2}. \\ \end{array} $$

The same argument in Part 1 holds here and we have

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{1}}\left( \hat{PL}_{1:s} + \hat{PL}_{s:\hat{\tau}} \geq \hat{PL}_{1:\hat{\tau}}\right) \longrightarrow 1. \end{array} $$

Note that \(\boldsymbol {\hat {\theta }}_{s:T}\) is not a consistent estimator of \(\boldsymbol {\theta }_{\hat {\tau }:T}\) given the change point. Therefore, similar to Eq. 6.16, we have

$$ \begin{array}{@{}rcl@{}} & &{}\frac{1}{T - \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2} \\ &&{}\leq \frac{1}{T - s}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&~~~~~~~~~~+ \frac{(T-s-\delta)\lambda_{s:T}}{T-s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \\ \end{array} $$

and so

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{1}}\left( \tilde{PL}_{s:T} \geq \hat{PL}_{\hat{\tau}:T}\right) \longrightarrow 1. \end{array} $$

Putting the two parts together, we have

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{1}}\left( \hat{PL}_{1:s} + \hat{PL}_{s:T}\geq \hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} \right) \longrightarrow 1 \end{array} $$

for any \(s < \hat {\tau }\). □

A.3 Proof of Theorem 9

Under the assumption of stationarity, we could omit the time index in this section, that is 𝜃 = 𝜃t,∀t. To show Theorem 3.3, we begin with the following lemma.

Lemma 8.1.

Given \(\boldsymbol {\theta } \in \mathbb {R}^{(N-1)p}\), let G(𝜃(u, v)) be a p-dimensional vector with elements

$$ \begin{array}{@{}rcl@{}} G(\boldsymbol{\theta}(u,v)) &= -2T^{-1}\left( \mathbf{X}(v)^{\prime}(\mathbf{X}(u) - {\sum}_{v\in V\backslash\{u\}}\mathbf{X}(v)\boldsymbol{\theta}(u,v))\right). \end{array} $$
(6.17)

A vector \(\boldsymbol {\hat {\theta }}\) with \(\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0\), ∀vV ∖{u} is a solution to the group lasso type of estimator iff for all vV ∖{u}, \( G(\boldsymbol {\hat {\theta }}(u,v)) + \lambda \mathbf {D}(\boldsymbol {\hat {\theta }}(u,v)) = \mathbf {0}\), where \(\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\|{~}_{2} = 1\) in the case of \(\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} > 0\) and \(\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v) \|{~}_{2} < 1\) in the case of \(\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0\).

Proof.

Lemma 8.1

Under KKT conditions, using subdifferential methods, the subdifferential of

$$ \frac{1}{T}\left\|\mathbf{X}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}(v)\boldsymbol{\theta}(u,v)\right\|^{2} + \lambda\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\theta}(u,v)\right\|{~}_{2} $$

is given by \(G(\boldsymbol {\theta }(u,v)) + \lambda \mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\), where \(\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\|{~}_{2} = 1\) if ∥𝜃(u, v)∥ 2 > 0 and \(\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\|{~}_{2} < 1\) if ∥𝜃(u, v)∥ 2 = 0. The lemma follows. □

We now proof Theorem 3.3.

Proof 3.

Assuming that \(\hat {C}_{u} \nsubseteq C_{u}\), there must exist at least one estimated edge that joins two nodes in two different connectivity components. Given the assumptions, we use similar arguments as in the proof of Theorem 3 in Meinshausen and Bühlmann (2006). Hence we have

$$ \mathbb{P}(\exists u \in V: \hat{C}_{u} \nsubseteq C_{u}) \leq N \max_{u \in V} \mathbb{P}(\exists v \in V \backslash C_{u}: v \in \hat{\text{ne}}_{u}), $$

where \(\hat {\text {ne}}_{u}\) is the estimated neighborhood of node u and \(v \in \hat {\text {ne}}_{u}\) means \(\| \boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} > 0\).

Let \({\mathscr{E}}\) be the event that

$$ \max\limits_{u\in V \backslash C_{u}} \left\|G \left( \boldsymbol{\hat{\theta}}(u,v)\right) \right\|{~}_{2}^{2} < \lambda^{2}. $$

Conditional on the event \({\mathscr{E}}\), \(\boldsymbol {\hat {\theta }}\) is also a solution to the group lasso problem. As \(\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0\) for all vVCu, it follows from Lemma 8.1 that \(\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0\) for all vVCu. Hence

$$ \begin{array}{@{}rcl@{}} \mathbb{P}(\exists v\in V \backslash C_{u}: \|\boldsymbol{\hat{\theta}}(u,v)\|{~}_{2} > 0) &\leq & 1 - \mathbb{P}(\mathscr{E})\\ &=& P\left( \max\limits_{v \in V \backslash C_{u}} \left\|G\left( \boldsymbol{\hat{\theta}}(u,v)\right) \right\|{~}_{2}^{2} \geq \lambda^{2} \right). \end{array} $$

It is then sufficient to show that

$$ N^{2} \max_{u \in V\text{, } v\in V \backslash C_{u}} \mathbb{P}\left( \left\|G(\boldsymbol{\hat{\theta}}(u,v))\right\|{~}_{2}^{2} \geq \lambda^{2}\right) \leq \alpha. $$

Note that now the v and Cu are in different connected components, which means that X(v) is conditionally independent of X(Cu). Hence, conditional on all X(Cu), we have

$$ \begin{array}{@{}rcl@{}} \left\|G(\boldsymbol{\hat{\theta}}(u,v))\right\|{~}_{2}^{2} &=& \left\|-2T^{-1}\left( \mathbf{X}(v)^{\prime}(\mathbf{X}(u) - \sum\limits_{i \in C_{u}}\mathbf{X}(i)\boldsymbol{\hat{\theta}}(u,i))\right)\right\|{~}_{2}^{2}\\ &=& 4T^{-2}\left\|(\mathbf{\hat R}_{1}, \cdots, \mathbf{\hat R}_{p} )^{\prime} \right\|{~}_{2}^{2} \end{array} $$

where \(\mathbf {\hat R}_{\ell } =X_{-\ell }(v)^{\prime } \left (\mathbf {X}(u) - {\sum }_{i \in C_{u}}\mathbf {X}(i)\boldsymbol {\hat {\theta }}(u,i)\right )\) is the remainder term and is independent of X(v), at all lags , for = 1,⋯ ,p. It follows that the joint distribution

$$ (\mathbf{\hat R}_{1}, \cdots, \mathbf{\hat R}_{p}| \mathbf{X}(C_{u})) \sim N(\mathbf{0}, \mathbf{\Omega}) $$

for some covariance matrix Ω. Note that this is a conditional distribution given X(Cu). Hence, in the expression of Ω, every term appearing with a suffix u is constant and every term appearing with a suffix v is a normalized random variable. This simplifies the covariance term. Note that

$$ {\boldsymbol{\Omega}}_{p \times p} = \textbf{Cov}\left( \mathbf{\hat R}_{1}, \cdots, \mathbf{\hat R}_{p} \right) $$

and

$$ \begin{array}{@{}rcl@{}} &&{}\textbf{tr}\left( \boldsymbol{\Omega}\right) = \sum\limits_{\ell=1}^{p} \textbf{Var} (\boldsymbol{\hat R}_{\ell}) = \sum\limits_{\ell=1}^{p} \textbf{Var}\!\left( \!{\sum}_{t=1}^{T}\left( \!\!X_{t}(u) - \sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\!\right)X_{t-\ell}(v)\!\!\right)\! \\ &&{\kern7pt}=\sum\limits_{\ell=1}^{p}\sum\limits_{s=1}^{T}\sum\limits_{t=1}^{T}\textbf{Cov}\left[\left( \left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)X_{t-\ell}(v)\right),\right.\\&&{} \left.\left( \left( X_{s}(u)-\sum\limits\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)X_{s-\ell}(v)\right)\right]. \end{array} $$
(6.18)

Conditional on X(Cu), Eq. 6.18 can be further simplified as:

$$ \begin{array}{@{}rcl@{}} \textbf{tr}\left( \boldsymbol{\Omega}\right) \!\!\!&=&\!\!\! \sum\limits_{\ell=1}^{p}\sum\limits_{t=1}^{T}\sum\limits_{s=1}^{T}\left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\\&& \left( X_{s}(u)-\sum\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right) \textbf{Cov} \left[ X_{t-\ell}(v) ,X_{s-\ell}(v) \right] \\ \!\!\!&\leq&\!\!\! \sum\limits_{\ell=1}^{p}\sum\limits_{t=1}^{T}\sum\limits_{s=1}^{T}\left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\\&& \left( X_{s}(u)-\sum\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\sqrt{\textbf{Var}(X_{t-\ell}(v)) \textbf{Var}(X_{s-\ell}(v))}. \end{array} $$

We have the above bounded by

$$ \begin{array}{@{}rcl@{}} &\leq& p \sum\limits_{s=1}^{T}\sum\limits_{t=1}^{T} \left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\\&& \left( X_{s}(u)-\sum\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right) \\ &=& p \left[{\sum}_{t=1}^{T} \left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\right]^{2} \\ & \leq& Tp \left[X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i) \right]^{2}\\ &\leq& Tp \|\mathbf{X}(u)\|^{2}_{2}.\\ \end{array} $$

The last inequality comes from the Cauchy-Schwarz inequality. Denote by νmax the largest eigenvalue of the covariance matrix Ω. Since Ω is PSD, we have (νmaxIΩ) is also PSD. Following Müller (2001)’s argument, we can show \((\hat {\mathbf {R}}_{1}, \cdots , \hat {\mathbf {R}}_{p}) \leq _{cx} \mathbf {Y}\) for some random vector \(\mathbf {Y} \sim N(\mathbf {0}, \nu _{max}\mathbf {I}_{p})\), where ≤cx is the convex order that means XY, if and only if μx = μy and \({\sigma _{x}^{2}} \leq {\sigma _{y}^{2}}\). It follows that

$$ \begin{array}{@{}rcl@{}} \max_{u \in V, v\in V \backslash C_{u} } \mathbb{P}\left( \left\|G(\hat{\boldsymbol\theta}(u,v))\right\|{~}_{2}^{2} \geq \lambda^{2}\right) \!\!\!&\leq&\!\!\! \max_{u \in V, v\in V \backslash C_{u}} \mathbb{P}(4T^{-2}(\mathbf{Y}^{\prime}\mathbf{Y}) \geq \lambda^{2})\\ \!\!\!&=&\!\!\! \max_{u \in V, b\in V \backslash C_{u}} \mathbb{P}\left( \frac{1}{\nu_{max}}\mathbf{Y}^{\prime}\mathbf{Y} \geq \frac{\lambda^{2}T^{2}}{4\nu_{max}}\right) . \end{array} $$

Note that the matrix \(\frac {1}{\nu _{max}}\mathbf {Y}^{\prime }\mathbf {Y}\) is idempotent and thus it follows a χ2(p) distribution, and \(\nu _{max} \leq \textbf {tr}(\boldsymbol {\Omega }) \leq Tp\|\mathbf {X}(u)\|{~}_{2}^{2}\). Put everything together, we have

$$ \begin{array}{@{}rcl@{}} \max_{u \in V, b\in V \backslash C_{u}} \mathbb{P}\left( \|G(\hat{\boldsymbol\theta}(u,v))\|{~}_{2}^{2} \geq \lambda^{2}\right) \!\!\!&\leq&\!\!\! \max_{u \in V, v\in V \backslash C_{u}} \mathbb{P}\left( \chi^{2}(p) \geq \frac{\lambda^{2}T^{2}}{4\nu_{max}} \right)\\ \!\!\!&\leq&\!\!\! \max_{u \in V, v\in V \backslash C_{u}} \mathbb{P}\left( \chi^{2}(p) \geq \frac{\lambda^{2}T^{2}}{4Tp\|\mathbf{X}(u)\|{~}_{2}^{2}} \right) \\&\leq&\!\!\! \frac{\alpha}{N(N-1)} \end{array} $$

and thus we have the desired λ(α, a)

$$ \lambda(\alpha) = 2\hat{\sigma}_{u}\sqrt{pQ\left( 1-\frac{\alpha}{N(N-1)}\right)}. $$
(6.19)

A.4 Proof of Theorem 3.3

The proof of the theorem is in line with the work in Kolaczyk and Nowak (2005). The core idea is to bound the expected Hellinger loss in terms of the Kullback-Leibler distance. This approach, building on the original work of Li and Barron (2000), leverages the union of unions bound, after discretizing the underlying parameter space. We assume a similar discretization here, while omitting the straightforward but tedious numerical analysis arguments that accompany. See, for example, Kolaczyk and Nowak (2005) for details. Our fundamental bound is given by the following theorem.

Theorem 8.2.

Let \({\Gamma }_{T}^{(N-1)p}\) be a space of finite collection of estimators \(\boldsymbol {\tilde \theta }\) for 𝜃, and pen(⋅) a function on \({{\Gamma }_{T}^{p}}\) satisfying the condition

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}}e^{-pen(\boldsymbol{\tilde\theta}(u, v))} \leq 1. \end{array} $$
(6.20)

Let \(\hat {\boldsymbol \theta }\) be a penalized maximum likelihood estimator of the form

$$ \begin{array}{@{}rcl@{}} \hat{\boldsymbol\theta} \equiv \operatornamewithlimits{arg min}_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{ -\log p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol{\tilde\theta}) + 2\sum\limits_{v \in V\backslash \{u\}} \text{Pen}(\boldsymbol{\tilde\theta}(u,v))\right\}. \end{array} $$

Then

$$ \begin{array}{@{}rcl@{}} \mathbb{E}[H^{2}(p_{\hat{\boldsymbol\theta}},p_{\boldsymbol\theta})] \leq \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{K(p_{\boldsymbol\theta},p_{\boldsymbol{\tilde\theta}}) + 2\sum\limits_{v \in V\backslash \{u\}}\text{Pen}(\boldsymbol{\tilde\theta}(u,v))\right\}. \end{array} $$
(6.21)

Note that the result of Theorem 6 requires that inequality (6.20) holds. Lemma 8.3 shows that our proposed penalty satisfies inequality (6.20). We now prove Theorem 6.

Proof.

Note that we have

$$ \begin{array}{@{}rcl@{}} H^{2}(p_{\hat{\boldsymbol\theta}},p_{\boldsymbol\theta}) &=& \int \left[\sqrt{p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol{\hat{\theta}})} - \sqrt{p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol\theta)}\right]^{2} d\nu(\mathbf{x}) \\ &=& 2\left( 1 - \int \sqrt{p(\mathbf{x}|\mathbf{X}(-u), \hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol\theta)}d\nu(\mathbf{x}) \right)\\ &\leq &-2 \log \int \sqrt{p(\mathbf{x}|\mathbf{X}(-u), \hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol\theta)} d\nu(\mathbf{x}). \end{array} $$

Taking the conditional expectation respect to X(u)|X(−u), we then have

$$ \begin{array}{@{}rcl@{}} \mathbb{E}[H^{2}(p_{\hat{\boldsymbol\theta}},p_{\boldsymbol\theta})] \!\!\!&\leq&\!\!\! 2\mathbb{E}\log \left( \frac{1}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right) \\ \!\!\!& \leq&\!\!\! 2\mathbb{E}\log \left( \frac{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\hat{\boldsymbol\theta})e^{- \sum\limits_{v} pen(\hat{\boldsymbol\theta}(u,v))}}{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\check{\boldsymbol\theta})e^{- \sum\limits_{v} pen(\check{\boldsymbol\theta}(u,v))}}\right.\\&&\left. \frac{1}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right), \end{array} $$

where the collection of \(\check {\boldsymbol \theta }(u,v)\)’s are the arguments that minimize the right-hand side of the expression (6.21). The last expression can be written in two pieces, that is

$$ \begin{array}{@{}rcl@{}} &&{} \mathbb{E}\left[ \log \frac{p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}{p(\mathbf{X}(u)|\mathbf{X}(-u),\check{\boldsymbol\theta})}\right] + 2 \sum\limits_{v} pen(\check{\boldsymbol\theta}(u,v)) \end{array} $$
(6.22)
$$ \begin{array}{@{}rcl@{}} &&{}+ 2\mathbb{E} \log \left( \frac{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\hat{\boldsymbol\theta})}{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}\frac{\prod\limits_{v} \prod\limits_{\ell} e^{-pen(\hat{\boldsymbol\theta}^{(\ell)}(u,v))}}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right).\\ \end{array} $$
(6.23)

Note that the expression (6.22) is the right hand side of Eq. 6.21. What we need to show then is that expression (6.23) is bounded above by zero. By applying Jensen’s inequality, we have Eq. 6.23 bounded by:

$$ \begin{array}{@{}rcl@{}} 2\log \mathbb{E}\left[\prod\limits_{v}e^{-pen(\hat{\boldsymbol\theta}(u,v))}\frac{\sqrt{p(\mathbf{X}(u)|\mathbf{X}(-u),\hat{\boldsymbol\theta})/p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right]. \end{array} $$
(6.24)

The integrand in the expectation in Eq. 6.24 can be bounded by

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \prod\limits_{v}e^{-pen(\tilde{\boldsymbol\theta}(u,v))}\frac{\sqrt{p(\mathbf{X}(u)|\mathbf{X}(-u),\tilde{\boldsymbol\theta})/p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\tilde{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})}. \end{array} $$

Given the fact that \(\tilde {\boldsymbol \theta }\) does not depend on the X(−u), Eq. 6.24 can be bounded by

$$ \begin{array}{@{}rcl@{}} &&{} 2\log \sum\limits_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \prod\limits_{v}e^{-pen(\tilde{\boldsymbol\theta}(u,v))}\frac{\mathbb{E}\left[\sqrt{p(\mathbf{X}(u)|\mathbf{X}(-u),\tilde{\boldsymbol\theta})/p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}\right]}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\tilde{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \\ &&{}= 2\log \sum\limits_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \prod\limits_{v}e^{-pen(\tilde{\boldsymbol\theta}(u,v))}. \end{array} $$
(6.25)

Since \(e^{-pen(\tilde {\boldsymbol \theta }(u,v))} > 0\) for any \(\boldsymbol {\tilde \theta }(u,v)\), and using the inequality \({\sum }_{i} a_{i} b_{i} \leq {\sum }_{i} a_{i} {\sum }_{i}b_{i}\) for any ai > 0,bi > 0, we can bound (6.25) by:

$$ \begin{array}{@{}rcl@{}} 2\log \prod\limits_{v} \sum\limits_{\boldsymbol{\tilde\theta}(u,v) \in {{\Gamma}_{T}^{p}}} e^{- pen(\tilde{\boldsymbol\theta}(u,v))}. \end{array} $$

From the condition in Eq. 6.20, we see that the above expression is bounded by zero. We now show that our proposed estimator satisfies condition (6.20) by the following lemma.

Lemma 8.3.

Let ΓT be the collection of all \(\boldsymbol {\tilde \theta }^{(\ell )}(u, v)\) with components \(\boldsymbol {\tilde \theta }_{t}^{(\ell )}(u, v) \in D_{T}[-C, C]\) and possessed of a Haar like expansion through a common partition, using either RDP (see expression (2.2)) or RP (see expression (2.4)), where DT[−C, C] denotes a discretization of the interval [−C, C] into T1/2 equispaced values. For any type of penalty such that

$$ \begin{array}{@{}rcl@{}} Pen(\boldsymbol{\tilde\theta}(u, v)) = C_{3}\log T \#\{\mathcal{P}(\boldsymbol{\tilde\theta)}\} + \lambda\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})} \|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u, v)\|{~}_{2}, \end{array} $$

where C3 = 1/2 for recursive dyadic partitioning and C3 = 3/2 for recursive partitioning, we have

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-pen(\boldsymbol{\boldsymbol{\tilde\theta}}(u, v))} \leq 1, \end{array} $$

for T > ⌈e2p/3⌉.

Proof.

We prove Lemma 8.3 for the case of recursive partitioning. We write \({\Gamma }_{T} = \bigcup _{d_{\ell }=1}^{T} {\Gamma }_{T}^{(d_{\ell })}\) where \({\Gamma }_{T}^{(d_{\ell })}\) is the subset of values \(\boldsymbol {\tilde \theta }_{t}^{(\ell )}(u, v)\) that is composed of d constant valued sequences. For example, \({\Gamma }_{T}^{(d_{\ell })}\) consists of all length T sequences such that there are exactly d alternating sequences of zero and nonzero elements. So, for example, (0,0,4,0,0) and (2,0,1,1,1) might be two such sequences in \({\Gamma }_{5}^{(3)}\). Then we have

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-pen(\boldsymbol{\tilde\theta}(u, v))} &=& \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-(3/2)\log T\{\#\mathcal{P}(\boldsymbol{\tilde\theta)}\} - \lambda\sum\limits_{\mathcal{I}\in \mathcal{P}(\boldsymbol{\tilde\theta})}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u, v) \|{~}_{2}} \\ & \leq& \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-(3/2)\log T\{\#\mathcal{P}(\boldsymbol{\tilde\theta)}\}}\\ & \leq& \prod\limits_{\ell=1}^{p} \sum\limits_{\boldsymbol{\tilde\theta}^{(\ell)}(u, v) \in {\Gamma}_{T}} e^{-(3/2p)\log T\{\#\mathcal{P}(\boldsymbol{\tilde\theta)}\}}\\ & =& \prod\limits_{\ell=1}^{p} \sum\limits_{d_{\ell}=1}^{T}\binom{T-1}{d_{\ell}-1} e^{-d_{\ell}(3/2p)\log T}\\ & =& \prod\limits_{\ell=1}^{p} \sum\limits_{d^{\ell^{\prime}}=0}^{T-1}\binom{T-1}{d^{\ell^{\prime}}}e^{-(d^{\ell^{\prime}}+1)(3/2p)\log T} \\ & =& \prod\limits_{\ell=1}^{p} \sum\limits_{d\prime=0}^{T-1} \frac{(T-1)!}{d^{\ell^{\prime}} ! (T-d^{\ell^{\prime}}-1)!} T^{-(d^{\ell^{\prime}}+1)(3/2p)}\\ & \leq& \prod\limits_{\ell=1}^{p} T^{-(3/2p)} \sum\limits_{d^{\ell^{\prime}}=0}^{T-1} \frac{(T-1)^{d^{\ell^{\prime}}}}{d^{\ell^{\prime}}!}\frac{1}{T^{(3/2p)d^{\ell^{\prime}}}}\\ & \leq& T^{-(3/2)} e^{p} \end{array} $$

which is bounded by 1 for any T > ⌈e2p/3⌉. The argument follows analogously for the case of recursive dyadic partitioning. □

Using the loss function and the corresponding risk function we defined before, recovering the neighborhood of node u is essentially a univariate Gaussian time series problem, and thus the KL divergence of the conditional likelihood function takes the form:

$$ \begin{array}{@{}rcl@{}} K(p_{\boldsymbol\theta},p_{\boldsymbol{\tilde\theta}}) = \mathbb{E}\left\{\log \frac{p_{\boldsymbol\theta}(\mathbf{x})}{p_{\boldsymbol{\tilde\theta}}(\mathbf{x})}\right\} = \mathbb{E}\left\{\sum\limits_{t=1}^{T} \log \frac{p_{\boldsymbol\theta}(X_{t}(u))}{p_{\boldsymbol{\tilde\theta}}(X_{t}(u))}\right\} = \sum\limits_{t=1}^{T} (\tilde\mu_{t}-\mu_{t})^{2} / (2\sigma^{2}) \end{array} $$

where each μt is the mean of Xt(u), and \(\tilde \mu _{t}\) is an approximation/estimate thereof, for a given estimator \(\boldsymbol {\tilde \theta }\). Since these means in turn are based on linear combinations of all neighborhood observations, over p lags, we have:

$$ \begin{array}{@{}rcl@{}} \tilde\mu_{t} - \mu_{t} = \sum\limits_{v\in V \backslash \{u\}}\sum\limits_{\ell=1}^{p} X_{t-\ell}(v)[\tilde\theta_{t}^{(\ell)}(u, v) - \theta_{t}^{(\ell)}(u, v)]. \end{array} $$

So the KL divergence for each neighborhood problem involves values at other nodes.

Assume without loss of generality that σ ≡ 1. From Eq. 6.21 and the fact that the K-L divergence in the Gaussian case is simply proportional to a squared 2-norm, the risk of estimating 𝜃 by \(\boldsymbol {\hat {\theta }}\) should be in the form:

$$ \begin{array}{@{}rcl@{}} \mathbb{R}(\hat{\boldsymbol\theta}, \boldsymbol\theta) &\leq& \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{\frac{1}{T}K(p_{\boldsymbol\theta},p_{\boldsymbol{\tilde\theta}}) + \frac{2}{T}\sum\limits_{v=1}^{N-1} Pen(\boldsymbol{\tilde\theta}(u,v))\right\} \\ &\leq& \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \left\{\frac{1}{2T}\left\|\boldsymbol{\tilde\mu}-\boldsymbol\mu\right\|{~}_{2}^{2} + \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} \right.\\&&\left.+ \frac{2}{T}\sum\limits_{v=1}^{N-1}(3/2)\log T \#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}.\\ \end{array} $$

From Cauchy-Schwarz, we have that

$$ \begin{array}{@{}rcl@{}} \mathbb{R}(\hat{\boldsymbol\mu}, \boldsymbol\mu) \!\!\!\!\!&\leq&\!\!\!\! \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N\!-1)p}}\left\{\frac{1}{2T}\|{\mathbf{X}(\!-u)}^{\prime}\mathbf{X}(\!-u)\|{~}_{2} \sum\limits_{t=1}^{T}\sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left( \!\tilde\theta_{t}^{(\ell)}(u,v) - \theta_{t}^{(\ell)}(u,v)\!\right)^{2} \right.\!\\ && \!\!\!\!+\left. \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\} \\ \!\!\!\!&\leq&\!\!\!\! \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{\frac{1}{2}{\Lambda} \sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left\|\boldsymbol{\tilde\theta}_{t}^{(\ell)}(u,v) - \boldsymbol{\theta}_{t}^{(\ell)}(u,v)\right\|{~}_{2}^{2} \right.\\ &&\left. \!\!\!\!+ \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}. \end{array} $$
(6.26)

The minimization of the expression (6.26) tries to find the optimal balancing of bias and variance. To bound it, the following L2 result from Donoho (1993) plays the core role.

Lemma 8.4.

Let \(\theta _{(\cdot )}^{(\ell )}(u,v) \in BV(C)\). Define \({\theta _{bd}}_{(\cdot )}^{(\ell )}(u,v)\) to be the best d-term approximant to \(\theta _{(\cdot )}^{(\ell )}(u,v)\) in the dyadic Haar basis for L2([0,1]). Then \(\|{\theta _{bd}}^{(\ell )}(u,v) - \theta ^{(\ell )}(u,v)\|{~}_{L_{2}} = \mathcal {O}(d^{-1})\).

Define \({\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v)\) to be the average sampling of \({\theta _{bd}}^{(\ell )}(u,v)\) on the interval Ii, that is \({\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v) = T{\int \limits }_{I_{i}}{\theta _{bd}}^{(\ell )}(u,v)(t) dt\). Then let \({\boldsymbol {\tilde \theta }_{bd}}^{(\ell )}(u,v)\) be the result of discretizing the elements of \({\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v)\) to the set DT[−C, C], where C is the radius of the bounded variation ball defined in Assumption 6. We have the following by triangle inequality:

$$ \begin{array}{@{}rcl@{}} &&\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} \\&\leq& \left\|{\boldsymbol{\theta}_{bd}}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} + \left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - {\boldsymbol{\theta}_{bd}}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} \\ &&+ 2\left\|{\boldsymbol{\theta}_{bd}}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}. \end{array} $$
(6.27)

For sequence \({\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v)\) and \({\boldsymbol {\tilde \theta }_{bd}}^{(\ell )}(u,v)\) obtained from average sampling, a simple argument relating Haar function on the discrete set DT[−C, C] to the functions on the interval [0,1] is to show that

$$ \begin{array}{@{}rcl@{}} \frac{1}{T}\left\|{\boldsymbol{\tilde\theta}_{bd}}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} \leq \left\|\theta_{bd}^{(\ell)}(u,v) - \theta^{(\ell)}(u,v) \right\|{~}_{L2}^{2}. \end{array} $$

See equation (27) of Kolaczyk and Nowak (2005). On the right hand side of Eq. 6.27, the first resulting squared term will be of order \(\mathcal {O}(Td^{-2})\). The second term is a discretization error and by lemma (8.4) is of order \(\mathcal {O}(1)\). The third cross-term is therefore of order \(\mathcal {O}(T^{1/2}d^{-1})\).

Given these results, we have the following bound of Eq. 6.26 by bounding the bias term over each \({\Gamma }_{T}^{(d)}\), where \(d = \bigcup _{i} d_{i}\), for each di and i = 1,⋯ ,(N − 1)p. We then we optimize for d:

$$ \begin{array}{@{}rcl@{}} &&{}\min\limits_{\boldsymbol{\tilde\theta} \in {{\Gamma}_{T}^{(N-1)p}}^{(d)}} \left\{\frac{1}{2}{\Lambda} \sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol\theta^{(\ell)}(u,v)\right\|{~}_{2}^{2} \right.\\ &&{}\quad\quad \left. + \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}. \end{array} $$
(6.28)

The first term is dominated by the first part of expression (6.27) and is of order \(\mathcal {O}({\Lambda } Td^{-2})\). In the second term, we have \(\frac {\lambda }{T}{\sum }_{\mathcal {I} \in \mathcal {P}(\boldsymbol {\tilde \theta })}{\sum }_{v=1}^{N-1}\|\boldsymbol {\tilde \theta }_{\mathcal {I}}(u,v)\|{~}_{2}\), which are the group lasso terms. Given the fact that \(\theta _{(\cdot )}^{(\ell )}(u,v)\) is of BV (C), we have that \(1/(T^{1/2})\|\boldsymbol {\tilde \theta }_{\mathcal {I}}(u,v)\|{~}_{2}\) is of order \(\mathcal {O}(C + d^{-1})\). Note that λ is of order T− 1/2 and the number of interval \(\#\{\mathcal {P}(\boldsymbol {\tilde \theta })\}\) is proportional to d. So the second term is of order \(\mathcal {O}(T^{-1} *d * (C + d^{-1}) )\). The third term is of order \(\mathcal {O}(dT^{-1}\log T)\). Combining the above results, we have that:

$$ \begin{array}{@{}rcl@{}} && \min\limits_{\boldsymbol{\tilde\theta} \in {{\Gamma}_{T}^{(N-1)p}}^{(d)}} \left\{\frac{1}{2}{\Lambda} \sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol\theta^{(\ell)}(u,v)\right\|{~}_{2}^{2} \right.\\ &&\quad\quad \left. + \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}\\ &&\leq \mathcal{O}({\Lambda} Td^{-2}) + \mathcal{O}(T^{-1} *d * (C + d^{-1}) ) + \mathcal{O}(dT^{-1}\log T), \end{array} $$

which is minimized for \(d \sim ({\Lambda } T^{2}/\log T)^{1/3}\). Substitution then yields the result that the risk is bounded by a quantity of order \(\mathcal {O}(({\Lambda }\log ^{2}T/T)^{1/3})\). For estimation via recursive dyadic partitioning, where \(\#\{\mathcal {P}(\tilde \theta )\}\) is proportional to \(d\log T\), the expression is minimized at \(d \sim ({\Lambda } T^{2} /\log ^{2} T)^{1/3}\), which gives the bound of the risk of order \(\mathcal {O}({\Lambda } \log ^{4}T/T)^{1/3}\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kang, X., Ganguly, A. & Kolaczyk, E.D. Dynamic Networks with Multi-scale Temporal Structure. Sankhya A 84, 218–260 (2022). https://doi.org/10.1007/s13171-021-00256-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-021-00256-1

Keywords

  • Dynamic network
  • multiscale modeling
  • vector autoregressive model.

AMS (2000) subject classification

  • Primary: 62M10
  • Secondary: 05C82
  • 62P10