Dynamic Networks with Multi-scale Temporal Structure

Kang, Xinyu; Ganguly, Apratim; Kolaczyk, Eric D.

doi:10.1007/s13171-021-00256-1

Dynamic Networks with Multi-scale Temporal Structure

Published: 28 July 2021

Volume 84, pages 218–260, (2022)
Cite this article

Sankhya A Aims and scope Submit manuscript

339 Accesses
Explore all metrics

Abstract

We describe a novel method for modeling non-stationary multivariate time series, with time-varying conditional dependencies represented through dynamic networks. Our proposed approach combines traditional multi-scale modeling and network based neighborhood selection, aiming at capturing temporally local structure in the data while maintaining sparsity of the potential interactions. Our multi-scale framework is based on recursive dyadic partitioning, which recursively partitions the temporal axis into finer intervals and allows us to detect local network structural changes at varying temporal resolutions. The dynamic neighborhood selection is achieved through penalized likelihood estimation, where the penalty seeks to limit the number of neighbors used to model the data. We present theoretical and numerical results describing the performance of our method, which is motivated and illustrated using task-based magnetoencephalography (MEG) data in neuroscience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1

Figure 2

An Invitation to the Study of Brain Networks, with Some Statistical Analysis of Thresholding Techniques

Neurocraft: software for microscale brain network dynamics

Article Open access 20 October 2021

Dimitris Fotis Sakellariou, Angeliki Vakrinou, … Mark Phillip Richardson

δ-MAPS: from spatio-temporal data to a weighted and lagged network between functional domains

Article Open access 31 July 2018

Ilias Fountalis, Constantine Dovrolis, … Shella Keilholz

References

Amano, K., Takeda, T., Haji, T., Terao, M., Maruya, K., Matsumoto, K., Murakami, I. and Nishida, S. (2012). Human neural responses involved in spatial pooling of locally ambiguous motion signals. Journal of Neurophysiology107, 3493–3508.
Article Google Scholar
Bach, F.R. (2008). Consistency of the group lasso and multiple kernel learning. The Journal of Machine Learning Research 9, 1179–1225.
MathSciNet MATH Google Scholar
Barigozzi, M. and Brownlees, C.T. (2014). Nets: network estimation for time series. Available at SSRN 2249909.
Basu, S., Shojaie, A. and Michailidis, G. (2015). Network granger causality with inherent grouping structure. Journal of Machine Learning Research 16, 417–453. http://jmlr.org/papers/v16/basu15a.html.
MathSciNet MATH Google Scholar
Betancourt, B., Rodríguez, A. and Boyd, N. (2017). Bayesian fused lasso regression for dynamic binary networks. Journal of Computational and Graphical Statistics.
Bettencourt, K.C. and Xu, Y. (2016). Decoding the content of visual short-term memory under distraction in occipital and parietal areas. Nature Neuroscience 19, 150–157.
Article Google Scholar
Bolstad, A., Van Veen, B.D. and Nowak, R. (2011). Causal network inference via group sparse regularization. IEEE Transactions on Signal Processing 59, 2628– 2641.
Article MathSciNet Google Scholar
Braddick, O., O'Brien, J., Wattam-Bell, J., Atkinson, J. and Turner, R. (2000). Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain. Current Biology 10, 731–734.
Article Google Scholar
Bullmore, E. and Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience10, 186– 198.
Article Google Scholar
Calabro, F. and Vaina, L. (2012). Interaction of cortical networks mediating object motion detection by moving observers. Experimental Brain Research 221, 177– 189.
Article Google Scholar
Davis, R.A., Lee, T. and Rodriguez-Yam, G.A. (2008). Break detection for a class of nonlinear time series models. Journal of Time Series Analysis 29, 834–867.
Article MathSciNet Google Scholar
Donoho, D.L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Applied and Computational Harmonic Analysis1, 100– 115.
Article MathSciNet Google Scholar
Donoho, D.L. (1997). Cart and best-ortho-basis: a connection. Annals of Statistics 25, 1870–1911.
Article MathSciNet Google Scholar
Fouque, J.-P., Papanicolaou, G., Sircar, R. and Sølna, K. (2011). Multiscale stochastic volatility for equity, interest rate, and credit derivatives. Cambridge University Press, Cambridge.
Book Google Scholar
Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 3, 424–438.
Article Google Scholar
Hamilton, J.D. (1983). Oil and the macroeconomy since world war ii. The Journal of Political Economy 91, 2, 228–248.
Article Google Scholar
Hiemstra, C. and Jones, J.D. (1994). Testing for linear and nonlinear granger causality in the stock price-volume relation. Journal of Finance 49, 1639–1664.
Google Scholar
Honey, C.J., Kötter, R., Breakspear, M. and Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proceedings of the National Academy of Sciences of the United States of America 104, 10240– 10245.
Article Google Scholar
Killick, R., Fearnhead, P. and Eckley, I.A. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association 107, 1590–1598.
Article MathSciNet Google Scholar
Kolaczyk, E.D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer Publishing Company, Incorporated, 1st edn.
Kolaczyk, E.D. and Nowak, R.D. (2005). Multiscale generalised linear models for nonparametric function estimation. Biometrika 92, 119–133.
Article MathSciNet Google Scholar
Li, J.Q. and Barron, A.R. (2000). Mixture density estimation, p. 279–285.
Long, C., Brown, E., Triantafyllou, C., Aharon, I., Wald, L. and Solo, V. (2005). Nonstationary noise estimation in functional mri. Neuroimage28, 890–903.
Article Google Scholar
Louie, M.M. and Kolaczyk, E.D. (2006). A multiscale method for disease mapping in spatial epidemiology. Statistics in Medicine 25, 1287–1306.
Article MathSciNet Google Scholar
Mallat, S.G. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 674–693.
Article Google Scholar
Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, 3, 1436–1462.
Article MathSciNet Google Scholar
Mukhopadhyay, N.D. and Chatterjee, S. (2007). Causality and pathway search in microarray time series experiment. Bioinformatics 23, 442–449.
Article Google Scholar
Müller, A. (2001). Stochastic ordering of multivariate normal distributions. Annals of the Institute of Statistical Mathematics 53, 567–575.
Article MathSciNet Google Scholar
Rana, K.D. and Vaina, L.M. (2014). Functional roles of 10 hz alpha-band power modulating engagement and disengagement of cortical networks in a complex visual motion task. PloS One 9, e107715.
Article Google Scholar
Sims, C.A. (1972). Money, income, and causality. American Economic Review 62, 540–552.
Google Scholar
Willett, R.M. and Nowak, R.D. (2007). Multiscale poisson intensity and density estimation. IEEE Transactions on Information Theory 53, 3171–3187.
Article MathSciNet Google Scholar
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68, 49–67.
Article MathSciNet Google Scholar
Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank Lucia Vaina and Kunjan Rana for providing the MEG data and offering helpful discussion throughout. This work was supported in part by funding under AFOSR award 12RSL042 and NIH award 1R01NS095369-01.

Author information

Authors and Affiliations

Boston University, Boston, MA, USA
Xinyu Kang & Apratim Ganguly
Department of Mathematics & Statistics, Boston University, 111 Cummington Mall, Boston, MA, 02215, USA
Eric D. Kolaczyk

Authors

Xinyu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Apratim Ganguly
View author publications
You can also search for this author in PubMed Google Scholar
Eric D. Kolaczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric D. Kolaczyk.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

A.1 Algorithm using RDP

Here we provide the algorithm for implementation based on recursive dyadic partitions. Assume the length of the time series equals T = 2^J and $j_{min} = \min \limits _{j}$ such that 2^j > p + 1. Note that p + 1 is the minimum required number of observations to fit the restricted VAR(p) model. Assume J > j_min,

Algorithm 2 splits only at dyadic positions. The candidate partitions $\mathcal {P} \preceq \mathcal {P}_{D_{y}}^{*}$ can be represented as subtrees of a binary tree of depth $\log _{2} T$. Given a dataset of length T = 2^J, we have 2⁰ root node, 2¹ nodes at level 1, 2² nodes, 2³ nodes, and so on, at the following levels, until we reach the leaf level, which has 2^(J− 1) nodes. The complexity of the algorithm is then of order $\mathcal {O}(T)$ calls to fit the group lasso regression and $\mathcal {O}(T)$ calls for comparisons.

A.2 Proof of Theorem 3.1

Proof.

Theorem 3.1

The proof contains two parts. In the first part, we show that Eq. 3.1 holds, under H₀. In the second part, we show that Eq. 3.2 holds, under H₁.

Part 1

We begin by defining the group lasso penalized likelihood on an interval I:

$$ \begin{array}{@{}rcl@{}} PL_{I} = \frac{1}{|I|}\left\|\mathbf{X}_{I}(u) -\mathbf{X}_{I}(-u)\boldsymbol\theta_{I}(u,v) \right\|{~}_{2}^{2} + \lambda_{I} \sum\limits_{v \in V\backslash \{u\}}\left\|\boldsymbol\theta_{I}(u,v)\right\|{~}_{2}. \end{array} $$

(6.1)

Let $\boldsymbol {\hat {\theta }}_{1:T}$ be the 𝜃 that minimizes the penalized likelihood Eq. 6.1 on the interval from 1 to T and $\hat {PL}_{1:T}$ be the quantity upon substituting $\boldsymbol {\hat {\theta }}_{1:T}$ in Eq. 6.1. Consider any alternative model with a change point detected at point $\hat {\tau }\in (1, T)$. Denote by $\boldsymbol {\hat {\theta }}_{1:\hat {\tau }}$ and $\boldsymbol {\hat {\theta }}_{\hat {\tau }:T}$ the coefficients 𝜃 that minimize Eq. 6.1 over intervals $[1, \hat {\tau }]$ and $(\hat {\tau }, T]$, respectively. Given our model, Eq. 3.1 in Theorem 3.1 is equivalent to

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{0}}(\hat{PL}_{1:T} \leq \hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} + C_{3}\log T) \longrightarrow 1. \end{array} $$

The additional term $C_{3}\log T$ comes from the fact that the alternative model has 1 more partition than the null model, with C₃ = 1/2 using RDP and C₃ = 3/2 using RP. We expand $\hat {PL}_{1:\hat {\tau }} + \hat {PL}_{\hat {\tau }:T} - \hat {PL}_{1:T} +C_{3}\log T$ and get:

$$ \begin{array}{@{}rcl@{}} &&{} \frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&+\! \frac{1}{T- \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2}\\ &&+ \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}- \frac{1}{T}\left\|\vphantom{\left.\sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2}}\mathbf{X}_{1:T}(u)\right.\\&&\left.- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2}- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2}\\ && + C_{3}\log T. \end{array} $$

(6.2)

By rewriting the last line of Eq. 6.2, we have

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&~~+ \frac{1}{T - \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2}\\ &&+ \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}\! \\ &&- \frac{1}{T}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2} + C_{3}\log T. \end{array} $$

(6.3)

We then add and subtract a term in both line 3 and line 4 of Eq. 6.3. In doing so, we have:

$$ \begin{array}{@{}rcl@{}} &&{} \frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - {\sum}_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}{\sum}_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&+ \frac{1}{T - \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) + \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right.\\&&\left.- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) + \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right.\\&&\left.- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2} + C_{3}\log T. \end{array} $$

(6.4)

From which we have that:

$$ \begin{array}{@{}rcl@{}} &&\text{equation~(6.4)} \\ &\geq & \frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} \\ &&+ \frac{1}{T- \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{1}{T}\left \| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{2}{T}\left( \left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2} \right. \\ &&\times \left. \left\| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)- \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2} \right) \\ &&- \frac{1}{T}\left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right \|{~}_{2}^{2} \\ &&- \frac{1}{T}\left\| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2}^{2} \\ &&- \frac{2}{T}\left( \left\|\mathbf{X}_{\hat{\tau} :T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right \|{~}_{2} \right. \\ &&\times \left. \left\|\sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau} :T}(v)\boldsymbol{\hat{\theta}}_{1:T}(u,v) \right\|{~}_{2} \right) \\ &&+ \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}+ \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\&&- \lambda_{1:T}{\sum}_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:T}(u,v)\right\|{~}_{2} \\ &&+ C_{3}\log T. \end{array} $$

(6.5)

Under assumptions (1) to (5), Bach (2008) reformulated the group lasso penalized likelihood (6.1) as:

$$ PL_{I} = \hat{\boldsymbol{\Sigma}}_{\mathbf{X}(u)\mathbf{X}(u)} - 2\hat{\boldsymbol{\Sigma}}_{\mathbf{X}(-u)\mathbf{X}(u)}^{\prime}\boldsymbol{\theta} + \boldsymbol{\theta}^{\prime}\boldsymbol{\hat{\Sigma}}_{\mathbf{X}(-u)\mathbf{X}(-u)}\boldsymbol\theta + \lambda_{I} \sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\theta}(u,v)\right\|{~}_{2} $$

(6.6)

where $\hat {\boldsymbol {\Sigma }}_{\mathbf {X}(u)\mathbf {X}(u)} = \frac {1}{|I|}\mathbf {X}(u)^{\prime } {\Pi }_{|I|}\mathbf {X}(u)$, $\hat {\boldsymbol {\Sigma }}_{\mathbf {X}(-u) \mathbf {X}(u)} = \frac {1}{|I|}\mathbf {X}(-u)^{\prime }{\Pi }_{|I|}\mathbf {X}(u)$ and $\boldsymbol \theta ^{\prime }\hat {\boldsymbol {\Sigma }}_{\mathbf {X}(-u)\mathbf {X}(-u)}\boldsymbol \theta = \frac {1}{|I|}\mathbf {X}(-u)^{\prime }{\Pi }_{|I|}\mathbf {X}(-u)$ are the empirical covariance matrices with π_|I| defined as ${\Pi }_{|I|} = \mathbf {I}_{|I|}-\frac {1}{|I|}\mathbf {1}_{|I|}\mathbf {1}_{|I|}^{\prime }$ and showed that the group lasso estimator $\boldsymbol {\hat {\theta }}$ converges in probability to 𝜃. Using expression in Eq. 6.6 and collecting similar terms, we could then rewrite (6.5) as:

$$ \begin{array}{@{}rcl@{}} &&{} \frac{T-\hat{\tau}}{T}\left\{\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{1:\hat{\tau}}(u) \mathbf{X}_{1:\hat{\tau}}(u)} - 2\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{1:\hat{\tau}}(-u) \mathbf{X}_{1:\hat{\tau}}(u)} \hat{\boldsymbol\theta}_{1:\hat{\tau}} + \hat{\boldsymbol\theta}_{{1:\hat{\tau}}}^{\prime} \hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{1:\hat{\tau}}(-u)\mathbf{X}_{1:\hat{\tau}}(-u)} \hat{\boldsymbol\theta}_{1:\hat{\tau}} \right\} \\ &&{}+ \frac{\hat{\tau}}{T}\left\{\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{\hat{\tau}:T}(u) \mathbf{X}_{\hat{\tau}:T}(u)} - 2\hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{\hat{\tau}:T}(-u) \mathbf{X}_{\hat{\tau}:T}(u)} \hat{\boldsymbol\theta}_{\hat{\tau}:T} + \hat{\boldsymbol\theta}_{{\hat{\tau}:T}}^{\prime} \hat{\boldsymbol{\Sigma}}_{\mathbf{X}_{\hat{\tau}:T}(-u)\mathbf{X}_{1:\hat{\tau}}(-u)} \hat{\boldsymbol\theta}_{\hat{\tau}:T} \right\} \end{array} $$

(6.7)

$$ \begin{array}{@{}rcl@{}} &&{}- \left\| \hat{\boldsymbol{\Sigma}}^{1/2}_{\mathbf{X}_{1:\hat{\tau}}(-u) \mathbf{X}_{1:\hat{\tau}}(-u)} \left( \hat{\boldsymbol\theta}_{1:\hat{\tau}} - \hat{\boldsymbol\theta}_{1:T} \right)\right\|{~}_{2}^{2} - \left\| \hat{\boldsymbol{\Sigma}}^{1/2}_{\mathbf{X}_{\hat{\tau}:T}(-u) \mathbf{X}_{\hat{\tau}:T}(-u)} \left( \hat{\boldsymbol\theta}_{\hat{\tau}:T} - \hat{\boldsymbol\theta}_{1:T} \right)\right\|{~}_{2}^{2} \end{array} $$

(6.8)

$$ \begin{array}{@{}rcl@{}} &&{}- \frac{2}{T}\left( \left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}\right.\\&&\left. \left \| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\left( \boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)- \boldsymbol{\hat{\theta}}_{1:T}(u,v)\right) \right\|{~}_{2}\right) \end{array} $$

(6.9)

$$ \begin{array}{@{}rcl@{}} &&{}- \frac{2}{T}\left( \left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}\right.\\&&\left. \left \| \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\left( \boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)- \boldsymbol{\hat{\theta}}_{1:T}(u,v)\right) \right\|{~}_{2}\right) \end{array} $$

(6.10)

$$ \begin{array}{@{}rcl@{}} &&{}+ \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\hat{\boldsymbol\theta}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} +\lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\hat{\boldsymbol\theta}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}\\ &&{}- \lambda_{1:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\hat{\boldsymbol\theta}_{1:T}(u,v)\right\|{~}_{2} + C_{3}\log T. \end{array} $$

(6.11)

Note that in the previous expression, the first two lines are by definition non-negative. The expression in the last line is composed of a collection of penalty terms. They are the group lasso penalties, and all of them converge to zero asymptotically assuming λ_(⋅)→0 and λ_(⋅)N→0.

Since $\hat {\boldsymbol \theta }_{1:\hat {\tau }} \stackrel {P}{\longrightarrow } \boldsymbol \theta $, $\hat {\boldsymbol \theta }_{\hat {\tau }:T} \stackrel {P}{\longrightarrow } \boldsymbol \theta $ and $\hat {\boldsymbol \theta }_{1:T} \stackrel {P}{\longrightarrow } \boldsymbol \theta $, $\hat {\boldsymbol \theta }_{1:\hat {\tau }} - \hat {\boldsymbol \theta }_{1:T} \stackrel {P}{\longrightarrow } 0$ and X’s have finite moments up to order 4, each term in Eqs. 6.8, 6.9 and 6.10 converges to 0 in probability.

Putting everything together, we then complete the proof of the first part of the theorem:

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{0}}(\hat{PL}_{1:T} \leq \hat{PL}_{1:\hat{\tau}_{i}} + \hat{PL}_{\hat{\tau}:T} + C_{3}\log T) \longrightarrow 1. \end{array} $$

Part 2

Suppose H₁ is true. We denote the estimated change point by $\hat {\tau }$. We show that $\hat {PL}_{1:\hat {\tau }} + \hat {PL}_{\hat {\tau }:T}$ is minimized at $\hat {\tau } = \tau $. Assume we have a competing estimator $\tilde \tau $ with change point detected at time $\tilde \tau = s$ with s≠τ. We show that

$$ \begin{array}{@{}rcl@{}} \hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} \leq \hat{PL}_{1:s} + \hat{PL}_{s:T} \end{array} $$

(6.12)

holds with high probability under H₁. Without loss of generality, we assume that τ − s = δ, for some δ > 0 as shown in Fig. 6. For the case that s > τ, a similar argument holds.

Denote by $\boldsymbol {\hat {\theta }}_{1:\hat {\tau }}$ and $\boldsymbol {\hat {\theta }}_{\hat {\tau }:T}$ the estimated coefficients that minimize the penalized likelihoods, given that $I = \{t: t \in [1,\hat {\tau })\}$ and $I = \{t: t \in [\hat {\tau }, T]\}$. We also define $\boldsymbol {\hat {\theta }}_{1:s}$ and $\boldsymbol {\hat {\theta }}_{s:T}$ to be the estimated coefficients that minimize the penalized likelihoods in Eq. 6.1, given that I = {t : t ∈ [1,s)} and I = {t : t ∈ [s, T]}. The key idea is that $\hat {\boldsymbol \theta }_{1:\hat {\tau }}$ and $\hat {\boldsymbol \theta }_{\hat {\tau }:T}$ are consistent estimators of 𝜃_1:τ and 𝜃_τ:T but $\hat {\boldsymbol \theta }_{s:T}$ is not a consistent estimator of 𝜃_1:τ nor 𝜃_τ:T due to the mis-specification error. Therefore, one of the estimators from $\boldsymbol {\hat {\theta }}_{1:s}$ and $\boldsymbol {\hat {\theta }}_{s:T}$ such that s < τ is not a consistent estimator on the corresponding intervals. Formally, we have that

$$ \begin{array}{@{}rcl@{}} &&{}\hat{PL}_{1:s} + \hat{PL}_{s:T}\\ &&{}= \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2}\\ &&{\kern2pt}+ \frac{1}{T-s}\left\|\mathbf{X}_{s:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:T}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{s:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2}\\ &&{}= \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2}\\ &&{}+ \frac{1}{T - s}\left\|\mathbf{X}_{s:\tau}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\tau}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&{}+ \frac{\delta \lambda_{s:T}}{T-s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2} \end{array} $$

(6.13)

$$ \begin{array}{@{}rcl@{}} &&{}+ \frac{1}{T-s}\left\|\mathbf{X}_{\tau:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\tau:T}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&{}+ \frac{(T-s-\delta)\lambda_{s:T}}{T-s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \end{array} $$

(6.14)

and

$$ \begin{array}{@{}rcl@{}} &&{}\hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} \\ &&{}= \frac{1}{\tau}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&{}+ \frac{1}{T - \tau}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2}. \end{array} $$

We write expression (6.13) as $\hat {PL}_{1:s} + \hat {PL}_{s:\hat {\tau }}$, and expression (6.14), as $\tilde {PL}_{s:T}$. We show (6.12) holds by first showing that $\hat {PL}_{1:s} + \hat {PL}_{s:\hat {\tau }} \geq \hat {PL}_{1:\hat {\tau }}$, and then showing $\tilde {PL}_{s:T} \geq \hat {PL}_{\hat {\tau }:T}$. We first compute $\hat {PL}_{1:s} + \hat {PL}_{s:\hat {\tau }} - \hat {PL}_{1:\hat {\tau }}$:

$$ \begin{array}{@{}rcl@{}} &&{}= \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}v_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{\mathbf{X}\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \\ &&{}+ \frac{1}{T - s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} + \frac{\delta \lambda_{s:T}}{T - s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2} \\ &&{}-\frac{1}{\tau}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} - \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2}. \end{array} $$

Assuming there is another group-lasso estimator defined on the the interval between s and $\hat {\tau }$, which is given by

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\hat{\theta}}_{s:\hat{\tau}} \!\!\!&=&\!\!\! \operatornamewithlimits{arg min}_{\boldsymbol{\theta}} \frac{1}{\hat{\tau}-s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\theta}_{s:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} \\&&\!\!\!+ \lambda_{s:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol\theta_{s:\hat{\tau}}(u,v)\right\|{~}_{2}. \end{array} $$

The estimator $\boldsymbol {\hat {\theta }}_{s:\hat {\tau }} $ is again a consistent estimator of $\boldsymbol {\theta }_{1:\hat {\tau }}$ and we have that:

$$ \begin{array}{@{}rcl@{}} &&{}\frac{1}{\hat{\tau} - s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{s:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v)\right\|{~}_{2} \end{array} $$

(6.15)

$$ \begin{array}{@{}rcl@{}} &&{}\leq \frac{1}{T - s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&{}+ \frac{\delta \lambda_{s:T}}{T - s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:T}(u,v)\right\|{~}_{2}. \end{array} $$

(6.16)

These are directly implied by Theorem (2) in Bach (2008) given that $\boldsymbol {\hat {\theta }}_{s:T}$ is not consistent in the ℓ₂ sense of estimating $\boldsymbol \theta _{1:\hat {\tau }}$ whenever $s \neq \hat {\tau }$. Given (6.16), we have that

$$ \begin{array}{@{}rcl@{}} &&{}\hat{PL}_{1:s} + \hat{PL}_{s:\hat{\tau}} - \hat{PL}_{1:\hat{\tau}} \\ &&{}\geq \frac{1}{s}\left\|\mathbf{X}_{1:s}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:s}(v)\boldsymbol{\hat{\theta}}_{1:s}(u,v) \right\|{~}_{2}^{2} + \lambda_{1:s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \\ &&{\kern2pt}+ \frac{1}{\hat{\tau}-s}\left\|\mathbf{X}_{s:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{s:\hat{\tau}}(v)\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} + \lambda_{s:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{s:\hat{\tau}}(u,v)\right\|{~}_{2} \\ &&{\kern2pt}-\frac{1}{\hat{\tau}}\left\|\mathbf{X}_{1:\hat{\tau}}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{1:\hat{\hat{\tau}}}(v)\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v) \right\|{~}_{2}^{2} - \lambda_{1:\hat{\tau}}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:\hat{\tau}}(u,v)\right\|{~}_{2}. \\ \end{array} $$

The same argument in Part 1 holds here and we have

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{1}}\left( \hat{PL}_{1:s} + \hat{PL}_{s:\hat{\tau}} \geq \hat{PL}_{1:\hat{\tau}}\right) \longrightarrow 1. \end{array} $$

Note that $\boldsymbol {\hat {\theta }}_{s:T}$ is not a consistent estimator of $\boldsymbol {\theta }_{\hat {\tau }:T}$ given the change point. Therefore, similar to Eq. 6.16, we have

$$ \begin{array}{@{}rcl@{}} & &{}\frac{1}{T - \hat{\tau}}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v) \right\|{~}_{2}^{2} + \lambda_{\hat{\tau}:T}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{\hat{\tau}:T}(u,v)\right\|{~}_{2} \\ &&{}\leq \frac{1}{T - s}\left\|\mathbf{X}_{\hat{\tau}:T}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}_{\hat{\tau}:T}(v)\boldsymbol{\hat{\theta}}_{s:T}(u,v) \right\|{~}_{2}^{2} \\&&~~~~~~~~~~+ \frac{(T-s-\delta)\lambda_{s:T}}{T-s}\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\hat{\theta}}_{1:s}(u,v)\right\|{~}_{2} \\ \end{array} $$

and so

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{1}}\left( \tilde{PL}_{s:T} \geq \hat{PL}_{\hat{\tau}:T}\right) \longrightarrow 1. \end{array} $$

Putting the two parts together, we have

$$ \begin{array}{@{}rcl@{}} \mathbb{P}_{H_{1}}\left( \hat{PL}_{1:s} + \hat{PL}_{s:T}\geq \hat{PL}_{1:\hat{\tau}} + \hat{PL}_{\hat{\tau}:T} \right) \longrightarrow 1 \end{array} $$

for any $s < \hat {\tau }$. □

A.3 Proof of Theorem 9

Under the assumption of stationarity, we could omit the time index in this section, that is 𝜃 = 𝜃_t,∀t. To show Theorem 3.3, we begin with the following lemma.

Lemma 8.1.

Given $\boldsymbol {\theta } \in \mathbb {R}^{(N-1)p}$, let G(𝜃(u, v)) be a p-dimensional vector with elements

$$ \begin{array}{@{}rcl@{}} G(\boldsymbol{\theta}(u,v)) &= -2T^{-1}\left( \mathbf{X}(v)^{\prime}(\mathbf{X}(u) - {\sum}_{v\in V\backslash\{u\}}\mathbf{X}(v)\boldsymbol{\theta}(u,v))\right). \end{array} $$

(6.17)

A vector $\boldsymbol {\hat {\theta }}$ with $\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0$, ∀v ∈ V ∖{u} is a solution to the group lasso type of estimator iff for all v ∈ V ∖{u}, $ G(\boldsymbol {\hat {\theta }}(u,v)) + \lambda \mathbf {D}(\boldsymbol {\hat {\theta }}(u,v)) = \mathbf {0}$, where $\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\|{~}_{2} = 1$ in the case of $\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} > 0$ and $\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v) \|{~}_{2} < 1$ in the case of $\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0$.

Proof.

Lemma 8.1

Under KKT conditions, using subdifferential methods, the subdifferential of

$$ \frac{1}{T}\left\|\mathbf{X}(u) - \sum\limits_{v\in V\backslash\{u\}}\mathbf{X}(v)\boldsymbol{\theta}(u,v)\right\|^{2} + \lambda\sum\limits_{v\in V\backslash\{u\}}\left\|\boldsymbol{\theta}(u,v)\right\|{~}_{2} $$

is given by $G(\boldsymbol {\theta }(u,v)) + \lambda \mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))$, where $\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\|{~}_{2} = 1$ if ∥𝜃(u, v)∥ ₂ > 0 and $\|\mathbf {D}(\boldsymbol {\hat {\theta }}(u,v))\|{~}_{2} < 1$ if ∥𝜃(u, v)∥ ₂ = 0. The lemma follows. □

We now proof Theorem 3.3.

Proof 3.

Assuming that $\hat {C}_{u} \nsubseteq C_{u}$, there must exist at least one estimated edge that joins two nodes in two different connectivity components. Given the assumptions, we use similar arguments as in the proof of Theorem 3 in Meinshausen and Bühlmann (2006). Hence we have

$$ \mathbb{P}(\exists u \in V: \hat{C}_{u} \nsubseteq C_{u}) \leq N \max_{u \in V} \mathbb{P}(\exists v \in V \backslash C_{u}: v \in \hat{\text{ne}}_{u}), $$

where $\hat {\text {ne}}_{u}$ is the estimated neighborhood of node u and $v \in \hat {\text {ne}}_{u}$ means $\| \boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} > 0$.

Let ${\mathscr{E}}$ be the event that

$$ \max\limits_{u\in V \backslash C_{u}} \left\|G \left( \boldsymbol{\hat{\theta}}(u,v)\right) \right\|{~}_{2}^{2} < \lambda^{2}. $$

Conditional on the event ${\mathscr{E}}$, $\boldsymbol {\hat {\theta }}$ is also a solution to the group lasso problem. As $\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0$ for all v ∈ V ∖C_u, it follows from Lemma 8.1 that $\|\boldsymbol {\hat {\theta }}(u,v)\|{~}_{2} = 0$ for all v ∈ V ∖C_u. Hence

$$ \begin{array}{@{}rcl@{}} \mathbb{P}(\exists v\in V \backslash C_{u}: \|\boldsymbol{\hat{\theta}}(u,v)\|{~}_{2} > 0) &\leq & 1 - \mathbb{P}(\mathscr{E})\\ &=& P\left( \max\limits_{v \in V \backslash C_{u}} \left\|G\left( \boldsymbol{\hat{\theta}}(u,v)\right) \right\|{~}_{2}^{2} \geq \lambda^{2} \right). \end{array} $$

It is then sufficient to show that

$$ N^{2} \max_{u \in V\text{, } v\in V \backslash C_{u}} \mathbb{P}\left( \left\|G(\boldsymbol{\hat{\theta}}(u,v))\right\|{~}_{2}^{2} \geq \lambda^{2}\right) \leq \alpha. $$

Note that now the v and C_u are in different connected components, which means that X(v) is conditionally independent of X(C_u). Hence, conditional on all X(C_u), we have

$$ \begin{array}{@{}rcl@{}} \left\|G(\boldsymbol{\hat{\theta}}(u,v))\right\|{~}_{2}^{2} &=& \left\|-2T^{-1}\left( \mathbf{X}(v)^{\prime}(\mathbf{X}(u) - \sum\limits_{i \in C_{u}}\mathbf{X}(i)\boldsymbol{\hat{\theta}}(u,i))\right)\right\|{~}_{2}^{2}\\ &=& 4T^{-2}\left\|(\mathbf{\hat R}_{1}, \cdots, \mathbf{\hat R}_{p} )^{\prime} \right\|{~}_{2}^{2} \end{array} $$

where $\mathbf {\hat R}_{\ell } =X_{-\ell }(v)^{\prime } \left (\mathbf {X}(u) - {\sum }_{i \in C_{u}}\mathbf {X}(i)\boldsymbol {\hat {\theta }}(u,i)\right )$ is the remainder term and is independent of X(v), at all lags ℓ, for ℓ = 1,⋯ ,p. It follows that the joint distribution

$$ (\mathbf{\hat R}_{1}, \cdots, \mathbf{\hat R}_{p}| \mathbf{X}(C_{u})) \sim N(\mathbf{0}, \mathbf{\Omega}) $$

for some covariance matrix Ω. Note that this is a conditional distribution given X(C_u). Hence, in the expression of Ω, every term appearing with a suffix u is constant and every term appearing with a suffix v is a normalized random variable. This simplifies the covariance term. Note that

$$ {\boldsymbol{\Omega}}_{p \times p} = \textbf{Cov}\left( \mathbf{\hat R}_{1}, \cdots, \mathbf{\hat R}_{p} \right) $$

and

$$ \begin{array}{@{}rcl@{}} &&{}\textbf{tr}\left( \boldsymbol{\Omega}\right) = \sum\limits_{\ell=1}^{p} \textbf{Var} (\boldsymbol{\hat R}_{\ell}) = \sum\limits_{\ell=1}^{p} \textbf{Var}\!\left( \!{\sum}_{t=1}^{T}\left( \!\!X_{t}(u) - \sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\!\right)X_{t-\ell}(v)\!\!\right)\! \\ &&{\kern7pt}=\sum\limits_{\ell=1}^{p}\sum\limits_{s=1}^{T}\sum\limits_{t=1}^{T}\textbf{Cov}\left[\left( \left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)X_{t-\ell}(v)\right),\right.\\&&{} \left.\left( \left( X_{s}(u)-\sum\limits\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)X_{s-\ell}(v)\right)\right]. \end{array} $$

(6.18)

Conditional on X(C_u), Eq. 6.18 can be further simplified as:

$$ \begin{array}{@{}rcl@{}} \textbf{tr}\left( \boldsymbol{\Omega}\right) \!\!\!&=&\!\!\! \sum\limits_{\ell=1}^{p}\sum\limits_{t=1}^{T}\sum\limits_{s=1}^{T}\left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\\&& \left( X_{s}(u)-\sum\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right) \textbf{Cov} \left[ X_{t-\ell}(v) ,X_{s-\ell}(v) \right] \\ \!\!\!&\leq&\!\!\! \sum\limits_{\ell=1}^{p}\sum\limits_{t=1}^{T}\sum\limits_{s=1}^{T}\left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\\&& \left( X_{s}(u)-\sum\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\sqrt{\textbf{Var}(X_{t-\ell}(v)) \textbf{Var}(X_{s-\ell}(v))}. \end{array} $$

We have the above bounded by

$$ \begin{array}{@{}rcl@{}} &\leq& p \sum\limits_{s=1}^{T}\sum\limits_{t=1}^{T} \left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\\&& \left( X_{s}(u)-\sum\limits_{i\in C_{u}}X_{s-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right) \\ &=& p \left[{\sum}_{t=1}^{T} \left( X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i)\right)\right]^{2} \\ & \leq& Tp \left[X_{t}(u)-\sum\limits_{i\in C_{u}}X_{t-\ell}(i)\hat{\theta}^{(\ell)}(u,i) \right]^{2}\\ &\leq& Tp \|\mathbf{X}(u)\|^{2}_{2}.\\ \end{array} $$

The last inequality comes from the Cauchy-Schwarz inequality. Denote by ν_max the largest eigenvalue of the covariance matrix Ω. Since Ω is PSD, we have (ν_maxI −Ω) is also PSD. Following Müller (2001)’s argument, we can show $(\hat {\mathbf {R}}_{1}, \cdots , \hat {\mathbf {R}}_{p}) \leq _{cx} \mathbf {Y}$ for some random vector $\mathbf {Y} \sim N(\mathbf {0}, \nu _{max}\mathbf {I}_{p})$, where ≤_cx is the convex order that means X ≤Y, if and only if μ_x = μ_y and ${\sigma _{x}^{2}} \leq {\sigma _{y}^{2}}$. It follows that

$$ \begin{array}{@{}rcl@{}} \max_{u \in V, v\in V \backslash C_{u} } \mathbb{P}\left( \left\|G(\hat{\boldsymbol\theta}(u,v))\right\|{~}_{2}^{2} \geq \lambda^{2}\right) \!\!\!&\leq&\!\!\! \max_{u \in V, v\in V \backslash C_{u}} \mathbb{P}(4T^{-2}(\mathbf{Y}^{\prime}\mathbf{Y}) \geq \lambda^{2})\\ \!\!\!&=&\!\!\! \max_{u \in V, b\in V \backslash C_{u}} \mathbb{P}\left( \frac{1}{\nu_{max}}\mathbf{Y}^{\prime}\mathbf{Y} \geq \frac{\lambda^{2}T^{2}}{4\nu_{max}}\right) . \end{array} $$

Note that the matrix $\frac {1}{\nu _{max}}\mathbf {Y}^{\prime }\mathbf {Y}$ is idempotent and thus it follows a χ²(p) distribution, and $\nu _{max} \leq \textbf {tr}(\boldsymbol {\Omega }) \leq Tp\|\mathbf {X}(u)\|{~}_{2}^{2}$. Put everything together, we have

$$ \begin{array}{@{}rcl@{}} \max_{u \in V, b\in V \backslash C_{u}} \mathbb{P}\left( \|G(\hat{\boldsymbol\theta}(u,v))\|{~}_{2}^{2} \geq \lambda^{2}\right) \!\!\!&\leq&\!\!\! \max_{u \in V, v\in V \backslash C_{u}} \mathbb{P}\left( \chi^{2}(p) \geq \frac{\lambda^{2}T^{2}}{4\nu_{max}} \right)\\ \!\!\!&\leq&\!\!\! \max_{u \in V, v\in V \backslash C_{u}} \mathbb{P}\left( \chi^{2}(p) \geq \frac{\lambda^{2}T^{2}}{4Tp\|\mathbf{X}(u)\|{~}_{2}^{2}} \right) \\&\leq&\!\!\! \frac{\alpha}{N(N-1)} \end{array} $$

and thus we have the desired λ(α, a)

$$ \lambda(\alpha) = 2\hat{\sigma}_{u}\sqrt{pQ\left( 1-\frac{\alpha}{N(N-1)}\right)}. $$

(6.19)

□

A.4 Proof of Theorem 3.3

The proof of the theorem is in line with the work in Kolaczyk and Nowak (2005). The core idea is to bound the expected Hellinger loss in terms of the Kullback-Leibler distance. This approach, building on the original work of Li and Barron (2000), leverages the union of unions bound, after discretizing the underlying parameter space. We assume a similar discretization here, while omitting the straightforward but tedious numerical analysis arguments that accompany. See, for example, Kolaczyk and Nowak (2005) for details. Our fundamental bound is given by the following theorem.

Theorem 8.2.

Let ${\Gamma }_{T}^{(N-1)p}$ be a space of finite collection of estimators $\boldsymbol {\tilde \theta }$ for 𝜃, and pen(⋅) a function on ${{\Gamma }_{T}^{p}}$ satisfying the condition

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}}e^{-pen(\boldsymbol{\tilde\theta}(u, v))} \leq 1. \end{array} $$

(6.20)

Let $\hat {\boldsymbol \theta }$ be a penalized maximum likelihood estimator of the form

$$ \begin{array}{@{}rcl@{}} \hat{\boldsymbol\theta} \equiv \operatornamewithlimits{arg min}_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{ -\log p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol{\tilde\theta}) + 2\sum\limits_{v \in V\backslash \{u\}} \text{Pen}(\boldsymbol{\tilde\theta}(u,v))\right\}. \end{array} $$

Then

$$ \begin{array}{@{}rcl@{}} \mathbb{E}[H^{2}(p_{\hat{\boldsymbol\theta}},p_{\boldsymbol\theta})] \leq \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{K(p_{\boldsymbol\theta},p_{\boldsymbol{\tilde\theta}}) + 2\sum\limits_{v \in V\backslash \{u\}}\text{Pen}(\boldsymbol{\tilde\theta}(u,v))\right\}. \end{array} $$

(6.21)

Note that the result of Theorem 6 requires that inequality (6.20) holds. Lemma 8.3 shows that our proposed penalty satisfies inequality (6.20). We now prove Theorem 6.

Proof.

Note that we have

$$ \begin{array}{@{}rcl@{}} H^{2}(p_{\hat{\boldsymbol\theta}},p_{\boldsymbol\theta}) &=& \int \left[\sqrt{p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol{\hat{\theta}})} - \sqrt{p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol\theta)}\right]^{2} d\nu(\mathbf{x}) \\ &=& 2\left( 1 - \int \sqrt{p(\mathbf{x}|\mathbf{X}(-u), \hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol\theta)}d\nu(\mathbf{x}) \right)\\ &\leq &-2 \log \int \sqrt{p(\mathbf{x}|\mathbf{X}(-u), \hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u), \boldsymbol\theta)} d\nu(\mathbf{x}). \end{array} $$

Taking the conditional expectation respect to X(u)|X(−u), we then have

$$ \begin{array}{@{}rcl@{}} \mathbb{E}[H^{2}(p_{\hat{\boldsymbol\theta}},p_{\boldsymbol\theta})] \!\!\!&\leq&\!\!\! 2\mathbb{E}\log \left( \frac{1}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right) \\ \!\!\!& \leq&\!\!\! 2\mathbb{E}\log \left( \frac{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\hat{\boldsymbol\theta})e^{- \sum\limits_{v} pen(\hat{\boldsymbol\theta}(u,v))}}{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\check{\boldsymbol\theta})e^{- \sum\limits_{v} pen(\check{\boldsymbol\theta}(u,v))}}\right.\\&&\left. \frac{1}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right), \end{array} $$

where the collection of $\check {\boldsymbol \theta }(u,v)$’s are the arguments that minimize the right-hand side of the expression (6.21). The last expression can be written in two pieces, that is

$$ \begin{array}{@{}rcl@{}} &&{} \mathbb{E}\left[ \log \frac{p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}{p(\mathbf{X}(u)|\mathbf{X}(-u),\check{\boldsymbol\theta})}\right] + 2 \sum\limits_{v} pen(\check{\boldsymbol\theta}(u,v)) \end{array} $$

(6.22)

$$ \begin{array}{@{}rcl@{}} &&{}+ 2\mathbb{E} \log \left( \frac{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\hat{\boldsymbol\theta})}{p^{1/2}(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}\frac{\prod\limits_{v} \prod\limits_{\ell} e^{-pen(\hat{\boldsymbol\theta}^{(\ell)}(u,v))}}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right).\\ \end{array} $$

(6.23)

Note that the expression (6.22) is the right hand side of Eq. 6.21. What we need to show then is that expression (6.23) is bounded above by zero. By applying Jensen’s inequality, we have Eq. 6.23 bounded by:

$$ \begin{array}{@{}rcl@{}} 2\log \mathbb{E}\left[\prod\limits_{v}e^{-pen(\hat{\boldsymbol\theta}(u,v))}\frac{\sqrt{p(\mathbf{X}(u)|\mathbf{X}(-u),\hat{\boldsymbol\theta})/p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\hat{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \right]. \end{array} $$

(6.24)

The integrand in the expectation in Eq. 6.24 can be bounded by

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \prod\limits_{v}e^{-pen(\tilde{\boldsymbol\theta}(u,v))}\frac{\sqrt{p(\mathbf{X}(u)|\mathbf{X}(-u),\tilde{\boldsymbol\theta})/p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\tilde{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})}. \end{array} $$

Given the fact that $\tilde {\boldsymbol \theta }$ does not depend on the X(−u), Eq. 6.24 can be bounded by

$$ \begin{array}{@{}rcl@{}} &&{} 2\log \sum\limits_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \prod\limits_{v}e^{-pen(\tilde{\boldsymbol\theta}(u,v))}\frac{\mathbb{E}\left[\sqrt{p(\mathbf{X}(u)|\mathbf{X}(-u),\tilde{\boldsymbol\theta})/p(\mathbf{X}(u)|\mathbf{X}(-u),\boldsymbol\theta)}\right]}{\int \sqrt{p(\mathbf{x}|\mathbf{X}(-u),\tilde{\boldsymbol\theta})p(\mathbf{x}|\mathbf{X}(-u),\boldsymbol\theta)}d\nu(\mathbf{x})} \\ &&{}= 2\log \sum\limits_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \prod\limits_{v}e^{-pen(\tilde{\boldsymbol\theta}(u,v))}. \end{array} $$

(6.25)

Since $e^{-pen(\tilde {\boldsymbol \theta }(u,v))} > 0$ for any $\boldsymbol {\tilde \theta }(u,v)$, and using the inequality ${\sum }_{i} a_{i} b_{i} \leq {\sum }_{i} a_{i} {\sum }_{i}b_{i}$ for any a_i > 0,b_i > 0, we can bound (6.25) by:

$$ \begin{array}{@{}rcl@{}} 2\log \prod\limits_{v} \sum\limits_{\boldsymbol{\tilde\theta}(u,v) \in {{\Gamma}_{T}^{p}}} e^{- pen(\tilde{\boldsymbol\theta}(u,v))}. \end{array} $$

□

From the condition in Eq. 6.20, we see that the above expression is bounded by zero. We now show that our proposed estimator satisfies condition (6.20) by the following lemma.

Lemma 8.3.

Let Γ_T be the collection of all $\boldsymbol {\tilde \theta }^{(\ell )}(u, v)$ with components $\boldsymbol {\tilde \theta }_{t}^{(\ell )}(u, v) \in D_{T}[-C, C]$ and possessed of a Haar like expansion through a common partition, using either RDP (see expression (2.2)) or RP (see expression (2.4)), where D_T[−C, C] denotes a discretization of the interval [−C, C] into T^1/2 equispaced values. For any type of penalty such that

$$ \begin{array}{@{}rcl@{}} Pen(\boldsymbol{\tilde\theta}(u, v)) = C_{3}\log T \#\{\mathcal{P}(\boldsymbol{\tilde\theta)}\} + \lambda\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})} \|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u, v)\|{~}_{2}, \end{array} $$

where C₃ = 1/2 for recursive dyadic partitioning and C₃ = 3/2 for recursive partitioning, we have

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-pen(\boldsymbol{\boldsymbol{\tilde\theta}}(u, v))} \leq 1, \end{array} $$

for T > ⌈e^2p/3⌉.

Proof.

We prove Lemma 8.3 for the case of recursive partitioning. We write ${\Gamma }_{T} = \bigcup _{d_{\ell }=1}^{T} {\Gamma }_{T}^{(d_{\ell })}$ where ${\Gamma }_{T}^{(d_{\ell })}$ is the subset of values $\boldsymbol {\tilde \theta }_{t}^{(\ell )}(u, v)$ that is composed of d_ℓ constant valued sequences. For example, ${\Gamma }_{T}^{(d_{\ell })}$ consists of all length T sequences such that there are exactly d_ℓ alternating sequences of zero and nonzero elements. So, for example, (0,0,4,0,0) and (2,0,1,1,1) might be two such sequences in ${\Gamma }_{5}^{(3)}$. Then we have

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-pen(\boldsymbol{\tilde\theta}(u, v))} &=& \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-(3/2)\log T\{\#\mathcal{P}(\boldsymbol{\tilde\theta)}\} - \lambda\sum\limits_{\mathcal{I}\in \mathcal{P}(\boldsymbol{\tilde\theta})}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u, v) \|{~}_{2}} \\ & \leq& \sum\limits_{\boldsymbol{\tilde\theta}(u, v) \in {{\Gamma}_{T}^{p}}} e^{-(3/2)\log T\{\#\mathcal{P}(\boldsymbol{\tilde\theta)}\}}\\ & \leq& \prod\limits_{\ell=1}^{p} \sum\limits_{\boldsymbol{\tilde\theta}^{(\ell)}(u, v) \in {\Gamma}_{T}} e^{-(3/2p)\log T\{\#\mathcal{P}(\boldsymbol{\tilde\theta)}\}}\\ & =& \prod\limits_{\ell=1}^{p} \sum\limits_{d_{\ell}=1}^{T}\binom{T-1}{d_{\ell}-1} e^{-d_{\ell}(3/2p)\log T}\\ & =& \prod\limits_{\ell=1}^{p} \sum\limits_{d^{\ell^{\prime}}=0}^{T-1}\binom{T-1}{d^{\ell^{\prime}}}e^{-(d^{\ell^{\prime}}+1)(3/2p)\log T} \\ & =& \prod\limits_{\ell=1}^{p} \sum\limits_{d\prime=0}^{T-1} \frac{(T-1)!}{d^{\ell^{\prime}} ! (T-d^{\ell^{\prime}}-1)!} T^{-(d^{\ell^{\prime}}+1)(3/2p)}\\ & \leq& \prod\limits_{\ell=1}^{p} T^{-(3/2p)} \sum\limits_{d^{\ell^{\prime}}=0}^{T-1} \frac{(T-1)^{d^{\ell^{\prime}}}}{d^{\ell^{\prime}}!}\frac{1}{T^{(3/2p)d^{\ell^{\prime}}}}\\ & \leq& T^{-(3/2)} e^{p} \end{array} $$

which is bounded by 1 for any T > ⌈e^2p/3⌉. The argument follows analogously for the case of recursive dyadic partitioning. □

Using the loss function and the corresponding risk function we defined before, recovering the neighborhood of node u is essentially a univariate Gaussian time series problem, and thus the KL divergence of the conditional likelihood function takes the form:

$$ \begin{array}{@{}rcl@{}} K(p_{\boldsymbol\theta},p_{\boldsymbol{\tilde\theta}}) = \mathbb{E}\left\{\log \frac{p_{\boldsymbol\theta}(\mathbf{x})}{p_{\boldsymbol{\tilde\theta}}(\mathbf{x})}\right\} = \mathbb{E}\left\{\sum\limits_{t=1}^{T} \log \frac{p_{\boldsymbol\theta}(X_{t}(u))}{p_{\boldsymbol{\tilde\theta}}(X_{t}(u))}\right\} = \sum\limits_{t=1}^{T} (\tilde\mu_{t}-\mu_{t})^{2} / (2\sigma^{2}) \end{array} $$

where each μ_t is the mean of X_t(u), and $\tilde \mu _{t}$ is an approximation/estimate thereof, for a given estimator $\boldsymbol {\tilde \theta }$. Since these means in turn are based on linear combinations of all neighborhood observations, over p lags, we have:

$$ \begin{array}{@{}rcl@{}} \tilde\mu_{t} - \mu_{t} = \sum\limits_{v\in V \backslash \{u\}}\sum\limits_{\ell=1}^{p} X_{t-\ell}(v)[\tilde\theta_{t}^{(\ell)}(u, v) - \theta_{t}^{(\ell)}(u, v)]. \end{array} $$

So the KL divergence for each neighborhood problem involves values at other nodes.

Assume without loss of generality that σ ≡ 1. From Eq. 6.21 and the fact that the K-L divergence in the Gaussian case is simply proportional to a squared ℓ₂-norm, the risk of estimating 𝜃 by $\boldsymbol {\hat {\theta }}$ should be in the form:

$$ \begin{array}{@{}rcl@{}} \mathbb{R}(\hat{\boldsymbol\theta}, \boldsymbol\theta) &\leq& \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{\frac{1}{T}K(p_{\boldsymbol\theta},p_{\boldsymbol{\tilde\theta}}) + \frac{2}{T}\sum\limits_{v=1}^{N-1} Pen(\boldsymbol{\tilde\theta}(u,v))\right\} \\ &\leq& \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}} \left\{\frac{1}{2T}\left\|\boldsymbol{\tilde\mu}-\boldsymbol\mu\right\|{~}_{2}^{2} + \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} \right.\\&&\left.+ \frac{2}{T}\sum\limits_{v=1}^{N-1}(3/2)\log T \#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}.\\ \end{array} $$

From Cauchy-Schwarz, we have that

$$ \begin{array}{@{}rcl@{}} \mathbb{R}(\hat{\boldsymbol\mu}, \boldsymbol\mu) \!\!\!\!\!&\leq&\!\!\!\! \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N\!-1)p}}\left\{\frac{1}{2T}\|{\mathbf{X}(\!-u)}^{\prime}\mathbf{X}(\!-u)\|{~}_{2} \sum\limits_{t=1}^{T}\sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left( \!\tilde\theta_{t}^{(\ell)}(u,v) - \theta_{t}^{(\ell)}(u,v)\!\right)^{2} \right.\!\\ && \!\!\!\!+\left. \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\} \\ \!\!\!\!&\leq&\!\!\!\! \min_{\boldsymbol{\tilde\theta} \in {\Gamma}_{T}^{(N-1)p}}\left\{\frac{1}{2}{\Lambda} \sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left\|\boldsymbol{\tilde\theta}_{t}^{(\ell)}(u,v) - \boldsymbol{\theta}_{t}^{(\ell)}(u,v)\right\|{~}_{2}^{2} \right.\\ &&\left. \!\!\!\!+ \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}. \end{array} $$

(6.26)

The minimization of the expression (6.26) tries to find the optimal balancing of bias and variance. To bound it, the following L₂ result from Donoho (1993) plays the core role.

Lemma 8.4.

Let $\theta _{(\cdot )}^{(\ell )}(u,v) \in BV(C)$. Define ${\theta _{bd}}_{(\cdot )}^{(\ell )}(u,v)$ to be the best d-term approximant to $\theta _{(\cdot )}^{(\ell )}(u,v)$ in the dyadic Haar basis for L₂([0,1]). Then $\|{\theta _{bd}}^{(\ell )}(u,v) - \theta ^{(\ell )}(u,v)\|{~}_{L_{2}} = \mathcal {O}(d^{-1})$.

Define ${\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v)$ to be the average sampling of ${\theta _{bd}}^{(\ell )}(u,v)$ on the interval I_i, that is ${\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v) = T{\int \limits }_{I_{i}}{\theta _{bd}}^{(\ell )}(u,v)(t) dt$. Then let ${\boldsymbol {\tilde \theta }_{bd}}^{(\ell )}(u,v)$ be the result of discretizing the elements of ${\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v)$ to the set D_T[−C, C], where C is the radius of the bounded variation ball defined in Assumption 6. We have the following by triangle inequality:

$$ \begin{array}{@{}rcl@{}} &&\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} \\&\leq& \left\|{\boldsymbol{\theta}_{bd}}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} + \left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - {\boldsymbol{\theta}_{bd}}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} \\ &&+ 2\left\|{\boldsymbol{\theta}_{bd}}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}. \end{array} $$

(6.27)

For sequence ${\boldsymbol {\theta }_{bd}}^{(\ell )}(u,v)$ and ${\boldsymbol {\tilde \theta }_{bd}}^{(\ell )}(u,v)$ obtained from average sampling, a simple argument relating Haar function on the discrete set D_T[−C, C] to the functions on the interval [0,1] is to show that

$$ \begin{array}{@{}rcl@{}} \frac{1}{T}\left\|{\boldsymbol{\tilde\theta}_{bd}}^{(\ell)}(u,v) - \boldsymbol{\theta}^{(\ell)}(u,v)\right\|{~}_{\ell_{2}}^{2} \leq \left\|\theta_{bd}^{(\ell)}(u,v) - \theta^{(\ell)}(u,v) \right\|{~}_{L2}^{2}. \end{array} $$

See equation (27) of Kolaczyk and Nowak (2005). On the right hand side of Eq. 6.27, the first resulting squared term will be of order $\mathcal {O}(Td^{-2})$. The second term is a discretization error and by lemma (8.4) is of order $\mathcal {O}(1)$. The third cross-term is therefore of order $\mathcal {O}(T^{1/2}d^{-1})$.

Given these results, we have the following bound of Eq. 6.26 by bounding the bias term over each ${\Gamma }_{T}^{(d)}$, where $d = \bigcup _{i} d_{i}$, for each d_i and i = 1,⋯ ,(N − 1)p. We then we optimize for d:

$$ \begin{array}{@{}rcl@{}} &&{}\min\limits_{\boldsymbol{\tilde\theta} \in {{\Gamma}_{T}^{(N-1)p}}^{(d)}} \left\{\frac{1}{2}{\Lambda} \sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol\theta^{(\ell)}(u,v)\right\|{~}_{2}^{2} \right.\\ &&{}\quad\quad \left. + \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}. \end{array} $$

(6.28)

The first term is dominated by the first part of expression (6.27) and is of order $\mathcal {O}({\Lambda } Td^{-2})$. In the second term, we have $\frac {\lambda }{T}{\sum }_{\mathcal {I} \in \mathcal {P}(\boldsymbol {\tilde \theta })}{\sum }_{v=1}^{N-1}\|\boldsymbol {\tilde \theta }_{\mathcal {I}}(u,v)\|{~}_{2}$, which are the group lasso terms. Given the fact that $\theta _{(\cdot )}^{(\ell )}(u,v)$ is of BV (C), we have that $1/(T^{1/2})\|\boldsymbol {\tilde \theta }_{\mathcal {I}}(u,v)\|{~}_{2}$ is of order $\mathcal {O}(C + d^{-1})$. Note that λ is of order T^− 1/2 and the number of interval $\#\{\mathcal {P}(\boldsymbol {\tilde \theta })\}$ is proportional to d. So the second term is of order $\mathcal {O}(T^{-1} *d * (C + d^{-1}) )$. The third term is of order $\mathcal {O}(dT^{-1}\log T)$. Combining the above results, we have that:

$$ \begin{array}{@{}rcl@{}} && \min\limits_{\boldsymbol{\tilde\theta} \in {{\Gamma}_{T}^{(N-1)p}}^{(d)}} \left\{\frac{1}{2}{\Lambda} \sum\limits_{v=1}^{N-1}\sum\limits_{\ell=1}^{p}\left\|\boldsymbol{\tilde\theta}^{(\ell)}(u,v) - \boldsymbol\theta^{(\ell)}(u,v)\right\|{~}_{2}^{2} \right.\\ &&\quad\quad \left. + \frac{\lambda}{T}\sum\limits_{\mathcal{I} \in \mathcal{P}(\boldsymbol{\tilde\theta})}\sum\limits_{v=1}^{N-1}\|\boldsymbol{\tilde\theta}_{\mathcal{I}}(u,v)\|{~}_{2} + 3(N-1)\frac{\log T}{T}\#\{\mathcal{P}(\boldsymbol{\tilde\theta})\}\right\}\\ &&\leq \mathcal{O}({\Lambda} Td^{-2}) + \mathcal{O}(T^{-1} *d * (C + d^{-1}) ) + \mathcal{O}(dT^{-1}\log T), \end{array} $$

which is minimized for $d \sim ({\Lambda } T^{2}/\log T)^{1/3}$. Substitution then yields the result that the risk is bounded by a quantity of order $\mathcal {O}(({\Lambda }\log ^{2}T/T)^{1/3})$. For estimation via recursive dyadic partitioning, where $\#\{\mathcal {P}(\tilde \theta )\}$ is proportional to $d\log T$, the expression is minimized at $d \sim ({\Lambda } T^{2} /\log ^{2} T)^{1/3}$, which gives the bound of the risk of order $\mathcal {O}({\Lambda } \log ^{4}T/T)^{1/3}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, X., Ganguly, A. & Kolaczyk, E.D. Dynamic Networks with Multi-scale Temporal Structure. Sankhya A 84, 218–260 (2022). https://doi.org/10.1007/s13171-021-00256-1

Download citation

Received: 04 September 2020
Accepted: 08 June 2021
Published: 28 July 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s13171-021-00256-1

Keywords

AMS (2000) subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Networks with Multi-scale Temporal Structure

Abstract

Access this article

Similar content being viewed by others

An Invitation to the Study of Brain Networks, with Some Statistical Analysis of Thresholding Techniques

Neurocraft: software for microscale brain network dynamics

δ-MAPS: from spatio-temporal data to a weighted and lagged network between functional domains

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A

A.1 Algorithm using RDP

A.2 Proof of Theorem 3.1

Proof.

A.3 Proof of Theorem 9

Lemma 8.1.

Proof.

Proof 3.

A.4 Proof of Theorem 3.3

Theorem 8.2.

Proof.

Lemma 8.3.

Proof.

Lemma 8.4.

Rights and permissions

About this article

Cite this article

Keywords

AMS (2000) subject classification

Navigation

Abstract

Access this article

Similar content being viewed by others

An Invitation to the Study of Brain Networks, with Some Statistical Analysis of Thresholding Techniques

Neurocraft: software for microscale brain network dynamics

δ-MAPS: from spatio-temporal data to a weighted and lagged network between functional domains

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A

A.1 Algorithm using RDP

A.2 Proof of Theorem 3.1

Proof.

A.3 Proof of Theorem 9

Lemma 8.1.

Proof.

Proof 3.

A.4 Proof of Theorem 3.3

Theorem 8.2.

Proof.

Lemma 8.3.

Proof.

Lemma 8.4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS (2000) subject classification

Search

Navigation