Skip to main content
Log in

A Blockwise Consistency Method for Parameter Estimation of Complex Models

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

The drastic improvement in data collection and acquisition technologies has enabled scientists to collect a great amount of data. With the growing dataset size, typically comes a growing complexity of data structures and of complex models to account for the data structures. How to estimate the parameters of complex models has put a great challenge on current statistical methods. This paper proposes a blockwise consistency approach as a potential solution to the problem, which works by iteratively finding consistent estimates for each block of parameters conditional on the current estimates of the parameters in other blocks. The blockwise consistency approach decomposes the high-dimensional parameter estimation problem into a series of lower-dimensional parameter estimation problems, which often have much simpler structures than the original problem and thus can be easily solved. Moreover, under the framework provided by the blockwise consistency approach, a variety of methods, such as Bayesian and frequentist methods, can be jointly used to achieve a consistent estimator for the original high-dimensional complex model. The blockwise consistency approach is illustrated using high-dimensional linear regression with both univariate and multivariate responses. The results of both problems show that the blockwise consistency approach can provide drastic improvements over the existing methods. Extension of the blockwise consistency approach to many other complex models is straightforward.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Banerjee, M., Durot, C. and Sen, B. (2019). Divide and conquer in non-standard problems and the super-efficiency phenomenon. Ann. Statistics47, 720–757.

    Article  MathSciNet  MATH  Google Scholar 

  • Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A., Kim, S., Wilson, C., Lehar, J., Kryukov, G., Sonkin, D., Reddy, A., Liu, M., Murray, L., Berger, M., Monahan, J., Morais, P., Meltzer, J., Korejwa, A., Jane-Valbuena, J., Mapa, F., Thibault, J., Bric-Furlong, E., Raman, P., Shipway, A. and Engels, I. (2012). The Cancer cell line encyclopedia enables predictive modeling of anticancer drug sensitivity. Nature483, 603–607.

    Article  Google Scholar 

  • Bhadra, A. and Mallick, B. (2013). Joint high-dimensional bayesian variable and covariance selection with an application to eQTL analysis. Biometrics69, 447–457.

    Article  MathSciNet  MATH  Google Scholar 

  • Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat.5, 232–252.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, T., Li, H., Liu, W. and Xie, J. (2013). Covariate-adjusted precision matrix estimation iwth an application in genetical genomics. Biometrika100, 139–156.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criterion for model selection with large model space. Biometrika94, 759–771.

    Article  MATH  Google Scholar 

  • Dempster, A. (1972). Covariance selection. Biometrics28, 157–175.

    Article  MathSciNet  Google Scholar 

  • Duguet, M. (1997). When helicase and topoisomerase meet. J. Cell Sci.110, 1345–1350.

    Google Scholar 

  • Efron, B. and Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall/CRC, Boca Raton.

    Book  MATH  Google Scholar 

  • Fan, J., Feng, Y., Saldana, D. F., Samworth, R. and Wu, Y. (2015). Sure independence screening. CRAN R Package.

  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.96, 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. Ser. B70, 849–911.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res.10, 1829–1853.

    MathSciNet  MATH  Google Scholar 

  • Fan, J. and Song, R. (2010). Sure independence screening in generalized linear model with NP-dimensionality. Ann. Stat.38, 3567–3604.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Stat.42, 819–849.

    Article  MathSciNet  MATH  Google Scholar 

  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika80, 27–38.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics9, 432–441.

    Article  MATH  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22.

    Article  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2015). GLASSO: Graphical lasso- estimation of Gaussian graphical models, CRAN R-Package.

  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell.6, 721–741.

    Article  MATH  Google Scholar 

  • Guo, N., Wan, Y., Tosun, K., Lin, H., Msiska, Z., Flynn, D., Remick, S., Vallyathan, V., Dowlati, A., Shi, X., Castranova, V., Beer, D. and Qian, Y. (2008). Confirmation of gene expression-based prediction of survival in non-small cell lung cancer. Clin. Cancer Res.14, 8213–8220.

    Article  Google Scholar 

  • Hamburg, M. and Collins, F. (2010). The path to personalized medicine. New Engl. J. Med.363, 301–304.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The elements of statistical learning. Springer, Berlin.

    Book  MATH  Google Scholar 

  • Li, R., Lin, D. and Li, B. (2013). Statistical inference in massive data sets. Appl. Stoch. Model. Bus. Ind.29, 399–409.

    MathSciNet  Google Scholar 

  • Li, X., Xu, S., Cheng, Y. and Shu, J. (2016). HSPB1 polymorphisms might be associated with radiation-induced damage risk in lung cancer patients treated with radiotherapy. Tumour Biol.37, 5743–5749.

    Article  Google Scholar 

  • Liang, F., Jia, B., Xue, J., Li, Q. and Luo, Y. (2018). An imputation-regularized optimization algorithm for high-dimensional missing data problems and beyond. J. R. Statist. So. Series B80, 899–926.

    Article  MathSciNet  MATH  Google Scholar 

  • Liang, F., Song, Q. and Qiu, P. (2015). An Equivalent Measure of Partial Correlation Coefficients for High Dimensional Gaussian Graphical Models. J. Am. Stat. Assoc.110, 1248–1265.

    Article  MathSciNet  MATH  Google Scholar 

  • Liang, F., Song, Q. and Yu, K. (2013). Bayesian Subset Modeling for High Dimensional Generalized Linear Models. J. Am. Stat. Assoc.108, 589–606.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, H., Lafferty, J. and Wasserman, L. (2009). The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. J. Mach. Learn. Res.10, 2295–2328.

    MathSciNet  MATH  Google Scholar 

  • Mazumder, R., Friedman, J. and Hastie, T. (2011). SparseNet: Coordinate descent with nonconvex penalties. J. Am. Stat. Assoc.106, 1125–1138.

    Article  MathSciNet  MATH  Google Scholar 

  • Mazumder, R. and Hastie, T. (2012). The graphical Lasso: New insights and alternatives. Elect J Stat6, 2125–2149.

    Article  MathSciNet  MATH  Google Scholar 

  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist.34, 1436–1462.

    Article  MathSciNet  MATH  Google Scholar 

  • Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D. -Y., Pollack, J. R. and Wang, P. (2010). Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. Ann. Appl. Stat.4, 53–77.

    Article  MathSciNet  MATH  Google Scholar 

  • Peng, Z., Wu, T., Xu, Y., Yan, M. and Yin, W. (2016). Coordinate friendly structures, algorithms and applications. Annals of Mathematical Sciences and Applications1, 57–119.

    Article  MathSciNet  MATH  Google Scholar 

  • Raskutti, G., Wainwright, M. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over l qballs. IEEE Trans. Inf. Theory57, 6976–6994.

    Article  MATH  Google Scholar 

  • Rothman, A. (2015). MRCE: Multivariate regression with covariance estimation, CRAN R-Package.

  • Rothman, A., Levina, E. and Zhu, J. (2010). Sparse multivariate regression with covariance estimation. J. Comput. Graph. Stat.19, 947–962.

    Article  MathSciNet  Google Scholar 

  • Sofer, T., Dicker, L. and Lin, X. (2014). Variable selection for high dimensional multivariate outcomes. Stat. Sin.22, 1633–1654.

    MathSciNet  MATH  Google Scholar 

  • Song, Q. and Liang, F. (2015a). High Dimensional Variable Selection with Reciprocal L1-Regularization. J. Am. Stat. Assoc.110, 1607–1620.

    Article  MATH  Google Scholar 

  • Song, Q. and Liang, F. (2015b). A split-and-merge Bayesian variable selection approach for ultra-high dimensional regression. J. R. Stat. Soc. Ser. B77, 947–972.

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl.109, 475–494.

    Article  MathSciNet  MATH  Google Scholar 

  • Tseng, P. and Yun, S. (2009). A coordinate gradient descent method for nonsmooth separable minimization. Mathematics Programming, Series B117, 387–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Turlach, B., Venables, W. and Wright, S. (2005). Simultaneous variable selection. Technometrics47, 349–363.

    Article  MathSciNet  Google Scholar 

  • Vershynin, R. (2015). Estimation in high dimensions: A geometric perspective. Cham, Pfander, G. (ed.), p. 3–66.

  • Wang, J. (2015). Joint estimation of sparse multivariate regression and conditional graphical models. Stat. Sin.25, 831–851.

    MathSciNet  MATH  Google Scholar 

  • Weickert, C.E. (2009). Transcriptome analysis of male female differences in prefrontal cortical development. Molecular Psychiatry14, 558–561.

    Article  Google Scholar 

  • Witten, D., Friedman, J. and Simon, N. (2011). New insights and faster computations for the graphical Lasso. J. Comput. Graph. Stat.20, 892–900.

    Article  MathSciNet  Google Scholar 

  • Xue, J. and Liang, F. (2019). Double-Parallel Monte Carlo for Bayesian analysis of big data. Statist. Comput.29, 23–32.

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika95, 19–35.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, C. -H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics38, 894–942.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, Y., Duchi, J. and Wainwright, M. (2013). Divide and conquer kernel ridge regression. In Conference on learning theory, pp. 592–617.

  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res.7, 2541–2563.

    MathSciNet  MATH  Google Scholar 

  • Zou, H. (2006). The adptive lasso and its oracle properties. Ann Statist38, 894–942.

    Google Scholar 

Download references

Acknowledgments

Liang’s research was support in part by the grants DMS-1612924, DMS/NIGMS R01-GM117597, and NIGMS R01-GM126089. The authors thank the editor, associate editor, and two referees for their helpful comments which have led to significant improvement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faming Liang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 2.1

We follow the proof of Theorem 2.4.3 of van der Vaart and Wellner (1996). By the symmetrization Lemma 2.3.1 of van der Vaart and Wellner (1996), measurability of the class \(\mathcal {F}_{n}\), and Fubini’s theorem,

$$\begin{array}{@{}rcl@{}} &&E^{*}\sup\limits_{\theta^{(s)} \in {\Theta}_{n}^{(s)},\hat{\boldsymbol{\theta}}_{t-1}^{(-s)}\in {\Theta}_{n,T}^{(-s)}} \left| \widehat{G}_{n}(\theta^{(s)}|\hat{\boldsymbol{\theta}}_{t-1}^{(-s)}) - G_{n}(\theta^{(s)}|\hat{\boldsymbol{\theta}}_{t-1}^{(-s)}) \right|\\ &\leq& 2 E_{x} E_{\epsilon} \sup\limits_{q\in \mathcal{F}} \left\|\frac{1}{n}\sum\limits_{i = 1}^{n} \epsilon_{i} q(x_{i}) \right\| \leq 2 E_{x}E_{\epsilon} \sup\limits_{q\in \mathcal{G}_{n,M}} \left\|\frac{1}{n}\sum\limits_{i = 1}^{n} \epsilon_{i} q(x_{i}) \right\|\\ &&+ 2 E^{*}[m_{n}(x)1(m_{n}(x)>M)], \\ \end{array} $$

where 𝜖i are i.i.d. Rademacher random variables with P(𝜖i = + 1) = P(𝜖i = − 1) = 1/2, and E denotes the outer expectation.

By condition (B2)-(a), 2E[mn(x)1(mn(x) > M)] → 0 for sufficiently large M. To prove convergence in mean, it suffices to show that the first term converges to zero for fixed M. Fix x1,...,xn, and let \(\mathcal {H}\) be a 𝜖-net in \(L_{1}(\mathbb {P}_{n})\) over \(\mathcal {G}_{M}\), then

$$E_{\epsilon} \sup\limits_{q\in \mathcal{G}_{n,M}}\left\|\frac{1}{n}\sum\limits_{i = 1}^{n} \epsilon_{i} q(x_{i}) \right\|\leq E_{\epsilon} \sup\limits_{q\in \mathcal{H}}\left\|\frac{1}{n}\sum\limits_{i = 1}^{n} \epsilon_{i} q(x_{i}) \right\|+\epsilon, $$

where the cardinality of \(\mathcal {H}\) can be chosen equal to \(N(\epsilon ,\mathcal {G}_{n,M},L_{1}(\mathbb {P}_{n})\). Bound the l1-norm on the right by the Orlicz-norm ψ2 and use the maximal inequality (Lemma 2.2.2 of van der Vaart and Wellner (1996)) and Hoeffding’s inequality, it can be shown that

$$ E_{\epsilon} \sup\limits_{q\in \mathcal{G}_{n,M}}\left\|\frac{1}{n}\sum\limits_{i = 1}^{n} \epsilon_{i} q(x)\right\| \leq K \sqrt{1+\log N(\epsilon,\mathcal{G}_{n,M},L_{1}(\mathbb{P}_{n}))}\sqrt{\frac{6}{n}}M +\epsilon \rightarrow_{P^{*}} \epsilon, $$
(A.1)

where K is a constant, and P denotes outer probability. It has been shown that the left side of Eq. A.1 converges to zero in probability. Since it is bounded by M, its expectation with respect to x1,…,xn converges to zero by the dominated convergence theorem.

This concludes the proof that \(\sup _{\theta ^{(s)} \in {\Theta }_{n}^{(s)},\hat {\boldsymbol {\theta }}_{t-1}^{(-s)}\in {\Theta }_{n,T}^{(-s)}} \left | \widehat {G}_{n}(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1}^{(-s)}) - G_{n}\right .\)\(\left .(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1}^{(-s)}) \right | \rightarrow _{p} 0\) in mean. Further, by Markov inequality, we conclude that Eq. 2.6 holds.

Proof of Theorem 2.2

Since both \(\widehat {G}_{n}(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1}^{(-s)})\) and \(G_{n}(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1}^{(-s)})\) are continuous in 𝜃(s) as implied by the continuity of log π(x|𝜃) in 𝜃, the remaining part of the proof follows from Lemma A.1.

Lemma A.1.

Consider a sequence of functionsQt(𝜃, Xn) fort = 1,2,…,T.Suppose that the following conditions are satisfied: (C1) For eacht,Qt(𝜃, Xn) is continuous in𝜃and there exists a function\(Q_{t}^{*}(\boldsymbol {\theta })\),which is continuous in𝜃and uniquely maximized at\(\boldsymbol {\theta }_{*}^{(t)}\).(C2) For any𝜖 > 0,\(\sup _{\boldsymbol {\theta } \in {\Theta }_{n}\setminus B_{t}(\epsilon )} Q_{t}^{*}(\boldsymbol {\theta })\)exists,where\(B_{t}(\epsilon )=\{\boldsymbol {\theta }: \|\boldsymbol {\theta }-\boldsymbol {\theta }_{*}^{(t)}\| < \epsilon \}\);Let\(\delta _{t}=Q_{t}^{*}(\boldsymbol {\theta }_{*}^{(t)})- \sup _{\boldsymbol {\theta } \in {\Theta }_{n}\setminus B_{t}(\epsilon )} Q_{t}^{*}\) (𝜃),δ = mint∈{1,2,…,T}δt > 0.(C3)\(\sup _{t\in \{1,2,\ldots ,T\}} \sup _{\boldsymbol {\theta } \in {\Theta }_{n}} |Q_{t}(\boldsymbol {\theta }, \boldsymbol {X}_{n})-Q_{t}^{*}(\boldsymbol {\theta })| \to _{p} 0\)asn.(C4) The penalty function\(P_{\lambda _{n}}(\boldsymbol {\theta })\)isnon-negative and converges to 0 uniformly over the set\(\{\boldsymbol {\theta }_{*}^{(t)}: t = 1,2,\ldots ,T\}\)asn,whereλnis a regularization parameter and its value can depend on the sample size n.Let\(\boldsymbol {\theta }_{n}^{(t)}=\arg \max _{\boldsymbol {\theta }\in {\Theta }_{n}} \{ Q_{t}(\boldsymbol {\theta }, \boldsymbol {X}_{n})-P_{\lambda _{n}}(\boldsymbol {\theta })\}\).Then the uniform convergence holds, i.e.,\(\sup _{t \in \{1,2,\ldots ,T\}} \|\boldsymbol {\theta }_{n}^{(t)}- \boldsymbol {\theta }_{*}^{(t)}\|\to _{p} 0\).

Proof.

Consider two events (i)\(\sup _{t \in \{1,2,\ldots ,T\}} \sup _{\boldsymbol {\theta } \in {\Theta }_{n}\setminus B_{t}(\epsilon )} |Q_{t}(\boldsymbol {\theta },\boldsymbol {X}_{n})-Q_{t}^{*}(\boldsymbol {\theta })| < \delta /2\),and (ii) supt∈{1,2,…,T}\(\sup _{\boldsymbol {\theta } \in B_{t}(\epsilon )} |Q_{t}(\boldsymbol {\theta },\boldsymbol {X}_{n})-Q_{t}^{*}(\boldsymbol {\theta })| < \delta /2\).From event (i), we can deduce that for anyt ∈{1,2,…,T} and any𝜃 ∈ΘnBt(𝜖),\(Q_{t}(\boldsymbol {\theta }, \boldsymbol {X}_{n}) < Q_{t}^{*}(\boldsymbol {\theta })+\delta /2 \leq Q_{t}^{*}(\boldsymbol {\theta }_{*}^{(t)}) -\delta _{t} +\delta /2 \leq Q_{t}^{*}(\boldsymbol {\theta }_{*}^{(t)}) -\delta /2\).Therefore,\(Q_{t}(\boldsymbol {\theta }, \boldsymbol {X}_{n}) -P_{\lambda _{n}}(\boldsymbol {\theta }) < Q_{t}^{*}(\boldsymbol {\theta }_{*}^{(t)}) -\delta /2 -o(1)\)bycondition (C4).

From event (ii), we can deduce that for anyt ∈{1,2,…,T} and any𝜃Bt(𝜖),\(Q_{t}(\boldsymbol {\theta }, \boldsymbol {X}_{n})> Q_{t}^{*}(\boldsymbol {\theta }) -\delta /2\)and\(Q_{t}(\boldsymbol {\theta }_{*}^{(t)}, \boldsymbol {X}_{n})> Q_{t}^{*}(\boldsymbol {\theta }_{*}^{(t)}) -\delta /2\).Therefore,\(Q_{t}(\boldsymbol {\theta }_{*}^{(t)}, \boldsymbol {X}_{n})-P_{\lambda _{n}}(\boldsymbol {\theta }_{*}^{(t)}) > Q_{t}^{*}(\boldsymbol {\theta }_{*}^{(t)}) -\delta /2- o(1)\)bycondition (C4).

If both events hold simultaneously, then we musthave\({\boldsymbol {\theta }}_{n}^{(t)} \in B_{t}(\epsilon )\)forallt ∈{1,2,…,T} asn.By condition (C3), the probability that both events hold tends to 1.Therefore,

$$P(\boldsymbol{\theta}_{n}^{(t)} \in B_{t}(\epsilon) \text{ for all } t = 1,2,\ldots,T) \to 1, $$

which concludes the lemma. □

Proof of Theorem 2.3

Applying Taylor expansion to \(G_{n}(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1,g}^{(-s)})\) at \(\theta _{t*}^{(s)}\), we get \(G_{n}(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1,g}^{(-s)}) - G_{n}(\theta _{t*}^{(s)}|\hat {\boldsymbol {\theta }}_{t-1,g}^{(-s)}) =O_{p}(1/n)\), following from the condition (B5) and the condition (B3) that \(G_{n}(\theta ^{(s)}|\hat {\boldsymbol {\theta }}_{t-1,g}^{(-s)})\) is maximized at \(\boldsymbol {\theta }_{t*}^{(s)}\). Therefore,

$$\begin{array}{@{}rcl@{}} n[\widehat{G}_{n}(\theta_{t,g}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t*}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})] &=&Z_{t,1}+\cdots+Z_{t,n}+n[G_{n}(\theta_{t,g}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})\\&&- G_{n}(\boldsymbol{\theta}_{t*}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})]= Z_{t,1}+\cdots+Z_{t,n}+ \epsilon_{n}, \end{array} $$

where 𝜖n = Op(1), and

$$ P(n|\widehat{G}_{n}(\boldsymbol{\theta}_{t,g}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t*}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})|>nz) \leq\! P(|Z_{t,1}+\cdots+Z_{t,n}|>nz-|\epsilon_{n}|). $$
(A.2)

By Bernstein’s inequality,

$$ P(|Z_{t,1}+\cdots+Z_{t,n}|>nz-|\epsilon_{n}|) \leq 2 \exp\left\{-\frac{1}{2} \frac{(z-|\epsilon_{n}|/n)^{2}}{ \tilde{v}^{\prime}+\tilde{M}_{b}^{\prime}(z-|\epsilon_{n}|/n)} \right\}, $$
(A.3)

for \(\tilde {v}^{\prime } \geq (\tilde {v}_{1}+\cdots +\tilde {v}_{n})/n^{2}\) and \(\tilde {M}_{b}^{\prime }=\tilde {M}_{b}/n\). Applying Taylor expansion to the right of Eq. A.3 at z and combining with Eq. A.2 leads to

$$ P(|\widehat{G}_{n}(\theta_{t,g}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t*}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})| >z) \leq K \exp\left\{-\frac{1}{2} \frac{z^{2}}{\tilde{v}^{\prime}+\tilde{M}_{b}^{\prime} z} \right\}, $$
(A.4)

where \(K = 2+\frac {3}{\tilde {M}_{b}^{\prime }}O_{p}(1/n)= 2+\frac {3}{\tilde {M}_{b}}O_{p}(1)\), since the derivative \(|d[z^{2}/(\tilde {v}^{\prime }+\tilde {M}_{b}^{\prime } z)]/dz| \leq 3/\tilde {M}_{b}^{\prime }\).

By applying Lemma 2.2.10 of van der Vaart and Wellner (1996), for Orlicz norm ψ1, we have

$$\begin{array}{@{}rcl@{}} &&\left\| \sup\limits_{\theta_{t,g}^{(s)} \in {\Theta}_{n}^{(s)}, t = 1,2,\ldots,T} \left |\widehat{G}_{n}(\theta_{t,g}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t*}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) \right| \right\|_{\psi_{1}}\\ &&\leq \epsilon + K^{\prime}\left( \tilde{M}_{b}^{\prime} \log (1+TN(\epsilon,\mathcal{G}_{n,M},L_{1}(\mathbb{P}_{n}))) +\sqrt{\tilde{v}^{\prime}}\sqrt{\log(1+TN(\epsilon,\mathcal{G}_{n,M},L_{1}(\mathbb{P}_{n})))} \right),\\ \end{array} $$
(A.5)

for a constant K and any 𝜖 > 0. Since \(\tilde {v}^{\prime }=O(1/n)\), \(\tilde {M}_{b}^{\prime }=O(1/n)\), log(T) = o(n), and \(\log N(\epsilon ,\mathcal {G}_{n,M},L_{1}(\mathbb {P}_{n}))=o(n)\), we have

$$\left\| \sup\limits_{\theta_{t,g}^{(s)} \in {\Theta}_{n}^{(s)}, t = 1,2,\ldots,T} \left|\widehat{G}_{n}(\theta_{t,g}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t*}^{(s)}|\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) \right| \right\|_{\psi_{1}} \rightarrow_{P^{*}} \epsilon. $$

Therefore,

$$ \sup\limits_{\theta_{t,g}^{(s)} \in {\Theta}_{n}^{(s)}, t \in \{1,2,\ldots,T\}} \left|\widehat{G}_{n}(\theta_{t,g}^{(s)}| \hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})- G_{n}(\theta_{t*}^{(s)} |\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) \right| \to_{p} 0. $$
(A.6)

Note that, as implied by the proof of Lemma 2.2.10 of van der Vaart and Wellner (1996), Eq. A.5 holds for a general constant K in Eq. A.4. Then, by condition (B3), we must have the uniform convergence that \(\theta _{t,g}^{(s)} \in B_{t}(\epsilon )\) for all t as n, where Bt(𝜖) is as defined in (B3). This statement can be proved by contradiction as follows:

Assume \(\theta _{t,g}^{(s)} \notin B_{i}(\epsilon )\) for some i ∈{1,2,…,T}. By the uniform convergence established in Theorem 2.1, \(\left |\widehat {G}_{n}(\theta _{t,g}^{(s)}| \hat {\boldsymbol {\theta }}_{t-1,g}^{(-s)})- G_{n}(\theta _{t,g}^{(s)} |\hat {\boldsymbol {\theta }}_{t-1,g}^{(-s)}) \right | =o_{p}(1)\). Further, by condition (B3) and the assumption \(\theta _{t,g}^{(s)} \notin B_{i}(\epsilon )\),

$$\begin{array}{@{}rcl@{}} \left| \widehat{G}_{n}(\theta_{t,g}^{(s)}| \hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)})- G_{n}(\theta_{t*}^{(s)} |\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) \right| & \geq& \left|G_{n}(\theta_{t,g}^{(s)} |\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t*}^{(s)} |\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) \right|\\ &&- \left|\widehat{G}_{n}(\theta_{t,g}^{(s)}| \hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) - G_{n}(\theta_{t,g}^{(s)} |\hat{\boldsymbol{\theta}}_{t-1,g}^{(-s)}) \right| \\ & \geq& \delta-o_{p}(1), \end{array} $$

which contradicts with the uniform convergence established in Eq. A.6. This concludes the proof.

Proof of Theorem 2.4

Define \(d_{t}^{(n)}:=\|\hat {\boldsymbol {\theta }}_{t}-\widetilde {\boldsymbol {\theta }}_{t}\|\), where n indicates the implicit dependence of \(\hat {\boldsymbol {\theta }}_{t}\) on n. Then

$$ d_{t}^{(n)}:=\|\hat{\boldsymbol{\theta}}_{t}-\widetilde{\boldsymbol{\theta}}_{t}\| \leq \|\hat{\boldsymbol{\theta}}_{t}-M_{s}(\hat{\boldsymbol{\theta}}_{t-1})\| + \|M_{s}(\hat{\boldsymbol{\theta}}_{t-1})-\widetilde{\boldsymbol{\theta}}_{t}\|. $$
(A.7)

For the first component of the inequality (A.7), we define

$$g_{n}:=\sup\limits_{t \in \Bbb{N},\hat{\boldsymbol{\theta}}_{t-1} \in {\Theta}_{n}} \|\hat{\boldsymbol{\theta}}_{t}-M_{s}(\hat{\boldsymbol{\theta}}_{t-1})\|, $$

which converges to zero in probability as n, following from Theorems 2.2 and 2.3 for both types of consistent estimation procedures considered in the paper. For the second component of the inequality (A.7), we have

$$\|M_{s}(\hat{\boldsymbol{\theta}}_{t-1})-\widetilde{\boldsymbol{\theta}}_{t}\|=\|M_{s}(\hat{\boldsymbol{\theta}}_{t-1})-M_{s}(\widetilde{\boldsymbol{\theta}}_{t-1})\| \leq \rho^{*} \|\hat{\boldsymbol{\theta}}_{t-1}-\widetilde{\boldsymbol{\theta}}_{t-1} \|=\rho^{*} d_{t-1}^{(n)}, $$

following from condition (B6).

Combining with the fact that d0 = 0, i.e., the two paths \(\{\hat \theta _{t}\}\) and \(\{\widetilde \theta _{t}\}\) started from the same point, we have

$$ d_{t}^{(n)}\leq \sum\limits_{l = 0}^{t-1} g_{n} (\rho^{*})^{l} \leq \frac{g_{n}}{1-\rho^{*}} \stackrel{p}{\to} 0, $$
(A.8)

where the convergence is uniform over t as \(g_{n} \stackrel {p}{\to } 0\). Moreover, since \(\widetilde {\boldsymbol {\theta }}_{t}\) converges to a coordinatewise maximum point of \(E_{\boldsymbol {\theta }_{*}} \log \pi (X|\boldsymbol {\theta })\) under conditions (A1) and (A2), \(\hat {\boldsymbol {\theta }}_{t}\) will converge to the same point in probability. That is, \(\hat {\boldsymbol {\theta }}_{\infty }:=\lim _{t\rightarrow \infty }\hat {\boldsymbol {\theta }}_{t} \stackrel {p}{\to } \widetilde {\boldsymbol {\theta }}_{\infty }: =\lim _{t\rightarrow \infty } \widetilde {\boldsymbol {\theta }}_{t}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, R., Liang, F., Song, Q. et al. A Blockwise Consistency Method for Parameter Estimation of Complex Models. Sankhya B 80 (Suppl 1), 179–223 (2018). https://doi.org/10.1007/s13571-018-0183-0

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-018-0183-0

Keywords and phrases.

AMS (2000) subject classification.

Navigation