Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data

Wang, Long; Xie, Fangzheng; Xu, Yanxun

doi:10.1007/s12561-021-09324-4

Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data

Published: 15 October 2021

Volume 15, pages 583–607, (2023)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

168 Accesses
Explore all metrics

Abstract

Estimating the dimension of a model along with its parameters is fundamental to many statistical learning problems. Traditional model selection methods often approach this task by a two-step procedure: first estimate model parameters under every candidate model dimension, then select the best model dimension based on certain information criterion. When the number of candidate models is large, however, this two-step procedure is highly inefficient and not scalable. We develop a novel automated and scalable approach with theoretical guarantees, called mixed-binary simultaneous perturbation stochastic approximation (MB-SPSA), to simultaneously estimate the dimension and parameters of a statistical model. To demonstrate the broad practicability of the MB-SPSA algorithm, we apply the MB-SPSA to various classic statistical models including K-means clustering, Gaussian mixture models with an unknown number of components, sparse linear regression, and latent factor models with an unknown number of factors. We evaluate the performance of the MB-SPSA through simulation studies and an application to a single-cell sequencing dataset in terms of accuracy, running time, and scalability. The code implementing the MB-SPSA is available at http://github.com/wanglong24/MB-SPSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Horseshoe-Like Regularization for Feature Subset Selection

Article 17 December 2019

Sparse dimension reduction based on energy and ball statistics

Article 10 November 2021

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Article Open access 21 March 2018

References

Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24
Google Scholar
Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ et al (2019) The single-cell transcriptional landscape of mammalian organogenesis. Nature 566(7745):496–502
Google Scholar
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B (Methodol) 59(4):731–792
MATH Google Scholar
Bhattacharya A, Dunson DB (2011) Sparse Bayesian infinite factor models. Biometrika 98(2):291–306
MathSciNet MATH Google Scholar
Athreya A, Fishkind DE, Tang M, Priebe CE, Park Y, Vogelstein JT, Levin K, Lyzinski V, Qin Y (2017) Statistical inference on random dot product graphs: a survey. J Mach Learn Res 18(1):8393–8484
MathSciNet MATH Google Scholar
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
MathSciNet MATH Google Scholar
Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
MathSciNet MATH Google Scholar
Zoph B et al (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 8697–8710
Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. Proc AAAI Conf Artif Intell 33:4780–4789
Google Scholar
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732
MathSciNet MATH Google Scholar
Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2(6):1152–1174
MathSciNet MATH Google Scholar
Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(2):249–265
MathSciNet Google Scholar
Ghahramani Z, Griffiths TL (2006) Infinite latent feature models and the Indian buffet process. In: Advances in neural information processing systems, pp 475–482
Walker SG (2007) Sampling the dirichlet mixture model with slices. Commun Stat-Simul Comput 36(1):45–54
MathSciNet MATH Google Scholar
Blei DM, Jordan MI et al (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–143
MathSciNet MATH Google Scholar
Markley SC, Miller DJ (2010) Joint parsimonious modeling and model order selection for multivariate gaussian mixtures. IEEE J Select Top Signal Proces 4(3):548–559
Google Scholar
Huang T, Peng H, Zhang K (2017) Model selection for Gaussian mixture models. Stat Sin 27(1):147–169
MathSciNet MATH Google Scholar
Bertsimas D, King A, Mazumder R et al (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852
MathSciNet MATH Google Scholar
Miyashiro R, Takano Y (2015) Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur J Oper Res 247(3):721–731
MathSciNet MATH Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MathSciNet MATH Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. http://arxiv.org/abs/1412.6980
Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans Autom Control 37(3):332–341
MathSciNet MATH Google Scholar
Alessandri A, Parisini T (1997) Nonlinear modeling of complex large-scale plants using neural networks and stochastic approximation. IEEE Trans Syst Man Cybern A Syst Hum 27(6):750–757
Google Scholar
Balakrishna R, Antoniou C, Ben-Akiva M, Koutsopoulos HN, Wen Y (2007) Calibration of microscopic traffic simulation models: methods and application. Transp Res Rec 1999(1):198–207
Google Scholar
Kocsis L, Szepesvári C (2006) Universal parameter optimisation in games based on spsa. Mach Learn 63(3):249–286
Google Scholar
Sidorov KA, Richmond S and Marshall D (2009) An efficient stochastic approach to groupwise non-rigid image registration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2208–2213
Wang L, Zhu J and Spall JC (2018) Mixed simultaneous perturbation stochastic approximation for gradient-free optimization with noisy measurements. In Proceedings of the annual american control conference, pp 3774–3779
Tympakianaki A, Koutsopoulos HN, Jenelius E (2015) C-SPSA: cluster-wise simultaneous perturbation stochastic approximation algorithm and its application to dynamic origin-destination matrix estimation. Transp Res C Emerg Technol 55:231–245
Google Scholar
Dong N, Wu C-H, Gao Z-K, Chen Z-Q, Ip W-H (2016) Data-driven control based on simultaneous perturbation stochastic approximation with adaptive weighted gradient estimation. IET Control Theory Appl 10(2):201–209
MathSciNet Google Scholar
Lorenz R, Monti RP, Violante IR, Anagnostopoulos C, Faisal AA, Montana G, Leech R (2016) The automatic neuroscientist: a framework for optimizing experimental design with closed-loop real-time fmri. Neuroimage 129:320–334
Google Scholar
Alaeddini A, Klein DJ (2017) Application of a second-order stochastic optimization algorithm for fitting stochastic epidemiological models. In: Proceedings of the winter simulation conference, pp 2194–2206
Khatami A, Nazari A, Khosravi A, Lim CP, Nahavandi S (2020) A weight perturbation-based regularisation technique for convolutional neural networks and the application in medical imaging. Expert Syst Appl 149:113196
Google Scholar
Aksakalli V, Malekipirbazari M (2016) Feature selection via binary simultaneous perturbation stochastic approximation. Pattern Recogn Lett 75:41–47
Google Scholar
Dennis J Jr, Schnabel RB (1989) Chapter ia view of unconstrained optimization. Handb Oper Res Manage Sci 1:1–72
Google Scholar
Spall JC (1998) Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans Aerosp Electron Syst 34(3):817–823
Google Scholar
Bottou L, Cun YL (2004) Large scale online learning. In: Proceedings of the advances in neural information processing systems, pp 217–224
Spall JC (2005) Introduction to stochastic search and optimization: estimation, simulation, and control, vol 65. Wiley, Berlin
MATH Google Scholar
Shukla AK, Muhuri PK (2019) Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intell 77:268–282
Google Scholar
de la Fuente-Tomas L, Arranz B, Safont G, Sierra P, Sanchez-Autet M, Garcia-Blanco A, Garcia-Portilla MP (2019) Classification of patients with bipolar disorder using k-means clustering. PLoS ONE 14(1):e0210314
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Brunet J-P, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169
Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Proceedings of the advances in neural information processing systems, pp 556–562
Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-Marqui RD (2006) Nonsmooth nonnegative matrix factorization (nsnmf). IEEE Trans Pattern Anal Mach Intell 28(3):403–415
Google Scholar
Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3(1):246–259
Google Scholar
Frichot E, Mathieu F, Trouillon T, Bouchard G, François O (2014) Fast and efficient estimation of individual ancestry coefficients. Genetics 196(4):973–983
Google Scholar
Stein-O’Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF et al (2018) Enter the matrix: factorization uncovers knowledge from omics. Trends Genet 34(10):790–805
Google Scholar
Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 2009:785152
Google Scholar
Févotte C, Cemgil AT (2009) Nonnegative matrix factorizations as probabilistic inference in composite models. In: Proceedings of the European signal processing conference, pp 1913–1917
Landgraf AJ, Lee Y (2020) Generalized principal component analysis: projection of saturated model parameters. Technometrics 62(4):459–472
MathSciNet Google Scholar
Zhang S et al (2020) Review of single-cell rna-seq data clustering for cell type identification and characterization. http://arxiv.org/abs/2001.01006
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
MATH Google Scholar
Durif G, Modolo L, Mold JE, Lambert-Lacroix S, Picard F (2019) Probabilistic count matrix factorization for single cell expression data analysis. Bioinformatics 35(20):4011–4019
Google Scholar
Sun S, Chen Y, Liu Y, Shang X (2019) A fast and efficient count-based matrix factorization method for detecting cell types from single-cell rnaseq data. BMC Syst Biol 13(2):28
Google Scholar
Bruce P, Bruce A (2017) Practical statistics for data scientists: 50 essential concepts. O’Reilly Media, Inc, Newton
Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, New York
MATH Google Scholar
Yang L, Liu J, Lu Q, Riggs AD, Wu X (2017) SAIC: an iterative clustering approach for analysis of single cell RNA-seq data. BMC Genomics 18(6):689
Google Scholar
Jiang L, Chen H, Pinello L, Yuan G-C (2016) Giniclust: detecting rare cell types from single-cell gene expression data with gini index. Genome Biol 17(1):144
Google Scholar
Zhu X, Ching T, Pan X, Weissman SM, Garmire L (2017) Detecting heterogeneity in single-cell rna-seq data by non-negative matrix factorization. PeerJ 5:e2888
Google Scholar
Linderman GC, Rachh M, Hoskins JG, Steinerberger S, Kluger Y (2019) Fast interpolation-based t-sne for improved visualization of single-cell rna-seq data. Nat Methods 16(3):243–245
Google Scholar
Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Gephart MGH, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci 112(23):7285–7290
Google Scholar
Ghosh J, Acharya A (2011) Cluster ensembles. Wiley Interdiscipl Rev Data Mining Knowl Discov 1(4):305–315
Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
MATH Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Xie F, Xu Y (2019) Optimal Bayesian estimation for random dot product graphs. http://arxiv.org/abs/1904.12070
Huang H, Shi G, He H, Duan Y, Luo F (2019) Dimensionality reduction of hyperspectral imagery based on spatial-spectral manifold learning. IEEE Trans Cybern 50(6):2604–2616
Google Scholar
Bing X, Bunea F, Wegkamp M et al (2020) A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics. Bernoulli 26(3):1765–1796
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work of Xu was supported by NSF 1918854 and NSF 1940107.

Author information

Authors and Affiliations

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, USA
Long Wang, Fangzheng Xie & Yanxun Xu

Authors

Long Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fangzheng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yanxun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanxun Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix: Proof of Theorem 1

Proof

Denote $ L({\hat{{\varvec{\theta }}}}_k^{(+)}) = L_k^{(+)} $ and $ L_k^{(-)} = L({\hat{{\varvec{\theta }}}}_k^{(-)}) $. Before starting the main proof, we first define some useful notations below

$$\begin{aligned} {\bar{\varvec{g}}}_k&= {\mathbb {E}}\left[ {\hat{\varvec{g}}}_k \mid {\hat{{\varvec{\theta }}}}_k\right] , \end{aligned}$$

(12)

$$\begin{aligned} {\hat{{\varvec{b}}}}_k&= \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ L_k^{(+)} - L_k^{(-)}\right] - {\bar{\varvec{g}}}_k, \end{aligned}$$

(13)

$$\begin{aligned} {\hat{{\varvec{e}}}}_k&= {\hat{\varvec{g}}}_k - \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ L_k^{(+)} - L_k^{(-)}\right] = \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ \epsilon _k^{(+)} - \epsilon _k^{(-)}\right] , \end{aligned}$$

(14)

where the expectation in (12) is taken over both the perturbation vector $ {\varvec{\Delta }}_k $ and the noise term $ \epsilon _k $. Using (12), (13) and (14), we can write the updating equation as

$$\begin{aligned} {\hat{{\varvec{\theta }}}}_{k+1}&= {\hat{{\varvec{\theta }}}}_k - a_k {\hat{\varvec{g}}}_k \nonumber \\&= {\hat{{\varvec{\theta }}}}_k - a_k \left( \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ L_k^{(+)} - L_k^{(-)}\right] +\frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ \epsilon _k^{(+)} - \epsilon _k^{(-)}\right] \right) \nonumber \\&={\hat{{\varvec{\theta }}}}_k - a_k \left( {\bar{\varvec{g}}}_k + {\hat{{\varvec{b}}}}_k + {\bar{\varvec{g}}}_k\right) . \end{aligned}$$

(15)

For any $ \omega \in \Omega _0 $ such that $ P(\Omega _0) = 1 $, since $ \{{\hat{{\varvec{\theta }}}}_k(\omega )\} $ is a bounded sequence by Assumption 2, the Bolzano–Weierstrass Theorem implies that there exists $ \Omega _1 \subset \Omega $ such that $ P(\Omega _1) = 1 $ and for any $ \omega \in \Omega _1 $ there exists a convergent subsequence $ \{{\hat{{\varvec{\theta }}}}_{k_s}(\omega )\} $. Denote the limiting point of the convergent subsequence as $ {\varvec{\theta }}'(\omega ) $. For simplicity, the notation $ \omega $ is suppressed below.

According to (15), we can write

$$\begin{aligned} {\varvec{\theta }}' - {\hat{{\varvec{\theta }}}}_{k_s} = \lim _{s\rightarrow \infty } \sum _{i=s}^{n} ({\hat{{\varvec{\theta }}}}_{k_{i+1}} - {\hat{{\varvec{\theta }}}}_{k_i}) = -\lim _{s\rightarrow \infty } \sum _{i=s}^{n} a_{k_i}\left( {\bar{\varvec{g}}}_{k_i} + {\hat{{\varvec{b}}}}_{k_i} + {\bar{\varvec{g}}}_{k_i}\right) , \end{aligned}$$

(16)

Since $ {\varvec{\theta }}' - {\hat{{\varvec{\theta }}}}_{k_s} \rightarrow \varvec{0} $ as $ s \rightarrow \infty $, we will show below that all the three terms of the right-hand side of (16) must also converge to $ \varvec{0} $.

First note that by Assumption 3 and (13), we have

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_{k+1} \mid {\mathcal {F}}_k \right] = \varvec{0}, \end{aligned}$$

which implies that $ \{\sum _{i=k}^{m} a_i{\varvec{b}}_i\}_{m\ge k} $ is a martingale sequence as

$$\begin{aligned} {\mathbb {E}}\left[ \sum _{i=k}^{m+1} a_i{\hat{{\varvec{b}}}}_i \mid {\mathcal {F}}_m\right] = \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i + a_{m+1}{\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_{m+1} \mid {\mathcal {F}}_m\right] = \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i. \end{aligned}$$

Given that $ \{\sum _{i=k}^{m} a_i{\varvec{b}}_i\}_{m\ge k} $ is a martingale sequence, the Doob’s martingale inequality implies that for any $ \eta > 0 $

$$\begin{aligned} P\left( \sup _{m\ge k} \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) \le \eta ^{-2} {\mathbb {E}}\left[ \left\| \sum _{i=k}^{\infty } a_i{\hat{{\varvec{b}}}}_i \right\| ^2 \right] = \eta ^{-2} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] , \end{aligned}$$

(17)

where the last equality is due to Assumption 3, since

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i^T {\hat{{\varvec{b}}}}_i \right] = {\mathbb {E}}\left[ {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i^T {\hat{{\varvec{b}}}}_i \mid {\mathcal {F}}_j, {\mathcal {G}}_{j-1}\right] \right] = {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i^T {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i \mid {\mathcal {F}}_j\right] \right] = 0. \end{aligned}$$

By Assumption 1, we have $ b_k > 0 $ and $ c_k \le c_0 $. Hence, there exist a constant $ {\bar{c}} $ such that we can write (17) as

$$\begin{aligned} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right]&\le \sum _{i=k}^{\infty }a_i^2 {\mathbb {E}}\left[ \left( L_k^{(+)} - L_k^{(-)}\right) ^2(2\varvec{C}_i\circ {\varvec{\Delta }}_i)^{-T}(2\varvec{C}_i\circ {\varvec{\Delta }}_i)^{-1}\right] \\&\le \sum _{i=k}^{\infty }{\bar{c}}^2 \frac{a_i^2}{c_i^2} {\mathbb {E}}\left[ \left( L_k^{(+)} - L_k^{(-)}\right) ^2{\varvec{\Delta }}_i^{-T}{\varvec{\Delta }}_i^{-1}\right] < \infty , \end{aligned}$$

which further implies that

$$\begin{aligned} \lim _{k\rightarrow \infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] = 0. \end{aligned}$$

For any $ \eta > 0 $ and all $ k \ge n $, since

$$\begin{aligned} \left\{ \sup _k \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right\} \subset \left\{ \sup _{m\ge k} \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right\} , \end{aligned}$$

we can use (17) to get

$$\begin{aligned} P\left( \sup _k \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) \le P\left( \sup _{m\ge k} \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) \le \eta ^{-2} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] . \end{aligned}$$

As $ n \rightarrow \infty $, for all $ k \ge n $,

$$\begin{aligned} \lim _{k\rightarrow \infty }\eta ^{-2} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] = 0, \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty } P\left( \sup _{k\ge n} \left\| \sum _{i=k}^{\infty } a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) = 0. \end{aligned}$$

Therefore, we conclude

$$\begin{aligned} \lim _{k\rightarrow \infty } \sum _{i=k}^{\infty } a_i{\hat{{\varvec{b}}}}_i = \varvec{0}, \end{aligned}$$

and

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{\infty } a_{k_i}{\hat{{\varvec{b}}}}_{k_i} = \varvec{0}. \end{aligned}$$

(18)

Similarly, we can also show that

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{\infty } a_{k_i}{\hat{{\varvec{e}}}}_{k_i} = \varvec{0}. \end{aligned}$$

(19)

Combining (16) with results in (18) and (19), we have

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{n} a_{k_i} {\bar{\varvec{g}}}_{k_i} = \varvec{0}. \end{aligned}$$

(20)

Suppose $ {\varvec{\theta }}' \ne {\varvec{\theta }}^* $. Given $ \lim _{s\rightarrow \infty } {\hat{{\varvec{\theta }}}}_{k_s} = {\varvec{\theta }}' $, for any $ \delta > 0 $, there exists a S such that for any $ s > S $, $ \Vert {\hat{{\varvec{\theta }}}}_{k_s} - {\varvec{\theta }}' \Vert \le \delta $. Let $ \delta $ be sufficiently small, we have $ {\hat{{\varvec{\theta }}}}_{k_s} \in B_r({\varvec{\theta }}') $. By Assumption 1 and 6, we must have $ \sum _{i=s}^\infty a_{k_i} = \infty $ implies

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{n} a_{k_i} {\bar{\varvec{g}}}_{k_i}^T({\varvec{\theta }}' - {\varvec{\theta }}^*) = \infty , \end{aligned}$$

which contradicts with (20). Hence, we conclude that $ {\varvec{\theta }}' = {\varvec{\theta }}^* $. Since $ {\varvec{\theta }}' $ is chosen to be the limiting point of any convergent subsequence, we have all the convergent subsequence converges to the same liming points and consequently $ {\hat{{\varvec{\theta }}}}_k \rightarrow {\varvec{\theta }}^* $ a.s. when $ k \rightarrow \infty $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., Xie, F. & Xu, Y. Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data. Stat Biosci 15, 583–607 (2023). https://doi.org/10.1007/s12561-021-09324-4

Download citation

Received: 15 June 2020
Revised: 19 November 2020
Accepted: 19 September 2021
Published: 15 October 2021
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12561-021-09324-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data

Abstract

Access this article

Similar content being viewed by others

The Horseshoe-Like Regularization for Feature Subset Selection

Sparse dimension reduction based on energy and ball statistics

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data

Abstract

Access this article

Similar content being viewed by others

The Horseshoe-Like Regularization for Feature Subset Selection

Sparse dimension reduction based on energy and ball statistics

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation