Skip to main content
Log in

Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Estimating the dimension of a model along with its parameters is fundamental to many statistical learning problems. Traditional model selection methods often approach this task by a two-step procedure: first estimate model parameters under every candidate model dimension, then select the best model dimension based on certain information criterion. When the number of candidate models is large, however, this two-step procedure is highly inefficient and not scalable. We develop a novel automated and scalable approach with theoretical guarantees, called mixed-binary simultaneous perturbation stochastic approximation (MB-SPSA), to simultaneously estimate the dimension and parameters of a statistical model. To demonstrate the broad practicability of the MB-SPSA algorithm, we apply the MB-SPSA to various classic statistical models including K-means clustering, Gaussian mixture models with an unknown number of components, sparse linear regression, and latent factor models with an unknown number of factors. We evaluate the performance of the MB-SPSA through simulation studies and an application to a single-cell sequencing dataset in terms of accuracy, running time, and scalability. The code implementing the MB-SPSA is available at http://github.com/wanglong24/MB-SPSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24

    Google Scholar 

  2. Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ et al (2019) The single-cell transcriptional landscape of mammalian organogenesis. Nature 566(7745):496–502

    Google Scholar 

  3. Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B (Methodol) 59(4):731–792

    MATH  Google Scholar 

  4. Bhattacharya A, Dunson DB (2011) Sparse Bayesian infinite factor models. Biometrika 98(2):291–306

    MathSciNet  MATH  Google Scholar 

  5. Athreya A, Fishkind DE, Tang M, Priebe CE, Park Y, Vogelstein JT, Levin K, Lyzinski V, Qin Y (2017) Statistical inference on random dot product graphs: a survey. J Mach Learn Res 18(1):8393–8484

    MathSciNet  MATH  Google Scholar 

  6. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723

    MathSciNet  MATH  Google Scholar 

  7. Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Google Scholar 

  8. Zoph B et al (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 8697–8710

  9. Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. Proc AAAI Conf Artif Intell 33:4780–4789

    Google Scholar 

  10. Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732

    MathSciNet  MATH  Google Scholar 

  11. Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2(6):1152–1174

    MathSciNet  MATH  Google Scholar 

  12. Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(2):249–265

    MathSciNet  Google Scholar 

  13. Ghahramani Z, Griffiths TL (2006) Infinite latent feature models and the Indian buffet process. In: Advances in neural information processing systems, pp 475–482

  14. Walker SG (2007) Sampling the dirichlet mixture model with slices. Commun Stat-Simul Comput 36(1):45–54

    MathSciNet  MATH  Google Scholar 

  15. Blei DM, Jordan MI et al (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–143

    MathSciNet  MATH  Google Scholar 

  16. Markley SC, Miller DJ (2010) Joint parsimonious modeling and model order selection for multivariate gaussian mixtures. IEEE J Select Top Signal Proces 4(3):548–559

    Google Scholar 

  17. Huang T, Peng H, Zhang K (2017) Model selection for Gaussian mixture models. Stat Sin 27(1):147–169

    MathSciNet  MATH  Google Scholar 

  18. Bertsimas D, King A, Mazumder R et al (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852

    MathSciNet  MATH  Google Scholar 

  19. Miyashiro R, Takano Y (2015) Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur J Oper Res 247(3):721–731

    MathSciNet  MATH  Google Scholar 

  20. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  21. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. http://arxiv.org/abs/1412.6980

  22. Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans Autom Control 37(3):332–341

    MathSciNet  MATH  Google Scholar 

  23. Alessandri A, Parisini T (1997) Nonlinear modeling of complex large-scale plants using neural networks and stochastic approximation. IEEE Trans Syst Man Cybern A Syst Hum 27(6):750–757

    Google Scholar 

  24. Balakrishna R, Antoniou C, Ben-Akiva M, Koutsopoulos HN, Wen Y (2007) Calibration of microscopic traffic simulation models: methods and application. Transp Res Rec 1999(1):198–207

    Google Scholar 

  25. Kocsis L, Szepesvári C (2006) Universal parameter optimisation in games based on spsa. Mach Learn 63(3):249–286

    Google Scholar 

  26. Sidorov KA, Richmond S and Marshall D (2009) An efficient stochastic approach to groupwise non-rigid image registration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2208–2213

  27. Wang L, Zhu J and Spall JC (2018) Mixed simultaneous perturbation stochastic approximation for gradient-free optimization with noisy measurements. In Proceedings of the annual american control conference, pp 3774–3779

  28. Tympakianaki A, Koutsopoulos HN, Jenelius E (2015) C-SPSA: cluster-wise simultaneous perturbation stochastic approximation algorithm and its application to dynamic origin-destination matrix estimation. Transp Res C Emerg Technol 55:231–245

    Google Scholar 

  29. Dong N, Wu C-H, Gao Z-K, Chen Z-Q, Ip W-H (2016) Data-driven control based on simultaneous perturbation stochastic approximation with adaptive weighted gradient estimation. IET Control Theory Appl 10(2):201–209

    MathSciNet  Google Scholar 

  30. Lorenz R, Monti RP, Violante IR, Anagnostopoulos C, Faisal AA, Montana G, Leech R (2016) The automatic neuroscientist: a framework for optimizing experimental design with closed-loop real-time fmri. Neuroimage 129:320–334

    Google Scholar 

  31. Alaeddini A, Klein DJ (2017) Application of a second-order stochastic optimization algorithm for fitting stochastic epidemiological models. In: Proceedings of the winter simulation conference, pp 2194–2206

  32. Khatami A, Nazari A, Khosravi A, Lim CP, Nahavandi S (2020) A weight perturbation-based regularisation technique for convolutional neural networks and the application in medical imaging. Expert Syst Appl 149:113196

    Google Scholar 

  33. Aksakalli V, Malekipirbazari M (2016) Feature selection via binary simultaneous perturbation stochastic approximation. Pattern Recogn Lett 75:41–47

    Google Scholar 

  34. Dennis J Jr, Schnabel RB (1989) Chapter ia view of unconstrained optimization. Handb Oper Res Manage Sci 1:1–72

    Google Scholar 

  35. Spall JC (1998) Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans Aerosp Electron Syst 34(3):817–823

    Google Scholar 

  36. Bottou L, Cun YL (2004) Large scale online learning. In: Proceedings of the advances in neural information processing systems, pp 217–224

  37. Spall JC (2005) Introduction to stochastic search and optimization: estimation, simulation, and control, vol 65. Wiley, Berlin

    MATH  Google Scholar 

  38. Shukla AK, Muhuri PK (2019) Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intell 77:268–282

    Google Scholar 

  39. de la Fuente-Tomas L, Arranz B, Safont G, Sierra P, Sanchez-Autet M, Garcia-Blanco A, Garcia-Portilla MP (2019) Classification of patients with bipolar disorder using k-means clustering. PLoS ONE 14(1):e0210314

    Google Scholar 

  40. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  41. Brunet J-P, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169

    Google Scholar 

  42. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Proceedings of the advances in neural information processing systems, pp 556–562

  43. Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-Marqui RD (2006) Nonsmooth nonnegative matrix factorization (nsnmf). IEEE Trans Pattern Anal Mach Intell 28(3):403–415

    Google Scholar 

  44. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3(1):246–259

    Google Scholar 

  45. Frichot E, Mathieu F, Trouillon T, Bouchard G, François O (2014) Fast and efficient estimation of individual ancestry coefficients. Genetics 196(4):973–983

    Google Scholar 

  46. Stein-O’Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF et al (2018) Enter the matrix: factorization uncovers knowledge from omics. Trends Genet 34(10):790–805

    Google Scholar 

  47. Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 2009:785152

    Google Scholar 

  48. Févotte C, Cemgil AT (2009) Nonnegative matrix factorizations as probabilistic inference in composite models. In: Proceedings of the European signal processing conference, pp 1913–1917

  49. Landgraf AJ, Lee Y (2020) Generalized principal component analysis: projection of saturated model parameters. Technometrics 62(4):459–472

    MathSciNet  Google Scholar 

  50. Zhang S et al (2020) Review of single-cell rna-seq data clustering for cell type identification and characterization. http://arxiv.org/abs/2001.01006

  51. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    MATH  Google Scholar 

  52. Durif G, Modolo L, Mold JE, Lambert-Lacroix S, Picard F (2019) Probabilistic count matrix factorization for single cell expression data analysis. Bioinformatics 35(20):4011–4019

    Google Scholar 

  53. Sun S, Chen Y, Liu Y, Shang X (2019) A fast and efficient count-based matrix factorization method for detecting cell types from single-cell rnaseq data. BMC Syst Biol 13(2):28

    Google Scholar 

  54. Bruce P, Bruce A (2017) Practical statistics for data scientists: 50 essential concepts. O’Reilly Media, Inc, Newton

    Google Scholar 

  55. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, New York

    MATH  Google Scholar 

  56. Yang L, Liu J, Lu Q, Riggs AD, Wu X (2017) SAIC: an iterative clustering approach for analysis of single cell RNA-seq data. BMC Genomics 18(6):689

    Google Scholar 

  57. Jiang L, Chen H, Pinello L, Yuan G-C (2016) Giniclust: detecting rare cell types from single-cell gene expression data with gini index. Genome Biol 17(1):144

    Google Scholar 

  58. Zhu X, Ching T, Pan X, Weissman SM, Garmire L (2017) Detecting heterogeneity in single-cell rna-seq data by non-negative matrix factorization. PeerJ 5:e2888

    Google Scholar 

  59. Linderman GC, Rachh M, Hoskins JG, Steinerberger S, Kluger Y (2019) Fast interpolation-based t-sne for improved visualization of single-cell rna-seq data. Nat Methods 16(3):243–245

    Google Scholar 

  60. Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Gephart MGH, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci 112(23):7285–7290

    Google Scholar 

  61. Ghosh J, Acharya A (2011) Cluster ensembles. Wiley Interdiscipl Rev Data Mining Knowl Discov 1(4):305–315

    Google Scholar 

  62. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    MATH  Google Scholar 

  63. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  64. Xie F, Xu Y (2019) Optimal Bayesian estimation for random dot product graphs. http://arxiv.org/abs/1904.12070

  65. Huang H, Shi G, He H, Duan Y, Luo F (2019) Dimensionality reduction of hyperspectral imagery based on spatial-spectral manifold learning. IEEE Trans Cybern 50(6):2604–2616

    Google Scholar 

  66. Bing X, Bunea F, Wegkamp M et al (2020) A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics. Bernoulli 26(3):1765–1796

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The work of Xu was supported by NSF 1918854 and NSF 1940107.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanxun Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Proof

Denote \( L({\hat{{\varvec{\theta }}}}_k^{(+)}) = L_k^{(+)} \) and \( L_k^{(-)} = L({\hat{{\varvec{\theta }}}}_k^{(-)}) \). Before starting the main proof, we first define some useful notations below

$$\begin{aligned} {\bar{\varvec{g}}}_k&= {\mathbb {E}}\left[ {\hat{\varvec{g}}}_k \mid {\hat{{\varvec{\theta }}}}_k\right] , \end{aligned}$$
(12)
$$\begin{aligned} {\hat{{\varvec{b}}}}_k&= \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ L_k^{(+)} - L_k^{(-)}\right] - {\bar{\varvec{g}}}_k, \end{aligned}$$
(13)
$$\begin{aligned} {\hat{{\varvec{e}}}}_k&= {\hat{\varvec{g}}}_k - \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ L_k^{(+)} - L_k^{(-)}\right] = \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ \epsilon _k^{(+)} - \epsilon _k^{(-)}\right] , \end{aligned}$$
(14)

where the expectation in (12) is taken over both the perturbation vector \( {\varvec{\Delta }}_k \) and the noise term \( \epsilon _k \). Using (12), (13) and (14), we can write the updating equation as

$$\begin{aligned} {\hat{{\varvec{\theta }}}}_{k+1}&= {\hat{{\varvec{\theta }}}}_k - a_k {\hat{\varvec{g}}}_k \nonumber \\&= {\hat{{\varvec{\theta }}}}_k - a_k \left( \frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ L_k^{(+)} - L_k^{(-)}\right] +\frac{1}{2}(\varvec{C}_k \circ {\varvec{\Delta }}_k)\left[ \epsilon _k^{(+)} - \epsilon _k^{(-)}\right] \right) \nonumber \\&={\hat{{\varvec{\theta }}}}_k - a_k \left( {\bar{\varvec{g}}}_k + {\hat{{\varvec{b}}}}_k + {\bar{\varvec{g}}}_k\right) . \end{aligned}$$
(15)

For any \( \omega \in \Omega _0 \) such that \( P(\Omega _0) = 1 \), since \( \{{\hat{{\varvec{\theta }}}}_k(\omega )\} \) is a bounded sequence by Assumption 2, the Bolzano–Weierstrass Theorem implies that there exists \( \Omega _1 \subset \Omega \) such that \( P(\Omega _1) = 1 \) and for any \( \omega \in \Omega _1 \) there exists a convergent subsequence \( \{{\hat{{\varvec{\theta }}}}_{k_s}(\omega )\} \). Denote the limiting point of the convergent subsequence as \( {\varvec{\theta }}'(\omega ) \). For simplicity, the notation \( \omega \) is suppressed below.

According to (15), we can write

$$\begin{aligned} {\varvec{\theta }}' - {\hat{{\varvec{\theta }}}}_{k_s} = \lim _{s\rightarrow \infty } \sum _{i=s}^{n} ({\hat{{\varvec{\theta }}}}_{k_{i+1}} - {\hat{{\varvec{\theta }}}}_{k_i}) = -\lim _{s\rightarrow \infty } \sum _{i=s}^{n} a_{k_i}\left( {\bar{\varvec{g}}}_{k_i} + {\hat{{\varvec{b}}}}_{k_i} + {\bar{\varvec{g}}}_{k_i}\right) , \end{aligned}$$
(16)

Since \( {\varvec{\theta }}' - {\hat{{\varvec{\theta }}}}_{k_s} \rightarrow \varvec{0} \) as \( s \rightarrow \infty \), we will show below that all the three terms of the right-hand side of (16) must also converge to \( \varvec{0} \).

First note that by Assumption 3 and (13), we have

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_{k+1} \mid {\mathcal {F}}_k \right] = \varvec{0}, \end{aligned}$$

which implies that \( \{\sum _{i=k}^{m} a_i{\varvec{b}}_i\}_{m\ge k} \) is a martingale sequence as

$$\begin{aligned} {\mathbb {E}}\left[ \sum _{i=k}^{m+1} a_i{\hat{{\varvec{b}}}}_i \mid {\mathcal {F}}_m\right] = \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i + a_{m+1}{\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_{m+1} \mid {\mathcal {F}}_m\right] = \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i. \end{aligned}$$

Given that \( \{\sum _{i=k}^{m} a_i{\varvec{b}}_i\}_{m\ge k} \) is a martingale sequence, the Doob’s martingale inequality implies that for any \( \eta > 0 \)

$$\begin{aligned} P\left( \sup _{m\ge k} \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) \le \eta ^{-2} {\mathbb {E}}\left[ \left\| \sum _{i=k}^{\infty } a_i{\hat{{\varvec{b}}}}_i \right\| ^2 \right] = \eta ^{-2} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] , \end{aligned}$$
(17)

where the last equality is due to Assumption 3, since

$$\begin{aligned} {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i^T {\hat{{\varvec{b}}}}_i \right] = {\mathbb {E}}\left[ {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i^T {\hat{{\varvec{b}}}}_i \mid {\mathcal {F}}_j, {\mathcal {G}}_{j-1}\right] \right] = {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i^T {\mathbb {E}}\left[ {\hat{{\varvec{b}}}}_i \mid {\mathcal {F}}_j\right] \right] = 0. \end{aligned}$$

By Assumption 1, we have \( b_k > 0 \) and \( c_k \le c_0 \). Hence, there exist a constant \( {\bar{c}} \) such that we can write (17) as

$$\begin{aligned} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right]&\le \sum _{i=k}^{\infty }a_i^2 {\mathbb {E}}\left[ \left( L_k^{(+)} - L_k^{(-)}\right) ^2(2\varvec{C}_i\circ {\varvec{\Delta }}_i)^{-T}(2\varvec{C}_i\circ {\varvec{\Delta }}_i)^{-1}\right] \\&\le \sum _{i=k}^{\infty }{\bar{c}}^2 \frac{a_i^2}{c_i^2} {\mathbb {E}}\left[ \left( L_k^{(+)} - L_k^{(-)}\right) ^2{\varvec{\Delta }}_i^{-T}{\varvec{\Delta }}_i^{-1}\right] < \infty , \end{aligned}$$

which further implies that

$$\begin{aligned} \lim _{k\rightarrow \infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] = 0. \end{aligned}$$

For any \( \eta > 0 \) and all \( k \ge n \), since

$$\begin{aligned} \left\{ \sup _k \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right\} \subset \left\{ \sup _{m\ge k} \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right\} , \end{aligned}$$

we can use (17) to get

$$\begin{aligned} P\left( \sup _k \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) \le P\left( \sup _{m\ge k} \left\| \sum _{i=k}^{m} a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) \le \eta ^{-2} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] . \end{aligned}$$

As \( n \rightarrow \infty \), for all \( k \ge n \),

$$\begin{aligned} \lim _{k\rightarrow \infty }\eta ^{-2} \sum _{i=k}^{\infty }a_i^2{\mathbb {E}}\left[ \left\| {\hat{{\varvec{b}}}}_i \right\| ^2 \right] = 0, \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty } P\left( \sup _{k\ge n} \left\| \sum _{i=k}^{\infty } a_i{\hat{{\varvec{b}}}}_i \right\| \ge \eta \right) = 0. \end{aligned}$$

Therefore, we conclude

$$\begin{aligned} \lim _{k\rightarrow \infty } \sum _{i=k}^{\infty } a_i{\hat{{\varvec{b}}}}_i = \varvec{0}, \end{aligned}$$

and

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{\infty } a_{k_i}{\hat{{\varvec{b}}}}_{k_i} = \varvec{0}. \end{aligned}$$
(18)

Similarly, we can also show that

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{\infty } a_{k_i}{\hat{{\varvec{e}}}}_{k_i} = \varvec{0}. \end{aligned}$$
(19)

Combining (16) with results in (18) and (19), we have

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{n} a_{k_i} {\bar{\varvec{g}}}_{k_i} = \varvec{0}. \end{aligned}$$
(20)

Suppose \( {\varvec{\theta }}' \ne {\varvec{\theta }}^* \). Given \( \lim _{s\rightarrow \infty } {\hat{{\varvec{\theta }}}}_{k_s} = {\varvec{\theta }}' \), for any \( \delta > 0 \), there exists a S such that for any \( s > S \), \( \Vert {\hat{{\varvec{\theta }}}}_{k_s} - {\varvec{\theta }}' \Vert \le \delta \). Let \( \delta \) be sufficiently small, we have \( {\hat{{\varvec{\theta }}}}_{k_s} \in B_r({\varvec{\theta }}') \). By Assumption 1 and 6, we must have \( \sum _{i=s}^\infty a_{k_i} = \infty \) implies

$$\begin{aligned} \lim _{s\rightarrow \infty } \sum _{i=s}^{n} a_{k_i} {\bar{\varvec{g}}}_{k_i}^T({\varvec{\theta }}' - {\varvec{\theta }}^*) = \infty , \end{aligned}$$

which contradicts with (20). Hence, we conclude that \( {\varvec{\theta }}' = {\varvec{\theta }}^* \). Since \( {\varvec{\theta }}' \) is chosen to be the limiting point of any convergent subsequence, we have all the convergent subsequence converges to the same liming points and consequently \( {\hat{{\varvec{\theta }}}}_k \rightarrow {\varvec{\theta }}^* \) a.s. when \( k \rightarrow \infty \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Xie, F. & Xu, Y. Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data. Stat Biosci 15, 583–607 (2023). https://doi.org/10.1007/s12561-021-09324-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-021-09324-4

Keywords

Navigation