Skip to main content
Log in

Information-based optimal subdata selection for non-linear models

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Subdata selection methods provide flexible tradeoffs between computational complexity and statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for selecting informative subdata from massive data for a broad class of models, including generalized linear models as special cases. A connection between the proposed method and many widely used optimal design criteria such as A-, D-, and E-optimality criteria is established to provide a comprehensive understanding of the selected subdata. Theoretical justifications are provided for the proposed method, and numerical simulations are conducted to illustrate the superior performance of the selected subdata.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ai M, Wang F, Yu J, Zhang H (2021a) Optimal subsampling for large-scale quantile regression. J Complex 62:101512

  • Ai M, Yu J, Zhang H, Wang H (2021b) Optimal subsampling algorithms for big data regressions. Stat Sin 31:749–772

  • Atkinson AC, Fedorov VV (1975) The design of experiments for discriminating between two rival models. Biometrika 62:57–70

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng Q, Wang H, Yang M (2020) Information-based optimal subdata selection for big data logistic regression. J Stat Plan Inference 209:112–122

    Article  MathSciNet  MATH  Google Scholar 

  • David HA, Hartley HO, Pearson ES (1954) The distribution of the ratio, in a single normal sample, of range to standard deviation. Biometrika 41:482–493

    Article  MathSciNet  MATH  Google Scholar 

  • Deldossi L, Tommasi C (2022) Optimal design subsampling from big datasets. J Qual Technol 54:93–101

    Article  Google Scholar 

  • Deville JC, Särndal CE (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87(418):376–382

    Article  MathSciNet  MATH  Google Scholar 

  • Drovandi CC, Holmes C, McGree JM, Mengersen K, Richardson S, Ryan EG (2017) Principles of experimental design for big data analysis. Stat Sci 32(3):385

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected fisher information. Biometrika 65(3):457–483

    Article  MathSciNet  MATH  Google Scholar 

  • Hartley HO, Rao JNK (1962) Sampling with unequal probabilities and without replacement. Ann Math Stat 33:350–374

    Article  MathSciNet  MATH  Google Scholar 

  • Horn RA, Johnson CR (2013) Matrix analysis, 2nd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Kiefer J (1959) Optimum experimental designs. J R Stat Soc B 21(2):272–319

    MathSciNet  MATH  Google Scholar 

  • Ma P, Mahoney MW, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–919

    MathSciNet  MATH  Google Scholar 

  • Ma P, Zhang X , Xing X ,Ma J, Mahoney M (2020) Asymptotic analysis of sampling estimators for randomized numerical linear algebra algorithms. In: International conference on artificial intelligence and statistics. PMLR, pp. 1026–1035

  • Martínez C (2004) On partial sorting. Technical report, 10th seminar on the analysis of algorithms

  • Meng C, Xie R, Mandal A, Zhang X, Zhong W, Ma P (2020) Lowcon: A design-based subsampling approach in a misspecified linear model. J Comput Graph Stat. https://doi.org/10.1080/10618600.2020.1844215

    Article  MATH  Google Scholar 

  • Montgomery DC (2019) Introduction to statistical quality control, 8th Ed

  • Musser DR (1997) Introspective sorting and selection algorithms. Software 27(8):983–993

    Google Scholar 

  • Pukelsheim F (2006) Optimal design of experiments. Society for Industrial and Applied Mathematics

  • Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York

    Book  MATH  Google Scholar 

  • Schmidt D, Schwabe R (2017) Optimal design for multiple regression with information driven by the linear predictor. Stat Sin 27:1371–1384

    MathSciNet  MATH  Google Scholar 

  • Tippett LHC (1925) On the extreme individuals and the range of samples taken from a normal population. Biometrika 17:364–387

    Article  MATH  Google Scholar 

  • Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108:99–112

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113:829–844. https://doi.org/10.1080/01621459.2017.1292914

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114:393–405

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Elmstedt J, Wong WK, Xu H (2021) Orthogonal subsampling for big data linear regression. Ann Appl Stat 15:1273–1290

    Article  MathSciNet  MATH  Google Scholar 

  • Xie R, Wang Z, Bai S, Ma P, Zhong W (2019) Online decentralized leverage score sampling for streaming multidimensional time series. In: Chaudhuri K, Sugiyama M (eds) Proceedings of machine learning research, vol 89, pp. 2301–2311. PMLR

  • Yang M, Zhang B, Huang S (2011) Optimal designs for generalized linear models with multiple design variables. Stat Sin 21:1415–1430

    Article  MathSciNet  MATH  Google Scholar 

  • Yu J, Wang H (2022) Subdata selection algorithm for linear model discrimination. Stat Pap. https://doi.org/10.1007/s00362-022-01299-8

    Article  MathSciNet  MATH  Google Scholar 

  • Yu J, Ai M, Ye Z (2023) A review on design inspired subsampling for big data. Stat Pap. https://doi.org/10.1007/s00362-022-01386-w

    Article  Google Scholar 

  • Zhang T, Ning Y, Ruppert D (2021) Optimal sampling for generalized linear models under measurement constraints. J Comput Graph Stat 30:106–114

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao Y, Amemiya Y, Hung Y (2018) Efficient Gaussian process modeling using experimental design-based subagging. Stat Sin 28:1459–1479

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors sincerely thank the editors and referees for their valuable comments and insightful suggestions, which led to further improvement of this article. Beijing Municipal Natural Science Foundation No. 1232019. Yu’s work was supported by NSFC Grant 12001042, Beijing Institute of Technology Research Fund Program for Young Scholars. Liu and Wang’s research was supported by NSF Grant CCF 2105571 and UConn CLAS Research in Academic Themes funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HaiYing Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proofs

Appendix A Proofs

1.1 A.1 Proof of Theorem 1

Proof

Note that

$$\begin{aligned} \varvec{\delta }^{opt}_{T}= \arg \max _{\varvec{\delta }}\textrm{tr}\left( {\mathcal {I}}(\varvec{\delta })\right) =\arg \max _{\varvec{\delta }}\delta _i\Vert \varvec{x}_i\Vert ^2. \end{aligned}$$

Thus, the T-optimal subdata is the subdata with the k largest values of \(\Vert \varvec{x}_i\Vert ^2\). \(\square \)

1.2 A.2 Proof of Theorem 2

Before proving Theorem 2, we need the following lemma.

Lemma A.1

Let \(\lambda _{1},\ldots ,\lambda _{d}\) be the d eigenvalues of \({\mathcal {I}}(\varvec{\delta })\) with \(\lambda _{\min }=\lambda _{1}\le \ldots \le \lambda _{d}=\lambda _{\max }\). Assume that \(\textrm{var}(\varvec{Z}_{1}^*)\le \ldots \le \textrm{var}(\varvec{Z}_{d}^*)\) with \(\textrm{var}(\varvec{Z}_j^*)\) being the sample variance for the jth column of \(\varvec{Z}^*\). For any subdata of size n, represented by \(\varvec{\delta }\), it holds that

$$\begin{aligned} (n-1)\lambda _{\min }(\varvec{R}^*)\textrm{var}(\varvec{Z}_{j}^*)\le \lambda _j({\mathcal {I}}(\varvec{\delta })) \le (n-1)\lambda _{\max }(\varvec{R}^*)\textrm{var}(\varvec{Z}_{j}^*), \end{aligned}$$
(21)

where \(\varvec{R}^*\) is the sample correlation matrix of \(\varvec{Z}^*\).

Proof

Recall \(\varvec{Z}^*=(\varvec{z}_{1}^*,\ldots , \varvec{z}_{n}^*)^\textrm{T}\). Let \({\textbf{1}}\) be a \(n\times 1\) vector of ones, and \(\textrm{var}(\varvec{Z}_j^*)\) be the sample variance for the jth column of \(\varvec{Z}^*\), \(j=1,\ldots , d\). It follows that

$$\begin{aligned} {\mathcal {I}}(\varvec{\delta },\varvec{\theta })=&\varvec{Z}^{*\mathrm T}\varvec{Z}^*\\ \ge&\varvec{Z}^{*\textrm{T}}\Big (\varvec{I}-\frac{1}{n}{\textbf{1}}{\textbf{1}}^\textrm{T}\Big )\varvec{Z}^*\\ =&(n-1) \begin{bmatrix} \sqrt{\textrm{var}(\varvec{Z}_{1}^*)} &{} &{} \\ {} &{} \ddots &{} \\ &{} &{} \sqrt{\textrm{var}(\varvec{Z}_{d}^*)} \end{bmatrix} {\textbf{R}}^* \begin{bmatrix} \sqrt{\textrm{var}(\varvec{Z}_{1}^*)} &{} &{} \\ &{} \ddots &{} \\ &{} &{} \sqrt{\textrm{var}(\varvec{Z}_{d}^*)} \end{bmatrix}. \end{aligned}$$

Note the fact that for any matrices \(A\ge 0\) and \(B\ge 0\), it holds that \(\lambda _{\min }(B)\lambda _j(A^2)\le \lambda _j(ABA)\le \lambda _{\max }(B)\lambda _j(A^2)\). The desired result follows immediately by letting \(B=\varvec{R}^*\) and \(A=\textrm{diag}(\sqrt{\textrm{var}(\varvec{Z}_{1}^*)},\ldots ,\sqrt{\textrm{var}(\varvec{Z}_{d}^*)})\). \(\square \)

Proof of Theorem 2

Recall that \(\lambda _{1},\ldots ,\lambda _{d}\) are d eigenvalues of \({\mathcal {I}}(\varvec{\delta })\) with \(0<\lambda _{\min }=\lambda _{1}\le \ldots \le \lambda _{d}=\lambda _{\max }\). We have

$$\begin{aligned} \{\textrm{tr}(d^{-1}{\mathcal {I}}^{-1}(\varvec{\delta }))\}^{-1}&=\left( d^{-1}\sum _{j=1}^d\lambda _j^{-1}\right) ^{-1},\end{aligned}$$
(22)
$$\begin{aligned} \{\det ({\mathcal {I}}(\varvec{\delta }))\}^{1/d}&=\left( \prod _{j=1}^d\lambda _j\right) ^{1/d}. \end{aligned}$$
(23)

Thus (13)–(15) can be easily obtained by Lemma A.1. \(\square \)

1.3 A.3 Proof of Theorem 3

Proof

For each sample variance,

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) =&\frac{1}{n-1}\sum _{i=1}^{n}({\tilde{{z}}}_{ij}^*-\bar{{\tilde{z}}}_j^*)^2\nonumber \\ =&\frac{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})^2}{k-1}\sum _{i=1}^{n}\left( \frac{{\tilde{z}}_{ij}^*-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2\nonumber \\ \ge&\frac{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})^2}{n-1} \left( \sum _{i=1}^{r}+\sum _{i=N-r+1}^N\right) \left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2. \end{aligned}$$
(A.1)

For the first summation in (A.4),

$$\begin{aligned}&\sum _{i=1}^{r}\left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2 =\frac{1}{n^2}\sum _{i=1}^{r}\left( \frac{\sum _{s=1}^n{{\tilde{z}}}_{sj}^*-n{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2. \end{aligned}$$
(A.2)

Each term in the summation of (A.5) can be written as

$$\begin{aligned}&\frac{\sum _{s=1}^n{{\tilde{z}}}_{sj}^*-n{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\nonumber \\&\quad =\sum _{s=N-r+1}^N\frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}} +\sum _{s=1}^r\frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\nonumber \\&\qquad +\sum _{l\ne j}\left( \sum _{s=1}^r +\sum _{s=N-r+1}^N\right) \frac{{\tilde{z}}_j^{(s)l}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}, \end{aligned}$$
(A.3)

where \({\tilde{z}}_j^{(s)l}\) is the jth dimension of the subdata point selected according to \(\{\tilde{{z}}_{il},i=1,\ldots ,N\}\) in the second step of Algorithm 1. From the Assumption 1, we have that for \(s,i\le r\), \({({\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j})}/{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})}=o_{P}(1)\) and \({({\tilde{z}}_j^{(s)l}-{\tilde{z}}_{(i)j})}/{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})}\) is either positive or \(o_{P}(1)\). Thus (A.6) implies

$$\begin{aligned}&\frac{\sum _{s=1}^n{{\tilde{z}}}_{sj}^*-n{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}} \ge \sum _{s=N-r+1}^N\frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}. \end{aligned}$$
(A.4)

From assumptions 1 and 2, for \(s\ge N-r+1\) and \(i\le r\), as \(N\rightarrow \infty \),

$$\begin{aligned} \frac{\tilde{{z}}_{(s)j}-\tilde{{z}}_{(i)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} =\frac{\tilde{{z}}_{(s)j}-\tilde{{z}}_{(N)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} +\frac{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} +\frac{\tilde{{z}}_{(1)j}-\tilde{{z}}_{(i)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} =1+o_{P}(1) \end{aligned}$$
(A.5)

From (A.5), (A.7) and (A.8),

$$\begin{aligned} \sum _{i=1}^{r}\left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2 \ge&\frac{1}{n^2}\sum _{i=1}^{r}\left( \sum _{s=N-r+1}^N \frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}+o_{P}(1)\right) ^2 =\frac{r^3}{n^2}+o_{P}(1). \end{aligned}$$
(A.6)

Similarly,

$$\begin{aligned} \sum _{i=N-r+1}^N\left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2 \ge \frac{r^3}{n^2}+o_{P}(1). \end{aligned}$$
(A.7)

Combining (A.4), (A.9) and (A.10),

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) \ge \frac{2r^3({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})^2}{n^2(n-1)}(1+o_{P}(1)). \end{aligned}$$
(A.8)

If \({\tilde{z}}_{(1)j}/{\tilde{z}}_{(N)j}\overset{P}{\rightarrow }0\) or \(\pm \infty \), then

$$\begin{aligned} \frac{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}{|{\tilde{z}}|_{(N)j}}=1+o_{P}(1). \end{aligned}$$
(A.9)

Combining (A.11) and (A.12) shows that

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) \ge \frac{2r^3|{\tilde{z}}|_{(N)j}^2}{n^2(n-1)}(1+o_{P}(1)). \end{aligned}$$
(A.10)

Thus, the desired results come from Theorem 2 and Slutsky’s theorem.

If \({\tilde{z}}_{(N)j}\rightarrow \infty \) and \({\tilde{z}}_{(1)j}\) is bounded below, or \({\tilde{z}}_{(1)j}\rightarrow -\infty \) and \({\tilde{z}}_{(N)j}\) is bounded above, then

$$\begin{aligned} \frac{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}{|{\tilde{z}}|_{(N)j}}=1+o_{P}(1). \end{aligned}$$
(A.11)

From (A.11) and (A.14), it follows that

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) \ge \frac{2r^3|{\tilde{z}}|_{(N)j}^2}{n^2(n-1)}(1+o_{P}(1)). \end{aligned}$$
(A.12)

Thus, the desired results come from Theorem 2 and Slutsky’s theorem.

\(\square \)

1.4 A.4 Some details of Example 3 and Proposition 4

Lemma A.2

Suppose \(\varvec{x}\sim N(\varvec{\mu },\Sigma )\) with \(\Sigma >0\) and \(\varvec{\theta }\) has more than one nonzero elements. Then it holds that \((x_j,\varvec{\theta }^\textrm{T}\varvec{x})^\textrm{T}\) is still a nondegenerate normal distribution for all j.

Lemma A.3

Suppose \(\varvec{x}\sim N(\varvec{\mu },\Sigma )\) with \(\Sigma >0\), and \(\varvec{\theta }\) lies in a compact ball with more than one nonzero element. Then for any given M, and \(C>0\), it holds that

$$\begin{aligned}&\Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0,\quad \\&\Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}< -M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0, \end{aligned}$$

for all j.

Proof of Proposition 4

Note that the jth dimension of \(\tilde{\varvec{z}}\) can be written as

$$\begin{aligned}\tilde{{z}}_j=\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}{\tilde{\varvec{\theta }}}/2}+e^{\varvec{x}^\textrm{T}{\tilde{\varvec{\theta }}}/2}}.\end{aligned}$$

For any j, one can see that

$$\begin{aligned} \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M\Big )\ge \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0, \end{aligned}$$

where M and C are some constant independent of \(\varvec{x}\).

Similarly, one can show that

$$\begin{aligned} \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}< -M\Big )\ge \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}< -M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0. \end{aligned}$$

From Lemma A.3, the desired result follows. \(\square \)

Details of Example 3

Grounded on the two lemmas, we can see that for any \(M>0\) and \(j=1,\ldots ,d\),

$$\begin{aligned} \Pr (\tilde{{z}}_{(N)j}M)&=1-\Pr (\tilde{{z}}_{j,(N)}\le M)\end{aligned}$$
(A.13)
$$\begin{aligned}&=1-{\Pr }^N(\tilde{{z}}_{j}\le M)\end{aligned}$$
(A.14)
$$\begin{aligned}&=1-{\Pr }^N\left( \frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}\le M\right) . \end{aligned}$$
(A.15)

From Lemma A.3, it is clear to see that

$$\begin{aligned}&{\Pr }^N\Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}\le M\Big ) \rightarrow 0. \end{aligned}$$
(A.16)

Thus the result follows. \(\square \)

Proof of Lemma A.2

Without loss of generality, we only consider the case \(j=1\) here. Note that \(\varvec{x}\) is a multivariate normal distribution. Let \({\varvec{e}_j}\) be a unit vector with jth element being one and \(C=(\varvec{e}_j,\varvec{\theta })\). Note that \(\varvec{\theta }\) has more than one nonzero element by assumption. It is clear to see the rank of C equals two. Since \(\Sigma >0\), the rank of \(C^\textrm{T}\Sigma C\) is also two. The \(\check{\varvec{x}}=C^\textrm{T}\varvec{x}=(x_j,\varvec{\theta }^\textrm{T}\varvec{x})^\textrm{T}\) is also a non degenerate normal distribution with mean \(C^\textrm{T}\varvec{\mu }=(\mu _{j},\varvec{\mu }^\textrm{T}\varvec{\theta })^\textrm{T}\), and variance

$$\begin{aligned} C^\textrm{T}\Sigma C={\check{\Sigma }}=\left( \begin{array}{ll} \sigma _{jj} &{} {{\check{\sigma }}}_{12}\\ {{\check{\sigma }}}_{12}^\textrm{T}&{} {{\check{\sigma }}}_{22} \end{array} \right) , \end{aligned}$$

where \(\mu _{j}\) is the jth elements in \(\varvec{\mu }\), \(\sigma _{jj}\) is the (jj)th element of \(\Sigma \), \({{\check{\sigma }}}_{12}\) equals to \((\Sigma _{\cdot j}^\textrm{T}\varvec{\theta })\) with \(\Sigma _{\cdot j}\) being the jth column of \(\Sigma \), and \({{\check{\sigma }}}_{22}=\varvec{\theta }^\textrm{T}\Sigma \varvec{\theta }\).

\(\square \)

Proof of Lemma A.3

For the first result, simple calculation yields that

$$\begin{aligned}&\Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )\end{aligned}$$
(A.17)
$$\begin{aligned}&\ge \Pr \Big (x_{j}> 2e^{C}M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0, \end{aligned}$$
(A.18)

since \((x_j,\varvec{x}^\textrm{T}\varvec{\theta })^\textrm{T}\) are non degenerate normal distribution by the fact proved in Lemma A.2. The second result is quite similar. Thus we omit it.

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Liu, J. & Wang, H. Information-based optimal subdata selection for non-linear models. Stat Papers 64, 1069–1093 (2023). https://doi.org/10.1007/s00362-023-01430-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-023-01430-3

Keywords

Navigation