Information-based optimal subdata selection for non-linear models

Yu, Jun; Liu, Jiaqi; Wang, HaiYing

doi:10.1007/s00362-023-01430-3

Information-based optimal subdata selection for non-linear models

Regular Article
Published: 25 March 2023

Volume 64, pages 1069–1093, (2023)
Cite this article

Statistical Papers Aims and scope Submit manuscript

492 Accesses
2 Citations
Explore all metrics

Abstract

Subdata selection methods provide flexible tradeoffs between computational complexity and statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for selecting informative subdata from massive data for a broad class of models, including generalized linear models as special cases. A connection between the proposed method and many widely used optimal design criteria such as A-, D-, and E-optimality criteria is established to provide a comprehensive understanding of the selected subdata. Theoretical justifications are provided for the proposed method, and numerical simulations are conducted to illustrate the superior performance of the selected subdata.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

References

Ai M, Wang F, Yu J, Zhang H (2021a) Optimal subsampling for large-scale quantile regression. J Complex 62:101512
Ai M, Yu J, Zhang H, Wang H (2021b) Optimal subsampling algorithms for big data regressions. Stat Sin 31:749–772
Atkinson AC, Fedorov VV (1975) The design of experiments for discriminating between two rival models. Biometrika 62:57–70
Article MathSciNet MATH Google Scholar
Cheng Q, Wang H, Yang M (2020) Information-based optimal subdata selection for big data logistic regression. J Stat Plan Inference 209:112–122
Article MathSciNet MATH Google Scholar
David HA, Hartley HO, Pearson ES (1954) The distribution of the ratio, in a single normal sample, of range to standard deviation. Biometrika 41:482–493
Article MathSciNet MATH Google Scholar
Deldossi L, Tommasi C (2022) Optimal design subsampling from big datasets. J Qual Technol 54:93–101
Article Google Scholar
Deville JC, Särndal CE (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87(418):376–382
Article MathSciNet MATH Google Scholar
Drovandi CC, Holmes C, McGree JM, Mengersen K, Richardson S, Ryan EG (2017) Principles of experimental design for big data analysis. Stat Sci 32(3):385
Article MathSciNet MATH Google Scholar
Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected fisher information. Biometrika 65(3):457–483
Article MathSciNet MATH Google Scholar
Hartley HO, Rao JNK (1962) Sampling with unequal probabilities and without replacement. Ann Math Stat 33:350–374
Article MathSciNet MATH Google Scholar
Horn RA, Johnson CR (2013) Matrix analysis, 2nd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Kiefer J (1959) Optimum experimental designs. J R Stat Soc B 21(2):272–319
MathSciNet MATH Google Scholar
Ma P, Mahoney MW, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–919
MathSciNet MATH Google Scholar
Ma P, Zhang X , Xing X ,Ma J, Mahoney M (2020) Asymptotic analysis of sampling estimators for randomized numerical linear algebra algorithms. In: International conference on artificial intelligence and statistics. PMLR, pp. 1026–1035
Martínez C (2004) On partial sorting. Technical report, 10th seminar on the analysis of algorithms
Meng C, Xie R, Mandal A, Zhang X, Zhong W, Ma P (2020) Lowcon: A design-based subsampling approach in a misspecified linear model. J Comput Graph Stat. https://doi.org/10.1080/10618600.2020.1844215
Article MATH Google Scholar
Montgomery DC (2019) Introduction to statistical quality control, 8th Ed
Musser DR (1997) Introspective sorting and selection algorithms. Software 27(8):983–993
Google Scholar
Pukelsheim F (2006) Optimal design of experiments. Society for Industrial and Applied Mathematics
Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York
Book MATH Google Scholar
Schmidt D, Schwabe R (2017) Optimal design for multiple regression with information driven by the linear predictor. Stat Sin 27:1371–1384
MathSciNet MATH Google Scholar
Tippett LHC (1925) On the extreme individuals and the range of samples taken from a normal population. Biometrika 17:364–387
Article MATH Google Scholar
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108:99–112
Article MathSciNet MATH Google Scholar
Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113:829–844. https://doi.org/10.1080/01621459.2017.1292914
Article MathSciNet MATH Google Scholar
Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114:393–405
Article MathSciNet MATH Google Scholar
Wang L, Elmstedt J, Wong WK, Xu H (2021) Orthogonal subsampling for big data linear regression. Ann Appl Stat 15:1273–1290
Article MathSciNet MATH Google Scholar
Xie R, Wang Z, Bai S, Ma P, Zhong W (2019) Online decentralized leverage score sampling for streaming multidimensional time series. In: Chaudhuri K, Sugiyama M (eds) Proceedings of machine learning research, vol 89, pp. 2301–2311. PMLR
Yang M, Zhang B, Huang S (2011) Optimal designs for generalized linear models with multiple design variables. Stat Sin 21:1415–1430
Article MathSciNet MATH Google Scholar
Yu J, Wang H (2022) Subdata selection algorithm for linear model discrimination. Stat Pap. https://doi.org/10.1007/s00362-022-01299-8
Article MathSciNet MATH Google Scholar
Yu J, Ai M, Ye Z (2023) A review on design inspired subsampling for big data. Stat Pap. https://doi.org/10.1007/s00362-022-01386-w
Article Google Scholar
Zhang T, Ning Y, Ruppert D (2021) Optimal sampling for generalized linear models under measurement constraints. J Comput Graph Stat 30:106–114
Article MathSciNet MATH Google Scholar
Zhao Y, Amemiya Y, Hung Y (2018) Efficient Gaussian process modeling using experimental design-based subagging. Stat Sin 28:1459–1479
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors sincerely thank the editors and referees for their valuable comments and insightful suggestions, which led to further improvement of this article. Beijing Municipal Natural Science Foundation No. 1232019. Yu’s work was supported by NSFC Grant 12001042, Beijing Institute of Technology Research Fund Program for Young Scholars. Liu and Wang’s research was supported by NSF Grant CCF 2105571 and UConn CLAS Research in Academic Themes funding.

Author information

Jun Yu and Jiaqi Liu have contributed equally to this work.

Authors and Affiliations

School of Mathematics and Statistics, and key laboratory of mathematical theory and computation in information security, Beijing Institute of Technology, Beijing, 100811, China
Jun Yu
Department of Statistics, University of Connecticut, Storrs, 06269, CT, USA
Jiaqi Liu & HaiYing Wang

Authors

Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
HaiYing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to HaiYing Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proofs

1.1 A.1 Proof of Theorem 1

Proof

Note that

$$\begin{aligned} \varvec{\delta }^{opt}_{T}= \arg \max _{\varvec{\delta }}\textrm{tr}\left( {\mathcal {I}}(\varvec{\delta })\right) =\arg \max _{\varvec{\delta }}\delta _i\Vert \varvec{x}_i\Vert ^2. \end{aligned}$$

Thus, the T-optimal subdata is the subdata with the k largest values of $\Vert \varvec{x}_i\Vert ^2$. $\square $

1.2 A.2 Proof of Theorem 2

Before proving Theorem 2, we need the following lemma.

Lemma A.1

Let $\lambda _{1},\ldots ,\lambda _{d}$ be the d eigenvalues of ${\mathcal {I}}(\varvec{\delta })$ with $\lambda _{\min }=\lambda _{1}\le \ldots \le \lambda _{d}=\lambda _{\max }$. Assume that $\textrm{var}(\varvec{Z}_{1}^*)\le \ldots \le \textrm{var}(\varvec{Z}_{d}^*)$ with $\textrm{var}(\varvec{Z}_j^*)$ being the sample variance for the jth column of $\varvec{Z}^*$. For any subdata of size n, represented by $\varvec{\delta }$, it holds that

$$\begin{aligned} (n-1)\lambda _{\min }(\varvec{R}^*)\textrm{var}(\varvec{Z}_{j}^*)\le \lambda _j({\mathcal {I}}(\varvec{\delta })) \le (n-1)\lambda _{\max }(\varvec{R}^*)\textrm{var}(\varvec{Z}_{j}^*), \end{aligned}$$

(21)

where $\varvec{R}^*$ is the sample correlation matrix of $\varvec{Z}^*$.

Proof

Recall $\varvec{Z}^*=(\varvec{z}_{1}^*,\ldots , \varvec{z}_{n}^*)^\textrm{T}$. Let ${\textbf{1}}$ be a $n\times 1$ vector of ones, and $\textrm{var}(\varvec{Z}_j^*)$ be the sample variance for the jth column of $\varvec{Z}^*$, $j=1,\ldots , d$. It follows that

$$\begin{aligned} {\mathcal {I}}(\varvec{\delta },\varvec{\theta })=&\varvec{Z}^{*\mathrm T}\varvec{Z}^*\\ \ge&\varvec{Z}^{*\textrm{T}}\Big (\varvec{I}-\frac{1}{n}{\textbf{1}}{\textbf{1}}^\textrm{T}\Big )\varvec{Z}^*\\ =&(n-1) \begin{bmatrix} \sqrt{\textrm{var}(\varvec{Z}_{1}^*)} &{} &{} \\ {} &{} \ddots &{} \\ &{} &{} \sqrt{\textrm{var}(\varvec{Z}_{d}^*)} \end{bmatrix} {\textbf{R}}^* \begin{bmatrix} \sqrt{\textrm{var}(\varvec{Z}_{1}^*)} &{} &{} \\ &{} \ddots &{} \\ &{} &{} \sqrt{\textrm{var}(\varvec{Z}_{d}^*)} \end{bmatrix}. \end{aligned}$$

Note the fact that for any matrices $A\ge 0$ and $B\ge 0$, it holds that $\lambda _{\min }(B)\lambda _j(A^2)\le \lambda _j(ABA)\le \lambda _{\max }(B)\lambda _j(A^2)$. The desired result follows immediately by letting $B=\varvec{R}^*$ and $A=\textrm{diag}(\sqrt{\textrm{var}(\varvec{Z}_{1}^*)},\ldots ,\sqrt{\textrm{var}(\varvec{Z}_{d}^*)})$. $\square $

Proof of Theorem 2

Recall that $\lambda _{1},\ldots ,\lambda _{d}$ are d eigenvalues of ${\mathcal {I}}(\varvec{\delta })$ with $0<\lambda _{\min }=\lambda _{1}\le \ldots \le \lambda _{d}=\lambda _{\max }$. We have

$$\begin{aligned} \{\textrm{tr}(d^{-1}{\mathcal {I}}^{-1}(\varvec{\delta }))\}^{-1}&=\left( d^{-1}\sum _{j=1}^d\lambda _j^{-1}\right) ^{-1},\end{aligned}$$

(22)

$$\begin{aligned} \{\det ({\mathcal {I}}(\varvec{\delta }))\}^{1/d}&=\left( \prod _{j=1}^d\lambda _j\right) ^{1/d}. \end{aligned}$$

(23)

Thus (13)–(15) can be easily obtained by Lemma A.1. $\square $

1.3 A.3 Proof of Theorem 3

Proof

For each sample variance,

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) =&\frac{1}{n-1}\sum _{i=1}^{n}({\tilde{{z}}}_{ij}^*-\bar{{\tilde{z}}}_j^*)^2\nonumber \\ =&\frac{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})^2}{k-1}\sum _{i=1}^{n}\left( \frac{{\tilde{z}}_{ij}^*-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2\nonumber \\ \ge&\frac{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})^2}{n-1} \left( \sum _{i=1}^{r}+\sum _{i=N-r+1}^N\right) \left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2. \end{aligned}$$

(A.1)

For the first summation in (A.4),

$$\begin{aligned}&\sum _{i=1}^{r}\left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2 =\frac{1}{n^2}\sum _{i=1}^{r}\left( \frac{\sum _{s=1}^n{{\tilde{z}}}_{sj}^*-n{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2. \end{aligned}$$

(A.2)

Each term in the summation of (A.5) can be written as

$$\begin{aligned}&\frac{\sum _{s=1}^n{{\tilde{z}}}_{sj}^*-n{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\nonumber \\&\quad =\sum _{s=N-r+1}^N\frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}} +\sum _{s=1}^r\frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\nonumber \\&\qquad +\sum _{l\ne j}\left( \sum _{s=1}^r +\sum _{s=N-r+1}^N\right) \frac{{\tilde{z}}_j^{(s)l}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}, \end{aligned}$$

(A.3)

where ${\tilde{z}}_j^{(s)l}$ is the jth dimension of the subdata point selected according to $\{\tilde{{z}}_{il},i=1,\ldots ,N\}$ in the second step of Algorithm 1. From the Assumption 1, we have that for $s,i\le r$, ${({\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j})}/{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})}=o_{P}(1)$ and ${({\tilde{z}}_j^{(s)l}-{\tilde{z}}_{(i)j})}/{({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})}$ is either positive or $o_{P}(1)$. Thus (A.6) implies

$$\begin{aligned}&\frac{\sum _{s=1}^n{{\tilde{z}}}_{sj}^*-n{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}} \ge \sum _{s=N-r+1}^N\frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}. \end{aligned}$$

(A.4)

From assumptions 1 and 2, for $s\ge N-r+1$ and $i\le r$, as $N\rightarrow \infty $,

$$\begin{aligned} \frac{\tilde{{z}}_{(s)j}-\tilde{{z}}_{(i)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} =\frac{\tilde{{z}}_{(s)j}-\tilde{{z}}_{(N)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} +\frac{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} +\frac{\tilde{{z}}_{(1)j}-\tilde{{z}}_{(i)j}}{\tilde{{z}}_{(N)j}-\tilde{{z}}_{(1)j}} =1+o_{P}(1) \end{aligned}$$

(A.5)

From (A.5), (A.7) and (A.8),

$$\begin{aligned} \sum _{i=1}^{r}\left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2 \ge&\frac{1}{n^2}\sum _{i=1}^{r}\left( \sum _{s=N-r+1}^N \frac{{\tilde{z}}_{(s)j}-{\tilde{z}}_{(i)j}}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}+o_{P}(1)\right) ^2 =\frac{r^3}{n^2}+o_{P}(1). \end{aligned}$$

(A.6)

Similarly,

$$\begin{aligned} \sum _{i=N-r+1}^N\left( \frac{{\tilde{z}}_{(i)j}-\bar{{\tilde{z}}}_j^*}{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}\right) ^2 \ge \frac{r^3}{n^2}+o_{P}(1). \end{aligned}$$

(A.7)

Combining (A.4), (A.9) and (A.10),

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) \ge \frac{2r^3({\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j})^2}{n^2(n-1)}(1+o_{P}(1)). \end{aligned}$$

(A.8)

If ${\tilde{z}}_{(1)j}/{\tilde{z}}_{(N)j}\overset{P}{\rightarrow }0$ or $\pm \infty $, then

$$\begin{aligned} \frac{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}{|{\tilde{z}}|_{(N)j}}=1+o_{P}(1). \end{aligned}$$

(A.9)

Combining (A.11) and (A.12) shows that

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) \ge \frac{2r^3|{\tilde{z}}|_{(N)j}^2}{n^2(n-1)}(1+o_{P}(1)). \end{aligned}$$

(A.10)

Thus, the desired results come from Theorem 2 and Slutsky’s theorem.

If ${\tilde{z}}_{(N)j}\rightarrow \infty $ and ${\tilde{z}}_{(1)j}$ is bounded below, or ${\tilde{z}}_{(1)j}\rightarrow -\infty $ and ${\tilde{z}}_{(N)j}$ is bounded above, then

$$\begin{aligned} \frac{{\tilde{z}}_{(N)j}-{\tilde{z}}_{(1)j}}{|{\tilde{z}}|_{(N)j}}=1+o_{P}(1). \end{aligned}$$

(A.11)

From (A.11) and (A.14), it follows that

$$\begin{aligned} \textrm{var}(\tilde{\varvec{Z}}_{j}^*) \ge \frac{2r^3|{\tilde{z}}|_{(N)j}^2}{n^2(n-1)}(1+o_{P}(1)). \end{aligned}$$

(A.12)

Thus, the desired results come from Theorem 2 and Slutsky’s theorem.

$\square $

1.4 A.4 Some details of Example 3 and Proposition 4

Lemma A.2

Suppose $\varvec{x}\sim N(\varvec{\mu },\Sigma )$ with $\Sigma >0$ and $\varvec{\theta }$ has more than one nonzero elements. Then it holds that $(x_j,\varvec{\theta }^\textrm{T}\varvec{x})^\textrm{T}$ is still a nondegenerate normal distribution for all j.

Lemma A.3

Suppose $\varvec{x}\sim N(\varvec{\mu },\Sigma )$ with $\Sigma >0$, and $\varvec{\theta }$ lies in a compact ball with more than one nonzero element. Then for any given M, and $C>0$, it holds that

$$\begin{aligned}&\Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0,\quad \\&\Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}< -M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0, \end{aligned}$$

for all j.

Proof of Proposition 4

Note that the jth dimension of $\tilde{\varvec{z}}$ can be written as

$$\begin{aligned}\tilde{{z}}_j=\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}{\tilde{\varvec{\theta }}}/2}+e^{\varvec{x}^\textrm{T}{\tilde{\varvec{\theta }}}/2}}.\end{aligned}$$

For any j, one can see that

$$\begin{aligned} \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M\Big )\ge \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0, \end{aligned}$$

where M and C are some constant independent of $\varvec{x}$.

Similarly, one can show that

$$\begin{aligned} \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}< -M\Big )\ge \Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}< -M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0. \end{aligned}$$

From Lemma A.3, the desired result follows. $\square $

Details of Example 3

Grounded on the two lemmas, we can see that for any $M>0$ and $j=1,\ldots ,d$,

$$\begin{aligned} \Pr (\tilde{{z}}_{(N)j}M)&=1-\Pr (\tilde{{z}}_{j,(N)}\le M)\end{aligned}$$

(A.13)

$$\begin{aligned}&=1-{\Pr }^N(\tilde{{z}}_{j}\le M)\end{aligned}$$

(A.14)

$$\begin{aligned}&=1-{\Pr }^N\left( \frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}\le M\right) . \end{aligned}$$

(A.15)

From Lemma A.3, it is clear to see that

$$\begin{aligned}&{\Pr }^N\Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}\le M\Big ) \rightarrow 0. \end{aligned}$$

(A.16)

Thus the result follows. $\square $

Proof of Lemma A.2

Without loss of generality, we only consider the case $j=1$ here. Note that $\varvec{x}$ is a multivariate normal distribution. Let ${\varvec{e}_j}$ be a unit vector with jth element being one and $C=(\varvec{e}_j,\varvec{\theta })$. Note that $\varvec{\theta }$ has more than one nonzero element by assumption. It is clear to see the rank of C equals two. Since $\Sigma >0$, the rank of $C^\textrm{T}\Sigma C$ is also two. The $\check{\varvec{x}}=C^\textrm{T}\varvec{x}=(x_j,\varvec{\theta }^\textrm{T}\varvec{x})^\textrm{T}$ is also a non degenerate normal distribution with mean $C^\textrm{T}\varvec{\mu }=(\mu _{j},\varvec{\mu }^\textrm{T}\varvec{\theta })^\textrm{T}$, and variance

$$\begin{aligned} C^\textrm{T}\Sigma C={\check{\Sigma }}=\left( \begin{array}{ll} \sigma _{jj} &{} {{\check{\sigma }}}_{12}\\ {{\check{\sigma }}}_{12}^\textrm{T}&{} {{\check{\sigma }}}_{22} \end{array} \right) , \end{aligned}$$

where $\mu _{j}$ is the jth elements in $\varvec{\mu }$, $\sigma _{jj}$ is the (j, j)th element of $\Sigma $, ${{\check{\sigma }}}_{12}$ equals to $(\Sigma _{\cdot j}^\textrm{T}\varvec{\theta })$ with $\Sigma _{\cdot j}$ being the jth column of $\Sigma $, and ${{\check{\sigma }}}_{22}=\varvec{\theta }^\textrm{T}\Sigma \varvec{\theta }$.

$\square $

Proof of Lemma A.3

For the first result, simple calculation yields that

$$\begin{aligned}&\Pr \Big (\frac{x_{j}}{e^{-\varvec{x}^\textrm{T}\varvec{\theta }/2}+e^{\varvec{x}^\textrm{T}\varvec{\theta }/2}}> M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )\end{aligned}$$

(A.17)

$$\begin{aligned}&\ge \Pr \Big (x_{j}> 2e^{C}M,\ |\varvec{x}^\textrm{T}\varvec{\theta }|\le 2C\Big )>0, \end{aligned}$$

(A.18)

since $(x_j,\varvec{x}^\textrm{T}\varvec{\theta })^\textrm{T}$ are non degenerate normal distribution by the fact proved in Lemma A.2. The second result is quite similar. Thus we omit it.

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, J., Liu, J. & Wang, H. Information-based optimal subdata selection for non-linear models. Stat Papers 64, 1069–1093 (2023). https://doi.org/10.1007/s00362-023-01430-3

Download citation

Received: 31 December 2022
Revised: 26 February 2023
Published: 25 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00362-023-01430-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information-based optimal subdata selection for non-linear models

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

A survey of Bayesian Network structure learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A Proofs

1.1 A.1 Proof of Theorem 1

Proof

1.2 A.2 Proof of Theorem 2

Lemma A.1

Proof

Proof of Theorem 2

1.3 A.3 Proof of Theorem 3

Proof

1.4 A.4 Some details of Example 3 and Proposition 4

Lemma A.2

Lemma A.3

Proof of Proposition 4

Details of Example 3

Proof of Lemma A.2

Proof of Lemma A.3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information-based optimal subdata selection for non-linear models

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

A survey of Bayesian Network structure learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A Proofs

Appendix A Proofs

1.1 A.1 Proof of Theorem 1

Proof

1.2 A.2 Proof of Theorem 2

Lemma A.1

Proof

Proof of Theorem 2

1.3 A.3 Proof of Theorem 3

Proof

1.4 A.4 Some details of Example 3 and Proposition 4

Lemma A.2

Lemma A.3

Proof of Proposition 4

Details of Example 3

Proof of Lemma A.2

Proof of Lemma A.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation