Self-normalization: Taming a wild population in a heavy-tailed world

Shao, Qi-man; Zhou, Wen-xin

doi:10.1007/s11766-017-3552-y

Self-normalization: Taming a wild population in a heavy-tailed world

Published: 07 September 2017

Volume 32, pages 253–269, (2017)
Cite this article

Applied Mathematics-A Journal of Chinese Universities Aims and scope Submit manuscript

Qi-man Shao¹ &
Wen-xin Zhou²

307 Accesses
5 Citations
Explore all metrics

Abstract

The past two decades have witnessed the active development of a rich probability theory of Studentized statistics or self-normalized processes, typified by Student’s t-statistic as introduced by W. S. Gosset more than a century ago, and their applications to statistical problems in high dimensions, including feature selection and ranking, large-scale multiple testing and sparse, high dimensional signal detection. Many of these applications rely on the robustness property of Studentization/self-normalization against heavy-tailed sampling distributions. This paper gives an overview of the salient progress of self-normalized limit theory, from Student’s t-statistic to more general Studentized nonlinear statistics. Prototypical examples include Studentized one- and two-sample U-statistics. Furthermore, we go beyond independence and glimpse some very recent advances in self-normalized moderate deviations under dependence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

T W Anderson. An Introduction to Multivariate Statistical Analysis, 3rd ed, Wiley, Hoboken, 2003.
MATH Google Scholar
J N Arvesen. Jackknifing U-statistics, Ann Math Statist, 1969, 40(6): 2076–2100.
Article MathSciNet MATH Google Scholar
Z Bai, H Saranadasa. Effect of high dimension: by an example of a two sample problem, Statist Sinica, 1996, 6(2): 311–329.
MathSciNet MATH Google Scholar
A Belloni, V Chernozhukov, L Wang. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming, Biometrika, 2011, 98(4): 791–806.
Article MathSciNet MATH Google Scholar
V Bentkus, M Bloznelis, F Götze. A Berry-Esséen bound for student’s statistic in the non-I.I.D. case, J Theoret Probab, 1996, 9(3): 765–796.
Article MathSciNet MATH Google Scholar
V Bentkus, F Götze. The Berry-Esseen bound for Student’s statistic, Ann Probab, 1996, 24(1): 491–503.
Article MathSciNet MATH Google Scholar
V Bentkus, B Y Jing, Q M Shao, W Zhou. Limiting distributions of the non-central t-statistic and their applications to the power of t-tests under non-normality, Bernoulli, 2007, 13(2): 346–364.
Article MathSciNet MATH Google Scholar
B Bercu, E Gassiat, E Rio. Concentration inequalities, large and moderate deviations for selfnormalized empirical processes, Ann Probab, 2002, 30(4): 1576–1604.
Article MathSciNet MATH Google Scholar
B Bercu, A Touati. Exponential inequalities for self-normalized martingales with applications, Ann Appl Probab, 2008, 18(5): 1848–1869.
Article MathSciNet MATH Google Scholar
M Bloznelis, H Putter. Second-order and bootstrap approximation to Student’s t-statistic, Theory Probab Appl, 2003, 47(2): 300–307.
Article MathSciNet MATH Google Scholar
J F Box. Gosset, Fisher, and t distribution, Amer Statist, 1981, 35(2): 61–66.
MathSciNet Google Scholar
P Bühlmann. Bootstrap for time series, Statist Sci, 2002, 17(1): 52–72.
Article MathSciNet MATH Google Scholar
H Cao, M R Kosorok. Simultaneous critical values for t-tests in very high dimensions, Bernoulli, 2011, 17(1): 347–394.
Article MathSciNet MATH Google Scholar
J Chang, Q M Shao, W X Zhou. Cramér-type moderate deviations for Studentized two-sample U-statistics with applications, Ann Statist, 2016, 44(5): 1931–1956.
Article MathSciNet MATH Google Scholar
J Chang, C Y Tang, Y Wu. Marginal empirical likelihood and sure independence feature screening, Ann Statist, 2013, 41(4): 2123–2148.
Article MathSciNet MATH Google Scholar
J Chang, C Y Tang, Y Wu. Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood, Ann Statist, 2016, 44(2): 515–539.
Article MathSciNet MATH Google Scholar
S Chatterjee, Q M Shao. Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie-Weiss model, Ann Appl Probab, 2011, 21(2): 464–483.
Article MathSciNet MATH Google Scholar
L H Y Chen, Q M Shao. Normal approximation for nonlinear statistics using a concentration inequality approach, Bernoulli, 2007, 13(2): 581–599.
Article MathSciNet MATH Google Scholar
X Chen, Q M Shao, W B Wu, L Xu. Self-normalized Cramér-type moderate deviations under dependence, Ann Statist, 2016, 44(4): 1593–1617.
Article MathSciNet MATH Google Scholar
H Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann Math Statist, 1952, 23(4): 493–507.
Article MathSciNet MATH Google Scholar
G P Chistyakov, F Götze. On bounds for moderate deviations for Student’s statistic, Theory Probab Appl, 2004, 48(3): 528–535.
Article MathSciNet MATH Google Scholar
G P Chistyakov, F Götze. Limit distributions of Studentized means, Ann Probab, 2004, 32(1A): 28–77.
Article MathSciNet MATH Google Scholar
E Chung, J Romano. Asymptotically valid and exact permutation tests based on two-sample Ustatistics, J Statist Plann Inference, 2016, 168: 97–105.
Article MathSciNet MATH Google Scholar
S Clarke, P Hall. Robustness of multiple testing procedures against dependence, Ann Statist, 2009, 37(1): 332–358.
Article MathSciNet MATH Google Scholar
M Csörgő, L Horváth. Asymptotic representations of self-normalized sums, Probab Math Statist, 1988, 9: 15–24.
MathSciNet MATH Google Scholar
M Csörgő, B Szyszkowicz, Q Wang. Donsker’s theorem for self-normalized partial sums processes, Ann Probab, 2003, 31(3): 1228–1240.
Article MathSciNet MATH Google Scholar
V H de la Pe˜na, T L Lai, Q M Shao. Self-Normalized Processes: Theory and Statistical Applications, Springer, Berlin, 2009.
Book Google Scholar
A Delaigle, P Hall, J Jin. Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic, J Roy Statist Soc Ser B, 2011, 73(3): 283–301.
Article MathSciNet Google Scholar
A Dembo, Q M Shao. Large and moderate deviations for Hotelling’s T ²-statistic, Electron Commun Probab, 2006, 11: 149–159.
Article MathSciNet MATH Google Scholar
B Efron. Student’s t-test under symmetry conditions, J Amer Statist Assoc, 1969, 64: 1278–1302.
MathSciNet MATH Google Scholar
V A Egorov. Estimation of distribution tails for normalized and self-normalized sums, J Math Sci, 2005, 127(1): 1717–1722.
Article MATH MathSciNet Google Scholar
C Eisenhart. On the transition from “Student’s” z to “Student’s” t, Amer Statist, 1979, 33(1): 6–10.
MathSciNet Google Scholar
J Fan, Y Fan. High-dimensional classification using features annealed independence rules, Ann Statist, 2008, 36(6): 2605–2637.
Article MathSciNet MATH Google Scholar
J Fan, P Hall, Q Yao. To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied, J Amer Statist Assoc, 2007, 102: 1282–1288.
Article MathSciNet MATH Google Scholar
J Fan, J Lv. A selective overview of variable selection in high dimensional feature space, Statist Sinica, 2010, 20(1): 101–148.
MathSciNet MATH Google Scholar
R A Fisher. Applications of “Student’s” distribution, Metron, 1925, 5: 90–104.
MATH Google Scholar
L Gao, Q M Shao, J S Shi. Cramér moderate deviations for a general self-normalized sum, Preprint, 2017.
Google Scholar
E Giné, F Götze, D M Mason. When is the Student t-statistic asymptotically standard normal, Ann Probab, 1997, 25(3): 1514–1531.
Article MathSciNet MATH Google Scholar
P S Griffin, J D Kuelbs. Self-normalized laws of the iterated logarithm, Ann Probab, 1989, 17(4): 1571–1601.
Article MathSciNet MATH Google Scholar
P S Griffin, D M Mason. On the asymptotic normality of self-normalized sums, Math Proc Cambridge Philos Soc, 1991, 109(3): 597–610.
Article MathSciNet MATH Google Scholar
P Hall. Edgeworth expansion for Student’s t statistic under minimal moment conditions, Ann Probab, 1987, 15(3): 920–931.
Article MathSciNet MATH Google Scholar
P Hall. On the effect of random norming on the rate of convergence in the central limit theorem, Ann Probab, 1988, 16(3): 1265–1280.
Article MathSciNet MATH Google Scholar
P Hall, Q Wang. Exact convergence rate and leading term in central limit theorem for Student’s t statistic, Ann Probab, 2004, 32(2): 1419–1437.
Article MathSciNet MATH Google Scholar
W Hoeffding. A class of statistics with asymptotically normal distribution, Ann Math Statist, 1948, 19(3): 293–325.
Article MathSciNet MATH Google Scholar
H Hotelling. The generalization of Student’s ratio, Ann Math Statist, 1931, 2(3): 360–378.
Article MATH Google Scholar
B Y Jing, Q M Shao, Q Wang. Self-normalized Cramér-type large deviations for independent random variables, Ann Probab, 2003, 31(4): 2167–2215.
Article MathSciNet MATH Google Scholar
B Y Jing, Q M Shao, W Zhou. Saddlepoint approximation for Student’s t-statistic with no moment conditions, Ann Statist, 2004, 32(6): 2679–2711.
Article MathSciNet MATH Google Scholar
B Y Jing, Q M Shao, W Zhou. Towards a universal self-normalized moderate deviation, Trans Amer Math Soc, 2008, 360: 4263–4285.
Article MathSciNet MATH Google Scholar
M Juodis, A Račkauskas. A remark on self-normalization for dependent random variables, Lith Math J, 2005, 45(2): 142–151.
Article MathSciNet MATH Google Scholar
S N Lahiri. Resampling Methods for Dependent Data, Springer, New York, 2003.
Book MATH Google Scholar
T L Lai, Q M Shao, Q Wang. Cramér type moderate deviations for Studentized U-statistics, ESAIM Probab Stat, 2011, 15: 168–179.
Article MathSciNet MATH Google Scholar
J V Linnik. On the probability of large deviations for the sums of independent variables, Proc 4th Berkeley Sympos Math Statist and Prob, Vol II, Univ California Press, Berkeley, 1961, 289–306.
Google Scholar
W Liu, Q M Shao. A Cramér type moderate deviation theorem for Hotelling’s T ²-statistic with applications to global tests, Ann Statist, 2013, 41(1): 296–322.
Article MathSciNet MATH Google Scholar
W Liu, Q M Shao. Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control, Ann Statist, 2014, 42(5): 2003–2025.
Article MathSciNet MATH Google Scholar
B F Logan, C L Mallow, S O Rice, L A Shepp. Limit distributions of self-normalized sums, Ann Probab, 1973, 1(5): 788–809.
Article MathSciNet MATH Google Scholar
R A Maller. A theorem on products of random variables with application to regression, Aust N Z J Stat, 1981, 23(2): 177–185.
Article MathSciNet MATH Google Scholar
D M Mason. The asymptotic distribution of self-normalized triangular arrays, J Theoret Probab, 2005, 18(4): 853–870.
Article MathSciNet MATH Google Scholar
S Y Novak. On self-normalized sums of random variables and the Student’s statistic, Theory Probab Appl, 2005, 49(2): 336–344.
Article MathSciNet MATH Google Scholar
G M Pan, W Zhou. Central limit theorem for Hotelling’s T ² statistic under large dimension, Ann Appl Probab, 2011, 21(5): 1860–1910.
Article MathSciNet MATH Google Scholar
E S Pearson. Some reflections on continuity in the development of mathematical statistics, 1885–1920, Biometrika, 1967, 54: 341–355.
MathSciNet MATH Google Scholar
D N Politis, J P Romano, M Wolf. Subsampling, Springer, New York, 1999.
Book MATH Google Scholar
J Robinson, Q Wang. On the self-normalized Cramér-type large deviation, J Theoret Probab, 2005, 18(4): 891–909.
Article MathSciNet MATH Google Scholar
Q M Shao. Self-normalized large deviations, Ann Probab, 1997, 25(1): 285–328.
Article MathSciNet MATH Google Scholar
Q M Shao. A Cramér type large deviation result for Student’s t statistic, J Theoret Probab, 1999, 12(2): 385–398.
Article MathSciNet MATH Google Scholar
Q M Shao. An explicit Berry-Esseen bound for Student’s t-statistic via Stein’s method, In: Stein’s Method and Applications, Lect Notes Ser Inst Math Sci Natl Univ Singap 5, Singapore University Press, Singapore, 2005, 143–155.
Google Scholar
Q M Shao, Q Wang. Self-normalized limit theorems: a survey, Probab Surv, 2013, 10: 69–93.
Article MathSciNet MATH Google Scholar
Q M Shao, Z S Zhang. Identifying the limiting distribution by a general approach of Stein’s method, Sci China Math, 2016, 59(12): 2379–2392.
Article MathSciNet MATH Google Scholar
Q M Shao, K Zhang, W X Zhou. Stein’s method for nonlinear statistics: a brief survey and recent progress, J Statist Plann Inference, 2016, 168: 68–89.
Article MathSciNet MATH Google Scholar
Q M Shao, W X Zhou. Cramér type moderate deviation theorems for self-normalized processes, Bernoulli, 2016, 22(4): 2029–2079.
Article MathSciNet MATH Google Scholar
V V Slavova. On the Berry-Esseen bound for Student’s statistics, In: Stability Problems for Stochastic Models, Lecture Notes in Math, 1155, Springer, Berlin, 1985, 355–390.
Google Scholar
Student. The probable error of a mean, Biometrika, 1908, 6: 1–25.
Article Google Scholar
M Vandemaele, N Veraverbeke. Cramér type large deviations for Studentized U-statistics, Metrika, 1985, 32(1): 165–179.
Article MathSciNet MATH Google Scholar
Q Wang. Bernstein type inequalities for degenerate U-statistics with applications, Chin Ann Math, 1998, 19(2): 157–166.
MathSciNet MATH Google Scholar
Q Wang. Limit theorems for self-normalized large deviation, Electron J Probab, 2005, 10: 1260–1285.
Article MathSciNet MATH Google Scholar
Q Wang. Refined self-normalized large deviations for independent random variables, J Theoret Probab, 2011, 24(2): 307–329.
Article MathSciNet MATH Google Scholar
Q Wang, B Y Jing. An exponential nonuniform Berry-Esseen bound for self-normalized sums, Ann Probab, 1999, 27(4): 2068–2088.
Article MathSciNet MATH Google Scholar
Q Wang, B Y Jing, L Zhao. The Berry-Esseen bound for Studentized statistics, Ann Probab, 2000, 28(1): 511–535.
Article MathSciNet MATH Google Scholar
S L Zabell. On Student’s 1908 article “The probable error of a mean.”, J Amer Statist Assoc, 2008, 103: 1–7.
Article MathSciNet MATH Google Scholar
W Zhou, B Y Jing. Tail probability approximations for Student’s t-statistics, Probab Theory Related Fields, 2006, 136(4): 541–559.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Qi-man Shao
Department of Mathematics, University of California, San Diego, La Jolla, CA, 92093, USA
Wen-xin Zhou

Authors

Qi-man Shao
View author publications
You can also search for this author in PubMed Google Scholar
Wen-xin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi-man Shao.

Additional information

Supported by Hong Kong RGC GRF 14302515.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, Qm., Zhou, Wx. Self-normalization: Taming a wild population in a heavy-tailed world. Appl. Math. J. Chin. Univ. 32, 253–269 (2017). https://doi.org/10.1007/s11766-017-3552-y

Download citation

Received: 01 June 2017
Revised: 13 July 2017
Published: 07 September 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11766-017-3552-y

MR Subject Classification

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-normalization: Taming a wild population in a heavy-tailed world

Abstract

Access this article

Similar content being viewed by others

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices

Statistical inference and large-scale multiple testing for high-dimensional regression models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

MR Subject Classification

Keywords

Navigation

Self-normalization: Taming a wild population in a heavy-tailed world

Abstract

Access this article

Similar content being viewed by others

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices

Statistical inference and large-scale multiple testing for high-dimensional regression models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

MR Subject Classification

Keywords

Search

Navigation