Abstract
We derive limiting distributions of symmetrized estimators of scatter. Instead of considering all \(n(n-1)/2\) pairs of the n observations, we only use nd suitably chosen pairs, where \(d \ge 1\) is substantially smaller than n. It turns out that the resulting estimators are asymptotically equivalent to the original one whenever \(d = d(n) \rightarrow \infty\) at arbitrarily slow speed. We also investigate the asymptotic properties for arbitrary fixed d. These considerations and numerical examples indicate that for practical purposes, moderate fixed values of d between 10 and 20 yield already estimators which are computationally feasible and rather close to the original ones.
Similar content being viewed by others
References
Barbour, A. D., Chen, L. H. Y. (eds.) (2005). An introduction to Stein’s method, Lecture Notes Series, Vol. 4, Institute for Mathematical Sciences, National University of Singapore, Singapore University Press.
Blom, G. (1976). Some properties of incomplete \(U\)-statistics. Biometrika, 63, 573–580.
Brown, B. M., Kildea, D. G. (1978). Reduced \(U\)-statistics and the Hodges–Lehmann estimator. The Annals of Statistics, 6, 828–835.
Dudley, R. M., Sidenko, S., Wang, Z. (2009). Differentiability of \(t\)-functionals of location and scatter. The Annals of Statistics, 37, 939–960.
Dümbgen, L. (1998). On Tyler’s \(M\)-functional of scatter in high dimension. Annals of the Institute of Statistical Mathematics, 50, 471–491.
Dümbgen, L., Nordhausen, K., Schuhmacher, H. (2014). fastM: Fast computation of multivariate M-estimators. R package, https://cran.r-project.org/web/packages/fastM
Dümbgen, L., Pauly, M., Schweizer, T. (2015). M-functionals of multivariate scatter. Statistics Surveys, 9, 32–105.
Dümbgen, L., Nordhausen, K., Schuhmacher, H. (2016). New algorithms for M-estimation of multivariate scatter and location. Journal of Multivariate Analysis, 144, 200–217.
Feller, W. (1945). The fundamental limit theorems in probability. Bulletin of the American Mathematical Society, 51, 800–832.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics, 19, 293–325.
Hoeffding, W. (1951). A combinatorial Central Limit Theorem. The Annals of Mathematical Statistics, 22, 558–566.
Kent, J. T., Tyler, D. E. (1991). Redescending \(M\)-estimates of multivariate location and scatter. The Annals of Mathematical Statistics, 19, 2102–2119.
Lee, A. J. (1990). U-statistics—theory and practice (Vol. 110). New York: Marcel Dekker, Inc.
Miettinen, J., Nordhausen, K., Taskinen, S., Tyler, D. E. (2016). On the computation of symmetrized \(M\)-estimators of scatter. In C. Agostinelli, A. Basu, P. Filzmoser, D. Mukherjee (Eds.), Recent Advances in Robust Statistics: Theory and Applications (pp. 151–167). India: Springer.
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10, 1–141.
Nordhausen, K., Tyler, D. E. (2015). A cautionary note on robust covariance plug-in methods. Biometrika, 102, 573–588.
Nordhausen, K., Oja, H., Ollila, E. (2008). Robust independent component analysis based on two scatter matrices. Austrian Journal of Statistics, 37, 91–100.
Paindaveine, D. (2008). A canonical definition of shape. Statistics and Probability Letters, 78, 2240–2247.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. Wiley series in probability and mathematical statistics, New York: Wiley.
Sirkia, S., Taskinen, S., Oja, H. (2007). Symmetrised M-estimators of multivariate scatter. Journal of Multivariate Analysis, 98, 1611–1629.
Stein, C. (1986). Approximate computation of expectations, Institute of mathematical statistics lecture notes—monograph series, Vol. 7, Hayward, CA: Institute of Mathematical Statistics.
Tyler, D. E. (1987). A distribution-free \(M\)-estimator of multivariate scatter. The Annals of Statistics, 15, 234–251.
Tyler, D. E., Critchley, F., Dümbgen, L., Oja, H. (2009). Invariant coordinate selection (with discussion). Journal of the Royal Statistical Society, Series B: Statistical Methodology, 71, 549–592.
Acknowledgements
We thank Sara Taskinen for stimulating discussions. Constructive comments of three referees are gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
L. Dümbgen, work supported by the Swiss National Science Foundation.
A auxiliary results
A auxiliary results
1.1 A.1 A particular coupling of random permutations
Preparations. For an integer \(n \ge 1\), let \({\mathcal {S}}_n\) be the set of all permutations of \(\langle n\rangle := \{1,2,\ldots ,n\}\). A cycle in \({\mathcal {S}}_n\) is a permutation \(\sigma \in {\mathcal {S}}_n\) such that for \(m \ge 1\) pairwise different points \(a_1,\ldots ,a_m \in \langle n\rangle\),
while \(\sigma (i) = i\) for \(i \in \langle n\rangle {\setminus } \{a_1,\ldots ,a_m\}\). (In the case of \(m = 1\), \(\sigma (i) = i\) for all \(i \in \langle n\rangle\).) We write
for this mapping and note that it has m equivalent representations
Any permutation \(\sigma \in {\mathcal {S}}_n\) can be written as
where the sets \(\{a_{j1},\ldots ,a_{jm(j)}\}\), \(1 \le j \le k\), form a partition of \(\langle n\rangle\). Note that the cycles \((a_{j1},\ldots ,a_{jm(j)})_{\textrm{c}}\), \(1\le j\le m\), commute. This representation of \(\sigma\) as a combination of cycles is unique if we require, for instance, that
and
In what follows, let \({\mathcal {S}}_n^*\) be the set of all permutations \(\sigma \in {\mathcal {S}}_n\) consisting of just one cycle, i.e.,
with pairwise different numbers \(a_1, a_2, \ldots , a_n \in \langle n\rangle\).
The coupling. The standardized cycle representation of \(\sigma \in {\mathcal {S}}_n\) gives rise to a particular mapping \({\mathcal {S}}_n \ni \pi \mapsto (\sigma ,\sigma ^*) \in {\mathcal {S}}_n \times {\mathcal {S}}_n^*\) such that \(\pi \mapsto \sigma\) is bijective. For fixed \(\pi \in {\mathcal {S}}_n\) and any index \(i \in \langle n\rangle\), let
i.e., \(\langle n\rangle = M_1 \supset M_2 \supset \cdots \supset M_n = \{\pi (n)\}\), and \(\# M(i) = n+1-i\). Let \(1 \le t_1< t_2< \cdots < t_k = n\) be those indices i such that \(\pi (i) = \min (M_i)\). Then,
defines a permutation of \(\langle n\rangle\) with standardized cycle representation. This is essentially the construction used by Feller (1945) to investigate the number of cycles of a random permuation. Moreover,
defines a permutation in \({\mathcal {S}}_n^*\) such that
Suppose that \(\pi\) is a random permutation with uniform distribution on \({\mathcal {S}}_n\). Then, \(\sigma\) is a random permutation with uniform distribution on \({\mathcal {S}}_n\) too, because \(\pi \mapsto \sigma\) is a bijection. Since the conditional distribution of \(\pi (i)\), given \((\pi (s))_{1 \le s < i}\), is the uniform distribution on \(M_i\), the random variables
are stochastically independent Bernoulli random variables with \(\textrm{I}\!\textrm{P}(Y_i = 1) = (n+1-i)^{-1} = 1 - \textrm{I}\!\textrm{P}(Y_i = 0)\). Consequently,
because \(j^{-1} \le \int _{j-1}^j x^{-1} \, dx = \log (j) - \log (j-1)\) for \(2 \le j \le n\).
1.2 A.2 Some inequalities related to Lindeberg-type conditions
In connection with Gaussian approximations and Stein’s method, see Stein (1986) or Barbour and Chen (2005), the quantity
for a square-integrable random variable X plays an important role. Elementary considerations show that
for arbitrary \(x \in {\mathbb {R}}\). Moreover, \(h: {\mathbb {R}}\rightarrow [0,\infty )\) is an even, convex function such that \(h(2x) \le 8 h(x)\). Consequently, for arbitrary \(x,y \in {\mathbb {R}}\), Jensen’s inequality implies that
For a symmetric matrix \(A \in {\mathbb {R}}^{n\times n}\), we define its row means \(\bar{A}_i:= n^{-1} \sum _{j=1}^n A_{ij}\) and its overall mean \(\bar{A}:= n^{-2} \sum _{i,j=1}^n A_{ij}\). Let \(\tilde{A}:= (A_{ij} - \bar{A}_i - \bar{A}_j + \bar{A})_{i,j=1}^n\). Then, elementary calculations and the previous inequalities reveal that
and
About this article
Cite this article
Dümbgen, L., Nordhausen, K. Approximating symmetrized estimators of scatter via balanced incomplete U-statistics. Ann Inst Stat Math 76, 185–207 (2024). https://doi.org/10.1007/s10463-023-00879-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-023-00879-1