Bootstrapping and permuting paired t-test type statistics

Konietschke, Frank; Pauly, Markus

doi:10.1007/s11222-012-9370-4

Bootstrapping and permuting paired t-test type statistics

Open access
Published: 08 January 2013

Volume 24, pages 283–296, (2014)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

Bootstrapping and permuting paired t-test type statistics

Download PDF

Frank Konietschke¹ &
Markus Pauly²

12k Accesses
56 Citations
2 Altmetric
Explore all metrics

Abstract

We study various bootstrap and permutation methods for matched pairs, whose distributions can have different shapes even under the null hypothesis of no treatment effect. Although the data may not be exchangeable under the null, we investigate different permutation approaches as valid procedures for finite sample sizes. It will be shown that permutation or bootstrap schemes, which neglect the dependency structure in the data, are asymptotically valid. Simulation studies show that these new tests improve the power of the t-test under non-normality.

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Article Open access 07 September 2023

Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App

Article 25 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In many psychological, biological and medical experiments, data are collected in terms of a matched pairs design, e.g. when a homogeneous group of subjects is repeatedly observed under two conditions called time points in the terminology of repeated measures designs. Hereby different variances of the observations occur in a natural way, e.g. when data are collected over time. The data of such trials can be modeled by independent and identically distributed random vectors

(1.1)

with expectation E(X ₁)=μ=(μ ₁,μ ₂)′ and an arbitrary positive definite covariance matrix Var(X ₁)=Σ. Our aim is to test the null hypothesis H ₀:μ ₁=μ ₂, or $H_{0}^{(1)}:\mu_{1}\leq\mu_{2}$, in this semi-parametric framework.

The paired t-test type statistic |T _n,stud| with

(1.2)

is the commonly used statistic for testing H ₀, where D _i=X _i,1−X _i,2 denote the differences of the pairs for i=1,…,n, $\overline{D}_{n}=n^{-1}\sum_{i=1}^{n} D_{i} = \overline{X}_{1} - \overline{X}_{2}$ is the difference of the means, and $V_{n}^{2}=(n-1)^{-1}\sum_{i=1}^{n} (D_{i}-\overline{D}_{n})^{2}$ denotes the sample variance of the D _i’s. As commonly known, T _n,stud is exactly T(n−1)-distributed under H ₀, if the differences are normal, even for arbitrary Σ. Under non-normality, the distribution of T _n,stud may be approximated by a T(n−1)-distribution, which follows from the central limit theorem. For large sample sizes, the null hypothesis H ₀:μ ₁=μ ₂ will be rejected if |T _n,stud|≥t _1−α/2, where t _1−α/2 denotes the (1−α/2)-quantile from the T(n−1)-distribution. Thus, the t-test can be equivalently written as

(1.3)

For testing $H_{0}^{(1)}$ the t-test φ _t can be redefined by using T _n,stud as the test statistic in (1.3) and replacing the critical value t _1−α/2 by t _1−α. In a variety of papers and applications, however, it has already been shown that the rate of convergence from T _n,stud to its asymptotic normality is rather slow, particularly for skewed distributions of the differences. For a detailed explanation we refer the reader to Munzel (1999).

It is the aim of the present paper to discuss the limit behaviour of various resampling versions of T _n,stud to improve its small sample properties under non-normality. Specific examples are all kind of bootstrap and permutation resampling statistics. Although the data may not be exchangeable in model (1.1), an accurate and (asymptotically) valid level α resampling test for H ₀ can be derived if (i) the resampling distribution of the statistic is asymptotically independent from the distribution of the data; (ii) the resampling distribution has a limit; and (iii) if the distribution of the test statistic and the conditional resampling distribution (asymptotically) coincide, see Janssen (1997, 1999a, 1999b, 2005), Janssen and Pauls (2003, 2005), Neubert and Brunner (2007), Pauly (2011) or Omelka and Pauly (2012). The items (i)–(iii) will be referred to the permanence property of resampling tests.

More details on theory and applications of bootstrap and permutation tests can be found in the monographs of Basso et al. (2009), Good (2005) as well as Pesarin and Salmaso (2010b). Moreover, when comparing more than one aspect of the data, Brombin et al. (2011) also discuss permutation tests for paired observations with an useful application. In particular, permutation approaches for multivariate data are intensively discussed by Pesarin and Salmaso (2012) and Brombin and Salmaso (2009). Both papers provide a detailed summary of existing procedures and some new developments. Regarding repeated measures designs, Pesarin and Salmaso (2010a) apply permutation tests by investigating finite-sample properties.

The intuitive resampling or permutation strategy is to draw the differences with replacement D _i from the data, or to permute the variables X _i,1 and X _i,2 within the pairs, respectively. The lack of both resampling schemes is that only a few permutations (2ⁿ) are available, or that a small variety within the resamplings occurs when n is rather small. The counterintuitive resampling or permutation strategies are either drawing the variables X _i,s with replacement from all 2n observations X _1,1,…,X _n,2, drawing the variables $X_{i,s}-\overline{X}_{s}$ with replacement from each marginal sample X _1,s,…,X _n,s,s=1,2, separately, or to permute all 2n observations in X=(X _1,1,X _1,2,…,X _n,2)′, and then repeatedly compute (e.g. 10,000 times) the paired t-test statistic. On the one hand, these counterintuitive resampling methods increase the resampling variability, on the other hand, the dependency structure within the pairs is neglected. In this paper, it will be shown that both kinds of the intuitive and also the counterintuitive resampling strategies, which neglect the dependency structure in the data, fulfill the permanence property, and thus, the corresponding resampling tests are asymptotically valid. Extensive simulation studies show that especially permutation-based approaches improve the paired t-test, even for extremely small sample sizes. The paper is organized as follows: In Sect. 2 we explain how resampling and permutation tests work and explain in detail why the resulting tests are asymptotically valid. In Sect. 3 extensive simulations are conducted to compare the different resamplings with the paired t-test. The paper closes with a discussion of the results. All technical details and proofs are given in the Appendix.

2 How do paired bootstrap and permutation tests work?

In this section we will study various resampling versions of the paired t-Test. Among others we like to point out why special bootstrap and permutation tests, which neglect the dependency structure of the data within their resampling scheme, are asymptotically valid level α tests for H ₀. Let $\mathbf{X}^{\ast}=(\mathbf{X}_{1}^{\ast}, \ldots, \mathbf{X}_{n}^{\ast})'$, with $\mathbf{X}_{i}^{\ast}=(X_{i,1}^{\ast},X_{i,2}^{\ast})$, denote n resampling vectors for i=1,…,n, given the original data X, where

(I)
X ^∗ is a random permutation of all data X=(X _1,1,X _1,2,…,X _n,2)′, or
(II)
$\mathbf{X}_{i}^{\ast}$ is a random permutation of the sample unit $\mathbf{X}_{i}'=(X_{i,1},X_{i,2})$, or
(III)
$X_{i,s}^{\ast}$ is randomly drawn with replacement from all data X, or
(IV)
$X_{i,s}^{\ast}$ is randomly drawn with replacement from each centered marginal sample $\mathbf{X}_{s}=(X_{1,s}-\overline{X}_{s},\ldots, X_{n,s}-\overline{X}_{s})',s=1,2$, respectively.

The conditional resampling statistic of T _n,stud is then given by

(2.1)

where $D_{i}^{\ast}= X_{i,1}^{\ast}- X_{i,2}^{\ast}$ denotes the differences of the resampling variables for i=1,…,n, $\overline{D}_{n}^{\ast}= n^{-1}\sum_{i=1}^{n} D_{i}^{\ast}$ denotes their mean, and $V_{n}^{\ast 2}=(n-1)^{-1}\sum_{i=1}^{n}(D_{i}^{\ast}-\overline{D}_{n}^{\ast})^{2}$ denotes the sample variance of the differences $D_{i}^{\ast}$.

Here we like to point out that the denominator in (2.1) is part of the resampling procedures, which is in accordance with the guidelines for bootstrap testing, see Hall and Wilson (1991), Beran (1997), Bickel and Freedman (1981), and Janssen (2005). Delaigle et al. (2011) have further shown that studentized resampling t-statistics are more robust and accurate than non-studentized statistics. The following gives an explanation how the corresponding resampling tests can be computed.

The introduced conditional resampling tests rely on a reference distribution $\mathcal{L}(T_{n,stud}^{\ast}|\mathbf{X})$ given the data X. This means that the data are treated as fixed values, and quantiles from the conditional resampling distribution of $T_{n,stud}^{\ast}$ are estimated to compute critical values. Denote by $c_{n}^{\ast}(1-\alpha)$ the (1−α)-quantile of $\mathcal {L}(T_{n,stud}^{\ast}|\mathbf{X})$. Then, according to the definition of the paired t-test in (1.3), conditional resampling tests can be written as

(2.2)

Next we will prove that $T_{n,stud}^{\ast}$ as given in (2.1) is asymptotically standard normal under all of the different resampling schemes described above. In particular, we will show that the permanence property is fulfilled, thus, $\varphi_{n}^{\ast}$ is an asymptotically valid test for H ₀. Its asymptotic normality is particularly derived under arbitrary alternatives, i.e. we do not assume that H ₀ is true. To give an answer to the question “How do paired Bootstrap and Permutation tests work?” we will introduce the following criterion from Janssen and Pauly (2010), which uses the paired t-test as a benchmark for the resampling procedures.

Definition 2.1

The conditional tests $\varphi_{n}^{*}$ defined in (2.2) are called

(i)
asymptotically effective under H ₀ with respect to the paired t-test, iff
(2.3)
(ii)
consistent iff
(2.4)
for μ ₁≠μ ₂ as n→∞.

Now we can formulate

Theorem 2.1

The resampling tests $\varphi_{n}^{*}$ defined in (2.1) are asymptotically effective with respect to φ _t and consistent under all resampling schemes (I) through (IV).

From the proof it can be seen that a similar result also holds for one-sided versions of the tests. For further details see the Appendix. Specifically, Theorem 2.1 shows that the counterintuitive resampling procedures (I), (III) and (IV) are asymptotically valid, because studentized statistics are resampled. Roughly speaking, the studentization of the resampling variables “deletes” the dependency structure in the data when n is sufficiently large.

2.1 Resampling the differences D _i

In this subsection we will also introduce resampling methods, particularly wild boostrap methods, which are based on the differences D _i. The wild bootstrap technique is motivated by the residual bootstrap commonly applied in regression analysis, see Wu (1986), Mammen (1992) and Beran (1997), and in time-series testing problems, see Kreiss and Paparoditis (2011). It is also proposed in the context of survival analysis, see Lin (1997) or Beyersmann et al. (2012). Here, we adapt the wild bootstrap to the simple matched pairs design and we will compare the accuracy of the resulting test procedures with the resampling tests in (2.1) in extensive simulation studies. Let $D_{i}^{\ast}$ denote n resampling variables given the original differences D=(D ₁,…,D _n)′, where $D_{i}^{\ast}$ denotes the observed value from

(V)
drawing with replacement from all differences D, or
(VI)
from a wild bootstrap method with $D_{i}^{\ast}= W_{i}D_{i}$, where W _i,i=1,…,n, denote independent and identically distributed random variables, which are independent from the D _i’s, with E(W ₁)=0 and Var(W ₁)=1.

The corresponding resampling tests are then defined as in (2.2) with the paired t-test type resampling statistic

(2.5)

where now $\overline{D}_{n}^{\ast}= n^{-1}\sum_{i=1}^{n} D_{i}^{\ast}$ denotes the mean of the resampled differences, and $V_{n}^{\ast2} = (n-1)^{-1}\sum_{i=1}^{n}(D_{i}^{\ast}-\overline {D}_{n}^{\ast})^{2}$ denotes the sample variance of the $D_{i}^{\ast}$’s. The effectiveness of these resampling procedures is given in the next theorem.

Theorem 2.2

The resampling tests $\varphi_{n}^{*}$ defined in (2.5) are asymptotically effective with respect to φ _t and consistent under both resampling schemes (V) and (VI).

Example and Remark 2.1

In our simulation study in Sect. 3, we will focus on the following weight examples. However, there are of course others that may be of interest for particular situations.

(a)
W _i,i=1,…,n is a sequence of symmetric independent and identically distributed random variables with
In this case it even holds that $E(W_{1}^{3})=1$. These wild bootstrap weights are typically used for studentized test statistics, see e.g. Kreiss and Paparoditis (2011). We will call the corresponding test Rademacher wild bootstrap.
(b)
W _i,i=1,…,n, is a sequence of independent and identically distributed Gaussian random variables, i.e. W _i∼N(0,1). This corresponds to the resampling procedure proposed by Lin (1997).

We note that Arlot et al. (2010a, 2010b) investigate wild bootstrap methods for multiple comparisons and confidence intervals in high-dimensional data using random signs W _i,i=1,…,n, with distribution P(W ₁=−1)=P(W ₁=1)=1/2. This resampling method, however, is equivalent to the resampling scheme (II). For further details we refer the reader to Janssen (1999b).

Theorems 2.1 and 2.2 state that all the considered procedures fulfill the permanence property, thus, the corresponding tests $\varphi_{n}^{\ast}$ are asymptotically valid. The numerical algorithm for the computation of the p-value is as follows

(1)
Given the data X, compute the paired t-test statistic T _n,stud as given in (1.2).
(2)
Repeat the resampling steps N times (e.g. N=10,000), compute the values $T_{n,stud}^{\ast}$ and save them in A ₁,…,A _N.
(3)
Estimate the two-sided p-value by
In comparison to that the one-sided p-value is given by p ₁.

3 Simulations

For testing the two-sided null hypothesis H ₀:μ ₁=μ ₂ formulated above, we consider the unconditional t-test φ _t based on the T(n−1)-approximation of the statistic T _n,stud in (1.2) and the various conditional resampling tests $\varphi_{n}^{\ast}$ based on the resampling schemes (I) through (VI) as described in Sect. 2. The simulation studies are performed to investigate their behaviour with regard to maintaining the pre-assigned type-I error level under the hypothesis, and the power of the statistics under alternatives. The observations X _i=(X _i,1,X _i,2)′,i=1,…,n, were generated using marginal distributions F _s and varying correlations ρ∈(−1,1). We hereby generate exchangeable matched pairs having a bivariate normal, exponential, log-normal or uniform distribution, each with correlation ρ∈(−1,1), as well as non-exchangeable data by simulating

(a)
F ₁=N(0,1) and F ₂=N(0,2),
(b)
F ₁=N(0,1) and F ₂=N(0,4),
(c)
F ₁=N(3,4) and $F_{2}= \chi_{3}^{2}$, and
(d)
F ₁=N(exp(0.5),3) and F ₂=LN(0,1),

each with correlation ρ, respectively. Routine calculations show that μ ₁=μ ₂ is fulfilled in all of these considerations. We only consider the small sample sizes n=7 and n=10 throughout this paper. All simulations were conducted with the help of R-computing environment, version 2.13.2 (www.r-project.org), each with nsim=10,000 and N=10,000 bootstrap runs. The simulation results for exchangeable normally, exponentially, log-normally, and uniformly distributed matched pairs with the very small sample size of n=7 and different correlations ρ are displayed in Table 1.

Table 1 Type-I error level (α=5 %) simulations for very small sample sizes (n=7) with exchangeable distributions

Full size table

It follows from Table 1 that the paired t-test is an accurate procedure for symmetric distributions (normal and uniform), even for the very small sample size of n=7. When the data are skewed (exponential and log-normal), the t-test tends to be conservative. It is apparent that both the wild bootstrap methods using the Rademacher weights as defined in Remark 2.1(b) and the Gaussian weights given in Remark 2.1(c) are inappropriate tests for such small sample sizes. The resampling test with Rademacher weights is very liberal. This can be explained by the fact that these weights are very skewed distributed. Roughly speaking, both wild bootstrap resampling distributions are too far away from the distribution of T _n,stud, when n is rather small and the original data are not resampled. Simply drawing the differences from the data with replacement can not be recommended either. The corresponding test tends to be quite liberal when the data are skewed. This occurs, because the resampling variability (i.e. the variability within the resampling variables $D_{i}^{\ast}$) is rather small when n=7. However, drawing with replacement from either all 2n observations or from each marginal separately, results in more accurate test decisions. Comparing these results with the permutation based approaches, it is easily seen that both kind of permutation tests (i.e. to permute all data, or to permute within the sample unit) control the type-I error level for all distributions and all dependencies ρ in the data. Next we investigate the behaviour of the different resampling tests for larger n=10. The simulation results are displayed in Table 2.

Table 2 Type-I error level (α=5 %) simulations for moderate sample sizes (n=10) with exchangeable distributions

Full size table

From Table 2 an interesting phenomenon of replacement procedures with resampling scheme (IV) and (V) can be observed: The rejection rates do not converge linearly in n to α. The tests are more liberal with n=10 than with n=7. Their liberality increases with an increasing n up to the breakpoint n≈15. With larger n (e.g. n≥30), all resampling test based on drawing with replacements are accurate. The liberality of the wild bootstrap tests using Rademacher or Gaussian weights decrease. Both kind of permutation approaches, however, are still the most accurate procedures.

Now we investigate how accurate the tests control the type-I error level when both marginal distributions are different. The simulation results for different non-exchangeable distributions (a) through (d) with n=7 and varying correlations are displayed in Table 3.

Table 3 Type-I error level (α=5 %) simulations for very small sample sizes (n=7) with non-exchangeable distributions (a) through (d) as described in the text

Full size table

It follows from Table 3 that both the permutation approaches are accurate, even for non-exchangeable distributions, n=7, and permutations of all data X. When two distributions with extremely different shapes and negative correlations (normal versus log-normal) are compared, they tend to be slightly liberal. The same conclusions, however, can be drawn for the t-test. In Table 4 the simulation results for n=10 and the same non-exchangeable distributions are given.

Table 4 Type-I error level (α=5 %) simulations for moderate sample sizes (n=10) with non-exchangeable distributions (a) through (d) as described in the text

Full size table

For larger n=10, both permutation approaches are accurate and demonstrate a similar behaviour to the t-test.

To compare the power of the tests, we generate bivariate normally and log-normally distributed matched pairs with n=10 and n=20, respectively, each with correlation ρ=1/2. Hereby, we shifted the data under time-point 2 with δ∈(0,1). The simulation results for n=10 are displayed in Table 5. Although both the wild bootstrap methods using Rademacher and Gaussian weights, as well as the resampling tests based on scheme (III)–(V) were quite liberal in these situations, we included them in the power simulation study. However, to give a fair comparison between the procedures, we will not grade them in detail and concentrate on the t-test and the permutation based approaches.

Table 5 Power (α=5 %) simulations for moderate sample sizes (n=10) and ρ=1/2

Full size table

It follows from Table 5 that both the permutation approaches have a comparable power to the t-test under normality. Under non-normality, the power of the permutation based approaches is remarkably higher. The same conclusions can be drawn for n=20, as can be seen from Table 6.

Table 6 Power (α=5 %) simulations for moderate sample sizes (n=20) and ρ=1/2

Full size table

4 Discussion

We analyzed two different permutation approaches for testing H ₀:μ ₁=μ ₂ with paired data under non-normality. Particularly, we demonstrated that the usual assumption of exchangeability is not necessary for the construction of permutation tests. We have analytically shown that permutation approaches, which are based on permutations of all observed data (i.e. neglecting the dependency structure), are asymptotically valid procedures. The results are obtained by investigating the conditional permutation distribution of studentized statistics. All results in this paper would not hold without the studentization. The investigation of permutation techniques in heteroscedastic repeated measures designs will be part of future research.

In this paper, only mean based approaches were considered. Rank-based studentized permutation tests are proposed by Konietschke and Pauly (2012).

References

Arlot, S., Blanchard, G., Roquain, E.: Some nonasymptotic results on resampling in high dimension, I: confidence regions. Ann. Stat. 38, 51–82 (2010a)
Article MATH MathSciNet Google Scholar
Arlot, S., Blanchard, G., Roquain, E.: Some nonasymptotic results on resampling in high dimension, II: multiple tests. Ann. Stat. 38, 83–99 (2010b)
Article MATH MathSciNet Google Scholar
Basso, D., Pesarin, F., Salmaso, L., Solari, A.: Permutation Tests for Stochastic Ordering and ANOVA. Springer, New York (2009)
MATH Google Scholar
Beran, R.: Diagnosing bootstrap success. Ann. Inst. Stat. Math. 49, 1–24 (1997)
Article MATH MathSciNet Google Scholar
Beyersmann, J., Di Termini, S., Pauly, M.: Weak convergence of the wild bootstrap for the Aalen-Johansen estimator of the cumulative incidence function of a competing risk. Scand. J. Stat. (2012). doi:10.1111/j.1467-9469.2012.00817.x
MATH Google Scholar
Bickel, P.J., Freedman, D.A.: Some asymptotic theory for the bootstrap. Ann. Stat. 9, 1196–1217 (1981)
Article MATH MathSciNet Google Scholar
Brombin, C., Salmaso, L.: Multi-aspect permutation tests in shape analysis with small sample size. Comput. Stat. Data Anal. 53, 3921–3931 (2009)
Article MATH MathSciNet Google Scholar
Brombin, C., Salmaso, L., Ferronato, G., Galzignato, P.-F.: Multi-aspect procedures for paired data with application to biometric morphing. Commun. Stat., Simul. Comput. 40, 3921–3931 (2011)
MathSciNet Google Scholar
Delaigle, A., Hall, P., Jin, J.: Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic. J. R. Stat. Soc. B 73, 283–301 (2011)
Article MathSciNet Google Scholar
Good, P.: Permutation, Parametric and Bootstrap Tests of Hypotheses, 3rd edn. Springer Series in Statistics. Springer, New York (2005)
MATH Google Scholar
Hall, P., Wilson, S.R.: Two guidelines for bootstrap hypothesis testing. Biometrics 47, 757–762 (1991)
Article MathSciNet Google Scholar
Janssen, A.: Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem. Stat. Probab. Lett. 36, 9–21 (1997)
Article MATH Google Scholar
Janssen, A.: Testing nonparametric statistical functionals with application to rank tests. J. Stat. Plan. Inference 81, 71–93 (1999a). Erratum: J. Stat. Plan. Inference 92, 297 (2001)
Article MATH Google Scholar
Janssen, A.: Nonparametric symmetry tests for statistical functionals. Math. Methods Stat. 8, 320–343 (1999b)
MATH Google Scholar
Janssen, A.: Resampling Student’s t-type statistics. Ann. Inst. Stat. Math. 57, 507–529 (2005)
Article MATH Google Scholar
Janssen, A., Pauls, T.: How do bootstrap and permutation tests work? Ann. Stat. 31, 768–806 (2003)
Article MATH MathSciNet Google Scholar
Janssen, A., Pauls, T.: A Monte Carlo comparison of studentized bootstrap and permutation tests for heteroscedastic two-sample problems. Comput. Stat. 20, 369–383 (2005)
Article MATH MathSciNet Google Scholar
Janssen, A., Pauly, M.: Asymptotics and effectiveness of conditional tests with applications to randomization tests. Tech. Report, University of Duesseldorf (2010)
Konietschke, F., Pauly, M.: A studentized permutation test for the non-parametric Behrens-Fisher problem in paired data. Electron. J. Stat. 6, 1358–1372 (2012)
Article MATH MathSciNet Google Scholar
Kreiss, J.-P., Paparoditis, E.: Bootstrap for dependent data: a review, with discussion, and a rejoinder. J. Korean Stat. Soc. 40, 357–378, 393–395 (2011)
Article MathSciNet Google Scholar
Lin, D.: Non-parametric inference for cumulative incidence functions in competing risks studies. Stat. Med. 16, 901–910 (1997)
Article Google Scholar
Mammen, E.: When Does Bootstrap Work? Asymptotic Results and Simulations. Springer, New York (1992)
Book MATH Google Scholar
Munzel, U.: Nonparametric methods for paired samples. Stat. Neerl. 53, 277–286 (1999)
Article MATH MathSciNet Google Scholar
Neubert, K., Brunner, E.: A studentized permutation test for the non-parametric Behrens-Fisher problem. Comput. Stat. Data Anal. 51, 5192–5204 (2007)
Article MATH MathSciNet Google Scholar
Omelka, M., Pauly, M.: Testing equality of correlation coefficients in an potentially unbalanced two-sample problem via permutation methods. J. Stat. Plan. Inference 142, 1396–1406 (2012)
Article MATH MathSciNet Google Scholar
Pauly, M.: Discussion about the quality of F-ratio resampling tests for comparing variances. Test 20, 163–179 (2011)
Article MathSciNet Google Scholar
Pesarin, F., Salmaso, L.: Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. J. Nonparametr. Stat. 22, 669–684 (2010a)
Article MATH MathSciNet Google Scholar
Pesarin, F., Salmaso, L.: Permutation Tests for Complex Data: Theory, Applications and Software. Wiley, Chichester (2010b)
Book Google Scholar
Pesarin, F., Salmaso, L.: A review and some new results on permutation testing for multivariate problems. Stat. Comput. 22, 639–646 (2012)
Article MathSciNet Google Scholar
Wu, C.: Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat. 14, 1261–1295 (1986)
Article MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to an Associate Editor and two anonymous referees for helpful comments which considerably improved the paper. This work was supported by the German Research Foundation projects DFG-Br 655/16-1 and DFG-Ho 1687/9-1.

Author information

Authors and Affiliations

Department of Medical Statistics, University of Goettingen, Humboldtallee 32, 37073, Goettingen, Germany
Frank Konietschke
Institute of Mathematics, University of Duesseldorf, Universitaetsstrasse 1, 40225, Duesseldorf, Germany
Markus Pauly

Authors

Frank Konietschke
View author publications
You can also search for this author in PubMed Google Scholar
Markus Pauly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Konietschke.

Appendix

The next Lemma explains that it suffices to analyze the limit of the conditional distribution for proving all theorems. In the sequel ‘$\mbox {$\, \stackrel {P}{\longrightarrow } \,$}$’ will denote convergence in probability as n→∞.

Lemma 5.1

Let $\varphi_{n}^{*}$ be one of the resampling tests (I)–(VI) defined as in (2.2). If we have convergence

(5.1)

for all α∈(0,1) and general μ∈ℝ², the test $\varphi_{n}^{*}$ is asymptotically effective with respect to φ _t and consistent.

Proof

For completeness we start by giving a short proof for the asymptotic exactness of φ _t: By the multivariate central limit theorem we have for E(X ₁)=μ convergence in distribution

Since T _n=(1,−1)S _n is a linear transformation of S _n we get from the continuous mapping theorem and Polya’s Theorem that under H ₀:μ ₁=μ ₂ sup_x∈ℝ|P(T _n≤x)−Φ(x/σ)|→0, where with $\sigma_{j}^{2} := \mathrm{Var}(X_{1,j}), j=1,2$ and σ ₁₂=Cov(X _1,1,X _1,2). Since $V_{n}^{2}$ is a consistent estimator of σ ² the result follows from Slutzky’s Theorem.

Note that by (5.1) Lemma 1 in Janssen and Pauls (2003) implies (2.3). Moreover, since the convergence (5.1) also holds under alternatives μ ₁≠μ ₂, the result follows from the convergence

□

In the following we will apply Lemma 5.1. Note, that in order to prove (5.1) it suffices to show that the conditional resampling distribution converges weakly to a standard normal distribution in probability, i.e.

(5.2)

Proof of Theorem 2.1

We start by analyzing the resampling scheme (I) which is based on permuting the pooled sample:

Let Z ₁,…,Z _2n with Z _i=X _i,1 for 1≤i≤n and Z _i=X _i,2 for n+1≤i≤2n denote the pooled sample. For studying the permutation test based on the resampling scheme (I) let π be a random permutation of (1,…,2n), i.e. a random variable that is uniformly distributed on the symmetric group $\mathcal{S}_{2n}$, that is independent from X. Consider the modified studentized version of T _n,stud

see Eqs. (4.1)–(4.2) in Janssen (2005). We will complete the proof for (I) as follows: First we prove that the permutation version of $\frac{T_{n}}{\tilde{V}_{n}}$ derived from (I) fulfills (5.2). After that we argue that $T_{n,stud}^{*}$ has the same asymptotic limit behaviour by discussing the different permutation versions of the standardizations. For the first part we apply Theorem 4.1. in Janssen (2005). Note, that we have convergences in probability

by the law of large numbers, and

(5.3)

by the fact that $(X_{i,j}/\sqrt{n}\,)_{1\leq i\leq n}$ fulfill the Lindeberg condition for each j=1,2, which is more restrictive. Hence Condition (1.12) in his paper is fulfilled and Theorem 4.1 in Janssen (2005) implies a conditional central limit theorem for the permutation version of $\tilde{T}_{n,stud}$

$$\sup_{x\in \mathbb {R}}\bigl \vert P\bigl(\tilde{T}_{n,stud} \bigl((Z_{\pi(i)})_{1\leq i\leq 2n} \bigr)\leq x\bigr) -\varPhi(x) \bigr \vert \mbox {$\, \stackrel {P}{\longrightarrow } \,$}0. $$

Note that T _n,stud and $\tilde{T}_{n,stud}$ only differ in their standardizations. Hence, to complete the proof, we have to show that the difference $\tilde{V}_{n}^{2}((Z_{\pi(i)})_{1\leq i\leq2n}) - V_{n}^{2}((Z_{\pi (i)})_{1\leq i\leq2n})$ converges in probability to zero (note that both are positive on a set with probability tending to 1). Straightforward calculations show that

We will first study $R_{n,1}^{\pi}$. The conditional expectation fulfills

Here o _P(1) stands for a sequence that converges in probability to zero as n→∞. Moreover, for the conditional second moment we have by the law of large numbers

Note that the third step comprised iterated applications of the law of large numbers together with inequalities that involve the convergence in probability $\max_{1\leq i \leq2n} Z_{i}^{2}/n\mbox {$\, \stackrel {P}{\longrightarrow } \,$}0$. Altogether this shows $\mathrm{Var}(R_{n,1}^{\pi}) \mbox {$\, \stackrel {P}{\longrightarrow } \,$}0$ so that $R_{n,1}^{\pi}$ converges in probability to $\frac{1}{4} (\mu_{1}+\mu_{2})^{2}$. For $R_{n,2}^{\pi}$ similar calculations as above show that

Thus $\frac{1}{n} \sum_{j=1}^{n}Z_{\pi(j)}$ converges in probability to $\frac{1}{2} (\mu_{1}+\mu_{2} )$. Since the same holds true for $(\frac{1}{n} \sum_{j=1}^{n}Z_{\pi(n+j)})$ it follows that R _n,2 converges in probability to $\frac{1}{4} (\mu_{1}+\mu_{2})^{2}$ which completes the proof for the resampling scheme (I).

The proof for scheme (III), where we draw the resample with replacement from the pooled sample, can be obtained with similar methods.

Since case (II) is a special example of (VI), see Remark 2.1 above, the result follows from Theorem 2.2. Hence it remains to prove (IV). Therefore we can again proceed as in the proof of (I). First it follows from Theorem 4.2. in Janssen (2005) that $\tilde {T}_{n,stud}^{*}=\tilde{T}_{n,stud}(\mathbf{X}^{*})$ is asymptotically standard normal, i.e. (5.2) holds with ${T}_{n,stud}^{*}$ replaced by $\tilde {T}_{n,stud}^{*}$. Again the asymptotic equivalence of the different studentizations follows as in case (I). □

Proof of Theorem 2.2

We start by verifying the result for (V). Therefore we will apply Theorem 3.1. in Janssen (2005) with the array $X_{n,i} := D_{i}/\sqrt{n}$. Note that by Eq. (3.4) in his paper the result follows from the convergences $\sum_{i=1}^{n} (X_{n,i} - \overline{X})^{2} = \frac{1}{n}\sum_{i=1}^{n} D_{i}^{2} - \overline{D}_{n}^{2} \mbox {$\, \stackrel {P}{\longrightarrow } \,$}\mathrm{Var}(D_{1}) $ and $\max_{1\leq i\leq n} |X_{n,i}|\mbox {$\, \stackrel {P}{\longrightarrow } \,$}$ 0. Here the last convergence is a consequence of (5.3). This finishes the proof for (V).

For the last case (VI) we analyze foremost the conditional distribution of the enumerator of the wild bootstrap t-type statistic $\sqrt{n}\,\overline{D}_{n}^{*}$. Note, that given the data X

$$W_{n,i} := \frac{1}{\sqrt{n}} W_i D_i,\quad1 \leq i\leq n $$

defines an array of row-wise independent random variables. It fulfills

$$E(W_{n,i}|\mathbf{X}) = 0 \quad\mbox{and} \quad \mathrm{Var}(W_{n,i}| \mathbf {X}) = \frac{1}{n}D_i^2. $$

Hence the conditional variance of $\sqrt{n}\,\overline{D}_{n}^{*}$ fulfills

Since we also have

for all ϵ>0 by the dominated convergence theorem, Lindeberg’s central limit theorem implies

$$\sup_{x\in \mathbb {R}}\bigl| P\bigl(\sqrt{n}\,\overline{D}_n^*\leq x\bigl|\mathbf {X}\bigr) -\varPhi(x/\sigma_W) \bigr| \mbox {$\, \stackrel {P}{\longrightarrow } \,$}0. $$

By Slutzky’s Lemma it remains to prove that $V_{n}^{* 2}$ converges in probability to $\sigma_{W}^{2}$. But this follows from the law of large numbers since

$$\frac{n-1}{n}V_n^{* 2} = \frac{1}{n} \sum _{i=1}^n (W_i D_i )^2 - \Biggl(\frac {1}{n}\sum_{j=1}^n W_jD_j \Biggr)^2 $$

converges in probability to $E((W_{1}D_{1})^{2}) - E(W_{1}D_{1})^{2} = E(W_{1}^{2})E(D_{1}^{2}) = \sigma_{W}^{2}$. □

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Cite this article

Konietschke, F., Pauly, M. Bootstrapping and permuting paired t-test type statistics. Stat Comput 24, 283–296 (2014). https://doi.org/10.1007/s11222-012-9370-4

Download citation

Received: 15 August 2012
Accepted: 15 November 2012
Published: 08 January 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11222-012-9370-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bootstrapping and permuting paired t-test type statistics

Abstract

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App

1 Introduction