Permutation Tests Using Arbitrary Permutation Distributions

Ramdas, Aaditya; Barber, Rina Foygel; Candès, Emmanuel J.; Tibshirani, Ryan J.

doi:10.1007/s13171-023-00308-8

Permutation Tests Using Arbitrary Permutation Distributions

Open access
Published: 03 June 2023

Volume 85, pages 1156–1177, (2023)
Cite this article

Download PDF

You have full access to this open access article

Sankhya A Aims and scope Submit manuscript

Permutation Tests Using Arbitrary Permutation Distributions

Download PDF

Aaditya Ramdas ORCID: orcid.org/0000-0003-0497-311X¹,
Rina Foygel Barber²,
Emmanuel J. Candès³ &
…
Ryan J. Tibshirani⁴

1688 Accesses
Explore all metrics

Abstract

Permutation tests date back nearly a century to Fisher’s randomized experiments, and remain an immensely popular statistical tool, used for testing hypotheses of independence between variables and other common inferential questions. Much of the existing literature has emphasized that, for the permutation p-value to be valid, one must first pick a subgroup G of permutations (which could equal the full group) and then recalculate the test statistic on permuted data using either an exhaustive enumeration of G, or a sample from G drawn uniformly at random. In this work, we demonstrate that the focus on subgroups and uniform sampling are both unnecessary for validity—in fact, a simple random modification of the permutation p-value remains valid even when using an arbitrary distribution (not necessarily uniform) over any subset of permutations (not necessarily a subgroup). We provide a unified theoretical treatment of such generalized permutation tests, recovering all known results from the literature as special cases. Thus, this work expands the flexibility of the permutation test toolkit available to the practitioner.

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Suppose we observe data $X_{1},\dots ,X_{n}\in \mathcal {X}$, and would like to test the null hypothesis

$$ H_{0}: X_{1},\dots,X_{n}\ \text{are exchangeable}. $$

(1.1)

(Note that the hypothesis that the X_i’s are i.i.d., is a special case of this null.) We assume that we have a pre-specified test statistic, which is a function $T: \mathcal {X}^{n} \rightarrow \mathbbm {R}$, where, without loss of generality, we let larger values of $T(X) = T(X_{1},\dots ,X_{n})$ indicate evidence in favor of an alternative hypothesis.

Since the null distribution of the X_i’s is not specified exactly, we usually do not know the null distribution of T(X). The permutation test avoids this difficulty by comparing T(X) against the same function applied to permutations of the data. To elaborate, let $\mathcal {S}_{n}$ denote the set of all permutations on $[n]:=\{1,\dots ,n\}$, and define

$$ x_{\sigma} := (x_{\sigma(1)},\dots,x_{\sigma(n)}) $$

for any $x\in \mathcal {X}^{n}$ and any $\sigma \in \mathcal {S}_{n}$. Then, we can compute a p-value

(1.2)

which ranks T(X) amongst $\{T(X_{\sigma })\}_{\sigma \in \mathcal {S}_{n}}$ sorted in decreasing order. Then, under the null hypothesis H₀, P is a valid p-value, meaning $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1].^{Footnote 1}

As an example, suppose that the observed data set actually consists of pairs (X_i,Y_i), which are assumed to be i.i.d. from some joint distribution. If we are interested in testing , we can reframe this question as testing whether $X_{1},\dots ,X_{n}$ are i.i.d. conditional on $Y_{1},\dots ,Y_{n}$—in particular, under $H_{0}^{\prime }$, it holds that X follows an exchangeable distribution conditional on Y. Our test statistic T might be chosen as

$$T(X) = \left|\text{Corr}\big((X_{1}, \dots,X_{n}),(Y_{1},\dots,Y_{n})\big)\right|.$$

In order to see whether the observed correlation is sufficiently large to be statistically significant, we would compare T(X,Y ) to the correlations computed on permuted data,

$$T(X_{\sigma})=\left|\text{Corr}\big((X_{\sigma(1)},\dots,X_{\sigma(n)}),(Y_{1},\dots,Y_{n})\big)\right|.$$

The resulting p-value computed as in Eq. 1.2 is then a valid p-value under the null hypothesis $H_{0}^{\prime }$. In addition to testing independence, permutation tests are also commonly used for testing other hypotheses, such as whether two samples follow the same distribution.^{Footnote 2}

The p-value P computed in Eq. 1.2 requires computing T(X_σ) for every $\sigma \in \mathcal {S}_{n}$. One may naturally be interested in reducing the computational cost of this procedure, since computing T(X_σ) for $|\mathcal {S}_{n}|=n!$ many permutations may be computationally prohibitive for even moderately large n. As is well known, we can obtain valid p-values by uniformly randomly sampling permutations from $\mathcal {S}_{n}$ and computing

(1.3)

in which σ₁,…,σ_M are i.i.d. uniform draws from $\mathcal {S}_{n}$.

In a different direction, one can also reduce the set of permutations σ to subsets of $\mathcal {S}_{n}$. Specifically, let $G\subseteq \mathcal {S}_{n}$ be any subset, and define

(1.4)

where |G| is the cardinality of G.

Clearly, if G does not contain the identity permutation, then P cannot be a p-value because it could potentially take on the value zero. However, including the identity permutation is not sufficient. The literature repeatedly emphasizes that P defined in Eq. 1.4 is a valid p-value only if the subset $G\subseteq \mathcal {S}_{n}$ is in fact a subgroup^{Footnote 3} (Hemerik and Goeman, 2018, Theorem 1).

The subgroup G mentioned above may be chosen strategically to balance between computational efficiency and the power of the test (see, e.g., Hemerik and Goeman (2018) and Koning and Hemerik (2022)). In case G has a large cardinality, the aforementioned references show that sampling permutations uniformly at random from G also yields valid p-values—that is, the randomized p-value P from Eq. 1.3 is valid if $\sigma _{1},\dots ,\sigma _{M}$ are i.i.d. samples drawn uniformly from G rather than from $\mathcal {S}_{n}$. Again, choosing G to be a subgroup (rather than an arbitrary subset), and sampling uniformly rather than from an arbitrary distribution, are both important for the validity of the resulting p-value.

1.1 Contributions

The background above naturally leads to the following question: while it is indeed correct that P from Eq. 1.4 is not a p-value for general subsets $G \subseteq \mathcal {S}_{n}$, is it possible to slightly modify the definition of P so that it retains its validity for subsets G that are not subgroups? Further, while sampling the σ_m’s nonuniformly from G would destroy the validity of P from Eq. 1.3, can we modify the definition of P so that nonuniform sampling from a set is allowed? The first question is addressed by Hemerik and Goeman (2018), as we will describe below; to our knowledge, the second question has not been addressed in the literature.

In this paper, we will broaden the applicability of permutation tests and present generalizations, which yield valid p-values even when we sample permutations—with or without replacement—from a non-uniform distribution over all permutations or from an arbitrary subset of permutations. In doing so, we shall carefully explain how this generalization relates and extends all previous options. This generalization yields new and more flexible permutation test methods; we leave a detailed study of pros and cons of these generalizations (such as how they trade off the two types of errors) to future work.

2 A Generalized Permutation Test

2.1 Testing with an Arbitrary Distribution

We now present our first generalization of the permutation test. It allows us to use any (not necessarily uniform) distribution over $\mathcal {S}_{n}$ in order to construct our permutation p-value.

Theorem 2.1.

Let q be any distribution over $\sigma \in \mathcal {S}_{n}$. Let $\sigma _{0}\sim q$ be a random draw, and define

(2.1)

Then P is a valid p-value, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1].

In this theorem, validity is retained when conditioning on the order statistics of the data, meaning that $\mathbb {P}_{H_{0}}\left \{P\leq \alpha | X_{(1)},\dots ,X_{(n)}\right \}\leq \alpha $, where $X_{(1)}\leq {\dots } \leq X_{(n)}$ are the order statistics of $X=(X_{1},\dots ,X_{n})$.^{Footnote 4} The reason that this holds is simply because H₀ remains true even conditional on the order statistics—that is, if X is exchangeable, then $X\mid (X_{(1)},\dots ,X_{(n)})$ is again exchangeable. The same conditional validity holds for all results to follow, as well. However, one cannot condition on σ₀; the result only holds marginally over σ₀, and this external randomization is key to retaining validity.

We defer the proof to Section 2.3—we will first discuss connections to the existing literature in order to provide more context and intuition for the above theorem.

Uniform Distribution Over a Subgroup

To begin with, assume q is the uniform distribution over a fixed subgroup G of $\mathcal {S}_{n}$. Then in this case, the p-value in Eq. 2.1 takes the special form

where the second equality holds because a subgroup G is closed under inverses and composition, so $\{\sigma \circ \sigma _{0}^{-1} : \sigma \in G\} = G$ for any σ₀ ∈ G. This simple observation recovers a well-known fact we discussed earlier; namely, one can restrict the set of permutations to an arbitrary subgroup, and the p-value P defined in Theorem 2.1 will then coincide with our earlier definition Eq. 1.4 (proved to be a valid p-value in (Hemerik and Goeman, 2018, Theorem 1)).

Uniform Distribution Over a Subset

Consider now a uniform distribution over an arbitrary subset S that is not a subgroup. In this case, the definition of P in Theorem 2.1 is equal to

(2.2)

as proposed earlier by Hemerik and Goeman (2018). This is, in general, not the same as

(2.3)

(which is equivalent to the quantity defined in Eq. 1.4 earlier, with the subset S in place of a subgroup G). As we shall see below, $P^{\prime }$ is generally not a p-value, a fact which can cause large issues in practice, as has been frequently emphasized. For example, consider the tool of balanced permutations—in the setting of testing whether a randomly assigned treatment has a zero or nonzero effect, this method has been proposed as a variant of the permutation test in this setting, where the subset S consists of all permutations such that the permuted treatment group contains exactly half of the original treatment group, and half of the original control group. Southworth et al. (2009) show that the quantity $P^{\prime }$ computed in Eq. 2.3 for this choice of subset S can be substantially anti-conservative, i.e., $\mathbb {P}\left \{P\leq \alpha \right \}> \alpha $, particularly for low significance levels α. (See also Hemerik and Goeman (2018) for additional discussion of this issue.)

A simple example may help to illustrate this point, and to give intuition for the role of the random permutation σ₀.

Example 2.1.

Let n = 4, and consider the set

$$ S = \{\text{Id} , \sigma_{1\leftrightarrow 3, 2\leftrightarrow 4}, \sigma_{1\leftrightarrow 4, 2\leftrightarrow 3}\},$$

where, e.g., $\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}$ is the permutation swapping entries 1 and 3 and also swapping 2 and 4. Let $X_{1},X_{2},X_{3},X_{4}\overset {iid}{\sim } \mathcal {N}(0,1)$ be standard normal random variables (so that the null hypothesis of exchangeability, H₀, is satisfied), and set T(X) = X₁ + X₂. Then the quantity $P^{\prime }$ defined in Eq. 2.3 is equal to

This gives

$$ P^{\prime} = \begin{cases}\frac{1+0+0}{3} = \frac{1}{3}, & \text{ if }X_{3} + X_{4} < X_{1} + X_{2} , \\ \frac{1+1+1}{3}= 1, & \text{ otherwise}\end{cases} $$

and, therefore,

$$P^{\prime} = \begin{cases} \frac{1}{3},& \text{with probability} \frac{1}{2},\\ 1,& \text{with probability} \frac{1}{2}.\end{cases}$$

We can thus see that $P^{\prime }$ is anti-conservative at the threshold $\alpha = \frac {1}{3}$.

Next, we will see how the correction in Eq. 2.2 fixes the failure described above. Denote by P_σ the p-value in Eq. 2.2 calculated conditional on the random σ₀ being equal to σ, so that

$$ P = \begin{cases} P_{\text{Id}}, & \text{ w.p. } 1/3, \\ P_{\sigma_{1\leftrightarrow 4, 2\leftrightarrow 3}}, & \text{ w.p. } 1/3, \\ P_{\sigma_{1\leftrightarrow 3, 2\leftrightarrow 4}}, & \text{ w.p. } 1/3. \end{cases} $$

(2.4)

Then, the calculation that was previously performed effectively shows that

$$ P_{\text{Id}} = \begin{cases} \frac{1}{3}, & \text{ if }X_{3} + X_{4} < X_{1} + X_{2} , \\ 1, & \text{ otherwise}.\end{cases} $$

A similar straightforward calculation then yields

$$ P_{\sigma_{1\leftrightarrow 3, 2\leftrightarrow 4}} = P_{\sigma_{1\leftrightarrow 4, 2\leftrightarrow 3}} = \begin{cases} \frac{2}{3}, & \text{ if }X_{3} + X_{4} < X_{1} + X_{2} , \\ 1, & \text{ otherwise}.\end{cases} $$

Put together, we obtain

$$ P = \begin{cases} \frac{1}{3}, & \text{ w.p. } 1/6, \\ \frac{2}{3}, & \text{ w.p. } 1/3, \\ 1, & \text{ w.p. } 1/2. \end{cases} $$

(2.5)

This is indeed stochastically larger than uniform, and is thus a valid p-value.

The role of σ ₀

To better understand the role of the random permutation σ₀, let us consider Example 2.1 again, and look more closely at what goes wrong there. We observe that $P^{\prime }$ compares the observed statistic T(X) against the set $ \{T(X_{\sigma })\}_{\sigma \in S} = \{T(X_{\text {Id}}),T(X_{\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}}) , T(X_{\sigma _{1\leftrightarrow 4, 2\leftrightarrow 3}})\}$. For $P^{\prime }$ to be a valid p-value, given the (unordered) set of potential data vectors $\{X_{\text {Id}},X_{\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}} , X_{\sigma _{1\leftrightarrow 4, 2\leftrightarrow 3}}\}$, it suffices that the actual observed data X is equally likely to be any one of these three. Now suppose this set is equal to {(0.8,0.5,0.2,1.0),(1.0,0.2,0.5,0.8),(0.5,0.8,1.0,0.2)}, in no particular order. Each of these three vectors have equal likelihood under H₀ (due to exchangeability). Counterintuitively, however, our knowledge of the subset of permutations S implies that we must have X = (1.0,0.2,0.5,0.8)—otherwise we could not have obtained this particular set. For instance, if X = (0.8,0.5,0.2,1.0), then we would have $X_{\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}} = (0.2, 1.0, 0.8, 0.5)$—but this does not lie in our set, so it cannot be the correct value of X. In other words, if we condition on the unordered set {X_σ}_σ∈S, which is the orbit of the data X under the actions of permutations σ ∈ S, our intuition tells us that X is equally likely to be any element of this orbit—but in fact, for a non-subgroup S, X might be uniquely identified from its orbit.

Now consider what happens if we compute the corrected p-value P in Eq. 2.2 and let us once more examine the question of identifying the data X from its orbit. The p-value P compares the observed statistic T(X) against the set $\{T(X_{\sigma \circ \sigma _{0}^{-1}})\}_{\sigma \in S}$, and so now the question is whether we can identify X from the set $\{X_{\sigma \circ \sigma ^{-1}_{0}}\}_{\sigma \in S}$, which is the orbit of $X_{\sigma _{0}^{-1}}$ for a randomly drawn σ₀ ∈ S. Identifying X is no longer possible because of the random σ₀. For instance, working again with Example 2.1, suppose this set $\{X_{\sigma \circ \sigma ^{-1}_{0}}\}_{\sigma \in S}$ is equal to {(0.8,0.5,0.2,1.0),(1.0,0.2,0.5,0.8),(0.5,0.8,1.0,0.2)}, in no particular order. We can identify that this is the orbit of x = (1.0,0.2,0.5,0.8) under S—that is, this set is equal to {x_σ}_σ∈S. Then the following three possibilities are equally likely:

σ₀ = Id and so $X = x_{\text {Id}^{-1}} = x = (1.0, 0.2, 0.5, 0.8)$;
$\sigma _{0} = \sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}$ and so $X = x_{\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}^{-1}} = (0.5,0.8,1.0,0.2)$;
$\sigma _{0} = \sigma _{1\leftrightarrow 4, 2\leftrightarrow 3}$ and so $X = x_{\sigma _{1\leftrightarrow 4, 2\leftrightarrow 3}^{-1}} = (0.8,0.5,0.2,1.0)$.

In other words, X is now equally likely to be any of the three values in our set, and validity is restored.

2.2 Random samples from an arbitrary distribution

Our second generalization concerns permutations that are randomly chosen from an arbitrary distribution.

Theorem 2.2.

Let q be any distribution over $\sigma \in \mathcal {S}_{n}$. Let $\sigma _{0},\sigma _{1},\dots ,$ $\sigma _{M}\overset {iid}{\sim } q$, and define

(2.6)

Then P is a valid p-value, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1].

This result is closely related to Besag and Clifford (1989)’s well known construction for obtaining exchangeable samples from Markov chain Monte Carlo (MCMC) sampling—the details are deferred to Section 4.2.

Just as before, some special cases of this result are well known to statisticians.

Random permutations from $\mathcal {S}_{n}$

In the simple case where q is the uniform distribution over $\mathcal {S}_{n}$, Theorem 2.2 states that

(2.7)

is a valid p-value. The equality in distribution above holds because the ${\sigma _{m}\circ \sigma _{0}^{-1}}$’s are i.i.d. draws from $\mathcal {S}_{n}$. Hence, this recovers the most commonly implemented form of the permutation test.

Random permutations from a subgroup

The distributional equality Eq. 2.7 extends to any uniform distribution q over a subgroup G of $\mathcal {S}_{n}$ since in this case, as before, the random variables ${\sigma _{m}\circ \sigma _{0}^{-1}}$ are i.i.d. draws from G. This gives the following well-known result (see, e.g., Hemerik and Goeman (2018, Theorem 2)):

Corollary 2.1.

Let $G\subseteq \mathcal {S}_{n}$ be a subgroup, and sample $\sigma _{1},\dots ,\sigma _{M}\overset {iid}{\sim }\text {Unif}(G)$. Then

(2.8)

is a valid p-value, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1].

Random permutations from a subset

Consider now the case where q is a uniform distribution over an arbitrary subset S. When S is not a subgroup, the p-value

may not have the same distribution as

Here, Theorem 2.2 gives:

Corollary 2.2.

Let $S\subseteq \mathcal {S}_{n}$ be any fixed subset of permutations. Sample $\sigma _{0},\sigma _{1},\dots ,\sigma _{M}\overset {iid}{\sim }\text {Unif}(S)$. Then

(2.9)

is a valid p-value, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1].

To the best of our knowledge, this statement had not been recorded in the literature. As a variant, the same result holds if we instead draw permutations without replacement.

Corollary 2.3.

Consider the variant of Corollary 2.2 in which the permutations are drawn without replacement. Then the p-value P defined in Eq. 2.9 is a valid p-value, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1]. In the special case where S is a subgroup, the same conclusion also applies for the p-value P defined in Eq. 2.8.

Proof 1.

Let $S^{\prime }\subseteq S$ be a subset of size M + 1 chosen uniformly at random. Let $\sigma _{0},\sigma _{1},\dots ,\sigma _{M}$ be a random ordering of $S^{\prime }$—in particular, this means that σ₀ is drawn uniformly from $S^{\prime }$. Then by Theorem 2.1, applied with q taken to be the uniform distribution over $S^{\prime }$, P is a valid p-value.

The second claim follows from the fact that

$$(\sigma_{1}\circ\sigma_{0}^{-1},\dots,\sigma_{M}\circ\sigma_{0}^{-1})\overset{d}{=} (\sigma_{1},\dots,\sigma_{M}),$$

whenever S is a subgroup. □

To guide the reader, Fig. 1 summarizes the connections between all the results presented thus far in the paper. Interestingly, as highlighted in the figure, Theorems 2.1 and 2.2 can be derived from each other; we will elaborate on this connection below.

Finally, we present another simple example to highlight the necessity of the σ₀ term, in the case of nonuniform sampling. Indeed, even “intuitive” modifications of the uniform sampling scheme may fail to produce valid p-values.

Example 2.2.

If one considers to be a Monte Carlo estimate of the p-value computed in Eq. 1.2, then a lower-variance estimate may be obtained by “antithetic sampling”—that is, pairing a random draw $\sigma _{m}\in \mathcal {S}_{n}$ with its reverse $\text {Rev}(\sigma _{m}) = (\sigma _{m}(n),\dots ,\sigma _{m}(1))$ (see, e.g., Mitchell et al. (2022) for an example of this variance reduction technique). However, using antithetic sampling can lead to an invalid p-value—specifically, if $\sigma _{1},\dots ,\sigma _{M/2}$ are drawn uniformly at random from $\mathcal {S}_{n}$ (or from some subgroup $G\subseteq \mathcal {S}_{n}$), and we then set σ_M/2+i = Rev(σ_i) for each $i=1,\dots ,M/2$, then the quantity $P^{\prime }$ may not be a valid p-value. For instance, suppose we take M = 2, so that σ₂ = Rev(σ₁) where σ₁ is drawn uniformly from $\mathcal {S}_{n}$. Take T(x) = X₁ + X_n, and draw $X_{i}\overset {iid}{\sim } \mathcal {N}(0,1)$. Then

Then we can verify that, conditional on σ₁, if {σ₁(1),σ₁(n)} = {1,n} then P = 1, while if {σ₁(1),σ₁(n)}≠{1,n} then $P^{\prime }=\frac {1}{3}$ or $P^{\prime }=1$ each with probability $\frac {1}{2}$, which yields

$$\mathbb{P}\left\{P^{\prime} = \frac{1}{3}\right\} = \frac{1}{2} - \frac{1}{n(n-1)} > \frac{1}{3},$$

with the last step holding if n > 3. We can thus see that $P^{\prime }$ is anti-conservative at the threshold $\alpha = \frac {1}{3}$.

2.3 Proof of Theorem 2.1

First, for any fixed $\sigma ^{\prime }\in \mathcal {S}_{n}$, we have

(2.10)

because $X \overset {d}{=} X_{\sigma ^{\prime }}$ under H₀ (and note that $(X_{\sigma ^{\prime }})_{\sigma \circ \sigma '{}^{-1}} = X_{\sigma }$). Next, we will apply a deterministic inequality by Harrison (2012): for all $t_{1},\dots ,t_{N} \in [-\infty ,\infty ]$ and all $\alpha ,w_{1},\dots ,w_{N} \in [0,\infty ]$,

Applying this bound with q(σ)’s in place of the w_i’s, and T(X_σ)’s in place of the t_i’s, we obtain

(2.11)

Finally, we have

where the third step holds since $\sigma _{0}\sim q$ is drawn independently of the data X, while the last two steps apply Eq. 2.10 and Eq. 2.11.

2.4 Connecting Theorems 2.1 and 2.2

As mentioned earlier, Theorems 2.1 and 2.2 can be derived from each other. We now give these proofs to show the connection.

Proof 2 (Alternative proof of Theorem 2.1 (via Theorem 2.2 )).

Let $\sigma _{0},\sigma _{1},\sigma _{2},\dots \overset {iid}{\sim } q$, and for any fixed M, define

By the Law of Large Numbers, we see that almost surely for all $\sigma \in \mathcal {S}_{n}$, and therefore, $P_{M}\rightarrow P$ almost surely, where P is the p-value defined in Eq. 2.1. In particular, this implies that P_M converges to P in distribution, and therefore

$$\mathbb{P}\left\{P\leq \alpha\right\} = \lim_{M\rightarrow\infty}\mathbb{P}\left\{P_{M}\leq \alpha\right\} \leq \alpha,$$

where the last step holds since, for every M ≥ 1, P_M is a valid p-value by Theorem 2.2. □

Proof 3 (Proof of Theorem 2.2 (via Theorem 2.1 )).

Let $\sigma _{0},\sigma _{1},\dots ,\sigma _{M}\overset {iid}{\sim } q$, and define the empirical distribution

$$\widehat{q} = \frac{1}{M+1}\sum\limits_{m=0}^{M} \delta_{\sigma_{m}},$$

where δ_σ is the point mass at σ. Now we treat $\widehat {q}$ as fixed. Let k be drawn uniformly from $\{0,\dots ,M\}$ (that is, σ_k is drawn at random from $\widehat {q}$). Applying Theorem 2.1 with $\widehat {q}$ in place of q, we then see that

is a valid p-value conditional on $\widehat {q}$, and therefore also valid after marginalizing over $\widehat {q}$. Since $\sigma _{0},\dots ,\sigma _{M}$ are drawn i.i.d. and are therefore in a random order, we see that

which is the desired p-value. □

2.5 Another perspective: exchangeable permutations

Many of the results described above can be viewed through the lens of exchangeability—not on the data X (which we assume to be exchangeable under the null hypothesis H₀), but on the collection of permutations used to define the p-value P.

Theorem 2.3.

Let $\sigma _{0},\sigma _{1},\dots ,\sigma _{M}\in \mathcal {S}_{n}$ be a random set of permutations, which are exchangeable, i.e.,

$$ (\sigma_{0},\sigma_{1},\dots,\sigma_{M})\overset{d}{=} (\sigma_{\pi(0)},\sigma_{\pi(1)},\dots,\sigma_{\pi(M)}) $$

for any fixed permutation π on $\{0,\dots ,M\}$. Then

is a valid p-value, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $ for all α ∈ [0,1].

Many of the results stated earlier can be viewed as special cases—in particular, the results for a subgroup G, or for a subset S, as well as our more general result Theorem 2.2 for permutations drawn i.i.d. from q.

Proof 4.

To be clear, this theorem is essentially just a new perspective, and can be proved as a corollary to Theorem 2.1. To see why, let $\sigma _{0},\dots ,\sigma _{M}$ be exchangeable, and let $\widehat {q} = \frac {1}{M+1}{\sum }_{m=0}^{M}\delta _{\sigma _{m}}$ be the empirical distribution induced by the unordered set of drawn permutations. Then since $\sigma _{0},\dots ,\sigma _{M}$ is exchangeable, conditional on $\widehat {q}$ it holds that σ₀ is a random draw from $\widehat {q}$. Applying Theorem 2.1 with $\widehat {q}$ in place of q gives the conclusion. □

However, we can also prove this result in a more intuitive way, using the framework of exchangeability:

Proof 5 (Alternative proof of Theorem 2.3 ).

Since the sequence $\sigma _{0},\sigma _{1},\dots ,$ σ_M is exchangeable,

$$T(X_{\sigma_{0}}),T(X_{\sigma_{1}}),\dots,T(X_{\sigma_{M}})$$

is also exchangeable conditional on X. It is thus still exchangeable after marginalizing over X. Therefore, under the null hypothesis H₀, the test statistic values

$$ T(X)=T(X_{\sigma_{0}\circ\sigma_{0}^{-1}}), \ T(X_{\sigma_{1}\circ\sigma_{0}^{-1}}),\ \dots, \ T(X_{\sigma_{M}\circ\sigma_{0}^{-1}}) $$

(2.12)

are also exchangeable—this follows immediately from the previous line because $X \overset {d}{=} X_{\sigma _{0}^{-1}}$ under H₀. This shows that the p-value P defined in Theorem 2.3 is valid. □

3 Averaging to reduce variance

The p-value P defined in Eq. 2.1 can equivalently be written as

$$P = \mathbb{P}_{\sigma \sim q}\left\{T(X_{\sigma\circ\sigma_{0}^{-1}})\geq T(X) | X,\sigma_{0}\right\}.$$

It is clear that P is random even if we condition on the observed data X, because of the randomness due to σ₀. Consequently, in some settings P may be quite variable conditional on the data X, and this may be undesirable.

To address this issue, we can also consider averaging over σ₀ (in addition to averaging over σ) in the calculation of P. This alternative definition is now a deterministic function of the observed data X, but may no longer be a valid p-value. Nonetheless, the following theorem shows a bound on the Type I error.

Theorem 3.1.

Let q be any distribution over $\sigma \in \mathcal {S}_{n}$. Define

(3.1)

or equivalently,

$$\bar P = \mathbb{P}_{\sigma,\sigma_{0} \overset{iid}{\sim} q}\left\{\left.T(X_{\sigma\circ\sigma_{0}^{-1}})\geq T(X)\right|X\right\}.$$

Then $\bar P$ is a valid p-value up to a factor of 2, i.e., $\mathbb {P}_{H_{0}}\left \{\bar P\leq \alpha \right \}\leq 2\alpha $ for all α ∈ [0,1]. In other words, the quantity $\min \limits \{2\bar P,1\}$ is a valid p-value.

Proof 6.

Draw $\sigma _{0}^{(1)},\sigma _{0}^{(2)},\dots \overset {iid}{\sim } q$. Let

for each m ≥ 1. Then by Theorem 2.1, each P_m is a valid p-value. It is known (Rüschendorf, 1982; Vovk and Wang, 2020) that the average of valid p-values is a valid up to a factor of 2, i.e., for any M ≥ 1 the average $\bar {P}_{M} = \frac {1}{M}{\sum }_{m=1}^{M} P_{m}$ satisfies $\mathbb {P}\left \{\bar {P}_{M}\leq \alpha \right \}\leq 2\alpha $ for all α ∈ [0,1]. We can equivalently write

By the Law of Large Numbers, $\bar {P}_{M}$ converges almost surely to the p-value $\bar P$ defined in Eq. 3.1, which completes the proof. □

Returning to Example 2.1, we see that while P was a mixture of P_Id, $P_{\sigma _{1\leftrightarrow 4, 2\leftrightarrow 3}}, P_{\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}}$, we now have that $\bar P$ is an average of these, meaning $\bar P = \tfrac 13(P_{\text {Id}} + P_{\sigma _{1\leftrightarrow 4, 2\leftrightarrow 3}} + P_{\sigma _{1\leftrightarrow 3, 2\leftrightarrow 4}})$. Simplifying, we get

$$ \bar P = \begin{cases} \frac{5}{9}, & \text{ w.p. } 1/2, \\ 1, & \text{ w.p. } 1/2. \end{cases} $$

It is worth noting that this new quantity $\bar P$ is neither more conservative nor more anti-conservative than the p-value P in Eq. 2.5 from earlier. This is perhaps a more general phenomenon: the average of p-values need not in general be anti-conservative, and indeed it could often be more conservative, than the original p-values.

Analogously, the p-value in Theorem 2.2, computed via random samples from q, can also be averaged to reduce variance.

Theorem 3.2.

Let q be any distribution over $\sigma \in \mathcal {S}_{n}$. Let $\sigma _{0},\sigma _{1},\dots ,$ $\sigma _{M}\overset {iid}{\sim } q$, and define

(3.2)

Then P is a valid p-value up to a factor of 2, i.e., $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq 2\alpha $ for all α ∈ [0,1]. Thus, as before, the quantity $\min \limits \{2\bar P,1\}$ is a valid p-value.

The proof is similar to that of Theorem 3.1, and we omit it for brevity.

4 Connections to the Literature

We next mention a few connections to the broader literature.

4.1 Permutation tests vs randomization tests

Hemerik and Goeman (2021) describe the difference between two testing frameworks, permutation tests (as studied in our present work) versus randomization tests. The difference is subtle, because randomization tests may still use permutations. Specifically, Hemerik and Goeman (2021) highlight

an important difference in mathematical reasoning between these classes: a permutation test fundamentally requires that the set of permutations has a group structure, in the algebraic sense; the reasoning behind a randomisation test is not based on such a group structure, and it is possible to use an experimental design that does not correspond to a group.

To better understand this distinction, we can consider a scenario where a fixed subset $S\subseteq \mathcal {S}_{n}$, which is not a subgroup, is used for a randomization test rather than a permutation test. Consider a study comparing a treatment versus a placebo, with n/2 many subjects assigned to each of the two groups. We can use a permutation σ to denote the treatment assignments, with σ(i) ≤ n/2 indicating that subject i receives the treatment, and σ(i) > n/2 indicating that subject i receives the placebo. Now we switch notation, to be able to compare to permutation tests more directly—writing $X= (1,\dots ,1,0,\dots ,0)$, suppose that we will assign treatments via the permuted vector X_σ, i.e., for each subject $i=1,\dots ,n$, under this permutation σ the i th subject will receive the treatment if X_σ(i) = 1, or the placebo if X_σ(i) = 0.

Now suppose that we draw a random treatment assignment $\sigma _{\text {asgn}}\sim \text {Unif}(S)$, from a fixed subset $S\subseteq \mathcal {S}_{n}$ (for example, S may be chosen to restrict to treatment assignments that are equally balanced across certain subpopulations). After the treatments are administered, the measured response variable is given by $Y=(Y_{1},\dots ,Y_{n})$. Fix any test statistic T(X) = T(X,Y ) (we will implicitly condition on Y ), and compute

(4.1)

Since σ_asgn was drawn uniformly from S, this quantity P is a valid p-value. In the terminology of Hemerik and Goeman (2021), this test is a randomization test, not a permutation test. While the set of possible treatment assigments {X_σ : σ ∈ S} happens to be indexed by permutations σ, the group structure of permutations is not used in any way, and we do not rely on any invariance properties.

Comparing to the invalid p-value considered in Eq. 2.3, we can easily see the distinction: for a randomization test, the observed statistic is $T(X_{\sigma _{\text {asgn}}})$ for a randomly drawn $\sigma _{\text {asgn}}\sim \text {Unif}(S)$, while in the permutation test in Eq. 7, the observed statistic is T(X) (i.e., using the fixed permutation Id in place of a randomly drawn σ_asgn). For this reason, the randomization test p-value in Eq. 4.1 is valid, while the permutation test calculation in Eq. 2.3 is not valid in general.

Now we again consider Hemerik and Goeman (2018)’s method using a fixed subset. This test in Eq. 2.2 is a permutation test, not a randomization test—the observed data X, and its corresponding statistic T(X), do not arise from a random treatment assignment. More generally, our proposed test in Eq. 2.1 using an arbitrary distribution q on $\mathcal {S}_{n}$ is again a permutation test rather than a randomization test—that is, the observed data is given by X itself, not by a randomly chosen treatment assignment $X_{\sigma _{\text {asgn}}}$ for $\sigma _{\text {asgn}}\sim q$. Nonetheless, we are able to produce a valid p-value without assuming an underlying group structure or uniform sampling for the permutations considered by the test.

4.2 Exchangeable MCMC

The result of Theorem 2.2, which allows for random samples drawn from an arbitrary distribution q on $\mathcal {S}_{n}$, is closely connected to Besag and Clifford (1989)’s well known construction for obtaining exchangeable samples from Markov chain Monte Carlo (MCMC) sampling.

Consider a distribution Q₀ on $\mathcal {Z}$, and suppose we want to test

$$H_{0}: \ Z\sim Q_{0}$$

with some test statistic T(Z). To find a significance threshold for T(Z), we would ideally like to draw from the null distribution, i.e., compare T(Z) against $T(Z_{1}),\dots ,T(Z_{M})$ for $Z_{1},\dots ,Z_{M} \overset {iid}{\sim } Q_{0}$. However, in many settings, sampling directly from Q₀ is impossible, but we instead have access to a Markov chain whose stationary distribution is Q₀. If we run the Markov chain initialized at Z to obtain draws $Z_{1},\dots ,Z_{M}$ (say, running the Markov chain for some fixed number of steps s between each draw), then dependence among these sequentially drawn samples means that $Z,Z_{1},\dots ,Z_{M}$ are not i.i.d., and are not even exchangeable. Without studying the mixing properties of the Markov chain, we cannot determine how large the number of steps needs to be for the dependence to become negligible. Instead, Besag and Clifford (1989) propose a construction where the samples are drawn in parallel (rather than sequentially), which ensures exchangeability:

Theorem 4.1 (Besag and Clifford (1989, Section 2)).

Let Q₀ be any distribution on a probability space $\mathcal {Z}$. Construct a Markov chain on $\mathcal {Z}$ with stationary distribution Q₀, whose forward and backward transition distributions (initialized at $z\in \mathcal {Z}$) are denoted by $Q_{\rightarrow }(\cdot |z)$ and $Q_{\leftarrow }(\cdot |z)$. Let $Q_{\rightarrow }^{s}(\cdot |z)$ and $Q_{\leftarrow }^{s}(\cdot |z)$ denote the forward and backward transition distributions after running s steps of the Markov chain, for some fixed s ≥ 1. Given an initialization Z, suppose we generate data as in the left plot of Fig. 2:

$$\begin{cases}\text{First, draw} Z_{*} \sim Q^{s}_{\leftarrow}(\cdot|Z);\\ \text{Then, draw} Z_{1},\dots,Z_{M}\overset{iid}{\sim} Q^{s}_{\rightarrow}(\cdot|Z_{*}).\end{cases}$$

If it holds marginally that $Z\sim Q_{0}$, then the draws $Z,Z_{1},\dots ,Z_{M}$ are exchangeable.

Given this exchangeability property, the quantity is then a valid p-value for testing $H_{0}: Z\sim Q_{0}$.

Now we will see how Theorem 2.2 is related to this result. Let $\mathcal {Z} = \mathcal {X}^{n}$, and let Q₀ be any exchangeable distribution. In the setting of this paper, we do not know Q₀ precisely, which makes it a bit different from a typical setting where Besag and Clifford (1989)’s method is applied. However, we will work with a Markov chain for which any exchangeable distribution Q₀ is stationary, and in fact, Theorem 4.1 holds regardless of whether Q₀ is the unique stationary distribution for the Markov chain.

Consider the Markov chain given by applying a randomly chosen permutation $\sigma \sim q$, that is, for $x = (x_{1},\dots ,x_{n})$,

$$ Q_{\rightarrow}(\cdot|x) = \sum\limits_{\sigma \in S} q(\sigma) \cdot\delta_{x_{\sigma}}, $$

where $\delta _{x_{\sigma }}$ is the point mass at x_σ, while the backward transition probabilities are given by

$$ Q_{\leftarrow}(\cdot|x) = \sum\limits_{\sigma \in S}q(\sigma) \cdot\delta_{x_{\sigma^{-1}}}. $$

Then, to implement the test described in Theorem 2.2, we run Besag and Clifford (1989)’s method (with s = 1): we define $X_{*} = X_{\sigma _{0}^{-1}}$, and then define $X_{m} = (X_{*})_{\sigma _{m}} = X_{\sigma _{m}\circ \sigma _{0}^{-1}}$ for $m=1,\dots ,M$. This is illustrated on the right-hand side of Fig. 2. If X is exchangeable (that is, it is drawn from some exchangeable Q₀), then the exchangeability of $X,X_{1},\dots ,X_{M}$ follows by Theorem 4.1, and this verifies that P is a valid p-value, thus completing the proof of Theorem 2.2.

Of course, we have only written out our method for the s = 1 case (where s is the number of steps of the Markov chain). New variants of our method can be constructed by taking s > 1 backward steps to the hidden node, and the same number s of forward steps to the permuted data. All of these are valid for the same reason as the s = 1 case.

5 Conclusion

We proposed a new method for permutation testing that generalizes previous methods. This idea naturally opens up new lines of theoretical and practical enquiry. In this work, we have focused on validity, but it is of course also important to examine the consistency and power of such methods. In particular, Dobriban (2022) and Kim et al. (2022) study the power of the permutation test when using the full permutation group $\mathcal {S}_{n}$; it would be interesting to examine this question in the context of using only a subset $S\subseteq \mathcal {S}_{n}$ or a nonuniform distribution over $\mathcal {S}_{n}$. In addition, the theoretical guarantees for all the permutation tests considered here ensure a p-value P that is valid in the sense of satisfying $\mathbb {P}_{H_{0}}\left \{P\leq \alpha \right \}\leq \alpha $, which means that P could potentially be quite conservative under the null (for instance, we saw this behavior when ‘fixing’ the failure example in Section 2.1). It would also be interesting to understand which types of tests reduce overly conservative outcomes.

In conclusion, it is perhaps remarkable that one can still gain new understanding about classical permutation methods. In turn, this enhanced understanding can inform other areas of inference. As an example, the results from this paper were motivated by questions in conformal prediction (Vovk et al. 2005), a method for distribution-free predictive inference. Classically, conformal prediction has relied on exchangeability of data points (e.g., training and test data are drawn i.i.d. from the same unknown distribution), and thus the joint distribution of the data (including both training samples and a test point) is invariant under an arbitrary permutation. In contrast, in our recent work (Barber et al. 2023), we studied the problem of constructing prediction intervals when the data do not satisfy exchangeability; for instance, the distribution of observations may simply drift over time in an unknown fashion. Thus the data is no longer invariant under an arbitrary permutation, and so we instead restrict attention to a weighted distribution over simple permutations that only swap the test point with a random training point, which at least approximately preserve the distribution of the data. These swaps clearly do not form a subgroup of permutations, and are weighted non-uniformly; understanding how permutation tests operate in this setting, as in Theorem 2.1, is key to the findings in our aforementioned work.

Notes

Note that we always have P > 0, because of the identity permutation $\sigma = \text {Id} \in \mathcal {S}_{n}$ (for which X_σ = X and thus . Other permutation p-values in this paper, like Eq. 1.3, may explicitly include a “1+” term in the numerator and denominator, but their similarity to the above formula can be intuitively justified by thinking of the extra “1+” as resulting from the identity permutation.
Permutation tests are a special case of “invariance-based testing” (Lehmann et al. 2005, Chapter 6).
For completeness, a group is a set paired with an operation that takes any two elements of the set and produces a third, such that the operation is associative, an identity element exists, and every element has an inverse. A subgroup is just a subset of the original group that maintains the same properties—in particular, any subgroup must contain the identity element. The group $\mathcal {S}_{n}$ is called the symmetric group; its elements are the n! permutations over n objects. The operation is denoted ∘, sometimes called “composition”; its action is to compose any two permutations $\sigma ,\sigma ^{\prime }$ to yield a third one $\nu := \sigma \circ \sigma ^{\prime }$ which is given by $\nu (i)=\sigma (\sigma ^{\prime }(i))$ for i ∈ [n]. The inverse of σ, denoted σ^− 1, is defined by setting σ^− 1(i) = j if σ(j) = i, so that σ ∘ σ^− 1 always equals the identity permutation introduced earlier. Note that $\mathcal {S}_{n}$ is not an Abelian group, meaning that ∘ is not commutative, since usually, $\sigma \circ \sigma ^{\prime } \neq \sigma ^{\prime } \circ \sigma $. For a subset $G\subseteq \mathcal {S}_{n}$, we can verify that G is a subgroup if it is closed under composition (i.e., $\sigma \circ \sigma ^{\prime }\in G$ for any $\sigma ,\sigma ^{\prime }\in G$).
The notation of the order statistics implicitly assumes $\mathcal {X}=\mathbbm {R}$. More generally, for an arbitrary space $\mathcal {X}$, the validity of P is retained when conditioning on the unordered observed data, i.e., the multiset $\{X_{1},\dots ,X_{n}\}$.

References

Barber, RF, Candès, EJ, Ramdas, A and Tibshirani, RJ (2023). Conformal prediction beyond exchangeability. The Annals of Statistics (to appear).
Besag, J and Clifford, P (1989). Generalized Monte Carlo significance tests. Biometrika 76, 633–642.
Article MathSciNet MATH Google Scholar
Dobriban, E (2022). Consistency of invariance-based randomization tests. The Annals of Statistics 50, 4, 2443–2466.
Article MathSciNet MATH Google Scholar
Harrison, MT (2012). Conservative hypothesis tests and confidence intervals using importance sampling. Biometrika 99, 57–69.
Article MathSciNet MATH Google Scholar
Hemerik, J and Goeman, J (2018). Exact testing with random permutations. Test 27, 811–825.
Article MathSciNet MATH Google Scholar
Hemerik, J and Goeman, J J (2021). Another look at the lady tasting tea and differences between permutation tests and randomisation tests. Int Stat Rev89, 367–381.
Article MathSciNet Google Scholar
Kim, I, Balakrishnan, S and Wasserman, L (2022). Minimax optimality of permutation tests. Ann Stat 50, 225–251.
Article MathSciNet MATH Google Scholar
Koning, N W and Hemerik, J (2022). More Efficient Exact Group-Invariance Testing: using a Representative Subgroup. arXiv:2202.00967.
Lehmann, EL, Romano, JP and Casella, G (2005). Testing statistical hypotheses, volume 3. Springer.
Mitchell, R, Cooper, J, Frank, E and Holmes, G (2022). Sampling permutations for shapley value estimation. Journal of Machine Learning Research 23, 43, 1–46.
MathSciNet MATH Google Scholar
Rüschendorf, L (1982). Random variables with maximum sums. Adv Appl Probab 14, 623–632.
Article MathSciNet MATH Google Scholar
Southworth, LK, Kim, SK and Owen, AB (2009). Properties of balanced permutations. J Comput Biol 16, 625–638.
Article MathSciNet Google Scholar
Vovk, V, Gammerman, A and Shafer, G (2005). Algorithmic learning in a random world. Springer Science & Business Media.
Vovk, V and Wang, R (2020). Combining p-values via averaging. Biometrika 107, 791–808.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors thank Nick Koning and Ilmun Kim for helpful feedback on an early preprint. The authors also thank the SQUARE program run by the American Institute of Mathematics, where our collaboration started.

Funding

Open Access funding provided by Carnegie Mellon University. R.F.B. was supported by the National Science Foundation via grants DMS-1654076 and DMS-2023109, and by the Office of Naval Research via grant N00014-20-1-2337. E.J.C. was supported by the Office of Naval Research grant N00014-20-1-2157, the National Science Foundation grant DMS-2032014, the Simons Foundation under award 814641, and the ARO grant 2003514594. R.J.T. was supported by ONR grant N00014-20-1-2787.

Author information

Authors and Affiliations

Departments of Statistics and Machine Learning, Carnegie Mellon University, Pittsburgh, USA
Aaditya Ramdas
Department of Statistics, University of Chicago, Chicago, USA
Rina Foygel Barber
Departments of Statistics and Mathematics, Stanford University, Stanford, USA
Emmanuel J. Candès
Department of Statistics, University of California Berkeley, Berkeley, USA
Ryan J. Tibshirani

Authors

Aaditya Ramdas
View author publications
You can also search for this author in PubMed Google Scholar
Rina Foygel Barber
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel J. Candès
View author publications
You can also search for this author in PubMed Google Scholar
Ryan J. Tibshirani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aaditya Ramdas.

Ethics declarations

Conflict of Interest

The authors have no conflict of interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ramdas, A., Barber, R.F., Candès, E.J. et al. Permutation Tests Using Arbitrary Permutation Distributions. Sankhya A 85, 1156–1177 (2023). https://doi.org/10.1007/s13171-023-00308-8

Download citation

Received: 10 August 2022
Accepted: 06 February 2023
Published: 03 June 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13171-023-00308-8

Keywords

PACS Nos

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Permutation Tests Using Arbitrary Permutation Distributions

Abstract

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

1 Introduction

1.1 Contributions

2 A Generalized Permutation Test

2.1 Testing with an Arbitrary Distribution

Theorem 2.1.

Uniform Distribution Over a Subgroup

Uniform Distribution Over a Subset

Example 2.1.

The role of σ 0

2.2 Random samples from an arbitrary distribution

Theorem 2.2.

Random permutations from \(\mathcal {S}_{n}\)

Random permutations from a subgroup

Corollary 2.1.

Random permutations from a subset

Corollary 2.2.

Corollary 2.3.

Proof 1.

Example 2.2.

2.3 Proof of Theorem 2.1

2.4 Connecting Theorems 2.1 and 2.2

Proof 2 (Alternative proof of Theorem 2.1 (via Theorem 2.2 )).

Proof 3 (Proof of Theorem 2.2 (via Theorem 2.1 )).

2.5 Another perspective: exchangeable permutations

Theorem 2.3.

Proof 4.

Proof 5 (Alternative proof of Theorem 2.3 ).

3 Averaging to reduce variance

Theorem 3.1.

Proof 6.

Theorem 3.2.

4 Connections to the Literature

4.1 Permutation tests vs randomization tests

4.2 Exchangeable MCMC

Theorem 4.1 (Besag and Clifford (1989, Section 2)).

5 Conclusion

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

PACS Nos

Search

Navigation

The role of σ ₀