Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process

Bichat, Antoine; Ambroise, Christophe; Mariadassou, Mahendra

doi:10.1007/s00180-021-01148-6

Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process

Original paper
Open access
Published: 12 September 2021

Volume 37, pages 995–1013, (2022)
Cite this article

Download PDF

You have full access to this open access article

Computational Statistics Aims and scope Submit manuscript

Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process

Download PDF

Antoine Bichat ORCID: orcid.org/0000-0001-6599-7081^1,2,
Christophe Ambroise¹ &
Mahendra Mariadassou³

1710 Accesses
1 Citation
9 Altmetric
Explore all metrics

Abstract

Statistical testing is classically used as an exploratory tool to search for association between a phenotype and many possible explanatory variables. This approach often leads to multiple testing under dependence. We assume a hierarchical structure between tests via an Ornstein-Uhlenbeck process on a tree. The process correlation structure is used for smoothing the p-values. We design a penalized estimation of the mean of the Ornstein-Uhlenbeck process for p-value computation. The performances of the algorithm are assessed via simulations. Its ability to discover new associations is demonstrated on a metagenomic dataset. The corresponding R package is available from https://github.com/abichat/zazou.

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

Article Open access 31 July 2019

Additive Conditional Independence for Large and Complex Biological Structures

Bayesian nonparametric clustering for large data sets

Article 12 February 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In many fields, statistical testing is classically used as an exploratory tool to look for the association between a variable of interest and many possible explanatory variables. For example, in transcriptomics, the link between a phenotype and the expression of tens of thousands of genes is tested (McLachlan et al. 2005), in Genome Wide Association Studies (GWAS) the association between millions of markers and a phenotype is tested (Bush and Moore 2012), in functional Magnetic Resonance Imaging (fMRI), the goal is to identify voxels that are significantly activated in two different conditions (Cremers et al. 2017).

This problem of multiple comparisons dates back to the work of Tukey (Tukey 1953). It has since been the subject of abundant literature and aims at controlling a probability of error of some sort. Most of the literature focus on the control of the Familiy Wise Error Rate (FWER) (Bland and Altman 1995), being the probability of at least one false discovery among detections, or of the False Discovery Rate (FDR) (Benjamini and Hochberg 1995), defined as the expected proportion of false positives among detections.

Most of the correction procedures for controlling FWER or FDR, such as the popular Benjamini-Hochberg (BH) procedure, rely on independence, or some form of weak dependence, among the hypotheses, which is rarely observed in practice. Multiple testing under dependence is a difficult problem occurring in many fields. In transcriptomics, differential analysis has to deal with gene expressions that are often highly correlated. When performing GWAS, the linkage desiquilibrium imposes a strong spatial dependence between markers, and in Functional Magnetic Resonance Imaging (fMRI), two spatially close voxels have often comparable activation.

The control of the FDR remains valid under arbitrary dependency structures by replacing the BH procedure with the more conservative BY procedure of Benjamini and Yekutieli (2001). However, based on results obtained from simulated datasets, it is obvious that there is a substantial loss of power when the real dependency structure is not taken into account, as discussed in depth in Blanchard et al. (2020).

An alternative approach for dealing with multiple testing is to reduce the number of tests by aggregating certain hypotheses. Aggregation strategies vary and can be based on a priori knowledge (e.g. metabolic pathways, functional modules of genes) or on clustering algorithms (Sankaran and Holmes 2014; Renaux et al. 2020).

This article aims to take into account the dependencies between variables in order to offer a powerful statistical procedure of multiple testing. A hierarchical dependency structure between variables is assumed to be known up to certain constants. This assumption is common in our motivating example of microbiome studies (Sankaran and Holmes 2014; Xiao et al. 2017; Huang et al. 2021; Matsen IV and Evans 2013; Silverman et al. 2017), where the phylogeny is a natural hierarchical structure encoding similarities between variables (or namely species in that context). The hypotheses tested can then be organized in a tree structure which captures correlations at different scales of observation. This type of hierarchical structure is observable in transcriptomics differential analysis, where gene expressions can easily be represented by a hierarchy based on gene expression correlation. In GWAS and fMRI, spatial dependence also proves to be very suitable for hierarchical modeling (Ambroise et al. 2019; Eickhoff et al. 2015; Sesia et al. 2020).

We propose to model the hierarchical structure of the multiple tests through an Ornstein-Uhlenbeck process on a tree. The process correlation structure is used for smoothing the p-values, after conversion to z-scores, similarly to the algorithm proposed in Xiao et al. (2017) but with an explicit underlying model.

We then consider a three stage approach for our differential analysis procedure. The first stage reframes the initial problem as a linear regression problem that preserves the hierarchical structure. This linear problem is ill defined ($p \sim 2n$) and we therefore resort to an $\ell _1$ penalized estimation of the mean of the Ornstein-Uhlenbeck process. The second stage produces asymptotically valid p-values. The output of $\ell _1$ penalized estimation produces are indeed biased and offer no theoretical guarantees about their asymptotic distribution; we therefore correct them using a debiasing procedure (Javanmard and Montanari 2013, 2014; Zhang and Zhang 2014) to compute valid p-values. The third and final stage controls the FDR of the overall procedure, using the tuning strategy of Javanmard et al. (2019).

The selection strength of the Ornstein-Uhlenbeck process and the penalty parameter are hyperparameters of our model, whose selection is achieved via a Bayesian Information Criterion (BIC). We provide some background on hierarchical procedures in Sect. 2, introduce the model and statistical procedure in Sect. 3 and detail the computational steps in Sect. 4. The performances of the algorithm are assessed via simulations in Sect. 5. The use of the proposed model is illustrated in Sect. 6, where we demonstrate its ability to discover novel associations in a metagenomic dataset.

2 Background

2.1 Examples of multiple testing strategies

A classic example in genomics consists in grouping the markers according to whether they belong to the same genes (aggregation by a prior). The genes can then be grouped according to their similarity, computed for example from expression profiles. Kim et al. (2010) have, for example, proposed a hierarchical testing strategy controlling the FWER in a hierarchical manner, by testing clusters of genes, then individual genes associated with a phenotype with the goal of finding genomic regions associated with a specific type of cancer. This type of top-down approach uses the concept of sequential rejection principle (Goeman and Finos 2012; Meinshausen 2008; Renaux et al. 2020).

fMRI is another domain where tests are aggregated: neighboring voxels that are highly correlated are aggregated into a single voxel cluster. Benjamini and Heller (2007) propose an adaptation of the False Discovery Rate (FDR) to allow for cluster-level multiple testing for fMRI data.

Ad hoc aggregating methods for multiple testing also exist in Metagenomics. LEfSe (Segata et al. 2011) performs a bottom up approach where a factorial Kruskal-Wallis rank sum test is applied to each feature with respect to a class factor, followed by a pairwise Wilcoxon test, and a linear discriminant analysis. MiLineage (Tang et al. 2017) performs multivariate tests concerning multiple taxa in a lineage to test the association of lineages to a phenotypic outcome.

2.2 Independence assumption

The assumption of independence of tests is convenient as it enables for both exact analyses and simple error bounds for classical procedures (Benjamini and Hochberg 1995, e.g.). It is however unrealistic in practice. In many fields, including all the previous examples, measurements typically exhibit strong correlations. Some correction procedures, like the one proposed by Benjamini and Yekutieli (2001), make few assumptions while guaranteeing control of the FDR. Those general guarantees come with a high cost in terms of statistical power: the nominal FDR typically is much smaller that the target, resulting in many FN. Permutation procedures are an appealing alernative that can automatically adapt to the dependence structure of the p-values (Tusher et al. 2001) but may fail when confronted to unbalanced design or correlated data. Knowledge of the correlation structure can be leveraged to increase the power while still controling the FDR below a given target. Several approaches have been developed along those lines when the tests are organized along a hierarchical structure, typically encoded in a tree.

2.3 Hierarchical testing

The Hierarchical FDR (hFDR) introduced by Yekutieli (2008) and implemented in the R package structSSI (Sankaran and Holmes 2014), proposes a top-down algorithm to sequentially reject hypotheses organized in a tree. The same approach is used in (Renaux et al. 2020) to select a group of variables arranged in a clustering tree. However, this approach suffers from some limitations, as shown in (Bichat et al. 2020; Huang et al. 2021). First, the algorithm in its vanilla formulation commonly fails to move down on the tree because of failure to reject the topmost node. Second, it only controls for an a posteriori FDR level, which is a complex function of the (user-defined) a priori FDR level and the structure of rejected nodes. This makes it difficult to calibrate the a priori FDR that would achieve a target a posteriori FDR and thus to compare it to other correction methods. Finally, it does not produce a corrected p-value, or q-value, per leaf, but only a reject / no reject decision and was shown in (Bichat et al. 2020) to perform no better than BH in many instances. Given all these drawbacks, we did not include the hFDR in our benchmark and use BH as a baseline instead.

StructFDR (Xiao et al. 2017) was developed for metagenomics Differential Abundance Testing (DAT) and relies on z-scores / p-values smoothing followed by permutation correction. Given any taxa-wise DAT procedure, p-values ${\mathfrak {p}}$ are first computed for all m taxa (i.e. leaves of the tree) and then transformed to z-scores ${\mathfrak {z}}$. The tree is used to compute a distance matrix $\left( {\mathbf {D}}_{i,j}\right) $ and then turned into a correlation matrix ${\mathbf {C}}_{\rho } = \left( \exp \left( -2\rho {\mathbf {D}}_{i,j}\right) \right) $ between taxa using a Gaussian kernel. The z-scores are then smoothed using the following hierarchical model:

$$\begin{aligned}&\left. {\mathfrak {z}}\mid \mu \right. \sim \mathcal {N}_m\left( \mu ,\sigma ^2{\mathbf {I}}_m\right) , \\&\quad \mu \sim \mathcal {N}_m\left( \gamma {\mathbf {1}}_m,\tau ^2{\mathbf {C}}_{\rho }\right) , \end{aligned}$$

where $\mu $ captures the effect size of each taxa and ${\mathfrak {z}}$ is a noisy observation of $\mu $. The maximum a posteriori estimator $\mu ^*$ of $\mu $ is given by

$$\begin{aligned} \mu ^* = \left( {\mathbf {I}}_m+ k{\mathbf {C}}_{\rho }^{-1}\right) ^{-1}\left( k{\mathbf {C}}_{\rho }^{-1}\gamma {\mathbf {1}}_m+{\mathfrak {z}}\right) \quad \text {where} \quad k = \sigma ^2 / \tau ^2 . \end{aligned}$$

The FDR is controlled by means of a resampling procedure to estimate the distribution of $\mu ^*$ under $H_0$ and estimate adjusted p-values ${\mathfrak {q}}^{\text {sf}}$. This method is implemented in the StructFDR package (Chen 2018).

TreeclimbR (Huang et al. 2021) is a bottom-up approach also developed for metagenomics DAT but with a broader scope. It relies on aggregating abundances at each node of the tree (understood as a cluster of taxa) and performing a test to compute one p-value per node (compared one test per leaf for StructFDR). The main idea is then to use those p-values to compute a score for node i

$$\begin{aligned} U_i(t) = \left| \frac{\sum _{k\in B(i)} {\mathfrak {s}}_k \mathbb {1}_{\{{\mathfrak {p}}_k \le t\}}}{\#B(i)}\right| \end{aligned}$$

where B(i) is the set of descendants of node i, ${\mathfrak {p}}_k$ and ${\mathfrak {s}}_k \in \{-1, +1\}$ are the p-value of the node k and the sign of the associated effect, and t is a tuning parameter. A node i will be considered as candidate if $U_i(t) \simeq 1$ and ${\mathfrak {p}}_i < \alpha $. This ensure that all descendants are (i) significant at level t with (ii) effects of coherent sign. At the end, multiplicity correction is only done on nodes (including leaves) that do not descend from another candidate.

3 Models and algorithms

Our correction methods assumes that p-values, or rather z-scores, evolve according to an Ornstein-Uhlenbeck process on a tree. We thus use the corresponding correlation structure to decorrelate the z-scores and, in turn, the p-values. This is similar in spirit to the smoothing algorithm of Xiao et al. (2017) but we derive our procedure from first principles and explicit assumptions. We first remind a few properties of Ornstein-Uhlenbeck processes before proceeding to our model and procedure.

3.1 Ornstein-Uhlenbeck process on a tree

An Ornstein-Uhlenbeck (OU) process $(W_t)$ with optimal value (also called drift) ${\beta }_{\text {ou}}$, selection strengh (also called mean reversion parameter) ${\alpha }_{\text {ou}}$ and variance of the white noise ${\sigma ^2}_{\text {ou}}$, is a Gaussian process that satisfies the stochastic differential equation:

$$\begin{aligned} \mathrm {d}^{}\mathord {W_t} = - {\alpha }_{\text {ou}} (W_t - {\beta }_{\text {ou}}) \mathrm {d}^{}\mathord {t} + {\sigma }_{\text {ou}}\mathrm {d}^{}\mathord {B_t}. \end{aligned}$$

The important properties of OU processes are bounded variance and convergence to a stationary distribution centered on the optimal value ${\beta }_{\text {ou}}$, namely $W_t\xrightarrow []{(d)} \mathcal {N}_{{}} \left( {{\beta }_{\text {ou}}}, {{\sigma }_{\text {ou}}^2/ 2{\alpha }_{\text {ou}}}\right) $ when $t \rightarrow \infty $. Thanks to those properties, OU processes have become a popular model applied in various subfields of biology, ranging from evolution of continuous traits, such as body mass (Freckleton et al. 2003), fitness (Lande 1976) or CpG enrichment in viral sequences (MacLean et al. 2021) to animal movement (Dunn and Gipson 1977) and epidemiology (Nåsell 1999). They naturally emerge as the continuous limit of broad range of discrete-time evolution models (Lande 1976). Ornstein-Uhlenbeck processes can be readily adapted to tree-like structures as illustrated in Fig. 1.

Formally, we consider a rooted ultrametric tree $\mathcal {T}$ with m leaves and n branches ($n = 2m - 1$ for binary trees). The internal nodes are labeled $N_1$ (the root) to $N_{n-m}$ and the leaves $T_1$ to $T_m$. Let i be a node, $W_i$ the value of the trait at that node and denote pa(i) its unique parent. By convention, we set $t_{N_1} = 0$ and assume $W_{N_1} = 0$. The branch leading to i from pa(i) is denoted $b_i$ and has length $l_i = t_i -t_{pa(i)}$ where $t_i$ is the time elapsed between the root and node i. Since the tree is ultrametric, $t_i = h$ for all $i \in \{T_1, \dots , T_{m}\}$. For any pair of nodes (i, j), let $t_{ij}$ be the time elapsed between the root and the most recent common ancestor of i and j and denote $d_{ij} = t_i - t_j -2t_{ij}$ the distance in the tree between nodes i and j. The distribution of the trait at node i is given by:

$$\begin{aligned} W_{i} | W_{pa(i)} \sim \mathcal {N} \left( \lambda _i W_{pa(i)} + (1 - \lambda _i) {\beta }_{\text {ou},i}, \frac{{\sigma }_{\text {ou}}^2}{2{\alpha }_{\text {ou}}} (1 - \lambda _i^2) \right) \end{aligned}$$

(1)

where $\lambda _i = \exp (-{\alpha }_{\text {ou}} l_i)$ and ${\beta }_{\text {ou},i}$ is the optimal value on branch i. Remark that the process mean value does not immediately shift to ${\beta }_{\text {ou},i}$ but lags behind it with a shrinkage parameter controlled by $1 - \lambda _i$. If ${\beta }_{\text {ou},i} = 0$ for all i, straightforward computations show that $W = (W_{T_1}, \dots , W_{T_m})$ is a gaussian vector with distribution

$$\begin{aligned} W \sim \mathcal {N}(0, \varSigma ) \quad \text {where} \quad \varSigma _{ij} = \frac{{\sigma }_{\text {ou}}^2}{2{\alpha }_{\text {ou}}} e^{-2{\alpha }_{\text {ou}}d_{ij}} (1 - e^{-2{\alpha }_{\text {ou}}t_{ij}}). \end{aligned}$$

When, the optimal value can shift on a branch (e.g. the branch $b_{N_4}$ leading to $N_4$ in Fig. 1), the mean vector of W is a slightly more complex and depends on both the tree topology and the location and magnitude of the shifts. Denote U the $m \times (n+m)$ incidence matrix of $\mathcal {T}$ with rows labeled by leaves ($i \in \{T_1, \dots , T_{m}\}$) and columns labeled by inner nodes and leaves ($j \in \{N_1, \dots , N_{n-m}, T_1, \dots , T_{m}\}$), with entries defined as $U_{ij} = 1$ if and only if leaf i is in the subtree rooted at node j. Intuitively, column $U_{.j}$ encodes all leaves descending from node j and row $U_{i.}$ encodes all ancestors of leaf i. Denote $\varDelta $ the dimension n column vector with entries defined as $\varDelta _{i} = {\beta }_{\text {ou},i} - \beta _{\text {ou},pa(i)}$ where $i \in \{N_1, \dots , N_{n-m}, T_1, \dots , T_{m}\}$. Non-zero entries of $\varDelta $ correspond to shifts location, nodes for which the optimal value ${\beta }_{\text {ou},i}$ differ from its parent’s and their values to shifts magnitude (see Figure 2 for an example). Finally let $\varLambda $ be the n diagonal matrix with diagonal entries $\varLambda _{i} = 1 - \exp ({\alpha }_{\text {ou}}(h - t_{pa(i)}))$ where $i \in \{N_1, \dots , N_{n-m}, T_1, \dots , T_{m}\}$. Straightforward computations (see Bastide et al. (2017) for detailed derivations) show that W is a gaussian vector with joint distribution:

$$\begin{aligned} W \sim \mathcal {N}(\mu , \varSigma ) \quad \text {where} \quad \mu = U \varLambda \varDelta \quad \text {and} \quad \varSigma _{ij} = \frac{{\sigma }_{\text {ou}}^2}{2{\alpha }_{\text {ou}}} e^{-2{\alpha }_{\text {ou}}d_{ij}} (1 - e^{-2{\alpha }_{\text {ou}}t_{ij}}).\nonumber \\ \end{aligned}$$

(2)

When $\mathcal {T}$ is known, the matrix $T = U \varLambda $ is completely specified up to parameter ${\alpha }_{\text {ou}}$. The shifted Ornstein-Uhlenbeck model, with parameters ${\alpha }_{\text {ou}}$, ${\sigma }_{\text {ou}}^2$ and shift vector $\varDelta $, has been used (Bastide et al. 2017; Khabbazian et al. 2016) to find adaptive events, modeled as non zero values in $\varDelta $, in the evolution of continuous traits of interest (turtle shell size, great monkey brain shape, etc). In this work, we apply the same mathematical framework to the joint distribution of p-values transformed to z-scores.

3.2 Procedure

We show here how to use the previously described Ornstein-Uhlenbeck process to incorporate the tree structure $\mathcal {T}$ in the correction of the p-values vector ${\mathfrak {p}}$.

Framework. Noting $m_{i}^1$ (resp. $m_{i}^2$) the median count (or relative abundance) of taxon i under condition 1 (resp. condition 2), we want to test $\mathcal {H}_{i0}: m_{i}^1 = m_{i}^2$ against $\mathcal {H}_{i1}: m_{i}^1 \ne m_{i}^2$ and assume that we have a testing procedure that outputs p-values, e.g. the Wilcoxon-Mann-Whitney test (Mann and Whitney 1947; Wilcoxon 1992). We first convert the p-values to z-scores using the quantile function $\varPhi ^{-1}$ of the standard gaussian:

$$\begin{aligned} {\mathfrak {z}}= \varPhi ^{-1}({\mathfrak {p}}). \end{aligned}$$

Provided the use of a correct statistical test, we known that ${\mathfrak {p}}_i \sim \mathcal {U}([0, 1])$ under $\mathcal {H}_{i0}$, so that ${\mathfrak {z}}_i \sim \mathcal {N}(0, 1)$. We also know that ${\mathfrak {p}}_i \preccurlyeq \mathcal {U}([0, 1])$ and thus ${\mathfrak {z}}_i \preccurlyeq \mathcal {N}(0, 1)$ under $\mathcal {H}_{i1}$. We could also test $\mathcal {H}_{i0}: m_{i}^1 = m_{i}^2$ against $\mathcal {H}_{i1}: m_{i}^1 < m_{i}^2$ or $\mathcal {H}_{i1}: m_{i}^1 > m_{i}^2$, we only require the procedure to output p-values that satisfy the previous distributional assumptions for these $\mathcal {H}_{i0}$ and $\mathcal {H}_{i1}$. Note that, even if the test statistic is itself a z-score before being transformed to a p-value, the z-score ${\mathfrak {z}}_i$ may differ from the raw test statistic $z_i$ because of the intermediate p-value ${\mathfrak {p}}_i$. Indeed when considering the simple case of testing equality of means in two samples of size n, with gaussian distributions and known variance $\sigma $, the relation between ${\mathfrak {z}}_i$ and $z_i = \sqrt{n}({\hat{m}}_i^1 - {\hat{m}}_i^2)/2\sigma $ is given by:

$$\begin{aligned} {\mathfrak {z}}_i = \varPhi ^{-1}({\mathfrak {p}}_i) = {\left\{ \begin{array}{ll} \varPhi ^{-1}(\varPhi (z_i)) = z_i &{} \text {if } \mathcal {H}_{i1}: m_{i}^1 < m_{i}^2 \\ \varPhi ^{-1}(\varPhi (1 - z_i)) = -z_i &{} \text {if } \mathcal {H}_{i1}: m_{i}^1 > m_{i}^2 \\ \varPhi ^{-1}(2\varPhi (- |z_i|)) &{} \text {if } \mathcal {H}_{i1}: m_{i}^1 \ne m_{i}^2 \\ \end{array}\right. } \end{aligned}$$

After transformation, the test can be thus always be reframed as one-sided on ${\mathfrak {z}}_i$: $\mathcal {H}_{i0}: E[{\mathfrak {z}}_i] = 0$ against $\mathcal {H}_{i1}: E[{\mathfrak {z}}_i] < 0$. We make two assumptions regarding the distribution of ${\mathfrak {z}}$.

(A1)
Under $\mathcal {H}_{i1}$, ${\mathfrak {z}}_i \sim \mathcal {N}(\mu _i, 1)$ where $\mu _i \le 0$;
(A2)
${\mathfrak {z}}$ arises from a shifted Ornstein-Uhlenbeck process on an ultrametric tree $\mathcal {T}$ with parameters ${\alpha }_{\text {ou}}$, ${\varDelta }_{\text {ou}}$ and $\varDelta $.

Assumption (A1) is very classic when working with z-scores (McLachlan and Peel 2000): finding the alternative hypotheses is equivalent to finding the negative entries of $\mu $. Assumption (A2) allows us to specify the joint distribution of ${\mathfrak {z}}$ as:

$$\begin{aligned} {\mathfrak {z}}\sim \mathcal {N}_{{m}} \left( {\mu }, {\varSigma }\right) \end{aligned}$$

(3)

where $\varSigma $ is fully specified by the parameters ${\sigma }_{\text {ou}}$ and ${\alpha }_{\text {ou}}$. Note that the diagonal coefficients of $\varSigma $ are all equal to ${\sigma }_{\text {ou}}^2 / 2{\alpha }_{\text {ou}} (1 - 2e^{-2{\alpha }_{\text {ou}}h})$. As they correspond to marginal variances, this forces the equality ${\sigma }_{\text {ou}}^2 = (1 - 2e^{-2{\alpha }_{\text {ou}}h}) / 2{\alpha }_{\text {ou}}$ so that $\varSigma $ depends only on ${\alpha }_{\text {ou}}$, i.e. $\varSigma = \varSigma ({\alpha }_{\text {ou}})$. Finally, the decompositon $\mu = T \varDelta $, where T acts as a phylogenetic design matrix, ensures that alternative hypotheses are likely to form clades, i.e. groups of leaves obtained by cutting a single branch in the tree.

This framework allows us to use $\mathcal {T}$ as a prior structure in the mean vector $\mu $ and variance matrix $\varSigma $ and to recast the hypothesis testing problem as a regression problem.

3.2.1 Parameter estimation

Estimation of ${\hat{\mu }}$. Assume first that $\varSigma $, or equivalently ${\alpha }_{\text {ou}}$, is known. Our main goal is to estimate the negative components of $\mu $.

To leverage the known tree structure, we use the decomposition $\mu = T\varDelta $ and estimate $\mu $ by means of $\varDelta $. Since $\varDelta $ has dimension n compared to dimension m for $\mu $, we force ${\hat{\varDelta }}$ to be sparse using a constrained lasso penalty (Tibshirani 1996) :

$$\begin{aligned} {\hat{\varDelta }} = \mathop {\mathrm{argmin}}\limits _{\varDelta \in {\mathbb {R}}^{n} \; \text {s.t.} \; T\varDelta \in {\mathbb {R}}^m_-} \frac{1}{2} \left\| {\mathfrak {z}}- T\varDelta \right\| _{\varSigma ^{-1},2}^2 + \lambda \Vert \varDelta \Vert _1. \end{aligned}$$

(4)

where ${\mathbb {R}}_- = \{x \in {\mathbb {R}}\; \text {s.t.} \; x \le 0\}$.

Intuitively, the decomposition together with the $\ell _1$ penalty works as a nested group lasso penalty for the components of $\mu $, where the groups correspond to clades of $\mathcal {T}$, while the constraint $T\varDelta \in {\mathbb {R}}^m_-$ forces components of $\mu $ to be non positive. For compacity, we define the feasible set $\mathcal {D}= \{ \varDelta \in {\mathbb {R}}^n \; \text {s.t.} \; T\varDelta \in {\mathbb {R}}_-^m\}$. Finally, we use the Cholesky decomposition $\varSigma ^{-1} = R^TR$ to simplify the problem into the very well studied optimisation problem:

$$\begin{aligned} {\hat{\varDelta }} = \mathop {\mathrm{argmin}}\limits _{\varDelta \in \mathcal {D}} \frac{1}{2} \left\| y - X\varDelta \right\| _2^2 + \lambda \Vert \varDelta \Vert _1 \end{aligned}$$

(5)

with $y = R{\mathfrak {z}}\in {\mathbb {R}}^{m}$ and $X = RT \in {\mathbb {R}}^{m \times n}$. Note that y is a whitened version of ${\mathfrak {z}}$, with independent components and spherical covariance matrix. This is a lasso problem with a convex feasability constraint on $\varDelta $. The optimisation algorithm used to solve this problem is detailed in Sect. 4.

Estimation of ${\hat{\varSigma }}$ and tuning of $\lambda $.

Remember first that $\varSigma $ is completely determined by ${\alpha }_{\text {ou}}$ because of the link between ${\alpha }_{\text {ou}}$ and ${\sigma }_{\text {ou}}^2$. There are no closed-form expression for the maximum likelihood estimator of ${\alpha }_{\text {ou}}$. We therefore resort to numerical optimisation. To tune the parameter $\lambda $, we test several values to estimate models with different sparsity levels and select the best one using a modified BIC criterion:

$$\begin{aligned} ({{\hat{\alpha }}}_{\text {ou}}, {\hat{\lambda }}) = \mathop {\mathrm{argmin}}\limits _{\alpha > 0, \lambda \ge 0} \left\| {\mathfrak {z}}- T\varDelta _{\alpha , \lambda }\right\| _{\varSigma ^{-1}(\alpha ),2}^2 + \log |\varSigma (\alpha )| + \Vert \varDelta _{\alpha , \lambda }\Vert _0 \log (\log {m})\log {m}\nonumber \\ \end{aligned}$$

(6)

where $\varDelta _{\alpha , \lambda }$ is the solution of problem (4) for $\varSigma (\alpha )$ and $\lambda $. In practice, $\alpha $ and $\lambda $ vary in a bidimensional grid and we select the values that minimize the objective. We use a modified BIC, where $\log (\log {m})\log {m}$ replaces $\log {m}$, to account for the fact that m scales like n as suggested in Fan and Tang (2013).

3.2.2 Confidence intervals

Lasso procedures are known to produce biased estimators and do not return confidence intervals for the point estimate ${\hat{\mu }}_i$. Instead of simply returning all negative components of ${\hat{\mu }} = T{\hat{\varDelta }}$, we first debias the estimates and construct confidence intervals for the components of $\varDelta $, and in turn of ${\hat{\mu }}$, using the debiasing procedure of Javanmard and Montanari (2013, 2014); Zhang and Zhang (2014).

Debiasing. All debiasing procedures assume a model $Y \sim \mathcal {N}_{{m}} \left( {X\varDelta }, {\sigma ^2 I_m}\right) $ and require both an initial estimator ${\hat{\varDelta }}^{\text {(init)}}$ of $\varDelta $ and ${\hat{\sigma }}$ of $\sigma $. We use the scaled lasso (Sun and Zhang 2012) with the same negativity constraint as in (4):

$$\begin{aligned} \left( {\hat{\varDelta }}^{\text {(init)}}, {\hat{\sigma }}\right) = \mathop {\mathrm{argmin}}\limits _{\varDelta \in \mathcal {D}, \sigma > 0} \frac{\Vert y - X \varDelta \Vert _2^2}{2\sigma m} + \frac{\sigma }{2} + \lambda _{scaled} \Vert \varDelta \Vert _1 . \end{aligned}$$

(7)

Problem (7) can be solved efficiently by iterating between updates of (i) ${\hat{\sigma }}$ using the closed-form expression ${\hat{\sigma }} = \Vert y - X {\hat{\varDelta }}\Vert _2 / \sqrt{m}$ and (ii) of ${\hat{\varDelta }}$ by solving the constrained lasso problem (5) with tuning parameter $\lambda _{scaled}=\lambda m {\hat{\sigma }}$. Debiasing is achieved by the corrected update:

$$\begin{aligned} {\hat{\varDelta }}_j = {\hat{\varDelta }}_j^{\text {(init)}} + \frac{\langle s_j,y-X{\hat{\varDelta }}^{(\text {init})}\rangle }{\langle s_j,x_j\rangle }. \end{aligned}$$

(8)

where the $s_j$ form a score-system (SS). Intuitively, $s_j$ should form a relaxed orthogonalization of $x_j$ against other column-vectors of X. The $s_j$ are used to decorrelate the estimators. We used the strategy of Zhang and Zhang (2014) and take the residuals of a lasso regression of $x_j$ against $X_{-j}$. We also considered the alternative debiasing strategy of Javanmard and Montanari (2013, 2014), which is based on a pseudo-inverse of ${\hat{\varSigma }} = \frac{X^TX}{m}$. Their debiased estimate is again a simple update of the initial scaled lasso estimator:

$$\begin{aligned} {\hat{\varDelta }} = {\hat{\varDelta }}^{(\text {init})} + \frac{1}{m}SX^T \left( y-X{\hat{\varDelta }}^{(\text {init})}\right) \end{aligned}$$

but the decorrelation matrix S is computed in a so-called colwise inverse approach (CI), by inverting ${\hat{\varSigma }}$ in a columnwise fashion. Column $s_j$ is solution of the optimization problem :

$$\begin{aligned} \left\{ \begin{aligned}&s_j = \text {argmin}_{s \in {\mathbb {R}}^{n}} \ s^T{\hat{\varSigma }}s \\&\text {s.t.}\ \Vert {\hat{\varSigma }}s - e_j\Vert _{\infty } \le \gamma . \end{aligned} \right. \end{aligned}$$

(9)

where $e_j$ is the $j^\text {th}$ canonical vector and $\gamma \ge 0$ is a slack hyperparameter. If $\gamma $ is too small, the problem is not feasible (unless ${\hat{\varSigma }}$ is non singular). If $\gamma $ is too large, the unique solution is $s_j = 0$.

Confidence Interval. Zhang and Zhang (2014) showed that asymptotically ${\hat{\varDelta }} \sim \mathcal {N}\left( \varDelta , V\right) $ with the covariance matrix V defined by

$$\begin{aligned} v_{ij} = {\hat{\sigma }}^2 \frac{\langle s_i,s_j\rangle }{\langle s_i,x_i\rangle \langle s_j,x_j\rangle } . \end{aligned}$$

(10)

Similarly, the columnwise-inverse estimator of Javanmard and Montanari (2013) has asymptotic distribution $\mathcal {N}\left( \varDelta , V\right) $ with variance matrix $V = S {\hat{\varSigma }} S^T / m$. For both procedures, the bilateral confidence interval at level $\alpha $ for ${\hat{\varDelta }}_j$ is

$$\begin{aligned} IC_{\alpha }({\hat{\varDelta }}_j) = \left[ {\hat{\varDelta }}_j \pm \phi ^{-1}\left( 1-\frac{\alpha }{2}\right) \sqrt{v_{jj}} \right] . \end{aligned}$$

Note that the estimator of the $i^{\text {th}}$ component of $\mu $ can be written ${\hat{\mu }}_i = t_{i.}^T{\hat{\varDelta }}$ with $t_{i.}^T$ the $i^{\text {th}}$ row of T. Its unilateral confidence intervals at level $\alpha $ is thus given by $\left[ -\infty , {\hat{\mu }}_i + \sqrt{t_{i.}^T V t_{i.}} \phi ^{-1}\left( 1-\alpha \right) \right] $. We can thus simply check whether 0 falls in the interval to test $\mathcal {H}_{i0} : \{\mu _i = 0\}$ versus $\mathcal {H}_{i1}: \{\mu _i < 0\}$ at level $\alpha $ or compute the p-value of the one-sided test as:

$$\begin{aligned} {\mathfrak {p}}^\text {ss}_i = \varPhi \left( \frac{t_{i.}^T{\hat{\varDelta }}}{\left( t_{i.}^TVt_{i.}\right) ^{1/2}}\right) . \end{aligned}$$

(11)

3.2.3 FDR control

The debiasing procedure achieves marginally consistent interval estimation of the shifts $\varDelta $ but additional care is required to control the FDR when testing all components of $\mu $ simultaneously. We use the procedure proposed in Javanmard et al. (2019), which is specific to debiased lasso estimators, and relies on the t-scores ${\mathfrak {t}}_i = \frac{t_{i.}^T{\hat{\varDelta }}}{\left( t_{i.}^TVt_{i.}\right) ^{1/2}}$. Briefly, for FDR control at a given level $\alpha $, let $t_{\text {max}} = \sqrt{2 \log m - 2 \log \log m}$ and set:

$$\begin{aligned} t^{\star } = \inf \left\{ 0 \le t \le t_{\max } : \frac{2m(1 - \varPhi (t))}{R(t) \vee 1} \le \alpha \right\} \end{aligned}$$

where $R(t) = \sum _{i = 1}^m 1_{\{t_i \le -t\}}$ is the total number of rejections at threshold t, or $t^{\star } = \sqrt{2 \log m}$ if the previous expression is empty. Applying the procedure from Javanmard et al. (2019) strictly would replace 2m with m in the numerator, as we’re considering one-sided tests instead of two-sided ones for $\mu _i$. However, numerical analysis showed that the extra 2 led to better control of the FDR and we thus kept it. Hypothesis $\mathcal {H}_{i0}$ is rejected if ${\mathfrak {t}}_i \le -t^{\star }$ or in term of q-values if

$$\begin{aligned} {\mathfrak {q}}^{\text {ss}}_i := \frac{{\mathfrak {p}}^{\text {ss}}_i \alpha }{\varPhi (-t^{\star })} \le \alpha . \end{aligned}$$

(12)

Since ${\mathfrak {t}}$ itself depends on $\alpha $, the corrected p-values depend on $\alpha $, unlike in the standard BH procedure, where they only depend on the order statistics.

3.2.4 Algorithm

The algorithm 1 summarises our procedure. We call it zazou for "z-scores az Ornstein-Uhlenbeck".

4 Sign-constrained lasso

Our inference procedure is based on very standard estimates but requires to solve the following constrained lasso problem:

$$\begin{aligned} {\hat{\varDelta }} = \mathop {\mathrm{argmin}}\limits _{\varDelta \; \text {s.t.} \; T\varDelta \in {\mathbb {R}}_-^m} \frac{1}{2} \left\| y - X\varDelta \right\| _2^2 + \lambda \Vert \varDelta \Vert _1 . \end{aligned}$$

For arbitrary vector y and matrices X and T. This a convex problem as both the objective function and feasibility set are convex. We therefore adapt the shooting algorithm (Fu 1998), an iterative algorithm used to solve the standard lasso by looping over coordinates and solving simpler unidimensional problem, to our constrained problem.

Let $X_{-j}$ (resp. $\varDelta _{-j}$) be the matrix X (resp. vector $\varDelta $) deprived of its $j^\text {th}$ column (resp. $j^\text {th}$ coordinate). We can isolate $\varDelta _j$ in (5) and decompose the objective as $\Vert y - X\varDelta \Vert ^2_2 + \lambda |\varDelta | = \Vert y - z_j - x_j \varDelta _j \Vert ^2_2 + \lambda |\varDelta _j| + \lambda \Vert \varDelta _{-j}\Vert _1$ where $z_j = X_{-j}\varDelta _{-j} \in {\mathbb {R}}^{m}$. We can likewise decompose $T\varDelta = u_j + v_j\varDelta _j$ where $u_j = T_{-j}\varDelta _{-j}\in {\mathbb {R}}^{m}$ and $v_j = t_j$. When updating $\varDelta _j$, we can thus consider the simpler univariate problem in $\theta $:

$$\begin{aligned} {\left\{ \begin{array}{ll} &{} \mathop {\mathrm{argmin}}\limits _{\theta \in {\mathbb {R}}} h(\theta ) = \frac{1}{2} \Vert y - z - x\theta \Vert ^2_2 + \lambda |\theta | \\ &{}\text {s.t.}\ u + v\theta \le 0. \end{array}\right. } \end{aligned}$$

(13)

Let $I_+ = \{i: v_i > 0\}$ and $I_- = \{i: v_i < 0\}$ and denote $\theta _{\max } = \min _{I_{+}} \{ {-u_i}/{v_i} \}$ and $\theta _{\min } = \max _{I_{-}} \{ {-u_i}/{v_i} \}$ with the usual conventions that $\max (\emptyset ) = -\infty $ and $\min (\emptyset ) = +\infty $. Problem (13) is feasible only if (i) $\theta _{\min } \le \theta _{\max }$ and (ii) for all i, $v_i = 0 \Rightarrow u_i \le 0$, in which case the feasible region is $[\theta _{\min }, \theta _{\max } ]$. Computing the subgradient $\partial h(\theta )$ of h and looking for values $\theta $ such that $0 \in \partial h(\theta )$ leads to the usual shrinked estimates:

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{(y-z)^Tx+\lambda }{x^Tx} &{} \text {if } (y-z)^Tx< -\lambda , \\ \frac{(y-z)^Tx-\lambda }{x^Tx} &{} \text {if } (y-z)^Tx > \lambda , \\ 0 &{} \text {if } | (y-z)^Tx | < \lambda . \end{array}\right. } \end{aligned}$$

By convexity of h, the solution of (13) can be found by projecting the previous unconstrained minimum to the feasibility set. If problem (13) is feasible, its solution is thus given by

$$\begin{aligned} \theta ^{\star } = {\left\{ \begin{array}{ll} P_\mathcal {I} \left( \frac{(y-z)^Tx+\lambda }{x^Tx}\right) &{} \text {if } (y-z)^Tx< -\lambda , \\ P_\mathcal {I}\left( \frac{(y-z)^Tx-\lambda }{x^Tx}\right) &{} \text {if } (y-z)^Tx > \lambda , \\ P_\mathcal {I} (0) &{} \text {if } | (y-z)^Tx | < \lambda , \end{array}\right. } \end{aligned}$$

where $P_\mathcal {I} : u \mapsto \max (\theta _{\min }, \min (u, \theta _{\max }))$ is the projection of u on the segment $\mathcal {I} = [\theta _{\min }, \theta _{\max } ]$.

5 Synthetic data

5.1 Metagenomics

Metagenomics data are made up of three components. The first component is the count or abundance matrix $X = (x_{ij})$, with $1 \le i \le m$ and $1 \le j \le p$, which represents the quantity of taxa i in sample j. The second component is a set of sample covariates, such as disease status, environmental conditions, group, etc. The final component is a phylogenetic tree which captures the shared evolutionary history of all taxa. When performing DAT, we are interested in taxa whose abundance is significantly associated to a covariate.

Most DAT procedures proceed with univariate tests (one test per species) followed by a correction procedure. In the synthetic datasets, we consider discrete covariates only. Dozens of full-fledged testing pipelines are published each year, including some designed with omics data in mind. Since our goal is this study is to compare correction procedures rather than full testing procedures, we use Wilcoxon or Kruskall-Wallis tests, which are classical and widespread non parametric tests in metagenomics.

5.2 Simulations

Simulation scheme. We use the following simulation scheme:

1.
start with a homogeneous dataset,
2.
assign each sample to group A or B at random
3.
select differentially abundant taxa in a phylogenetically consistent manner (diffentially abundant taxa)
4.
apply a fold-change to the observed abundance of diffentially abundant taxa in group B.

This non-parametric simulation scheme was previously used in Bichat et al. (2020). We considered two variants for step 3, respectively called positive and negative. In the negative variant, differentially abundant taxa were selected randomly across the tree, so that the phylogeny is not informative. In the positive variant, taxa are instead selected in a phylogenetically consistent manner. Formally, the phylogeny was first used to compute the cophenetic (Sneath et al. 1973) distance matrix between taxa. A partioning around medoids algorithm was then used to create cluster of related species. One or more clusters were then picked at random and all species in those clusters were selected as differentially abundant.

For each fold-change ($\text {fc} \in \{3, 5, 10\}$), 500 simulated datasets were created, with a proportion of differentially abundant species ranging from 3 % to 35 %. For each simulation, we corrected p-values using no correction (Raw), BH procedure (BH), BY procedure (BY), StructFDR (TF) or our procedure with either score system (SS) or colwise inverse debiasing (CI), targeting in all instances a 5% FDR level. We compared the 6 procedures in terms of True Positive Rate (TPR), nominal FDR and AUC (Area Under the Curve).

Positive simulations.

The results of positive simulations (i.e. where the phylogeny is informative) are shown in Fig. 3. All correction methods have controlled the FDR at the target rate or below when the fold change is larger than 5. For smaller fold changes, both SS and CI variations of zazou exhibit nominal FDR slightly above the target level (up to 9% in the worst case). In all settings, BY had the lowest TPR, whereas TF was comparable to vanilla BH, in line with results of Bichat et al. (2020). Finally, zazou (both SS and CI variations) had the best overall TPR, with largest gains observed in the lowest fold-change setting.

The higher than intended FDR of zazou methods suggests that the problem of finding an adequate threshold for ${\mathfrak {p}}_i^{ss}$ is not completely solved by Javanmard et al. (2019) procedure. To assess the performance of zazou in a threshold-independent manner, we also compared the AUC of all procedures. Fig. 4 shows that zazou (both variants) has higher AUC than all other methods. As reported previously, TF and BH are at the same level and BY has the lowest ROC curve. Focus on the beginning of left hand side side of the curve shows that zazou is more efficient starting from the first discoveries.

Negative simulations. The negative simulations are designed to assess the robustness of our algorithm with respect to uninformative phylogenies, or equivalently mispecified hierarchies. Fig. 5 shows that, as expected, standard BH outperforms competing methods (in terms of AUC) when the tree is mispecified. Forcing an inadequate tree structure results in AUC losses ranging from 15 to 20 percentage points compared to no structure. The puzzling lack of AUC loss for the TF procedure is explained by an implementation trick: TF always performs BH correction in parallel to its hierarchical procedure and falls back to BH when the hierarchical procedure detects much fewer species than BH (Bichat et al. 2020; Xiao et al. 2017).

6 Application

We use our zazou procedure on a gut microbiota dataset from the Fiji Islands (Brito et al. 2016; Pasolli et al. 2017) to identify species that are differentially abundant between adults and children. The data sets consists in the abundances of $p=387$ species among $n = 146$ islanders, split into 112 adults and 34 children.

To mimick the simulation study, we used Wilcoxon tests for the univariate tests. Without correction, 21 species were detected as differentially abundant at the 5% level. None of them remained significant after correction by BH, BY, TreeFDR or treeclimbR. By contrast, zazou detected differentially abundant species with both desparsification methods: 17 for SS and 6 for CI.

Fig. 6 shows that they are not a strict subset of the 21 detected with no correction. Smoothing salvages some species that are closely related to one of the 21 without being significant on their own (red box in the figure). It also illustrate some numerical problems associated with colwise-inverse debiasing, which is highly sensitive to the choice of the slack hyperparameter $\gamma $. The window of relevant values for $\gamma $ is narrow and too large or too small values $\gamma $ respectively lead to no correction or a faulty p-value correction.

7 Conclusion

In this work, we introduced zazou, a new method for correcting p values in a hierarchical context. zazou is based on recasting the testing problem as a regression problem, under the framework of stochastic processes on an ultrametric tree, and using the tree topology as a regularization parameter.

It outperforms competing methods, hierarchical (TreeFDR, TreeclimbR) or not (BH, BY) in terms of AUC but this does not translate immediately to superior results in terms of FDR and TPR. The threshold for rejecting hypotheses is turned out to be quite difficult to calibrate while controling the FDR and warrants further work.

There are several other parts of the procedure that are not as powerful as expected. First, the BIC step used to select $\lambda $ and in turn the number of shifts tends to choose models with very few shifts, and sometimes even none. In such instances, the relevance of the debiasing step is limited. Second, the correction procedure proposed by Javanmard et al. (2019) is too conservative for our purpose. It was indeed developed to control both the FDR and the directional FDR (i.e. proportion of Type S errors, where the effect size have the wrong sign, in the discoveries) whereas we only need to control the former. For both these steps, specific developments taking into account the sign constraint on ${\hat{\mu }}$ and the structure of the topology matrix of tree $\mathcal {T}$ could lead to better performances for zazou.

References

Ambroise C, Dehman A, Neuvial P, Rigaill G, Vialaneix N (2019) Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms Mol Biol 14(1):22
Article Google Scholar
Bastide P, Mariadassou M, Robin S (2017) Detection of adaptive shifts on phylogenies by using shifted stochastic processes on a tree. J R Stat Soc Ser B (Stat Methodol) 79(4):1067–1093
Article MathSciNet MATH Google Scholar
Benjamini Y, Heller R (2007) False discovery rates for spatial signals. J Am Stat Assoc 102(480):1272–1281
Article MathSciNet MATH Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300
MathSciNet MATH Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 13:1165–1188
MathSciNet MATH Google Scholar
Bichat A, Plassais J, Ambroise C, Mariadassou M (2020) Incorporating phylogenetic information in microbiome differential abundance studies has no effect on detection power and fdr control. Front Microbiol 11:649. https://doi.org/10.3389/fmicb.2020.00649
Article Google Scholar
Blanchard G, Neuvial P, Roquain E (2020) Post hoc confidence bounds on false positives using reference families. Ann Stat 48(3):1281–1303. https://doi.org/10.1214/19-AOS1847
Article MathSciNet MATH Google Scholar
Bland JM, Altman DG (1995) Multiple significance tests: the Bonferroni method. BMJ 310(6973):170
Article Google Scholar
Ilana LB, Yilmaz S, Huang K, Xu L, Stacy DJ, Aaron PJ, Waisea N, Tamminen M, Smillie CS, Jennifer RW et al (2016) Mobile genes in the human microbiome are structured from global to individual scales. Nature 535(7612):435–439
Article Google Scholar
Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8(12):e1002822
Article Google Scholar
Chen J (2018) StructFDR: false discovery control procedure integrating the prior structure information. https://CRAN.R-project.org/package=StructFDR. R package version 1.3
Cremers HR, Wager TD, Yarkoni T (2017) The relation between statistical power and inference in fmri. PLoS ONE 12(11):e0184923
Article Google Scholar
Dunn JE, Gipson PS (1977) Analysis of radio telemetry data in studies of home range. Biometrics 13:85–101
Article MATH Google Scholar
Eickhoff SB, Thirion B, Varoquaux G, Bzdok D (2015) Connectivity-based parcellation: critique and implications. Hum Brain Mapp 36(12):4771–4792
Article Google Scholar
Fan Y, Tang CY (2013) Tuning parameter selection in high dimensional penalized likelihood. J R Stat Soc Ser B (Stat Method) 75(3):531–552
Article MathSciNet MATH Google Scholar
Freckleton RP, Harvey PH, Pagel M (2003) Bergmann’s rule and body size in mammals. Am Nat 161(5):821–825
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Gr Stat 7(3):397–416
MathSciNet Google Scholar
Goeman Jelle J, Livio Finos (2012) The inheritance procedure: multiple testing of tree-structured hypotheses. Stat Appl Genet Mol Biol 11(1):1–18
MathSciNet Google Scholar
Huang R, Soneson C, Germain P-L, Schmidt TSB, Von Mering C, Robinson MD (2021) Treeclimbr pinpoints the data-dependent resolution of hierarchical hypotheses. Genome Biol 22(1):1–21
Article Google Scholar
Javanmard Adel, Montanari Andrea (2013) Confidence intervals and hypothesis testing for high-dimensional statistical models. In: Advances in neural information processing systems, pp 1187–1195
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1):2869–2909
MathSciNet MATH Google Scholar
Javanmard A, Javadi H et al (2019) False discovery rate control via debiased lasso. Electron J Stat 13(1):1212–1253
Article MathSciNet MATH Google Scholar
Khabbazian M, Kriebel R, Rohe K, Ané C (2016) Fast and accurate detection of evolutionary shifts in Ornstein-Uhlenbeck models. Methods Ecol Evol 7(7):811–824
Article Google Scholar
Kim KI, Roquain E, van de Wiel MA (2010) Spatial clustering of array cgh features in combination with hierarchical multiple testing. Stat Appl Genet Mol Biol 9(1):159
Article MathSciNet MATH Google Scholar
Lande R (1976) Natural selection and random genetic drift in phenotypic evolution. Evolution 30(2):314–334. https://doi.org/10.1111/j.1558-5646.1976.tb00911.x
Article Google Scholar
MacLean OA, Lytras S, Weaver S, Singer JB, Boni MF, Lemey P, Kosakovsky PSL, Robertson DL (2021) Natural selection in the evolution of sars-cov-2 in bats created a generalist virus and highly capable human pathogen. PLoS Biol 19(3):e3001115
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 15:50–60
Article MathSciNet MATH Google Scholar
Matsen IV, Frederick A, Evans SN (2013) Edge principal components and squash clustering: Using the special structure of phylogenetic placement data for sample comparison. PLOS ONE 8(3):1–15. https://doi.org/10.1371/journal.pone.0056859
Article Google Scholar
McLachlan G, Peel D (2000) Finite Mixture Models. Wiley, New York
Book MATH Google Scholar
McLachlan GJ, Do K-A, Ambroise C (2005) Analyzing Microarray Gene Expression Data, vol 422. Wiley, New York
MATH Google Scholar
Meinshausen N (2008) Hierarchical testing of variable importance. Biometrika 95(2):265–278
Article MathSciNet MATH Google Scholar
Nåsell I (1999) On the time to extinction in recurrent epidemics. J R Stat Soc Ser B (Stat Methodol) 61(2):309–330. https://doi.org/10.1111/1467-9868.00178
Article MathSciNet MATH Google Scholar
Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong DT, Beghini F, Malik F, Ramos M, Dowd JB et al (2017) Accessible, curated metagenomic data through experimenthub. Nat Methods 14(11):1023
Article Google Scholar
Renaux C, Buzdugan L, Kalisch M, Bühlmann P (2020) Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput Stat 35(1):1–40
Article MathSciNet MATH Google Scholar
Sankaran K, Holmes S (2014) structssi: simultaneous and selective inference for grouped or hierarchically structured data. J Stat Softw 59(13):1
Article Google Scholar
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12(6):1–18
Article Google Scholar
Sesia M, Katsevich E, Bates S, Candès E, Sabatti C (2020) Multi-resolution localization of causal variants across the genome. Nat Commun 11(1):1–10
Article Google Scholar
Silverman JD, Washburne AD, Mukherjee S, David LA (2017) A phylogenetic transform enhances analysis of compositional microbiota data. eLife. https://doi.org/10.7554/elife.21887
Article Google Scholar
Sneath PHA, Sokal RR et al (1973) Numerical taxonomy. The principles and practice of numerical classification. Science 2:19
MATH Google Scholar
Sun T, Zhang C-H (2012) Scaled sparse linear regression. Biometrika 99(4):879–898. https://doi.org/10.1093/biomet/ass043
Article MathSciNet MATH Google Scholar
Tang Z-Z, Chen G, Alekseyenko AV, Li H (2017) A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33(9):1278–1285
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Tukey JW (1953) The problem of multiple comparisons. Mult Comp 2:39
Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 98(9):5116–5121
Article MATH Google Scholar
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics. Springer, pp 196–202
Xiao J, Cao H, Chen J (2017) False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing. Bioinformatics 33(18):2873–2881
Article Google Scholar
Yekutieli D (2008) Hierarchical false discovery rate-controlling methodology. J Am Stat Assoc 103(481):309–316
Article MathSciNet MATH Google Scholar
Zhang C-H, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B (Stat Methodol) 76(1):217–242
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LaMME, Université d’Évry val d’Essonne, 91000, Évry, France
Antoine Bichat & Christophe Ambroise
Enterome, 94-96 Avenue Ledru Rollin, 75011, Paris, France
Antoine Bichat
MaIAGE, INRAE, Université Paris-Saclay, 78350, Jouy-en-Josas, France
Mahendra Mariadassou

Authors

Antoine Bichat
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Ambroise
View author publications
You can also search for this author in PubMed Google Scholar
Mahendra Mariadassou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahendra Mariadassou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bichat, A., Ambroise, C. & Mariadassou, M. Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process. Comput Stat 37, 995–1013 (2022). https://doi.org/10.1007/s00180-021-01148-6

Download citation

Received: 18 December 2020
Accepted: 31 August 2021
Published: 12 September 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00180-021-01148-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process

Abstract

Similar content being viewed by others

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

Additive Conditional Independence for Large and Complex Biological Structures

Bayesian nonparametric clustering for large data sets

1 Introduction

2 Background