Skip to main content
Log in

Discovery of shared genomic loci using the conditional false discovery rate approach

  • Review
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

In recent years, genome-wide association study (GWAS) sample sizes have become larger, the statistical power has improved and thousands of trait-associated variants have been uncovered, offering new insights into the genetic etiology of complex human traits and disorders. However, a large fraction of the polygenic architecture underlying most complex phenotypes still remains undetected. We here review the conditional false discovery rate (condFDR) method, a model-free strategy for analysis of GWAS summary data, which has improved yield of existing GWAS and provided novel findings of genetic overlap between a wide range of complex human phenotypes, including psychiatric, cardiovascular, and neurological disorders, as well as psychological and cognitive traits. The condFDR method was inspired by Empirical Bayes approaches and leverages auxiliary genetic information to improve statistical power for discovery of single-nucleotide polymorphisms (SNPs). The cross-trait condFDR strategy analyses separate GWAS data, and leverages overlapping SNP associations, i.e., cross-trait enrichment, to increase discovery of trait-associated SNPs. The extension of the condFDR approach to conjunctional FDR (conjFDR) identifies shared genomic loci between two phenotypes. The conjFDR approach allows for detection of shared genomic associations irrespective of the genetic correlation between the phenotypes, often revealing a mixture of antagonistic and agonistic directional effects among the shared loci. This review provides a methodological comparison between condFDR and other relevant cross-trait analytical tools and demonstrates how condFDR analysis may provide novel insights into the genetic relationship between complex phenotypes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

Download references

Acknowledgements

National Institutes of Health (NS057198; EB00790); National Institutes of Health NIDA/NCI: U24DA041123; the Research Council of Norway (229129; 213837; 248778; 223273; 249711); the South-East Norway Regional Health Authority (2017-112); KG Jebsen Stiftelsen (SKGJ-2011-36).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Olav B. Smeland or Ole A. Andreassen.

Ethics declarations

Conflict of interest

OA.A. has received speaker’s honorarium from Lundbeck and is a consultant for Healthlytix. C.C.F. is under employment of Multimodal Imaging Service, dba Healthlytix, in addition to his research appointment at the University of California, San Diego. A.M.D. is a founder of and holds equity interest in CorTechs Labs and serves on its scientific advisory board. He is also a member of the Scientific Advisory Board of Healthlytix and receives research funding from General Electric Healthcare (GEHC). The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. Remaining authors have no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The condFDR/conjFDR software is available on https://github.com/precimed/pleiofdr as a MATLAB package, under GPL v3 license.

Box 1: Conditional and conjunctional false discovery rate

Box 1: Conditional and conjunctional false discovery rate

The ‘enrichment’ seen in the conditional Q–Q plots can be directly interpreted in terms of a Bayesian interpretation of the true discovery rate (TDR = 1 – false discovery rate (FDR)) (Efron 2010). More specifically, for a given p value, under a simple two-group (null and non-null) model, Bayes rule gives the posterior probability of being null as:

$${\text{FDR}}\left( p \right)\, = \,\pi_{0} F_{0} \left( p \right)/F\left( p \right),$$
(1)

where π0 is the proportion of null SNPs, F0 is the cumulative distribution function (cdf) of the null SNPs, and F is the cdf of all SNPs, both null and non-null (Efron 2007). Here, we assume the SNP p values are a priori independent and identically distributed. Under the null hypothesis, F0 is the cdf of the uniform distribution on the unit interval [0,1], so that Eq. (1) reduces to:

$${\text{FDR}}\left( p \right)\, = \,\pi_{0} p/F\left( p \right).$$
(2)

F can be estimated by the empirical cdf q = Np/Ν, where Np is the number of SNPs with p values less than or equal to p, and N is the total number of SNPs. Replacing F by q in Eq. (2), we get:

$${\text{Estimated FDR}}\left( p \right)\, = \,\pi_{0} p/q,$$
(3)

which is biased upwards as an estimate of the FDR (Efron and Tibshirani 2002). Replacing π0 in Eq. (3) with unity gives an estimated FDR that is further biased upward:

$$q^*\, = \,p/q.$$
(4)

If π0 is close to one, the increase in bias going from Eqs. (34) is minimal. The quantity 1 – p/q is, therefore, biased downward, and hence a conservative estimate of the TDR. Referring to the Q–Q plots, we see that q* is equivalent to the nominal p value divided by the empirical quantile, as defined earlier. We can thus read the FDR estimate directly off the Q–Q plot as:

$$- { \log }_{ 10} \left( {q^*} \right)\, = \,{ \log }_{ 10} \left( q \right){-}{ \log }_{ 10} \left( p \right),$$
(5)

i.e., the horizontal shift of the curves in the Q–Q plots from the expected line x = y, with a larger shift corresponding to a smaller FDR. To estimate the conditional FDR of a given SNP, we repeat the above procedure for a subset of SNPs with p values in the secondary GWAS equal to or lower than that observed for the given SNP. Formally, this is given by:

$${\text{FDR}}\left( {p_{1} |p_{2} } \right)\, = \,\pi_{0} \left( {p_{2} } \right)p_{1} /F\left( {p_{1} |p_{2} } \right),$$
(6)

where p1 is the p value for the first phenotype, p2 is the p value for the second, and F(p1 | p2) is the conditional cdf and π0 (p2) the conditional proportion of null SNPs for the first phenotype, given that p values for the second phenotype are p2 or smaller. The condFDR framework is closely related to the stratified FDR method developed by Sun et al. (2006). Whereas they propose computing FDR separately conditional on membership in pre-defined discrete strata of p values, here, we condition the estimated FDR on a continuous random variable, the SNP p values with respect to a second phenotype.

To identify SNPs jointly associated with two phenotypes using conjunctional FDR, the conditional FDR procedure is repeated after inverting the roles of the primary and secondary phenotypes. Similar to previous conjunction tests for p value statistics (Nichols et al. 2005), the conjunctional FDR estimate is defined as the maximum of both conditional FDR values, which minimizes the effect of a single phenotype driving the common association signal. Formally, the conjunctional FDR is given by:

$$\begin{aligned}& {\text{FDR}}_{{{\text{Phenotype1}}\& {\text{Phenotype2}}}} \left( {p_{1} , \, p_{2} } \right)\,\\ & \quad = \,\pi_{0} F_{0} \left( {p_{1} , \, p_{2} } \right)/F\left( {p_{1} , \, p_{2} } \right)\, \\&\quad+ \,\pi_{1} F_{1} \left( {p_{1} , \, p_{2} } \right)/F\left( {p_{1} , \, p_{2} } \right)\, \\&\quad+ \,\pi_{2} F_{2} \left( {p_{1} , \, p_{2} } \right)/F\left( {p_{1} , \, p_{2} } \right), \end{aligned}$$
(7)

where π0 is the a priori proportion of SNPs null for both phenotypes simultaneously and F0(p1, p2) is the joint null cdf, π1 is the a priori proportion of SNPs non-null for the first phenotype and null for the second with F1(p1, p2) the joint cdf of these SNPs, and π2 is the a priori proportion of SNPs non-null for the second phenotype and null for the first, with joint cdf F2(p1, p2). F(p1, p2) is the joint overall mixture cdf for all phenotype 1 and 2 SNPs.

Conditional empirical cdfs provide a model-free method to obtain conservative estimates of Eq. (7). This can be seen as follows: estimate the conjunction FDR by:

$$\begin{aligned} &{\rm{Estimated\, FDR}}_{{{{\rm Phenotype1}}} \& {{{\rm Phenotype2}}}}\\&\quad = {{\text{max}}} \, \left\{ {{\text{Estimated FDR}}_{{\rm Phenotype1} | {\rm Phenotype2}} ,}\right. \\&\quad\left.{{{\rm{ Estimated\, FDR}}}_{{{\rm Phenotype2} | {\rm Phenotype}1}} } \right\}, \end{aligned}$$
(8)

where Estimated FDRPhenotype1|Phenotype2 and Estimated FDRPhenotype2|Phenotype1 are conservative (upwardly biased) estimates of Eq. (6). Thus, Eq. (8) is a conservative estimate of max {p1/F(p1| p2), p2/F(p2|p1)} = max{p1F2(p2)/F(p1, p2), p2F1(p1)/F(p1, p2)}, with F1(p1) and F2(p2) the marginal non-null cdfs of SNPs for phenotypes 1 and 2, respectively. For enriched samples, p values will tend to be smaller than predicted from the uniform distribution, so that F1(p1) ≥ p1 and F2(p2) ≥ p2. Then, max {p1F2(p2)/F(p1, p2), p2F1(p1)/F(p1, p2)} ≥ [π0 + π1 + π2] max{p1F2(p2)/F(p1, p2), p2F1(p1)/F(p1, p2)} ≥ [π0p1p2 + π1p2F1(p1) + π2p1F2(p2)]/F(p1, p2).

Under the assumption that SNPs are independent if one or both are null, reasonable for disjoint samples, this last quantity is precisely the conjunctional FDR given in Eq. (7). Thus, Eq. (8) is a conservative model-free estimate of the conjunctional FDR.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smeland, O.B., Frei, O., Shadrin, A. et al. Discovery of shared genomic loci using the conditional false discovery rate approach. Hum Genet 139, 85–94 (2020). https://doi.org/10.1007/s00439-019-02060-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-019-02060-2

Navigation