Introduction

GWAS are commonly used to identify genetic associations with phenotypes of interest and have been successful in identifying a large number of associations with diverse traits and diseases. The GWAS models used to connect genotypes with phenotypes are based on linear regression (LR) models that can fail to capture more complex and dynamic relationships. Despite awareness of this complexity1,2, the standard models remain highly simplified, largely because the very large sample sizes required for genetic discovery preclude incorporating known complexities into the analytic framework (e.g. relevant environmental measures may not be available at this scale).

Linear regression models allow one to test whether genetic variants are associated with the mean of a phenotype distribution. However, the effect of a genetic variant can go beyond the mean and can include the lower or upper parts of the phenotype distribution. Several known examples include an earlier study by Yang et al.3 that showed that an SNP in the FTO gene is associated with both the mean and the variance of BMI, as well as several variance quantitative trait loci (vQTL) studies4,5,6,7,8. Genetic associations can be complex due to underlying heterogeneity in the population and disease model, and the effects of gene-gene and gene-environment interactions9,10. For eQTLs, several recent studies have shown the dynamic nature of eQTLs depending on the context or environment, providing evidence of widespread gene–environment interactions that can play a role in disease development11. LR, with its focus on the central part of the phenotype distribution, may not capture these more complex associations.

There are several existing methods for detecting genetic variants that control phenotypic variance4,12,13,14,15,16,17. However, the detected variance effects in practice are very few, and reflect only to a limited extent the existing effect heterogeneity, as we show in simulations and applications below. An alternative, natural extension of LR in the context of quantitative traits is based on quantile regression (QR18). Different from testing moments, quantile regression-based tests, as discussed here, target distribution-wise differences and can lead to the detection of more general distributional differences. In particular, QR allows one to model any quantile of the phenotype distribution and can be used to measure how a genetic variant associates with various quantiles of a trait.

QR techniques have only sporadically been utilized in GWAS studies, most of them of modest sample sizes6,19,20,21,22, despite the similarity to LR, the availability of efficient implementations for genome-wide analysis in standard statistical packages6,23,24, and some notable advantages as follows. First, QR provides a better characterization of the underlying genotype-phenotype association, even at those associations identified by LR. Second, QR does not have assumptions on the error distribution, which means that it is robust to non-Gaussian errors and outliers. While in linear regression, traits are usually normalized using, for example, rank-based inverse normal transformation, and results (type-1 error and power) can depend on the particular transformation25,26, with quantile regression results are invariant to monotonic transformations and therefore, no particular trait transformation is required. It is, of course, highly desirable that trait transformations do not affect our conclusions based on the regression results. Furthermore, this invariance property also helps when certain transformations are preferable to get a more natural interpretation of the association patterns.

Furthermore, QR can be helpful in downstream analyses, such as polygenic risk score (PRS) construction. Specifically, QR provides estimates of regression coefficients at specific quantile levels, with a similar interpretation as in LR. Namely, a QR coefficient represents how much a specific quantile of the phenotype distribution changes with a 1-unit shift in the genotype distribution (e.g. for the 0.5 quantile, this would correspond to the change in median). While the PRS derived from LR models in GWAS are based on predicting the conditional phenotypic mean (conditional on genotypes), QR allows PRS construction at different quantile levels, essentially allowing us to predict the entire conditional phenotypic distribution, not just the mean.

We illustrate below the performance of QR in simulations using several homogeneous and heterogeneous models, and compare it with LR and a recent variance-based test, QUAIL17. We also illustrate the practical benefits in the GWAS setting, by applying it to biomarkers from the UKBB27.

Results

Overview of quantile regression techniques

We describe here briefly the quantile regression model as a complementary strategy to the commonly used linear regression model in GWAS. More details are in the Methods section. We assume that we have n independent samples from a population. We denote by \({\bf{Y}}=({Y}_{1},\cdots \,,{Y}_{n})^{\prime}\) the n × 1 sample phenotype vector, by X the n × p genotype matrix, and by C the n × q matrix for covariates. We denote the τth conditional quantile function of Y as QY(τXC). Then we can write the conditional quantile regression model for jth variant and for a specific quantile level τ ∈ (0, 1) as

$${Q}_{{\rm {Y}}}(\tau | {{\bf{X}}}_{j},\,{\bf{C}})={{\bf{X}}}_{j}{{\boldsymbol{\beta }}}_{j}(\tau )+{\bf{C}}{\boldsymbol{\alpha }}(\tau ),$$
(1)

where βj(τ) and α(τ) are quantile-specific coefficients and can be estimated by minimizing the pinball loss function18 as implemented in the R package quantreg. For testing H0βj(τ) = 0, a commonly used tool in quantile regression is the rank score test6,28, implemented in the R package QRank. The rank score test statistic for a fixed quantile level τ is defined as

$${{\bf{S}}}_{{\rm{QRank}},j,\tau }={n}^{-1/2}\mathop{\sum }\limits_{i=1}^{n}{{\bf{X}}}_{ij}^{*}{\phi }_{\tau }({{\bf{Y}}}_{i}-{{\bf{C}}}_{i}\hat{{\boldsymbol{\alpha }}}(\tau ))\,{\rm{and}}\,{{\bf{V}}}_{{\rm{QRank}},j}={n}^{-1}\tau (1-\tau ){{\bf{X}}}_{j}^{*^{\prime} }{{\bf{X}}}_{j}^{*}.$$
(2)

where X* = PCX, \({{\bf{P}}}_{{\rm {C}}}={\bf{I}}-{\bf{C}}{({\bf{C}}^{\prime} {\bf{C}})}^{-1}{\bf{C}}^{\prime}\), and ϕτ(u) = τI(u < 0) is the τ-pinball loss. Under the null hypothesis H0βj(τ) = 0, \({{\bf{V}}}_{{\rm{QRank}},j}^{-1/2}{{\bf{S}}}_{{\rm{QRank}},j,\tau } \sim N(0,1)\). Note that the asymptotic distribution of the test statistics is independent of the particular distribution of the phenotype, and hence the rank score test can be applied to any phenotype without requiring a pre-transformation to achieve normality. Throughout this paper, we use nine quantiles spaced equally at 0.1 intervals and combine quantile-specific p-values via Cauchy combination29.

QR is powerful across different homogeneous and heterogeneous models

We perform simulations based on UKBB genotype data on 20,000 white British unrelated individuals and focus on one locus (chr8:19,291,998–20,291,997; GRCh37/hg19) with 4148 variants. We assume only one causal variant selected at random among variants with minor allele frequency (MAF) > 1% and with varying effect size: 0.01 ≤ β ≤ 0.2. We simulate phenotype data using several models, including a homogeneous linear model (with effects being constant across quantiles) with the error distribution being normal or a heavy tail Cauchy distribution; a location-scale model; a local model where effects only exist at the upper quantiles; and a model with both additive and dominance effects. Trait is normalized using rank-based inverse normal transformation for LR only; raw trait values are used for QR and QUAIL. We assess the power of LR, QUAIL, and QR based on 100 Monte Carlo replicates. Note that the type-1 error is well controlled under both normal and Cauchy error distribution at individual quantile levels (Table S1). There is slight inflation at the 0.05 level when using the Cauchy combination method to combine p-values across quantile levels, due to the fact that the Cauchy combination method is known to be most accurate in the tail of the null distribution29, and therefore smaller p-values are approximated better. Given that in GWAS, we are interested in identifying variants at stringent significance levels, the Cauchy combination method should be an appropriate strategy to combine p-values.

Homogeneous linear model with normal or Cauchy error distribution

we assume the phenotype Y follows the model:

$${\bf{Y}}=\beta {\bf{X}}+{\boldsymbol{\epsilon }},$$
(3)

where X is the genotype at the causal variant, ϵ ~ N(0, 1) or ϵ ~ Cauchy(0, 1). For the normal error, LR has a slight power increase over QR (Fig. 1a), while for the Cauchy error (heavy tail distribution), QR has higher power than LR (Fig. 1b). QUAIL has no power in these scenarios.

Fig. 1: Power results in simulations.
figure 1

a Homogeneous model, normal error distribution; b homogeneous model, Cauchy error distribution; c location-scale model; d local model; e model with additive and dominance effects. Power is shown when beta varies from 0.01 to 0.2. For the model with additive and dominance effects, the dominance effect is fixed at 0.2, while the additive effect varies from 0.01 to 0.2. Figures on the left show densities of the raw trait values by genotype group for one replicate. Figures in the middle show the empirical conditional quantile functions for τ = 0.1−0.9 for the same replicate for beta = 0.1 (betaA = 0.1 and betaD = 0.2 for the additive + dominance model). Statistical significance was determined by the QR rank score test described in the Methods section. Shown in black is also the LR line fitted to the data.

Location-scale model

we also simulate the phenotype Y from the following model:

$${\bf{Y}}=\beta {\bf{X}}+(1+0.5\beta {\bf{X}}){\boldsymbol{\epsilon }},$$
(4)

where ϵ ~ N(0, 1). In this model, X is associated with both the mean and the variance of Y. QR has the highest power in this scenario, followed by QUAIL, while LR has the lowest power, as expected (Fig. 1c).

Local model with effects only at upper quantiles

we additionally simulate data under a local model with effects only at the upper quantiles τ ∈ [0.7, 1]:

$${Q}_{{\rm {Y}}}(\tau | {\bf{X}})={\boldsymbol{\beta }}(\tau ){\bf{X}}+{{\boldsymbol{\alpha }}}_{0}(\tau ),$$
(5)

where β(τ) = 5β(τ−0.7)/(1−0.7) when τ > 0.7 and β(τ) = 0 otherwise. The error distribution (\({{\boldsymbol{\alpha }}}_{0}^{-1}(\tau )\)) is assumed to be normal. Such a model can correspond to a gene–environment interaction, where a genetic variant only shows an effect on a phenotype in a certain environment. Under this local model, QR can have significant increase in power over LR (Fig. 1c). QUAIL has similar power as QR.

Model with both additive and dominance effects

finally, we consider a model that includes both additive and dominance effects as follows:

$${\bf{Y}}={\beta }^{{\rm {a}}}{{\bf{X}}}^{{\rm {a}}}+{\beta }^{{\rm {d}}}{{\bf{X}}}^{{\rm {d}}}+{\boldsymbol{\epsilon }},$$
(6)

where Xa corresponds to the additive coding 0, 1 and 2, while Xd is coded as \(-p/(1-p),\, 1,-\left.(1-p)/p\right)\) with p being the MAF, and ϵ ~ N(0, 1). When the dominance effect is strong enough, QR can indeed be more powerful (Figs. 1d, S1). QUAIL overall has lower power than both LR and QR.

Applications to UKBB GWAS

To illustrate QR in a modern GWAS setting and contrast the results with LR and QUAIL, we use the UKBB data (between 264,854 and 325,825 white British, unrelated/independent individuals and  ~8.6 million SNPs), and perform GWAS for 39 quantitative traits (Fig. S2). We adjust for several covariates, including age, sex, batch, and 10 principal components (PC) of genetic variation. We identify significant SNPs using a genome-wide significance threshold of 5 × 10−8 for each trait (Fig. S4S13). We define a locus as a 1Mb segment around the top p-value SNP in a region. LR detects a larger number of significant associations relative to QR (2,039,518 genome-wide significant SNPs and 9964 loci vs. 1,517,669 SNPs and 7297 loci). Most of the QR discoveries are shared with LR, but QR also identifies new loci that LR misses (Fig. 2a, S2, and Supplementary Data 1). In contrast, QUAIL detects a few significant associations across these 39 biomarkers, most of which are already detected by QR (Fig. S3). Specifically, on average, only 9% (ranging from 0% to 33%) of the QR associations were also detected by QUAIL.

Fig. 2: Number of genome-wide significant loci identified by LR and QR for each of 39 traits, effect size heterogeneity, and functional score comparison.
figure 2

a Number of genome-wide significant loci. Stacked barplots show the corresponding number of significant loci for LR&QR, LR–QR, and QR–LR. b Densities of heterogeneity indexes for GWAS associations for 39 traits. Densities are shown for three sets of associated SNPs: LR&QR, LR–QR, and QR–LR. c Functional scores (EigenPC and CADD) for the associated SNPs detected by QR–LR compared to those from LR–QR. Each box plot represents the median (line inside the box) and interquartile range (box top and bottom) and extends to 1.5 times the interquartile range (whiskers). One-sided p-values (QR–LR > LR–QR) from the Kolmogorov–Smirnov test are also shown comparing the functional effects for 42,291 vs. 564,140 associations identified by QR–LR vs. LR–QR.

We replicate the genome-wide significant associations above in an independent dataset of European individuals from UKBB (n ≈ 34,000–39,000). Given the much smaller sample sizes for the replication datasets relative to the discovery datasets, we report replication at nominal significance level (p-value < 0.05). One challenge with QR is that the result is a composite of results across multiple quantile levels and the p-value is obtained by Cauchy combination of individual quantile-level p-values which makes it more difficult to take the sign of the associations into account when considering replication. To overcome this challenge, we used the sign of the estimated effect β(τ) for the quantile with the lowest QR p-value in the discovery dataset and compared it to the sign of the estimated effect for the same quantile level in the replication dataset. QR has replication rates at least as high as LR (68% vs. 62% on average across the 39 traits, Fig. S14, Table S2). For both LR and QR, sign consistency tends to be very high (≈96%). Note also that most of the lead QR variants at loci only identified by QR still have suggestive signals in LR, suggesting that many of them could be identified in larger GWAS (Fig. S15).

Since QR allows us to estimate quantile specific effects across a whole range of quantiles, we can quantify the degree of heterogeneity in effects across the quantiles for individual SNPs. Specifically, for a SNP we compute a heterogeneity index as the log transformed ratio between the standard deviation and the absolute mean of the estimated quantile coefficients β(τ)’s: \(\log | {\rm{sd}}(\beta (\tau ))/{\rm{mean}}(\beta (\tau ))|\). The higher the value, the more heterogeneity in the quantile effects. As expected, SNPs identified by QR but missed by LR (QR p-value < 5e−08, LR p-value > 5e−08) are the most heterogeneous (Fig. 2b). Furthermore, they have significantly higher functional scores relative to those only identified by LR (Eigen-PC30 p = 0.031, Fig. 2c). Together with the higher replication rates and the high sign consistency mentioned above, these results suggest that the new QR-LR loci are at least as reliable as the loci identified only by LR.

Next, we focus on two traits, BMI and LDL, to provide more specific results. Both traits have a right-skewed distribution. For BMI, QR identifies 188 loci, of which 172 are shared with LR, and 16 are unique to QR (Figs. 3a and S16, Table S3). Variants identified by QR but missed by LR have heterogeneous effects across quantile levels as expected, with larger effects at the upper quantiles (Fig. S17), likely due to the right skewness of the BMI distribution. For LDL, QR identifies 93 loci, with 87 of them being shared with LR, and 6 being unique to QR (Fig. 3b and S18, Table S3). As with BMI, variants identified by QR but missed by LR have larger effects at the upper quantiles (Fig. S17). Moreover, variants identified by QR but missed by LR have more heterogeneity (Fig. S19a) and higher functional scores (Fig. S19b) relative to those only identified by LR.

Fig. 3: Manhattan sunset plot for BMI and LDL.
figure 3

\(-{\log }_{10}(p\,\text{-value}\,)\) of the genome-wide significant associations (p-value < 5 × 10−8) detected by the LR (blue bars) and QR (red bars) for a BMI and b LDL are plotted against the genomic positions. The lead QR association at a locus detected by QR–LR is represented by a red diamond with the name of the closest protein-coding gene indicated.

Next, we illustrate the advantages of QR when there is effect heterogeneity by showing results at two loci detected by QR but missed by LR for two traits, BMI and LDL. For BMI, we show the locus corresponding to the lead QR SNP rs12037905 (Fig. 4a), which falls within LYPLAL1-antisense RNA1 (LYPLAL1-AS1), a long noncoding RNA gene, which has been found previously to be associated with BMI31 and lipid traits such as high-density lipoprotein (HDL) cholesterol and triglycerides (TG) levels32. Yang et al. described a mechanism by which long noncoding RNA regulates adipogenic differentiation of human mesenchymal stem cells and suggested that LYPLAL1-AS1 may serve as a novel therapeutic target for preventing and combating diseases related to abnormal adipogenesis, such as obesity33.

Fig. 4: Examples of QR–LR and QR& LR significant loci for BMI and LDL.
figure 4

a and b QR–LR significant loci; c and d QR& LR significant loci. The regional plots show \(-{\log }_{10}(p\,\text{-value}\,)\) of QR with the lead QR SNP per locus shown as diamond. Colors indicate the LD (r2) to the lead SNP for each variant. For the lead QR SNP at each locus, the estimated quantile specific effects ± SE are shown as dots and error bars in the middle panels, and the empirical conditional quantile functions for τ = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 quantiles of BMI and LDL by genotype group (one gray dot per individual) are shown in the right panels. Sample size n = 325,306 (BMI) and 310,574 (LDL). Statistical significance was determined by the QR rank score test described in the “Methods” section.

For LDL, we show the lipoprotein lipase (LPL) locus with lead QR SNP rs326 (Fig. 4b), which is an intron variant within LPL. LPL, encoding lipoprotein lipase, plays an important role in the clearance of TG and the maturation of HDL. A higher concentration of serum LPL is associated with a lower/higher level of TG/HDL cholesterol and contributes to a reduced risk of coronary artery disease (CAD)34,35. rs326 at LPL is a pleiotropic variant associated with CAD36 and lipid metabolism37. Large lipid genetic studies have shown that the rs326 G allele is associated with increased HDL cholesterol levels and decreased TG, suggesting a protective effect for CAD risk37. In our QR analyses, the rs326 G allele is significantly associated with decreased LDL at upper quantiles (Fig. 4b), suggesting a protective effect against CAD risk among individuals with higher levels of LDL.

The advantage of QR is not restricted to identifying novel loci, missed by LR. QR can provide additional insights even for loci detected by LR. We show here two examples corresponding to well known associations: the FTO locus and BMI, and the PCSK9 locus and LDL. These loci have strong associations from the conventional LR approach, yet QR provides additional information on the heterogeneity of effects across different quantile levels beyond what one can get from LR (Fig. 4c, d). In particular, the rs11642015 T allele at the FTO locus (intron variant) has higher effects at the upper quantiles of the BMI distribution, while the rs11591147 T allele at the PCSK9 locus (missense variant) has stronger negative (protective) effect at the upper quantiles of the LDL distribution. Both variants have been reported before as likely to play a functional role38,39. In particular, the FTO rs11642015 resides in an enhancer that regulates the expression of the neighboring IRX3/IRX5 cluster through long-range interactions with the IRX3/IRX5 gene promoters, and was shown to repress mitochondrial thermogenesis in adipocyte precursor cells40.

Heterogeneous associations identified by QR could be the result of gene-environment interactions. Even though gene–environment interactions are believed to be common, detecting them is challenging due to several factors, including lower statistical power relative to detecting main effects, the low prior probability of specific gene–environment interactions, etc. Furthermore, the relevant environmental exposures can be missing from the study, and even when measured, accurate measurements of exposure may not be available. The significant SNPs identified by QR can be subsequently tested for interactions with particular environmental factors using conventional gene–environment tests. Note that this is a two-stage design: first, we identify significant SNPs from a QR scan (only for SNP effect, without any environmental factor in the model), and second, we test for interactions between these SNPs and particular environmental factors using well established gene–environment tests with Bonferroni method to adjust for the number of gene–environment tests in the second stage (note that the two stages are independent).

We selected five environmental factors that are available in the UKBB, including sex, age, physical activity (PA), sedentary behavior (SB) and smoking, and tested for interactions with top SNPs detected by QR. Overall, using lead QR SNPs at 7297 loci across 39 traits we identify 139 significant gene-environment interactions (significant after multiple testing adjustment for tests for each phenotype), 58 of which are significant at the study-wide level (p < 0.05/(7297 × 5) = 1.37e−06), after adjusting for all gene–environment tests across all phenotypes (Table S4 and Fig. 5a). To test the enrichment of gene–environment interaction effects among the lead QR variants, we repeated the gene–environment tests using the same number of randomly selected significant variants (LR or QR) across 39 traits. Over 1000 replicates, we observed, on average, 16 significant gene–environment interaction effects, significantly lower than the 58 observed interactions with leading QR variants (Fig. 5b). If we focus on QR–LR loci, only two study-wide significant interactions are detected among 259 × 5 tests: ARHGEF3 with age for platelet distribution width (p = 9.09e−16), and RSPO3 with sex for waist circumference (p = 7.49e−07). Note that this is still an 11-fold enrichment over the 2 study-wide significant interactions detected at the LR-QR loci among 2926 × 5 tests (p = 0.03). Similar results hold if we restrict to BMI and LDL (Fig. S20), for which most of the significant interactions are with age or sex. For example, for LDL we detected significant interactions between variants in PCSK9, APOB, CELSR2, LPA, and age. For BMI, we identified significant interactions between a variant in FTO and physical activity, and a variant in FAIM2 and sedentary behavior.

Fig. 5: Significant gene–environment interaction effects of the lead QR variants for 39 traits.
figure 5

a The heatmap shows \(-{\log }_{10}(p\,\text{-value}\,)\) of gene-environment interaction tests for the lead QR association at the loci detected by QR. * denotes nominally significant GxE interaction effects (p-value  < 0.05) and ** denotes significant effects after Bonferroni correction (p < 0.05/(5 ⋅ 7, 297) = 1.37e − 06). Sample size by trait can be found in Table S4. b The empirical distribution of the number of significant gene-environment interactions when variants are chosen randomly among significant variants identified by LR or QR.

We attempted to replicate the study-wide significant gene-environment interactions above in an independent dataset of European individuals from UKBB (Table S4). There is strong correlation among estimated effect sizes from the discovery and replication (r = 0.77) with almost perfect sign consistency with 56 out of 58 interaction effects showing the same sign (p < 2.2e − 308). In terms of p-value, 25 out of 58 tests are nominally significant (p < 2.2e − 308), with 6 out of 58 being significant after adjusting for 58 tests, including variants in TCF7L2 and age for Glycated hemoglobin (HbA1c), variants in SMOX and sex for High light scatter reticulocyte count, Reticulocyte percentage, and Immature reticulocyte fraction, variants in CSP1 and sex for Mean platelet (thrombocyte) volume. Similar results are obtained when relaxing the list of discovered interactions to the ones that are significant for each trait individually Supplementary Data 2.

As mentioned above, one highly significant interaction was detected with rs1354034, a SNP with heterogeneous quantile effects that was completely missed by LR (Fig. 6). Interestingly, a strong non-additive association was previously observed for platelet distribution width at the same variant rs1354034, associated with increased expression of ARHGEF3 in platelets, the association being highly specific to platelets and not detected in other major blood cell types41,42. Platelet distribution width describes how similar the platelets are in size, with high values meaning great variation is size and being associated with vascular disease or certain cancers. The interaction of rs1354034 with age may be indicative of increased platelet activation with age which leads to more variable platelet width43. Specifically, at younger age there are low levels of platelet activation and the risk genotype may be silent. As platelet activation increases with age, the genotype expresses itself, which leads to stronger age-phenotype associations in the presence of the risk genotype. rs1354034 is a putative functional variant and has been studied before in connection with the platelet-related phenotypes. In particular, rs1354034 disrupts a conserved GATA motif and resides in a regulatory element that has previously been shown to affect the transcription of nearby ARHGEF344. Additionally, rs1354034 is associated in trans with the expression of von Willebrand factor (VWF) and other key platelet/megakaryocyte genes found at other loci45.

Fig. 6: Association of rs1354034 at the ARHGEF3 locus with Platelet distribution width.
figure 6

a \(-{\log }_{10}(p\,\text{-value}\,)\) of QR with the lead QR SNP per locus shown as diamond. b \(-{\log }_{10}(p\,\text{-value}\,)\) of LR with the lead QR SNP per locus shown as diamond. Colors indicate the LD (r2) to the lead SNP for each variant. c The estimated quantile specific effects (dots) and the SE (error bars) are shown at τ = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9. d The curves are the τ = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 empirical conditional quantile functions. Shown in black is also the LR line fitted to the data. e gene-environment interaction between rs1354034 and age. The statistics in a-e were derived from n = 316,707 individuals used in the platelet distribution width GWAS. Statistical significance was determined by QR rank score test in panels a-d and two-sided t-test in panel e.

Discussion

In this paper we demonstrate the usefulness of QR in the context of GWAS for quantitative phenotypes, and how it can complement conventional GWAS based on LR. Although QR has been employed before in the context of genetic studies, previous applications were limited to small GWAS studies of single phenotypes19,20 or small simulation studies22. The current paper expands significantly on the existing work by providing applications of QR at biobank scale and for many phenotypes. Identifying heterogeneous associations as made possible by QR should be of wide interest to genetic association studies of medically relevant traits. In particular, it is often the case that high-risk subgroups (e.g. high risk for heart disease) have either high or low values for continuous risk factors, such as BMI or LDL cholesterol. Furthermore, QR is robust to the non-normality of traits (including heavy-tailed distributions) and outliers, a common practical problem when conducting quantitative trait analyses in large, heterogeneous GWAS studies. Although we have focused our discussion on GWAS with independent subjects, it is possible to extend QR to studies with related individuals by including random effects whose covariance structure depends on the genetic relatedness matrix46. However, these existing methods are not computationally feasible at the biobank scale, and we plan to work on developing scalable methods in this setting in the future.

SNPs with heterogeneous effects that are detected by QR can arise due to various factors, including gene-gene or gene-environment interactions, time-dependent effects, or measurement errors. Similarly, genetic nurturing effects47, where sequence variants in the parental genomes that are not transmitted to a child interact with the child’s genotype to affect the phenotype, can also lead to heterogeneous effects. Elucidating the precise source of heterogeneity is difficult in general but may be possible by, for example, post-hoc incorporation of explicit interactions (when available) in the model as we have done here, and can suggest particular contexts in which GWAS-discovered variants manifest their effects.

An appealing aspect of QR relative to other non-LR tests in the literature is its parallel to widely used LR techniques. Like LR, QR is a regression framework and inherits its advantages, such as the ability to do covariate adjustment and to estimate effect sizes of genetic variants. However, whereas LR only produces regression coefficients that quantify the genetic effect on the phenotypic mean, QR produces quantile-specific regression coefficients across a range of quantile levels which characterize the genetic effects on the conditional quantiles instead. Such rich information allows not only better characterization of the genotype-phenotype relationship but also allows one to generalize the phenotypic prediction based on genetic variants, from predicting the conditional mean to predicting the entire conditional distribution of the phenotype. Specifically, in the usual polygenic risk scores, we aim to predict the conditional mean of a phenotype, given the genotype profile of an individual: E(YX)48. QR provides the opportunity to generalize this phenotypic prediction to the entire conditional distribution: QY(τX) for any quantile level τ ∈ (0, 1). That allows us to, for example, compute for a given genotype profile, the probability that an individual with that profile has PRS above a given threshold, or the proportion of individuals with that profile expected to be in a high risk group. Similar to confidence intervals for the conventional PRS, we can construct quantile prediction intervals using the predicted conditional quantiles. It will be of interest to investigate this direction in detail in future work.

There are several potential limitations of QR. As discussed, QR is able to provide quantile-specific p-values, and type-1 error is well controlled at the individual quantile levels. There are different ways to combine correlated quantile-specific p-values into one single p-value. Here, we illustrated one such approach based on the Cauchy combination method, which provides an analytical approximation to the combined p-value. Note that this approximation works best for very small p-values (in the tails) as one can see also from the empirical results presented, and may show slight inflation at larger significance levels (e.g. 0.05). Given that we are interested in identifying SNPs significant at stringent significance levels, this approximation is a reasonable method to combine p-values across quantile levels. Another limitation of QR is that heterogeneous effects, as discussed above, can arise due to various factors, and hence care needs to be taken when interpreting the results. Careful investigation in post-hoc analyses (beyond QR) needs to be undertaken to understand the source of heterogeneity.

QR is computationally more expensive than LR due to iterative interior point methods used to optimize the loss function, but it can be scaled up to biobank-sized datasets, as illustrated here. To give a specific example, for BMI (with 325,306 individuals and 8,572,925 variants), LR with PLINK249 took 92 h of CPU time, and QR (nine quantile-specific tests) with the QRank package in R took 9534 h of CPU time. This runtime includes both the estimation of effect sizes at every SNP and the rank score testing at nine quantile levels. It is possible to speed up the time for QR analyses by focusing on genome-wide testing, while only estimating effects for regions of interest, such as genome-wide significant SNPs or loci. This way, the runtime can be reduced to a total of 1946 h of CPU time, i.e. an average of 216 h per quantile level. Moreover, both LR and QR can be accelerated by performing tests in parallel on small genomic regions. In our implementation, the genome-wide variant set was split into 1,000 segments of ~8.57K variants, with each segment taking about 0.1 h for LR and 9.5 hours for QR (1.9 hours if only performing testing) with a single CPU core. With this data scatter and parallel strategy, the elapsed wall-clock time with 1000 CPU cores became reasonable (~1.9 h for nine quantile levels). It is also possible to use more computationally efficient optimization techniques for the loss function in QR based on stochastic gradient descent50 to reduce the computational requirements, and this will be the subject of future work.

In summary, we showed that QR is a useful yet underutilized complement to LR in GWAS of biomarkers and quantitative traits, and can lead to biologically meaningful discoveries. QR bears many similarities to LR and can be run efficiently using available R packages at biobank scale. We recommend adding QR to the toolbox when performing biomarker GWAS studies.

Methods

We first provide some basic details on the linear and quantile regression models as they relate to the GWAS setting.

Linear regression (LR)

We assume that we have n independent samples from a population, and for the ith subject, we denote by Yi a continuous (quantitative) phenotype, by \({{\bf{X}}}_{i}=({X}_{i1},\cdots \,,{X}_{ip})^{\prime}\) the genotypes at p genetic variants, and by \({{\bf{C}}}_{i}=({C}_{i1},\cdots \,,{C}_{iq})^{\prime}\) covariates we want to adjust for, such as age, gender and principal components of genetic variation. Furthermore, we denote by \({\bf{Y}}=({Y}_{1},\cdots \,,{Y}_{n})^{\prime}\) the n × 1 sample phenotype vector, by X the n × p genotype matrix, and by C the n × q matrix for covariates. We assume that Y is normalized.

First, we consider fitting all p variants in a genetic region using a joint linear model:

$${\bf{Y}}={\bf{X}}{{\boldsymbol{\beta }}}_{{\rm{all}}}+{\bf{C}}{\boldsymbol{\alpha }}+{\boldsymbol{\epsilon }},$$
(7)

where α ∈ Rq and βall ∈ Rp are coefficients, and \({\boldsymbol{\epsilon }} \sim N(0,{\sigma }_{Y}^{2})\). We define the error projection matrix \({{\bf{P}}}_{{\rm {C}}}={\bf{I}}-{\bf{C}}{({\bf{C}}^{\prime} {\bf{C}})}^{-1}{\bf{C}}^{\prime}\). We project Y and X onto the orthogonal complement of the column space of C, then the estimate for βall and its variance can be obtained as

$${\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}}={({\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{\bf{X}})}^{-1}{\bf{X}}^{\prime} {{\bf{P}}}_{\rm {{C}}}{\bf{Y}}\,{\rm{and}}\,{\rm {var}}({\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}})={({\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{\bf{X}})}^{-1}{\sigma }_{Y}^{2}.$$
(8)

The z score test statistic is defined as

$${{\bf{Z}}}_{{\rm{all}}}=\frac{{\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}}}{{\rm {se}}({\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}})}={\sigma }_{Y}^{-1}{({\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{\bf{X}})}^{-1/2}{\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{\bf{Y}}.$$
(9)

In GWAS, the typical approach is to perform marginal (unconditional) testing for each variant j at a time. Therefore the model becomes:

$${\bf{Y}}={{\bf{X}}}_{j}{{\boldsymbol{\beta }}}_{j}+{\bf{C}}{\boldsymbol{\alpha }}+{\boldsymbol{\epsilon }}.$$
(10)

Similarly, as above, we have

$${\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,j}}}={({\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{{\bf{X}}}_{j})}^{-1}{\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{\bf{Y}}\,{\rm{and}}\,{\rm {var}}({\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,j}}})={({\bf{X}}^{\prime} {{\bf{P}}}_{{\rm {C}}}{{\bf{X}}}_{j})}^{-1}{\sigma }_{Y}^{2}.$$
(11)

We denote by \({\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS}}}=({\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,1}}},\cdots \,,{\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,p}}})^{\prime}\) and the vector of z scores as

$${{\bf{Z}}}_{{\rm{GWAS}}}=\left(\frac{{\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,1}}}}{{\rm {se}}({\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,1}}})},\cdots \,,\frac{{\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,p}}}}{se({\hat{{\boldsymbol{\beta }}}}_{{\rm{GWAS,p}}})}\right)^{\prime} .$$
(12)

Then, the following is true:

$${{\bf{Z}}}_{{\rm{GWAS}}}={{\bf{D}}}^{*1/2}{{\bf{Z}}}_{{\rm{all}}},$$
(13)

where D* is the correlation matrix of X* ≔ PCX. When X is independent of C, X* = X and D* is simply the correlation (LD) matrix among p SNPs. Under the null hypothesis H0βall = 0, Zall ~ N(0Ip) whereas ZGWAS ~ N(0D*).

Quantile regression (QR)

We denote the τth conditional quantile function of Yi as \({Q}_{{Y}_{i}}(\tau | {{\bf{X}}}_{i},{{\bf{C}}}_{i})\). Then in linear quantile regression we can write the conditional quantile regression model for p joint variants and given a specific quantile level τ ∈ (0, 1) as

$${Q}_{{Y}_{i}}(\tau | {{\bf{X}}}_{i},\,{{\bf{C}}}_{i})={{\bf{X}}}_{i}{{\boldsymbol{\beta }}}_{{\rm{all}}}(\tau )+{{\bf{C}}}_{i}{\boldsymbol{\alpha }}(\tau ),$$
(14)

where βall(τ), and α(τ) are quantile-specific coefficients and can be estimated by solving the following optimization problem18:

$$({\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}}(\tau ),\, \hat{{\boldsymbol{\alpha }}}(\tau ))=\arg \min E{\rho }_{\tau }({{\bf{Y}}}_{i}-{{\bf{X}}}_{i}{{\boldsymbol{\beta }}}_{{\rm{all}}}(\tau )-{{\bf{C}}}_{i}{\boldsymbol{\alpha }}(\tau )),$$
(15)

where ρτ(u) = u(τI(u < 0)) is the pinball loss function and I(⋅) is the indicator function. When the error ϵ is independent and identically distributed (iid), we have the asymptotic normality for \({\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}}(\tau )\)28 such that

$$\sqrt{n}\left\{{\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}}(\tau )-{{\boldsymbol{\beta }}}_{{\rm{all}}}(\tau )\right\}\to N\left(0,\frac{\tau (1-\tau )}{{f}_{i}^{2}({\xi }_{i}(\tau ))}{{\bf{D}}}_{0}^{-1}\right),$$
(16)

where \({{\bf{D}}}_{0}: \!=\mathop{\lim }\limits_{n\to \infty }{n}^{-1}{\sum }_{i}{{\bf{X}}}_{i}^{\top }{{\bf{X}}}_{i}\), \({\xi }_{i}(\tau ): \!={F}_{{Y}_{i}}^{-1}(\tau | {{\bf{X}}}_{i},{{\bf{C}}}_{i})={Q}_{{Y}_{i}}(\tau | {{\bf{X}}}_{i},\, {{\bf{C}}}_{i})\), and f(⋅) represents the density function. Thus, the asymptotic distribution of \({\hat{{\boldsymbol{\beta }}}}_{{\rm{all}}}\) depends on the behavior of the unknown conditional density of the response Y in a neighborhood of the conditional quantile model.

For quantile regression, a commonly used hypothesis testing tool is the rank score test28. The rank score test statistic for a fixed quantile level τ is defined as:

$${{\bf{S}}}_{{\rm{all,\tau }}}={n}^{-1/2}\mathop{\sum }\limits_{i=1}^{n}{{\bf{X}}}_{i}^{*^{\prime} }{\phi }_{\tau }({{\bf{Y}}}_{i}-{{\bf{C}}}_{i}\hat{{\boldsymbol{\alpha }}}(\tau )),$$
(17)

where ϕτ(u) = τI(u < 0). The covariance matrix of Sn,τ is

$${{\bf{V}}}_{{\rm{all}}}={n}^{-1}\tau (1-\tau ){{\bf{X}}}^{*\top }{{\bf{X}}}^{*},$$
(18)

where X* = PCX is the same as defined previously. If we fit a marginal model for the jth SNP (i.e., QRank6), we have

$${Q}_{{Y}_{i}}(\tau | {{\bf{X}}}_{ij},{{\bf{C}}}_{i})={{\bf{X}}}_{ij}{{\boldsymbol{\beta }}}_{{\rm{QRank}},j}(\tau )+{{\bf{C}}}_{i}{\boldsymbol{\alpha }}(\tau ),$$
(19)

and

$${{\bf{S}}}_{{\rm{QRank,j,\tau }}}={n}^{-1/2}\mathop{\sum }\limits_{i=1}^{n}{{\bf{X}}}_{ij}^{*}{\phi }_{\tau }({{\bf{Y}}}_{i}-{{\bf{C}}}_{i}\hat{{\boldsymbol{\alpha }}}(\tau ))\,{\rm{and}}\,{{\bf{V}}}_{{\rm{QRank,j}}}={n}^{-1}\tau (1-\tau ){{\bf{X}}}_{j}^{*^{\prime} }{{\bf{X}}}_{j}^{*}.$$
(20)

Under the null hypothesis H0 : βall,τ = 0, \({{\bf{V}}}_{{\rm{all}}}^{-1/2}{{\bf{S}}}_{{\rm{all,\tau }}} \sim N({\bf{0}},{{\bf{I}}}_{{\rm {p}}})\) and \({{\bf{V}}}_{{\rm{QRank}},j}^{-1/2}{{\bf{S}}}_{{\rm{QRank}},j,\tau } \sim N(0,1)\), where VQRank,j is the jth diagonal element of Vall. We note that the asymptotic distribution of the test statistics is independent of the particular distribution of the phenotype. Hence it can be applied to any phenotype without requiring a pre-transformation to achieve normality.

Denote \({{\bf{Z}}}_{{\rm{QRank,\tau }}}=\left(\frac{{{\bf{S}}}_{{\rm{QRank,1,\tau }}}}{\sqrt{{{\bf{V}}}_{{\rm{QRank,1}}}}},\cdots \,,\frac{{{\bf{S}}}_{{\rm{QRank,p,\tau }}}}{\sqrt{{{\bf{V}}}_{{\rm{QRank,p}}}}}\right)^{\prime}\) and \({{\bf{Z}}}_{{\rm{all,\tau }}}={{\bf{V}}}_{{\rm{all}}}^{-1/2}{{\bf{S}}}_{{\rm{all,\tau }}}\), then we have

$${{\bf{Z}}}_{{\rm{QRank,\tau }}}={{{\bf{D}}}^{*}}^{1/2}{{\bf{Z}}}_{{\rm{all,\tau }}},$$
(21)

where D* is the correlation matrix of X*.

Hence, we conclude that the rank score test statistics from a marginal quantile regression model can be adjusted to the ones from a joint quantile regression model by the correlation matrix D*, the same as the z scores in the linear regression models above. Finally, we combine quantile-specific p-values from the marginal tests via the Cauchy combination29.

UK Biobank analyses

We used the imputed genotype data of 488,377 participants from the UK Biobank (UKBB). The UKBB samples were directly genotyped with UK BiLEVE and UK Biobank Axiom arrays. Genotype imputation was carried out by using the Haplotype Reference Consortium panel and the merge reference panel from UK10K and 1000 Genomes phase 3 with the IMPUTE4 software27. We selected unrelated white British individuals by excluding third-degree or closer relatives (kinship coefficient > 0.0442 estimated with KING51) as described in refs. 27,52 for discovery analysis and unrelated individuals with non-British European ancestry for replication. The numbers of post-QC samples and variants vary depending on the availability of phenotypic data (Fig. S2). We retained variants with minor allele frequency (MAF) > 1%, Hardy–Weinberg equilibrium (HWE) test p-value > 1 × 10−6, and imputation quality (INFO) score > 0.8. Positions of the genetic variants are given in the coordinates of the GRCh37 human reference genome. All analyses were done while adjusting for age, sex, and 10 principal components of genetic variation. For linear regression, traits were normalized using rank-based inverse normal transformation for LR only; raw trait values are used for quantile regression. We analyzed 39 quantitative phenotypes in the UK Biobank, including three body size measures, eight blood biochemistry markers (clinical markers related to disease risk factors, diagnostic measures, and renal and liver function), and 28 hematological markers (blood counts and related parameters commonly used in clinical practice). The body size measures were collected by the UK Biobank assessment centre, and the biomarkers were measured by the UK Biobank central laboratory. UKBB field IDs for the phenotypes analyzed in this study are available in Supplementary Data 1. Details on each phenotype can be accessed by searching the field ID in the UKBB online showcase (https://biobank.ndph.ox.ac.uk/ukb/).

Gene–environment interactions

We tested for interactions with top SNPs detected by QR using the following model:

$${\bf{Y}}={{\bf{X}}}_{{\rm {g}}}{\beta }_{{\rm {g}}}+{{\bf{X}}}_{{\rm {e}}}{\beta }_{{\rm {e}}}+{{\bf{X}}}_{{\rm {g}}}{{\bf{X}}}_{{\rm {e}}}{\beta }_{{\rm {ge}}}+{\boldsymbol{\epsilon }}.$$
(22)

We tested βge = 0 using linear regression in PLINK2 and adjusted for multiple testing using Bonferroni method. Note that we also performed the robust GxE interaction tests with the Huber-White sandwich variance estimator and the results were consistent with those from the naive variance estimator. (Fig. S21).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.