On the Transformation of Genetic Effect Size from Logit to Liability Scale


Genetic effects on the liability scale are informative for describing the genetic architecture of binary traits, typically diseases. However, most genetic association analyses on binary traits are performed by logistic regression, and there is no straightforward method that transforms both effect size estimate and standard error from the logit scale to the liability scale. Here, we derive a simple linear transformation of the log odds ratio and its standard error for a single nucleotide polymorphism (SNP) to an effect size and standard error on the liability scale. We show by analytic calculations and simulations that this approximation is accurate when the disease is common and the SNP effect is small. We also apply this method to estimate the contribution of a SNP near the RET gene to the variance of Hirschsprung disease liability, and the age-specific contributions of APOE4 on the variance of Alzheimer’s disease liability. We discuss the approximate linear inter-relationships between genotype and effect sizes on the observed binary, logit, and liability scales, and the potential applications of the linear approximation to statistical power calculation for binary traits.

Fig. 1
Fig. 2
Fig. 3

Code availability

The R code to implement the four transformation method is provided on the author’s Github page


We thank Dr. Zipeng Liu for useful discussions and suggestions.


This study was funded by Research Grants Council, University Grants Committee (HK) (Grant Numbers 17128515, 17124017).

Derivation of the variance of the estimated lnOR for standardized genotype under case–control design

The number of cases and controls in different allele and disease affection status.

Allele status\Affection status Case Control
With effect allele a c
Without effect allele b d
$$var(\widehat{lnOR})\approx \frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}$$

We denote the total sample size by \(n\). Accordingly, the total number of alleles is \(2n\). The total number of cases and controls are denoted by \(2{n}_{A}\) and \(2{n}_{U}\), respectively. \(f\) is the frequency of the effect allele.

$$\frac{1}{a}+\frac{1}{c}= \frac{1}{2{n}_{A}f}+\frac{1}{2{n}_{A}(1-f)}=\frac{1}{2{n}_{A}f(1-f)}$$

For case–control design, \({n}_{A}=nw\), where \(w\) is the case–control ratio.

$$\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}= \frac{1}{2nwf(1-f)}+\frac{1}{2n(1-w)f(1-f)}=\frac{1}{2nf(1-f)}\frac{1}{w(1-w)}$$

When the genotype is standardized, \(2f(1-f)=1\).

$$var(\widehat{lnOR})\approx \frac{1}{nw(1-w)}$$

