Multi-SNP Haplotype Analysis Methods for Association Analysis
Haplotype analysis forms the basis of much of genetic association analysis using both related and unrelated individuals (we concentrate on unrelated). For example, haplotype analysis indirectly underlies the SNP imputation methods that are used for testing trait associations with known but unmeasured variants and for performing collaborative post-GWAS meta-analysis. This chapter is focused on the direct use of haplotypes in association testing. It reviews the rationale for haplotype-based association testing, discusses statistical issues related to haplotype uncertainty that affect the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons, first they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature.
This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes, (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters, and (3) a simplified approximation to full ML for case–control data.
Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and argue that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of haplotype risk estimation genome-wide and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.
Key wordsHaplotype-specific risk estimation Phase estimation Genetic association testing Expectation-substitution methods Maximum likelihood Uncertainty analysis
- 19.Loh PR, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R, Price AL (2016) Reference-based phasing using the haplotype reference consortium panel. Nat Genet 48:1443–1448Google Scholar
- 22.Louis T (1982) Finding the observed information matrix when using the EM algorithm. JRSS-B 44(2):226–233Google Scholar