Association Mapping and Disease: Evolutionary Perspectives
In this chapter, we give a short introduction to the genetics of complex diseases emphasizing evolutionary models for disease genes and the effect of different models on the genetic architecture, and we give a survey of the state-of-the-art of genome-wide association studies (GWASs).
Key wordsComplex diseases Association mapping Genome-wide association studies Common disease/common variant
A combination of genes and environment determines our phenotype. The degree to which genotype or environment influences our phenotype—the balance of nature versus nurture—varies from trait to trait, with some traits independent of genotype and determined by the environment alone and others determined by the genotype alone and independent of the environment.
A measure quantifying the importance of genotype compared to the environment is the so-called heritability. It is the fraction of the total phenotypic variation in the population explained by variation in the genotype within the population . A trait of interest, say a common disease, which exhibits a nontrivial heritability, tells us that genes are important for understanding this trait and that it is worthwhile to identify the specific genetic polymorphisms influencing the trait. The first step toward this is association mapping: searching for genetic polymorphisms that, statistically, associate with the trait. Polymorphisms associated with a given phenotype need not influence that phenotype directly, but it is among those associated genetic polymorphisms that we will find the causal ones.
Genetic variants are correlated, a phenomenon called linkage disequilibrium (LD), so by examining the trait association of a few variants, we learn about the association of many others. Examining the association between a phenotypic trait and a few hundred thousand to a million genetic variants suffices to capture how most of the common variation in the entire genome associates with the trait [2, 3, 4]. When we find a genetic variant associated with the trait, we have not necessarily located a variant that has any functional effect on the trait, but we have located a genomic region containing genetic variation that does. LD is predominantly a local phenomenon, so correlated genetic variants tend to be physically near each other on the genome. If we observe an association between the phenotype and a variant, and the variant is not causally affecting the trait but is merely in LD with a causal variant, the causal variant is likely nearby. Further examination of the region might reveal which variants affect the trait, and how, but that often involves functional characterization and is beyond association mapping. With association mapping, we merely seek to identify genetic variation that associates with a trait.
2 The Allelic Architecture of Genetic Determinants for Disease
Many complex diseases show a high heritability, typically ranging between 20% and 80%. Each genetic variant that increases the risk of disease contributes to the measured heritability of the disease and thus explains some fraction of the estimated total heritability of the trait. For most diseases investigated, many variants contribute, and the fraction of the heritability explained for each is therefore low. The number of contributing variants, their individual effects on the disease probability, their selection coefficient, and their dominance relations can be collectively termed the genetic architecture of a common disease. Insights into this architecture are slowly emerging and reveal differences between diseases .
Below we first consider two proposed genetic architectures based on theoretical arguments: the common disease common variant (CDCV) architecture and the common disease rare variant (CDRV) architecture. CDCV states that most of the heritability can be explained by a few high-frequency variants with moderate effects, while CDRV states that most of the heritability can be explained by moderate- or low-frequency variants with large effects. We present population genetic arguments for the two architectures and the consequences of the two architectures for association mapping. Later, in Subheading 5.1, we present empirical knowledge we have obtained about the genetic architectures of common diseases.
2.1 Theoretical Models for the Allelic Architecture of Common Diseases
Understanding the distribution of the number and frequency of genetic variants in a population is the purview of population genetics. Using diffusion approximations we can derive the expected frequency distribution of independent mutations under mutation-drift-selection balance in a stable population (see, e.g., Wright ). Central parameters are the mutation rate, u, and the selection for or against an allele, measured by s, scaled with the effective population size, N. Mutations enter a population with a rate determined by Nu, and subsequently, their frequencies change in a stochastic manner. If a mutant allele is not subject to natural selection, for example, if it does not lead to any change in function, it is selectively neutral. Its frequency then rises and falls with equal probability. If the allele is under selection, it has a higher likelihood of increasing in frequency than decreasing if it is under positive selection (s > 0) and conversely for negative selection (s < 0).
The range of frequencies, where drift dominates, or selection dominates, is determined by the strength of selection (Ns) and the genotypic characteristics of selection, as, e.g., dominance relations between alleles. For strong selection or in large populations, the process is predominantly deterministic for most frequencies, while for weak selection or a small population, the process is highly stochastic for most frequencies. The time an allele can spend at moderate frequencies is also determined by Ns and selection characteristics.
Implicitly, this model assumes a population in mutation-selection equilibrium, and this does not necessarily match human populations. Humans have recently expanded considerably in numbers, and changes in our lifestyle, e.g., from hunter-gatherers to farmers might have changed the adaptive landscape driving selection of our genes.
Depending on which architecture underlies a given disease, different strategies are needed to discover the genetic variants involved. When genome-wide association mapping was proposed as a strategy for discovering disease variants, the proposal was based on the hypothesis that, at least for some common diseases, the CDCV architecture underlies them. GWAS relies on the CDCV hypothesis for two practical reasons. The first is that the LD patterns across the genome greatly restrict examination to only a small fraction of the total possible variation. It is feasible to probe the common variants of a genome from a small selection of representative variants, but the association with rare variants is far less detectable. Second, statistical analysis of the association between polymorphism and disease is rather straightforward for moderate-frequency alleles but has far less power to detect association with low-frequency alleles.
While the GWAS approach is only practical as an approach for variant discovery for common alleles, it was necessary to hypothesize that the CDCV architecture would be underlying diseases of interest. The actual genetic architecture behind common diseases was unknown, but there were no alternative methods aimed at CDRV, so GWAS was the only show in town.
2.2 The Allelic Frequency Spectrum in Humans
The vast majority of human nucleotide variation is very rare because of our history of population bottlenecks followed by rapid growth. For instance, in the 2500 individuals of the 1000 genomes study, 64 million SNVs have frequency <0.5%, and 20 million SNVs have frequency >0.5% . Nevertheless the majority of heterozygous variants observed within a single individual are not rare . The rare variants are most often very recent and therefore specific to populations, and they are also more often deleterious because selection has not yet acted on them . This is particularly clear for loss-of-function variants and other protein-coding variants. A study of 2636 Icelanders found that the fraction of variants with a minor allele frequency (MAF) below 0.1% was 62% for protein-truncating variants, 46% for missense variants, and 38% for synonymous variants .
The strong recent population expansions have also allowed variants to increase in frequency by surfing on the population expansion wave front even if they would be selected against in a population with stable size. Thus, rare variants with large effects on disease may exist. The GWAS studies so far have been successful in identifying a large set of common variants associated with disease, so common variants contributing to disease do exist. It is likely that rare variants with large phenotypic effects also contribute to the heritability of many common diseases, but the extend is likely to be disease specific.
3 The Basic GWAS
The first GWASs were published around 2006 [14, 15] when Illumina and Affymetrix first introduced genotyping chips that made it possible to test hundreds of thousands of SNPs quickly and inexpensively. The GWASs’ approach to find susceptibility variants for diseases boils down to testing approximately 0.3–2 million SNPs (depending on chip type) for differences in allele frequencies between cases and controls, adjusting for the high number of multiple tests. This approach is a wonderfully simple procedure that requires no complicated statistics or algorithms but only well-known statistical tests and a minimum of computing power. Despite the simplicity, some issues remain, such as faulty genotype data and confounding factors that can result in erroneous findings if not handled properly. The most important aspects of any GWAS are, therefore, thorough quality control of the data used and measures to avoid and reduce the effect of confounding factors.
3.1 Statistical Tests
Contingency table for allele counts in case/control data
Expected allele counts in case/control data
(Ncases · NA)/N
(Ncases · NB)/N
(Ncontrols · NA)/N
(Ncontrols · NB)/N
This statistic approximates a χ2 distribution with 1 degree of freedom, but if the expected allele counts are very low (<10), the approximation breaks down. This means that if the MAF is very low or if the total sample size, N, is small, an exact test, such as the Fisher’s exact test, should be applied. An alternative to the tests that use the 2 × 2 allelic contingency table and thereby assumes a multiplicative model is the Cochran–Armitage trend test that assumes an additive risk model . This test is preferred by some since it does not require an assumption of Hardy–Weinberg equilibrium in cases and controls combined .
While a 1 degree of freedom test that assumes an additive or multiplicative model is usually the first analysis, some studies also perform a test that would be better at picking up associations following a dominant or recessive pattern, for instance, by performing a 2 degrees of freedom test of the null hypothesis of no association between rows and columns in the 2 × 3 contingency table that counts genotypes instead of alleles.
3.2 Effect Estimates
3.3 Quality Control
Data quality problems can be either variant specific or individual specific, and inspection usually results in the removal of both problematic individuals and problematic variants from the data set.
Individual-specific problems can be caused by low DNA quality or contamination by foreign DNA. A sample of low DNA quality results in a high rate of missing data, where particular variants cannot be called, and there is a higher risk of miscalling variants. It is, therefore, recommended that individuals lacking calls in more than 2–3% of the variants are removed from the analysis. Excess heterozygosity is an indicator of sample contamination, and individuals displaying that should also be disregarded. Sex checks and other kinds of phenotype tests might also be applied to remove individuals, where the genotype information does not match the phenotype information due to a sample mix-up .
For a given variant, the data from an individual can be suspicious in two ways: it can fail to be called by the genotype-calling program or it can be miscalled. Typically, a conservative cutoff value is used in the calling process securing that most problems show up as missing data rather than miscalls. Most problematic variants, therefore, reveal a high fraction of missing data, and variants missing calls above a given threshold (typically, 1–5%) are removed. Miscalls typically occur when the homozygotes are hard to distinguish from the heterozygotes, and some of the heterozygotes are being misclassified as homozygotes or vice versa. Both biases manifest as deviation from Hardy–Weinberg equilibrium, and SNPs that show large deviations from Hardy–Weinberg equilibrium within the controls should be removed .
3.4 Confounding Factors
Confounding in GWAS can arise if there are genotyping batch effects or if there is population or family structure in the sample. For example, if cases and controls in GWAS are predominantly collected from geographically distinct areas, association signals could arise due to genetic differences caused by geographic variation, and most of such genetic signals are unlikely to be causal. Such confounding due to population structure typically occurs when samples have different genetic ancestry, e.g., if the sample contains individuals of both European and Asian ancestry. Population structure confounding can also happen when the population structure is more subtle, especially for large sample sizes. Methods for inferring population substructure, such as principal components analysis, are useful for detecting outliers we can remove from the data . However, this approach is not suitable when dealing with subtle structure, as a small bias can become significant in a large enough sample of individuals of similar genetic ancestry.
Several approaches accounting for population structure in GWAS have been proposed. Devlin and Roeder [21, 22] proposed genomic control, i.e., to shrink the observed χ2 test statistic to make the median coincide with the expected value under the null model. However, studies by Yang et al.  and Bulik-Sullivan et al.  pointed out that the median and mean χ2 statistic is expected to be inflated for polygenic traits, even when there is no population structure confounding. With that in mind, we recommend adjusting for the confounders in the statistical model instead of performing genomic control. One such approach is to include covariates that capture the relevant structure in the model. Price et al.  proposed including the largest principal components as covariates in the model to adjust for population structure. This approach has proved to be effective in most cases. However, if the sample includes related individuals or if it is very large, controlling for the top PCs may not be able to capture subtle structure. An alternative approach is to use mixed models [26, 27], where the expected genetic relatedness between the individuals is included in the model. Advances in computational efficiency of mixed models  now enable analysis of very large and complex data sets, such as the UK biobank data set .
Besides population structure, family structure or cryptic relatedness can also confound the analyses. Here one can identify closely related individuals by calculating a genetic relatedness matrix and prune the data so that it does not contain any close relatives. Lastly, sequencing batch effects due to incomplete randomizations can lead to structure, unrelated to genetics, which confounds the analysis. A study on polygenic prediction of longevity by Sebastiani et al.  serves as a warning. The researchers applied two different kinds of chips and failed to remove several SNPs that exhibited bad quality on only one of the chips . If the fraction of the two different kinds of chips had been the same in both cases and controls that would probably not have resulted in false signals, unfortunately, the chip with the bad SNPs was used in twice as many cases as controls. When this genotyping batch effect was discovered, the authors had to retract their publication from Science. Type and frequency of errors that may happen during sample preparation and SNP calling are likely to vary through time and space, so case and control samples should be completely randomized as early as possible in the procedure of genotypic typing. Failure to carefully plan this aspect of an investigation introduces errors in the data that are hard, if not impossible, to disclose, and they may reduce interesting findings to mere artifacts.
3.5 Meta-analysis of GWAS
The statistical power to detect association depends directly on the sample size used, all other things being equal. This fact has driven researchers to collaborate across institutions and countries in GWAS consortia, where they combine multiple cohorts in one large analysis. However, for logistic and legal reasons, it may not be possible to share individual-level genotypes, which are required for all of the GWAS approaches covered so far. Meta-analyses of GWASs performed in each cohort are a solution to this problem. These require coordination between the researchers, where they share GWAS summary statistics instead of individual-level genotypes. These summary statistics are then meta-analyzed using statistical approaches that either assume a constant effect across cohorts or not. In recent years many large-scale GWAS meta-analyses have been published, and the resulting summary statistics of these are often made public, providing a treasure trove for understanding genetics of common diseases and traits .
The best way to make sure that a finding is real is to replicate it. If the same signal is found in an independent set of cases and controls, it means that the association is unlikely to be the result of a confounding factor specific to the original data. Likewise, if the association persists after typing the markers using another genotyping method, it means that it is not a false positive due to some artifact of the genotyping method used.
When trying to replicate a finding, the best strategy is to try to replicate it in a population of similar ancestry. A marker that correlates with a true causal variant in one population might not be correlated with the same variant in a population of different ethnicity, where the LD structure can be different. This is especially problematic when trying to replicate an association found in a non-African population in an African population . A marker might easily have 20 completely correlated markers in a European population, but no good correlates in an African population. To replicate a finding in the European population of one of these variants, it does not suffice to test one of the variants in an African population; all 20 variants must be tested. This, however, also offers a way to fine map the signal and possibly find the causative variant .
Before spending time and effort to replicate an association signal in a foreign cohort, it is a good idea to search for the existing partial replication of the marker within the data. Usually, a marker is surrounded by several correlated markers on the genotyping chip, and if one marker shows a significant association, then the correlated markers should show an association too. If a marker is significantly associated with a disease, but no other marker in the region is, then it should be viewed as suspicious.
4 Imputation: Squeezing More Information Out of Your Data
The current generation of SNP chip types includes only 0.3–2 million of the nine to ten million common SNPs in the human (i.e., SNPs with a MAF of more than 5%). Because of the correlation between SNPs in LD, however, the SNP chips can still claim to assay most of the common variants in the genome (in European populations at least). Although the Illumina HumanHap300 chip only directly tests about 3% of the ten million common SNPs, it still covers 77% of the SNPs in HapMap with a squared correlation coefficient (r2) of at least 0.8 in a population of European ancestry . The corresponding fraction in a population of African ancestry is only 33%, however.
These numbers expose two limitations of the basic GWAS strategy. First, there is a substantial fraction of the common SNPs that are not well covered by the SNP chips even in European populations (23% in the case of the HumanHap300 chip). Second, we rely on tagging to test a large fraction of the common SNPs, and this diluted signal from correlated SNPs inevitably causes us to overlook true associations in many instances. An efficient way of alleviating these limitations is genotype imputation, where genotypes that are not directly assayed are predicted using information from a reference data set that contains data from a large number of variants. Such imputation improves the GWAS in multiple ways: It boosts the power to detect associations, gives a more precise location of an association, and makes it possible to do meta-analyses between studies that used different SNP chips .
4.1 Selection of Reference Data Set
The two important choices when performing imputation are the reference data set to use and the software to use. Usually, a publicly available reference data set, such as the 1000 Genomes Project  or the large Haplotype Reference Consortium , is used. Alternatively, researchers sequence a part of their study cohort and thus create their own reference data set. The latter strategy has the advantage that one can be certain that the ancestry of the reference data matches the ancestry of the study cohort. It is important that the reference data be from a population that is similar to the study population. If the reference population is too distantly related to the study population, the reliability of the imputed data will be reduced. The quality and nature of the reference data also limit the quality of the imputed data in other ways. A reference data set consisting of only a small number of individuals is not able to reliably estimate the frequency of rare variants and that in turn means that the imputation of rare variants lacks in accuracy. This means that there is a natural limit to how low a frequency a variant can have and still be reliably imputed.
The largest publicly available reference data set is the Haplotype Reference Consortium (HRC) that combines whole-genome sequence data from 20 studies of predominantly European ancestry . The first release of this reference panel has data from 32,611 samples at 39,235,157 SNPs. The large sample size means that variants with minor allele frequencies as low as 0.1% can correctly be imputed using this data set.
The use of imputation methods does not only offer the possibility of increased SNP coverage, but, given the right reference data, also eases the analysis of common non-SNP variation, such as indels and copy number variations (CNVs). So far some reference panels have, however, only include SNVs and disregarded indels and structural variants. The increasing quality of whole-genome sequencing and software for calling structural variants means that better data sets that include structural variants should soon become available. Imputation will then make it possible to use the SNP chips to test many indels and structural variants that are not being (routinely) tested today .
4.2 Imputation Software
The commonly applied genotype imputation methods, such as IMPUTE2 , BIMBAM , MaCH-Admix , and minimac3 , are all based on hidden Markov models (HMMs). Comparisons of these software packages have shown that they produce data of broadly similar quality but that they are superior to imputation software based on other methodological approaches [36, 43]. The basic HMMs used in these programs are similar to earlier HMMs developed to model LD patterns and estimate recombination rates.
When the sample size is large, imputation using these HMM-based methods imposes a high computational burden. One possible way of decreasing this burden is to pre-phase the samples so that resolved haplotypes are used as input for the imputation software instead of genotypes . But even with pre-phasing, the computational task is far from trivial, and whole-genome imputation is not a task that can be performed on a single computer. This computational problem can be solved by using one of the two free imputation services that have recently been launched (https://imputationserver.sph.umich.edu, https://imputation.sanger.ac.uk). These services allow users to upload their data through a web interface and choose between a set of reference panels. The data set will then be imputed on a High Performance Computing Cluster, and the user will receive an email when the imputed data is ready for download.
4.3 Testing Imputed Variants
Since imputation is based on probabilistic models, the output is merely a probability for each genotype for the unknown variants in a given individual. That is, instead of reporting the genotype of an individual as AG, say, the program reports that the probability of the genotype being AA is 5%, that of being AG is 93%, and that of being GG is 2%. This nature of the output data challenges the GWAS. The simplest way of analyzing the imputed data is to use the “best guess” genotype, i.e., assume the genotype with the highest probability and ignore the others. In the example above, the individual would be given the genotype AG at the SNP in question, and usually, an individual’s genotype would be considered as missing if none of the genotypes have a probability larger than a certain threshold (e.g., 90%). The use of “best guess” genotype is problematic since it does not take the uncertainty of the imputed genotypes into account, may introduce a systematic bias, and lead to false positives and false negatives. A better way is to report a logistic regression on the expected allele count—in the example above, the expected allele count for allele A would be 1.03 (2pAA + pAG). This method has proved to be surprisingly robust at least when the effect of the risk allele is small , which is the case for most of the variants found through GWAS. An even better solution is to use methods that fully account for the uncertainty of the imputed genotypes [45, 46, 47].
5 Current Status
5.1 Polygenic Architecture of Common Diseases
GWASs have consistently shown that most complex traits and diseases have very polygenic architectures with a large number of causal variants with small effects. The small effect sizes mean that enormous sample sizes are needed to detect the associated variants and that each variant only explains a small fraction of the heritability. Even though large sample sizes have led to the discovery of many loci affecting common diseases, the aggregated effect of all these loci still only explains a small fraction of the heritability.
A good example is type 2 diabetes where researchers by 2012 had identified 63 associated loci that collectively only explained 5.7% of the liability-scale variance . Such results led to much discussion about the possible source of the remaining “missing heritability” [49, 50]. A significant contribution to this debate was when researchers in 2010 started using mixed linear models to estimate the heritability explained by all common variants not only those that surpass a conservative significance threshold. These studies showed that a significant fraction of the so-called missing heritability was not truly missing from the GWAS data sets but only hidden due to small effect sizes. This was first illustrated in height where 180 statistically significant SNPs could only explain 10% of the heritability, but this fraction increased to 45% when all genotyped variants were considered .
For common diseases, such analyses have typically shown that around half of the heritability can be explained by considering all common variants. Given the small individual contribution of each of the discovered variants and that the individual contribution of the yet to be found variants will be even smaller, it is likely that the actual number of causal variants will be much more than a thousand for many common diseases. Recent data shows that in many diseases these causal variants are relatively uniformly distributed along the genome. It has, for instance, been estimated that 71–100% of 1 MB windows in the genome contribute to the heritability of schizophrenia . Another article recently estimated that most 100 kB windows contribute to the variation of height and that more than 100,000 markers have an independent effect on height. This strikingly large number leads the authors to propose a new “omnigenic” model in which most genes expressed in a cell type that is relevant for a given disease have a nonzero contribution to the heritability of that disease .
The variants that have been discovered by GWASs so far reveal numerous examples where one genetic locus affects multiple often seemingly unrelated traits [54, 55]. One explanation for such a shared association between a pair of traits is mediation where the shared locus affects the risk of one of the traits, and that trait is causal for the other. Another possible explanation is pleiotropy where the shared locus is independently causal for both traits. It is possible to distinguish between mediation and true pleiotropy by adjusting or stratifying for one trait while testing the other. In the case of mediation, it is also possible to determine the direction of the causation. In general, it is difficult to make such causal inference from observational data, but Mendelian randomization, which uses significantly associated variants as instrumental variables, can in some circumstances be used to assess a causal relationship between a potential risk factor and a disease. For instance, Voight and colleagues used SNPs associated with lipoprotein levels to assess whether the correlation between different forms of lipoprotein and myocardial infarction risk was causal . They found that while low-density lipoprotein (LDL) had a causal effect on disease risk, high-density lipoprotein (HDL) did not.
The fact that pleiotropy is widespread has several implications. One is that variants that have already been found to affect one trait can be prioritized in other studies since they are more likely also to affect another trait than a random variant is. Another implication is that we cannot always examine the effect of selection by studying one trait in isolation. There are multiple examples of antagonistic pleiotropy where a variant increases the risk of one disease while decreasing the risk of another.
5.3 Differences Between Diseases
Because of differences in age of onset and severity, we do not expect identical allelic architectures in all common diseases. Using the currently available GWAS data sets, we can now start to identify these differences in the allelic architectures, but because of the significant differences in samples sizes and the number of tested variants, this is not an easy task.
The data available to date show that the degree of polygenicity differs between diseases with schizophrenia, for example, having more predicted loci than immune disorders  and hypertension . Results also show that rare variants play a larger role in some diseases compared to others. Rare variants, for example, have a greater role in amyotrophic lateral sclerosis than in schizophrenia  and are even less important in lifestyle-dependent diseases such as type 2 diabetes .
The price of whole-genome sequencing is still declining, and it is not unreasonable to expect that at some point in the future, a majority of people will get their genomes sequenced. At that point the availability of genetic data will no longer be a limiting factor in studies of common human diseases. In order to make the most of such huge data sets, the genetic information needs to be combined with high-quality phenotypic and environmental information. If that is achieved, we will be able to explain most—if not all—of the additive genetic variance for the common human diseases. Having large population data sets where genetic data is combined with extensive phenotypic data including information about lifestyle, diet and other environmental risk factors will also enable much better studies of pleiotropy and gene–environment interactions. A few large population data sets are already available now with the UK Biobank —a prospective study of 500,000 individuals—being the best example.
While GWASs have found a lot of loci that are associated with common diseases, the actual causal variant and the functional mechanism driving the causation are still unknown for a large fraction of the loci. In order to understand the functional mechanism of a specific locus, it is necessary to combine sequence data with other types of data. This includes gene expression data (from the correct tissue) and epigenetic data such as methylation. Such data sets are fortunately also becoming cheaper to produce and thus more abundant as a result of falling sequencing costs. Furthermore large consortium data sets such as GTEx , ENCODE , and Roadmap Epigenomics  mean that each lab studying these mechanisms will not have to produce all the data themselves but can in part rely on these public data sets. It is thus likely that we in the future not only will find many more GWAS loci for each common disease but we will also have a much better understanding of how each of these loci affects the disease.
How can you distinguish causal variants from other variants when all variants have been typed? Is there any statistical way of distinguishing between correlation and causality just from genotype data? Could you use functional annotations?
Consider a GWAS data set, where in the top ten ranked statistics you have five markers that are close together and the remaining five scattered across the genome. Would you consider the five close markers more or less likely to be a true positive? Why? If one of them is a false positive, what would you think about the others?
Why is the RR but not the OR estimate affected by a biased case/control sample?
How would you test for, e.g., dominant or recessive effects in a contingency table?
- 30.Sebastiani P, Solovieff N, Puca A et al (2010) Genetic signatures of exceptional longevity in humans. Science. https://doi.org/10.1126/science.1190532
- 32.Zheng J, Erzurumluoglu AM, Elsworth BL et al (2017) LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33:272–279. https://doi.org/10.1093/bioinformatics/btw613 CrossRefPubMedPubMedCentralGoogle Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.