Human Genetics

, Volume 125, Issue 2, pp 131–151 | Cite as

Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE Statement

  • Julian LittleEmail author
  • Julian P. T. Higgins
  • John P. A. Ioannidis
  • David Moher
  • France Gagnon
  • Erik von Elm
  • Muin J. Khoury
  • Barbara Cohen
  • George Davey-Smith
  • Jeremy Grimshaw
  • Paul Scheet
  • Marta Gwinn
  • Robin E. Williamson
  • Guang Yong Zou
  • Kim Hutchings
  • Candice Y. Johnson
  • Valerie Tait
  • Miriam Wiens
  • Jean Golding
  • Cornelia van Duijn
  • John McLaughlin
  • Andrew Paterson
  • George Wells
  • Isabel Fortier
  • Matthew Freedman
  • Maja Zecevic
  • Richard King
  • Claire Infante-Rivard
  • Alex Stewart
  • Nick Birkett
Open Access
Original Investigation


Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modeling haplotype variation, Hardy–Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.


Gene-disease associations Genetics Gene-environment interaction Systematic review Meta-analysis Reporting recommendations Epidemiology Genome-wide association 


The rapidly evolving evidence on genetic associations is crucial to integrating human genomics into the practice of medicine and public health (Khoury et al. 2004; Genomics Health and Society Working Group 2004). Genetic factors are likely to affect the occurrence of numerous common diseases, and therefore identifying and characterizing the associated risk (or protection) will be important in improving the understanding of etiology and potentially for developing interventions based on genetic information. The number of publications on the associations between genes and diseases has increased tremendously; with more than 34,000 published articles, the annual number has more than doubled between 2001 and 2008 (Lin et al. 2006; Yu et al. 2008). Articles on genetic associations have been published in about 1,500 journals and in several languages.

Despite many similarities between genetic association studies and “classical” observational epidemiologic studies (that is, cross-sectional, case–control, and cohort) of lifestyle and environmental factors, genetic association studies present several specific challenges including an unprecedented volume of new data (Lawrence et al. 2005; Thomas 2006) and the likelihood of very small individual effects. Genes may operate in complex pathways with gene-environment and gene–gene interactions (Khoury et al. 2007). Moreover, the current evidence base on gene-disease associations is fraught with methodological problems (Little et al. 2003; Ioannidis et al. 2005, 2006). Inadequate reporting of results, even from well-conducted studies, hampers assessment of a study’s strengths and weaknesses, and hence the integration of evidence (von Elm and Egger 2004).

Although several commentaries on the conduct, appraisal and/or reporting of genetic association studies have so far been published (Nature Genetics 1999; Cardon and Bell 2001; Weiss 2001; Weiss et al. 2001; Cooper et al. 2002; Hegele 2002; Little et al. 2002; Romero et al. 2002; Colhoun et al. 2003; van Duijn and Porta 2003; Crossman and Watkins 2004; Huizinga et al. 2004; Little 2004; Rebbeck et al. 2004; Tan et al. 2004; Anonymous 2005; Ehm et al. 2005; Freimer and Sabatti 2005; Hattersley and McCarthy 2005; Manly 2005; Shen et al. 2005; Vitali and Randolph 2005; Wedzicha and Hall 2005; Hall and Blakey 2005; DeLisi and Faraone 2006; Saito et al. 2006; Uhlig et al. 2007; NCI-NHGRI Working Group on Replication in Association Studies et al. 2007), their recommendations differ. For example, some papers suggest that replication of findings should be part of the publication (Nature Genetics 1999; Cardon and Bell 2001; Cooper et al. 2002; Hegele 2002; Huizinga et al. 2004; Tan et al. 2004; Wedzicha and Hall 2005; Hall and Blakey 2005; DeLisi and Faraone 2006), whereas others consider this suggestion unnecessary or even unreasonable (van Duijn and Porta 2003; Begg 2005; Byrnes et al. 2005; Pharoah et al. 2005; Wacholder 2005; Whittemore 2005). In many publications, the guidance has focused on genetic association studies of specific diseases (Weiss 2001; Weiss et al. 2001; Hegele 2002; Romero et al. 2002; Crossman and Watkins 2004; Huizinga et al. 2004; Rebbeck et al. 2004; Tan et al. 2004; Manly 2005; Shen et al. 2005; Vitali and Randolph 2005; Wedzicha and Hall 2005; Hall and Blakey 2005; DeLisi and Faraone 2006; Saito et al. 2006; Uhlig et al. 2007) or the design and conduct of genetic association studies (Cardon and Bell 2001; Weiss 2001; Weiss et al. 2001; Hegele 2002; Romero et al. 2002; Colhoun et al. 2003; Crossman and Watkins 2004; Huizinga et al. 2004; Rebbeck et al. 2004; Hattersley and McCarthy 2005; Manly 2005; Shen et al. 2005; Hall and Blakey 2005; DeLisi and Faraone 2006) rather than on the quality of the reporting.

Despite increasing recognition of these problems, the quality of reporting genetic association studies needs to be improved (Bogardus et al. 1999; Peters et al. 2003; Clark and Baudouin 2006; Lee et al. 2007; Yesupriya et al. 2008). For example, an assessment of a random sample of 315 genetic association studies published from 2001 to 2003 found that most studies provided some qualitative descriptions of the study participants (for example, origin and enrollment criteria), but reporting of quantitative descriptors such as age and sex was variable (Yesupriya et al. 2008). In addition, completeness of reporting of methods that allow readers to assess potential biases (for example, number of exclusions or number of samples that could not be genotyped) varied (Yesupriya et al. 2008). Only some studies described methods to validate genotyping or mentioned whether research staff was blinded to outcome. The same problems persisted in a smaller sample of studies published in 2006 (Yesupriya et al. 2008). Lack of transparency and incomplete reporting have raised concerns in a range of health research fields (von Elm and Egger 2004; Reid et al. 1995; Brazma et al. 2001; Pocock et al. 2004; Altman and Moher 2005) and poor reporting has been associated with biased estimates of effects in clinical intervention studies (Gluud 2006).

The main goal of this article is to propose and justify a set of guiding principles for reporting results of genetic association studies. The epidemiology community has recently developed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement for cross-sectional, case–control, and cohort studies (von Elm et al. 2007; Vandenbroucke et al. 2007). Given the relevance of general epidemiologic principles for genetic association studies, we propose recommendations in an extension of the STROBE statement called the STrengthening the REporting of Genetic Association studies (STREGA) Statement. The recommendations of the STROBE Statement have a strong foundation because they are based on the empirical evidence on the reporting of observational studies, and they involved extensive consultations in the epidemiologic research community (Vandenbroucke et al. 2007). We have sought to identify gaps and areas of controversy in the evidence regarding potential biases in genetic association studies. With the recommendations, we have indicated available empirical or theoretical work that has demonstrated or suggested that a methodological feature of a study can influence the direction or magnitude of the association observed. We acknowledge that for many items, no such evidence exists. The intended audience for the reporting guideline is broad and includes epidemiologists, geneticists, statisticians, clinician scientists, and laboratory-based investigators who undertake genetic association studies. In addition, it includes “users” of such studies who wish to understand the basic premise, design, and limitations of genetic association studies in order to interpret the results. The field of genetic associations is evolving very rapidly with the advent of genome-wide association investigations, high-throughput platforms assessing genetic variability beyond common single-nucleotide polymorphisms (SNPs) (for example, copy number variants, rare variants), and eventually routine full sequencing of samples from large populations. Our recommendations are not intended to support or oppose the choice of any particular study design or method. Instead, they are intended to maximize the transparency, quality and completeness of reporting of what was done and found in a particular study.


A multidisciplinary group developed the STREGA Statement using literature review, workshop presentations and discussion, and iterative electronic correspondence after the workshop. Thirty-three of 74 invitees participated in the STREGA workshop in Ottawa, Ontario, Canada, in June, 2006. Participants included epidemiologists, geneticists, statisticians, journal editors, and graduate students.

Before the workshop, an electronic search was performed to identify existing reporting guidance for genetic association studies. Workshop participants were also asked to identify any additional guidance. They prepared brief presentations on existing reporting guidelines, empirical evidence on reporting of genetic association studies, the development of the STROBE Statement, and several key areas for discussion that were identified on the basis of consultations before the workshop. These areas included the selection and participation of study participants, rationale for choice of genes and variants investigated, genotyping errors, methods for inferring haplotypes, population stratification, assessment of Hardy–Weinberg equilibrium (HWE), multiple testing, reporting of quantitative (continuous) outcomes, selectively reporting study results, joint effects and inference of causation in single studies. Additional resources to inform workshop participants were the HuGENet handbook (Little and Higgins 2006; Higgins et al. 2007), examples of data extraction forms from systematic reviews or meta-analyses, articles on guideline development (Altman et al. 2001; Moher et al. 2001) and the checklists developed for STROBE. To harmonize our recommendations for genetic association studies with those for observational epidemiologic studies, we communicated with the STROBE group during the development process and sought their comments on the STREGA draft documents. We also provided comments on the developing STROBE Statement and its associated explanation and elaboration document (Vandenbroucke et al. 2007).


In Table 1, we present the STREGA recommendations, an extension to the STROBE checklist (von Elm et al. 2007) for genetic association studies. The resulting STREGA checklist provides additions to 12 of the 22 items on the STROBE checklist. During the workshop and subsequent consultations, we identified five main areas of special interest that are specific to, or especially relevant in, genetic association studies: genotyping errors, population stratification, modeling haplotype variation, HWE, and replication. We elaborate on each of these areas, starting each section with the corresponding STREGA recommendation, followed by a brief outline of the issue and an explanation for the recommendations. Complementary information on these areas and the rationale for additional STREGA recommendations relating to selection of participants, choice of genes and variants selected, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and issues of data volume, are presented in Table 2.
Table 1

STREGA reporting recommendations, extended from STROBE Statement


Item number

STROBE guideline

Extension for Genetic Association Studies (STREGA)

Title and Abstract


(a) Indicate the study’s design with a commonly used term in the title or the abstract


(b) Provide in the abstract an informative and balanced summary of what was done and what was found




 Background rationale


Explain the scientific background and rationale for the investigation being reported




State specific objectives, including any pre-specified hypotheses

State if the study is the first report of a genetic association, a replication effort, or both



 Study design


Present key elements of study design early in the paper




Describe the setting, locations and relevant dates, including periods of recruitment, exposure, follow-up, and data collection




(a) Cohort study: give the eligibility criteria, and the sources and methods of selection of participants. Describe methods of follow-up

Case–control study: give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls

Cross-sectional study: give the eligibility criteria, and the sources and methods of selection of participants

Give information on the criteria and methods for selection of subsets of participants from a larger study, when relevant

(b) Cohort study: for matched studies, give matching criteria and number of exposed and unexposed

Case–control study: for matched studies, give matching criteria and the number of controls per case




(a) Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable

(b) Clearly define genetic exposures (genetic variants) using a widely-used nomenclature system. Identify variables likely to be associated with population stratification (confounding by ethnic origin)

 Data sources/measurement


(a) For each variable of interest, give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group

(b) Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates and call rates. State the laboratory/center where genotyping was done. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches



(a) Describe any efforts to address potential sources of bias

(b) For quantitative outcome variables, specify if any investigation of potential bias resulting from pharmacotherapy was undertaken. If relevant, describe the nature and magnitude of the potential bias, and explain what approach was used to deal with this

 Study size


Explain how the study size was arrived at


 Quantitative variables


Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen, and why

If applicable, describe how effects of treatment were dealt with

 Statistical methods


(a) Describe all statistical methods, including those used to control for confounding

State software version used and options (or settings) chosen

(b) Describe any methods used to examine subgroups and interactions


(c) Explain how missing data were addressed


Cohort study: if applicable, explain how loss to follow-up was addressed

Case–control study: if applicable, explain how matching of cases and controls was addressed

Cross-sectional study: if applicable, describe analytical methods taking account of sampling strategy


(e) Describe any sensitivity analyses


(f) State whether Hardy–Weinberg equilibrium was considered and, if so, how


(g) Describe any methods used for inferring genotypes or haplotypes


(h) Describe any methods used to assess or address population stratification


(i) Describe any methods used to address multiple comparisons or to control risk of false-positive findings


(j) Describe any methods used to address and correct for relatedness among subjects





(a) Report the numbers of individuals at each stage of the study—e.g., numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analyzed

Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful

(b) Give reasons for non-participation at each stage


(c) Consider use of a flow diagram


 Descriptive data


(a) Give characteristics of study participants (e.g., demographic, clinical, social) and information on exposures and potential confounders

Consider giving information by genotype

(b) Indicate the number of participants with missing data for each variable of interest


(c) Cohort study: summarize follow-up time, e.g., average and total amount


 Outcome data


Cohort study: report numbers of outcome events or summary measures over time

Report outcomes (phenotypes) for each genotype category over time

Case–control study: report numbers in each exposure category, or summary measures of exposure

Report numbers in each genotype category

Cross-sectional study: report numbers of outcome events or summary measures

Report outcomes (phenotypes) for each genotype category

 Main results


(a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (e.g., 95% confidence intervals). Make clear which confounders were adjusted for and why they were included


(b) Report category boundaries when continuous variables were categorized


(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period


(d) Report results of any adjustments for multiple comparisons

 Other analyses


(a) Report other analyses done—e.g., analyses of subgroups and interactions, and sensitivity analyses


(b) If numerous genetic exposures (genetic variants) were examined, summarize results from all analyses undertaken


(c) If detailed results are available elsewhere, state how they can be accessed



 Key results


Summarize key results with reference to study objectives




Discuss limitations of the study, taking into account sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias




Give a cautious overall interpretation of results considering objectives, limitations, multiplicity of analyses, results from similar studies, and other relevant evidence




Discuss the generalizability (external validity) of the study results


Other information




Give the source of funding and the role of the funders for the present study and, if applicable, for the original study on which the present article is based


STREGA Strengthening the REporting of Genetic Association studies, STROBE Strengthening the Reporting of Observational Studies in Epidemiology

aGive information separately for cases and controls in case–control studies and, if applicable, for exposed and unexposed groups in cohort and cross-sectional studies

Table 2

Rationale for inclusion of topics in the STREGA recommendations

Specific issue in genetic association studies

Rationale for inclusion in STREGA

Item(s) in STREGA

Specific suggestions for reporting

Main areas of special interest

 Genotyping errors (misclassification of exposure)

Non-differential genotyping errors will usually bias associations towards the null (Rothman et al. 1993; Garcia-Closas et al. 2004). When there are systematic differences in genotyping according to outcome status (differential error), bias in any direction may occur

8(b) Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates and call rates. State the laboratory/center where genotyping was done. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches

13(a) Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful

Factors affecting the potential extent of misclassification (information bias) of genotype include the types and quality of samples, timing of collection, and the method used for genotyping (Little et al. 2002; Pompanon et al. 2005; Steinberg and Gallagher 2004)

When high-throughput platforms are used, it is important to report not only the platform used but also the allele calling algorithm and its version. Different calling algorithms have different strengths and weaknesses [(McCarthy et al. 2008) and supplementary information in (Wellcome Trust Case Control Consortium 2007)]. For example, some of the currently used algorithms are notably less accurate in assigning genotypes to single-nucleotide polymorphisms with low minor allele frequencies (<0.10) than to single nucleotide polymorphisms with higher minor allele frequencies (Pearson and Manolio 2008). Algorithms are continually being improved. Reporting the allele calling algorithm and its version will help readers to interpret reported results, and it is critical for reproducing the results of the study given the same intermediate output files summarizing intensity of hybridization

For some high-throughput platforms, the user may choose to assign genotypes using all of the data from the study simultaneously, or in smaller batches, such as by plate (Clayton et al. 2005; Plagnol et al. 2007) and supplementary information (Wellcome Trust Case Control Consortium 2007)). This choice can affect both the overall call rate and the robustness of the calls

For case–control studies, whether genotyping was done blind to case–control status should be reported, along with the reason for this decision

 Population stratification (confounding by ethnic origin)

When study sub-populations differ both in allele (or genotype) frequencies and disease risks, then confounding will occur if these sub-populations are unevenly distributed across exposure groups (or between cases and controls)

12(h) Describe any methods used to assess or address population stratification

In view of the debate about the potential implications of population stratification for the validity of genetic association studies, transparent reporting of the methods used, or stating that none was used, to address this potential problem is important for allowing the empirical evidence to accrue

Ethnicity information should be presented (see for example (Winker 2006)), as should genetic markers or other variables likely to be associated with population stratification. Details of case-family control designs should be provided if they are used

As several methods of adjusting for population stratification have been proposed (Balding 2006), explicit documentation of the methods is needed

 Modeling haplotype variation

In designs considered in this article, haplotypes have to be inferred because of lack of available family information. There are diverse methods for inferring haplotypes.

12(g) Describe any methods used for inferring genotypes or haplotypes.

When discrete “windows” are used to summarize haplotypes, variation in the definition of these may complicate comparisons across studies, as results may be sensitive to choice of windows. Related “imputation” strategies are also in use (Wellcome Trust Case Control Consortium 2007; Scott et al. 2007; Scuteri et al. 2007).

It is important to give details on haplotype inference and, when possible, uncertainty. Additional considerations for reporting include the strategy for dealing with rare haplotypes, window size and construction (if used) and choice of software

 Hardy–Weinberg equilibrium (HWE)

Departure from Hardy–Weinberg equilibrium may indicate errors or peculiarities in the data (Salanti et al. 2005). Empirical assessments have found that 20–69% of genetic associations were reported with some indication about conformity with Hardy–Weinberg equilibrium, and that among some of these, there were limitations or errors in its assessment (Salanti et al. 2005)

12(f) State whether Hardy–Weinberg equilibrium was considered and, if so, how

Any statistical tests or measures should be described, as should any procedure to allow for deviations from Hardy–Weinberg equilibrium in evaluating genetic associations (Zou and Donner 2006)


Publications that present and synthesize data from several studies in a single report are becoming more common

3: State if the study is the first report of a genetic association, a replication effort, or both

The selected criteria for claiming successful replication should also be explicitly documented

Additional issues

 Selection of participants

Selection bias may occur if (i) genetic associations are investigated in one or more subsets of participants (sub-samples) from a particular study; or (ii) there is differential non-participation in groups being compared; or, (iii) there are differential genotyping call rates in groups being compared

6(a) Give information on the criteria and methods for selection of subsets of participants from a larger study, when relevant

13(a) Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful

Inclusion and exclusion criteria, sources and methods of selection of sub-samples should be specified, stating whether these were based on a priori or post hoc considerations

 Rationale for choice of genes and variants investigated

Without an explicit rationale, it is difficult to judge the potential for selective reporting of study results. There is strong empirical evidence from randomised controlled trials that reporting of trial outcomes is frequently incomplete and biased in favor of statistically significant findings (Chan et al. 2004a, b; Chan and Altman 2005). Some evidence is also available in pharmacogenetics (Contopoulos-Ioannidis et al. 2006)

7(b) Clearly define genetic exposures (genetic variants) using a widely-used nomenclature system. Identify variables likely to be associated with population stratification (confounding by ethnic origin)

The scientific background and rationale for investigating the genes and variants should be reported

For genome-wide association studies, it is important to specify what initial testing platforms were used and how gene variants are selected for further testing in subsequent stages. This may involve statistical considerations (for example, selection of P value threshold), functional or other biological considerations, fine mapping choices, or other approaches that need to be specified

Guidelines for human gene nomenclature have been published by the Human Gene Nomenclature Committee (Wain et al. 2002a, b). Standard reference numbers for nucleotide sequence variations, largely but not only SNPs are provided in dbSNP, the National Center for Biotechnology Information’s database of genetic variation (Sherry et al. 2001). For variations not listed in dbSNP that can be described relative to a specified version, guidelines have been proposed (Antonarakis 1998; den Dunnen and Antonarakis 2000)

 Treatment effects in studies of quantitative traits

A study of a quantitative variable may be compromised when the trait is subjected to the effects of a treatment for example, the study of a lipid-related trait for which several individuals are taking lipid-lowering medication. Without appropriate correction, this can lead to bias in estimating the effect and loss of power

9(b) For quantitative outcome variables, specify if any investigation of potential bias resulting from pharmacotherapy was undertaken. If relevant, describe the nature and magnitude of the potential bias, and explain what approach was used to deal with this

11: If applicable, describe how effects of treatment were dealt with

Several methods of adjusting for treatment effects have been proposed (Tobin et al. 2005). As the approach to deal with treatment effects may have an important impact on both the power of the study and the interpretation of the results, explicit documentation of the selected strategy is needed

Statistical methods

Analysis methods should be transparent and replicable, and genetic association studies are often performed using specialized software

12(a) State software version used and options (or settings) chosen



The methods of analysis used in family-based studies are different from those used in studies that are based on unrelated cases and controls. Moreover, even in the studies that are based on apparently unrelated cases and controls, some individuals may have some connection and may be (distant) relatives, and this is particularly common in small, isolated populations, for example, Iceland. This may need to be probed with appropriate methods and adjusted for in the analysis of the data

12(j) Describe any methods used to address and correct for relatedness among subjects

For the great majority of studies in which samples are drawn from large, non-isolated populations, relatedness is typically negligible and results would not be altered depending on whether relatedness is taken into account. This may not be the case in isolated populations or those with considerable inbreeding. If investigators have assessed for relatedness, they should state the method used (Lynch and Ritland 1999; Slager and Schaid 2001; Voight and Pritchard 2005) and how the results are corrected for identified relatedness

 Reporting of descriptive and outcome data

The synthesis of findings across studies depends on the availability of sufficiently detailed data

14(a) Consider giving information by genotype

15: Cohort study: Report outcomes (phenotypes) for each genotype category over time

Case-control study: Report number in each genotype category

Cross-sectional study: Report outcomes (phenotypes) for each genotype category


 Volume of data

The key problem is of possible false-positive results and selective reporting of these. Type I errors are particularly relevant to the conduct of genome-wide association studies. A large search among hundreds of thousands of genetic variants can be expected by chance alone to find thousands of false-positive results (odds ratios significantly different from 1.0)

12(i) Describe any methods used to address multiple comparisons or to control risk of false-positive findings

16(d) Report results of any adjustments for multiple comparisons

17(b) If numerous genetic exposures (genetic variants) were examined, summarize results from all analyses undertaken

17(c) If detailed results are available elsewhere, state how they can be accessed

Genome-wide association studies collect information on a very large number of genetic variants concomitantly. Initiatives to make the entire database transparent and available online may supply a definitive solution to the problem of selective reporting (Khoury et al. 2007)

Availability of raw data may help interested investigators reproduce the published analyses and also pursue additional analyses. A potential drawback of public data availability is that investigators using the data second-hand may not be aware of limitations or other problems that were originally encountered, unless these are also transparently reported. In this regard, collaboration of the data users with the original investigators may be beneficial. Issues of consent and confidentiality (Homer et al. 2008; Zerhouni and Nabel 2008) may also complicate what data can be shared, and how. It would be useful for published reports to specify not only what data can be accessed and where, but also briefly mention the procedure. For articles that have used publicly available data, it would be useful to clarify whether the original investigators were also involved and if so, how

The volume of data analyzed should also be considered in the interpretation of findings

Examples of methods of summarizing results include giving distribution of P values (frequentist statistics), distribution of effect sizes and specifying false discovery rates

Genotyping errors

Recommendation for reporting of methods (Table  1 , item 8(b)): Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates, and call rates. State the laboratory/center where genotyping was done. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches.

Recommendation for reporting of results (Table  1 , item 13(a)): Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful.

Genotyping errors can occur as a result of effects of the DNA sequence flanking the marker of interest, poor quality or quantity of the DNA extracted from biological samples, biochemical artefacts, poor equipment precision or equipment failure, or human error in sample handling, conduct of the array or handling the data obtained from the array (Pompanon et al. 2005). A commentary published in 2005 on the possible causes and consequences of genotyping errors observed that an increasing number of researchers were aware of the problem, but that the effects of such errors had largely been neglected (Pompanon et al. 2005). The magnitude of genotyping errors has been reported to vary between 0.5 and 30% (Pompanon et al. 2005; Akey et al. 2001; Dequeker et al. 2001; Mitchell et al. 2003). In high-throughput centers, an error rate of 0.5% per genotype has been observed for blind duplicates that were run on the same gel (Mitchell et al. 2003). This lower error rate reflects an explicit choice of markers for which genotyping rates have been found to be highly repeatable and whose individual polymerase chain reactions (PCR) have been optimized. Non-differential genotyping errors, that is, those that do not differ systematically according to outcome status, will usually bias associations towards the null (Rothman et al. 1993; Garcia-Closas et al. 2004), just as for other non-differential errors. The most marked bias occurs when genotyping sensitivity is poor and genotype prevalence is high (>85%) or, as the corollary, when genotyping specificity is poor and genotype prevalence is low (<15%) (Rothman et al. 1993). When measurement of the environmental exposure has substantial error, genotyping errors of the order of 3% can lead to substantial under-estimation of the magnitude of an interaction effect (Wong et al. 2004). When there are systematic differences in genotyping according to outcome status (differential error), bias in any direction may occur. Unblinded assessment may lead to differential misclassification. For genome-wide association studies of SNPs, differential misclassification between comparison groups (for example, cases and controls) can occur because of differences in DNA storage, collection or processing protocols, even when the genotyping itself meets the highest possible standards (Clayton et al. 2005). In this situation, using samples blinded to comparison group to determine the parameters for allele calling could still lead to differential misclassification. To minimize such differential misclassification, it would be necessary to calibrate the software separately for each group. This is one of the reasons for our recommendation to specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches.

Population stratification

Recommendation for reporting of methods (Table  1 , item 12(h)): Describe any methods used to assess or address population stratification.

Population stratification is the presence within a population of subgroups among which allele (or genotype; or haplotype) frequencies and disease risks differ. When the groups compared in the study differ in their proportions of the population subgroups, an association between the genotype and the disease being investigated may reflect the genotype being an indicator identifying a population subgroup rather than a causal variant. In this situation, population subgroup is a confounder because it is associated with both genotype frequency and disease risk. The potential implications of population stratification for the validity of genetic association studies have been debated (Knowler et al. 1988; Gelernter et al. 1993; Kittles et al. 2002; Thomas and Witte 2002; Wacholder et al. 2002; Cardon and Palmer 2003; Wacholder et al. 2000; Ardlie et al. 2002; Edland et al. 2004; Millikan 2001; Wang et al. 2004; Ioannidis et al. 2004; Marchini et al. 2004; Freedman et al. 2004; Khlat et al. 2004). Modeling the possible effect of population stratification (when no effort has been made to address it) suggests that the effect is likely to be small in most situations (Wacholder et al. 2000; Ardlie et al. 2002; Millikan 2001; Wang et al. 2004; Ioannidis et al. 2004). Meta-analyses of 43 gene-disease associations comprising 697 individual studies showed consistent associations across groups of different ethnic origin (Ioannidis et al. 2004), and thus provide evidence against a large effect of population stratification, hidden or otherwise. However, as studies of association and interaction typically address moderate or small effects and hence require large sample sizes, a small bias arising from population stratification may be important (Marchini et al. 2004). Study design (case-family control studies) and statistical methods (Balding 2006) have been proposed to address population stratification, but so far few studies have used these suggestions (Yesupriya et al. 2008). Most of the early genome-wide association studies used family-based designs or such methods as genomic control and principal components analysis (Wellcome Trust Case Control Consortium 2007; Ioannidis 2007) to control for stratification. These approaches are particularly appropriate for addressing bias when the identified genetic effects are very small (odds ratio < 1.20), as has been the situation in many recent genome-wide association studies (Wellcome Trust Case Control Consortium 2007; Parkes et al. 2007; Todd et al. 2007; Zeggini et al. 2007; Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research et al. 2007; Scott et al. 2007; Helgadottir et al. 2007; McPherson et al. 2007; Easton et al. 2007; Hunter et al. 2007; Stacey et al. 2007; Gudmundsson et al. 2007; Haiman et al. 2007b; Yeager et al. 2007; Zanke et al. 2007; Tomlinson et al. 2007; Haiman et al. 2007a; Rioux et al. 2007; Libioulle et al. 2007; Duerr et al. 2006). In view of the debate about the potential implications of population stratification for the validity of genetic association studies, we recommend transparent reporting of the methods used, or stating that none was used, to address this potential problem. This reporting will enable empirical evidence to accrue about the effects of population stratification and methods to address it.

Modeling haplotype variation

Recommendation for reporting of methods (Table  1 , item 12(g)): Describe any methods used for inferring genotypes or haplotypes.

A haplotype is a combination of specific alleles at neighboring genes that tends to be inherited together. There has been a considerable interest in modeling haplotype variation within candidate genes. Typically, the number of haplotypes observed within a gene is much smaller than the theoretical number of all possible haplotypes (Zhao et al. 2003; International HapMap Consortium et al. 2007). Motivation for utilizing haplotypes comes, in large part, from the fact that multiple SNPs may “tag” an untyped variant more effectively than a single typed variant. The subset of SNPs used in such an approach is called “haplotype tagging” SNPs. Implicitly, an aim of haplotype tagging is to reduce the number of SNPs that have to be genotyped, while maintaining statistical power to detect an association with the phenotype. Maps of human genetic variation are becoming more complete, and large-scale genotypic analysis is becoming increasingly feasible. In consequence, it is possible that modeling haplotype variation will become more focussed on rare causal variants, because these may not be included in the genotyping platforms.

In most current large-scale genetic association studies, data are collected as unphased multilocus genotypes (that is, which alleles are aligned together on particular segments of chromosome is unknown). It is common in such studies to use statistical methods to estimate haplotypes (Stephens et al. 2001; Qin et al. 2002; Scheet and Stephens 2006; Browning 2008), and their accuracy and efficiency have been discussed (Huang et al. 2003; Kamatani et al. 2004; Zhang et al. 2004; Carlson et al. 2004; van Hylckama Vlieg et al. 2004). Some methods attempt to make use of a concept called haplotype “blocks” (Greenspan and Geiger 2004; Kimmel and Shamir 2005), but the results of these methods are sensitive to the specific definitions of the “blocks” (Cardon and Abecasis 2003; Ke et al. 2004). Reporting of the methods used to infer individual haplotypes and population haplotype frequencies, along with their associated uncertainties should enhance our understanding of the possible effects of different methods of modeling haplotype variation on study results as well as enabling comparison and syntheses of results from different studies.

Information on common patterns of genetic variation revealed by the International Haplotype Map (HapMap) Project (International HapMap Consortium et al. 2007) can be applied in the analysis of genome-wide association studies to infer genotypic variation at markers not typed directly in these studies (Servin and Stephens 2007; Marchini et al. 2007). Essentially, these methods perform haplotype-based tests but make use of information on variation in a set of reference samples (for example, HapMap) to guide the specific tests of association, collapsing a potentially large number of haplotypes into two classes (the allelic variation) at each marker. It is expected that these techniques will increase power in individual studies, and will aid in combining data across studies, and even across differing genotyping platforms. If imputation procedures have been used, it is useful to know the method, accuracy thresholds for acceptable imputation, how imputed genotypes were handled or weighted in the analysis, and whether any associations based on imputed genotypes were also verified on the basis of direct genotyping at a subsequent stage.

Hardy–Weinberg equilibrium

Recommendation for reporting of methods (Table  1 , item 12(f)): State whether HWE was considered and, if so, how.

Hardy–Weinberg equilibrium has become widely accepted as an underlying model in population genetics after (Hardy 1908) and (Weinberg 1908) proposed the concept that genotype frequencies at a genetic locus are stable within one generation of random mating; the assumption of HWE is equivalent to the independence of two alleles at a locus. Views differ on whether testing for departure from HWE is a useful method to detect errors or peculiarities in the data set, and also the method of testing (Minelli et al. 2008). In particular, it has been suggested that deviation from HWE may be a sign of genotyping errors (Xu et al. 2002; Hosking et al. 2004; Salanti et al. 2005). Testing for departure from HWE has a role in detecting gross errors of genotyping in large-scale genotyping projects such as identifying SNPs for which the clustering algorithms used to call genotypes have broken down (Wellcome Trust Case Control Consortium 2007; Pearson and Manolio 2008). However, the statistical power to detect less important errors of genotyping by testing for departure from HWE is low (McCarthy et al. 2008) and, in hypothetical data, the presence of HWE was generally not altered by the introduction of genotyping errors (Zou and Donner 2006). Furthermore, the assumptions underlying HWE, including random mating, lack of selection according to genotype, and absence of mutation or gene flow, are rarely met in human populations (Shoemaker et al. 1998; Ayres and Balding 1998). In five of 42 gene-disease associations assessed in meta-analyses of almost 600 studies, the results of studies that violated HWE significantly differed from the results of studies that conformed to the model (Trikalinos et al. 2006). Moreover, the study suggested that the exclusion of HWE-violating studies may result in loss of the statistical significance of some postulated gene-disease associations and that adjustment for the magnitude of deviation from the model may also have the same consequence for some other gene-disease associations. Given the differing views about the value of testing for departure from HWE and about the test methods, transparent reporting of whether such testing was done and, if so, the method used, is important for allowing the empirical evidence to accrue.

For massive-testing platforms, such as genome-wide association studies, it might be expected that many false-positive violations of HWE would occur if a lenient P value threshold were set. There is no consensus on the appropriate P value threshold for HWE-related quality control in this setting. Hence, we recommend that investigators state which threshold they have used, if any, to exclude specific polymorphisms from further consideration. For SNPs with low minor allele frequencies, substantially more significant results than expected by chance have been observed, and the distribution of alleles at these loci has often been found to show departure from HWE.

For genome-wide association studies, another approach that has been used to detect errors or peculiarities in the data set (due to population stratification, genotyping error, HWE deviations or other reasons) has been to construct quantile–quantile (Q/Q) plots whereby observed association statistics or calculated P values for each SNP are ranked in order from smallest to largest and plotted against the expected null distribution (Pearson and Manolio 2008; McCarthy et al. 2008). The shape of the curve can lend insight into whether or not systematic biases are present.


Recommendation: state if the study is the first report of a genetic association, a replication effort, or both (Table  1 , item 3).

Articles that present and synthesize data from several studies in a single report are becoming more common. In particular, many genome-wide association analyses describe several different study populations, sometimes with different study designs and genotyping platforms, and in various stages of discovery and replication (Pearson and Manolio 2008; McCarthy et al. 2008). When data from several studies are presented in a single original report, each of the constituent studies and the composite results should be fully described. For example, a discussion of sample size and the reason for arriving at that size would include clear differentiation between the initial group (those that were typed with the full set of SNPs) and those that were included in the replication phase only (typed with a reduced set of SNPs) (Pearson and Manolio 2008; McCarthy et al. 2008). Describing the methods and results in sufficient detail would require substantial space in print, but options for publishing additional information on the study online make this possible.


The choices made for study design, conduct and data analysis potentially influence the magnitude and direction of results of genetic association studies. However, the empirical evidence on these effects is insufficient. Transparency of reporting is, thus, essential for developing a better evidence base (Table 2). Transparent reporting helps address gaps in empirical evidence (Bogardus et al. 1999), such as the effects of incomplete participation and genotyping errors. It will also help assess the impact of currently controversial issues such as population stratification, methods of inferring haplotypes, departure from HWE and multiple testing on effect estimates under different study conditions.

The STREGA Statement proposes a minimum checklist of items for reporting genetic association studies. The statement has several strengths. First, it is based on existing guidance on reporting observational studies (STROBE). Second, it was developed from discussions of an interdisciplinary group that included epidemiologists, geneticists, statisticians, journal editors, and graduate students, thus reflecting a broad collaborative approach in terminology accessible to scientists from diverse disciplines. Finally, it explicitly describes the rationale for the decisions (Table 2) and has a clear plan for dissemination and evaluation.

The STREGA recommendations are available at We welcome comments, which will be used to refine future versions of the recommendations. We note that little is known about the most effective ways to apply reporting guidelines in practice, and that therefore it has been suggested that editors and authors collect, analyze, and report their experiences in using such guidelines (Davidoff et al. 2008). We consider that the STREGA recommendations can be used by authors, peer reviewers and editors to improve the reporting of genetic association studies. We invite journals to endorse STREGA, for example by including STREGA and its Web address in their Instructions for Authors and by advising authors and peer reviewers to use the checklist as a guide. It has been suggested that reporting guidelines are most helpful if authors keep the general content of the guideline items in mind as they write their initial drafts, then refer to the details of individual items as they critically appraise what they have written during the revision process (Davidoff et al. 2008). We emphasize that the STREGA reporting guidelines should not be used for screening submitted manuscripts to determine the quality or validity of the study being reported. Adherence to the recommendations may make some manuscripts longer, and this may be seen as a drawback in an era of limited space in a print journal. However, the ability to post information on the Web should alleviate this concern. The place in which supplementary information is presented can be decided by authors and editors of the individual journal.

We hope that the recommendations stimulate transparent and improved reporting of genetic association studies. In turn, better reporting of original studies would facilitate the synthesis of available research results and the further development of study methods in genetic epidemiology with the ultimate goal of improving the understanding of the role of genetic factors in the cause of diseases.



The authors thank Kyle Vogan and Allen Wilcox for their participation in the workshop and for their comments; Michele Cargill (Affymetrix Inc) and Aaron del Duca (DNA Genotek) for their participation in the worshop as observers: and the Public Population Project in Genomics (P3G), hosted by the University of Montreal and supported by Genome Canada and Genome Quebec. This article was made possible thanks to input and discussion by the P3G International Working Group on Epidemiology and Biostatistics, discussion held in Montreal, May 2007. The authors also thank the reviewers for their very thoughtful feedback, and Silvia Visentin, Rob Moriarity, Morgan Macneill and Valery L’Heureux for administrative support. We were unable to contact Barbara Cohen to confirm her involvement in the latest version of this article. This article was supported by the Institutes of Genetics and of Nutrition, Metabolism and Diabetes, Canadian Institutes of Health Research; Genome Canada; Biotechnology, Genomics and Population Health Branch, Public Health Agency of Canada; Affymetrix; DNA Genotek; TrialStat!; and GeneSens. The funders had no role in the decision to submit the article or in its preparation.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


  1. Akey JM, Zhang K, Xiong M, Doris P, Jin L (2001) The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures. Am J Hum Genet 68:1447–1456PubMedCrossRefGoogle Scholar
  2. Altman D, Moher D (2005) Developing guidelines for reporting healthcare research: scientific rationale and procedures. Med Clin (Barc) 125:8–13Google Scholar
  3. Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gotzsche PC, Lang T, CONSORT GROUP (Consolidated Standards of Reporting Trials) (2001) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 134:663–694PubMedGoogle Scholar
  4. Anonymous (2005) Framework for a fully powered risk engine. Nat Genet 37:1153CrossRefGoogle Scholar
  5. Antonarakis SE (1998) Recommendations for a nomenclature system for human gene mutations. Nomenclature Working Group Hum Mutat 11:1–3Google Scholar
  6. Ardlie KG, Lunetta KL, Seielstad M (2002) Testing for population subdivision and association in four case–control studies. Am J Hum Genet 71:304–311PubMedCrossRefGoogle Scholar
  7. Ayres KL, Balding DJ (1998) Measuring departures from Hardy–Weinberg: a Markov chain Monte Carlo method for estimating the inbreeding coefficient. Heredity 80(Pt 6):769–777PubMedCrossRefGoogle Scholar
  8. Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791PubMedCrossRefGoogle Scholar
  9. Begg CB (2005) Reflections on publication criteria for genetic association studies. Cancer Epidemiol Biomarkers Prev 14:1364–1365PubMedCrossRefGoogle Scholar
  10. Bogardus ST Jr, Concato J, Feinstein AR (1999) Clinical epidemiological quality in molecular genetic research. The need for methodological standards. J Am Med Assoc 281:1919–1926CrossRefGoogle Scholar
  11. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet 29:356–371CrossRefGoogle Scholar
  12. Browning SR (2008) Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet 124:439–450PubMedCrossRefGoogle Scholar
  13. Byrnes G, Gurrin L, Dowty J, Hopper JL (2005) Publication policy or publication bias? Cancer Epidemiol Biomarkers Prev 14:1363PubMedCrossRefGoogle Scholar
  14. Cardon LR, Abecasis GR (2003) Using haplotype blocks to map human complex triat loci. Trends Genet 19:135–140PubMedCrossRefGoogle Scholar
  15. Cardon L, Bell J (2001) Association study designs for complex diseases. Nat Rev Genet 2:91–99PubMedCrossRefGoogle Scholar
  16. Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. Lancet 361:598–604PubMedCrossRefGoogle Scholar
  17. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analysis using linkage disequilibrium. Am J Hum Genet 74:106–120PubMedCrossRefGoogle Scholar
  18. Chan AW, Altman DG (2005) Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. Br Med J 330:753CrossRefGoogle Scholar
  19. Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG (2004a) Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. J Am Med Assoc 291:2457–2465CrossRefGoogle Scholar
  20. Chan AW, Krleza-Jeric K, Schmid I, Altman DG (2004b) Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. Can Med Assoc J 171:735–740CrossRefGoogle Scholar
  21. Clark MF, Baudouin SV (2006) A systematic review of the quality of genetic association studies in human sepsis. Intensive Care Med 32:1706–1712PubMedCrossRefGoogle Scholar
  22. Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, Smink LJ, Lam AC, Ovington NR, Stevens HE, Nutland S, Howson JM, Faham M, Moorhead M, Jones HB, Falkowski M, Hardenbol P, Willis TD, Todd JA (2005) Population structure, differential bias and genomic control in a large-scale, case–control association study. Nat Genet 37:1243–1246PubMedCrossRefGoogle Scholar
  23. Colhoun HM, McKeigue PM, Davey Smith G (2003) Problems of reporting genetic associations with complex outcomes. Lancet 361:865–872PubMedCrossRefGoogle Scholar
  24. Contopoulos-Ioannidis DG, Alexiou GA, Gouvias TC, Ioannidis JP (2006) An empirical evaluation of multifarious outcomes in pharmacogenetics: beta-2 adrenoceptor gene polymorphisms in asthma treatment. Pharmacogenet Genomics 16:705–711PubMedCrossRefGoogle Scholar
  25. Cooper DN, Nussbaum RL, Krawczak M (2002) Proposed guidelines for papers describing DNA polymorphism-disease associations. Hum Genet 110:208Google Scholar
  26. Crossman D, Watkins H (2004) Jesting Pilate, genetic case–control association studies, and heart. Heart 90:831–832PubMedCrossRefGoogle Scholar
  27. Davidoff F, Batalden P, Stevens D, Ogrinc G, Mooney S, SQUIRE DevelopmentGroup (2008) Publication guidelines for improvement studies in health care: evolution of the SQUIRE Project. Ann Intern Med 149:670–676PubMedGoogle Scholar
  28. DeLisi LE, Faraone SV (2006) When is a “positive” association truly a “positive” in psychiatric genetics? A commentary based on issues debated at the World Congress of Psychiatric Genetics, Boston, 12–18 October 2005. Am J Med Genet B Neuropsychiatr Genet 141:319–322Google Scholar
  29. den Dunnen JT, Antonarakis SE (2000) Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15:7–12CrossRefGoogle Scholar
  30. Dequeker E, Ramsden S, Grody WW, Stenzel TT, Barton DE (2001) Quality control in molecular genetic testing. Nat Rev Genet 2:717–723PubMedCrossRefGoogle Scholar
  31. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research, Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Bostrom K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Rastam L, Speliotes EK, Taskinen MR, Tuomi T, Guiducci C, Berglund A, Carlson J, Gianniny L, Hackett R, Hall L, Holmkvist J, Laurila E, Sjogren M, Sterner M, Surti A, Svensson M, Svensson M, Tewhey R, Blumenstiel B, Parkin M, Defelice M, Barry R, Brodeur W, Camarata J, Chia N, Fava M, Gibbons J, Handsaker B, Healy C, Nguyen K, Gates C, Sougnez C, Gage D, Nizzari M, Gabriel SB, Chirn GW, Ma Q, Parikh H, Richardson D, Ricke D, Purcell S (2007) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331–1336PubMedCrossRefGoogle Scholar
  32. Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, Dassopoulos T, Bitton A, Yang H, Targan S, Datta LW, Kistner EO, Schumm LP, Lee AT, Gregersen PK, Barmada MM, Rotter JI, Nicolae DL, Cho JH (2006) A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314:1461–1463PubMedCrossRefGoogle Scholar
  33. Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R, SEARCH collaborators, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen CY, Wu PE, Wang HC, Eccles D, Evans DG, Peto J, Fletcher O, Johnson N, Seal S, Stratton MR, Rahman N, Chenevix-Trench G, Bojesen SE, Nordestgaard BG, Axelsson CK, Garcia-Closas M, Brinton L, Chanock S, Lissowska J, Peplonska B, Nevanlinna H, Fagerholm R, Eerola H, Kang D, Yoo KY, Noh DY, Ahn SH, Hunter DJ, Hankinson SE, Cox DG, Hall P, Wedren S, Liu J, Low YL, Bogdanova N, Schurmann P, Dork T, Tollenaar RA, Jacobi CE, Devilee P, Klijn JG, Sigurdson AJ, Doody MM, Alexander BH, Zhang J, Cox A, Brock IW, MacPherson G, Reed MW, Couch FJ, Goode EL, Olson JE, Meijers-Heijboer H, van den Ouweland A, Uitterlinden A, Rivadeneira F, Milne RL, Ribas G, Gonzalez-Neira A, Benitez J, Hopper JL, McCredie M, Southey M, Giles GG, Schroen C, Justenhoven C, Brauch H, Hamann U, Ko YD, Spurdle AB, Beesley J, Chen X, kConFab, AOCS Management Group, Mannermaa A, Kosma VM, Kataja V, Hartikainen J, Day NE, Cox DR, Ponder BA (2007) Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447:1087–1093PubMedCrossRefGoogle Scholar
  34. Edland SD, Slager S, Farrer M (2004) Genetic association studies in Alzheimer’s disease research: challenges and opportunities. Stat Med 23:169–178PubMedCrossRefGoogle Scholar
  35. Ehm MG, Nelson MR, Spurr NK (2005) Guidelines for conducting and reporting whole genome/large-scale association studies. Hum Mol Genet 14:2485–2488PubMedCrossRefGoogle Scholar
  36. Freedman ML, Reich D, Penney KL, McDonald GJ, Mignault AA, Patterson N, Gabriel SB, Topol EJ, Smoller JW, Pato CN, Pato MT, Petryshen TL, Kolonel LN, Lander ES, Sklar P, Henderson B, Hirschhorn JN, Altshuler D (2004) Assessing the impact of population stratification on genetic association studies. Nat Genet 36:388–393PubMedCrossRefGoogle Scholar
  37. Freimer NB, Sabatti C (2005) Guidelines for association studies in human molecular genetics. Hum Mol Genet 14:2481–2483PubMedCrossRefGoogle Scholar
  38. Garcia-Closas M, Wacholder S, Caporaso N, Rothman N (2004) Inference issues in cohort and case–control studies of genetic effects and gene–environment interactions. In: Khoury MJ, Little J, Burke W (eds) Human genome epidemiology: a scientific foundation for using genetic information to improve health and prevent disease. Oxford University Press, New York, pp 127–144Google Scholar
  39. Gelernter J, Goldman D, Risch N (1993) The A1 allele at the D2 dopamine receptor gene and alcoholism: a reappraisal. J Am Med Assoc 269:1673–1677CrossRefGoogle Scholar
  40. Genomics Health and Society Working Group (2004) Genomics, Health and Society. Emerging Issues for Public Policy. Government of Canada Policy Research Initiative, OttawaGoogle Scholar
  41. Gluud LL (2006) Bias in clinical intervention research. Am J Epidemiol 163:493–501PubMedCrossRefGoogle Scholar
  42. Greenspan G, Geiger D (2004) Model-based inference of haplotype block variation. J Comput Biol 11:493–504PubMedCrossRefGoogle Scholar
  43. Gudmundsson J, Sulem P, Steinthorsdottir V, Bergthorsson JT, Thorleifsson G, Manolescu A, Rafnar T, Gudbjartsson D, Agnarsson BA, Baker A, Sigurdsson A, Benediktsdottir KR, Jakobsdottir M, Blondal T, Stacey SN, Helgason A, Gunnarsdottir S, Olafsdottir A, Kristinsson KT, Birgisdottir B, Ghosh S, Thorlacius S, Magnusdottir D, Stefansdottir G, Kristjansson K, Bagger Y, Wilensky RL, Reilly MP, Morris AD, Kimber CH, Adeyemo A, Chen Y, Zhou J, So WY, Tong PC, Ng MC, Hansen T, Andersen G, Borch-Johnsen K, Jorgensen T, Tres A, Fuertes F, Ruiz-Echarri M, Asin L, Saez B, van Boven E, Klaver S, Swinkels DW, Aben KK, Graif T, Cashy J, Suarez BK, van Vierssen Trip O, Frigge ML, Ober C, Hofker MH, Wijmenga C, Christiansen C, Rader DJ, Palmer CN, Rotimi C, Chan JC, Pedersen O, Sigurdsson G, Benediktsson R, Jonsson E, Einarsson GV, Mayordomo JI, Catalona WJ, Kiemeney LA, Barkardottir RB, Gulcher JR, Thorsteinsdottir U, Kong A, Stefansson K (2007) Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet 39:977–983PubMedCrossRefGoogle Scholar
  44. Haiman CA, Le Marchand L, Yamamoto J, Stram DO, Sheng X, Kolonel LN, Wu AH, Reich D, Henderson BE (2007a) A common genetic risk factor for colorectal and prostate cancer. Nat Genet 39:954–956PubMedCrossRefGoogle Scholar
  45. Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007b) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644PubMedCrossRefGoogle Scholar
  46. Hall IP, Blakey JD (2005) Genetic association studies in Thorax. Thorax 60:357–359PubMedCrossRefGoogle Scholar
  47. Hardy GH (1908) Mendelian proportions in a mixed population. Science 28:49–50PubMedCrossRefGoogle Scholar
  48. Hattersley AT, McCarthy MI (2005) What makes a good genetic association study? Lancet 366:1315–1323PubMedCrossRefGoogle Scholar
  49. Hegele R (2002) SNP judgements and freedom of association. Arterioscler Thromb Vasc Biol 22:1058–1061PubMedCrossRefGoogle Scholar
  50. Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, Jonasdottir A, Jonasdottir A, Sigurdsson A, Baker A, Palsson A, Masson G, Gudbjartsson DF, Magnusson KP, Andersen K, Levey AI, Backman VM, Matthiasdottir S, Jonsdottir T, Palsson S, Einarsdottir H, Gunnarsdottir S, Gylfason A, Vaccarino V, Hooper WC, Reilly MP, Granger CB, Austin H, Rader DJ, Shah SH, Quyyumi AA, Gulcher JR, Thorgeirsson G, Thorsteinsdottir U, Kong A, Stefansson K (2007) A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316:1491–1493PubMedCrossRefGoogle Scholar
  51. Higgins JP, Little J, Ioannidis JP, Bray MS, Manolio TA, Smeeth L, Sterne JA, Anagnostelis B, Butterworth AS, Danesh J, Dezateux C, Gallacher JE, Gwinn M, Lewis SJ, Minelli C, Pharoah PD, Salanti G, Sanderson S, Smith LA, Taioli E, Thompson JR, Thompson SG, Walker N, Zimmern RL, Khoury MJ (2007) Turning the pump handle: evolving methods for integrating the evidence on gene-disease association. Am J Epidemiol 166:863–866PubMedCrossRefGoogle Scholar
  52. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4:e1000167PubMedCrossRefGoogle Scholar
  53. Hosking L, Lumsden S, Lewis K, Yeo A, McCarthy L, Bansal A, Riley J, Purvis I, Xu CF (2004) Detection of genotyping errors by Hardy–Weinberg equilibrium testing. Eur J Hum Genet 12:395–399PubMedCrossRefGoogle Scholar
  54. Huang Q, Fu YX, Boerwinkle E (2003) Comparison of strategies for selecting single nucleotide polymorphisms for case/control association studies. Hum Genet 113:253–257PubMedCrossRefGoogle Scholar
  55. Huizinga TW, Pisetsky DS, Kimberly RP (2004) Associations, populations, and the truth: recommendations for genetic association studies in Arthritis & Rheumatism. Arthritis Rheum 50:2066–2071PubMedCrossRefGoogle Scholar
  56. Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, Wang J, Yu K, Chatterjee N, Orr N, Willett WC, Colditz GA, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, Hayes RB, Tucker M, Gerhard DS, Fraumeni JF Jr, Hoover RN, Thomas G, Chanock SJ (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39:870–874PubMedCrossRefGoogle Scholar
  57. International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe’er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L’Archeveque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861PubMedCrossRefGoogle Scholar
  58. Ioannidis JP (2007) Non-replication and Inconsistency in the genome-wide association setting. Hum Hered 64:203–213PubMedCrossRefGoogle Scholar
  59. Ioannidis JP, Ntzani EE, Trikalinos TA (2004) ‘Racial’ differences in genetic effects for complex diseases. Nat Genet 36:1312–1318PubMedCrossRefGoogle Scholar
  60. Ioannidis JP, Bernstein J, Boffetta P, Danesh J, Dolan S, Hartge P, Hunter D, Inskip P, Jarvelin MR, Little J, Maraganore DM, Bishop JA, O’Brien TR, Petersen G, Riboli E, Seminara D, Taioli E, Uitterlinden AG, Vineis P, Winn DM, Salanti G, Higgins JP, Khoury MJ (2005) A network of investigator networks in human genome epidemiology. Am J Epidemiol 162:302–304PubMedCrossRefGoogle Scholar
  61. Ioannidis JP, Gwinn M, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, Casas JP, Chokkalingam A, Danesh J, Smith GD, Dolan S, Duncan R, Gruis NA, Hartge P, Hashibe M, Hunter DJ, Jarvelin MR, Malmer B, Maraganore DM, Newton-Bishop JA, O’Brien TR, Petersen G, Riboli E, Salanti G, Seminara D, Smeeth L, Taioli E, Timpson N, Uitterlinden AG, Vineis P, Wareham N, Winn DM, Zimmern R, Khoury MJ, Human Genome Epidemiology Network and the Network of Investigator Networks (2006) A road map for efficient and reliable human genome epidemiology. Nat Genet 38:3–5PubMedCrossRefGoogle Scholar
  62. Kamatani N, Sekine A, Kitamoto T, Iida A, Saito S, Kogame A, Inoue E, Kawamoto M, Harigari M, Nakamura Y (2004) Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs. Am J Hum Genet 75:190–203PubMedCrossRefGoogle Scholar
  63. Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P (2004) The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 13:577–588PubMedCrossRefGoogle Scholar
  64. Khlat M, Cazes MH, Genin E, Guiguet M (2004) Robustness of case–control studies of genetic factors to population stratification: magnitude of bias and type I error. Cancer Epidemiol Biomarkers Prev 13:1660–1664PubMedGoogle Scholar
  65. Khoury MJ, Little J, Burke W (2004) Human genome epidemiology: scope and strategies. In: Khoury MJ, Little J, Burke W (eds) Human genome epidemiology. Oxford University Press, New York, pp 3–16Google Scholar
  66. Khoury MJ, Little J, Gwinn M, Ioannidis JP (2007) On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies. Int J Epidemiol 36:439–445PubMedCrossRefGoogle Scholar
  67. Kimmel G, Shamir R (2005) GERBIL: Genotype resolution and block identification using likelihood. Proc Natl Acad Sci USA 102:158–162PubMedCrossRefGoogle Scholar
  68. Kittles RA, Chen W, Panguluri RK, Ahaghotu C, Jackson A, Adebamowo CA, Griffin R, Williams T, Ukoli F, Adams-Campbell L, Kwagyan J, Isaacs W, Freeman V, Dunston GM (2002) CYP3A4-V and prostate cancer in African Americans: causal or confounding association because of population stratification? Hum Genet 110:553–560PubMedCrossRefGoogle Scholar
  69. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG (1988) Gm3, 5, 13, 14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Human Genet 43:520–526Google Scholar
  70. Lawrence RW, Evans DM, Cardon LR (2005) Prospects and pitfalls in whole genome association studies. Philos Trans R Soc Lond B Biol Sci 360:1589–1595PubMedCrossRefGoogle Scholar
  71. Lee W, Bindman J, Ford T, Glozier N, Moran P, Stewart R, Hotopf M (2007) Bias in psychiatric case–control studies: literature survey. Br J Psychiatry 190:204–209PubMedCrossRefGoogle Scholar
  72. Libioulle C, Louis E, Hansoul S, Sandor C, Farnir F, Franchimont D, Vermeire S, Dewit O, de Vos M, Dixon A, Demarche B, Gut I, Heath S, Foglio M, Liang L, Laukens D, Mni M, Zelenika D, Van Gossum A, Rutgeerts P, Belaiche J, Lathrop M, Georges M (2007) Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet 3:e58PubMedCrossRefGoogle Scholar
  73. Lin BK, Clyne M, Walsh M, Gomez O, Yu W, Gwinn M, Khoury MJ (2006) Tracking the epidemiology of human genes in the literature: The HuGE published literature database. Am J Epidemiol 164:1–4PubMedCrossRefGoogle Scholar
  74. Little J (2004) Reporting and review of human genome epidemiology studies. In: Khoury MJ, Little J, Burke W (eds) Human genome epidemiology: a scientific foundation for using genetic information to improve health and prevent disease. Oxford University Press, New York, pp 168–192Google Scholar
  75. Little J, Higgins JPT (eds) (2006) The HuGENet™ HuGE Review Handbook, version 1.0. Available at Accessed 28 February 2006
  76. Little J, Bradley L, Bray MS, Clyne M, Dorman J, Ellsworth DL, Hanson J, Khoury M, Lau J, O’Brien TR, Rothman N, Stroup D, Taioli E, Thomas D, Vainio H, Wacholder S, Weinberg C (2002) Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. Am J Epidemiol 156:300–310PubMedGoogle Scholar
  77. Little J, Khoury MJ, Bradley L, Clyne M, Gwinn M, Lin B, Lindegren ML, Yoon P (2003) The human genome project is complete. How do we develop a handle for the pump? Am J Epidemiol 157:667–673PubMedCrossRefGoogle Scholar
  78. Lynch M, Ritland K (1999) Estimation of pairwise relatedness with molecular markers. Genetics 152:1753–1766PubMedGoogle Scholar
  79. Manly K (2005) Reliability of statistical associations between genes and disease. Immunogenetics 57:549–558PubMedCrossRefGoogle Scholar
  80. Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517PubMedCrossRefGoogle Scholar
  81. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913PubMedCrossRefGoogle Scholar
  82. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9:356–369PubMedCrossRefGoogle Scholar
  83. McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg-Hansen A, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC (2007) A common allele on chromosome 9 associated with coronary heart disease. Science 316:1488–1491PubMedCrossRefGoogle Scholar
  84. Millikan RC (2001) Re: population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst 93:156–157PubMedCrossRefGoogle Scholar
  85. Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2008) How should we use information about HWE in the meta-analyses of genetic association studies? Int J Epidemiol 37:136–146PubMedCrossRefGoogle Scholar
  86. Mitchell AA, Cutler DJ, Chakravarti A (2003) Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am J Hum Genet 72:598–610PubMedCrossRefGoogle Scholar
  87. Moher D, Schultz KF, Altman D (2001) The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. J Am Med Assoc 285:1987–1991CrossRefGoogle Scholar
  88. Nature Genetics (1999) Freely associating (editorial). Nat Genet 22:1–2CrossRefGoogle Scholar
  89. NCI-NHGRI Working Group on Replication in Association Studies, Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, Brooks LD, Cardon LR, Daly M, Donnelly P, Fraumeni JF Jr, Freimer NB, Gerhard DS, Gunter C, Guttmacher AE, Guyer MS, Harris EL, Hoh J, Hoover R, Kong CA, Merikangas KR, Morton CC, Palmer LJ, Phimister EG, Rice JP, Roberts J, Rotimi C, Tucker MA, Vogan KJ, Wacholder S, Wijsman EM, Winn DM, Collins FS (2007) Replicating genotype–phenotype associations. Nature 447:655–660PubMedCrossRefGoogle Scholar
  90. Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, Roberts RG, Nimmo ER, Cummings FR, Soars D, Drummond H, Lees CW, Khawaja SA, Bagnall R, Burke DA, Todhunter CE, Ahmad T, Onnie CM, McArdle W, Strachan D, Bethel G, Bryan C, Lewis CM, Deloukas P, Forbes A, Sanderson J, Jewell DP, Satsangi J, Mansfield JC, The Wellcome Trust Case Control Consortium, Cardon L, Mathew CG (2007) Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility. Nat Genet 39:830–832PubMedCrossRefGoogle Scholar
  91. Pearson TA, Manolio TA (2008) How to interpret a genome-wide association study. J Am Med Assoc 299:1335–1344CrossRefGoogle Scholar
  92. Peters DL, Barber RC, Flood EM, Garner HR, O’Keefe GE (2003) Methodologic quality and genotyping reproducibility in studies of tumor necrosis factor-308 G→A. A single nucleotide polymorphism and bacterial sepsis: implications for studies of complex traits. Crit Care Med 31:1691–1696PubMedCrossRefGoogle Scholar
  93. Pharoah PD, Dunning AM, Ponder BA, Easton DF (2005) The reliable identification of disease–gene associations. Cancer Epidemiol Biomarkers Prev 14:1362PubMedCrossRefGoogle Scholar
  94. Plagnol V, Cooper JD, Todd JA, Clayton DG (2007) A method to address differential bias in genotyping in large-scale association studies. PLoS Genet 3:e74PubMedCrossRefGoogle Scholar
  95. Pocock SJ, Collier TJ, Dandreo KJ, de Stavola BL, Goldman MB, Kalish LA, Kasten LE, McCormack VA (2004) Issues in the reporting of epidemiological studies: a survey of recent practice. Br Med J 329:883CrossRefGoogle Scholar
  96. Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes, consequences and solutions. Nat Rev Genet 6:847–859PubMedCrossRefGoogle Scholar
  97. Qin ZS, Niu T, Liu JS (2002) Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am J Hum Genet 71:1242–1247PubMedCrossRefGoogle Scholar
  98. Rebbeck TR, Martinez ME, Sellers TA, Shields PG, Wild CP, Potter JD (2004) Genetic variation and cancer: improving the environment for publication of association studies. Cancer Epidemiol Biomarkers Prev 13:1985–1986PubMedGoogle Scholar
  99. Reid MC, Lachs MS, Feinstein AR (1995) Use of methodological standards in diagnostic test research. Getting better but still not good. J Am Med Assoc 274:645–651CrossRefGoogle Scholar
  100. Rioux JD, Xavier RJ, Taylor KD, Silverberg MS, Goyette P, Huett A, Green T, Kuballa P, Barmada MM, Datta LW, Shugart YY, Griffiths AM, Targan SR, Ippoliti AF, Bernard EJ, Mei L, Nicolae DL, Regueiro M, Schumm LP, Steinhart AH, Rotter JI, Duerr RH, Cho JH, Daly MJ, Brant SR (2007) Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet 39:596–604PubMedCrossRefGoogle Scholar
  101. Romero R, Kuivaniemi H, Tromp G, Olson JM (2002) The design, execution, and interpretation of genetic association studies to decipher complex diseases. Am J Obstet Gynecol 187:1299–1312PubMedCrossRefGoogle Scholar
  102. Rothman N, Stewart WF, Caporaso NE, Hayes RB (1993) Misclassification of genetic susceptibility biomarkers: implications for case–control studies and cross-population comparisons. Cancer Epidemiol Biomarkers Prev 2:299–303PubMedGoogle Scholar
  103. Saito YA, Talley NJ, de Andrade M, Petersen GM (2006) Case–control genetic association studies in gastrointestinal disease: review and recommendations. Am J Gastroenterol 101:1379–1389PubMedCrossRefGoogle Scholar
  104. Salanti G, Amountza G, Ntzani EE, Ioannidis JP (2005) Hardy–Weinberg equilibrium in genetic association studies: an empirical evaluation of reporting, deviations, and power. Eur J Hum Genet 13:840–848PubMedCrossRefGoogle Scholar
  105. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644PubMedCrossRefGoogle Scholar
  106. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341–1345PubMedCrossRefGoogle Scholar
  107. Scuteri A, Sanna S, Chen WM, Uda M, Albai G, Strait J, Najjar S, Nagaraja R, Orru M, Usala G, Dei M, Lai S, Maschio A, Busonero F, Mulas A, Ehret GB, Fink AA, Weder AB, Cooper RS, Galan P, Chakravarti A, Schlessinger D, Cao A, Lakatta E, Abecasis GR (2007) Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet 3:e115PubMedCrossRefGoogle Scholar
  108. Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3:e114PubMedCrossRefGoogle Scholar
  109. Shen H, Liu Y, Liu P, Recker R, Deng H (2005) Nonreplication in genetic studies of complex diseases—lessons learned from studies of osteoporosis and tentative remedies. J Bone Miner Res 20:365–376PubMedCrossRefGoogle Scholar
  110. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311PubMedCrossRefGoogle Scholar
  111. Shoemaker J, Painter I, Weir BS (1998) A Bayesian characterization of Hardy–Weinberg disequilibrium. Genetics 149:2079–2088PubMedGoogle Scholar
  112. Slager SL, Schaid DJ (2001) Evaluation of candidate genes in case–control studies: a statistical method to account for related subjects. Am J Hum Genet 68:1457–1462PubMedCrossRefGoogle Scholar
  113. Stacey SN, Manolescu A, Sulem P, Rafnar T, Gudmundsson J, Gudjonsson SA, Masson G, Jakobsdottir M, Thorlacius S, Helgason A, Aben KK, Strobbe LJ, Albers-Akkers MT, Swinkels DW, Henderson BE, Kolonel LN, Le Marchand L, Millastre E, Andres R, Godino J, Garcia-Prats MD, Polo E, Tres A, Mouy M, Saemundsdottir J, Backman VM, Gudmundsson L, Kristjansson K, Bergthorsson JT, Kostic J, Frigge ML, Geller F, Gudbjartsson D, Sigurdsson H, Jonsdottir T, Hrafnkelsson J, Johannsson J, Sveinsson T, Myrdal G, Grimsson HN, Jonsson T, von Holst S, Werelius B, Margolin S, Lindblom A, Mayordomo JI, Haiman CA, Kiemeney LA, Johannsson OT, Gulcher JR, Thorsteinsdottir U, Kong A, Stefansson K (2007) Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet 39:865–869PubMedCrossRefGoogle Scholar
  114. Steinberg K, Gallagher M (2004) Assessing genotypes in human genome epidemiology studies. In: Khoury MJ, Little J, Burke W (eds) Human genome epidemiology: a scientific foundation for using genetic information to improve health and prevent disease. Oxford University Press, New York, pp 79–91Google Scholar
  115. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989PubMedCrossRefGoogle Scholar
  116. Tan N, Mulley J, Berkovic S (2004) Association studies in epilepsy: “the truth is out there”. Epilepsia 45:1429–1442PubMedCrossRefGoogle Scholar
  117. Thomas DC (2006) Are we ready for genome-wide association studies? Cancer Epidemiol Biomarkers Prev 15:595–598PubMedCrossRefGoogle Scholar
  118. Thomas DC, Witte JS (2002) Point: population stratification: a problem for case–control studies of candidate–gene associations? Cancer Epidemiol Biomarkers Prev 11:505–512PubMedGoogle Scholar
  119. Tobin MD, Sheehan NA, Scurrah KJ, Burton PR (2005) Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med 24:2911–2935PubMedCrossRefGoogle Scholar
  120. Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, Plagnol V, Bailey R, Nejentsev S, Field SF, Payne F, Lowe CE, Szeszko JS, Hafler JP, Zeitels L, Yang JH, Vella A, Nutland S, Stevens HE, Schuilenburg H, Coleman G, Maisuria M, Meadows W, Smink LJ, Healy B, Burren OS, Lam AA, Ovington NR, Allen J, Adlem E, Leung HT, Wallace C, Howson JM, Guja C, Ionescu-Tirgoviste C, Genetics of Type 1 Diabetes in Finland, Simmonds MJ, Heward JM, Gough SC, Dunger DB, The Wellcome Trust Case Control Consortium, Wicker LS, Clayton DG (2007) Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet 39:857–864PubMedCrossRefGoogle Scholar
  121. Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, Barclay E, Lubbe S, Martin L, Sellick G, Jaeger E, Hubner R, Wild R, Rowan A, Fielding S, Howarth K, the CORGI Consortium, Silver A, Atkin W, Muir K, Logan R, Kerr D, Johnstone E, Sieber O, Gray R, Thomas H, Peto J, Cazier JB, Houlston R (2007) A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 39:984–988PubMedCrossRefGoogle Scholar
  122. Trikalinos TA, Salanti G, Khoury MJ, Ioannidis JP (2006) Impact of violations and deviations in Hardy–Weinberg equilibrium on postulated gene-disease associations. Am J Epidemiol 163:300–309PubMedCrossRefGoogle Scholar
  123. Uhlig K, Menon V, Schmid CH (2007) Recommendations for reporting of clinical research studies. Am J Kidney Dis 49:3–7PubMedCrossRefGoogle Scholar
  124. van Duijn CM, Porta M (2003) Good prospects for genetic and molecular epidemiologic studies in the European Journal of Epidemiology. Eur J Epidemiol 18:285–286PubMedCrossRefGoogle Scholar
  125. van Hylckama Vlieg A, Sandkuijl LA, Rosendaal FR, Bertina RM, Vos HL (2004) Candidate gene approach in association studies: would the factor V Leiden mutation have been found by this approach? Eur J Hum Genet 12:478–482PubMedCrossRefGoogle Scholar
  126. Vandenbroucke JP, von Elm E, Altman DG, Gotzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M, STROBE initiative (2007) Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Ann Intern Med 147:W163–W194PubMedGoogle Scholar
  127. Vitali S, Randolph A (2005) Assessing the quality of case–control association studies on the genetic basis of sepsis. Pediatr Crit Care Med 6:S74–S77PubMedCrossRefGoogle Scholar
  128. Voight BF, Pritchard JK (2005) Confounding from cryptic relatedness in case–control association studies. PLoS Genet 1:e32PubMedCrossRefGoogle Scholar
  129. von Elm E, Egger M (2004) The scandal of poor epidemiological research. Br Med J 329:868–869CrossRefGoogle Scholar
  130. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, for the STROBE Initiative (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Guidelines for Reporting Observational Studies. PLoS Med 4:e296CrossRefGoogle Scholar
  131. Wacholder S (2005) Publication environment and broad investigation of the genome. Cancer Epidemiol Biomarkers Prev 14:1361PubMedCrossRefGoogle Scholar
  132. Wacholder S, Rothman N, Caporaso N (2000) Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst 92:1151–1158PubMedCrossRefGoogle Scholar
  133. Wacholder S, Chatterjee N, Hartge P (2002) Joint effects of genes and environment distorted by selection biases: implications for hospital-based case–control studies. Cancer Epidemiol Biomarkers Prev 11:885–889PubMedGoogle Scholar
  134. Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S (2002a) Guidelines for human gene nomenclature. Genomics 79:464–470PubMedCrossRefGoogle Scholar
  135. Wain HM, Lush M, Ducluzeau F, Povey S (2002b) Genew: the human gene nomenclature database. Nucleic Acids Res 30:169–171PubMedCrossRefGoogle Scholar
  136. Wang Y, Localio R, Rebbeck TR (2004) Evaluating bias due to population stratification in case–control association studies of admixed populations. Genet Epidemiol 27:14–20PubMedCrossRefGoogle Scholar
  137. Wedzicha JA, Hall IP (2005) Publishing genetic association studies in Thorax. Thorax 60:357CrossRefGoogle Scholar
  138. Weinberg W (1908) Über den Nachweis der Vererbung beim Menschen. Jahrhefte Des Vereines Für Vaterländische Naturkunde in Württemberg 64:368–382Google Scholar
  139. Weiss S (2001) Association studies in asthma genetics. Am J Resp Crit Care Med 164:2014–2015PubMedGoogle Scholar
  140. Weiss ST, Silverman EK, Palmer LJ (2001) Case–control association studies in pharmacogenetics. Pharmacogenomics J 1:157–158PubMedGoogle Scholar
  141. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678CrossRefGoogle Scholar
  142. Whittemore AS (2005) Genetic association studies: time for a new paradigm? Cancer Epidemiol Biomarkers Prev 14:1359–1360PubMedCrossRefGoogle Scholar
  143. Winker MA (2006) Race and ethnicity in medical research: requirements meet reality. J Law Med Ethics 34:520–525 480PubMedCrossRefGoogle Scholar
  144. Wong MY, Day NE, Luan JA, Wareham NJ (2004) Estimation of magnitude in gene-environment interactions in the presence of measurement error. Stat Med 23:987–998PubMedCrossRefGoogle Scholar
  145. Xu J, Turner A, Little J, Bleecker ER, Meyers DA (2002) Positive results in association studies are associated with departure from Hardy–Weinberg equilibrium: hint for genotyping error? Hum Genet 111:573–574PubMedCrossRefGoogle Scholar
  146. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, Wang Z, Welch R, Staats BJ, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Gelmann EP, Tucker M, Gerhard DS, Fraumeni JF Jr, Hoover R, Hunter DJ, Chanock SJ, Thomas G (2007) Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39:645–649PubMedCrossRefGoogle Scholar
  147. Yesupriya A, Evangelou E, Kavvoura FK, Patsopoulos NA, Clyne M, Walsh M, Lin BK, Yu W, Gwinn M, Ioannidis JPA, Khoury MJ (2008) Reporting of human genome epidemiology (HuGE) association studies: an empirical assessment. BMC Med Res Methodol 8:31PubMedCrossRefGoogle Scholar
  148. Yu Y, Yesupriya A, Clyne M, Wulf A, Gwinn M, Khoury MJ (2008) HuGE Literature Finder. HuGE Navigator. Available at Accessed 15 December 2008
  149. Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM, Prendergast J, Olschwang S, Chiang T, Crowdy E, Ferretti V, Laflamme P, Sundararajan S, Roumy S, Olivier JF, Robidoux F, Sladek R, Montpetit A, Campbell P, Bezieau S, O’shea AM, Zogopoulos G, Cotterchio M, Newcomb P, McLaughlin J, Younghusband B, Green R, Green J, Porteous ME, Campbell H, Blanche H, Sahbatou M, Tubacher E, Bonaiti-Pellie C, Buecher B, Riboli E, Kury S, Chanock SJ, Potter J, Thomas G, Gallinger S, Hudson TJ, Dunlop MG (2007) Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet 39:989–994PubMedCrossRefGoogle Scholar
  150. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT (2007) Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316:1336–1341PubMedCrossRefGoogle Scholar
  151. Zerhouni EA, Nabel EG (2008) Protecting aggregate genomic data. Science 322:44PubMedCrossRefGoogle Scholar
  152. Zhang W, Collins A, Morton NE (2004) Does haplotype diversity predict power for association mapping of disease susceptibility? Hum Genet 115:157–164PubMedCrossRefGoogle Scholar
  153. Zhao LP, Li SS, Khalid N (2003) A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case–control studies. Am J Hum Genet 72:1231–1250PubMedCrossRefGoogle Scholar
  154. Zou GY, Donner A (2006) The merits of testing Hardy–Weinberg equilibrium in the analysis of unmatched case–control data: a cautionary note. Ann Hum Genet 70:923–933PubMedCrossRefGoogle Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Julian Little
    • 1
    • 2
    Email author
  • Julian P. T. Higgins
    • 3
  • John P. A. Ioannidis
    • 4
    • 5
  • David Moher
    • 2
  • France Gagnon
    • 6
  • Erik von Elm
    • 7
    • 8
  • Muin J. Khoury
    • 9
  • Barbara Cohen
    • 10
  • George Davey-Smith
    • 11
  • Jeremy Grimshaw
    • 12
  • Paul Scheet
    • 13
  • Marta Gwinn
    • 9
  • Robin E. Williamson
    • 14
  • Guang Yong Zou
    • 15
    • 16
  • Kim Hutchings
    • 2
  • Candice Y. Johnson
    • 2
  • Valerie Tait
    • 2
  • Miriam Wiens
    • 2
  • Jean Golding
    • 17
  • Cornelia van Duijn
    • 18
  • John McLaughlin
    • 19
    • 20
  • Andrew Paterson
    • 21
  • George Wells
    • 22
  • Isabel Fortier
    • 23
  • Matthew Freedman
    • 24
  • Maja Zecevic
    • 25
  • Richard King
    • 26
  • Claire Infante-Rivard
    • 27
  • Alex Stewart
    • 28
  • Nick Birkett
    • 2
  1. 1.Canada Research Chair in Human Genome EpidemiologyOttawaCanada
  2. 2.Department of Epidemiology and Community MedicineUniversity of OttawaOttawaCanada
  3. 3.MRC Biostatistics UnitCambridgeUK
  4. 4.Department of Hygiene and Epidemiology, School of MedicineUniversity of IoanninaIoanninaGreece
  5. 5.Center for Genetic Epidemiology and ModelingTufts University School of MedicineBostonUSA
  6. 6.CIHR New Investigator and Canada Research Chair in Genetic EpidemiologyUniversity of TorontoTorontoCanada
  7. 7.Institute of Social and Preventive MedicineUniversity of BernBernSwitzerland
  8. 8.Department of Medical Biometry and Medical Informatics, German Cochrane CentreUniversity Medical CentreFreiburgGermany
  9. 9.National Office of Public Health GenomicsCenters for Disease Control and PreventionAtlantaUSA
  10. 10.Public Library of ScienceSan FranciscoUSA
  11. 11.Department of Social Medicine, MRC Centre for Causal Analyses in Translational EpidemiologyUniversity of BristolBristolUK
  12. 12.Canada Research Chair in Health Knowledge Transfer and Uptake, Clinical Epidemiology Program, Department of Medicine, Ottawa Health Research InstituteUniversity of OttawaOttawaCanada
  13. 13.Department of Epidemiology, MD Anderson Cancer CenterUniversity of TexasHoustonUSA
  14. 14.American Journal of Human GeneticsBostonUSA
  15. 15.Department of Epidemiology and BiostatisticsUniversity of Western OntarioLondonCanada
  16. 16.Robarts Clinical TrialsRobarts Research InstituteLondonCanada
  17. 17.Paediatric and Perinatal EpidemiologyBristolUK
  18. 18.European Journal of EpidemiologyRotterdamThe Netherlands
  19. 19.Cancer Care OntarioTorontoCanada
  20. 20.Prosserman Centre for Health Research, Samuel Lunenfeld Research InstituteTorontoCanada
  21. 21.Canada Research Chair in Genetics of Complex DiseasesHospital for Sick Children (SickKids)TorontoCanada
  22. 22.Cardiovascular Research Methods CentreUniversity of Ottawa Heart InstituteOttawaCanada
  23. 23.Genome Quebec and P3G ObservatoryMcGill University and Genome Quebec Innovation CenterMontréalCanada
  24. 24.Dana-Farber Cancer InstituteBostonUSA
  25. 25.LancetNew YorkUSA
  26. 26.Genetics in MedicineMinneapolisUSA
  27. 27.Department of Epidemiology, Biostatistics and Occupational Health, Faculty of MedicineMcGill UniversityMontrealCanada
  28. 28.University of Ottawa Heart InstituteOttawaCanada

Personalised recommendations