Assessment of systematic effects of methodological characteristics on candidate genetic associations
- First Online:
- Cite this article as:
- Aljasir, B., Ioannidis, J.P.A., Yurkiewich, A. et al. Hum Genet (2013) 132: 167. doi:10.1007/s00439-012-1237-4
- 241 Views
Candidate genetic association studies have been found to have a low replication rate in the past. Here, we aimed to assess whether aspects of reported methodological characteristics in genetic association studies may be related to the magnitude of effects observed. An observational, literature-based investigation of 511 case–control studies of genetic association studies indexed in 2007, was undertaken. Meta-regression analyses were used to assess the relationship between 23 reported methodological characteristics and the magnitude of genetic associations. The 511 studies had been conducted in 52 countries and were published in 220 journals (median impact factor 5.1). The multivariate meta-regression model of methodological characteristics plus disease category accounted for 17.2 % of the between-study variance in the magnitude of the reported genetic associations. Our findings are consistent with the view that better conducted and better reported genetic association research may lead to less inflated results.
Over 7,000 papers are published annually on gene-disease associations and the pace is accelerating (Khoury et al. 2009). A well-developed body of knowledge about human genetic susceptibility is essential in identifying the causes of diseases influenced by genetic factors, and offering insights for prevention and developing new therapies (Lawrence et al. 2005; Thomas et al. 2005; Khoury et al. 2004, 2005, 2007; Genomics, Health and Society Working Group 2004; Lavedan et al. 2004; Mackay and Taylor 2006; Shastry 2005). The effective use of knowledge related to gene-disease associations relies on the optimal conduct and reporting of research studies. Transparent reporting enables readers to identify the strengths and weaknesses of research and subsequently to determine the quality of evidence supporting any particular piece of knowledge in the field of gene-disease associations. Adequate research reporting is important for both primary studies and systematic reviews that try to synthesize the findings of primary studies (Lohmueller et al. 2003; Ioannidis et al. 2001; Stroup et al. 2000; Patsopoulos et al. 2005; Little et al. 2002; Little 2004; Dickersin 2002; Committee on Assuring the Health of the Public in the 21st Century 2002). Issues related to the methodological and analytic processes of conducting primary studies, such as research design, study population, study sample, control characteristics, similarities in methods of processing and handling laboratory samples for cases and controls, quality assurance measures (e.g., blinding of staff conducting the research), statistical methods, and selective outcome reporting may affect the validity of the results. Previous evaluations of the reporting of genetic association studies relate to studies published more than a decade ago, and to specific disease areas (Bogardus et al. 1999; Peters et al. 2003; Clark and Baudouin 2006; Lee et al. 2007; Yesupriya et al. 2008). While many investigators have discussed the importance of these characteristics for genetic studies (Bogardus et al. 1999; Yesupriya et al. 2008; Ioannidis et al. 2006; Ioannidis 2003; Hegele 2002; Wedzicha and Hall 2005; Mayo 2008; Crow 1999; Edwards 2008; Rosenthal and DiMatteo 2001), there is limited and uncertain empirical evidence on whether methodological characteristics indeed correlate with increased or decreased reported genetic effects in gene-disease association studies (Lohmueller et al. 2003; Ioannidis et al. 2001, 2006; Peters et al. 2003; Clark and Baudouin 2006; Lee et al. 2007; Ioannidis 2003; Nat. Genet editorial 1999; Gambaro et al. 2000; Cardon and Bell 2001; Hirschhorn et al. 2002; Tabor et al. 2002; Lancet editorial 2003; Trikalinos et al. 2006; Dong et al. 2008; Rebbeck et al. 2004). Candidate studies remain important even with the advent of agnostic testing that has revolutionized the discovery and characterization of common variants associated with diverse phenotypes since 2005 (McCarthy et al. 2008). These studies are providing a new generation of more reliable candidate genes, and the subsequent validation and evaluation in diverse populations of these variants are still carried out with traditional case–control designs.
In the current study, we evaluated the reporting of methodological characteristics in a large number of candidate genetic association studies and investigated the correlation between reported methodological characteristics and the magnitude of genetic effect sizes in a large systematic sample of these studies. The study was intentionally limited to articles published in 2007, to provide a timepoint before the publication of the STREGA guidelines (Little et al. 2009). These guidelines are an effort to improve the reporting of genetic epidemiology studies. The information from the present study is aimed at providing a baseline assessment for the evaluation of the impact of these guidelines and of the evolution of methodological aspects in the genetic epidemiology field in future assessments.
Sampling frame and inclusion criteria
The sampling frame was comprised of studies investigating at least one gene-disease association indexed in the HuGE Literature Finder database (Yu et al. 2007) in 2007. The HuGE Literature Finder is a curated, regularly updated database of genetic epidemiology studies selected through PubMed (Yu et al. 2008). To enhance the homogeneity of the included studies, we only considered English language studies with non-familial case–control designs addressing one or multiple specific gene-disease associations. Nested designs were included. We excluded non-English language studies, cohort studies, studies of gene-environment and gene–gene interactions, articles that reported exclusively family-based analyses and those that reported genome-wide association studies with “agnostic” testing for associations.
Sample size calculation
The sample size calculation for the number of studies to be evaluated was based on a review of published studies investigating the effect of study characteristics on magnitude of the reported effects (e.g., odds ratio) (Moher et al. 2000; Pham et al. 2005). We estimated that a total sample size of 484 reports would be required to detect a 25 % change in the overall log odds ratio associated with a study characteristic, assuming a random effect, a two-sided t test, 95 % confidence interval, and 80 % power. We consider this effect conservative because previous meta-analyses of gene-disease associations have indicated more marked contrasts in the results of individual studies (Ioannidis and Trikalinos 2005). We increased the sample size requirement by 10 % (to n = 533) to allow for the exclusion of ineligible studies based on a review of the study title and abstract.
Selection of studies and data extraction
The selection process was first assessed in a pilot study of 62 articles to ensure that it was optimized. Simple computer-generated random sampling was used to select 533 citations. If the randomization process had selected an article previously used in the pilot study or it was not available as a full text in English, it was replaced by either the next article in the list of citations or the previously listed one; the next or previous listed article was selected alternately to ensure randomization and reduce selection bias. Two reviewers assessed each study independently. The title, abstract, and keywords of every record retrieved were reviewed first. If the article was potentially eligible, then the full text was retrieved for more detailed examination to ensure its eligibility. Two reviewers independently extracted details on the included studies using an electronic data extraction form. Disagreements between the reviewers were resolved by consensus and discussion with a third party when required. A final list of 511 articles was eligible.
Outcome measure (reported genetic effect)
The outcome measure extracted for this analysis was the most statistically significant odds ratio or other measures of association considered in a primary analysis in the article. We relied on the study-specific classification of reference genotype or allele. To identify the outcome measure, we first searched the abstract for an odds ratio and p value (or 95 % confidence interval from which a p value could be computed); when more than one odds ratio was presented, we selected the odds ratio with the lowest p value. If no odds ratio was reported in the abstract, then we sought information from which a 2 × 2 table could be constructed for the number of cases and controls with and without the genetic marker or markers under investigation; if no such information was presented, we then sought information that would enable construction of a 2 × 3 table for the case–control distribution by genotype. If measures of association were presented for subgroups only, we extracted the one with the most extreme p value (as reported or calculated from the 95 % confidence intervals). If no odds ratio or information enabling construction of a 2 × 2 or 2 × 3 table was presented in the abstract, then the entire article was reviewed to identify an eligible outcome according to the systematic process already described.
Extracted study characteristics
Methodological characteristics investigated in the meta-regression analysis of the effect of methodological characteristics on case–control studies of gene-disease association outcomes (n = 511)
Matching of cases and controls
Claim of first report or of replication
Origin of study (continent level)
Association with a cancerous or a pre-cancerous disease
Reporting of sample size/study power calculations
Data source for included studies
Selection of control participants
Consideration of possible relatedness between case and control groups
Reporting of quality assurance method
Reporting of statistical adjustment for covariates
Reporting on departure from the Hardy–Weinberg equilibrium
Results of Hardy–Weinberg assessment
Data that would allow independent testing for departure from the Hardy–Weinberg equilibrium
Number of reported genes
Number of reported genetic markers
Ratio of genetic markers tested to number of genes reported
Journal impact factor
Final reported sample size
Final reported number of cases
Final reported number of controls
Ratio of cases to controls
Descriptive and graphical presentations of the data were used to summarize reporting characteristics and investigate the associations between the different study characteristics and the magnitude of the genetic effect expressed as a log odds ratio.
Univariate and multivariate meta-regressions under random effects models using unrestricted maximum likelihood were used for the analysis. The random effects allow the gene-disease associations investigated in the different studies to vary within each category of the study characteristic(s) included in the model. We aimed at estimating the average relation of each characteristic with the reported genetic effect. The percentage contribution in the between-study variance (tau squared) was calculated for each methodological factor under the random effects model using an unrestricted maximum likelihood for both the univariate and multivariate analyses (Rosenthal and DiMatteo 2001; Higgins and Thompson 2002; Higgins et al. 2003; Van Houwelingen et al. 1993, 2002; Sutton and Higgins 2008; Trikalinos et al. 2008; Borenstein et al. 2009). Heterogeneity in effects in studies with different methodological characteristics was also tested by the Q-statistic, a measure of weighted squared deviations.
All methodological characteristics, significantly associated with the genetic effect at p < 0.05 or associated with a reduction in the residual between-study variance by ≥0.5 % in univariate meta-regressions, were considered in a multivariate model for further evaluation. Any of the methodological characteristics evaluated by multivariate meta-regression that did not show any effect in explaining the between-study variance were excluded from the model with backward stepwise elimination. Meta-regression analyses were performed with the SAS 9.1 package.
The study included 511 citations in the final sample (see supplement for details of sampling process and list of included and excluded studies).
The case–control studies selected for analysis (n = 511) were published in 220 journals with a median impact factor (2006) of 5.1. The 511 studies reported on a median number of one gene, and on a maximum number of 135 genes. The median number of genetic markers reported was 3, with a minimum number of 1 and a maximum of 1,515. The median number of genetic markers per gene was 2.
Methodological characteristics of case–control studies indexed in 2007 in HuGENet literature database (n = 511)
Number of studies (%)
Number of studies (%)
Total number of study participants
Matching of cases and controls
Data source for the included studies
Primary collection and analysis of data
Building on pre-existing data
First report or replication
Secondary analysis of pre-existing data
Source of control participants
Special groups (e.g., hospital patients, blood-bank donors)
Combination or not reported
Population-based, with details on the sampling frame
Origin of study
Population-based, without details on the sampling frame
Combination of special groups and population controls
Relatedness of cases and controls (blood relationship)
South and Central America
Multi-centre (country level)
Any method of quality assurance
Investigated a cancerous or a pre-cancerous disease
Statistical adjustment for covariates
Sample size or study power calculations
No, or not applicable
The 511 studies included in the analysis were conducted in 52 countries; the most frequent country of origin was the US (18.2 %), followed by China (15.1 %). Twenty-eight (5.5 %) of the studies were multi-centre studies conducted in more than one country. Just over a third of studies investigated cancer or a pre-cancerous state. With regard to system affected by any type of disease, gastrointestinal diseases were the most commonly investigated (10.4 %), followed by neurological diseases (8.4 %) and then vascular diseases (7.8 %). Sample sizes or study power calculations were reported in 25.6 % of the studies. Less than a third of studies specified whether or not relatedness of cases and controls had been considered. Just under half of studies reported on quality assurance and statistical adjustment for covariates.
Reporting of Hardy–Weinberg equilibrium in case–control studies indexed in 2007 in HuGENet literature database (n = 511)
Number of studies (%)
Testing for departure from Hardy–Weinberg equilibrium (HWE)
All included genotypes were in equilibrium
Some included genotypes were in equilibrium (including that associated with the extracted outcome)
Some included genotypes were in equilibrium (not including that associated with the extracted outcome)
Genotypes were not in equilibrium or it was not clear
Data that would allow independent testing for departure from HWE
Univariate meta-regression analysis
The methodological characteristics that were extracted are listed in Table 1. Categorical methodological characteristics that explained at least 0.5 % of the between-study variance included: disease category [5.72 % (Q value = 36.78, p = 0.046, df = 24)], the source of control participants [3.59 % (Q value = 23.67, p < 0.001, df = 4)], continent in which study conducted [3.16 % (Q value = 19.76, p = 0.003, df = 6)], first report versus replication of study findings [1.97 % (Q value = 11.48, p = 0.003, df = 2)], simple [1.75 % (Q value = 13.16, p < 0.001, df = 1)] and detailed [2.16 % (Q value = 19.70, p < 0.001, df = 4)] reporting on the HWE, sample size/study power calculation [1.41 % (Q value = 9.06, p = 0.003, df = 1)], matching of cases and controls [0.88 % (Q value = 5.91, p = 0.052, df = 2)], and whether relatedness between cases and controls was explicitly reported as having been considered [0.66 % (Q value = 6.00, p = 0.050, df = 2)]. Other categorical characteristics, including whether or not the study reported on cancer or pre-cancerous lesions, showed reduction in the between-study variance by <0.5 % and had non-significant differences.
For continuous factors, univariate meta-regression analysis showed a reduction in the magnitude of the reported gene-disease association outcomes with an increase in the number of participants, regardless of whether the total sample size (proportion of variance explained 1.00 %), the number of cases (proportion of variance explained 0.84 %), or number of controls (proportion of variance explained 1.06 %) was modeled. Univariate meta-regression analysis of the ratio of cases to controls, number of genes, genetic markers tested, and the ratio of genetic markers tested to the number of genes examined showed a minimal change in the between-study variance (<0.001 %) and the regression coefficients were not statistically significant (not shown). Univariate meta-regression analysis for journal impact factor showed a reduction of the between-study variance of only 0.13 % and the effect was also not significant.
Multivariate meta-regression analysis
Final multivariate meta-regression model of reduction in the residual between-study variance by gene-disease covariates
Reduction in residual between-study variance (%)
Matching of cases and controls
First report versus replication of study findings
Location of study (continent)
Overall number of controls
Sample size/study power calculations
Source of control participants
Detailed reporting on Hardy–Weinberg Equilibrium
Total reduction in residual between study variance
The final multivariate meta-regression analysis, using the random effects model with an unrestricted maximum likelihood reduced the residual between-study variance to 0.2651, therefore explaining 17.16 % of the total residual between-study variance. The proportion of the variance accounted for by specific characteristics was largely similar to the univariate analysis. The proportion of the residual between-study variance was slightly increased for disease category (6.34 %), remained about the same for case–control matching (0.94 %) and was somewhat attenuated for the other factors (Table 4).
In our investigation of over 500 articles on case–control genetic-association studies, only about a quarter of the studies reported sample sizes or study power calculations, less than a third of studies whether or not relatedness of cases and controls had been considered, and just under half of studies reported on quality assurance and statistical adjustment for covariates. In addition, several methodological characteristics correlate with the magnitude of the reported genetic effects. In general, many of the correlations reflect a tendency for better conducted, better-reported and larger studies and those with more stringent quality checks to report smaller effects. Thus, smaller effects were reported in studies with larger sample size, detailed reporting of HWE testing, clear description of controls, and sample size/power calculations and those that performed individual matching of cases and controls. Moreover, we observed that replication studies tended to give smaller effects than initial studies and effect sizes were influenced by where the study had been conducted and the disease it addressed.
Disease category accounted for a substantial proportion of the variability of the effects. Some fields such as allergic, andrologic, dental and hepatic diseases had the largest effects on average, while others such as breast, gynaecological and obstetric, hematologic, infectious disease, nephrologic, rheumatologic and thyroid had the smallest. One potential explanation may be that genetic effects indeed have different distributions in different diseases, but evidence from robustly replicated associations (Manolio et al. 2008; Hindorff et al. 2009; Ioannidis et al. 2010; Visscher et al. 2012) does not support this. A very large proportion of the reported significant odds ratios may simply represent false-positives (Ioannidis 2005b; Ioannidis et al. 2011) and many effects that may be true-positives may still be inflated in magnitude compared to their true size (Ioannidis 2008). Therefore, it is more likely that the fields finding and reporting larger effects for common variants on average simply have more false-positives and more inflated results than the others.
Studies in which individual matching was reported tended to find smaller magnitudes of effects than studies that used frequency matching or no matching, whereas studies in which it was stated that frequency matching was used tended to have a larger magnitude of effect than studies in which participants were not matched at all. The most common matching factors in both individually matched and frequency-matched studies were age, sex, ethnicity, and geographical location. Although in genetic association studies it may be important to match for ethnicity and location to address population stratification, in the current sample, the rationale for matching on the selected factors was frequently not specified. This raises the concern that over-matching might have occurred in some individually matched studies. This concern is supported by the observation that, across all studies, the genetic effect did not differ between studies that reported adjustment for covariates and those that did not. Moreover, in several individually matched studies, the statistical analysis did not take account of the matching, with neither conditional maximum likelihood nor the McNemar test being applied. The implications of over-matching in gene-disease association studies may require further evaluation in the future (Peterson and Kleinbaum 1991; Costanza 1995; Gissler and Hemminki 1996; Brookmeyer et al. 1986), and indicate a need to provide detailed reporting of the process of matching, and justification for this.
The impact of the replication of reported outcomes was also investigated. Studies that reported on a replication investigation tended to report smaller magnitudes of effect than studies that claimed to present a first report of new discovery. This is in line with previous evidence on replication of proposed candidate associations (Ioannidis et al. 2001). It is widely accepted that replication of findings is extremely important for both candidate gene and agnostic associations derived from genome-wide association studies (Hegele 2002; Wedzicha and Hall 2005; Nat. Genet editorial 1999; Cardon and Bell 2001; Hirschhorn et al. 2002; Cooper et al. 2002; Huizinga et al. 2004; Tan et al. 2004; Hall and Blakey 2005; DeLisi and Faraone 2006; Ioannidis et al. 2003).
Geographic location of the study seemed also to correlate with the magnitude of the reported genetic effects. The largest deviations were seen in studies conducted in continents where there is a relative dearth of genetic association research, specifically Africa and South and Central America. Only 16 studies were performed in African populations and these generally had effects close to the null. This could reflect differences in haplotype blocks in African populations compared with the populations of European origin in which most of the genetic associations have been first discovered. The magnitudes of effect reported in European, North American, and Asian studies did not have large differences on average. However, we should acknowledge that these are just average estimates and do not exclude the possibility of differences in the magnitude of effects for specific associations. Other studies have suggested that the results of gene-disease association studies published in Asia differed systematically from the results of studies published elsewhere, especially when studies were published in local languages, e.g., Chinese (Moher et al. 2000; Pham et al. 2005; Pan et al. 2005). A possible explanation for this discrepancy is that the present study represents studies that were published recently and indexed in HuGENet in 2007, whereas previous analyses were based on studies published between 1991 and 2004. Moreover, we only examined studies that appeared in English language journals. It is also possible that changes have occurred recently in publication practices for genetic epidemiology research in Asia (Ioannidis et al. 2001; Vickers et al. 1998; Ioannidis 2005a).
Each of the methodological aspects examined had a modest correlation with the magnitude of the genetic effect, but the cumulative proportion of the variance explained was considerable. We should clarify that these correlations are aggregate effects and they cannot be extrapolated to a single study, i.e., it would be inappropriate to “correct” the results of a single study based on some reported methodological features. Moreover, it is possible that the estimates of variance explained may be modestly inflated due to a winner’s curse phenomenon for the variables that were retained in the multivariable model. However, all the correlations that we observed are consistent with the view that better-conducted and better-reported research may be associated with less inflated results. Previous studies addressing single aspects of the study design also agree with our findings, e.g., as it pertains to HWE (Bogardus et al. 1999; Clark and Baudouin 2006; Wedzicha and Hall 2005; Salanti et al. 2005; Attia et al. 2003) or sample size (Ioannidis et al. 2001, 2003; Ioannidis 2005a; Gelernter et al. 1993), where other investigators have found inflated effects in studies with deviations from HWE or those with small sample size. Given that most genetic effects for common variants seem to be very subtle or modest at best, attention to methodological detail, the conduct of large studies, and avoidance of publication bias are therefore extremely important to avoid the propagation of spurious claims of genetic associations. The STREGA guidelines (Little et al. 2009) are an effort to improve the reporting of genetic epidemiology studies and they address the importance of reporting on sample size/power calculations, matching, source of controls and HWE. As the number of robustly documented genetic associations increases rapidly currently with the use of agnostic platforms and consortia, these considerations would be important to take into account as these variants are further tested and finely characterized in case–control studies in additional populations and settings.
Some of the factors for which no differences were seen in the magnitude of the genetic effects also merit discussion. For example, while genetic effects were larger in studies that did not report the source of controls, among studies that reported on this, the genetic effects were similar regardless of whether controls were selected from specific groups such as hospital patients or blood-bank donors, or from the general population with detailed reporting on the sampling frame, or, to some extent, from the general population but without a full description of the sampling frame. This reassuring finding is consistent with the finding of Garte and colleagues that the use of hospital controls was not associated with biased outcomes related to genotype frequencies in comparison with the use of controls from the general population (Garte et al. 2001). The same was also reported by the Wellcome Trust Case Control Consortium that found minimal differences between blood-donor/specific-group controls and population-based controls, and so did not preclude pooling of the two groups for the purpose of investigating a number of genetic disease associations in a genome-wide association study (Wellcome Trust Case Control Consortium 2007). As already mentioned, another aspect of reported design for which we found no evidence of a correlation with the genetic effect was whether there was reported adjustments for covariates. This provides some reassurance that for genetic associations, unadjusted analyses usually tend to give very similar results to adjusted analyses.
Another reassuring “negative” finding was that the number of genes or genetic markers tested in the association made no difference to the magnitude of reported effects. This is perhaps surprising, because our choice of the most statistically significant odds ratio might be expected to induce a correlation between reported effect and the number of markers tested. We similarly observed no association for the contrast of primary analyses versus analyses using already collected data. The latter accounted for a sixth of the articles. As large genetic-association databases become available, it is possible that further analysis of previously collected data may become even more common.
Some caveats need to be discussed. First, in an effort to enhance the homogeneity of the included studies, our evaluation was limited to case–control designs, including nested designs (Yu et al. 2007), examining specific associations rather than pursuing massive agnostic testing. Studies of family-based analyses, gene–environment and gene–gene interactions, and genome-wide association were not included. These studies have very important contributions to make, but they still account for a minority of published articles (Yu et al. 2007) and the issues involved may be different. There is a large potential for explanatory analysis, selective reporting, and considerable variation in analytic approaches in gene–environment and gene–gene interaction studies (Little and Higgins 2006).
In view of the accrual of genome-wide association studies, it would be useful to evaluate whether aspects of the study design and reporting correlate with their results. Given the massive testing undertaken in genome-wide studies, there is increased potential sensitivity to detailed aspects of study design (McCarthy et al. 2008), including selection of cases and controls, HWE testing, consideration and correction for population stratification, power calculations, relatedness, and quality assurance. Information is increasingly reported in extensive online appendices, particularly as a result of an increasing tendency for collaborative studies to combine genome-wide data from many studies. Given that the discovered effects for common gene variants tend to become smaller and smaller on average, even small effects from methodological characteristics could create false associations and obscure true associations (Ioannidis et al. 2009). Therefore, the importance of attention to methodological detail is only likely to be heightened in genome-wide studies.
Second, we should caution that study reporting is not always an accurate or complete translation of the actual design and conduct of a study. Nevertheless, we focused on aspects where the discrepancy between what is reported and what actually happened should be minimal. Third, each study was represented in our analysis with a single odds ratio. This is clearly a simplification of what are often complex analyses involving many different genes and variants. However, we used an explicit, objective, and reproducible algorithm for selecting the representative odds ratio for each study. Conceptually, our choice is very close to what attracts most attention as a primary result in gene discovery, the lowest p value. Nevertheless, we recognize that all data and results are important, and transparent reporting of all analyses is important to enhance the credibility of the genetic association literature (GAIN Collaborative Research Group 2007; Mailman et al. 2007; National Institutes of Health (NIH) 2008; National Institutes of Health (NIH) 2009; European Bioinformatics Institute 2012).
Diana Fox and Zahra Montazeri for administrative support. Knowledge Translation Branch, Canadian Institutes of Health Research (Grant Ref No: 200606KRS-162113-KRS-CECA-102805); Biotechnology, Genomics and Population Health Branch, Public Health Agency of Canada. National Guard Health Affairs, Saudi Arabia. Julian Little holds a Canada Research Chair in Human Genome Epidemiology. The funders had no role in the decision to submit the article or in its preparation.
Conflict of interest
The authors declare that they have no conflict of interest.