Introduction

The past decade has seen rapid developments in our understanding of the genetic etiology of various common multifactorial diseases, such as age-related macular degeneration (AMD), type 1 and type 2 diabetes, cardiovascular diseases, Crohn's disease and various cancers [1]. Further developments in genomic research, such as the growing number of genome-wide association studies, the large-scale consortia that are pooling data from various studies, and the advances in statistical genomics and genotype technology, are drastically improving the chances of identifying common low risk variants and rare high risk variants. It is beyond doubt that many more genetic susceptibility variants will be discovered in the next few years.

Expectations are high that increasing knowledge of the genetic bases of disease will eventually lead to personalized medicine, that is, to preventive and therapeutic interventions for complex diseases that are tailored to individuals on the basis of their genetic profiles [2, 3]. Genome-based personalized medicine already exists for monogenic disorders. For example, female carriers of BRCA1 or BRCA2 mutations are offered biannual mammography screening or provided the opportunity of preventive surgery. Potential applications of genetic profiling in multifactorial diseases include tailoring of prevention programs to at-risk individuals, determining the starting age of participation in screening programs [4] and, when profiles predict treatment success, tailoring treatment modalities and starting doses.

As we have reviewed recently [5], the predictive value of genetic profiling is still limited at present, with a few promising exceptions. The area under the receiver operating characteristic curve (AUC) gives an assessment of the discriminative accuracy of a prediction model, that is, the degree to which the test results can discriminate between persons who will develop the disease and those who will not. AUC ranges from 0.50 (equal to tossing a coin) to 1.00 (perfect prediction). We found that the AUC was low for the genetic prediction of type 2 diabetes and coronary heart disease and high for the prediction of hypertriglyceridemia and AMD [5]. Table 1 illustrates that the high AUC of 0.80 for hypertriglyceridemia resulted from very strong individual genetic factors, with odds ratios ranging from 2.0 to 7.4, and the low AUC of 0.55 for coronary heart disease from genetic variants with low odds ratios. Note that the strongest genetic predictor by far for coronary heart disease had a weaker effect than the weakest predictor for hypertriglyceridemia. In order to achieve appreciable predictive value, genetic profiles need to include a few strong genetic risk factors or a large number of weak susceptibility variants [6].

Table 1 AUC and effect estimates of susceptibility variants for the prediction of three diseases

Although the predictive value of genetic profiling is still limited, an increasing number of companies already offer personalized lifestyle health recommendations and nutritional supplements on the basis of clients' genetic profiles [7]. Despite the limited predictive value of genetic testing in multifactorial diseases, these commercial developments will yield ongoing interest from consumers, from health care professionals confronted with questions from patients who underwent testing, and from policy makers who search for novel strategies to improve health care and population health. These developments ask for a solid evidence base for genomics applications. One of the major challenges for the coming decades will be to investigate the translation of this emerging genomic knowledge into public health and medical care [8, 9].

In this article, we review recent studies on the predictive value of genetic profiling from a methodological perspective. We address five issues: the choice of the study population, the construction of genetic profiles, the measurement of the predictive value, calibration and validation of the predictive value, and finally assessment of the clinical utility of genetic profiles. These issues are illustrated using examples from recent studies on the predictive value of genetic profiling in common diseases. Methodological characteristics of these studies are listed in Table 2.

Table 2 Methodological characteristics of recent studies on the prediction of complex diseases using multiple genes*

From gene discovery samples to the target populations

In the gene discovery phase, researchers often make use of highly selected series of patients and controls. Patients are selected for severe pathology, early onset and familial clustering of disease, and controls for the absence of pathology. This procedure substantially improves the statistical power of gene discovery research without creating any bias. But hyperselection of cases and controls can be a problem for evaluating the usefulness of genetic testing, as it typically leads to an overestimation of the effect sizes and, thus, to an overestimation of the predictive value. Effect sizes are inflated because frequencies of the risk genotypes are particularly increased in enriched patient populations and particularly decreased in controls that have no pathology related to the disease of interest.

Table 2 shows that many studies on the predictive value of genetic profiling were conducted in hyperselected case-control series, comparing, for example, type 2 diabetes patients with normoglycemic individuals [10, 11], patients with severe hypertriglyceridemia with normolipidemic controls [12], or patients with end-stage AMD with individuals who have no eye pathology [13]. By excluding individuals with modestly elevated glucose or lipid levels, these case-control series largely lose their relevance for investigating predictive potential in clinical practice, where persons with such levels are part of the population. Predicting progression to disease is most difficult in individuals with early symptoms or mild pathology, but prediction in this population is clinically highly relevant. One could argue that if the predictive value of genetic profiling is low in the samples used in these studies [1013], it will be even poorer in unselected cohorts. Thus, hyperselected case-control studies can be useful to demonstrate that predictive genetic testing is not informative and, given the commercial interest in genome-based applications, this is an important message to get across.

Another consideration is the use of case-control studies in general, as illustrated by the recent findings on type 2 diabetes. Lango et al. [11] investigated the predictive value of 18 polymorphisms in a case-control study, comparing patients with normoglycemic controls, and van Hoek et al. [14] looked at the same polymorphisms in a prospective cohort of individuals aged 55 years and older. In both studies [11, 14], the AUC of the 18 polymorphisms was 0.60 and the improvement in AUC beyond prediction from age, sex and body mass index (BMI) was limited (ΔAUC = 0.02). But a more detailed analysis of the results reveals that even though the AUC was 0.60 in both studies, it was mainly contributed by different genetic variants in the two studies (Table 3). Moreover, the 0.02 improvement increased the AUC to 0.80 in the case-control study but to only 0.68 in the prospective cohort study. This difference is mostly explained by the difference in BMI. Mean BMI in the case-control study was 31.5 kg/m2 in patients and 26.9 kg/m2 in controls compared with 28.0 kg/m2 and 26.0 kg/m2 in the prospective cohort study, indicating that BMI was a stronger predictor of type 2 diabetes in the case-control study.

Table 3 Effect estimates of 18 established susceptibility variants on type 2 diabetes risk in two studies

Case-control studies tend to overestimate odds ratios and this may be related to selection bias (most likely the case in the example above) or information bias (patients may attribute a disease to a known risk factor and they over-report this exposure). An issue that is often ignored in gene discovery studies but that is extremely relevant in studies evaluating the predictive value is that of survival bias. If genes increase the risk of disease, they may also increase the risk of (early) mortality. Therefore, there are strong arguments that show the necessity that predictive testing in preventive medicine should be investigated in cohort studies consisting of individuals who do not have the disease of interest, and predictive testing for prognosis and therapy response should be evaluated prospectively in clinically relevant patient series.

There is no single golden standard by which study population and study design should be selected, other than that predictive genetic tests need to be evaluated in populations representative for their intended use. The choice of the target population is not arbitrary, but rather is a trade-off of the effectiveness, costs and harmful side effects of available interventions, among other factors. Table 2 shows three prospective cohort studies evaluating the prediction of coronary heart disease, one in Caucasian men of European ancestry aged 50-64 years [15], one in a general population of 45-64 years [16] and one in patients with familial hypercholesterolemia [17]. These different study populations assume different target populations for genetic profiling, and the predictive value will differ between these populations when disease risks, genotype frequencies and effect sizes are different.

Moving from risk variants to genomic profiles

When the predictive value of a limited number of variants is investigated in a large population-based study, disease risks can be calculated as the percentage of patients for each combination of genotypes. However, the number of genotype combinations increases exponentially with the number of variants tested. For example, combining 18 variants that have three possible genotypes, as did two studies of type 2 diabetes, theoretically yields 387,420,489 (318) unique profiles. To deal with such a large number, researchers adopt one of two approaches for the calculation of disease risks. First, they may calculate risk allele scores or genotype scores obtained by counting the number of risk alleles across all variants [10, 11, 1423]. This approach assumes that the differences between the effects of the individual variants can be ignored, which may be a realistic assumption for multifactorial disorders given that the effect sizes are generally small [24]. Second, researchers may use logistic or Cox proportional hazards regression analyses for risk prediction, which do account for differences in effects sizes between individual variants. Risk predictions from regression analysis can be regarded as weighted risk scores. Table 2 shows that all studies applied either logistic or Cox proportional hazards regression analyses, some in addition to the simpler risk allele scores.

In addition to the question of how to combine genetic variants into profiles, the question arises as to which of the variants to include. Several studies include variants that were already established risk factors (Table 2) [11, 12, 14, 19, 2123, 25]. Others include polymorphisms from candidate genes or regions that have been associated with disease risk in at least one other study or that are likely to be functionally implicated (for example, [13, 15, 17, 18, 20, 26, 27]). And again others include polymorphisms identified in their own genome-wide association study [10, 28]. Although the distinction between candidate and established variants is not crystal clear and findings from genome-wide association studies may be robust, we can expect that the predictive value of variants that are less convincingly established is less likely to be replicated in independent populations.

Another important issue in obtaining accurate estimates of the genetic predisposition at the individual level is how to handle gene-gene and gene-environment interaction in the prediction of common diseases. It is frequently argued that strong effects can be seen from the interaction of a gene with other genetic variants or environmental factors. Several studies reported in Table 2 investigated the presence of gene-gene interaction [19, 23, 26], but none included interaction effects in the regression models. The reported effect sizes for interaction terms were very modest, implying that the influence on the predictive value of risk profiles would have been limited [19]. When future studies give robust evidence for interaction, these interaction effects should be taken into account in the risk prediction.

Last but not least, we can anticipate improvement in the predictive value when we can identify the exact causal variants. Most variants that are included in the genetic profiles shown in Table 2 are derived directly from genome-wide association studies. There is a growing awareness that these might not be the causal variants and that the causal variants may have a very different allele distribution in patients and controls. It is anticipated that the causal variants will have stronger effects on disease risk. The large deep-sequencing efforts that are ongoing may shed light on this question.

Evaluation of the predictive value

The question of how well genetic profiles can predict disease can be answered by many different performance measures, which all are related but which highlight different features. Which measure is of interest depends on the question addressed. Individuals who undergo genetic testing will be most interested in their absolute risks of disease conditional on their genetic profile. Only a few empirical studies have presented absolute risks [14, 17, 18], most likely because these cannot be calculated from case-control data without assumptions on the incidence of disease. Many other studies have reported used risk ratios (odds ratios, relative risks or hazard ratios), which each compare the risks of disease with a reference risk, namely that of individuals who carry no risk alleles. Here, also, evaluation studies diverge from gene discovery studies. Although the comparison with those with the lowest number of risk alleles is a valid approach and the recommended strategy in gene-discovery studies, it is less relevant in translational studies. Individuals who undergo genetic testing and receive their results are not interested in learning their risk compared with individuals who have an extremely low risk of disease [29] but rather compared with the average risk of disease, that is, the risk before testing, which for a common disease such as type 2 diabetes may be as high as 10%. Thus, comparing the risk or odds of disease with those with the average risk is more appropriate [29].

When deciding about whether or not to perform a test from a clinical perspective, physicians need to know to what extent a test can make a difference. This makes them more interested in the distribution of risk allele scores and, related to that, the distribution of risks and risk ratios. Many empirical studies do present distributions of risk allele scores [10, 11, 16, 1822], and several others do present risks associated with the risk allele scores but do not show their distribution (Table 2) [13, 14, 17, 23, 25, 27, 30]. These distributions are all different presentations of the discriminative accuracy of a test, generally measured as the AUC (see earlier). All but two [23, 30] of the studies shown in Table 2 evaluated the AUC of genetic profiling. Despite reported shortcomings [3133], AUC is very suitable as a first screening indication of predictive value. Further evaluation of clinical validity and utility is warranted only if a reasonable AUC is demonstrated at first. This further evaluation can include evaluation of absolute risks, reclassification [31], net reclassification improvement and integrated discrimination improvement [32]. The value of reclassification should not be overestimated, as illustrated by the study of Kathiresan and colleagues [18], who studied the addition of genetic factors to traditional risk factors for cardiovascular disease. In this study, adding genetic variants did not improve the AUC, but 26% of the individuals in the intermediate risk group (absolute risks 10-20%) were reclassified into the lower and higher risk groups. A closer look at the findings shows that the observed risk of those who were reclassified to the lower risk group was 8.2%, only slightly lower than the cut-off value of 10%, and the observed risk of those reclassified to the highest risk category was 14.7%, which was similar to the observed risk among those who remained in the intermediate category (14.5%). Reclassification may thus not lead to better classification when no improvement in AUC is seen.

Moving towards the calibration and validation of the predictive value

Prediction of complex diseases from risk allele scores or on regression models makes several assumptions. As discussed earlier, risk allele scores assume that the differences in effect sizes between the individual variants can be ignored and that there is no gene-gene interaction. Regression methods generally do not consider gene-gene or gene-environment interaction effects either. One way to test whether these assumptions are reasonable is to evaluate the concordance between observed and expected disease risks, a method that is called calibration. Although calibration is an essential step in the development of clinical prediction models, it was examined only in two of the studies reported in Table 2[12, 22]. Wang et al. [12] found that the prediction of hypertriglyceridemia from traditional and genetic factors showed good calibration, indicating that observed risks were reasonably predicted by a regression model that did not include interaction effects.

All prediction models perform best in the dataset from which they were obtained. Therefore, it is crucial to replicate the predictive value of genetic profiling in independent datasets. Validation investigates the extent to which genetic profiles have similar predictive value in independent datasets. In large-scale studies, prediction models usually are developed using part of the data and applied to predict the outcome of interest in the rest (internal validation). In addition, the prediction needs to be evaluated in an independent dataset (external validation) to demonstrate its value.

Table 2 shows that only one study performed internal validation for the selection of the markers [16], and none performed external validation. One might think that the two studies investigating the same 18 polymorphisms in type 2 diabetes are replication studies [11, 14], but this is not true. Because the studies were published at the same time, they each developed their own prediction model. Given that these prediction models had only four variants in common, it is reasonable to expect that the AUC would be lower if the two research groups had validated their models on each others data. The lack of replication studies is, however, not so much a problem for studies that already show poor predictive value in the original population, as the predictive value typically becomes worse in the replication study. It is, however, important for studies that potentially show useful predictive value, such as the study on myocardial infarction following surgery (AUC 0.76) [26] and the studies in AMD and hypertriglyceridemia (both AUC 0.80) [12, 13].

Moving from clinical validity to personalized medicine

Whether genetic testing is useful for public health or clinical practice depends on what the implications of the test results are. The usefulness depends on the availability of alternative strategies for disease prediction and the availability of preventive or therapeutic interventions that can be targeted to genetic profiles, among other factors, such as effectiveness and cost-effectiveness of interventions, and patient preferences and attitudes.

An important consideration is whether genetic profiles yield substantially better predictive value than traditional risk factors. For instance, genes associated with cardiovascular disease may also be involved in intermediate outcomes, such as dyslipidemia or hypertension [18, 34]. From a theoretical perspective, genetic factors will not remain significant when considering both genetic factors and intermediate outcomes in the prediction analysis [5]. Overall, genetic factors will not be better predictors of disease risk than intermediate factors, but their greater ease of assessment may be worth a slight reduction in the predictive value. However, it should be realized that even when genetic profiles predict disease equally as well as intermediate biomarkers, this does not mean that they are equally useful. Typical intermediate outcomes suggest the existence of early pathology and point to clear targets for intervention, such as weight loss or medication for lowering blood pressure or cholesterol, whereas targeted interventions are often not clear for genetic risks. An exception is intensive surveillance, which is useful to broader populations at risk, independent of the underlying pathology.

Another issue is the availability of specific interventions for specific genetic profiles. Very often the number of alternative therapeutic interventions is quite limited. Also, from a public health perspective it can be argued that most preventive strategies, such as weight control and smoking cessation, will have effects on multiple disease outcomes, making it unreasonable - and unethical - to specifically target these strategies to people on the basis of a genetic profile that increases their risk for a single disorder and to withhold it from others. That is, not only persons at increased genetic risk for diabetes should be advised to control their weight, but also persons at increased risk for other disorders such as arthritis, cardiovascular disease and cancer. Thus, a key question to answer is how we can justify personalization of preventive and therapeutic interventions.

Conclusion

Prediction studies so far have been rather simplistic in the sense that most were based on a small number of variants that by themselves explain only a fraction of the genetic variability, were conducted in non-representative cohorts, were neither calibrated nor validated and hardly investigated clinical utility. This should not be interpreted as shortcomings of these studies; questions concerning calibration, validation and clinical utility are relevant only for genetic profiles with promising discriminative values. On the basis of AUC, further evaluations could be worthwhile for AMD and hypertriglyceridemia [12, 13], if only to find out whether their very high discriminative accuracy (AUC = 0.80) was due to the hyperselected case-control design or to true strong genetic effects. Replication is warranted in independent population-based prospective cohort studies that include the whole range of the clinical spectrum.

Another important question is the level of predictive value that is to be targeted before implementing a genomic profile. The level aimed for depends on the intended application, particularly on the goal of testing, the medical, psychological and financial burden of the disease, the availability of (preventive) treatment and the adverse effects of false-positive and false-negative test results. The aim of genetic screening is often to select high-risk subjects for preventive treatment or intensified surveillance programs. High predictive value is needed for interventions that are invasive and irreversible, whereas lower predictive value may be sufficient for interventions such as adopting a healthy diet or increasing physical activity, which are beneficial and not harmful for a broader population.

A legitimate question is whether we should evaluate predictive genetic testing for common diseases at the moment. It is clear that our current knowledge of their genetic basis is insufficient, and will probably remain so for the next five years. However, the current interest from biotechnology companies that offer genetic profiling on the internet, and from customers who want to learn about their risks of disease, currently asks for empirical evidence. Whether future genetic profiles should only be offered commercially if the clinical utility has been proven beyond reasonable doubt (as is the case for medical tests and treatments) or can enter the market if proven not harmful (as expected for health products such as vitamins and anti-aging cosmetics) remains an open question. From a clinical and public health perspective, we need to build the knowledge base that is needed to identify useful genome-based applications for implementation in a clinical setting.