Introduction

Infectious diseases impose a huge burden on modern health-care systems – a problem that is even more significant in developing countries. In older adults infectious diseases accounted for 13% of all hospital charges in the USA in one study [1]. Another study conducted in a pediatric population estimated that in 2003 a total of 286,739 infectious disease hospitalizations occurred among infants in the USA, accounting for 42.8% of all hospitalizations of infants [2]. Additionally, we face the problem of increased hospital mortality rates and costs due to increasingly resistant organisms such as methicillin-resistant Staphylococcus aureus [36] and vancomycin-resistant enterococci [7, 8]. An understanding of what determines susceptibility and response to infectious disease is central to reducing its associated burden and improving health care.

Susceptibility and response to infectious disease is heritable. Sorensen and colleagues [9] found that the genetic contribution to death from infection is five times greater than the genetic contribution to cancer. Since that report was published, multiple groups have confirmed that susceptibility to and outcome from infectious disease is heritable [1012]. As a result, investigators have sought to identify genetic variants associated with altered susceptibility and response to infectious disease. Identification of the genetic variants associated with infectious disease would permit early identification of patients at greater risk for adverse outcome from, for example, pneumonia, sepsis, and acute respiratory distress syndrome. It would also promote development of novel, perhaps individually tailored, treatments for these patients. In addition, detrimental side effects and expense of adjuvant therapy could be avoided in other patients who, by genotype, are predicted not to benefit.

Initial investigations have highlighted the complexity of the immune response and thus the large number of host genes that probably play a role in determining an individual's susceptibility and response to infection. Additionally, environmental factors may greatly modify genetic effects. Important environmental factors include type of organism, antibiotic susceptibility, site of infection, how soon the infection is detected, and whether it is treated appropriately with antibiotics, resuscitation, supportive medical management and/or surgery. Searching for genetic contributors to susceptibility and response to infection is challenging in view of these important confounders. Inadequate sample size and mismatching of patients with control individuals may contribute to the lack of reproducibility seen in case-control studies. Gene-gene interactions, epigenetic effects, and patterns of linkage disequilibrium contained within haplotypes are all issues that must be addressed. Despite this extremely high degree of complexity, high-throughput genotyping technologies and large patient cohorts may now allow us to tease out the key genetic variants that influence susceptibility and response to infection.

Candidate gene-based approach to genetic association studies

From a genetics perspective, infection is a complex disease that arises from the interaction of an individual's genotype with the environment (infectious micro-organisms). Classic Mendelian, single-gene diseases are studied using techniques such as linkage analysis. In linkage analysis an identifiable genetic marker is used as a tool to track the inheritance pattern of a nearby disease gene that has not yet been identified but whose approximate location is known [13]. This approach has not worked well for complex diseases that may involve many genes. In contrast, by using the known pathophysiology of specific diseases to direct good guesses – called the candidate gene approach [14] – investigators have discovered many associations between genetic variants in these relevant candidate genes and clinical outcome in diseases such as diabetes, hypertension, and infection. Candidate gene association studies determine whether the frequency of a 'risk' allele is higher in affected than in unaffected individuals. Linkage studies are not as powerful as candidate gene association studies in identifying risk genetic variants for common, complex diseases [13] because of the modest effect of risk alleles in complex disease and poor resolution. However, whole-genome genotyping in very large populations of patients with specific complex diseases is starting to yield discoveries.

Genetic association studies in infectious diseases have largely focused on candidate genes in the inflammatory and immune systems, because these are assumed to be important in the immune response to an infection. Polymorphisms in inflammatory and immune system genes may lead to inappropriate activation of the inflammatory system in response to invading micro-organisms. Critical care investigators have also looked at candidate genes in the coagulation system, because an inappropriate coagulation response is important in the pathology of sepsis and is intricately tied to the immune response [1518].

Once a candidate gene had been selected for study, variants within the gene must be tested for association with phenotype. Single nucleotide polymorphisms (SNPs) are the most commonly occurring type of variant in the genome, and they are the most frequently studied in genetic association studies. SNPs are a single-base change in the DNA sequence. HapMap [19] and related projects have now identified most common SNPs in the human genome (about 2.2 million SNPs with a minor allele frequency >5%) in a variety of ancestral groups, greatly simplifying SNP selection for genetic association studies. Polymorphisms that change the amino acid sequence of a gene, that are in a potential regulatory sequence, or that alter a splice site of a gene have a higher probability of having functional consequences. Therefore, these polymorphisms have traditionally been the most popular candidates for genetic association studies [13].

Candidate gene single nucleotide polymorphism associations in sepsis

Early genetic association studies using a candidate gene strategy focused on potential functional SNPs have produced somewhat unclear and conflicting results. We review some well known examples in genes familiar to many intensive care physicians.

Tumour necrosis factor-α promoter polymorphisms

The A allele of a G-to-A polymorphism at position -308 in the promoter region of the tumour necrosis factor-α gene was initially found to be associated with adverse outcome in patients with septic shock [20]. A number of subsequent studies yielded similar results [21, 22] but several studies [23], including a recent large study [24], were unable to reproduce these findings. Interestingly, the tumour necrosis factor-α gene is located close to the lymphotoxin-α gene, the heat shock protein 70 gene, and other inflammatory pathway genes. A number of investigators have suggested that SNPs in these genes may be the real cause of any observed differences in patient outcomes.

Interleukin-6 polymorphisms

A key inflammatory cytokine that has been well examined in genetic association studies in infectious disease is IL-6. These studies have also produced conflicting results and highlight the problems with reproducibility in genetic association studies. The C allele of a G-to-C polymorphism at position -174 of the IL-6 gene was associated with decreased levels of IL-6 [25] in one study, and another study found an association between -174 GG and increased serum IL-6 concentrations [26]. However, a third study found no association between either allele and serum concentrations [27]. In critically ill patients, one study found no association between the -174 G/C polymorphism and incidence of sepsis, although -174 GG was associated with improved survival rates in patients with sepsis [28], whereas our group found that the -174 G/C polymorphism was not associated with a difference in survival [29].

CD14 polymorphisms

CD14 is an innate immunity receptor for lipopolysaccharide, peptidoglycan, and lipoteichoic acid, which – in association with Toll-like receptor (TLR)4 and MD2 – forms the lipopolysaccharide receptor complex [3033]. A C-to-T polymorphism at position -159 in the promoter of the CD14 gene has been examined for association with intermediate phenotypes and clinical outcomes related to infection by numerous groups (Table 1). There have been a number of contradictory reports regarding the risk for developing, and outcome from, severe sepsis and septic shock [3440]. The CD14 -159 C/T polymorphism does not appear to be associated with risk for septic shock or mortality in Asian populations [39, 40], and there have been conflicting reports in mixed ethnicity and Caucasian patient samples [3437, 41].

Table 1 Genetic association studies of the CD14 C-159T polymorphism and infectious disease

Toll-like receptor-2 polymorphisms

TLR2 is an innate immune receptor for Gram-positive bacteria that activates the nuclear factor-κB signaling cascade and transcription of inflammatory cytokines [4244]. Polymorphisms in the TLR2 gene have been associated with increased risk for Gram-positive infections and decreased responsiveness to bacterial peptides [4548] but, in contrast, not with mortality from severe S. aureus infection [49].

Haplotype associations in sepsis

With the development of public resources such as dbSNP, HapMap [50], the Human Genome Diversity Project [51], and gene-based re-sequencing projects (SeattleSNPs [52] and the National Institute of Environmental Health Sciences SNPs Program [53]), we are beginning to develop a better understanding of the patterns of diversity across the human genome. Data from the HapMap project have been used to describe patterns of linkage disequilibrium in the human genome, while detailed descriptions of variation in individual genes allow researchers to describe haplotypes – patterns of SNPs that are inherited as a single unit – of individual genes (Figure 1). These tools have allowed researchers to move away from a candidate (functional) SNP-based approach to a broader survey of 'tag' SNPs that represent all known and unknown polymorphisms in a haplotype of a candidate gene. This eliminates the potential bias of examining only candidate functional SNPs. The SeattleSNPs Program [54] has been especially useful in picking tag-SNPs to examine in infectious disease, because they focus on re-sequencing genes of the inflammatory and immune systems [52].

Figure 1
figure 1

Protein C gene SNPs. Protein C gene single nucleotide polymorphisms (SNPs) arranged in simplified haplotypes are illustrated. Each SNP is a colored column labeled with its 'rs' number. (For example, the NCBI [National Center for Biotechnology Information] website [123] can be searched by choosing the 'SNP' database and searching, for example, for 'rs2069912'. A wealth of data relevant to this SNP is then displayed.) The common (major) allele is illustrated in blue and the less common (minor) allele is displayed in yellow. SNPs are arranged in patterns called haplotypes. There are four common SNP patterns, or haplotypes, observed in the protein C gene. Haplotype 3 is the most common, making up about 40% of the observed haplotypes in those of European ancestry, whereas haplotype 2 makes up about one-third of the observed haplotypes. Haplotype 4 is the most similar to the haplotype observed in chimpanzees, and it is therefore considered the ancestral haplotype. The common haplotype 3 is similar to this ancestral haplotype on the left-hand SNPs, or 5' end, but differs significantly on the right hand SNPs, or 3' end. The 5' end of haplotype 1 is very similar to haplotype 2, which has evolved considerably away from the ancestral haplotype. However, 3' end of haplotype 1 is very similar to the ancestral haplotype 4. Therefore, there has almost certainly been a crossing over event that created this haplotype from two precursors. It is evident that much more information can be determined from haplotypes than from single SNPs.

We may not have a complete understanding of how polymorphisms in genes alter their expression or function, and so it may be more useful to select SNPs that allow us to describe all of the variation in a gene, and not just the variation that we presume may have functional significance. Our limited knowledge of transcriptional regulation and the structure of linkage disequilibrium may in part be responsible for the lack of reproducibility of many genetic association studies in sepsis. A haplotype-based approach to candidate gene association studies enables us to avoid making presumptions about the functional significance of SNPs in candidate genes. A number of haplotype-based studies have found associations between candidate genes and infectious disease.

Protein C haplotypes

Two polymorphisms 13 base pairs apart in the promoter region of the protein C gene (-1,654 C/T and -1,641 G/A) have been suggested to alter outcome in sepsis [55] and to alter protein C levels in blood [56] (Figure 1). Chen and coworkers [57] found that the CA haplotype of protein C -1,654 C/T and -1,641 G/A was associated with increased risk for death and organ dysfunction in Chinese Han patients with severe sepsis. The C allele of protein C 673 T/C (linkage disequilibrium with the CA haplotype, D' = 100%) was also found to be associated with increased mortality and organ dysfunction in a cohort of 100 North American East Asians with severe sepsis [58].

IL-6 haplotypes

IL-6 haplotype clades were associated with mortality and organ dysfunction in critically ill adults [29]. A different, common IL-6 haplotype running from nucleotides -1,363 to +4,835 relative to the transcription start site of IL-6, and spanning the gene, conferred risk for susceptibility and response to acute lung injury [59]. However, haplotype analysis revealed that the IL-6 gene was not associated with susceptibility and response to invasive pulmonary aspergillosis in a Spanish population [60].

Mannose-binding lectin haplotypes

Mannose-binding lectin (MBL) binds sugar groups on microbial surfaces and activates the 'alternative', or lectin, complement pathway [61]. Three structural mutations have been found in exon 1 of the MBL gene [6264] that occur as six different haplotypes [6567]. These haplotypes have consistently been associated with different serum levels of MBL [6567], but there have been conflicting reports of the association between MBL haplotypes and outcome from sepsis [48, 68, 69], as well as from other infectious and inflammatory processes [7076].

C-reactive protein haplotypes

The C-reactive protein haplotype 1,184C; 2,042C; 2,911C was found to be more frequent in individuals who were not colonized with S. aureus in the vestibulum nasi, and host genotype was associated with the carriage of specific S. aureus genotypes [77]. This is interesting in that it highlights the importance of looking not just at host genetic variation but also at variation in micro-organisms and how this affects the interaction between host and micro-organism.

Other inflammation/coagulation gene haplotypes

A fibrinogen-β gene haplotype was associated with mortality in sepsis [78]. An IL-10 haplotype has been associated with increased mortality in critically ill patients with sepsis from pneumonia but not in patients with extrapulmonary sepsis [79].

Remaining problems

Although haplotype analysis has produced some interesting results, there remains the problem of nonreproducible results seen in genetic association studies based on functional SNPs. Additionally, groups appear to be inconsistent in their definition of haplotypes within candidate genes, and haplo-types defined in one patient population may not be applicable to another. With the growing collection of documented SNPs in the genome, our improved understanding of the patterns of genetic variation, and high-throughput genotyping technologies, we now have the ability to move away from candidate gene based association studies. The risk of looking for candidate genes among pathways we already know is that we may miss key genes because of ignorance of the other biologic systems involved [14]. Approximately 10% of the 30,000 human genes are immune response genes, and thus the likelihood of any single gene being associated with infectious disease is low [80]. We now have the tools to use a broader, less biased approach to genetic association studies, and this may allow us finally to tease out the contributions made by genetic variants to susceptibility and response to infectious disease.

Moving forward with genetic association studies in sepsis

Several technologies (Affymetrix and Illumina) have been developed during the past few years that allow thousands of SNPs to be genotyped rapidly and accurately using small amounts of DNA. As the speed and throughput of genotyping polymorphisms has increased, costs have decreased significantly. It is now feasible for researchers to genotype thousands of SNPs in thousands of patients at moderate cost. Concurrently, groups such as the International HapMap Project [50] and Perlegen Sciences [81] have provided high-resolution maps that allow researchers to select SNPs that are correlated with adjacent polymorphisms and can act as markers, or tag SNPs, for other unmeasured SNPs. Sets of thousands of common SNPs can now be selected so that they tag the most common variants in a population. These SNPs can then be genotyped at low cost in thousands of patient samples using new high-throughput genotyping platforms. These technologies and resources make new strategies for genetic association studies, such as genome-wide association, practical, and they allow researchers to take an unbiased approach to association studies independent from selection of candidate genes.

Genome-wide association

Genome-wide association studies (GWAS), like linkage analyses, do not require a prior hypothesis of candidate genes to test for association with disease. In GWAS, as in genetic association studies, allele frequencies are compared between cases and controls. In GWAS, however, it is not allele frequencies in individual candidate genes that are compared, but rather allele frequencies in an unbiased selection of SNPs across the whole genome. Thus, assumptions about important genes and pathways in disease are avoided and novel insights into biology are possible. That is, whereas candidate gene studies test only for variants within genes of known relevance, GWAS make it possible to gain further insight into the pathophysiology of sepsis. Novel genes that have significant impact on outcome from sepsis would implicate the gene pathways involved in sepsis.

Now that it is economically feasible to genotype hundreds of thousands of SNPs in thousands of patients, and HapMap has made available intermediate allele frequency polymorphisms that are informative for association studies [50], whole-genome association studies for complex disease are possible and have been conducted in a number of diseases. The first published example of a GWAS in complex disease found that functional SNPs in the lymphotoxin-α gene are associated with susceptibility and response to myocardial infarction (MI) [82]. A total of 92,788 tag SNPs were genotyped in 94 individuals with MI and 653 control individuals to identify a locus on chromosome 6p21 that was associated with susceptibility and response to MI. Further linkage disequilibrium mapping and haplotype analysis allowed the researchers to narrow down the association to two SNPs in the lymphotoxin-α gene in 1,133 affected individuals versus 1,006 control individuals. Importantly, the researchers validated their GWAS findings with in vitro functional analysis to establish the biologic plausibility of their finding. GWAS has now been used to find disease-associated alleles in Crohn's disease [83], type 1 diabetes [84], type 2 diabetes [85] and age-related macular degeneration [86], and will be an important tool in identifying disease-associated alleles in infectious disease.

Genome-wide array of nonsynonymous single nucleotide polymorphisms

An alternative to genotyping tag SNPs across the genome, as in GWAS, is to directly test association of large numbers of nonsynonymous SNPs (nsSNPs), or amino acid changing SNPs, to disease. There are now almost 60,000 documented SNPs that cause nonsynonymous amino acid substitutions [87]. High-throughput genotyping technologies allow all of these nsSNPs to be genotyped simultaneously in thousands of patients. nsSNPs may cause functional changes in a protein that lead to increased susceptibility and response to disease. By screening all known nsSNPs in the human genome, and not just in candidate genes, researchers do not have to make assumptions about which genes or pathways may play a role in disease. However, this method, unlike genome-wide association, does require some knowledge of the structure of genes. Genome-wide scans of nsSNPs have identified polymorphisms associated with type 1 diabetes [88] and Crohn's disease [89].

Testing for differences in allelic expression

Recent studies have shown that polymorphic alleles may be differentially expressed within an individual and that this may contribute to phenotypic variation [9094]. Classically, allele-specific differences in expression were attributed to phenomena such as genomic imprinting (methylation causing inactivation of one parental haplotype) [95] and X-chromosome inactivation [96]. More recently it has been recognized that allele-specific expression is relatively common among non-imprinted autosomal genes [91, 93, 9799] and that this difference in allelic expression is heritable [93]. Common polymorphisms in autosomal genes may cause subtle quantitative changes in the expression of one allele of a gene that may make a minor contribution to a quantitative trait, or to the susceptibility and response to a disease. Genome-wide analysis of gene expression patterns has been used to examine differences in global patterns of gene expression between healthy and diseased individuals [90, 100, 101]. Allele-specific differences in expression appear to be cell-type and stimulus dependent [90, 100, 101]. Differential allelic expression has been associated with susceptibility and response to colorectal cancer [92], schizophrenia [102], and obesity [94].

Nonsynonymous coding SNPs can be used to test heterozygote cell lines for differences in allelic expression [93, 103]. Within one cell, if there are no cis-acting regulatory elements affecting the expression of each allele, both alleles should be equally expressed [93]. However, if an individual is heterozygous for a functional cis-acting regulatory polymorphism, then the two alleles will be differentially expressed [93]. A nonsynonymous coding SNP within the transcript can be can be used as a tag to distinguish between transcripts derived from each allele [103]. Allelic discrimination can then be used to measure relative allelic expression levels, with each allele serving as an internal control for the other. Allele-specific gene expression can be performed on a genome-wide scale using oligonucleotide arrays in order to find regulatory elements [91]. Regulatory polymorphisms can then be mapped and tested for association with disease. Identifying regulatory SNPs or the haplotypes in which they lie may help us to understand how genetic variation influences susceptibility and response to disease.

Copy number polymorphisms

In addition to regulatory polymorphisms that cause allele-specific differences in expression, protein expression may be altered among individuals as a result of copy number polymorphisms (CNPs) [104, 105]. CNPs are alterations in genomic DNA that cause deletions or duplications of a gene in adjacent segments of DNA [104, 105]. Analogous to the definition of SNPs, the minor form of a CNP must occur in more than 1% of the population for this variation to be termed a CNP. The deletions or duplications result in varying copy numbers of genes among individuals and can cause measurable differences in protein expression. The differences in protein expression are not due to altered regulation of gene transcription, as in allele-specific differences in expression, but are a result of a decrease or increase in the number of copies of the gene in the genome [104]. CNPs are likely to contribute to complex disease and quantitative traits. An example of a CNP that leads to human disease is the genomic duplication of the PMP22 gene, which causes the most common form of Charcot-Marie Tooth disease [106]. CNPs are likely to have variable affects on phenotypes, depending on the sensitivity of the gene to dose, interactions with other loci, and the environment.

The availability of increasingly complex microarrays at decreasing cost has made it possible to perform genome-wide analysis of CNPs to quantify copy number differences. Affymetrix and Illumina offer combined SNP genotyping and copy number analysis, allowing researchers to perform genome-wide studies to detect associations of disease with either CNPs or SNPs. Genotyping of multibase, often multi-allelic CNPs is more challenging than genotyping di-allelic SNPs, however, and current data indicate that there is a low correlation between quantitative measures of CNPs and the true allelic state of each CNP in each individual [107]. More accurate assays are needed for association studies using CNPs.

Use of genetic tests in patient care

Although a number of important genetic associations with outcome from sepsis have been discovered, further steps are required to apply these discoveries to patient care. First, risk for adverse outcome predicted by genotype is somewhat helpful, but prediction of response to therapy is clearly more useful for clinicians deciding on therapeutic approaches. Therefore, genetic association studies must expand measured end-points to include response to specific therapies. Second, predictive genetic associations must also consider specificity and sensitivity analyses to confirm that genotypic information contributes to predictions of response to therapy or outcome beyond what is possible using classical measures (age, severity of illness, and so on). Third, prospective testing of predictive genetic tests in large multicenter studies will be important to validate the treatment-modifying discoveries and to define the effectiveness (a step beyond efficacy) of decisions based on the predictive genetic test. These are substantial hurdles but they can be addressed, particularly by global collaborations, which we should all now embrace.

Conclusion

The age of genomic personalized medicine is within our reach. Previous genetic association studies in sepsis have had problems with reproducibility as a result of a number of issues, including small sample sizes, bias resulting from selection of candidate genes, the influence of multiple genes and environment on phenotype, epigenetics, and a lack of understanding of the patterns of variation in the human genome. We are beginning to develop the ability to deal with these issues as new, more economically feasible technologies allow us to genotype thousands of patients at hundreds of thousands of loci, and as we develop a better understanding of the complexity of patterns of variation in the human genome and the environment. Discoveries of novel genotype-phenotype associations in infectious disease may provide us with a clearer understanding of the pathways that are involved in susceptibility and response to infection, and they may one day allow us to treat patients with more specific treatments with fewer side effects.