Introduction

Over the past decade several investigators have applied microarray technology and related bioinformatic approaches to clinical sepsis and septic shock, thus allowing for an assessment of how, or if, this branch of genomic medicine has meaningfully impacted the sepsis field. This review will first provide an overview of the gene microarray approach, including limitations and study design considerations. Subsequently, the review will focus on the potential translational application of microarray data and genome-wide expression profiling to the sepsis field. Four broad areas will be discussed: genome-level understanding of sepsis, biomarker discovery, gene expression-based identification of septic shock subclasses, and discovery of novel targets and pathways.

Technology, approaches, and limitations

Microarray-related technology, approaches, and limitations have been extensively reviewed elsewhere [15], and will be summarized below. Notably, there is now an emerging technology, RNA sequencing (RNA-Seq) [6], that has potentially intriguing applications for the field, but will not be further discussed as there are no RNA-Seq data specifically related to sepsis.

The fundamental technical innovation of microarray technology is the ability to simultaneously measure mRNA abundance of thousands of transcripts (transcriptomics). The technique generally involves reverse transcription of RNA into cDNA, with the inclusion of a labeling molecule for detection. The labeled cDNA (targets) is subsequently applied to a support surface arrayed with nucleotide sequences corresponding to specific genes (probes). The probes and targets hybridize via standard nucleic acid interactions and the amount of hybridization reflects the abundance of a specific mRNA species. The supporting surface is subsequently washed and scanned to provide raw mRNA abundance data. An important limitation of transcriptomics is that it solely provides a 'snapshot' of steady-state mRNA abundance. The degree of mRNA abundance is influenced by multiple factors, and does not provide any direct information about gene end products (proteins), nor post-translational modifiers of protein function, such as phosphorylation or glycation.

One major consideration in designing a microarray experiment involves the RNA source. Ideally, the RNA source should be relatively homogenous and closely represent the disease/condition biology of interest. For example, the discovery of neutrophil gelatinase-associated lipocalin as a biomarker for acute kidney injury included microarray-based analysis of kidneys from rodents subjected to renal ischemia [7]. Most of the studies described below have used the blood compartment as the RNA source. Reliance on the blood compartment has obvious limitations with regard to specific organ perturbations in clinical sepsis, but also reflects the practical limitations of tissue sampling in clinical research and does provide a broad picture of a systemic response. Blood-derived RNA can come from either whole blood (a mixed population of blood cells), or following the isolation of specific blood cells. The whole-blood approach facilitates the procurement of samples from multiple centers, without the requirement for cell separation expertise, and has the potential to provide a comprehensive picture. However, the whole blood approach has the potential to confound data interpretation due to heterogeneous blood cell populations. The cell-specific RNA approach provides a more homogenous RNA source, but has the potential to miss biologically relevant expression signatures from cells that are excluded from the experimental approach. For example, a study that focuses exclusively on peripheral blood mononuclear cells will not account for the potentially important response of neutrophils.

Another important consideration in designing a microarray experiment involves the reference (control) group to which gene expression in the population of interest will be compared. For example, if one is interested in studying gene expression patterns in sepsis, relative to a normal state, then comparisons to normal controls is appropriate. In contrast, if one is interested in discovering gene expression patterns that distinguish sepsis from 'sterile inflammation', then a more appropriate control group would consist of patients who are not infected, but meet criteria for systemic inflammatory response (SIRS).

The heterogeneity and complexity that characterize clinical sepsis present an important challenge to clinical microarray studies. From one perspective, one could say that the comprehensive nature of a microarray approach is ideally suited for studying such a heterogeneous and complex syndrome. From another perspective, the heterogeneity and complexity are potentially profound confounders for data interpretation. Accordingly, it is critical that microarray data be interpreted in the context of robust clinical/biological data that can influence gene expression patterns. These include, but are not limited to, race, gender, age, co-morbidities, infecting pathogen class, state of immune competence, and therapy.

Analysis of microarray data is an evolving and complex field. A universal initial step involves data normalization, which allows valid comparisons across samples by reducing technical variations not directly related to biological variation [5]. A typical next step involves statistical comparisons across groups of interest using either parametric or non-parametric analysis of variance. Unfortunately, there is no clear consensus as to which statistical test is most appropriate for a given data set, and it is particularly troubling that lists of 'differentially regulated genes', from the same data set, can substantially vary based on the statistical test [8, 9]. Regardless of what statistical test one uses, it is imperative that the statistical test incorporates corrections for multiple comparisons to account for a substantially high risk of false positives. One common filter that is applied to microarray data involves an expression filter that compares mRNA abundance of specific gene probes in one cohort versus a reference cohort. Expression filters are useful to assess 'magnitude of effect' and to reduce the number of comparisons for a subsequent statistical test, but they are not valid substitutes for formal statistical testing. Finally, there is the issue of statistical power in microarray experiments, which can be calculated, but is dependent on assumptions that can be difficult to derive objectively [10]. In general, a heterogeneous study cohort will require substantially more independent samples, compared to a more homogenous cohort.

The statistical tests described above typically yield large lists of differentially regulated genes, thus leaving one with the challenge of assigning biological meaning to these gene lists. One approach to data interpretation involves the generation of 'heat maps', which statistically cluster genes and samples based on similarity of expression. Heat maps provide a broad picture of gene expression patterns and allow for the discovery of disease 'subclasses' based on differential gene expression [11]. Another approach to viewing large microarray data sets involves the generation of gene expression 'mosaics' based on a 'self-organizing map' algorithm [12, 13]. These gene expression mosaics provide microarray data with a 'face' that is recognizable via intuitive pattern recognition, and were recently applied to allocate patients with septic shock into clinically relevant subclasses [14, 15].

Beyond these global assessments of gene expression patterns there exist a number of public and proprietary databases allowing for the assignment of biological function to gene lists. These databases examine uploaded gene lists and determine whether the gene list is enriched for genes that are biologically related, based on the established literature. The outputs from these databases range from generic (for example, 'immune response') to specific (for example, 'antigen presentation') biological processes. Furthermore, the outputs from these databases provide an estimate of significance (P-values) indicating how likely a gene list is enriched for a given biological function by chance alone. The level of significance is directly proportional to the number of genes in the list that correspond to the given biological function, and indirectly proportional to the total number of genes in the list. A related approach to assigning biological meaning to gene lists involves the generation of gene networks based on known, direct and indirect, interactions between genes [16, 17].

Genome-level understanding of sepsis

Microarray-based expression profiling has provided an unprecedented opportunity to gain a broader, genome-level 'picture' of complex and heterogeneous clinical syndromes such as sepsis. In addition, this genome-level approach has the potential to reduce investigator bias, and thus increase discovery capability, in as much as all genes are potentially interrogated, rather than a specific set of genes chosen by the investigator based on a priori and potentially biased assumptions.

Many of the fundamental physiologic and biologic principles of the sepsis paradigms are derived from experiments involving human volunteers subjected to intravenous endotoxin challenge [1821]. More recently, the genome-level response during experimental human endotoxemia has been studied using microarray technology [16, 22, 23]. Talwar and colleagues [22] compared eight volunteers challenged with intravenous endotoxin to four controls challenged with saline. Mononuclear cell-specific RNA was obtained at four different time points after endotoxin challenge and analyzed via microarray. As expected, a large number of transcripts related to inflammation and innate immunity were substantially up-regulated in response to endotoxin challenge. Interestingly, the peak transcriptomic response to the single endotoxin challenge occurred within 6 hours and mRNA levels generally returned to control levels within 24 hours. The investigators also reported endotoxin-mediated differential regulation of over 100 genes not typically associated with acute inflammation (for example, cathepsin H, sialidase 1, UDP-glucose dehydrogenase, zinc finger protein 266, homeo box B2). Finally, and of relevance to subsequent sections of this review, endotoxin challenge also led to repression of several gene programs directly related to adaptive immunity (for example, interleukin-7 receptor, T cell receptor α locus, zeta-chain T cell receptor associated protein kinase 70 kDa, T cell receptor γ locus).

Calvano and colleagues [16] also studied normal volunteers subjected to a single endotoxin challenge, but applied a (then) novel approach to microarray data analysis centered on knowledge-based interactive gene networks. Again, the maximal up-regulation of gene networks corresponding to inflammation and innate immunity occurred at approximately 6 hours after the endotoxin challenge, and generally returned to baseline by 24 hours. Perhaps the most interesting finding from this network-centered analysis, however, was the widespread and early repression of gene networks related to mitochondrial energy production (for example, NADH dehydrogenase 1, pyruvate dehydrogenase, ATP synthase) and protein synthesis (ribosomal protein L3, ribosomal protein S8, eukaryotic translation initiation factor). Tang and colleagues [24] have corroborated the repression of mitochondrial energy production-related genes in a study focused on neutrophil-specific gene expression in critically ill patients with sepsis.

The human endotoxemia studies described above provide a highly controlled and reproducible experimental setting to explore sepsis biology at the level of the entire transcriptome, but as with all sepsis models, this model does not fully replicate the complex and heterogeneous syndrome seen at the bedside following infection with live microbes [25]. Consequently, several investigators have attempted microarray-based studies in critically ill patients with sepsis and septic shock. These studies present considerable experimental challenges due to the inherent heterogeneity of clinical sepsis and septic shock. Nonetheless, several studies have provided novel insight into the overall genome-level response to sepsis [9, 17, 24, 2634]. A common theme across many of these studies is the massive up-regulation of inflammation- and innate immunity-related genes in patients with sepsis and septic shock. These observations are not intrinsically novel, but they are consistent with the long-standing sepsis paradigms centered on a hyperactive inflammatory response, and thus provide an important layer of biological plausibility with regard to overall microarray data output in the context of clinical sepsis.

Another common paradigm in the sepsis field involves a two-phase model consisting of an initial hyper-inflammatory phase followed by a compensatory anti-inflammatory phase, but this has been recently challenged, in large part due to the multiple failures of interventional clinical trials founded on this paradigm [3537]. Recently, Tang and colleagues [3] conducted a formal systematic review of a carefully selected group of microarray-based human sepsis studies. A major conclusion of this systematic review is that, in aggregate, the transcriptome- level data do not consistently separate sepsis into distinct pro-and anti-inflammatory phases. This conclusion has been questioned [38], but is supported by several recent cytokine-and inflammatory mediator-based studies in clinical and experimental sepsis [3941].

Another prevailing paradigm in the sepsis field involves the concept of immune-paralysis, which frames sepsis as more of an adaptive immune problem (rather than just an overactive innate immune system) and the inability to adequately clear infection [42, 43]. Recently, this paradigm was elegantly corroborated in mice subjected to sepsis and rescued by administration of IL-7, an anti-apoptotic cytokine essential for lymphocyte survival and expansion [44, 45]. As mentioned previously, studies in human volunteers challenged with endotoxin revealed early repression of gene programs related to adaptive immunity [22]. In studies focused on mononuclear cell-specific expression profiles, Tang and colleagues [30, 31] have also reported early repression of adaptive immunity genes in patients with sepsis. Finally, multiple studies in children with septic shock have reported, and validated, early and persistent repression of adaptive immunity-related gene programs (for example, genes corresponding to the T cell receptor) [9, 11, 14, 15, 17, 3234]. Thus, the concept of adaptive immune dysfunction as an early and prominent feature of clinical sepsis and septic shock seems to be well supported by the available genome-wide expression data.

Developmental age is thought to be a major contributor to sepsis heterogeneity. Recently, a microarray-based study in children with septic shock corroborated this concept at the genomic level [46]. Four developmental age groups of children were compared based on wholeblood-derived gene expression profiles. Children in the 'neonate' group (<28 days of age) demonstrated a unique expression profile relative to older children. For example, children in the neonate group demonstrated widespread repression of genes corresponding to the triggering receptor expressed on myeloid cells 1 (TREM1) pathway. TREM1 is critical for amplification of the inflammatory response to microbial products and there has been recent interest in blockade of the TREM1 signaling pathway in septic shock [47]. The observation that TREM1 signaling may not be relevant in neonates with septic shock illustrates how some potential therapeutic strategies for septic shock may not have biological plausibility in certain developmental age groups.

Biomarker discovery

A daily conundrum in the intensive care unit is the ability to distinguish which patients that meet criteria for SIRS are infected, and which patients with SIRS are not infected. Accordingly, there are ongoing efforts to discover diagnostic biomarkers for sepsis (SIRS secondary to infection), and microarray approaches have the potential to enhance these efforts. Several investigators have reported genome-level signatures that can distinguish patients with SIRS from patients with sepsis [26, 29, 31, 48]. A substantial amount of work, including validation, remains to be done in order to leverage these datasets into clinically applicable diagnostic biomarkers, but the datasets nonetheless provide a foundation for the derivation and development of diagnostic biomarkers for sepsis.

Investigators have also applied microarray technology to address other important clinical challenges directly related to infection. Cobb and colleagues [49, 50] have reported an expression signature (the 'ribonucleogram') having the potential to predict ventilator-associated pneumonia in critically ill blunt trauma patients up to 4 days before traditional clinical recognition. Similarly, Ramilo and colleagues [51] have reported expression signatures that can distinguish influenza A infection from bacterial infection, and Escherichia coli infection from Staphylococcus aureus infection, in hospitalized febrile children. In contrast, Tang and colleagues [30] were unable to define organism-specific gene expression signatures (Gram positive versus Gram negative bacteria) in critically ill adults with sepsis.

Another aspect of biomarker development in sepsis surrounds stratification biomarkers, particularly to predict outcome. Theoretically, any gene that is consistently differentially regulated between survivors and non-survivors in a microarray dataset may warrant further investigation as a potential outcome biomarker. For example, a microarray study by Pachot and colleagues [27, 52] identified CX3CR1 (fractalkine receptor) as a potential outcome biomarker in sepsis. Similarly, Nowak and colleagues [53] have leveraged microarray data to identify chemokine (C-C motif) ligand 4 (CCL4) as an outcome biomarker in children with septic shock. Both candidate stratification biomarkers, however, require further validation.

IL-8 has emerged as a robust stratification biomarker in children with septic shock [54], and the rationale for pursuing it stemmed directly from microarray-based studies identifying IL-8 as one of the more highly expressed genes in pediatric non-survivors of septic shock, compared to survivors [34]. Subsequent studies demonstrated that serum IL-8 protein levels, measured within 24 hours of presentation to the intensive care unit with septic shock, could predict survival in pediatric septic shock with a probability of 95% [54]. The ability of IL-8 to serve as a stratification biomarker was subsequently validated in a completely independent cohort of children with septic shock. Consequently, it has been proposed that IL-8 could be used in future pediatric septic shock interventional trials as a means to exclude patients having a high likelihood of survival with standard care, as a means of improving the risk-to-benefit ratio of a given intervention. This type of stratification strategy would be particularly applicable for an intervention that carries more than minimal risk. Interestingly, it appears that IL-8-based stratification may not perform in a similarly robust manner in adults with septic shock [55], thus providing another example of how developmental age contributes to septic shock heterogeneity.

Currently, there is an ongoing effort to derive and validate a multi-biomarker sepsis outcome risk model in pediatric septic shock. The foundation of this effort is the relatively unbiased selection of a panel of candidate outcome biomarkers using microarray data from a large cohort of children with septic shock [56, 57].

Gene-expression-based identification of septic shock subclasses

Viewing septic shock as a highly heterogeneous syndrome implies the existence of 'disease subclasses', in an analogous manner to that encountered in the oncology field [37]. Recently, there has been an attempt to identify septic shock subclasses in children based on genome-wide expression profiling [11]. Complete microarray data from a large cohort of children with septic shock, representing the first 24 hours of admission, were used to identify septic shock subclasses. A heat map of over 6,000 differentially regulated genes was generated using an unsupervised clustering algorithm. Patients were then classified into one of three subclasses (subclasses 'A', 'B', or 'C') based on statistically similar gene expression patterns, as determined by the first and second order branching patterns of the heat map. Subsequently, the clinical database was mined to determine if there were any phenotypic differences between the three subclasses. Patients in subclass A had a significantly higher level of illness severity as measured by mortality, organ failure, and illness severity score.

The gene expression patterns that distinguished the subclasses were distilled to a 100-gene expression signature by conducting a leave-one-out cross-validation procedure and selecting the 100 genes having the greatest subclass prediction capability. These 100 genes were then uploaded to a gene expression database that identified enrichment for genes corresponding to adaptive immunity, glucocorticoid receptor signaling, and the peroxisome proliferator-activated receptor-α signaling pathway. Of note, the genes corresponding to these functional annotations were generally repressed in the subclass of patients with the higher level of illness severity (that is, subclass A patients).

In a subsequent study, the expression patterns of the 100 subclass-defining genes were depicted using visually intuitive gene expression mosaics and shown to a panel of clinicians with no formal bioinformatic training and blinded to the actual patient subclasses (Figure 1). The clinicians were able to allocate patients into the respective subclasses with a high degree of sensitivity and specificity [15]. The ability to identify a subclass of children with a higher illness severity was further corroborated when the gene-expression-based subclassification strategy was applied to a separate validation cohort of children with septic shock [14]. Collectively, these studies demonstrate the feasibility of subclassifying patients with septic shock, in a clinically relevant manner, based on the expression patterns of a discrete set of genes having relevance to sepsis biology. The availability of clinical microfluidics [58] and digital mRNA measurement technology [59] may allow for clinical feasibility of measuring the 100 class-defining genes in a timely manner that is suitable to direct patient care or for clinical trial stratification.

Figure 1
figure 1

Examples of gene expression mosaics for individual patients in septic shock subclasses A, B, and C, respectively [14, 15]. The expression mosaics represent the expression patterns of the same 100 class-defining genes corresponding to adaptive immunity, glucocorticoid receptor signaling, and the peroxisome proliferator-activated receptor-α signaling pathway. The color bar on the right depicts the relative intensity of gene expression. These individual patient mosaics have not been previously published.

Discovery of novel targets and pathways

The potential to interrogate the entire genome in a relatively unbiased manner provides an opportunity to discover previously unrecognized, or unconsidered, targets and pathways relevant to sepsis biology. This is a daunting task in the context of a highly heterogeneous syndrome such as clinical sepsis, and the many unavoidable confounding factors inherent to clinical sepsis microarray studies. Nonetheless, several studies illustrate the potential of genome-wide expression profiling in the discovery of novel targets and pathways.

For example, using a combination of expression profiling and in vitro approaches, Pathan and colleagues [60] have identified interleukin-6 as a major contributor to myocardial depression in patients with meningococcal sepsis. This is a particularly intriguing and robust study because the study population is relatively homogeneous (that is, exclusively patients with meningococcal) compared to the majority of sepsis microarray studies that have enrolled patients with heterogeneous sepsis etiologies.

In one of the earliest clinical sepsis microarray studies, Pachot and colleagues [27] identified a set of genes differentially regulated between survivors and non-survivors. The gene most highly expressed in survivors, relative to non-survivors, was that encoding the chemokine receptor CX3CR1 (fractalkine receptor). In a subsequent validation study, these same investigators provided further evidence supporting the novel concept that dysregulation of CX3CR1 in monocytes contributes to immune-paralysis in human sepsis [52]. These studies further demonstrate the potential to discover novel pathways through discovery-oriented expression profiling.

Several studies in children with septic shock have documented early and persistent repression of gene programs directly related to zinc homeostasis, in combination with low serum zinc concentrations [9, 11, 17, 32, 34]. Since normal zinc homeostasis is absolutely critical to normal immune function [61], these observations have raised the possibility of zinc supplementation as a potentially safe and low cost therapeutic strategy in clinical septic shock and other forms of critical illness [6264]. Importantly, Knoell and colleagues [65, 66] have independently corroborated that zinc supplementation is a highly beneficial strategy in experimental sepsis. Additional studies by Knoell and colleagues [67] have corroborated decreased plasma zinc concentrations in patients with sepsis, and that low plasma zinc concentrations correlate with higher illness severity. Furthermore, plasma zinc concentrations correlate inversely with monocyte expression of the zinc transporter gene SLC39A8 (also know as ZIP8) [67, 68]. Interestingly, microarray-based studies in children with septic shock have reported high levels of SLC39A8 expression in non-survivors, relative to survivors [34]. Despite the intriguing convergence of these data from independent laboratories, the safety and efficacy of zinc supplementation in clinical sepsis remains to be directly demonstrated and is a current area of active investigation. One consideration for these studies will be the incorporation of trancriptomic analyses to determine if zinc supplementation influences the zinc-related gene repression patterns described above.

In the aforementioned studies involving children with septic shock, metalloproteinase (MMP)-8 has consistently been the highest expressed gene in patients with septic shock, relative to normal controls [9, 11, 17, 3234, 46]. In addition, MMP-8 is more highly expressed in patients with septic shock compared to patients with sepsis, and in septic shock non-survivors compared to septic shock survivors [69]. MMP-8 is also known as neutrophil collagenase because it is a neutrophil-derived protease that cleaves collagen in the extracellular matrix, but MMP-8 is also known to have other cellular sources and non-extracellular matrix substrates, including chemokines and cytokines [70]. The consistently high level of expression of MMP-8 in clinical septic shock recently stimulated the formal study of MMP-8 in experimental sepsis. These studies demonstrated that either genetic ablation of MMP-8 or pharmacologic inhibition of MMP-8 activity confers a significant survival advantage in a murine model of sepsis [69]. While these studies require further development and validation, the findings are intriguing given that there exist a number of drugs to effectively inhibit MMP-8 activity in the clinical setting [71].

Conclusion

Despite the tremendous methodological challenges that come with translational research involving humans with sepsis, microarray technology and complex bioinformatic approaches are beginning to provide novel insights into this complex syndrome. Progress, albeit slow, has been realized with regard to our understanding of the genome-level response during sepsis, the identification of potential novel targets and pathways, discovery of candidate diagnostic and stratification biomarkers, and the possibility of clinically relevant and clinically feasible gene-expression-based subclassification. The challenges ahead include robust validation studies, standardization of technical approaches, standardization and further development of analytical algorithms, and large scale collaborations.