Mechanical ventilation is a life-saving therapy for numerous critical illnesses. However, it is now recognized that ventilation with excessive tidal volumes, leading to hyperexpansion or excessive mechanical shear, is potentially directly harmful to susceptible patients. The benefits of lower tidal volumes, which reduce lung-cell stretch, have now clearly been established [1]. The clinical presentation of ventilator-associated lung injury (VALI) is identical to that of other causes of acute lung injury (ALI) and is characterized by increased pulmonary edema. Important studies by Parker [2, 3] and Webb and Tierney [4] demonstrated changes in microvascular permeability in isolated lung and intact animal models exposed to increased airway pressures, suggesting that these changes in permeability may in large part be attributed to the effects of mechanical stimuli on various cell-signaling pathways [5, 6]. Although several studies have suggested a genetic basis for susceptibility to VALI [79], few candidate genes have been implicated in this process.

To identify major genes associated with VALI, we examined gene-expression profiles of several in vivo models (rat, mouse, and dog) of ventilator-induced ALI. As a main component of ALI is presumed to involve biophysical stress-induced leakage of the pulmonary vasculature [10], we also included human lung vascular endothelial cells exposed to high-level cyclic stretch as a human in vitro model of mechanical stress. Gene-expression profiling of these models was performed and analyzed using species-specific Affymetrix GeneChips. The individual analysis of species-specific arrays produced large lists of candidate genes and several challenges, with the most notable being an excessive number of genes (ranging from 548 candidates in the rat to 963 candidates in the human model) for candidate gene selection. While meta-analysis strategies exist for narrowing candidate gene selection from multiple experimental systems [1113], this analysis can only be applied to the same species cross-platform array comparison. To use this approach for analysis of experiments involving diverse species we speculated that multi-species gene-expression profiles could be linked using the Eukaryote Gene Orthologs database (EGO [14]).

Orthologs are genes in different species that have evolved from a common ancestral gene by speciation and generally retain a similar function in the course of evolution. We speculated that overlapping responses to mechanical stretch in orthologous genes across species might reveal candidate genes involved in an evolutionarily conserved defense mechanism to lung injury that might be triggered by ventilator-induced lung injury. Previous studies of three-way comparative analysis of human, mouse and dog DNA [15] showed that the majority of highly conserved human-mouse elements are also conserved in the dog. Furthermore, Frazer et al. [16] speculated that comparing human sequence with those of multiple species might be an effective approach for distinguishing actively conserved elements from elements that simply result from a shared ancestry. On the basis of these observations, we predicted that a common stimulus (mechanical stretch) across four different species will initiate actively conserved mechanisms that defend the lungs against adverse environment factors or bacterial products. To select genes involved in these defense mechanisms the functionally related genes from different species should be first identified.

Despite the availability of tools for comparing gene-expression data from Affymetrix GeneChip arrays designed for different species [17, 18], there are limited resources for simultaneous array-data analysis across multiple species-specific platforms (GeneChip IDs U34, U74, U95, U133). GeneHopper [18, 19], which uses the UniGene and HomoloGene databases to provide comparisons between arrays, is useful for linkage of selected genes of interest from different array platforms, but is less suitable for linking expression sequence tags (ESTs) and uncharacterized genes represented on arrays. Moreover, the database for this software is not yet complete, and does not include the widely used HG_U95Av2 GeneChip.

A better alternative is RESOURCERER [17, 20], which is based on the TIGR Eukaryotic Gene Ortholog (EGO) database [21], and contains information for all commercially available Affymetrix GeneChips. However, RESOURCERER allows comparison of only two chips simultaneously and cannot be used directly for multi-species analysis. Therefore, we assembled a database using ortholog links (identified by RESOURCERER) between the most commonly used Affymetrix rat, mouse and human GeneChips (U34A, U74A, U95A and U133A) for our multi-species cross-platform gene-expression analysis.

We first calculated gene-expression changes for each tested species and linked expression values obtained for orthologous genes. Orthologous genes exhibiting similar patterns of expression across all species were selected as VALI-related candidates under the assumption that gene-expression responses conserved across evolutionary history would be most likely to reveal fundamental biological responses to VALI. After normalizing gene-expression values across species, we next identified orthologous genes with statistically significant changes in response to VALI. A biologically significant fold-change in gene-expression level was determined using MAPPFinder [22, 23] by linking selected genes to Gene Ontology (GO) biological processes and identifying functional categories that were significantly regulated. This filtering produced a candidate list of 69 genes that were significantly affected by mechanical stretch. A literature search for these genes using PubMatrix [24] identified 12 genes as related to ALI as well as six new VALI-related candidate genes. Our analytical gene ortholog approach also revealed a number of changes in unsuspected GO processes and biological pathways that may provide new insights and potential therapies in ALI. Thus, this technique offers the capacity to identify genes that are likely to be missed by individual species analysis and facilitates application of a meta-analysis approach to multi-species analyses.


To maximize the number of valid cross-species comparisons, we focused our analysis on the human, mouse and rat Affymetrix 'A' GeneChips, which contain the majority of 'named' or functionally classified genes and the least number of unannotated ESTs. Ortholog tables for each pair were generated using RESOURCERER. This software provides a table in which rows contain paired orthologous probe IDs; IDs corresponding to Affymetrix internal controls were ignored in further analysis. Because the U133A GeneChip contained the largest number of probe IDs (22,215) as compared to U95A, U74A, and U34A chips (12,588, 12,422, and 8,740 probe IDs, respectively), the U133A genes were selected as the reference gene set for orthologous comparisons. As anticipated, the total number of reference genes to participate in forming ortholog pairs (identified by RESOURCERER) was always higher than that of corresponding orthologs (Table 1), which justified our selection of the U133A array as the reference platform.

Table 1 Relationship of EGO orthologs between selected Affymetrix GeneChips

The linkage of all four arrays identified 3,077 genes common to the U133A reference gene ortholog nodes (Figure 1). An example of an ortholog node for the ODC-1 gene is shown in Figure 2a. This ortholog node was missing one link, rendering our ortholog-linked database incomplete. Therefore, we identified all orthologs with missing links (Figure 2b) and connected them as putative orthologs on the basis of homology to the common reference gene.

Figure 1
figure 1

Overlaps between rat (U34A GeneChip), mouse (U74A GeneChip) and human (U95A GeneChip) Affymetrix array platforms based on the human (U133A GeneChip) ortholog assignments. The sum of numbers inside each circle represents the total number of ortholog pairs formed with reference genes on the U133A GeneChip by corresponding arrays (see also Table 1). The reference genes formed 3,077 pairs with corresponding orthologs that were represented on all depicted arrays.

Figure 2
figure 2

Schema of the centric approach in ortholog-linked database building and putative ortholog detection. (a) An example of putative ortholog creation for the ornithine decarboxylase 1 (ODC-1) gene. U74A and U34A probe IDs were EGO orthologs (solid line) for the U133A and U95A ODC-1 gene but were not directly linked (dashed line) either in EGO or in the Affymetrix ortholog table. (b) The reference genes common to all arrays (see Figure 1) and their corresponding orthologs for U95A-U74A, U95A-U34A, and U74A-U34A pairs were permutated and all possible combinations counted (dashed lines). EGO combinations were retrieved from RESOURCERER-generated tables for these paired arrays and counted (solid lines). The difference in the predicted and existing pairs represents the number of putative orthologs to be created, based on homology to the common reference gene.

Gene-expression data for populating our ortholog-linked database was generated by hybridization of total mRNA from rat, mouse and dog lung tissues and human endothelial cell cultures to GeneChips U34A, U74A, U133A and U95A, respectively. All hybridizations were represented by a minimum of three control and four mechanical stretch-challenged samples, with the exception of the rat model which had two control and two stretch-affected samples (see Materials and methods). The signal intensities produced during hybridization were extracted from hybridization images using Affymetrix software MAS 5.0 and ratios of transcript abundance calls were computed. Rat, mouse, and human array assays produced 51%, 52% and 49% present (p < 0.04) and marginal (p < 0.06) transcript abundance calls, respectively. In contrast, however, the canine hetero-hybridization to the human U133A GeneChip created only 17% marginal and present calls (Figure 3a). Probe-level analysis revealed that poor cross-species hybridization to a subset of the probe pair sets was responsible for the loss of many present calls from the canine array data. To address this, we adjusted results of this cross-species hybridization by modifying U133A array probe-set compositions on the basis of differences between dog and human DNA. The poorly performing probes were also identified in the species-specific hybridizations and subsequently masked using masking protocol embedded in MAS 5.0. When modified probe sets were reprocessed by MAS 5.0, the ratio of present calls was increased on average by 25% (Figure 3b). Next, we replaced remaining absent calls with the corresponding chip background value (see Materials and methods), which allowed us to use all available data on each chip. Subsequent statistical analysis was conducted for each experimental system individually and four generated gene lists were later used for comparison with gene lists generated using the ortholog approach.

Figure 3
figure 3

Experimental data used for populating the ortholog-link database. (a) Using Affymetrix MAS 5.0 software, absent (black), marginal (white) and present (gray) transcript-abundance calls were counted for each experimental dataset and the values obtained expressed as a percentage of all calls. (b) By masking poorly performing probes for U95A, U74A and U34A, the present call ratio for these GeneChips was increased by 25%. As dog mRNA was hybridized to the human U133A chip, the present call ratio for this hetero-hybridization was much lower than that in other experiments. We therefore corrected U133A probe sets for differences in gene sequence between human and canine, which increased the present call ratio by more than 50%.

For statistical analysis of combined cross-platform expression data, we pooled control and mechanical stretch-challenged samples from all tested species into corresponding groups ncontrol = 11 (nrat = 2, nmouse = 3, ncanine = 3, and nHPAEC = 3) and nstretch = 14 (nrat = 2, nmouse = 4, ncanine = 4, and nHPAEC = 4). Because these arrays contain multiple paralogues (similar sequences in a single organism), the multiple orthologs for the same reference gene were identified (Figure 4). Therefore, approximately 62% of formed ortholog groups failed to follow the ncontrol = 11/nstretch = 14 pattern. To avoid unequal contribution of each species to the statistical analysis, the expression values of multiple paralogues were averaged and then ncontrol = 11/nstretch = 14 set was built. Once the groups for comparison were formed, we used the independent variance double-tailed t-test for statistical evaluation of changes in gene expression of reference genes and their orthologs. This analysis identified significant changes in the expression of 141 reference genes and their corresponding orthologs across all experimental systems.

Figure 4
figure 4

Overall distribution of orthologs among reference genes. Most of the reference genes (1,088) had only one ortholog on each of the U95A, U74A and U34A arrays used in these studies. The first bar shown here represents the number of reference genes that had three orthologs. The majority of remaining reference genes had two orthologs on one of the studied arrays. Overall, about 62% of reference genes had at least one multiple ortholog set.

To further refine this list, we established a fold-change cutoff for biologically significant gene-expression changes based on the analysis of the relationship of the biological processes driven by these genes. Starting from the notion that genes coding for proteins involved in the same biological processes are regulated in coordinated manner, and that expression of members of a given bioprocess is more likely to be co-regulated rather than inversely regulated [25], we speculated that an increased ratio of inversely regulated bioprocesses at low fold-change cutoff values (Figure 5a,b) is due to the contribution of spurious (false-positive) changes in gene expression assigned to low fold-change values. As shown in Figure 5a for inflammatory response bioprocess at 1.1- and 1.15-fold-change cutoffs, this process was classified as inversely regulated. However, with a 1.2-fold-change cutoff, this becomes a co-regulated pathway. In contrast, the DNA-dependent regulation of transcription bioprocess (Figure 5b) is classified as an inversely regulated through all tested fold-change cutoffs. Although most low fold-change genes in this process were eliminated, the ratio of upregulated and downregulated genes remained constant and was stabilized beyond the 1.3-fold-change cutoff. From these observations we propose that the point at which sharp changes in the number of genes involved in GO bioprocesses subsides could be considered as biologically meaningful fold-change cutoff.

Figure 5
figure 5

Distribution of co-regulated and inversely regulated biological bioprocesses identified by linkage to GO. (a) Genes involved in a co-regulated bioprocess (inflammatory response; GO 6954) and (b) an inversely regulated bioprocess (DNA-dependent regulation of transcription; GO 6355). Solid areas under the curve represent upregulated genes and gray areas under the curve represent downregulated genes. (c) A summary of all co-regulated (top curve) and inversely regulated (bottom curve) GO bioprocesses identified by MAPPFinder corresponding to the increment in the fold-change cutoff.

The bioprocesses affected by mechanical stretch were identified using MAPPFinder [13] software designed by BayGenomics PGA group for dynamic linkage of gene-expression data to the GO [26] hierarchy. When we analyzed the gene pool that included genes with slight changes in their expression (1.1-fold), the MAPPFinder identified 432 bioprocesses, with 288 activated and 147 suppressed bioprocesses. Of these 432 bioprocesses, a total of 54 bioprocesses were common to both groups and, therefore, were classified as inversely regulated (shared) bioprocesses (Figure 5). To identify the point at which the number of the shared bioprocesses will approach the monotonic phase at which only real inversely regulated pathways will survive, we tested our gene list by gradually increasing the stringency of the fold-change cutoff. The fold-change cutoff of ±1.3 and ±1.35 satisfied this condition for inversely regulated and co-regulated bioprocesses, respectively (Figure 5). Using this filtering strategy and applying ±1.3-fold-change cutoff, we further refined our gene list to 69 genes (see Additional data files) which comprised 61 upregulated and 8 downregulated genes.

We next matched these 69 genes against the PubMed database using the PubMatrix [24] software tool. This analysis identify 12 genes that were extensively linked to lung-injury-related articles, with six of these genes also linked to mechanical ventilation-related articles, a finding that indirectly validates our approach (Table 2). Given the pre-eminent importance of the vascular component in ALI pathogenesis, our primary trait in selecting candidate genes was their expression in vascular endothelium. The PubMatrix output identified a number of genes linked to articles that included lung, endothelium, and even pulmonary endothelium terms in their context, which again facilitated our selection of new gene candidates for further studies.

Table 2 Genes showing significant changes in expression throughout all biological systems tested

We also investigated whether our gene list might reveal unsuspected biological processes and pathways activated or suppressed by VALI. To address this we linked the available GenMAPP [27] biological GO processes [23] to our gene list. The resulting picture of the biological processes affected by mechanical stretch in our models is shown in Table 3 with 'Immune Response,' 'Inflammatory Response,' 'Blood Coagulation,' and 'Cell Cycle Arrest' biological processes identified as the most significantly upregulated by mechanical stretch. As our gene list had only eight downregulated genes, the MAPPFinder output for downregulated pathways did not allow filtering (see Materials and methods). The complete list of genes and GO processes identified by our procedure is provided in our supplemental data files.

Table 3 MAPPFinder results for significantly upregulated genes throughout all species tested

Finally, we compared our list of candidate genes with the genes obtained from four individual experimental systems using the same filtering conditions (±1.3-fold-change cutoff and p < 0.05). As shown in Table 4, analysis of gene expression in canine, human, mouse and rat models identified 9, 7, 13, and 15 genes out of our 69 candidates, respectively. The total of 28 genes (~40%) successfully identified by our ortholog approach did not survive selection by individual species analysis, and included well known ALI-related candidate genes such as IL1β, COX-2, PAI-1, BTG1, and FGA. The linkage of orthologous genes from different arrays increased the statistical power of our gene-expression analysis and allowed us to identify candidate genes that would otherwise remain unnoticed. A small fraction (~15%) of known ALI-related genes [7, 28, 29] were identified by individual species analysis but not detected by our bioinformatics approach (Table 4). This is to be anticipated, as differences exist in gene representation on multiple array platforms. For example, genes coding for the ALI candidates interleukin-8 and tumor necrosis factor-alpha were not presented on the rodent arrays, and therefore were excluded from our analysis. The tissue-specific gene expression also contributed to this false-negative gene fraction. The gene coding for surfactant C, which is mainly expressed in epithelial cells, was identified during analysis of stretched canine lung tissues but was excluded by our orthologous method because of the virtual absence of expression in stretched endothelial cells (Table 4).

Table 4 Comparison of candidate gene list generated by multi-species cross-platform analysis with that obtained using a single-experiment analysis


The procedure we have described presents a complementary and potentially useful approach in searching for candidate genes involved in specific biological processes of interest. General trends in the expression of common groups of genes in response to a specific stimulus in diverse species might relate unsuspected evolutionarily conserved responses triggered by this stimulus. At the same time, known biological pathways and genes, either activated or suppressed by a selected stimulus, can be used as a validation of this approach. In this study, we investigated the response of four different biological systems (rat, mouse, dog, and human cell culture) to levels of mechanical stretch relevant to ALI. Our ortholog approach and filtering algorithm allowed us to identified 12 VALI candidate genes previously linked to ALI, five of which went undetected using a common analytical approach. We also selected six novel endothelium-related candidate genes that warrant further investigation (Table 2).

The most commonly cited upregulated ALI genes in our list were those for IL-1β and interleukin-6 (IL-6), which were cited as lung-injury-related proteins in 287 and 173 references, respectively. Importantly, IL-1β did not survive standard selection as a candidate gene and was undetected by the same-species analytical approach. IL-6 had the highest number of links (75 citations) to mechanical ventilation (Table 2). Clinical studies showed that IL-1β and IL-6 concentrations in broncho-alveolar lavage fluid (BALF) from patients with established adult respiratory distress syndrome (ARDS) were higher than in BALF from normal volunteers [30]. Moreover, IL-1β was self-sufficient in causing ALI when overexpressed in mouse lungs [31] and was directly related to VALI in another mouse model [32]. IL-6 levels in ALI patients correlated with the mode of mechanical ventilation, as low tidal volume was associated with lower IL-6 and elevated tidal volume with high IL-6 concentrations [33].

Predictably, we identified several genes encoding enzymes that are highly conserved throughout evolution, including the ALI-related enzyme prostaglandin-endoperoxide synthase 2/cyclooxygenase-2 (PTGS-2/COX-2). COX-2 is involved in eicosanoid synthesis and appears to be important to both edemagenesis and the pattern of pulmonary perfusion in experimental ALI. Gust et al. showed that the effect of endotoxin on pulmonary perfusion in ALI could be, in part, the result of activation of inducible COX-2 [34]. Upregulation of the COX-2 gene is also linked to increased pulmonary microvascular permeability in a sheep model of combined burn and smoke inhalation injury [35].

We also showed that the lung-specific surfactant protein regulation transcription factor, CCAAT enhancer-binding protein (C/EBP), was upregulated in all VALI models. C/EBP has an important role in the regulation of expression of surfactant proteins A and D, which are heavily involved in pulmonary host defense and innate immunity [36], with increased gene expression in patients with ALI [37, 38]. Upregulation of C/EBP by severe lung injury [39] is highly correlated with our findings (1.4-fold increase in C/EBP expression, p = 0.013, Table 2). As endothelium does not generate surfactant, it will be of interest to identify the molecular targets of C/EBP in endothelium; these may include interleukin-13 (IL-13) [40] and cell chemokine 2 (CCL2) [41]. These genes belong to the 'Inflammatory Response' GO biological process that was rated by MAPPFinder as highly upregulated (Table 3).

The second most highly represented ontology in the ALI-related genes bioprocess was 'Blood Coagulation' (Table 3), a finding consistent with previous reports of increased levels of coagulation factor III (thromboplastin, tissue factor, F3) and plasminogen activator inhibitor type 1 (PAI-1) in patients with ALI [4244] or VALI [45, 46]. Fibrinogen A (FGA) and plasminogen activator - the urokinase receptor (PLAUR) - are involved in IL-1β signaling and regulation, respectively. Fibrinogen indirectly activates transcription of IL-1β [47], which in turn increases expression of the urokinase receptor [48]. Interestingly, this bioprocess was identified by MAPPFinder solely on the basis of data generated by our ortholog algorithm, as in a single-species analysis, three out of four genes related to the blood coagulation bioprocess did not survive statistical filtering (Table 4).

The interconnection of coagulation and inflammation is well recognized in that inflammation leads to increased coagulation, relevant to ALI (for a review see [8]) and a likely link is vascular endothelium. There is some evidence that the 'cross-talk' between coagulation and inflammation could be reversed. Blood coagulation in vitro stimulates release of inflammatory mediators from neutrophils and endothelial cells [49, 50]. On the basis of these findings and data generated by our cross-species analysis of VALI, we speculate that mechanical stretch may produce either injury or activation of the pulmonary endothelium with activation of a coagulation cascade that may involve platelet aggregation. Procoagulation genes are therefore key participants in the early stages of VALI. Given that a multitude of inflammatory cytokines produce upregulation of the coagulation cascade, further studies of the time-course analysis of expression patterns of selected candidate genes in response to VALI are needed to clarify this paradigm.

In summary, our findings indicate that alterations in gene expression in response to mechanical ventilation alone can be detected by microarray techniques applied across diverse biological systems. Our data suggest that ortholog-link gene-expression analysis of multi-species VALI-simulating experimental systems is a useful tool in selecting candidate genes involved in this pathobiological process, with clear advantages over single-species analysis. We anticipate that predicted drawbacks such as incompleteness of gene representation on different array platforms and tissue-specific gene expression can be overcome by careful selection of array platforms and experimental models, respectively, as well as further improvements or refinements in the Affymetrix platform itself.

The ortholog gene-expression approach promotes application of the meta-analysis of multi-species gene-expression profiles in diverse human pathologic conditions and facilitates the selection of candidate genes of interest, with the emphasis on actively evolutionarily conserved genes.

Materials and methods

Animal models of acute lung injury (ALI)

Rats were anesthetized with 0.4 mL of etomidate (2 mg/ml) by intraperitoneal injection before cannulating the trachea for ventilation. Rats were then placed in heated water-jacketed chambers and core body temperature was adjusted to 37°C. The experimental group of rats (n = 2) was mechanically ventilated (12 ml/kg tidal volume, 150 breaths/min) while the control group (n = 2) breathed spontaneously. After 5 h ventilation the lungs were rapidly excised, snap frozen and stored at -80°C until processed for RNA isolation. Mice were anesthetized by intraperitoneal injection of ketamine (150 mg/kg) and acetylpromazine (15 mg/kg). The endotracheal intubation was performed and mice (n = 4) were exposed to high tidal volume (15 ml/kg; breathing rate = 92/min) ventilation for 2 h using a small animal mechanical ventilator; a control group (n = 3) was not ventilated. The excised lungs were snap-frozen and stored at -80°C.

Dogs were anesthetized, intubated, and the lungs were lavaged and either ventilated for 5 h (n = 4) or collected immediately following the lavage procedure (n = 3) as control tissues. Lungs were snap-frozen and stored at -40°C. All experimental protocols were approved by the Johns Hopkins University Animal Care Committee.

Human HPAEC cells (Clonetics), passages 6-8, grown on flexible, bottomed collagen I-coated BioFlex plates in the presence of complete culture medium (20% FCS) were exposed to cyclic stretch (25 cycles/min, 18% elongation) for 48 h (n = 4) as we have described [10] using FlexerCell Tension Plus T-4000 cell culture stretch system or remained static (n = 3).

The time-course of the experiments was selected according to the manifestation of the defining feature of ALI - vascular leakage.

RNA isolation and hybridization

Smaller frozen tissues (~50 mg) were directly solubilized in chaotropic solubilization buffer using a Brinkman Polytron tissue disruptor. Larger tissue fragments (>100 mg) were pulverized into frozen powder with a mortar and pestle, pre-chilled to liquid nitrogen temperature, and the frozen powder solubilized with the Polytron. RNA was purified using Trizol LS (Life Technologies) and an additional RNA purification step was conducted using the RNAeasy purification kit (Qiagen). Approximately 10 μg of purified, total RNA was used for analyses. HPAEC total RNA was purified using Trizol LS and an additional RNA clean-up step was conducted using the RNAeasy purification kit. Purified total RNA was reverse transcribed to first-strand cDNA using a hybrid primer consisting of oligo(dT) and T7 RNA polymerase promoter sequences. The single-stranded cDNA was then converted to double-stranded cDNA. Complementary DNA corresponding to 5-10 μg total RNA was used in a cRNA amplification step using T7 RNA polymerase and two biotinylated nucleotide precursors. The resulting biotinylated cRNA was fragmented to a size of approximately 50 bp. Approximately 20-30 μg of the biotinylated canine, mouse, rat and HPEAC cRNA was hybridized to U133A, U74A, U34A, and U95A GeneChips (Affymetrix), respectively. The bound cRNA was visualized by binding of streptavidin/phycoerythrin conjugates to the hybridized GeneChip, followed by laser scanning of bound phycoerythrin. These scan results are available on the HopGene website (Table 5).

Table 5 Data sets and analytical tools sources

Building the ortholog-linked database

Probe IDs of U74A, U34A, and U95A GeneChips were linked to their orthologous counterparts on the U133A using RESOURCERER (Table 5). This linkage identified 3,077 genes common to all array ortholog nodes (Figures 1, 2a), which were built around 2,887 reference genes from the U133A chip. The actual number of reference genes was lower, owing to the fact that in some cases multiple orthologs for the same reference gene are represented on the arrays (Figure 4). Any identified missing links between members of a node (Figure 2b) were filled, and the newly linked node members were coined as putative orthologs on the basis of their homology to the common reference gene. In total, the final ortholog-linked database contained 2,887 reference probe sets from U133A, and 2,631, 2,365 and 2,848 ortholog probe sets from U95A, U74A and U34A, respectively. The unequal numbers are due to the sharing of the same ortholog by different reference genes (Table 1).

Expression-data analysis

The signal intensity fluorescent images produced during Affymetrix GeneChip hybridizations were read using the Agilent Gene Array Scanner and converted into GeneChip Cell files (CEL) using MAS 5.0 software (Affymetrix). The analysis of the probe level data (available in the .CEL files) was performed using the Bioconductor affy package [24, 51]. In particular, the package was used to extract the probe-level data and convert into expression measures of individual probe pairs. Various strategies have been used which in general involve three steps: background correction, across-array normalization, and summarization. For the analysis presented here, we utilized the mas5 module of the affy package [52]. This probe-level analysis was conducted in all species tested and poorly performing probe pairs thus revealed were masked before converting CEL files into GeneChip Sequence files (CHP) using MAS 5.0. The signal intensity values obtained for U95Av2, U74A and U34A were used directly for analysis, and those of the hetero-hybridized U133A arrays were adjusted on the basis of differences in canine and human gene sequences (the detailed procedure and validation of probe-level analysis and probe-set modification will be described elsewhere). This approach increased the present call of a transcript (p < 0.04) by 25% on average, compared to unadjusted probe-set processing (Figure 4). The remaining absent calls (p > 0.06) for transcript abundance were assumed to be a result of undetectable message concentration (<1 pM [53]) rather than technical or detection errors. Therefore, absent calls for each GeneChip were averaged and all absent calls for a given chip were replaced with this average value. This modified dataset was used for further analysis.

Selecting significant gene-expression changes using a single experiment or orthologous approaches

The data from each orthologous group (reference gene and its orthologs) was pooled in four species-specific groups. For statistical analysis, each species-specific group contributed three control and four mechanical stretch-challenged samples, except for rat, which had two control and two challenged lung samples. Therefore, the dataset for each ortholog group was comprised of 11 control and 14 stress-challenged samples. However, in more than 60% of analyzed array probes, multiple paralogs existed for each species-specific group (Figure 3). To maintain three-control/four-condition input from these groups, we averaged expression data of multiple paralogues on the chipwise basis (the only data for paralogs from the same chip was averaged). At the same time, the data were scaled using raw-wise average normalization. A two-tailed unequal variance independent t-test was performed on ortholog-generated expression datasets (11 controls vs 14 mechanical stretches) or individual experiments (three controls vs four mechanical stretch samples for canine, HPAEC and mouse models, and two control vs two mechanical stretch samples for the rat model) and changes in gene expression with p < 0.05 was used as a cutoff to produce preliminary lists of candidate genes. The fold-change ratio was computed from the mean values of control and mechanical stretch sets produced by t-test.

Gene ontology (GO) analysis

The MAPPFinder software is not yet compatible with the U133A probe sets. To overcome this incompatibility we substituted the U133A probe IDs with corresponding MAPPFinder-compatible U95A probe IDs. To link our gene list to corresponding GenBank accession numbers and consequently to GO terms, we utilized GenMAPP software and linked 2,278 genes out of 2,887 reference genes to GO and repeatedly (nine cycles) analyzed these results by MAPPFinder using different settings of fold-change limit from ±1.1 to ±1.5 with increment of ±0.05 (Figure 5). GO biological process assignments were selected and filtered by Z-score (>0) and the number of hits in the first GO node (>0). GO terms simultaneously identified as both down- and upregulated bioprocesses were selected using Microsoft Access 2000 and considered to be inversely regulated biological processes. If the results represented unique down- or upregulated GO bioprocesses, these processes were considered to be co-regulated (Figure 5). The point at which the number of shared GO terms became constant was selected as a threshold fold-change cutoff (±1.3). The analysis depicted in Table 3 was conducted using a candidate set of 69 genes, using GO and local MAPPs with fold-change set at 1.3 or higher and p < 0.05.

Journal articles that referenced lung, lung injury, mechanical ventilation, endothelium, or pulmonary endothelium and our candidate genes at the same time were obtained using the PubMatrix tool [24]. URLs for statistical tools and analytical software employed in our analysis are listed in Table 5.

Additional data files

The following additional data files are available with the online version of this article: a final list of reference genes (Additional data file 1), a list of gene candidates (Additional data file 2), the full GO results (Additional data file 3), and data for Table 4 (Additional data file 4).