An expression module of WIPF1-coexpressed genes identifies patients with favorable prognosis in three tumor types
- First Online:
- Cite this article as:
- Staub, E., Groene, J., Heinze, M. et al. J Mol Med (2009) 87: 633. doi:10.1007/s00109-009-0467-y
- 699 Downloads
Wiskott–Aldrich syndrome (WAS) predisposes patients to leukemia and lymphoma. WAS is caused by mutations in the protein WASP which impair its interaction with the WIPF1 protein. Here, we aim to identify a module of WIPF1-coexpressed genes and to assess its use as a prognostic signature for colorectal cancer, glioma, and breast cancer patients. Two public colorectal cancer microarray data sets were used for discovery and validation of the WIPF1 co-expression module. Based on expression of the WIPF1 signature, we classified more than 400 additional tumors with microarray data from our own experiments or from publicly available data sets according to their WIPF1 signature expression. This allowed us to separate patient populations for colorectal cancers, breast cancers, and gliomas for which clinical characteristics like survival times and times to relapse were analyzed. Groups of colorectal cancer, breast cancer, and glioma patients with low expression of the WIPF1 co-expression module generally had a favorable prognosis. In addition, the majority of WIPF1 signature genes are individually correlated with disease outcome in different studies. Literature gene network analysis revealed that among WIPF1 co-expressed genes known direct transcriptional targets of c-myc, ESR1 and p53 are enriched. The mean expression profile of WIPF1 signature genes is correlated with the profile of a proliferation signature. The WIPF1 signature is the first microarray-based prognostic expression signature primarily developed for colorectal cancer that is instrumental in other tumor types: low expression of the WIPF1 module is associated with better prognosis.
KeywordsColorectal cancer WIPF1 Prognosis Expression signature Microarray
The WIPF1 gene encodes the WASP/WASL interacting protein family member 1 that plays an important role in the organization of the actin cytoskeleton [1, 2]. The WIPF1-encoded protein WIP binds to a region of Wiskott–Aldrich syndrome protein (WASP) that is frequently mutated in patients with Wiskott–Aldrich syndrome (WAS) [3, 4], and WIP mutations themselves lead to an immunological disorder resembling Wiskott–Aldrich syndrome . WAS is an X-linked recessive disease that predisposes to leukemia and lymphoma. The WIP protein is essential for WASP synthesis and probably acts as its chaperone . Disruption of the WASP-WIP interaction by hereditary mutations leads to a rounded cell surface on immune cells, a conversion that is thought to coincide with a diminished capability to form immune synapses and reduction of NK cell cytotoxicity . WIP is important for podosome formation in macrophages and cellular fusions in flies [8, 9], stressing its universal role in the design of cell membrane remodeling. Apart from its expression in diverse immune cells, several human tissues exhibit WIP expression. Little is known about the expression of WIPF1 in solid tumors. However, WIPF1 expression levels influence morphology and migration of fibroblasts . This prompted us to investigate the expression characteristics of WIPF1 in colorectal tumors with the aim to study its potential for prognosis.
A multitude of microarray studies have been carried out during the past decade to gain a better understanding of basic colorectal cancer (CRC) biology [11, 12, 13, 14, 15, 16, 17, 18]. Other CRC microarray studies led to the discovery of informative gene sets for the prediction of the response to therapy or tumor recurrence [19, 20, 21, 22, 23], diagnosis of tumor stage [24, 25, 26, 27], lymph node metastasis [28, 29, 30], or liver metastasis [31, 32]. Until now, cross-validation of diagnostic or prognostic signatures with independent data sets has hardly been performed for colorectal cancer. This is probably due to the fact that published signatures are only overlapping to a small degree and that they are difficult to reproduce when originating from different laboratories and platforms (for a discussion see Groene et al. ). In addition, until recently, data sets with sufficient patient information were lacking in public databases which hindered cross-validation of signatures from different studies.
Here, we describe the identification of a set of genes that is co-expressed with WIPF1. It was discovered through re-analysis of two public microarray data sets on clinical colorectal cancer specimen that were deposited in the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo). Whereas the first data set was used for discovery of WIPF1-coexpressed genes, the second data set served us for validation of the expression correlations. Using a simple classification algorithm trained on microarray data of WIPF1 co-expressed genes from the two studies, we identified patients with characteristic expression of the WIPF1 coexpression module in three further microarray data sets with information about survival or relapse of patients: a colorectal cancer data set of our own, a breast cancer data set, and a glioblastoma data set. Strikingly, patients with low expression of the WIPF1 signature have the best prognosis in all three data sets, in total comprising more than 400 patients. Based on an analysis of the WIPF1 coexpression module in the context of literature-based gene networks, we identified plausible regulatory mechanisms responsible for lower WIPF1 module expression in patients with better prognosis.
Materials and methods
For the generation of our own microarray data set, 62 CRC patients undergoing elective standard oncological resection at the Department of General, Vascular, and Thoracic Surgery, Campus Benjamin Franklin, Charité, were prospectively recruited. Several clinical characteristics of the patients were recorded (see Supplementary Table 1). The study was approved by the local ethical committee, and informed consent was obtained from all patients.
Pre-processing of frozen tissue blocks by laser-capture microdissection was essentially performed as described in our previous publications [13, 14, 24, 33]. Briefly, all cancer specimens were snap frozen within 20 min following excision. After laser-capture microdissection frozen tissue specimens were serially cut into 6- to 8-µm-thick sections which were mounted on a sterile 2.5 mm membrane. Slides were fixed in 70% ethanol. The sections were briefly stained with hematoxylin and eosin, dehydrated in ethanol, and dried for 10–15 min using an exsiccator. The membrane was turned around and fixed with adhesive tape on the other sterile slide. First slides served as a template on which the areas of tumor or normal epithelium were marked. On the consecutive section, these areas were microdissected using a laser microdissection system (SL, Jena, Germany and P.A.L.M. Microlaser Technologies AG Bernried, Germany) and capture transfer films (Arcturus GmbH, Moerfelden-Walldorf, Germany). For molecular analysis, up to 100,000 cells or approximately 30–60 mm2 of tissue section areas were pooled and collected in ice-cooled tubes containing 100 ml of 98% guanidine thiocyanate (GTC) buffer and 2% beta-mercaptoethanol.
Messenger RNA preparation and DNA chip hybridization
PolyA mRNA from the microdissected specimens was prepared using the PolyA-tract 1000 kit (Promega, Heidelberg, Germany) according to the manufacturer’s recommendations. For each sample, the cDNA synthesis and repetitive in vitro transcription were performed three times. The total amount of prepared mRNA from each sample was used. First strand cDNA synthesis was initiated using the Affymetrix T7-oligo-dT promoter–primer combination at 0.1 mM. The second strand cDNA synthesis was generated by internal priming. In vitro transcription was performed using the Megascript kit (Ambion, Huntington, UK) as recommended by Ambion. From the generated cRNA, a new first strand synthesis was initiated using 0.025 mM of a random hexamer as primer. After completion, the second strand synthesis was performed using the Affymetrix T7-oligo-dT promoter–primer combination. A second in vitro transcription was performed, and then the procedure was repeated one additional time. During the last in vitro transcription, biotin-labeled ribonucleotides were incorporated into the cRNA, as recommended by the Affymetrix protocol. Hybridization and detection of the labeled cRNA on the Affymetrix U133A Chip were performed according to Affymetrix standard protocol.
Microarray data pre-processing
Public expression data was downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/projects/geo/). In addition to our own data that was deposited in GEO with accession number GSE12945, we used four different data sets from this repository. The colorectal cancer data sets GSE5206 of the Aronow group (see Kaiser et al. ), GSE7208 of Ayers and co-workers  and of our own served us for discovery and validation of the WIPF1 signature in CRC. The GSE2034 data set of Wang et al. served us for assessment of predictive power of the WIPF1 signature for breast cancer patients . The data set of Phillips et al. (GSE4271) served us for assessment of predictive power of the WIPF1 signature in high-grade glioma patients . We refer to the original publications and the GEO database for patient and sample characteristics.
For our own microarray experiment, we used algorithms implemented in the freely available statistical software package R (http://www.r-project.org/) and its public package repositories CRAN (http://cran.r-project.org/) and the bioinformatics R package repository Bioconductor (http://www.bioconductor.org/). Preprocessing: Raw expression data were condensed to probe set-wise intensity values using the RMA algorithm. For experiment normalization, all colorectal cancer data sets were pre-processed using the same data transformations. If not already done, the raw data were log transformed. Independently of the original authors’ pre-processing, we quantile-normalized each data set on the probe set level. Then we filtered out the 10% probesets with lowest median expression and 10% probesets with lowest variance to enrich informative probesets in an unbiased way. We restricted the further analysis to probesets passing this filter in each of the three data sets. Probeset annotations for Affymetrix expression microarrays (most importantly gene symbols) were retrieved from the Affymetrix web site (version 22). We condensed probeset signals on the gene symbol level by averaging across all remaining probesets of a gene. On the gene level, each data set was finally mean centered. Breast cancer and glioma data were processed in the same way: but here no additional probesets were filtered out due to low variance or mean expression signals in these data sets. Instead, for mapping of probeset expression intensities onto the gene level, we considered all probesets that were used for probeset-to-gene mapping during processing of the three CRC data sets. Using the applied normalization scheme, we intend to ensure that Affymetrix microarray data from the five different studies are comparable (we note that our study does not include a cross-platform comparison as all data sets were generated using Affymetrix gene chips).
Expression data mining
Expression data mining was carried out using the statistics software R supplemented with diverse packages from the CRAN or Bioconductor projects. Analysis of the correlation of two expression profiles was evaluated with Pearson correlation coefficients determined with the function cor.test in package stats. Using the same function we determined p values for the significance of the deviation of the correlation coefficient from 0. The average expression profile of a multigene expression signature (proliferation signature by Rosenwald et al. ), here denoted as signature centroid, was determined using averaging across signature genes for each patient. For tumor class discovery, we applied hierarchical clustering using the heatmap.2 function of the gplots package in R. The distance matrices for row and column clustering were determined using pairwise correlation distances (d = 0.5 (1 − cor(x,y))) of the gene-wise mean-centered expression intensities of genes and samples, respectively. For clustering, we used the complete linkage hierarchical clustering algorithm.
In the preceding paragraph, we described how we identify tumor classes in a training data set using unsupervised analysis (clustering). Using classification (supervised analysis), we then attempted to detect the tumor classes that we identified on a primary (training) data set in tumors of secondary (test) data sets from independent studies. As a classification algorithm, we used k-nearest neighbor classification as implemented in the R package class (function knn with k = 9). The classifiers were trained on tumor expression profiles of the training data and then directly applied on test data sets. A prerequisite for our classifier to work properly on the test data (here solely external data sets from independent studies) is that training and test data are sufficiently normalized which in our study should be ensured by log-transforming the expression values followed by gene-wise mean centering. Across-study normalization based on intra-study mean centering of log-transformed expression intensities from Affymetrix chips was already shown by Lusa et al. to be a pre-processing strategy that can be the basis of good classifier performance when the aim is to construct gene expression-based predictors for tumor classes across studies . However, the authors also stated that best classifier performance can only be expected if the fraction of tumor classes in the different data sets is comparable. In our own validation of our methodology, we found that k-nearest neighbor-based predictors of estrogen receptor status in breast cancers based on an estrogen-responsive set of genes achieves prediction accuracies on external data sets of ~90% on average on gene-wise mean-centered expression data (based on four Affymetrix U133A microarray data sets, data available upon request, manuscript in preparation). Classifier performance was still at 76% when the tumor classes were not balanced (e.g., 1:8 in test data). We note that we did not attempt to assess classifier accuracy using cross-validation on the primary data because we have derived the tumor classes by data mining in the complete primary data set: it is obvious that a numerical difference between classes exists. Therefore, the assessment of classifier performance using cross-validation could lead to a serious overestimation of classifier accuracy. The reason is that even if test cases in cross-validation are not used for classifier training, they were already included in the initial clustering analysis that led to the assignment of class labels: a violation of a main principle of cross validation—the independence of training and test data.
Patient survival and relapse were visualized with Kaplan–Meier curves determined with the survfit function in the survival package. The logrank test as implemented in the coxph function of the survival package was used to assess the significance of differences in survival/relapse times between patient groups. The significance of association of continuous variables, here gene expression intensities, with survival/relapse were tested using Cox regression and Wald tests on the model coefficients and their variances as implemented in the coxph function. For survival analyses, we used robust estimates of Cox model coefficient variances (parameter robust=T). If not otherwise indicated, default parameter settings were used in the functions mentioned above.
Literature-based gene networks were investigated using the metaCore software from GeneGo (Saxony Road, #104, Encinitas, CA 92024, USA). Subnetworks of genes with functional links based on literature evidence were screened for enrichment of genes co-expressed with WIPF1. Those networks with significant enrichment of WIPF1 genes were further investigated for enrichment of Gene Ontology categories. The significance of enrichment of either user supplied gene lists (like the gene list of the WIPF1 signature) or genes lists associated with Gene Ontology (GO) terms in gene lists of literature subnetworks is given by hypergeometric p values (with the complete set of human genes with literature information as a reference set).
Discovery and validation of a WIPF1 coexpression module
Prediction of low-risk colorectal cancers using the WIPF1 signature
The WIPF1 signature has prognostic predictivity for brain and breast tumors
Summary of WIPF1 correlation and survival/relapse association for 38 core genes of the WIPF1 module and WIPF1 itself
Cox regression Wald test
Cox regression Wald test
The WIPF1 signature has links to c-myc, p53, proliferation, and apoptosis
Summaries of top literature subnetworks enriched with genes from the WIPF1 coexpression module
General Molecular Network
ADAM19, SLC25A10, CDC14a, UBE2E3, TXNIP (VDUP1), ...
Sulfate transport (8.1%; 7.943e−06), cell division (18.9%; 1.334e−05), mitosis (16.2%; 1.414e−05), M phase of mitotic cell cycle (16.2%; 1.598e−05), M phase (18.9%; 1.854e−05)
REA, NLK, Chordin-like 1, Copine-1, ...
BMP signaling pathway (11.6%; 8.328e−08), positive regulation of osteoblast differentiation (9.3%; 1.255e−06), regulation of osteoblast differentiation (9.3%; 7.241e−06), transmembrane receptor protein serine/threonine kinase signaling pathway (11.6%; 1.533e−05), developmental process (67.4%; 2.095e−05)
Neurofibromin, TXNIP (VDUP1), REA, DEDD, DEDD2, ...
Regulation of apoptosis (45.5%; 2.153e−13), regulation of programmed cell death (45.5%; 2.710e−13), regulation of developmental process (52.3%; 2.099e−12), Ras protein signal transduction (22.7%; 1.044e−11), negative regulation of cellular process (54.5%; 1.944e−11)
Transcriptional Regulation Network
Positive regulation of mitotic cell cycle (25.0%; 1.015e−05), cell cycle (62.5%; 3.803e−05), regulation of mitotic cell cycle (37.5%; 5.661e−05), regulation of cell cycle (50.0%; 5.920e−05), positive regulation of cell cycle (25.0%; 3.432e−04)
Response to hormone stimulus (57.1%; 2.915e−05), response to endogenous stimulus (57.1%; 3.184e−05), response to organic nitrogen (28.6%; 1.001e−04), response to steroid hormone stimulus (42.9%; 2.113e−04), negative regulation of hydrolase activity (28.6%; 2.580e−04)
Response to organic nitrogen (28.6%; 1.001e−04), positive regulation of cell cycle (28.6%; 2.580e−04), regulation of apoptosis (57.1%; 4.295e−04), nucleic acid–protein covalent cross-linking (14.3%; 4.446e−04), RNA–protein covalent cross-linking (14.3%; 4.446e−04)
The hypothesis that the WIPF1 gene is important for cancer development was based on two facts. First, its encoded WIP protein interacts with the Wiskott–Aldrich syndrome protein WASP through a surface that is affected by a WASP mutation, and Wiskott–Aldrich syndrome predisposes to leukemia and lymphoma. Second, expression levels of the WIP protein influence the migratory and differentiation properties of fibroblasts. It has not been studied so far how the expression of the WIPF1 gene is regulated and if its coexpression neighborhood provides an additional link to cancer. Here, we show that there exists a module of genes that is coexpressed with WIPF1 in colorectal cancers. The majority of genes in this module show a characteristic down-regulation in several cancer patients with longer survival time or time to relapse, also in other cancer types than colorectal cancer. We found that the module genes do not overlap with the frequently rediscovered “proliferation” signature that is regulated during cancer cell mitosis. Instead, a large number of genes of the WIPF1 coexpression module have poorly characterized functions. Only single genes link directly to cancer-relevant processes like proliferation and apoptosis. However, we could show that the expression profile of the WIPF1 signature correlates significantly with expression profile of the Rosenwald proliferation signature. Literature networks revealed that the link of the WIPF1 module to proliferation can partly be explained by the fact that a large fraction of WIPF1 module genes are known transcriptional targets of oncogenic transcription factors like c-myc, ESR1, or p53. In this context, it is interesting to note that estrogen receptor signaling is not only of importance for breast cancers, but also apparently able to modulate the aggressiveness of prostate cancers . It is tempting to hypothesize that keeping the expression of the WIPF1 module in a low expression state is causative for a less aggressive cancer phenotype, e.g., by inhibition of WIPF1/WASP-related cytoskeletal remodeling that coincides with a reduced ability of cells to migrate and metastasize.
In conclusion, we presented a module of WIPF1-coexpressed genes. The expression signature of this module could be used to identify patients with better prognosis with respect to relapse or survival in expression data sets of three different tumor types, colorectal cancer, breast cancer, and high-grade glioma. The WIPF1 coexpressed genes seem to be linked to proliferation and apoptosis possibly by regulation through c-myc, ESR1, and p53. We propose the WIPF1 signature as an alternative predictor of breast, brain, and colorectal cancer prognosis.
We thank Anja von Heydebreck for critical reading of the manuscript and valuable comments.
Disclosure of potential conflict of interests
The authors declare that they have no conflicting interests related to this study.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.