Identification of Biomarkers for Prostate Cancer Prognosis Using a Novel Two-Step Cluster Analysis
Prognosis of Prostate cancer is challenging due to incomplete assessment by clinical variables such as Gleason score, metastasis stage, surgical margin status, seminal vesicle invasion status and preoperative prostate-specific antigen level. The whole-genome gene expression assay provides us with opportunities to identify molecular indicators for predicting disease outcomes. However, cell composition heterogeneity of the tissue samples usually generates inconsistent results for cancer profile studies. We developed a two-step strategy to identify prognostic biomarkers for prostate cancer by taking into account the variation due to mixed tissue samples. In the first step, an unsupervised EM clustering analysis was applied to each gene to cluster patient samples into subgroups based on the expression values of the gene. In the second step, genes were selected based on χ2 correlation analysis between the cluster indicators obtained in the first step and the observed clinical outcomes. Two simulation studies showed that the proposed method identified 30% more prognostic genes than the traditional differential expression analysis methods such as SAM and LIMMA. We also analyzed a real prostate cancer expression data set using the new method and the traditional methods. The pathway assay showed that the genes identified with the new method are significantly enriched by prostate cancer relevant pathways such as the wnt signaling pathway and TGF-β signaling pathway. Nevertheless, these genes were not detected by the traditional methods.
Unable to display preview. Download preview PDF.