Semisupervised Profiling of Gene Expressions and Clinical Data
We present an application of BioDCV, a computational environment for semisupervised profiling with Support Vector Machines, aimed at detecting outliers and deriving informative subtypes of patients with respect to pathological features. First, a sample-tracking curve is extracted for each sample as a by-product of the profiling process. The curves are then clustered according to a distance derived from Dynamic Time Warping. The procedure allows identification of noisy cases, whose removal is shown to improve predictive accuracy and the stability of derived gene profiles. After removal of outliers, the semisupervised process is repeated and subgroups of patients are specified. The procedure is demonstrated through the analysis of a liver cancer dataset of 213 samples described by 1 993 genes and by pathological features.
Keywordsstatistical learning semisupervised classification feature selection Support Vector Machines functional genomics DNA microarray
Unable to display preview. Download preview PDF.
- 1.Albanese, D.: BioDCV: a distributed computing system for the complete validation of gene profiles. Master’s thesis, University of Trento (2005)Google Scholar
- 5.Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2 (2004) DOI: 10.1371/journal.pbio.0020108Google Scholar
- 8.Furlanello, C., Serafini, M., Merler, S., Jurman, G.: Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 54 (2003)Google Scholar
- 10.Furlanello, C., Merler, S., Jurman, G.: Combining feature selection and DTW for time-varying functional genomics. Technical Report T05-05-01, ITC-irst (2005)Google Scholar
- 11.R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005)Google Scholar