Advertisement

Semisupervised Profiling of Gene Expressions and Clinical Data

  • Silvano Paoli
  • Giuseppe Jurman
  • Davide Albanese
  • Stefano Merler
  • Cesare Furlanello
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3849)

Abstract

We present an application of BioDCV, a computational environment for semisupervised profiling with Support Vector Machines, aimed at detecting outliers and deriving informative subtypes of patients with respect to pathological features. First, a sample-tracking curve is extracted for each sample as a by-product of the profiling process. The curves are then clustered according to a distance derived from Dynamic Time Warping. The procedure allows identification of noisy cases, whose removal is shown to improve predictive accuracy and the stability of derived gene profiles. After removal of outliers, the semisupervised process is repeated and subgroups of patients are specified. The procedure is demonstrated through the analysis of a liver cancer dataset of 213 samples described by 1 993 genes and by pathological features.

Keywords

statistical learning semisupervised classification feature selection Support Vector Machines functional genomics DNA microarray 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Albanese, D.: BioDCV: a distributed computing system for the complete validation of gene profiles. Master’s thesis, University of Trento (2005)Google Scholar
  2. 2.
    Simon, R., Radmacher, M., Dobbin, K., McShane, L.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95, 14–18 (2003)CrossRefGoogle Scholar
  3. 3.
    Sese, J., Kurokawa, Y., Monden, M., Kato, K., Morishita, S.: Constrained clusters of gene expression profiles with pathological features. Bioinformatics 20, 3137–3145 (2004)CrossRefGoogle Scholar
  4. 4.
    Furlanello, C., Serafini, M., Merler, S., Jurman, G.: Semisupervised learning for molecular profiling. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 110–118 (2005)CrossRefGoogle Scholar
  5. 5.
    Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2 (2004) DOI: 10.1371/journal.pbio.0020108Google Scholar
  6. 6.
    Merler, S., Caprile, B., Furlanello, C.: Bias-variance control via hard points shaving. International Journal of Pattern Recognition and Artificial Intelligence 18, 891–903 (2004)CrossRefGoogle Scholar
  7. 7.
    Li, L., Pratap, A., Lin, H., Abu-Mostafa, Y.: Generalization by data categorization. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 157–168. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Furlanello, C., Serafini, M., Merler, S., Jurman, G.: Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 54 (2003)Google Scholar
  9. 9.
    Aach, J., Church, G.: Aligning gene expression time series with time warping algorithms. Bioinformatics 17, 495–508 (2001)CrossRefGoogle Scholar
  10. 10.
    Furlanello, C., Merler, S., Jurman, G.: Combining feature selection and DTW for time-varying functional genomics. Technical Report T05-05-01, ITC-irst (2005)Google Scholar
  11. 11.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Silvano Paoli
    • 1
  • Giuseppe Jurman
    • 1
  • Davide Albanese
    • 1
  • Stefano Merler
    • 1
  • Cesare Furlanello
    • 1
  1. 1.ITC-irstTrentoItaly

Personalised recommendations