An Algorithm for Finding Gene Signatures Supervised by Survival Time Data

  • Stefano M. Pagnotta
  • Michele Ceccarelli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6881)


Signature learning from gene expression consists into selecting a subset of molecular markers which best correlate with prognosis. It can be cast as a feature selection problem. Here we use as optimality criterion the separation between survival curves of clusters induced by the selected features. We address some important problems in this fields such as developing an unbiased search procedure and significance analysis of a set of generated signatures. We apply the proposed procedure to the selection of gene signatures for Non Small Lung Cancer prognosis by using a real data-set.


Bayesian Information Criterion Gene Ranking Seed Gene Prognosis Group Feature Selection Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alizadeh, A.A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)CrossRefGoogle Scholar
  2. 2.
    Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America 99(10), 6562–6566 (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Boutros, P.C., Lau, S.K., Pintilie, M., Liu, N., Sheperd, F.A., Der, D.S., Tao, M., Penn, L.Z., Jurisca, I.: Prognostic gene signatures for non-small-cell lung cancer Arch. Rat. Mech. Anal. 78, 315–333 (1982)CrossRefGoogle Scholar
  4. 4.
    Cai, Y.D., Huang, T., Feng, K.-Y., Hu, L., Xie, L.: A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas. PloS one 5(9), e12726 (2010)CrossRefGoogle Scholar
  5. 5.
    Ceccarelli, M., Maratea, A.: Improving fuzzy clustering of biological data by metric learning with side information. International Journal of Approximate Reasoning 47(1), 45–57 (2008)CrossRefzbMATHGoogle Scholar
  6. 6.
    Chang, H., Nuyten, D., et al.: Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. PNAS 102(10), 3738–3743 (2005)CrossRefGoogle Scholar
  7. 7.
    Chen, H.-Y., et al.: A Five-Gene Signature and Clinical Outcome in NonSmall-Cell Lung Cancer. The New England Journal of medicine 356(1), 11 (2007)CrossRefGoogle Scholar
  8. 8.
    Van De Vijver, M.J., et al.: A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347(25), 1999–2009 (2002)CrossRefGoogle Scholar
  9. 9.
    Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531 (1999)CrossRefGoogle Scholar
  10. 10.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Jain, A.K., Zongker, D.: Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans. Pattern Analysis and Machine Intelligence 19(2), 153–158 (1997)CrossRefGoogle Scholar
  12. 12.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. Wiley, Chichester (1990)CrossRefzbMATHGoogle Scholar
  13. 13.
    Lapointe, et al.: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences of the United States of America 101(3), 801 (2004)CrossRefGoogle Scholar
  14. 14.
    Lau, S., et al.: Three-gene prognostic classifier for early-stage non–small-cell lung cancer. Journal of Clinical Oncology 25(25), 5562–5566 (2007)CrossRefGoogle Scholar
  15. 15.
    Lisboa, P., Velido, A., Tagliaferri, R., Ceccarelli, M., Martin-Guerrero, J., Biganzoli, E.: Data Mining in Cancer Research. IEEE Computational Intelligence Magazine 5(1), 14–18 (2010)CrossRefGoogle Scholar
  16. 16.
    Mantel, N.: Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother. Rep. 50(3), 163–170 (1966)Google Scholar
  17. 17.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern. Anal. Mach. Intell. 27, 1226–1238 (2005)CrossRefGoogle Scholar
  18. 18.
    Sørlie, T., et al.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences of the United States of America 98(19), 10869 (2001)CrossRefGoogle Scholar
  19. 19.
    Rousseeuw, P.J., van Driessen, K.: A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics 41, 212–223 (1999)CrossRefGoogle Scholar
  20. 20.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Welch, W.J.: Construction of Permutation Tests. Journal of the American Statistical Association 85(411), 693–698 (1990)CrossRefGoogle Scholar
  22. 22.
    Zhang, X., Qian, X.L., Xu, X.-Q., Leung, H.-C., Harris, L., Iglehart, J., Miron, A., Liu, J., Wong, W.: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7, 197 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stefano M. Pagnotta
    • 1
  • Michele Ceccarelli
    • 1
    • 2
  1. 1.Department of ScienceUniversity of SannioBeneventoItaly
  2. 2.Bioinformatics CORE, BIOGEM s.c.a.r.l., Contrada CamporealeUniversity of SannioArianoItaly

Personalised recommendations