Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data

  • Elena Marchiori
  • Connie R. Jimenez
  • Mikkel West-Nielsen
  • Niels H. H. Heegaard
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3907)


Computational analysis of mass spectrometric (MS) proteomic data from sera is of potential relevance for diagnosis, prognosis, choice of therapy, and study of disease activity. To this aim, feature selection techniques based on machine learning can be applied for detecting potential biomarkes and biomaker patterns. A key issue concerns the interpretability and robustness of the output results given by such techniques. In this paper we propose a robust method for feature selection with MS proteomic data. The method consists of the sequentail application of a filter feature selection algorithm, RELIEF, followed by multiple runs of a wrapper feature selection technique based on support vector machines (SVM), where each run is obtained by changing the class label of one support vector. Frequencies of features selected over the runs are used to identify features which are robust with respect to perturbations of the data. This method is tested on a dataset produced by a specific MS technique, called MALDI-TOF MS. Two classes have been artificially generated by spiking. Moreover, the samples have been collected at different storage durations. Leave-one-out cross validation (LOOCV) applied to the resulting dataset, indicates that the proposed feature selection method is capable of identifying highly discriminatory proteomic patterns.


Support Vector Machine Feature Selection Feature Selection Algorithm Storage Duration Feature Selection Technique 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(6), 1145–1159 (1997)CrossRefGoogle Scholar
  2. 2.
    Cristianini, N., Shawe-Taylor, J.: Support Vector machines. Cambridge Press, Cambridge (2000)Google Scholar
  3. 3.
    Diamandis, E.P.: Analysis of serum proteomic patterns for early cancer diagnosis: Drawing attention to potential problems. Journal of the National Cancer Institute 96(5), 353–356 (2004)CrossRefGoogle Scholar
  4. 4.
    Issaq, H.J., et al.: SELDI-TOF MS for diagnostic proteomics. Anal. Chem 75(7), 148A–155A (2003)CrossRefGoogle Scholar
  5. 5.
    Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute 94(20), 1576–1578 (2002)Google Scholar
  6. 6.
    Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)CrossRefGoogle Scholar
  7. 7.
    Qu, Y., et al.: Boosted decision tree analysis of surface-enhanced laser desorption/ ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem 48(10), 1835–1843 (2002)Google Scholar
  8. 8.
    Zhu, W., et al.: Detection of cancer-specific markers amid massive mass spectral data. PNAS 100(25), 14666–14671 (2003)MATHCrossRefGoogle Scholar
  9. 9.
    Evgeniou, T., Pontil, M., Elisseeff, A.: Leave one out error, stability, and generalization of voting combinations of classifiers. Mach. Learn. 55(1), 71–97 (2004)MATHCrossRefGoogle Scholar
  10. 10.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Machine Learning 3, 1157–1182 (2003); Special Issue on variable and feature selectionMATHCrossRefGoogle Scholar
  11. 11.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)MATHCrossRefGoogle Scholar
  12. 12.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: International Conference on Machine Learning, pp. 121–129 (1994)Google Scholar
  13. 13.
    Jong, K., Marchiori, E., Sebag, M., van der Vaart, A.: Feature selection in proteomic pattern data with support vector machines. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)Google Scholar
  14. 14.
    Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on artificial intelligence, pp. 129–134 (1992)Google Scholar
  15. 15.
    Li, J., Zhang, Z., Rosenzweig, J., Wang, Y.Y., Chan, D.W.: Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clinical Chemistry 48(8), 1296–1304 (2002)Google Scholar
  16. 16.
    Lie, H., Motoda, H. (eds.): Feature Extraction, Construction and Selection: a Data Mining Perspective. International Series in Engineering and Computer Science. Kluwer, Dordrecht (1998)Google Scholar
  17. 17.
    Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2002)Google Scholar
  18. 18.
    Marchiori, E., Heegaard, N.H.H., West-Nielsen, M., Jimenez, C.R.: Feature selection for classification with proteomic data of mixed quality. In: Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 385–391 (2005)Google Scholar
  19. 19.
    Marshall, E.: Getting the noise out of gene arrays. Science 306(5696), 630–631 (2004)CrossRefGoogle Scholar
  20. 20.
    Oh, I.S., Lee, J.-S., Moon, B.-R.: Local search-embedded genetic algorithms for feature selection. In: 16 th International Conference on Pattern Recognition (ICPR 2002). IEEE Press, Los Alamitos (2002)Google Scholar
  21. 21.
    Ransohoff, D.F.: Lessons from controversy: Ovarian cancer screening and serum proteomics. Journal of the National Cancer Institute 97, 315–319 (2005)CrossRefGoogle Scholar
  22. 22.
    Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(2), 164–171 (2000)CrossRefGoogle Scholar
  23. 23.
    Rendell, L.A., Kira, K.: A practical approach to feature selection. In: International Conference on machine learning, pp. 249–256 (1992)Google Scholar
  24. 24.
    Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet 365(9458), 488–492 (2005)CrossRefGoogle Scholar
  25. 25.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)MATHGoogle Scholar
  26. 26.
    West-Nielsen, M., Hogdall, E.V., Marchiori, E., Hogdall, C.K., Schou, C., Heegaard, N.H.H.: Sample handling for mass spectrometric proteomic investigations of human sera. Analytical Chemistry 11(16), 5114–5123 (2005)CrossRefGoogle Scholar
  27. 27.
    Xing, E.P.: Feature selection in microarray analysis. In: A Practical Approach to Microarray Data Analysis, Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  28. 28.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlationbased filter solution. In: ICML, pp. 856–863 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Elena Marchiori
    • 1
  • Connie R. Jimenez
    • 2
  • Mikkel West-Nielsen
    • 3
  • Niels H. H. Heegaard
    • 3
  1. 1.Department of Computer ScienceVrije Universiteit AmsterdamThe Netherlands
  2. 2.Department of Molecular and Cellular NeurobiologyVrije Universiteit AmsterdamThe Netherlands
  3. 3.Department of AutoimmunologyStatens Serum InstitutCopenhagenDenmark

Personalised recommendations