Deep Learning for Proteomics Data for Feature Selection and Classification

  • Sahar IravaniEmail author
  • Tim O. F. Conrad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11713)


Todays high-throughput molecular profiling technologies allow to routinely create large datasets providing detailed information about a given biological sample, e.g. about the concentrations of thousands contained proteins. A standard task in the context of precision medicine is to identify a set of biomarkers (e.g. proteins) from these datasets that can be used for disease diagnosis, prognosis or to monitor treatment response. However, finding good biomarker sets is still a challenging task due to the high dimensionality and complexity of the data and the often quite high noise level.

In this work, we present an approach to this problem based on Deep Neural Networks (DNN) and a transfer learning strategy using simulation data. To allow interpretation of the results, we compare different approaches to analyze the learned DNN. Based on these interpretation approaches, we describe how to extract biomarker sets.

Comparison of our method to a state-of-the-art L1-SVM approach shows that the new approach is able to find better biomarker sets for classification when small sets are desired. Compared to a state-of-the-art \(\ell _1\)-support vector machine (\(\ell _1\)-SVM) approach, our method achieves better results for the classification task when a small number of features are needed.


Deep learning Attribution LRP Interpretation Feature selection Transfer learning Mass spectrometry Proteomics 



This study was funded by the German Ministry of Research and Education (BMBF) Project Grant 3FO18501 (Forschungscampus MODAL) and Project Grant 01IS18037I (Berlin Center for Machine Learning).


  1. 1.
    Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422(6928), 198 (2003)CrossRefGoogle Scholar
  2. 2.
    Alber, M., et al.: iNNvestigate neural networks!. J. Mach. Learn. Res. 20(93), 1–8 (2019)MathSciNetGoogle Scholar
  3. 3.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)CrossRefGoogle Scholar
  4. 4.
    Chollet, F., et al.: Keras (2015).
  5. 5.
    Conrad, T.O., et al.: Sparse proteomics analysis-a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data. BMC Bioinf. 18(1), 160 (2017)CrossRefGoogle Scholar
  6. 6.
    Conrad, T.O.F., et al.: Beating the noise: new statistical methods for detecting signals in MALDI-TOF spectra below noise level. In: Berthold, M.R., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS, vol. 4216, pp. 119–128. Springer, Heidelberg (2006). Scholar
  7. 7.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  8. 8.
    Donoho, D.L., et al.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Fiedler, G.M., et al.: Serum peptidome profiling revealed platelet factor 4 as a potential discriminating peptide associated with pancreatic cancer. Clin. Cancer Res. 15(11), 3812–3819 (2009)CrossRefGoogle Scholar
  11. 11.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)CrossRefGoogle Scholar
  12. 12.
    Gibb, S., Strimmer, K.: MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28(17), 2270–2271 (2012)CrossRefGoogle Scholar
  13. 13.
    Gibb, S., Strimmer, K.: Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31(19), 3156–3162 (2015)CrossRefGoogle Scholar
  14. 14.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  16. 16.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
  17. 17.
    Holzinger, A., Langs, G., Denk, H., Zatloukal, K., Müller, H.: Causability and explainabilty of artificial intelligence in medicine. Wiley Interdisc. Rev. Data Min. Knowl. Discovery, e1312 (2019)Google Scholar
  18. 18.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
  19. 19.
    Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). Scholar
  20. 20.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  21. 21.
    Jayrannejad, F., Conrad, T.O.F.: Better interpretable models for proteomics data analysis using rule-based mining. In: Holzinger, A., Goebel, R., Ferri, M., Palade, V. (eds.) Towards Integrative Machine Learning and Knowledge Extraction. LNCS (LNAI), vol. 10344, pp. 67–88. Springer, Cham (2017). Scholar
  22. 22.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  23. 23.
    Kratzsch, J., et al.: New reference intervals for thyrotropin and thyroid hormones based on national academy of clinical biochemistry criteria and regular ultrasonography of the thyroid. Clin. Chem. 51(8), 1480–1486 (2005)CrossRefGoogle Scholar
  24. 24.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  25. 25.
    Liu, Q., et al.: Comparison of feature selection and classification for MALDI-MS data. BMC Genom. 10(1), S3 (2009)CrossRefGoogle Scholar
  26. 26.
    Marrugal, Á., Ojeda, L., Paz-Ares, L., Molina-Pinelo, S., Ferrer, I.: Proteomic-based approaches for the study of cytokines in lung cancer. Dis. Markers 2016 (2016)Google Scholar
  27. 27.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 807–814 (2010)Google Scholar
  28. 28.
    Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Samek, W., Montavon, G., Binder, A., Lapuschkin, S., Müller, K.R.: Interpreting the predictions of complex ml models by layer-wise relevance propagation. arXiv preprint arXiv:1611.08191 (2016)
  30. 30.
    Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713 (2016)
  31. 31.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  33. 33.
    Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
  34. 34.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
  35. 35.
    Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3319–3328 (2017).
  36. 36.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  37. 37.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). CrossRefGoogle Scholar
  38. 38.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Series B (Stat. Methodol.) 67(2), 301–320 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.Zuse Institute BerlinBerlinGermany
  2. 2.Free University of BerlinBerlinGermany

Personalised recommendations