RPML: A Learning-Based Approach for Reranking Protein-Spectrum Matches

  • Qiong Duan
  • Hao Liang
  • Chaohua Sheng
  • Jun Wu
  • Bo Xu
  • Zengyou He
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10954)


Searching top-down spectra against a protein database has been a mainstream method for intact protein identification. Ranking true Protein-Spectrum Matches (PrSMs) over their false counterparts is a feasible method for improving protein identification results. In this paper, we propose a novel model called RPML (Rerank PrSMs based on Machine Learning) to rerank PrSMs in top-down proteomics. The experimental results on real data sets show that RPML can distinguish more correct PrSMs from incorrect ones. The source codes of algorithm are available at


Protein identification Protein-spectrum matches Machine learning Rerank method 



This work was partially supported by the Natural Science Foundation of China (Nos. 61572094, 61502071), the Fundamental Research Funds for the Central Universities (Nos. DUT2017TB02, DUT14QY07) and the Science-Technology Foundation for Youth of Guizhou Province (No. KY[2017]250).


  1. 1.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2007)zbMATHGoogle Scholar
  2. 2.
    Cannon, J.R., Cammarata, M., Robotham, S.A., Cotham, V.C., Shaw, J.B., Fellers, R.T., Early, B.P., Thomas, P.M., Kelleher, N.L., Brodbelt, J.S.: Ultraviolet photodissociation for characterization of whole proteins on a chromatographic time scale. Anal. Chem. 86(4), 2185–2192 (2014)CrossRefGoogle Scholar
  3. 3.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)Google Scholar
  4. 4.
    Durbin, K.R., Fellers, R.T., Ntai, I., Kelleher, N.L., Compton, P.D.: Autopilot: an online data acquisition control system for the enhanced high-throughput characterization of intact proteins. Anal. Chem. 86(3), 1485–1492 (2014)CrossRefGoogle Scholar
  5. 5.
    Elias, J.E., Gygi, S.P.: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4(3), 207–214 (2007)CrossRefGoogle Scholar
  6. 6.
    He, Z., Yu, W.: Improving peptide identification with single-stage mass spectrum peaks. Bioinformatics 25(22), 2969–2974 (2009)CrossRefGoogle Scholar
  7. 7.
    He, Z., Zhao, H., Yu, W.: Score regularization for peptide identification. Asia Pac. Bioinf. Conf. 12(1), 1–10 (2011)Google Scholar
  8. 8.
    Liu, X., Inbar, Y., Dorrestein, P.C., Wynne, C., Edwards, N., Souda, P., Whitelegge, J.P., Bafna, V., Pevzner, P.A.: Deconvolution and database search of complex tandem mass spectra of intact proteins a combinatorial approach. Mol. Cell. Proteomics 9(12), 2772–2782 (2010)CrossRefGoogle Scholar
  9. 9.
    Liu, X., Sirotkin, Y., Shen, Y., Anderson, G., Tsai, Y.S., Ying, S.T., Goodlett, D.R., Smith, R.D., Bafna, V., Pevzner, P.A.: Protein identification using top-down spectra. Mol. Cell. Proteomics MCP 11(6), M111.008524 (2012)Google Scholar
  10. 10.
    Park, J., Piehowski, P.D., Wilkins, C., Zhou, M., Mendoza, J., Fujimoto, G.M., Gibbons, B.C., Shaw, J.B., Shen, Y., Shukla, A.K.: Informed-proteomics: open source software package for top-down proteomics. Nat. Methods 14(9), 909–914 (2017)CrossRefGoogle Scholar
  11. 11.
    Storey, J.D.: A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64(3), 479–498 (2002)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Sun, R., Luo, L., Wu, L., Wang, R., Zeng, W., Chi, H., Liu, C., He, S.: pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88(6), 3082–3090 (2016)CrossRefGoogle Scholar
  13. 13.
    Tian, Z., Tolic, N., Zhao, R., Moore, R.J., Hengel, S.M., Robinson, E.W., Stenoien, D.L., Wu, S., Smith, R.D., Pasatolic, L.: Enhanced top-down characterization of histone post-translational modifications. Genome Biol. 13(10), 1–9 (2012)CrossRefGoogle Scholar
  14. 14.
    Tsai, Y.S., Scherl, A., Shaw, J.L., Mackay, C.L., Shaffer, S.A., Langridgesmith, P.R.R., Goodlett, D.R.: Precursor ion independent algorithm for top-down shotgun proteomics. J. Am. Soc. Mass Spectrom. 20(11), 2154–2166 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of SoftwareDalian University of TechnologyDalianChina
  2. 2.School of Information EngineeringZunyi Normal UniversityZunyiChina

Personalised recommendations