A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics

  • Yan Fu
  • Qiang Yang
  • Charles X. Ling
  • Haipeng Wang
  • Dequan Li
  • Ruixiang Sun
  • Hu Zhou
  • Rong Zeng
  • Yiqiang Chen
  • Simin He
  • Wen Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3157)

Abstract

Case retrieval in case-based reasoning relies heavily on the design of a good similarity function. This paper provides an approach to utilizing the correlative information among features to compute the similarity of cases for case retrieval. This is achieved by extending the dot product-based linear similarity measures to their nonlinear versions with kernel functions. An application to the peptide retrieval problem in bioinformatics shows the effectiveness of the approach. In this problem, the objective is to retrieve the corresponding peptide to the input tandem mass spectrum from a large database of known peptides. By a kernel function implicitly mapping the tandem mass spectrum to a high dimensional space, the correlative information among fragment ions in a tandem mass spectrum can be modeled to dramatically reduce the stochastic mismatches. The experiment on the real spectra dataset shows a significant reduction of 10% in the error rate as compared to a common linear similarity function.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aebersold, R., Mann, M.: Mass Spectrometry-Based Proteomics. Nature 422, 198–207 (2003)CrossRefGoogle Scholar
  2. 2.
    Agnar, A., Plaza, E.: Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. AI Communications 7, 39–59 (1994)Google Scholar
  3. 3.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  4. 4.
    Bafna, V., Edwards, N.: SCOPE: a Probabilistic Model for Scoring Tandem Mass Spectra against a Peptide Database. Bioinformatics 17(Suppl. 1), S13–S21 (2001)Google Scholar
  5. 5.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Optimal Margin Classifiers. In: Haussler, D. (ed.) Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press, Pittsburgh (1992)CrossRefGoogle Scholar
  6. 6.
    Clauser, K.R., Baker, P., Burlingame, A.L.: Role of Accurate Mass Measurement (± 10 ppm) in Protein Identification Strategies Employing MS or MS/MS and Database Searching. Anal. Chem. 71, 2871–2882 (1999)CrossRefGoogle Scholar
  7. 7.
    Eng, J.K., McCormack, A.L., Yates, J.R.: An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. J. Am. Soc. Mass Spectrom 5, 976–989 (1994)CrossRefGoogle Scholar
  8. 8.
    Fenyö, D., Qin, J., Chait, B.T.: Protein Identification Using Mass Spectromic Information. Electrophoresis 19, 998–1005 (1998)CrossRefGoogle Scholar
  9. 9.
    Fenyö, D., Beavis, R.C.: A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes. Anal. Chem. 75, 768–774 (2003)CrossRefGoogle Scholar
  10. 10.
    Field, H.I., Fenyö, D., Beavis, R.C.: RADARS, a Bioinformatics Solution that Automates Proteome Mass Spectral Analysis, Optimises Protein Identification, and Archives Data in a Relational Database. Proteomics 2, 36–47 (2002)CrossRefGoogle Scholar
  11. 11.
    Fu, Y., Yang, Q., Sun, R., Li, D., Zeng, R., Ling, C.X., Gao, W.: Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics (2004) 10.1093/bioinformatics/bth186Google Scholar
  12. 12.
    Keller, A., Purvine, S., Nesvizhskii, A.I., Stolyar, S., Goodlett, D.R., Kolker, E.: Experimental Protein Mixture for Validating Tandem Mass Spectral Analysis. Omics 6, 207–212 (2002)CrossRefGoogle Scholar
  13. 13.
    Kolodner, J.L.: Case-Based Reasoning. Morgan Kaufmann Publisher, California (1993)Google Scholar
  14. 14.
    Leake, D.B., Kinley, A., Wilson, D.: Case-Based Similarity Assessment: Estimating Adaptability from Experience. In: Proceedings of the 14th National Conference on Artificial Intelligence, AAAI Press, Menlo Park (1997)Google Scholar
  15. 15.
    Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis 20, 3551–3567 (1999)CrossRefGoogle Scholar
  16. 16.
    Schölkopf, B., Simard, P., Smola, A.J., Vapnik, V.: Prior Knowledge in Support Vector Kernels. In: Jordan, M., Kearns, M., Solla, S. (eds.) Advances in Neural Information. Processing Systems, vol. 10, pp. 640–646. MIT Press, Cambridge (1998)Google Scholar
  17. 17.
    Schölkopf, B., Smola, A.J., Müller, K.R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 10, 1299–1319 (1998)CrossRefGoogle Scholar
  18. 18.
    Smyth, B., Keane, M.T.: Remembering to Forget: A Competence Preserving Deletion Policy for Case-Based Reasoning Systems. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 377–382. Morgan Kaufmann, San Francisco (1995)Google Scholar
  19. 19.
    Smyth, B., Keane, M.T.: Adaptation-Guided Retrieval: Questioning the Similarity Assumption in Reasoning. Artificial Intelligence 102, 249–293 (1998)MATHCrossRefGoogle Scholar
  20. 20.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)MATHGoogle Scholar
  21. 21.
    Wan, K.X., Vidavsky, I., Gross, M.L.: Comparing Similar Spectra: from Similarity Index to Spectral Contrast Angle. J. Am. Soc. Mass Spectrom. 13, 85–88 (2002)CrossRefGoogle Scholar
  22. 22.
    Watson, I.: Applying Case-Based Reasoning: Techniques for Enterprise Systems. Morgan Kaufmann Publisher, Inc., California (1997)MATHGoogle Scholar
  23. 23.
    Zhang, N., Aebersold, R., Schwikowski, B.: ProbID: A Probabilistic Algorithm to Identify Peptides through Sequence Database Searching Using Tandem Mass Spectral Data. Proteomics 2, 1406–1412 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Yan Fu
    • 1
    • 2
  • Qiang Yang
    • 3
  • Charles X. Ling
    • 4
  • Haipeng Wang
    • 1
  • Dequan Li
    • 1
  • Ruixiang Sun
    • 2
  • Hu Zhou
    • 5
  • Rong Zeng
    • 5
  • Yiqiang Chen
    • 1
  • Simin He
    • 1
  • Wen Gao
    • 1
    • 2
  1. 1.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.Graduate School of Chinese Academy of SciencesBeijingChina
  3. 3.Department of Computer ScienceHong Kong University of Science and TechnologyKowloon, Hong Kong
  4. 4.Department of Computer ScienceThe University of Western OntarioLondonCanada
  5. 5.Research Center for Proteome Analysis, Key Lab of Proteomics, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiChina

Personalised recommendations