Alignment of Mass Spectrometry Data by Clique Finding and Optimization

  • Daniel Fasulo
  • Anne-Katrin Emde
  • Lu-Yong Wang
  • Karin Noy
  • Nathan Edwards
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4532)


Mass spectrometry (MS) is becoming a popular approach for quantifying the protein composition of complex samples. A great challenge for comparative proteomic profiling is to match corresponding peptide features from different experiments to ensure that the same protein intensities are correctly identified. Multi-dimensional data acquisition from liquid-chromatography mass spectrometry (LC-MS) makes the alignment problem harder. We propose a general paradigm for aligning peptide features using a bounded error model. Our method is tolerant of imperfect measurements, missing peaks, and extraneous peaks. It can handle an arbitrary number of dimensions of separation, and is very fast in practice even for large data sets. Finally, its parameters are intuitive and we describe a heuristic for estimating them automatically. We demonstrate results on single- and multi-dimensional data.


mass spectrometry alignment bounded error model clique finding 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Petricoin, E., Liotta, L.: Mass spectrometry-based diagnostics: the upcoming revolution in disease detection. Clin. Chem. 49, 533–534 (2003)CrossRefGoogle Scholar
  2. 2.
    Diamandis, E.P.: Mass spectrometry as a diagnostic and a cancer biomarker discovery tool. Molecular and Cellular Proteomics 3.4, 367–378 (2004)CrossRefGoogle Scholar
  3. 3.
    Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422, 198–208 (2003)CrossRefGoogle Scholar
  4. 4.
    Semmes, O.e.a.: Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. assessment of platform reproducibility. Clin. Chem. 51, 102–111 (2005)CrossRefGoogle Scholar
  5. 5.
    Baggerly, K.e.a.: Reproducibility of seldi-tof protein patterns in serum comparing data sets from different experiments. Bioinformatics 20, 777–785 (2003)CrossRefGoogle Scholar
  6. 6.
    Turnbull, B.W.: The empirical distribution function with arbitrarily grouped, censored, and truncated data. Journal of the Royal Statistical Association, Series B 38, 290–295 (1976)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Maathuis, M.: Reduction algorithm for the NPMLE for the distribution function of bivariate interval censored data. Journal of Computational and Graphical Statistics 14, 352–362 (2005)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. Science 220, 671–680 (1983)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Bogaerts, K., Lesaffre, E.: A new fast algorithm to find the regions of possible mass support for bivariate interval censored data. Technical Report 0312, IAP Statistics Network (2003)Google Scholar
  10. 10.
    Song, S.: Estimation with Bivariate Interval Censored Data. PhD thesis, University of Washington (2001)Google Scholar
  11. 11.
    Gentleman, R., Vandal, A.C.: Computational algorithms for censored data problems using intersection graphs. Journal of Computational and Graphical Statistics, 403–421 (2001)Google Scholar
  12. 12.
    Uy, W., Li, X., Liu, J., Wu, B., Williams, K.R., Zhao, H.: Multiple peak alignment in sequential data analysis: A scale-space approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(3), 208–219 (2006)CrossRefGoogle Scholar
  13. 13.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons Inc., Chichester (2001)zbMATHGoogle Scholar
  14. 14.
    Bellew, M., Coram, M., Fitzgibbon, M., Igra, M., Randolph, T., Wang, P., May, D., Eng, J., Fang, R., Lin, C.W., Chen, J., Goodlett, D., Whiteaker, J., Paulovich, A., McIntosh, M.: A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution lc-ms. Bioinformatics 22(15), 1902–1909 (2006)CrossRefGoogle Scholar
  15. 15.
    Li, X.-j., Yi, E.C., Kemp, C., Zhang, H., Aebersold, R.: A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography-mass spectrometry. Molecular & Cellular Proteomics 4(9), 1328–1340 (2005)CrossRefGoogle Scholar
  16. 16.
    Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3/4), 281–297 (1999)CrossRefGoogle Scholar
  17. 17.
    Kuglin, C.D., Hines, D.C.: The phase correlation image alignment method. In: Proceedings of the IEEE International Conference on Cybernetics and Society (1975)Google Scholar
  18. 18.
    Salvador, S., Chan, P.: Fastdwt: toward accurate dynamic time warping in linear time and space. In: Proceedngs of the KDD Workshop on Mining Temporal and Sequential Data (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Daniel Fasulo
    • 1
  • Anne-Katrin Emde
    • 1
  • Lu-Yong Wang
    • 1
  • Karin Noy
    • 1
  • Nathan Edwards
    • 2
  1. 1.Integrated Data Systems Department, Siemens Corporate Research, 755 College Road East, Princeton, NJUSA
  2. 2.Center for Bioinformatics and Computational Biology, 3119 Biomolecular Sciences Bldg. #296, University of Maryland, College Park, MD 20742 

Personalised recommendations