LC-MS Data Analysis for Differential Protein Expression Detection

  • Rency S. Varghese
  • Habtom W. Ressom
Part of the Methods in Molecular Biology book series (MIMB, volume 694)


In proteomic studies, liquid chromatography coupled with mass spectrometry (LC-MS) is a common platform to compare the abundance of various peptides that characterize particular proteins in biological samples. Each LC-MS run generates data consisting of thousands of peak intensities for peptides represented by retention time (RT) and mass-to-charge ratio (m/z) values. In label-free differential protein expression studies, multiple LC-MS runs are compared to identify differentially abundant peptides between distinct biological groups. This approach presents a computational challenge because of the following reasons (i) substantial variation in RT across multiple runs due to the LC instrument conditions and the variable complexity of peptide mixtures, (ii) variation in m/z values due to occasional drift in the calibration of the mass spectrometry instrument, and (iii) variation in peak intensities caused by various factors including noise and variability in sample handling and processing. In this chapter, we present computational methods for quantification and comparison of peptides by label-free LC-MS analysis. We discuss data preprocessing methods for alignment and normalization of LC-MS data. Also, we present multivariate statistical methods and pattern recognition methods for detection of differential protein expression from preprocessed LC-MS data.

Key words

Mass spectrometry LC-MS Alignment Normalization Difference detection 



This work was supported in part by the National Science Foundation Grant IIS-0812246 awarded to HWR.


  1. 1.
    Lill, J. (2003) Proteomic tools for quantitation by mass spectrometry. Mass Spectrom Rev 22, 182–194.PubMedCrossRefGoogle Scholar
  2. 2.
    Goodlett, D. R. and Yi, E. C. (2003) Stable isotopic labeling and mass spectrometry as a means to determine differences in protein expression. TrAC Trends Anal Chem 22, 282–290.CrossRefGoogle Scholar
  3. 3.
    Old, W. M., Meyer-Arendt, K., Aveline-Wolf, L., Pierce, K. G., Mendoza, A., Sevinsky, J. R., Resing, K. A., and Ahn, N. G. (2005) Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteomics 4, 1487–1502.PubMedCrossRefGoogle Scholar
  4. 4.
    Zhongqi, Z., Shenheng, G., and Marshall, A. G. (1997) Enhancement of the effective resolution of mass spectra of high-mass biomolecules by maximum entropy-based deconvolution to eliminate the isotopic natural abundance distribution. J Am Soc Mass Spectrom 8, 659–670.CrossRefGoogle Scholar
  5. 5.
    Ramsay, J. O. and Silverman, B. W. (2002) Applied functional data analysi : methods and case studies. Springer, New York.CrossRefGoogle Scholar
  6. 6.
    Listgarten, J., Neal, R. M., Roweis, S. T., Wong, P., and Emili, A. (2007) Difference detection in LC-MS data for protein biomarker discovery. Bioinformatics 23, e198–e204.PubMedCrossRefGoogle Scholar
  7. 7.
    Wang, P., Tang, H., Fitzgibbon, M. P., McIntosh, M., Coram, M., Zhang, H., Yi, E., and Aebersold, R. (2007) A statistical method for chromatographic alignment of LC-MS data. Biostatistics 8, 357–367.PubMedCrossRefGoogle Scholar
  8. 8.
    Wiener, M. C., Sachs, J. R., Deyanova, E. G., and Yates, N. A. (2004) Differential mass spectrometry: a label-free LC-MS method for finding significant differences in complex peptide and protein mixtures. Anal Chem 76, 6085–6096.PubMedCrossRefGoogle Scholar
  9. 9.
    Radulovic, D., Jelveh, S., Ryu, S., Hamilton, T. G., Foss, E., Mao, Y., and Emili, A. (2004) Informatics platform for global proteomic profiling and biomarker discovery using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 3, 984–997.PubMedCrossRefGoogle Scholar
  10. 10.
    Sadygov, R. G., Maroto, F. M., and Huhmer, A. F. (2006) ChromAlign: a two-step algorithmic procedure for time alignment of three-dimensional LC-MS chromatographic surfaces. Anal Chem 78, 8207–8217.PubMedCrossRefGoogle Scholar
  11. 11.
    Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., and Schwikowski, B. (2006) Signal maps for mass spectrometry-based comparative proteomics. Mol Cell Proteomics 5, 423–432.PubMedGoogle Scholar
  12. 12.
    Jaitly, N., Monroe, M. E., Petyuk, V. A., Clauss, T. R., Adkins, J. N., and Smith, R. D. (2006) Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. Anal Chem 78, 7397–7409.PubMedCrossRefGoogle Scholar
  13. 13.
    America, A. H., Cordewener, J. H., van Geffen, M. H., Lommen, A., Vissers, J. P., Bino, R. J., and Hall, R. D. (2006) Alignment and statistical difference analysis of complex peptide data sets generated by multidimensional LC-MS. Proteomics 6, 641–653.PubMedCrossRefGoogle Scholar
  14. 14.
    Pierce, K. M., Wood, L. F., Wright, B. W., and Synovec, R. E. (2005) A comprehensive two-dimensional retention time alignment algorithm to enhance chemometric analysis of comprehensive two-dimensional separation data. Anal Chem 77, 7735–7743.PubMedCrossRefGoogle Scholar
  15. 15.
    Horvatovich, P., Govorukhina, N. I., Reijmers, T. H., van der Zee, A. G. J., Suits, F., and Bischoff, R. P. H. (2007) Chip-LC-MS for label-free profiling of human serum. Electrophoresis 28, 4493–4505.PubMedCrossRefGoogle Scholar
  16. 16.
    Mueller, L. N., Rinner, O., Schmidt, A., Letarte, S., Bodenmiller, B., Brusniak, M. Y., Vitek, O., Aebersold, R., and Muller, M. (2007) SuperHirn – a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics 7, 3470–3480.PubMedCrossRefGoogle Scholar
  17. 17.
    Listgarten, J., Neal, R. M., Roweis, S. T., and Emily, A. (2005) Multiple alignment of continuous time series. Neural Inf Process Syst 17, 817–824.Google Scholar
  18. 18.
    Befekadu, G. K., Tadesse, M. G., Hathout, Y., and Ressom, H. W. (2008) Multiclass alignment of LC-MS data using probabilistic-based mixture regression models. Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, 4094–4097.Google Scholar
  19. 19.
    Ressom, H. W., Befekadu, G. K., and Tadesse, M. G. (2009) Analysis of LC-MS data using probabilistic-based mixture regression models. at – Automatisierungstechnik 57, 453–465.CrossRefGoogle Scholar
  20. 20.
    Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B (Methodol) 39, 1–38.Google Scholar
  21. 21.
    Jordan, M. I. and Jacobs, R. A. (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6, 181–214.CrossRefGoogle Scholar
  22. 22.
    Redner, R. A. and Walker, H. F. (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26, 195–239.CrossRefGoogle Scholar
  23. 23.
    Katajamaa, M. and Oresic, M. (2005) Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics 6, 179.PubMedCrossRefGoogle Scholar
  24. 24.
    Sysi-Aho, M., Katajamaa, M., Yetukuri, L., and Oresic, M. (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8, 93.PubMedCrossRefGoogle Scholar
  25. 25.
    Karpievitch, Y. V., Taverner, T., Adkins, J. N., Callister, S. J., Anderson, G. A., Smith, R. D., and Dabney, A. R. (2009) Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics 25, 2573–2580.PubMedCrossRefGoogle Scholar
  26. 26.
    Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30, e15.PubMedCrossRefGoogle Scholar
  27. 27.
    Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193.PubMedCrossRefGoogle Scholar
  28. 28.
    Kerr, M. K., Martin, M., and Churchill, G. A. (2000) Analysis of variance for gene expression microarray data. J Comput Biol 7, 819–837.PubMedCrossRefGoogle Scholar
  29. 29.
    Hill, E. G., Schwacke, J. H., Comte-Walters, S., Slate, E. H., Oberg, A. L., Eckel-Passow, J. E., Therneau, T. M., and Schey, K. L. (2008) A statistical model for iTRAQ data analysis. J Proteome Res 7, 3091–3101.PubMedCrossRefGoogle Scholar
  30. 30.
    Purohit, P. V. and Rocke, D. M. (2003) Discriminant models for high-throughput proteomics mass spectrometer data. Proteomics 3, 1699–1703.PubMedCrossRefGoogle Scholar
  31. 31.
    Chen, C., Gonzalez, F. J., and Idle, J. R. (2007) LC-MS-based metabolomics in drug metabolism. Drug Metab Rev 39, 581–597.PubMedCrossRefGoogle Scholar
  32. 32.
    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc Series B 57, 289–300.Google Scholar
  33. 33.
    Opgen-Rhein, R. and Strimmer, K. (2007) Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Stat Appl Genet Mol Biol 6, Article9.PubMedGoogle Scholar
  34. 34.
    Datta, S. (2008) Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol 7, Article7.PubMedGoogle Scholar
  35. 35.
    Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., and Zhao, H. (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643.PubMedCrossRefGoogle Scholar
  36. 36.
    Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002) Gene Selection for cancer classification using support vector machines. Mach Learn 46, 389–422.CrossRefGoogle Scholar
  37. 37.
    Ressom, H. W., Varghese, R. S., Drake, S. K., Hortin, G. L., Abdel-Hamid, M., Loffredo, C. A., and Goldman, R. (2007) Peak selection from MALDI-TOF mass spectra using ant colony optimization. Bioinformatics 23, 619–626.PubMedCrossRefGoogle Scholar
  38. 38.
    Wang, Z., Wang, Y., Xuan, J., Dong, Y., Bakay, M., Feng, Y., Clarke, R., and Hoffman, E. P. (2006) Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data. Bioinformatics 22, 755–761.PubMedCrossRefGoogle Scholar
  39. 39.
    Zhang, Z. and Chan, D. W. (2005) Cancer proteomics: in pursuit of “true” biomarker discovery. Cancer Epidemiol Biomarkers Prev 14, 2283–2286.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Rency S. Varghese
    • 1
  • Habtom W. Ressom
    • 1
  1. 1.Department of OncologyGeorgetown University Medical CenterWashingtonUSA

Personalised recommendations