LC-MS Data Analysis for Differential Protein Expression Detection
In proteomic studies, liquid chromatography coupled with mass spectrometry (LC-MS) is a common platform to compare the abundance of various peptides that characterize particular proteins in biological samples. Each LC-MS run generates data consisting of thousands of peak intensities for peptides represented by retention time (RT) and mass-to-charge ratio (m/z) values. In label-free differential protein expression studies, multiple LC-MS runs are compared to identify differentially abundant peptides between distinct biological groups. This approach presents a computational challenge because of the following reasons (i) substantial variation in RT across multiple runs due to the LC instrument conditions and the variable complexity of peptide mixtures, (ii) variation in m/z values due to occasional drift in the calibration of the mass spectrometry instrument, and (iii) variation in peak intensities caused by various factors including noise and variability in sample handling and processing. In this chapter, we present computational methods for quantification and comparison of peptides by label-free LC-MS analysis. We discuss data preprocessing methods for alignment and normalization of LC-MS data. Also, we present multivariate statistical methods and pattern recognition methods for detection of differential protein expression from preprocessed LC-MS data.
Key wordsMass spectrometry LC-MS Alignment Normalization Difference detection
This work was supported in part by the National Science Foundation Grant IIS-0812246 awarded to HWR.
- 17.Listgarten, J., Neal, R. M., Roweis, S. T., and Emily, A. (2005) Multiple alignment of continuous time series. Neural Inf Process Syst 17, 817–824.Google Scholar
- 18.Befekadu, G. K., Tadesse, M. G., Hathout, Y., and Ressom, H. W. (2008) Multiclass alignment of LC-MS data using probabilistic-based mixture regression models. Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, 4094–4097.Google Scholar
- 20.Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B (Methodol) 39, 1–38.Google Scholar
- 32.Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc Series B 57, 289–300.Google Scholar