Abstract
Computational proteomics applications are often imagined as a pipeline, where information is processed in each stage before it flows to the next one. Independent of the type of application, the first stage invariably consists of obtaining the raw mass spectrometric data from the spectrometer and preparing it for use in the later stages by enhancing the signal of interest while suppressing spurious components. Numerous approaches for preprocessing MS data have been described in the literature. In this chapter, we will describe both, standard techniques originating from classical signal and image processing, and novel computational approaches specifically tailored to the analysis of MS data sets. We will focus on low level signal processing tasks such as baseline reduction, denoising, and feature detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Of course, this picture is changing as soon as we take posttranslational modifications or labelling techniques into account.
- 2.
Please note that peptides will usually elute over several subsequent time points and will therefore appear in several neighbouring scans.
- 3.
This whole “de-isotoping” step is often seen as part of a later stage of the proteomics pipeline - the identification stage - since it usually operates not on the raw data, but on the list of sticks. However, as we will show in a later section, integrating de-isotoping (feature detection) into the signal processing can improve prediction performance by extracting further valuable information from the data that would otherwise be neglected.
- 4.
In reality, the peak intensities rather follow a binomial distribution, but can be approximated by a Poisson distribution.
- 5.
The adaptive Wavelet transform is a slight generalization of the classical Wavelet transform in that the Wavelet kernel can vary with position; hence, the transform does not correspond to a simple convolution, but rather to a more complicated integral transform.
- 6.
The sinc-function is defined by sinc(x): = sin(x)/x.
References
Gay, S., Binz, P. A., Hochstrasser, D. F., Appel, R. D. (1999) Modeling peptide mass fingerprinting data using the atomic composition of peptides. Electrophoresis 20, 3527-34.
Bocker, S., Makinen, V. (2008) Combinatorial approaches for mass spectra recalibration. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5, 91-100.
Kolibal, J., Howard, D. (2006) MALDI-TOF baseline drift removal using stochastic Bernstein approximation. Eurasip Journal on Applied Signal Processing 1, 61.
Sauve, A. C., Speed, T. P. (2004) Normalization, baseline correction and alignment of high-throughput mass spectrometry data. In: Proceedings of the Genomic Signal Processing and Statistics workshop; 26-7.
Williams, B., Cornett, S., Crecelius, A., Caprioli, R., Dawant, B., Bodenheimer, B. (2005) An algorithm for baseline correction of MALDI mass spectra. In: ACM Southeast Regional Conference: ACM Proceedings.
Shin, H., Koomen, J., Baggerly, K., Markey, M. (2004) Towards a Noise Model of MALDI TOF Spectra. In: American Association for Cancer Research (AACR) Advances in Proteomics in Cancer Research, Waikoloa.
Du, P. C., Stolovitzky, G., Horvatovich, P., Bischoff, R., Lim, J., Suits, F. (2008) A noise model for mass spectrometry based proteomics. Bioinformatics 24, 1070-7.
Savitzky, A., Golay, M. J. E. (1964) Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry 36, 1627.
Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 829-36.
Donoho, D. L., Johnstone, I. M. (1995) Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90, 1200-24.
Wehofsky, M., Hoffmann, R. (2002) Automated deconvolution and deisotoping of electrospray mass spectra. Journal of Mass Spectrometry 37, 223-9.
Hoopmann, M. R., Finney, G. L., MacCoss, M. J. (2007) High-speed data reduction, feature detection and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. Analytical Chemistry 79, 5620-32.
Gambin, A., Dutkowski, J., Karczmarski, J., Kluge, B., Kowalczyk, K., Ostrowski, J., Poznanski, J., Tiuryn, J., Bakun, M., Dadlez, M. (2007) Automated reduction and interpretation of multidimensional mass spectra for analysis of complex peptide mixtures. International Journal of Mass Spectrometry 260, 20-30.
Kaur, P., O’Connor, P. B. (2006) Algorithms for automatic interpretation of high resolution mass spectra. Journal of the American Society for Mass Spectrometry 17, 459-68.
Horn, D. M., Zubarev, R. A., McLafferty, F. W. (2000) Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. Journal of the American Society for Mass Spectrometry 11, 320-32.
Mantini, D., Petrucci, F., Pieragostino, D., Del Boccio, P., Di Nicola, M., Di Ilio, C., Federici, G., Sacchetta, P., Comani, S., Urbani, A. (2007) LIMPIC: a computational method for the separation of protein MALDI-TOF-MS signals from noise. Bmc Bioinformatics 8, 101.
Noy, K., Fasulo, D. (2007) Improved model-based, platform-independent feature extraction for mass spectrometry. Bioinformatics 23, 2528-35.
Samuelsson, J., Dalevi, D., Levander, F., Rognvaldsson, T. (2004) Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting. Bioinformatics 20, 3628-35.
Schulz-Trieglaff, O., Hussong, R., Gröpl, C., Hildebrandt, A., Reinert, K. (2007) A fast and accurate algorithm for the quantification of peptides from mass spectrometry data. In: Research in computational molecular biology, Springer; 473-87.
Schulz-Trieglaff, O., Hussong, R., Gropl, C., Leinenbach, A., Hildebrandt, A., Huber, C., Reinert, K. (2008) Computational quantification of peptides from LC-MS data. Journal of Computational Biology 15, 685-704.
Yu, W. C., He, Z. Y., Liu, J. F., Zhao, H. Y. (2008) Improving mass spectrometry peak detection using multiple peak alignment results. Journal of Proteome Research 7, 123-9.
Muddiman, D. C., Rockwood, A. L., Gao, Q., Severs, J. C., Udseth, H. R., Smith, R. D., Proctor, A. (1995) Application of sequential paired covariance to capillary electrophoresis electrospray-ionization time-of-flight mass-spectrometry - unraveling the signal from the noise in the electropherogram. Analytical Chemistry 67, 4371-5.
Fleming, C. M., Kowalski, B. R., Apffel, A., Hancock, W. S. (1999) Windowed mass selection method: a new data processing algorithm for liquid chromatography-mass spectrometry data. Journal of Chromatography A 849, 71-85.
Lange, E., Gröpl, C., Reinert, K., Kohlbacher, O., Hildebrandt, A. (2006) High-accuracy peak picking of proteomics data using wavelet techniques. In: Pac Symp Biocomput; 243-54.
Strittmatter, E. F., Rodriguez, N., Smith, R. D. (2003) High mass measurement accuracy determination for proteomics using multivariate regression fitting: application to electrospray ionization time-of-flight mass spectrometry. Analytical Chemistry 75, 460-8.
Kempka, M., Sjodahl, J., Bjork, A., Roeraade, J. (2004) Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 18, 1208-12.
Di Marco, V. B., Bombi, G. G. (2001) Mathematical functions for the representation of chromatographic peaks. Journal of Chromatography A 931, 1-30.
Zubarev, R. A., Hakansson, P., Sundqvist, B. (1996) Accurate monoisotopic mass measurements of peptides: possibilities and limitations of high resolution time-of-flight particle desorption mass spectrometry. Rapid Communications in Mass Spectrometry 10, 1386-92.
Wool, A., Smilansky, Z. (2002) Precalibration of matrix-assisted laser desorption/ionization-time of flight spectra for peptide mass fingerprinting. Proteomics 2, 1365-73.
Tabb, D. L., MacCoss, M. J., Wu, C. C., Anderson, S. D., Yates, J. R., 3rd (2003) Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Analytical Chemistry 75, 2470-7.
Du, P., Kibbe, W. A., Lin, S. M. (2006) Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 22, 2059-65.
Carlson, S. M., Najmi, A., Whitin, J. C., Cohen, H. J. (2005) Improving feature detection and analysis of surface-enhanced laser desorption/ionization-time of flight mass spectra. Proteomics 5, 2778-88.
Randolph, T. W., Yasui, Y. (2006) Multiscale processing of mass spectrometry data. Biometrics 62, 589-97.
Andreev, V. P., Rejtar, T., Chen, H. S., Moskovets, E. V., Ivanov, A. R., Karger, B. L. (2003) A universal denoising and peak picking algorithm for LC-MS based on matched filtration in the chromatographic time domain. Analytical Chemistry 75, 6314-26.
Mantini, D., Petrucci, F., Del Boccio, P., Pieragostino, D., Di Nicola, M., Lugaresi, A., Federici, G., Sacchetta, P., Di Ilio, C., Urbani, A. (2008) Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra. Bioinformatics 24, 63-70.
Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P. A., Bienvenut, W., Hoogland, C., Sanchez, J. C., Bairoch, A., Hochstrasser, D. F., Appel, R. D. (1999) Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis 20, 3535-50.
Breen, E. J., Hopwood, F. G., Williams, K. L., Wilkins, M. R. (2000) Automatic Poisson peak harvesting for high throughput protein identification. Electrophoresis 21, 2243-51.
McIlwain, S., Page, D., Huttlin, E. L., Sussman, M. R. (2007) Using dynamic programming to create isotopic distribution maps from mass spectra. Bioinformatics 23, I328-I36.
Hussong, R., Tholey, A., Hildebrandt, A. (2007) Efficient analysis of mass spectrometry data using the isotope wavelet. In: Arno, P. J. M. S., Michael, R. B., Robert, C. G., Ad, J. F., editors CompLife. (AIP) American Institute of Physics http://proceedings.aip.org/proceedings/Melville, NY; 139-49.
Perkins, D. N., Pappin, D. J. C., Creasy, D. M., Cottrell, J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-67.
Eng, J. K., McCormack, A. L., Yates, J. R. (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. Journal of the American Society for Mass Spectrometry 5, 976-89.
Tabb, D. L., Shah, M. B., Strader, M. B., Connelly, H. M., Hettich, R. L., Hurst, G. B. (2006) Determination of peptide and protein ion charge states by Fourier transformation of isotope-resolved mass spectra. Journal of the American Society for Mass Spectrometry 17, 903-15.
Sadygov, R. G., Hao, Z., Huhmer, A. F. R. (2008) Charger: combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra. Analytical Chemistry 80, 376-86.
Na, S., Paek, E., Lee, C. (2008) CIFTER: automated charge-state determination for peptide tandem mass spectra. Analytical Chemistry 80, 1520-8.
Colinge, J., Magnin, J., Dessingy, T., Giron, M., Masselot, A. (2003) Improved peptide charge state assignment. Proteomics 3, 1434-40.
Chen, L., Yap, Y. L. (2008) Automated charge state determination of complex isotope-resolved mass spectra by peak-target Fourier transform. Journal of the American Society for Mass Spectrometry 19, 46-54.
Klammer, A. A., Wu, C. C., MacCoss, M. J., Noble, W. S. (2005) Peptide charge state determination for low-resolution tandem mass spectra. In: IEEE Computational Systems Bioinformatics Conference. IEEE Computer Society.
Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, A., Mann, M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Molecular & Cellular Proteomics 1, 376-86.
Du, P. C., Angeletti, R. H. (2006) Automatic deconvolution of isotope-resolved mass spectra using variable selection and quantized peptide mass distribution. Analytical Chemistry 78, 3385-92.
Sturm, M., Bertsch, A., Gropl, C., Hildebrandt, A., Hussong, R., Lange, E., Pfeifer, N., Schulz-Trieglaff, O., Zerck, A., Reinert, K., Kohlbacher, O. (2008) OpenMS - an open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163.
Acknowledgments
The authors would like to express their gratitude to Mrs. Anna Katharina Dehof, Mrs. Sophie Weggler, and Mrs. Linda Wolters for critical reading of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Hussong, R., Hildebrandt, A. (2010). Signal Processing in Proteomics. In: Hubbard, S., Jones, A. (eds) Proteome Bioinformatics. Methods in Molecular Biology™, vol 604. Humana Press. https://doi.org/10.1007/978-1-60761-444-9_11
Download citation
DOI: https://doi.org/10.1007/978-1-60761-444-9_11
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60761-443-2
Online ISBN: 978-1-60761-444-9
eBook Packages: Springer Protocols