Abstract
The article presents the method of processing mass spectrometry data, with detection of peaks by using the mixture of distributions. Spectra are firstly subjected to preprocessing, involving calibration, normalization, denoising and baseline correction. After that they are modeled with Gaussian distributions. Each distribution represents a single peak. Means of Gaussians describe m/z values of peaks while standard deviations represent their width. Third parameters are weights representing highs of individual peaks. The simulation presents usage of Expectation- Maximization algorithm to processing spectra with known m/z value of albumin and unknown m/z value of other compounds existing in the analyzed data sets. The developed algorithm was employed in identification of m/z values of proteins attached to the albumin with usage of the decomposition of Gaussian components. Searched m/z values were discovered with predetermined accuracy.
Keywords
- maldi-tof
- mass spectrometry
- EM algorithm
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Baggerly, K., et al.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3, 1667–1672 (2005)
Coombes, K.: Pre-processing mass spectrometry data. In: Dubitzky, W., Granzow, M., Berrar, D. (eds.) Fundamentals of Data Mining in Genomics and Proteomics, pp. 79–99. Kluwer, Boston (2007)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Du, P., Kibbe, W., Lin, S.: Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 22(17), 2059–2065 (2006)
Eidhammer, I., et al.: Computational methods for mass spectrometry proteomics. John Wiley & Sons, Incorporated, Chichester (2007)
Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, New York (1981)
Fallin, D., Schork, N.J.: Accuracy of haplotype frequency estimation for biallelic loci, via the Expectation-Maximization algorithm for unphased diploid genotype data. American Journal of Human Genetics 67(4), 947–959 (2000)
Gentzel, M., Kocher, T., Ponnusamy, S., Wilm, M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics 3, 1597–1610 (2003)
Mantini, D., et al.: LIMPIC: a computational method for the separation of protein signals from noise. BMC Bionformatics 8(101) (2007)
Mantini, D., et al.: Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra. Bioinformatics 24, 63–70 (2008)
Morris, J., et al.: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)
Norris, J., et al.: Processing MALDI mass spectra to improve mass spectral direct tissue analysis. National Institutue of Health, US (2007)
Plechawska, M.: Comparing and similarity determining of Gaussian distributions mixtures. In: Materials of SMI Conference, S̈winoujście, Poland (2008)
Plechawska, M.: Using mixtures of Gaussian distributions for proteomic spectra analysis. In: Proceedings of the Xth International PhD Workshop OWD. Gliwice, Poland (2008)
Polanska, J.: The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes. International Journal Of Applied Mathematics And Computer Science 13(3), 419–429 (2003)
Polański, A., et al.: Application of the Gaussian mixture model to proteomic MALDI-ToF mass spectra. Journal of Computational Biology (2007)
Randolph, T., et al.: Quantifying peptide signal in MALDI-TOF mass spectrometry data. Molecular & Cellular Proteomics 4(12), 1990–1999 (2005)
Tibshirani, R., et al.: Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20, 3034–3044 (2004)
Yasui, Y., et al.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4, 449–463 (2003)
Zhang, S.Q., et al.: Peak detection with chemical noise removal using short-time FFT for a kind of MALDI data. Lecture Notes in Operations Research 7, 222–231 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Plechawska, M. et al. (2009). Analyze of Maldi-TOF Proteomic Spectra with Usage of Mixture of Gaussian Distributions. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds) Man-Machine Interactions. Advances in Intelligent and Soft Computing, vol 59. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00563-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-00563-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00562-6
Online ISBN: 978-3-642-00563-3
eBook Packages: EngineeringEngineering (R0)