Efficient Model-Based Clustering for LC-MS Data
Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. The issue of high-throughput data processing is therefore becoming more and more significant. The problems of data imperfectness, presence of noise and of various errors introduced during experiments arise.
In this paper we focus on the peak alignment problem. As an alternative to heuristic based approaches to aligning peaks from different mass spectra we propose a mathematically sound method which exploits the model-based approach. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate different classes of models and select the most suitable one. We analyze the results in terms of statistically significant biomarkers that can be identified after alignment of spectra.
KeywordsFeature Selection False Discovery Rate Liquid Chromatography Mass Spectrometry Monoisotopic Peak DBSCAN Algorithm
Unable to display preview. Download preview PDF.
- 10.Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
- 11.Gambin, A., Dutkowski, J., Karczmarski, J., Kluge, B., Kowalczyk, K., Ostrowski, J., Poznański, J., Tiuryn, J., Bakun, M., Dadlez, M.: Automated reduction and interpretation of multidimensional ms data for analysis of complex peptide mixtures. International Journal of Mass Spectrometry (in press, 2006)Google Scholar
- 12.Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statististical Society, Series B, 1–38 (1977)Google Scholar
- 15.Fraley, C., Raftery, A.E.: MCLUST: Software for model-based clustering, density estimation and discriminant. Technical Report 415R, University of Washington, Department of Statistics (2002)Google Scholar