Beating the Noise: New Statistical Methods for Detecting Signals in MALDI-TOF Spectra Below Noise Level

  • Tim O. F. Conrad
  • Alexander Leichtle
  • Andre Hagehülsmann
  • Elmar Diederichs
  • Sven Baumann
  • Joachim Thiery
  • Christof Schütte
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4216)


Background: The computer-assisted detection of small molecules by mass spectrometry in biological samples provides a snapshot of thousands of peptides, protein fragments and proteins in biological samples. This new analytical technology has the potential to identify disease associated proteomic patterns in blood serum. However, the presently available bioinformatic tools are not sensitive enough to identify clinically important low abundant proteins as hormons or tumor markers with only low blood concentrations.

Aim: Find, analyze and compare serum proteom patterns in groups of human subjects having different properties such as disease status with a new workflow to enhance sensitivity and specificity.

Problems: Mass data acquired from high-throughput platforms frequently are blurred and noisy. This complicates the reliable identification of peaks in general and very small peaks even below noise level in particular. However, this statement is only valid for single or few spectra. If the algorithm has access to a large number of spectra (e.g. N > 1000), new possibilities arise, one of such being a statistical approach.

Approach: Apply signal preprocessing steps followed by statistical analyses of the blurred data and the region below the typical noise threshold to identify signals usually hidden below this “barrier”.

Results: A new analysis workflow has been developed that is able to accurately identify, analyze and determine peaks and their parameters even below noise level which other tools can not detect. A Comparison to commercial software has clearly proven this gain in sensitivity. These additional peaks can be used in subsequent steps to build better peak patterns for proteomic pattern analysis. We belive that this new approach will foster identification of new biomarkers having not been detectable by most algorithms currently available.


Gaussian Mixture Model Peak Detection Sporadic Breast Cancer Proteomic Pattern Peak Detection Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kozak, K.R., Su, F., Whitelegge, J.P., Faull, K., Reddy, S., Farias-Eisner, R.: Characterization of serum biomarkers for detection of early stage ovarian cancer. Proteomics 5(17), 4589–4596 (2005)CrossRefGoogle Scholar
  2. 2.
    Becker, S., Cazares, L.H., Watson, P., Lynch, H., Semmes, O.J., Drake, R.R., Laronga, C.: Surfaced-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1 and sporadic breast cancer. Ann. Surg. Oncol. 11(10), 907–914 (2004)CrossRefGoogle Scholar
  3. 3.
    Baumann, S., Ceglarek, U., Fiedler, G.M., Lembcke, J., Leichtle, A., Thiery, J.: Standardized approach to proteome profiling of human serum based on magnetic bead separation and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin. Chem. 51(6), 973–980 (2005)CrossRefGoogle Scholar
  4. 4.
    Hortin, G.L.: The MALDI TOF Mass Spectrometric View of the Plasma Proteome and Peptidome. Clin. Chem. (April 2006)Google Scholar
  5. 5.
    Breen, E.J., Hopwood, F.G., Williams, K.L., Wilkins, M.R.: Automatic poisson peak harvesting for high throughput protein identification. Electrophoresis 21(11), 2243–2251 (2000)CrossRefGoogle Scholar
  6. 6.
    Sauve, A.C., Speed, T.P.: Normalization, baseline correction and alignment of high-throughput mass spectrometry data. In: Proceedings Gensips 2004 (2004)Google Scholar
  7. 7.
    Gröpl, C., Hildebrandt, A., Kohlbacher, O., Lange, E., Lövenich, S., Sturm, M.: OpenMS - Software for Mass Spectrometry. In: MBI Workshop on Computational Proteomics and Mass Spectrometry 2005, Ohio State University (2005)Google Scholar
  8. 8.
    Mazet, V., Brie, D., Idier, J.: Baseline spectrum estimation using half-quadratic minimization. In: Proceedings of the European Signal Processing Conference, Vienna, Autriche (September 2004)Google Scholar
  9. 9.
    Wagner, M., Naik, D., Pothen, A.: Protocols for disease classification from mass spectrometry data. Proteomics 3(9), 1692–1698 (2003)CrossRefGoogle Scholar
  10. 10.
    Liu, Q., Krishnapuram, B., Pratapa, P., Liao, X., Hartemink, A., Carin, L.: Identification of differentially expressed proteins using maldi-tof mass spectra. In: Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1323–1327 (November 2003)Google Scholar
  11. 11.
    Louis, A.K., Maass, P., Rieder, A.: Wavelets: Theorie und Anwendungen. In: Teubner, B.G., Stuttgart (eds.), 2nd edn. (1998)Google Scholar
  12. 12.
    Nason, G.P., Silverman, B.W.: The stationary wavelet transform and some statistical applications. Lecture Notes in Statistics, vol. 103, pp. 281–300 (1995)Google Scholar
  13. 13.
    Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306), 572–577 (2002)CrossRefGoogle Scholar
  14. 14.
    Li, L., Tang, H., Wu, Z., Gong, J., Gruidl, M., Zou, J., Tockman, M., Clark, R.A.: Data mining techniques for cancer detection using serum proteomic profiling. Artif. Intell. Med. 32(2), 71–83 (2004)CrossRefGoogle Scholar
  15. 15.
    Norris, J.L., Cornett, D.S., Mobley, J.A., Schwartz, S.A., Roder, H., Caprioli, R.M.: Preparing maldi mass spectra for statistical analysis: A practical approach. In: Proceedings of the 53rd ASMS Conference on Mass Spectrometry and Allied Topics, San Antonio, TX (June 2005)Google Scholar
  16. 16.
    Fung, E.T., Enderwick, C.: ProteinChip clinical proteomics: computational challenges and solutions. Biotechniques (Suppl. 34–8), 40–41 (March 2002)Google Scholar
  17. 17.
    Baggerly, K.A., Morris, J.S., Wang, J., Gold, D., Xiao, L.-C., Coombes, K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3(9), 1667–1672 (2003)CrossRefGoogle Scholar
  18. 18.
    McDonough, R.N., Whale, A.D.: Detection of Signals in Noise, 2nd edn. Academic Press, San Diego (1995)Google Scholar
  19. 19.
    Guiasu, S., Shenitzer, A.: The principle of maximum entropy. The Mathematical Intelligencer 7(1), 42–48 (1985)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Verbeek, J.J., Vlassis, N., Kröse, B.: Efficient greedy learning of gaussian mixture models. Neural Comput. 15(2), 469–485 (2003)MATHCrossRefGoogle Scholar
  21. 21.
    Paalanen, P., Kamarainen, J.-K., Ilonen, J., Kälviäinen, H.: Representation and discrimination based on gaussian mixture model probability densities - practices and algorithms. Technical Report 95. Lappeenranta University of Technology, Department of Information Technology (2005)Google Scholar
  22. 22.
    Zhang, J., Gao, W., Cai, J., He, S., Zeng, R., Chen, R.: Predicting molecular formulas of fragment ions with isotope patterns in tandem mass spectra. IEEE/ACM Transactions on Computational Biology and Bioinformatics 02(3), 217–230 (2005)CrossRefGoogle Scholar
  23. 23.
    Wolfson, H.J., Rigoutsos, I.: Geometric hashing: An overview. IEEE Computational Science & Engineering 4(4), 10–21 (1997)CrossRefGoogle Scholar
  24. 24.
    Tibshirani, R., Hastie, T., Narasimhan, B., Soltys, S., Shi, G., Koong, A., Le, Q.-T.: Sample classification from protein mass spectrometry, by ’peak probability contrasts’. Bioinformatics 20(17), 3034–3044 (2004)CrossRefGoogle Scholar
  25. 25.
    Ferguson, T.S.: A bayesian analysis of some nonparametric problems. The Annals of Statistics 1, 209–230 (1973)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Blackwell, D., MacQueen, J.: Ferguson distributions via polya urn schemes. The Annals of Statistics 1, 353–355 (1973)MATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Aldous, D.J.: Exchangeability and related topics. Lecture Notes in Math - Ecole d’ete de probabilites de Saint-Flour, vol. 1117. Springer, Berlin (1983)Google Scholar
  28. 28.
    Ishwaran, H., James, L.F.: Generalized weighted chinese restaurant process for species sampling mixture models. Statistica Sinica 3, 1211–1235 (2003)MathSciNetGoogle Scholar
  29. 29.
    Lo, A.Y.: Weighted chinese restaurant processes. Cosmos 1(1), 107–111 (2005)CrossRefMathSciNetGoogle Scholar
  30. 30.
    Scheather, S.J.: Density estimation. Statistical Science 19(4), 588–597 (2004)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tim O. F. Conrad
    • 1
  • Alexander Leichtle
    • 2
  • Andre Hagehülsmann
    • 3
  • Elmar Diederichs
    • 1
  • Sven Baumann
    • 2
  • Joachim Thiery
    • 2
  • Christof Schütte
    • 1
  1. 1.Department of MathematicsFree University BerlinGermany
  2. 2.Institute of Laboratory Medicine, Clinical Chemistry and Molecular DiagnosticsUniversity Hospital LeipzigGermany
  3. 3.Microsoft ResearchCambridgeUK

Personalised recommendations