, Volume 2, Issue 2, pp 75–83 | Cite as

Alignment of high resolution mass spectra: development of a heuristic approach for metabolomics

  • Saira A. Kazmi
  • Samiran Ghosh
  • Dong-Guk Shin
  • Dennis W. Hill
  • David F. GrantEmail author


One of the challenges of using mass spectrometry for metabolomic analyses of samples consisting of thousands of compounds is that of peak identification and alignment. This paper addresses the issue of aligning mass spectral data from different samples in order to determine average component m/z peak values. The alignment scheme developed takes the instrument m/z measurement error into consideration in order to heuristically align two or more samples using a technique comparable to automated visual inspection and alignment. The results obtained using mass spectral profiles of replicate human urine samples suggest that this heuristic alignment approach is more efficient than other approaches using hierarchical clustering algorithms. The output consists of an average m/z and intensity value for the spectral components together with the number of matches from the different samples. One of the major advantages of using this alignment strategy is that it eliminates the boundary problem that occurs when using predetermined fixed bins to identify and combine peaks for averaging and the efficient runtime allows large datasets to be processed quickly.


Mass spectrometry alignment clustering preprocessing 



This study was supported by National Institute of Health (P20 GM65764-02), the Department of Defense (N00014-99-1-0905; N00014-99-1-06006) and a Faculty Research Grant from the University of Connecticut.


  1. Aharoni A., Ric de Vos C.H. et al. (2002). Nontargeted metabolome analysis by use of fourier transform ion cyclotron mass spectrometry. OMICS: J Integr Biol 6(3):217–234CrossRefGoogle Scholar
  2. Ball G., Mian S., et al. (2002). An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404PubMedCrossRefGoogle Scholar
  3. Coombes K.R., Morris J.S. et al. (2005). Serum proteomics profiling–a young technology begins to mature. Nat. Biotechnol. 23(3):291–2PubMedCrossRefGoogle Scholar
  4. Duran A.L., Yang J. et al. (2003). Metabolomics spectral formatting, alignment and conversion tools (MSFACTs). Bioinformatics 19(17):2283–2293PubMedCrossRefGoogle Scholar
  5. Eisen M.B., Spellman P.T. et al. (1998). Cluster analysis and display of genome-wide expression patterns. PNAS 95(25):14863–14868PubMedCrossRefGoogle Scholar
  6. Geurts P., Fillet M. et al. (2005). Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21(14):3138–3145PubMedCrossRefGoogle Scholar
  7. Grabmeier J., Rudolph A. (2002). Techniques of cluster algorithms in data mining. Data Min. Knowl. Discov. 6(4):303–360CrossRefGoogle Scholar
  8. Jain A.K., Murty M.N. (1999). Data clustering: a review. ACM Comput. Surv. 31(3):264–323CrossRefGoogle Scholar
  9. Jeffries N. (2005). Algorithms for alignment of mass spectrometry proteomic data. Bioinformatics 21(14):3066–3073PubMedCrossRefGoogle Scholar
  10. Krznaric D., Levcopoulos C. (2002). Optimal algorithms for complete linkage clustering in d dimensions. Theor. Comput. Sci. 286(1):139–149CrossRefGoogle Scholar
  11. Li J., Zhang Z. et al. (2002). Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin. Chem. 48(8):1296–1304PubMedGoogle Scholar
  12. Montgomery D.C. (2004). Design and Analysis of Experiments. John Wiley and Sons, New JerseyGoogle Scholar
  13. Morris J.S., Coombes K.R., et al. (2005). Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9):1764–1775PubMedCrossRefGoogle Scholar
  14. Randolph, T. W. and Yasui, Y. (2004). Multiscale Processing of Mass Spectrometry Data. UW Biostatistics Working Paper SeriesGoogle Scholar
  15. Tibshirani R., Hastie T. et al. (2004). Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20(17):3034–3044PubMedCrossRefGoogle Scholar
  16. Vorst O., Vos C.H.R.d. et al. (2005). A non-directed approach to the differential analysis of multiple LC/MS-derived metabolic profiles. Metabolomics 1(2):169–180CrossRefGoogle Scholar
  17. Wong J.W.H., Cagney G. et al. (2005). SpecAlign–processing and alignment of mass spectra datasets. Bioinformatics 21(9):2088–2090PubMedCrossRefGoogle Scholar
  18. Wu B., Abbott T. et al. (2003). Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643PubMedCrossRefGoogle Scholar
  19. Yasui Y., McLerran D. et al. (2003). An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers. J. Biomed. Biotechnol. 4:242–248CrossRefGoogle Scholar
  20. Yu J., Chen X.-W. (2005). Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data. Bioinformatics 21(suppl_1):i487–494PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Saira A. Kazmi
    • 1
  • Samiran Ghosh
    • 2
  • Dong-Guk Shin
    • 1
  • Dennis W. Hill
    • 3
  • David F. Grant
    • 3
    Email author
  1. 1.Department of Computer Science and EngineeringUniversity of ConnecticutStorrsUSA
  2. 2.Department of StatisticsUniversity of ConnecticutStorrsUSA
  3. 3.Department of Pharmaceutical SciencesUniversity of ConnecticutStorrsUSA

Personalised recommendations