Optimization and testing of mass spectral library search algorithms for compound identification

  • Stephen E. Stein
  • Donald R. Scott


Five algorithms proposed in the literature for library search identification of unknown compounds from their low resolution mass spectra were optimized and tested by matching test spectra against reference spectra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were probability-based matching (PBM), dot-product, Hertz et al. similarity index, Euclidean distance, and absolute value distance. The test set consisted of 12,592 alternate spectra of about 8000 compounds represented in the database. Most algorithms were optimized by varying their mass weighting and intensity scaling factors. Rank in the list of candidatc compounds was used as the criterion for accuracy. The best performing algorithm (75% accuracy for rank 1) was the dot-product function that measures the cosine of the angle between spectra represented as vectors. Other methods in order of performance were the Euclidean distance (72%), absolute value distance (68%) PBM (65%), and Hertz et al. (64%). Intensity scaling and mass weighting were important in the optimized algorithms with the square root of the intensity scale nearly optimal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the results. A modest improvement in the performance of the dot-product algorithm was made by adding a term that gave additional weight to relative peak intensities for spectra with many peaks in common.


Mass Weighting Electron Ionization Mass Spectrometry Spectral Point NIST Database Intensity Scaling 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zurcher, M.; Clerc, J. T.; Farkas, M.; Pretsch, E. Anal. Chim. Acta 1988, 206, 161–172.CrossRefGoogle Scholar
  2. 2.
    (a) Martinsen, D. P. Appl. Spectrosc. 1981, 35, 255–266; (b) Martinsen. D. P.; Song. B.-H. Mass Spectrom. Rev. 1985, 4, 461–490.CrossRefGoogle Scholar
  3. 3.
    Clerc, J. T. In Computer Enhanced Analytical Spectroscopy; Meuzelaar, H. L. C., Isenhour, T. L., Eds.; Plenum Press: New York, 1980; pp 145–162.Google Scholar
  4. 4.
    Stein, S. E. J. Am. Soc. Mass Spectrom. 1994. 5, 316–323.CrossRefGoogle Scholar
  5. 5.
    (a) McLafferty, F. W.; Hertel,. R. H.; Villwock.. R. D. Org. Mass Spectrom. 1974, 9, 690–702; (b) Atwater, B. L., Stauffer, D. B.; McLafferty, F. W.; Peterson, D. W. Anal. Chem. 1985, 57, 899–903; (c) McLafferty, F. W.; Stauffer, D. B. J. Chem. Inf Comput. Sci. 1985, 25,245–252; (d) Stauffer, D. H.; McLafferty, F. W.; Ellis, R. D., Peterson, D. W. Anal. Chem. 1985, 57, 771–773.CrossRefGoogle Scholar
  6. 6.
    Sokolow, S.; Kamofsky, J.; Gustafson, P. The Finnigan Library Search Program; Finnigan Application Report 2; Finnigan Corp.; San Jose, CA, March 1978.Google Scholar
  7. 7.
    Pellizarri, E. D.; Hartwell, T.; Crowder, J. A Comparative Evaluation of GC/MS Data Analysis Processing; Project Report PB-85-125664; U.S. Environmental Protection Agency: Research Triangle Park, Ne, 1985.Google Scholar
  8. 8.
    Rasmussen, G. T.; Isenhour. T. L. J. Chem. Inf Comput. Sci. 1979, 19, 179–186.Google Scholar
  9. 9.
    Hertz, H. S.; Hites, R. A.; Biemannr K. Anal. Chem. 1971, 43, 681–691.CrossRefGoogle Scholar
  10. 10.
    (a) Pesya, G. M. Computerized Structure Retrieval and Interpretation of Mass Spectra: The Design and Evaluation of a Probability Based Matching System Using a Large Data Base; Doctoral Dissertation; Cornell University: Ithaca, NY.. 1975; (b) Atwater, B. L. More Reliable Identifications of Unknown Mass Spectra Using the Probability Based Matching Algorithm: Doctoral Dissertation; Cornell University: Ithaca. NY 1980; (c) Stauffer, D. B. Improved Identification of Unknown Mass Spectra Using the Probability Based Matching Algoritlun; Doctoral Dissertation; Cornell University: Ithaca, NY, 1984.Google Scholar
  11. 11.
    McLafferty, F. W.; Stauffer, D. B.; Loh, S. Y.; J. Am. Soc. Mass Spectrom. 1991, 2, 438–440.CrossRefGoogle Scholar
  12. 12.
    McLafferty..R. W.; Stauffer, D. B.; Twiss-Brooks, A. B.; Loh, S. Y. J. Am. Soc. Mass Spectrom. 1991, 2, 432–437.CrossRefGoogle Scholar
  13. 13.
    McLafferty, F. W. Anal. Chern. 1977, 49, 1441–1443.CrossRefGoogle Scholar
  14. 14.
    Lam, R. B.; Foulk, S. J.; Isenhour, T. L. Anal. Chem. 1981, 53, 1679–1684.CrossRefGoogle Scholar
  15. 15.
    Crawford, L. R.; Morrison, J. D. Anal. Chem. 1968, 10, 1464–1469.CrossRefGoogle Scholar

Copyright information

© American Society for Mass Spectrometry 1994

Authors and Affiliations

  • Stephen E. Stein
    • 1
  • Donald R. Scott
    • 2
  1. 1.Atmospheric Research and Exposure Assessment LaboratoryU. S. Environmental Protection AgencyResearch Triangle ParkUSA
  2. 2.NIST Mass Spectrometry Data CenterNational Institute of Standards and TechnologyGaithersburg

Personalised recommendations