Abstract
Five algorithms proposed in the literature for library search identification of unknown compounds from their low resolution mass spectra were optimized and tested by matching test spectra against reference spectra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were probability-based matching (PBM), dot-product, Hertz et al. similarity index, Euclidean distance, and absolute value distance. The test set consisted of 12,592 alternate spectra of about 8000 compounds represented in the database. Most algorithms were optimized by varying their mass weighting and intensity scaling factors. Rank in the list of candidatc compounds was used as the criterion for accuracy. The best performing algorithm (75% accuracy for rank 1) was the dot-product function that measures the cosine of the angle between spectra represented as vectors. Other methods in order of performance were the Euclidean distance (72%), absolute value distance (68%) PBM (65%), and Hertz et al. (64%). Intensity scaling and mass weighting were important in the optimized algorithms with the square root of the intensity scale nearly optimal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the results. A modest improvement in the performance of the dot-product algorithm was made by adding a term that gave additional weight to relative peak intensities for spectra with many peaks in common.
Article PDF
Similar content being viewed by others
References
Zurcher, M.; Clerc, J. T.; Farkas, M.; Pretsch, E. Anal. Chim. Acta 1988, 206, 161–172.
(a) Martinsen, D. P. Appl. Spectrosc. 1981, 35, 255–266; (b) Martinsen. D. P.; Song. B.-H. Mass Spectrom. Rev. 1985, 4, 461–490.
Clerc, J. T. In Computer Enhanced Analytical Spectroscopy; Meuzelaar, H. L. C., Isenhour, T. L., Eds.; Plenum Press: New York, 1980; pp 145–162.
Stein, S. E. J. Am. Soc. Mass Spectrom. 1994. 5, 316–323.
(a) McLafferty, F. W.; Hertel,. R. H.; Villwock.. R. D. Org. Mass Spectrom. 1974, 9, 690–702; (b) Atwater, B. L., Stauffer, D. B.; McLafferty, F. W.; Peterson, D. W. Anal. Chem. 1985, 57, 899–903; (c) McLafferty, F. W.; Stauffer, D. B. J. Chem. Inf Comput. Sci. 1985, 25,245–252; (d) Stauffer, D. H.; McLafferty, F. W.; Ellis, R. D., Peterson, D. W. Anal. Chem. 1985, 57, 771–773.
Sokolow, S.; Kamofsky, J.; Gustafson, P. The Finnigan Library Search Program; Finnigan Application Report 2; Finnigan Corp.; San Jose, CA, March 1978.
Pellizarri, E. D.; Hartwell, T.; Crowder, J. A Comparative Evaluation of GC/MS Data Analysis Processing; Project Report PB-85-125664; U.S. Environmental Protection Agency: Research Triangle Park, Ne, 1985.
Rasmussen, G. T.; Isenhour. T. L. J. Chem. Inf Comput. Sci. 1979, 19, 179–186.
Hertz, H. S.; Hites, R. A.; Biemannr K. Anal. Chem. 1971, 43, 681–691.
(a) Pesya, G. M. Computerized Structure Retrieval and Interpretation of Mass Spectra: The Design and Evaluation of a Probability Based Matching System Using a Large Data Base; Doctoral Dissertation; Cornell University: Ithaca, NY.. 1975; (b) Atwater, B. L. More Reliable Identifications of Unknown Mass Spectra Using the Probability Based Matching Algorithm: Doctoral Dissertation; Cornell University: Ithaca. NY 1980; (c) Stauffer, D. B. Improved Identification of Unknown Mass Spectra Using the Probability Based Matching Algoritlun; Doctoral Dissertation; Cornell University: Ithaca, NY, 1984.
McLafferty, F. W.; Stauffer, D. B.; Loh, S. Y.; J. Am. Soc. Mass Spectrom. 1991, 2, 438–440.
McLafferty..R. W.; Stauffer, D. B.; Twiss-Brooks, A. B.; Loh, S. Y. J. Am. Soc. Mass Spectrom. 1991, 2, 432–437.
McLafferty, F. W. Anal. Chern. 1977, 49, 1441–1443.
Lam, R. B.; Foulk, S. J.; Isenhour, T. L. Anal. Chem. 1981, 53, 1679–1684.
Crawford, L. R.; Morrison, J. D. Anal. Chem. 1968, 10, 1464–1469.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Stein, S.E., Scott, D.R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Spectrom. 5, 859–866 (1994). https://doi.org/10.1016/1044-0305(94)87009-8
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1016/1044-0305(94)87009-8