Abstract
The use of mass spectrometry (MS) is pivotal in analyses of the metabolome and presents a major challenge for subsequent data processing. While the last few years have given new high performance instruments, there has not been a comparable development in data processing. In this paper we discuss an automated data processing pipeline to compare large numbers of fingerprint spectra from direct infusion experiments analyzed by high resolution MS. We describe some of the intriguing problems that have to be addressed, starting with the conversion and pre-processing of the raw data to the final data analysis. Illustrated on the direct infusion analysis (ESI-TOF-MS) of complex mixtures the method exploits the full quality of the high-resolution present in the mass spectra. Although the method is illustrated as a new library search method for high resolution MS, we demonstrate that the output of the preprocessing is applicable to cluster-, discriminant analysis, and related multivariate methods applied directly to mass spectra from direct infusion analysis of crude extracts. This is done to find the relationship between several terverticillate Penicillium species and identify the ions responsible for the segregation.
Similar content being viewed by others
References
Allen J., Davey H.M., Broadhurst D., Heald J.K., Rowland J.J., Oliver S.G., Kell D.B. (2003) High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat. Biotechnol. 21:692–696
Birkinshaw K. (2003) Deconvolution of mass spectra measured with a non-uniform detector array to give accurate ion abundances. J. Mass Spectrom. 38: 206–210
Crawford L.R., Morrison J.D. (1968) Computer methods in analytical mass spectrometry: Identification of an unknown compound in a catalog. Anal. Chem. 40:1464–1469
Fellenberg, K., Hauser, N.C., Brors, B., Neutzner, A., Hoheisel, J.D., Vingron, M. (2001). Correspondence analysis applied to microarray data, Proc. Natl. Acad. Sci. U.S.A. 98:10781–10786
Fiehn O. (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol. Biol. 48:155–171
Frisvad, J.C. and Samson, R.A. (2004). Polyphasic taxonomy of Penicillium subgenus Penicillium. A guide to identification of the food and air-borne terverticillate Penicillia and their mycotoxins. Stud. Mycol. 49
Gauss, K.F., “General Investigations of Curved Surfaces” [1827] and “New General Investigations of Curved Surfaces” [1825]. Both papers bound as one book, General Investigations of Curved Surfaces, trans. Adam Hiltebeitel and James Morehead, intro. by Richard Courant, Raven Press, Hewlett, New York, 1965
Greaves J. (2002) Operation of an academic open access mass spectrometry facility with particular reference to the analysis of synthetic compounds. J. Mass Spectrom. 37:777–785
Greenacre M., Hastie T. (1987) The geometric interpretation of correspondence analysis. J. Am. Stat. Assoc. 82:437–447
Grotch S.L. (1971) Computer techniques for identifying low resolution mass spectra. Anal. Chem. 43:1362–1370
Guilhaus M., Selby D.S., Mlynski V. (2000) Orthogonal acceleration time-of-flight mass spectrometry. Mass Spectrom. Rev. 19:65–107
Han X., Gross R.W. (2003) Global analysis of cellular lipidomes directly from crude extracts of biological samples by ESI mass spectrometry: a bridge to lipodomics. J. Lipid Res. 44:1071–1079
Hansen M.E., Smedsgaard J. (2004) A new matching algorithm for accurate mass spectra. J. Am. Soc. Mass Spectrosc. 15:1173–2164
Hastie, T., Tibshirani, R., and Friedman, J. (2002). The elements of statistical learning; datamining, inference and prediction, Springer Verlag
Hertz H.S., Hites R.A., Biemann K. (1971) Identification of mass spectra by computer searching a file of known spectra. Anal. Chem. 43:681–691
Hill M.O. (1974) Correspondence analysis: A neglected multivariate method Appl. Stat. 23:340–354
Kell D.B. (2004) Metabolomics and system biology: making sense of the soup. Curr. Opin. Biotechnol. 7:296–307
Krzanowski W.J. (1993) Attribute selection in correspondence analysis of incidence matrices, Appl. Stat. 42:529–541
Leow, W.K. and Li, R. (2001). Adaptive binning and dissimilarity measure for image retrieval and classification. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 234–239.
Leow, W.K. (2002). The algebra and analysis of adaptive-binning color histograms, Dept. of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Technical Report TRB8/02.
Maharjan R.P., Ferenci T. (2003) Global metabolite analysis: the influence of extraction methodology on metabolome profiles of Escherichia coli. Anal. Biochem. 313:145–154
Marchetti A.A., Mignerey A.C. (1993) Deconvolution of mass spectra. Nucl. Instrum. Methods Phys. Res. 324(1) 288–296
McLafferty F.W. (1974) Propability based matching of mass spectra. Org. Mass Spectrom. 9:690–702
Mead A. (1992) Review of the development of multidimensional scaling methods. The Statistician 41:27–39
Niessen W.M.A. (2003) Progress in liquid chromatography-mass spectrometry instrumentation and its impact on high-throughput screening. J. Chromatogr. A. 1000:413–436
Payne, T.R. and Edwards, P. (1999). Dimensionality reduction through correspondence analysis, The Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburg PA 15232, USA, Tech. Rep. AUCS/TR9910.
Pitt J.I. (1979) The Genus Penicillium and its Teleomorphic States Eupencillium and Taleromyces. Academic Press, London
Ramsay J.O. (1983) Some statistical approaches to multidimensional scaling data. J. Roy. Stat. Soc. A 145:285–312
Ripley B.D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge
Roussopoulos, N., Kelley, S., and Vincent, F. (1995). Nearest neighbor queries, in Proceedings of the 1995 ACM-SIGMOD Intl. Conf. on Management of Data. 71–79
Samson R.A., Hoekstra E.S., Frisvad J.C., Filtenborg O. (2000) Introduction to Food and Airborne fungi, 6th ed. Centraalbureau voor Schimmelcultures, Utrecht
Smedsgaard J., Frisvad J.C. (1996) Using direct electrospray mass spectrometry in taxonomy and secondary metabolite profiling of crude fungal extracts. J. Microbiol. Meth. 25:5–17
Smedsgaard, J., A chemosystematic study of the terverticillate Penicillia using electrospray mass spectrometry, Ph.D. dissertation, IBT, Technical University of Denmark, Søltofts Plads, build 221, DK-2800 Kgs. Lyngby, 1996
Smedsgaard J. (1997a) Terverticillate Penicillia Studied by Direct Electrospray Mass Spectrometric Profiling of Crude Extracts. I. Chemosystematics. Biochem. Syst. Ecol. 25:51–64
Smedsgaard J. (1997b) Micro-scale extraction procedure for standardized screening of fungal metabolite production in cultures. J. Chromatogr. A 760:264–270
Smedsgaard, J., Hansen, M.E., and Frisvad, J.C. (2004). Classification of Terverticillate Penicillia by Electrospray Mass Spectrometric Profiling. Stud. Mycol. 49
Stein S.E., Scott D.R. (1994) Optimization and testing of mass spectral search algorithms for compound identification. J. Am. Soc. Mass Spectrosc. 5:859–866
Sumner L.W., Mendes P., Dixon R.A. (2003) Plant metabolomics: large-scale phytochemistry in the functional genomics area. Phytochemistry 62:817–836
Vaidyanathan S., Kell D.B., Goodacre R. (2002) Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-thoughput bacteial identification. J. Am. Soc. Mass Spectrom. 13:118–128
Wehofsky M., Hoffmann R. (2002) Special feature: Perspective - automated deconvolution and deisotoping of electrospray mass spectra. J. Mass Spectrom. 37:223–229
Acknowledgments
The authors thank Professor Jens Christian Frisvad (BioCentrum-DTU) for identification of the species. Ellen Kirstine Lyhne and Hanne Jacobsen are greatly acknowledged for cutting the plugs, doing the extraction and analysis of the samples. The project was supported by the Danish Technical Research Council under the project “Programme for predictive biotechnology: Functional biodiversity in Penicillium and Aspergillus” (grant no. 9901295) and The Danish Research Council (grant no. 274–05-0606).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Scripts
The software used for extracting data from the MassLynx data files, can be obtained together with a full documentation of the scripts doing the processing by contacting the corresponding author by email: meh@biocentrum.dtu.dk.
Adaptive binning
The adaptive binning algorithm used
-
Require: \({\varvec{\Phi} =\left\{{\varphi _p^{{\rm S}_k} }\right\}}\)
-
for all k∈{ 1,...,K }and p∈{ 1,...,| S k | } do
-
find the nearest cluster c to φ S p k
-
if no cluster is found or distance \({d_{cp} \geq d_{max}}\) then
-
create a new cluster with element p;
-
else if \({d_{cp} \leq d_{min}}\) then
-
add element p to cluster k
-
end if
-
end for
-
for all cluster i do
-
if cluster i has at least N m elements then
-
update centroid c i of cluster i
-
remove cluster i
-
end if
-
end for
Rights and permissions
About this article
Cite this article
Hansen, M.A.E., Smedsgaard, J. Automated work-flow for processing high-resolution direct infusion electrospray ionization mass spectral fingerprints. Metabolomics 3, 41–54 (2007). https://doi.org/10.1007/s11306-006-0044-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-006-0044-0