Introduction

The biological interpretation of post-genomic datasets depends on the ability to identify reliably the molecules that have been measured. For example, microarray analysis depends upon the hybridization behavior of complementary nucleotide sequences to enable detection of individual mRNA transcripts. For metabolomic analysis comparable means of simple identification have not yet been described; the chemical characterization of single molecular species being laborious and not currently amenable to automation. This has, to date, restricted the application of metabolomics largely to fingerprinting studies that detect diagnostic differences between samples but provide restricted biological insight (Allen et al., 2003; Goodacre et al., 2004; Kell, 2004; Nicholson et al., 2004).

Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS or simply FTMS) (Brown et al., 2005; Zhang et al., 2005) has so far been used only in a handful of published studies into metabolomics (Aharoni et al., 2002; Hirai et al., 2004; Murch et al., 2004; Tohge et al., 2005). However, the technique has great potential as a technology to unravel metabolomes. The extreme mass accuracy of the technique, coupled to ultra high resolution of mass species means that thousands of metabolites can be identified simultaneously without the need for chromatographic separations (Brown et al., 2005; Zhang et al., 2005). The ultra high mass accuracy enables assignment of putative chemical formulae to metabolites since only a finite number of combinations of carbon, nitrogen, oxygen, hydrogen, sulfur and phosphorus can yield the same precise measured mass.

This capability to assign likely chemical formulae to a multitude of metabolites may allow the analysis of metabolomes at a level comparable to microarray analysis of the transcriptome. However, as we demonstrate in this manuscript, as masses of metabolites increase the ability to assign individual chemical formulae diminishes. However, judicious analysis of the data from a given metabolomics experiment, can go some way to resolving this problem. This is because, in addition to offering putative identification of formulae, ultra high mass accuracy has the potential to identify the connectivity between related metabolites, since chemically transformed species will be related by measurable, clearly defined mass differences. In the present study we are able to position observed molecules uniquely and accurately in comprehensive metabolic networks that are generated ab initio from the measured mass peaks. Links in these networks correspond explicitly to actual chemical reactions and thus go beyond the metabolite correlation networks used previously (Steuer et al., 2003). We show that the ab initio metabolic networks have a highly informative, non-random structure and can be used to assign putative molecular identities to metabolites. Moreover, they open up a novel perspective on the global structure of cellular metabolism by providing the first comprehensive experimental assessment of metabolite connectivities, unbiased by the historical contingencies of classical biochemistry. The topology of metabolic networks described to date has been inferred based upon reactions predicted to occur within a given cell type based upon reactions that may be catalyzed by enzymes whose presence is predicted through genome analysis (Arita, 2004; Ma and Zeng, 2003a, b; Pfeiffer et al., 2005). The networks generated to date fail to take into account the fact that enzymes need not be expressed constitutively, nor compartmentalized in a manner allowing them to contribute to a given sub-network. Moreover, roles of non-enzymatic metabolite interconversions within the cell are not accounted for, nor are roles for enzymes with promiscuous substrate specificity. In spite of these limitations, metabolic network building is an important discipline and one that will benefit greatly from the introduction of techniques that can directly measure the metabolites present within a cell and report upon their connectivity. The work that we present here demonstrates the potential of using ultra high resolution mass spectrometry to generate such networks ab initio.

Another important implication of the present work is the ability to construct metabolic networks for organisms that have so far been outside the focus of classical biochemistry, i.e. beyond yeast and E. coli. Indeed, genomic information of any kind is not required for these analyses. Construction of such ab initio networks will form a useful basis for future system-wide comparative studies of metabolism in a wide variety of species.

Materials and methods

Chemicals and standards

ATP, ADP, NAD, NADP and diminazene aceturate (berenil) were of the highest grade available from Sigma. Trypanothione (N1,N8 bis-glutathionylspermidine) was from G.H. Coombs (University of Glasgow). DB75 (2,5-bis(4-amidinophenyl)furan) was from D. Boykin (Georgia State University). Pentamidine isethionate was from Aventis (through the World Health Organisation). Cymelarsan (melarsen oxide in solution) was from M. Turner (University of Glasgow).

Preparation of trypanosome extracts

Bloodstream form trypanosomes (Lister 427 line) were collected by cardiac puncture at 5 × 108 parasites/ml from Wistar rats and separated from red blood cells as a buffy coat by centrifugation at 3000g. The same parasite line was grown in HMI-9 medium supplemented with 20% foetal calf serum to mid-log phase (8 × 105 parasites/ml) of culture medium. Parasites were then centrifuged at 3000g for 5 min with pellets and supernatant then flash frozen in liquid nitrogen prior to extraction.

Mass spectrometric analysis

Fourier transform mass spectrometry was performed as we have described previously (Tohge et al., 2005). Briefly, cell pellets and media (300 μL) were extracted in solvents ranging from polar (aqueous) to non-polar with most proteins and nucleic acids removed during extraction. Extracts were stored at −80 °C. After appropriate dilution, samples were analysed on a Bruker Daltonics APEX III Fourier transform ion cyclotron resonance mass spectrometer equipped with a 7.0 T superconducting magnet (Bruker Daltonics, Billerica, MA). Samples were directly injected using electrospray ionization (ESI) and atmospheric pressure chemical ionization (APCI) at a flow rate of 600 μL per hour. Different sample extracts were analyzed separately, and the processed mass spectral data for each sample were combined. Sample peaks were calibrated using internal standards with peak mass error <1 ppm relative to the theoretical mass. Measured masses were combined into a single table for exploration using DISCOVAmetrics TM software.

Computational analysis

Further analyses used a combination of Microsoft Excel, MATLAB and custom-written Perl scripts, which are available from the authors upon request. The degree distributions in figure 1A were calculated from the ab initio networks by counting how often a particular “commonly observed” mass difference or a certain “biochemical reaction” mass difference occurred in the network. The mass differences were then ranked by their number of occurrences and plotted in that order. The degrees in figure 1B were obtained by counting the number of mass pairs each observed mass was involved in, i.e. its number of edges in the undirected ab initio networks.

Figure 1.
figure 1

Zipf plots of the reaction (A) and metabolite (B) degrees in the ab initio metabolic network of Trypanosoma brucei. In (A), the red dots correspond to the reactions inferred automatically from the mass–mass differences, while the green dots are based on a pre-defined set of common biochemical reactions. The fitted lines are based on power-law (A) and exponential distributions (B), respectively.

Results and discussion

Mass precision and resolution of FTMS

We started our study by analyzing a mixture of standard chemicals, including ATP, ADP, NAD, NADP, glutathione and a number of trypanocidal drugs important to our research. For the 13 standards that were detected in our sample, the average mass accuracy was 0.783 ppm (maximum 2.47 ppm; table 1). As we detect masses between 100 and 1500 atomic mass units, this resolution is sufficient to discriminate at least 50,000 molecular species, even when we assume that many of them will be represented by several peaks (isotope peaks and ion adducts). Indeed, for most of our standard molecules, single and double 13C peaks were detected in the expected proportions, as were a large number of minor contaminants, confirming the high sensitivity of the method.

Table 1 Molecular standards detected by FTMS

FTMS analysis of a parasitic protozoan, Trypanosoma brucei

We then proceeded with the analysis of metabolite samples from Trypanosoma brucei, a protozoan parasite that causes the fatal disease sleeping sickness in Africa (Barrett et al., 2003). We collected metabolic profiles for parasites grown in vivo (in rats) and in vitro (in serum culture) and compared these to the profiles from their environment (rat serum and culture medium, respectively). As a parasite, T. brucei has a drastically streamlined set of metabolic enzymes, making it particularly suitable for pioneering studies in metabolomics. At the same time, metabolic enzymes represent key targets for drugs used in treating sleeping sickness, and new targets are urgently required (Butler, 2005).

Excluding 13C isotope peaks and common ion adducts, a total of 399 masses were identified from rat-derived trypanosomes, while for in vitro grown cells the total number was 262. Of these, about 30% could be matched to putative identities in the chemical database PubChem (http://www.pubchem.ncbi.nlm.nih.gov/). These matches offer reasonable certainty regarding the empirical formula (matches to two alternative empirical formulae within 2 ppm are rare) although mass alone does not allow discrimination of chemical connectivity of atoms within the molecules.

Generation of ab initio metabolic networks

A majority of masses in our sample did not match to any known compound. This is due to the prevailing lack of knowledge about the total complexity of the metabolome of T. brucei and most other biological species, rather than limitations of mass accuracy. We have systematically explored the accuracy requirements needed for database matching and found that about 1–2 ppm is sufficient for a unique hit in PubChem, which in the release used contained 72,634 unique empirical formulae (table 2). However, making use of the high mass accuracy of Fourier transform mass spectrometry, and particularly studying mass–mass differences, has allowed us here to make significant progress in surmounting this problem. Two approaches were used to generate ab initio reconstructions of metabolic networks from the available data:

  1. (1)

    In a completely untargeted approach, all pairwise mass–mass differences were calculated. Considering all possible pairwise differences makes the approach unbiased, although not all molecules are related in a chemically feasible way. Thus, in a next step frequently occurring mass differences were identified (defined as clusters of more than five pairwise distances that differed by less than 0.0001 mass units). Such commonly observed distances are very unlikely to be observed by chance and can hence be expected to have a chemical basis. Compounds whose masses differed by one of these commonly observed masses were assumed to be related by a chemical transformation.

  2. (2)

    In the first approach above we have focused on all measured relationships within the dataset. This totally unbiased approach will not distinguish between metabolic transformations within the cell and chemical alterations induced during sample preparation and analysis by FTMS. As the full inventory of FTMS based artifacts becomes clear it will become possible to filter data for as many of these as possible. To work towards this approach we adopted a second, semi-targeted approach in which the observed pairwise mass–mass differences were compared to a list of 83 mass differences corresponding to common metabolic reactions compiled from biochemistry textbooks (table 3). Here, metabolites whose mass differed by the expected amount (within 2 ppm) were considered to be related by the corresponding metabolic transformation.

Table 2 Average number of matching empirical formulae identified in PubChem at various mass accuracies, averaging over all masses in the present release of the database (mass range 2–9200, median mass 438)
Table 3 Common metabolic transformations and corresponding formulae

This approach is related to the technique used for the de novo sequencing of proteins using tandem mass spectrometry (in which masses are related by the masses of individual amino acids). The task in proteomics is facilitated by the linear nature of the examined peptides and the well-defined set of possible building blocks, however the same principle, but using a more extensive set of transformation masses, should be informative in mass spectrometry as applied to metabolomics.

Table 4 summarizes the results of the first approach. About 25,370 mass–mass differences corresponded to one of 2472 “commonly occurring” mass differences. This is a dramatic excess over the number observed for random lists of masses (uniformly distributed between 100 and 1500 atomic mass units) shown in the same table. Thus, there are an astonishing 25,000 or more relationships between observed masses that can be explained by ab initio predicted chemical transformations. The most common of them are listed in table 4 and assigned to the most likely underlying chemical difference (including isotope variability). It is clear that not all of these enriched mass–mass differences will correspond to a catalyzed metabolic (or even chemical) reaction. Some of them may just be artificial fragmentation products, but just as in proteomics applications, where such artificial fragments are systematically exploited for peptide identification, they will also be informative in the case of metabolomics. More importantly, such artifacts provide an excellent “gold standard” for the evaluation of our approach: we know, for example, that isotope peaks should exist in our dataset, so re-discovering the corresponding patterns in an unsupervised manner demonstrates the general feasibility of the approach.

Table 4 Comparison of the most common mass differences in observed and random metabolite networks

The semi-targeted approach confirms the results of the untargeted network reconstruction, and largely overcomes the issue of mass spectrometric artifacts. In this case, 1438 mass differences correspond to one of the major biochemical transformations, compared to 271 (±25) for a random list of masses of the same size. The most common mass differences correspond to hydrogenation/dehydrogenation (H2; 284 occurrences), ethylene addition (C2H2; 211), ethyl addition (C2H4; 191), hydroxylation (O; 84) and palmitoylation (C16H30O; 57), all of them expected to be abundant in our membrane rich samples, based on general biochemical knowledge.

To determine the importance of mass accuracy for the ability of reconstructing metabolic networks, we added random noise of various size to the observed masses, i.e. a uniformly distributed random number from an interval indicated in the table was added or subtracted from each observed mass. We then performed the same untargeted analysis as before on these noisy data (table 5). The results show clearly that the reconstructed networks are robust against noise of this type – provided the accuracy of mass identification is ultra high. This analysis indicates that when mass accuracy falls to a number greater than 10 ppm in the order of 50% of the inferable transformations are lost. High accuracy spectra are essential for this approach to work.

Table 5 Stability of network inference against noise

Further confirmation of the non-random nature of the observed mass–mass difference network is provided by an analysis of the frequency of the various reactions. As shown in figure 1A, the number of times a specific mass difference is observed depends on its rank in the form of a power-law. This means that there are many rare reactions, but a few principal reactions/mass differences account for most chemical interconversions visible within the total dataset. Such a distribution would not be expected in a random network, but has been reported as an organizing principle for various metabolic networks (Jeong et al., 2000; Wagner and Fell, 2001; Ravasz et al., 2002; Almaas et al., 2004). These previous studies were generally based on a select series of enzymes and metabolites reported in the basic biochemical literature, or from genome-wide analysis of enzymatic reactions putatively present in an organism, superimposed on this historical view of metabolism (Edwards et al., 2001; Schilling et al., 2002; Forster et al., 2003; Covert et al., 2004). In striking contrast, the networks that we have identified, based on mass spectrometric data, reveal the potential of generating network connectivity “on the fly” from experimental results, without biasing outcomes based on well-established, but clearly incomplete, biochemical pathways. Interestingly, the degree distribution of the observed metabolites (i.e. the number of metabolic reactions in which each is predicted to participate) does not fit a power-law distribution in our data, but rather follows an exponential distribution with only slightly heavy tails (figure 1B). This is not consistent with those earlier reports (Jeong et al., 2000; Wagner and Fell, 2001) describing network properties extrapolated from enzymatic pathways predicted from whole-organism genome sequence information. It is, however, important to reiterate that we only reliably measure metabolites in the range 100–1500 atomic mass units. Thus, many central metabolites (e.g. water, CO2, pyruvate, glutamate) fall outside the mass window that we explore. This results in an absence of numerous major network “hubs”, and this influences the overall topology of the network and in particular removes the corresponding heavy tails in the degree distribution. In contrast, when examined from the point-of-view of metabolic transformations, which will take many important “hub metabolites” into account implicitly, the degree distribution is clearly following a power-law, although in the case presented here this distribution is also influenced by other chemical relationships that result from our mass spectrometric analysis. A future challenge will be to refine networks to include maximal information derived from the metabolome, while minimizing interference related to technical effects associated with sample preparation and analysis. In spite of this, the ab initio metabolic networks described here are in good agreement with the in silico networks derived through interpretation of genome content and biochemical literature. Technical refinements and variations in experimental design will certainly lead to further improvements in the amount and quality of information that can be used to build networks ab initio using Fourier transform mass spectrometry. Our results indicate that the effort required for these technical refinements is clearly warranted by the potential of the method to provide comprehensive and relatively unbiased overviews of the cellular metabolome.

Figure 2 shows an extract of the metabolic network, focusing on compounds that are greatly enriched in parasites compared to their environment. The same diagram also demonstrates the ease with which predicted transformations may be visualized within the network. Mass 809.5939 was predicted by database matching to be a choline phospholipid with four unsaturated bonds and 38 carbons in the lipid side chains. While mass alone cannot provide the identity of such a lipid, 1-stearoyl, 2-arachidonoyl-phosphatidylcholine (calculated mass = 809.5935) falls within the limits determined for these FTMS experiments. Moreover, a phosphocholine of this class has previously been identified as predominant in the T. brucei phospholipidome (Patnaik et al., 1993) making this a very like candidate. This identification was then used to predict the molecular identity of the connected metabolites, and all but one of the network’s 44 members were successfully assigned putative formulae in this way. All of them are phospholipids with various side chain compositions and different headgroups, which again conforms to expectations, as the parasite samples are rich in membrane material. This identification is confirmed by the clear pattern that emerges when one looks for metabolites whose mass-to-mass difference can be explained by side-chain elongation and side-chain (de-)saturation. Figure 3 shows the resulting pattern. It demonstrates that the abundance of the predicted phospholipid masses follows a clear trend, with higher degrees of unsaturation at larger side chain length. This is a well-known phenomenon supports our mass identification. Even stronger support is obtained when we compare the abundance of masses in parasites and serum. Three ether phospholipids stand out as dramatically enriched in parasites compared to their environment. This overabundance is in perfect agreement with reports in the literature (Patnaik et al., 1993). Figure 3 also shows that many of the putative phospholipid masses correlate with mass 809.5939 in abundance in the parasite samples. This indicates that correlations in ion abundance can also be used as indicators of connectivity, although at a coarser level than provided by the mass–mass differences. In a large-scale system perturbation study, such correlations could thus be an important piece of complementary information.

Figure 2.
figure 2

Extract of the ab initio metabolic network of Trypanosoma brucei. For clarity we show only metabolites that correlate in abundance with mass 809.5939, an unsaturated phosphatidyl choline phospholipid that is part of an enriched metabolite family in the parasite. The inset (A) highlights the first generation of transformations originating from mass 809.5939, the main figure (B) shows the entire subgraph, which connects more than 60% of the most strongly correlating masses (Pearson correlation r > 0.85). Assigned molecular identities for each metabolite are indicated in a shorthand notation, where Cn:m stands for a phosphatidyl choline with n carbon atoms in the side chains and m unsaturated bonds. Alternative headgroups are explicitly mentioned in the labels. Shades of green indicate the abundance of the metabolites in the parasite. The graph layout was generated using aiSee (http://www.aisee.com).

Figure 3.
figure 3

Abundance profile of various phospholipid classes. Diacyl cholines, alkylacyl cholines and alkylacyl ethanolamines are shown. The number of unsaturated bonds increases from left to right, the number of carbons in the side chains from top to bottom. The upper left corresponds to a saturated C16, C16 phospholipid, the bottom right to an 8-fold unsaturated C22, C22 molecule. The left column shows the absolute signal strength in trypanosomes in vivo. The right column shows the relative abdundance of the lipids compared to their concentration in the serum supernatant. Shades of blue indicate depletion in the parasites, yellow and red enrichment. One star denotes that the difference is significant at p < 0.05 (two-tailed t-test), two stars indicate that the same significant difference is also seen in vitro. A bar highlights masses that correlate in abundance with mass 809.5939. All three lipid classes show the same overall trends, with higher unsaturation at higher chain length. The highest abundance is found for three types of alkylacyl lipids, which can be putatively identified as C18:2,C18:0 alkylacyl phosphatidyl choline, C18:2, C18:0 alk-1-enylacyl phosphatidyl choline, and C18:2, C18:0 alk-1-enylacyl phosphatidyl ethanolamine.

Metabolic fragment analysis

Mass spectrometry fails to resolve structural isomers. Thus in spite of the high likelihood that our assigned chemical identification is robust (based on both exact mass calculation and metabolic connectivity) we have sought additional means of assigning an identity. The gold standard in determining positive identification involves targeted fragmentation of selected masses followed by a second mass spectrometry step. This tandem mass spectrometry process is, however, itself challenging and requires additional sample preparation and technical development.

Careful analysis of the FTMS dataset offers an additional route to add supporting data towards assignments, based on what we call “metabolic fragment analysis”. The technique is based solely on peaks derived from the dataset. Most biomolecules (including phospholipids) are formed by the condensation of building blocks and these may also be catabolized back to the building blocks by hydrolysis. For phospholipids, these building blocks will comprise the side chain fatty acids and the polar head groups. Hence, we searched for all triples of masses that could be explained by condensation/hydrolysis reactions (i.e. mass1 + mass2 = mass3 + massH2O, at 2 ppm accuracy). About 581 masses (about half of all those detected) are putative condensation/hydrolysis products of other masses within the dataset, with a total of 1637 inferred reactions. Fifteen masses are putatively involved in at least 20 condensations each, and four masses in more than 30 each (table 6). With the exception of phosphocholine these all have masses in a narrow range between 280 and 370, and most of them are putative sidechain fatty acids. Other common “metabolic fragments” are choline phosphate (183.0661) with 26, glycerylphosphorylcholine (257.1029) with 15, and palmitoyl lysolecithin (495.3316) with 17 condensation reactions. This information can be used to infer the side chain composition of the phospholipids. For example, mass 727.5509, the most abundant phospholipid of the trypanosome pellet, is a putative condensation product of masses 465.3207 and 280.2395. The latter corresponds to linoleic acid, leading to the prediction that 727.5509 is an 18:0 alk-1-enyl,18:2 acyl phosphatidylethanolamine. The single previous study (Patnaik et al., 1993) aimed specifically at characterizing the molecular species of phospholipids in trypanosomes also revealed that 18:0, 18:2 species were by far the most abundant in trypanosomes.

Table 6 Masses that occur in at least 30 putative condensation reactions among masses in our dataset. Their relative abundance in the various types of samples is indicated on an arbitrary scale. n.d., not detectable

Concluding remarks

Our results indicate that the unprecedented mass accuracy of Fourier transform mass spectrometry can lead to qualitative, rather than merely quantitative, advances in the study of cellular metabolism. Issues of sample preparation (e.g. loss of labile metabolites) and metabolite detection (e.g. ion suppression), which currently restrict the numbers of metabolite visible in Fourier transform mass spectrometry, remain a challenge (as discussed in Aharoni et al., 2002; Tohge et al., 2005). However, our study shows that the technology, coupled to advances in bioinformatic data interpretation, has great potential to allow unbiased and comprehensive studies of complex metabolic systems. Increasing numbers of metabolites should become visible as sample preparation parameters are optimized, and further advances in ultra-high resolution mass spectrometry promise to lead to substantial increases in the quantity of high resolution mass spectrometry data available for analysis (see for example Olsen et al., 2005). This will have a dramatic impact on the way such systems will be perceived and analyzed by biologists.