A novel stable isotope labelling assisted workflow for improved untargeted LC–HRMS based metabolomics research
- 3.6k Downloads
Many untargeted LC–ESI–HRMS based metabolomics studies are still hampered by the large proportion of non-biological sample derived signals included in the generated raw data. Here, a novel, powerful stable isotope labelling (SIL)-based metabolomics workflow is presented, which facilitates global metabolome extraction, improved metabolite annotation and metabolome wide internal standardisation (IS). The general concept is exemplified with two different cultivation variants, (1) co-cultivation of the plant pathogenic fungi Fusarium graminearum on non-labelled and highly 13C enriched culture medium and (2) experimental cultivation under native conditions and use of globally U-13C labelled biological reference samples as exemplified with maize and wheat. Subsequent to LC–HRMS analysis of mixtures of labelled and non-labelled samples, two-dimensional data filtering of SIL specific isotopic patterns is performed to better extract truly biological derived signals together with the corresponding number of carbon atoms of each metabolite ion. Finally, feature pairs are convoluted to feature groups each representing a single metabolite. Moreover, the correction of unequal matrix effects in different sample types and the improvement of relative metabolite quantification with metabolome wide IS are demonstrated for the F. graminearum experiment. Data processing employing the presented workflow revealed about 300 SIL derived feature pairs corresponding to 87–135 metabolites in F. graminearum samples and around 800 feature pairs corresponding to roughly 350 metabolites in wheat samples. SIL assisted IS, by the use of globally U-13C labelled biological samples, reduced the median CV value from 7.1 to 3.6 % for technical replicates and from 15.1 to 10.8 % for biological replicates in the respective F. graminearum samples.
Keywords13C-labelling Internal standardisation Metabolomics Fusarium Wheat Maize
While full genome sequences have been determined for many organisms, it is currently still not possible to measure the complete metabolite inventory of a biological system due to methodical limitations. Complementary, sensitive and generic techniques are required to cope with the large chemical diversity and wide dynamic range of low molecular weight metabolites. Gas chromatography (GC) or liquid chromatography (LC) coupled to mass spectrometry (MS) as well as nuclear magnetic resonance (NMR) spectroscopy have emerged as key techniques in the field of metabolomics, as recently reviewed by e.g. Zhang et al. (2012), Patti et al. (2012b) and Zhou et al. (2012). The combination of LC with electrospray ionisation (ESI) high resolution mass spectrometry (HRMS) has proven to be particularly powerful as this technique enables the detection of a large number of known and unknown metabolites simultaneously and requires only small amounts of the biological sample (Hiller et al. 2011; Patti et al. 2012b).
Two different metabolomics concepts can be distinguished: targeted and untargeted approaches. In targeted approaches, a set of predefined known substances is determined, thus, absolute quantification of those metabolites, which are available as authentic reference standards, is feasible. In contrast, untargeted approaches try to find mass spectrometric features of all detectable metabolites, including those unknown or at least unidentified at the time of measurement. Therefore, the untargeted approach has the advantage of probing the entire, observable metabolic space and can obtain relative abundances of several hundreds to thousands of metabolites simultaneously (Patti et al. 2012b). For the automated data processing of such LC–HRMS derived metabolomics datasets, various workflows and software packages have been developed and are frequently used in untargeted metabolomics studies e.g. XCMS (Smith et al. 2006), MzMine (Pluskal et al. 2010), MetAlign (Lommen and Kools 2012) or Maven (Clasquin et al. 2012). These software tools have in common, that they extract as many features as possible from raw LC–HRMS derived metabolomics data sets. In this respect the term feature has been defined to be a bounded, two dimensional LC–HRMS signal consisting of a chromatographic peak (i.e. retention time) and a MS signal (m/z value) (Kuhl et al. 2012).
Despite the recent advances regarding both LC–HRMS instrumentation and data handling platforms, the comprehensive annotation of the metabolome of a biological sample of interest and subsequent metabolite identification still remain the major bottlenecks in untargeted metabolomics, especially for LC-ESI-HRMS based studies (Scalbert et al. 2009; Castillo et al. 2011; Patti et al. 2012b; Theodoridis et al. 2012; Dunn et al. 2013). This limitation can largely be attributed to the generic nature of the ESI process, unavoidably leading to LC-ESI-HRMS full scan chromatograms and spectra, containing a large proportion of background and chemical noise compared to the signals originating from true metabolites (Keller et al. 2008; Covey et al. 2009; Trotzmüller et al. 2011). Further challenges arise from the fact that a single metabolite leads to more than one ion species (e.g. isotopologue peaks, different adducts, in-source fragments and even more complex combinations of the previous species). In addition, many metabolites cannot completely be separated in the chromatographic dimension and therefore LC–HRMS measurements result in mass spectra, which contain signals from more than one metabolite.
Another obstacle of untargeted LC-ESI-HRMS based metabolomics is related to relative quantification of the detected metabolite ions, which is caused by so called matrix effects. The composition of the evaporated sample at any time point of the LC–HRMS measurement can significantly influence the ionization efficiency and leads to ion suppression or ion enhancement in the ESI source of the mass spectrometer (Tang and Kebarle 1993; King et al. 2000). Matrix effects can seriously affect signal intensities as well as precision and even limit the coverage of the metabolome (Vogeser and Seger 2010; Koal and Deigner 2010). They are difficult to overcome in global untargeted studies as the matrix is composed of the biological sample itself. Thus, except protein precipitation, sample purification is generally not a suitable option as this would largely discriminate many sample constituents of interest (Tulipani et al. 2013). Moreover, the availability of appropriate internal standards is often limited. The detailed and comprehensive study of matrix effects is laborious and challenging, thus only a few studies reported the systematic evaluation of matrix effects and their limitations on relative metabolite quantification in the field of LC–HRMS based metabolomics (Böttcher et al. 2007; Redestig et al. 2011; Tulipani et al. 2013).
With respect to the above mentioned limitations regarding global annotation of the metabolome and method performance evaluation, there is a great demand for both innovative approaches for the analytical measurement of biological samples with LC–HRMS as well as the development of novel, improved data processing algorithms.
Stable isotope labelling (SIL) is a technique, which is becoming increasingly used in different areas of metabolomics research and it shows the potential to conquer many of the elucidated limitations in untargeted metabolomics research. In this respect, SIL assisted experiments employ stable isotopes of elements such as carbon (13C), hydrogen (2H), oxygen (18O), nitrogen (15N) and sulphur (34S) (Klein and Heinzle 2012; Nakabayashi et al. 2013) respectively. However, 13C is used most commonly as the main labelling isotope, since carbon is part of virtually any metabolite. Non-labelled, partly labelled and highly (>98 %) 13C enriched (U-13C) metabolites show the same physico-chemical properties and therefore are not separated by chromatography, but can easily be distinguished by their mass to charge ratio (m/z) using an MS instrument.
The use of globally U-13C labelled biological samples enables to circumvent problems in untargeted metabolomics, such as metabolome annotation, generation of sum formulas of the detected metabolites and putative metabolite identification. It was demonstrated that the combination of 13C, 15N and 34S labelling for example can help to assign the number of atoms of the respective labelling element to a metabolite ion correctly and thereby facilitates annotation of metabolites by database search (Hegeman et al. 2007; Giavalisco et al. 2008; Cano et al. 2013). In addition to improved feature extraction and metabolite annotation, SIL experiments have also been successfully used to accomplish internal standardisation (IS) for quantification of metabolite levels and, thus correct ion suppression or MS signal fluctuations caused by matrix effects in LC-ESI-HRMS (Bennett et al. 2008; Giavalisco et al. 2009; Hegeman 2010). Moreover, IS by globally stable isotope labelled biological samples allow both detailed characterisation of the performance of the used metabolomics workflow as well as an improved relative quantification/technical precision of metabolomics data.
Despite the high potential of SIL assisted approaches and their successful application in various fields, to the best of our knowledge only a few data processing tools have been published for the automated evaluation of LC–HRMS data originating from labelled biological samples. For non-targeted GC–MS based metabolomics SIL assisted metabolomics, Hiller and colleagues published the Non-targeted Tracer Fate Detection (NTFD) algorithm (Hiller et al. 2013) to study labelled tracer compounds in the central metabolism (Hiller et al. 2010). Moreover, de Jong and Beecher (2012) have successfully implemented a sophisticated method termed IROA (Isotopic Ratio Outlier Analysis™) to automatically extract features differing between experimental conditions after parallel cultivation on native and U-13C-labelled nutrition sources. IROA is offered as a commercial metabolomics application and software programme. The R package mzMatch-ISO (Chokkathukalam et al. 2013) is a software tool for the annotation and relative quantification of SIL derived MS data with the aim to provide insight into metabolic fluxes of biological systems. It is designed to use metabolomics data analysed by XCMS (Tautenhahn et al. 2008) and allows an in depth evaluation and visualisation of the associated isotopic patterns and their respective abundances of various native and labelled metabolites. To the best of our knowledge, MetExtract, which has been developed in our laboratory, is to date the only publicly available tool aiming at the untargeted, automated global detection of truly metabolite derived LC–HRMS signals originating from natural (compounds showing a natural carbon isotopic distribution pattern are termed “non-labelled” in the following) and stable isotope labelled biological samples (Bueschl et al. 2012).
Here a detailed analytical and data processing workflow for SIL assisted untargeted LC–HRMS based metabolomics experiments is presented. This workflow is exemplified by two representative experiments: In the first approach the filamentous fungus Fusarium graminearum is cultivated in parallel on a non-labelled and a U-13C labelled carbon source respectively under identical conditions. In the second approach a metabolomics experiment is performed using a non-labelled carbon source for cultivation of biological samples (wheat and maize), while globally U-13C labelled biological reference samples are used for IS. The concept and performance of both variants are presented in detail.
2 Materials and methods
2.1 Chemicals and biological samples
Acetonitrile (ACN, HiPerSolv Chromanorm, HPLC gradient grade) was purchased from VWR (Vienna, Austria); Methanol (MeOH, LiChrosolv, LC gradient grade) was purchased from Merck (Darmstadt, Germany); formic acid (FA, MS grade) was obtained from Sigma-Aldrich (Vienna, Austria). Water was purified successively by reverse osmosis and an ELGA Purelab Ultra-AN-MK2 system (Veolia Water, Vienna, Austria). Components of the modified FMM were purchased from the following suppliers: Fluka (KH2PO4, Fe(NH4)2(SO4)2·6H2O), Roth (MgSO4·7H2O, KCl, ZnSO4, H3BO3), Sigma Aldrich (NaNO3, MnSO4, CuSO4·5H2O, Na2MoO4·2H2O), Serva (citric acid) and VWR (glucose). U-13C6-glucose with a 13C enrichment degree of 99 % was obtained from Eurisotop (Saarbrücken, Germany). U-13C labelled wheat ear (>97 % 13C, cultivar Baldus), and U-13C labelled maize kernels (>97 % 13C, cultivar Yukon chief) were obtained from Isolife (Wageningen, The Netherlands).
2.2 Cultivation of Fusarium graminearum, wheat and maize samples
2.2.1 Variant A: 12C and 13C co-cultivation (F. graminearum samples)
Fusarium graminearum wild-type (PH-1, NRRL 31084) and its isogenic tri5Δ::loxP mutant lacking the gene encoding trichodiene synthase, the first enzyme in the trichothecene biosynthetic pathway were used. The strains were cultivated in a modified Fusarium minimal medium (FMM: 1 g/L KH2PO4, 0.5 g/L MgSO4·7H2O, 0.5 g/L KCl, 2 g/L NaNO3, 10 mg/L citric acid, 10 mg/L ZnSO4·6H2O, 2 mg/L Fe(NH4)2(SO4)2·6H2O, 0.5 mg/L CuSO4·5H2O, 0.1 mg/L MnSO4, 0.1 mg/L H3BO3, 0.1 mg/L Na2MoO4·2H2O) (Leslie and Summerell 2007) containing either non-labelled glucose or U-13C-glucose as sole carbon source at a concentration of 10 g/L. F. graminearum wild-type and tri5Δ strains were grown on non-labelled as well as on U-13C labelled FMM using six biological replicates per strain and nutrition condition resulting in a total of 24 samples. The cultures were set up as follows: the strains were sporulated in mung-bean medium as described before by Kluger et al. (2013). 1-ml aliquots of either non-labelled or U-13C labelled glucose containing medium were pipetted to each well of a UNIFILTER 24-well 10 ml filtration microplate equipped with a Whatman GF/C filter (VWR, Vienna, Austria) and each well was inoculated with 2,000 spores of the respective F. graminearum strain. Still cultures were grown at 20 °C in the dark for 7 days.
2.2.2 Variant B: 12C cultivation & 13C reference metabolome (wheat and maize samples)
Seeds of the wheat cultivars “Remus” and “CM-82036” were grown in pots with soil under environmentally controlled greenhouse conditions. Light and watering regime, humidity and temperature were kept under controlled conditions whenever possible and readjusted continuously to fit the plant’s actual developmental stage. At the onset of anthesis five ears (ten spikelets per ear) were harvested for each cultivar at the beginning of a luminescence cycle and immediately shock-frozen in liquid nitrogen to quench cellular metabolism. In total ten ears were sampled and stored at −80 °C until further sample preparation.
Seeds of the maize line CO354 were planted in pots and later transferred to an environmentally controlled greenhouse with controlled light regime and temperature conditions. Ears were harvested 18 days after hand pollination and kernels were immediately extracted using sterilised scalpels. Three pools of kernels were obtained, where each pool derived from the mixing of seeds came from three different ears, samples were immediately frozen in liquid nitrogen after collecting and stored at −80 °C for further analyses.
For IS using a U-13C labelled reference metabolome according to variant B (Fig. 1), a U-13C labelled wheat ear (>97 % 13C, cultivar Baldus), and U-13C labelled maize kernels (>97 % 13C, cultivar Yukon chief) were used respectively. It was taken into consideration that the labelled plant material had been grown to the same development stage as the non-labelled wheat cultivars and the native maize line CO354 respectively.
2.3 Sample preparation
2.3.1 Variant A: preparation of F. graminearum samples
The 24-well microtiter plate was removed from the climate chamber and immediately centrifuged for 10 min at 2,000 rpm to separate the mycelium from the extracellular metabolites in the supernatant. Non-labelled and U-13C labelled supernatants were prepared in parallel according to the following protocol. 500 μl aliquots of supernatants of wildtype and tri5Δ mutant, which had been grown on native glucose containing FMM were transferred separately each into a 1.5 ml Eppendorf tube resulting in a total of 12 samples (n = 6 replicates per fungal strain). 400 μl aliquots of each of U-13C labelled supernatants were pooled together in a 50 ml polystyrene tube (VWR International GmbH, Vienna, Austria) resulting in a total of 4,800 μl of a pooled U-13C-supernatant for IS. Immediately after centrifugation and aliquoting of supernatants, all aliquots were quenched with 30 % acetonitrile (v/v) resulting in a 7:3 (v/v) ratio of supernatant to acetonitrile. For LC–HRMS analysis, each quenched non-labelled supernatant was standardised by adding the same volume (200 μl) of the pooled and quenched 13C-supernatant resulting in LC–HRMS sample aliquots (1:1, v/v).
In addition, 60 μl of each of the U-13C-standardised LC–HRMS samples (n = 6 replicates per fungal strain) were merged to an aggregate sample (AG) which was used to evaluate the precision of the LC–HRMS measurement and chromatographic peak integration steps by repeated injection out of the same HPLC vial (n = 13 replicates). LC–HRMS analysis of all samples was carried out immediately after sample preparation.
For comparison of SIL assisted data processing and conventional data processing by XCMS as well as to demonstrate the extraction efficiency, aliquots of quenched native supernatants without any U-13C labelled material were mixed with quenched FMM (1:1, v/v) to yield LC–HRMS samples exhibiting same concentration levels of non-labelled metabolites as the U-13C labelled standardised analogues.
To further exemplify the selectivity of the proposed workflow to identify only SIL derived biological information, solvent blanks (water:acetonitrile (7:3, v/v)) containing purified water instead of supernatant were prepared in parallel according to the same procedure mentioned above.
2.3.2 Variant B: preparation of wheat and maize samples
The sample preparation of plant material was based on De Vos et al. (2007) carried out after slight modifications as reported in Kluger et al. (2012). Native wheat ears and U-13C labelled wheat ear “Baldus” were extracted and prepared in parallel. Native wheat ears of the cultivar “Remus” and “CM-82036” respectively (n = 5 replicates per cultivar) were milled separately to a fine powder using a ball mill (MM301 Retsch, Haan, Germany). 100 ± 5 mg of homogenised plant material were weighed to 1.5 mL-Eppendorf tubes with subsequent extraction using 1 mL of pre-cooled (4 °C) methanol:water (3:1, v/v) including 0.1 % formic acid (v/v) in an ultrasonic bath. After centrifugation an aliquot of the supernatants (300 μl of native samples and 420 μl of U-13C labelled reference sample) were transferred separately to another 1.5 ml-Eppendorf tube and pre-cooled (4 °C) water + 0.1 % formic acid (v/v) was added to achieve a final methanol:water ratio of 1:1 (v/v). IS was achieved by adding the same volume of U-13C labelled sample aliquots resulting in (1:1, v/v) mixtures of non-labelled and U-13C labelled supernatant. All samples were rigorously mixed for 10 s before transfer into HPLC vials for LC–HRMS measurements.
Maize line CO354 (n = 3) and U-13C labelled cultivar “Yukon chief” were prepared according to the same protocol, with “Yukon chief” diluted sample extracts being used for IS.
2.4 LC–HRMS analysis
All samples (F. graminearum, wheat, maize) were analysed on a UHPLC system (Accela, Thermo Fisher Scientific, San Jose, CA, USA) coupled to an LTQ Orbitrap XL (Thermo Fisher Scientific) equipped with an ESI source. A HTC PAL system (CTC analytics, Zwingen, Switzerland) was used for injection (10 μl) per sample and for thermostatisation of sample solutions to 10 °C throughout the whole sequence.
A reversed-phase XBridge C18, 150 × 2.1 mm i.d., 3.5 μm particle size (Waters, Milford, MA, USA) analytical column, preceded by a C18 4 × 3 mm i.d. security cartridge (Phenomenex, Torrance, CA, USA) was thermostated to 25 °C and used for chromatographic separation at a constant flow rate of 250 μl/min. Water containing 0.1 % FA (v/v) (eluent A) and MeOH containing 0.1 % FA (v/v) (eluent B) were used for linear gradient elution: The initial mobile phase composition (10 % eluent B) was held constant for 2 min, followed by a linear gradient to 100 % eluent B within 30 min. After a hold time of 5 min the column was re-equilibrated for 8 min at 10 % eluent B. A 10 μl sample loop was employed to maintain a constant injection volume.
The ESI interface was operated in positive ion mode with the following settings: sheath gas: 60 arbitrary units, auxillary gas: 15 arbitrary units, sweep gas: 5 arbitrary units, capillary voltage: 4 kV, capillary temperature: 300 °C. LTQ parameters were automatically tuned for maximum signal intensity of a 10 mg L−1 reserpine solution (Sigma Aldrich) as recommended by the instrument manufacturer. For measurements using the FT-Orbitrap in the fullscan mode, the automatic gain control was set to a target value of 3 × 105 and a maximum injection time of 500 ms was chosen. The mass spectrometer was operated in a scan range from m/z 100–1,000 with a resolving power setting of 60,000 FWHM (at m/z 400). Data were recorded using Xcalibur 2.1.0 (Thermo Fisher Scientific).
2.5 Data processing
For an efficient extraction of metabolite derived MS signals and analytical features, several consecutive data processing steps (illustrated in Fig. 1-3-a–f) were implemented as an expansion of the already released software version of MetExtract (Bueschl et al. 2012). The focus of this paper was laid on the detailed description of the complete workflow including the biological experiment, sample analysis and data processing as well as its application to samples of plants and fungi. The extended MetExtract 2.0 software, which is capable of performing all of the following data processing steps (2.5.1–2.5.7) will be published elsewhere together with the release of the programme.
2.5.1 Pre-processing (Fig. 1-3-a)
Measurement files were converted from acquired profile to centroid data and the mzXML format (Pedrioli et al. 2004) with the MSConvert programme from the freely available ProteoWizard package v.3.0.3980 32-bit (Chambers et al. 2012).
2.5.2 MS signal pair filtering and clustering of m/z values (Fig. 1-3-b)
Each recorded MS scan was inspected for the typical SIL derived isotopic pattern as described previously in Bueschl et al. (2012). The intensity threshold of both the monoisotopic 12C derived (M) and U-13C derived (M′) MS signal was set to 5,000 counts in at least 3 scans. The maximum m/z deviation from postulated m/z values was set to 2.5 ppm and the isotopologue abundance error was set to ±20 %. Each MS signal pair, fulfilling the criteria was annotated with m/z of M, the charge number z, deduced from the SIL derived isotopic pattern, and the determined number of carbon atoms nC, calculated from the m/z value difference between M and M′, and z. Extracted MS signal pairs were clustered together with hierarchical clustering to group redundantly extracted MS signal pairs of similar m/z value (i.e. MS signal pairs, which originate from the same chromatographic peak or structural isomers with identical sum formula). Hierarchical clustering was performed separately for all MS signals having the same number of carbon atoms and the same charge number. All clusters in the resulting tree, whose m/z values differed more than ±10 ppm were split into separate sub-clusters.
2.5.3 Feature pair picking (Fig. 1-3-c)
For each MS signal cluster, the algorithm of (Du et al. 2006) was utilised to inspect the XICs of both the monoisotopic 12C and the corresponding U-13C labelled ions for co-eluting and similarly shaped chromatographic peaks. For this purpose, a maximum retention time difference of ±15 scans was tolerated between chromatographic peaks in both XICs. Furthermore, the chromatographic peak profiles of the monoisotopic 12C and the U-13C labelled features were compared with the Pearson correlation coefficient and only those, with correlation coefficients greater than 0.5 were considered a valid feature pair derived from the SIL process. This data processing step resulted in a list of putative feature pairs (monoisotopic 12C- and corresponding U-13C labelled feature) with each feature pair being annotated with the m/z value of M, retention time (Rt), peak area, number of carbon atoms per ion (nC), and charge state z.
2.5.4 De-isotoping (Fig. 1-3-d)
Compared to correctly paired features, M+1 features, falsely picked as monoisotopic 12C features or M′−1 features, falsely picked as U-13C labelled features showed a reduced number of carbon atoms nC and/or an increased monoisotopic 12C m/z value. Thus such erroneously extracted features were removed from the feature list by comparing the m/z values of M, charge state z, Rt and nC among putative feature pairs.
2.5.5 Feature pair grouping (Fig. 1-3-e)
To group different features from the same metabolite, extracted feature pairs were convoluted by comparing the chromatographic peak shapes of all monoisotopic 12C features eluting at approximately the same retention time (±10 scans) (Kuhl et al. 2012). A minimum correlation coefficient of 0.85 was specified for features to be grouped together.
2.5.6 Matching results of several samples and generation of data matrix (Fig. 1-3-f)
To track metabolite features over all samples of a particular experiment, the extracted feature pairs of all LC–HRMS data files were compared using nC, m/z of M and Rt in that order. After data matrix generation, monoisotopic 12C and U-13C labelled features, initially missed in some of the data files due to the restrictive filtering criteria, were searched for in a targeted way. To this end, the described peak picking and integration algorithms were employed but without checking peak shape similarity.
2.5.7 Internal standardisation
Internal standardisation was carried out on a file basis for each feature pair by dividing the area of monoisotopic 12C by that of its corresponding U-13C labelled feature.
2.6 Comparison with labelling free strategy
To compare the feature extraction process with a labelling free approach, a non-labelled F. graminearum aggregate sample, which had been diluted with FMM (1:1, v/v) and did not contain any U-13C labelled culture supernatant and one of the U-13C standardised F. graminearum aggregate samples, were analysed, processed and evaluated. The data file derived from a non-labelled sample was processed with XCMS (1.34.0) and R (R Development Core Team, 2012, v. 2.15.2) using parameter settings as recommended by Patti et al. (2012a) for HPLC Orbitrap XL MS. The LC–HRMS data file obtained for the U-13C standardised aggregate sample was processed as described above (steps 2.5.1–2.5.4) and parameter settings similar to XCMS (i.e. minimum intensity of 5,000 in at least 3 scans; maximum tolerated m/z deviation of 2.5 ppm). Automated comparison of the results was performed by comparing both the determined m/z value and retention time of all extracted features and feature pairs respectively. For this, a maximum relative m/z deviation of ±10 ppm and ±0.15 min was allowed for two results to match. Features, which had only been found by the SIL assisted data processing, were further inspected manually using TOPPView (Sturm and Kohlbacher 2009, v 1.10).
2.7 Selectivity evaluation of SIL assisted workflow
To demonstrate the selectivity of feature pair extraction in the presented SIL assisted metabolomics workflow, blank samples (solvent blank) (n = 3) as well as five non-labelled F. graminearum aggregate samples (no internal standardisation with U-13C labelled supernatant) were processed as described earlier.
2.8 Evaluation of internal standardisation and matrix effects
Analysis of internal standardisation was performed on a feature pair level using only those pairs for which both the monoisotopic 12C and the corresponding U-13C labelled features were found in all replicates of a certain sample type after re-integration. Therefore, no imputation of missing values was required. For analytical precision demonstration before and after internal standardisation coefficient of variance (CV) histograms of individual feature pairs within all replicates of a sample type were calculated. The bin width was set to 5 %. CV values above 120 % were set to 120 % to achieve equidistant axis in the plots. To demonstrate internal standardisation with multivariate statistics, PCA plots were calculated for the monoisotopic 12C and U-13C labelled feature areas as well as for the internal standardisation derived feature pairs. For analytical precision analyses R (R Development Core Team 2012 v. 2.15.2) was used. The functionality for calculating the principal component analysis (PCA) was taken from the package ChemometricsWithR (Wehrens 2011, pp. 53–57). Data were range scaled (van den Berg et al. 2006) prior to PCA. For the ellipsis in the PCA plots, the ellipse package (Murdoch and Chow 1996) was used. Ellipses were calculated using the co-variance matrices of PC1 and PC2 of the respective sample types.
3 Results and discussion
Variant A can be realized without much extra effort for less complex organisms such as bacteria, yeasts or fungi, which can be grown on synthetic media that require only a limited number of carbon sources, available as stable isotope enriched nutrients. Growing U-13C labelled plants (e.g. A. thaliana, wheat) has also been carried out successfully but is a more challenging task regarding infrastructure, costs, and time to establish cultivation conditions in a controlled U-13CO2 enriched atmosphere and can be demanding, particularly under the experimental conditions of interest. If U-13C labelled plant material is commercially available, variant B might be a good compromise as it provides the possibility to use globally U-13C labelled plant samples as reference for both qualitative and quantitative measurements with the drawback however, that the biological experiment itself is not carried out under labelling conditions. Mammalian organisms can be regarded to be even more demanding than plants. So far, successful SIL experiments (15N or 13C stable isotope enrichment >90 %) of different mammalian cell lines (e.g. CHO) requiring complex media have been reported (Egorova-Zachernyuk et al. 2011), but to the best of our knowledge, labelling of whole mammalian organisms was not performed successfully to date.
With the presented approach, the systematic search of MS-signals and feature pairs carrying the SIL specific isotopic pattern virtually enables the complete annotation of that part of the metabolome of a biological sample which can be accessed by the chosen sample preparation and measurement method. Moreover, since the mass difference of the 12C monoisotopic ion M and its U-13C labelled isotopologue ion M′ is proportional to the mass difference between 12C and 13C isotopes (i.e. 1.00335 u), the number of carbon atoms contained in that particular metabolite ions can directly be calculated from the measured HRMS spectra (Bueschl et al. 2012). A prerequisite to unambiguously assign the number of carbon atoms (nC) in a particular metabolite ion directly, the workflow requires the m/z value of both the monoisotopic, non-labelled M- and fully 13C labelled M′ ions to be clearly identifiable among their respective isotopic patterns (Fig. 2c). For example, a 13C enrichment of 98 % for the labelled metabolites allows extracting and directly calculating the number of carbon atoms in metabolite ions containing up to 60 carbon atoms. In case the degree of 13C enrichment dropped to e.g. 85 %, this number is reduced to a maximum of 7 carbon atoms before the MS signal of a hypothetical 13C n-1 12 C1 reached the same intensity as the fully labelled 13Cn isotopologue and thus would interfere with direct nC assignment. It should be noted however that even at a further reduced enrichment degree of as low as e.g. 75 or 50 %, the presented workflow does still allow for the automated recognition of corresponding isotopic ion patterns when parameter settings for processing are adjusted accordingly. In addition to the qualitative aspects of improved metabolome annotation, the use of globally U-13C labelled biological samples permits a highly efficient internal standardisation thereby enabling the assessment of precision parameters (both biological and technical) as well as compensation of matrix effects and improved relative quantification of hundreds of metabolites simultaneously.
3.1 Feature reduction by two-dimensional data filtering and feature grouping
The table provides a quantitative overview of the data processing results with the proposed workflow (Fig. 1)
Wildtype PH-1 (variant A)
Aggregate samples (variant A)
Remus wheat (variant B)
CM wheat (variant B)
CO354 maize (variant B)
1.8 × 106 (±0.06 × 106)
1.9 × 106 (±0.09 × 106)
3.0 × 106 (±0.01 × 106)
3.0 × 106 (±2 × 102)
2.8 × 106 (±7 × 103)
SIL derived signal pairs
1.06 × 104 (±0.24 × 104)
1.65 × 104 (±0.52 × 104)
1.87 × 104 (±0.66 × 104)
2.09 × 104 (±0.19 × 104)
2.04 × 104 (±0.67 × 104)
MS signal clusters
8.09 × 102 (±0.27 × 102)
1.28 × 103 (±0.04 × 103)
2.07 × 103 (±0.03 × 103)
2.32 × 103 (±0.13 × 103)
1.39 × 103 (±0.37 × 103)
Feature pairs before de-isotoping
De-isotoped feature pairs
Feature groups (i.e. metabolites)
After MS signal clustering, feature pairs are extracted from the data. As the non-labelled monoisotopic and its corresponding U-13C labelled analogue of a particular metabolite can be expected to show perfect chromatographic co-elution, verification of retention time and chromatographic peak shape similarity is used for feature pair picking (Fig. 1-3-c). The subsequent de-isotoping step eliminates incorrectly paired monoisotopic 12C- and U-13C features which do not represent true monoisotopic or uniformly labelled features. Together, the data filtering steps 3b–3d reduced the metabolite-related information—depending on the investigated organism—to ca. 300–900 distinct de-isotoped feature pairs per LC–HRMS chromatogram.
Since ionisation by electrospray may give rise to several ion species for the same substance such as adducts, in-source fragments or dimers, the SIL derived feature pairs are further combined with the aim to convolute all ion species of a particular metabolite into single groups (Fig. 1-3-e). Feature grouping is greatly facilitated by the prior removal of all non-biology related as well as all M+1, M+2 isotopic features and, depending on the investigated samples, resulted in the detection of 87–135 metabolites for the F. graminearum—and 200–360 truly plant derived substances for the maize and the wheat extracts respectively. As a major benefit of the presented approach all of these metabolites can be used to build-up reference databases and can serve as positive lists for future metabolomics experiments. The list of approximately 135 metabolites detected for the Fusarium aggregate samples (variant A, labelling under experimental conditions) can be assumed to comprise all metabolites which have been produced and released by at least one of the fungal strains under the tested conditions. For variant B of the workflow, which is exemplified with globally U-13C-labelled reference samples, the detected metabolites are restricted to those present in both, the reference as well as the experimental samples. Nevertheless, all of the generated feature groups facilitate metabolic feature annotation and come with additional valuable characteristics for molecular formula generation and metabolite annotation such as the number of carbon atoms, the charge state for each metabolite ion as well as all other ion species detected for that particular metabolite (i.e. feature group).
3.1.1 Comparison of feature extraction with a labelling-free workflow
To compare the number of SIL derived features with those found with a labelling-free metabolomics approach, a selected native F. graminearum aggregate sample (consisting of a mixture of non-labelled F. graminearum wildtype and non-labelled tri5Δ mutant culture supernatants) was diluted 1:1 (v/v) with either Fusarium minimal medium (FMM) or a pooled 13C-aggregate supernatant, to generate two LC–HRMS samples both containing the native metabolites at identical concentration levels. Retention time shifts observed between the respective LC–HRMS chromatograms were negligible (see TICs in Fig. 3a). Parameter settings for XCMS-based (Smith et al. 2006) feature extraction for the non-labelled aggregate sample chromatogram such as intensity threshold (5,000 counts), minimum number of scans (3×), maximum ppm deviation (2.5 ppm) were kept identical to the SIL assisted feature picking in order to ensure maximum comparability of results between the two approaches. As expected, with XCMS every chromatographic peak in the data regardless of its origin (biological, background, noise…) was extracted as a feature. In total, n = 4,625 features (illustrated as grey symbols in scatter plot of Fig. 3b) were found by XCMS based data processing including all low abundant features with no observable isotopic peaks or MS signals. In contrast, with the SIL assisted approach MS signal- and feature pair picking and subsequent de-isotoping only yields monoisotopic 12C features with a high degree of confidence to correspond to truly sample derived metabolites. In total, application of the SIL assisted workflow yielded 431 feature pairs (red dots in scatter plot in Fig. 3b) which are about ten times less compared to XCMS. Moreover, data processing and automated comparison of the SIL and XCMS assisted approach resulted in 28 feature pairs (Fig. 3b blue dots) solely found with the SIL assisted approach. A closer, manual inspection of these 28 results showed, that 22 features were not automatically matched because of larger m/z or retention time deviations. Only three feature pairs found solely with the SIL assisted approach did not show any signals in the native sample processed with XCMS. Moreover, 3 of these 28 feature pairs were identified as false positives (e.g. pairings of Fourier transform artefacts (Brown et al. 2009)). For additional manual inspection of those parts of the LC–HRMS chromatograms showing a high density of features (e.g. Rt ≥ 30 min or m/z ≤ 150), which had only been found by XCMS, TOPPView was used to confirm that none of these features did show corresponding U-13C labelled isotopologues.
3.1.2 Selectivity of the presented approach
To demonstrate the selectivity of the workflow for the extraction of truly biologically related metabolite features by the use of SIL, eight blank samples were included in a measurement sequence of F. graminearum samples and evaluated according to the presented workflow (variant A). Three of these blanks, which consisted of purified water instead of fungal supernatant, were employed as solvent blanks. Furthermore, five aggregate samples containing only the non-labelled F. graminearum metabolome served as simulated matrix background blanks. Metabolite ions detected in the native F. graminearum supernatants were not expected to be found with the SIL assisted data processing steps since they did not contain any U-13C labelled metabolites. In the solvent blanks, hardly any MS signal pairs (searched in each mass scan, Fig. 1-3-b) were extracted (2, 26 and 39 MS signal pairs respectively). Better yet, none of these MS signal pairs were further confirmed to be valid feature pairs according to the predefined filtering criteria (Fig. 1-3-c). The simulated background blanks showed 14, 17, 7, 224 and 106 MS signal pairs on a mass scan level. Subsequent feature pair picking revealed 0, 1, 1, 1 and 5 feature pairs for the simulated background blanks respectively. However, none of such extracted feature pairs were found in more than one of the measured matrix blanks. Further manual inspection clearly showed that all of these randomly picked feature pairs fulfilled the present criteria either by chance or were pairings of different adducts or Fourier transformation artefacts (Brown et al. 2009). Such incorrectly picked adducts or artefacts showed nearly identical chromatographic profiles and they were therefore not discarded as false positives automatically (Fig. 1-3-e). Two of these feature pairs were detected at >700 u with a difference between the monoisotopic 12C and corresponding U-13C mass corresponding to less than ten carbon atoms. Thus such feature pairs can easily be excluded from further data analysis. In conclusion the very low rate of false positives in both types of blank samples demonstrates the exceptionally high selectivity of the presented approach in only extracting truly biologically related feature pairs.
3.2 Results of internal standardisation
Absolute quantification in untargeted metabolomics experiments of all detected (known or unknown) metabolites is generally not feasible by most of the current approaches. Moreover, the accuracy of relative feature/metabolite quantification in untargeted metabolomics experiments is limited by matrix effects which can cause problems during statistical analysis as the biased feature abundances complicate comparison across different experimental conditions. Internal standardisation using globally stable isotope labelled biological samples provides the ideal tool to overcome these limitations and has already been used for relative (e.g. Giavalisco et al. 2009) and even for absolute quantification in untargeted metabolomics approaches (Bennett et al. 2008). As the presented workflow also makes use of globally U-13C labelled biological samples, the effect of global internal standardisation on matrix effects and technical precision has been investigated at the example of the F. graminearum dataset.
3.2.1 Correction of matrix effects
As can be expected from the experiment, the score plot of the 12C-PCA shows a clear separation of the PH-1, tri5Δ and AGs samples into three distinct groups. The same holds true for the 12C/U-13C-PCA, which was calculated from the area ratios of non-labelled and corresponding U-13C features. Additionally the variance captured by PC1 and PC2 increased slightly from 85.4 to 88.8 %. Compared to the 12C-PCA, the aggregate samples are located in the centre of the 12C/U-13C-PCA score plot (Fig. 4c), which is explained by the range scaling process together with the fact that these aggregate samples constitute of equal amounts of U-13C internally standardised PH-1 and tri5Δ samples. In contrast to the ideal behaviour of the U-13C-standardised AG samples, matrix effects obviously affected the monoisotopic 12C feature areas of the aggregate samples (Fig. 4a), which in turn become visible as a shift of the aggregate sample group away from the centre of the 12C-PCA score plot.
Interestingly, the use of U-13C-feature areas for PCA also resulted in a clear separation into the three sample categories, although (according to the preparation protocol for the AGs, see Sect. 2.3.1) the absolute concentration levels of all U-13C labelled metabolites were identical in every of the analysed samples and sample type. In this case, the separation of the sample groups in the U-13C-PCA plots is explained by the different metabolic composition of the wildtype PH-1, tri5Δ and AGs samples with respect to their non-labelled metabolites, which resulted in distinct alterations of the areas derived from U-13C labelled features (i.e. matrix effects) for each of the tested sample categories. The separation of the three sample categories based on the peak areas of the respective U-13C features would have never been recognised as artefact (caused by matrix effects) without the availability of globally U-13C labelled biological samples. In contrast, an observation as depicted in Fig. 4b most probably would have led to the false conclusion that metabolites differing between the experimental samples had caused the separation according to the tested experimental states.
3.2.2 Precision of workflow and improvement of technical data variability
In order to estimate and dissect the precision of the end determination step (i.e. LC–HRMS measurement, data processing and feature integration), 12C/U-13C aggregate samples, consisting of 1:1 mixtures of PH-1/tri5Δ supernatants were measured as replicate injections (n = 13) at regular intervals over a complete LC–HRMS sequence. The distribution of CVs of the monoisotopic 12C feature areas showed a median CV of 7.1 % (90 % of all features had CV values below 18 %). After internal standardisation (blue histogram), the distribution of CVs shifted left (blue histogram) towards lower CV values with a median and 90 % quartile of to 3.6 and 9.7 % respectively, demonstrating a substantial improvement (~50 %) of the LC–HRMS end determination.
It should be noted that for a few (n < 10) features the internal standardisation of F. graminearum culture supernatants resulted in CV values >120 %. This might have been caused by non-reproducible degradation/chemical conversion of a few metabolite features in the samples but this phenomenon was not further investigated in this study however.
With ca. 20 % and 40–50 % respectively the distribution of CVs in wheat and maize samples (variant B) yielded slightly higher median and 90 % percentile values than for the F. graminearum samples (data not shown). Similar to the F. graminearum experiment, wheat and maize aggregate samples were used to study the technical precision. Similar to Fusarium, median CV (90 % percentile) values shifted from roughly 8 % (16 %) to 5 % (20 %) for wheat and 12 % (22 %) to 6 % (17 %) for maize respectively (data not shown).
In conclusion, the above illustrated results are in good agreement with the reports of e.g. Birkemeyer et al. (2005), Bennett et al. (2008), Giavalisco et al. (2009), who described enhanced precision and (relative) quantification after metabolome wide internal standardisation by use of U-13C labelled biological samples. Furthermore, as shown for F. graminearum samples, internal standardisation resulted in an improved performance of multivariate data analysis.
4 Concluding remarks
Recent reports of SIL assisted tools and techniques and their application to various fields of metabolomics have illustrated significant improvements to circumvent major challenges of untargeted metabolite profiling. The presented workflow enables the untargeted global extraction of truly metabolite related MS signals and features in LC–HRMS datafiles derived from native and U-13C labelled metabolomes. Together with the automated generation of hundreds of feature groups per sample, each of which is representing a distinct metabolite, this approach constitutes a major step forward towards global annotation of the entire metabolic composition of biological samples. Additionally, metabolome wide internal standardisation with U-13C labelled samples greatly enhances accuracy and reliability of relative quantification by correction of technical variability as well as correction of matrix effects, which otherwise are difficult to evaluate and compensate.
In conclusion, although stable isotope labelling of whole metabolomes for untargeted metabolomics is still challenging and generally requires additional efforts in terms of costs and/or experimental design, it is anticipated that SIL assisted metabolomics will arouse increasing interest and become a well-established technique in metabolomics research.
The authors thank Nora Neumann, Denise Schöfbeck, Sylvia Lehner and Benedikt Warth for useful discussions. The Austrian Science Fund (SFB-Fusarium-3706-B11;3702), WWTF (Toxi-Genome-9793008037), EC (KBBE-2007-22269-2-MYCORED), FFG (Grant-824186), BMWFJ, BMVIT, ZIT, and Zukunftsstiftung Tirol are acknowledged for financial support. V.M. was supported by Doctoral-School on the Agro-Food System (Agrisystem) of Università Cattolica del Sacro Cuore (Italy).
- Böttcher, C., Roepenack-Lahaye, E. V., Willscher, E., Scheel, D., & Clemens, S. (2007). Evaluation of matrix effects in metabolite profiling based on capillary liquid chromatography electrospray ionization quadrupole time-of-flight mass spectrometry. Analytical Chemistry, 79, 1507–1513.CrossRefPubMedGoogle Scholar
- Cano, P. M., Jamin, E. L., Tadrist, S., Bourdaud’Hui, P., Pean, M., Debrauwer, L., et al. (2013). New untargeted metabolic profiling combining mass spectrometry and isotopic labeling: Application on Aspergillus fumigatus grown on wheat. Analytical Chemistry, 85, 8412–8420.CrossRefPubMedGoogle Scholar
- Clasquin, M. F., Melamud, E., & Rabinowitz, J. D. (2012). LC-MS data processing with MAVEN: A metabolomic analysis and visualization engine. Current Protocols in Bioinformatics, 37, 14.11.1–14.11.23.Google Scholar
- Giavalisco, P., Hummel, J., Lisec, J., Inostroza, A. C., Catchpole, G., & Willmitzer, L. (2008). High-resolution direct infusion-based mass spectrometry in combination with whole 13C metabolome isotope labeling allows unambiguous assignment of chemical sum formulas. Analytical Chemistry, 80, 9417–9425.CrossRefPubMedGoogle Scholar
- Giavalisco, P., Köhl, K., Hummel, J., Seiwert, B., & Willmitzer, L. (2009). 13C isotope-labeled metabolomes allowing for improved compound annotation and relative quantification in liquid chromatography–mass spectrometry-based metabolomic research. Analytical Chemistry, 81, 6546–6551.CrossRefPubMedGoogle Scholar
- Kluger, B., Bueschl, C., Lemmens, M., Berthiller, F., Häubl, G., Jaunecker, G., et al. (2012). Stable isotopic labelling-assisted untargeted metabolic profiling reveals novel conjugates of the mycotoxin deoxynivalenol in wheat. Analytical and Bioanalytical Chemistry, 405, 5031–5036.CrossRefPubMedPubMedCentralGoogle Scholar
- Kluger, B., Zeilinger, S., Wiesenberger, G., Schöfbeck, D., & Schuhmacher, R. (2013). Detection and identification of fungal microbial volatile organic compounds by HS–SPME–GC–MS. In V. K. Gupta, M. G. Tuohy, M. Ayyachamy, K. M. Turner, & A. O’donovan (Eds.), Laboratory Protocols in Fungal Biology. New York: Springer.Google Scholar
- Leslie, J. F., & Summerell, B. A. (2007). Media—Recipes and preparation. The Fusarium Laboratory Manual. Hoboken: Blackwell Publishing.Google Scholar
- Murdoch, D. J., & Chow, E. D. (1996). A graphical display of large correlation matrices. American Statistician, 50, 178–180.Google Scholar
- Nakabayashi, R., Sawada, Y., Yamada, Y., Suzuki, M., Hirai, M. Y., Sakurai, T., et al. (2013). Combination of liquid chromatography–fourier transform ion cyclotron resonance-mass spectrometry with 13C-labeling for chemical assignment of sulfur-containing metabolites in onion bulbs. Analytical Chemistry, 85, 1310–1315.CrossRefPubMedGoogle Scholar
- R Development Core Team. (2012). R: A language and environment for statistical computing. Vienna, Austria: R foundation for statistical computing. Retrieved from http://www.R-project.org.
- Scalbert, A., Brennan, L., Fiehn, O., Hankemeier, T., Kristal, B. S., van Ommen, B., et al. (2009). Mass-spectrometry-based metabolomics: Limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics, 5, 435–458.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.