Ultra-fast searching assists in evaluating sub-ppm mass accuracy enhancement in U-HPLC/Orbitrap MS data
A strategy, detailed methodology description and software are given with which the mass accuracy of U-HPLC-Orbitrap data (resolving power 50,000 FWHM) can be enhanced by an order of magnitude to sub-ppm levels. After mass accuracy enhancement all 211 reference masses have mass errors within 0.5 ppm; only 14 of these are outside the 0.2 ppm error margin. Further demonstration of mass accuracy enhancement is shown on a pre-concentrated urine sample in which evidence for 89 (342 ions) potential hydroxylated and glucuronated DHEA-metabolites is found. Although most DHEA metabolites have low-intensity mass signals, only 11 out of 342 are outside the ±1 ppm error envelop; 272 mass signals have errors below 0.5 ppm (142 below 0.2 ppm). The methodology consists of: (a) a multiple internal lock correction (here ten masses; no identity of internal lock masses is required) to avoid suppression problems of a single internal lock mass as well as to increase lock precision, (b) a multiple external mass correction (here 211 masses) to correct for calibration errors, (c) intensity dependant mass correction, (d) file averaging. The strategy is supported by ultra-fast file searching of baseline corrected, noise-reduced metAlign output. The output and efficiency of ultra-fast searching is essential in obtaining the required information to visualize the distribution of mass errors and isotope ratio deviations as a function of mass and intensity.
KeywordsSub-ppm mass accuracy Orbitrap metAlign Liquid chromatography mass spectrometry (LC/MS) Data processing Metabolomics
Advances in high resolution mass spectrometry play a significant role in obtaining elemental compositions of unknown compounds. However, high mass resolution alone is often not enough to get unique elemental compositions of compounds (Kind and Fiehn 2006). The larger the mass the more uncertain the elemental composition becomes. Additional heuristic rules (such as, for instance, the rings-plus-double-bonds equivalent (RDBE), LEWIS and SENIOR checks, probability of element ratio’s, nitrogen rule), and—most importantly—isotope ratio information and prior knowledge on possible elements can restrict the number of reasonable elemental compositions (Kind and Fiehn 2007).
High resolution, however, does not automatically mean high mass accuracy and a well established isotope ratio. Calibration is obtained externally and/or internally to correct for errors in mass. In Orbitrap technology—besides the normal external calibration—the option for an internal mass correction can be chosen to correct all the scans in a liquid chromatography mass spectrometry (LC/MS) experiment (Makarov et al. 2006a, b). With current Orbitrap mass analyzers a mass accuracy better than 5 ppm at resolving power 60,000 FWHM (Full Width at Half Maximum) can be achieved using only the standard external calibration; including a one-mass-based internal mass calibration a better than 2 ppm mass accuracy is feasible at the same resolution. On-going developments in Orbitrap technology may improve future specifications (Makarov et al. 2009; Olsen et al. 2009).
Two studies have been reported in which a further improvement of mass accuracy is realized through sophisticated post-analysis mass corrections. Scheltema et al. have shown a nice ZIC-HILIC HPLC-LTQ-Orbitrap MS application in which multiple system dependant internal masses with known elemental composition are used to recalibrate the mass axis. The mass correction, however, is restricted by the presence, abundance and mass range of internal background masses in the system (Scheltema et al. 2008). This makes the whole procedure dependant on not having a too clean background. Furthermore the software tools are not made available to potential users. Cox and Mann (2009) have shown elegantly how mass precision and accuracy improves by considering all associated measurements, starting from the MS peak and proceeding to its chromatographic elution profile, isotope envelope, and stable isotope pair in SILAC measurements (SILAC = Stable Isotope Labeling with Amino acids in Cell culture). The publicly available tools, however, are restricted to research such as SILAC-based proteomics.
It has been reported, that the isotope ratio in Orbitrap technology may to a certain extent be compromised by a resolution-dependant phenomenon (possible isotopic beat patterns) (Erve et al. 2009). The mentioned study was conducted on compounds in the mass range of 639 to 1,663 and is particularly evident in that study when considering fourth and fifth isotopes, but not very apparent in the first few isotopes. Still, in combination with improving mass accuracy it is worthwhile—as done in this study—to also assess the isotope ratio robustness.
The goal of the present study is to substantially improve the mass accuracy over a wide mass range while being less dependant on the number of internal calibration masses and by being independent on their identity; furthermore it is a goal to obtain estimates of the experimental isotope ratio error to be expected in a LC-Orbitrap-MS setting at a resolving power of 50,000 (FWHM). This will substantially increase the feasibility of in silico post-processing towards sub-ppm mass accuracy and—with more information on isotope ratio correctness—thus increase the confidence in the determination of the elemental composition. The overall feasibility and data-processing strategy is demonstrated by application to relatively low signals of dehydroepiandrosterone (DHEA) metabolites in a real-life concentrated bovine urine sample with limited sample clean-up. The urine sample originates from a controlled animal experiment in which evidence is given for a large number of DHEA metabolites (Rijk et al. 2009).
The effect of using multiple internal calibration masses (unknown compounds), calculating the average mass over the chromatographic peak, additional external calibration with 211 compounds and averaging over multiple files is examined. Furthermore, the mass stability over a large amplitude range as well as the isotope ratio correctness are studied.
A combination of software tools is used for this study (freely available at http://www.metalign.wur.nl/UK/Download+and+publications/). The basis is the software package, metAlign™, which is capable of accurate mass calculation over the peak, data reduction and alignment (Lommen 2009). Besides this, two new additional software tools have now been created. The first tool is the Subppm_converter which calculates and applies special internal and external calibration to original raw data files. The second tool is Search_LCMS which can query highly reduced metAlign output of multiple data sets for hundreds of masses in retention windows in a few seconds; this makes multi-file large scale mass analysis highly feasible.
Ultra-fast searching of processed data is essential in obtaining the required information to visualize the distribution of mass corrections (ppm deviations) and isotope ratios in an acceptable time frame. The speed of mass searching is directly correlated to the size of files. For this purpose, in this study file sizes are decreased 100–1000-fold.
2 Materials and methods
Formic acid of analytical grade is obtained from Merck (Darmstadt, Germany). Acetonitrile ultra pure LC–MS quality and water LC–MS quality are obtained from Biosolve (Valkenswaard, The Netherlands). A list of 211 standards are given in the supplementary section; these compounds are from different manufacturers.
2.2 Ultra high performance chromatography coupled to Orbitrap mass spectrometry
Ultra-high performance chromatography (U-HPLC) is performed on a U-HPLC Accela system (Thermo Fisher Scientific, San Jose, CA, USA). The U-HPLC system consists of a degasser, a quaternary pump, an autosampler and a column oven. Separation is performed on a Waters Acquity UPLC (UPLC = Ultra Performance Liquid Chromatography) BEH C18 column (150 × 2.1 mm, 1.7 μm particle size) (Waters, Etten-Leur, The Netherlands) which is kept at 40°C. A flow rate of 0.4 ml/min and an injection volume of 10 μl are used. The mobile phase consists of 20 mM formic acid in water (A) and 20 mM formic acid in water/acetonitrile (10/90 v/v) (B). The mobile phase composition is kept at 0% B for 1 min which is followed by a linear change to 20% B in 2 min, 20–80% B in 20 min, and 80–100% B in 2 min. The composition of 100% B is kept for 10 min and then is returned linearly to 0% B in 1 min, remaining at this composition for 4 min prior to the next injection. The U-HPLC is directly interfaced to a single stage Orbitrap mass spectrometer (Exactive, Thermo Fisher Scientific). The Orbitrap-MS is equipped with a heated electrospray interface (HESI) operating in the positive mode. Data are acquired between m/z 100 and m/z 1000, the resolving power is set to 50,000 (FWHM), resulting in a scan time of 0.5 s. Furthermore, the spray voltage is 2800 V, the capillary voltage is 47.5 V, the capillary temperature is 275°C, a sheath and auxiliary gas flow of 35 and 15 arbitrary units is used, respectively. Instrument calibration is performed externally prior to the sequence by infusion of a calibration solution. The calibration solution (m/z 138 to m/z 1822) contains, caffeine, MFRA (Met-Arg-Phe-Ala), ultramark 1621, acetic acid in acetonitrile/methanol/water (2:1:1, v/v) (Sigma-Aldrich). Data are recorded using Χcalibur software version 126.96.36.1999 (Thermo Fisher Scientific).
Two samples are used for this study. One reference standard solution containing 211 compounds (sample A) with molecular weights in the range of 100–1000. The other is a mix sample (B) of urine (after sample preparation) of DHEA (= Dehydroepiandrosterone) treated male calves obtained from the study as previously described (Rijk et al. 2009); this sample is known to contain a large number of DHEA metabolites.
2.4 Data acquisition
Twenty-five data sets are acquired. The run consists of 7 injections of the reference sample A, followed by 12 injections of the urine sample B, followed by 6 injections of the reference sample A.
2.5 Data processing
A detailed presentation of parameters and programs is given in the documentation in the supplemental information (see Electronic supplementary material). A separate detailed presentation is also given in the supplemental information for a two-file procedure with one reference and one real-life sample (see Electronic supplementary material).
2.6 Software development
The Subppmconverter software (see Electronic supplementary material) and the Search_LCMS module (see Electronic supplementary material) are developed in C and Visual C++ version 6.0. Installation requirements are identical to those of metAlign (Lommen 2009). MetAlign must be installed first.
The Subppmconverter module can use a list of internal background masses as reference to correct the mass axis in scans. In short the program does the following: (a) It reads a list of internal masses. (b) It reads the centroided data (not normally visible in Xcalibur) in the raw data files recorded in profile mode (using the correct filter) using the xrawfile.ocx (from Xcalibur 188.8.131.529) supplied by Thermo Fisher Scientific; it performs an intensity correction—if chosen—while reading. (c) It searches for all the proposed (see for example the list in Subppmconverter.pdf in the electronic supplementary material) internal background reference masses in all the scans in all the files (delta mass ≤0.005 Da); per scan it calculates the ppm errors of the internal reference masses and then excludes those deviating more than 5 ppm from the average; if no internal mass is present it interpolates from the closest scans containing internal masses. (d) It calculates the average mass of each proposed corrected internal mass from all the scans of all the data sets. (e) It makes a new internal mass list based on (d). (f) It repeats (a)–(c). (g) It corrects the mass values in each scan in all data sets using the corresponding calculated average ppm deviation. (h) If chosen: it calculates an external correction profile from “.csv_profile” output from the Search_LMS module (this consists of polynomial fitting (Press et al. 1988) followed by a 5-point averaging of subsequent points: ppm, mass); it applies this external correction. (i) It converts the corrected centroid data to a netCDF file. Further conversion of the corrected centroid data to Xcalibur format can be done in batch using any version of the Xcalibur Xconvert module.
The Search_LCMS module reads a list of the “.redms_acc” output files from metAlign (Lommen 2009) (output after baseline correction and noise reduction) together with a csv file containing search criteria (see Electronic supplementary material) and then creates a search output in csv format. Excel is used to visualize search results.
A download of the software and detailed methodology description are provided at www.metalign.nl free of charge. Regular updates will be provided in the future through this website. The source code will be made available under a Material Transfer Agreement on an individual basis.
3 Results and discussion
3.1 Internal lock mass correction
In their software Thermo Fisher Scientific provides an option for internal mass correction of Orbitrap data. For an internal mass correction to be feasible an internal mass must be present in all scans and the exact mass (elemental composition) of this compound has to be known. If the internal mass is not present, no correction may be performed or an incorrect other ion may be taken. If the internal mass has a low intensity the noise variation on this mass is transferred to the scans. Both these potential problems can result in erroneous corrections in certain regions of the chromatogram leading to mass errors exceeding 20 ppm (data not shown). While in diluted samples with high system background ions these problems may not occur, this is—on the other hand—all the more likely to happen in complex concentrated extracts as used in particular in metabolomics. The likelihood of ion suppression of internal references increases in concentrated samples. Unfortunately, in our system no background ion seems to be present in all scans.
Therefore the initial goal of internal mass correction developed in this study is to make all mass traces in all samples comparable by locking to a larger set (here 10) of system background ions (see Electronic supplementary material) of which only a subset has to be present in each scan. A threshold is defined for the intensity of the background ions; this is set here to 5–7 times noise (20,000 counts in centroid mode). If there are scans without an internal lock mass the procedure interpolates between known ppm-deviations in adjacent corrected scans.
3.2 Ultra-fast mass searching assists in evaluation
In Fig. 3a the ppm error distribution is given for all 211 reference compounds, when no internal lock is used and the mass is taken from the apex of a peak. It can be observed, that although all signals in all 13 files have a good signal to noise ratio, mass errors in the range of −3.5 and +2.5 ppm occur. Furthermore, Fig. 3a shows a drift in mass between experiments over the time course of the sequence of experiments (ca. 1 day). This is however still in accordance with the instrument specification (error <5 ppm). In Fig. 3b an internal lock correction using one internal mass has been done (see also Fig. 2b); the ppm error distributions from 13 data files are now more or less superimposed. More precise superimposing is achieved as expected and illustrated in Fig. 3c (see also Fig. 2b) by averaging mass errors of multiple internal lock masses. Figure 3d shows the result of Fig. 3c after additional external mass correction as described in the next section; a clear further improvement (flatter distribution centered around 0 ppm) is obtained even though the mass spectrometer has been correctly calibrated according to the manufacturer’s instructions.
3.3 External lock mass correction
The Subppm_converter runs this module in line. The algorithm consists of a multiple recursive polynomial fit and a 5-point smoothing. For a third to fifth order polynomial repeated fits and corrections are performed up to 30 times. The fit with the least sum of the residual squares is automatically selected as the best fit and stored. For each experimental mass a correction is performed on the basis of this fitted curve (see Fig. 4). The residual ppm-error over five subsequent masses is then averaged and again subtracted (see Fig. 4). Additionally to the multiple internal lock mass correction this final 211-mass external correction profile is used by the Subppm_converter to correct the original urine sample data as well as the reference data (see Fig. 1).
3.4 Setting-up a list of masses to search in the mass-corrected urine data
The 12 replicate mass-corrected data sets are pre-processed and aligned using metAlign software (see Fig. 1 and electronic supplementary material); only masses present in all 12 data sets are retained in the average pre-processed urine data set. This serves to further eliminate noise and to retain only real signals.
In the urine sample used in this study Rijk et al. (2009) found that a large number of metabolites downstream from DHEA are formed. Therefore it is known a priori (statistical and analytical evidence), that these metabolites can be traced in the urine sample data sets. Among these there are a number of hydroxylated metabolites conjugated to glucuronides. The known diagnostic ions (M + 1, M + NH4, 2M + 1, 2M + NH4, in-source fragments due to loss of glucuronide and/or water) can easily be calculated and searched for using the Search_LCMS module in the “all” mode (no retention windows involved) using a relatively wide mass tolerance window of ±5 ppm. The search is done on the averaged (12) urine data set (see Fig. 1 and see above). The output from the entire chromatogram consists of a list of 709 ions matching the calculated exact masses (see Electronic supplementary material). If these 709 ions are indeed related to DHEA, then 692 of them have mass errors below 2 ppm, 656 below 1 ppm, 540 below 0.5 ppm and 294 even below 0.2 ppm. For elemental composition analysis of these ions the elements may be restricted to C, H, N, O and S. The reason for this is that the urine has come from an animal experiment where DHEA was administered under very controlled conditions. Phosphorylated compounds do not generally occur in urine and the presence of other types of atoms in compounds is at least very unlikely to occur. Analysis, for example, of the positively charged ion 502.30106 (C25H44O9N = androstanetriol-glucuronide NH4-adduct) with application of the LEWIS, SENIOR and nitrogen-rule shows that the second hit is at −1.35 ppm mass difference. For a mass around 300 Dalton the second hit would be at ca. 3 ppm. In both cases the mass difference of the second hit is large with regard to expected errors. Furthermore, the second hit contains sulfur, which is not the most likely element. Normally sulfur will occur as sulphate in urine or as part of a cysteine conjugate. Sulphates do not ionize well in positive mode in the conditions used (data not shown). It is therefore highly likely that most of the 709 ions are related to elemental compositions of metabolites downstream of DHEA. Still, further restrictions towards more confidence in identification are imposed.
The 709 ions can also potentially include non-glucuronidated metabolites and non-glucuronidated, non-hydroxylated metabolites. A filtering of the ions towards likely hydroxylated and glucuronidated DHEA metabolite is done as follows. For each peak, retention time, accurate mass and intensity is obtained. The list is reduced by manual examination of ions found at the same retention time (interval of 1.5 s). Only retention times containing M + 1 or M + NH4 of hydroxylated and glucuronidated metabolites are selected; furthermore these M + 1 or M + NH4 ions should at least be accompanied by one other diagnostic ion (dimer or in-source fragment). The elemental compositions of 89 metabolites (each at least two characteristic ions) can be identified with a total of 342 diagnostic ions (see Electronic supplementary material. This is in line with previous evidence (Rijk et al. 2009). Most of these mass peaks are in the lower range of detection and serve nicely as a test for the mass accuracy enhancement as a whole and as a test for the calibration using an external source (i.e. obtained from separately measured reference data).
Although the 89 downstream metabolites may seem a large number, it should be noted that this is a pre-concentrated urine with high levels of metabolites due to DHEA treatment of calves; furthermore the number of metabolites is also combinatorially determined by different hydroxylation positions, different positions of up to two double bonds, alpha and beta configurations in the steroidal structures and also the position of glucuronidation. There is still a lot of information left on non-hydroxylated, non-glucuronylated, sulphated (although not likely in positive mode) and other possible metabolites. The number of combinatorial possibilities could exceed 200 metabolites for DHEA.
3.5 Further evaluation of mass corrections
Having corrected reference data files and urine data files using internal and external lock masses the average ppm error can be visualized as a function of (10Log) intensity for the resp. 13 and 12 replicates. For this, the mass-corrected reference data are searched for 211 molecular ions of reference compounds using a 5 ppm tolerance for mass deviation and 2.5% for retention time deviation (see Electronic supplementary material). For the urine sample the list of ions (342 ions in total; 89 metabolites) described above is used for the search.
3.6 Intensity dependant mass correction
3.7 Mass from apex or from whole peak
Interestingly, before intensity-dependant mass correction, calculating accurate masses by averaging the mass over multiple scans of a peak does not give an improvement over taking the accurate mass directly from the apex of a peak (data not shown). The reason stems from the slight intensity dependence of the mass itself; averaging only makes sense when fluctuation of the mass is noise related. But, after intensity correction of masses on the raw data level, whole-peak mass calculation still nearly negligibly improves the average mass accuracy over 211 compounds. Therefore, it is assumed that all other issues concerning deviation in mass as well as precision in mass correction play a much larger role in mass accuracy than averaging over a few scans. In Fig. 5c the threshold to include scans for averaging of the mass over the peak is lowered to 5000, which amounts to ca. two times noise. This change slightly decreases mass accuracy, because more data close to noise is included; this seems to outweigh the effect of averaging over scans.
3.8 Effect of file averaging
Figure 5d shows the result of all corrections using only one data file of reference sample A and one data file of urine sample B. When comparing Fig. 5b–d it is clear, that averaging over files is beneficial. This holds in particular for mass signals below an intensity of 30,000 (ca. 7–10 times noise). Overall, Fig. 5b shows that only 11 out of 342 relatively low-intensity mass signals are outside the ±1 ppm error envelop (272 mass signals below 0.5 ppm and 142 below 0.2 ppm) in accordance with the elemental compositions of the selected DHEA-metabolites; for the reference compounds all 211 mass signals are within ±0.5 ppm, with only 14 out of 211 signals outside of ±0.2 ppm.
3.9 Isotope ratio evaluation
These findings will help to relate experimental results to theoretical results and also help determine error bars on isotope ratios. This in turn can then empirically help in limiting the number of possible elemental compositions of a mass. It is clear, however, that for low-intensity signals below a mass of ca. 550 the isotope ratio is not reliable.
Generally speaking, the average isotope ratio’s established experimentally for the DHEA metabolites are in accordance with the deviations found for the reference compounds. They therefore additionally help confirm the identities of the DHEA metabolites.
4 Concluding remarks
The strategy, presented here, makes use of a potentially unlimited number of internal and external masses as well as a possible intensity correction with the goal to improve mass accuracy in U-HPLC high-resolution Orbitrap MS data. Pre-processing using metAlign serves to eliminate baseline and noise contributions in mass-corrected data files. Alignment using metAlign serves to average data from different replicate files and to select on ions present in all replicates and thus filtering out noise. The reduced (and averaged) data files obtained from metAlign can be searched in an ultra-fast way (for example, 211 ions in 13 data sets in 2 s); the output can easily be used to visualize error distributions in various ways. Typically, the mass accuracy can be increased by nearly one order of magnitude over an intensity range of at least four orders of magnitude and can also be used to evaluate deviations of experimental isotope ratios. Together this increases the confidence in which elemental compositions can be correlated to experimental data on DHEA-metabolites such as obtained from a urine sample in this study.
This work was supported by the Dutch Ministry of Agriculture, Nature and Food Quality, Strategic Research Funds RIKILT-WUR (project 1207232903), the Netherlands Toxicogenomics Centre (NTC; www.toxicogenomics.nl) and EU-META-PHOR (FP6: FOOD-CT-2006-036220). Thermo Fischer Scientific is acknowledged for making the Exactive available to RIKILT. Jens Griep-Raming and Helmut Muenster of Thermo Fisher Scientific are thanked for their assistance within the framework of EU-META-PHOR (FP6: FOOD-CT-2006-036220).
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1988). Algorithms gaussj.c and lfit.c, Numerical Recipes in C. Cambridge: Cambridge University Press.Google Scholar