1 Introduction

Advances in high resolution mass spectrometry play a significant role in obtaining elemental compositions of unknown compounds. However, high mass resolution alone is often not enough to get unique elemental compositions of compounds (Kind and Fiehn 2006). The larger the mass the more uncertain the elemental composition becomes. Additional heuristic rules (such as, for instance, the rings-plus-double-bonds equivalent (RDBE), LEWIS and SENIOR checks, probability of element ratio’s, nitrogen rule), and—most importantly—isotope ratio information and prior knowledge on possible elements can restrict the number of reasonable elemental compositions (Kind and Fiehn 2007).

High resolution, however, does not automatically mean high mass accuracy and a well established isotope ratio. Calibration is obtained externally and/or internally to correct for errors in mass. In Orbitrap technology—besides the normal external calibration—the option for an internal mass correction can be chosen to correct all the scans in a liquid chromatography mass spectrometry (LC/MS) experiment (Makarov et al. 2006a, b). With current Orbitrap mass analyzers a mass accuracy better than 5 ppm at resolving power 60,000 FWHM (Full Width at Half Maximum) can be achieved using only the standard external calibration; including a one-mass-based internal mass calibration a better than 2 ppm mass accuracy is feasible at the same resolution. On-going developments in Orbitrap technology may improve future specifications (Makarov et al. 2009; Olsen et al. 2009).

Two studies have been reported in which a further improvement of mass accuracy is realized through sophisticated post-analysis mass corrections. Scheltema et al. have shown a nice ZIC-HILIC HPLC-LTQ-Orbitrap MS application in which multiple system dependant internal masses with known elemental composition are used to recalibrate the mass axis. The mass correction, however, is restricted by the presence, abundance and mass range of internal background masses in the system (Scheltema et al. 2008). This makes the whole procedure dependant on not having a too clean background. Furthermore the software tools are not made available to potential users. Cox and Mann (2009) have shown elegantly how mass precision and accuracy improves by considering all associated measurements, starting from the MS peak and proceeding to its chromatographic elution profile, isotope envelope, and stable isotope pair in SILAC measurements (SILAC = Stable Isotope Labeling with Amino acids in Cell culture). The publicly available tools, however, are restricted to research such as SILAC-based proteomics.

It has been reported, that the isotope ratio in Orbitrap technology may to a certain extent be compromised by a resolution-dependant phenomenon (possible isotopic beat patterns) (Erve et al. 2009). The mentioned study was conducted on compounds in the mass range of 639 to 1,663 and is particularly evident in that study when considering fourth and fifth isotopes, but not very apparent in the first few isotopes. Still, in combination with improving mass accuracy it is worthwhile—as done in this study—to also assess the isotope ratio robustness.

The goal of the present study is to substantially improve the mass accuracy over a wide mass range while being less dependant on the number of internal calibration masses and by being independent on their identity; furthermore it is a goal to obtain estimates of the experimental isotope ratio error to be expected in a LC-Orbitrap-MS setting at a resolving power of 50,000 (FWHM). This will substantially increase the feasibility of in silico post-processing towards sub-ppm mass accuracy and—with more information on isotope ratio correctness—thus increase the confidence in the determination of the elemental composition. The overall feasibility and data-processing strategy is demonstrated by application to relatively low signals of dehydroepiandrosterone (DHEA) metabolites in a real-life concentrated bovine urine sample with limited sample clean-up. The urine sample originates from a controlled animal experiment in which evidence is given for a large number of DHEA metabolites (Rijk et al. 2009).

The effect of using multiple internal calibration masses (unknown compounds), calculating the average mass over the chromatographic peak, additional external calibration with 211 compounds and averaging over multiple files is examined. Furthermore, the mass stability over a large amplitude range as well as the isotope ratio correctness are studied.

A combination of software tools is used for this study (freely available at http://www.metalign.wur.nl/UK/Download+and+publications/). The basis is the software package, metAlign™, which is capable of accurate mass calculation over the peak, data reduction and alignment (Lommen 2009). Besides this, two new additional software tools have now been created. The first tool is the Subppm_converter which calculates and applies special internal and external calibration to original raw data files. The second tool is Search_LCMS which can query highly reduced metAlign output of multiple data sets for hundreds of masses in retention windows in a few seconds; this makes multi-file large scale mass analysis highly feasible.

Ultra-fast searching of processed data is essential in obtaining the required information to visualize the distribution of mass corrections (ppm deviations) and isotope ratios in an acceptable time frame. The speed of mass searching is directly correlated to the size of files. For this purpose, in this study file sizes are decreased 100–1000-fold.

2 Materials and methods

2.1 Chemicals

Formic acid of analytical grade is obtained from Merck (Darmstadt, Germany). Acetonitrile ultra pure LC–MS quality and water LC–MS quality are obtained from Biosolve (Valkenswaard, The Netherlands). A list of 211 standards are given in the supplementary section; these compounds are from different manufacturers.

2.2 Ultra high performance chromatography coupled to Orbitrap mass spectrometry

Ultra-high performance chromatography (U-HPLC) is performed on a U-HPLC Accela system (Thermo Fisher Scientific, San Jose, CA, USA). The U-HPLC system consists of a degasser, a quaternary pump, an autosampler and a column oven. Separation is performed on a Waters Acquity UPLC (UPLC = Ultra Performance Liquid Chromatography) BEH C18 column (150 × 2.1 mm, 1.7 μm particle size) (Waters, Etten-Leur, The Netherlands) which is kept at 40°C. A flow rate of 0.4 ml/min and an injection volume of 10 μl are used. The mobile phase consists of 20 mM formic acid in water (A) and 20 mM formic acid in water/acetonitrile (10/90 v/v) (B). The mobile phase composition is kept at 0% B for 1 min which is followed by a linear change to 20% B in 2 min, 20–80% B in 20 min, and 80–100% B in 2 min. The composition of 100% B is kept for 10 min and then is returned linearly to 0% B in 1 min, remaining at this composition for 4 min prior to the next injection. The U-HPLC is directly interfaced to a single stage Orbitrap mass spectrometer (Exactive, Thermo Fisher Scientific). The Orbitrap-MS is equipped with a heated electrospray interface (HESI) operating in the positive mode. Data are acquired between m/z 100 and m/z 1000, the resolving power is set to 50,000 (FWHM), resulting in a scan time of 0.5 s. Furthermore, the spray voltage is 2800 V, the capillary voltage is 47.5 V, the capillary temperature is 275°C, a sheath and auxiliary gas flow of 35 and 15 arbitrary units is used, respectively. Instrument calibration is performed externally prior to the sequence by infusion of a calibration solution. The calibration solution (m/z 138 to m/z 1822) contains, caffeine, MFRA (Met-Arg-Phe-Ala), ultramark 1621, acetic acid in acetonitrile/methanol/water (2:1:1, v/v) (Sigma-Aldrich). Data are recorded using Χcalibur software version 2.1.0.1139 (Thermo Fisher Scientific).

2.3 Samples

Two samples are used for this study. One reference standard solution containing 211 compounds (sample A) with molecular weights in the range of 100–1000. The other is a mix sample (B) of urine (after sample preparation) of DHEA (= Dehydroepiandrosterone) treated male calves obtained from the study as previously described (Rijk et al. 2009); this sample is known to contain a large number of DHEA metabolites.

2.4 Data acquisition

Twenty-five data sets are acquired. The run consists of 7 injections of the reference sample A, followed by 12 injections of the urine sample B, followed by 6 injections of the reference sample A.

2.5 Data processing

Data processing is done using software which can be downloaded at http://www.metalign.wur.nl/UK/Download+and+publications/. For mass accuracy correction the new Subppmconverter package (specifically developed for Orbitrap data) is used in combination with the previously published metAlign package (Lommen 2009) and the metAlign-based ultra-fast mass-searching Search_LCMS module. A schematic overview of the software procedure including multiple file averaging is given in Fig. 1.

Fig. 1
figure 1

Schematic representation of the accurate mass correction procedure. R reference standard sample (here known compounds in solvent), S sample (here urine extract)

A detailed presentation of parameters and programs is given in the documentation in the supplemental information (see Electronic supplementary material). A separate detailed presentation is also given in the supplemental information for a two-file procedure with one reference and one real-life sample (see Electronic supplementary material).

2.6 Software development

The Subppmconverter software (see Electronic supplementary material) and the Search_LCMS module (see Electronic supplementary material) are developed in C and Visual C++ version 6.0. Installation requirements are identical to those of metAlign (Lommen 2009). MetAlign must be installed first.

The Subppmconverter module can use a list of internal background masses as reference to correct the mass axis in scans. In short the program does the following: (a) It reads a list of internal masses. (b) It reads the centroided data (not normally visible in Xcalibur) in the raw data files recorded in profile mode (using the correct filter) using the xrawfile.ocx (from Xcalibur 2.1.0.1139) supplied by Thermo Fisher Scientific; it performs an intensity correction—if chosen—while reading. (c) It searches for all the proposed (see for example the list in Subppmconverter.pdf in the electronic supplementary material) internal background reference masses in all the scans in all the files (delta mass ≤0.005 Da); per scan it calculates the ppm errors of the internal reference masses and then excludes those deviating more than 5 ppm from the average; if no internal mass is present it interpolates from the closest scans containing internal masses. (d) It calculates the average mass of each proposed corrected internal mass from all the scans of all the data sets. (e) It makes a new internal mass list based on (d). (f) It repeats (a)–(c). (g) It corrects the mass values in each scan in all data sets using the corresponding calculated average ppm deviation. (h) If chosen: it calculates an external correction profile from “.csv_profile” output from the Search_LMS module (this consists of polynomial fitting (Press et al. 1988) followed by a 5-point averaging of subsequent points: ppm, mass); it applies this external correction. (i) It converts the corrected centroid data to a netCDF file. Further conversion of the corrected centroid data to Xcalibur format can be done in batch using any version of the Xcalibur Xconvert module.

The Search_LCMS module reads a list of the “.redms_acc” output files from metAlign (Lommen 2009) (output after baseline correction and noise reduction) together with a csv file containing search criteria (see Electronic supplementary material) and then creates a search output in csv format. Excel is used to visualize search results.

A download of the software and detailed methodology description are provided at www.metalign.nl free of charge. Regular updates will be provided in the future through this website. The source code will be made available under a Material Transfer Agreement on an individual basis.

3 Results and discussion

3.1 Internal lock mass correction

In their software Thermo Fisher Scientific provides an option for internal mass correction of Orbitrap data. For an internal mass correction to be feasible an internal mass must be present in all scans and the exact mass (elemental composition) of this compound has to be known. If the internal mass is not present, no correction may be performed or an incorrect other ion may be taken. If the internal mass has a low intensity the noise variation on this mass is transferred to the scans. Both these potential problems can result in erroneous corrections in certain regions of the chromatogram leading to mass errors exceeding 20 ppm (data not shown). While in diluted samples with high system background ions these problems may not occur, this is—on the other hand—all the more likely to happen in complex concentrated extracts as used in particular in metabolomics. The likelihood of ion suppression of internal references increases in concentrated samples. Unfortunately, in our system no background ion seems to be present in all scans.

Therefore the initial goal of internal mass correction developed in this study is to make all mass traces in all samples comparable by locking to a larger set (here 10) of system background ions (see Electronic supplementary material) of which only a subset has to be present in each scan. A threshold is defined for the intensity of the background ions; this is set here to 5–7 times noise (20,000 counts in centroid mode). If there are scans without an internal lock mass the procedure interpolates between known ppm-deviations in adjacent corrected scans.

Figure 2 demonstrates the above using the Subppm_converter. The subfigures are constructed from Excel compatible “.csv” output from the converter. Typical ppm deviations for both samples (i.e. reference sample A and urine sample B) are given in Fig. 2a, d. Within a time frame of 36 min the ppm deviation can fluctuate 3 ppm. Figure 2b, e shows what happens when only one lock mass is used. The choice here is on the potential lock mass that is most frequently present. When a clean sample like the reference sample is used the lock mass is only “lost” directly after the void volume where (high levels of) unretained compounds elute, and in one scan at the end of the chromatogram. While this seems to work quite nicely for this relatively clean sample, it is clear that in the case of the urine sample the internal lock is lost in the most signal-rich region of the chromatogram. If this had been the situation with the vendors’ software, erroneous mass corrections could have been the result in these regions. In Fig. 2c, f the number of internal lock masses found as a function of retention shows that near complete coverage of the chromatogram is possible when using a list of ten background ions.

Fig. 2
figure 2

Internal lock mass correction. ac From the reference sample. df From the urine sample. The horizontal axis refers to the retention time in minutes. a and d The ppm deviation versus the retention time. b and e The availability of the lock mass as a function of the retention when only one lock mass is used. c and f The number of available lock masses (here max 10) versus the retention time

3.2 Ultra-fast mass searching assists in evaluation

An essential piece of analysis software in this study is the ultra-fast search tool, Search_LCMS. In the supplemental information a list of search criteria for all 211 reference compounds is given. The 13 pre-processed reference data files can be searched for these 211 compounds in ca. 2 s using the Search_LCMS module. This ultra-fast searching of compounds is the direct result of the reduced data size achieved by metAlign (ca. 100–1000-fold). The Excel compatible output contains mass, retention, scan, ppm error and intensity for each mass in each file; furthermore data for external mass correction is created (see below). Therefore it becomes feasible to look for ppm error distributions while changing file processing and to visualize the results (see for example Fig. 3).

Fig. 3
figure 3

Ppm error distribution of 211 reference compounds in 13 data files obtained for sample A (reference sample). a No internal or external lock mass correction. b Internal lock mass correction using one lock mass. c Internal lock mass correction (average) on the basis of a maximum of ten lock masses. d As c but with additional external lock mass correction based on 211 known masses

In Fig. 3a the ppm error distribution is given for all 211 reference compounds, when no internal lock is used and the mass is taken from the apex of a peak. It can be observed, that although all signals in all 13 files have a good signal to noise ratio, mass errors in the range of −3.5 and +2.5 ppm occur. Furthermore, Fig. 3a shows a drift in mass between experiments over the time course of the sequence of experiments (ca. 1 day). This is however still in accordance with the instrument specification (error <5 ppm). In Fig. 3b an internal lock correction using one internal mass has been done (see also Fig. 2b); the ppm error distributions from 13 data files are now more or less superimposed. More precise superimposing is achieved as expected and illustrated in Fig. 3c (see also Fig. 2b) by averaging mass errors of multiple internal lock masses. Figure 3d shows the result of Fig. 3c after additional external mass correction as described in the next section; a clear further improvement (flatter distribution centered around 0 ppm) is obtained even though the mass spectrometer has been correctly calibrated according to the manufacturer’s instructions.

3.3 External lock mass correction

One of the outputs (i.e. binary output “.csv_polyfit”) of the Search_LCMS module contains the searched masses (here 211) and their ppm errors averaged over multiple (here 13) reference data sets. A software program called “polyfitmass” calculates an external correction profile on the basis of this file (see Electronic supplementary material); an output of “polyfitmass” is an Excel compatible file called polycurve.csv from which Fig. 4 is constructed.

Fig. 4
figure 4

External lock mass calibration using average ppm errors from 13 data sets. Diamond shaped points are from the 211 reference masses from sample A. The lighter curve is the polynomial fit through the points. The darker curve is the final correction curve (polynomial including the 5-point average difference)

The Subppm_converter runs this module in line. The algorithm consists of a multiple recursive polynomial fit and a 5-point smoothing. For a third to fifth order polynomial repeated fits and corrections are performed up to 30 times. The fit with the least sum of the residual squares is automatically selected as the best fit and stored. For each experimental mass a correction is performed on the basis of this fitted curve (see Fig. 4). The residual ppm-error over five subsequent masses is then averaged and again subtracted (see Fig. 4). Additionally to the multiple internal lock mass correction this final 211-mass external correction profile is used by the Subppm_converter to correct the original urine sample data as well as the reference data (see Fig. 1).

3.4 Setting-up a list of masses to search in the mass-corrected urine data

The 12 replicate mass-corrected data sets are pre-processed and aligned using metAlign software (see Fig. 1 and electronic supplementary material); only masses present in all 12 data sets are retained in the average pre-processed urine data set. This serves to further eliminate noise and to retain only real signals.

In the urine sample used in this study Rijk et al. (2009) found that a large number of metabolites downstream from DHEA are formed. Therefore it is known a priori (statistical and analytical evidence), that these metabolites can be traced in the urine sample data sets. Among these there are a number of hydroxylated metabolites conjugated to glucuronides. The known diagnostic ions (M + 1, M + NH4, 2M + 1, 2M + NH4, in-source fragments due to loss of glucuronide and/or water) can easily be calculated and searched for using the Search_LCMS module in the “all” mode (no retention windows involved) using a relatively wide mass tolerance window of ±5 ppm. The search is done on the averaged (12) urine data set (see Fig. 1 and see above). The output from the entire chromatogram consists of a list of 709 ions matching the calculated exact masses (see Electronic supplementary material). If these 709 ions are indeed related to DHEA, then 692 of them have mass errors below 2 ppm, 656 below 1 ppm, 540 below 0.5 ppm and 294 even below 0.2 ppm. For elemental composition analysis of these ions the elements may be restricted to C, H, N, O and S. The reason for this is that the urine has come from an animal experiment where DHEA was administered under very controlled conditions. Phosphorylated compounds do not generally occur in urine and the presence of other types of atoms in compounds is at least very unlikely to occur. Analysis, for example, of the positively charged ion 502.30106 (C25H44O9N = androstanetriol-glucuronide NH4-adduct) with application of the LEWIS, SENIOR and nitrogen-rule shows that the second hit is at −1.35 ppm mass difference. For a mass around 300 Dalton the second hit would be at ca. 3 ppm. In both cases the mass difference of the second hit is large with regard to expected errors. Furthermore, the second hit contains sulfur, which is not the most likely element. Normally sulfur will occur as sulphate in urine or as part of a cysteine conjugate. Sulphates do not ionize well in positive mode in the conditions used (data not shown). It is therefore highly likely that most of the 709 ions are related to elemental compositions of metabolites downstream of DHEA. Still, further restrictions towards more confidence in identification are imposed.

The 709 ions can also potentially include non-glucuronidated metabolites and non-glucuronidated, non-hydroxylated metabolites. A filtering of the ions towards likely hydroxylated and glucuronidated DHEA metabolite is done as follows. For each peak, retention time, accurate mass and intensity is obtained. The list is reduced by manual examination of ions found at the same retention time (interval of 1.5 s). Only retention times containing M + 1 or M + NH4 of hydroxylated and glucuronidated metabolites are selected; furthermore these M + 1 or M + NH4 ions should at least be accompanied by one other diagnostic ion (dimer or in-source fragment). The elemental compositions of 89 metabolites (each at least two characteristic ions) can be identified with a total of 342 diagnostic ions (see Electronic supplementary material. This is in line with previous evidence (Rijk et al. 2009). Most of these mass peaks are in the lower range of detection and serve nicely as a test for the mass accuracy enhancement as a whole and as a test for the calibration using an external source (i.e. obtained from separately measured reference data).

Although the 89 downstream metabolites may seem a large number, it should be noted that this is a pre-concentrated urine with high levels of metabolites due to DHEA treatment of calves; furthermore the number of metabolites is also combinatorially determined by different hydroxylation positions, different positions of up to two double bonds, alpha and beta configurations in the steroidal structures and also the position of glucuronidation. There is still a lot of information left on non-hydroxylated, non-glucuronylated, sulphated (although not likely in positive mode) and other possible metabolites. The number of combinatorial possibilities could exceed 200 metabolites for DHEA.

3.5 Further evaluation of mass corrections

Having corrected reference data files and urine data files using internal and external lock masses the average ppm error can be visualized as a function of (10Log) intensity for the resp. 13 and 12 replicates. For this, the mass-corrected reference data are searched for 211 molecular ions of reference compounds using a 5 ppm tolerance for mass deviation and 2.5% for retention time deviation (see Electronic supplementary material). For the urine sample the list of ions (342 ions in total; 89 metabolites) described above is used for the search.

3.6 Intensity dependant mass correction

Figure 5a shows the combined picture of 211 reference masses (diamonds) and 342 (squares) DHEA-metabolite-related masses. This figure shows that towards lower intensity there is a slight tendency for a negative bias in the accurate mass. This is corroborated by the observation that the 211 reference masses (13 file average) calculated in apex mode are on average 0.144 ppm higher than when these masses are calculated over all scans of a peak (minimum intensity 20,000 per included scan); the masses were calculated on exactly the same data, i.e. internal multiple lock corrected data. Therefore, an empirically derived intensity correction option is built into the Subppm_converter. Since a mass intensity correction will (although only slightly) influence all subsequent steps, this correction is done at the original data level and requires total reprocessing (see Fig. 1). Figure 5b shows the same data after intensity correction of the mass. The intensity correction slightly decreases the mass errors for reference and urine-related masses.

Fig. 5
figure 5

Ppm error as a function of the 10Log of the intensity. Diamonds indicate the results from the 211 reference masses in sample A (ac: average of 13 files; d: one file). Squares indicate the 342 masses searched in sample B (urine) (ac: average of 12 files; d: one file). Masses were calculated over the whole peak (Lommen 2009). a No intensity dependent correction on the mass was performed; mass averaging threshold = 20,000. b An intensity dependant correction on the mass was performed. (Empirical ppm correction used = 0.24 × 10Log(Intensity) − 1.45); mass averaging threshold = 20,000. c As b, but with mass averaging threshold = 5,000. d As b, but without file averaging

3.7 Mass from apex or from whole peak

Interestingly, before intensity-dependant mass correction, calculating accurate masses by averaging the mass over multiple scans of a peak does not give an improvement over taking the accurate mass directly from the apex of a peak (data not shown). The reason stems from the slight intensity dependence of the mass itself; averaging only makes sense when fluctuation of the mass is noise related. But, after intensity correction of masses on the raw data level, whole-peak mass calculation still nearly negligibly improves the average mass accuracy over 211 compounds. Therefore, it is assumed that all other issues concerning deviation in mass as well as precision in mass correction play a much larger role in mass accuracy than averaging over a few scans. In Fig. 5c the threshold to include scans for averaging of the mass over the peak is lowered to 5000, which amounts to ca. two times noise. This change slightly decreases mass accuracy, because more data close to noise is included; this seems to outweigh the effect of averaging over scans.

3.8 Effect of file averaging

Figure 5d shows the result of all corrections using only one data file of reference sample A and one data file of urine sample B. When comparing Fig. 5b–d it is clear, that averaging over files is beneficial. This holds in particular for mass signals below an intensity of 30,000 (ca. 7–10 times noise). Overall, Fig. 5b shows that only 11 out of 342 relatively low-intensity mass signals are outside the ±1 ppm error envelop (272 mass signals below 0.5 ppm and 142 below 0.2 ppm) in accordance with the elemental compositions of the selected DHEA-metabolites; for the reference compounds all 211 mass signals are within ±0.5 ppm, with only 14 out of 211 signals outside of ±0.2 ppm.

3.9 Isotope ratio evaluation

Besides mass accuracy, isotope ratio’s are also important for identification, because they limit the number of possible elemental compositions. This study is therefore extended towards evaluating experimentally obtained average isotope ratio’s. All data shown in Fig. 6 come from file averaging as in Fig. 5. Figure 6a shows that estimated numbers of C-atoms obtained from experimental isotope ratio’s do not always match with the known theoretical values. It has been reported that the isotope ratio in Orbitrap technology can be slightly compromised between mass 639 and 1663 Da by isotopic beat patterns (Erve et al. 2009). This is particularly evident at high resolution when considering fourth and fifth isotopes, but less so for the first few isotopes. Figure 6a has limited data in the region above mass 639; the signal to noise in this study is not high enough to reliably detect the higher isotopes noted by Erve et al. However, major systemic deviations of unknown origin in the region below mass 550 seem to occur, which are not the result of pre-processing by software and are not within the scope of the study by Erve et al. The averaged isotope ratio’s of urine data used in Fig. 6a are in reasonable accordance with the averaged reference data, but also indicate intensity dependence besides a mass dependence. Figure 6b shows the 10Log intensity of the first isotopic mass versus the experimentally determined C-atom deviation. Again masses below 550 (diamonds and squares) and above 550 (triangles) give different results. The empirical results suggest that for lower masses (<550; diamonds and squares) the isotope ratio deteriorates towards lower intensity ending in an expected more diffuse noise-related dispersion. Masses above 550 (triangles), although much less in number, clearly are much less affected.

Fig. 6
figure 6

Isotope ratio deviations. a Deviation of calculated number of C-atoms from the theoretical number of C-atoms versus the theoretical number of C-atoms. Diamonds indicate reference compounds from sample A (13 file average). Squares originate from compounds in urine; first isotopic peak intensity >20,000. Triangles (slightly shifted to the right for clarity) indicate compounds from urine; first isotopic peak intensity <20,000. b Deviation of calculated number of C-atoms from the theoretical number of C-atoms versus the 10Log of the isotopic (first) peak intensity. Diamonds indicate reference compounds from sample A with mass below 550 (13 file average). Squares originate from compounds in urine with mass below 550. Triangles are data points from both samples originating from masses > 550

These findings will help to relate experimental results to theoretical results and also help determine error bars on isotope ratios. This in turn can then empirically help in limiting the number of possible elemental compositions of a mass. It is clear, however, that for low-intensity signals below a mass of ca. 550 the isotope ratio is not reliable.

Generally speaking, the average isotope ratio’s established experimentally for the DHEA metabolites are in accordance with the deviations found for the reference compounds. They therefore additionally help confirm the identities of the DHEA metabolites.

4 Concluding remarks

The strategy, presented here, makes use of a potentially unlimited number of internal and external masses as well as a possible intensity correction with the goal to improve mass accuracy in U-HPLC high-resolution Orbitrap MS data. Pre-processing using metAlign serves to eliminate baseline and noise contributions in mass-corrected data files. Alignment using metAlign serves to average data from different replicate files and to select on ions present in all replicates and thus filtering out noise. The reduced (and averaged) data files obtained from metAlign can be searched in an ultra-fast way (for example, 211 ions in 13 data sets in 2 s); the output can easily be used to visualize error distributions in various ways. Typically, the mass accuracy can be increased by nearly one order of magnitude over an intensity range of at least four orders of magnitude and can also be used to evaluate deviations of experimental isotope ratios. Together this increases the confidence in which elemental compositions can be correlated to experimental data on DHEA-metabolites such as obtained from a urine sample in this study.