A Practical Approach for Determination of Mass Spectral Baselines

  • Kui Yang
  • Xiaoling Fang
  • Richard W. Gross
  • Xianlin Han
Research Article

Abstract

Precise determination of the baseline levels of mass spectra is critical for identification and quantification of analytes. Herein, we present a practical approach for determination of the baselines of mass spectra acquired under differential conditions. The baseline determined by this approach was the sum of baseline drift and noise level. The baseline drift was determined by averaging a number of lowest ion intensities. The noise level was determined based on the fact that an accelerated intensity change exists from noise to signal. This change was best revealed by the established accumulative layer thickness curve that was derived from the thicknesses of individual deducted layers. Deductions were performed sequentially layer by layer, each of which has a thickness of averaged lowest ion intensities from existing spectral data. The layer where the accelerated intensity change occurred was defined as a transition layer, which was determined from the polynomial regression in the sixth order of the accumulative layer thickness curve followed by resolving the roots of its fourth derivative. We validated the presence of this transition layer through determination of its convergence from various accumulative layer thickness curves generated by varying either the ending or the fineness of the sequential layer deductions. This simple, practical, program-based baseline determination approach should greatly increase the accuracy and consistency of identification and quantification by mass spectrometry, and facilitate the automation of data processing, thereby increasing the power of any high throughput methodology in general and of shotgun lipidomics in particular.

Key words

Baseline correction Mass spectrometry Shotgun lipidomics 

1 Introduction

Determination of a baseline is critical for identification of analytes by mass spectrometry through properly discriminating noises and signals. It is also essential for accurate quantification of the mass content of each analyte through correcting the baseline contribution to the individual peak intensities of a mass spectrum, particularly when a shotgun lipidomics approach is employed where quantification of analytes is performed through direct comparison of their ion intensities with that of selected internal standard(s) [1] and when the species is in low abundance (e.g., signal/noise < 10). Baseline correction is also important for precise display of mass spectra, for example, in tissue mapping [2].

Two major factors contribute to the values of a baseline of a mass spectrum, i.e., detector drift (i.e., baseline drift) and chemical noise (i.e., noise level). The baseline drift represents a shift of a mass spectrum from the origin (i.e., zero) and can be tuned to a linear, minimal increase with increase in m/z for a mass spectrometer with a quadrupole as an analyzer. The chemical noise, the actual noise level of a spectrum after deduction of the baseline drift, is composed of background signals resulting from residual chemicals. In particular, inorganic salts presented in the sample matrix are largely responsible for this noise.

Many different approaches, based on a variety of algorithms, have previously been developed and been applied for different purposes [2, 3, 4]. However, most of these approaches have been limited to their particular utilities (e.g., correction for matrix-induced baseline in matrix-assisted laser desorption/ionization mass spectrometry; correction for mobile phase components-contributed baseline; etc.). Herein, we present a simple, practical approach for a general determination of the baseline of a mass spectrum. This approach includes the corrections for both the baseline drift and the noise level of a mass spectrum. The principles of this practical approach are discussed and the procedures to determine the baseline are described in detail. We compared many of the determined baselines using either this program-based approach or a manual approach and found that the baselines determined by these two approaches were very consistent. We specifically pointed out that although the new approach was derived largely from the spectra acquired by the Xcalibur operation system, it could be readily modified for other operation systems as we practiced for the mass spectra acquired by ABI 4800 mass analyzer. We believe that the development of this simple, practical, program-based approach should greatly facilitate the automation and increase the power of high throughput methodology.

2 Experimental

2.1 Determination of the Baseline Drift of a Mass Spectrum

In theory, the baseline drift of a single spectral peak acquired in the profile mode can be corrected by deduction of the lowest intensity of its data points. The baseline drift of a mass spectrum in a certain mass range can thus be corrected by averaging the lowest intensities of individual peaks of the mass spectrum (Scheme 1a). In practice, we first determined the number of peaks (N) of the mass spectrum and then averaged the N lowest intensities from the spectral data. This averaged intensity was defined as the baseline drift of the spectrum and deducted from the spectral data (Scheme 1b). The detailed procedures are described below.
Scheme 1

Illustration of the baseline drift as well as intensity difference between neighboring data points. The solid symbols in a three-peak imitated mass spectrum (a) indicate the three least intensity points, which were averaged to determine the baseline drift. The arrows indicate the intensity differences between neighboring data points in either a low or high intensity peak (a). The broken line in a mass spectrum acquired from a lipid extract (b) indicates the determined levels of baseline drift

Each of the mass spectra processed in the current work were acquired in the profile mode by the Xcalibur operation system. By using the Xcalibur Qual Browser, each spectrum was averaged from the acquired individual scans and smoothed with a Gaussian function. A full list of the spectral data points within a user-specified mass range was generated in the Qual Browser with the display option specified for “All peaks.” In this data list, the m/z difference between every two neighboring data points was one-tenth of the Peak Width (FWHM). For example, when a unit resolution spectrum was acquired at a Peak Width setting of 0.7 Th [5, 6], the m/z difference between two neighboring data points was 0.07 Th. Each single peak in a unit resolution spectrum was then composed of 14–15 data points (i.e., 1/0.07 = 14.3). If a spectrum was acquired at a Peak Width setting of 0.35 Th, the m/z difference between two neighboring data points was 0.035 Th and the spectrum was a half unit resolution spectrum each peak of which occupied a half mass unit and was also composed of 14–15 data points (i.e., 0.5/0.035 = 14.3). Accordingly, we determined the number of peaks of a mass spectrum from the number of raw spectral data points by the following equation:
$$ \begin{array}{*{20}{c}} {{\text{Number}}\;{\text{of}}\;{\text{peaks}}\left( {\text{N}} \right){\text{of}}\;{\text{a}}\;{\text{spectrum}}} \\ { = {\text{The}}\;{\text{total}}\;{\text{number}}\;{\text{of}}\;{\text{raw}}\;{\text{data}}\;{\text{points}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}/{14}.{3}} \\ { = {\text{The}}\;{\text{total}}\;{\text{number}}\;{\text{of}}\;{\text{raw}}\;{\text{data}}\;{\text{points}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}*0.0{7}} \\ \end{array} $$
(1)

For example, a unit resolution spectrum of a full list of 1430 data points contained 100 peaks (i.e., N = 1430 * 0.07 = 100). If, within the same mass range, a half unit resolution spectrum was acquired, a full list of 2860 spectral data points would be generated because the mass difference (i.e., 0.035) between every two neighboring data points was half of that from unit resolution spectrum. The number of peaks of the spectrum would therefore be doubled (i.e., N = 2860 * 0.07 = 200).

Next, we sorted the data in an ascending order of the intensities, and then averaged the N lowest intensity data. This averaged intensity was defined as the baseline drift of the mass spectrum in the mass region. For example, the baseline drift of the unit mass spectrum exemplified above was obtained by averaging the 100 lowest intensities after sorting the intensity data while the baseline drift of the exemplified half unit mass spectrum was obtained by averaging the 200 lowest intensities. Finally, the baseline drift was subtracted from each of the raw spectral intensity data points. Those data points that had intensities of either zero or negative after the baseline drift deduction were discarded. The remaining data points were re-sorted in an m/z order and termed as Data Set 0, which represented the baseline drift-deducted data set. A mass spectrum reconstructed from the baseline drift-deducted data set (i.e., Data Set 0) had baseline drift deducted but still had background noise present (Figures S1 to S4).

2.2 Determination of the Noise Level of a Mass Spectrum

The baseline drift-deducted data points (i.e., Data Set 0) of a mass spectrum were next used for determination of the noise level of the spectrum. In theory, the same procedure as the baseline drift deduction described above could be repeated on Data Set 0, which in turn generates a new data set (i.e., Data Set 1). If this type of deduction procedure is repeated over and over, many data sets could be generated from their previous ones (e.g., Data Set M from Data Set M-1) where each data set has less data points compared to its previous one. The mass spectrum reconstructed from Data Set 1 could be imagined as a spectrum with a thin “layer” wiped off from the bottom of the baseline drift-deducted spectrum (reconstructed from Data Set 0), while the thickness of the “layer” is the average of N (calculated by Equation 1) lowest intensities from Data Set 0. The mass spectrum reconstructed by Data Set M could be imagined as a spectrum with a “layer” wiped off from the bottom of the spectrum reconstructed from Data Set M-1, while the thickness of the “layer” is the average of N lowest intensities from Data Set M-1.

The thickness of each “layer” may vary. A “layer” wiped off to generate a very early data set (e.g., Data Set 1) from its previous data set (i.e., Data Set 0) is usually thin, while the “layer” wiped off to generate a later data set from its previous data set could be much thicker. One of the reasons is that each peak contains the same number of data points (i.e., 14–15) equally distributed in m/z dimension (i.e., x-axis), thus the higher the intensity of the peak, the bigger the intensity difference between its two neighboring data points (Scheme 1a). Since the spectral peaks are Gaussian smoothed and of Gaussian distribution, this intensity difference increases with the data point moving towards the top of the peak in comparison to the difference between two points near the bottom of the peak (Scheme 1a). Another reason is that the same number (N, calculated by Equation 1) of lowest intensities is averaged from each data set to generate next data set. Accordingly, the very early layer by layer deductions would not lead to a significantly-varied thickness of each layer since the N lowest intensities in each data set are almost exclusively from the low intensity peaks (i.e., noise peaks) and the intensity differences between neighboring data points of the low intensity peaks are small. When low intense peaks are wiped off after a few times of deduction, the data points from high intensity peaks (or signals) are then picked up in the N lowest intensities whose average is the layer thickness for deduction from the current data set to generate next data set. Therefore, a significant increase in layer thickness would occur at this point. We defined this layer as the Transition Layer. The sum of the deducted intensities (or the layer thicknesses) of individual layers from the first layer to the transition layer was designated as the noise level of the spectrum, which was deducted from Data Set 0 to obtain the spectrum with both baseline drift and noise corrected.

In reality, when working on complex mass spectra, searching for the transition layer might not be as straightforward as described above. We performed the following steps to practically search for this transition layer from the baseline drift-deducted spectral data points (i.e., Data Set 0).
  1. 1.

    Calculation of the Thickness of a Layer

    To calculate the thickness of each layer, we repeated the steps for determining the baseline drift described above on every new data set except that the number (N) of lowest intensity data points was replaced with a new number (n), which was calculated by the following equation:
    $$ {\text{n}} = {\text{The}}\;{\text{total}}\;{\text{number}}\;{\text{of}}\;{\text{raw}}\;{\text{data}}\;{\text{points}}\;{\text{of}}\;{\text{a}}\;{\text{spectrum}}*{\text{Step}}\;{\text{Length}} $$
    (2)
    where the Step Length determined how finely each layer deduction would be processed and is defined by the user. This number (n) could be different from the number (N). If a step length of 0.07 was defined, the number (n) calculated by Equation 2 was identical to the number (N) by Equation 1. If a step length of <0.07 was defined, a smaller number of lowest intensity data points from the current data set were used for deduction to yield the next data set. The average of a smaller number (e.g., p) of lowest intensity data points (e.g., intensity 1, intensity 2, …, intensity p in an ascending order) was smaller than the average of a bigger number (e.g., q, q > p) of lowest intensity data points (e.g., intensity 1, intensity 2, …, intensity p, intensity p + 1, …, intensity q in an ascending order). Accordingly, this represented a finer sequential layer deduction because more layers of deduction would necessarily be performed to wipe off the entire spectral data points if a smaller averaged intensity was used for deduction of each layer. In contrast, the defined step length of >0.07 led to a rougher layer deduction represented by a bigger averaged intensity for deduction of each layer and less layers of deduction for the entire spectrum. The significance of varying the number (n) for deduction of each layer was discussed in detail in the Discussion section.
    In practice, we first sorted the data points from the baseline drift deducted spectral data (Data Set 0) in the order of ascending intensity and averaged the n lowest intensity data points (n was calculated by Equation 2 with a user-defined Step Length). Then we deducted this averaged intensity from the intensities of individual data points in Data Set 0 and discarded the data points whose intensities were zero or negative after deduction. The remaining data points yielded Data Set 1. The mass spectrum reconstructed from the newly generated Data Set 1 could be viewed as the baseline drift-deducted spectrum (reconstructed from Data Set 0) was wiped a layer off from the bottom of the spectrum whose thickness was the average of the n lowest intensities from Data Set 0. This averaged intensity was designated as the Thickness of Layer 1 (TL1). This procedure was repeated to calculate the thicknesses of sequential layers. In general, the calculation of the Thickness of Layer i (TLi) could be represented by the following equation:
    $$ {\text{T}}{{\text{L}}_{\text{i}}} = {\text{ Average }}\left( {{\text{Intensity 1}},{\text{ Intensity 2}},{ } \ldots, {\text{ Intensity n of Data Set i}} - {1}} \right) $$
    (3)
    where TLi is the Thickness of Layer i, and Intensity 1, Intensity 2, …, Intensity n are the n lowest intensities of Data Set i-1. The deduction of TLi from the data points of Data Set i-1 yielded Data Set i. For example, Data Set 2 was generated from Data Set 1 by wiping from Data Set 1 one layer off with a thickness of TL2, which was calculated from averaging the n lowest intensities of Data Set 1. The procedure was repeated until the number of the remaining spectral data points was smaller than n. A series of the thicknesses of layers: TL1, TL2, …, TLm were generated accordingly where TLm was the thickness of the last Layer m, whose deduction generated the last data set m from Data Set m-1 while one-step further deduction on Data Set m would have generated a data set that contained less than n remaining spectral data points.
     
  2. 2.

    Generation of the Accumulative Thickness of a Layer

    In reality, it was difficult to directly determine the transition layer from the generated series of the layer thicknesses (i.e., TL1, TL2, …, TLm for layer 1, layer 2, …, and last layer m, respectively). One of the reasons was that the curve of layer thickness (TL) versus layer mostly had irregular trends for which no regression method worked consistently for different mass spectra with satisfactory correlation coefficients. To resolve this issue, we derived the Accumulative Thickness of a Layer (ATL) from the individual Thickness of the Layer (TL) by the following equation:
    $$ {\text{AT}}{{\text{L}}_{\text{i}}} = \sum {\text{T}}{{\text{L}}_{\text{i}}} $$
    (4)
    where TLi was the Thickness of Layer i (1 ≤ i ≤ m) and was calculated by Equation 3, and ATLi was the Accumulative Thickness of Layer i. The curve of the Accumulative Thickness of Layer (ATL) versus layer was termed as the accumulative layer thickness curve, which was used for determining the transition layer in the following steps.
     
  3. 3.

    Automated Determination of the Transition Layer

    Next, we fitted the accumulative layer thickness curve by regression to determine the transition layer. The first point (1, ATL1) of the curve was not included for the curve fitting because the thickness of the very first layer (i.e, TL1 or ATL1) was lack of stability and the first layer impossible to be the transition layer. We determined the transition layer from the accumulative layer thickness curve by the following procedures:
    1. (1)
      Fitted these accumulative layer thickness data points by polynomial regression in the order of 6 by using MATLAB function polyval as follows:
      $$ {\text{y }} = {\text{ a }} + {\text{ bx }} + {\text{ c}}{{\text{x}}^{{2}}} + {\text{ d}}{{\text{x}}^{{3}}} + {\text{ e}}{{\text{x}}^{{4}}} + {\text{ f}}{{\text{x}}^{{5}}} + {\text{ g}}{{\text{x}}^{{6}}} $$
      (5)
      where “x” was the layer number (1 ≤ x ≤ m-1); “y” was the accumulative layer thickness (ATL) of the layer x, which is calculated from Equation 4; a, b, …, and g were the regression coefficients. To specify, the first data point (x1, y1) for the regression is the second point (2, ATL2) of the original accumulative layer thickness curve due to elimination of the first layer for regression, and the last data point (xm-1, ym-1) is the last point (m, ATLm) of the original curve.
       
    2. (2)
      Calculated the derivatives of the obtained polynomial regression up to the fourth derivative. The fourth derivative was as follows:
      $$ {\text{y}}''''{ } = {\text{ 24e }} + { 12}0{\text{fx }} + { 36}0{\text{g}}{{\text{x}}^{{2}}} $$
      (6)
      where e, f, and g were regression coefficients from Equation 5.
       
    3. (3)
      Found zeros of y”” by solving the following single-variable quadratic equation by using MATLAB function polyder:
      $$ {\text{y}}'''' = 0{\text{ or 15g}}{{\text{x}}^{{2}}} + {\text{ 5fx }} + {\text{ e}} = 0 $$
      (7)
      where e, f, and g were regression coefficients from Equation 5. There were two possibilities for the roots of Equation 7: the two roots were real numbers xI and xII when 5f2 – 12eg ≥ 0 or the two roots were complex numbers when 5f2 – 12eg < 0. When two real-number roots were obtained and xI ≠ xII, the bigger number was taken as the transition layer and the smaller number was discarded. When two complex-number roots were obtained, the algorithm redid steps (1), (2), and (3) with the narrowed accumulative layer thickness curves by eliminating the last data point until real number roots were found from Equation 7.
       
     
  4. 4.

    Determination of the Accumulative Layer Thickness Corresponding to the Determined Transition Layer

    After the transition layer (e.g., the xI from step 3) was determined, we calculated the accumulative layer thickness yI corresponding to xI. If xI was an integer number, the yI from the data point (xI, yI) of the accumulative layer thickness curve was then taken as the accumulative layer thickness corresponding to the transition layer xI. If xI was not an integer number, the “y”s from the two adjacent data points that were neighbors to the determined transition layer xI (i.e., data points (xs, ys) and (xs+1, ys+1) where xs < xI < xs+1) were used to calculate the yI corresponding to xI by the following equation:
    $$ {y_I} = {y_s}*\left( {{x_{{s + 1}}} - {x_I}} \right) + {y_{{s + 1}}}*\left( {{x_I} - {x_s}} \right) $$
    (8)
    where xI is the determined transition layer; yI is the accumulative layer thickness corresponding to xI; xs and xs+1 are the two adjacent layers from the accumulative layer thickness curve that meet xs < xI < xs+1; ys and ys+1 are accumulative layer thicknesses corresponding to xs and xs+1, respectively.
     
  5. 5.

    Self-Check and Determination of the Spectral Noise Level

    In theory, the determined accumulative layer thickness yI corresponding to the determined transition layer xI could be considered as the noise level of the spectrum. In practice, to self check the stability of the determined transition layer and eliminate any potential uncertainty from single time polynomial regression on selected data points, we determined more transition layers from narrower regions of the accumulative layer thickness curve. Specifically, we first repeated the procedures (1), (2), and (3) of Step 3 to individually determine the transition layers (e.g., xI1, xI2, …) from a series of narrowed regions of the accumulative layer thickness curve (i.e., x = 1 to m-2, x = 1 to m-3, …, and x = 1 to 7) which were yielded by eliminating the last data point of the curve sequentially until minimal number (i.e., seven) of data required for determination of the regression coefficients by Equation 5 reached. Then, the accumulative layer thicknesses yI1, yI2, …, corresponding to the newly determined transition layers xI1, xI2, …, respectively, were determined by repeating Step 4. We discarded the values from the determined yI1, yI2, … that were either larger than yI or less than yI * 70% and averaged the rest of the values. This averaged accumulative layer thickness was defined as the noise level of a mass spectrum. The overall baseline of a mass spectrum was corrected by the sum of the determined baseline drift and the determined noise level of the spectrum. Meanwhile, the signal-to-noise ratio of an ion peak can be calculated as:
    $$ \begin{array}{*{20}{c}} {{\text{S}}/{\text{N}} = {\text{the}}\;{\text{signal}}\;{\text{of}}\;{\text{an}}\;{\text{ion}}\;{\text{peak}}/{\text{the}}\;{\text{noise}}\;{\text{level}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}} \\ { = {\text{baseline}} - {\text{corrected}}\;{\text{peak}}\;{\text{intensity}}/{\text{the}}\;{\text{noise}}\;{\text{level}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}} \\ { = \left( {{\text{ion}}\;{\text{peak}}\;{\text{intensity}}--{\text{the}}\;{\text{overall}}\;{\text{baseline}}} \right)/{\text{the}}\;{\text{noise}}\;{\text{level}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}} \\ \end{array} $$
    (9)
     

3 Results and Discussion

3.1 Correction for Baseline Drift of a Mass Spectrum

The baseline drift of a mass spectrum is a constant shift of the peak intensities from their original values to the apparently determined values and consistently occurs for the entire spectrum. This drift is different from the noise background of the spectrum. The baseline drift of a mass spectrum could be either mass independent or mass dependent. However, when the mass range of interest is narrow, the baseline drift usually becomes minimally mass dependent within the mass range. In the current study, we focused on the correction of the mass independent baseline drift of a mass spectrum. When the mass range of interest is wide and the mass dependence of baseline drift has to be addressed, one can segment the entire widely ranged mass spectrum into a few narrowly ranged spectra [7] and then employ the current approach to correct the baseline drift individually for each of the segmented mass spectra.

The baseline drift of a Gaussian-smoothed single spectral peak acquired in the profile mode can be represented by its lowest intensity which is usually the intensity from either the first data point or the last data point of the peak. The baseline drift of a mass spectrum acquired in the profile mode containing N Gaussian-smoothed single peaks, in theory, can be represented by the line connecting each lowest intensity data point of each of the N single peaks in the spectrum in the order of m/z. If the baseline drift is considered mass-independent within the mass range, the baseline drift can be simplified by averaging the N individual lowest intensities, each of which is the lowest intensity for one of the N peaks of the mass spectrum. The baseline drift was determined in our approach by averaging the N lowest intensities of the entire spectrum while the N lowest intensities were selected through sorting the entire spectral data points in an ascending order. Therefore, among the N lowest intensities, instead of one from each peak, more than one might come from the data points of one peak, or none from another peak. However, the advantage of our approach for baseline drift determination is its simplicity with practically sufficient accuracy. Its simplicity results from the one-time sorting in an ascending order of the peak intensities that fishes out all the N lowest intensities simultaneously from the full data list. Its sufficient accuracy is due to that the average of a large number (N, generally in hundreds or more) of lowest intensities can very likely represent the trend of the baseline shift of the entire spectrum (Scheme 1b). Additional examples of the determined baseline drifts are demonstrated in the Supplementary Materials (Figures S1-S4).

3.2 Generation of the Accumulative Layer Thickness Curve from the Baseline Drift Corrected Mass Spectrum

Correction for noise levels was next conducted on the baseline drift-corrected mass spectrum. An identical procedure to the baseline drift deduction but with a number of n lowest intensities for deduction (as described in the Section 2) was repeated on the baseline drift-corrected data points (i.e., Data Set 0), which yielded Data Set 1. The spectrum reconstructed from Data Set 1 could be visualized as a spectrum with one layer (Layer 1) wiped off from the bottom of the baseline drift-corrected spectrum. The thickness of Layer 1, TL1, was the average of the n lowest intensities from Data Set 0. In general, Data Set i (1 ≤ i ≤ m, m is the last data set containing ≥ n spectral data points) were generated by deduction of a layer (Layer i; having a thickness of TLi by Equation 3) from Data Set i-1. When these layers (Layer 1, Layer 2, …, Layer m) were laid along the y-axis of the spectrum, they were unequally distributed by varied thicknesses of individual layers (TL1, TL2, …, TLm, respectively) (Figure 1). Only small changes existed in the thickness of the first few deducted layers due to the dominant contribution of the low intense peaks to these layer deductions (see the Section 2) (Figures 1b and 2a). When the low intensity peaks were deducted completely from the data set, the following layer deduction then picked the data points from the remaining high intensity peaks present in the data set. This would subsequently result in a significant increase in the thickness of the layer which we defined as the transition layer (Figure 2a).
Figure 1

Illustration of the layer thicknesses of a mass spectrum. (a) Displays a mass spectrum of mouse myocardial lipid extract acquired by negative-ion ESI-MS as described previously [6] and in Supplementary Materials. (b) Shows the amplified section of the mass spectrum displayed in (a) and illustrates the physical meaning of the baseline drift and the thickness of layer (TL). TL1, TL2, TL3, TL4, TL5, TL23, and TL24 representatively indicate the thickness of layer corresponding to layers 1, 2, 3, 4, 5, 23, and 24 (the last layer), respectively

Figure 2

Representative layer thickness curve and accumulative layer thickness curve. The layer thickness curve (a) and accumulative layer thickness curve (b) were derived from the mass spectrum in Figure 1(a). The closed symbols indicate the region of the determined transition layer (12.1398, see Table 1)

Before the transition, the thickness of each layer did not change significantly from layer to layer (Figure 1b) because the n lowest intensity data points in the current data set and in its previous data set were both from low intensity peaks that had small intensity differences between the neighboring data points (Scheme 1a). After the transition, the thickness of each layer increased significantly from layer to layer (Figure 1a) because the n lowest intensity data points in the current data set and in its previous data set were both from high intensity peaks that had big intensity changes (Scheme 1a). During the transition, the n lowest intensity data points in the current data set were, at least partially, from high intensity peaks while the n lowest intensity data points in its previous data set were exclusively from low intensity peaks. Therefore, the average of the n lowest intensities from the current data set increased significantly compared with that from the previous data set at the occurrence of the transition, and the rate of the increase at the transition should be differentiable from that before the transition (where the increase if any would be slight) and after the transition (where the increase would be dramatic) (Figure 2).

It was noted that the trends of these determined layer thicknesses were varied irregularly (inset in Figure 2a). This irregulation made the curve regression difficult while a precise curve regression is essential to automatically locate the transition layer mathematically by an algorithm. We found that employing an accumulative layer thickness curve yielded from the TLi data [i.e., a curve of ATLi versus layer i (Figure 2b), where ATLi was calculated by Equation 4] successfully bypassed this difficulty.

3.3 Determination of the Transition Layer from the Accumulative Layer Thickness Curve

The accumulative layer thickness curve with the first layer data point eliminated was used for the curve fitting as described in the Experimental section (Figure 2b, Table 1). The curve was fitted by polynomial regression in the order of 6, by which all the examined mass spectra thus far were well fitted. Next, the derivatives of the regression equation were performed, including the first derivative to the fourth derivative. Among these derivatives, the first derivative represents the rate of change of the accumulative layer thickness with layer (i.e., y’ = d(ATLx)/dx); the second derivative represents the rate of change of the accumulative layer thickness change rate or the acceleration of the accumulative layer thickness with layer (i.e., y” = d(y’)/dx); and the third derivative represents the rate of change of the acceleration of the accumulative layer thickness with layer (i.e., y”’ = d(y”)/dx). The roots of Equation 6 (i.e., the fourth derivative = 0) represent the layers that correspond to the extrema of the rate of change of the acceleration of the accumulative layer thickness (i.e., the extrema of the curve of y”’ versus x). Since the fourth derivative of a sixth order polynomial is in the order of 2, Equation 6 has two roots. We observed that the lower value of the two roots generally located within the first few layers, which might indicate a type of transition of which we did not yet know the meaning while the higher value of the two roots best represents the transition layer.
Table 1

The Effects of Different Numbers of Layers used for Determination of a Transition Layer to Derive the Noise Level of a Mass Spectruma

Layer i (x)

ATLi or yb

The coefficients of the six order polynomial regression

Determined transition layer

Determined noise levelb

x6

x5

x4

x3

x2

x1

x0

1c

96.0257

         

2 (1)

178.1054

         

3 (2)

241.4414

         

4 (3)

303.5546

         

5 (4)

362.1216

         

6 (5)

417.7978

         

7 (6)

475.7886

         

8 (7)

543.0676

−0.0060

0.1401

−1.1303

3.9280

−6.7077

68.9562

112.9251

6.4240

442.3857

9 (8)

611.4270

−0.0129

0.3021

−2.6454

10.9921

−23.7935

88.9226

104.3376

6.1826

428.3870

10 (9)

673.6276

0.0015

−0.0781

1.2995

−9.2411

29.4259

22.3163

134.4206

14.3558

1126.6866

11 (10)

743.2630

0.0091

−0.3010

3.8417

−23.4735

69.9014

−31.7203

159.9766

8.0684

547.7460

12 (11)

836.9986

0.0064

−0.2158

2.7817

−17.0385

50.2164

−3.7860

146.1761

8.2008

556.7936

13 (12)

942.5386

0.0003

−0.0028

−0.0868

1.7348

−11.2756

88.6708

98.5609

7.1979

489.1044

14 (13)

1071.3711

−0.0009

0.0433

−0.7543

6.4208

−27.6442

114.6717

84.6279

11.3401

775.1448

15 (14)

1226.8370

−0.0009

0.0426

−0.7440

6.3441

−27.3593

114.1949

84.8934

11.3739

778.3111

16 (15)

1384.3546

−0.0013

0.0574

−0.9883

8.2921

−35.0162

127.6661

77.1149

11.0511

748.0493

17 (16)

1561.4457

−0.0007

0.0332

−0.5633

4.6994

−20.1054

100.1519

93.5721

11.2062

762.5905

18 (17)

1795.2227

0.0001

−0.0087

0.2179

−2.2787

10.3935

41.2455

130.0248

12.5442

894.4364

19 (18)

2097.3211

0.0005

−0.0285

0.6083

−5.9529

27.2633

7.2054

151.7936

12.4177

881.0784

20 (19)

2483.2300

0.0005

−0.0289

0.6166

−6.0358

27.6624

6.3657

152.3480

12.4211

881.4445

21 (20)

2983.0357

0.0004

−0.0229

0.4860

−4.6806

20.8400

21.3147

142.1702

12.2109

859.2555

22 (21)

3729.9181

0.0006

−0.0319

0.6910

−6.9057

32.5348

−5.3273

160.8592

12.6271

903.1856

23 (22)

4801.2543

0.0007

−0.0389

0.8581

−8.7999

42.9106

−29.8675

178.5809

12.9036

932.3677

24 (23)

6467.8738

0.0009

−0.0558

1.2786

−13.7695

71.2385

−99.3373

230.1834

13.4421

999.4903

        

Averaged

12.1398

855.9413

aThe mass spectrum of mouse myocardial lipid extract acquired by negative-ion ESI-MS as described previously [6] was shown in Figure 1a and described in the Supplementary Materials. The accumulative layer thickness corresponding to each individual layer of the spectrum was calculated as described in the Experimental section and tabulated in the second column of the table. Different numbers of layers [from layer 2 to layer 8, layer 2 to layer 9, …, layer 2 to layer 24 (layer m, the last layer)] were used for determination of the transition layer through fitting their accumulative layer thicknesses with sixth order polynomial regression (see columns 3 to 9 of the table) followed by calculating the fourth derivative of the polynomial and solving the zero of the derivative as described in the Experimental section. Minimally varied transition layers (column 10) and corresponding noise levels (column 11) were determined from regression fitting data from layers 2 to 14, layers 2 to 15, …, layers 2 to 24 as highlighted in bold in the table.

bThe unit is in ion counts.

cThis first layer was not used for fitting (see Section 2).

dAverage of the bold values.

It should be pointed out that although the sixth order polynomial regression followed by finding the zeros of the fourth derivative of this polynomial regression to determine the transition layer worked best thus far among the regressions we tested, we would leave the possibility open that the regression with any other formula to fit the curve might achieve a similar result or even more precise determination of the transition layer.

3.4 Determination of the Noise Level of a Mass Spectrum

Since polynomial regression on experimental data points is not a mechanism based modeling of data, it is necessary to assure that any potential uncertainty from single time polynomial regression is eliminated and the transition layer determined from the regression is stable. Ideally, if a regression equation that displays a precise fitting of the experimental data points could reflect a true understanding of the relationship underlying the data, this regression equation should be independent of the number of data points employed for the regression. In practice, a similar regression equation would be obtained when the same regression were performed on selected, less data points from the curve. Accordingly, we determined the transition layers from the narrowed accumulative layer thickness curves covering different numbers of data points from the curve, i.e., x = 1 to m-2, x = 1 to m-3, …, and x = 1 to 7 (which is the minimal number of data points required for a sixth order polynomial regression). An example was tabulated (Table 1 where m = 24).

Intriguingly, we found that the determined transition layers and noise levels were minimally affected by reducing the number of the data points from the curve used for the regression if the transition layer was within the employed data points (the highlighted data in Table 1). These results indicated that the determination of the transition layer was independent of how many data points from the accumulative layer thickness curve were used for the regression analysis. It is important to implement these procedures to self check the stability and improve the liability of the determined transition layer. In addition, this stability validated the precise fitting of the accumulative layer thickness curve by the obtained polynomial regression equation and consequently the accurate determination of the transition layer from the regression. The accumulative layer thickness corresponding to each of the determined transition layers was calculated by Equation 8, and those that were consistent (within 30% deviation as described in the Experimental section) were averaged and designated as the noise level of the mass spectrum. A few examples of mass spectra demonstrating the baseline drifts and noise levels were provided in the Supplementary Materials (Figures S1–S4).

3.5 The Effects of the Step Length Setting on Determination of the Transition Layer

To further examine the stability of a determined transition layer, we generated a variety of accumulative layer thickness curves using different step lengths varied from 0.01 to 0.4, where the step length setting at 0.01 represents the finest process while the step length setting at 0.4 represents the roughest process for generating the curve (Figure 3) as described in the Experimental section. It is apparent that the bigger the step length is, the bigger the jump could be from one layer to the next. It is not surprising that a big jump may either overpass the transition layer or diminish the transition on the curve or both, which in turn may result in inaccurate determination of the transition layer. We found that if a step length (L) employed was in the range of ≥0.15, a very short transition was present or there was no transition at all; if the step length fell into the range of 0.04 < L < 0.15, a short, but steady transition was present; and if the step length was ≤0.04, a long transition was present (Figure 3). It should be pointed out that a very long transition might not be beneficial since the transition might be smoothed out. Accordingly, in practice, we employed a step length in the range of 0.04 < L < 0.15. Specifically, we employed L = 0.07 as a default setting in the program and routinely performed our data analysis with this step length.
Figure 3

The effects of the step length (L) on the generation of accumulative layer thickness curve of a mass spectrum. Each accumulative layer thickness curve corresponding to an individual step length (0.01 ≤ L ≤ 0.4) was constructed on the mass spectrum in Figure 1(a) as described under the Experimental section except the layer deduction procedure was repeated until no more spectral data points remained after deduction to generate an extended accumulative layer thickness curve for a clear demonstration purpose. The results indicate that if a large step length was employed (e.g., L ≥ 0.15), a very short transition was present or there was no transition at all; if the modest step length was used (e.g., 0.04 < L < 0.15), a short, but stable transition was present; and if the small step length was applied (e.g., L ≤ 0.04), a long transition was present. For clarification, only a few indicated accumulative layer thickness curves are displayed for demonstration

We observed that the transition layer and corresponding noise level determined from each of the accumulative layer thickness curves generated by using step lengths in the range of 0.04 < L < 0.15 were minimally varied (Table 2). This also validated the accurate determination of the transition layer from the regression of the accumulative layer thickness curve whose generation was independent of the step length within a certain range (i.e., 0.04 < L < 0.15) (i.e., independent of how finely each layer deduction was processed).
Table 2

Examples of the Constancy of the Determined Noise Levels of Mass Spectra for Lipid Analysis as Varied with the Step Length of the Accumulative Layer Thickness Curves

Step length

Determined baseline level (example I) (ion counts)

Determined baseline level (example II) (ion counts)

Determined baseline level (example III) (ion counts)

0.05

3527.7

1717.4

411164.1

0.06

3435.6

1622.2

404535.6

0.07

3350.6

1562.8

402887.7

0.08

3477.1

1555.6

402625.4

0.09

3358.2

1497.6

404002.9

0.10

3245.1

1542.4

401291.8

0.11

3234.3

1506.6

398201.2

0.12

3294.3

1505.9

398957.1

Mean ± SEM (relative error)

3365.4 ± 38.0 (1.1%)

1563.8 ± 26.2 (1.7%)

402958.2 ± 1417.0 (0.4%)

3.6 Examples of Improved Reproducibility of Quantification under Varied Analyte Concentrations with Baseline Correction

Accurate quantification requires sufficient reproducibility under various alterations on experimental conditions (e.g., varied analyte concentrations due to differential analyte recovery, and varied chemical residues due to differential carry-over during sample preparation). Those alterations on experimental conditions, which are inevitable but may not be noticeable in most cases, can affect spectral baseline. To determine the effect of baseline correction on accurate quantification under varied analyte concentrations, we performed the baseline correction for a series of mass spectra acquired from sequentially diluted lipid solutions of a mouse myocardial lipid extract (Table 3 and Figure S5). The results indicate that the absolute noise levels are reduced at higher dilution rates likely due to the decrease in concentration of residual chemicals while the relative noise levels increase with dilution because of the decreased S/N from the increased proportion of baseline in peak intensity (Table 3). Importantly, it was demonstrated that the variation of the peak intensity ratio of a lipid species versus the selected internal standard (i.e., ratiometric comparison) is substantially reduced with baseline correction (Table 3). This reduced variation represents improved reproducibility and therefore more accurate quantification. For example, the peak at m/z 758.6 (as indicated with arrows in Figure S5) represents a modestly low intense ion in the spectra. The variation of the ratio of its peak intensity relative to the internal standard at m/z 674.6 is reduced from 12.2% without baseline correction to 5.6% with baseline correction (Table 3). To those ions in much lower abundance, it is anticipated that baseline correction can improve the reproducibility of quantification even more dramatically due to the low S/N ratios of the low abundance peaks. These results further validate the powerful utility of our newly-developed baseline correction approach and clearly demonstrate the importance of baseline correction for accurate quantification.
Table 3

An Example that Baseline Correction Improves the Accuracy of Quantificationa

Dilution factorb

Driftc (relatived)

Noisec (relatived)

Baseline levelc,e

Base peakc

Ion peak at m/z 674 (I674)c

Ion peak at m/z 758 (I758)c

I758/I674 (no baseline correction)

I758/I674 (baseline corrected)

1

3.29 (0.60)

4.4 (0.81)

7.72

550

498.3

100.1

0.201

0.188

2

2.80 (0.91)

3.1 (1.00)

5.89

308

294.1

56.8

0.193

0.177

4

2.13 (1.09)

2.4 (1.24)

4.55

195

178.0

36.1

0.203

0.182

8

1.18 (1.38)

1.6 (1.94)

2.82

85

73.2

16.8

0.229

0.198

16

1.84 (2.87)

2.2 (3.44)

4.04

64

58.4

15.0

0.257

0.202

      

Mean ± SD

0.217 ± 0.026

0.189 ± 0.011

      

CV (%)

12.15

5.56

aThe reduced variation of ion peak intensity ratios of I758/I674 after baseline correction indicates the importance of baseline correction for accurate quantification. The ion peak at m/z 674 represents an internal standard and the ion peak at m/z 758 as indicated in Figure S5 represents a modestly low intense ion.

bA stock lipid extract of mouse myocardium was prepared by re-suspending dry lipid residue in 100 μL of solvent per mg of tissue protein followed by 100-fold dilution and was sequentially further diluted prior to direct infusion.

cNumbers are shown in absolute ion counts (×10–4).

dThe relative drift and noise (×102) are calculated by normalizing the absolute counts of drift and noise to the absolute counts of base peak and are given in parenthesis. The baseline level is the sum of drift and noise.

Notes

Acknowledgments

This work was supported by National Institute on Aging/National Institute of Diabetes and Digestive and Kidney Diseases grant R01 AG31675 (X.H.) and National Institutes of Health grant P01 HL57278 (R.W.G.). R.W.G. and H. X. have financial relationships with LipoSpectrum LLC. R.W.G. also has a financial relationship with Platomics, Inc.

Supplementary material

13361_2011_229_MOESM1_ESM.doc (3.4 mb)
Figure S1Illustration of mass spectrum and its deducted baseline drift and noise. Panel A displays a survey scan mass spectrum of a mouse myocardial lipid extract acquired by negative-ion ESI-MS with unit resolution in the presence of a small amount of LiOH as described previously [6]. Panel B shows the amplified section of the mass spectrum displayed in Panel A illustrating the deducted baseline drift and noise levels. (DOC 3492 kb)
13361_2011_229_MOESM2_ESM.doc (147 kb)
Figure S2Illustration of mass spectrum and its deducted baseline drift and noise. Panel A displays a survey scan mass spectrum of a mouse myocardial lipid extract acquired by negative-ion ESI-MS with a resolution setting of 0.3 Th as described previously [6]. Panel B shows the amplified section of the mass spectrum displayed in Panel A illustrating the deducted baseline drift and noise levels (DOC 147 kb)
13361_2011_229_MOESM3_ESM.doc (144 kb)
Figure S3Illustration of mass spectrum and its deducted baseline drift and noise. Panel A displays a survey scan mass spectrum of a mouse myocardial lipid extract acquired by positive-ion ESI-MS with unit resolution in the presence of a small amount of LiOH as described previously [6]. Panel B shows the amplified section of the mass spectrum displayed in Panel A illustrating the deducted baseline drift and noise levels. (DOC 144 kb)
13361_2011_229_MOESM4_ESM.doc (138 kb)
Figure S4Illustration of mass spectrum and its deducted baseline drift and noise. Panel A displays a neutral loss scan mass spectrum of a mouse myocardial lipid extract acquired by positive-ion ESI-MS/MS with unit resolution in the presence of a small amount of LiOH as described previously [6]. Panel B shows the amplified section of the mass spectrum displayed in Panel A illustrating the deducted baseline drift and noise levels. (DOC 138 kb)
13361_2011_229_MOESM5_ESM.doc (130 kb)
Figure S5Representative mass spectra of a mouse myocardial lipid extract at different dilution. Survey scan mass spectra of a mouse myocardial lipid extract at different dilution were acquired without addition of LiOH by positive-ion ESI-MS with unit resolution as described previously [6]. The spectra showed dominantly protonated choline phospholipid species. The spectral baseline drift and noise levels were derived as described in the text and indicated in each panel. Panel A, a mass spectrum of a properly-prepared lipid solution (i.e., by re-suspending dry lipid residue in 100 μL of solvent per mg of tissue protein followed by 100 fold dilution); Panel B, a mass spectrum of a lipid solution further diluted twice from the one used in panel A; Panel C, a mass spectrum of a lipid solution further diluted 4 times from the one used in panel A; Panel D, a mass spectrum of a lipid solution further diluted 8 times from the one used in panel A. Panel E, a mass spectrum of a lipid solution further diluted 16 times from the one used in panel A. The results indicate that the absolute noise levels are generally reduced while relative noise levels (%, relative to base peak) increased with a diluted solution. (DOC 130 kb)

References

  1. 1.
    Han, X., Gross, R.W.: Shotgun lipidomics: electrospray ionization mass spectrometric analysis and quantitation of the cellular lipidomes directly from crude extracts of biological samples. Mass Spectrom. Rev. 24, 367–412 (2005)CrossRefGoogle Scholar
  2. 2.
    Norris, J.L., Cornett, D.S., Mobley, J.A., Andersson, M., Seeley, E.H., Chaurand, P., Caprioli, R.M.: Processing MALDI mass spectra to improve mass spectral direct tissue analysis. Int. J. Mass Spectrom. 260, 212–221 (2007)CrossRefGoogle Scholar
  3. 3.
    Satten, G.A., Datta, S., Moura, H., Woolfitt, A.R., Carvalho Mda, G., Carlone, G.M., De, B.K., Pavlopoulos, A., Barr, J.R.: Standardization and denoizing algorithms for mass spectra to classify whole-organism bacterial specimens. Bioinformatics 20, 3128–3136 (2004)CrossRefGoogle Scholar
  4. 4.
    Ivanova, P.T., Milne, S.B., Byrne, M.O., Xiang, Y., Brown, H.A.: Glycerophospholipid Identification and quantitation by electrospray ionization mass spectrometry. Methods Enzymol. 432, 21–57 (2007)CrossRefGoogle Scholar
  5. 5.
    Han, X., Yang, J., Cheng, H., Ye, H., Gross, R.W.: Towards fingerprinting cellular lipidomes directly from biological samples by two-dimensional electrospray ionization mass spectrometry. Anal. Biochem. 330, 317–331 (2004)CrossRefGoogle Scholar
  6. 6.
    Han, X., Yang, K., Gross, R.W.: Microfluidics-Based electrospray ionization enhances intrasource separation of lipid classes and extends identification of individual molecular species through multi-dimensional mass spectrometry: development of an automated high throughput platform for shotgun lipidomics. Rapid Commun. Mass Spectrom. 22, 2115–2124 (2008)CrossRefGoogle Scholar
  7. 7.
    Williams, B.; Cornett, S.; Dawant, B.; Crecelium, A.; Bodenheimer, B.; Caprioli, R. M. An Algorithm for Baseline Correction of Maldi Mass Spectra. Proceedings of the 43 rd Annual Southeast Regional Conference; Kennesaw, GA, 2005 Guimaracs, M. Ed. Association for Computing Machinery, New York, NY, 2005; pp. 137–142.Google Scholar

Copyright information

© American Society for Mass Spectrometry 2011

Authors and Affiliations

  • Kui Yang
    • 1
  • Xiaoling Fang
    • 2
  • Richard W. Gross
    • 1
  • Xianlin Han
    • 2
  1. 1.Division of Bioorganic Chemistry and Molecular Pharmacology, Department of Internal MedicineWashington University School of MedicineSt. LouisUSA
  2. 2.Sanford-Burnham Medical Research InstituteOrlandoUSA

Personalised recommendations