# A Practical Approach for Determination of Mass Spectral Baselines

- 490 Downloads
- 10 Citations

## Abstract

Precise determination of the baseline levels of mass spectra is critical for identification and quantification of analytes. Herein, we present a practical approach for determination of the baselines of mass spectra acquired under differential conditions. The baseline determined by this approach was the sum of baseline drift and noise level. The baseline drift was determined by averaging a number of lowest ion intensities. The noise level was determined based on the fact that an accelerated intensity change exists from noise to signal. This change was best revealed by the established accumulative layer thickness curve that was derived from the thicknesses of individual deducted layers. Deductions were performed sequentially layer by layer, each of which has a thickness of averaged lowest ion intensities from existing spectral data. The layer where the accelerated intensity change occurred was defined as a transition layer, which was determined from the polynomial regression in the sixth order of the accumulative layer thickness curve followed by resolving the roots of its fourth derivative. We validated the presence of this transition layer through determination of its convergence from various accumulative layer thickness curves generated by varying either the ending or the fineness of the sequential layer deductions. This simple, practical, program-based baseline determination approach should greatly increase the accuracy and consistency of identification and quantification by mass spectrometry, and facilitate the automation of data processing, thereby increasing the power of any high throughput methodology in general and of shotgun lipidomics in particular.

### Key words

Baseline correction Mass spectrometry Shotgun lipidomics## 1 Introduction

Determination of a baseline is critical for identification of analytes by mass spectrometry through properly discriminating noises and signals. It is also essential for accurate quantification of the mass content of each analyte through correcting the baseline contribution to the individual peak intensities of a mass spectrum, particularly when a shotgun lipidomics approach is employed where quantification of analytes is performed through direct comparison of their ion intensities with that of selected internal standard(s) [1] and when the species is in low abundance (e.g., signal/noise < 10). Baseline correction is also important for precise display of mass spectra, for example, in tissue mapping [2].

Two major factors contribute to the values of a baseline of a mass spectrum, i.e., detector drift (i.e., baseline drift) and chemical noise (i.e., noise level). The baseline drift represents a shift of a mass spectrum from the origin (i.e., zero) and can be tuned to a linear, minimal increase with increase in *m/z* for a mass spectrometer with a quadrupole as an analyzer. The chemical noise, the actual noise level of a spectrum after deduction of the baseline drift, is composed of background signals resulting from residual chemicals. In particular, inorganic salts presented in the sample matrix are largely responsible for this noise.

Many different approaches, based on a variety of algorithms, have previously been developed and been applied for different purposes [2, 3, 4]. However, most of these approaches have been limited to their particular utilities (e.g., correction for matrix-induced baseline in matrix-assisted laser desorption/ionization mass spectrometry; correction for mobile phase components-contributed baseline; etc.). Herein, we present a simple, practical approach for a general determination of the baseline of a mass spectrum. This approach includes the corrections for both the baseline drift and the noise level of a mass spectrum. The principles of this practical approach are discussed and the procedures to determine the baseline are described in detail. We compared many of the determined baselines using either this program-based approach or a manual approach and found that the baselines determined by these two approaches were very consistent. We specifically pointed out that although the new approach was derived largely from the spectra acquired by the Xcalibur operation system, it could be readily modified for other operation systems as we practiced for the mass spectra acquired by ABI 4800 mass analyzer. We believe that the development of this simple, practical, program-based approach should greatly facilitate the automation and increase the power of high throughput methodology.

## 2 Experimental

### 2.1 Determination of the Baseline Drift of a Mass Spectrum

*m/z*difference between every two neighboring data points was one-tenth of the Peak Width (FWHM). For example, when a unit resolution spectrum was acquired at a Peak Width setting of 0.7 Th [5, 6], the

*m/z*difference between two neighboring data points was 0.07 Th. Each single peak in a unit resolution spectrum was then composed of 14–15 data points (i.e., 1/0.07 = 14.3). If a spectrum was acquired at a Peak Width setting of 0.35 Th, the

*m/z*difference between two neighboring data points was 0.035 Th and the spectrum was a half unit resolution spectrum each peak of which occupied a half mass unit and was also composed of 14–15 data points (i.e., 0.5/0.035 = 14.3). Accordingly, we determined the number of peaks of a mass spectrum from the number of raw spectral data points by the following equation:

For example, a unit resolution spectrum of a full list of 1430 data points contained 100 peaks (i.e., N = 1430 * 0.07 = 100). If, within the same mass range, a half unit resolution spectrum was acquired, a full list of 2860 spectral data points would be generated because the mass difference (i.e., 0.035) between every two neighboring data points was half of that from unit resolution spectrum. The number of peaks of the spectrum would therefore be doubled (i.e., N = 2860 * 0.07 = 200).

Next, we sorted the data in an ascending order of the intensities, and then averaged the N lowest intensity data. This averaged intensity was defined as the baseline drift of the mass spectrum in the mass region. For example, the baseline drift of the unit mass spectrum exemplified above was obtained by averaging the 100 lowest intensities after sorting the intensity data while the baseline drift of the exemplified half unit mass spectrum was obtained by averaging the 200 lowest intensities. Finally, the baseline drift was subtracted from each of the raw spectral intensity data points. Those data points that had intensities of either zero or negative after the baseline drift deduction were discarded. The remaining data points were re-sorted in an *m/z* order and termed as Data Set 0, which represented the baseline drift-deducted data set. A mass spectrum reconstructed from the baseline drift-deducted data set (i.e., Data Set 0) had baseline drift deducted but still had background noise present (Figures S1 to S4).

### 2.2 Determination of the Noise Level of a Mass Spectrum

The baseline drift-deducted data points (i.e., Data Set 0) of a mass spectrum were next used for determination of the noise level of the spectrum. In theory, the same procedure as the baseline drift deduction described above could be repeated on Data Set 0, which in turn generates a new data set (i.e., Data Set 1). If this type of deduction procedure is repeated over and over, many data sets could be generated from their previous ones (e.g., Data Set M from Data Set M-1) where each data set has less data points compared to its previous one. The mass spectrum reconstructed from Data Set 1 could be imagined as a spectrum with a thin “layer” wiped off from the bottom of the baseline drift-deducted spectrum (reconstructed from Data Set 0), while the thickness of the “layer” is the average of N (calculated by Equation 1) lowest intensities from Data Set 0. The mass spectrum reconstructed by Data Set M could be imagined as a spectrum with a “layer” wiped off from the bottom of the spectrum reconstructed from Data Set M-1, while the thickness of the “layer” is the average of N lowest intensities from Data Set M-1.

The thickness of each “layer” may vary. A “layer” wiped off to generate a very early data set (e.g., Data Set 1) from its previous data set (i.e., Data Set 0) is usually thin, while the “layer” wiped off to generate a later data set from its previous data set could be much thicker. One of the reasons is that each peak contains the same number of data points (i.e., 14–15) equally distributed in *m/z* dimension (i.e., x-axis), thus the higher the intensity of the peak, the bigger the intensity difference between its two neighboring data points (Scheme 1a). Since the spectral peaks are Gaussian smoothed and of Gaussian distribution, this intensity difference increases with the data point moving towards the top of the peak in comparison to the difference between two points near the bottom of the peak (Scheme 1a). Another reason is that the same number (N, calculated by Equation 1) of lowest intensities is averaged from each data set to generate next data set. Accordingly, the very early layer by layer deductions would not lead to a significantly-varied thickness of each layer since the N lowest intensities in each data set are almost exclusively from the low intensity peaks (i.e., noise peaks) and the intensity differences between neighboring data points of the low intensity peaks are small. When low intense peaks are wiped off after a few times of deduction, the data points from high intensity peaks (or signals) are then picked up in the N lowest intensities whose average is the layer thickness for deduction from the current data set to generate next data set. Therefore, a significant increase in layer thickness would occur at this point. We defined this layer as the Transition Layer. The sum of the deducted intensities (or the layer thicknesses) of individual layers from the first layer to the transition layer was designated as the noise level of the spectrum, which was deducted from Data Set 0 to obtain the spectrum with both baseline drift and noise corrected.

- 1.
Calculation of the Thickness of a Layer

To calculate the thickness of each layer, we repeated the steps for determining the baseline drift described above on every new data set except that the number (N) of lowest intensity data points was replaced with a new number (n), which was calculated by the following equation:where the Step Length determined how finely each layer deduction would be processed and is defined by the user. This number (n) could be different from the number (N). If a step length of 0.07 was defined, the number (n) calculated by Equation 2 was identical to the number (N) by Equation 1. If a step length of <0.07 was defined, a smaller number of lowest intensity data points from the current data set were used for deduction to yield the next data set. The average of a smaller number (e.g., p) of lowest intensity data points (e.g., intensity 1, intensity 2, …, intensity p in an ascending order) was smaller than the average of a bigger number (e.g., q, q > p) of lowest intensity data points (e.g., intensity 1, intensity 2, …, intensity p, intensity p + 1, …, intensity q in an ascending order). Accordingly, this represented a finer sequential layer deduction because more layers of deduction would necessarily be performed to wipe off the entire spectral data points if a smaller averaged intensity was used for deduction of each layer. In contrast, the defined step length of >0.07 led to a rougher layer deduction represented by a bigger averaged intensity for deduction of each layer and less layers of deduction for the entire spectrum. The significance of varying the number (n) for deduction of each layer was discussed in detail in the Discussion section.$$ {\text{n}} = {\text{The}}\;{\text{total}}\;{\text{number}}\;{\text{of}}\;{\text{raw}}\;{\text{data}}\;{\text{points}}\;{\text{of}}\;{\text{a}}\;{\text{spectrum}}*{\text{Step}}\;{\text{Length}} $$(2)In practice, we first sorted the data points from the baseline drift deducted spectral data (Data Set 0) in the order of ascending intensity and averaged the n lowest intensity data points (n was calculated by Equation 2 with a user-defined Step Length). Then we deducted this averaged intensity from the intensities of individual data points in Data Set 0 and discarded the data points whose intensities were zero or negative after deduction. The remaining data points yielded Data Set 1. The mass spectrum reconstructed from the newly generated Data Set 1 could be viewed as the baseline drift-deducted spectrum (reconstructed from Data Set 0) was wiped a layer off from the bottom of the spectrum whose thickness was the average of the n lowest intensities from Data Set 0. This averaged intensity was designated as the Thickness of Layer 1 (TL_{1}). This procedure was repeated to calculate the thicknesses of sequential layers. In general, the calculation of the Thickness of Layer i (TL_{i}) could be represented by the following equation:where TL$$ {\text{T}}{{\text{L}}_{\text{i}}} = {\text{ Average }}\left( {{\text{Intensity 1}},{\text{ Intensity 2}},{ } \ldots, {\text{ Intensity n of Data Set i}} - {1}} \right) $$(3)_{i}is the Thickness of Layer i, and Intensity 1, Intensity 2, …, Intensity n are the n lowest intensities of Data Set i-1. The deduction of TL_{i}from the data points of Data Set i-1 yielded Data Set i. For example, Data Set 2 was generated from Data Set 1 by wiping from Data Set 1 one layer off with a thickness of TL_{2}, which was calculated from averaging the n lowest intensities of Data Set 1. The procedure was repeated until the number of the remaining spectral data points was smaller than n. A series of the thicknesses of layers: TL_{1}, TL_{2}, …, TL_{m}were generated accordingly where TL_{m}was the thickness of the last Layer m, whose deduction generated the last data set m from Data Set m-1 while one-step further deduction on Data Set m would have generated a data set that contained less than n remaining spectral data points. - 2.
Generation of the Accumulative Thickness of a Layer

In reality, it was difficult to directly determine the transition layer from the generated series of the layer thicknesses (i.e., TL_{1}, TL_{2}, …, TL_{m}for layer 1, layer 2, …, and last layer m, respectively). One of the reasons was that the curve of layer thickness (TL) versus layer mostly had irregular trends for which no regression method worked consistently for different mass spectra with satisfactory correlation coefficients. To resolve this issue, we derived the*A*ccumulative Thickness of a Layer (ATL) from the individual Thickness of the Layer (TL) by the following equation:where TL$$ {\text{AT}}{{\text{L}}_{\text{i}}} = \sum {\text{T}}{{\text{L}}_{\text{i}}} $$(4)_{i}was the Thickness of Layer i (1 ≤ i ≤ m) and was calculated by Equation 3, and ATL_{i}was the Accumulative Thickness of Layer i. The curve of the Accumulative Thickness of Layer (ATL) versus layer was termed as the accumulative layer thickness curve, which was used for determining the transition layer in the following steps. - 3.
Automated Determination of the Transition Layer

Next, we fitted the accumulative layer thickness curve by regression to determine the transition layer. The first point (1, ATL_{1}) of the curve was not included for the curve fitting because the thickness of the very first layer (i.e, TL_{1}or ATL_{1}) was lack of stability and the first layer impossible to be the transition layer. We determined the transition layer from the accumulative layer thickness curve by the following procedures:- (1)Fitted these accumulative layer thickness data points by polynomial regression in the order of 6 by using MATLAB function polyval as follows:where “x” was the layer number (1 ≤ x ≤ m-1); “y” was the accumulative layer thickness (ATL) of the layer x, which is calculated from Equation 4; a, b, …, and g were the regression coefficients. To specify, the first data point (x$$ {\text{y }} = {\text{ a }} + {\text{ bx }} + {\text{ c}}{{\text{x}}^{{2}}} + {\text{ d}}{{\text{x}}^{{3}}} + {\text{ e}}{{\text{x}}^{{4}}} + {\text{ f}}{{\text{x}}^{{5}}} + {\text{ g}}{{\text{x}}^{{6}}} $$(5)
_{1}, y_{1}) for the regression is the second point (2, ATL_{2}) of the original accumulative layer thickness curve due to elimination of the first layer for regression, and the last data point (x_{m-1}, y_{m-1}) is the last point (m, ATL_{m}) of the original curve. - (2)Calculated the derivatives of the obtained polynomial regression up to the fourth derivative. The fourth derivative was as follows:where e, f, and g were regression coefficients from Equation 5.$$ {\text{y}}''''{ } = {\text{ 24e }} + { 12}0{\text{fx }} + { 36}0{\text{g}}{{\text{x}}^{{2}}} $$(6)
- (3)Found zeros of y”” by solving the following single-variable quadratic equation by using MATLAB function polyder:where e, f, and g were regression coefficients from Equation 5. There were two possibilities for the roots of Equation 7: the two roots were real numbers x$$ {\text{y}}'''' = 0{\text{ or 15g}}{{\text{x}}^{{2}}} + {\text{ 5fx }} + {\text{ e}} = 0 $$(7)
_{I}and x_{II}when 5f^{2}– 12eg ≥ 0 or the two roots were complex numbers when 5f^{2}– 12eg < 0. When two real-number roots were obtained and x_{I}≠ x_{II}, the bigger number was taken as the transition layer and the smaller number was discarded. When two complex-number roots were obtained, the algorithm redid steps (1), (2), and (3) with the narrowed accumulative layer thickness curves by eliminating the last data point until real number roots were found from Equation 7.

- (1)
- 4.
Determination of the Accumulative Layer Thickness Corresponding to the Determined Transition Layer

After the transition layer (e.g., the x_{I}from step 3) was determined, we calculated the accumulative layer thickness y_{I}corresponding to x_{I}. If x_{I}was an integer number, the y_{I}from the data point (x_{I}, y_{I}) of the accumulative layer thickness curve was then taken as the accumulative layer thickness corresponding to the transition layer x_{I}. If x_{I}was not an integer number, the “y”s from the two adjacent data points that were neighbors to the determined transition layer x_{I}(i.e., data points (x_{s}, y_{s}) and (x_{s+1}, y_{s+1}) where x_{s}< x_{I}< x_{s+1}) were used to calculate the y_{I}corresponding to x_{I}by the following equation:where x$$ {y_I} = {y_s}*\left( {{x_{{s + 1}}} - {x_I}} \right) + {y_{{s + 1}}}*\left( {{x_I} - {x_s}} \right) $$(8)_{I}is the determined transition layer; y_{I}is the accumulative layer thickness corresponding to x_{I}; x_{s}and x_{s+1}are the two adjacent layers from the accumulative layer thickness curve that meet x_{s}< x_{I}< x_{s+1}; y_{s}and y_{s+1}are accumulative layer thicknesses corresponding to x_{s}and x_{s+1}, respectively. - 5.
Self-Check and Determination of the Spectral Noise Level

In theory, the determined accumulative layer thickness y_{I}corresponding to the determined transition layer x_{I}could be considered as the noise level of the spectrum. In practice, to self check the stability of the determined transition layer and eliminate any potential uncertainty from single time polynomial regression on selected data points, we determined more transition layers from narrower regions of the accumulative layer thickness curve. Specifically, we first repeated the procedures (1), (2), and (3) of Step 3 to individually determine the transition layers (e.g., x_{I}^{1}_{,}x_{I}^{2}_{, …}) from a series of narrowed regions of the accumulative layer thickness curve (i.e., x = 1 to m-2, x = 1 to m-3, …, and x = 1 to 7) which were yielded by eliminating the last data point of the curve sequentially until minimal number (i.e., seven) of data required for determination of the regression coefficients by Equation 5 reached. Then, the accumulative layer thicknesses y_{I}^{1}_{,}y_{I}^{2}_{, …}, corresponding to the newly determined transition layers x_{I}^{1}_{,}x_{I}^{2}_{, …}, respectively, were determined by repeating Step 4. We discarded the values from the determined y_{I}^{1}_{,}y_{I}^{2}_{, …}that were either larger than y_{I}or less than y_{I}* 70% and averaged the rest of the values. This averaged accumulative layer thickness was defined as the noise level of a mass spectrum. The overall baseline of a mass spectrum was corrected by the sum of the determined baseline drift and the determined noise level of the spectrum. Meanwhile, the signal-to-noise ratio of an ion peak can be calculated as:$$ \begin{array}{*{20}{c}} {{\text{S}}/{\text{N}} = {\text{the}}\;{\text{signal}}\;{\text{of}}\;{\text{an}}\;{\text{ion}}\;{\text{peak}}/{\text{the}}\;{\text{noise}}\;{\text{level}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}} \\ { = {\text{baseline}} - {\text{corrected}}\;{\text{peak}}\;{\text{intensity}}/{\text{the}}\;{\text{noise}}\;{\text{level}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}} \\ { = \left( {{\text{ion}}\;{\text{peak}}\;{\text{intensity}}--{\text{the}}\;{\text{overall}}\;{\text{baseline}}} \right)/{\text{the}}\;{\text{noise}}\;{\text{level}}\;{\text{of}}\;{\text{the}}\;{\text{spectrum}}} \\ \end{array} $$(9)

## 3 Results and Discussion

### 3.1 Correction for Baseline Drift of a Mass Spectrum

The baseline drift of a mass spectrum is a constant shift of the peak intensities from their original values to the apparently determined values and consistently occurs for the entire spectrum. This drift is different from the noise background of the spectrum. The baseline drift of a mass spectrum could be either mass independent or mass dependent. However, when the mass range of interest is narrow, the baseline drift usually becomes minimally mass dependent within the mass range. In the current study, we focused on the correction of the mass independent baseline drift of a mass spectrum. When the mass range of interest is wide and the mass dependence of baseline drift has to be addressed, one can segment the entire widely ranged mass spectrum into a few narrowly ranged spectra [7] and then employ the current approach to correct the baseline drift individually for each of the segmented mass spectra.

The baseline drift of a Gaussian-smoothed single spectral peak acquired in the profile mode can be represented by its lowest intensity which is usually the intensity from either the first data point or the last data point of the peak. The baseline drift of a mass spectrum acquired in the profile mode containing N Gaussian-smoothed single peaks, in theory, can be represented by the line connecting each lowest intensity data point of each of the N single peaks in the spectrum in the order of *m/z*. If the baseline drift is considered mass-independent within the mass range, the baseline drift can be simplified by averaging the N individual lowest intensities, each of which is the lowest intensity for one of the N peaks of the mass spectrum. The baseline drift was determined in our approach by averaging the N lowest intensities of the entire spectrum while the N lowest intensities were selected through sorting the entire spectral data points in an ascending order. Therefore, among the N lowest intensities, instead of one from each peak, more than one might come from the data points of one peak, or none from another peak. However, the advantage of our approach for baseline drift determination is its simplicity with practically sufficient accuracy. Its simplicity results from the one-time sorting in an ascending order of the peak intensities that fishes out all the N lowest intensities simultaneously from the full data list. Its sufficient accuracy is due to that the average of a large number (N, generally in hundreds or more) of lowest intensities can very likely represent the trend of the baseline shift of the entire spectrum (Scheme 1b). Additional examples of the determined baseline drifts are demonstrated in the Supplementary Materials (Figures S1-S4).

### 3.2 Generation of the Accumulative Layer Thickness Curve from the Baseline Drift Corrected Mass Spectrum

_{1}, was the average of the n lowest intensities from Data Set 0. In general, Data Set i (1 ≤ i ≤ m, m is the last data set containing ≥ n spectral data points) were generated by deduction of a layer (Layer i; having a thickness of TL

_{i}by Equation 3) from Data Set i-1. When these layers (Layer 1, Layer 2, …, Layer m) were laid along the y-axis of the spectrum, they were unequally distributed by varied thicknesses of individual layers (TL

_{1}, TL

_{2}, …, TL

_{m}, respectively) (Figure 1). Only small changes existed in the thickness of the first few deducted layers due to the dominant contribution of the low intense peaks to these layer deductions (see the Section 2) (Figures 1b and 2a). When the low intensity peaks were deducted completely from the data set, the following layer deduction then picked the data points from the remaining high intensity peaks present in the data set. This would subsequently result in a significant increase in the thickness of the layer which we defined as the transition layer (Figure 2a).

Before the transition, the thickness of each layer did not change significantly from layer to layer (Figure 1b) because the n lowest intensity data points in the current data set and in its previous data set were both from low intensity peaks that had small intensity differences between the neighboring data points (Scheme 1a). After the transition, the thickness of each layer increased significantly from layer to layer (Figure 1a) because the n lowest intensity data points in the current data set and in its previous data set were both from high intensity peaks that had big intensity changes (Scheme 1a). During the transition, the n lowest intensity data points in the current data set were, at least partially, from high intensity peaks while the n lowest intensity data points in its previous data set were exclusively from low intensity peaks. Therefore, the average of the n lowest intensities from the current data set increased significantly compared with that from the previous data set at the occurrence of the transition, and the rate of the increase at the transition should be differentiable from that before the transition (where the increase if any would be slight) and after the transition (where the increase would be dramatic) (Figure 2).

It was noted that the trends of these determined layer thicknesses were varied irregularly (inset in Figure 2a). This irregulation made the curve regression difficult while a precise curve regression is essential to automatically locate the transition layer mathematically by an algorithm. We found that employing an accumulative layer thickness curve yielded from the TL_{i} data [i.e., a curve of ATL_{i} versus layer i (Figure 2b), where ATL_{i} was calculated by Equation 4] successfully bypassed this difficulty.

### 3.3 Determination of the Transition Layer from the Accumulative Layer Thickness Curve

_{x})/d

*x*); the second derivative represents the rate of change of the accumulative layer thickness change rate or the acceleration of the accumulative layer thickness with layer (i.e., y” = d(y’)/d

*x*); and the third derivative represents the rate of change of the acceleration of the accumulative layer thickness with layer (i.e., y”’ = d(y”)/d

*x*). The roots of Equation 6 (i.e., the fourth derivative = 0) represent the layers that correspond to the extrema of the rate of change of the acceleration of the accumulative layer thickness (i.e., the extrema of the curve of y”’ versus

*x*). Since the fourth derivative of a sixth order polynomial is in the order of 2, Equation 6 has two roots. We observed that the lower value of the two roots generally located within the first few layers, which might indicate a type of transition of which we did not yet know the meaning while the higher value of the two roots best represents the transition layer.

The Effects of Different Numbers of Layers used for Determination of a Transition Layer to Derive the Noise Level of a Mass Spectrum^{a}

Layer i (x) | ATL | The coefficients of the six order polynomial regression | Determined transition layer | Determined noise level | ||||||
---|---|---|---|---|---|---|---|---|---|---|

x | x | x | x | x | x | x | ||||

1 | 96.0257 | |||||||||

2 (1) | 178.1054 | |||||||||

3 (2) | 241.4414 | |||||||||

4 (3) | 303.5546 | |||||||||

5 (4) | 362.1216 | |||||||||

6 (5) | 417.7978 | |||||||||

7 (6) | 475.7886 | |||||||||

8 (7) | 543.0676 | −0.0060 | 0.1401 | −1.1303 | 3.9280 | −6.7077 | 68.9562 | 112.9251 | 6.4240 | 442.3857 |

9 (8) | 611.4270 | −0.0129 | 0.3021 | −2.6454 | 10.9921 | −23.7935 | 88.9226 | 104.3376 | 6.1826 | 428.3870 |

10 (9) | 673.6276 | 0.0015 | −0.0781 | 1.2995 | −9.2411 | 29.4259 | 22.3163 | 134.4206 | 14.3558 | 1126.6866 |

11 (10) | 743.2630 | 0.0091 | −0.3010 | 3.8417 | −23.4735 | 69.9014 | −31.7203 | 159.9766 | 8.0684 | 547.7460 |

12 (11) | 836.9986 | 0.0064 | −0.2158 | 2.7817 | −17.0385 | 50.2164 | −3.7860 | 146.1761 | 8.2008 | 556.7936 |

13 (12) | 942.5386 | 0.0003 | −0.0028 | −0.0868 | 1.7348 | −11.2756 | 88.6708 | 98.5609 | 7.1979 | 489.1044 |

14 (13) | 1071.3711 | −0.0009 | 0.0433 | −0.7543 | 6.4208 | −27.6442 | 114.6717 | 84.6279 | 11.3401 | 775.1448 |

15 (14) | 1226.8370 | −0.0009 | 0.0426 | −0.7440 | 6.3441 | −27.3593 | 114.1949 | 84.8934 | 11.3739 | 778.3111 |

16 (15) | 1384.3546 | −0.0013 | 0.0574 | −0.9883 | 8.2921 | −35.0162 | 127.6661 | 77.1149 | 11.0511 | 748.0493 |

17 (16) | 1561.4457 | −0.0007 | 0.0332 | −0.5633 | 4.6994 | −20.1054 | 100.1519 | 93.5721 | 11.2062 | 762.5905 |

18 (17) | 1795.2227 | 0.0001 | −0.0087 | 0.2179 | −2.2787 | 10.3935 | 41.2455 | 130.0248 | 12.5442 | 894.4364 |

19 (18) | 2097.3211 | 0.0005 | −0.0285 | 0.6083 | −5.9529 | 27.2633 | 7.2054 | 151.7936 | 12.4177 | 881.0784 |

20 (19) | 2483.2300 | 0.0005 | −0.0289 | 0.6166 | −6.0358 | 27.6624 | 6.3657 | 152.3480 | 12.4211 | 881.4445 |

21 (20) | 2983.0357 | 0.0004 | −0.0229 | 0.4860 | −4.6806 | 20.8400 | 21.3147 | 142.1702 | 12.2109 | 859.2555 |

22 (21) | 3729.9181 | 0.0006 | −0.0319 | 0.6910 | −6.9057 | 32.5348 | −5.3273 | 160.8592 | 12.6271 | 903.1856 |

23 (22) | 4801.2543 | 0.0007 | −0.0389 | 0.8581 | −8.7999 | 42.9106 | −29.8675 | 178.5809 | 12.9036 | 932.3677 |

24 (23) | 6467.8738 | 0.0009 | −0.0558 | 1.2786 | −13.7695 | 71.2385 | −99.3373 | 230.1834 | 13.4421 | 999.4903 |

Average | 12.1398 | 855.9413 |

It should be pointed out that although the sixth order polynomial regression followed by finding the zeros of the fourth derivative of this polynomial regression to determine the transition layer worked best thus far among the regressions we tested, we would leave the possibility open that the regression with any other formula to fit the curve might achieve a similar result or even more precise determination of the transition layer.

### 3.4 Determination of the Noise Level of a Mass Spectrum

Since polynomial regression on experimental data points is not a mechanism based modeling of data, it is necessary to assure that any potential uncertainty from single time polynomial regression is eliminated and the transition layer determined from the regression is stable. Ideally, if a regression equation that displays a precise fitting of the experimental data points could reflect a true understanding of the relationship underlying the data, this regression equation should be independent of the number of data points employed for the regression. In practice, a similar regression equation would be obtained when the same regression were performed on selected, less data points from the curve. Accordingly, we determined the transition layers from the narrowed accumulative layer thickness curves covering different numbers of data points from the curve, i.e., x = 1 to m-2, x = 1 to m-3, …, and x = 1 to 7 (which is the minimal number of data points required for a sixth order polynomial regression). An example was tabulated (Table 1 where m = 24).

Intriguingly, we found that the determined transition layers and noise levels were minimally affected by reducing the number of the data points from the curve used for the regression if the transition layer was within the employed data points (the highlighted data in Table 1). These results indicated that the determination of the transition layer was independent of how many data points from the accumulative layer thickness curve were used for the regression analysis. It is important to implement these procedures to self check the stability and improve the liability of the determined transition layer. In addition, this stability validated the precise fitting of the accumulative layer thickness curve by the obtained polynomial regression equation and consequently the accurate determination of the transition layer from the regression. The accumulative layer thickness corresponding to each of the determined transition layers was calculated by Equation 8, and those that were consistent (within 30% deviation as described in the Experimental section) were averaged and designated as the noise level of the mass spectrum. A few examples of mass spectra demonstrating the baseline drifts and noise levels were provided in the Supplementary Materials (Figures S1–S4).

### 3.5 The Effects of the Step Length Setting on Determination of the Transition Layer

Examples of the Constancy of the Determined Noise Levels of Mass Spectra for Lipid Analysis as Varied with the Step Length of the Accumulative Layer Thickness Curves

Step length | Determined baseline level (example I) (ion counts) | Determined baseline level (example II) (ion counts) | Determined baseline level (example III) (ion counts) |
---|---|---|---|

0.05 | 3527.7 | 1717.4 | 411164.1 |

0.06 | 3435.6 | 1622.2 | 404535.6 |

0.07 | 3350.6 | 1562.8 | 402887.7 |

0.08 | 3477.1 | 1555.6 | 402625.4 |

0.09 | 3358.2 | 1497.6 | 404002.9 |

0.10 | 3245.1 | 1542.4 | 401291.8 |

0.11 | 3234.3 | 1506.6 | 398201.2 |

0.12 | 3294.3 | 1505.9 | 398957.1 |

Mean ± SEM (relative error) | 3365.4 ± 38.0 (1.1%) | 1563.8 ± 26.2 (1.7%) | 402958.2 ± 1417.0 (0.4%) |

### 3.6 Examples of Improved Reproducibility of Quantification under Varied Analyte Concentrations with Baseline Correction

*m/z*758.6 (as indicated with arrows in Figure S5) represents a modestly low intense ion in the spectra. The variation of the ratio of its peak intensity relative to the internal standard at

*m/z*674.6 is reduced from 12.2% without baseline correction to 5.6% with baseline correction (Table 3). To those ions in much lower abundance, it is anticipated that baseline correction can improve the reproducibility of quantification even more dramatically due to the low S/N ratios of the low abundance peaks. These results further validate the powerful utility of our newly-developed baseline correction approach and clearly demonstrate the importance of baseline correction for accurate quantification.

An Example that Baseline Correction Improves the Accuracy of Quantification^{a}

Dilution factor | Drift | Noise | Baseline level | Base peak | Ion peak at | Ion peak at | I | I |
---|---|---|---|---|---|---|---|---|

1 | 3.29 (0.60) | 4.4 (0.81) | 7.72 | 550 | 498.3 | 100.1 | 0.201 | 0.188 |

2 | 2.80 (0.91) | 3.1 (1.00) | 5.89 | 308 | 294.1 | 56.8 | 0.193 | 0.177 |

4 | 2.13 (1.09) | 2.4 (1.24) | 4.55 | 195 | 178.0 | 36.1 | 0.203 | 0.182 |

8 | 1.18 (1.38) | 1.6 (1.94) | 2.82 | 85 | 73.2 | 16.8 | 0.229 | 0.198 |

16 | 1.84 (2.87) | 2.2 (3.44) | 4.04 | 64 | 58.4 | 15.0 | 0.257 | 0.202 |

Mean ± SD | 0.217 ± 0.026 | 0.189 ± 0.011 | ||||||

CV (%) | 12.15 | 5.56 |

## Notes

### Acknowledgments

This work was supported by National Institute on Aging/National Institute of Diabetes and Digestive and Kidney Diseases grant R01 AG31675 (X.H.) and National Institutes of Health grant P01 HL57278 (R.W.G.). R.W.G. and H. X. have financial relationships with LipoSpectrum LLC. R.W.G. also has a financial relationship with Platomics, Inc.

## Supplementary material

### References

- 1.Han, X., Gross, R.W.: Shotgun lipidomics: electrospray ionization mass spectrometric analysis and quantitation of the cellular lipidomes directly from crude extracts of biological samples.
*Mass Spectrom. Rev.***24**, 367–412 (2005)CrossRefGoogle Scholar - 2.Norris, J.L., Cornett, D.S., Mobley, J.A., Andersson, M., Seeley, E.H., Chaurand, P., Caprioli, R.M.: Processing MALDI mass spectra to improve mass spectral direct tissue analysis.
*Int. J. Mass Spectrom.***260**, 212–221 (2007)CrossRefGoogle Scholar - 3.Satten, G.A., Datta, S., Moura, H., Woolfitt, A.R., Carvalho Mda, G., Carlone, G.M., De, B.K., Pavlopoulos, A., Barr, J.R.: Standardization and denoizing algorithms for mass spectra to classify whole-organism bacterial specimens.
*Bioinformatics***20**, 3128–3136 (2004)CrossRefGoogle Scholar - 4.Ivanova, P.T., Milne, S.B., Byrne, M.O., Xiang, Y., Brown, H.A.: Glycerophospholipid Identification and quantitation by electrospray ionization mass spectrometry.
*Methods Enzymol.***432**, 21–57 (2007)CrossRefGoogle Scholar - 5.Han, X., Yang, J., Cheng, H., Ye, H., Gross, R.W.: Towards fingerprinting cellular lipidomes directly from biological samples by two-dimensional electrospray ionization mass spectrometry.
*Anal. Biochem.***330**, 317–331 (2004)CrossRefGoogle Scholar - 6.Han, X., Yang, K., Gross, R.W.: Microfluidics-Based electrospray ionization enhances intrasource separation of lipid classes and extends identification of individual molecular species through multi-dimensional mass spectrometry: development of an automated high throughput platform for shotgun lipidomics.
*Rapid Commun. Mass Spectrom.***22**, 2115–2124 (2008)CrossRefGoogle Scholar - 7.Williams, B.; Cornett, S.; Dawant, B.; Crecelium, A.; Bodenheimer, B.; Caprioli, R. M. An Algorithm for Baseline Correction of Maldi Mass Spectra.
*Proceedings of the 43 rd Annual Southeast Regional Conference*; Kennesaw, GA, 2005 Guimaracs, M. Ed. Association for Computing Machinery, New York, NY, 2005; pp. 137–142.Google Scholar