Abstract
In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs’s algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs’s polynomial algorithm, and MIDAs’s Fourier transform algorithm. Among the three, IC and MIDAs’s polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html
Key words
Isotopic Distribution Accurate mass Mass spectrometry Proteomics1 Introduction
Most biomolecules are composed of hydrogen, carbon, nitrogen, oxygen, and sulphur. It is known that the natural isotopes of these elements occur with different probabilities [1, 2], and in some experiments the relative abundances of an element’s isotopes can be manipulated by using a technique known as stable isotopic labeling [3, 4]. The relative abundances of isotopes determine a molecule’s isotopic distribution (ID), which can be measured experimentally using a mass spectrometer. The measured ID constrains the elemental composition when compared with the in-silico computed ID and, hence, helps in identifying the underlying molecule. The realization of this goal, however, demands accurate in-silico ID prediction [5–11].
The information content in an experimentally measured ID depends on the resolution of the mass spectrometer. An ID generated by a low resolution instrument contains less information than that by an ultra-high resolution instrument [12–15]. Based on the instrument resolution, three different types of IDs are commonly mentioned in the literature: the aggregated, the fine structure, and the hyper-fine structure IDs [16]. The aggregated ID is computed by merging isotopic variants that have the same nucleon number into one aggregated isotopic variant [17, 18] whose corresponding molecular mass (MM) and occurrence probability are computed respectively from the probability-weighted sum of masses and from the sum of the probabilities of the isotopic variants merged. The fine and hyper-fine structure IDs are computed similarly to the aggregated ID, except that one merges only isotopic variants whose molecular mass differences are within some pre-specified mass accuracy.
To make practical use of experimentally measured IDs, it is imperative to have methods that can compute in-silico IDs when given molecular formulas. Rockwood et al. [19] mentioned several criteria for a sound ID-computing method (IDCM): an IDCM must accurately compute in a very short time the masses and intensities without consuming much computational resource. We propose a few additional criteria by which to assess an IDCM’s application value: to handle experimentally generated IDs from both low-resolution and high-resolution instruments, an IDCM should allow adjustable mass accuracy; given that customized isotopic labeling has become a common experimental technique for quantitative analyses, an IDCM should be able to handle customized (or user-specified) isotopic abundances (or occurrence probabilities) of all chemical elements considered; finally, an IDCM should be able to compute IDs for a wide mass range and be user-friendly. Although there are several available methods [16] that can compute an aggregated ID [17, 18, 20–23], fine structure ID [19], and hyper-fine structure ID [23–26], there are not many methods that can satisfy all the requirements mentioned above.
In this manuscript, we present MIDAs, a software tool satisfying all the requirements above. MIDAs provides users with two accurate and efficient algorithms to compute IDs: the first algorithms belongs to the class of polynomial methods [27, 28], whereas the other algorithm belongs to the class of Fourier transform methods [29, 30]. The latter consists mainly of changes made to the existing Fourier transform method [19], and the changes made are shown to improve significantly the accuracy of the computed ID. Both algorithms can compute low and high resolution IDs, referred to as the coarse-grained isotopic distribution (CGID) and the fine-grained isotopic distribution (FGID), respectively, for the remainder of this manuscript. Also both algorithms implemented in MIDAs are capable of computing CGID and FGID with adjustable mass accuracy.
To evaluate the performance of MIDAs, we have benchmarked it against eight methods: four of these methods—Mercury [19], NeutronCluster (NC) [17], Emass [21], and BRAIN [18, 31]—are the four best performing methods taking from a recent publication by Claesen et al. [18]; four other methods included are Mercury5 (a new version of Mercury2) [32], Qmass [20], Isotope Calculator (IC) [33], and a Fourier-transform-based method recently published [34], which we refer to as JFC. JFC is an improved version of Isotopica [35], which incorporates BRAIN’s generating function. The program of JFC was downloaded from http://bioinformatica.cigb.edu.cu/isotopica/centermass.html. The BRAIN code was downloaded from http://www.bioconductor.org/packages/release/bioc/html/BRAIN.html. The program IC was downloaded from http://agarlabs.com/. The rest of the programs were provided by the code authors, whom we acknowledge in the Acknowledgment section.
The performance evaluation was conducted using 25 molecules. Ten of these molecules are benchmark proteins previously used to evaluate the accuracy of computed CGIDs [17, 18]. Another 10 are hydrocarbon molecules whose CGIDs and FGIDs can be exactly computed, making them ideal for evaluating the accuracy of computed IDs. The remaining five molecules, made of a combination of sulfur, mercury, carbon and hydrogen, are used together with some of the other 20 molecules to evaluate the computational time of MIDAs’s algorithms. Results from our investigation show that MIDAs [both the polynomial-based algorithm (MIDAs^{ a }) and the Fourier-transform-based algorithm (MIDAs^{ b })], Emass, and JFC compute CGIDs with equivalent accuracy and are more accurate than the other methods evaluated. When computing the FGIDs, IC and MIDAs^{ a } yield FGIDs that are closest to the exact FGIDs. The results also show that MIDAs^{ a } and MIDAs^{ b } satisfy all aforementioned requirements to be considered a valuable tool, providing the community with two new options for computing accurate IDs.
2 Methods
In the subsections below we explain in detail the two algorithms implemented in MIDAs. The first subsection explains MIDAs^{ a }, a polynomial-based algorithm. The second subsection describes MIDAs^{ b }, a fast Fourier transform (FFT) based algorithm. Both algorithms can be used to compute CGIDs and FGIDs.
2.1 MIDAs Polynomial Multiplication Algorithm (MIDAs^{ a })
There are several polynomial-based methods designed to compute an ID from the MF. Methods such as the stepwise procedure and its improvement [36, 37], symbolic expansion [4], and multinomial expansion [28, 38] have been proposed to compute the expansion of the above polynomial. Although these methods have been shown to perform well for small molecules, they fail to handle large molecules, yielding inaccurate IDs, requiring a significant amount of computer memory, and taking a considerable amount of computational time [16].
Algorithm 1. Computes Coarse-Grained Isotopic Distribution
By summing only the contributions bounded by \( \mathcal{B} \) and \( \mathcal{U} \), we direct the calculations to the relevant part of the ID. It has counter-part in FT based method, namely the heterodyning of in [24].
Algorithm 2. Computes Fine-Grained Isotopic Distribution 2
2.2 MIDAs Fast Fourier Transform Algorithm (MIDAs^{ b })
The MIDAs^{ b } algorithm is similar to an early FFT algorithm by Rockwood et al. [19], which was implemented in a computer program called Mercury. These two algorithms differ, however, in a few aspects. First, using the exact isotopic masses in discrete FFT (DFFT) [39, 40], Mercury produces IDs with leakages (assigning nonzero probabilities to masses where exactly zero probability is expected) and employs an apodization function to minimize leakage [41]. On the other hand, by assigning each isotope mass to a point on a fixed grid, MIDAs^{ b } avoids the leakage problem. Using discrete masses to avoid leakage is not new: Rockwood and Van Orden [32] have written a computer program, whose latest version is called Mercury5, to compute IDs based on the nucleon numbers (or roughly using one dalton mass grid). The improvement we made was to allow the users to specify the mass accuracy other than 1 Da. Second, Mercury uses a fixed number of sample points with the DFFT, whereas in MIDAs^{ b } the number of sample points used depends on the mass accuracy, which is a parameter adjustable by the user.
Every FFT based method relies on the convolution theorem, which states that a convolution can be performed as multiplication in the Fourier domain.
As we shall discuss in the Appendix, there are two key conditions in order for the convolution theorem to be used in the discrete case while computing IDs. The first one is that the masses of each isotope must lie on grid points. Using a mass that is not on the grid causes the “leakage" phenomenon [41]. If the masses considered all reside on grid points, the leakage problem no longer exists. The second important condition is that the mass domain must be large enough so that the “folded-back" phenomenon (which is also known as “aliasing”, “fold over”, or “wrap around” in the signal processing community) near the tail of the distribution is negligible (see Appendix).
Algorithm 3. Computes Fine-Grained and Coarse-Grained Isotopic Distribution
3 Results and Discussion
Atomic Masses and Abundances used for Benchmark Test in this Paper
Isotope |
Atomic mass Da |
Abundance (%) |
---|---|---|
Atomic masses and naturally occurring isotopic abundances [1] | ||
^{ 12}C |
12.0000000000 |
98.9300 |
^{ 13}C |
13.0033548378 |
1.0700 |
^{ 1}H |
1.0078250321 |
99.9885 |
^{ 2}H |
2.0141017780 |
0.0115 |
^{ 14} N |
14.0030740052 |
99.6320 |
^{ 15} N |
15.0001088984 |
0.3680 |
^{ 16}0 |
15.9949146 |
99.7570 |
^{ 17}0 |
16.9991312 |
0.0380 |
^{ 18}0 |
17.9991603 |
0.2050 |
^{ 32}S |
31.97207070 |
94.9300 |
^{ 33}S |
32.97145843 |
0.7600 |
^{ 34}S |
33.96786665 |
4.2900 |
^{ 36}S |
35.96708062 |
0.0200 |
^{ 196}Hg |
195.965833 |
0.0015 |
^{ 198}Hg |
197.966769 |
0.0997 |
^{ 199}Hg |
198.968279 |
0.1687 |
^{ 200}Hg |
199.968326 |
0.2310 |
^{ 201}Hg |
200.970302 |
0.1318 |
^{ 202}Hg |
201.970643 |
0.2986 |
^{ 204}Hg |
203.973493 |
0.0687 |
Atomic Masses and Enriched Carbon’s Isotopic Abundances | ||
^{ 12}C |
12.0000000000 |
1.0000 |
^{ 13}C |
13.0033548378 |
99.0000 |
Molecules for which the Isotopic Distribution was Computed by Various Methods
No. ^{1} |
Molecular formula |
Lightest Mass (Da)^{2} |
Average Mass (Da) |
---|---|---|---|
(1) |
C_{50}H_{71}N_{13}O_{12} |
1045.5345145467 |
1046.1811074558 |
(2) |
C_{254}H_{377}N_{65}O_{75}S_{6} |
5729.6008666397 |
5733.5107592120 |
(3) |
C_{520}H_{817}N_{139}O_{147}S_{8} |
11616.8493497485 |
11624.4487510271 |
(4) |
C_{744}H_{1224}N_{210}O_{222}S_{5} |
16812.9547750824 |
16823.3213522608 |
(5) |
C_{2023}H_{3208}N_{524}O_{619}S_{20} |
45387.0070331016 |
45415.6793695079 |
(6) |
C_{2934}H_{4615}N_{781}O_{897}S_{39} |
66389.8624747027 |
66432.4555603617 |
(7) |
C_{5047}H_{8014}N_{1338}O_{1495}S_{48} |
112823.8795468070 |
112895.1259319964 |
(8) |
C_{8574}H_{13378}N_{2092}O_{2392}S_{77} |
186386.7992654122 |
186506.0525933526 |
(9) |
C_{17600}H_{2674}N_{4752}O_{5486}S_{197} |
398470.3669960258 |
398722.9724824960 |
(10) |
C_{23832}H_{37816}N_{6528}O_{7031}S_{170} |
533403.4750914392 |
533735.2146493989 |
(11) |
C_{5}H_{5} |
65.0391251605 |
65.0933832534 |
(12) |
C_{10}H_{10} |
130.0782503209 |
130.1867665069 |
(13) |
C_{50}H_{50} |
650.3912516049 |
650.9338325345 |
(14) |
C_{100}H_{100} |
1300.7825032099 |
1301.8676650690 |
(15) |
C_{1000}H_{1000} |
13007.8250320999 |
13018.6766506902 |
(16) |
C_{10000}H_{10000} |
130078.2503209999 |
130186.7665069023 |
(17) |
C_{20000}H_{20000} |
260156.5006419999 |
260373.5330138047 |
(18) |
C_{30000}H_{30000} |
390234.7509629999 |
390560.2995207072 |
(19) |
C_{40000}H_{40000} |
520313.0012839999 |
520747.0660276095 |
(20) |
C_{50000}H_{50000} |
650391.2516049999 |
650933.8325345119 |
(21) |
S_{20000} |
639441.4139999999 |
641321.6938997399 |
(22) |
Hg_{5000} |
159860.3534999999 |
160330.4234749349 |
(23) |
Hg_{1000}S_{1000} |
227937.9037000000 |
232665.2510595869 |
(24) |
S_{1000}C_{1000}H_{1000} |
44979.8957320999 |
45084.7613456772 |
(25) |
Hg_{1000}C_{1000}H_{1000} |
208973.6580321000 |
213617.8430152902 |
3.1 Overview of Methods Benchmarked
MIDAs’s performance was evaluated against eight published methods: Mercury [19], Mercury5 [32], JFC [34], Isotope Calculator (IC) [33], Qmass [20], BRAIN [18, 31, 43], NeutronCluster (NC) [17], and Emass [21]. The first three published methods are Fourier-transform-based, IC utilizes a divide-and-recursively-combine algorithm, Qmass has its core based on FFT, BRAIN and NeutronCluster are polynomial-based, whereas Emass is based on a direct convolution approach related to the stepwise procedure and its improvement [36, 37]. BRAIN, Qmass, NC, Emass, JFC, and Mercury5 all use nucleon numbers to classify molecule’s isotopic variants, while all but the last assign to a given nucleon number the average isotopic mass of all variants of that nucleon number.
IC is suitable for computing FGIDs, not CGIDs. Qmass, BRAIN, NeutronCluster, and Emass are suitable for computing CGIDs, not FGIDs. The remaining three Fourier-transform-based methods are also suitable for computing CGIDs, although Mercury is the only one that has FGID computing capacity. To benchmark the FGIDs computed by MIDAs against those of Mercury, however, would require post-processing of Mercury data files such as removing noise from leakage and rounding errors, as well as compiling output from different specified molecular masses. All of these steps may be done differently and make the benchmark test less meaningful. For these reasons, we only evaluated MIDAs’s FGIDs against that of IC, not that of Mercury.
3.2 Benchmarking of Computed CGIDs
Following previous publications [18, 19, 24], the accuracy of a method is gauged by how accurately it yields ID mean, ID standard deviation, lightest and heaviest molecular masses, while computing a CGID. In our evaluation, the lightest mass and heaviest molecular mass are defined as a molecule’s molecular mass computed using the masses of the lightest and heaviest isotopes, respectively.
Coarse - Grained Isotopic Distribution Results using Naturally Occurring Isotopes
Difference in lightest mass | |||||||||
No.^{1} |
MIDAs^{ a } |
MIDAs^{ b } |
BRAIN |
Emass |
Mercury |
Mercury5 |
NC |
Qmass |
JFC |
(1) |
0 |
-3.4e - 05 |
-2.6e - 10 |
2.2e - 13 |
7.0 |
7.3 |
0 |
12.2 |
0 |
(2) |
0 |
-1.7e - 03 |
-1.3e - 09 |
0 |
12.0 |
12.1 |
0 |
15.9 |
0 |
(3) |
0 |
-2.6e - 03 |
-2.8e - 09 |
0 |
8.0 |
8.4 |
0 |
18.2 |
-1.0e - 10 |
(4) |
0 |
-2.1e - 03 |
-4.2e - 09 |
0 |
22.0 |
21.6 |
-360 |
39.1 |
8.0e - 10 |
(5) |
7.2e - 12 |
-7.4e - 03 |
1.0e - 08 |
1.4e - 11 |
2.9 |
3.3 |
0 |
0.045 |
-1.6e - 01 |
(6) |
-1.4e - 11 |
-5.0 |
-1.6e - 08 |
0 |
22.0 |
21.3 |
0 |
65.2 |
-4.6 |
(7) |
1.5e - 11 |
-19.1 |
-2.7e - 08 |
-8.0 |
-8.0 |
-7.2 |
0 |
-69.7 |
-18.1 |
(8) |
0 |
-49.1 |
-4.1e - 08 |
-31.1 |
-55.2 |
-55.3 |
0 |
-90.7 |
-48.8 |
(9) |
-5.8e - 11 |
-147.4 |
-1.1e - 07 |
-114.3 |
-124.3 |
-124.7 |
0 |
-188.3 |
-118.9 |
(10) |
0 |
-210.6 |
-1.2e - 07 |
-172.5 |
-203.7 |
-203.7 |
0 |
-355.6 |
-146.4 |
Difference in heaviest isotopic mass | |||||||||
No. |
MIDAs^{ a } |
MIDAs^{ b } |
BRAIN |
Emass |
Mercury |
Mercury5 |
NC |
Qmass |
JFC |
(1) |
7.4e + 01 |
1.4e + 02 |
1.5e + 02 |
1.4e + 02 |
1.5e + 02 |
1.5e + 02 |
1.5e + 02 |
1.4e + 02 |
1.4e + 02 |
(2) |
7.2e + 02 |
8.4e + 02 |
8.7e + 02 |
8.4e + 02 |
8.5e + 02 |
8.5e + 02 |
8.6e + 02 |
8.4e + 02 |
8.4e + 02 |
(3) |
1.6e + 03 |
1.8e + 03 |
1.8e + 03 |
1.7e + 02 |
1.8e + 03 |
1.8e + 03 |
1.8e + 03 |
1.8e + 03 |
1.8e + 03 |
(4) |
2.5e + 03 |
2.6e + 03 |
2.6e + 03 |
2.6e + 03 |
2.6e + 03 |
2.6e + 03 |
2.3e + 03 |
2.6e + 03 |
2.6e + 03 |
(5) |
6.8e + 03 |
7.0e + 03 |
7.0e + 03 |
7.0e + 03 |
7.0e + 03 |
7.0e + 04 |
7.0e + 03 |
6.9e + 03 |
7.0e + 03 |
(6) |
9.9e + 03 |
1.0e + 04 |
1.0e + 04 |
1.0e + 04 |
1.0e + 04 |
1.0e + 04 |
1.0e + 04 |
1.0e + 04 |
1.0e + 04 |
(7) |
1.7e + 04 |
1.7e + 04 |
1.7e + 04 |
1.7e + 04 |
1.7e + 04 |
1.7e + 04 |
1.7e + 04 |
1.8e + 04 |
1.7e + 04 |
(8) |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
2.9e + 04 |
(9) |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
6.0e + 04 |
(10) |
8.2e + 04 |
8.3e + 04 |
8.3e + 04 |
8.3e + 04 |
8.3e + 04 |
8.3e + 04 |
8.3e + 04 |
8.2e + 04 |
8.2e + 04 |
The software NC reports correct lightest masses for nine out of the 10 molecules. For biomolecule number four, NC reports a mass that is 360 Da heavier. This same result has also been observed independently by others [17, 18].
For MIDAs^{ a }, BRAIN, and Emass, the differences between exact and computed lightest masses, for small and medium size biomolecules [numbered (1)–(6)], are smaller than 1.0e–08 Da. As for JFC and MIDAs^{ b }, although they do not perform as well as the polynomial-based methods above, they are not inferior to other Fourier-transform-based methods such as Mercury and Mercury5. When the biomolecules become heavier [say molecules numbered (7)–(10)], the chance of experimentally observing the exact lightest masses rapidly decreases, and the computed difference between exact and computed lightest masses becomes less important.
The evaluation of getting the correct heaviest mass is not as important under natural conditions. This is because heavy isotopes typically carry very low natural occurrence probabilities so that it is impossible to observe the exact heaviest isotopic variant of the molecule. Of course, when artificial isotopic abundances are enforced, obtaining the correct heaviest masses can become important, while obtaining the correct lightest masses can become unimportant. Since the current evaluation is using the natural isotopic abundances, we do not expect any method to provide correct heaviest masses. Indeed, because most methods are computing terms of an ID that are concentrated around a molecule’s average molecular mass, which is closer to the exact lightest mass under natural isotopic abundances, the mass range used for computing IDs usually will not include the heaviest masses. For biomolecules numbered (1)–(10), the differences between the exact heaviest masses and the heaviest masses computed by all methods considered are all of the same order of magnitude.
Coarse - Grained Isotopic Distribution Results using Naturally Occurring Isotopes
Difference in average mass | |||||||||
No.^{1} |
MIDAs^{ a } |
MIDAs^{ b } |
BRAIN |
Emass |
Mercury |
Mercury5 |
NC |
Qmass |
JFC |
(1) |
2.3e - 13 |
6.8e - 13 |
4.4e - 03 |
-4.5e - 13 |
9.1e - 05 |
-2.9e - 05 |
6.51e - 05 |
-4.6e - 13 |
2.3e - 12 |
(2) |
1.8e - 12 |
3.5e - 11 |
3.2e - 01 |
-1.8e - 12 |
8.1e - 04 |
7.6e - 05 |
3.7e - 03 |
-7.3e - 12 |
3.6e - 12 |
(3) |
-5.4e - 12 |
-5.4e - 12 |
8.2e - 02 |
-3.6e - 12 |
2.2e - 04 |
-5.1e - 05 |
5.9e - 03 |
5.4e - 12 |
0 |
(4) |
-7.3e - 12 |
2.0e - 10 |
4.6e - 02 |
0 |
2.9e - 03 |
-7.3e - 04 |
-360 |
5.8e - 11 |
7.3e - 12 |
(5) |
4.3e - 11 |
2.6e - 10 |
1.4e - 04 |
7.3e - 12 |
-3.1e - 03 |
1.8e - 04 |
3.7e - 03 |
-7.3e - 12 |
-2.9e - 11 |
(6) |
0 |
1.3e - 10 |
1.7e - 06 |
-5.8e - 11 |
-4.1e - 03 |
3.1e - 03 |
-8.5e - 04 |
4.2e - 09 |
1.2e - 10 |
(7) |
4.3e - 11 |
-2.4e - 09 |
-2.7e - 08 |
-1.4e - 11 |
-4.1e - 03 |
2.1e - 03 |
-3.9e - 03 |
-1.3e - 10 |
5.8e - 11 |
(8) |
-2.9e - 11 |
1.6e - 09 |
-4.1e - 08 |
0 |
-5.1e - 03 |
-7.9e - 03 |
-1.0e - 02 |
-7.5e - 01 |
2.6e - 10 |
(9) |
-3.5e - 10 |
7.2e - 09 |
-1.1e - 07 |
-3.5e - 10 |
1.7e - 02 |
7.7e - 03 |
-4.0e - 02 |
-9.7e - 03 |
7.6e - 10 |
(10) |
-1.2e - 10 |
7.6e - 09 |
-1.2e - 07 |
-1.2e - 10 |
-1.1e - 01 |
3.3e - 02 |
-3.8e - 02 |
-4.4e + 02 |
-4.6e - 10 |
Difference in standard deviation | |||||||||
No.^{1} |
MIDAs^{ a } |
MIDAs^{ b } |
BRAIN |
Emass |
Mercury |
Mercury5 |
NC |
Qmass |
JFC |
(1) |
1.1e - 06 |
-4.2e - 10 |
1.2e - 02 |
1.1e - 06 |
1.6e - 06 |
-1.2e - 04 |
-3.6e - 04 |
1.1e - 06 |
1.1e - 06 |
(2) |
6.5e - 06 |
-5.0e - 09 |
3.6e - 01 |
6.5e - 06 |
1.2e - 06 |
-1.8e - 04 |
-1.2e - 03 |
6.5e - 06 |
6.5e - 06 |
(3) |
8.0e - 06 |
1.5e - 08 |
1.2e - 01 |
8.0e - 06 |
9.8e - 05 |
-3.3e - 05 |
2.2e - 03 |
7.9e - 06 |
8.0e - 06 |
(4) |
7.2e - 06 |
2.5e - 08 |
7.3e - 02 |
7.2e - 06 |
-3.0e - 07 |
-4.6e - 04 |
-4.5e - 02 |
7.0e - 06 |
7.1e - 06 |
(5) |
1.3e - 05 |
1.8e - 07 |
3.7e - 04 |
1.3e - 05 |
9.7e - 06 |
-2.7e - 04 |
-1.8e - 03 |
1.3e - 05 |
1.3e - 05 |
(6) |
1.8e - 05 |
-1.9e - 07 |
2.3e - 05 |
1.8e - 05 |
-3.9e - 07 |
-9.0e - 04 |
-8.4e - 03 |
-2.7e - 06 |
1.7e - 05 |
(7) |
2.0e - 05 |
-8.0e - 07 |
2.1e - 05 |
2.0e - 05 |
-2.7e - 07 |
-7.1e - 04 |
-7.5e - 03 |
2.2e - 05 |
2.0e - 05 |
(8) |
2.5e - 05 |
2.1e - 06 |
2.5e - 05 |
2.5e - 05 |
4.4e - 06 |
-5.4e - 04 |
-8.7e - 03 |
-4.8e + 00 |
2.6e - 05 |
(9) |
4.2e - 05 |
-7.8e - 06 |
4.1e - 05 |
4.5e - 05 |
-5.9e - 07 |
-1.5e - 03 |
-5.2e - 03 |
-9.9e - 02 |
3.9e - 05 |
(10) |
4.8e - 04 |
-1.0e - 05 |
5.0e - 05 |
5.4e - 05 |
-1.2e - 05 |
-1.3e - 03 |
9.6e - 03 |
-1.4e + 02 |
3.8e - 05 |
We have also considered the possibility of deviations from the natural frequencies of occurrence of an element’s isotopes. Such customized modifications can be accomplished experimentally by a technique known as isotopic labeling [3], which is frequently employed in quantitative proteomics [44]. To mimic such a situation, we have computed CGIDs for various molecules assuming different carbon isotopic abundances: 99% ^{13}C and 1% ^{12}C as listed in Table 1. We then derive from the computed CGIDs the average molecular masses and standard deviations, and compare them to the corresponding theoretical values that can be analytically calculated.
Coarse - Grained Isotopic Distribution Evaluation using Abundances for Carbon’s Isotopes of 99% ^{13}C and 1% ^{12}C
Difference in average mass | |||||||||
No.^{1} |
MIDAs^{ a } |
MIDAs^{ b } |
BRAIN |
Emass |
Mercury |
Mercury5 |
NC |
Qmass |
JFC |
(1) |
-6.8e - 13 |
3.9e - 12 |
-1.7e + 01 |
2.3e - 13 |
5.0e - 05 |
-4.0e - 05 |
3.3e - 02 |
-2.3e - 13 |
1.6e - 12 |
(2) |
0 |
4.5e - 11 |
NR^{2} |
3.6e - 12 |
3.3e - 04 |
7.3e - 05 |
1.7e - 01 |
-1.8e - 12 |
4.5e - 12 |
(3) |
1.8e - 12 |
2.2e - 10 |
NR |
-1.8e - 12 |
-6.3e - 04 |
-3.1e - 04 |
3.1e - 01 |
-6.4e - 11 |
-2.4e - 11 |
(4) |
-7.3e - 12 |
4.4e - 11 |
NR |
0 |
3.9e - 03 |
-2.5e - 04 |
NR |
-1.1e - 11 |
-1.1e - 11 |
(5) |
-2.9e - 11 |
-5.8e - 11 |
NR |
7.3e - 12 |
-5.2e - 03 |
6.4e - 04 |
1.8e + 03 |
-4.1e - 07 |
-7.3e - 12 |
(6) |
5.8e - 11 |
-1.0e - 10 |
NR |
2.9e - 11 |
-5.4e - 03 |
2.1e - 03 |
2.7e + 03 |
-2.9e - 11 |
2.9e - 11 |
(7) |
1.4e - 11 |
6.8e - 10 |
NR |
-7.3e - 11 |
-8.7e - 04 |
6.6e - 04 |
4.8e + 03 |
-4.9e - 07 |
1.4e - 10 |
(8) |
3.2e - 11 |
-3.5e - 10 |
NR |
8.7e - 11 |
-8.1e - 03 |
6.4e - 03 |
8.4e + 03 |
-1.1e + 02 |
1.2e - 10 |
(9) |
0 |
4.2e - 09 |
NR |
0 |
-3.7e - 03 |
8.7e - 03 |
1.7e + 04 |
-1.5e + 02 |
-4.1e - 10 |
(10) |
2.3e - 10 |
7.7e - 09 |
NR |
8.15e - 10 |
-1.2e - 01 |
5.4e - 03 |
2.4e + 04 |
-5.1e + 02 |
-1.2e - 10 |
Difference in standard deviation | |||||||||
No.^{1} |
MIDAs^{ a } |
MIDAs^{ b } |
BRAIN |
Emass |
Mercury |
Mercury5 |
NC |
Qmass |
JFC |
(1) |
7.9e - 07 |
-3.3e - 10 |
-1.4e + 01 |
7.9e - 07 |
-2.6e - 08 |
-1.2e - 04 |
-4.9e + 00 |
7.4e - 07 |
7.9e - 07 |
(2) |
5.9e - 06 |
5.9e - 09 |
NR^{2} |
5.9e - 06 |
2.2e - 07 |
-1.9e - 04 |
-1.1e + 01 |
5.9e - 06 |
5.9e - 06 |
(3) |
7.6e - 06 |
1.4e - 08 |
NR |
7.6e - 06 |
8.2e - 06 |
-1.3e - 04 |
-1.6e + 01 |
7.6e - 06 |
7.6e - 06 |
(4) |
7.1e - 06 |
1.4e - 08 |
NR |
7.1e - 06 |
-1.7e - 07 |
-4.7e - 04 |
NR |
7.1e - 06 |
7.1e - 06 |
(5) |
1.3e - 05 |
-2.0e - 07 |
NR |
1.3e - 05 |
3.6e - 07 |
-2.8e - 04 |
5.2e + 00 |
1.2e - 05 |
1.3e - 05 |
(6) |
1.7e - 05 |
-5.7e - 07 |
NR |
1.7e - 05 |
-1.2e - 07 |
-9.2e - 04 |
6.6e + 00 |
1.8e - 05 |
1.8e - 05 |
(7) |
2.1e - 05 |
-1.1e - 06 |
NR |
2.1e - 05 |
-6.9e - 09 |
-7.2e - 04 |
8.6e + 00 |
1.9e - 05 |
2.0e - 05 |
(8) |
2.2e - 05 |
6.3e - 07 |
NR |
2.4e - 05 |
-1.7e - 06 |
-5.5e - 04 |
1.1e + 01 |
-1.1e + 02 |
2.5e - 05 |
(9) |
4.0e - 05 |
1.0e - 06 |
NR |
4.4e - 05 |
-3.5e - 06 |
-1.5e - 03 |
1.6e + 01 |
-2.0e + 02 |
5.1e - 05 |
(10) |
3.3e - 05 |
1.0e - 05 |
NR |
3.5e - 05 |
-1.5e - 05 |
-1.4e - 03 |
1.9e + 01 |
-2.3e - 04 |
6.8e - 05 |
3.3 Assessing Fidelity of Computed CGIDs and FGIDs
To evaluate the fidelity of CGIDs and FGIDs reported, we used 10 hydrocarbon molecules [numbered (11)–(20) in Table 2] because the “exact” CGIDs and FGIDs can be calculated for these molecules. Exact CGID is defined as follows. First, one merges isotopic variants that have the same nucleon number into one aggregated isotopic variant, whose corresponding molecular mass (MM) and occurrence probability are computed respectively from the probability-weighted sum of masses and from the sum of the probabilities of the isotopic variants merged. However, only aggregated isotopic variants having probability greater than 5e–12 were retained for accuracy evaluation. The exact FGIDs were obtained/defined similarly to the exact CGIDs, except that one merges only isotopic variants whose molecular mass differences are within some pre-specified mass accuracy, here set to 0.01 Da. The probability cutoff of 5e–12, for typical sample loads, probably already surpasses the detection capability of current mass spectrometer. Furthermore, it is also a small enough cutoff that ignoring terms below the cutoff has negligible effect in the ID profile.
For CGIDs, ∈ is set to one Da, while for FGIDs, ∈ is set to 0.01 Da.
Coarse-Grained Isotopic Distribution (CGID) Fidelity Assessment Results τ is the Number of Terms in the Exact CGID Having Probability Greater than 5e - 12. Δτ is the Difference Between τ and the Number of Terms of a Computed CGID. Δχ is the Difference Between the Sum of Probability Terms from the Exact CGID and the Sum of Probability terms from the Computed CGID; σ _{ m } is the Root-Mean-Square Differences of Masses Between Exact and Computed CGID, see Equation (11); U is the Number of Terms from the Computed CGID that are not with ± 2∈ (∈ = 1 Da) from any Terms in the Exact CGID; E is the Number of Terms in the Exact CGID that Have at Least One Corresponding Term in Computed CGID that are with ± 2∈; ρ is the Weighted Correlation Between Computed and Exact CGID
No.^{1} |
τ |
Δτ |
Δχ |
σ _{ m } |
U |
E |
ρ |
Method |
---|---|---|---|---|---|---|---|---|
(11) |
6 |
0 |
-5.6e - 16 |
5.1e - 13 |
0 |
6 |
1.0 |
MIDAs^{ a } |
0 |
-7.4e - 15 |
2.2e - 04 |
0 |
6 |
1.0 |
MIDAs^{ b } | ||
0 |
-4.7e - 04 |
1.2e - 14 |
0 |
6 |
0.99999988 |
Emass | ||
0 |
-4.4e - 16 |
2.1e - 05 |
0 |
6 |
1.0 |
JFC | ||
(12) |
7 |
0 |
-8.9e - 16 |
7.7e - 12 |
0 |
7 |
1.0 |
MIDAs^{ a } |
0 |
7.8e - 16 |
7.5e - 05 |
0 |
7 |
1.0 |
MIDAs^{ b } | ||
0 |
-8.3e - 04 |
2.6e - 14 |
0 |
7 |
0.99999963 |
Emass | ||
0 |
-5.6e - 16 |
1.3e - 05 |
0 |
7 |
1.0 |
JFC | ||
(13) |
12 |
0 |
-5.0e - 15 |
1.5e - 11 |
0 |
12 |
1.0 |
MIDAs^{ a } |
0 |
-2.2e - 14 |
3.3e - 05 |
0 |
12 |
1.0 |
MIDAs^{ b } | ||
0 |
-1.6e - 03 |
7.3e - 14 |
0 |
12 |
0.99999885 |
Emass | ||
0 |
-5.0e - 15 |
8.3e - 04 |
0 |
12 |
1.0 |
JFC | ||
(14) |
15 |
0 |
-1.0e - 14 |
1.2e - 11 |
0 |
15 |
1.0 |
MIDAs^{ a } |
0 |
1.5e - 14 |
2.3e - 05 |
0 |
15 |
1.0 |
MIDAs^{ b } | ||
0 |
-1.3e - 03 |
1.9e - 13 |
0 |
15 |
0.99999957 |
Emass | ||
0 |
-1.2e - 14 |
3.6e - 03 |
0 |
15 |
1.0 |
JFC | ||
(15) |
40 |
0 |
-9.8e - 14 |
1.0e - 11 |
0 |
40 |
1.0 |
MIDAs^{ a } |
0 |
1.3e - 12 |
6.3e - 06 |
0 |
40 |
1.0 |
MIDAs^{ b } | ||
0 |
-5.6e - 04 |
3.2e - 12 |
0 |
40 |
0.99999996 |
Emass | ||
0 |
-9.7e - 14 |
2.5e - 03 |
0 |
40 |
1.0 |
JFC | ||
(16) |
139 |
0 |
-9.6e - 13 |
4.4e - 10 |
0 |
139 |
1.0 |
MIDAs^{ a } |
0 |
3.8e - 12 |
6.3e - 06 |
0 |
139 |
1.0 |
MIDAs^{ b } | ||
0 |
-1.9e - 04 |
5.2e - 11 |
0 |
139 |
1.0 |
Emass | ||
0 |
-6.6e - 13 |
4.1e - 02 |
0 |
139 |
1.0 |
JFC | ||
(17) |
195 |
0 |
-2.0e - 12 |
5.5e - 10 |
0 |
195 |
1.0 |
MIDAs^{ a } |
0 |
1.3e - 12 |
6.3e - 06 |
0 |
195 |
1.0 |
MIDAs^{ b } | ||
0 |
-1.3e - 04 |
1.3e - 10 |
0 |
195 |
1.0 |
Emass | ||
0 |
-1.9e - 12 |
6.1e - 02 |
0 |
195 |
1.0 |
JFC | ||
(18) |
238 |
0 |
-3.0e - 12 |
9.4e - 10 |
0 |
238 |
1.0 |
MIDAs^{ a } |
0 |
2.5e - 11 |
6.4e - 06 |
0 |
238 |
1.0 |
MIDAs^{ b } | ||
0 |
-1.1e - 04 |
2.3e - 10 |
0 |
238 |
1.0 |
Emass | ||
0 |
-2.6e - 12 |
6.0e - 02 |
0 |
238 |
1.0 |
JFC | ||
(19) |
274 |
0 |
-4.1e - 12 |
1.2e - 09 |
0 |
274 |
1.0 |
MIDAs^{ a } |
1 |
2.5e - 11 |
6.1e - 02 |
0 |
274 |
1.0 |
MIDAs^{ b } | ||
0 |
-9.5e - 05 |
3.0e - 10 |
0 |
274 |
1.0 |
Emass | ||
0 |
-5.4e - 12 |
5.9e - 02 |
0 |
274 |
1.0 |
JFC | ||
(20) |
306 |
0 |
-4.8e - 12 |
1.6e - 09 |
0 |
306 |
1.0 |
MIDAs^{ a } |
0 |
2.6e - 11 |
6.8e - 06 |
0 |
306 |
1.0 |
MIDAs^{ b } | ||
0 |
-8.5e - 05 |
4.2e - 10 |
0 |
306 |
1.0 |
Emass | ||
0 |
-4.6e - 12 |
5.6e - 02 |
0 |
306 |
1.0 |
JFC |
Fine - Grained Isotopic Distribution (FGID) Fidelity Assessment Results τ is the number of terms in the exact FGID having probability greater than 5e - 12; Δτ is the difference between τ and the number of terms of a computed FGID; Δχ is the difference between the sum of probability terms from the exact FGID and the sum of probability terms from the computed FGID; σ _{ m } is the root-mean-square differences of masses between exact and computed FGID, see Equation (11); U is the number of terms from the computed FGID that are not with ±2∈ (∈ = 0.01 Da) from any terms in the exact FGID; E is the number of terms in the exact FGID that have at least one corresponding term in computed FGID that are with ±2∈; ρ is the weighted correlation between computed and exact FGID
No.^{1} |
(ppm)^{2} |
τ |
Δτ |
Δχ |
σ _{ m } |
U |
E |
ρ |
Method |
---|---|---|---|---|---|---|---|---|---|
(11) |
307.25 |
6 |
0 |
-5.6e - 16 |
1.1e - 09 |
0 |
6 |
1.0 |
MIDAs^{ a } |
0 |
8.9e - 14 |
7.3e - 05 |
0 |
6 |
1.0 |
MIDAs^{ b } | |||
0 |
1.5e - 08 |
1.8e - 09 |
0 |
6 |
1.0 |
IC | |||
(12) |
153.63 |
7 |
0 |
-2.7e - 15 |
4.2e - 10 |
0 |
7 |
1.0 |
MIDAs^{ a } |
0 |
2.3e - 12 |
1.6e - 05 |
0 |
7 |
1.0 |
MIDAs^{ b } | |||
0 |
3.7e - 07 |
2.3e - 09 |
0 |
7 |
1.0 |
IC | |||
(13) |
30.72 |
13 |
1 |
-5.7e - 13 |
3.1e - 03 |
0 |
13 |
0.99999591 |
MIDAs^{ a } |
0 |
3.0e - 13 |
4.4e - 03 |
0 |
12 |
0.99999592 |
MIDAs^{ b } | |||
1 |
-2.9e - 07 |
3.1e - 03 |
0 |
13 |
0.99999591 |
IC | |||
(14) |
15.36 |
16 |
3 |
2.7e - 12 |
5.3e - 03 |
0 |
15 |
0.99937104 |
MIDAs^{ a } |
3 |
3.6e - 12 |
7.1e - 03 |
0 |
15 |
0.99937227 |
MIDAs^{ b } | |||
4 |
-3.5e - 07 |
5.1e - 03 |
0 |
16 |
0.99937103 |
IC | |||
(15) |
1.53 |
65 |
-6 |
-4.6e - 11 |
1.7e - 03 |
0 |
58 |
0.99927870 |
MIDAs^{ a } |
3 |
1.2e - 11 |
4.0e - 03 |
0 |
64 |
0.98806083 |
MIDAs^{ b } | |||
5 |
2.4e - 05 |
3.6e - 03 |
0 |
65 |
0.98803755 |
IC | |||
(16) |
0.15 |
291 |
-5 |
2.6e - 11 |
5.2e - 03 |
0 |
257 |
0.99999001 |
MIDAs^{ a } |
53 |
6.9e - 11 |
5.6e - 03 |
1 |
282 |
0.99958599 |
MIDAs^{ b } | |||
82 |
1.4e - 07 |
5.2e - 03 |
0 |
280 |
0.99998237 |
IC | |||
(17) |
0.077 |
500 |
-18 |
1.8e - 10 |
4.0e - 03 |
0 |
453 |
0.99785805 |
MIDAs^{ a } |
126 |
1.3e - 10 |
7.3e - 03 |
13 |
488 |
0.95950051 |
MIDAs^{ b } | |||
124 |
-1.6e - 08 |
4.5e - 03 |
0 |
496 |
0.99951891 |
IC | |||
(18) |
0.051 |
715 |
-16 |
-5.4e - 10 |
4.5e - 03 |
0 |
636 |
0.99466182 |
MIDAs^{ a } |
242 |
1.5e - 10 |
7.0e - 03 |
10 |
681 |
0.71069880 |
MIDAs^{ b } | |||
19 |
-5.0e - 08 |
4.3e - 03 |
0 |
690 |
0.99735244 |
IC | |||
(19) |
0.038 |
881 |
57 |
7.6e - 11 |
4.8e - 03 |
0 |
824 |
0.95007936 |
MIDAs^{ a } |
437 |
1.9e - 10 |
7.8e - 03 |
33 |
866 |
0.59270671 |
MIDAs^{ b } | |||
-26 |
-1.7e - 06 |
5.9e - 03 |
0 |
713 |
0.97935224 |
IC | |||
(20) |
0.031 |
1143 |
93 |
-1.4e - 10 |
5.7e - 03 |
0 |
1011 |
0.85638390 |
MIDAs^{ a } |
498 |
2.2e - 10 |
8.4e - 03 |
47 |
1092 |
0.63960000 |
MIDAs^{ b } | |||
-173 |
-1.7e - 05 |
4.6e - 03 |
0 |
838 |
0.98564325 |
IC |
For fidelity assessment of CGIDs, all four methods shown in Table 6 yield small Δτ and ρ values close to one. In terms of σ _{ m } and Δχ, more differences are revealed. Emass always yields small σ _{ m }, reflecting good fidelity in terms of mass locations, but seems to give a larger |Δχ|, reflecting less accuracy in amplitudes. JFC and MIDAs^{ b } seem to yield less precise mass locations, evidenced by a larger σ _{ m }, but seem to provide more accurate amplitudes, evidenced by a smaller |Δχ|. MIDAs^{ a } yields both accurate mass locations and accurate amplitudes.
The values of Δχ and σ _{ m } in Table 7 indicate that IC, MIDAs^{ a }, and MIDAs^{ b } report FGID terms with similar mass accuracy and with probability sums that are close to the expected value. For small to medium molecules, numbered (11)–(15), IC, MIDAs^{ a }, and MIDAs^{ b } have equivalently accurate results. For molecules numbered (16)–(20), IC and MIDAs^{ a } have comparable performances, both slightly better than MIDAs^{ b }. The values for Δτ indicates that MIDAs^{ b } reports many more terms than expected in its computed FGID. Not expecting any leakage, MIDAs^{ b } gains these extra terms mainly due to rounding errors associated with the DFFT numerical procedure.
The difference observed in Δτ for MIDAs^{ a } is caused by the pruning and merging procedures employed by the algorithm. All the FGID terms computed by IC and MIDAs^{ a } are within 2∈ from the exact FGID terms, which is shown by the number of unexplained term (U) being zero in Table 7. It is also true that most of the terms computed from MIDAs^{ b } are within 2∈ from the exact FGID terms with the exception of molecules (17)–(20) where the number U ranges from 1 to 47. The computed weighted correlation also shows that for heavier molecules, (18)–(20), both IC and MIDAs^{ a } produce FGIDs that are more similar to the exact FGIDs than MIDAs^{ b }.
What causes MIDAs^{ b } to perform worse here might be related to the fact that pinning the elemental masses to grid points may introduce appreciable mass errors while computing IDs for larger molecules. In the worst case scenario, the mass error introduced is apparently proportional to the number of atoms contained in the molecule. Even though MIDAs^{ b } employs a mass rescaling [32] to bring the computed average masses and standard deviations close to their theoretical values, the linear mass rescaling is not sufficient to guarantee the full profile resemblance between the computed ID and the exact ID. The non-negligible discrepancy (indicated by the weighted correlation ρ not very close to one) between the computed FGID and the exact FGID for molecules (18)–(20) is reflecting this problem.
3.4 MIDAs Web Interface
MIDAs web interface http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html is user-friendly, but at the same time offers considerable flexibility. For example, in terms of the input molecule, the user may type in the box an elemental composition, a molecular formula, a peptide, or even a protein sequence. The program recognizes the input molecule in all formats above and extracts the corresponding elemental compositions for computing CGIDs and FGIDs. The isotopic abundances and elements’ masses can also be customized within the web interface. The user simply clicks on the “change” button to edit the abundance table of all elements. Other fields that can be easily customized and specified by the user are the charge of the input molecule and the cutoff probability. MIDAs displays both CGID and FGID together using user-specified accuracies, one for each. The “algorithm” drop down box allows the user to select either the FFT or the polynomial algorithms. The output, including the lightest mass, theoretical average mass, theoretical mass standard deviation, computed average mass, computed mass standard deviation, FGID peak list, and CGID peak list can be exported to a flat file by clicking on the “download output” button on the result page. There is also a contextual help for every functional button.
4 Conclusion and Outlook
Computation Time in Seconds (s) and Number of Terms Reported with MIDAs’s Computed Coarse-Grained (CG) and Fine-Grained (FG) Isotopic Distributions (ID) Using 1.0 Da and 0.01 Da Mass Accuracy, Respectively
MIDAs^{a} | ||||
No.^{1} |
Number of terms CGID |
CGID time(s) |
Number of terms FGID |
FGID time(s) |
(1) |
85 |
0.006 |
42 |
0.001 |
(2) |
154 |
0.01 |
151 |
0.002 |
(3) |
186 |
0.02 |
246 |
0.001 |
(4) |
203 |
0.02 |
301 |
0.001 |
(5) |
290 |
0.04 |
809 |
0.006 |
(6) |
341 |
0.05 |
1269 |
0.01 |
(7) |
423 |
0.1 |
1945 |
0.05 |
(8) |
540 |
0.13 |
3145 |
0.06 |
(9) |
820 |
0.14 |
6579 |
0.3 |
(10) |
956 |
0.2 |
7850 |
0.4 |
(21) |
3022 |
0.35 |
13834 |
0.8 |
(22) |
5908 |
1.0 |
74994 |
1.0 |
(23) |
2706 |
0.23 |
28508 |
2.1 |
(24) |
617 |
0.05 |
2805 |
0.01 |
(25) |
2623 |
0.21 |
18261 |
0.5 |
MIDAs^{b} | ||||
No. |
Number of terms CGID |
CGID time(s) |
Number of terms FGID |
FGID time(s) |
(1) |
15 |
0.0006 |
29 |
0.025 |
(2) |
29 |
0.001 |
114 |
0.041 |
(3) |
38 |
0.001 |
193 |
0.043 |
(4) |
42 |
0.001 |
241 |
0.08 |
(5) |
78 |
0.002 |
740 |
0.08 |
(6) |
95 |
0.002 |
1166 |
0.14 |
(7) |
123 |
0.004 |
1784 |
0.14 |
(8) |
157 |
0.004 |
2953 |
0.14 |
(9) |
230 |
0.004 |
6527 |
0.3 |
(10) |
257 |
0.01 |
7818 |
0.3 |
(21) |
794 |
0.005 |
12405 |
0.4 |
(22) |
1500 |
0.01 |
74994 |
1.0 |
(23) |
706 |
0.01 |
23367 |
0.7 |
(24) |
188 |
0.003 |
2209 |
0.2 |
(25) |
698 |
0.01 |
16384 |
0.8 |
Acknowledgments
The authors thank Alfred Yergey for sending them the NeutronCluster code, and Alan Rockwood for providing them with codes of Mercury, Emass, Qmass, and Mercury5. The authors thank the administrative group of the National Institutes of Health Biowulf Clusters, where all the computational tasks were carried out. They also thank the National Institutes of Health Fellows Editorial Board for editorial assistance. This work was supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health. Funding for Open Access publication charges for this article was provided by the National Institutes of Health.
Appendix
Using Convolution theorem in Discrete Fourier Transform
In general, when the number L is fixed, the folded-back problem should be less severe for the CGID when compare to its FGID counter-part. This is because if one keeps L fixed but decreases the mass difference between adjacent points, the effective mass range shrinks and there exists the possibility when regions with significant probabilities are now folded back to a particular mass window, where much smaller probabilities are assumed if no folded-back occurs. It is for this reason that MIDAs does not fix the number of sampled points, but rather increases it in proportion to 1/∈.
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.