An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics

Wu, Chaochao; Monroe, Matthew E.; Xu, Zhe; Slysz, Gordon W.; Payne, Samuel H.; Rodland, Karin D.; Liu, Tao; Smith, Richard D.

doi:10.1007/s13361-015-1169-z

An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics

Focus: Mass Spectrometry-Based Strategies for Neuroproteomics and Peptidomics: Research Article
Published: 27 May 2015

Volume 26, pages 2002–2008, (2015)
Cite this article

Download PDF

Journal of The American Society for Mass Spectrometry

An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics

Download PDF

Chaochao Wu¹,
Matthew E. Monroe¹,
Zhe Xu¹,
Gordon W. Slysz¹,
Samuel H. Payne¹,
Karin D. Rodland¹,
Tao Liu¹ &
…
Richard D. Smith¹

2134 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

The comprehensive MS analysis of the peptidome, the intracellular and intercellular products of protein degradation, has the potential to provide novel insights on endogenous proteolytic processing and its utility in disease diagnosis and prognosis. Along with the advances in MS instrumentation and related platforms, a plethora of proteomics data analysis tools have been applied for direct use in peptidomics; however, an evaluation of the currently available informatics pipelines for peptidomics data analysis has yet to be reported. In this study, we began by evaluating the results of several popular MS/MS database search engines, including MS-GF+, SEQUEST, and MS-Align+, for peptidomics data analysis, followed by identification and label-free quantification using the well-established accurate mass and time (AMT) tag and newly developed informed quantification (IQ) approaches, both based on direct LC-MS analysis. Our results demonstrated that MS-GF+ outperformed both SEQUEST and MS-Align+ in identifying peptidome peptides. Using a database established from MS-GF+ peptide identifications, both the AMT tag and IQ approaches provided significantly deeper peptidome coverage and less missing data for each individual data set than the MS/MS methods, while achieving robust label-free quantification. Besides having an excellent correlation with the AMT tag quantification results, IQ also provided slightly higher peptidome coverage. Taken together, we propose an optimized informatics pipeline combining MS-GF+ for initial database searching with IQ (or AMT tag) approaches for identification and label-free quantification for high-throughput, comprehensive, and quantitative peptidomics analysis.

Peptide identification in “shotgun” proteomics using tandem mass spectrometry: Comparison of search engine algorithms

Article 23 December 2015

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics

Article 10 April 2017

PepQuery2 democratizes public MS proteomics data for rapid peptide searching

Article Open access 18 April 2023

Introduction

Peptidomics is defined as the comprehensive characterization of “native” peptides in a biological sample [1]. Without digesting proteins into peptides using trypsin or other proteases as applied in conventional bottom-up proteomics, peptidomics is able to preserve the endogenous information of the peptidome peptides from a biological sample, including post-translational modifications (PTMs) and proteolytic products revealing the natural proteases participated in the proteolytic processes [2]. In contrast to limited study of small neuropeptides [3–6], broad and large-scale studies have been increasingly conducted to characterize endogenous peptides in different biological samples, including cell lines [7, 8], body fluids [9, 10], as well as tissues [11], for biomarker screening or clinically-related studies.

The peptidome coverage and effectiveness for quantification provided by a peptidomics pipeline is dependent on each step of the pipeline, including peptide extraction and/or enrichment/fractionation, separation, LC-MS data acquisition, as well as the subsequent informatics analysis [12]. Despite many new advances in both sample preparation [13, 14] and instrumentation [15], very limited progress has been made toward improved data analysis for peptidomics studies. Many proteomics software tools have been applied directly to peptidomics studies, such as MS-GF+ [16], SEQUEST [17], Mascot [18], MS-Align+ [19], which was reviewed recently [20]. However, to the best of our knowledge, there is no detailed study reported for comparing the performance of these proteomics software tools for peptide identification for peptidomics analysis.

Different stable isotope labeling strategies such as ¹⁸O-labeling [21] and other chemical labeling [8] have been applied for quantification of peptidomic changes. Although the labeling approaches can provide accurate quantification, they are typically associated with increased cost, sample loss, and increased sample processing time. In contrast, label-free quantification, a simple yet effective method, can be highly reliable in well-controlled experiments [22]. In addition, there are also well-established methods, such as the accurate mass and time (AMT) tag approach [23], which provide both improved measurement throughput and reliable label-free quantification. More recently, we have developed a new software tool, informed quantification (IQ), which capitalizes on peptide LC elution and high-accuracy mass information, and is capable of accurate de-isotoping, peak matching, as well as label-free quantification, independent of MS/MS data [11]. Such MS/MS-independent proteomics analysis strategies have the potential of providing both increased coverage and reliable quantification in large-scale peptidomics studies.

In this study, we applied both the AMT tag and the augmented IQ informatics pipelines for analyzing data sets from our recent peptidomics study on potential ischemia effects in ovarian cancer tumors [11], and evaluated their performance in greater detail. Both the AMT tag and IQ analyses use a database consisting of peptides identified from conventional database searching of the MS/MS data from each individual peptidomics data set from the entire study. The study included evaluation of the performance of different search engines, including MS-GF+, SEQUEST, and MS-Align+, for effectiveness in peptidomic peptide identification. The results showed that MS-GF+ could identify many more unique peptidome peptides than SEQUEST and MS-Align+. Both AMT tag and IQ approaches were shown to provide more unique peptide identification than the database searching methods, which greatly reduced missing data across the entire data sets. In addition to the good correlation that was observed between AMT tag and IQ quantification results, IQ also provided slightly higher peptidome coverage and less missing data than AMT tag approach. Taken together, our results demonstrate that integration of MS-GF+ database research and IQ analysis for label-free quantification drastically improves peptidome coverage, reduces missing data, and represents an optimized informatics pipeline for large-scale, comprehensive and quantitative peptidomics analysis.

Experimental

Tumor Samples, Peptidomics Sample Preparation, and LC-MS/MS Analysis

The ovarian tumor samples, sample preparation methods, and LC-MS/MS instrument analysis methods used for generating the peptidomics data sets have been described in detail previously [11]. Briefly, tumor tissues collected from three patients with high-grade serous ovarian carcinoma (A, B, and C) were rapidly dissected into four contiguous and adjacent specimens strips, and placed into cryovials and frozen in liquid nitrogen at four different time points (0, 5, 30, and 60 min, at room temperature). The ovarian cancer tumor samples were further processed by cryopulverization and acid extraction (using 0.25% acetic acid and protease inhibitor cocktail) for peptidomic peptides, followed by LC-MS/MS analysis using nanoACQUITY UPLC system (Waters Corporation, Milford, MA, USA) coupled on-line to a LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). A 110 cm × 75 μm i.d. (flow rate 200 nL/min) fused-silica capillary column packed with 3 μm Jupiter C18 bonded particles (Phenomenex, Torrance, CA, USA) was used for analysis of the ovarian tumor samples. Mobile phases consisted of 0.1% formic acid in water (A) and 0.1% formic acid acetonitrile (B) were operated with effective gradient profiles as follow (min:%B): 0:1, 6:8, 60:12, 225:35, 291:45, 300: 95. The LTQ Orbitrap Velos mass spectrometer was operated in the data-dependent mode acquiring high-resolution CID scans (R = 15,000, 5 × 10⁴ target ions) after each full MS scan (R = 60,000, 1 × 10⁶ target ions) for the top six most abundant ions within the mass range of 400 to 2000 m/z. An isolation window of 2 Th and a normalized collision energy of 35 were used for CID. The dynamic exclusion time was 60 s.

Mass Spectrometry Data Analysis

The resulting MS data was first subjected to DTARefinery [24] to correct overall mass measurement deviation before database searching. After that, the corrected spectrum was further searched against human protein sequences from UniProt (UniProt Knowledgebase release 2013_09) using MS-GF+ [16], with the following parameters: no enzyme digestion, precursor mass tolerance 50 ppm, methionine oxidation as variable modification, and target-decoy strategy adopted for false discovery rate (FDR) calculation. The database searching result was finally filtered by spectrum level FDR less than 1% and precursor mass error less than 10 ppm, and only these confidently identified peptides were kept for next step analysis.

After the initial database searching, the data sets were further analyzed using the Pacific Northwest National Laboratory (PNNL)-developed AMT tag approach [25]. Confidently identified peptides from all the individual data sets were assembled into an AMT tag database containing both theoretical masses (calculated from the peptide sequence) and LC elution times of those peptides. LC-MS features in each individual data set were then matched against the AMT tags in the database for peptide identification using VIPER [26] with a mass error tolerance of <10 ppm and a normalized elution time (NET) error tolerance of <2.5%. The AMT tag matching result was further filtered by uniqueness probability (UP) higher than 0.5 to reduce ambiguous match to multiple AMT tags, and finally by Statistical Tools for AMT tag Confidence (STAC) value filtering to ensure FDR less than 2.5% [27]. Integrated MS peak area was used to derive changes in abundance for the peptide identifications.

The MS data sets were also analyzed using a recently in-house developed IQ approach [11], a derivative and improved approach, which provides better de-isotoping and peak selection. Briefly, in IQ the m/z values of the theoretical isotopic profile (derived based on the peptide sequences that were included in the AMT tag database) are used to guide the extraction of the observed isotopic profile from the summed mass spectra. Least-squares fitting of the theoretical isotopic profile on the observed profile is then performed [28], providing a measure of how well the observed isotopic profile matches the theoretical isotopic profile. This metric is called the “fit score” and is a key metric for resolving correct versus incorrect features. A key step in IQ, as with the AMT tag approach, is the alignment of observed mass and LC elution times to database values in order to correct for variations in mass and elution time measurements taken across multiple datasets. Alignment of mass and the LC elution time makes it possible to narrow the mass tolerance used in generating extracted ion chromatograms (XICs) and the elution time window for selecting the correct chromatographic peak. Currently, VIPER is also used in a first-pass analysis to output mass and NET alignment information, which is then loaded into IQ and used for mass and NET correction during subsequent processing. Data processed by IQ approach was initially filtered by fit score (<0.1), NET tolerance (<2.5%), and mass accuracy (<10 ppm), and followed by manual validation to eliminate false positives [29] (development on computing FDR for IQ is currently in progress). If a chromatographic peak has been selected for a given peptide/charge state target, IQ then performs a final step of extracting the abundance information. Currently, this is comprised of summing a total of five mass spectra, centering around the apex scan of the elution profile. The abundance from different charge states is then added up for the specific peptide for quantification.

The peptide to protein mapping was performed using IDPicker3 [30]. All the quantification result from AMT tag and IQ analyses were imported into DanteR program [31] for processing and plotting: the data was first log10 transformed followed by median normalization, and used for further analysis; hierarchical clustering analysis was performed with Euclidean distance as distance metrics and average linkage for clustering; principal component analysis was performed with default parameters.

Results and Discussion

MS-GF+ Outperforms SEQUEST and MS-Align+ in Peptide Identification for Peptidomics Analysis

Totally 6845 unique peptides were confidently identified using MS-GF+ after 1% FDR filtering. As shown in Figure 1a (white), the distribution of precursor monoisotopic mass ranged from 785 to ~6000 Da with a median of 2059.3 Da. These peptides were further mapped to 1136 non-redundant protein groups using IDPicker3 (Supplementary Table S1). In our previous study [11], both SEQUEST and MS-Align+ were used for database search for the same peptidomics data, resulting in 3977 and 2843 unique peptides after filtering (FDR <1%), respectively. MS-GF+ was able to identify many more unique peptides than SEQUEST and MS-Align+ (Figure 1b). Similar results were obtained for the breast tumor peptidomics data in our previous study (data not shown). Furthermore, 91.05% of SEQUEST identified peptides (3621) were overlapped with MS-GF+ identifications, and the distribution of which is depicted in Figure 1a (Blue); similarly, 61.66% of MS-Align+ identified peptides (1753) were covered by MS-GF+, and the distribution of which is depicted in Figure 1a (Red). In agreement with a previous report [19], SEQUEST appears to better identify peptides with relatively lower molecular weight, whereas MS-Align+ has a preference for peptides with higher molecular weight (Figure 1a). In comparison, peptide identifications from MS-GF+ covered a larger dynamic range of molecular weight distribution, almost as large as the combination of those from the other two methods. To confirm the quality of the MS-GF+ search results, we manually inspected the spectra with the worst scores for 50 peptides that were identified only by MS-GF+. The results suggest that the majority of the spectra are of acceptable quality for peptide identification (Supplementary Figure 1). Taken together, our data suggested that MS-GF+ outperforms SEQUEST and MS-Align+ in terms of peptide identification for peptidomics analysis by providing significantly more unique peptide identifications and better molecular weight range coverage.

AMT Tag and IQ Provide Significantly Higher Peptidome Coverage and Fewer Missing Data Than MS-GF+

Owing to the low stoichiometry of peptidome as well as the undersampling nature of typical data-dependent acquisition of MS/MS data, very low consistency is usually observed in MS/MS-based peptide identification from run to run, leading to poor peptidome coverage and a significant amount of missing data in label-free quantification [32]. Indeed, although there were a total of 6845 unique peptides identified by MS-GF+ from all 12 datasets, on average only approximately 2500 unique peptides were detected for each data set (Blue in Figure 2a; also see Table 1). Furthermore, only about 500 peptides were consistently identified and quantifiable across all 12 samples, whereas more than 2000 peptides were only detected from one sample (Figure 2b, Blue). This led to much smaller peptidome coverage and a much larger number of missing data in the individual sample analysis, both well known and significant issues for label-free quantification.

Table 1 Number of Unique Peptidome Peptides Identified by Different Informatics Approaches in Each Tumor Sample

Full size table

In order to improve data quality in identification and quantification for each individual analysis, we next utilized both the AMT tag and IQ approaches for analysis of the same 12 data sets, taking advantage of a LC-MS database created using the MS-GF+ peptide identifications. Both approaches are expected to obtain more comprehensive quantification results attributable to the LC elution time alignment, and hence effective LC-MS peak matching. As depicted in Figure 2a, the average numbers of unique peptides identified from individual dataset via AMT tag and IQ approaches were much larger than that from MS-GF+: 4630 and 5000, respectively, as opposed to 2500. Moreover, the number of unique peptides detected across all 12 datasets via AMT tag (2112) and IQ (2182) approaches was significantly increased compared with MS-GF+ (421), and peptides only detected in one sample greatly decreased (243 for AMT tag and 133 for IQ), significantly reducing missing data (Figure 2b and Table 2). The peptide identifications resulted from MS-GF+, AMT tag, and IQ analysis for each individual peptidomic dataset are provided in Supplementary Tables 2–4.

Table 2 The Distribution of Quantification Frequency (Number of Unique Peptide Identifications Common Across the Different Samples) for the Three Different Informatics Approaches

Full size table

When the performance of the two MS/MS-independent LC-MS analysis approaches were compared, IQ provided slightly more unique peptide identifications (2.34%–14.69%, average 8.02%), and thus less missing data than did the AMT tag approach. This is likely because IQ employs all isotopic peaks, whereas the AMT tag approach uses individually de-isotoped spectra, resulting in improved sensitivity, better distinguished overlapping features, and better reproducibility. Taken together, our results demonstrate that benefiting from significantly reduced undersampling, the direct LC-MS analysis pipelines including the AMT tag and IQ approaches provide significantly higher peptidome coverage and, more importantly, less missing data across the entire peptidomics data sets compared with even the best performing MS/MS-dependent analysis approaches such as MS-GF+.

Both the AMT Tag and IQ Approaches Provide Robust Label-Free Quantification for Peptidomics

The AMT tag is a well-established approach for label-free quantification [23, 33]; it is interesting to compare its performance in quantification with that of the relatively newer IQ approach. Altogether, there were 1503 unique peptides overlapped between IQ and AMT tag analyses with no missing data across all 12 samples. Pearson correlation was first calculated for all 1503 unique peptides to assess the consistency between the quantification results of AMT tag and IQ analyses. As shown in Figure 3a, 91.62% of peptides displayed a correlation coefficient no less than 0.8, whereas only 50 peptides (3.32%) had a correlation lower than 0.5. Pearson correlations between the AMT tag and IQ quantification results for each sample were also calculated. With no Pearson correlation coefficient less than 0.93, the quantification results from both AMT tag and IQ analyses were well-correlated (Table 3). As an example, the AMT tag and IQ quantification results of one sample (A_0) shown in Figure 3b displayed excellent consistency, with most of the data points aligned along the diagonal line with an overall correlation of 0.94.

Table 3 Pearson Correlation of the AMT Tag and IQ Label-Free Quantification Results for Each Tumor Sample

Full size table

The IQ and AMT tag label-free quantitation results also produced very similar hierarchical clustering analysis (HCA) and principle component analysis (PCA) plots (Supplementary Figure 2). Consistent with previously reported results [11], in both IQ and AMT tag analyses of the same peptidomics data sets the HCA heatmaps and PCA plots showed that the peptidomic profiles from the four time points of the same patient sample were clustered together. This indicated that potential changes in the peptidomes due to post-excision delay (up to 1 h) were much smaller than that from patient heterogeneity, and that both IQ and AMT tag informatics pipelines provide robust quantitation for peptidomics analysis.

Conclusion

We describe an improved LC-MS-based informatics workflow for comprehensive and quantitative peptidomics analysis, which consists of MS-GF+ for initial database searching and IQ (or AMT tag) approach for improved identification and more robust label-free quantification. MS-GF+ provides significantly more peptide identifications spanning a broader molecular weight range than the frequently used SEQUEST and MS-Align+ MS/MS search engines for identifying peptidome peptides. Owing to the direct LC-MS analysis strategy employed, both the AMT tag and IQ approaches significantly alleviate the undersampling issue and provide much better peptidome coverage and much less missing data for each sample in comparison to even the best MS/MS-based analysis methods such as MS-GF+. In addition to the excellent correlation with the quantification results provided by the AMT tag approach, the IQ approach showed further improvement of peptidome coverage and reduced missing data across the entire data sets, likely due to better peak picking and retention time alignment. We believe that the powerful combination of MS-GF+ and IQ (or AMT tag) approach represent an optimal peptidomics informatics pipeline and expect broad application of this pipeline for large-scale, highly effective, and robust peptidomics analysis.

References

Schrader, M., Schulz-Knappe, P.: Peptidomics technologies for human body fluids. Trends Biotechnol. 19, S55–S60 (2001)
Article CAS Google Scholar
Gelman, J.S., Fricker, L.D.: Hemopressin and other bioactive peptides from cytosolic proteins: are these non-classical neuropeptides? AAPS J. 12, 279–289 (2010)
Article CAS Google Scholar
Li, L., Kelley, W.P., Billimoria, C.P., Christie, A.E., Pulver, S.R., Sweedler, J.V., Marder, E.: Mass spectrometric investigation of the neuropeptide complement and release in the pericardial organs of the crab, Cancer borealis. J. Neurochem. 87, 642–656 (2003)
Article CAS Google Scholar
Hummon, A.B..., Amare, A., Sweedler, J.V.: Discovering new invertebrate neuropeptides using mass spectrometry. Mass Spectrom. Rev. 25, 77–98 (2006)
Article CAS Google Scholar
Li, L., Sweedler, J.V.: Peptides in the brain: mass spectrometry-based measurement approaches and challenges. Annu Rev Anal Chem (Palo Alto, Calif) 1, 451–483 (2008)
Article CAS Google Scholar
Ma, M., Chen, R., Ge, Y., He, H., Marshall, A.G., Li, L.: Combining bottom-up and top-down mass spectrometric strategies for de novo sequencing of the crustacean hyperglycemic hormone from Cancer borealis. Anal. Chem. 81, 240–247 (2009)
Article CAS Google Scholar
Gelman, J.S., Sironi, J., Castro, L.M., Ferro, E.S., Fricker, L.D.: Peptidomic analysis of human cell lines. J. Proteome Res. 10, 1583–1592 (2011)
Article CAS Google Scholar
Fricker, L.D., Gelman, J.S., Castro, L.M., Gozzo, F.C., Ferro, E.S.: Peptidomic analysis of HEK293T cells: effect of the proteasome inhibitor epoxomicin on intracellular peptides. J. Proteome Res. 11, 1981–1990 (2012)
Article CAS Google Scholar
Villanueva, J., Shaffer, D.R., Philip, J., Chaparro, C.A., Erdjument-Bromage, H., Olshen, A.B..., Fleisher, M., Lilja, H., Brogi, E., Boyd, J., Sanchez-Carbayo, M., Holland, E.C., Cordon-Cardo, C., Scher, H.I., Tempst, P.: Differential exoprotease activities confer tumor-specific serum peptidome patterns. J. Clin. Invest. 116, 271–284 (2006)
Article CAS Google Scholar
Fiedler, G.M., Baumann, S., Leichtle, A., Oltmann, A., Kase, J., Thiery, J., Ceglarek, U.: Standardized peptidome profiling of human urine by magnetic bead separation and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin. Chem. 53, 421–428 (2007)
Article CAS Google Scholar
Xu, Z., Wu, C., Xie, F., Slysz, G.W., Tolic, N., Monroe, M.E., Petyuk, V.A., Payne, S.H., Fujimoto, G.M., Moore, R.J., Fillmore, T.L., Schepmoes, A.A., Levine, D.A., Townsend, R.R., Davies, S.R., Li, S., Ellis, M., Boja, E., Rivers, R., Rodriguez, H., Rodland, K.D., Liu, T., Smith, R.D.: Comprehensive quantitative analysis of ovarian and breast cancer tumor peptidomes. J. Proteome Res. 14, 422–433 (2015)
Article CAS Google Scholar
Tinoco, A.D., Saghatelian, A.: Investigating endogenous peptides and peptidases using peptidomics. Biochemistry 50, 7447–7461 (2011)
Article CAS Google Scholar
Van Dijck, A., Hayakawa, E., Landuyt, B., Baggerman, G., Van Dam, D., Luyten, W., Schoofs, L., De Deyn, P.P.: Comparison of extraction methods for peptidomics analysis of mouse brain tissue. J. Neurosci. Methods 197, 231–237 (2011)
Article Google Scholar
Finoulst, I., Pinkse, M., Van Dongen, W., Verhaert, P.: Sample preparation techniques for the untargeted LC-MS-based discovery of peptides in complex biological matrices. J. Biomed. Biotechnol. 2011, 245291 (2011)
Article Google Scholar
Rose, R.J., Damoc, E., Denisov, E., Makarov, A., Heck, A.J.: High-sensitivity Orbitrap mass analysis of intact macromolecular assemblies. Nat. Methods 9, 1084–1086 (2012)
Article CAS Google Scholar
Kim, S., Pevzner, P.A.: MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014)
Article CAS Google Scholar
Mommen, G.P., Frese, C.K., Meiring, H.D., van Gaans-van den Brink, J., de Jong, A.P., van Els, C.A., Heck, A.J.: Expanding the detectable HLA peptide repertoire using electron-transfer/higher-energy collision dissociation (EThcD). Proc. Natl. Acad. Sci. U.S.A. 111, 4507–4512 (2014)
Article CAS Google Scholar
Dasgupta, S., Castro, L.M., Dulman, R., Yang, C., Schmidt, M., Ferro, E.S., Fricker, L.D.: Proteasome inhibitors alter levels of intracellular peptides in HEK293T and SH-SY5Y cells. PLoS One 9, e103604 (2014)
Article Google Scholar
Liu, X., Sirotkin, Y., Shen, Y., Anderson, G., Tsai, Y.S., Ting, Y.S., Goodlett, D.R., Smith, R.D., Bafna, V., Pevzner, P.A.: Protein identification using top-down. Mol. Cell. Proteomics 11, 008524 (2012)
Article Google Scholar
Menschaert, G., Vandekerckhove, T.T., Baggerman, G., Schoofs, L., Luyten, W., Van Criekinge, W.: Peptidomics coming of age: a review of contributions from a bioinformatics angle. J. Proteome Res. 9, 2051–2061 (2010)
Article CAS Google Scholar
Stingl, C., Soderquist, M., Karlsson, O., Boren, M., Luider, T.M.: Uncovering effects of ex vivo protease activity during proteomics and peptidomics sample extraction in rat brain tissue by oxygen-18 labeling. J. Proteome Res. 13, 2807–2817 (2014)
Article CAS Google Scholar
Quintana, L.F., Campistol, J.M., Alcolea, M.P., Banon-Maneus, E., Sol-Gonzalez, A., Cutillas, P.R.: Application of label-free quantitative peptidomics for the identification of urinary biomarkers of kidney chronic allograft dysfunction. Mol. Cell. Proteomics 8, 1658–1673 (2009)
Article CAS Google Scholar
Smith, R.D., Anderson, G.A., Lipton, M.S., Pasa-Tolic, L., Shen, Y., Conrads, T.P., Veenstra, T.D., Udseth, H.R.: An accurate mass tag strategy for quantitative and high-throughput proteome measurements. Proteomics 2, 513–523 (2002)
Article CAS Google Scholar
Petyuk, V.A., Mayampurath, A.M., Monroe, M.E., Polpitiya, A.D., Purvine, S.O., Anderson, G.A., Camp II, D.G., Smith, R.D.: DtaRefinery, a software tool for elimination of systematic errors from parent ion mass measurements in tandem mass spectra data sets. Mol. Cell. Proteomics 9, 486–496 (2010)
Article CAS Google Scholar
Zimmer, J.S., Monroe, M.E., Qian, W.J., Smith, R.D.: Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. 25, 450–482 (2006)
Article CAS Google Scholar
Monroe, M.E., Tolic, N., Jaitly, N., Shaw, J.L., Adkins, J.N., Smith, R.D.: VIPER: an advanced software package to support high-throughput LC-MS peptide identification. Bioinformatics 23, 2021–2023 (2007)
Article CAS Google Scholar
Stanley, J.R., Adkins, J.N., Slysz, G.W., Monroe, M.E., Purvine, S.O., Karpievitch, Y.V., Anderson, G.A., Smith, R.D., Dabney, A.R.: A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics. Anal. Chem. 83, 6135–6140 (2011)
Article CAS Google Scholar
Jaitly, N., Mayampurath, A., Littlefield, K., Adkins, J.N., Anderson, G.A., Smith, R.D.: Decon2LS: an open-source software package for automated processing and visualization of high resolution mass spectrometry data. BMC Bioinformatics 10, 87 (2009)
Article Google Scholar
Slysz, G.W., Steinke, L., Ward, D.M., Klatt, C.G., Clauss, T.R., Purvine, S.O., Payne, S.H., Anderson, G.A., Smith, R.D., Lipton, M.S.: Automated data extraction from in situ protein-stable isotope probing studies. J. Proteome Res. 13, 1200–1210 (2014)
Article CAS Google Scholar
Holman, J.D., Ma, Z.Q., Tabb, D.L.: Identifying proteomic LC-MS/MS data sets with Bumbershoot and IDPicker. Curr. Protoc. Bioinformatics 13, Unit13–Unit17 (2012)
Google Scholar
Taverner, T., Karpievitch, Y.V., Polpitiya, A.D., Brown, J.N., Dabney, A.R., Anderson, G.A., Smith, R.D.: DanteR: an extensible R-based tool for quantitative analysis of -omics data. Bioinformatics 28, 2404–2406 (2012)
Article CAS Google Scholar
Xie, F., Liu, T., Qian, W.J., Petyuk, V.A., Smith, R.D.: Liquid chromatography-mass spectrometry-based quantitative proteomics. J. Biol. Chem. 286, 25443–25449 (2011)
Article CAS Google Scholar
Lipton, M.S., Pasa-Tolic, L., Anderson, G.A., Anderson, D.J., Auberry, D.L., Battista, J.R., Daly, M.J., Fredrickson, J., Hixson, K.K., Kostandarithes, H., Masselon, C., Markillie, L.M., Moore, R.J., Romine, M.F., Shen, Y., Stritmatter, E., Tolic, N., Udseth, H.R., Venkateswaran, A., Wong, K.K., Zhao, R., Smith, R.D.: Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. U. S. A. 99, 11049–11054 (2002)
Article CAS Google Scholar

Download references

Acknowledgments

Portions of this work were supported by grant U24CA160019 from the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC), National Institutes of Health Research Resource grant P41GM103493, and Department of Defense Interagency Agreement MIPR2DO89M2058. The experimental work described herein was performed in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the DOE and located at Pacific Northwest National Laboratory, which is operated by Battelle Memorial Institute for the DOE under Contract DE-AC05-76RL0 1830.

Author information

Authors and Affiliations

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99352, USA
Chaochao Wu, Matthew E. Monroe, Zhe Xu, Gordon W. Slysz, Samuel H. Payne, Karin D. Rodland, Tao Liu & Richard D. Smith

Authors

Chaochao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Monroe
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Xu
View author publications
You can also search for this author in PubMed Google Scholar
Gordon W. Slysz
View author publications
You can also search for this author in PubMed Google Scholar
Samuel H. Payne
View author publications
You can also search for this author in PubMed Google Scholar
Karin D. Rodland
View author publications
You can also search for this author in PubMed Google Scholar
Tao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Richard D. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tao Liu or Richard D. Smith.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Figure 1

Manual inspection of spectra with the worst scores for 50 peptides that were identified only by MS-GF+ (PDF 1159 kb)

Supplementary Figure 2

(DOCX 415 kb)

Supplementary Table 1

(XLSX 227 kb)

Supplementary Table 2

(XLSX 503 kb)

Supplementary Table 3

(XLSX 867 kb)

Supplementary Table 4

(XLSX 1004 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, C., Monroe, M.E., Xu, Z. et al. An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics. J. Am. Soc. Mass Spectrom. 26, 2002–2008 (2015). https://doi.org/10.1007/s13361-015-1169-z

Download citation

Received: 30 January 2015
Revised: 23 March 2015
Accepted: 27 March 2015
Published: 27 May 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s13361-015-1169-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Optimized Informatics Pipeline for Mass Spectrometry-Based Peptidomics

Abstract

Similar content being viewed by others

Peptide identification in “shotgun” proteomics using tandem mass spectrometry: Comparison of search engine algorithms

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics

PepQuery2 democratizes public MS proteomics data for rapid peptide searching

Introduction

Experimental