Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Cheng, Lin-Yang; Liu, Yansheng; Chang, Ching-Yun; Röst, Hannes; Aebersold, Ruedi; Vitek, Olga

doi:10.1186/1471-2105-16-S2-A4

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Meeting abstract
Open access
Published: 28 January 2015

Volume 16, article number A4, (2015)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Download PDF

Lin-Yang Cheng¹,
Yansheng Liu²,
Ching-Yun Chang¹,
Hannes Röst²,
Ruedi Aebersold^2,3 &
…
Olga Vitek⁴

1285 Accesses
1 Citation
1 Altmetric
Explore all metrics

Background

Many proteomic investigations summarize the quantitative information across multiple spectral features into protein-level conclusions. Data-independent spectral acquisition (DIA) now generates a lot of interest, as it allows us to quantify many spectral features in a single run. However, the disadvantage of DIA experiments as compared, e.g., to Selected Reaction Monitoring (SRM) is that the features are subject to interferences and noise. We argue that between-run variation provides an additional insight for distinguishing good-quality and noisy DIA features. To appropriately use the quantitative between-run variation, it is important to account for the properties experimental design, and distinguish random artifacts from the biological changes. We have previously proposed a method (Chang et al., ASMS 2013) that accounts for the experimental design to eliminate features with low information content.

Results

In this project we furthermore emphasized that conducting regularization helps us avoid exploring every subset of features exhaustively, and allows us to conduct hypothesis tests later on so that we would be able to control the false discovery rate of the feature selection process. We evaluated our proposed approach by using three datasets that have some notion of ground truth: an extensive simulation study, a controlled mixture where proteins were spiked into a complex background in known concentrations, and a study of 232 plasma samples, where 18 proteins were quantified in both SWAH and SRM mode in presence of heavy labeled reference peptides. We worked on [1] protein-level estimates of fold changes between conditions, [2] sensitivity and specificity of detecting changes in protein abundance, and [3] accuracy of relative quantification of protein abundance in individual biological samples. A family of linear mixed models similar to that in MSstats http://www.msstats.org were fit to all the datasets. Then we conducted the regularization and hypothesis test to control the selection false discovery rate.

Conclusion

The results demonstrated that our proposed feature selection approach enhanced sensitivity and specificity of the conclusions, was robust to the amount of noisy fragments, and increased the correlation of subject quantification between SRM and DIA workflows. Importantly, the performance exceeded that of the frequently used 'top 3' approach, which consists of using three spectral features with the highest average intensity between runs. Furthermore, we showed that our proposed approach outperforms using correlation to select the information features.

References

Clough T, Thaminy S, Ragg S, Aebersold R, Vitek O: Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs". BMC Bioinformatics. 2012, 13: S16-
Article Google Scholar
Chang CY, Picotti P, Hüttenhain R, Heinzelmann-Schwarz V, Jovanovic M, Aebersold R, Vitek O: Protein significance analysis in selected reaction monitoring (SRM) measurements. Molecular and Cellular Proteomics. 2012, 11: Article M111.014662
Google Scholar
Choi M, Chang CY, Clough T, Broudy D, Killeen T, MacLean B, Vitek O: MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics. 2014
Google Scholar
Lockhart R, Taylor J, Tibshirani R, Tibshirani R: A significance test for the lasso. The Annals of Statistics. 2014, 42:
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Purdue University, West Lafayette, IN, USA
Lin-Yang Cheng & Ching-Yun Chang
Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093, Zurich, Switzerland
Yansheng Liu, Hannes Röst & Ruedi Aebersold
Faculty of Science, University of Zurich, 8057, Zurich, Switzerland
Ruedi Aebersold
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Olga Vitek

Authors

Lin-Yang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yansheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Yun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Röst
View author publications
You can also search for this author in PubMed Google Scholar
Ruedi Aebersold
View author publications
You can also search for this author in PubMed Google Scholar
Olga Vitek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin-Yang Cheng.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Cheng, LY., Liu, Y., Chang, CY. et al. Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition. BMC Bioinformatics 16 (Suppl 2), A4 (2015). https://doi.org/10.1186/1471-2105-16-S2-A4

Download citation

Published: 28 January 2015
DOI: https://doi.org/10.1186/1471-2105-16-S2-A4

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Background

Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Background

Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation