Skip to main content
Log in

Exploiting thread-level and instruction-level parallelism to cluster mass spectrometry data using multicore architectures

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

Modern mass spectrometers can produce large numbers of peptide spectra from complex biological samples in a short time. A substantial amount of redundancy is observed in these data sets from peptides that may get selected multiple times in liquid chromatography tandem mass spectrometry experiments. A large number of spectra do not get mapped to specific peptide sequences due to low signal-to-noise ratio of the spectra from these machines. Clustering is one way to mitigate the problems of these complex mass spectrometry data sets. Recently, we presented a graph theoretic framework, known as CAMS, for clustering of large-scale mass spectrometry data. CAMS utilized a novel metric to exploit the spatial patterns in the mass spectrometry peaks which allowed highly accurate clustering results. However, comparison of each spectrum with every other spectrum makes the clustering problem computationally inefficient. In this paper, we present a parallel algorithm, called P-CAMS, that uses thread-level and instruction-level parallelism on multicore architectures to substantially decrease running times. P-CAMS relies on intelligent matrix completion to reduce the number of comparisons, threads to run on each core and single instruction multiple data (SIMD) paradigm inside each thread to exploit massive parallelism on multicore architectures. A carefully crafted load-balanced scheme that uses spatial locations of the mass spectrometry peaks mapped to nearest level cache and core allows super-linear speedups. We study the scalability of the algorithm with a wide variety of mass spectrometry data and variation in architecture specific parameters. The results show that SIMD style data parallelism combined with thread-level parallelism for multicore architectures is a powerful combination that allows substantial reduction in run-times even for all-to-all comparison algorithms. The quality assessment is performed using real-world data set and is shown to be consistent with the serial version of the same algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Notations of Fset and F-set are used interchangeably in this paper.

References

  • Beausoleil A, Jedrychowski M, Schwartz D, Elias E, Villen J, Li J, Cohn A, Cantley C, Gygi P (2004) Large-scale characterization of hela cell nuclear phosphoproteins. Proc Natl Acad Sci USA 101:12130

    Article  Google Scholar 

  • Beer I, Barnea E, Ziv T, Admon A (2004) Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics 4(4):950–960

    Article  Google Scholar 

  • Catalyurek UV, Feo J, Gebremedhin AH, Halappanavar M, Pothen A (2012) Graph coloring algorithms for multi-core and massively multithreaded architectures. Parallel Comput 38(1011):576–594

    Article  MathSciNet  Google Scholar 

  • Cantin T, Venable D, Cociorva D, Yates R (2006) Iii quantitative phosphoproteomic analysis of the tumor necrosis factor pathway. J. Proteome Res. 5:127

    Article  Google Scholar 

  • Dutta D, Chen T (2007) Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search. Bioinformatics 23(5):612–618

    Article  Google Scholar 

  • Du X, Yang F, Manes NP, Stenoien DL, Monroe ME, Adkins JN, States DJ, Purvine SO, Camp DG II, Smith RD (2008) Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications. J Proteome Res 7(6):2195–2203

    Article  Google Scholar 

  • Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA (2008) Clustering Millions of tandem mass spectra. J Proteome Res 7:113–122

    Article  Google Scholar 

  • Gruhler A, Olsen JV, Mohammed S, Mortensen P, FÃrgeman NJ, Mann M, Jensen ON (2005) Quantitative Phosphoproteomics Applied to the Yeast Pheromone Signaling Pathway. Mol Cell Proteomics 4:310–327

    Article  Google Scholar 

  • Hoffert J, Pisitkun T, Wang G, Shen F, Knepper M (2006) Quantitative phosphoproteomics of vasopressin-sensitive renal cells: regulation of aquaporin-2 phosphorylation at two sites. Proc Natl Acad Sci USA 103(18):7159–7164

    Article  Google Scholar 

  • Jiang X, Ye M, Han G, Dong X, Zou H (2010) Classification filtering strategy to improve the coverage and sensitivity of phosphoproteome analysis. Anal Chem 82(14):6168–6175

    Article  Google Scholar 

  • Li X, Gerber SA, Rudner AD, Beausoleil SA, Haas W, Elias JE, Gygi SP (2007) Large-scale phosphorylation analysis of alpha-factor-arrested saccharomyces cerevisiae. J Proteome Res 6(3):1190–1197

    Article  Google Scholar 

  • Liu Y, Schmidt B, Maskell D (2011) Parallelized short read assembly of large genomes using de bruijn graphs. BMC Bioinform 12(1):354

    Article  Google Scholar 

  • Majumder T, Borgens M, Pande P, Kalyanaraman A (2012) On-chip network-enabled multicore platforms targeting maximum likelihood phylogeny reconstruction, Computer-Aided Design of Integrated Circuits and Systems. IEEE Transactions on 31:1061–1073

    Google Scholar 

  • Ozyer T, Alhajj R (2009) Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer. Appl Intell 31(3):318–331

    Article  Google Scholar 

  • Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP (2006) A fast coarse filtering method for peptide identification by mass spectrometry. Bioinformatics 22(12):1524–1531

    Article  Google Scholar 

  • Riedy J, Meyerhenke H, Bader D, Ediger D, Mattson T (2012) Analysis of streaming social networks and graphs on multicore architectures. In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 5337–5340 IEEE

  • Ruttenberg BE, Pisitkun T, Knepper MA, Hoffert JD (2008) PhosphoScore: an open-source phosphorylation site assignment tool for MSn data. J Proteome Res 7:3054–3059

    Article  Google Scholar 

  • Saeed F, Khokhar A (2009) A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms. J Parallel Distrib Comput 69(7):666–677

    Article  Google Scholar 

  • Saeed F, Pisitkun T, Knepper MA, Hoffert JD (2012) An efficient algorithm for clustering of large-scale mass spectrometry data. In: Bioinformatics and biomedicine (BIBM), 2012 IEEE International Conference on 1–4 IEEE

  • Saeed F, Pisitkun T, Hoffert JD, Wang G, Gucek M, Knepper MA (2012) An efficient dynamic programming algorithm for phosphorylation site assignment of large-scale mass spectrometry data. In: Bioinformatics and biomedicine Workshops (BIBMW), 2012 IEEE International Conference on, pp 618–625, IEEE

  • Saeed F, Hoffert JD, Knepper MA (2013) A high performance algorithm for clustering of large-scale protein mass spectrometry data using multi-core architectures. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM'13). ACM, New York, pp 923–930

  • Saeed F, Hoffert JD, Knepper MA (2014) Cams-rs: clustering algorithm for large-scale mass spectrometry data using restricted search space and intelligent random sampling. IEEE/ACM Trans Comput Biol Bioinform (in press)

  • Sarje A, Zola J, Aluru S (2011) Accelerating pairwise computations on cell processors. Parallel and Distributed Systems, IEEE Transactions on 22:69–77

    Google Scholar 

  • Tabb DL, MacCoss MJ, Wu CC, Anderson SD, Yates JR (2003) Similarity among tandem mass spectra from proteomic experiments, detection, significance, and utility. Anal Chem 75(10):2470–2477

    Article  Google Scholar 

  • Tabb DL, Thompson MR, Khalsa-Moyers G, VerBerkmoes NC, McDonald WH (2005) Ms2grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. J Am Soc Mass Spectrom16(8):1250–1261

    Article  Google Scholar 

  • Whitelegge JP (2003) Hplc and mass spectrometry of intrinsic membrane proteins, 251

Download references

Acknowledgements

This work was funded by the operating budget of Division of Intramural Research, National Heart, Lung and Blood Institute, National Institutes of Health (NIH), Project ZO1-HL001285 and National Science Foundation (NSF) under grant CNS-1250264. All the Mass Spectrometry data was produced at Proteomics Core at System Biology Center (SBC), NHLBI, NIH.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fahad Saeed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saeed, F., Hoffert, J.D., Pisitkun, T. et al. Exploiting thread-level and instruction-level parallelism to cluster mass spectrometry data using multicore architectures. Netw Model Anal Health Inform Bioinforma 3, 54 (2014). https://doi.org/10.1007/s13721-014-0054-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-014-0054-1

Keywords

Navigation