Journal of Healthcare Informatics Research

, Volume 2, Issue 3, pp 305–318 | Cite as

On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples

  • Sameer Manchanda
  • Mikaela Meyer
  • Qianqian Li
  • Kai Liang
  • Yan Li
  • Nan KongEmail author
Research Article
Part of the following topical collections:
  1. Special Issue on Data Mining in Healthcare Informatics


To guarantee meaningful interpretation of data in basic and translational medicine, it is critical to ensure the quality of biological samples. Mass spectrometers have become promising instruments to acquire proteomic information that is known to be associated with the quality of samples. However, a universally applicable mass spectrometry data analysis platform for quality assessment remains of great need. We present a comprehensive pattern recognition study to facilitate the development of such a platform. This study involves feature extraction, binary classification, and feature ranking. In this study, we develop classifiers with classification accuracy higher than 90% in distinguishing human serum samples stored for different amounts of time. We also derive fingerprint patterns of serum peptides that can be conveniently used for temporal classification.


Proteome profiling Mass spectrometry Blood sample Binary classification Feature ranking 


Funding Information

This study received financial support from NSF grant DMS#1246818 and an industry grant from the Chinese Academy of Sciences Holding Co., Ltd.


  1. 1.
    Ayache S et al (2006) Effects of storage time and exogenous protease inhibitors on plasma protein levels. Am J Clin Pathol 126(2):174. CrossRefGoogle Scholar
  2. 2.
    Baggerly KA, Morris JS, Coombes KR (2004) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20(5):777–785CrossRefGoogle Scholar
  3. 3.
    Ball G et al (2002) An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., Secaucus isbn: 0387310738zbMATHGoogle Scholar
  5. 5.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140zbMATHGoogle Scholar
  6. 6.
    Carvalho PC et al (2008) Identifying differences in protein expression levels by spectral counting and feature selection. Genet Mol Res 7(2):342CrossRefGoogle Scholar
  7. 7.
    Chaigneau C et al (2007) Serum biobank certification and the establishment of quality controls for biological fluids: examples of serum biomarker stability after temperature variation. Clin Chem Lab Med 45(10):1390–1395CrossRefGoogle Scholar
  8. 8.
    Datta S, DePadilla LM (2006) Feature selection and machine learning with mass spectrometry data for distinguishing cancer and noncancer samples. Stat Methodol 3(1):79–92MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Jackson DH, Banks RE (2010) Banking of clinical samples for proteomic biomarker studies: a consideration of logistical issues with a focus on pre-analytical variation. Proteomics Clin Appl 4(3):250–270CrossRefGoogle Scholar
  10. 10.
    Jenkins MA (2004) Quality control and quality assurance aspects of the routine use of capillary electrophoresis for serum and urine proteins in clinical laboratories. Electrophoresis 25(10–11):1555–1560CrossRefGoogle Scholar
  11. 11.
    Kozak KR et al (2003) Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: potential use in diagnosis and prognosis. Proc Natl Acad Sci 100(21):12343–12348CrossRefGoogle Scholar
  12. 12.
    Levner I (2005) Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6(1):1MathSciNetCrossRefGoogle Scholar
  13. 13.
    Liang K et al (2016) Mesoporous silica chip: enabled peptide profiling as an effective platform for controlling bio-sample quality and optimizing handling procedure. Clin Proteomics 13(1):34. issn: 1559–0275. CrossRefGoogle Scholar
  14. 14.
    Ostroff R et al (2010) The stability of the circulating human proteome to variations in sample collection and handling procedures measured with an aptamer-based proteomics array. J Proteomics 73(3):649–666CrossRefGoogle Scholar
  15. 15.
    Papadopoulos MC et al (2004) A novel and accurate diagnostic test for human African trypanosomiasis. Lancet 363(9418):1358–1363CrossRefGoogle Scholar
  16. 16.
    Petricoin EF et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572–577CrossRefGoogle Scholar
  17. 17.
    Pieragostino D et al (2010) Pre-analytical factors in clinical proteomics investigations: impact of ex vivo protein modifications for multiple sclerosis biomarker discovery. J Proteomics 73(3):579–592. Blood Proteomics, issn: 1874–3919. CrossRefGoogle Scholar
  18. 18.
    Rai AJ et al (2005) HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 5(13):3262–3277CrossRefGoogle Scholar
  19. 19.
    Russell SJ et al (2003) Artificial intelligence: a modern approach. Vol. 2. Prentice hall, Upper Saddle RiverGoogle Scholar
  20. 20.
    Sorace JM, Zhan M (2003) A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4(1):1CrossRefGoogle Scholar
  21. 21.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Society Ser B (Methodol) 267–288MathSciNetzbMATHGoogle Scholar
  22. 22.
    Tibshirani R et al (2004) Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20(17):3034–3044CrossRefGoogle Scholar
  23. 23.
    Veenstra TD et al (2005) Biomarkers: mining the biofluid proteome. Mol Cell Proteomics 4(4):409–418. eprint: url: CrossRefGoogle Scholar
  24. 24.
    Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L, Fleisher M, Robbins RJ, Tempst P (2005) Correcting common errors in identifying cancer-specific serum peptide signatures. J Proteome Res 4(4):1060–1072CrossRefGoogle Scholar
  25. 25.
    Wagner M, Naik D, Pothen A (2003) Protocols for disease classification from mass spectrometry data. Proteomics 3(9):1692–1698CrossRefGoogle Scholar
  26. 26.
    Won Y et al (2003) Pattern analysis of serum proteome distinguishes renal cell carcinoma from other urologic diseases and healthy persons. Proteomics 3(12):2310–2316CrossRefGoogle Scholar
  27. 27.
    Wu B et al (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643CrossRefGoogle Scholar
  28. 28.
    Yasui Y et al (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4(3):449–463zbMATHCrossRefGoogle Scholar
  29. 29.
    Yu JS et al (2005) Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10):2200–2209CrossRefGoogle Scholar
  30. 30.
    Zhang X et al (2006) Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7(1):1MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer SciencePurdue UniversityWest LafayetteUSA
  2. 2.Department of Statistics and MathematicsPurdue UniversityWest LafayetteUSA
  3. 3.Institute of BiophysicsChinese Academy of SciencesBeijingChina
  4. 4.Weldon School of Biomedical EngineeringPurdue UniversityWest LafayetteUSA

Personalised recommendations