Dimensionality Reduction of Protein Mass Spectrometry Data Using Random Projection

  • Chen Change Loy
  • Weng Kin Lai
  • Chee Peng Lim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4233)


Protein mass spectrometry (MS) pattern recognition has recently emerged as a new method for cancer diagnosis. Unfortunately, classification performance may degrade owing to the enormously high dimensionality of the data. This paper investigates the use of Random Projection in protein MS data dimensionality reduction. The effectiveness of Random Projection (RP) is analyzed and compared against Principal Component Analysis (PCA) by using three classification algorithms, namely Support Vector Machine, Feed-forward Neural Networks and K-Nearest Neighbour. Three real-world cancer data sets are employed to evaluate the performances of RP and PCA. Through the investigations, RP method demonstrated better or at least comparable classification performance as PCA if the dimensionality of the projection matrix is sufficiently large. This paper also explores the use of RP as a pre-processing step prior to PCA. The results show that without sacrificing classification accuracy, performing RP prior to PCA significantly improves the computational time.


Principal Component Analysis Support Vector Machine Partial Little Square Dimensionality Reduction Projection Matrix 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Perkins, G.L., et al.: Serum Tumor Markers. American Family Physician 68(6), 1075–1082 (2003)Google Scholar
  2. 2.
    Petricon, E.F., et al.: Use of Proteomic Patterns in Serum to Identify Ovarian Cancer. The Lancet 359, 572–577 (2002)CrossRefGoogle Scholar
  3. 3.
    Dasgupta, S.: Experiments with Random Projections. In: Proc. 16th Conf. Uncertainty in Artificial Intelligence (2000)Google Scholar
  4. 4.
    Bingham, E., Mannila, H.: Random Projection in Dimensionality Reduction Application to Image and Text Data. Knowledge Discovery and Data Mining, pp. 245–250 (2001)Google Scholar
  5. 5.
    Levner, I.: Feature Selection and Nearest Centroid Classification for Protein Mass Spectrometry. Bioinformatics 6(68) (2005)Google Scholar
  6. 6.
    Lilien, R.H., Farid, H., Donald, B.R.: Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. J. of Computational Biology 10(6), 925–946 (2003)CrossRefGoogle Scholar
  7. 7.
    Shen, L., Tan, E.C.: Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification using Microarray Data. IEEE/ACM Trans. on Computational Biology and Bioinformatics 2(2), 166–174 (2005)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Purohit, P.V., Rocke, D.M.: Discriminant Models for High-Throughput Proteomics Mass Spectrometer Data. Proteomics 3, 1699–1703 (2003)CrossRefGoogle Scholar
  9. 9.
    Vempala, S.S.: The Random Projection Method, vol. 65. American Mathematical Society, Providence, RI (2004)MATHGoogle Scholar
  10. 10.
    Achlioptas, D.: Database-Friendly Random Projections. In: Symposium on Principles of Database Systems, pp. 274–281 (2001)Google Scholar
  11. 11.
    Clinical Proteomics Program Databank, National Cancer Institute: http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp
  12. 12.
    Conrads, T.P., et al.: High-Resolution Serum Proteomic Features for Ovarian Cancer Detection. Endocrine-Related Cancer 11, 163–178 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chen Change Loy
    • 1
  • Weng Kin Lai
    • 1
  • Chee Peng Lim
    • 2
  1. 1.Grid Computing and Bioinformatics Lab, MIMOS BerhadKuala LumpurMalaysia
  2. 2.School of Electrical & Electronic EngineeringUniversity of Science MalaysiaNibong Tebal, PenangMalaysia

Personalised recommendations