Dimensionality Reduction of Protein Mass Spectrometry Data Using Random Projection
Protein mass spectrometry (MS) pattern recognition has recently emerged as a new method for cancer diagnosis. Unfortunately, classification performance may degrade owing to the enormously high dimensionality of the data. This paper investigates the use of Random Projection in protein MS data dimensionality reduction. The effectiveness of Random Projection (RP) is analyzed and compared against Principal Component Analysis (PCA) by using three classification algorithms, namely Support Vector Machine, Feed-forward Neural Networks and K-Nearest Neighbour. Three real-world cancer data sets are employed to evaluate the performances of RP and PCA. Through the investigations, RP method demonstrated better or at least comparable classification performance as PCA if the dimensionality of the projection matrix is sufficiently large. This paper also explores the use of RP as a pre-processing step prior to PCA. The results show that without sacrificing classification accuracy, performing RP prior to PCA significantly improves the computational time.
KeywordsPrincipal Component Analysis Support Vector Machine Partial Little Square Dimensionality Reduction Projection Matrix
Unable to display preview. Download preview PDF.
- 1.Perkins, G.L., et al.: Serum Tumor Markers. American Family Physician 68(6), 1075–1082 (2003)Google Scholar
- 3.Dasgupta, S.: Experiments with Random Projections. In: Proc. 16th Conf. Uncertainty in Artificial Intelligence (2000)Google Scholar
- 4.Bingham, E., Mannila, H.: Random Projection in Dimensionality Reduction Application to Image and Text Data. Knowledge Discovery and Data Mining, pp. 245–250 (2001)Google Scholar
- 5.Levner, I.: Feature Selection and Nearest Centroid Classification for Protein Mass Spectrometry. Bioinformatics 6(68) (2005)Google Scholar
- 10.Achlioptas, D.: Database-Friendly Random Projections. In: Symposium on Principles of Database Systems, pp. 274–281 (2001)Google Scholar
- 11.Clinical Proteomics Program Databank, National Cancer Institute: http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp