Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis

  • Xiaoxu Han
  • Joseph Scazzero
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


Identifying cancer molecular patterns robustly from large dimensional protein expression data not only has significant impacts on clinical ontology, but also presents a challenge for statistical learning. Principal component analysis (PCA) is a widely used feature selection algorithm and generally integrated with classic classification algorithms to conduct cancer molecular pattern discovery. However, its holistic mechanism prevents local data characteristics capture in feature selection. This may lead to the increase of misclassification rates and affect robustness of cancer molecular diagnostics. In this study, we develop a nonnegative principal component analysis (NPCA) algorithm and propose a NPCA-based SVM algorithm with sparse coding in the cancer molecular pattern analysis of proteomics data. We report leading classification results from this novel algorithm in predicting cancer molecular patterns of three benchmark proteomics datasets, under 100 trials of 50% hold-out and leave one out cross validations, by directly comparing its performances with those of the PCA-SVM, NMF-SVM, SVM, k-NN and PCA-LDA classification algorithms with respect to classification rates, sensitivities and specificities. Our algorithm also overcomes the overfitting problem in the SVM and PCA-SVM classifications and provides exceptional sensitivities and specificities.


Nonnegative principle component analysis sparse coding support vector machine (SVM) 


  1. 1.
    Han, X.: Cancer molecular pattern discovery by subspace consensus kernel classification, Computational Systems Bioinformatics, In: Proceedings of the Conference CSB 2007, vol. 6, pp. 55–65 (2007)Google Scholar
  2. 2.
    Hauskrecht, H., et al.: Feature Selection for Classification of SELDI-TOF-MS Proteomic Profiles. Applied Bioinformatics 4(4), 227–246 (2005)CrossRefPubMedGoogle Scholar
  3. 3.
    Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. Journal of Computational and Graphical Statistics 15(2), 262–286 (2006)CrossRefGoogle Scholar
  4. 4.
    D’Aspremont, A., Ghaout, L., Jordan, M., Lanckriet, G.: A direct formulation for sparse PCA using Semidefinite Programming. SIAM Review 49(3), 434–448 (2007)CrossRefGoogle Scholar
  5. 5.
    Zass, R. and Shashua, A.: Nonnegative sparse PCA, Neural Information and Processing Systems (NIPS) (2006) Google Scholar
  6. 6.
    Lee, D.D., Sebastian Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefPubMedGoogle Scholar
  7. 7.
    Hoyer, P.O.: Hoyer: Non-negativematrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)Google Scholar
  8. 8.
    National Center Institute Center for Cancer Research Clinical Proteomics Program,
  9. 9.
    Ressom, H., Varghese, R., Saha, D., Orvisky, R., et al.: Analysis of mass spectral serum profiles for biomarker selection. Bioinformatics 21(21), 4039–4045 (2005)CrossRefPubMedGoogle Scholar
  10. 10.
    Lilien, R., Farid, H.: Probabilistic Disease Classification of Expression-dependent Proteomic Data from Mass Spectrometry of Human Serum. Journal of Computational Biology 10(6), 925–946 (2003)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Xiaoxu Han
    • 1
  • Joseph Scazzero
    • 2
  1. 1.Department of Mathematics and Bioinformatics ProgramUSA
  2. 2.Department of Accounting and FinanceEastern Michigan UniversityYpsilantiUSA

Personalised recommendations