Skip to main content

Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study

  • Chapter
  • 8423 Accesses

Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ,volume 207)

Abstract

To satisfy the ever growing need for effective screening and diagnostic tests, medical practitioners have turned their attention to high resolution, high throughput methods. One approach is to use mass spectrometry based methods for disease diagnosis. Effective diagnosis is achieved by classifying the mass spectra as belonging to healthy or diseased individuals. Unfortunately, the high resolution mass spectrometry data contains a large degree of noisy, redundant and irrelevant information, making accurate classification difficult. To overcome these obstacles, feature extraction methods are used to select or create small sets of relevant features. This paper compares existing feature selection methods to a novel wrapper-based feature selection and centroid-based classification method. A key contribution is the exposition of different feature extraction techniques, which encompass dimensionality reduction and feature selection methods. The experiments, on two cancer data sets, indicate that feature selection algorithms tend to both reduce data dimensionality and increase classification accuracy, while the dimensionality reduction techniques sacrifice performance as a result of lowering the number of features. In order to evaluate the dimensionality reduction and feature selection techniques, we use a simple classifier, thereby making the approach tractable. In relation to previous research, the proposed algorithm is very competitive in terms of (i) classification accuracy, (ii) size of feature sets, (iii) usage of computational resources during both training and classification phases.

Keywords

  • feature extraction
  • classification
  • mining bio-medical data
  • mass spectrometry
  • dimensionality reduction

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-540-35488-8_31
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   269.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-35488-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   349.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • B. Adam, Y. Qu, J. W. Davis, M. D. Ward, M. A. Clements, L. H. Cazares, O. J. Semmes, P. F. Schellhammer, Y. Yasui, Z. Feng, and Jr. G. L. Wright. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research, 62(13):3609–3614, 2002.

    Google Scholar 

  • T. P. Conrads, M. Zhou, E. F. Petricoin III, L. Liotta, and T. D. Veenstra. Cancer diagnosis using proteomic patterns. Expert Reviews in Molecular Diagnostics, 3(4):411–420, 2003.

    CrossRef  Google Scholar 

  • E. Diamandis. Proteomic patterns in biological fluinds: Do they represent the future of cancer diagnostics. Clinical Chemistry (Point/CounterPoint), 48(8):1272–1278, 2003.

    CrossRef  Google Scholar 

  • T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer Verlag, New York, 2001.

    MATH  Google Scholar 

  • D. Johann. Clinical proteomics program databank. Technical report, National Cancer Institute, Center for Cancer Research, NCI-FDA Clinical Proteomics Program, 2003. http://ncifdaproteomics.com/ppatterns.php.

  • C. Kainz. Early detection and preoperative diagnosis of ovarian carcinoma (article in german). Wien Med Wochenschr, 146(1–2):2–7, 1996.

    Google Scholar 

  • Michael Kirby. Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns. John Wiley & Sons, New York, 2001.

    MATH  Google Scholar 

  • I. Levner. Proteomic pattern recognition. Technical report, University of Alberta, April 2004. No: TR04-10.

    Google Scholar 

  • R.H. Lilien, H. Farid, and B. R. Donald. Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. Computational Biology, 10(6), 2003.

    Google Scholar 

  • H. Park, M. Jeon, and J. B. Rosen. Lower dimensional representation of text data based on centroids and least squares. BIT, 43(2):1–22, 2003.

    CrossRef  MathSciNet  Google Scholar 

  • S. D. Patterson and R. H. Aebersold. Proteomics: The first decade and beyond. Nature, Genetics Supplement, 33:311–323, 2003.

    CrossRef  Google Scholar 

  • E. F. Petricoin, A. M. Ardekani, B. A. Hitt, P. J. Levine, V. A. Fusaro, S. M. Steinberg, G. B. Mills, C. Simone, D. A. Fishman, E. C. Kohn, and L. A. Liotta. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306):572–577, 2002a.

    CrossRef  Google Scholar 

  • E. F. Petricoin, D.K. Ornstein, C. P. Paweletz, A. Ardekani, P.S. Hackett, B. A. Hitt, A. Velassco, C. Trucco, L. Wiegand, K. Wood, C. Simone, P. J. Levine, W. M. Linehan, M. R. Emmert-Buck, S. M. Steinberg, E. C. Kohn, and L. A. Liotta. Serum preteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute, 94(20):1576–1578, 2002b.

    Google Scholar 

  • W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C: The Art of Scientifi Computing, Second Edition. Cambridge University Press, 2002.

    Google Scholar 

  • Y. Qu, B. Adam, Y. Yasui, M. D. Ward, L. H. Cazares, P. F. Schellhammer, Z. Feng, O. J. Semmes, and Jr. G. L. Wright. Boosted decision tree analysis of surfaceenhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry, 48(10):1835–1843, 2002.

    Google Scholar 

  • R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Class prediction by nearest shrunken centroids, with applications to dna microarrays. Statistical Science, 18(1):104–117, 2003.

    CrossRef  MATH  MathSciNet  Google Scholar 

  • R. Tibshirani, T. Hastiey, B. Narasimhanz, S. Soltys, G. Shi, A. Koong, and Q. Le. Sample classifcation from protein mass spectrometry by ‘peak probability contrasts’. BioInformatics, 2004.

    Google Scholar 

  • B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams, and H. Zhao. Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics, 19(13), 2003.

    Google Scholar 

  • J. D. Wulfkuhle, L. A. Liotta, and E. F. Petricoin. Proteomic applications for the early detection of cancer. Nature Reviews, 3:267–275, 2003.

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Levner, I., Bulitko, V., Lin, G. (2006). Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-35488-8_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)