Advertisement

Frontiers of Optoelectronics

, Volume 10, Issue 3, pp 273–279 | Cite as

Recursive feature elimination in Raman spectra with support vector machines

  • Bernd Kampe
  • Sandra Kloß
  • Thomas Bocklitz
  • Petra Rösch
  • Jürgen Popp
Research Article

Abstract

The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.

Keywords

feature selection Raman spectroscopy pattern recognition chemometrics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

Funding of the research project InterSept (13N13852) from the Federal Ministry of Education and Research, Germany (BMBF) is gratefully acknowledged.

References

  1. 1.
    Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. Journal of Raman Spectroscopy: JRS, 2016, 47(1): 89–109CrossRefGoogle Scholar
  2. 2.
    Meisel S, Stöckel S, Rösch P, Popp J. Identification of meatassociated pathogens via Raman microspectroscopy. Food Microbiology, 2014, 38: 36–43CrossRefGoogle Scholar
  3. 3.
    Rösch P, Harz M, Schmitt M, Peschke K D, Ronneberger O, Burkhardt H, Motzkus H W, Lankers M, Hofer S, Thiele H, Popp J. Chemotaxonomic identification of single bacteria by micro-Raman spectroscopy: application to clean-room-relevant biological contaminations. Applied and Environmental Microbiology, 2005, 71(3): 1626–1637CrossRefGoogle Scholar
  4. 4.
    Mukherjee S. Classifying Microarray Data Using Support Vector Machines in A Practical Approach to Microarray Data Analysis. Boston: Springer US, 2003, 166–185CrossRefGoogle Scholar
  5. 5.
    Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, Popp J. A comprehensive study of classification methods for medical diagnosis. Journal of Raman Spectroscopy: JRS, 2009, 40(12): 1759–1765CrossRefGoogle Scholar
  6. 6.
    Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324CrossRefzbMATHGoogle Scholar
  7. 7.
    Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England), 2007, 23(19): 2507–2517CrossRefGoogle Scholar
  8. 8.
    Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using Support Vector Machines. Machine Learning, 2002, 46(1/3): 389–422CrossRefzbMATHGoogle Scholar
  9. 9.
    Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006, 83(2): 83–90CrossRefGoogle Scholar
  10. 10.
    Menze B H, Kelm B M, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht F A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009, 10(1): 213CrossRefGoogle Scholar
  11. 11.
    Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32CrossRefzbMATHGoogle Scholar
  12. 12.
    Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics (Oxford, England), 2011, 27(14): 1986–1994CrossRefGoogle Scholar
  13. 13.
    Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273–297zbMATHGoogle Scholar
  14. 14.
    Kloß S, Kampe B, Sachse S, Rösch P, Straube E, Pfister W, Kiehntopf M, Popp J. Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. Analytical Chemistry, 2013, 85(20): 9610–9616CrossRefGoogle Scholar
  15. 15.
    Morháč M, Kliman J, Matoušek V, Veselský M, Turzo I. Background elimination methods for multidimensional coincidence g-ray spectra. Nuclear Instruments & Methods in Physics Research Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 401(1): 113–132CrossRefGoogle Scholar
  16. 16.
    Zhang D, Jallad K N, Ben-Amotz D. Stripping of cosmic spike spectral artifacts using a new upper-bound spectrum algorithm. Applied Spectroscopy, 2001, 55(11): 1523–1531CrossRefGoogle Scholar
  17. 17.
    Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Zeitschrift für Physikalische Chemie, 2011, 225(6–7): 753–764CrossRefGoogle Scholar
  18. 18.
    Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York: ACM, 1992, 144–152Google Scholar
  19. 19.
    Vapnik V. The Nature of Statistical Learning Theory. 2nd ed. New York: Springer Science & Business Media, 2013zbMATHGoogle Scholar
  20. 20.
    Couvreur C, Bresler Y. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 2000, 21(3): 797–808MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, 5: 101–141MathSciNetzbMATHGoogle Scholar
  22. 22.
    R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, 2016Google Scholar
  23. 23.
    Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab–An S4 package for kernel methods in R. Journal of Statistical Software, 2004, 11(9): 1–20CrossRefGoogle Scholar
  24. 24.
    Van Campenhout J M. Topics in measurement selection. In: Handbook of Statistics. Elsevier, 1982, 793–803Google Scholar
  25. 25.
    Sima C, Dougherty E R. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters, 2008, 29(11): 1667–1674CrossRefGoogle Scholar
  26. 26.
    Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2011, 73(5): 753–772MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Lavine B K, Davidson C E, Moores A J, Griffiths P R. Raman spectroscopy and genetic algorithms for the classification of wood types. Applied Spectroscopy, 2001, 55(8): 960–966CrossRefGoogle Scholar
  28. 28.
    Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157–1182zbMATHGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Bernd Kampe
    • 1
  • Sandra Kloß
    • 1
    • 2
  • Thomas Bocklitz
    • 1
    • 2
  • Petra Rösch
    • 1
    • 2
  • Jürgen Popp
    • 1
    • 2
    • 3
  1. 1.Institute of Physical Chemistry and Abbe Center of PhotonicsUniversity of JenaJenaGermany
  2. 2.InfectoGnostics Research Campus JenaCenter for Applied ResearchJenaGermany
  3. 3.Leibniz-Institute of Photonic TechnologyJenaGermany

Personalised recommendations