Skip to main content
Log in

Recursive feature elimination in Raman spectra with support vector machines

  • Research Article
  • Published:
Frontiers of Optoelectronics Aims and scope Submit manuscript

Abstract

The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. Journal of Raman Spectroscopy: JRS, 2016, 47(1): 89–109

    Article  Google Scholar 

  2. Meisel S, Stöckel S, Rösch P, Popp J. Identification of meatassociated pathogens via Raman microspectroscopy. Food Microbiology, 2014, 38: 36–43

    Article  Google Scholar 

  3. Rösch P, Harz M, Schmitt M, Peschke K D, Ronneberger O, Burkhardt H, Motzkus H W, Lankers M, Hofer S, Thiele H, Popp J. Chemotaxonomic identification of single bacteria by micro-Raman spectroscopy: application to clean-room-relevant biological contaminations. Applied and Environmental Microbiology, 2005, 71(3): 1626–1637

    Article  Google Scholar 

  4. Mukherjee S. Classifying Microarray Data Using Support Vector Machines in A Practical Approach to Microarray Data Analysis. Boston: Springer US, 2003, 166–185

    Book  Google Scholar 

  5. Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, Popp J. A comprehensive study of classification methods for medical diagnosis. Journal of Raman Spectroscopy: JRS, 2009, 40(12): 1759–1765

    Article  Google Scholar 

  6. Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324

    Article  MATH  Google Scholar 

  7. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England), 2007, 23(19): 2507–2517

    Article  Google Scholar 

  8. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using Support Vector Machines. Machine Learning, 2002, 46(1/3): 389–422

    Article  MATH  Google Scholar 

  9. Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006, 83(2): 83–90

    Article  Google Scholar 

  10. Menze B H, Kelm B M, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht F A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009, 10(1): 213

    Article  Google Scholar 

  11. Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32

    Article  MATH  Google Scholar 

  12. Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics (Oxford, England), 2011, 27(14): 1986–1994

    Article  Google Scholar 

  13. Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273–297

    MATH  Google Scholar 

  14. Kloß S, Kampe B, Sachse S, Rösch P, Straube E, Pfister W, Kiehntopf M, Popp J. Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. Analytical Chemistry, 2013, 85(20): 9610–9616

    Article  Google Scholar 

  15. Morháč M, Kliman J, Matoušek V, Veselský M, Turzo I. Background elimination methods for multidimensional coincidence g-ray spectra. Nuclear Instruments & Methods in Physics Research Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 401(1): 113–132

    Article  Google Scholar 

  16. Zhang D, Jallad K N, Ben-Amotz D. Stripping of cosmic spike spectral artifacts using a new upper-bound spectrum algorithm. Applied Spectroscopy, 2001, 55(11): 1523–1531

    Article  Google Scholar 

  17. Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Zeitschrift für Physikalische Chemie, 2011, 225(6–7): 753–764

    Article  Google Scholar 

  18. Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York: ACM, 1992, 144–152

    Google Scholar 

  19. Vapnik V. The Nature of Statistical Learning Theory. 2nd ed. New York: Springer Science & Business Media, 2013

    MATH  Google Scholar 

  20. Couvreur C, Bresler Y. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 2000, 21(3): 797–808

    Article  MathSciNet  MATH  Google Scholar 

  21. Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, 5: 101–141

    MathSciNet  MATH  Google Scholar 

  22. R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, 2016

    Google Scholar 

  23. Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab–An S4 package for kernel methods in R. Journal of Statistical Software, 2004, 11(9): 1–20

    Article  Google Scholar 

  24. Van Campenhout J M. Topics in measurement selection. In: Handbook of Statistics. Elsevier, 1982, 793–803

    Google Scholar 

  25. Sima C, Dougherty E R. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters, 2008, 29(11): 1667–1674

    Article  Google Scholar 

  26. Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2011, 73(5): 753–772

    Article  MathSciNet  MATH  Google Scholar 

  27. Lavine B K, Davidson C E, Moores A J, Griffiths P R. Raman spectroscopy and genetic algorithms for the classification of wood types. Applied Spectroscopy, 2001, 55(8): 960–966

    Article  Google Scholar 

  28. Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157–1182

    MATH  Google Scholar 

Download references

Acknowledgements

Funding of the research project InterSept (13N13852) from the Federal Ministry of Education and Research, Germany (BMBF) is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jürgen Popp.

Additional information

Bernd Kampe studied bioinformatics at the Friedrich Schiller University Jena, Germany. He subsequently joined the work group of Jürgen Popp in July 2010 to start his Ph.D. studies focused on the identification of microorganisms with micro-Raman spectroscopy and chemometric methods, especially support vector machines. He is currently with the Jena University Language & Information Engineering Lab (JULIE Lab), where his research is aimed at the extraction of information about protein-protein interactions from text.

Sandra Kloß studied chemistry in Jena. In 2015 she received her Ph.D. at the Friedrich-Schiller-University Jena. Currently she is working as a post-doctoral researcher in the work group of Jürgen Popp. Her main research interests are the isolation of microorganisms from complex matrices and their subsequent Raman spectroscopic and molecular biological investigation.

Thomas Bocklitz studied physics at the Friedrich-Schiller-University. He received his diploma in theoretical physics in 2007 and the Ph.D. in chemometrics in 2011. Dr. Bocklitz is a junior research group leader for statistical data analysis and image analysis mostly for biophotonic applications. Dr. Bocklitz research agenda is closely connected with the translation of physical information, measured by AFM, TERS, Raman-spectroscopy, CARS, SHG, TPEF, into medical or biological relevant information. This research led to over 60 publications in peerreviewed journals and his habilitation, which he completed in 2016.

Petra Rösch studied chemistry at the University of Würzburg. Actually she is research associate at the chair of Jürgen Popp at the University of Jena. Her research interests are focused on the investigation of all kind of biological, medical, and pharmaceutical relevant problems with various vibrational spectroscopic methods. Her main focus lays on the characterization and identification of microorganisms with Raman spectroscopy.

Jürgen Popp studied chemistry at the universities of Erlangen and Würzburg. After his Ph.D. in Chemistry he joined Yale University for postdoctoral work. He subsequently returned to Würzburg University where he finished his habilitation in 2002. Since 2002 he holds a chair for Physical Chemistry at the Friedrich-Schiller University Jena. Since 2006 he is also the scientific director of the Leibniz Institute of Photonic Technology, Jena. His research interests are mainly concerned with biophotonics. In particular his expertise in the development and application of innovative Raman techniques for biomedical diagnosis should be emphasized.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kampe, B., Kloß, S., Bocklitz, T. et al. Recursive feature elimination in Raman spectra with support vector machines. Front. Optoelectron. 10, 273–279 (2017). https://doi.org/10.1007/s12200-017-0726-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12200-017-0726-4

Keywords

Navigation