Abstract
The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.
Similar content being viewed by others
References
Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. Journal of Raman Spectroscopy: JRS, 2016, 47(1): 89–109
Meisel S, Stöckel S, Rösch P, Popp J. Identification of meatassociated pathogens via Raman microspectroscopy. Food Microbiology, 2014, 38: 36–43
Rösch P, Harz M, Schmitt M, Peschke K D, Ronneberger O, Burkhardt H, Motzkus H W, Lankers M, Hofer S, Thiele H, Popp J. Chemotaxonomic identification of single bacteria by micro-Raman spectroscopy: application to clean-room-relevant biological contaminations. Applied and Environmental Microbiology, 2005, 71(3): 1626–1637
Mukherjee S. Classifying Microarray Data Using Support Vector Machines in A Practical Approach to Microarray Data Analysis. Boston: Springer US, 2003, 166–185
Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, Popp J. A comprehensive study of classification methods for medical diagnosis. Journal of Raman Spectroscopy: JRS, 2009, 40(12): 1759–1765
Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England), 2007, 23(19): 2507–2517
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using Support Vector Machines. Machine Learning, 2002, 46(1/3): 389–422
Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006, 83(2): 83–90
Menze B H, Kelm B M, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht F A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009, 10(1): 213
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32
Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics (Oxford, England), 2011, 27(14): 1986–1994
Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273–297
Kloß S, Kampe B, Sachse S, Rösch P, Straube E, Pfister W, Kiehntopf M, Popp J. Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. Analytical Chemistry, 2013, 85(20): 9610–9616
Morháč M, Kliman J, Matoušek V, Veselský M, Turzo I. Background elimination methods for multidimensional coincidence g-ray spectra. Nuclear Instruments & Methods in Physics Research Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 401(1): 113–132
Zhang D, Jallad K N, Ben-Amotz D. Stripping of cosmic spike spectral artifacts using a new upper-bound spectrum algorithm. Applied Spectroscopy, 2001, 55(11): 1523–1531
Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Zeitschrift für Physikalische Chemie, 2011, 225(6–7): 753–764
Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York: ACM, 1992, 144–152
Vapnik V. The Nature of Statistical Learning Theory. 2nd ed. New York: Springer Science & Business Media, 2013
Couvreur C, Bresler Y. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 2000, 21(3): 797–808
Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, 5: 101–141
R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, 2016
Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab–An S4 package for kernel methods in R. Journal of Statistical Software, 2004, 11(9): 1–20
Van Campenhout J M. Topics in measurement selection. In: Handbook of Statistics. Elsevier, 1982, 793–803
Sima C, Dougherty E R. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters, 2008, 29(11): 1667–1674
Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2011, 73(5): 753–772
Lavine B K, Davidson C E, Moores A J, Griffiths P R. Raman spectroscopy and genetic algorithms for the classification of wood types. Applied Spectroscopy, 2001, 55(8): 960–966
Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157–1182
Acknowledgements
Funding of the research project InterSept (13N13852) from the Federal Ministry of Education and Research, Germany (BMBF) is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Bernd Kampe studied bioinformatics at the Friedrich Schiller University Jena, Germany. He subsequently joined the work group of Jürgen Popp in July 2010 to start his Ph.D. studies focused on the identification of microorganisms with micro-Raman spectroscopy and chemometric methods, especially support vector machines. He is currently with the Jena University Language & Information Engineering Lab (JULIE Lab), where his research is aimed at the extraction of information about protein-protein interactions from text.
Sandra Kloß studied chemistry in Jena. In 2015 she received her Ph.D. at the Friedrich-Schiller-University Jena. Currently she is working as a post-doctoral researcher in the work group of Jürgen Popp. Her main research interests are the isolation of microorganisms from complex matrices and their subsequent Raman spectroscopic and molecular biological investigation.
Thomas Bocklitz studied physics at the Friedrich-Schiller-University. He received his diploma in theoretical physics in 2007 and the Ph.D. in chemometrics in 2011. Dr. Bocklitz is a junior research group leader for statistical data analysis and image analysis mostly for biophotonic applications. Dr. Bocklitz research agenda is closely connected with the translation of physical information, measured by AFM, TERS, Raman-spectroscopy, CARS, SHG, TPEF, into medical or biological relevant information. This research led to over 60 publications in peerreviewed journals and his habilitation, which he completed in 2016.
Petra Rösch studied chemistry at the University of Würzburg. Actually she is research associate at the chair of Jürgen Popp at the University of Jena. Her research interests are focused on the investigation of all kind of biological, medical, and pharmaceutical relevant problems with various vibrational spectroscopic methods. Her main focus lays on the characterization and identification of microorganisms with Raman spectroscopy.
Jürgen Popp studied chemistry at the universities of Erlangen and Würzburg. After his Ph.D. in Chemistry he joined Yale University for postdoctoral work. He subsequently returned to Würzburg University where he finished his habilitation in 2002. Since 2002 he holds a chair for Physical Chemistry at the Friedrich-Schiller University Jena. Since 2006 he is also the scientific director of the Leibniz Institute of Photonic Technology, Jena. His research interests are mainly concerned with biophotonics. In particular his expertise in the development and application of innovative Raman techniques for biomedical diagnosis should be emphasized.
Rights and permissions
About this article
Cite this article
Kampe, B., Kloß, S., Bocklitz, T. et al. Recursive feature elimination in Raman spectra with support vector machines. Front. Optoelectron. 10, 273–279 (2017). https://doi.org/10.1007/s12200-017-0726-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12200-017-0726-4