Introduction

In the last decades, nanomaterials have become very important because of their useful properties, such as a large surface area to volume ratio and enhanced electrical and optical responses, mainly related to the quantum size effect [1]. By exploiting these properties, outstanding advances have been achieved in science and technology applications. From colorimetric-based assays to ultrasensitive fluorescent sensors, the optical properties of nanomaterials have made it possible to design a wide variety of analytical platforms [2]. In this regard, an efficient sensing platform should be capable to characterize samples with complex matrices, demanding high sensitivity (to monitor relevant changes in composition at trace levels), and selectivity and chemical specificity (to capture and unequivocally identify target molecule(s) present(s) in the system).

Surface-enhanced Raman spectroscopy (SERS) is a nanoscience-based technology that comprises high sensitivity and chemical specificity for molecules adsorbed on nanostructured metallic surfaces. Since the discovery and correct interpretation of SERS in the 1970s, advances in nanotechnology have driven its rapid development and analytical applications [3]. This can be corroborated by the continuous increase in the number of SERS-related publications every year as shown in Fig. 1. In contrast to other highly sensitive techniques (e.g., fluorescence spectroscopy), SERS provides an additional spectral fingerprint that stems from the vibrational modes of the molecules. Therefore, structural and chemical information can be obtained with high sensitivity. The selective adsorption of target molecules is nevertheless unusually observed in the analysis of real-world samples, in which the matrix molecules may be much more concentrated than the analyte and competitive adsorption for the available surface is recurrent. Consequently, at the core of every SERS-based analytical method is the challenge to selectively adsorb target molecules onto the metallic surface before the measurements.

Fig. 1
figure 1

Published SERS papers since its discovery. The number of recently published papers related to SERS coupled to chemometrics (Chem) and machine learning (ML) is shown inset. Source: Web of Science—January 2023 (keywords: SERS, Chemometrics and Machine Learning)

Own to the lack of selectivity, a complex overlapping of SERS signals is often observed. To deal with this issue, one can modify the metallic surface with capture agents that possess a high affinity for the analyte [4, 5]. However, the need for additional fabrication steps and specific capture agents are the main limitations of this approach. Advanced signal processing methods based on chemometrics and machine learning can alternatively be implemented. The incorporation of these methods has been gaining prominence in SERS, mainly due to improvements in selectivity and automatization toward the development of “smart” sensors. As can be seen in Fig. 1 (inset), the number of publications using SERS along with chemometrics and machine learning has rapidly increased in the last 5 years. These multivariate approaches are devoted to extracting key information contained in the data and, therefore, have great potential to boost the development of automated SERS sensors for real-world applications.

In this review, we start by defining some theoretical and practical concepts of SERS, chemometrics, and machine learning to provide a useful background in the context of analytical chemistry. We then survey recent advances in the combination of SERS and a range of different multivariate models for both qualitative and quantitative applications. Limitations, challenges, and trends of this coupling to achieve the much-desired SERS selectivity via advanced data analysis are also discussed. Finally, benchmarking and insights about the selection of the suitable multivariate method for SERS applications are presented.

Surface-enhanced Raman spectroscopy/scattering (SERS)

The intensity of Raman scattering of molecules adsorbed onto nanostructured metals (e.g., Ag, Au, and Cu) is substantially enhanced if compared to the normal Raman measurements. This enhancement effect, known as SERS, stems from chemical and electromagnetic mechanisms that will be described in this section.

The Raman scattering intensity of a vibrational mode of frequency \({\omega }_{vib}\) excited with radiation laser of frequency \({\omega }_{0}\) can be expressed as [6]:

$${I}_{Raman}\propto {({\omega }_{0}\pm {\omega }_{vib})}^{4}N{|{E}_{0}|}^{2}\sum\nolimits_{\rho ,\sigma }{|{\alpha }_{\rho \sigma }|}^{2}$$
(1)

where \({\omega }_{0}\pm {\omega }_{vib}\) is the frequency of the scattered radiation (\(+\) for anti-Stokes and \(-\) for Stokes), N is the number of scattering molecules, \({E}_{0}\) is the incident radiation electric field amplitude, and \({\alpha }_{\rho \sigma }\) are the elements of the Raman polarizability tensor. For molecules with small polarizability, such as water, the Raman and SERS intensities are very weak. This constitutes an important advantage for studying molecules in an aqueous solution, unlike infrared absorption spectroscopy. The chemical mechanism is related to modifications of the polarizability tensor upon molecular adsorption, especially in chemisorption that leads to strong interactions between molecular and metal electronic states [7]. On the other hand, the electromagnetic mechanism stems from modifications of the local field probed by a molecule at the metal surface (distance < 10 nm) and modifications in the dipole radiative emission rate [6]. As the electromagnetic mechanism is mainly responsible for signal enhancement, we will focus on it to highlight analytical applications. It should be noted, however, that important effects due to chemical mechanisms can be observed under specific experimental conditions, such as in spectroelectrochemical studies [8].

For nanostructured materials made of Ag, Au, and Cu, the incident radiation fields can strongly couple with excitations of surface plasmon polariton modes in the metal (collective oscillation of charge density). The general result of such excitations is a considerable increase in the electric field near the metallic surface, especially in the case of coupled nanoparticle systems [9, 10]. As an example, Fig. 2A (black line) presents a numerical simulation based on the generalized Mie theory for the near-field enhancement \({(|E|/|{E}_{0}|)}^{2}\) in the gap of two Ag nanospheres of 50 nm in diameter. At such regions, known as hot spots, it is possible to achieve very large field enhancements, making it even possible to experimentally detect signals from single molecules [11]. For this reason, it is a common analytical practice to promote nanoparticle cluster/aggregate formation before SERS measurements. Another important aspect of Raman scattering and SERS is the induced dipole in a molecule oscillating at a frequency \({\omega }_{0}\pm {\omega }_{vib}\) (see Eq. 1). For such an oscillating dipole, the relaxation may occur either via a radiative decay, giving rise to an inelastically scattered field, or via a non-radiative decay pathway. The key point here is that the radiative decay rate (scattering probability) can be strongly enhanced in the presence of metallic nanomaterials [6]. Figure 2A (red lines) shows the enhancement in the radiative decay rate, i.e., the ratio between rates in the presence and absence of nanoparticles, \(\Gamma /{\Gamma }_{0}\). The plasmon modes increase the scattering intensity associated to an oscillating dipole. Interestingly, the scattering enhancement is dependent on dipole orientation relative to the metal surface. Figure 2A also displays a comparison of the radiative decay rate enhancement for a dipole perpendicular to the metal surface and after 45° rotation. In SERS, such dipole configurations simulate orientations of vibrational modes, and the largest increase in scattering is for vibrations oscillating perpendicularly to the metal surface. This result leads to differential enhancements in the Raman signals among vibrational modes depending on orientations relative to the surface. Therefore, the SERS spectrum may display significant changes in terms of relative band intensities when compared to normal Raman measurements [6].

Fig. 2
figure 2

Mie theory simulations for a dimer of Ag nanospheres. A Local field enhancement (computed as the squared ratio between local and incident electric fields, black line), and dipole radiative decay rate enhancement (ratio between decay rates in the presence and absence of nanoparticles, red lines). Two dipole orientations are considered: perpendicular and 45° rotation (inset scheme). The field and radiative enhancements are computed as a function of wavelength (incident and scattered radiation). B \({E}^{4}\) approximation simulation of SERS enhancement factor spatial distribution for a 633 nm excitation source

The discussion about the electromagnetic mechanism leads to the following figure of merit describing the expected enhancement in SERS measurements relative to normal Raman:

$$F={(\frac{\left|{E}_{loc}\right|}{\left|{E}_{0}\right|})}_{{\omega }_{0}}^{2}.{(\frac{\Gamma }{{\Gamma }_{0}})}_{{\omega }_{0}\pm {\omega }_{vib}}$$
(2)

where \(F\) is the so-called SERS enhancement factor, considering only the electromagnetic mechanism. The first term represents the near-field enhancement contribution (black line in Fig. 2A), whereas the second term describes the modification in dipole spontaneous emission (red lines in Fig. 2A). Note that, for a dipole perpendicular to the surface, the scattering probability and near-field enhancements are virtually the same, which allows approximating:

$$F\approx {(\frac{|{E}_{loc}|}{|{E}_{0}|})}_{{\omega }_{0}}^{2}.{(\frac{|{E}_{loc}|}{|{E}_{0}|})}_{{\omega }_{0}\pm {\omega }_{vib}}^{2}\approx {(\frac{|{E}_{loc}|}{|{E}_{0}|})}_{{\omega }_{0}}^{4}$$
(3)

where the second approximation, a.k.a. \({E}^{4}\) approximation [12], is due to a small difference between incident and inelastic field frequencies (see Raman shift scale in Fig. 2A).

The largest SERS enhancements can be achieved for excitation source in resonance with plasmon excitations tuned via nanoparticle/hot spot engineering. Plasmonic nanomaterial fabrication is a very active research area, and they can be obtained by bottom-up or top-down approaches [3, 12]. Lithographic techniques are excellent examples of the latter, in which shapes are crafted on a solid platform. The main limitation with such techniques is the not cost-effective spatial resolution the patterns can be created. Alternatively, one can produce SERS substrates by adsorbing metal nanoparticles onto rigid surfaces to generate hot spots with large enhancement factors [13,14,15]. Wet chemistry synthesis is widely employed as a bottom-up methodology for obtaining colloidal nanoparticles with different shapes (spheres, rods, prisms, stars, cubes, etc.) and sizes. Furthermore, hot spot engineering and self-assembly strategies allow the fabrication of ultrasensitive rigid SERS substrates from these nanoparticles [3, 12]. SERS measurements can be performed by direct analyte addition to the colloid or onto the rigid SERS substrate. In colloidal solution, nanoparticle aggregation yields strong SERS signals from the analyte because of the large number of hot spots created. Nevertheless, uncontrolled aggregation and natural shape dispersion of particles may decrease SERS repeatability. Controlled aggregation and self-assembly of nanoparticles have been reported in the literature to mitigate this problem [16,17,18,19].

Even under controlled conditions or the use of high-quality rigid SERS substrates, intensity fluctuations may happen because of the hot spot spatial localization, especially at low analyte concentrations [20]. Figure 2B shows a 2D map simulation for the enhancement factor distribution in a hot spot at 633 nm excitation. A molecule in the hot spot region should display a large enhancement in the scattering intensities, making it possible to detect spectroscopic signals at ultra-low concentrations (e.g., below 10−12 mol L−1). However, moving a few nanometers from the hot spot center is enough to decrease \(F\) by various orders of magnitude. This means that only molecules approaching the hot spot can produce significant SERS intensities to be detected at ultra-low concentrations, imposing a spatial limitation to the SERS response. Moreover, once \(F\) is not homogeneously distributed on the metal surface, the SERS intensities do not show a linear behavior with the increase molecular surface coverage [20]. Therefore, SERS intensities do not linearly scale with analyte concentrations at trace levels.

The electromagnetic mechanism predicts the same enhancement regardless of the molecular structure, though the affinity for the metal surface can dramatically change from one molecule to another (e.g., mercaptobenzoic acid displays a much higher affinity for metallic surfaces than glucose). Since the electromagnetic mechanism is dominant in most of the experiments, SERS is not considered a selective technique and depends on the competitive diffusion/adsorption of the molecules onto the metal surface. Thus, a SERS spectrum may contain a highly complex mixture of bands associated to the components of the sample. One strategy to selectively adsorb an analyte present in a complex matrix is by modifying the metal surface with capture agents that possess strong interactions with the analyte [4, 5, 21]. However, this procedure is subjected to additional experimental steps and might require specific capture agents when interferences with similar chemical structures to the analyte are present.

All these pieces of evidence lead to the need for alternative metrics and protocols for using SERS in real-world applications. In this context, chemometrics and machine learning can be useful tools to extract relevant qualitative and quantitative information with minimum sample preparation.

Multivariate data analysis: chemometrics and machine learning

In this section, we describe the fundamentals of multivariate data analysis, with an emphasis on chemometrics and machine learning methods that have already been implemented along with SERS. Although no sharp distinction can be assumed between chemometrics and machine learning, we adopt a terminology compatible with the literature: (1) we refer to chemometrics as those methods based on principal component and factor analysis, related to dimensionality reduction (e.g., principal component analysis, partial least squares, and multivariate curve resolution), and (2) we refer to machine learning as a set of advanced algorithms that learn from data and are applicable for non-linear/high complexity modelling (e.g., support vector machine, artificial neural networks, and random forest). Furthermore, most of the papers discussed in this review are focused on supervised methods, in which the datasets are labeled/tagged with actual answers/values (unlike unsupervised methods).

Data preprocessing

Preprocessing methods are useful to compensate for undesired external sources of signal fluctuations apart from the chemical components of the sample [22]. They can be applied on rows (objects/samples) or columns (variables/Raman shifts) of a data matrix. The most common artifacts in SERS are baseline deviations, fluorescence, misalignment, and noise. Despite centering and autoscaling being very popular preprocessing methods, autoscaling is not always recommended since it might give equal weights for spectral bands of different intensities and increase noise. Savitzky-Golay smoothing is a well-known method for filtering noise and consists of a moving-average filter, in which a polynomial function is fitted [23]. Filtering noise should be carefully used because important chemical information can be lost when using a too-large window size (e.g., more than 15 points per window). Moreover, this might introduce correlations into the noise structure, leading to worst model predictions when compared to the use of raw data [24]. SERS data may also contain linear baseline deviations, evidenced by a constant offset of instrumental variations, and/or non-linear ones, such as drifts caused by other physical effects. In offset correction, the baseline is estimated, for example, by asymmetric least squares or penalized least squares [25], and then subtracted from the spectrum. Derivatives are also useful preprocessing methods to improve the visualization of spectral differences [26], but they might also increase the spectral noise. The first and second derivatives remove linear and non-linear sample-dependent background signals, respectively. Although derivatives change the spectral format and might difficult the interpretation of informative vectors (e.g., spectral loadings and regression coefficients), they can still be associated to the original spectra.

Principal component analysis (PCA)

PCA is the most popular unsupervised chemometric method for pattern recognition/exploratory analysis, and the basis of various chemometric models [22, 27]. In SERS, one often handles data matrices containing hundreds or thousands of Raman shifts. PCA decomposes the data matrix by means of linear combinations of correlated original variables (Raman shifts) to create a reduced number of variables (principal components, PCs) containing valuable vibrational information. Figure 3A describes this dimensionality reduction strategy. Hence, a SERS dataset \({{\varvec{X}}}_{{\varvec{n}},{\varvec{m}}}\) (\(n\) samples and \(m\) Raman shifts) can be expressed by matrices of scores \({{\varvec{T}}}_{{\varvec{n}},{\varvec{A}}}\) and loadings \({{\varvec{P}}}_{{\varvec{m}},{\varvec{A}}}\):

$${\varvec{X}}={{\varvec{t}}}_{1}{{\varvec{p}}}_{1}^{T}+{{\varvec{t}}}_{2}{{\varvec{p}}}_{2}^{T}+\dots +{{\varvec{t}}}_{A}{{\varvec{p}}}_{A}^{T}+{\varvec{E}}={\varvec{T}}{{\varvec{P}}}^{T}+{\varvec{E}}$$
(4)

where \(A\) is the selected number of PCs, \({{\varvec{t}}}_{1}\) and \({{\varvec{p}}}_{1}\) are the score and loading vectors of the first PC, respectively, and \({\varvec{E}}\) contains the residual variance not described by the model. Scores (\({\varvec{T}}\)-matrix) provide the composition of each PC related to the samples, whereas loadings describe this composition in terms of variables (\({\varvec{P}}\)-matrix). For SERS data, one can plot the loadings for each PC as a function of Raman shifts to correlate them with differences or similarities among samples visualized in the scores plot. Moreover, the PCs are modeled in descending order of explained variance, the columns of \({\varvec{T}}\) are mutually orthogonal, and the columns of \({\varvec{P}}\) are orthonormal, i.e., uncorrelated. The suitable number of PCs can be determined by assessing the explained/captured variance and the random behavior of the residuals. Occasionally, one can be interested in the information representing a small part of the data variance (e.g., 10% or even less), but should be careful not to include spectral noise information in the model. Outliers can be identified in PCA models by using Hotelling’s \({T}^{2}\) statistics and high \(Q\) residues. While the \(Q\) statistic measures the unexplained variance of each sample (sum of the squares of the residual vectors), Hotelling’s T2 statistic represents the variation in each sample within the model [27]. For objective decisions in outlier detection, confidence limits for these two parameters are estimated at a certain probability level. Nevertheless, outliers should carefully be handled and interpreted.

Fig. 3
figure 3

A Dimensionality reduction using PCA in three steps: (1) estimate of PCs, (2) subspace representation, and (3) projection on the PCs of interest. B Schematic representation of PLS by maximizing the correlation between \({\varvec{X}}\) and \({\varvec{Y}}\) matrices

Soft independent modelling of class analogy (SIMCA)

SIMCA is a linear class modelling method introduced by S. Wold in 1976 [28] and used for supervised authentication and classification purposes. SIMCA may be considered as consisting of PCA models individually developed for predefined target classes with the best numbers of PCs selected by cross-validation. As a class modelling algorithm, SIMCA defines multivariate enclosed class spaces at a predefined confidence level. Class distances are estimated as statistical probabilities and SIMCA allows for a straightforward detection of outliers as samples not fitting to any of the modeled classes. Moreover, new samples can be assigned to more than one class when there is class overlapping. SIMCA is also suitable for one-class modelling, required for authentication purposes, mainly in food analysis. In this case, information from any non-targeted class should be avoided during modelling because it may introduce bias related to specific patterns of non-authenticity [29]. Thus, a rigorous one-class modelling considers only the target class information. Criteria for sample assignment in SIMCA are based on both \(Q\) and Hotelling’s \({T}^{2}\) statistics, which are useful for outlier detection and were introduced from the multivariate statistical process control [30].

Partial least squares (PLS) and partial least squares discriminant analysis (PLS-DA)

PLS was introduced in the 1980s [31] and has become the most popular multivariate calibration method. PLS models can quantify analytes or properties in the presence of interferences, which are introduced in the calibration samples. PLS decomposes the dataset \({\varvec{X}}\) in scores and loadings, but with the scores estimated in such a way as to maximize the correlation with a dependent variable \({\varvec{y}}\) (or \({\varvec{Y}}\) in the case of more than one). This is accomplished by the simultaneous decomposition of the two data blocks (see Fig. 3B). The factors in PLS are thus constrained and might deviate from orthogonality, giving origin to the so-called latent variables (LVs). An additional advantage of PLS is that a relationship between \({\varvec{X}}\) and \({\varvec{y}}\) may be detected even if the key variables have a small contribution to the first LV. The implementation of PLS in situations in which more than one dependent variable is predicted (\({\varvec{Y}}\)), a.k.a. PLS2, is advisable only when there is a strong correlation between predicted variables [22]. The determination of the proper number of LVs is crucial for developing robust PLS models and is performed by internal cross-validation based on minimizing the root mean square error of cross-validation (RMSECV). A very small number of LVs may result in underfitting because important information/variance is left out of the model. On the other hand, a very large number of LVs may render an overfitted model unable to provide accurate predictions for external validation/test samples. An indicative of overfitting is a much larger root mean square error of prevision (RMSEP) than the root mean square error of calibration (RMSEC). Some inadequate practices include the biased choice of the calibration/validation samples, the absence of an external validation set, or the expansion of the data set by incorporating replicates/virtual samples instead of true independent samples. An accurate PLS model should explain most of the variance of both \({\varvec{X}}\) and \({\varvec{y}}\) blocks.

PLS-DA is based on PLS regression and is suitable for discrimination/classification purposes [22]. Nevertheless, it is worth emphasizing that there is a key difference between PLS-DA and SIMCA. SIMCA is suitable for one-class modelling, when having a well-defined class, training a model, producing a prediction rule, and proceeding to include future samples in this specific class, or a universe of less well-defined target classes [32]. PLS-DA on the other hand is suitable for discrimination, when we have two or more well-defined classes, train a model, produce a discrimination rule, and proceed to assign future samples to any of these previously defined samples. In the PLS-DA model, the response vector \({\varvec{y}}\) (or matrix \({\varvec{Y}}\)) is constituted by values of 0 and 1 (or more depending on the number of classes) indicating whether a sample belongs or not belongs to a defined class. The suitable threshold value for the class separation can be calculated using Bayes’ theory or the receiver operating characteristic curve (ROC).

Artificial neural networks (ANN)

Inspired by the human brain’s capacity to perform very complex cognitive, perceptual, and control tasks, ANN was first proposed by McCulloch and Pitts in the 1940s [33]. ANN is a powerful mathematical tool used to accurately solve complex classification and regression problems [34]. ANN transforms a vector/SERS spectrum containing \(m\) input variables/Raman shifts, \({\varvec{x}}\boldsymbol{ }\epsilon \boldsymbol{ }{R}^{m}\), into an output signal \(y\) using an activation/transfer function \(f\), a threshold \(b\), and the weight vector \({\varvec{w}}\boldsymbol{ }\epsilon \boldsymbol{ }{R}^{m}\):

$$y=f\left({\varvec{w}}{\varvec{x}}-b\right)=f\left(\sum\nolimits_{i=1}^{m}{w}_{i}{x}_{i}-b\right) \left\{\begin{array}{c} 1, if \sum_{i=1}^{m}{w}_{i}{x}_{i}-b \ge 0\\ 0, if \sum_{i=1}^{m}{w}_{i}{x}_{i}-b <0\end{array}\right.$$
(5)

The general idea of ANN is mathematical data processing, which is performed by the weighted sum and application of an activation function. Alternative functions can also be considered, for instance, the rectified linear unit. The ANN model estimates the weight vector \({\varvec{w}}\) and the threshold value \(b\), such as they separate the classes or calculate the actual concentration/property value correctly. The predictive accuracy can be optimized by minimizing an error function of desired/actual and predicted values for all the calibration/training samples. During the learning process, the error value is minimized by gradually updating the parameters \({\varvec{w}}\) and \(b\).

A simple artificial neuron (a.k.a. perceptron) is shown in Fig. 4A and can solve only a very limited number of linearly separable problems. However, the use of specific activation functions, the backpropagation method, and more elaborated ANN architectures enable solving of highly complex problems. The basic architecture of an ANN consists of input, hidden, and output layers, and the number of hidden neurons directly affects the model performance [34]. For example, Fig. 4B shows a fully connected ANN with two hidden layers. A very large number of hidden neurons may ensure correct learning but poor predictions, whereas a very small number of them may imply a poor learning process or the error may not achieve an acceptable value. Modelling ANN with several hidden layers is sometimes required to solve highly complex qualitative and quantitative problems, thereby giving origin to deep learning. ANN architectures can be divided into four main groups [34]: (1) single-layer feedforward networks, (2) multilayer feedforward networks, (3) recurrent networks, and (4) mesh networks.

Fig. 4
figure 4

A Representation of signal processing by ANN. B ANN architecture with multiple hidden layers, which is the basis of deep learning

Support vector machine (SVM) and support vector regression (SVR)

SVM is a machine learning algorithm originally proposed by Vapnik and consists of finding a suitable hyperplane to maximize class separation [35]. Such hyperplane can be estimated using a set of \(n\) samples (e.g., one SERS spectrum per sample) and \(m\) measurable variables/Raman shifts. The dataset of \(n\) individual samples \({{\varvec{x}}}_{{\varvec{i}}}\boldsymbol{ }\epsilon \boldsymbol{ }{R}^{m}\) with labels \({y}_{i }\epsilon \boldsymbol{ }\{-1,+1\}\) is assumed to be separable by a hyperplane (\({\varvec{w}}{\varvec{x}}+b=0\), b is a constant) using the decision rule:

$$f\left({{\varvec{x}}}_{{\varvec{i}}}\right) \left\{\begin{array}{c} w{{\varvec{x}}}_{{\varvec{i}}}+b\ge 0\, for\, {y}_{i}=+1\\ w{{\varvec{x}}}_{{\varvec{i}}}+b\le 0\, for\, {y}_{i}=-1\end{array}\right. or\, {y}_{i}\,f\left({{\varvec{x}}}_{{\varvec{i}}}\right)\ge 1$$
(6)

A geometrical representation of SVM for linearly separable classes is shown in Fig. 5A. Assuming a separation margin \(M\) between the class boundary and the training samples, the objective of SVM is to find a hyperplane that maximizes \(M\). This is an optimization problem that can be solved by using Lagrange functions. Additionally, in the case of not fully separable classes, a slack variable \(\xi\) is introduced:

Fig. 5
figure 5

A Basic elements in SVM for classification. B Geometrical representation of SVM for non-linearly separable classes

$${y}_{i}\left({\varvec{w}}{{\varvec{x}}}_{{\varvec{i}}}+b\right)-1+{\xi }_{i}\ge 0\, where\, {\xi }_{i}\ge 0\, \forall i$$
(7)

A SVM soft margin is built, and the misclassified data points have a penalty that increases with the distance. To minimize the number of misclassified points, a new parameter \(C\) is introduced to balance \(\xi\) and \(M\). Hence, the objective is to minimize the Lagrangian with respect to \({\varvec{w}}\), \(b\), and \(\xi\) considering the Lagrangian multipliers. The parameters are calculated, and the model can then be tested for new objects/samples.

In the case of SVR, the SVM algorithm is adapted to perform regression and it is necessary to predict a real output \({y}_{i} \epsilon R\):

$${y}_{i}={\varvec{w}}{{\varvec{x}}}_{{\varvec{i}}}+b$$
(8)

SVR uses sophisticated penalty functions and there is no penalty if the predicted value \({\widehat{y}}_{i}\) is similar (tolerance value of \(\epsilon\)) to the actual value \({y}_{ref,i}\) (i.e., \(\left|{\widehat{y}}_{i}-{y}_{ref,i}\right|<\epsilon\)). The region \({y}_{i}\pm \epsilon\) is called the \(\epsilon\)-insensitive tube and the output variables that are outside have a slack variable penalty above (\({\xi }^{+}\)) or below (\({\xi }^{-}\)) the tube (\({\xi }^{+}>0, {\xi }^{-}>0 \forall i\)). After calculating the parameters, new predictions are estimated.

In general, to apply SVM for non-linear data, a new dot product is defined by using a family of functions called kernel functions [36]. The direct calculation of the dot product \({\varvec{\phi}}({{\varvec{x}}}_{{\varvec{i}}}){\varvec{\phi}}({{\varvec{x}}}_{{\varvec{j}}})\) without the need for converting each object \({\varvec{x}}\) to the higher dimension object \({\varvec{\phi}}({\varvec{x}})\) is called the kernel trick. It is very useful because several systems are not linearly separable in the original space (see Fig. 5B). Some of the most popular kernel functions are linear, polynomial, radial/Gaussian, and sigmoidal functions.

Analytical validation

Though analytical validation of multivariate methods is not thoroughly well-established, especially for quantitative models, it is essential to show that the developed method fulfills its purpose by estimating proper figures of merit (FOM). For qualitative methods [37, 38], FOM are based on evaluating predicted discrete categorical/dummy variables including the rates of false positive (FP) and false negative (FN) predictions. Important FOM are sensitivity (STR) rate, complementary to FN rate, and specificity (SPR) rate, complementary to FP rate. Predictive accuracy and F1-score are also important FOM for global comparison of qualitative methods. Although accuracy is a more intuitive value, the F1 score deals better with imbalanced datasets (classes with very different numbers of samples). ROC curves can also be used to evaluate the classification model capability and the area under the curve (AUROC) is calculated for each class after plotting STR versus SPR rates. This approach searches for a threshold value that simultaneously minimizes the number of FP and FN predictions.

For multivariate calibration models [22, 39], it is necessary to corroborate whether the method has adequate FOM for its specific application. FOM evaluated for multivariate calibration methods include accuracy (trueness and precision), linearity, analytical sensitivity, confidence intervals, bias, residual prediction deviation, and limits of detection (LOD) and quantification (LOQ). Furthermore, any model must be rigorously validated by carefully and systematically using calibration and internal and external/test validation sets. Trueness indicates the agreement between reference and predicted values, and RMSEP is the main parameter to express it. Relative errors for calibration and validation samples, as well as precision, should also be estimated and mentioned. For quantitative multivariate methods, the fit is evaluated through the correlation coefficient (R) of a plot of reference/actual versus predicted values. To assess linearity, an R value close to one and the random behavior of the residuals should be verified by several tools, such as visual inspection ANOVA, or appropriate statistical tests (checking for normality, homoscedasticity, and independency).

Qualitative data analysis in SERS applications

Multivariate methods and qualitative analysis have established a vibrant relationship in the last years, overcoming quantitative applications in the literature [40]. Taking advantage of the high sensitivity of SERS, qualitative multivariate applications have focused on the discrimination and detection of pathogens [41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59], cancer studies and medical diagnosis [60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83], and food and forensic analysis [84,85,86,87,88,89,90,91,92,93,94,95,96]. To systematically validate the multivariate models, external sets of independent samples should be used based on clear, systematic, and representative criteria. Only internal cross-validation or permutation tests are not sufficient to establish a reliable validation, though it may seem tempting in some situations, providing a performance that can look better for a non-specialist at first sight [97]. Unfortunately, a significant number of papers have not adopted an external validation. Another key point often absent in the literature is the spectral interpretation of the model to understand the underlying chemical phenomena. Particularly for PLS-DA, the so-called informative vectors, such as regression coefficients and variable importance in projection (VIP) scores [98], contain important model information that should be interpreted. Authors using multivariate analysis should not be restricted to the question “what is there?” but rather should search to answer other queries: “what is its/their relation to?” and “what is the difference between?” [99].

Discrimination and detection of pathogens

An example of proper PLS-DA model interpretation has been reported for the rapid discrimination of bacteria using filter paper–based SERS substrates [50]. PLS-DA was built with 180 samples (120 for training and 60 for validation), providing 100% of accuracy for discriminating genera. Uncertainties for model predictions were calculated by a resampling bootstrap strategy, and an in-deep model interpretation was carried out using VIP scores. This allowed the association of discriminant information to specific SERS bands of purine bases and the identification of possible metabolic pathways. Finally, a potentially new species of bacteria, not present in the training dataset, was correctly identified as an outlier using Q residues and Hotelling’s T2 statistic. The combination of SERS, PLS-DA, and suitable model validation to discriminate gram-positive and gram-negative bacteria [42] and sensitive and resistant E. coli strains [46] have also been reported. Spectral interpretations were performed using the loadings of the PCA models and PLS-DA results were discussed by comparing ROC curves, which is a very useful tool in qualitative analysis. Score plots of the first two LVs were used to show PLS-DA results, like in unsupervised PCA models. However, some authors have argued this is not the most reliable way to represent the results of supervised models [97, 100]. The interpretation of these scores can also be misleading if the number of variables far exceeds the number of samples, which is a common situation in SERS.

SERS analysis of bacteria using Au nanoparticles (NPs) oligomers and ANN has been reported for rapid antimicrobial susceptibility testing [52]. The authors proposed the use of a variational autoencoder, an interesting approach inspired by natural language processing that projected the data in a latent space to improve the visualization of the results. Analogously to PCA, this latent space represents the distribution of the samples and allows the identification of SERS fingerprints of the main metabolites. Following this idea, discrimination of multidrug-resistant bacteria has been performed by combining SERS and autoencoder-SVM (see Fig. 6) [41]. The autoencoder consisted of feedforward neural networks with non-linear activation functions to introduce the data in the SVM model. In contrast to the linear combination established in PCA-based dimensionality reduction, this method can directly adjust non-linear data. However, a much higher computational cost is required.

Fig. 6
figure 6

A Schematic description of a SERS-based protocol for bacterial resistant assessment using SVM. Figures of merit for the discrimination analysis using B autoencoder-SVM and C PCA-SVM. Reprinted from [41], with permission from Elsevier

SERS and multivariate analysis have also been applied to the detection of viruses. Serum samples containing hepatitis B virus (HBV) from clinically diagnosed patients have been discriminated from controls with excellent accuracy (98%) and proper spectral interpretation of biomarkers [43]. A limitation of this method was the small number of samples analyzed (30 HBV-positive and 11 healthy individuals) that might compromise the reliability and robustness of the proposed PLS-DA model. Additional multivariate methods used along with SERS to discriminate and detect pathogens are highlighted in Table 1. This table includes, for example, the use of linear discriminant analysis (LDA) and decision trees (DT). In LDA, a linear function is calculated by maximizing and minimizing the interclass and intraclass variances, respectively, thus dividing the space into regions/classes. DT is a non-parametric supervised algorithm that uses a flowchart-like tree model and is constituted by a root node, internal nodes (tests), branches (outcomes of the tests), and leaf nodes (class labels) [101].

Table 1 Complementary recent and relevant applications combining SERS and qualitative multivariate methods

Cancer studies and medical diagnosis

Portability is an important trend in spectroscopy for rapid and in situ sample analysis. For example, incorporating portable methods in consumer devices, point-of-care tests, and smart appliances to become part of the internet of things [102]. As can be seen in Fig. 7A–C, a portable SERS-based breathalyzer has been designed for the rapid diagnosis of COVID-19 using PLS-DA [71]. Breath samples were collected for 501 participants, 15% of them COVID-positive, and the method displayed an excellent performance (sensitivity and specificity greater than 96%). It is worth mentioning that such an imbalanced ratio of classes may lead to reliability problems when the models are built using only accuracy as the response. The portable SERS device successfully detected breath volatile organic compounds, whose concentrations were affected by the immune response and metabolic changes of COVID-19 patients. Orthogonal PLS-DA (OPLS-DA) is a variant of PLS-DA widely used in metabolomics that allegedly provides better prediction ability and easier model interpretation [103], although this is not a consensus. OPLS-DA has been applied to detect cervical and breast cancer in serum by SERS, jointly with LDA and PCA-SVM [82]. The proposed methods presented excellent accuracies (in the range of 93–98%) and allowed for non-invasive cancer diagnosis.

Fig. 7
figure 7

A SERS-based sensor for breath analysis, B detection mechanism by surface modification, and C discriminant analysis of healthy and infected individuals. D Deep learning architecture used for SERS diagnostic of brain cancer location. Figures reproduced with permission. Adapted with permission from [71] and [72]. Copyright 2022 American Chemical Society

One can also take advantage of the excellent predictive accuracy of ANN in medical diagnosis. By using ANN and ultrasensitive Ni-NiO SERS substrates, it has been possible to identify the type of cancer (lung, breast, and colorectal and brain cancer), and even the stage and location of brain cancer with accuracies of 100% and 96%, respectively (see Fig. 7D) [78]. Importantly, the liquid biopsy of a serum instead of conventionally invasive and time-consuming tissue biopsy highlights the capability of SERS to characterize trace levels of cancer biomarkers in complex matrices. Following a similar idea, breast, lung, and colorectal cancer have been discriminated with 100% accuracy by using ANN and a tridimensional SERS nanosensor. Although ANN provided better predictions than PLS-DA, associating discriminant results to chemically meaningful SERS bands is easier in PLS-DA models (e.g., by evaluating the VIP scores). A DNA-target functionalized surface has been combined with SERS in a decision system for detecting a specific oligonucleotide sequence identical to a gene fragment responsible for β-lactam antibiotic resistance [72]. The decision system was composed of a Siamese neural network (a class of neural network architectures containing two or more identical subnetworks) coupled with robust statistics and Bayes decision theory. This strategy was complex but very sensitive, being able to detect target oligonucleotides at a picomolar level.

SVM has also shown excellent performance in medical diagnosis using SERS. SVM-based applications have included the diagnosis of lung and colon [74], and colorectal cancer [75], with accuracies greater than 80%. In the latter application, differences between the mean SERS spectra of healthy individuals and colorectal cancer patients were assessed by PCA before supervised modelling. The preliminary PCA approach allowed the identification of differences in the SERS spectra of the classes by evaluating scores and loading plots. It is worth empathizing that, in addition to the advisable unsupervised exploratory analysis (e.g., by PCA), the correlation between the discriminant results of the supervised model (e.g., SVM) and SERS bands should be performed to validate the method. A comparison of SVM with other machine learning algorithms has been performed for the diagnosis/prognosis of coronary artery disease at different stages with 92% of overall accuracy [66]. The authors used a Ti/Au SERS substrate to enhance the signal of small extracellular vesicles (EVs) from plasma samples, a new trend in liquid biopsy–based diagnosis. The use of simple and efficient supervised methods for discriminant analysis, such as LDA and decision trees, is a parsimonious alternative to SVM and ANN. For example, the comparison among SVM, RF, DT, and LDA has indicated the superior accuracy of LDA for detecting breast cancer in serum (83%), whereas DT presented the best accuracy in urine (89%) [81]. Importantly, the authors found that the SERS spectra of serum and urine are dominated by the purine metabolites, and not sugars, lipids, or coenzymes, as previously suggested in the literature.

SVM, RF, and convolutional neural networks have been compared in cancer studies using a magnetically assisted sandwich assay with ultrasensitive SERS nanotags made of Au@Ag cages [70]. Convolutional neural networks displayed the highest predictive accuracy and were capable to recognize specific miRNA cancer biomarkers extracted from tissue samples for potential head and neck cancer diagnosis. Although RF displayed a lower predictive accuracy than SVM and ANN, it provided a lower risk of overfitting. SERS has also been employed to differentiate and characterize EVs derived from pancreatic tissue and bone marrow using machine learning [68]. Logistic regression provided better sensitivity (89%) and specificity (88%) than SVM, random forest (RF), and naïve Bayes and CN2 rule induction. Logistic regression is a linear method that estimates class membership probability, whereas RF (an algorithm that combines several DT) and naïve Bayes (a family of classifiers based on Bayes’ theory) are non-linear. Moreover, naïve Bayes works better for an uncorrelated and small number of variables, which is not the case with SERS data. Though no external validation was performed, the authors suggested the development of a platform for detecting cancer-derived EVs. Additional complementary papers using multivariate models in cancer studies and medical diagnosis are highlighted in Table 1.

Food and forensic analysis

Another important field of SERS is food analysis, with the potential to implement these methods in portable sensors for real-life end-user applications. Although discriminant classification models (PLS-DA and LDA) have been commonly used, class modelling by SIMCA and its variants were also implemented. This is in connection with the controversy in the literature, mainly in food application, about discriminant and one-class modelling [32, 104]. SERS and data-driven-SIMCA (DD-SIMCA) have also been combined to detect the pesticide thiabendazole in water extracts of mango peels using Au NPs-coated paper as substrate [90]. The one-class DD-SIMCA model was rigorously validated by incorporating an outlier detection step and showed good selectivity (95%) and sensitivity (92%) in classifying samples according to the limit established by the Brazilian Health Regulatory Agency. SIMCA and SERS have also been applied in forensic analysis in which a preliminary liquid–liquid extraction step compensated for the matrix effect in the detection of clonazepam in beer, juice, and energy drink samples [94]. The estimated sensibility and efficiency were 92 and 96%, respectively. In another forensic application, eleven anti-inflammatory and analgesic drugs typically used in China as adulterants in anti-rheumatic health foods have been detected by thin-layer chromatography coupled with portable SERS and hybrid multivariate methods (PC-LDA, PCA-KNN, and PCA-SVM) [89]. The PC-LDA model presented the best performance in terms of predictive accuracy (100%). However, replicated samples were used in the construction of discriminant models, which might lead to biased results, and the chemometric discussion was not extensive.

Residues of four different pesticides have been detected in tea by SERS and a one-dimensional convolution neural network (1D-CNN) [85]. The 1D-CNN model showed a superior predictive accuracy (100%) in comparison to SVM, PLS-DA, and KNN. However, an in-deep model interpretation was not presented, probably because of the limited capacity of 1D-CNN to associate outputs with chemically meaningful variables. Alternatively, the trade-off between computational cost and interpretability of hybrid methods (e.g., PCA-ANN or PCA-SVM) may help to partially compensate for such limitations. For example, urine samples have been analyzed to detect drugs by combining SERS and various architectures of ANN (deep learning networks) with a maximum accuracy of 98% [96]. Although CNN displayed a greater predictive accuracy than PCA-ANN hybrid model, the latter provided an easier chemical interpretation of the results. Additional papers about the use of SERS and multivariate methods in food and forensic analysis are highlighted in Table 1.

In general, machine learning methods display excellent performances to solve classification problems. Nevertheless, the criteria and real need to use them instead of simpler chemometric methods are still unclear and, in most of the cases, not systematically justified. Therefore, investigations beyond conventional comparison of predictive accuracy will help to clarify these points.

Quantitative data analysis in SERS applications

Currently, one of the main challenges in SERS applications is performing reliable and reproducible quantitative analysis [3]. In the univariate strategy, the height of a SERS band or the ratio between two bands is used; nevertheless, this is limited by the need for identifying and isolating pure signals. When univariate analysis fails, multivariate analysis arises as an excellent alternative to circumvent this issue [105]. Quantitative SERS methods using multivariate analysis have been mainly focused on the determination of organic compounds, including pesticides [106,107,108,109,110,111,112,113,114,115,116], drugs [96, 117,118,119,120,121,122,123,124,125], biomolecules [19, 126,127,128,129,130], and metallic ions [131,132,133,134,135,136,137].

PLS has been the most widely used multivariate calibration method in the literature for pesticide residue detection, probably because of its simplicity, speed, good performance, and easy accessibility. Moreover, some modified PLS regression models are available for non-linear calibration purposes. In a recent publication, dynamic SERS was employed to detect acephate, an organophosphate pesticide, in rice samples [115]. Measurements were collected during the transition from wet to dry state of colloidal Au nanorods modified with cysteamine dropped on a silicon chip. PLS, RF, and SVM algorithms were tested for building multivariate calibration models, with 82 samples randomly split into calibration and internal and external validation sets. Alternatively, systematic methods to split the most representative samples are also advisable [138]. PLS displayed the best performance, evidenced by the lowest RMSEP value. Therefore, machine learning/non-linear algorithms should not always be the first option for data analysis. As a good practice, preference should be given to the most straightforward and parsimonious model. Furthermore, linearity should preliminarily be evaluated in the concentration range of interest, for example, by using ANOVA and plotting the \({\varvec{X}}\) scores against \(\mathrm{y}\).

The non-linear relationship between the SERS intensity and thiabendazole pesticide concentration has been demonstrated by Li et al. [108], as the signal depends on the number of nanoparticle clusters and molecules captured in the hot spots. Taking this into consideration, the self-assembly of Au NPs by supramolecular host molecules (acting as bridges) was employed to selectively capture methylxanthine compounds [120]. The advantage of this approach is the generation of hot spots with selective affinity for molecules with similar chemical structures (e.g., theobromine, theophylline, and caffeine). Quantitative analyses were performed by using a non-linear ANN model with excellent predictive accuracy (in terms of RMSE) and LODs down to 50 nmol/L. A bootstrapping random resampling procedure with 1000 iterations was performed to partially compensate for the small number of calibration samples. Machine learning and SERS have also been implemented for indirect monitoring of pH by measurement changes in the SERS spectra of 4-mercaptopyridine (see Fig. 8) [139]. As non-linear dependence of the pH against the analytical SERS signal was observed, SVM, regression trees, and the Gaussian regression process were assessed, with the latest displaying the best performance (lowest RMSE). Nevertheless, the number of samples analyzed was relatively small, which is a limitation for method generalization.

Fig. 8
figure 8

Schematic representation of the pH sensing by the modified SERS substrate and the use of non-linear calibration strategy. Adapted with permission from [139]. Copyright 2022 American Chemical Society

As an alternative to PLS, multivariate curve resolution with alternating least squares (MCR-ALS) has also been used in quantitative SERS applications [19, 110, 126, 128]. MCR-ALS is a factor analysis method that aims at solving mixture signals, thus providing information from the components of the sample [140]. MCR-ALS decomposes the data matrix (\({\varvec{X}}\)) into scores associated to relative SERS intensities and loadings associated to the pure SERS spectra. Combining MCR-ALS and SERS results in versatile and chemically meaningful pseudo-univariate models, which are compatible with the standard addition strategy and require a small number of samples for calibration [129]. Additionally, similar predictive accuracies in terms of RMSE have been reported for MCR-ALS and PLS-DA models built with SERS data [19].

Variable selection is a valuable strategy to improve the model performance by removing uninformative/interfering variables. In a recent SERS application, genetic algorithm (GA), competitive adaptive reweighted sampling (CARS), and ant colony optimization (ACO) were combined with PLS to determine the pesticide thiabendazole in citrus using Au nanorods [109]. The GA-PLS model showed better performance (in terms of RMSEP) than CARS-PLS, ACO-PLS, and full spectra PLS model, and the LOD was 0.33 µg/mL. The coffee-ring effect was proposed to build label-free SERS methods to quantify patulin and alternariol, primary mycotoxin contaminants, in fruits (see Fig. 9A) [118]. The coffee-ring effect is mostly employed to preconcentrate the analyte and nanoparticles at the edge of the ring by capillarity without any external force [141, 142]. Synergy interval-PLS (Si-PLS), GA-PLS, and uninformative variable elimination-PLS (UVE-PLS) models were evaluated. Si-PLS and GA-PLS yielded the best performances (greatest R values) for patulin and alternariol, respectively. Note that there is not an absolute best model for selecting variables, as this will depend on the interaction between the method and the data properties. Highlights of additional interesting SERS applications using multivariate and variable selection methods are shown in Table 2.

Fig. 9
figure 9

A Utilization of the coffee-ring effect in SERS detection of mycotoxins. B Surface modification of AuNPs for metal ion detection and C comparison of actual and predicted concentrations for the quantitative PLS model. Reprinted from references [118] and [134], with permission from Elsevier

Table 2 Complementary recent and relevant applications combining SERS and quantitative multivariate methods

A methodology employing thin-layer chromatography with surface-enhanced Raman scattering (TLC-SERS) has been proposed to quantify histamine, a seafood allergen, in tuna samples [125]. TLC plates coated with gold nanoparticles served simultaneously as SERS substrate and as the stationary phase to separate histamine from the sample matrix. A PCA-SVR hybrid model showed a better performance than PLS, demonstrated by the higher R value and lower RMSEP in the concentration range of interest. Advanced mathematical modelling based on machine learning algorithms seems to fit better a complex non-linear SERS signal behavior. However, it should not be generalized because an in-depth discussion is necessary to provide systematic criteria to select the suitable machine learning method.

SERS has also found applicability in the quantification of metallic ions, present in drinking water and contaminated food. Indirect SERS detection strategies have mainly been adopted for this purpose. For instance, the Au(III) reduction to produce Au NPs in the presence of aptamers and graphene oxide allowed the indirect detection of Pb2+ ions in black tea. SERS signals were monitored using the Raman reporter 4-mercaptobenzoic acid and PLS method with variable selection (CARS-PLS, siPLS, GA-PLS, and Si-GA-PLS) and were compared [135]. In terms of R values, the CARS-PLS model provided the best performance and a LOD of 0.1 µg/L. RMSEP values were in some cases significantly lower than the RMSEC values, which might indicate a biased choice of validation samples. Interestingly, some reports have reinforced the excellent performance of CARS-PLS for the label-free determination of Cd2+ ions [132] and total arsenic [131] in tea samples with LODs of 23 pg/L and 27 ng/g, respectively.

Although several recently published articles have combined PLS with variable selection methods, some important aspects have not been systematically discussed. For example, the criteria for selecting a suitable number of LVs, which may lead to building underfitted or overfitted models. In a recent publication, a SERS method for quantitatively detecting Zn2+ ions at nmol/L level using a complexometric indicator was reported (Fig. 9B–C) [134]. In this work, the model was properly built using PLS with 10 VLs, determined by leave-one-out cross-validation. Additionally, the authors followed the ASTM recommendations, and samples were split into about two-thirds for the calibration set and one-third for the validation set. Other important aspects often absent in papers using SERS and multivariate calibration models are spectral interpretation and outlier detection. Informative vectors generated for PLS models should be critically interpreted and associated with the respective spectral bands, and the identification of outliers is crucial for the development of calibration models with good prediction ability.

Benchmarking and considerations for method selection

The main characteristics of the chemometric and machine learning methods successfully incorporated in SERS applications are summarized in Table 3. Note that non-supervised models, such as PCA, were not included because they should always be performed before supervised modelling. The first aspect to consider is the dimensionality of the SERS data, where the number of variables/Raman shifts frequently exceeds the number of samples. Therefore, models of low variance and dimensionality reduction are good options (e.g., PLS, PLS-DA, SIMCA, linear SVM, and MCR-ALS). On the other hand, if the number of samples is similar or higher than the number of variables/Raman shifts, high variance models may be a suitable first choice (e.g., RF, kernel SVM, SVR, and ANN). Moreover, one can consider the following key criteria: (1) predictive accuracy, (2) simplicity/parsimony, (3) interpretability, (4) linearity, and (5) computational cost. For example, the diagnosis of diseases such as cancer and COVID-19 demands highly accurate results because errors could have a dramatic impact on the population. Here, the use of flexible ANN architectures would be a good choice to fulfil such a requirement, despite the complexity (less parsimony), the poor ability to explain what is behind the outputs, and the high computational cost. Obtaining excellent accuracy is always desired; nevertheless, it is just one criterion to decide whether a model is useful or not to solve a particular problem. Accordingly, the study of potential biomarkers or biological pathways underlying medical conditions by SERS demands interpretable models (e.g., PLS-DA and RF). In this sense, great efforts are being made to improve the interpretability of machine learning algorithms aiming to change their “black box” nature (e.g., ANN and SVM) [143, 144]. In food analysis, supervised classification and authentication problems have been usually faced. Classification and discrimination analysis can be performed by using PLS-DA, LDA, or KNN with good predictive accuracies. On the other hand, rigorous authentication demands the use of SIMCA or other one-class modelling approaches, in which the authentic class is modeled without including the adulterated/non-authentic class information in the training set [29, 32]. This is because the adulterated samples can dramatically change over time and an infinite number of new types of adulteration may occur. Although it has not yet been applied in the SERS field, the use of machine learning for food authentication (e.g., one-class SVM and RF) has gained attention in recent spectroscopic applications [145, 146].

Table 3 Comparison of multivariate supervised methods for SERS data analysis

As can be seen in Table 3, SVM/SVR and PLS/PLS-DA have been the most used methods in SERS applications, probably because of their good predictive accuracies and availability in several software packages. PLS/PLS-DA outperforms SVM/SVR in terms of interpretability and simplicity/parsimony. However, the non-linear behavior of the SERS signals may negatively affect the performance of PLS models and restrict their applicability to narrow linear ranges. It is therefore advisable to test the linearity in the concentration range of interest before going through complex SVR or ANN modelling, which is less parsimonious, reduces the interpretability, and increases the computational cost. Finally, after choosing the suitable multivariate method, only a rigorous validation step will assure the robustness and generalization of the SERS-multivariate method in real-world applications.

Conclusions and outlook

Recent works implementing chemometrics and machine learning in SERS have been described and discussed in this review. These multivariate techniques allow the performing of SERS analysis with excellent predictive accuracy, especially when using ANN. Most of the qualitative applications have been focused on the characterization of pathogens, human health care, and food and forensic analysis. Complex machine learning models with limited interpretability and high computational cost should be avoided to solve simple discrimination/classification problems (principle of parsimony). For example, PLS-DA and SIMCA are interpretable, and good options for classification and authentication, respectively, despite they might provide lower prediction accuracy for complex SERS data. In quantitative applications, most of the published SERS applications have focused on monitoring pesticides, drugs, biomolecules, and metallic ions. Methods based on PLS, SVR, and ANN provide excellent performances in terms of predictive accuracy; nevertheless, the linearity should previously be assessed for choosing a suitable/parsimonious multivariate method. Although RF is not yet frequently used in SERS, it combines important advantages, such as good prediction accuracy and interpretability. Future improvements in the interpretability of accurate machine learning algorithms, such as SVM and ANN, will certainly improve the quality of future smart SERS sensors. Moreover, the rigorous validation of multivariate models, by estimating FOM and by using external validation/test sets with independent samples, should become a general practice because it will guarantee the robustness and reproducibility of SERS methods in real-world applications. Therefore, combining SERS and multivariate analysis significantly improves selectivity and allows for extracting crucial spectroscopic information, albeit sensing of metallic ions or molecules with very low affinity for metallic surfaces is still limited and needs preliminary surface modifications.