Quantitative Raman spectroscopy for the Ioncell® process: Part 2—quantification of ionic liquid degradation products and improvement of prediction performance through interval PLS

One of the main issues associated with ionic liquids (ILs) is their recyclability. Viable recycling concepts can only be developed if one knows what is in the IL mixtures and solutions. In our previous work, we showed that it is possible to quantify water and 1.5-diazabicyclo[4.3.0]non-5-enium acetate [DBNH][OAc] IL components in liquid mixtures using Raman spectroscopy. In this regard, we considered Raman spectroscopy as a promising analytical method for the inline monitoring and control of the Ioncell® process. In the present work, we push the limits of this analytical method further by extending it to more complex and realistic liquid mixtures including the hydrolysis product 1-(3-aminopropyl)-2-pyrrolidone (APP) that can be formed upon the reaction of 5-diazabicyclo[4.3.0]non-5-ene (DBN) with water. Quantifying APP is important in order to measure the extent of the hydrolysis reaction and apply the right corrective measures to reverse the reaction and to maintain the process within the optimal working conditions. The simultaneous quantification of the four components (Acetic acid, DBN, APP and H2O) in typical Ioncell® liquid streams is investigated using Raman spectroscopy. The sensitivity of the Raman method in quantifying APP is also highlighted in comparison with refractometry, which is widely applied to measure IL concentration in aqueous mixtures. Finally, we propose simple modifications on the multivariate partial least square regression model based on a variable selection algorithm to enhance the accuracy of the predicted calibration values.


Introduction
Process analytical technology (PAT) is increasingly adopted for inline analysis and process control. According to Kuppers et al., the advantages of integrated process analysis and control comprise a better control over the process, safer operations, significant economic advantages due to better product quality and short troubleshooting delays (Kueppers and Haider 2003).
For process control in industrial settings, waiting times of a few hours for an accurate and precise laboratory analysis are unacceptable. Fast and accurate feedback mechanisms are needed to avoid the production of inferior and substandard products during the analytical delay time. The combination of spectroscopy and chemometrics is ideal for such a situation where a compromise between the delay and accuracy is desired. Some of the accuracy and precision of the laboratory method is sacrificed for getting fast answers that can be used to monitor and control the process continuously (Geladi et al. 2004). With this regard, Raman spectroscopy is increasingly considered as method of choice for a fast, multi-component, inline quantitative analysis, for real-time process monitoring and control (Cooper 1999).
Indeed, a single fast-acquired Raman spectrum, with its well-resolved spectral features, can provide a large amount of information about a sample. The proportional relationship between the Raman scattering intensity and analyte concentration is the basis for most of the quantitative analyses done using Raman spectroscopy (Smith and Dent 2004;Larkin 2011). In a multicomponent system, quantitative Raman analysis relies on the principle of linear superposition: the Raman spectrum of a mixture is equal to the weighted sum of the Raman spectra of the components present in the mixture (Pelletier 2003). The attractiveness of multi-component analysis using Raman spectroscopy is reinforced by the absence of optical coherence between components in the sample, which means that the Raman scattering by one component in the sample does not influence the Raman scattering of another component (Pelletier 2003). Interference can only occur when the absorption spectrum of one or more components significantly affects the transmission of excitation or Raman scattered light to or from the target analyte. Further, possible changes in the interactions between the analytes upon changing their relative concentrations may alter their respective peak shape and intensity in the spectrum (Kauffmann and Fontana 2015).
From the perspective of process digitalization, realtime monitoring and control, we showed in our previous work that Raman spectroscopy is a very promising analytical tool for the Ioncell Ò process (Guizani et al. 2020). Combining Raman spectroscopy and chemometrics would allow a real time quantification of the protic ionic liquid (IL) 1.5-diazabicyclo[4.3.0]non-5-eniume acetate [DBNH][OAc] and water in the process liquid streams.
The formation of APP in the liquid streams may lower or even suppress the cellulose dissolution power of the IL and ultimately lead to the irreversible formation of APPAc. Recently, Hyde and coworkers showed that the hydrolysis reaction could be avoided by controlling the pH (Hyde et al. 2019). Limiting the hydrolysis reaction is hence possible but will not be discussed in this paper. The process viability is therefore dependent on the constant monitoring of APP formation and on reversing the hydrolysis reaction when it occurs. Hence, the inline Raman method must be capable of determining the hydrolysis product concentration in the process streams when protic ILs prone to hydrolysis are used.
Since DBN and APP show structural differences which would lead to distinct scattering signals, we hypothesized that the Raman quantitative analytical method could be extended to mixtures containing APP. Hence, in the present work, we investigate the potential of Raman spectroscopy for the quantification of a more challenging and complex multicomponent mixture of DBN, APP, acetic acid (AcOH) and H 2 O. We also explore a simple modification of a multivariate regression model algorithm using variable selection, in order to improve the model prediction performances.

Sample preparation
The [DBNH][OAc] stock solution was synthetized using the procedure described in (Guizani et al. 2020 Table S1 in the Electronic Supplementary Information (ESI).
Out of those stock solutions and Millipore H 2 O, forty samples (each weighing more than 3 g) with defined compositions were prepared gravimetrically using an electronic weighing scale with a precision of 0.1 mg. These forty samples can be classified into four categories as a function of the water contents (* 0 wt.%, * 25 wt.%, * 50 wt.% and * 75 wt.%). The concentration ranges for the four individual molecular constituents are given in Table 1. The composition of the forty samples is illustrated in Fig. S1 in the ESI. The training sample was prepared such that it spans wide ranges of the four analytes' concentrations and encompasses as wide as possible sample composition that could be encountered in the process liquid streams.

Refractometry
The reader may legitimately wonder if other simpler inline methods could be considered instead of Raman spectroscopy. We asked ourselves similar questions while screening alternative analytical methods. Refractometry is widely applied for inline process monitoring and control and is suitable to quantify the IL concentration in aqueous mixtures (Liu et al. 2008;Kaneko et al. 2018). Therefore, we considered it as an alternative method that should be investigated and conducted refractive index (RI) measurements on the set of forty samples in order to assess its potential. The RI was measured with a Peltier heated Abbe refractometer (Abbemat 300, Anton Paar, Austria) at 293.15 K.

Raman spectroscopy
Samples were analyzed with an Alpha 300 R confocal Raman microscope (Witec GmbH, Germany) at ambient conditions. Nearly 100 lL of the sample was spread on a microscope concavity slide and covered with a cover glass. The Raman spectra were obtained by using a frequency doubled neodymiumdoped yttrium aluminum garnet (Nd:YAG) laser (532.35 nm) at a constant power of 30 mW, and a Nikon 209 (NA = 0.4) air objective. The Raman system was equipped with a DU970 N-BV EMCCD camera behind a 600 lines/mm grating. The excitation laser was polarized horizontally. After fixing the focus using the microscopy mode, each single spectrum was acquired as an average of 32 scans with an integration time of 0.5 s/scan. In total, forty spectra were collected for the forty mixtures.

Data analysis
Data analysis and plotting were performed with Matlab (The Mathworks, Inc.). The dataset analyzed during this study is available from the corresponding author on a reasonable request.

Exploratory data analysis: Principal Components Analysis (PCA)
The spectra were first baseline corrected using a second order polynomial and then area normalized. PCA was done on the pre-processed mean centered spectra. For more details on PCA the reader is invited to read specialized literature (Brereton 2003;Geladi 2003;Geladi et al. 2004).

Partial least squares regression (PLS)
The PLS1 algorithm was used in this study to generate a model for each of the component in the sample set. The same pre-processing method as described for PCA was adopted for the PLS modelling. The decomposition into latent structures was done by maximizing the co-variance between the samples preprocessed spectra and their specific analyte mean-centered concentrations using the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm (Geladi and Kowalski 1986). The model validation and selection of the adequate number of latent variables for the PLS model was done using model cross-validation procedure based on the leave-one-out method. The root-mean square error of cross validation (RMSECV) was used as quantitative measure for the selection of the model latent variables (LVs). It was calculated using the following formula: where y i and b y i denote the measured and predicted values, respectively, and n the number of samples in the data set.

Results and discussion
The limitations of refractometry Refractometry was first considered regarding its simplicity and the proven applicability in analyzing mixtures of ILs and water (Liu et al. 2008) (Kaneko et al. 2018). Hence, before tackling the core of this paper, we would like to discuss our choice of further developing the Raman analytical method in the light of results we got from refractometry. The evolution of the RI for the 40 samples is shown in Fig. 2. In addition to the RI measurements on the 40 samples, we measured the RI for aqueous solutions of APP-free [DBNH][OAc]/H 2 O mixtures, in order to assess the sensitivity of RI in detecting the formation of APP.
The RI evolves linearly with the IL mass fraction and the trends are very similar in the presence or absence of APP. Samples measured at similar water content but with large difference in APP content do not show any significant difference in the RI value, though some spread in the RI values can be seen at low dilution levels. Altogether, those results show that refractometry is very limited in probing the extent of DBN hydrolysis to APP and that an alternative more sensitive analytical method is needed.

Raman spectra of the liquid mixtures
The pre-processed Raman spectra of the different mixtures are shown in Fig. 3. The spectra are colored according to the H 2 O wt.% concentration in the mixture. The samples having the same water content are clearly grouped into four distinct categories corresponding to the four dilution levels. The scatter intensity in the 3000-3700 cm -1 region results from the OH stretching vibrations in the water molecules and increases as the water content gets higher (Sun 2009). Inversely, the scattering intensity in the 300-2000 cm -1 region decreases with the dilution level as it is mainly related to the other molecules. The bending mode of water (* 1640 cm -1 ) (Pavlović et al. 1991) has a low influence due to its low intensity compared to the scatter intensity of the other molecules as discussed in our previous article (Guizani et al. 2020).

Effects of H 2 O addition
The spectra of the four APP-free samples having different water contents are shown in Fig. 4. The band assignment was done in the light of the existing literature on the Raman spectra-structure correlations and characteristic group frequencies (Larkin 2011).
In APP-free samples, the peaks at 464 and 518 cm -1 originate from the C-N-C bending/deformation modes in DBN. The peak around 740 cm -1 originated most probably from the C-C vibrations in DBN. The two prominent peaks at * 920 and * 2930 cm -1 were ascribed to the C-C and C-H stretching bands in AcOH respectively. The peaks at * 2890 and * 2980 cm -1 would be attributed to -CH 2 in phase and out of phase stretching in DBN. In the water free sample, the medium intensity band at * 1650 cm -1 was safely attributable to the C=O stretching band from AcOH.
Upon addition of water, the scattering intensity in the 280-3000 cm -1 region decreased notably due to the dilution effect. Conversely, the broad peak related to the OH vibrations in the water molecule * 3100 to 3700 cm -1 increased markedly when increasing the water content. It is well known that the Raman scattering from an analyte can show a strong dependence on the molecule's environment (Kauffmann and Fontana 2015). Thus, in addition to the intensity  Pre-processed Raman spectra of four APP-free IL samples with different dilution levels. The spectra are divided into fingerprint region (down) and high frequency region (top) for the sake of clarity change due to the concentration of the analyte, band shift and shape modifications result from the change in the molecule's environment.
Interactions of the analytes with water molecules via hydrogen bonding are expected upon addition of water. Those would explain partly some modifications other than the intensity decrease in the spectra of the diluted samples. For instance, the C=O stretching band shifted down to * 1600 cm -1 which was most likely due to the structural modifications of the solutions in the presence of water (Nakabayashi et al. 1999;Gofurov et al. 2019). Further, upon addition of H 2 O, the band at * 900 cm -1 vanished, which might indicate the absence of specific AcOH structures (dimers or trimers) that were only present in the waterfree IL. At higher frequencies, the reader can notice a marked intensity decrease in the 2800-2870 cm -1 region, reflecting modifications in the vibrational modes of DBN in the presence of water.

Effects of APP addition
The Raman spectra of APP-free samples and IL samples with APP/DBN = 4.47 mol/mol are shown in Fig. 5 for both cases of nearly water-free mixtures and mixtures with 75 wt.% of water. The reader can notice that there were visible changes in the spectra upon the variation of the APP/DBN ratio regardless of the water content both in the fingerprint region and in the higher frequency region. In the fingerprint region, the scattering intensity increased in the range of 332-340 cm -1 and was assigned to the vibrational modes of d C-C present in the aliphatic amino-propyl chain of APP. The bands at * 464 and * 516 cm -1 decreased markedly upon the addition of APP. Those probably originated from the C-N-C bending/deformation modes which would have lower intensities in APP than in DBN since APP has only one C-N-C bond after the DBN ring opening. Moreover, this region of 900-1680 cm -1 is marked by visible changes in peak intensities and shapes upon the addition of APP. This region probably reflects the different deformation and rocking vibrations of -NH 3 ? groups as reported in (Socrates 2001). Upon the addition of APP, the peak around * 1640 to 1675 cm -1 broadened markedly. This peak would represent the overlapped contributions from C=O vibrations in the ketone group of APP, and the C=O vibrations from the carboxylic acid group in AcOH. Also, since amine -NH 3 ? groups have also medium-to-strong absorptions near 1600 cm -1 and 1500 cm -1 due to asymmetric and symmetric deformation vibrations (Socrates 2001), they probably contribute to this peak.
For several peaks, a shift in the wavenumber was noticed and attributed to the presence of water as discussed previously.
In the high-frequency region, the bands at * 2890 and * 2980 cm -1 attributed to -CH 2 in phase and out of phase stretching in DBN, decreased markedly upon the addition of APP. Moreover, the peak intensity in the * 3100 to 3500 cm -1 region increased in the presence of APP. The higher intensity might be due to the overlapping contribution of OH groups from H 2 O and -NH 3 ? groups from APP. In water-free samples, the stretching vibrations from -NH 3 ? groups were not visible. According to Socrates (2001), stretching vibrations of -NH 3 ? groups in amine hydro halides yield a signal of medium intensity. The same observation stands for the NH ? vibrations in DBN, which do not show a contribution in the high frequency region. We speculate that the solution composition and the molecular environment around the -NH 3 ? and NH ? hinder those vibrational modes. More investigations are needed to elucidate this riddle.
The observed changes in the spectral features due to the variation of the APP/DBN ratio and H 2 O contents encouraged the development of a quantitative analysis using Raman spectroscopy.

Principal components analysis (PCA)
PCA is a method for data reduction and visualization. It is in the core of chemometrics and is commonly used for an exploratory multivariate data analysis and unsupervised pattern recognition. In PCA, the dimensionality of the data set is reduced by transforming the original spectral data set into a smaller data set composed by few uncorrelated variables (PCs), which retain most of the variation present in all the original variables. The aim is to identify the direction of greatest variability in the data and interpret them in terms of the underlying chemistry.
PCA was performed on the pre-processed (background corrected, area normalized and mean-centered) data (see Fig. S4 in the ESI for a comparison between the original and the pre-processed, meancentered spectra). The results show that more than 99% of the variance in the data was captured by the first three PCs (see Fig. S5 in the ESI). The first PC explained 89% of the variance, while the three following ones explained 5.1%, 4.4% and 0.52%, respectively. The pseudo-rank of the data matrix should be therefore three or four, which was physically reasonable since four different molecules were present in the mixtures and contribute to the scattering. The sample N35 showed difficulties during the preprocessing (background subtraction) and had very high Hoteling T 2 and Q residual values. It was consequently considered as an outlier and excluded from the decomposition procedure.
The scores and loadings of the first three PCs are shown in Fig. 6. For PC1, the scores were colored according to the H 2 O wt.% in the mixtures. PC1 scores were related to the water content in the different mixtures. Samples having the same water content had similar scores on PC1. Their loadings showed positive contributions in the * 280 to 1800 cm -1 region (corresponding mainly to the scattering from DBN, APP and AcOH), and negative ones in the * 3020 to 3720 cm -1 region (corresponding mainly to the scattering from H 2 O).
PC2 and PC3 separated the samples within each group according to the proportion of DBN in the sum of DBN and APP. PC2 and PC3 indicated also some interesting features. In PC2, the water-free samples and the samples with the highest water content had negative scores and are separated from the less extreme samples having respectively 25 wt.% and 50 wt.% H 2 O and positive scores. In PC3, the spread in scores became narrower as the dilution increased, and the shape of the scores for the different samples was very similar to the shape of DBN wt.% in the mixture as shown in Fig. S2 in the ESI. Altogether, the PCA decomposition showed that the variance in the data could be captured almost entirely in the four first components. These four PCs reflected most of the The PLS model was built on 39 samples after discarding sample N35 which showed some signal anomalies and for which the background correction was unsuccessful, resulting in a high Q residual and a clear outlier behavior when included in the models. The results from the PLS regression are shown in Table 2. As the PLS1 algorithm was adopted for the regression, each component was modeled separately. This choice was motivated by the fact that PLS1 regression results in a lower error than PLS2 with which all components are modeled simultaneously (Brereton 2003).
The number of chosen LVs varied between 3 and 5 for the different models. Each of the models captured Fig. 6 Scores (left) and loadings plots for the three first PCs more than 96% of the variability in the predictor (spectra) and more than 99% of the variability in the response (concentrations).
The RMSECV for the AcOH, DBN, APP and H 2 O were 0.23 wt.%, 2.09 wt.%, 1.15 wt.% and 0.64 wt.%, respectively. The model showed a better predictability for AcOH and H 2 O than for DBN and APP, although the results were still in a good range for the two last molecules. Figure 7 shows the cross-validation predicted versus measured concentrations of AcOH, DBN, APP and H 2 O. The reader can notice that the data points for the four molecules lie very close to the 1:1 identity line. The Pearson correlation coefficient was above 0.99 for the four cases. Overall, the analytical method showed very good results even for more complex mixtures containing APP. Additional data about the PLS models are given in the ESI. It is important finally to stress the universality of such analytical methods based on spectroscopy and multivariate analysis. The applications are practically unlimited, and readers are encouraged to try applying them on their own systems where variations in signal intensities could be correlated to the analyte concentration.

Enhancing the PLS model prediction performance through variable selection
The purpose of variable selection is to obtain a model that is easier to understand, and which has a better predictive performance. In searching for the best variable selection procedure, one might be tempted to try all possible combinations of the predictor variables in order to select the best one. However, this turns out to be prohibitive due to the large number of variables and causes a high risk of overfitting when the number of variables is higher than the number of samples (Andersen and Bro 2010). Both conditions are encountered when dealing with spectroscopic data. Therefore, the purpose here is not to search for the best model, but for a more robust one, in terms of prediction and understanding.
With this regard, we adopted a simple method in which the spectral range was divided into 10 subintervals and PLS models were determined based on all possible interval combinations. The algorithm calculated the Root-Mean Square of Errors for Calibration (RMSEC) for all combinations and chose the combination that resulted in the lowest RMSEC. The results are shown in Fig. 8. The reader can see for instance that taking the whole spectrum to predict DBN or APP results in the worst prediction case in terms of the lowest possible RMSEC. For AcOH and H 2 O, the worst case in terms of the lowest RMSEC is obtained with one subinterval.
The best cases for lowest RMSEC were found between those two extremes. They are summarized in Table 3 with the optimal number of subintervals and the corresponding lowest RMSEC. The reader can notice that with this simple procedure, the model calibration errors can be further reduced.

Conclusion and perspectives
With short analytical delays and acceptable accuracy in determining the expectable liquid stream composition, Raman spectroscopy shows significant potential for the Ioncell Ò process monitoring and control. Concentrations of water, IL components and degradation product in the liquid streams could be determined in real time using adequate Raman in-situ probes. The real-time information can be used to monitor and control the process operations.
Compared to the more widely applied refractometry for measuring aqueous IL concentration, Raman spectroscopy reveals a much better sensitivity in detecting the IL degradation products and shows hence a clear advantage. This study further confirmed Fig. 8 The lowest RMSEC values for the 10 best models using an increasing number of intervals that the combination of Raman spectroscopy and chemometrics opens the door for reliable monitoring and efficient control of a potential wide range of ILbased processes.
Funding Open access funding provided by Aalto University.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.